Parsing with PetitParser2

This text is part of the Parsing With PetitParser2 series. The table of content can be found at the end of the chapter.

Turning the script into a real parser

Once we are satisified with the functionality of a prototype (see the previous chapter) and we decide to continue further, it will become more and more inconvenient to keep code in a form of a playground script. It is a good practice to define a parser as a class, it allows us to:

  • manage cyclic dependencies,
  • simplify testing and
  • easily extend with new functionality.

1. Hands On

We create a parser by subclassing PP2CompositeNode:

PP2CompositeNode subclass: #WebGrammar
	instanceVariableNames: 'document javascript elOpen elContent elClose elementName element text jsOpen jsContent jsClose jsString structuredDocument comment any'
	classVariableNames: ''
	poolDictionaries: ''
	category: 'PetitParser2-Tutorial'

Javascript rule

We define a javascipt rule as follows:

WebGrammar>>javascript
	^ jsOpen, jsContent, jsClose ==> #second

The rules of javascript are defined as follows:

WebGrammar>>jsOpen
	^ '<script>' asPParser

WebGrammar>>jsClose
	^ '</script>' asPParser

WebGrammar>>jsContent
	^ (jsString / any) starLazy

WebGrammar>>jsString
	^ $' asPParser, any starLazy, $' asPParser

WebGrammar>>any
	^ #any asPParser

First, we would like to cover javascript rule with a test to make sure the rule works as expected. We do this by subclassing PP2CompositeParserTest and adding the test method;

PP2CompositeNodeTest subclass: #WebGrammarTest
	instanceVariableNames: ''
	classVariableNames: ''
	poolDictionaries: ''
	category: 'PetitParser2-Tutorial'

WebGrammarTest>>parserClass
	^ WebGrammar

WebGrammarTest>>testJavascript
	self parse: '<script>alert("hi there!")</script>' rule: #javascript	

Document rule

To extract javascript from an html document, we first define the document rule simply as a repetition of javascript seas as we are interested only in javascript:

WebGrammar>>document
	^ (javascript sea ==> #second) star

Or alternatively:

WebGrammar>>document
	^ javascript islandInSea star

The islandInSea operator is a shorthand for:

sea ==> #second

We should not forget the start rule:

WebGrammar>>start
	^ document 

And finally, we write a test for document:

WebGrammarTest>>testDocument
	| input |
	input := PP2Sources current htmlSample.
	
	self parse: input rule: #document.
	self assert: result size equals: 2.

Both tests should pass now:

(WebGrammarTest buildSuiteFromMethods: #(#testDocument #testJavascript)) run
2 run, 2 passes, 0 skipped, 0 expected failures, 0 failures, 0 errors, 0 unexpected passes

2. Complete Sources

You can download the sources here:

3. Conclusion

In this chapter we have turned the script form the previous chapter into the class, following the practices of PetitParser. We have also added some tests to verify the rules are working as expected. This step allows us to add a new functionality as we will do in the next chapter.

Table of Contents

This text is part of the Parsing With PetitParser2 series.

Part I, Developer's Workflow:

Do you have ideas, suggestions or issues? Write us an email, or contact u on github!

Go to top.