Simple, modular and flexible high-performance parsing framework.
Introduction
Introduction to PetitParser2
Migration from PetitParser
Parser Development
Scripting with Bounded Seas
Grammar
Context-Sensitive Grammar
Abstract-Syntax Tree
Full Parser
Comments
Optimizations
Optimization (Memoization)
PetitParser2 Internals
Star Lazy (In Progress)
Caches (In Progress)
Matching Tags (In Progress)
Context-Sensitivity (In Progress)
To scale we should turn the playground script into a proper parser.
Turning a script into a class is a good practice, because it allows us to:
We create a parser by subclassing PP2CompositeNode
:
Note that WebGrammar
and WebGrammarTest
classes are already loaded in your image. You might want to save them for future reference, or rename them before doing next steps.
PP2CompositeNode subclass: #WebGrammar
instanceVariableNames: 'document javascript elOpen elContent elClose elementName element text jsOpen jsContent jsClose jsString structuredDocument comment any'
classVariableNames: ''
poolDictionaries: ''
category: 'PetitParser2-Tutorial'
We define a javascipt
rule as follows:
WebGrammar>>javascript
^ jsOpen, jsContent, jsClose ==> #second
The rules of javascript
are defined as follows:
WebGrammar>>jsOpen
^ '<script>' asPParser
WebGrammar>>jsClose
^ '</script>' asPParser
WebGrammar>>jsContent
^ (jsString / any) starLazy
WebGrammar>>jsString
^ $' asPParser, any starLazy, $' asPParser
WebGrammar>>any
^ #any asPParser
Cover the javascript
rule with tests to make sure the rule works as expected.
Do this by subclassing PP2CompositeParserTest
and by adding test methods:
PP2CompositeNodeTest subclass: #WebGrammarTest
instanceVariableNames: ''
classVariableNames: ''
poolDictionaries: ''
category: 'PetitParser2-Tutorial'
WebGrammarTest>>parserClass
^ WebGrammar
WebGrammarTest>>testJavascript
self
parse: '<script>alert("hi there!")</script>'
rule: #javascript
To extract javascript from an html document, we first define the document
rule simply as a repetition of javascript
seas as we are interested only in javascript:
WebGrammar>>document
^ (javascript sea ==> #second) star
Or alternatively:
WebGrammar>>document
^ javascript islandInSea star
The islandInSea
operator is a shorthand for:
sea ==> #second
Do not forget the start
rule:
WebGrammar>>start
^ document
And finally, we write a test for document
:
WebGrammarTest>>testDocument
| input |
input := PP2Sources current htmlSample.
self parse: input rule: #document.
self assert: result size equals: 2.
Both tests should pass now:
(WebGrammarTest buildSuiteFromMethods: #(#testDocument #testJavascript)) run.
2 run, 2 passes, 0 skipped, 0 expected failures, 0 failures, 0 errors, 0 unexpected passes
We turned the script from previous chapter into a parser to be able to test and further extend our parser.