Parsing with PetitParser2

Simple, modular and flexible high-performance parsing framework.

Introduction
Introduction to PetitParser2
Migration from PetitParser

Parser Development
Scripting with Bounded Seas
Grammar
Context-Sensitive Grammar
Abstract-Syntax Tree
Full Parser
Comments
Optimizations
Optimization (Memoization)

PetitParser2 Internals
Star Lazy (In Progress)
Caches (In Progress)
Matching Tags (In Progress)
Context-Sensitivity (In Progress)

View the Project on GitHub kursjan/petitparser2

Grammar: Turning the script into a real parser

To scale we should turn the playground script into a proper parser.

Note:

Turning a script into a class is a good practice, because it allows us to:

  • manage cyclic dependencies,
  • simplify testing and
  • easily extend with new functionality.


Hands On

We create a parser by subclassing PP2CompositeNode:

Warning:

Note that WebGrammar and WebGrammarTest classes are already loaded in your image. You might want to save them for future reference, or rename them before doing next steps.


PP2CompositeNode subclass: #WebGrammar
  instanceVariableNames: 'document javascript elOpen elContent elClose elementName element text jsOpen jsContent jsClose jsString structuredDocument comment any'
  classVariableNames: ''
  poolDictionaries: ''
  category: 'PetitParser2-Tutorial'

Javascript rule

We define a javascipt rule as follows:

WebGrammar>>javascript
  ^ jsOpen, jsContent, jsClose ==> #second

The rules of javascript are defined as follows:

WebGrammar>>jsOpen
  ^ '<script>' asPParser

WebGrammar>>jsClose
  ^ '</script>' asPParser

WebGrammar>>jsContent
  ^ (jsString / any) starLazy

WebGrammar>>jsString
  ^ $' asPParser, any starLazy, $' asPParser

WebGrammar>>any
  ^ #any asPParser

Cover the javascript rule with tests to make sure the rule works as expected. Do this by subclassing PP2CompositeParserTest and by adding test methods:

PP2CompositeNodeTest subclass: #WebGrammarTest
  instanceVariableNames: ''
  classVariableNames: ''
  poolDictionaries: ''
  category: 'PetitParser2-Tutorial'

WebGrammarTest>>parserClass
  ^ WebGrammar

WebGrammarTest>>testJavascript
  self 
    parse: '<script>alert("hi there!")</script>' 
    rule: #javascript  

Document rule

To extract javascript from an html document, we first define the document rule simply as a repetition of javascript seas as we are interested only in javascript:

WebGrammar>>document
  ^ (javascript sea ==> #second) star
Note:

Or alternatively:

WebGrammar>>document
  ^ javascript islandInSea star

The islandInSea operator is a shorthand for:

sea ==> #second


Do not forget the start rule:

WebGrammar>>start
  ^ document 

And finally, we write a test for document:

WebGrammarTest>>testDocument
  | input |
  input := PP2Sources current htmlSample.
  
  self parse: input rule: #document.
  self assert: result size equals: 2.

Both tests should pass now:

(WebGrammarTest buildSuiteFromMethods: #(#testDocument #testJavascript)) run.
2 run, 2 passes, 0 skipped, 0 expected failures, 0 failures, 0 errors, 0 unexpected passes

Summary

We turned the script from previous chapter into a parser to be able to test and further extend our parser.