Parsing with PetitParser2

In this series we describe PetitParser2 --- a modular and flexible high-performance top-down parsing framework.

1. Structure

The text is organized into two parts, the first part covers the basic developer's workflow with PetitParser. It is suitable for anyone who has a bit of experience with PetitParser and would like to learn how to efficiently develop production quality parsers with PetitParser. The second part covers advanced topics and inspects internals of PettiParser into more depth.It is for those who would like to fully levereage capatbilities of PetitParser to better fit their needs.

Prerequsities

This text expects a basic knowledge of parsing with PetitParser. If you are not sure, if you have the required level of knoledge, we recommend the Writing Parser with PetitParser from Lukas Renggli and Introduction to PetitParser2 by Tudor Girba. Optionally, there is a dedicated PetitParser chapter in the Deep into Pharo book.

Part I, Developer's Workflow

In the first part, developer's workflow, we demonstrate how to quickly build a high-performance parser on the concrete example of extracting a Javascript code from any HTML source (including malformed ones).

We start by prototyping in a playground and throughout several steps we reach a full-fledged and tested parser that effectively builds a structured representation of the parsed source. During the tutorial, we cover many topics, some of them are well-known in the area of parsing. Nevertheless, the following ones are unique to PetitParser:

  1. We utilize bounded seas --- a technology that allows a programmer to focus on the interesting parts of an input (i.e. Javascript code in our case) and ignore the rest (i.e. the remaining HTML code).
  2. We utilize context-sensitive rules that will allow us to properly (and formally!) detect matching begin and end HTML tags and recover from malformed inputs.
  3. Last but not least, we use optimizations capabilities of PetitParser to turn our prototype into a an efficient top-down parser.

In Extracting Javascript we start with a simple script. Later in From a Script to a Parser, we turn the script into a proper grammar and add tests. In Extracting the Structure we extend the parser with rules to recognize HTML structure, even if the HTML source is malformed. In Abstract Syntax Tree, we show how to build a suitable representation of the HTML source.

In Optimizations we inspect optimizations capabilities of PetitParser2 and in Memoizations we describe tooling of PetitParser2 that will help us to pinpoint the performance bottlenecks and we show techniques how to fix them.

Part II, Advanced Topics

In the second part we describe advanced topics or get into the details of the topics we mentioned in the first part. For example, we provide more details to context-sensitive parsing, optimizations with caches and specializations, we show how to parse streams or how to efficiently highlight text (Especially useful for text editors).

On a concrete example of a Smalltalk grammar, we describe how to develop a parser whose performance is comparable to a table-driven parsers such as SmaCC or hand-written parsers such as RBParser (a parser used by Pharo compiler).

add some graphs?

create a chapter about syntax highlighting

create a chapter about specializations

2. Getting Started

Installation

The easiest way to start this tutorial is to use Moose. Moose is a software and data analysis platform that has everything we need already installed.

Alternatively, you can download clean Pharo 6 (or higher) image and install PetitParser2 using the following command:

Metacello new
	baseline: 'PetitParser2';
	repository: 'github://kursjan/petitparser2';
   	load

In case nothing of the previous works, please let us know.

Book Chapters

The list of available chapters:

Part two is not yet finished

If you are interested in more details:

3. Changelog

2017-02-21 - First draft of he Developer's workflow

4. License

The text is released under the MIT license.

5. Contact

Do you have ideas, suggestions or issues? Write us an email, or contact us on github!