Elements of HTML has an interesting property: the name of an opening tag has to match the name of a closing tag. Though natural for humans, this is, surprisingly, rather difficult task from the parsing theory point of view.
Unfortunately, standard solutions do not really work as we would like to. We describe problem in more details in the supplementary Matching Tags chapter.
To fix the issue, PetitParser2 comes with a special syntax to express constrains of matching open and close tags.
It can store a result of a rule (e.g. opening an html tag) onto a stack using the
push operator and assert that a result of a rule (e.g. closing an html tag) matches the top of the stack using the
Here is the concrete example: First we define an element name as a repetition of letters and digits:
Than we define element as a sequence of
elOpen, we push the element name as well as we consume water in case an element contains arguments:
elClose, we first match the element name against the top of a stack and we pop the stack in case of success:
Now we need to define
elContent rule, which represents the content of an element.
elContent is zero or more repetitions of the following components in the given order:
Text can be anything.
Therefore, we define it as with the help of bounded seas, concretely using the
Note, we mark the
text rule with
nonEpsilon operator is an extension of PEGs that forbids epsilon parses (in other words if the underlying parser does not consume any input, it fails).
The reason for this is that
#any asPParser starLazy can consume anything, even the empty string, because the
starLazy operator allows for zero repetitions.
there might be possibility to define
plusLazy as well.
nonEpsilon, the star repetition of
elContent would end up in an infinite loop recognizing an epsilon in each of its iterations, never failing, never stopping.
You can easily freeze your image by running the following code (we recommend saving your image now):
We have written a new code, let us try if it works.
We start with
We should be able to parse malformed elements as well.
Lets see if the
pop magic works, as expected:
So far it looks good. But our tests are telling us only that element can parse the given input, it does not tell us how it parses the input. We should assert that the proper element names and the right content is extracted.
It is the time to return a more convenient representation of the input: an abstract syntax tree. We will do this in the following chapter.
This text is part of the Parsing With PetitParser2 series.
Part I, Developer's Workflow:
Go to top.