Grammar
'' and a syntax
extension file ``pa_extend.cmo
'' to program grammars. All
details on modules, types and functions are described
chapter 7.Grammar.create
''. It takes a lexer as parameter. A good
candidate it the function ``Plexer.make
'' but the user can
creates its own lexer, providing that it is of type
``Token.lexer
''.Grammar.Entry.create
''. The first parameter is the grammar,
the second one is a string used to name the entry in error messages
and in entries dump. Entries are created empty, i.e. raising
``Stream.Error
'' when called. An entry is composed of entry
precedence levels, the first one being the least precedent and the
last one the most.Grammar.Entry.parse
''. In case of syntax error, the
exception ``Stream.Error
'' is raised, encapsulated by
the exception ``Stdpp.Exc_located
'', giving the location
of the error.Grammar.extend
''. But its interface being quite complicated
and, as it must be used with appropriate type constraints, the Camlp4
library provides a file, named ``pa_extend.cmo
'', compatible
with ``pa_o.cmo
'' and ``pa_r.cmo
'' which creates a new
instruction doing this work.EXTEND
'' which has the following
format:EXTEND |
{ GLOBAL : global-list ; } |
entry : { position } extension ; |
... |
entry : { position } extension ; |
END |
EXTEND
, GLOBAL
and END
are keywords.
There are some other keywords in this instruction, all in uppercase.EXTEND
''
instruction. The other entries are created locally. By default, all
entries are global and must correspond to entry variables visible at
the ``EXTEND
'' instruction point.FIRST
: The extension is inserted at the beginning
of the precedence levels.
LAST
: The extension is inserted as the end of the
precedence levels.
BEFORE
label: The extension is inserted
before the precedence level so labelled.
AFTER
label: The extension is inserted
after the precedence level so labelled.
LEVEL
label: The extension is inserted
at the precedence level so labelled.
LEVEL
extends already existing levels: the other
cases create new levels.[ |
{ label } { associativity } level-rules | |
| |
... | |
| |
{ label } { associativity } level-rules | ] |
LEFTA
, RIGHTA
or NONA
for respectively left,
right and no associativity: the default is left associativity.[ |
{ pattern = } symbol ; ... { pattern
= } symbol { -> action } |
|
| |
... | |
| |
{ pattern = } symbol ; ... { pattern
= } symbol { -> action } |
] |
loc
'' is bound to the source location of the rule.
The action part is optional; by default it is the value ``()
''.string
.
LIST0
and LIST1
whose syntax is:
LISTx
symbol1 { SEP
symbol2 }
LIST0
and with at least
one element for LIST1
) of symbol1, whose
elements are optionally separated by symbol2.
The type is t1 list
where t1
is the type of symbol1 (the result of the optional symbol2 is lost).
OPT
followed by a symbol, meaning this symbol or
nothing. The type is t option
where t
is the type of
symbol.
Grammar.delete_rule
''. But, like for
``Grammar.extend
'', it is not documented. One must use the
instruction ``DELETE_RULE
'', generating a call to this
function. This instruction is a syntax extension, loaded together with
the instruction ``EXTEND
'' by the file
``pa_extend.cmo
''.DELETE_RULE
'' is:DELETE_RULE |
entry : symbol ; ... symbol |
END |
Token.t
which is
actually (string * string)
, the first string being a
constructor (which must be an identifier starting with an uppercase
letter) and the second string the value.IDENT
and it put the
identifier value as second string. For example, reading "foo"
,
it returns: ("IDENT", "foo")
. Another example, if your lexer
read integers, you can use INT
as constructor and the string
representation of the integer in the string, e.g. ("INT", "32")
.EXTEND
statement, you can use as symbol a
constructor with a specific value, e.g:IDENT "bar" INT "32"which recognizes only the identifier
"bar"
or only the integer
32. Another possible symbol is the constructor alone, which recognizes
any value of this constructor. It is useful to assign it to a pattern
identifier, to use it in the action part of the rule:p = IDENT i = INTNotice that you can use any name for your constructors, provided they are identifiers starting with an uppercase letter, and not in conflict with the predefined symbols in the
EXTEND
statement which are:
SELF
, NEXT
, LIST0
, LIST1
, OPT
.""
(the empty string) as constructor, you can use the second string
directly in the rules. It is the case in our examples for the
operators "+"
, "*"
and so on.(Token.t * Token.location)
from a character stream. The type
Token.t
is defined above and Token.location
is a couple
of integers giving the input location (the first character of the input
stream having position 0).(0, 0)
as
location; it does not prevent the system to work, since it is
used only in error messages.Token
provides a function to create a lexer function from an
ocamllex lexing function (see the interface of module
Token
). Moreover, this function takes care of the location
stuff.Grammar.create
function is a record
of type Token.lexer
which is a record with the following fields:func
: it is the main lexing function. It is called once
when calling Grammar.Entry.parse
with the input character
stream. The simplest way to create this function is to apply
Token.lexer_func_of_parser
to your stream parser lexer, or
Token.lexer_func_of_ocamllex
to your ocamllex lexing function.using
: it is a function taking a token pattern as
parameter. When EXTEND
statement first scans all symbols in all
rules, it calls this function with the token patterns encountered (a
token pattern is of type (string * string)
). This function
allows you to check the pattern (that the constructor is among the
ones the lexer generates: raise an exception if not) and enter
keywords (if you have keywords) in your keyword table (if you use a
keyword table).removing
: like using
but used when a rule is
removed.tparse
: tells how a token pattern is parsed. Called each
time the grammar machinery has to compare the input token to a token
pattern of a rule. This function must return Some
specific
parser or None
for the standard token parsing, which is,
for the pattern (p_con, p_prm)
:
if p_prm = "" then parser [< '(con, prm) when con = p_con >] -> prm else parser [< '(con, prm) when con = p_con && prm = p_prm >] -> prm
text
: give the name of a token pattern to be used in
error messages. You can use the function named lexer_text
provided in the module Token
.