|
Absimpa v196 | ||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectabsimpa.GrammarBuilder<N,C>
N - is the type of the objects created by the parser to be
generated from the grammar constructed with a GrammarBuilderC - is the type of token codes provided by the lexer that will be
used by parserpublic class GrammarBuilder<N,C extends java.lang.Enum<C>>
helps to build a Grammar for a language made from objects of type
C. To create your grammar, the following bits and pieces are
needed.
Lexer is needed that provides token codes of type
C. It is completely up to you what these token codes stand for.
For the parser and the grammar the only interesting bit is that C
is an Enum. Typically a token code signifies a type of
token like NUMBER, OPENBRACKET, IDENTIFIER and so on. Normally the lexer
will internally associate the current token code also with a piece of
text, the token text, but the parser we built is never interested in this
text. Nevertheless, the text is not lost. Read on.N is needed. This package makes no assumption
about the type of result created, it is just called N. Whenever
the parser has recognized a token, it will ask the lexer to provide a leaf
node for it. And when it has recognized a partial bit of input, it will
call a NodeFactory with the bits it has recognized and asks it to
create an N.NodeFactory objects are needed. Whenever the
parser recognizes a part of the input, for example a sequence of tokens as
described by the seq() method, it will call the associated
NodeFactory with the bits it recognized to obtain a new N.
It is up to you to define what N is. It may be a type that
describes a syntax tree, but it may as well describe something completely
different that is incrementally built up while parsing.
To build a grammar, start by defining a few token recognizing grammars
with token(). Then you can combine these, for example into
a sequence, by passing the first to seq() and then adding
more with Sequence.add(absimpa.Grammar. To repeat things, use star()
or repeat(). Use choice() and subsequently
Choice.or() to create a choice of subnodes. A rough
outline of an example is:
GrammarBuilder<...> gb = new GrammarBuilder<...>(nodeFactory); Grammar<...> NUMBER = gb.token(CODE_NUMBER); Grammar<...> product = gb.star(NUMBER); Parser<...> parser = product.compile();
This would define a product as arbitrary length sequence of
NUMBERs. When the parser has collected all NUMBERs
available, it will call the nodeFactory with the list of results
provided by the lexer to the token parser. The result of the parse would
be whatever the nodeFactory makes out of the list. It could return
an object that contains the list as a field, but it could as well multiply
the numbers and return an N that contains just the product.
Slightly tricky is the creation of a recursive grammar. Create a
Recurse as a placeholder and later set its recursive child with
Recurse.setChild(Grammar).
The use of the GrammarBuilder is recommended over direct use of
the grammar classes like Sequence, Choice and so on,
because it reliefs from a lot of generic parameter typing.
To make sure that parsers compiled from the grammars produced here parse
the whole input sequence, the lexer eventually needs to produce a specific
eof-token which is explicitly matched by the grammar. Suppose your grammar
is nothing but G-> (SCOPE TERM)+. The parser will happily parse a
non-empty sequence of SCOPE and TERM pairs. In particular
it will succeed for the sequence SCOPE TERM TERM with parsing the
first pair and will leave the 2nd TERM unparsed. To prevent this,
the grammar should rather be G-> (SCOPE TERM)+ EOF.
It would be nice to use varargs methods for seq() and choice(). But due to the generic parameters of grammars, there would then
always be the compiler warning about unsafe conversion to array. Therefore
there are variants of those methods with up to 5 parameters. Only beyond
that, varargs is used.
Remark: This package has no support to create a
Lexer, but there is a SimpleLexer the
source code of which may serve to demonstrate the principles.
| Constructor Summary | |
|---|---|
GrammarBuilder(NodeFactory<N> defaultFactory)
the resulting GrammarBuilder will enter the given factory into
grammar objects as they are created, if no factory is explicitly
provided. |
|
| Method Summary | |
|---|---|
Choice<N,C> |
choice(Grammar<N,C> g)
creates a grammar to recognize any one of a number of sub grammars, one of which is the given g. |
Choice<N,C> |
choice(Grammar<N,C> g1,
Grammar<N,C> g2)
|
Choice<N,C> |
choice(Grammar<N,C> g1,
Grammar<N,C> g2,
Grammar<N,C> g3)
|
Choice<N,C> |
choice(Grammar<N,C> g1,
Grammar<N,C> g2,
Grammar<N,C> g3,
Grammar<N,C> g4)
|
Choice<N,C> |
choice(Grammar<N,C> g1,
Grammar<N,C> g2,
Grammar<N,C> g3,
Grammar<N,C> g4,
Grammar<N,C> g5)
|
Choice<N,C> |
choice(Grammar<N,C> g1,
Grammar<N,C> g2,
Grammar<N,C> g3,
Grammar<N,C> g4,
Grammar<N,C> g5,
Grammar<N,C>... more)
|
Repeat<N,C> |
opt(Grammar<N,C> grammar)
convenience function to call repeat(Grammar,int,int) with
min=0 and max=1. |
Repeat<N,C> |
opt(NodeFactory<N> nf,
Grammar<N,C> grammar)
convenience function to call repeat(NodeFactory,Grammar,int,int) with min=0 and
max=1. |
Repeat<N,C> |
repeat(Grammar<N,C> grammar,
int min,
int max)
creates a grammar like repeat(NodeFactory, Grammar,int,int),
but with the default NodeFactory. |
Repeat<N,C> |
repeat(NodeFactory<N> factory,
Grammar<N,C> g,
int min,
int max)
creates a grammar to recognize a repetition of the given subgrammar g. |
Sequence<N,C> |
seq(Grammar<N,C> g)
creates a grammar like seq(NodeFactory, Grammar), but with the
default NodeFactory. |
Sequence<N,C> |
seq(Grammar<N,C> g1,
Grammar<N,C> g2)
|
Sequence<N,C> |
seq(Grammar<N,C> g1,
Grammar<N,C> g2,
Grammar<N,C> g3)
|
Sequence<N,C> |
seq(Grammar<N,C> g1,
Grammar<N,C> g2,
Grammar<N,C> g3,
Grammar<N,C> g4)
|
Sequence<N,C> |
seq(Grammar<N,C> g1,
Grammar<N,C> g2,
Grammar<N,C> g3,
Grammar<N,C> g4,
Grammar<N,C> g5)
|
Sequence<N,C> |
seq(Grammar<N,C> g1,
Grammar<N,C> g2,
Grammar<N,C> g3,
Grammar<N,C> g4,
Grammar<N,C> g5,
Grammar<N,C>... more)
|
Sequence<N,C> |
seq(NodeFactory<N> factory,
Grammar<N,C> g)
creates a grammar to recognize a sequence of subgrammars which starts with g. |
Repeat<N,C> |
star(Grammar<N,C> grammar)
convenience function to call repeat(Grammar,int,int) with
min=0 and max=Integer.MAX_VALUE. |
Repeat<N,C> |
star(NodeFactory<N> nf,
Grammar<N,C> grammar)
convenience function to call repeat(NodeFactory,Grammar,int,int) with min=0 and
max=Integer.MAX_VALUE. |
TokenGrammar<N,C> |
token(C code)
creates a grammar to recognize the given token code. |
| Methods inherited from class java.lang.Object |
|---|
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Constructor Detail |
|---|
public GrammarBuilder(NodeFactory<N> defaultFactory)
the resulting GrammarBuilder will enter the given factory into
grammar objects as they are created, if no factory is explicitly
provided.
| Method Detail |
|---|
public TokenGrammar<N,C> token(C code)
creates a grammar to recognize the given token code.
public Sequence<N,C> seq(NodeFactory<N> factory,
Grammar<N,C> g)
creates a grammar to recognize a sequence of subgrammars which starts
with g. The factory provided will be used to transform
the recognized list of elements into the result type N. To add
more subgrammars to the sequence, use Sequence.add(absimpa.Grammar.
public Sequence<N,C> seq(Grammar<N,C> g)
creates a grammar like seq(NodeFactory, Grammar), but with the
default NodeFactory.
public Sequence<N,C> seq(Grammar<N,C> g1,
Grammar<N,C> g2)
public Sequence<N,C> seq(Grammar<N,C> g1,
Grammar<N,C> g2,
Grammar<N,C> g3)
public Sequence<N,C> seq(Grammar<N,C> g1,
Grammar<N,C> g2,
Grammar<N,C> g3,
Grammar<N,C> g4)
public Sequence<N,C> seq(Grammar<N,C> g1,
Grammar<N,C> g2,
Grammar<N,C> g3,
Grammar<N,C> g4,
Grammar<N,C> g5)
public Sequence<N,C> seq(Grammar<N,C> g1,
Grammar<N,C> g2,
Grammar<N,C> g3,
Grammar<N,C> g4,
Grammar<N,C> g5,
Grammar<N,C>... more)
public Repeat<N,C> repeat(NodeFactory<N> factory,
Grammar<N,C> g,
int min,
int max)
g. The provided factory will be used to transform the
recognized list of elements into the result type N.
min - is the minimum number of times the subgrammar must be found
in the inputmax - is the maximum number of times the subgrammar must be found
in the inputstar(absimpa.Grammar) ,
opt(absimpa.Grammar)
public Repeat<N,C> repeat(Grammar<N,C> grammar,
int min,
int max)
creates a grammar like repeat(NodeFactory, Grammar,int,int),
but with the default NodeFactory.
public Repeat<N,C> star(Grammar<N,C> grammar)
repeat(Grammar,int,int) with
min=0 and max=Integer.MAX_VALUE.
public Repeat<N,C> star(NodeFactory<N> nf,
Grammar<N,C> grammar)
repeat(NodeFactory,Grammar,int,int) with min=0 and
max=Integer.MAX_VALUE.
public Repeat<N,C> opt(Grammar<N,C> grammar)
repeat(Grammar,int,int) with
min=0 and max=1.
public Repeat<N,C> opt(NodeFactory<N> nf,
Grammar<N,C> grammar)
repeat(NodeFactory,Grammar,int,int) with min=0 and
max=1.
public Choice<N,C> choice(Grammar<N,C> g)
g. To add more sub grammars to the choice,
call Choice.or(absimpa.Grammar) . In contrast to all other grammars, the Choice does not need a NodeFactory, because the resulting
parser just passes on the choice that was recognized.
public Choice<N,C> choice(Grammar<N,C> g1,
Grammar<N,C> g2)
public Choice<N,C> choice(Grammar<N,C> g1,
Grammar<N,C> g2,
Grammar<N,C> g3)
public Choice<N,C> choice(Grammar<N,C> g1,
Grammar<N,C> g2,
Grammar<N,C> g3,
Grammar<N,C> g4)
public Choice<N,C> choice(Grammar<N,C> g1,
Grammar<N,C> g2,
Grammar<N,C> g3,
Grammar<N,C> g4,
Grammar<N,C> g5)
public Choice<N,C> choice(Grammar<N,C> g1,
Grammar<N,C> g2,
Grammar<N,C> g3,
Grammar<N,C> g4,
Grammar<N,C> g5,
Grammar<N,C>... more)
|
Absimpa v196 | ||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||