Harald Kirsch

genug Unfug.

Absimpa — Abstract Simple Parser

Abstract Simple Parser

Absimpa is a Java class library that provides a recursive decent parser. It is an experiment in abstraction and Java generics. I tried to carve out the essentials of what makes out a parser and to leave out, or rather leave free for specification, what is not relevant for the parsing process.

As a result, for example, the parser does not deal with String objects. This is left to the lexical analyzer, or lexer, for short. In fact, the whole package does not even assume that the lexer deals with strings. The only assumption made is, that the lexer produces a sequence of objects from some enumeration type C. C is a generic parameter and users of this package may freely define what they need in their application. As a result, the lexer interface basically looks like this:

public interface Lexer<N,C extends Enum<C>> {
  c current();
  N next();

In the usual lexer design, the lexer analyzes some string and produces tokens, a core ingredient of which is the piece of text the lexer recognized at the current input position. This seems to collide with the fact that Lexer.current() must produce values from an enumeration. There is no place to hold an arbitrary and ever changing piece of text in an enumeration object.

The puzzle is solved when we look at what the parser does when it receives a C from the lexer and finds it to be just right in its current stage of analysis. The parser, then calls next() to obtain an object of the type into which the parsed sequence is to be transformed.

In particular the lexer may now provide textual information in the generated N to create a classic syntax tree node. The parser, however, is completely desinterested in what N is. It will just pass it around and eventually return an object of type N as the result of the parse.

For more information and a step by step guide, see the javadocs for GrammarBuilder.