Difference between revisions of "Lexer Parser"

m (not links)
(Update lexer/parser page with actual outcome, mention Nick's work with perplex)
 
(3 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 
BRL-CAD needs to be able to express grammars for geometry formats in order to provide a flexible, fast, robust and comprehensive framework for conversion from one geometry format type to another.  Ideally, we will use tools designed specifically to generate lexers and parsers to automate the "grunt work" of handling such formats.  Candidates are listed below, with pros and cons.  Decison criteria include license, portability, robustness and ease of integration with our existing build logic.
 
BRL-CAD needs to be able to express grammars for geometry formats in order to provide a flexible, fast, robust and comprehensive framework for conversion from one geometry format type to another.  Ideally, we will use tools designed specifically to generate lexers and parsers to automate the "grunt work" of handling such formats.  Candidates are listed below, with pros and cons.  Decison criteria include license, portability, robustness and ease of integration with our existing build logic.
  
 +
UPDATE - the decision was ultimately for the combination of re2c/lemon, which Nick Reed demonstrated successfully on Windows MSVC and Linux/BSD/OSX builds.  Nick has written a scanner template tool called perplex which makes the scanner to surround the core re2c logic (unlike flex, re2c does not provide a pre-generated scanner.)  The current BRL-CAD svn trunk contains a successful CMake-based system for using perplex/re2c+lemon, including (if necessary) bootstrapping all three tools from C/C++ source code.
  
 
== Lexers ==
 
== Lexers ==
 
* Flex - http://flex.sourceforge.net/
 
* Flex - http://flex.sourceforge.net/
  
The 800 pound gorilla of open source lexers.  BSD license.  Modern versions require M4, which is why existing Windows versions stop at 2.54 - exploring NetBSD m4 as a possible supporting tool.  GNU m4 is a possibility if NetBSD's version doesn't work for some reason.
+
The 800 pound gorilla of open source lexers.  BSD license.  Modern versions require M4, which is why existing Windows versions stop at 2.54 - exploring NetBSD m4 as a possible supporting tool.  GNU m4 is a possibility from a licensing standpoint (LGPL) if NetBSD's version doesn't work for some reason.  Got flex+netbsd m4 compiling on some platforms, but modern flex code had some unix-isms that ultimately resulted in our going with re2c.
  
 
* re2c - http://re2c.org/
 
* re2c - http://re2c.org/
  
Public domain.
+
Public domain.  Seems like it can be built on Windows MSVC platforms.  Searching around, it looks like re2c + lemon may be a relatively common approach for those looking to replace flex/bixon.  A worked example: http://fasterparser.blogspot.com/2010/11/re2c-lexer-lemon-parser-calculator.html  Ultimately we went with this one - copy in src/other is a fork from the "main" version in that it has CMake build logic, and uses lemon instead of yacc for its parser code (the latter was done by Nick Reed to provide a unified toolchain using only re2c/lemon foundations - this way, re2c does not introduce a "hidden" dependency on yacc.  If bugs appear in generated re2c code, we need to fix re2c and not the generated code, and using lemon improves our ability to do so.)  If upstream is interested, we would be glad to have these changes incorporated into their codebase.
  
 
* Quex - http://quex.sourceforge.net/
 
* Quex - http://quex.sourceforge.net/
  
C++, LGPL licensed.  Author "doesn't like it to be used for military applications" although he apparently took the clause forbidding such use out of the license itself...  Requres Python.  Probably not viable for integration into BRL-CAD's build system, or at least not likely to be less trouble than flex itself.
+
C++, LGPL licensed.  Author "doesn't like it to be used for military applications" although he apparently took the clause forbidding such use out of the license itself...  Requres Python.  Probably not viable for integration into BRL-CAD's build system, or at least not likely to be less trouble than flex itself.  Decision was made to use re2c.
  
 
== Parsers ==
 
== Parsers ==
 
* Bison - http://http://www.gnu.org/software/bison/
 
* Bison - http://http://www.gnu.org/software/bison/
  
800 pound gorilla for parsers.  GNU GPL licensed, which pretty much makes it a non-starter for inclusion in BRL-CAD's tree.
+
800 pound gorilla for parsers.  GNU GPL licensed, which pretty much makes it a non-starter for inclusion in BRL-CAD's tree.  Never seriously considered, primarily due to license.
  
 
* Berkeley Yacc - http://invisible-island.net/byacc/
 
* Berkeley Yacc - http://invisible-island.net/byacc/
  
License compatible, implements some of the Bison features of interest to us.  Author has been quite helpful.  Current front-runner.
+
License compatible, implements some of the Bison features of interest to us.  Author has been quite helpful.  This came the closest of the non-lemon/re2c tools listed here of actually being used - got as far as building byacc on MSVC, but never tested it due to problems encountered with flex.  Decision was made to go with lemon due to known cross-platform robustness and ease of compilation, but byacc was never ruled out on its own merits.  byacc CMake and MSVC work is in the BRL-CAD subversion history if it is of interest to anyone.
  
 
* Lemon - http://www.hwaci.com/sw/lemon/
 
* Lemon - http://www.hwaci.com/sw/lemon/
  
License compatible, grammar definition style different from yacc.
+
License compatible, grammar definition style different from yacc.  Take a look at the example here: 
 +
http://osdir.com/ml/db.sqlite.general/2003-10/msg00235.html  Proved viable by work done by Nick Reed on obj-g convertor, easy to build and integrate, the accepted solution.
  
 
== Both ==
 
== Both ==
 +
 +
* Antlr http://www.antlr.org/
 +
 +
In many ways this is the most interesting of any of these tools (polish, community size, active developoment and use, language flexibility), but unfortunately it requires Java to generate code.  To use this, we would have to include generated C code in the tree, detect if it is installed, and if installed regenerate the C code (otherwise use the pre-generated code.)  This doesn't improve things much over Flex/Bison - particularly given that we'd have to rewrite all our existing code - except it would be much more reasonable to hope that someone could/would install Antlr on Windows.  Moot now for BRL-CAD due to the demonstrated viability of the C/C++ only solution of re2c/lemon.
  
 
* Styx - http://speculate.de/
 
* Styx - http://speculate.de/
  
Combination of GPL and LGPL - would have to dig to see if parts we need are usable or not.  Probably not of primary interest unless other options start to look really grim.
+
Combination of GPL and LGPL - would have to dig to see if parts we need are usable or not.  Probably not of primary interest unless other options start to look really grim - never got as far as seriously investigating this solution, so can't say much about it.

Latest revision as of 23:23, 23 December 2011

BRL-CAD needs to be able to express grammars for geometry formats in order to provide a flexible, fast, robust and comprehensive framework for conversion from one geometry format type to another. Ideally, we will use tools designed specifically to generate lexers and parsers to automate the "grunt work" of handling such formats. Candidates are listed below, with pros and cons. Decison criteria include license, portability, robustness and ease of integration with our existing build logic.

UPDATE - the decision was ultimately for the combination of re2c/lemon, which Nick Reed demonstrated successfully on Windows MSVC and Linux/BSD/OSX builds. Nick has written a scanner template tool called perplex which makes the scanner to surround the core re2c logic (unlike flex, re2c does not provide a pre-generated scanner.) The current BRL-CAD svn trunk contains a successful CMake-based system for using perplex/re2c+lemon, including (if necessary) bootstrapping all three tools from C/C++ source code.

Lexers[edit]

The 800 pound gorilla of open source lexers. BSD license. Modern versions require M4, which is why existing Windows versions stop at 2.54 - exploring NetBSD m4 as a possible supporting tool. GNU m4 is a possibility from a licensing standpoint (LGPL) if NetBSD's version doesn't work for some reason. Got flex+netbsd m4 compiling on some platforms, but modern flex code had some unix-isms that ultimately resulted in our going with re2c.

Public domain. Seems like it can be built on Windows MSVC platforms. Searching around, it looks like re2c + lemon may be a relatively common approach for those looking to replace flex/bixon. A worked example: http://fasterparser.blogspot.com/2010/11/re2c-lexer-lemon-parser-calculator.html Ultimately we went with this one - copy in src/other is a fork from the "main" version in that it has CMake build logic, and uses lemon instead of yacc for its parser code (the latter was done by Nick Reed to provide a unified toolchain using only re2c/lemon foundations - this way, re2c does not introduce a "hidden" dependency on yacc. If bugs appear in generated re2c code, we need to fix re2c and not the generated code, and using lemon improves our ability to do so.) If upstream is interested, we would be glad to have these changes incorporated into their codebase.

C++, LGPL licensed. Author "doesn't like it to be used for military applications" although he apparently took the clause forbidding such use out of the license itself... Requres Python. Probably not viable for integration into BRL-CAD's build system, or at least not likely to be less trouble than flex itself. Decision was made to use re2c.

Parsers[edit]

800 pound gorilla for parsers. GNU GPL licensed, which pretty much makes it a non-starter for inclusion in BRL-CAD's tree. Never seriously considered, primarily due to license.

License compatible, implements some of the Bison features of interest to us. Author has been quite helpful. This came the closest of the non-lemon/re2c tools listed here of actually being used - got as far as building byacc on MSVC, but never tested it due to problems encountered with flex. Decision was made to go with lemon due to known cross-platform robustness and ease of compilation, but byacc was never ruled out on its own merits. byacc CMake and MSVC work is in the BRL-CAD subversion history if it is of interest to anyone.

License compatible, grammar definition style different from yacc. Take a look at the example here: http://osdir.com/ml/db.sqlite.general/2003-10/msg00235.html Proved viable by work done by Nick Reed on obj-g convertor, easy to build and integrate, the accepted solution.

Both[edit]

In many ways this is the most interesting of any of these tools (polish, community size, active developoment and use, language flexibility), but unfortunately it requires Java to generate code. To use this, we would have to include generated C code in the tree, detect if it is installed, and if installed regenerate the C code (otherwise use the pre-generated code.) This doesn't improve things much over Flex/Bison - particularly given that we'd have to rewrite all our existing code - except it would be much more reasonable to hope that someone could/would install Antlr on Windows. Moot now for BRL-CAD due to the demonstrated viability of the C/C++ only solution of re2c/lemon.

Combination of GPL and LGPL - would have to dig to see if parts we need are usable or not. Probably not of primary interest unless other options start to look really grim - never got as far as seriously investigating this solution, so can't say much about it.