Logo designer: Guillermo M. Zambrino

CodeWorker

A universal parsing tool & a source code generator


Since 19jan2003

Software+Documentation+Website Release: 4.5.3
Last update: april 27, 2010

Distributed under the terms of the GNU Lesser General Public License.
Tutorial (in English): Practice of parsing and code generation for generative programming Tutoriel (en français) : Automatiser le développement d'applications; parcours découverte

Presentation

CodeWorker is a versatile Open Source (GNU Lesser General Public License) parsing tool and a source code generator devoted to generative programming. Generative programming is a software engineering approach interested in automating the production of reusable, tailor-made, adaptable and reliable IT systems.
In layman's terms, CodeWorker lets you generate code by parsing existing languages, or by creating and parsing your own language. Once a language file has been parsed, CodeWorker provides several techniques for generating code.

The tool's scripting language drives the parsing and source code generation process. The scripting language syntax is derived from the C family of languages, making it familiar to most programmers. The template syntax is like JSP, ASP, or Velocity. It has variations for parsing, code generation, or procedural programming, giving the developer a number of options for organizing CodeWorker projects.

It is also possible to integrate the CodeWorker's features about parsing and code generation into C++, Java and .NET applications:

Please do not hesitate to contact us (questions, criticisms, suggestions, ...).

Integration in Eclipse

CodeWorker provides a scripting language adapted both to the description of language grammars and to the writing of code generation templates. Unfortunately, scripts are arduous to read without syntax coloring.

Hopefully, there now exists an plugin (click on the image to download it). The plugin provides an editor for each kind of script, with syntax highlighting. An informational tooltip appears when the cursor moves on built-in functions. The reference manual is completely integrated in the Help Contents menu topic. A tutorial is also accessible from the same location.

Future developments are coming, relative to launching a code generation project from the IDE, instead of just running the command line in a shell.

Parsing

CodeWorker can be trained to parse almost any language and provides two distinct methods for creating parsers:
  • the extended-BNF notation is declarative, and is a derivative of BNF (the Backus-Naur Form defines the grammar of a language) extended with regular expressions, predefined non-terminals and useful directives. Something close to javacc or to ANTLR in the JAVA world except a separate parser class is not necessary with CodeWorker. This means that parsing scripts can be tested without having to compile a separate parser class.
  • Reading tokens is procedural and a somewhat obsolete now that CodeWorker handles BNF parsing scripts smoothly.
While parsing files, CodeWorker feeds nodes into a parse tree. A tree is a convenient structure to represent a hierarchical set of nodes, as in XML for instance.

The parse tree is populated by the parsing task, and used by the source code generation script to generate code, text or binary data.

Source Code Generation

CodeWorker can parse a language and use the resulting parse tree to generate source code via template-based scripts. One example is database DDL (Database Definition Language). CodeWorker has been used to parse DDL and generate large portions of a Java application.

CodeWorker's source code generation can occur in three ways: generation, expansion or translation.

  • generation uses a script, much like JSP or PHP, to produce an output file. Only certain areas, called protected areas in the vocabulary of CodeWorker, are preserved in the file.
  • expansion is used when small portions of an existing file need to be generated. The points where code is to be inserted are called markups in the vocabulary of CodeWorker, and code is inserted at the markups. The Class Wizard of Visual C++ generates code using this principle.
  • translation mode is used when both parsing and source code generation are required to produce a file. Here are the description of two main families of use:
    • source-to-source translation: a file must be rewritten in a different syntax. For example, a LaTeX file might have to be translated in HTML.
    • program transformation: a source file has to change for optimizing, refactoring, instrumenting or rewriting selected portions. For example, a script could add a trace at the beginning of each function body of a JAVA or C++ source code. To do that, parsing discovers function bodies, and source code generation will insert the code that implements the trace.

Parse-to-Generate: a straightforward process

Tasks for parsing specifications and generating code are executed in a straightforward process presented in the figure below. Because CodeWorker includes an expressive scripting language, there is no need for a separate "glue language" to join parsing and generation tasks.

The figure describes the classical approach used in a leader script interpreted by CodeWorker. It shows the script, which calls first a BNF-parse script, before continuing in sequence and calling a template-based script.

There are some other capabilities not noted above, for example CodeWorker can be run as a CGI program.

 CodeWorker is maintained by Cedric Lemaire. Please send a mail to Submit a bug or feature.
Many thanks to Guillermo M. Zambrino, designer of the CodeWorker's logo.