The talk starts with a short overview on how I personally like to tackle the task of implementing a language with Xtext. If the syntax is not yet carved in stone, I usually start of with some sketched sample files to get an idea about the different use cases. In doing so it's quite important to find a concise notation for the more common cases and to be more verbose with the unusual patterns that are anticipated in the language. As soon as the first version of the syntax is settled, it's obvious to begin with the grammar declaration.
That's a task that I really like. The grammar language of Xtext is probably the most concise and information rich DSL that I ever worked with. With very few orthogonal concepts it's possible to describe how a text is parsed and in the very same breath map those parsed information to a in memory representation. This representation is called abstract syntax tree (AST) and often referred to as model. The AST that Xtext yields is strongly typed and therefore heterogeneous, but still provides generic traversal possibilities since it is based on the Eclipse Modeling Framework (EMF, also: Ed Merks Framework). So the grammar is about the concrete syntax and its mapping to the abstract syntax.
As soon as the result of the parsing is satisfying, the next step when implementing a language is scoping. Without that one, any subsequent implementation efforts are quite a waste of effort. Scoping is the utility that helps to enrich the information in the AST by creating a graph of objects (Abstract Syntax Graph, ASG). This process is often called cross linking. Thereby some nodes in the tree will be linked with others that are not directly related to them in the first place. This is one of the most important aspects of a language implementation because after the linking and scoping was done, the model is actually far more powerful from a clients perspective. Any code that is written on top of that can leverage and traverse the complete graph even if the concrete language is split across many files.
Validation is the next step and it is implemented on top of the linked ASG. While the parser and the linking algorithm already produced some error annotations on invalid input sequences, it's the static constraint checking which will find the remaining semantic problems in the input. If the files were parsed and linked successfully and the static analysis does not reveal any problems, the model can be considered valid.
Now that one can be sure that the ASG as the in-memory representation of the files fulfills the semantic constraints of the language, it's possible to implement the execution layer which is often a compiler, a code generator or an interpreter. Actually those three are all very similar. You can think of a code generator as an interpreter which evaluates a model to a string. And of course a compiler is pretty much the same as a code generator but the output is not plain text but some sequence of bytes. The important thing is that the evaluation layer should (at least in the beginning) only consider valid input models. This will dramatically simplify the implementation and that's the reason why I like to implement that on top of a checked ASG. You don't have to take all those possible violated constraints into account.
Now there is of course still the huge field of the user interface that entwines around the editor and its services like content assist, navigation or syntax coloring. However, I would usually postpone that until the language runtime works at least to some extend.
The most important message in this intro is that this is not a waterfall process. All this can be implemented in small iterations each of which is accompanied with refined sample models, unit tests (!) and feedback from potential users.
In the next days I'll wrap up some of the main points of my presentation which will be about grammar tips, some hints on scoping, validation or content assist. Stay tuned for those!