Wednesday, October 31, 2012

Xtext Corner #5 - Backtracking vs Syntactic Predicates

The Xtext grammar language allows to create a working parser in almost no-time. Its concise notation to describe the concrete syntax and the mapping to an object model is giving quite a jump start if you want to create a language. Nevertheless it's also quite easy to get into some trouble. Xtext uses Antlr 3.2 as the underlying parser technology and we try really hard to hide the complexity and peculiarity of Antlr. Unfortunately that's not possible in all cases. From time to time Antlr will report ambiguities in the grammar definition with a charming message like this:
warning(200): Decision can match input such as "{EOF, RULE_ID, '('}" using multiple alternatives: 1, 2
As a result, alternative(s) 2 were disabled for that input
The parse generator basically complains about an ambiguous grammar. At some point in the syntax description it cannot decide which path to follow for a given input sequence. It's rather obvious that the warning message is not really helpful. Neither is there any chance to find the correct line that caused the error (which is not a problem of Antlr but cause by the translation of Xtext to Antlr) nor is it easily possible to spot the concrete decision that the parser generator complains about. The worst about this is that it's not really a warning either. What the parser generator basically did is the following: It removed a possible path from the grammar description. It will always choose the one remaining path for that particular situation. Which could by chance be the one that you'd expect. But it could also be the wrong path.


Fortunately there is a tool that helps to identify the problem: AntlrWorks allows to take a look at the grammar and visualize all the problems that it has graphically. It's still far from trivial to find the root cause of a problem but better than nothing. Make sure you pick the version 3.2 from download section if you want to give it a try.

Now you may wonder how you should handle the cases that are ambiguous by definition and by intention. You could of course enable backtracking in your language and afterwards everything appears to be fine. However, you can think about backtracking as a wildcard for Antlr to remove alternatives from your grammar everywhere where it spots an ambiguity. This will shadow the real problems in the grammar that may be introduced due to subsequent changes, a refactoring or new language features. That's why I strongly recommend to go for the hard way and analyze the root cause for the warnings. As soon as you found the actual decision that the parser generator complained about, you can use a syntactic predicate to fix that locally. Now you are in control on which alternative to remove and which path to follow. I think that makes perfect sense to be in charge in those cases.


But the shadowing of errors in the generator is only one drawback of backtracking. It will furthermore lead to surprising messages at run-time. If you consider the following snippet it's easy to see that the right operand of the binary operation is missing.
The parser will correctly report something along the lines of
mismatched input '}' expecting RULE_INT
Unfortunately it does so on a totally unexpected location. If you enabled backtracking and the algorithm decides that the function declaration is not complete - it fails to read a valid function body - the parsing will roll back to the start of the function and put the most specific error message on that token. You'll see an error marker under the keyword function. However, it would be more intuitive to have that error on the binary operation. At least that's what I would expect, wouldn't you?

Syntactic Predicates

Nevertheless it's not always possible to write an unambiguous grammar. There are some common patterns that are undecidable by definition. The most famous one is the Dangling else. If a language allows nested if-else constructs, it's not definite where a subsequent else keyword may belong to. Consider the following Java snippets which only differ in formatting:

The semantics of both code snippets should be independent from the formatting. Nevertheless it's ambiguous for the parser in the same way as a reader might be confused by an inconsistent indentation. Therefore you have to force the parser into one concrete direction in order to disambiguate the grammar. A syntactic predicate has to be added.

The => operator forces the parser to go a certain path if the input sequence would allow two or more possible decisions. It can be read as If you see these tokens, go this way. It's even possible to use alternatives or groups of elements as the criteria. Only the UnorderedGroup is prohibited in predicates.

In this example, the parser shall follow the given path if it can look ahead to a sequence like (or more abstract ID . ID =).

Implementation Detail

One thing is important to note. Syntactic predicates in Xtext are different from the plain Antlr predicates. In the Xtext grammar language it's only possible to use a complete or a partial sequence of production tokens as the predicate where Antlr allows to use arbitrary tokens that seem to be independent from the actual rule content. Here Antlrs approach appears to be more powerful. But actually that's only at a first glance. Firstly Xtext's variant is easier to use since you don't have to repeat parts of your grammar manually. And secondly it's the framework that does the heavy lifting: The syntactic predicates in Xtext are automatically propagated to the right places which you'd have to do manually otherwise. Just insert it at the spot that you identified with AntlrWorks and you're done.

Monday, October 29, 2012

Xtext Corner Revived

It's been a long time since I wrote about Xtext tips and tricks. However, I assembled a bunch of interesting tips and tricks while I prepared my Xtext Best Practices session for this years EclipseCon which I want to share with you.

The talk starts with a short overview on how I personally like to tackle the task of implementing a language with Xtext. If the syntax is not yet carved in stone, I usually start of with some sketched sample files to get an idea about the different use cases. In doing so it's quite important to find a concise notation for the more common cases and to be more verbose with the unusual patterns that are anticipated in the language. As soon as the first version of the syntax is settled, it's obvious to begin with the grammar declaration.

That's a task that I really like. The grammar language of Xtext is probably the most concise and information rich DSL that I ever worked with. With very few orthogonal concepts it's possible to describe how a text is parsed and in the very same breath map those parsed information to a in memory representation. This representation is called abstract syntax tree (AST) and often referred to as model. The AST that Xtext yields is strongly typed and therefore heterogeneous, but still provides generic traversal possibilities since it is based on the Eclipse Modeling Framework (EMF, also: Ed Merks Framework). So the grammar is about the concrete syntax and its mapping to the abstract syntax.

As soon as the result of the parsing is satisfying, the next step when implementing a language is scoping. Without that one, any subsequent implementation efforts are quite a waste of effort. Scoping is the utility that helps to enrich the information in the AST by creating a graph of objects (Abstract Syntax Graph, ASG). This process is often called cross linking. Thereby some nodes in the tree will be linked with others that are not directly related to them in the first place. This is one of the most important aspects of a language implementation because after the linking and scoping was done, the model is actually far more powerful from a clients perspective. Any code that is written on top of that can leverage and traverse the complete graph even if the concrete language is split across many files.

Validation is the next step and it is implemented on top of the linked ASG. While the parser and the linking algorithm already produced some error annotations on invalid input sequences, it's the static constraint checking which will find the remaining semantic problems in the input. If the files were parsed and linked successfully and the static analysis does not reveal any problems, the model can be considered valid.

Now that one can be sure that the ASG as the in-memory representation of the files fulfills the semantic constraints of the language, it's possible to implement the execution layer which is often a compiler, a code generator or an interpreter. Actually those three are all very similar. You can think of a code generator as an interpreter which evaluates a model to a string. And of course a compiler is pretty much the same as a code generator but the output is not plain text but some sequence of bytes. The important thing is that the evaluation layer should (at least in the beginning) only consider valid input models. This will dramatically simplify the implementation and that's the reason why I like to implement that on top of a checked ASG. You don't have to take all those possible violated constraints into account.

Now there is of course still the huge field of the user interface that entwines around the editor and its services like content assist, navigation or syntax coloring. However, I would usually postpone that until the language runtime works at least to some extend.

The most important message in this intro is that this is not a waterfall process. All this can be implemented in small iterations each of which is accompanied with refined sample models, unit tests (!) and feedback from potential users.

In the next days I'll wrap up some of the main points of my presentation which will be about grammar tips, some hints on scoping, validation or content assist. Stay tuned for those!

EclipseCon 2013, Proposal Submitted? Check!

As the early bird submission deadline for the EclipseCon 2013 in North America is approaching, I took the time and proposed a session that I had quite some fun with in Ludwigsburg.

The overwhelming interest in my talk about Java Performance MythBusters motivated me to propose round 2. I expect that the time until next years EclipseCon will bring some new insights and refined numbers, too.

After all, Java8 is currently under heavy development and so it's quite likely that the measured times and numbers will change dramatically. And of course it will be interesting to take a look at the performance characteristics of other platforms, e.g. Linux and Windows.

Which topics would you be interested in? Run-time cost of reflective access? Arrays vs collections? Auto-boxing? There are still plenty of myths out there and I will again pick some of them to go for the next round. Let's put them on the test-bet!

Xtend @ JUGF

In Frankfurt and no plans for Wednesday evening? How about joining the JUGF-Stammtisch on 31 Oct 2012 at 18:30 in the German National Library. I will be there and give a talk about Xtend featuring a preview of the upcoming language feature called Active Annotations. If you are interested in the latest news on Xtend, make sure you register for the session and attend the Stammtisch. See you there!

Friday, October 26, 2012

EclipseCon Europe 2012 - Wrap-Up

As promised, this years EclipseCon Europe again was a great community event with astonishing technical content, outstanding food and most importantly many good friends. The conference organizers did a great job and prepared something for everybody: there were autonomously flying robots, a circus with do-it-yourself fire breathing and a great live band. It's this package which makes the EclipseCon a unique and memorable experience. And the co-located beerfest at the Nestor Bar did its share to ensure that we don't get too much sleep.

However, as Sepp Herberger put it: "After the game is before the game!" The next EclipseCon will be in Boston, 25 - 28 March.

The early bird deadline for the call for papers is the 31 Oct. Don't hesitate and submit your proposals about the things that you want to share with others! The more the merrier!

There will also be an EclipseCon in France, on 5-6 June for the first time. After the great success of this years EclipseDay in Toulouse, the foundation will organize a two day conference there in the next year! Stay tuned for the call for papers.

And of course you should safe the date for next years ECE in Ludwigsburg from 29-31 Oct.

In the meantime make sure you don't forget to complete the conference survey and provide feedback for the speakers.

Tuesday, October 16, 2012

EclipseCon Europe - Join the Party!

Only one week until EclipseCon Europe 2012 will take off in Ludwigsburg. Again hundreds of Eclipse enthusiasts will strive for the next record of highest WiFi usage ever in the Swabian city with the largest baroque castle in Germany. From Oct 23 to 25 the Forum am Schlosspark will transform to a vibrant place of technical discussions, entertaining sessions and socializing. Thanks to the huge amount of submissions from the Community the program committee managed again to tie up three days of deep technical content about Eclipse, the framework and the ecosystem, about its past, present and future (actually not to much about the past, but that's a good thing, isn't it?).

I will have the pleasure to talk about a potpourri of different topics that each cover some field of interest of mine.

Tue 9:00AM - 12:30PM: Getting Started With Xtend
My conference starts on Tuesday Morning at 9:00 in the Schubartsaal. Sven and I give a tutorial about Xtend where you will have the chance to get your hands dirty on interesting and challenging programming problems and puzzlers. You should not miss that one!

Wed 2:00PM - 2:30PM: Xtext - Best Practices
On Wednesday I will share lessons learned when using the Xtext framework. I will cover a number of topics that I encountered in the Xtext newsgroup and other noteable things that can be important in your daily work with Xtext. If you are already familiar with this cool framework and want to know more about it or just contribute your own experience to the discussion, stop by in the Theater on Wednesday, 2:00 PM.

Thu 10:30 - 11:00: Java Performance MythBusters
The submission of this talk was inspired by a talk by Arno Haase that I attended at the JAX (Arno greatfully gave permission to hijack the title of his talk - thanks for that!). In this session I want to shed light on some myths about Java's performance and often recommended Dos and Don'ts. Come to the Schubartsaal on Thursday 10:30 and I bet you'll be surprised.

Thu 1:30PM - 2:00PM: Null-Safety on Steroids
Even though the new annotation based null-ness analysis of Eclipse Juno is often very helpful, I am not really fond of the implications that their design has on a reasonable sized code-base. In my last session at this year's ECE I want to share my impressions about null-safety and static analysis. Join me in Silchersaal, Thu 1:30PM if you want to learn about different approaches to tackle the infamous NullPointerException.

Of course there are other interesting sessions, too, e.g. John Arthorne raises the question about The future of Eclipse. The marriage of JavaFX and e4 seems to be a hot topic, too, since JavaFX is a quite powerful rendering technology. And naturally I'm excited about other Xtend and Xtext related content.

If you are still not convinced, take a look at the schedule yourself and make sure you join the party!

Monday, October 8, 2012

Revisited: Xtend @ JavaOne 2012

My talk about Xtend at this years JavaOne is now available in the content catalog on the conference website.

After a quick motivation and the answer to the obvious question "Why the heck did these guys develop yet another JVM language?", I gave a short overview on the basic ideas and design principles behind Xtend. Next up was a demo with different code snippets. Basically it was a walk-through with the examples that can be loaded into everybody's Eclipse as soon as the Xtend SDK is installed. Just select New -> Example... -> Xtend Introductory Examples and there you go.

The last part of the talk was about Active Annotations, a unique feature that will be part of the next version of Xtend. To put it into a few words it's Java's annotation processing on steroids. Active annotations may contribute to the translation of Xtend code to Java and even modify the result of that process. They allow to create additional types, use information from other resources or validate the Xtend code according to the annotation's semantics. Along with the powerful means to design creative and expressive APIs and the tight integration with Xtext languages, Xtend's exceptional support for domain-specific languages is raised to the next level by Active Annotations. Stay tuned for more information!