Sebastian Zarnekow's Blog: 2012

Thursday, December 13, 2012

Fixed Checked Exceptions - The Xtend Way

Recently I stumbled across a post about checked exceptions in Sam Beran's Java 8 blog. What he basically described is a means to reduce the burden when dealing with legacy APIs that ~~abused~~ used Java's checked exceptions. His example is build around the construction of a java.net.URI which may throw an URISyntaxException.
Actually the URI class is not too bad, since it already provides a static factory URI#create(String) that wraps the checked URISyntaxException in an IllegalArgumentException, but you get the idea.

An Attempt to Tackle Checked Exception

Now, that Java will finally get lambda expressions with JSR 335, Sam suggests to use some utility class in order to avoid littering your code with try { .. } catch () statements. For example, Throwables#propagate could take care of that boilerplate:
Does that blend? I don't think so. That's still way too much code in order to deal with something that I cannot handle at all in the current context - and compared to the Java7 version, it's not much of an improvement either. The latter does not even carry the stacktrace so the actual code would more look like this:
According to the number of characters and taking into account that this snippet does not even tell the reader which sort of exception was expected, I would always go for the classic try { .. } catch ().

Or I'd Use Xtend.

Xtend will transparently throw the checked exception if you don't care about it. However, if you want to catch and handle it, feel free to do so. For the ~~common~~ other cases, the Xtend compiler uses the sneaky throw mechanism that is used in project Lombok, too. It just uses some generics magic to trick the Java compiler thus allowing to throw a checked exception without declaring it. You are free to catch that one whenever you want. There is no need to wrap it into some sort of RuntimeException just to convince the compiler that you know what you are doing.

By the way: You could of course use something like Throwables with Xtend, too:
That's what I consider fixing checked exceptions.

Tuesday, November 27, 2012

Performance Is Not Obvious

Recently there was a post in the Xtext forum about the runtime performance of a particular function in the Xtext code base:

Ed Merks suggested to rewrite the method to a loop iteration instead of a recursive function and to save one invocation of the method eContainer such as the new implementation shall become at "least twice as fast."

I really liked the new form since is much easier to debug and to read and from that point a the change is definitely worth it. However, as I recently did a deep dive into the runtime behavior of the JVM, I doubted that the change would have to much impact on the actual performance of that method. So I took the time and sketched a small caliper benchmark in order to double check my intuition.

As it turns out, the refactored variant is approximately 5 to 20% faster for the rare cases where a lot of objects have to be skipped before the result was found and takes the same time for the common case where the requested instance is found immediately. So it's not even close to the expected improvements. But what's the take-away?

Before applying optimizations it's worthy to measure the impact. It may be intuitive to assume that cutting the number of method invocation down to a fraction of the original implementation - after all it was a recursive implementation before - saves a lot of time but actually the JVM is quite powerful and mature at inlining and producing optimized native code. So I can only repeat the conclusion of my talk about Java Performance (which I proposed for EclipseCon Boston, too):

[..] Write Readable and Clear Code. [..] (David Keenan)
[..] slavishly follow a principle of simple, clear coding that avoids clever optimizations [..] (Caliper FAQ)
Performance advice has a short shelf-life  (B. Goetz)

From that point of view, the refactored implementation is definitely worth it, even though there is no big impact on the runtime behavior of the method.

Wednesday, November 14, 2012

Xtext Corner #9 - About Keywords, Again

In the last weeks, I compiled some information about proper usage of keywords and generally about terminals in Xtext:

Keywords may help to recover from parse errors in a sense that they guide the parser.
It's recommended to use libraries instead of a hard wired keyword-ish representation for some built in language features.
Data type rules are the way to go if you want to represent complex syntactical concepts as atomic values in the AST.

In addition to these hints, there is one particular issue that arises quite often in the Xtext forum. People often wonder why their grammar does not work properly for some input files but perfectly well for others. What it boils down to in many of these examples is this:

Spaces are evil!

This seems to be a bold statement but let me explain why I think that keywords should never contain space characters. I'm assuming you use the default terminals but actually this fits for almost all terminal rules that I've seen so far. There is usually a concept of an ID which is defined similar to this:

terminal ID:
('a'..'z'|'A'..'Z') ('a'..'z'|'A'..'Z'|'0'..'9')*;

IDs and Keywords

IDs start with a character followed by an arbitrary number of additional characters or digits. And keywords usually look quite similar to an ID. No surprises so far. Now let's assume a keyword definition like 'some' 'input' compared to 'some input'. What happens if the lexer encounters an input sequence 'som ' is the following. It'll start to consume the leading 's' and has not yet decided which token to emit, since it could become a keyword or an identifier. Same for the 'o' and the 'm'. The trailing space is the character where it can finally decide that 'som ' contains two tokens: an identifier and a whitespace token. So far so good.

Let the Parser Fail - For Free

Now comes the tricky part since the user continues to type an 'e' after the 'm': 'some '. Again, the lexer starts with the 's' and continues to consume the 'o', 'm' and 'e'. No decision was made yet: it could still be an ID or the start of the keyword 'some input'. The next character is a space, and that's the crucial part here: If grammar contains a keyword 'some input', the space is expected since it is part of the keyword. Now, the lexer has only one valid alternative. After the space it is keen on consuming an 'i', 'n', 'p', 'u' and 't'. Unfortunately, there is no 'i' in the parsed text since the lexer already reached the end of the file.

As already mentioned in an earlier post, the lexer will never roll back to the token 'some' in order to create an ID token and a subsequent whitespace. In fact the space character was expected as part of single token so it was safe to consume it. Instead of rolling back and creating two tokens, the lexer will emit an error token which cannot be handled by the parser. Even though the text appeared to be a perfectly valid ID followed by a whitespace, the parser will fail. That's why spaces in keywords are considered harmful.

In contrast, the variant with two split keywords of the grammar works fine. Here, the user is free to apply all sorts of formatting to the two adjacent keywords, any number of spaces, line breaks or even comments can appear between them, are valid and handled well by the parser. If you are concerned about the convenience in the editor - after all, a single keyword with a space seems to be more user friendly in the content assistant - I recommend to tweak that one instead of using an error prone grammar definition.

Thursday, November 8, 2012

Xtext Corner #8 - Libraries Are Key

In today's issue of the Xtext Corner, I want to discuss the library approach and compare it to some hard coded grammar bits and pieces. The question about which path to choose often arises if you want to implement an IDE for an existing language. Most languages use a run-time environment that exposes some implicit API.

Just to name a few examples: Java includes the JDK with all its classes and the virtual machine has a notion of primitive types (as a bonus). JavaScript code usually has access to a DOM including its properties and functions. The DOM is provided by the run-time environment that executes the script. SQL in turn has built-in functions like max, avg or sum. All these things or more or less an integral part of the existing language.

As soon as you start to work on an IDE for such a language, you may feel tempted to wire parts of the environment into the grammar. After all, keywords like int, boolean or double feel quite natural in a Java grammar - at least a first glance. In the long run it often turns out to be a bad idea to wire these things into the grammar definition. The alternative is to use a so called library approach: The information about the language run-time is encoded in an external model that is accessible to the language implementation.

An Example

To use again the Java example (and for the last time in this post): The ultimate goal is to treat types like java.lang.Object and java.util.List in the same way as int or boolean. Since we did this already for Java as part of the Xtext core framework, let's use a different, somehow artificial example in the following. Our dummy language supports function calls of which max, min and avg are implicitly available.

The hard-coded approach looks quite simple at first. A simplified view on the things will lead to the conclusion that the parser will automatically check that the invoked functions actually exist, content assist works out of the box and even the coloring of keywords suggests that the three enumerated functions are somehow special.

Not so obvious are the pain-points (which come for free): The documentation for these functions has to be hooked up manually, the complete signatures of them have to be hard-coded, too. The validation has to be aware of the parameters and return types in order to check the conformance with the actual arguments. Things become rather messy beyond the first quickly sketched grammar snippet. And last but not least there is no guarantee that the set of implicit functions is stable forever with each and every version of the run-time. If the language inventor introduces a new function sum in a subsequent release, everything has to be rebuild and deployed. And you can be sure that the to-be-introduced keyword sum will cause trouble at least in one of the existing files.

Libraries Instead of Keywords

The library approach seems to be more difficult at first but it pays off quickly. Instead of using hard-coded function names, the grammar uses only a cross reference to the actual function. The function itself is modeled in another resource that is also deployed with the language.

This external definition of the built-in functions can usually follow the same guidelines as custom functions do. But of course they may even use a simpler representation. Such a stub may only define the signature and some documentation comment but not the actual implementation body. It's actualy pretty similar to header files. As long as there is no existing format that can be used transparently, it's often the easiest way to define an Xtext language for a custom stub format. The API description should use the same EPackage as the full implementation of the language. This ensures that the built-ins and the custom functions follow the same rules and all the utilities like the type checker and documentation provider can be used independently from the concrete invoked function.

If there is an existing specification of the implicit features available, that one should be used instead. Creating a model from an existing, processable format is straight forward and it avoids mistakes because there is no redundant declaration of the very same information. In both cases there is a clear separation of concerns: The grammar remains what it should be: a description of the concrete syntax and not something that is tight to the run-time. The API specification is concise and easy to grasp, too. And in case an existing format can be used for that purpose, it's likely that the language users are already familiar with that format.

Wrap Up

You should always consider to use external descriptions or header stubs of the environment. A grammar that is tightly coupled to a particular version or API is quite error-prone and fragile. Any evolution of the run-time will lead to grammar changes which will in turn lead to broken existing models (that's a promise). Last but not least, the effort for a seamless integration of built-in and custom functions for the end-user exceeds the efforts for a clean separation of concerns by far.

A very sophisticated implementation of this approach, can be explored in the Geppetto repository at GitHub. Geppetto uses puppet files and ruby libraries as the target platform, parses them and puts them onto the scope of the project files. This example underlines another advantage of the library approach: It is possible to use a configurable environment. The APIs may be different from version to version and the concrete variant can be chosen by the user. This would never be possible with a hard-wired set of built-ins.

Tuesday, November 6, 2012

Xtext Corner #7 - Parser Error Recovery

A crucial feature of the parser in Xtext is the ability to recover from errors: The parser may not fail on the first erroneous token in an invalid input but should continue after that token. In fact it should continue to the end of the document and yield an AST that is incomplete but contains as many nodes as possible. This feature of the parser is called error recovery: The ability to consume input text that is not conform to the grammar.

Error recovery is obviously necessary in an interactive environment like an editor since most of the time the input will actually be invalid. As soon as the user starts to type, the document may be broken in all sorts of ways. The user does not really care whether his actions are in line with some internal AST-structure or grammar rules. Copy and paste, duplicate lines, remove portions of the file by using toggle comment or just plain insertion of characters into the editor - none of these operations should cause the parser to fail utterly. After all, content assist, the outline and code navigation are expected to work for broken documents, too - at least to some extend.

Recovery strategies

The Antlr parser that is generated from an Xtext grammar supports different recovery strategies. If the parser encounters an invalid input, it'll basically perform one of the following operations:

Skip the invalid terminal symbol
If the current token is unexpected, the following terminal is considered. If that one matches the current expectation, the invalid token is skipped and flagged with an error.
Inject the missing token
The seen terminal is not valid at the current position in the parsing process but would be expected as the subsequent token. In that case, the parser might inject the missing token. It skips the current expectation and continues with the next step in the parsing. The information about the missing token is annotated on the current input symbol.
Pop the parsing stack
If the input is broken in a way that does not allow the parser to skip or inject a single token, it'll start to consume the following terminal symbols until it sees a token that is somehow unique in the expectation according to the current parsing stack. The parser will pop the stack and do a re-spawn on that element. This may happen in the rare case that the input is almost completely messed up.
Fail
The mentioned recovery strategies may fail due to the structure of the grammar or the concrete error situation in the input. In that case parsing will be aborted. No AST for subsequent input in the document will be produced.

Helping the Parser

There exist several things that you should watch out for if you experience poor error recovery in your language. First and foremost it may be the absence of keywords in the grammar. Keywords are often the only anchor that the parser can use to identify proper recovery points. If you feel tempted to write an overly smart grammar without any keywords because it should look and feel like natural language, you should really reconsider your approach. Even though I don't want to encourage a keyword-hell, keywords are somehow convenient if they are used properly. And please note that things like curly braces, parentheses or other symbols with only one character are as good as a keywords as other, longer sequences - at least from the parsers perspective. So to give a very popular example: Instead of using indentation to describe the structure of your language (similar to Python), using a c-style notations may save you a lot of effort with the grammar itself and provide a better user experience when editing code. And keywords also serve as a nice visual anchor in an editor so users will have an easier time when reading code in your language.

A second strategy to improve the behavior of the parser and the chance for nice error recovery is the auto-edit feature. It may have some flaws but it's quite essential for a good user experience. The most important aspect here is the insertion of closing quotes for strings and comments. As soon as you have an input sequence that is not only broken for the parser but lets even the lexer choke, you are basically screwed. Therefore multiline comments and strings are automatically closed as soon as you open them. If you use custom terminal rules, you should really consider to look for unmatched characters that should be inserted in pairs according to the lexer definition. The rule basically applies for paired parentheses, too. Even though the concrete auto-edit features may still need some fine tuning to not get in the way of the user, they already greatly improve the error recovery of the parser.

Friday, November 2, 2012

Xtext Corner #6 - Data Types, Terminals, Why Should I Care?

Issue #6 of the Xtext Corner is about some particularities of the parsing process in Xtext. As I already mentioned a few times in the past, Xtext uses Antlr under the hood in order to do the pure parsing. This is basically a two step process: At first, the input sequence of characters is split into tokens (often referred to as terminals) by a component that is called the lexer. The second step is to process the resulting list of tokens. The actual parser is responsible for that stage. It will create the abstract syntax tree from the token stream.

This divide-and-conquer approach is mostly called parsing altogether so the distinction between lexing and parsing is quite encapsulated. Nevertheless, the Xtext grammar definition honors both aspects: It is possible to define (parser) rules that are processed by the parser and it is also possible to define terminal rules. Those will be handled in the lexer. So the when should I use parser rules and when should I use terminal rules?

Production Rules also: Parser Rules

The obvious case and just for the sake of completeness: Production rules will yield an instance in the abstract syntax tree. These can only be implemented by the parser thus there is no question whether to use terminals instead. Production rules are the most common rules in almost every Xtext grammar.

Data Type Rules

Those are a completely different thing even though they are handled by the parser, too: Where ordinary parser rules produce instances of EClasses, data type rules will return data types (you did not guess that, did you?). Data types in the sense of Xtext and its usage of the Eclipse Modeling Framework are basically primitive Java types, Strings or other commons like BigDecimal or enums. The parser will not create those on its own but rather pass the consumed tokens as a string to a value converter. The language developer is responsible for converting the string to a data type.

Terminal Rules

Terminal rules are essentially the same as data type rules when you only consider the interface to the grammar. Internally they are completely different since they will not be processed by the parser but by the lexer. The consequences are quite severe if you want to get a working grammar. But one step after the other: As already mentioned, terminal rules can return the very same things as data type rules can. That is, they yield Strings, ints or other primitives. But since they are handled by the lexer, they are not quite as powerful as data type rules are.

Implementation aspects

The lexer is a pretty dumb component which is generated in a way that weights performance over intuitive behavior. Where the parser generator will produce nice error messages in case of ambiguous data types rules, conflicting terminals are mostly resolved by a first-come-first-served (FCFS, also FIFO) principle. For terminals it's crucial to get them in the right order. Consider the following terminal rules:

terminal ID: ('a'..'z') ('a'..'z'|'0'..'9')*;
terminal CHARS: ('a'..'z'|'_')+;

The ID rule shall consume something that starts with a lowercase letter and is followed by a letter or number. The CHARS rule is pretty close to that one: It shall match a sequence that contains only lowercase letters or underscores. The problem with these is that the matched sequences are not mutually exclusive. If you take the input abc as an example, it will be matched as an ID given the two rules above. As soon as you switch the order of the declarations, the sequence abc will out of a sudden be returned as CHARS. That's one thing that you have to be a aware of if you use terminal rules. But there is more to keep in mind.

Terminal rules are applied without any contextual information which has some interesting implications. The plus: it's easily possible to use the lexer on partial input - entry points can be computed almost trivially. There is no such thing as an entry rule as for the parser. But the disadvantages have to be taken into account, too. The lexer does not care about the characters that are still to come in the input sequence. Everything that matches will be consumed. And not reverted (as of Antlr 3.2). To explain what that means, let's take another example:

terminal DECIMAL: INT '.' INT;
terminal INT: '0'..'9'+;

The rules define decimal numbers in a hypothetical language that also supports method invocation. At a first glance, things seem to work fine: 123 is consumed as an INT where 123.456 will be a decimal. But the devil's in the details. Let's try to parse the string 123.toString(). The lexer will find an INT 123 - so far, so good. Now it sees a dot which is expected by the terminal rule DECIMAL. The lexer will consume the dot and try to read an INT afterwards - which is not present. Now it'll simply fail for the DECIMAL rule but never revert that dot character which was consumed almost by accident. The lexer will create an invalid token sequence for the parser and the method call cannot be read successfully. That's because the lexer simply does not know about things like the expectation in the current parser state. Attempts to define decimals, qualified names or other more complex strings like dates are very error prone and can often be implemented quite easy by means of data type rules:

DECIMAL: INT '.' INT;
terminal INT: '0'..'9'+;

Terminals Considered Harmful?

Data type rules move the decision from the ~~dumb~~ highly optimized lexer to the parser which has a lot more information at hand (the so called look ahead) to make decisions. So why not use data types everywhere? The simple answer is: performance. The duo of lexer and parser is optimized for a stream of reasonable sized tokens instead of hundreds of single characters. Things will not work out that well with Antlr at run-time if the parser is implemented in a so called scanner-less manner. The rule of thumb here is to use only a few number of terminal rules that can be distinguished easily and put data type rules on top of those. It'll simplify your live as a language developer tremendously.

Thursday, November 1, 2012

JUGFFM: A Scenic View, Ebblewoi and Very Nice People

Yesterday I had the opportunity to give a presentation about Xtend at the JUG Frankfurt. I really enjoyed it since the audience had a lot of very good questions and quite some interesting discussion unfolded from those during the talk and thereafter. Many thanks to Alex who organized the event.

The JUGF Stammtisch took place in the German National Library which is such an amazing location. We were in a room on the upper floor and the nightly view on the skyline of Frankfurt was almost paralyzing - I even forgot to take a picture... For the informal Stammtisch after the talk we changed location to a secret Franconian Apfelwein Schenke whose coordinates may not be disclosed. According to the locals, it's one of the last resorts in Frankfurt that's still rather free from tourists (except for the Kieler guy who will probably never manage to pronounce Ebblewoi correctly).

Dank an @szarnekow für den interessanten Xtend-Vortrag bei der @jugffm. Laßt euch das Apfelkompott schmecken... ;-)
— Hameister (@Hameiste) October 31, 2012

In the pub, the discussions continued over cutlet with traditional Green Sauce, typical Franconian cider or beer, since only Hessian stomachs can handle proper amounts of Ebblewoi. Unfortunately I had to leave at 10pm since I had to catch the receptionist in the hotel (thanks for the short briefing before I left, Apple maps indeed tried to play tricks on me...).

To put a long story short: The JUGF is a really nice crowd and I enjoyed the company a lot. Their next meeting is already on 07 Nov 2012. If you are in Frankfurt next week, make sure to stop by if you want to discuss DevOps topics.

Wednesday, October 31, 2012

Xtext Corner #5 - Backtracking vs Syntactic Predicates

The Xtext grammar language allows to create a working parser in almost no-time. Its concise notation to describe the concrete syntax and the mapping to an object model is giving quite a jump start if you want to create a language. Nevertheless it's also quite easy to get into some trouble. Xtext uses Antlr 3.2 as the underlying parser technology and we try really hard to hide the complexity and peculiarity of Antlr. Unfortunately that's not possible in all cases. From time to time Antlr will report ambiguities in the grammar definition with a charming message like this:

warning(200): Decision can match input such as "{EOF, RULE_ID, '('}" using multiple alternatives: 1, 2
As a result, alternative(s) 2 were disabled for that input

The parse generator basically complains about an ambiguous grammar. At some point in the syntax description it cannot decide which path to follow for a given input sequence. It's rather obvious that the warning message is not really helpful. Neither is there any chance to find the correct line that caused the error (which is not a problem of Antlr but cause by the translation of Xtext to Antlr) nor is it easily possible to spot the concrete decision that the parser generator complains about. The worst about this is that it's not really a warning either. What the parser generator basically did is the following: It removed a possible path from the grammar description. It will always choose the one remaining path for that particular situation. Which could by chance be the one that you'd expect. But it could also be the wrong path.

AntlrWorks

Fortunately there is a tool that helps to identify the problem: AntlrWorks allows to take a look at the grammar and visualize all the problems that it has graphically. It's still far from trivial to find the root cause of a problem but better than nothing. Make sure you pick the version 3.2 from download section if you want to give it a try.

Now you may wonder how you should handle the cases that are ambiguous by definition and by intention. You could of course enable backtracking in your language and afterwards everything appears to be fine. However, you can think about backtracking as a wildcard for Antlr to remove alternatives from your grammar everywhere where it spots an ambiguity. This will shadow the real problems in the grammar that may be introduced due to subsequent changes, a refactoring or new language features. That's why I strongly recommend to go for the hard way and analyze the root cause for the warnings. As soon as you found the actual decision that the parser generator complained about, you can use a syntactic predicate to fix that locally. Now you are in control on which alternative to remove and which path to follow. I think that makes perfect sense to be in charge in those cases.

Backtracking

But the shadowing of errors in the generator is only one drawback of backtracking. It will furthermore lead to surprising messages at run-time. If you consider the following snippet it's easy to see that the right operand of the binary operation is missing.

The parser will correctly report something along the lines of

mismatched input '}' expecting RULE_INT

Unfortunately it does so on a totally unexpected location. If you enabled backtracking and the algorithm decides that the function declaration is not complete - it fails to read a valid function body - the parsing will roll back to the start of the function and put the most specific error message on that token. You'll see an error marker under the keyword function. However, it would be more intuitive to have that error on the binary operation. At least that's what I would expect, wouldn't you?

Syntactic Predicates

Nevertheless it's not always possible to write an unambiguous grammar. There are some common patterns that are undecidable by definition. The most famous one is the Dangling else. If a language allows nested if-else constructs, it's not definite where a subsequent else keyword may belong to. Consider the following Java snippets which only differ in formatting:

The semantics of both code snippets should be independent from the formatting. Nevertheless it's ambiguous for the parser in the same way as a reader might be confused by an inconsistent indentation. Therefore you have to force the parser into one concrete direction in order to disambiguate the grammar. A syntactic predicate has to be added.

The => operator forces the parser to go a certain path if the input sequence would allow two or more possible decisions. It can be read as If you see these tokens, go this way. It's even possible to use alternatives or groups of elements as the criteria. Only the UnorderedGroup is prohibited in predicates.

In this example, the parser shall follow the given path if it can look ahead to a sequence like person.name= (or more abstract ID . ID =).

Implementation Detail

One thing is important to note. Syntactic predicates in Xtext are different from the plain Antlr predicates. In the Xtext grammar language it's only possible to use a complete or a partial sequence of production tokens as the predicate where Antlr allows to use arbitrary tokens that seem to be independent from the actual rule content. Here Antlrs approach appears to be more powerful. But actually that's only at a first glance. Firstly Xtext's variant is easier to use since you don't have to repeat parts of your grammar manually. And secondly it's the framework that does the heavy lifting: The syntactic predicates in Xtext are automatically propagated to the right places which you'd have to do manually otherwise. Just insert it at the spot that you identified with AntlrWorks and you're done.

Monday, October 29, 2012

Xtext Corner Revived

It's been a long time since I wrote about Xtext tips and tricks. However, I assembled a bunch of interesting tips and tricks while I prepared my Xtext Best Practices session for this years EclipseCon which I want to share with you.

The talk starts with a short overview on how I personally like to tackle the task of implementing a language with Xtext. If the syntax is not yet carved in stone, I usually start of with some sketched sample files to get an idea about the different use cases. In doing so it's quite important to find a concise notation for the more common cases and to be more verbose with the unusual patterns that are anticipated in the language. As soon as the first version of the syntax is settled, it's obvious to begin with the grammar declaration.

That's a task that I really like. The grammar language of Xtext is probably the most concise and information rich DSL that I ever worked with. With very few orthogonal concepts it's possible to describe how a text is parsed and in the very same breath map those parsed information to a in memory representation. This representation is called abstract syntax tree (AST) and often referred to as model. The AST that Xtext yields is strongly typed and therefore heterogeneous, but still provides generic traversal possibilities since it is based on the Eclipse Modeling Framework (EMF, also: Ed Merks Framework). So the grammar is about the concrete syntax and its mapping to the abstract syntax.

As soon as the result of the parsing is satisfying, the next step when implementing a language is scoping. Without that one, any subsequent implementation efforts are quite a waste of effort. Scoping is the utility that helps to enrich the information in the AST by creating a graph of objects (Abstract Syntax Graph, ASG). This process is often called cross linking. Thereby some nodes in the tree will be linked with others that are not directly related to them in the first place. This is one of the most important aspects of a language implementation because after the linking and scoping was done, the model is actually far more powerful from a clients perspective. Any code that is written on top of that can leverage and traverse the complete graph even if the concrete language is split across many files.

Validation is the next step and it is implemented on top of the linked ASG. While the parser and the linking algorithm already produced some error annotations on invalid input sequences, it's the static constraint checking which will find the remaining semantic problems in the input. If the files were parsed and linked successfully and the static analysis does not reveal any problems, the model can be considered valid.

Now that one can be sure that the ASG as the in-memory representation of the files fulfills the semantic constraints of the language, it's possible to implement the execution layer which is often a compiler, a code generator or an interpreter. Actually those three are all very similar. You can think of a code generator as an interpreter which evaluates a model to a string. And of course a compiler is pretty much the same as a code generator but the output is not plain text but some sequence of bytes. The important thing is that the evaluation layer should (at least in the beginning) only consider valid input models. This will dramatically simplify the implementation and that's the reason why I like to implement that on top of a checked ASG. You don't have to take all those possible violated constraints into account.

Now there is of course still the huge field of the user interface that entwines around the editor and its services like content assist, navigation or syntax coloring. However, I would usually postpone that until the language runtime works at least to some extend.

The most important message in this intro is that this is not a waterfall process. All this can be implemented in small iterations each of which is accompanied with refined sample models, unit tests (!) and feedback from potential users.

In the next days I'll wrap up some of the main points of my presentation which will be about grammar tips, some hints on scoping, validation or content assist. Stay tuned for those!

EclipseCon 2013, Proposal Submitted? Check!

As the early bird submission deadline for the EclipseCon 2013 in North America is approaching, I took the time and proposed a session that I had quite some fun with in Ludwigsburg.

The overwhelming interest in my talk about Java Performance MythBusters motivated me to propose round 2. I expect that the time until next years EclipseCon will bring some new insights and refined numbers, too.

After all, Java8 is currently under heavy development and so it's quite likely that the measured times and numbers will change dramatically. And of course it will be interesting to take a look at the performance characteristics of other platforms, e.g. Linux and Windows.

Which topics would you be interested in? Run-time cost of reflective access? Arrays vs collections? Auto-boxing? There are still plenty of myths out there and I will again pick some of them to go for the next round. Let's put them on the test-bet!

Xtend @ JUGF

In Frankfurt and no plans for Wednesday evening? How about joining the JUGF-Stammtisch on 31 Oct 2012 at 18:30 in the German National Library. I will be there and give a talk about Xtend featuring a preview of the upcoming language feature called Active Annotations. If you are interested in the latest news on Xtend, make sure you register for the session and attend the Stammtisch. See you there!

Friday, October 26, 2012

EclipseCon Europe 2012 - Wrap-Up

As promised, this years EclipseCon Europe again was a great community event with astonishing technical content, outstanding food and most importantly many good friends. The conference organizers did a great job and prepared something for everybody: there were autonomously flying robots, a circus with do-it-yourself fire breathing and a great live band. It's this package which makes the EclipseCon a unique and memorable experience. And the co-located beerfest at the Nestor Bar did its share to ensure that we don't get too much sleep.

However, as Sepp Herberger put it: "After the game is before the game!" The next EclipseCon will be in Boston, 25 - 28 March.

The early bird deadline for the call for papers is the 31 Oct. Don't hesitate and submit your proposals about the things that you want to share with others! The more the merrier!

There will also be an EclipseCon in France, on 5-6 June for the first time. After the great success of this years EclipseDay in Toulouse, the foundation will organize a two day conference there in the next year! Stay tuned for the call for papers.

And of course you should safe the date for next years ECE in Ludwigsburg from 29-31 Oct.

In the meantime make sure you don't forget to complete the conference survey and provide feedback for the speakers.

Tuesday, October 16, 2012

EclipseCon Europe - Join the Party!

Only one week until EclipseCon Europe 2012 will take off in Ludwigsburg. Again hundreds of Eclipse enthusiasts will strive for the next record of highest WiFi usage ever in the Swabian city with the largest baroque castle in Germany. From Oct 23 to 25 the Forum am Schlosspark will transform to a vibrant place of technical discussions, entertaining sessions and socializing. Thanks to the huge amount of submissions from the Community the program committee managed again to tie up three days of deep technical content about Eclipse, the framework and the ecosystem, about its past, present and future (actually not to much about the past, but that's a good thing, isn't it?).

I will have the pleasure to talk about a potpourri of different topics that each cover some field of interest of mine.

Tue 9:00AM - 12:30PM: Getting Started With Xtend
My conference starts on Tuesday Morning at 9:00 in the Schubartsaal. Sven and I give a tutorial about Xtend where you will have the chance to get your hands dirty on interesting and challenging programming problems and puzzlers. You should not miss that one!

Wed 2:00PM - 2:30PM: Xtext - Best Practices
On Wednesday I will share lessons learned when using the Xtext framework. I will cover a number of topics that I encountered in the Xtext newsgroup and other noteable things that can be important in your daily work with Xtext. If you are already familiar with this cool framework and want to know more about it or just contribute your own experience to the discussion, stop by in the Theater on Wednesday, 2:00 PM.

Thu 10:30 - 11:00: Java Performance MythBusters
The submission of this talk was inspired by a talk by Arno Haase that I attended at the JAX (Arno greatfully gave permission to hijack the title of his talk - thanks for that!). In this session I want to shed light on some myths about Java's performance and often recommended Dos and Don'ts. Come to the Schubartsaal on Thursday 10:30 and I bet you'll be surprised.

Thu 1:30PM - 2:00PM: Null-Safety on Steroids
Even though the new annotation based null-ness analysis of Eclipse Juno is often very helpful, I am not really fond of the implications that their design has on a reasonable sized code-base. In my last session at this year's ECE I want to share my impressions about null-safety and static analysis. Join me in Silchersaal, Thu 1:30PM if you want to learn about different approaches to tackle the infamous NullPointerException.

Of course there are other interesting sessions, too, e.g. John Arthorne raises the question about The future of Eclipse. The marriage of JavaFX and e4 seems to be a hot topic, too, since JavaFX is a quite powerful rendering technology. And naturally I'm excited about other Xtend and Xtext related content.

If you are still not convinced, take a look at the schedule yourself and make sure you join the party!

Monday, October 8, 2012

Revisited: Xtend @ JavaOne 2012

My talk about Xtend at this years JavaOne is now available in the content catalog on the conference website.

After a quick motivation and the answer to the obvious question "Why the heck did these guys develop yet another JVM language?", I gave a short overview on the basic ideas and design principles behind Xtend. Next up was a demo with different code snippets. Basically it was a walk-through with the examples that can be loaded into everybody's Eclipse as soon as the Xtend SDK is installed. Just select New -> Example... -> Xtend Introductory Examples and there you go.

The last part of the talk was about Active Annotations, a unique feature that will be part of the next version of Xtend. To put it into a few words it's Java's annotation processing on steroids. Active annotations may contribute to the translation of Xtend code to Java and even modify the result of that process. They allow to create additional types, use information from other resources or validate the Xtend code according to the annotation's semantics. Along with the powerful means to design creative and expressive APIs and the tight integration with Xtext languages, Xtend's exceptional support for domain-specific languages is raised to the next level by Active Annotations. Stay tuned for more information!

Tuesday, September 4, 2012

Xtend @ JavaOne 2012

This year, I will attend the JavaOne for the very first time and I'm really looking forward to it since I expect it to be a great conference (almost as good as EclipseCon, I'd say ;-). I'm especially excited about the fact, that my session about Xtend got accepted and therefore I will have the chance to give a talk about that great language which we are working on in Kiel, Germany. The session will be on Tuesday, Oct 2, 10:00 AM - 11:00 AM @ Hilton San Francisco - Golden Gate 6/7/8.

Don't expect a talk about language theory or something. I will only bring a few slides to give a brief introduction and insight into the ideas and philosophy behind Xtend. The bigger part of the session will be pure live coding. Xtend plays nicely with all these cool frameworks in the Java ecosystem so expect a variety of interesting code snippets. And as a guy from northern Germany I will even prepare an example that has to do with beer. And singing. Just kidding, I won't do that to you.
So if you are at JavaOne this year, make sure to stop by to learn more about Xtend.