Thursday, November 8, 2012

Xtext Corner #8 - Libraries Are Key

In today's issue of the Xtext Corner, I want to discuss the library approach and compare it to some hard coded grammar bits and pieces. The question about which path to choose often arises if you want to implement an IDE for an existing language. Most languages use a run-time environment that exposes some implicit API.

Just to name a few examples: Java includes the JDK with all its classes and the virtual machine has a notion of primitive types (as a bonus). JavaScript code usually has access to a DOM including its properties and functions. The DOM is provided by the run-time environment that executes the script. SQL in turn has built-in functions like max, avg or sum. All these things or more or less an integral part of the existing language.

As soon as you start to work on an IDE for such a language, you may feel tempted to wire parts of the environment into the grammar. After all, keywords like intboolean or double feel quite natural in a Java grammar - at least a first glance. In the long run it often turns out to be a bad idea to wire these things into the grammar definition. The alternative is to use a so called library approach: The information about the language run-time is encoded in an external model that is accessible to the language implementation.

An Example

To use again the Java example (and for the last time in this post): The ultimate goal is to treat types like java.lang.Object and java.util.List in the same way as int or boolean. Since we did this already for Java as part of the Xtext core framework, let's use a different, somehow artificial example in the following. Our dummy language supports function calls of which max, min and avg are implicitly available.
The hard-coded approach looks quite simple at first. A simplified view on the things will lead to the conclusion that the parser will automatically check that the invoked functions actually exist, content assist works out of the box and even the coloring of keywords suggests that the three enumerated functions are somehow special.

Not so obvious are the pain-points (which come for free): The documentation for these functions has to be hooked up manually, the complete signatures of them have to be hard-coded, too. The validation has to be aware of the parameters and return types in order to check the conformance with the actual arguments. Things become rather messy beyond the first quickly sketched grammar snippet. And last but not least there is no guarantee that the set of implicit functions is stable forever with each and every version of the run-time. If the language inventor introduces a new function sum  in a subsequent release, everything has to be rebuild and deployed. And you can be sure that the to-be-introduced keyword sum will cause trouble at least in one of the existing files.

Libraries Instead of Keywords

The library approach seems to be more difficult at first but it pays off quickly. Instead of using hard-coded function names, the grammar uses only a cross reference to the actual function. The function itself is modeled in another resource that is also deployed with the language.
This external definition of the built-in functions can usually follow the same guidelines as custom functions do. But of course they may even use a simpler representation. Such a stub may only define the signature and some documentation comment but not the actual implementation body. It's actualy pretty similar to header files. As long as there is no existing format that can be used transparently, it's often the easiest way to define an Xtext language for a custom stub format. The API description should use the same EPackage as the full implementation of the language. This ensures that the built-ins and the custom functions follow the same rules and all the utilities like the type checker and documentation provider can be used independently from the concrete invoked function.

If there is an existing specification of the implicit features available, that one should be used instead. Creating a model from an existing, processable format is straight forward and it avoids mistakes because there is no redundant declaration of the very same information. In both cases there is a clear separation of concerns: The grammar remains what it should be: a description of the concrete syntax and not something that is tight to the run-time. The API specification is concise and easy to grasp, too. And in case an existing format can be used for that purpose, it's likely that the language users are already familiar with that format.

Wrap Up

You should always consider to use external descriptions or header stubs of the environment. A grammar that is tightly coupled to a particular version or API is quite error-prone and fragile. Any evolution of the run-time will lead to grammar changes which will in turn lead to broken existing models (that's a promise). Last but not least, the effort for a seamless integration of built-in and custom functions for the end-user exceeds the efforts for a clean separation of concerns by far.

A very sophisticated implementation of this approach, can be explored in the Geppetto repository at GitHub. Geppetto uses puppet files and ruby libraries as the target platform, parses them and puts them onto the scope of the project files. This example underlines another advantage of the library approach: It is possible to use a configurable environment. The APIs may be different from version to version and the concrete variant can be chosen by the user. This would never be possible with a hard-wired set of built-ins.


soru said...

Interesting, but I don't get a sense from this of _how_ you would actually go about doing it. When you define your grammar file, how do you tell the system 'this category of things is going to be supplied at run-time, not defined somewhere else in the file'?

Sebastian Zarnekow said...

Just as it was described in the post:

FunctionCall: function=[Function]

Your scope provider is responsible for finding the invoked function thus you are free to look that up from the current file, the workspace or some implicit library.

Oliver L said...

Hi Sebastian,

I really like this approach.

What's the best way to ensure that these libraries are non-expandable?

I just want the language user not to change or add any library functions. So I guess it would be the best solution to roll it out with the language.

Anyway, there must be a way to import these libraries implictly (like Xtend does with IterableExtensions,...).

It would be great if you could show how to define those implicit imports in one of your upcoming Xtext Corner blogs. I guess the ImportedNamespaceAwareLocalScopeProvider is a good starting point, isn't it?

Kind regards

Sebastian Zarnekow said...

ImportedNamespaceAwareLocalScopeProvider.getImplicitImports(boolean) is a good starting point (or if you use Xbase: XbaseImportedNamespaceScopeProvider.getImplicitImports(boolean) respectively).

The easiest way in the Java context is most likely to ship the libs in a jar and add that one to the class-path of the project.

soru said...

I'm definitely still missing something about how this would work. If I type:
FunctionCall: function=[Function]

into the xtext syntaxt editor, I get an error 'cannot find type for Function'. And as far as I can see, all the scope provider stuff happens downstream from that, after you have generated code from .xtext file for your DSL editor.

Which you can't do, as it has an error in it.

Is there a tutorial or something somewhere that covers this?

Sebastian Zarnekow said...

Xtext is covered in many tutorials throughout the web and so is the difference between cross references and object instantiation, using EMF and such things. Actually it's a basic thing to have something like

Function: name=ID;
FunctionCall: function=[Function];

The first rule will declare a new function where the second calls such a function. Now it's perfectly possible to create a library with a couple of function declarations that are made available for clients.

soru said...

OK, I think I get it now. You define the _thing_ you want as an unreachable node in the grammar. Left alone everything would be statically legal, just there would be no possible input that would ever match the 'function call' rule.

Then you plug in the _set of values for the thing_ at run-time. That way, you are not modifying the grammar.

Sebastian Zarnekow said...

I'm not sure that I understood what understood ;-)

No, I don't introduce hidden or invisible concepts in the grammar. It's just the idea that a mechanism which could be used to define functions in the language, is used to define the library functions. Alternatively a dedicated syntax can be introduced to define the libraries iff the language itself does not allow to define functions. I think I will come up with a how to for this approach in the upcoming weeks.

Cristiano Gavião said...

What I couldn't figure out yet is how would be the best way to reuse generation/inferrer defined with a library element.
For example, suppose that I have a HousePartsLibrary DSL.
With this Dsl I could define different house parts: BlackDoor, BlueDoor, BrickWall, GlassWall, TransparentRoof, etc.
And I have a second dsl HouseBuildingDSL where I could reuse all parts created by first dsl. So one house could be:
House myhouse{
MainRoof mainroof : TransparentRoof
It would be interesting to reuse the generator/inferrer of the created element.
Could you explain if/how could we do that ?

Sebastian Zarnekow said...


I'm not sure what you have in mind when you say 'reuse' in that context. Both the generator and the inferrer are Java classes. Reusing those is no different from other classes: composition or inheritance will work. If you want to reuse the output of the inferrer / generator, it's straight forward, too. The inferrer will add the elements to the resource that contains the original model so the inferred things are part of the resource content. The generated parts will be available anyway. Does that make sense to you? Maybe I'm missing your point, though.


Jon said...

Hi Sebastian,

I have read this post on several occasions now but still haven't quite understood it.

You commented earlier that "It's just the idea that a mechanism which could be used to define functions in the language, is used to define the library functions." then "Alternatively a dedicated syntax can be introduced to define the libraries iff the language itself does not allow to define functions." and finally "I think I will come up with a how to for this approach in the upcoming weeks."

I would very much like to read such a how to, describing the first and second approaches. A pair of demonstration projects would be even better. Is there any chance of this happening in the near future? I think this would make a valuable addition to the Xtext documentation as it seems to me a recurring theme in language development.

Specifically, I have a situation where I would like to allow users to define functions in my language, but I also want to provide a standard library of function which follow the same pattern. Some indication of how to validate these would be great. I would rather not use EMF directly as I am only familiar with Xtext.



Sebastian Zarnekow said...


I'm not sure that I'll find the time to provide a fully working example in the next weeks.

The simplest idea is basically the same as with other artifacts, e.g. with Java libs: You create an archive that contains your library models and put that one onto the classpath of the project. There are other approaches that do not require a java project, but if you use reuse Java classpath semantics, it should work transparently.

Jon said...

Thanks Sebastian. That's helps somewhat. I will keep checking back in hope of an extended example; no pressure!

Paul Muntean said...

Hallo Sebastian,

ich habe eine frage die ich auf Stackoverflow gepostet habe.
Vieleicht kast du mir helfen die zu beantworten.

Mit freundlichen Grüße,

Paul Muntean

Paul Muntean said...
This comment has been removed by the author.
Paul Muntean said...
This comment has been removed by the author.
Neeraj said...

Nice article. Any thoughts on the following. Is it possible ?