As the Xtext builder loads every affected resource into a single resource set to update its cached state and validate the changes, it tends to require a huge amount of memory for a clean build of a really large project. If there are thousands of files involved it may happen that the machine simply runs out of available physical memory. As a side-effect performance degraded significantly due to paging and the vm crashed with an OutOfMemoryError.
During the past weeks we worked hard to fix this for Xtext SR1 which is due end of August. And the good news is - we could tackle the problem at its root. Finally we were able to build huge projects with a reasonable heap size that previously refused to build with 8Gb++.
The solution was a (more or less) simple divide and conquer algorithm: We split the job into two well defined phases and clustered each phase. This enables the builder to release memory early and thereby drastically reduce the peak amount of required heap.
To enable the clustering Xtext builder in your IDE, you have to override some central functionality that drives the builder for every language in your running IDE. Simply add an extension to your ui plugin that overrides the common configuration.
The second step is to override a binding in the ui module of your language to be communicate with the clustered builder. This implies that each Xtext based language that is installed in the IDE, has to expect the clustering builder instead of the default implementation. Otherwise you'll be confronted with an exception due to incompatible settings. The bindind in your ui module replaces the component that is used by the scope provider in the builder context:
Besides this global change to reduce the memory consumption, you may want to optimize some language specific implementations. It is always a good idea to review the number of exported objects. There is usually no need to export any object that has a name. E.g. a local variably is only reachable from inside the resource, it can simply be skipped when the exported objects are computed. Another good candidate is the fragment provider. The default fragments for resources are quite generic and somehow verbose. It will not hurt if you can come up with a shorter, resource specific representation. Both customization points will usually not only make your language consume less memory but even have potential to improve the performance as well because you can use language specific information in your implementation.