[Gambas-user] Replacing the JIT component

Sun May 13 18:55:00 CEST 2018

Le 06/05/2018 à 21:43, Benoît Minisini a écrit :
> Hi,
> 
> This is just an idea in the fly. There is nothing concrete at the moment.
> 
> As you may know, there is a JIT component in Gambas (not made by me) 
> based on LLVM. Alas it does not work with recent versions of LLVM, 
> because apparently this huge project is not able to keep backward 
> compatibility between minor versions. (Worse than GTK+, I couldn't 
> imagine it was possible!)
> 
> What to do now?
> 
> 1) Rewriting the JIT compiler for newer versions of LLVM. Alas the 
> original author does not give any news, and I have no knowledge about 
> LLVM, and its C++ interface looks horrible to me.
> 
> 2) gcc has now a JIT library, but it is an alpha version with a big 
> warning that everything may change between releases, even if it is 
> apparently relatively stable. Moreover, no idea about how many bugs the 
> library has.
> 
> 3) Writing a Gambas -> C translator.
> 
> My idea is that the compiler, or maybe an external program eventually 
> written in Gambas, takes a class/module source code, and transform it 
> into a C source file.
> 
> That C source file will call the interpreter functions when needed 
> through a dedicated interpreter API.
> 
> Then the C source file will be transform into a shared library loaded at 
> runtime by the interpreter (calling the C compiler that hence must be 
> installed).
> 
> The advantages are:
> 
> - C syntax is thousand times more stable than a JIT compiler library 
> that changes at each version.
> 
> - Maybe better optimizations.
> 
> The disadvantages are:
> 
> - You need the compiler. But the JIT library needs most of it too, so...
> 
> - Compiling is slower than calling a JIT library.
> 
> - JIT library can compile at the function level. This is not practical 
> with a compiler (we won't make a shared library for each JIT function!). 
> One shared library for each class, or even one for the entire project 
> may be the solution.
> 
> Now I'm waiting for your comments!
> 
> Regards,
> 

I don't think the option #3 is a good way to go (not that the others are 
better …).

First, going that route will require a *lot* of work, both on short- and 
long-term.
You will first have to write a full Gambas-to-C transpiler, which 
involves either plugging into gbc (unless you read the compiled output), 
or maintaining the full grammar on two different compilers.
Then it requires adding and maintaining extra hooks in the interpreter 
for the generated binary to link onto, which I don't know much about, 
but is probably not trivial either.

Having to perform a full compilation at startup also completely 
nullifies the advantages of Just-In-Time compilation: standard 
compilation (e.g. GCC or LLVM executable invocation) is not just slower, 
it is orders of magnitude slower, which even for small bits could just 
add a few seconds of startup time to any app that uses it. And 
considering 99.9% of Gambas code speed isn't really affected by any kind 
of compilation to native code (I think the only ones that are enhances 
are long, tight loops that do number crunching, but most if not all apps 
are just a bunch of calls to component APIs or similar), the tradeoff 
becomes not worth it at all in most cases. :/

Considering all of this, the first solution that comes to my mind could 
be to simply drop the gb.jit component : it wouldn't be too big of a 
breaking change I think, considering the interpreter already falls back 
to normal operation if it can't find the gb.jit component to perform the 
compilation.
And it seems it already doesn't work for most users, so dropping 
something that doesn't work anyway probably isn't a huge deal …

The alternative option (which is much simpler than the first three), 
would be to statically link against a specific version of LLVM (the one 
that works for us, we could use git submodules to do this), bundling 
only the parts of the compiler we need with the gb.jit component. Then 
it works for everybody out of the box, and we can update it only when we 
want, incrementally (and drop all those ugly #ifdef checks).

Now that I think about it, LLVM is not meant to be used as a dynamic 
library at all (which explains the pain you're going through trying to 
do so, and why they release breaking versions regularly). All the 
projects I know that rely on LLVM (like clang, openjdk, mono, 
emscripten, rustc …) link it statically.

-- 
Adrien Prokopowicz