[Gambas-user] Replacing the JIT component

Sun May 13 20:02:57 CEST 2018

Le 13/05/2018 à 18:55, Adrien Prokopowicz a écrit :
> Le 06/05/2018 à 21:43, Benoît Minisini a écrit :
>> Hi,
>>
>> This is just an idea in the fly. There is nothing concrete at the moment.
>>
>> As you may know, there is a JIT component in Gambas (not made by me) 
>> based on LLVM. Alas it does not work with recent versions of LLVM, 
>> because apparently this huge project is not able to keep backward 
>> compatibility between minor versions. (Worse than GTK+, I couldn't 
>> imagine it was possible!)
>>
>> What to do now?
>>
>> 1) Rewriting the JIT compiler for newer versions of LLVM. Alas the 
>> original author does not give any news, and I have no knowledge about 
>> LLVM, and its C++ interface looks horrible to me.
>>
>> 2) gcc has now a JIT library, but it is an alpha version with a big 
>> warning that everything may change between releases, even if it is 
>> apparently relatively stable. Moreover, no idea about how many bugs 
>> the library has.
>>
>> 3) Writing a Gambas -> C translator.
>>
>> My idea is that the compiler, or maybe an external program eventually 
>> written in Gambas, takes a class/module source code, and transform it 
>> into a C source file.
>>
>> That C source file will call the interpreter functions when needed 
>> through a dedicated interpreter API.
>>
>> Then the C source file will be transform into a shared library loaded 
>> at runtime by the interpreter (calling the C compiler that hence must 
>> be installed).
>>
>> The advantages are:
>>
>> - C syntax is thousand times more stable than a JIT compiler library 
>> that changes at each version.
>>
>> - Maybe better optimizations.
>>
>> The disadvantages are:
>>
>> - You need the compiler. But the JIT library needs most of it too, so...
>>
>> - Compiling is slower than calling a JIT library.
>>
>> - JIT library can compile at the function level. This is not practical 
>> with a compiler (we won't make a shared library for each JIT 
>> function!). One shared library for each class, or even one for the 
>> entire project may be the solution.
>>
>> Now I'm waiting for your comments!
>>
>> Regards,
>>
> 
> I don't think the option #3 is a good way to go (not that the others are 
> better …).
> 
> First, going that route will require a *lot* of work, both on short- and 
> long-term.
> You will first have to write a full Gambas-to-C transpiler, which 
> involves either plugging into gbc (unless you read the compiled output), 
> or maintaining the full grammar on two different compilers.
> Then it requires adding and maintaining extra hooks in the interpreter 
> for the generated binary to link onto, which I don't know much about, 
> but is probably not trivial either.

I don't say it's simple, but it should not be very difficult to generate 
C code once the source code has been parsed by the reader and the tree 
expression generator.

Moreover, the compiler tree expression generator statically computes the 
datatype of every intermediate expression, which can be used to generate 
static C code accurately.

> 
> Having to perform a full compilation at startup also completely 
> nullifies the advantages of Just-In-Time compilation: standard 
> compilation (e.g. GCC or LLVM executable invocation) is not just slower, 
> it is orders of magnitude slower, which even for small bits could just 
> add a few seconds of startup time to any app that uses it. And 
> considering 99.9% of Gambas code speed isn't really affected by any kind 
> of compilation to native code (I think the only ones that are enhances 
> are long, tight loops that do number crunching, but most if not all apps 
> are just a bunch of calls to component APIs or similar), the tradeoff 
> becomes not worth it at all in most cases. :/

By using a cache, the compilation will be done once (or if the system 
CPU changes, but it is usually not frequent).

And no, I don't think calling the compiler will be magnitude slow, as 
the source code will only include one header for the interpreter API, 
and not the tons of headers a standard C program includes.

Moreover, even if 99% of your code does not need speed, the other 1% may 
need it badly.

> 
> Considering all of this, the first solution that comes to my mind could 
> be to simply drop the gb.jit component : it wouldn't be too big of a 
> breaking change I think, considering the interpreter already falls back 
> to normal operation if it can't find the gb.jit component to perform the 
> compilation.
> And it seems it already doesn't work for most users, so dropping 
> something that doesn't work anyway probably isn't a huge deal …
> 
> 
> The alternative option (which is much simpler than the first three), 
> would be to statically link against a specific version of LLVM (the one 
> that works for us, we could use git submodules to do this), bundling 
> only the parts of the compiler we need with the gb.jit component. Then 
> it works for everybody out of the box, and we can update it only when we 
> want, incrementally (and drop all those ugly #ifdef checks).
> 
> Now that I think about it, LLVM is not meant to be used as a dynamic 
> library at all (which explains the pain you're going through trying to 
> do so, and why they release breaking versions regularly). All the 
> projects I know that rely on LLVM (like clang, openjdk, mono, 
> emscripten, rustc …) link it statically.
> 

Statically linking to LLVM 3.5 does not solve these problems : some 
Gambas instructions are not supported, and new ones cannot be added as I 
do not master the original code at all.

But using C allows to not be dependant on the compiler version, as C is 
a relatively stable standard.

Regards,

-- 
Benoît Minisini