Is it possible to transform LLVM bytecode into Java bytecode?

Late to the discussion: Sulong executes LLVM IR on the JVM. It creates executable nodes (which are Java objects) from the LLVM IR instead of converting the LLVM IR to Java bytecode. These executable nodes form an AST interpreter. You can check out the project at https://github.com/graalvm/sulong or read a paper about it at http://dl.acm.org/citation.cfm?id=2998416. Disclaimer: I'm working on this project.


It does now appear possible to convert LLVM IR bytecode to Java bytecode, using the LLJVM interpreter.

There is an interesting Disqus comment (21/03/11) from Grzegorz of kraytracing.com which explains, along with code, how he has modified LLJVM's Java class output routine to emit non-monolithic Java classes which agree in number with the input C/C++ modules. He suggests that his technique seems to avoid the excessively long 'compound' Java Constructor method argument signatures usually generated by LLJVM, and he provides links to his modifications and examples.

Although LLJVM doesn't look like it's been in active development for a couple of years now, its still hosted on Github and some documentation can still be found at its former repository at GoogleCode:

LLJVM @ Github
LLJVM documentation @ GoogleCode

I also came across the 'Proteuscc' project which also utilises LLVM to output Java Byte code (it suggests that this is specifically for C/C++, although I assume the project could be modified or fed LLVM Intermediate Representation (IR)). From http://proteuscc.sourceforge.net:

The general process of producing a Java executable with Proteus then can be summarised as below.

  1. Generate human readable representation of the LLVM intermediate representation (ll file)
  2. Pass this ll file as an argument to the proteus compilation system
  3. The above will produce a Java jar file which can be executed or used as a library

I've extended a bash script to compile the latest versions of LLVM and Clang on Ubuntu, it can found be as a Github Gist,here.

[UPDATE 31/03/14] - LLJVM has seemed to have been dead for somewhile, however Howard Chu (https://github.com/hyc) looks to have made LLJVM compatible with the latest version of LLVM (3.3). See Howard's LLJVM-LLVM3.3 branch at Github, here


I doubt you can, at least not without significant effort and run-time abstractions (e.g. building half a Von Neumann machine to execute certain opcodes). LLVM bitcode allows the full range of low-level unsafe "do what you want but we won't clean up the mess" features, from direct, raw, constructor-free memory allocation up to completely unchecked casts - real casts, not conversions -you can take i32 and bitcast it to to a %stuff * if you wish. Also, JVMs are heavily geared towards objects and methods, while the LLVM guys are lucky they have function pointers and structs.

On the other hand, it seems that C can be compiled to Java bytecode and LLVM bitcode can be compiled to Javascript (although many features, e.g. dynamic loading and stdlib functions, are lacking), so it should be possible, given enough effort.