Fun with manually diffing Java bytecode instructions

I recently found myself writing code against a simple library that was distributed in the form of a .class file. For a few reasons (laziness being one of them), I decompiled the class file using jd-gui and just added it to the source path of my project. Decompiling binary Java code and using it in a project is fairly routine and I’ve never had a problem with it before. This time, however, I noticed slightly anomalous behavior from the code.

Swapping it out with the binary version of the class fixed the problem, which was good, but left me wondering – why did the decompiled version behave slightly differently? Decompiling Java code is usually pretty safe, and if it had messed up, I’d expect a more immediate problem like a compilation failure. Intrigued, I decided to spend a few minutes figuring out just why decompiling a class and using the source to compile with had led to subtly different behavior.

Java Bytecode

One of the interesting things about Java is that the compiler (javac) doesn’t do very much optimization at all – code optimization occurs at runtime. The resulting bytecode – if deliberate obfuscation steps aren’t performed – can be easily decompiled and reassembled into Java code, and it is generally quite easy to manually read through to figure out what is going on. So, it seems like a reasonable place to start if we want to figure out why two pieces of nearly-identical code are behaving differently.

Bytecode Diffing

I’ll save you the tedium of poring through a few hundred lines of bytecode, and just show the interesting part. Here is the result from running javap -c on the original .class file:


108: iload 16
110: i2d
111: iload 17
113: i2d
114: ddiv
115: dstore 18

This is pretty simple – it takes the integer located in variable slot #16, putting it on the stack, then converting that value to a double, and placing it back on the stack. It does so for a second integer, then does double division and assigns the double result to variable slot #18. We can imagine the original Java code looked something like this:


double x = (double)y/(double)z;

What does it then do with x?


117: dload 5
119: dload 18
121: dcmpg
122: ifge 136

Again, fairly simple code – it is comparing two doubles and branching based on the result. We can imagine the original code looked something like this:


if(a < x){ .... }

Moving on to the code that has been decompiled (by jd-gui), recompiled with javac, then examined with javap -c, the problem is easy to find:


108: iload 16
110: iload 17
112: idiv
113: i2d
114: dstore 18

This might appear to be fairly similar, but there is an important difference. Here is what this block of code does: Push an int (#16) on to the stack. Push another int (#17) onto the stack. Perform integer division (which rounds the result to the integer closest to 0). Convert the integer to a double (i2d), then store the double in #18. The decompiled Java code looked something like this:


double x = y/z;

For any non-trivial piece of code, you'd have to get fairly lucky to spot this problem. For one, when we see this block of code, we don't actually know the types of y and z - and there could be a lot of trivial operations like this. Second, there are cases where rounding is perfectly valid.

This small bug led to the comparison behaving incorrectly in some cases (ie, if a = 2.0 and we compare what should be 2.5, they'll actually be equal due to rounding), which led to statistical anomalies in the output.

Decompiler Bugs

So, this is obviously a pretty simple decompiler bug. How did it happen? Well, remember when I told you that javac created bytecode that pretty closely mirrored the Java code? One of the small things it does is automatically insert primitive widening conversions - that is, it inserts bytecodes (such as i2d) to convert from one primitive type to another when it can guarantee that no loss of precision will occur. Integers to doubles are one of these cases, and you can see in our decompiled example how it automatically inserted an i2d call.

My guess is that some decompilers assume that all widening conversions (such as i2d) are automatically inserted by javac and can be safely elided from the decompiled code - probably to reduce the amount of noise in the code. However, it is quite clear that not all widening conversions are safe to ignore - thinking about it naively, it seems like there would be a set of rules you could follow to determine when it would be safe to ignore them and when it isn't, but I'm not convinced you could ever be 100% correct.

Conclusion

While there probably isn't a ton of useful technical information in this post, I had a lot of fun tracking the problem down - tracking down weird, seemingly impossible problems can be an enjoyable experience, and having some notion of what bytecode is and how it works can come in handy occasionally. This has, however, made me slightly more careful when using Java source code that has been decompiled from its class file format.

Leave a Comment


NOTE - You can use these HTML tags and attributes:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <pre lang="" line="" escaped="">