A second blogpost about developing a mobile app, in which Roberto Tyley finds a bug in Android – and fixes it.
As soon as a software developer tries something unusual, they stand a good chance of encountering our mighty friend – The Bug. When I wrote Agit, the Git client for Android, I encountered a strange little bug that nearly stopped me dead in my tracks. Although I didn't know it at the time, the bug was actually hidden deep within the source code of Android itself.
At the Guardian we encounter a lot of interesting bugs ('cos we do a lot of interesting things) and the process of fixing them is usually made much easier because we tend to use Open Source Software – though I worked on Agit in my spare time, I got to exercise a lot of the same skills in tracking this bug down.
This is the second blogpost about the development of Agit; you can read the first post in this series here, which gives an overview of Agit and Git in general.
Android apps run on a version of the Java platform, and given JGit, a marvellous open-source Java implementation of Git, there should have been no problem getting Git up and running on an Android phone ... yet mysteriously, there was. Using JGit to clone a list of test Git repositories worked perfectly on my laptop (with the standard version of Java supplied by Sun, as was), but worked with only a few of them when I ran it on my phone – mostly, it crashed, throwing me this message: Packfile corruption detected: Unknown zlib error.
So ... Java on the laptop, Java on the phone. What's the difference?
Zero difference with Harmony
Android apps run on a Java-like set of libraries based on the open-source Apache Harmony project. Apache wrote this code from scratch – and wasn't able to get a license from Sun for the Java-compatibility test kit – so it's not quite the same as regular Sun Java. I suspected that somewhere between the two there was a difference causing my problem – and I needed to work out precisely where it was.
Identifying the difference by running code on my phone would have been laborious, but thankfully I didn't have to – the Harmony version of Java is easily installed onto a laptop, and once I did, I was able to see the same behaviour again, with the code failing in the Harmony VM. That was great, because it confirmed that the problem was a discrepancy between Sun vs Harmony Java, but it didn't immediately tell me what that difference was. It also didn't explain why some repos consistently did work with Harmony – what was the poison pill that broke the others?
I added logging to the key parts of the JGit code. The error was occurring when JGit was "unpacking" the Git packfile – packfiles hold the lion's share of your repo's data, stored in an special compressed format. JGit uses the Java Inflater class to decompress that stored data, and then does a bunch of checks to ensure that no data has been corrupted. Both Harmony and Sun were performing this identically, except for one disparity: when JGit was running on Harmony, and had uncompressed the whole of an item, it sometimes wouldn't stop there. It would keep on going, stumble when it found there was no more data to be had, and the "Unknown zlib error" would get thrown.
Why wouldn't it stop? The code looped around, waiting for the finished() flag of the Inflater to return "true" – but it never did. The responsibility for setting the flag to true lay in the native code implementation of Inflater, yet that code was never even getting called, and this is why:
public int inflate(byte[] buf, int off, int nbytes) {
if (nbytes == 0) {
return 0;
}
// actually read compressed data & inflate it
...
}
Harmony's version of Inflater.inflate() had this tiny little optimisation not in the Sun version which would short-cut the rest of the function if asked to inflate zero bytes of data – which sounds fine, right? Reading zero bytes should never change the state of anything. Unfortunately, it just doesn't work so well if your repository happens to contain a zero-length file anywhere in it's history, because that zero-length stream of data will never get past the shortcut, and will never even be examined to determine that it's already finished – the finished flag will never get set.
Zero length files are more common that you might think. The helpful instructions shown by GitHub on creation of a new repository include these commands:
git init
touch README
git add README
git commit -m 'first commit'
The command "touch" creates a zero-length file – and that's the poison pill. All the repositorie with that poison pill died under Harmony, and all the others lived.
Fixing the problem came in two parts – the narrow fix (just useful to Agit) was creating a "HarmonyFixInflater" subclass of Inflater which overrode the inflate() method, then patching JGit to use that Inflater instead ... but the broader fix was fixing Android itself – a tiny change to the source code, but how would I get the weird little patch noticed and accepted?
Submitting patches is a delicate business. You have to succinctly explain where you're coming from, and not sound like a raving lunatic. So I raised a ticket against Harmony, and once that had been accepted, a second one against Android, referencing the first to give a evidence I wasn't talking rubbish, and providing a short snippet of code to demonstrate the problem.
This approach, methodical as it was, missed a massive opportunity – I didn't realise at the time that Android has an open web-based code-review system that anyone can contribute to. Code submitted there can be merged directly into the Android tree, making it trivial for them to accept good patches. This, to me, is just jaw-droppingly fantastic. Android is often called "open" because you can easily check the code out, yet almost nothing is made of the fact that they have such a well-developed system for receiving changes back in – utterly unlike any other commercial smartphone platform that I'm aware of.
Sweetness with Honeycomb
Despite my ignorance of Android's code review system, the patch did get accepted, and made it onto Android devices with the Honeycomb release – the latest, shiniest Android tablets all have it. Agit doesn't require it because it uses the "HarmonyFixInflater" to work on pre-Honeycomb devices – but it's good to know that the problem is fixed for future developers.
--
Source: http://www.guardian.co.uk/info/developer-blog/2011/jul/28/android-zero-byte-killer
~
Manage subscription | Powered by rssforward.com
0 comments:
Post a Comment