Integer Overflow And Underflow

Glass overflowing with water

Q: When is optimization sub-optimal? 

A: When it creates an Android bug that crashes your phone during a call to 911.

Details of the bug can be found in Ars Technica, but I want to focus on the integer overflow/underflow that caused the crash:

return account1.hashCode() - account2.hashCode();

Nevermind the variable names and the invocation of hashCode(). The problem is subtracting without looking, and returning a number that might be too large or too negative for the program to handle.

Google corrected the code by using the Java compare() method:

return Integer.compare(account1.hashCode(), account2.hashCode());

which limits the return values to -1, 0, or 1, indicating account1 is less than, equal to, or greater than account2, respectively.

This raises two questions: "Why did the original code use subtraction?" and "Wasn't this tested?"

Addressing the first question, I've seen overflows and underflows before, and even wrote about them in Space, Time, and Imagination Have No Boundaries, But Your unsigned char Does.  Evidently, ignoring boundary conditions is common.  Doing so is not an error of inexperience either, as I have noticed junior and senior developers alike make this very mistake.  Rather, this code can better be described as an affliction of micro-optimization.

Among some developers, there is an urge to make every line of code carry its own weight, where only the briefest, fastest code will do. In the end, added performance is negligible, whereas added risk takes one step toward reality.  Subtracting without looking is not too different from driving a car and changing lanes without looking.

Back when computers were slow and compilers immature, these micro-optimizations did matter somewhat. I've folded strings, unrolled loops, removed unneeded braces even though they provided legibility, and declared loop counters in the C language as register int i.  Today's compilers are robust, and render these measures unnecessary.  Meaningful optimization takes place at the program design level, and only after determining that the program runs correctly (see Dijkstra). 

Which brings us to the second question regarding testing. This bug only showed itself when the Microsoft Teams App was installed, and even then, only under rare circumstances. As a developer, it's fun to blame QA for missing the bug, but realistically, QA can't possibly test this with every third party app in existence. Nothing would ever get released. No, the blame falls squarely on the shoulders of the developer, who should have been more defensive in his or her code.

Unit testing is a lot less expensive than QA integration testing, which in turn is a lot less expensive than finding a bug in production. First principles matter. I doubt I've seen the last of overflow and underflow errors, but resisting the call of micro-optimizations will go a long way toward producing more reliable programs.



Comments

Popular posts from this blog

MR2 Check Engine

Bookshelf: UNIX A History and a Memoir

Bookshelf Classic: The C Programming Language