Gecko 1.9.0 Key Handling Postmortem

If you’ve participated in or followed Mozilla’s Firefox 3 development over the past month you’re probably aware that we had a bunch of issues with key handling come up at the very end of our development cycle. In the interest of giving others an opportunity to learn from our mistakes and to generally communicate what happened, I’ve agreed to write up a summary of the development team’s postmortem discussion about our recent key issues.

“Key hell” started when I fixed bug 398514, a significant rewrite of our key event flow, and Masayuki Nakano started fixing a major bug concerning key command event mapping on international keyboard layouts (more on Masayuki’s fix later). Prior to my patch we let Mac OS X match keyboard commands to native menu items and then executed associated DOM command nodes based on the operating system matching. The up side of that is that the operating system does all the mapping of key commands to their menu items, the down side is that nothing but the command node associated with the menu item ever sees the event. After my patch we ignored the operating system’s invocation of native menu items in favor of allowing the key event to flow normally through Gecko. This fixed a lot of important bugs, I can’t believe we got away with pigeon-holing key equiv events for so long. The problem is that we were now in charge of mapping key events to their commands. For US English this is pretty simple mapping and it works without special treatment, I didn’t notice any problems because I use US English. When I tested with other keyboard layouts I tested text in text fields – I didn’t test many keyboard commands. My bad #1, but I wasn’t aware of the fact that key commands have certain types of complex mappings under different keyboard layouts and circumstances.

Coincidentally, this problem came up for different reasons on Windows and Linux right before I exposed it on Mac OS X. Masayuki had already started working on a fix because of Windows and Linux. By the time we had sorted through bug reports and figured out what was going on with non-US-English keyboard commands on Mac OS X Masayuki was half way to a fix. This ordeal would have been much worse without that stroke of good luck. Masayuki is a talented guy that understands far more about international keyboard layouts and input than I do, I was pretty happy he was already on the problem.

Masayuki’s patch(es) attempted to solve a very complex problem. The problem has a huge number of edge cases under different keyboard layouts and the process of finding and fixing cases that we hadn’t covered yet dragged out until today. I don’t think any particular engineer is to blame for this, the extended timeline for fixing regressions was the result of late detection and a lack of automated test coverage.

We should have discovered this problem much earlier than we did. I suspect that we finally discovered it as a result of heavy beta usage, especially among international users. User testing is great but it is a perk – not something to be relied upon. We also would have found out about this earlier had I committed my major patch for bug 398514 earlier. We had good reasons for making that change so late, but having good reasons doesn’t shield us from the consequences. At best it just makes the consequences easier to swallow.

The other major factor contributing to this debacle was a lack of automated testing until too late. With a decent test suite we could have found out about most of this much earlier. Even after we found out about the problem it took us too long to get tests in place to aid in avoiding regressions in the fix process. Eventually Roc wrote up a great test system (based on synthesized native events) and that has helped a lot. We’ve since added a bunch more tests to his original set and we’ll be adding many more. If you only take away one thing from all of this writing I hope it is the value of tests – we could have saved a huge amount of time by getting those in place earlier.

There was one other last-minute problem with key handling that got confused with the situation I described above, though it is really a completely separate issue. Mac OS X sends key events into Cocoa apps via confusing and inconsistent paths (performKeyEquivalent: vs. keyDown:, either or both in different orders, plus sometimes we don’t get key up events via keyUp:, sometimes we get key up events but they come in via a second call to performKeyEquivalent:, what a mess Apple!). The circumstances that led to issues with this being a problem at the last minute were basically the same as for the other issues I described.

A big thank-you goes out to Robert O’Callahan, Karl Tomlinson, Matthew Gregan, and Masayuki for working so hard to get this situation under control over the past week. Everyone worked together really well, it was some impressive teamwork.

One thought on “Gecko 1.9.0 Key Handling Postmortem

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s