Captricity Engineering Insights: Pokemon Exception Handling
January 15, 2015 by Nick Jalbert
[Some functions and algorithms have been renamed to protect the innocent.]
Yesterday I was reminded once again of the dangers of what is commonly known as the Pokemon exception handling pattern.
In the morning, I noticed sporadic errors in our thumbnail system. We expect each image in our thumbnail system to have one thumbnail and for that thumbnail to be valid. However, I kept seeing multiple thumbnails that were sometimes valid and sometimes invalid.
After some exploratory debugging, I couldn't find any discernable pattern in the broken thumbnails. Given that our thumbnail generation code runs asynchronously and in parallel, I began to suspect this was a race condition due to improperly synchronized atomic blocks.
That’s when the fun started! If you've debugged nondeterministic races before, you know it can take hours of trial and error and lots of hard staring at code to find a root cause. This is because reproducing races can be extremely difficult in a dev environment. Then, once you have a plausible root cause, it takes even more effort to write a regression test and fix the bug.
So I dug in. After some work, I narrowed it down to a small block of code in our thumbnail generation function which boiled down to something like this:
thumbnail, created =
# Generation failed, mark thumbnail as invalid
The race condition itself was a simple one: two simultaneous get_or_create calls were conflicting with each other, causing the system to sometimes create two Thumbnail objects.
This particular race condition is usually easy to catch, because Django will raise the MultipleObjectsReturned exception on the get_or_create call (since get only expects one object). However, we were catching all exceptions and silently ignoring them to handle errors in thumbnail generation (like an invalid initial image file). So, when something else broke in the try block, it manifested as a runtime error causing unexpected thumbnails to appear in bad states.
While the catch-all pattern is generally frowned upon, it has its uses. For example, in a web server it is better to return a more sensible error code than to just fail in the middle of a request and return no response. In these cases, you should always log the actual exception in a different way to help with debugging in the future (Sentry is a good service for tracking exceptions in Django).
In our case, we weren't even logging the exception, making it extremely difficult to debug. This catch-all usage was also unjustified since it ignored the built-in retry logic in our asynchronous work queues. Worst of all, it hid an actual bug in the system.
So, the next time you are tempted to use the catch-all Pokemon pattern, think really hard about the consequences. Often debugging is made easier by catching specific exceptions (like the ones that can be raised by the thumbnail generation code).
Like this post? Give us a shout on Twitter @Captricity and let us know what you think!