A few hours ago, the automated email system for Code Jam went a little bit crazy. I'll write about the technical details below, but the short version is that it sent an email titled "Registration Now Open for Google Code Jam 2014!" more than 20 times to a large number of registrants from 2013. You're receiving this email because our systems indicate that you're one of them.
I'm writing this to apologize: we didn't intend to send anyone more than one email, but a bug crept in to a refactoring of our mail system and ruined those plans. On behalf of the Code Jam team and Google, I'm sorry; we'll work hard to make sure this doesn't happen again. Later in this email, I've included answers to some questions you might have. If you'd like to talk about this further, please send us an email at [email protected].
Sincerely, Bartholomew Furrow, on behalf of the Code Jam team.
We sent the notification email to users who registered for Code Jam 2013, checked the box saying "I would like to receive email notifying me about the next Code Jam.", and hadn't registered for 2014's Code Jam.
To automatically identify the users who should get an email based on Code Jam-specific properties (such as scores, booleans in user profiles, and registration statuses), and then to contact them, we decided it would be least error-prone (though not in this case!) to roll our own system that uses App Engine's mail api.
In the App Engine datastore, we have an object called a "notification". That notification has a "status" property, which takes on the following values: "Waiting", "Sending", and "Sent". We have a cron job called the MailCheckWorker that looks for "Waiting" notifications and starts sending them. Once all emails have been sent, the notification is marked "Sent". Do you see the bug?
The MailCheckWorker should start sending the notifications and mark them as "Sending". So a minute after we started sending the email to everyone for the first time, the MailCheckWorker started sending it to everyone for the second time. That process repeated several dozen times before we were alerted by a contestant, and manually stopped the system.
We had run smaller-scale tests of the mail system, but they all finished within a minute; so the MailCheckWorker just saw a "Sent" notification and didn't start it again.