Picture of Kailash Badu

Kailash Badu

Sep 22, 2011

The crunch time diary - when servers don't serve

Last week CloudFactory launched at the TechCrunch Disrupt and took the first public step towards its vision to fight poverty in developing nations through crowdsourcing. We have a lot of experience to share about our launch.

As Mark and Tom were busy demoing our platform to the curious crowd at TechCrunch Disrupt, the CloudFactory Team huddled together in our Kathmandu Office. We were aware that a lot of people will try to use CloudFactory after seeing our demo at Disrupt and we wanted to make sure that our server and applications were at their best behavior.

It was a little over midnight here in Kathmandu, of course, when Disrupt kicked off in San Francisco.

CloudFactory team has been pulling all nighters repeatedly for last one month chasing away every last piece of bug. It's a different story that bugs have a way of creeping back into your code no matter how meticulous you are in getting rid of them. Whenever man comes with a better mousetrap, nature immediately comes up with a better mouse, ya know?

CloudFactory runs on a set of Unicorn application servers which handles each incoming request from Nginx (our HTTP server) with a separate, newly created thread. At some point CloudFactory was processing a production run that required extracting contact information from the images of dozens of business cards being uploaded into our system by Disrupt visitors. And suddenly one of our Unicorn servers started misbehaving. We noticed that some of the HTTP requests were getting lost before they could complete leading to a few business cards not being processed at all. We tried to find out if there was anything weird about the request itself that could be crashing the thread. Nothing!

Bikash, our system admin, jumped into action and it took him a while to figure out that the real culprit was a setting in Unicorn that set the maximum life of the threads at 30 seconds. Our Unicorn servers are configured and optimized for situations that require CloudFactory to handle large number of short, concurrent requests. Apparently, a few tasks were taking longer than usual and consequently being shut down abruptly by Unicorn after 30 seconds. He swiftly set it to 60 second and problem was solved.

The team breathed a sigh of relief when we realized to our great surprise that we didn't have any major glitch after that. CloudFactory successfully digitized business cards for several dozens of Disrupt visitors on that day. Our team was delighted to see our product being received well by many who heard pitches and watched demos from Mark and Tom. Mission Accomplished.

The team slept for 12 straight hours the next day.

Startup CrunchTime

Recent Posts

Subscribe to CloudFactory Blog