Digg4 is Alpha. Really Alpha. Don't worry, we got this.
Last night we invited in our biggest batch of V4 alpha users at around 6pm. 20,000 in total. Within 20 minutes, Digg V4 was unusable. Here's what happened:
- We have been working on an on-going problem with our backend consumers not being load balanced properly across multiple RabbitMQ servers. We have been trying to get haproxy to make this happen and in our testing, it seemed to work after numerous changes this week in alpha. Once we tried on production, not so well. We rolled this change back.
- We added 20,000 people to the whitelist. Now this doesn't mean the site can't work with 20,000 people. It means we added a list of people to our whitelist, and the way the whitelist was implemented on the backend, was extremely inefficient. Every time someone was trying to login, it was pulling out 25,000+ sets of ids and trying to see if that user was whitelisted. Take that inefficiency and our coordinated email efforts, we bombed things. We have corrected this now and its working great!
- Lazyboy (our library we wrote to talk to Cassandra) has a hardcoded limit (max #) of data that can be returned at any given time. This value is extremely high! (100,000!). That's insane. This default actually hurt us with the whitelist, it should of never tried to fetch so much data at time. The proper way you handle pulling large sets of data from distributed database is you need to paginate (this is for load balancing and not over taxing system resources to a single node)
- We saw timeouts are still too high in some areas on the frontend and backend. If one particular call fails, but some the rest succeed, some pages on the front-end will throw an error or show nothing (even though most things came back just fine). We will need to make changes so if things do fail, we fail fast and try to give users a limited experience.
It's an exciting time to be at Digg right now. We will keep you posted on the challenges we encounter over the next few weeks as we continue to roll out V4.