The Digg Blog
Rasmus Lerdorf - PHP Performance
Last night the father of PHP Rasmus Lerdorf presented at the Digg office on PHP performance optimization as part of the Engineer-to-Engineer: San Francisco Tech Talks.

After a quick overview of some of the new features in PHP 5.3, he gives us a detailed and practical look into PHP performance pitfalls. He walks us step by step through the tools and techniques to optimize an application, showing us some remarkable improvements in the performance and throughput of WordPress installation. You can view the slides online here.
Latest Update to Digg's iPhone App
We’re excited to announce that we have launched an update to the Digg iPhone App.
After receiving lots of great user feedback, primarily around stability, we’ve been working hard to address your concerns and improve the App. We’re happy to report that version 1.1.2 is iOS 4 compatible, allows for easier sharing via Twitter and Facebook and, most importantly, is stable!
We’ve been testing and using the App internally for a few weeks now and hope you’ll enjoy this new version as much as we have. We owe a huge thank you to the Digg community for bearing with us as we make improvements, we really appreciate your patience and interest.
We’re excited to hear what you think so download the iPhone App here and keep the feedback coming!
- Corey Johnson, Peifeng Kuang and Natasha Prasad
Key Changes
We’ve made a number of stability changes, including resolving crashes around:- Logging in to, or changing accounts on, Facebook or Twitter
- Sharing stories through Facebook or Twitter without first logging in to that service
- Digging a story without first logging in to your account
- Running a search for a random string with no results
- Navigating away from a story while it is loading
- Saving a story or trying to access saved stories
- Loading more user comments related to a story
Continuous Deployment, Code Review and Pre-Tested Commits on Digg4
One of the exciting things, from a development perspective, about Digg4 is continuous deployment - when developers fix a bug or add a new feature, there's no need to wait for a scheduled release. Instead, the change can go live right away. This is great - the turnaround time for a change drops dramatically. But this also opens up the possibility of broken changes going live, since there won't be manual testing and signoffs before the changes go live. Figuring out how to balance the speed and agility of continuous deployment with the requirements for stability and reliability has been, and continues to be, a major challenge for us here. Over the last couple months, we've rolled out a workflow which we believe helps achieve that balance without sacrificing an excess of stability or agility.
There are a few pieces to the workflow we've put in place. First, there are the tools we were already using before moving to continuous deployment: git for source control, Hudson for continuous integration and build management, Selenium for UI testing (via the ridiculously handy Sauce OnDemand service from Sauce Labs), and puppet for managing what's running on our servers. The major addition to that stack has been Gerrit, a code review system for git, originating with Google's Android development group. Mandatory code review for all changes can seem like a pain, but there's nothing like having another developer actually look at your code to find the sort of design snafus automated testing may not catch.
But that's not all we use Gerrit for. Alongside code review, Gerrit by default also provides a "Verified" flag. This is meant to be used to record that the change in question builds and tests properly. By combining this with Hudson, we are able to achieve pre-tested commits: no change can make it to our master branch without passing a battery of unit tests, successfully packaging into a Debian package, and in the case of our frontend code, installing on a test box and passing a battery of Selenium smoke tests. In order to make this work, we rely on the fantastic Gerrit Trigger plugin for Hudson, written and maintained by Robert Sandell and the rest of the Hudson team at Sony Ericsson.
The Gerrit Trigger plugin takes advantage of Gerrit's event streaming functionality to listen for the creation of new patchsets. Whenever it sees a new patchset come down the stream, it checks whether any Hudson projects are configured to build the Gerrit project the patchset belongs to, and if so, it launches a build of that Hudson project, making sure that build pulls down the change we want to verify. When all running builds for a patchset finish, the Gerrit Trigger listener will send a message back to Gerrit with the result. The exact commands, values, etc are configurable - in our workflow, we just use the Verified flag, with a -1 when the build fails and a +1 when it passes. We've also made a minor tweak to the Gerrit Trigger plugin to allow for rebuilding a patchset if the build failed for reasons other than legitimate test failures - the team at Sony Ericsson has a more formal solution for this coming in a future release of the plugin.
Why do we only run a subset of unit tests and Selenium tests in our pre-tested commit builds? First and foremost, due to time - we want the pre-tested commit process to be as fast as possible, since every version of every change will go through it. The next step in the continuous deployment process will run less frequently, so it can take a little longer and be more stringent.
Once a change has been code reviewed by another developer and verified by a Hudson build, it can be merged down to the master branch. When a change hits the master branch, a full build is run, with the full set of unit tests run against the code. Next, the code is packaged and installed onto an internal staging environment, and the full set of Selenium tests are run against that environment. If all those tests pass successfully, we deploy the tested code to production.
Now, is this perfect? Most definitely not - we end up having to manually rebuild more pre-tested commit builds than we'd like, due to instability in some of our tests or external factors and there are plenty of other areas that could be tweaked to make the process smoother or more reliable. But overall, we've achieved quite a lot here - no change is going to make it onto our master branch without being reviewed by another developer, and being built to verify that tests covering that change pass. No change is going to make it onto our internal site without passing a full battery of unit tests. And nothing is going to go on from our internal site to production without passing an additional set of Selenium tests. Does this mean we're never going to introduce bugs to our live site? Of course not - but we're going to keep the number of bugs to hit the live site to a minimum, and we've made it easy and fast to get bug fixes live as well.
Best of Digg: June Comments
In addition to presenting the most talked-about news of the day, Digg allows users to discover and comment on the sometimes beautiful and sometimes ridiculous images that fly across the web.
To honor the popularity of image submissions (they often dominate the weekly Digg Digest emails, which you can sign up for here), we’re spotlighting only comments related to images in this month’s look at Digg comment lulz. All but one (a personal favorite, can’t help myself) received more than 1,000 comment Diggs. You really need to check out the submissions themselves for these to click, but the effort is worth it.
To start, a question: When did Sarah Jessica Parker jump the horse? Maybe she was never considered “hot,” but didn’t she spend a large portion of her adult film career playing “cute” or “attractive” women? Maybe I’m just crazy. But there’s one thing I do know: Give an (anonymous, male) Internet commenter a horse and it’s only a matter of time until he gives you an SJP joke. Digg does not disappoint.

Give it up for another great, unexpected zing. As user dscan states in his praise of the commenter, “You commented with something witty that the Digg community wasn’t expecting. Touche.” That’s how it works, folks.

Want another example of a user coming up with something surprising? As the follow-up commenter so accurately and succinctly puts it, “You win.”

Digg users often complain about having to click 10 or 15 (or more) “next” arrows to view an entire post. Some even take it upon themselves to list the contents in one uninterrupted block of text. But this might be the first time someone has translated a photo slideshow into text. Bring on the emoticons!
![]()
Digg loves The Oatmeal. Digg really loves irony. Bring both together and you get some excellent comments. Here’s one tasty contribution (with proper attribution to boot).

Honorable mention goes to this comment, which not only gets Governator points (we love us some Arnold) but also provides an excuse to bring up a fantastic photo submission (and remind everyone that BP blows). And we’d be remiss not to give a nod to the World Cup fever that has taken over Digg this month (¡viva España!).
Once again, a million thanks to the Digg community for keeping the discussions lively and fun. Holler if you have any comments for us, either via our handy contact form or through Twitter @digg_community.
Hasta pronto,
T.J.
Digg4 is Alpha. Really Alpha. Don't worry, we got this.
Last night we invited in our biggest batch of V4 alpha users at around 6pm. 20,000 in total. Within 20 minutes, Digg V4 was unusable. Here's what happened:
- We have been working on an on-going problem with our backend consumers not being load balanced properly across multiple RabbitMQ servers. We have been trying to get haproxy to make this happen and in our testing, it seemed to work after numerous changes this week in alpha. Once we tried on production, not so well. We rolled this change back.
- We added 20,000 people to the whitelist. Now this doesn't mean the site can't work with 20,000 people. It means we added a list of people to our whitelist, and the way the whitelist was implemented on the backend, was extremely inefficient. Every time someone was trying to login, it was pulling out 25,000+ sets of ids and trying to see if that user was whitelisted. Take that inefficiency and our coordinated email efforts, we bombed things. We have corrected this now and its working great!
- Lazyboy (our library we wrote to talk to Cassandra) has a hardcoded limit (max #) of data that can be returned at any given time. This value is extremely high! (100,000!). That's insane. This default actually hurt us with the whitelist, it should of never tried to fetch so much data at time. The proper way you handle pulling large sets of data from distributed database is you need to paginate (this is for load balancing and not over taxing system resources to a single node)
- We saw timeouts are still too high in some areas on the frontend and backend. If one particular call fails, but some the rest succeed, some pages on the front-end will throw an error or show nothing (even though most things came back just fine). We will need to make changes so if things do fail, we fail fast and try to give users a limited experience.
It's an exciting time to be at Digg right now. We will keep you posted on the challenges we encounter over the next few weeks as we continue to roll out V4.
Tom Conrad - Navigating the Smartphone Seas
Interested in why Tom Conrad, the CTO of the Pandora music recommendation service, said “I need Android like I need a hole in the head”? He stopped by the Digg office last week as part of the Engineer-to-Engineer: San Francisco Tech Talks to talk about how a focus on mobile applications helped transform Pandora into the enormously popular service that many of us know and love.
Digg Technical Talks - Kohsuke Kawaguchi
At Digg, releasing code quickly is very important. We do parallel development and testing, automated unit and functional tests, and continuous integration and deployment. We use Hudson to tie everything together. This week, we had the creator of Hudson, Kohsuke Kawaguchi, speak to our engineering team about the current state of Hudson and what we can look forward to down the road.
His comments about Selenium and Hudson are of particular interest to our QA team. There are all kinds of integration possibilities - from custom reports that include embedded Sauce Labs video results to automatically establishing connections between our environments, there are lots of ways to make tests run more often and more quickly through Hudson.
Slides are available here. Really great talk.
Digg Technical Talks - Jeff Hammerbacher
Jeff Hammerbacher stopped by Digg yesterday to talk about analytical data platforms. In his talk he describes what these platforms are and why you should care. He updates us on what's happening in that space and illustrates how Cloudera are building such a platform around Hadoop and HDFS
Jeff Hammerbacher is a founder and the Vice President of Products of Cloudera. Jeff was an Entrepreneur in Residence at Accel Partners immediately prior to Cloudera. Before Accel, he conceived, built, and led the Data team at Facebook. The Data team was responsible for driving many of the applications of statistics and machine learning at Facebook, as well as building out the infrastructure to support these tasks for massive data sets. The Data team produced open source projects such as Hive and Cassandra and their work was recognized at conferences such as CHI, ICWSM, SIGMOD, and VLDB. Before joining Facebook, Jeff was a quantitative analyst on Wall Street. Jeff earned his Bachelor's Degree in Mathematics from Harvard University and recently served as a Contributing Editor for O'Reilly's "Beautiful Data".
New Digg Drupal Modules
Here at Digg, we love open source. Many of our engineers are contributors back to projects like Cassandra, Puppet and Hudson and for the past couple of months we have been doubling down on our use of Drupal. Digg About and the forthcoming developers site are built upon Drupal. Drupal was choosen for its flexibility and active community allowing us to build sites quickly. We are confident that Drupal will continue to be a force in the future and are anxiously awaiting the release of Drupal 7. To those ends, we have decided to release a couple of brand new modules for Drupal.
- Diggable module is written and maintained by Digg engineers and is the official way to integrate your Drupal site with Digg.
- Digg Login is a module to give your site users the ability to log into your Drupal site using their Digg user name and password over OAuth. It is built using PEAR packages
- http://pear.php.net/package/Services_Digg2
- http://pear.php.net/package/HTTP_Request2/
- http://pear.php.net/package/HTTP_OAuth/
which are written and maintained by our own engineers.
The advantage of the Digg Login module is that your users do not have to maintain separate passwords providing them another easy and simple way to log into your site. It is written to ensure any customizations that have been added to the user registration form are preserved and to be filled out by your new users. All it takes to setup is to download the PEAR packages and setup a Digg Application.
For more on how we use open source Digg visit our Open Source page.
Speeding Along
Digg's engineering team is committed to a lightning fast website. We've recently released our widget and button rewrite and we've been investing heavily in our V4 infrastructure rewrite.
In February of this year, Steve Souders, wrote an article describing the relatively poor performance of the Digg widget. The story was recently picked up by another publication, but fear not, we fixed this in March.
The March release tackled both performance and reliability of our widgets and buttons. We made quite a few changes. We now serve nearly all static content from a CDN. We use non-blocking JavaScript and we no longer use Iframes.
Our asynchronous widgets and buttons are now failsafe: even if Digg.com or our CDN has availability problems, publisher pages still load quickly and the Digg items gracefully degrade.
They are fast too. Buttons now load in about 20ms with a warm cache, and in about 100-125ms with a cold cache. All our buttons were automatically upgraded in March.
Widgets are fast too. 30ms with a warm cache and 100-250ms with a cold cache.
Both our widgets and buttons score highly when tested with Google Page Speed and Yahoo! YSlow.
Watch this space for performance related updates.
John Quinn. VP Engineering. (Digg: doofdoofsf, Twitter: doofdoofsf)






