Back to Digg Home Digg is a place to discover and share the best content from across the web.

Digg: Dupe Detection Updates Are Here

Hi all,

After much anticipation, we are finally releasing several major updates to our dupe detection technology and content submission process that should go a long way in eliminating duplicate submissions.

To better understand the nature of the problem, we analyzed the types of duplicate stories being submitted. Most common are the same stories from the same site, but with different URLs. Our R&D team came up with a solution that identifies these types of duplicates by using a document similarity algorithm. Look for a separate tech blog post on how this works, but it has proven to be a reliable way of identifying identical content from the same source.

Another common type of duplicate is the same (or similar) story covered on different sites. Because this enters more subjective territory, we focused on doing a better job at detecting dupes with similar descriptive information. By leveraging Digg's improved search technology, released a couple months back, we now match stories with similar titles and descriptions with much higher accuracy than before.

Most importantly, we made changes to an often cumbersome submission process. We moved the duplicate check immediately after the URL entry, *before* we ask for descriptive information. This eliminates the need to describe your submission before checking for dupes. In addition, the lag time from when a story is submitted to when it's available in our duplicate checker is now a few seconds. These changes may take some time to adjust to but we anticipate that they'll help to eliminate dupes and encourage folks to Digg previously submitted content.

While we pilot the new dupe detection system, we will continue to only block submissions of the exact same URLs within a 30-day period.  We'll also be monitoring when certain Diggers choose to bypass high-confidence duplicates and will use this data to continue to improve the process going forward. As always, please share any feedback you have on these updates.

Cheers,
Chris