How did they do it?
- Used Analytics to eliminate more than 500,000 of one million documents from review
Wilks, Lukoff, and Bracegirdle, LLC handled a class-action lawsuit consisting of more than 600,000 records. Throughout ten months, the records were produced to the firm, and first-pass reviewers worked around the clock to categorize incoming production documents to sort out irrelevant data.
As the case developed, the U.S. Government became involved in the matter under a separate issue and subpoenaed a copy of the productions from the various producing parties. The government renumbered about 300,000 of the 600,000 records with unique identifiers and reproduced them to the law firm under the related matter. No cross-reference was available, and there was insufficient metadata to de-duplicate between the reviewed data sets. Wilks, Lukoff, and Bracegirdle engaged DLS Discovery to help them organize and de-duplicate the data.
DLS Discovery’s challenge was not just to identify the duplicates. They also needed to link the first-pass issue coding the firm’s team had completed throughout the 10 months prior to the duplicate production to eliminate the expense of reviewing the same documents again.
“Both parties knew the data set well and agreed there was a high number of textually similar documents,” said Andrew. “To stay within the projected timeframe and budget, we knew near-duplicate identification in Analytics would be the most efficient tool for the job.”
Analytics in Action
Within two days, DLS loaded all the data—nearly 1 million documents—into Relativity and used Analytics to identify near-duplicates. They set the minimum similarity percentage to 99 percent to ensure Analytics would only mark a document as a duplicate if it matched at least 99 percent of the principal, or example, document.
The team ran near-duplicate identification overnight and by morning had their results—Relativity identified nearly 200,000 duplicate groups containing about 500,000 documents of the total population.
DLS next determined if any of the issue coding decisions from the firm’s first-pass review—performed months ago—could be replicated to the government’s production to give the team a better idea of the issues coded on each record.
“If we had five records with the same duplicate group ID, then all five of those records were near duplicates,” said Andrew. “So, if one of those records had already been issue coded and the remaining four had not, we replicated that coding decision to those four documents.”
They then used random sampling to QC the workflow.
In the end, DLS eliminated more than 50 percent of the total data set from review and successfully replicated coding from the first-pass review to approximately 153,000 documents in the government agency’s production of duplicates. In total, this saved the firm an estimated 1,150 hours of review time.
“We proved the law firm’s observation that a large portion of the records in the second production set from the U.S. Government were duplicates of the first production set,” said Andrew. “Analytics eliminated the substantial time and cost associated with reviewing the same records a second time.”
“Analytics eliminated the substantial time and cost associated with reviewing the same records a second time.”
|Key Project Stats
|Total documents analyzed by near-duplicate detection in Analytics
|Minimum similarity percentage
|Total documents identified as duplicates
|Review hours saved