This post was originally published on D4's Discover More blog last week. We thought it was an informative introduction to analytics workflows for teams looking to speed up their review of small cases, or begin embracing text analytics on more manageable projects, so we thought we'd share it here.
While it is true predictive coding and technology-assisted review are better suited for cases with more than 100,000 document records, there is a common misperception that only large and very large cases can benefit from the application of analytics. There are a variety of tools and workflows that fall under the umbrella term “analytics.” These include, but are not limited to, email threading, inclusive email review, near-duplication, clustering, categorization, and conceptual search. Dependent on the data set and goals of the review, small cases may see greater benefits from the application of analytics than large multi-custodian, big data cases.
Here are five analytics workflows that will reduce, or possibly even eliminate, document review in the typical case under 100,000 records.
1. Email Threading and Inclusive Review
Email threading is one of the most commonly used items in the analytics toolkit, but too frequently, people do not consider it for small volume, or single mailbox, email review. Regardless of the amount of email, organizing any review by email thread will promote both the speed and consistency of the reviewer. When limiting the review to just the inclusive emails (emails that represent the end of a conversation or those forwarding an attachment that was left off of a later reply), the review can be reduced by as much as 30 percent. For a review of 3,000 emails, that would be a savings of 600 documents, or one day of review. Email threading can and should be applied to every case in which email will be reviewed.
2. Near-duplication Comparison of Separate Collections
Data typically comes in on a rolling basis, and with smaller cases, it is likely that one set of data could be completely reviewed before the second wave hits. While it may be difficult to find time to review the second set, the good news is by using near-duplication, you could leverage the document coding from your first review to supplement the coding of the second review. Near-duplication allows you to compare the text and group textually similar documents together. By identifying the near duplicate groups with individual documents from both collections, you can simply, and quickly, pass the coding from the first set to the second with the click of a button.
3. Clustering Each Custodian Separately
With smaller datasets, clustering can be a powerful workflow. When clustering is applied to a smaller dataset, there are fewer files to skew the automated process of grouping together conceptually similar documents. Small document clusters make it simple to prioritize (or de-prioritize) specific sets of documents for review. With clustering, large chunks of data can be quickly evaluated—and possibly eliminated—before any review is performed, all with little effort from the review team.
4. Sample Clusters to find Categorization Examples
As stated above, small cases do not make good predictive coding candidates. That being said, you can mimic a technology assisted review workflow by running clusters, generating random samples of each cluster, identifying exemplar documents in each sample set and using those exemplars to categorize the documents into 10 categories of your choosing. This workflow combines clustering and categorization with a limited review of your documents to allow you to issue-code an entire population of similar documents. The best part is if the results are limited by not having enough quality documents in the samples, you can generate another round.
5. Smoking Gun Concept Search
Conceptual searches can be incredibly powerful tools and in this workflow, the approach is simple. You can use any body of text to generate a conceptual search, as they are not dependent on example documents in your database. Simply create an ideal document that gets to the heart of the issue in the case. The text should be one to two fully developed paragraphs focused on a single concept. Submit this text as a conceptual search. The results of the search will, at the least, point you to documents that get to the core of your matter. At best, the results will contain the hot, relevant documents—and potentially the “smoking gun” documents—around which you can build your argument. D4 recently used the "Smoking Gun Concept Search" workflow - Read the case study.
The above strategies are not exclusive to smaller volume cases and can be used on cases of all sizes. But when used on smaller cases, these workflows will enhance, expedite, and possibly even eliminate, the need for a full document review phase. With the introduction of analytics into smaller cases, we are seeing a shift in the balance of power and a leveling of the playing field. Analytics is no longer just for the terabyte-size cases; it can be just as effective on one gigabyte of email.