3 Tips for Foreign Language Data

Rarely does a review project shape up exactly the way we predict. Litigation support teams need agility and flexibility to be prepared for everything e-discovery can and will throw their way.

Growing data volumes are an obvious contributor to this reality, but so is today’s international landscape. Globalization means more foreign language documents are finding their way into company data stores, and that results in added complications during e-discovery for both litigation and investigations.

If you’re starting to see that foreign language data is becoming a bigger part of everyday e-discovery, here’s how to get ahead of the complexity.

1. Think multilingually.

It is important to always be prepared for foreign language data that may appear in your collections. Odds are good that your business—or your client’s business—involves some dealings in another country, whether via product sales, outsourced services, or recruiting efforts. Modern business means foreign language documents are always a possibility, if not likely.

For example, our team recently kicked off a relatively small internal investigation involving five custodians. After initial strategizing with the client, we knew we might need to handle foreign language data. Even though we didn’t know what languages or volumes to expect, we were fortunate to have prepared the right technological workflows, including tapping a specialized translation plugin for our review workspace, in advance. It turned out that this small investigation became a big one, and more than 10 million documents involving English, Russian, and several Middle Eastern languages were collected when all was said and done.

Bonus Tip: You can also use early case assessment workflows to perform analytics on your case and identify which foreign languages are used in which documents.

2. Hone in on foreign language insights with the right technology.

The days of setting aside individual documents with foreign language content during a manual, linear review so they can be attended to separately by native speakers are more or less behind us. Case teams can now take advantage of text analytics to identify those documents at the very start of the review. The benefit here is that, while still requiring a separate workflow, these documents can undergo a first-pass review simultaneously alongside the English documents—instead of being flagged and funneled into a separate process as reviewers churn through the entire data set manually.

Working with foreign languages in your e-discovery software also means identifying the right stop words—common terms that the system will ignore, such as “the” or “it”—for searching and analytics, so be sure to have a proper understanding of those dictionaries from the start. You can also get creative during searching by looking into slang or other regional terms that could be present in your data set.

Creating a unique analytics index for each language is a good way to ensure you’re making the most of your system’s conceptual analysis of the data. Additionally, work closely with foreign language experts to identify any foreign names or terms that could but should not be translated, such as “Deutsche Telekom,” and dig into foreign keyword search criteria that may uncover the most important files by helping to create clusters—conceptually related groups of documents that can be automatically organized by the system.

Bonus Tip: Taking note of some special considerations for use on foreign languages, leverage email threading and other analytics features on this data for better organization with minimal human input.

3. Know you have options for translation.

All of those technology options mean that a slow linear review by native speakers is no longer necessary—at least not to the full extent it once was. However, once you’ve identified potentially relevant materials via these workflows, you still need to get the data into the hands of the experts on your project. You can’t build a convincing case strategy based on second-hand reports of the stories the documents are telling—at some point you’ll need accurate document translation to provide evidence.

Fortunately, even translation is a different animal when you have the right technology and workflows in place. Machine translation is a very low cost option, but you must be careful. It can provide a gist meaning, but is unreliable for the true meaning of any sentence. While convenient and fast, machine translation may produce misleading information—and some of it may be simply incomprehensible. For reliable accuracy, consider human revision of the machine's results.

For instance, on that same case of 10 million documents, our team ended up with more than 70,000 files that required translation—and the task seemed daunting. Working closely with Linguistic Systems, a Relativity developer partner, we were able to identify a collaborative, hybrid workflow that utilized post-editing of the machine translation to split the difference between the cost-effectiveness of machine translation and the refined accuracy of human translation. In the end, it cost 65 percent less than we anticipated for a manual translation—and we gathered all the insight we needed, easily within the time allowed.

Bonus Tip: Specialized tools that can be added directly to your review workspace support translation workflows in real time, so you don’t have to move data around. Discovia worked with the Relativity Developer Partner, Linguistic Systems, Inc., who does this translation work through their proprietary LSI Translation Plug-in, an application in the Relativity Ecosystem.

When it comes down to it, tackling foreign language data is yet another example of how modern e-discovery requires a healthy balance of technology, expertise, and collaboration.

3 Tips for Navigating the World of Foreign Language Data

1. Think multilingually.

2. Hone in on foreign language insights with the right technology.

3. Know you have options for translation.