Welcoming Generative AI into the Cockpit: The GPT-4 Pilot in RelativityOne

If you’ve had the pleasure of sitting down for some high-level banter with the internet’s beloved pal, ChatGPT, you know that the capabilities of AI are more pertinent and powerful than ever. ChatGPT—informed by a massive amount of training data and enough creativity to give an illusion of personality (we could talk for hours!)—is becoming wildly popular amongst avid googlers and tech enthusiasts alike.

In the legal realm, a question du jour is how generative AI will weave into e-discovery—something Relativity has set out to answer with the development of aiR for Review in RelativityOne. aiR is powered by Microsoft Azure’s OpenAI Service’s GPT-4, a large language model that scored in the 90th percentile on the bar exam, and legal professionals want to know: is GPT-4 working?

Cristin Traylor, director of law firm strategy at Relativity, sat down at Relativity Fest for a panel with four law firm customers who spent several months piloting the integration of GPT-4 in RelativityOne. The session titled Law Firms, Generative AI & The Future of e-Discovery, included:

Jason Lichter, Principal, Troutman Pepper eMerge
Matthew Jackson, Counsel - Data Analytics and eDiscovery, Sidley Austin LLP
Melissa Dalziel, Of Counsel, Quinn Emanuel
Michael Cichy, Litigation Support Regional Manager, Foley & Lardner

Before delving in to hear about their experiences, Cristin made sure the audience was well-acquainted with the blueprint for all of the AI we build: Relativity’s AI Principles. These six principles function as our promises to you—to help us say, “hey, AI is a big deal to us, and we’re handling these matters with care.”

She continued by defining relevant terms (large language model, OpenAI, Microsoft Azure) and then summarizing where GPT-4 fits in the review process during our pilot, which went something like this: an attorney authors a review protocol, aiR annotates documents using that review protocol, attorneys review aiR’s annotations, and then the tests start all over again. Our pilot program repeated this process again and again, all to ask: does aiR understand what we’re looking for?

With that context in mind, we were off to the races to hear from our customers. For those who missed the session (or want to revisit key insights), a snapshot of those conversations is below.

Cristin Traylor: What was your experience with piloting GPT-4?

Melissa Dalziel: Like all good beginnings, it started with a conversation. So, we talked with Relativity and they gave us some ideas: “This is how we’re thinking of developing the tool. Would you like it? Would you buy it? Is that what you want to accomplish?” Through a couple of conversations like that, we were able to really get comfort that they cared about the firm as customers and what we needed in the tool for it to actually be relevant to our daily practice.

It got really exciting as we saw the ideas be refined and become closer and closer to what we really wanted and needed the tool to do.

And then we sent in data—a sample of documents that had already been coded in a case, along with the review protocol—and they, behind-the-scenes, tinkered and tried to figure out how the AI could come close to the coding. And if it wasn’t close to the coding, why not? Was it that the AI was detecting things that the reviewers had missed? Or was it that the AI still needed further refinement? Was it an issue with the prompt?

Matt Jackson: I’ll say that the tool puts a real emphasis on the prompt. I knew that there were some weaknesses in our original protocol, because our data was from a review we had done some time ago, and that played out when we started to look at the documents. We saw some of those weaknesses reflected in the results and when we corrected those weaknesses, we got better results. And actually, pretty good results.

It does place a high emphasis on that prompt, how it’s engineered, how accurate it is, and how much information you have to put into your protocol. We don’t have that information, usually, at the beginning of a case, right?

Michael Cichy: One good experience we had was with a review memo; it was a pretty in-depth one, a thirty-page review memo. aiR was able to point out two documents from the data set and say, “this attachment might be something you want to look at.” You don’t see that kind of granularity in earlier-generation computer-assisted review tools. Whereas this, it said, “this seems important” and it was. These were documents that were critically important.

Cristin: With this pilot, we’ve really focused on how we can accelerate first-pass review. How do you see this technology being incorporated into your e-discovery practice?

Jason Lichter: While first pass review is the obvious initial place in which we see it being implemented, it is by far not the only one.

At the same time that you are putting that prompt in for responsiveness, you can also be doing so for, I believe, up to ten issue codes (in our Advanced Access version of the tool), and you’re going to get a score and snippets for each of those issues. So it can really accelerate issue designations, too.

Beyond that, I would say if you have low-risk matters—subpoena response, cases where you would otherwise do what some call an over-the-wall production based on some sampling and QC—you can meaningfully enhance your QC by using the generative AI tools that Relativity is integrating.

Later in the conversation, Jason added some additional thoughts on this topic: Reasonable minds can differ on whether this technology can be a replacement for contract-attorney-staffed, first-pass review. We may eventually get there; we’re not there today.

Melissa Dalziel: I can’t wait for depo-prep. I’m really excited about just focusing on an individual witness and an individual subject and using it to help cull from a larger data set. As you know, at the end of responsiveness, you may have a lot of tagging, and a lot of documents, but you don’t necessarily have the best documents or the focus that you want to take the case in—because responsiveness is so much broader than relevance.

Cristin: We had a user group where all of our pilot customers got together and shared ideas about what they were going through. Some of the things we talked about were consistency and the unique perseverance of generative AI tools like aiR. What are your thoughts on those concepts?

Jason Lichter: The fact that there’s no fatigue—that issue came up in our pilot. There were a few documents that ostensibly looked like tech exceptions (as opposed to human-generated documents we could use). In the Relativity viewer, it just looked like it was gobbledygook, but if you scrolled down long enough or opened it in the native viewer, in fact, on page 100-something, there was actual relevant material. aiR found that where the human reviewer did not.

Melissa Dalziel: I just love the idea of it as a giant brain. One giant brain, with one perspective, with an infinite memory. With the same understanding of language and what’s important. And that, with a single change of your input, you can change the focus of that entire brain on your data set. As you can imagine, several weeks into a review, you learn new facts. You don’t want to have to send reviewers back over what they’ve looked at. We’ve already talked about inconsistencies across reviewers. I want my big brain to be tweaked and to look at that data again very quickly and implement those changes.

A Level-headed—but Optimistic—Look Forward

Our pilot of GPT-4 in RelativityOne—which has borne aiR for Review, and will become generally available to our users in 2024—has allowed the firms involved to ask big questions about the role AI plays in the review process. Questions like:

Do this tool’s capabilities outweigh its risks?
How can we learn to steer it where we want it to go?
Where else would AI tools be useful?

As we innovate, we know how crucial it is to work directly with our customers to gauge the success of our offerings. And while we know that bringing AI on board takes careful consideration—and we’re devoted to that—we’re also excited for the future of this technology.

“What makes ChatGPT so interesting and unpredictable is that its creativity is unleashed at a very high level, but you can definitely put handcuffs on it,” said Melissa Dalziel. “The secret sauce [to make it successful in a high-stakes field like e-discovery] is just that: how you keep it locked on your documents.”

Graphics for this article were created by Kael Rose.

The AI Advantage: Building Responsible and Defensible Generative AI Solutions with Relativity aiR

In this webinar, learn about the development and testing of Relativity aiR—a new suite of generative AI solutions—so you can feel confident leveraging the latest AI technology.

WATCH NOW

Celia Stevenson is a member of the marketing team at Relativity where she serves as a copywriter.