E-Discovery and the Rise of Machines

Every day, people and businesses create electronic data about themselves and the world around them, and with modern computing and mobile devices, we often create more data than we could possibly sift through with human eyes.

So what happens when litigants have to review their data in order to respond to a subpoena or discovery request? If you’re a business, are you even sure you understand all that you’ve got? If so then how do you make sure that you’re accurately separating what’s responsive from what’s not, separating what’s relevant from what’s not, holding on to what’s legally privileged, and not missing anything? It’s been a problem in large corporate and commercial cases for a while now, but it’s becoming more prevalent with the sheer volume of electronic data that we create and transmit every day.

The historical solution has been the only one we knew: tackle reams of paper with the brute force of people and hours committed to reviewing every page. But nowadays, in a lot of cases, if you were to print out all that data, you couldn’t afford to pay enough competent people to carefully review all that paper in time.

With electronic discovery, or e-discovery, the solution is still to throw time and bodies at the problem but, also, to expedite the review by scanning or uploading files into a database that reads them electronically, removes duplicates, and renders them searchable.

An emerging solution, however, is predictive coding, or technology-assisted review (“TAR”). With predictive coding, you can teach a computer to analyze a large data set by feeding it small but meaningful subsets that humans have tagged as relevant, privileged, or whatnot. After these initial inputs, you run tests, gauge the computer’s accuracy, and make adjustments. Once you’ve honed the machine’s understanding of the data, you deploy it to code the universal set cheaper, faster, and more accurately than we ever could.

How far has predictive coding come along?

Last week, an influential federal judge had to decide whether he could force litigants to use it to review and produce data. In 2012, this judge was among the first, if not the very first, to approve its use in civil discovery. By 2015, he wrote, the law had come to firmly support a litigant’s choice to use it, but in this case, the litigant had chosen not to.

The plaintiff who requested the discovery wanted the defendant to use predictive coding, but the defendant, who was producing the discovery, preferred to have its own staff run keyword searches instead.

The judge found that he could not compel a litigant to use predictive coding today, but tomorrow, the answer could be different:

“To be clear, the Court believes that for most cases today, TAR is the best and most efficient search tool…. The Court would have liked the [defendant] to use TAR in this case. But the Court cannot, and will not, force [it] to do so. There may come a time when TAR is so widely used that it might be unreasonable for a party to decline to use TAR. We are not there yet. Thus, despite what the Court might want a responding party to do … [the plaintiff’s] application to force the [defendant] to use TAR is denied.”

Ratings and Reviews

The National Trial Lawyers
Mani Dabiri American Bar Foundation Emblem