What is predictive coding?

Code by Michael Himbeault

Predictive coding is a form of technology assisted review (TAR) used to assess the relevance of high volumes of documents for purposes of electronic disclosure (e-disclosure). E-disclosure refers to the disclosure of all electronically stored information (ESI) – as opposed to any hard copy documents – as part of the litigation process.

How does it work?

The logic behind predictive coding systems is similar to that of sophisticated search engines (such as Google) which use complex algorithms, combining search terms and personal search history, to serve up web pages in sequential order of relevance, personalised for each user. There are two primary elements involved:

  • keyword and key phrase search; and
  • iterative computer learning to rank relevance of each page.

Similarly, predictive coding software analyses multiple electronic documents that have been identified as potentially relating to a case. The software looks for document traits such as keywords and phrases – and may also run more nuanced searches on concepts, context and metadata – to determine the relevance of each individual document (ie. whether it needs to be disclosed), and provides it with a relevance mark (eg. on a scale of 1 – 10). More importantly (and what sets it apart from basic search), it also learns to improve relevance grading using a ‘seed set’ of documents which are graded by (human) legal experts; it analyses the manual grading to optimise the automatic grading – and this process is repeated multiple times to enhance accuracy. By combining sophisticated search techniques and iterative learning, the software can gradually become more adept at correctly identifying the relevance of documents.

All predictive coding systems work slightly differently but the goal is always the same: to end up with ESI graded for relevance. Once this process is complete, all the documents which have been classified with a grade below a certain level of relevance will be discarded. The remaining ESI will then be assessed by lawyers and paralegals to determine which documents need to be disclosed, thereby completing the process of e-disclosure.

Benefits and risks

The main reason for using predictive coding is to reduce cost. Large cases may involve hundreds of thousands or even millions of electronic documents. The traditional method of e-disclosure has been for law firms to temporarily employ teams of paralegals to sift through all the ESI manually (after the application of some basic keyword search) – which can obviously lead to vast expenditure. Although legally trained staff are still needed to spend time on the last element of filtering documents, and software licensing fees need to be factored in, predictive coding can significantly reduce the overall cost of e-disclosure. As such, in cases where there is a large imbalance between the parties (eg. where only one party could afford to undertake full manual e-disclosure), predictive coding can help to balance the playing field. The other big advantage of this type of sophisticated TAR is speed; when it comes to filtering large quantities of material, humans are totally outpaced by modern computers. So litigation which may otherwise last many months or years can be completed faster, cheaper and more efficiently.

The main risk of relying on any automated system is that, if it goes wrong, only manual intervention can fix it. An e-disclosure exercise which fails to correctly classify the pertinent documents can lead to huge losses, financial and otherwise, both for law firms and their clients. As such, it’s vital that checks and balances are in place; lawyers should be properly trained to understand how predictive coding software works so they can manage and monitor it, just like any other tool.

The view of the courts

Predictive coding has been used in the USA for several years for purposes of e-discovery (similar to e-disclosure but broader in its meaning). In the 2015 case of Rio Tinto PLC v Vale SA et al, No. 14-Civ-3042, Magistrate Judge Andrew Peck stated that the use of predictive coding has developed in case law to the point that it was “black letter law that where the producing party wants to utilize TAR for document review, courts will permit it.”

As is often the way with new technology, it has taken a little longer to sail across to this side of the pond, but the High Court sanctioned the use of predictive coding for the first time in English litigation last year in the case of Pyrrho Investments & MWB Property [2016] EWHC 256 (Ch). A few months later, the High Court went one step further and ordered the use of the technology on the basis that it would reduce costs, in the face of opposition from one party, in the case of Brown v BCA Trading and others [2016] EWHC 1464 (Ch).

What does this mean for the future of the legal profession?

As predictive coding systems become more advanced and the technology continues to receive recognition from the courts as a standard tool in large-scale litigation, its application by law firms will no doubt increase. But what will this mean for paralegals who currently undertake manual document review, and for lawyers in general?

Joanne Frears, partner at Blandy & Blandy, predicts that this shift “may result in the exercise of discovery being undertaken not by teams of legal juniors, but by programmers setting learning parameters to determine the right outcome in tagged documentation … As lawyers our value to clients no longer lies in being able to read quickly, in interpreting complicated legalese, or even in planning for eventualities – computers can foresee more scenarios and then literally ‘do the math’ to ascertain the likelihood of it happening. The profession must work to ensure that creative problem solving skills are made paramount so that, in the world of e-disclosure and AI, the ‘human touch’ remains relevant and, just as importantly, desired, by clients.”

Technology will not replace lawyers but the legal profession will need to adapt and focus on the uniquely human skills that cannot (currently) be replicated by machines.

Further reading

LexisNexis: Predictive coding in e-disclosure

Anexys: Predictive coding explained

Thomson Reuters: Predictive Coding: It’s Here to Stay

Pepper Hamilton: Facts and fictions underlying the predictive coding revolution

Rosenblatt: Predictive coding – “tar” very much

Alex Heshmaty is a legal copywriter and journalist with a particular interest in legal technology. He runs Legal Words, a copywriting agency in Bristol. Email alex@legalwords.co.uk. Twitter @alexheshmaty.

Image cc by Michael Himbeault on Flickr.