The most significant recent development at ICLR has been the launch, in March 2019, of our legal information lab, ICLR&D.
This was conceived as a space where ICLR, whose traditional role has been publishing legal information built around primary source materials such as case law and legislation, could experiment with case law data in fundamentally different ways. The launch of ICLR&D was to some extent itself a form of experiment. The results have been interesting.
ICLR&D is not a product. Nor is it a physical laboratory or studio. Rather, it is a process or state of mind born of a conscious decision to harness our technical imagination, and to expand our development horizons, to embrace new ways of working with the data we already have, and to engage and collaborate with others in the legal information world.
That decision to create the lab was born of a growing awareness of the explosion of interest in the intersection between law and technology. Until now, however, what is often referred to as “legaltech” or “lawtech” seems to have focused its attention mainly on transactional legal content or processes, such as contract drafting, document discovery and regulatory supervision, or the development of chatbot apps and other public-facing products, and (with some exceptions) rather less on primary legal source material.
We felt that the legal information space warranted further research, and that ICLR was in a good position to do this. We identified four main areas of research, for each of which we created a separate project.
This project is concerned with the automatic enrichment of unstructured legal text using rules-based and predictive techniques. To date, it is the most developed of the ICLR&D projects, with a prototype being launched over the summer of 2019 and presented for discussion at international legal conferences during the latter part of the year.
The project’s deliverable is an open source piece of software, the Blackstone library and statistical model, that allows researchers and engineers to automatically extract information from long, unstructured legal texts (such as judgments, skeleton arguments, scholarly articles, Law Commission reports, pleadings etc).
Trained using machine learning processes based on a body of existing case law data, it uses natural language processing (NLP) to parse raw text and to recognise named entities (such as case names, citations, provisions) and text categories (such as axioms, conclusions, issues), which can then be tagged accordingly and used to build analytical tools and visualization models based on the data.
That’s a simplification. Blackstone was developed using a single Python library called spaCy. The training data comprised the entire archive of ICLR’s law reports and unreported judgments. This corpus of content was broken down into single sentences, and then each sentence broken down into words. The words were then tokenised, vectorised, and parsed for parts of speech and dependency relationships within their respective sentences. The Named Entity Recogniser was trained on this data to spot the entities concerned, to pick them out of a mass of words and phrases, and tag them accordingly. Errors were corrected, and correct recognition was affirmed in a (manual, human) review process. Once the performance reached an acceptable standard, it could be applied to live, raw data. It could take a piece of legal text, analyse it, and pick out the named entities. A similar process was applied to words, phrases and sentences that carried particular types of meaning, such as the identification of an issue, the laying down of a rule, or the declaring of an axiom.
Such techniques are not new, even in the legal sphere: they have been developed in order to analyse the language of contracts or to classify documents for the purpose of discovery (disclosure) in litigation. So far as we are aware, however, Blackstone’s open source posture and focus on case law makes it the first model of its kind.
So that’s where we are. What was released last summer was essentially a prototype, a proof of concept. We are now working on a better, perhaps Beta, version. Watch this spaCy.
In the meantime, for the benefit of those who are not geeks, we have built an app which offers an easy (code-free) interface for the processing of ingested documents or extracts of text, using the Blackstone library: https://blackstone-demo.herokuapp.com.
This project looks at the promotion of open access to case law by analysing and mapping the judgment supply chain. On this project we have been working with the courts, publishers (such as BAILII and Justis) and providers (such as JUST:transcription) in an effort to improve the volume and speed of publication of both written and oral judgments.
This is concerned with exploring ways to accurately impart the significance and meaning of legal materials to the public. As an educational charity, ICLR sees public legal education as a priority, and one that can be addressed in other ways than the free supply of reliable judgments and case summaries in front of a paywall. Although this project remains in its infancy, a good example of what we envisage is the building up of free reference resources on the Knowledge section of ICLR’s website.
This is a conceptual project focused on modelling the connections between the various sources of English law. In particular, it is concerned with understanding how cases relate to each other, using data modelling and visualisations to analyse the content and not merely the labelling of the cases and to find connections and proximities between them that may bypass more traditional forms of indexing and categorisation or subject matter classification.
Collaboration is key
The driving philosophy of the lab is that open is better than closed. We want to work with others rather than pursue these projects in isolation inside a vacuum. So, if you have the urge to get involved with the lab or have an idea you’re interested in that you feel fits with our mission, we’d strongly encourage you to make contact with us.