Big data is big news. An estimated 90 per cent of the world’s data was created in the last two years (see www.ibm.com/big-data) and insights gleaned from large datasets are increasingly driving business innovation and economic growth. Underpinning this “big data revolution” is a powerful combination of low cost cloud computing, open source analytics software and new research methodologies. These are enabling us to move from simply storing large sets of data to extracting real value from them. Big data analysis can now tell us everything from the most borrowed library books in 2013 to the most overweight areas in England. It is being used by companies across the UK to predict the “next big thing” in the music industry, or to underpin new product development. And big data technologies are transforming how we deliver services such as transport, probation and health.
What can the big data revolution do for law? One of the big data challenges for law is the statute book. It’s simply too big and it changes too quickly for any one person to comprehend. The big data revolution provides us with a real opportunity to understand how the statute book works, and to use those insights to deliver better law.
But research across the entire body of current legislation is only possible if the basic ingredients – the data, the tools and some tried and trusted methods – are as readily available as the computing power and the storage. Legal researchers simply aren’t equipped to take advantage of the big data revolution right now. They lack a “big data toolkit” – something tailored to their specific needs and to the complexities and nuances of law. They need easily downloadable data and new online tools and open source tools that they can easily adapt and use. That’s what we’re going to provide through a new project – Big Data for Law.
Understanding researchers’ needs
The Big Data for Law project, announced by David Willetts MP on 6 February 2014, has three aims. The first is to understand researchers’ needs. For the first time ever, we’ll be putting big data technologies into the hands of a non-technical legal audience, so we need to get it right. As a first step it’s vitally important that we understand what researchers need and want, and what their capabilities and limitations are.
Access to raw data
The second aim is to give researchers access to as much raw data as possible, in particular by investigating how to derive new open data from closed data. One example is the vast quantity of legislation.gov.uk usage data, which is currently closed. The project will look at ways of processing that data – for example by anonymising it – so that it can be released as open data that will provide researchers with valuable information about how legislation is used.
We know that researchers currently don’t have access to all of the data they need – so finding new, safe ways of producing open data from closed data is incredibly valuable. We could generate generic cluster data or “recommendations” datasets that help to reveal what people look at after they’ve looked at section 1 of the Defamation Act, or that people who looked at legislation A, B, C also looked at legislation X, Y, Z. That might be a really useful thing to share with users.
Understanding usage patterns might lead to us totally restructuring how we present legislation online, based on real evidence of where users experience difficulties, or what works best for them. Opening up more data than ever before could really help us to improve service delivery, making it easier for people to correctly understand the legislation they are looking at on legislation.gov.uk.
Discovering a pattern language
The final aim of Big Data for Law is to examine the patterns that occur across the statute book with the aim of discovering a “pattern language” for legislation. A pattern language is simply a structured method of identifying and describing good design practices. Pattern languages have revolutionised software engineering over the last twenty years and have the potential to do the same for our understanding of the statute book. You don’t create or invent patterns – you identify them as “good design” based on evidence about how useful and effective they are. This might lead to a common vocabulary between the users of legislation and the people who draft it. Drafters could use the patterns to identify useful and effective drafting practices and solutions that deliver good law.
Experience with legislative drafting is scarce and valuable. The detection and systemisation of patterns that distil that experience – and building tools to convey the language and structure of successful legislation – have the potential to offer considerable benefits.
At the moment we simply don’t know whether there are patterns in the statute book. It’s a theory and having the opportunity to test that theory, through Big Data for Law, is very exciting. If we identify patterns, it could have a profound impact on legislative policy and practice. It could lead to a radically different approach to structuring teaching materials or guidance for legislators, for example. Or it could generate the evidence that supports policy makers to better understand legislative impact. It could provide a platform for technical developments in natural language processing – which in turn could help make updating legislation easier and more cost effective.
Big data, then, looks set to transform law in the same way that it has transformed business, and research, across a wide range of sectors. And there has never been a more relevant time for research into the architecture and content of law, the language used in legislation and how, through interpretation by the courts, it is given effect.
Legislation.gov.uk is now used and accessed by a wider group of users than ever before, and most have very little formal legal training. We know they find legislation difficult. It’s not surprising really – they’re confronted by huge amounts of legislation that’s piecemeal in structure, frequently changes, and which has complex inter-dependencies that aren’t always evident to the lay-reader.
What we’ve lacked, until now, are the tools, data and methods we need to better map and understand the statute book, to understand evidentially what practice supports understandability and usability, and to generate the evidence that could support policy makers to understand legislative impact. Big Data for Law will change that.
John Sheridan is Head of Legislation Services at The National Archives. Big Data for Law is led by The National Archives, working with a range of partners across the public and private sectors, and will deliver a new service for researchers – legislation.gov.uk Research – by March 2015, when the project completes.