Understanding algorithms

On Urban Dictionary “algorithm” is defined as “a word used by programmers when they don’t want to explain what they did.” As the pace of practical AI adoption increases, there is increasing truth in the joke. Someone who wants to understand this new technology, how it works and how it might be controlled or improved will encounter a number of informational barriers. To overcome the first of those hurdles, we need some proper definitions.

AI and algorithms

“Artificial intelligence” is a blanket term for a lot of different technologies. Essentially, though, it involves entrusting computers with activities that have historically depended on human judgment and decision-making. A small subset of that is what is sometimes termed “general artificial intelligence” which describes the type of AI most commonly seen in science fiction, but at the time of writing any AI which we encounter “in the wild” is far more limited.

This tends to be specific software designed to address a particular set of challenges and to yield a limited category of answers from a defined data set. To take a very simple example, it might be a piece of software which asks a child about its preferences (colour/flavour/etc) and then automatically produces an ice cream sundae. The software which captures these inputs, analyses them and then outputs a conclusion or recommendation is called an algorithm.

Like the child in the ice cream parlour, we might be able to guess some of how an algorithm works. If we tell the machine we like strawberry flavour and get strawberry ice cream back, the analytical process in the middle of that appears fairly straightforward. But what if the machine recommends a banana split? Leaving aside whether the recipient appreciates what they are given (we’ll come back to that), something more complicated has happened which is harder to intuit from the outside.

When an output cannot clearly be linked to the input, a number of factors might be at play. The algorithm may “know”, based on data that it has been supplied with or accumulated, that a very high percentage of children who like strawberry ice cream, also like banana splits. This is the “people who liked x, also like y” analysis, familiar to anyone who shops online. Alternatively, the algorithm may know that pretty much all children (regardless of other preferences) also like banana splits, so it may recommend those when it isn’t able to match their preferences more exactly. There may also be a range of other, unknown, variables being added into the calculation, which the child has no control over.

So, let’s go back to the child. If they like banana splits, they are not going to inquire too deeply into how the machine came up with its recommendation. They will head off to enjoy their dessert. Our concern is with the child who, notwithstanding the cleverness of the algorithm in use, has ended up with banana split when they don’t like bananas. They will want to know (in no uncertain terms and, probably, at some volume) why the machine has got it wrong. The next hurdle to be overcome is therefore to understand the limitations of the technology.

Algorithmic bias and error

An algorithm which analyses based on the “people who liked x also like y” approach is only ever as good as the data which it is given to work with. A range of conscious or unconscious biases may be at play here. There is not space in this article to drill down into this in detail, but to pick a simple example, the programmers of the algorithm may have made a deal with a chain of ice cream sundae shops to obtain the data set which they used to educate the algorithm about preferences. If that chain is particularly famous for its banana splits, then there is a statistically higher likelihood that the subjects whose data are being used have a preference for banana splits. This may not be significant, but it might cause the algorithm mistakenly to correlate preferences for other ice cream flavours with preferences for banana splits. When the machine is then deployed outside of that environment, the bias will make itself felt if in fact there is a less strong preference for those desserts in the public at large.

Such issues of ingrained bias are almost always inadvertent. But there are occasions where more deliberate interference can take place. So, algorithms can sometimes be reverse engineered to produce a particular outcome either by default or with a greater frequency than might naturally occur. The owner may have entered into a long-term contract for bananas but have found that she is suffering a high level of wastage because demand has not matched her expectations. There might have been pressure from her to configure the system so that it recommends banana splits by default wherever another recommendation is unavailable or in short supply.

Moving beyond that, there is also the consideration that these complicated calculations are only ever going to produce a statistical approximation of real life. For any system, there will be a margin of error that is acceptable. So, if from every 100 children that seek a recommendation one is given a flavour of ice cream that they don’t care for, that might be regarded as acceptable. A 10 per cent error margin might be regarded as too high. But that means that a decision is being taken that a percentage of users must suffer the consequences of an erroneous result, a false positive or negative. Where there is no clarity about this margin of error, dissatisfaction (to put it mildly) flows from every incorrect output.

Now, all of this is bad enough when it is about ice creams, but of course in reality AI has far more serious and significant real world applications. Embedded bias is a frequent problem. Facial recognition technology is less effective at making correct positive identifications of people of colour, seemingly as a consequence of being trained on predominantly white datasets. This could lead to wrongful positive identifications and therefore potentially wrongful convictions. A similar problem has also been reported in relation to autonomous vehicles, which are apparently 5 per cent less effective at identifying pedestrians with darker skin tones.

Hidden workings

There is also understandable concern about the “black box” nature of most algorithmic processing. The companies which have innovated this technology are understandably anxious about allowing their intellectual property to be inspected. But without that oversight, the risk is that decisions will be made which are unexplained and, potentially, inexplicable. A scheme developed by Admiral a few years back was going to allow the insurer to make assessments of customers’ attitudes to risk by inferences drawn from their profiles and posts on Facebook. The scheme was cancelled at the last minute by Facebook for infringing its privacy terms (and as they say, when Facebook is telling you that you have overstepped the mark on privacy …). Had it gone ahead, it would have been almost impossible for customers to know what, out of the myriad of data points within their Facebook identity, had contributed to the premium that they had been quoted or a decision whether or not to offer insurance at all.

Malign motives

Finally, there is a particular concern about the use of the technology where the motives of those using it are not benign. These are concerns thrown into even sharper relief by the Cambridge Analytica scandal in which personal data about voting tendencies and other information were misappropriated and, it is believed, used to influence voting behaviour. Similarly there are concerns about China’s social credit scheme which aims to use profiling to produce a ranking of all citizens by reference to their adherence to state-mandated behaviours and promises to curtail the rights and freedoms of those who fail to fall into line (or even associate with those who do).

Legal protections

As is so often the case, of course, legal protections can struggle to keep pace with technological developments. Elements of existing legislation, particularly around personal data protections, can be used to introduce a degree of scrutiny. Article 22 of the GDPR provides for protections for data subjects in the context of “automated individual decision-making” and is the closest thing to a provision directed to AI processing as currently exists. Its protections are limited, however.

Firstly, the right conferred is “not to be subject to a decision based solely on automated processing … which produces legal effects concerning him or her or similarly significantly affects him or her.” So any decision which includes a human element, at any stage, is likely to be excluded. A recent news story has reported on individuals wrongly suffering cuts to their benefits after being identified (incorrectly) by predictive algorithms as being potential fraudsters. The system, which was being trialled in four London boroughs, was judged to be 80 per cent effective. In other words, one in five of those accused of being a fraudster was likely in fact to have been innocent. Here, though, there would be no prospect of lodging an Article 22 objection to the processing, because the system was only producing information about the statistical likelihood of fraud for human review. In practice, though, any human intervention appeared to be negligible. It seems that every individual identified by the software was being written to in terms that alleged that they were fraudsters. Given that many of these would have been impoverished and may well have been otherwise vulnerable, the risk of harm flowing from such an approach was significant.

Human Rights law may also offer some protections, in those parts of the world where it is robustly enforced. So in the 2016 case of Szabo v Hungary (37138/14) the European Court of Human Rights ruled that a planned programme of surveillance infringed on the Article 8 rights of Hungary’s citizens, not least because of the automation of data collection and analysis in the surveillance which would be difficult to scrutinise or control.

The European Court of Justice’s approach in decisions such as the 2014 case of Digital Rights Ireland Ltd v Minister for Communications (C-293/12) also suggests that large scale collection of data for AI processing might be incompatible with human rights (as being disproportionate) even where the interference could otherwise be justified for reasons such as crime prevention.

The need for a new coherent programme of regulation or legislation grows ever clearer as the technology develops. Initiatives like the Lord Chief Justice’s AI Advisory Group, chaired by Richard Susskind (see https://bit.ly/2UzJFTQ), show a growing recognition that the impact of AI is going to require careful study.

Similarly the report of the House of Lords’ communications select committee published on 9 March 2019 shows that there is an ambition to address many of these concerns. But a good deal more thought needs to be given to the real-world implications of this emerging technology and how they might be managed. It is likely to be difficult for these discussions to match the pace of the innovations that they seek to regulate. Without making the attempt, however, there is a risk that legislators and regulators will be overtaken by the driverless car of AI, and will be left running to catch up.

Further reading

CEPEJ: European ethical charter on the use of AI in judicial systems

House of Lords Communications Select Committee Report: Regulating in a digital world (HL Paper 299)

ICO: Big data, artificial intelligence, machine learning and data protection (PDF) https://bit.ly/infolawAI03

Will Richmond-Coggan is a director in the IT & Data team at Freeths Solicitors, specialising in contentious and non-contentious data protection and technology issues. Email William.Richmond-Coggan@freeths.co.uk. Twitter @Tech_Litig8or.

Image: 1018 cc by x6e38 on Flickr.

One thought on “Understanding algorithms”

  1. Pingback: My Homepage

Comments are closed.