Garbage in garbage out: the dangers of training algorithms on biased data

Algorithms and artificial intelligence (AI) have become increasingly present in our daily lives, from the voice recognition on our phones to chatbots used to interact with customers and now with council services in London.

The potential benefits are huge. AI can increase the efficiency of service delivery both in the public and private sector. In healthcare, algorithms can help with the better triaging of patients so that the appropriate practitioners can attend them faster. It can also drastically improve people’s outcomes through increased prediction or detection accuracy. Being able to predict with a certain degree of accuracy which child is at risk of being in an abusive household, by combining data from schools, health and social care, and targeting interventions accordingly could be an invaluable tool.

Despite these benefits our increased reliance on algorithms is not devoid of risks. At times, it seems that we have imbued them with an aura of objectivity and infallibility, almost forgetting that these artificial agents have to function in imperfect environments with incomplete or noisy information. We must be aware of the risks involved and the limitations of algorithms.

Data is the fuel of AI and most data is biased in one way or another. We very rarely have complete and perfect information about what is going on in the world around us. Most of the time we cannot fully observe the underlying processes that are at play in a given situation. And often the information that we have is limited by our own subjectivity and understanding of a situation. This is reflected in a lot of the data that we collect

Biases in data can occur in several ways. For example, we can never observe all crimes being committed and can pretty much only observe levels of reported crime. This gives us a partial depiction of what is actually happening. In addition, people collecting the data can transfer their own biases into the type of data they collect, such as racial biases by over-sampling a given subgroup within the population.

If you liked this content…

When people decide to opt out from sharing the medical data this introduces a selection bias which means that the sample will no longer be representative of the general population. Although practitioners can try and clean bias from the data, these processes are rarely sufficient.

These issues of bias can have profoundly negative outcomes. Algorithms will be more prone to error on sub-populations that have low representation within a sample. For example, groups that have historically been underrepresented on the credit market might be discriminated against.

It’s because of the ubiquity of algorithms and AI in our daily lives that we should not become lazy and blindly or naively think that we should delegate all tasks to them. We need to be very aware of the limitations of the data we feed into these decision-support tools.

Eleonora Harwich is a Lead Researcher at Reform Think Tank. Reform is an independent, non-party Think Tank whose mission is to set out a better way to deliver public services and economic prosperity.

Eleonora will also be speaking at THINK AI for Public Sector, this September, in a session entitled Sir Humphrey and the Robots. You can still register to attend here.

Search

Editorial

If you liked this content…

If you are interested in this article, why not register to attend our Think AI for Government conference, where digital leaders tackle the most pressing AI-related issues facing government today.

Follow us

About us

Contact us

Subscribe

Register now for Think AI for Government

Register now!

Register now for Think Data for Government

Register now for Think Innovation for Government