Editorial

UK public data ‘not yet usable’ for AI

ODI test platform highlights gaps in data quality and accessibility of National Data Library.

Posted 26 March 2026 by Christine Horton


A prototype developed by the Open Data Institute (ODI) has provided an early test of the UK government’s proposed National Data Library raising serious concerns about whether the country’s public data is ready to support it.

The ODI has developed an experimental platform, dubbed “NDL-Lite” which brings together more than 100,000 public datasets from across government. The platform effectively stress-tests the government’s plans to demonstrate what an AI-ready data platform could look like.

The National Data Library was created to bring together public sector data into a more accessible, standardised and AI-ready resource to support better public services and policymaking.

But rather than showcasing the strength of Britain’s data infrastructure, NDL-Lite has exposed structural flaws – from poor labelling to outdated records – that prevent AI tools from using official data reliably.

Researchers found that when authoritative datasets were difficult to access or interpret, AI systems defaulted to alternative sources such as news articles or commercial databases. These sources, while easier to process, are often less accurate or consistent.

“The Government’s National Data Library has huge potential, but much of the data it would rely on is not yet usable by modern AI systems,” said Prof. Elena Simperl, director of research at the ODI. “If that doesn’t change, there is a risk that AI tools will increasingly rely on sources that are easier to access, rather than those that are most reliable.”

The findings build on earlier ODI research, which showed that even basic public queries, such as tax or benefits questions, can be answered using non-official sources when government data is inaccessible.

Structural issues in key datasets

Testing within the prototype highlighted several persistent issues across government data, including poor labelling, inconsistent formats and limited interoperability.

In one example, datasets labelled “crime” were found to represent different local authority statistical releases that could not be combined due to a lack of shared standards. Some key datasets were also outdated or inaccessible. One Home Office dataset has not been updated since 2018, while a newer version is not currently accessible via the Office for National Statistics API.

As a result, even basic analytical tasks such as tracking outcomes from recorded crimes through to charges or convictions are difficult to carry out consistently.

Event Logo

If you are interested in this article, why not register to attend our Think Data for Government conference, where digital leaders tackle the most pressing issues facing government today.


Register Now