It’s no secret that gen AI is hindered by a critical issue – whether we can trust the accuracy of its outputs – this is down to the LLMs that feed gen AI systems, and how they work. In the ocean of content they often make wrong decisions, otherwise known as AI hallucinations.
Companies trying to move forward with gen AI plans are reportedly grappling with a staggering 41 percent LLM hallucination rate, producing baseless responses and poor decision-making.
The public sector – like all organisations – wants to leverage AI to enable greater productivity. But what challenges do they face when it comes to the accuracy of results using LLMs?
For all the proposed benefits of optimising processes and informing decision-making, gen AI’s credibility for real world use cases within the public sector is hindered by a lack of accuracy, transparency and explainability in the content it generates – otherwise known as ‘hallucinations.’ The difficulty hallucinations create means public sector decision-makers are left gambling that the result of a prompt will be correct. This could have a significant impact on the direction of a project, and any issues would affect trust in a department.
The relative nascency of gen AI means public sector IT teams don’t yet have total clarity on the best way to create, train, and maintain the LLMs that power these systems. Traditional LLMs — which rely on probabilities rather than definitives — aren’t set up to extract the meaning from modern intertwined data successfully. Drinking from an ocean of online content, public sector records, and potentially contradictory options, they often repeat false facts or, worse, fill in the blanks with plausible fabrications, generating hallucinations.
Professor Yejin Choi has demonstrated how ChatGPT fails to deduce inferences that seem clear to humans. For example, if you tell ChatGPT it takes two hours for two shirts to dry in the sun, then ask how long it would take five shirts to dry, it will tell you the answer is five hours. Gen AI, in this instance, does the maths and ignores the logic; it doesn’t matter if you put ten shirts out; it will still take two hours. Data professionals are, therefore, turning to trusted, well-known technologies to help make the outputs of gen AI systems more reliable.
Can you provide more information on knowledge graphs and how they can help organisations feel more confident in AI driven decision-making?
Graph technology makes LLMs less biased and more accurate. It improves their behaviour and forces the model to focus on the correct answers. When an LLM is trained on curated, high-quality, and structured data – the knowledge graph – the risk of errors and hallucinations decreases significantly.
If you liked this content…
Knowledge graphs organise data from various sources. They capture information about entities such as people, places, or events and establish connections between them. System developers can define ‘golden relationships’ within the data model. These become the standard by which the knowledge graph can correct the LLM’s errors (much faster than the LLM could on its own). The corrections can themselves serve as examples to train refined machine-learning algorithms.
What about RAG, and GraphRAG?
GraphRAG, in particular, is a technique that adds knowledge graphs to Retrieval Augmented Generation (RAG), and which has become essential for addressing hallucinations. RAG enables organisations to retrieve and query data from external knowledge sources, giving LLMs a logical way to access and leverage their data. Knowledge graphs ground data in facts while harnessing explicit and implicit relationships between the data points, forcing the model to focus on the correct answers. The result makes gen AI outputs accurate, contextually rich, and – critically – explainable.
Is there anything that might prevent an organisation leveraging this technology?
Whilst the public sector may have high expectations for gen AI, there may be questions around ethics and responsible AI practices – especially until gen AI can be guaranteed 100 percent accurate. There is also a question around Gen AIs creating material without providing suitable attributions, infringing on intellectual property (IP), and inadvertently incorporating data bias into responses – all issues that public sector organisations will want to avoid. Applying technologies like GraphRAG can help mitigate these risks by enabling gen AI to understand complex relationships and enhance the quality of its output, while also providing direct attribution to the source material. For a deeper dive into gen AI ethics, you can explore the principles and risks informing its development here.
Do you have any use cases you can share?
Another option to make LLMs more accurate is to use them for generating knowledge graphs. Here, an LLM processes large amounts of natural language to derive a knowledge graph. While the LLM is opaque, so to speak, the generated knowledge graph is characterised by transparency and explainability.
To take a hypothetical example with the NHS, for example, if it were to be using gen AI to assist with medical research, the use of a complex corpus of internal and external research documents makes it crucial to be able to prove the origins of experimental results. Knowledge graphs in healthcare represent a powerful tool for organising and analysing constantly growing data – and the entities and relationships within it – to provide verifiable answers without hallucinations which could otherwise misinform clinical decision support.