Transport for London (TfL) runs and maintains London’s road, rail, and underground transport network; one of the largest in the world. Its mission is to ensure that its nine million residents and almost twenty million annual visitors can travel safely and easily, moving London forward in a healthy and sustainable way.

Managing this multi-layered transport network is a deeply complex challenge – a part of which is monitoring London’s 65,000 roads. Reacting swiftly to road incidents is critical, as each unaddressed minute means traffic jams build exponentially. It’s also costly – congestion costs London $7.5 billion per year in lost labour alone, on top of stress and inconvenience for road users.
So what if TfL could bring together digital and real-time data, and break these traffic jams in seconds rather than minutes, saving the city countless man hours and drastically cutting pollution?
Challenge
A key challenge for TfL was the low-quality and disparate travel data it is provided with. Its historical approach was also to collect distinct data sets, which meant it could only answer a fraction of the questions the team wanted to ask.
TfL collects terabytes of data every week, but because that data is stored and analysed separately, no meaningful conclusions could be drawn based on the relationships between datasets. Insufficient sensors to gather fresh data, like cameras and telematics, also meant TfL could only get insight into traffic incidents once they’ve been visually spotted.
“We were effectively using this disparate data through Excel sheets,” said Andy Emmonds, chief transport analyst at TfL. “None of this data was aligned or real-time, and what we needed was to be a real-time operator – to do that, we needed a digital twin.”
With that in mind, a “digital twin” was identified as a potential solution to the congestion challenge; a computer replication, in which ‘if/then’ scenarios can be tested before the system is deployed in the real world, made possible with Neo4j’s technology.
Solution
It became apparent that using a graph database would be the most efficient, cost-effective, and performant way to power such a digital twin. This would help uncover and examine hidden patterns and relationships across billions of data connections to make the decisions needed to predict and handle traffic incidents.
If you liked this content…
Graph databases were perfect for this challenge as they store and examine the connections between data points as data itself, much in the same way commuters think about the routes and connections in their daily travel. Neo4j’s graph solution meant TfL could connect and feed data sets into the digital twin, enabling it to improve its ability to detect and address incidents on London’s roads in as close to real-time as possible.
Result
So, what does TfL’s digital twin look like, and how exactly does graph power it? It consists of five layers:
- Digital twin data: the first level of the model, where input data is aligned with the business challenge
- Framework: the data is organised to solve the challenge
- Graph database: the data is set up so it mirrors the physical network it is modelling
- Visual layer: The data is sent to TfL’s control room for interpretation
- Plug and play layer: The data is used to solve different road problems
To try out its new solution and to see what real-time insights it would provide, TfL set up a stage rehearsal – which yielded results almost immediately.
Recalling the test, Emmonds said:“We set up a test product which was fed data powered by graph that could tell us in near real-time if there was a problem on the road. On the day of the test, the system detected five incidents that the control room didn’t pick up. That was the proof in the pudding for us.”
Currently, it takes TfL between 14 and 17 minutes to detect an incident. By the time it’s spotted and interventions are put in place, an average of 27 minutes have been lost in terms of traffic buildup. That means every minute of delay from an incident’s occurrence is worth $14,000.
The progress indicates that TfL’s digital twin could play a pivotal role in cutting congestion by 10 percent – a result worth $750 million per year to the capital and over $1,500 in time back per driver per year according to its own estimates.
Looking to the future, TfL hopes to build an optimizer for peak traffic days. For example, when a stadium event is happening, it will define the best plan and control routes across the network driven by data from the digital twin. TfL is also hoping to use the solution to build emission reduction strategies for London, and even lay the foundation for an autonomous vehicle network.