With technology evolving at an unprecedented rate, cybercriminals are able to harness the power of new tech for their own unethical gain. In recent years, cybercriminals have extensively utilised the freedom offered by the internet, including the dark web. The latter is a part of the deep web that is inaccessible to ordinary browsers and hidden by multiple proxy layers where stolen private data, intellectual property, confidential information, drugs, and illegal weapons are sold.

A recent report found that ransomware blog posts on the dark web, used for blackmailing companies or revealing new successful hacks, have increased to an average of 476 a month in 2023, peaking in November with 634 posts.
Using web crawlers and scrapers, cybersecurity actors can scan hundreds of thousands of URLs looking for specific data that may have been leaked or sold online, including corporate email addresses, phone numbers, company names, employee information, as well as technical details, such as access tokens, IP addresses, or source code. It is also possible to set up instant alerts whenever compromised data becomes available in the public domain or the dark web.
Vaidotas Sedys, head of risk at web data gathering Oxylabs, said, “Powered by modern web scraping solutions and ML technology, today open source intelligence allows cybersecurity companies to take a proactive approach to incident prevention and management. Web crawlers and scrapers can be customized to scour through millions of pages, including dark web repositories. Scraping the dark web speeds up the detection of data leaks, incident response efforts, cyber threat hunt, and research on the newest criminal strategies.”
Although private cybersecurity companies have been efficiently utilising web scraping for quite some time, the public sector is yet lagging behind. According to Sedys, it is imperative for public organisations to start investing in open-source intelligence more actively, too.
If you liked this content…
“Some public organisations are already scanning the opportunities. For example, in December, the National Agency of the City of London Police announced that it was seeking a £1.5 million investment in web scraping technology. The technology would be used throughout the UK by police forces, regional organized crime units, the Serious Fraud Office, and other law enforcement agencies to provide an increased level of investigative capability,” said Sedys.
However, the public sector still has a limited understanding of how to squeeze the maximum benefits out of web scraping, said Sedys. To counter this, initiatives such as Project 4B are working with public organisations, universities, and NGOs to educate on the benefits of web scraping tools in the battle against cybercrime.
“In recent years, we have worked closely with the Lithuanian Government to create a tool based on web scraping and AI technologies that now enables the Communications Regulatory Authority of Lithuania to detect illegal content related to child sexual abuse. In the first two months of its use, the tool has identified 19 violators of national or EU laws, leading to 11 complaints to the Inspector of journalist ethics registered, the filing of eight police reports, and a couple of pre-trial investigations.
Open-source intelligence is the most efficient way to enable the public sector to monitor the entire web in real-time and take a proactive approach to incident management and prevention, getting ahead of cybercriminals that constantly come with new, innovative attack vectors,” said Sedys.
Think Digital Partners is pleased to announce a new event for 2024. Think Digital Identity and Cybersecurity for Government takes place in London on May 8. Find out more and get your ticket here.