Mapping the Dark Web searching for illegal content

Pierluigi Paganini April 11, 2016

Recently the intelligence firms Intelliagg and Darksum have issued an interesting report on the Dark Web and related mapping.

We have discussed several times about Deep Web and Dark Web, discussing the reason why the hidden part of the web is even more dangerous.

However the darknets aren’t a prerogative of criminal organizations, a good portion of the content it host is legal as demonstrated by a recent global survey commissioned by the Centre for International Governance Innovation (CIGI).

The research demonstrates that 71% consider necessary the shut down of the dark net (36% strongly/35% somewhat), likely because the hidden part of the web is associated in the headlines with criminal activities.

Another interesting result emerged from the research is that citizens in some countries are much more likely than others to believe the “dark net” should be shut down. Indonesia (85%) and India (82%) lead the ranking, followed by Mexico (80%), China (79%), Egypt (79%). Bringing up the rear are Kenya (61%), South Korea (61%) and Sweden (61%).

It is not clear in fact if people interviewed were made aware of the legal usage of dark net before answering the question.

The Dark Web is a place crowded of cyber criminals and hackers that host the most popular black markets, but it a serious mistake to forgot that it is also a precious environment for journalists, activists, whistleblowers and political dissidents that escape from the censorship and repression.

Many experts ask me if there is a way to discover the real proportions between illegal and legal contents in the dark web, and I always explain that it depends on the sample that we use for the elaboration of the statistics.

Recently the intelligence firms Intelliagg and Darksum have issued an interesting report that tried to provide a reply to the above question. The researchers involved in the study focused their analysis on the Tor network that represent a significant portion of the dark web,but not its totality.

The experts used a spider software to crawl the Tor network and collect the information used in the study.

“We compiled our census of the dark web using the Darksum ‘collection software’, a ‘spider’ or software application that crawls through the web following links in order to compile an index of its pages, and Intelliagg’s ‘machine-learning intelligence classification system’ – complex algorithms that are ‘trained’ by humans then sent off to classify data automatically.” states the report.

“Our classification system was ‘trained’ using data that had been classified manually from 1,000 sites on the dark web. It proceeded to classify the remaining data automatically without human supervision. This automated method proved to be 94% as accurate as it would have been had this process been entirely done by hand, meaning that nine times out of 10 our algorithms came to the same conclusion as an experienced analyst”

The experts run their spiders two weeks in February 2016 focusing their analysis on selected dark web services, including pornography, fake documentation services, drugs, carding sites, financial fraud sites, weapons, blogs.

Dark Web Services - study

According to the experts, the Tor network is currently composed of approximately 30,000 distinct .onion addresses that result active.

The spiders accessed websites in a total of 32 different languages, the vast majority of information on the hidden services network is in English, followed by German and Chinese.

Of the 29,532 .onion identified during the sampling period, only 46% percent could be accessed, the remaining part is related to C&C infrastructure used to manage botnet, file-sharing applications or chat clients.

“A total of 29,532 ‘.onion’ addresses were identified during the sampling period. Of these, fewer than half were accessible at some point during this period. The remaining 54% (which were not analysed further) were probably only up on the dark web for a very short period of time. This could be for many reasons: commonly that they were addresses relating to ‘command and control’ servers used to manage malicious software, chat clients, or file-sharing applications” continues the study.

The real surprise is related to the hidden services automatically analyzed by the experts, 48% can be classified as illegal under UK and US law. By analyzing manually a separate sample composed of 1,000 hidden services the experts found about 68% of the content to be illegal.

Below the percentages of content associated with each category.

dark web categories

Let me suggest to give a look to the report.

[adrotate banner=”9″]

Pierluigi Paganini 

(Security Affairs – Dark Web, cybercrime)



you might also like

leave a comment