A group of experts from the Massachusetts Institute of Technology’s SMART lab in Singapore has recently published an interesting research paper on the Dark Web.
The researchers collected and analyzed the dark web (a.k.a. the “onionweb”) hyperlink graph, they discovered highly dissimilar to the well-studied world wide web hyperlink graph.
The team led by Carlo Ratti, director of MIT’s Senseable City Lab, used the Graph theory as a tool for analyzing social relationships for the dark web.
The experts analyzed the Tor network, one of the most popular darknet, they used crawler leveraging the tor2web proxy onion.link.
It is important to highlight that the team focused its analysis on the Tor Network, that anyway represents just a portion of the dark web.
The team crawled onion.link using the commercial service scrapinghub.com, they used two popular lists of dark web sites trying to visit them and accessing all linked pages using a breadth-first search.
The team just included in their analysis websites which responded to avoid including in their results services that no longer exist.
“I.e., if we discover a link to a page on domain v, but domain v could not be reached after >10 attempts across November 2016–February 2017, we delete node v and all edges to node v.
In our analysis, before pruning nonresponding domains we found a graph of 13,117 nodes and 39,283 edges. After pruning, we have a graph of 7, 178 nodes and 25, 104 edges (55% and 64% respectively)” states the researchers.
The first discrepancy emerged from the research is related to the number of the active .onion domain. The maintainers at the Tor Project Inc. states that the Tor network currently hosts ∼60, 000 distinct, active .onion addresses, meanwhile the team of experts has found only 7, 178 active .onion domains.
The researchers attribute this high-discrepancy to various messaging services— particularly TorChat, Tor Messenger, and Ricochet in which each user is identified by a unique .onion domain.
The Graph-theoretic results show that ∼30% of domains have exactly one incoming link—of which 62% come from one of the five largest out-degree hubs. 78% of all nodes received a connection from at least one of them.