• Home
  • Cyber Crime
  • Cyber warfare
  • APT
  • Data Breach
  • Deep Web
  • Digital ID
  • Hacking
  • Hacktivism
  • Intelligence
  • Internet of Things
  • Laws and regulations
  • Malware
  • Mobile
  • Reports
  • Security
  • Social Networks
  • Terrorism
  • ICS-SCADA
  • POLICIES
  • Contact me
MUST READ

Severe Hikvision HikCentral product flaws: What You Need to Know

 | 

U.S. CISA adds TP-Link Archer C7(EU) and TL-WR841N flaws to its Known Exploited Vulnerabilities catalog

 | 

Google addressed two Android flaws actively exploited in targeted attacks

 | 

U.S. CISA adds WhatsApp, and TP-link flaws to its Known Exploited Vulnerabilities catalog

 | 

Android droppers evolved into versatile tools to spread malware

 | 

Jaguar Land Rover shuts down systems after cyberattack, no evidence of customer data theft

 | 

Cloudflare blocked a record 11.5 Tbps DDoS attack

 | 

Palo Alto Networks disclosed a data breach linked to Salesloft Drift incident

 | 

Von der Leyen’s plane hit by suspected Russian GPS Jamming in Bulgaria, landed Safely

 | 

Supply-chain attack hits Zscaler via Salesloft Drift, leaking customer info

 | 

Crooks exploit Meta malvertising to target Android users with Brokewell

 | 

North Korea’s APT37 deploys RokRAT in new phishing campaign against academics

 | 

Fraudster stole over $1.5 million from city of Baltimore

 | 

SECURITY AFFAIRS MALWARE NEWSLETTER ROUND 60

 | 

Security Affairs newsletter Round 539 by Pierluigi Paganini – INTERNATIONAL EDITION

 | 

Amazon blocks APT29 campaign targeting Microsoft device code authentication

 | 

Lab Dookhtegan hacking group disrupts communications on dozens of Iranian ships

 | 

New zero-click exploit allegedly used to hack WhatsApp users

 | 

US and Dutch Police dismantle VerifTools fake ID marketplace

 | 

Experts warn of actively exploited FreePBX zero-day

 | 
  • Home
  • Cyber Crime
  • Cyber warfare
  • APT
  • Data Breach
  • Deep Web
  • Digital ID
  • Hacking
  • Hacktivism
  • Intelligence
  • Internet of Things
  • Laws and regulations
  • Malware
  • Mobile
  • Reports
  • Security
  • Social Networks
  • Terrorism
  • ICS-SCADA
  • POLICIES
  • Contact me
  • Home
  • Breaking News
  • Deep Web
  • Scraping the TOR for rare contents

Scraping the TOR for rare contents

Pierluigi Paganini July 18, 2019

Cyber security expert Marco Ramilli explains the difficulties for scraping the ‘TOR networks’ and how to enumerate hidden-services with scrapers.

Scraping the “TOR hidden world” is a quite complex topic. First of all you need an exceptional computational power (RAM mostly) for letting multiple runners grab web-pages, extracting new links and re-run the scraping-code against the just extracted links. Plus a queue manager system to manage scrapers conflicts and a database to store scraped data need to be consistent. Second, you need great starting points. In other words you need the .onion addresses where your scrapers start from. You might decide to begin from common and well-known onion links such as The TOR-hidden-wiki or to start from great reddit threads such this one, but seldom those approaches bring you to what I refer as “interesting links”. For this post “interesting links” means specific links that are rare or not very widespread and mostly focused on cyber-attacks and/or cyber-espionage. Another approach needs be used in order to reach better results. One of the most profitable way to search for “interesting links” is to look for .onion addresses in temporal and up-to-date spots such as: temporal pasties, IRC chats, slack or telegram groups, and so on and so forth. In there you might find links that bring you to more rare contents and to less spread information.

Today I want to start from here by showing some simple stats about scraped .onion links in my domestic scraping cluster. From the following graph you might appreciate some statistics of active-and-inactive scraped hidden services. The represented week is actually a great stereotype of what I’ve got in the last whole quarter. What is interesting, at least in my personal point of view, is the percentage of offline (green) onion services versus the percentage of online (yellow) onion services.

Tor crawlers

This scenario changed dramatically in the past few months. While during Q1 (2019) most of the scraped websites were absolutely up-and-running on Q2 (2019) I see, most of the scraped hidden services, dismissed and/or closed even if they persists in the communication channels (IRC chat, Pasties, Telegram, etc.).

I think there are dual factors that so much affected last quarter in spotting active hidden service. (1) Old content revamping. For example bots pushing “interesting links” back online even after months of inactivity. This activity is not new at all, but during the past quarter has been abused too many time respect to previous quarters. (2) Hidden services are changing address much more fast respect to few months ago. In order to make hard to spot malicious actors, they might decide to keep up-and-running their hidden services only for few hours and then change address/location. Is that way to enumerate hidden-services passing away or is it a simple weird time-frame? We will see it during the next “Scraping” months, stay tuned !

About the author: Marco Ramilli, Founder of Yoroi

I am a computer security scientist with an intensive hacking background. I do have a MD in computer engineering and a PhD on computer security from University of Bologna. During my PhD program I worked for US Government (@ National Institute of Standards and Technology, Security Division) where I did intensive researches in Malware evasion techniques and penetration testing of electronic voting systems.

I do have experience on security testing since I have been performing penetration testing on several US electronic voting systems. I’ve also been encharged of testing uVote voting system from the Italian Minister of homeland security. I met Palantir Technologies where I was introduced to the Intelligence Ecosystem. I decided to amplify my cybersecurity experiences by diving into SCADA security issues with some of the biggest industrial aglomerates in Italy. I finally decided to found Yoroi: an innovative Managed Cyber Security Service Provider developing some of the most amazing cybersecurity defence center I’ve ever experienced! Now I technically lead Yoroi defending our customers strongly believing in: Defence Belongs To Humans.

This analysis and many other studies and tools are available on Marco Ramilli’s blog:

https://marcoramilli.com/2019/07/18/scraping-the-tor-for-rare-contents/

[adrotate banner=”9″] [adrotate banner=”12″]

Pierluigi Paganini

(SecurityAffairs – Tor network, DarkWeb)

[adrotate banner=”5″]

[adrotate banner=”13″]


facebook linkedin twitter

Dark Web Deep Web hacking news information security news Pierluigi Paganini Security Affairs Security News Tor

you might also like

Pierluigi Paganini September 04, 2025
Severe Hikvision HikCentral product flaws: What You Need to Know
Read more
Pierluigi Paganini September 04, 2025
U.S. CISA adds TP-Link Archer C7(EU) and TL-WR841N flaws to its Known Exploited Vulnerabilities catalog
Read more

leave a comment

newsletter

Subscribe to my email list and stay
up-to-date!

    recent articles

    Severe Hikvision HikCentral product flaws: What You Need to Know

    Hacking / September 04, 2025

    U.S. CISA adds TP-Link Archer C7(EU) and TL-WR841N flaws to its Known Exploited Vulnerabilities catalog

    Hacking / September 04, 2025

    Crooks turn HexStrike AI into a weapon for fresh vulnerabilities

    Cyber Crime / September 03, 2025

    Google addressed two Android flaws actively exploited in targeted attacks

    Security / September 03, 2025

    U.S. CISA adds WhatsApp, and TP-link flaws to its Known Exploited Vulnerabilities catalog

    Hacking / September 03, 2025

    To contact me write an email to:

    Pierluigi Paganini :
    pierluigi.paganini@securityaffairs.co

    LEARN MORE

    QUICK LINKS

    • Home
    • Cyber Crime
    • Cyber warfare
    • APT
    • Data Breach
    • Deep Web
    • Digital ID
    • Hacking
    • Hacktivism
    • Intelligence
    • Internet of Things
    • Laws and regulations
    • Malware
    • Mobile
    • Reports
    • Security
    • Social Networks
    • Terrorism
    • ICS-SCADA
    • POLICIES
    • Contact me

    Copyright@securityaffairs 2024

    We use cookies on our website to give you the most relevant experience by remembering your preferences and repeat visits. By clicking “Accept All”, you consent to the use of ALL the cookies. However, you may visit "Cookie Settings" to provide a controlled consent.
    Cookie SettingsAccept All
    Manage consent

    Privacy Overview

    This website uses cookies to improve your experience while you navigate through the website. Out of these cookies, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities...
    Necessary
    Always Enabled
    Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
    Non-necessary
    Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.
    SAVE & ACCEPT