Stylometric analysis to track anonymous users in the underground

Law enforcement and intelligence agencies conscious of the high risks related to cyber threats have started massive monitoring campaign, everything must be controlled to avoid unpleasant surprises. The trend is shared by every governments of the planet, intelligence agencies are making great investments in term of money and resources to define new methods and to develop new tools for monitoring of social media.

One of the most interesting source of information is represented by underground forums, places in the cyberspace where is possible to discuss of every kind of subject and where it is possible to acquire/rent any kind of illegitimate software or service to conduct a cyber attacks.

It’s clear that these forums, despite anonymous participation of their users, represent a mine of information useful for any kind of investigation, but how to bypass anonymity of participants?

According an interesting study presented by researcher Sadia Afroz at last edition of Chaos Communication Congress in Germany, the 29C3, up to 80 percent of certain anonymous underground forum users can be identified using linguistics, a data that is stunning in my opinion. Sadia is member of the The Drexel and George Mason universities research team composed of Aylin Caliskan Islam, Ariel Stolerman, Rachel Greenstadt, and Damon McCoy.

The method adopted by researchers is based on the comparison of various posts across forums and other social media, such as social networks or blogs.

The researchers declared that every user tend to adopt his own writing style during his internet experience, peculiarity that make it identifiable. The identification is possible thanks the analysis of “function words” that are words that serve to express grammatical relationships with other words within a sentence and are strongly related the attitude or mood of the speaker.

“If our dataset contains 100 users we can at least identify 80 of them,” “Function words are very specific to the writer. Even if you are writing a thesis, you’ll probably use the same function words in chat messages.” “Even if your text is not clean, your writing style can give you away.” 

Digging in the underground forums it is possible to reveal the identity of a cyber criminal that try to sell malicious code or of a terrorist that is trying to communicate within an hidden cell. The study demonstrate the enormous possibility of the technique that could be used also to characterize a single forum and its audience, discovering the relationships of its users with other underground communities, information that is very useful for investigation to track cyber places attended by particular categories of individuals.

The project is very attractive, future develop and improvement could allow to trace anonymous profiles, researcher Aylin Caliskan Islam anticipated that future versions will include temporal information, according to Information Theory science (Metzger, 2007),“timeliness or currency is one of the key 5 aspects that determine a document’s credibility besides relevance, accuracy, objectivity and coverage”.

The technique will allow to link temporal information of user’s posts with IP addresses used for the connections, this information help to localize the physical place used for internet accesses.

The researchers adopted technique for authorship attribution such the stylometric analysis also used in forensic linguistics verifying the capability of method of tracking also against automated framework like Jstylo used to protect user’s privacy and anonymity.

Another interesting tool is Anonymouth,  it is an authorship recognition circumvention tool. It is based on Jstylo framework and provides many interesting features such as an interactive editor to evade authorship that assists users in changing text and writing style using a dictionary and suggesting synonymous.

Jstylo was presented last year during a previous edition of the Chaos Communication Congress in Germany, the 28C3, it is able to obfuscate documents to protect author’s identity from authorship analysis, one of the main problems for the researchers in fact is to detect writing style deception.

The main methods of circumventing writing style analysis are:

  • Obfuscation – An author attempts to write a document in such a way that their personal writing style will not be recognized.

  • Imitation – An author attempts to write a document such that the writing style will be recognized as that of another specific author.

  • Translation – Machine translation is used to translate a document to one or more languages and then back to the original language.

The technique proposed during the 29C3 was tested across millions of posts from tens of thousands of users of a series of multilingual underground websites including thebadhackerz.com, blackhatpalace.com, www.carders.cc, free-hack.com, hackel1te.info, hack-sector.forumh.net, rootwarez.org, L33tcrew.org and antichat.ru.

Following the results obtained with the method:

  • Discovered up to 300 distinct discussion topics in the forums related to various malicious activities such as password cracking and black SEO.
  • The technique could be performed only using a minimum text length of 5000 words to limit the number of results.
  • To improve researches on specific topic such as exploits and drugs  the method needs to separate product information from conversational data to facilitate machine learning to automate the process.
  • The technique is more efficient translating the post in English, successes raises up from 66% to around 80, but free translator tools like Google and Bing are not efficient for the purpose.
  • Leetspeak, an alternative alphabet popular in some forum circles, cannot be translated.

 

The process is still in a first phase, once demonstrated that the method of research is useful and efficient the next step will be to automate it. Researcher Sadia Afroz said:

“We want to automate the whole process.”

“We aren’t trying to identify users, we are trying to show them that this is possible,” she said.

Certainly in the next few years will see some good, the research, as declared, will include more user-specific features and temporal information, the methods will be able to identify multiple account holders adding topic information with authorship data … but be sure that someone already work in the opposite direction to ensure its anonymity 😉

Pierluigi Paganini

Pierluigi Paganini

Pierluigi Paganini is member of the ENISA (European Union Agency for Network and Information Security) Threat Landscape Stakeholder Group and Cyber G7 Group, he is also a Security Evangelist, Security Analyst and Freelance Writer. Editor-in-Chief at "Cyber Defense Magazine", Pierluigi is a cyber security expert with over 20 years experience in the field, he is Certified Ethical Hacker at EC Council in London. The passion for writing and a strong belief that security is founded on sharing and awareness led Pierluigi to find the security blog "Security Affairs" recently named a Top National Security Resource for US. Pierluigi is a member of the "The Hacker News" team and he is a writer for some major publications in the field such as Cyber War Zone, ICTTF, Infosec Island, Infosec Institute, The Hacker News Magazine and for many other Security magazines. Author of the Books "The Deep Dark Web" and “Digital Virtual Currency and Bitcoin”.

Recent Posts

MITRE revealed that nation-state actors breached its systems via Ivanti zero-days

The MITRE Corporation revealed that a nation-state actor compromised its systems in January 2024 by…

11 hours ago

FBI chief says China is preparing to attack US critical infrastructure

China-linked threat actors are preparing cyber attacks against U.S. critical infrastructure warned FBI Director Christopher…

24 hours ago

United Nations Development Programme (UNDP) investigates data breach

The United Nations Development Programme (UNDP) has initiated an investigation into an alleged ransomware attack…

1 day ago

FIN7 targeted a large U.S. carmaker with phishing attacks

BlackBerry reported that the financially motivated group FIN7 targeted the IT department of a large…

2 days ago

Law enforcement operation dismantled phishing-as-a-service platform LabHost

An international law enforcement operation led to the disruption of the prominent phishing-as-a-service platform LabHost.…

2 days ago

Previously unknown Kapeka backdoor linked to Russian Sandworm APT

Russia-linked APT Sandworm employed a previously undocumented backdoor called Kapeka in attacks against Eastern Europe since…

2 days ago

This website uses cookies.