Experts found an unsecured 16TB database containing 4.3B professional records

Pierluigi Paganini December 14, 2025

An open 16TB database exposed 4.3B professional records. It was unsecured and only closed after researchers alerted the owner.

A 16TB unsecured MongoDB database exposed about 4.3 billion professional records, mainly LinkedIn-style data, enabling large-scale AI-driven social-engineering attacks. The researcher Bob Diachenko and nexos.ai discovered the unsecured DB on November 23, 2025, and it was secured two days later.

At this time, it is impossible to know who accessed it beforehand.

Cybernews staff analyzed the unsecured database, finding nine collections, with each name most likely indicating the type of information contained within. Below are the collections contained in the dataset:

  • intent – 2,054,410,607 docs (604.76 GB)
  • profiles – 1,135,462,992 docs (5.85 TB)
  • unique_profiles – 732,412,172 docs (5.63 TB)
  • people – 169,061,357 docs (3.95 TB)
  • sitemap – 163,765,524 docs (20.22 GB)
  • companies – 17,302,088 docs (72.9 GB)
  • company_sitemap – 17,301,617 docs (3.76 GB)
  • address_cache – 8,126,667 docs (26.78 GB)
  • intent_archive – 2,073,723 docs (620 MB)

At least three collections exposed nearly two billion personal records, including names, emails, phone numbers, LinkedIn links, job roles, employers, work history, education, locations, skills, languages, and social accounts. The “unique_profiles” dataset alone listed over 732 million records with image URLs. Another “people” collection added enrichment metrics and Apollo IDs linked to the Apollo.io ecosystem, with no signs of an Apollo breach.

“According to our researchers, all records within a specific collection are unique. However, there could be duplicates between different collections within the exposed dataset.” reported Cybernews.

“While different collections contain different sets of information, the researchers confirmed that at least three of them, profiles, unique_profiles, and people, contained personally identifiable information (PII).”

Cybernews reported that it is difficult to determine the age of the LinkedIn data. Timestamps show records were collected or updated in 2025, but some data may date back years, including possible scrapes from large LinkedIn leaks claimed by threat actors in 2021.

The ownership of the leaked dataset remains unconfirmed. Researchers found clues suggesting a lead‑generation company, as sitemap records linked “/people” and “/company” paths to its website. The firm claims access to over 700 million professionals, closely matching the exposed “unique_profiles” count, and the database went offline a day after notification. Still, researchers stopped short of attribution, noting the company itself may have been scraped.

The leak is dangerous because such massive, structured data enables targeted attacks, including phishing and CEO fraud, corporate reconnaissance, and large‑scale AI‑driven attacks. With billions of records, criminals can automate personalized scams, reduce prep time, and focus on high‑value targets, including Fortune 500 employees.

“Large language models (LLMs) are capable of generating personalized messages based on user profile information. With some additional effort, tens of millions of malicious emails can be sent to victims, and it only takes one high-value target for the whole operation to be profitable for the attacker.” concludes Cybernews.

“Large datasets like this one are a prime target for malicious actors, as they act as a strong foundational base for profile enrichment based on other data leaks, enabling malicious actors to craft a large, searchable database of personal data that, after enrichment, could also include passwords, device identifiers, links to other social media, etc. Such datasets simplify social engineering and credential stuffing attacks,” our researchers explained.”

Follow me on Twitter: @securityaffairs and Facebook and Mastodon

Pierluigi Paganini

(SecurityAffairs – hacking, 16TB database)



you might also like

leave a comment