GeekedIn service exposed 8 million GitHub profiles online

Pierluigi Paganini November 19, 2016

The GeekedIn recruitment project scraped user data from GitHub and other similar websites, but data were inadvertently leaked online.

The popular security expert Troy Hunt, who operates the data breach notification service the owner ‘Have I Been Pwned,’ recently received a 600 Mb MongoDB backup file containing data from a tech recruitment website called GeekedIn.

The analysis of the data dump revealed the presence of more than 8 million GitHub profiles, including names, email addresses, locations and other data.

“On Saturday, a character in the data trading scene popped up and sent me a 594MB file called geekedin.net_mirror_20160815.7z. It was allegedly a MongoDB backup from August belonging to a site I’d not heard of before, one called GeekedIn and they apparently do this:” reported Troy Hunt.

The archive includes over one million of valid email addresses, the remaining part is composed of email “[email protected]” evidently associated with GitHub accounts with no public email address.

The database also contains thousands of accounts apparently taken from BitBucket.

GeekedIn service crawls code hosting repositories, including GitHub and BitBucket, and creates profiles for open-source projects and developers. The GeekedIn data could be used by recruiters to hire developers whose skills match their needs.

The data gathered by the service are public and does not include any sensitive information such as passwords, but GitHub does allow users to use its data for commercial purposes.

The data was stored in a MongoDB database that was exposed on the internet and accessible to anyone.

“And again, yes, you can go and pull this data publicly on a per-individual basis but the constant response I got from close confidants I shared this information with is that ‘it just feels wrong’. And it is wrong, not just the scraping of GitHub in the first place in order to commercialise our information, but then subsequently losing it via a MongoDB with no password and now having it float around the web in data breach trading circles.” wrote Hunt.

At the time I was writing GeekedIn website is down.

In the last year, we observe several cases related to MongoBD databases poorly configured and exposed online.  In July 2015, John Matherly, the creator of Shodan search engine for connected devices, revealed that many MongoDB administrators have exposed something like 595.2 terabytes of data by using bad poor configurations, or un-patched versions of the MongoDB.

[adrotate banner=”9″]

Pierluigi Paganini

(Security Affairs – GeekedIn, hacking)



you might also like

leave a comment