Memex – The new search tool to dig also in the Deep Web

Pierluigi Paganini February 10, 2015

DARPA has publicly presented for the first time a new set of search tools called Memex which will improve also researches into the “Deep Web”.

In 2014, the U.S. Defense Advance Research Projects Agency (DARPA) launched a the MEMEX project to design advanced search tools that could be also used to scan the deep web, which isn’t indexed by Google and other commercial search engines.

The Memex  search engine was started to allow search of not indexed content, an operation that in the majority of cases is still run manually by Intelligence Agency.

“DARPA has launched the Memex program. Memex seeks to develop the next generation of search technologies and revolutionize the discovery, organization and presentation of search results. The goal is for users to be able to extend the reach of current search capabilities and quickly and thoroughly organize subsets of information based on individual interests. Memex also aims to produce search results that are more immediately useful to specific domains and tasks, and to improve the ability of military, government and commercial enterprises to find and organize mission-critical publically available information on the Internet” states the official page of the project.

A few days ago, for the first time, the  security community has the opportunity to give a look to the MEMEX system, DARPA opened the doors of its laboratories to the journalists of the Scientific American. The Agency provided a preview of the powerful search engine and explained its use in the fight against the cybercrime.

MEMEX search engine

The Pentagon’s research agency gave Scientific American a preview of the software and 60 Minutes exclusive looks at the technology. The researchers explained that there is an impressive amount of data that is not considered useful for ordinary web users, but that represents a crucial source of information for law enforcement and intelligence agencies.

“That leaves untouched a multitude of information that may not be valuable to the average Web surfer but could provide crucial information to investigators.” states the article published by the Scientific American.

The majority of information in the Deep Web is unstructured data which are gathered from multiple sources that could not be crawled by ordinary search engines. The most popular subset of the Deep Web is the Tor network, an anonymizing network that is accessible only by using specific software.

“We’re envisioning a new paradigm for search that would tailor indexed content, search results and interface tools to individual users and specific subject areas, and not the other way around,” said Chris White, DARPA program manager. “By inventing better methods for interacting with and sharing information, we want to improve search for everybody and individualize access to information. Ease of use for non-programmers is essential.”

The DARPA involved 17 different teams of researchers, composed of representatives of the academic world and private industry, to develop most advanced technologies to include in the MEMEX program. The ambitious projects aim to revolutionize the way so search and present information from a larger pool of sources, including the content on the Deep Web.

In many circumstances, we have explained that the Deep Web is often abused by criminals that manage a multitude of illegal activities, including child pornography, human trafficking, drug deals and any other cyber criminal activities.

According to several reports, including one published by researchers at the Carnegie Mellon University, the NYDA’s Office is one of several bureaus and agencies that have already used earlier versions of the Memex system to collect information on human trafficking cases and prosecute criminals.

In a video interview released by Chris White, the inventor of the Memex search engine, the expert explained how this amazing platform works and which kind of information is able to access.

MEMEX search engine CBS video

 “The internet is much, much bigger than people think,” White said. “By some estimates Google, Microsoft Bing, and Yahoo only give us access to around 5% of the content on the Web.” said White, highlighting that the Deep Web represents a privileged environment for bad actors and their illegal affairs.

Very eloquent was the intervention of another researcher involved in the project MEMEX, the DARPA innovation head Dan Kaufman. who says, ”

“the easiest way to think about Memex is: How can I make the unseen seen?” commented Kaufman. “Most people on the internet are doing benign and good things,” Kaufman said. “But there are parasites that live on there, and we take away their ability to use the internet against us– and make the world a better place.”

At the moment, different Intelligence Agencies are testing the Memex, including two district attorneys’ offices, a law enforcement agency, and a nongovernmental organization. In the next phase the access to the platform will be extended to a greater number of beta testers to stress the capabilities of the application.

“The next set of testing begins in a few weeks and will include federal and district prosecutors, regional and national law enforcement and multiple NGOs. One of the main objectives of this round is to test new image search capabilities that can analyze photos even when portions that might aid investigators—including traffickers’ faces or a television screen in the background — are obfuscated,” Scientific American reports. “Another goal is to try out different user interfaces and to experiment with streaming architectures that assess time-sensitive data.”

White remarked the potentiality of a set of tools like MEMEX and that improvements they bring to searches conducted by law enforcement for their investigation and to prevent crimes, the experts also explained that its system will never hack any service on the Internet in order to retrieve information-

“White made several key decisions about the type of data Memex could access in an effort to steer clear of the controversy around government access to private citizen information and communications, a particularly touchy subject since Edward Snowden’s National Security Agency revelations beginning in June 2013. If something is password protected, it is not public content and Memex does not search it, according to White.” reported the Scientific American.

“We didn’t want to do hacking,” White added. “We didn’t want to cloud this work unnecessarily by dragging in the specter of snooping and surveillance.” 

As usual happen for special projects born in the Intelligence, one day we would all benefit of the services offered by MEMEX and its searches, we are drowning in a sea of information.

“Memex would ultimately apply to any public domain content; initially, DARPA intends to develop Memex to address a key Defense Department mission: fighting human trafficking. Human trafficking is a factor in many types of military, law enforcement and intelligence investigations and has a significant web presence to attract customers. The use of forums, chats, advertisements, job postings, hidden services, etc., continues to enable a growing industry of modern slavery. An index curated for the counter-trafficking domain, along with configurable interfaces for search and analysis, would enable new opportunities to uncover and defeat trafficking enterprises. Memex plans to explore three technical areas of interest: domain-specific indexing, domain-specific search, and DoD-specified applications.”

(Security Affairs –  MEMEX project, DARPA)

you might also like

leave a comment