RHINEHART, VoiceRT, how NSA converted spoken words into text

Pierluigi Paganini May 07, 2015

A new collection of documents leaked by Snowden revealed how the US intelligence converts spoken conversation in indexable text with RHINEHART and VoiceRT tools.

Following the disclosure of Top-secret documents provided by Edward Snowden everyone is aware of the risks of using any sort of communications methods, what was once private now it isn’t anymore, but do they know that even spoken conversation (on the telephone) can be converted and stored as text to be researched later?

To understand how it’s done, let’s go back to the 1970s, when the Defense Advanced Research Projects Agency (DARPA) was starting funding research in speech recognition, what lead to several projects being able to convert speech into text, even if it was slow, but improvements were made to make it faster and able to get more data.

Dan Kaufman, the director of DARPA’s information innovation office said that the method of converting voice to text its “super hard,” “there’s a lot of noise on the signal” and added “I would tell you we are not very good at that,” but he also said “we’re getting pretty good at being able to do these types of translations.”

After the 2001 the investment in spying tools massively increased and from one of the leaked documents provided by Snowden we even know that one decade ago NSA analysts were celebrating the creation of “Google for Voice”.

There isn’t yet any tool that does a perfect transcription of natural conversations, and for what we know that day is yet to come, but the tools existing nowadays do a pretty good job, using extensive keyword searching, extracting voice conversations, using comply algorithms to flag conversations that may have an interest, etc. etc.

The documents that Snowden leaked online reveal that US Intelligence has used of these tools in several war zones (i.e. Iraq, Afghanistan, and Latin America) but there is no data reporting their use to spy on US citizens.

Individuals worldwide should be concerned because the US intelligence operated under the radar, for it seems that neither the US congress was aware of these activities.

“I think people don’t understand that the economics of surveillance have totally changed,” explained Director of Civil Liberties Jennifer Granick to The Intercept.

“Once you have this capability, then the question is: How will it be deployed? Can you temporarily cache all American phone calls, transcribe all the phone calls, and do text searching of the content of the calls?” “It may not be what they are doing right now, but they’ll be able to do it.” “How would we ever know if they change the policy?” “We don’t have any idea how many innocent people are being affected, or how many of those innocent people are also Americans.” She added

The tools and their history

A the terrorist attacks in 2001, a huge amounts of voice communications are being collected automated way by using similar tools to process them.

The first generation of tools was codenamed “RHINEHART,” and it was deployed for the first time in 2004.

An internal memo from NSA (from 2006) called “For Media Mining, the Future Is Now!” reported:

“Voice word search technology allows analysts to find and prioritize intercept based on its intelligence content,”

The memo goes on and said that “[RHINEHART was] “designed to support both real-time searches, in which incoming data is automatically searched by a designated set of dictionaries, and retrospective searches, in which analysts can repeatedly search over months of past traffic,”

In 2006 “RHINEHART” was being used “across a wide variety of missions and languages”

In 2009, it emerged a new tool called “VoiceRT” that was first used in Baghdad and was “designed to index and tag 1 million cuts per day.”

US Intelligence NSA slide RHINEHART, VoiceRT


In 2011 and 2012, a new tool was created to replace “VoiceRT”, the new tool was called “SPIRITFIRE” and could handle even more data, and faster, being “a more robust voice processing capability based on speech-to-text keyword search and paired dialogue transcription.”

From the NSA memo we also discovered that “RHINEHART” was used by Persian-speaking analysts and that Spanish was the more mature language to convert from speech-to-text.

To finish the article I just would like to add that we as citizens of the world need to demand more respect for our rights, demand regulations for this type of tools, legislation to protect us from our governments.

About the Author Elsio Pinto

Elsio Pinto is at the moment the Lead Mcafee Security Engineer at Swiss Re, but he also as knowledge in the areas of malware research, forensics, ethical hacking. He had previous experiences in major institutions being the European Parliament one of them. He is a security enthusiast and tries his best to pass his knowledge. He also owns his own blog http://high54security.blogspot.com/

Edited by Pierluigi Paganini

(Security Affairs –  US Intelligence, eavesdropping, RHINEHART, VoiceRT)

you might also like

leave a comment