The Intercept is doing an okay-if-not-great job writing about the Snowden-leak-based "revelation" that the NSA uses large-scale speech-to-text on the large-scale collection of phone calls it records. These days everyone's Android and iPhone do this all the time; it hardly seems like it should count as a revelation. When I worked at BBN 8 years ago, the NIST-run competitions had worse error rates than we do now, but they were already running on a single machine at 10x faster than real-time. None of this was secret; read http://www.itl.nist.gov/iad/mig/tests/ace/2007/doc/ace07_eval_official_results_20070402.html
if you're a glutton for punishment.
But The Intercept's focus on the question "Are they creating a transcript of everything?" is misplaced. If you wanted to find all phone calls that contained the word bomb
, you would be stupid to make your one-best-guess transcript of the recording and then grep for "bomb", missing all the cases where the text-to-speech produced "balm" or "calm" or "slalom" errors. Instead you would have your computer scan the audio, using a model that's specifically trained to look for bomb
(and the other thousand words you most care about), and flag the bits that sound like they might match, even if your models say "balm" is a slightly more likely transcription.
Sadly, there's ample evidence that the FISA court's attempts at constraining the NSA are just as naive as The Intercept's way of reporting on it. Surely there's nothing stopping the NSA from doing this, and so surely they do it.