Profile cover photo
Profile photo
Patrick Nguyen
Patrick's interests
View all
Patrick's posts

Post has attachment

Post has attachment
Disclaimer: the hand is not actually drawing.

Thank you, google, thank you, for spell checking my email in whichever language I happen to type it.

Post has attachment
Coming out of stealth.

Post has attachment
Mind blowing. There's a recording of the Virtual Rachmaninoff on spotify.

Post has shared content
Yep. That's the speech button right there.
Just a hint for +Google ... anyone else agree?

Post has shared content
More data is better data.
Large Scale Language Modeling in Automatic Speech Recognition by +Ciprian Chelba 

At Google, we’re able to use the large amounts of data made available by the Web’s fast growth. Two such data sources are the anonymized queries on and the web itself. They help improve automatic speech recognition through large language models: Voice Search makes use of the former, whereas YouTube speech transcription benefits significantly from the latter. 

The language model is the component of a speech recognizer that assigns a probability to the next word in a sentence given the previous ones. As an example, if the previous words are “new york”, the model would assign a higher probability to “pizza” than say “granola”. The n-gram approach to language modeling (predicting the next word based on the previous n-1 words) is particularly well-suited to such large amounts of data: it scales gracefully, and the non-parametric nature of the model allows it to grow with more data. For example, on Voice Search we were able to train and evaluate 5-gram language models consisting of 12 billion n-grams, built using large vocabularies (1 million words), and trained on as many as 230 billion words. 

The computational effort pays off, as highlighted by the plot below: both word error rate (a measure of speech recognition accuracy) and search error rate (a metric we use to evaluate the output of the speech recognition system when used in a search engine) decrease significantly with larger language models. A more detailed summary of results on Voice Search and a few YouTube speech transcription tasks, written by +Ciprian Chelba, +Dan Bikel, +Masha Shugrina, +Patrick Nguyen and Shankar Kumar (, presents our results when increasing both the amount of training data, and the size of the language model estimated from such data. Depending on the task, availability and amount of training data used, as well as language model size and the performance of the underlying speech recognizer, we observe reductions in word error rate between 6% and 10% relative, for systems on a wide range of operating points.

Cross-posted with the Research Blog:
Wait while more posts are being loaded