Profile cover photo
Profile photo
Nickolay Shmyrev
426 followers
426 followers
About
Posts

Google continues with end-to-end. Quite many interesting points here, one is subword-based model which is probably the very core thing to capture the phonetic variation.

https://arxiv.org/pdf/1712.01769.pdf

https://research.googleblog.com/2017/12/improving-end-to-end-models-for-speech.html

Post has attachment

Post has shared content
Overall this is great, but data is 48khz sound encoded into 64bps mp3. I wonder if it is intentional.

Post has shared content
The bell rings for Nuance
Good documentation helps create good clinical care. Can the voice recognition technologies already available in Google Assistant, Google Home, and Google Translate be used to help doctors summarize notes more quickly?

Post has attachment

Post has attachment

Post has attachment

Post has attachment
One pretty interesting thing about Google's speech commands demo is that they call input features "fingerprint", despite they use pretty standard MFCC in the demo. This unusual term suggests that in production system the use some kind of audio fingerprinting for feature extraction, maybe wavelet fingerprinting they describe in papers.

https://github.com/tensorflow/tensorflow/blob/0e983318f711055448c66be6706a6238c866b784/tensorflow/examples/speech_commands/models.py#L51

Post has shared content

Awsome deep neural networks for voice conversion (voice style transfer) in Tensorflow
"Speaking like Kate Winslet"

Post has attachment
Wait while more posts are being loaded