Profile

Cover photo
Nickolay Shmyrev
Lives in Москва
309 followers|124,599 views
AboutPostsYouTube

Stream

Nickolay Shmyrev

Shared publicly  - 
 
I spent whole day reading about discriminative training in various posts and publications, it's a shame that most of the explanations are at least confusing or plain wrong. Good sign of a confusing post is that model parameters are not mentioned as a part of distribution, author writes P(x, c) instead of P(x,c | theta). Finally found a good source:

Discriminative models, not discriminative training
by Tom Minka
http://research.microsoft.com/pubs/70229/tr-2005-144.pdf
1
1
Olga Ladoshko's profile photo
Add a comment...

Nickolay Shmyrev

Shared publicly  - 
 
So https://www.kaggle.com/c/billion-word-imputation is over. The winner approach by https://www.kaggle.com/users/2809/eric-jackson is described here: https://www.kaggle.com/c/billion-word-imputation/forums/t/14210/one-million-monkeys-approach. Pretty fun, no neural networks were used. Next, the solution was tuned to the metric used in evaluation which I believe does not really reflect the purpose of the task. Since character-edit distance was used as a distance, the solution preferred short words over long words. Nice ideas though and honest win.
1
Tony Robinson's profile photo
 
Ah, great! This was my first intro to kaggle, simple mistakes were made in setting up the task but most were quickly rectified. The metric was clearly stated and whilst I agree it wasn't particularly relevant it was within scope of the kaggle team at the time, and it did lend itself to several interesting optimisations. There is much more that can be done on this task, sadly my group ran out of time last year - we did offer to make our nbest lists available and if anyone wants to keep working on this then the offer still stands. The rnnlm models do give much better results than ngrams but its the time taken to make a proper engineering solution that rightly won the day.
Add a comment...

Nickolay Shmyrev

Shared publicly  - 
 
This one is nice too
5
Add a comment...

Nickolay Shmyrev

Shared publicly  - 
2015-03-1110:58:45.0ZHANGYIHowambitionturnedastudentintoentrepreneurentrepreneur,kickstart,business,1811048828Economy2@usa/enpproperty-->LiuQingfeng,chairmanofUSTCiFlytecandadeputyintheNationalPeoplesCongress.[Photo/tech.qq.com]WhenLiuQingfengwasafirst-yearPhDstudent15yearsago,dreamsofmakingpiles...
1
Add a comment...

Nickolay Shmyrev

Shared publicly  - 
 
BBC started evaluation campain on large 1600 hours dataset
http://www.mgb-challenge.org. Somehow I miss interesting news in my feeds.
3
Add a comment...
Have him in circles
309 people
Yasuhisa Fujii's profile photo
richard kappler's profile photo
Misha Penkov's profile photo
mandeep khatkar's profile photo
Thế Thảo Hoàng's profile photo
Nana Ghartey's profile photo
Marina Pribyl's profile photo
Omar Al-Ithawi's profile photo
Eugene V. Kaurov's profile photo

Nickolay Shmyrev

Shared publicly  - 
 
Interesting note about NVIDIA presentations. Overall, GPU vs CPU is a complex question and it becomes even more complex for mobile. For many algorithms on matrices which traditionally solved on GPUs slightly more complex heuristic-based CPU version exists with superior performance. So advantage of GPU is really questionable. On the other hand, GPU processing speed allows system to keep low voltage significant amount of time which contributes to smaller power consumption. GPU is not accessible with OpenCL on iOS yet as far as I understand, that is another issue.
 
A quick lesson in how to understand BS marketing materials in HPC:
Step 1) Look up the launch date of the processors being compared.
Step 2) Look up the price of the processors being compared.

Here we have a performance compare of an NVIDIA K40 GPU to an Intel Core i7-3930K CPU (I have no idea if more than one core was used; marketing people at NVIDIA seem to be blissfully unaware of the OMP_NUM_THREADS environment variable).

First, the K40 launched in November 2013, so it's approximately two years newer than the CPU.  A more appropriate comparison would be an Intel Core i7-4930K or an Intel Xeon E5-2697V2 released only one quarter before the K40.

Second, the Intel CPU to which they compared costs less than $600, whereas the K40 GPU costs in excess of $3000 (according to NewEgg and Amazon).  I'm not an accountant, but that seems like a price difference worth noting.

Sometimes to have to ask yourself, should I start computing today, or should I wait 2 years and spend 5x the money to run no more than 6.5x faster?  Because running 5-25% faster per dollar with hardware that is two years newer doesn't exactly impress me.

References:

http://ark.intel.com/products/63697/Intel-Core-i7-3930K-Processor-12M-Cache-up-to-3_80-GHz
http://www.nvidia.com/object/tesla-servers.html
The NVIDIA cuSOLVER library provides a collection of dense and sparse direct solvers which deliver significant acceleration for Computer Vision, CFD, Computational Chemistry, and Linear Optimization applications.
54 comments on original post
1
Add a comment...

Nickolay Shmyrev

Shared publicly  - 
 
Very impressive, both TTS and ASR
4
1
Dirk Schnelle-Walka's profile photo
Add a comment...

Nickolay Shmyrev

Shared publicly  - 
1
Add a comment...

Nickolay Shmyrev

Shared publicly  - 
 
I ported pocketsphinx-android to #Gradle . It seems like a very strong advantage for Android ecosystem now because complex project dependencies could be tracked in an elegant way. Looks like Xcode is way behind here.
6
Add a comment...

Nickolay Shmyrev

Shared publicly  - 
 
Happy to understand why non-convex classifiers are required for acoustic frames. Basically 1 frame is not enough to accurately classify the phoneme, you need to observe frames before and after, the more, the better. Modern DNN classifiers use around 20 frames (0.2 second). Standalone phone sound is convex, however, if you have large fixed window like this, you get other phonemes in this window. And such object is non-convex because the previous phoneme can start from the frame 5 or it can start from the frame 10. This is why an advanced classifier like DNN is required. This also suggests that current context-dependency handling should be more complex for DNN than it was for GMM since the context is larger. An alternative would be to use an acoustic classifier with variable context range, something like landmark detection.
2
SAI KRISHNA's profile photoOlga Ladoshko's profile photo
2 comments
 
Perhaps we have in mind phonetic representation of phoneme as three phonemes states.  The second of which is a steady-state process. This second phoneme state differ significantly from the transients 1 and 3 states. Therefore, this elementary units form of  speech can be conventionally called convex.
Add a comment...
People
Have him in circles
309 people
Yasuhisa Fujii's profile photo
richard kappler's profile photo
Misha Penkov's profile photo
mandeep khatkar's profile photo
Thế Thảo Hoàng's profile photo
Nana Ghartey's profile photo
Marina Pribyl's profile photo
Omar Al-Ithawi's profile photo
Eugene V. Kaurov's profile photo
Links
Contributor to
Links
Places
Map of the places this user has livedMap of the places this user has livedMap of the places this user has lived
Currently
Москва
Previously
Астрахань
Basic Information
Gender
Male
Other names
Николай Шмырёв