People

Start a hangout

## Profile

Sai Rahul

Worked at Veveo (India) Pvt Ltd

Attended IIT Kharagpur

Lives in Bangalore

245 followers|29,440 views

AboutPostsVideosReviews

Education

- IIT KharagpurM.Tech, 2004 - 2006
- Sree Vidyanikethan Engineering CollegeB.Tech, 2000 - 2004

Basic Information

Gender

Male

Work

Employment

- Veveo (India) Pvt Ltd

Places

Currently

Bangalore

Previously

Chandragiri - Tirupati - Kharagpur - Chandragiri - Hyderabad

Links

## Stream

### Sai Rahul

Shared publicly -**Extinct Giant Insects Rediscovered and Rescued!**

Even if you are grossed out by the picture, the story is wonderful -- that is if you like science, adventure, and people's desire to help non-human animals.

#science #conservation #entomology #insects #discovery

Source: http://www.npr.org/blogs/krulwich/2012/02/24/147367644/six-legged-giant-finds-secret-hideaway-hides-for-80-years

The insect is so large — as big as a human hand — it's been dubbed a "tree lobster." It was thought to be extinct, but some enterprising entomologists scoured a barren hunk of rock in the middle of the ocean and found surviving Lord Howe Island stick insects.

1

1

Add a comment...

### Sai Rahul

Shared publicly -World is changing ! Microsoft made .net opensource and it is MIT license ! http://blogs.msdn.com/b/dotnet/archive/2014/11/12/net-core-is-open-source.aspx 2 amazing things in 2 days !

1

Add a comment...

### Sai Rahul

Shared publicly -Interesting book

About: Probabilistic programming languages (PPLs) unify techniques for the formal description of computation and for the representation and use of uncertain knowledge. PPLs have seen recent interest from the artificial intelligence, programming languages, cognitive science, and natural languages ...

1

Add a comment...

### Sai Rahul

Shared publicly -**24, the Monster, and quantum gravity**

Think of a prime number other than 2 or 3. Multiply the number by itself and then subtract 1. The result is a multiple of 24. This observation might appear to be a curiosity, but it turns out to be the tip of an iceberg, with far-reaching connections to other areas of mathematics and physics.

This result works for more than just prime numbers. It works for any number that is

**relatively prime**to 24. For example, 25 is relatively prime to 24, because the only positive number that is a factor of both of them is 1. (An easy way to check this is to notice that 25 is not a multiple of 2, or 3, or both.) Squaring 25 gives 625, and 624=(24x26)+1.

A mathematician might state this property of the number 24 as follows:

*If m is relatively prime to 24, then m^2 is congruent to 1 modulo 24.*

One might ask if any numbers other than 24 have this property. The answer is “yes”, but the only other numbers that exhibit this property are 12, 8, 6, 4, 3, 2 and 1; in other words, the factors of 24.

The mathematicians

**John H. Conway**and

**Simon P. Norton**used this property of 24 in their seminal 1979 paper entitled

*Monstrous Moonshine*. In the paper, they refer to this property as “the defining property of 24”. The word “monstrous” in the title is a reference to the

**Monster group**, which can be thought of as a collection of more than 8x10^53 symmetries; that is, 8 followed by 53 other digits. The word “moonshine” refers to the perceived craziness of the intricate relationship between the Monster group and the theory of modular functions.

The existence of the Monster group, M, was not proved until shortly after Conway and Norton wrote their paper. It turns out that the easiest way to think of M in terms of symmetries of a vector space over the complex numbers is to use a vector space of dimension

**196883**. This number is close to another number that is related to the

**Leech lattice**. The Leech lattice can be thought of as a stunningly efficient way to pack unit spheres together in

**24**dimensional space. In this arrangement, each sphere will touch

**196560**others. The closeness of the numbers 196560 and 196883 is not a coincidence and can be explained using the theory of monstrous moonshine.

It is now known that lying behind monstrous moonshine is a certain

**conformal field theory**having the Monster group as symmetries. In 2007, the physicist

**Edward Witten**proposed a connection between monstrous moonshine and

**quantum gravity**. Witten concluded that pure gravity with maximally negative cosmological constant is dual to the Monster conformal field theory. This theory predicts a value for the semiclassical entropy estimate for a given black hole mass, in the large mass limit. Witten's theory estimates the value of this quantity as the natural logarithm of 196883, which works out at about 12.19. As a comparison, the work of

**Jacob Bekenstein**and

**Stephen Hawking**gives an estimate of 4π, which is about 12.57.

**Relevant links**

Wikipedia on the Monster group: http://en.wikipedia.org/wiki/Monster_group

Wikipedia on the Leech lattice: http://en.wikipedia.org/wiki/Leech_lattice

Wikipedia on Monstrous Moonshine: http://en.wikipedia.org/wiki/Monstrous_moonshine

A 2004 survey paper about Monstrous Moonshine by

**Terry Gannon**: http://arxiv.org/abs/math/0402345

#mathematics #physics #sciencesunday

1

Add a comment...

### Sai Rahul

Shared publicly -A new paper on machine translation, by +Oriol Vinyals and +Quoc Le and myself, on using large deep LSTMs to translate English to French by directly generating translations from the model. The LSTM maps the entire input sentence to a big vectors and then produces a translation from that vector. Our LSTM beats a good phrase-based baseline by 1.5 BLEU points on the entire test set (34.8 vs 33.3), where this performance measure penalizes our model on out-of-vocabulary words.

Surprisingly, the model "just works" on long sentences, because the LSTM's combined hidden state is very large (8k dimensions), and because we reversed the order of the words in the source sentence. By reversing the source sentences, we introduce many short term dependencies which make the optimization problem much easier for gradient descent. The final trick was to simply use a large vocabulary and to train the model for a long time.

Our results further confirm the "deep learning hypothesis": a big deep neural network can solve pretty much

The paper will be presented at NIPS 2014.

https://arxiv.org/abs/1409.3215

Surprisingly, the model "just works" on long sentences, because the LSTM's combined hidden state is very large (8k dimensions), and because we reversed the order of the words in the source sentence. By reversing the source sentences, we introduce many short term dependencies which make the optimization problem much easier for gradient descent. The final trick was to simply use a large vocabulary and to train the model for a long time.

Our results further confirm the "deep learning hypothesis": a big deep neural network can solve pretty much

**any**problem, provided it has a very big high quality labelled training set. And if the results aren't good enough, it's because model is too small or because it didn't train properly.The paper will be presented at NIPS 2014.

https://arxiv.org/abs/1409.3215

Abstract: Deep Neural Networks (DNNs) are powerful models that have achieved excellent performance on difficult learning tasks. Although DNNs work well whenever large labeled training sets are available, they cannot be used to map sequences to sequences. In this paper, we present a general ...

1

Add a comment...

### Sai Rahul

Shared publicly -A really nice write-up of how to gently introduce some graph theory concepts to younger students.

1

Add a comment...

### Sai Rahul

Shared publicly -Came across this wonderful talk by Tom Mitchell. It is about how our brain represents the language and whether we can decode it ?. Also whether the representation is same across different persons. Its very very early now. But we can easily image matrix kind of stuff. I.e learning how to ride the chopper without actual training :)

https://www.youtube.com/watch?v=C5WzCRBfObs

Some highlights.

Trained neural activity on one person and try to decode the same from another one. Probability of success is more than a chance. Meaning the representation is similar.

If a person sees Hammer then there is a significant activity in premotor cortex. Premotor cortex is responsible for movements.

Neural representation is similar across people, language, word vs picture. Its easier to decode concrete and emotion nouns but difficult to decode abstract nouns and verbs. Ex abstract nouns like justice.

https://www.youtube.com/watch?v=C5WzCRBfObs

Some highlights.

Trained neural activity on one person and try to decode the same from another one. Probability of success is more than a chance. Meaning the representation is similar.

If a person sees Hammer then there is a significant activity in premotor cortex. Premotor cortex is responsible for movements.

Neural representation is similar across people, language, word vs picture. Its easier to decode concrete and emotion nouns but difficult to decode abstract nouns and verbs. Ex abstract nouns like justice.

1

Add a comment...

### Sai Rahul

Shared publicly -Time to celebrate Humans ! Rosetta took 10 years to reach the comet.http://images-cdn.9gag.com/photo/avZGG4q_460sa_v1.gif … #CometLanding

1

Add a comment...

### Sai Rahul

Shared publicly -**Learning to Execute**and

**Neural Turing Machines**

I'd like to draw your attention to two papers that have been posted in the last few days from some of my colleagues at Google that I think are pretty interesting and exciting:

Learning to Execute: http://arxiv.org/abs/1410.4615

Neural Turing Machines: http://arxiv.org/abs/1410.5401

The first paper, "Learning to Execute", by +Wojciech Zaremba and +Ilya Sutskever attacks the problem of trying to train a neural network to take in a small Python program, one character at a time, and to predict its output. For example, as input, it might take:

"i=8827

c=(i-5347)

print((c+8704) if 2641<8500 else 5308)"

During training, the model is given that the desired output for this program is "12185". During inference, though, the model is able to generalize to completely new programs and does a pretty good of learning a simple Python interpreter from examples.

The second paper, "Neural Turing Machines", by +alex graves, Greg Wayne, and +Ivo Danihelka from Google's DeepMind group in London, couples an external memory ("the tape") with a neural network in a way that the whole system, including the memory access, is differentiable from end-to-end. This allows the system to be trained via gradient descent, and the system is able to learn a number of interesting algorithms, including copying, priority sorting, and associative recall.

Both of these are interesting steps along the way of having systems learn more complex behavior, such as learning entire algorithms, rather than being used for just learning functions.

(Edit: changed link to Learning to Execute paper to point to the top-level Arxiv HTML page, rather than to the PDF).

Abstract: We extend the capabilities of neural networks by coupling them to external memory resources, which they can interact with by attentional processes. The combined system is analogous to a Turing Machine or Von Neumann architecture but is differentiable end-to-end, allowing it to be ...

2

1

Add a comment...

### Sai Rahul

Shared publicly -The Unreasonable Effectiveness of Data - Alon Halevy, Peter Norvig, and Fernando Pereira.

http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/35179.pdf

Interesting comment.

"For many tasks, words and word combinations provide all the representational machinery we need to learn from text. Human language has evolved over millennia to have words for the important concepts; lets use them. Abstract representations (such as clusters from latent analysis) that lack linguistic counter parts are hard to learn or validate and tend to lose information."

+Rakesh Barve Are they striking off SVD and other approaches here ?

http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/35179.pdf

Interesting comment.

"For many tasks, words and word combinations provide all the representational machinery we need to learn from text. Human language has evolved over millennia to have words for the important concepts; lets use them. Abstract representations (such as clusters from latent analysis) that lack linguistic counter parts are hard to learn or validate and tend to lose information."

+Rakesh Barve Are they striking off SVD and other approaches here ?

1

Add a comment...

### Sai Rahul

Shared publicly -What features u want in SmartWatch?

Vibration if you haven't checked in your code in 2 hrs.Large electric shock if u checked in with conflicts :D

http://www.reddit.com/r/apple/comments/2g5927/the_apple_watch_had_about_30_apps_on_it_what_kind/

Vibration if you haven't checked in your code in 2 hrs.Large electric shock if u checked in with conflicts :D

http://www.reddit.com/r/apple/comments/2g5927/the_apple_watch_had_about_30_apps_on_it_what_kind/

Looking again at the keynote I saw about 30 apps on the home screen. What kind of 3rd party apps and features would absolutely sell on on this produc...

1

Add a comment...