Profile

Cover photo
Andrej Karpathy
Worked at Google
Attends Stanford University
Lives in Stanford
6,781 followers|737,241 views
AboutPostsPhotosVideos

Stream

Andrej Karpathy

Shared publicly  - 
 
New blog post on "Deep Reinforcement Learning: Pong from Pixels".

Policy Gradients are powerful in certain settings, but further progress in rapid model building is necessary to push further.

http://karpathy.github.io/2016/05/31/rl/
53
18
Adrian CHAMDANI's profile photoRoyi Avital's profile photo
2 comments
 
Hi, Sorry to bother you here. Just was wondering if you ever to release the videos of CS231n? Thank You.
Add a comment...

Andrej Karpathy

Shared publicly  - 
 
New blog post: Shaking things up a bit with a short story on AI, "A Cognitive Discontinuity".
http://karpathy.github.io/2015/11/14/ai/

Appropriately, names of 2 main characters (Merus, Licia) were sampled from my earlier list of RNN-generated names :)
https://plus.google.com/+AndrejKarpathy/posts/GuutNpJKCUp
27
6
Renata Sherwin's profile photoMark Cummins's profile photoJulia Sok's profile photoAntreas Antoniou's profile photo
4 comments
 
I see you are into the artistic side of things now. If you like sci-fi/AI/philosophy based art you should check out: https://kissanime.to/Anime/Dimension-W
Add a comment...

Andrej Karpathy

Shared publicly  - 
 
#RandomExperimentSundays : I was curious if char-rnn (https://github.com/karpathy/char-rnn) can generate new, fun and plausible baby names. So I got a dataset of 8,000 baby names from an NLP repo (http://www.cs.cmu.edu/afs/cs/project/ai-repository/ai/areas/nlp/corpora/names/), trained a 2-layer LSTM and generated some.

To my amusement many fun unique names come out and 90% of them are not found in the training data. Here are 100 example samples that do not occur in training data:

Rudi
Levette
Berice
Lussa
Hany
Mareanne
Chrestina
Carissy
Marylen
Hammine
Janye
Marlise
Jacacrie
Hendred
Romand
Charienna
Nenotto
Ette
Dorane
Wallen
Marly
Darine
Salina
Elvyn
Ersia
Maralena
Minoria
Ellia
Charmin
Antley
Nerille
Chelon
Walmor
Evena
Jeryly
Stachon
Charisa
Allisa
Anatha
Cathanie
Geetra
Alexie
Jerin
Cassen
Herbett
Cossie
Velen
Daurenge
Robester
Shermond
Terisa
Licia
Roselen
Ferine
Jayn
Lusine
Charyanne
Sales
Sanny
Resa
Wallon
Martine
Merus
Jelen
Candica
Wallin
Tel
Rachene
Tarine
Ozila
Ketia
Shanne
Arnande
Karella
Roselina
Alessia
Chasty
Deland
Berther
Geamar
Jackein
Mellisand
Sagdy
Nenc
Lessie
Rasemy
Guen
Gavi
Milea
Anneda
Margoris
Janin
Rodelin
Zeanna
Elyne
Janah
Ferzina
Susta
Pey
Castina

Here is a much bigger sample: http://cs.stanford.edu/people/karpathy/namesGenUnique.txt

Some of my favorites include "Baby" (haha), "Killie", "Char", "R", "More", "Mars", "Hi", "Saddie", "With" and "Ahbort". Well that was fun.
58
8
Xuchen Liu's profile photoAndrej Karpathy's profile photoMario Alzate Lopez's profile photomaaike keevel's profile photo
14 comments
 
My friend's names are in this list
Add a comment...

Andrej Karpathy

Shared publicly  - 
 
CVPR 2015 papers are now up so I organized them into my annual pretty interface http://cs.stanford.edu/people/karpathy/cvpr2015papers/ … This year: new interactive t-SNE map
48
19
Add a comment...

Andrej Karpathy

Shared publicly  - 
 
Fooling Linear Classifiers on ImageNet
http://karpathy.github.io/2015/03/30/breaking-convnets/
new blog post with a few interpretations of the fooling ConvNets papers, along with some experiments with fooling linear classifiers. I tried to structure parts of it in ways that laymen could understand because I've been seeing quite a few misconceptions surround the topic online.

(sorry to ppl seeing this twice)
40
9
Abe Pectol's profile photoAndrej Karpathy's profile photoIan Goodfellow's profile photo
8 comments
 
Thanks!
Add a comment...

Andrej Karpathy

Shared publicly  - 
 
Was playing around with optimizing Caffe forward/backward pass for an AlexNet on LMDB encoded ImageNet, on a K40, Ubuntu 12.04, originally the machine had NVIDIA driver 311, cuda 6.0:

initial forward backward average time: 1800ms

move data from harddisk to SSD:
1770ms

install CUDA 6.5 and 340.29 NVIDIA driver:
1535ms
(Caffe website claims 311 driver had some kind of critical issue, which is why we may be seeing this dramatic improvement)

disable ECC, overclock:
(sudo nvidia-smi -i 0 --ecc-config=0
sudo nvidia-smi -pm 1
sudo nvidia-smi -i 0 -ac 3004,875 )
gives 1338ms
(this is comparable to Caffe website, which claims 1325ms)

Compile with cudnn v1:
954ms
few per layer breakdowns:
data forward 50ms (suspiciously high)
conv1 forward 50ms, back 54ms
relu1 forward 2.5ms, back 3.7ms
pool1 forward 4.6ms, back: 14ms
norm1 forward 2.5ms (above 4 lines: conv by far most expensive)
conv2,3,4,5 forward: 67ms, 63ms, 47ms, 22ms (relatively uniform)
fc6,fc7,fc8 forward 9ms, 4ms, 1ms (very cheap)

For another comparison, a friend running the same benchmark with Titan Z, cudnnv1, but with leveldb encoding on harddisk got 987ms.

Test CPU: I was also curious about CPU: This machine has 24x [Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz]. Running the same caffe test (caffe compiled to use Intel MKL) shows 45% utilization on average (i.e. ~1100% on $top), and gives 14700ms.
i.e. final GPU performance is 14700/954 = 15.4x faster.

Future work
cudnnv2 is out but RC2 had bug with 1x1 conv
cudnnv2 RC 3 is now out as of yesterday but Caffe (on master) doesn't seem to compile with it yet. Apparently they sped up CONV by a lot (40%). Looking forward to this!

(These results are obtained with the the Caffe timing script, e.g.:
./build/tools/caffe time --model=models/bvlc_reference_caffenet/train_val.prototxt --gpu 3)
34
7
Soumith Chintala's profile photoSander Dieleman's profile photoAndrej Karpathy's profile photoShuai Kyle Zheng's profile photo
6 comments
 
HI, +Andrej Karpathy Thanks for your sharing. I got caffe running on mnist dataset with cudnn v2 rc3. It takes 120ms average time, and forward average time cost is 0.9ms. It is more or less 7x faster than that on GPU, which got 895.3ms. I guess it would be more significant for IMAGENET dataset.
Add a comment...
Have him in circles
6,781 people
Eugenio Culurciello's profile photo
Sol Irvine's profile photo
Alan DeRossett's profile photo
Hans J Schroeder's profile photo
John Nipper's profile photo
Dan Kott's profile photo
med saf's profile photo
Lana Kayden's profile photo
Luca Simonelli's profile photo

Andrej Karpathy

Shared publicly  - 
 
Didn't post on G+ in a while, so I thought I'd advertise my recent side project obsession *arxiv-sanity* (http://www.arxiv-sanity.com/top) . This site helps you sort through arxiv papers (especially in Machine Learning): search, sort by similarity, add papers to a library.

It will build you a personalized SVM based on tfidf of all papers in your library, and recommend new ones from Arxiv (recent or not).

I <3 working on this project because arxiv was just out of control.
50
15
Add a comment...

Andrej Karpathy

Shared publicly  - 
 
New fun side project blog post on "What a Deep Neural Network thinks about your #selfie".

http://karpathy.github.io/2015/10/25/selfie/
34
9
Mahdi Kalayeh's profile photoDae Hee Kim's profile photo
2 comments
 
Hi, I have studied your convjs very carefully. I have some question about your implementation. Can I ask them via email? You can send me an email in case you do not want to publicize your email address. kim678@illinois.edu
If you can help, it will be very much appreciated. 
Thank you!
Add a comment...

Andrej Karpathy

Shared publicly  - 
 
Odd experience with ICCV reviews. I was first asked to order ~30 papers based on how qualified I was to review them. I spent a good hour on this, I liked many of them (based on the abstract) and ended up marking ~15 as qualified for.

Except of my final 5 assigned papers, 4 of them I can't remember ever seeing, not only in my final 15 but also in the wider list of 30. I can't tell if this is a mistake, or if this is working as intended. In the end, I'm very not qualified for 3/5 of the assigned papers. As a result, it's taking significantly more time and effort to write the reviews because I need to do my own literature review, and despite my efforts the 3 papers are unlikely to get an informed opinion.

Curious about experience of others - is this normal?
19
Andrej Karpathy's profile photoAndrei Bursuc's profile photoBob Collins's profile photoAnton Milan's profile photo
9 comments
 
"I do hope the papers I was eager to review get accepted, because I still want to read them :-)"

Have you checked arxiv? ;-)
Add a comment...

Andrej Karpathy

Shared publicly  - 
 
New (epic) blog post on "The Unreasonable Effectiveness of Recurrent Neural Networks" http://karpathy.github.io/2015/05/21/rnn-effectiveness/ was immense fun to write

(sorry to people who are seeing this multiple times)
101
55
John Taylor's profile photoAndrej Karpathy's profile photoMats-Erik Pistol's profile photoMauri Niininen's profile photo
9 comments
 
I was using  char-rnn package by +Andrej Karpathy to train recurrent neural network how to communicate using ham radio Morse code "language" (i.e. acronyms, commonly used Q-codes, etc.). After 50 epochs and using very little training material char-cnn delivered impressive performance - see details in http://ag1le.blogspot.com/2015/11/your-next-qso-partner-artificial.html 
Kudos to  +Andrej Karpathy for making his code freely available! 
Add a comment...

Andrej Karpathy

Shared publicly  - 
 
The Final Course Projects for our ConvNet class have been posted (100 ConvNet projects!):
http://cs231n.stanford.edu/reports.html

(sorry to people who are seeing this 2+ times)
99
34
Enes Deumić's profile photoAdhea Arlini's profile photo
2 comments
 
It is cool! 
Add a comment...

Andrej Karpathy

Shared publicly  - 
 
There are several new ImageNet results floating around that beat my 5.1% error rate on ImageNet. Most recently an interesting paper from Google that uses "batch normalization". I wanted to make a few comments regarding "surpassing human-level accuracy". The most critical one is this:

Human accuracy is not a point. It lives on tradeoff curve.

Estimating the lower bound error
5.1% is an approximate upper bound on human error, achieved by a relatively dedicated labeler who trained on 500 images and then evaluated on 1500. It is interesting to go further and estimate the lower bound on human error. We can do this approximately since I have broken down my errors based on categories, some of which I feel are fixable (by more training, or more expert knowledge of dogs, etc.), and some which I believe to be relatively insurmountable (e.g. multiple correct answers per image, or incorrect ground truth label).

In detail, the my human error types were:
1. Multiple correct objects in the image (12 mistakes)
2. Clearly incorrect label ground truth (5 mistakes)
3. Fine-grained recognition error (28 mistakes)
4. Class unawareness error (18 mistakes)
5. Insufficient training data (4 mistakes)
6. Unsorted/misc category (9 mistakes)

For a total of 76 mistakes, giving 76/1500 ~= 0.051 error. From these, I would argue that 1. and 2. are near insurmountable, while the rest could be further reduced by fine-grained experts (3.) and longer training period (4., 5.). For an optimistic lower bound, we could drop these errors down to 76 - 28 - 18 - 4 = 26, giving 26/1500 ~= 1.7% error, or even 1.1% if we drop all of (6.).

In conclusion
When you read the "surpassing-human" headlines, we should all keep in mind that human accuracy is not a point - it's a tradeoff curve. We trade off human effort and expertise with the error rate: I am one point on that curve with 5.1%. My labmates with almost no training are another point, with even up to 15% error. And based on the above hypothetical calculations, it's not unreasonable to suggest that a group of very dedicated humans might push this down to 2% or so.

That being said, I'm very impressed with how quickly multiple groups have improved from 6.6% down to ~5% and now also below! I did not expect to see such rapid progress. It seems that we're now surpassing a dedicated human labeler. And imo, when we are down to 3%, we'd matching the performance of a hypothetical super-dedicated fine-grained expert human ensemble of labelers.

My blog: 
http://karpathy.github.io/2014/09/02/what-i-learned-from-competing-against-a-convnet-on-imagenet/
The ILSVRC paper that has more details on human optimistic results:
http://arxiv.org/abs/1409.0575
Abstract: Training Deep Neural Networks is complicated by the fact that the distribution of each layer's inputs changes during training, as the parameters of the previous layers change. This slows down the training by requiring lower learning rates and careful parameter initialization, ...
100
32
Mihail Sirotenko's profile photoAndrej Karpathy's profile photoJon Barron's profile photo
3 comments
 
Let's all take a moment to appreciate that we are at doing peer-reviewed research using only Google+ and Arxiv, with a turnaround time of less than a day. Chew on that, PAMI.
Add a comment...
People
Have him in circles
6,781 people
Eugenio Culurciello's profile photo
Sol Irvine's profile photo
Alan DeRossett's profile photo
Hans J Schroeder's profile photo
John Nipper's profile photo
Dan Kott's profile photo
med saf's profile photo
Lana Kayden's profile photo
Luca Simonelli's profile photo
Work
Occupation
PhD Student
Employment
  • Google
    Research Intern, 2011 - 2011
Places
Map of the places this user has livedMap of the places this user has livedMap of the places this user has lived
Currently
Stanford
Previously
Mountain View - Kosice - Toronto - Vancouver
Story
Tagline
Computer Science PhD student at Stanford. I love technology, robots, and artificial intelligence
Introduction
Computer Science PhD student at Stanford, working on Machine Learning and Vision. On a quest to solve intelligence.
Education
  • Stanford University
    PhD Computer Science, 2011 - present
  • University of British Columbia
    MSc Computer Science, 2009 - 2011
  • University of Toronto
    BSc Computer Science and Physics, 2005 - 2009
Basic Information
Gender
Male