Cover photo
Andrej Karpathy
Worked at Google
Attends Stanford University
Lives in Stanford
6,174 followers|615,203 views


Andrej Karpathy

Shared publicly  - 
CVPR 2015 papers are now up so I organized them into my annual pretty interface … This year: new interactive t-SNE map
Jonathan Howe's profile photoCheng Soon Ong's profile photo
Add a comment...

Andrej Karpathy

Shared publicly  - 
Fooling Linear Classifiers on ImageNet
new blog post with a few interpretations of the fooling ConvNets papers, along with some experiments with fooling linear classifiers. I tried to structure parts of it in ways that laymen could understand because I've been seeing quite a few misconceptions surround the topic online.

(sorry to ppl seeing this twice)
Saravanan Thirumuruganathan's profile photoGibran Fuentes Pineda's profile photoChristopher Olah's profile photoXu Jia's profile photo
Add a comment...

Andrej Karpathy

Shared publicly  - 
Was playing around with optimizing Caffe forward/backward pass for an AlexNet on LMDB encoded ImageNet, on a K40, Ubuntu 12.04, originally the machine had NVIDIA driver 311, cuda 6.0:

initial forward backward average time: 1800ms

move data from harddisk to SSD:

install CUDA 6.5 and 340.29 NVIDIA driver:
(Caffe website claims 311 driver had some kind of critical issue, which is why we may be seeing this dramatic improvement)

disable ECC, overclock:
(sudo nvidia-smi -i 0 --ecc-config=0
sudo nvidia-smi -pm 1
sudo nvidia-smi -i 0 -ac 3004,875 )
gives 1338ms
(this is comparable to Caffe website, which claims 1325ms)

Compile with cudnn v1:
few per layer breakdowns:
data forward 50ms (suspiciously high)
conv1 forward 50ms, back 54ms
relu1 forward 2.5ms, back 3.7ms
pool1 forward 4.6ms, back: 14ms
norm1 forward 2.5ms (above 4 lines: conv by far most expensive)
conv2,3,4,5 forward: 67ms, 63ms, 47ms, 22ms (relatively uniform)
fc6,fc7,fc8 forward 9ms, 4ms, 1ms (very cheap)

For another comparison, a friend running the same benchmark with Titan Z, cudnnv1, but with leveldb encoding on harddisk got 987ms.

Test CPU: I was also curious about CPU: This machine has 24x [Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz]. Running the same caffe test (caffe compiled to use Intel MKL) shows 45% utilization on average (i.e. ~1100% on $top), and gives 14700ms.
i.e. final GPU performance is 14700/954 = 15.4x faster.

Future work
cudnnv2 is out but RC2 had bug with 1x1 conv
cudnnv2 RC 3 is now out as of yesterday but Caffe (on master) doesn't seem to compile with it yet. Apparently they sped up CONV by a lot (40%). Looking forward to this!

(These results are obtained with the the Caffe timing script, e.g.:
./build/tools/caffe time --model=models/bvlc_reference_caffenet/train_val.prototxt --gpu 3)
Gibran Fuentes Pineda's profile photoNarek Hovsepyan's profile photoJulian Green's profile photoNikolai Varankine's profile photo
HI, +Andrej Karpathy Thanks for your sharing. I got caffe running on mnist dataset with cudnn v2 rc3. It takes 120ms average time, and forward average time cost is 0.9ms. It is more or less 7x faster than that on GPU, which got 895.3ms. I guess it would be more significant for IMAGENET dataset.
Add a comment...

Andrej Karpathy

Shared publicly  - 
There are several new ImageNet results floating around that beat my 5.1% error rate on ImageNet. Most recently an interesting paper from Google that uses "batch normalization". I wanted to make a few comments regarding "surpassing human-level accuracy". The most critical one is this:

Human accuracy is not a point. It lives on tradeoff curve.

Estimating the lower bound error
5.1% is an approximate upper bound on human error, achieved by a relatively dedicated labeler who trained on 500 images and then evaluated on 1500. It is interesting to go further and estimate the lower bound on human error. We can do this approximately since I have broken down my errors based on categories, some of which I feel are fixable (by more training, or more expert knowledge of dogs, etc.), and some which I believe to be relatively insurmountable (e.g. multiple correct answers per image, or incorrect ground truth label).

In detail, the my human error types were:
1. Multiple correct objects in the image (12 mistakes)
2. Clearly incorrect label ground truth (5 mistakes)
3. Fine-grained recognition error (28 mistakes)
4. Class unawareness error (18 mistakes)
5. Insufficient training data (4 mistakes)
6. Unsorted/misc category (9 mistakes)

For a total of 76 mistakes, giving 76/1500 ~= 0.051 error. From these, I would argue that 1. and 2. are near insurmountable, while the rest could be further reduced by fine-grained experts (3.) and longer training period (4., 5.). For an optimistic lower bound, we could drop these errors down to 76 - 28 - 18 - 4 = 26, giving 26/1500 ~= 1.7% error, or even 1.1% if we drop all of (6.).

In conclusion
When you read the "surpassing-human" headlines, we should all keep in mind that human accuracy is not a point - it's a tradeoff curve. We trade off human effort and expertise with the error rate: I am one point on that curve with 5.1%. My labmates with almost no training are another point, with even up to 15% error. And based on the above hypothetical calculations, it's not unreasonable to suggest that a group of very dedicated humans might push this down to 2% or so.

That being said, I'm very impressed with how quickly multiple groups have improved from 6.6% down to ~5% and now also below! I did not expect to see such rapid progress. It seems that we're now surpassing a dedicated human labeler. And imo, when we are down to 3%, we'd matching the performance of a hypothetical super-dedicated fine-grained expert human ensemble of labelers.

My blog:
The ILSVRC paper that has more details on human optimistic results:
Abstract: Training Deep Neural Networks is complicated by the fact that the distribution of each layer's inputs changes during training, as the parameters of the previous layers change. This slows down the training by requiring lower learning rates and careful parameter initialization, ...
subhashini venugopalan's profile photoBob Stenven's profile photoFoster Pan's profile photoPhilippe Amelot's profile photo
Let's all take a moment to appreciate that we are at doing peer-reviewed research using only Google+ and Arxiv, with a turnaround time of less than a day. Chew on that, PAMI.
Add a comment...

Andrej Karpathy

Shared publicly  - 
I'm working on class notes / assignments for our upcoming ConvNet class CS231n ( By far the hardest challenge I'm coming across is deciding on my target audience:

I want the notes to be broadly useful even to people who might not have a lot of ML background, but I'm now realizing that as a result they seem way too dumbed down, leaving a more experienced reader frustrated. Conversely I want to throw in a lot of nuggets of interesting subtle problems/issues and details, but many of them require quite a lot of background expertise and intuitions that sometime seem outside of the scope of the class.

For example, I spend 2 pages motivating the SVM loss and then gloss over the (more complex) Softmax loss that involves logs, exps and probabilities in 2 paragraphs. Or I discuss the k-Nearest Neighbor in 3 pages and then next section plunge into SGD and talk about convex problems and optimization dynamics. It's all inconsistent and giving me quite the headache.

What I'm most worried about is a problem that I sometimes see in other classes: they try to target everyone and end up targeting noone.
Ivan Griffin's profile photoXu Dai's profile photoJeshwanth Kumar N K's profile photoXu Jia's profile photo
Add a comment...

Andrej Karpathy

Shared publicly  - 
Next quarter Fei-Fei and I will be teaching a new Stanford CS class (CS231n) on ConvNets for Visual Recognition (Classification/Localization/Detection). There will be a strong focus on hands-on programming assignments and practical implementation tips/tricks. The students will over the course of the class write their own ConvNets in Python and in the end compete on ImageNet (or maybe a subset of it). We will be making a lot of the materials (code on github, course notes, slides, videos) available online for others to follow along.

The exact title is:

CS231n: Convolutional Neural Networks for Visual Recognition

I hope to help build this out to a really good resource and I'm looking forward to seeing how it turns out!
James Synge's profile photoSaravanan Thirumuruganathan's profile photosunil kumar Sahu's profile photoОлег Лавров's profile photo
Add a comment...

Andrej Karpathy

Shared publicly  - 
ImageNet 2014 Results are out!
New York Times article: 

and the raw results page:

very exciting results! TLDR: Large number of teams participated (50% increase from last year), classification error rate down to 6.7% (~half of last year at 11%!), and detection AP is now up to 43.9 from 22.5!
Dan Nuffer's profile photo
Add a comment...
In his circles
355 people
Have him in circles
6,174 people
Francisco Garcia's profile photo
Michael Brundage's profile photo
Benjamin Hornedo's profile photo
Samuel David's profile photo
Istvan Nagy's profile photo
Aditya Kumar Praharaj's profile photo
Andre Esteva's profile photo
Cezary Śliwiński's profile photo
abdul bari's profile photo

Andrej Karpathy

Shared publicly  - 
New (epic) blog post on "The Unreasonable Effectiveness of Recurrent Neural Networks" was immense fun to write

(sorry to people who are seeing this multiple times)
Wayne Eddy's profile photoAlex Liberman's profile photoClaus Dahl's profile photoStephen Hicks's profile photo
Absolutely fascinating. Increase the number of parameters 1000 times and feed it 100+ books on algebraic geometry and it might start to understand schemes. 
Add a comment...

Andrej Karpathy

Shared publicly  - 
The Final Course Projects for our ConvNet class have been posted (100 ConvNet projects!):

(sorry to people who are seeing this 2+ times)
Manish Agarwal's profile photoIoannis Stamos's profile photoFelipe Lenero's profile photoWenlu Zhang's profile photo
It is cool! 
Add a comment...

Andrej Karpathy

Shared publicly  - 
I was approached by a Wired reporter who was very excited about the idea of Human vs. Machine competition in Image Recognition (after he has read my blog post from a while ago reporting on estimate of Human Accuracy on ImageNet). My initial reaction is to agree to a chat, since I'm apriori eager to share the excitement in the field and talk about the progress we're collectively making, etc. This article came from that:

I don't have too much experience with talking to media, but I'm starting to understand some of the subtleties and dangers in the process. I gave what I thought was a thorough and detailed explanation of the work, its context and my takeaways, but that all ends up getting spun around, fluffed and transformed in (potentially scary) undefined ways, with little control over the final outcome or its factual correctness. In this particular case:

- The visualization above is not of a ConvNet. It's t-SNE embedding of CNN representations of the images. But okay, close, I suppose.
- Apparently I am competing against Google AI, not against ILSVRC state of the art image classification model.
- The article alludes to my previous attempt to evaluate human accuracy on CIFAR-10 in 2011, implying that CIFAR-10 was the "standard image recognition test", when in fact ImageNet/PASCAL VOC was around just as it is today, and CIFAR-10 was considered toy even back then.
- The accuracy on CIFAR-10 is not comparable to ImageNet hit@5 accuracy, though the article does mention very briefly that it isn't "apples to apples".

I suppose it could have been worse. The core idea that we are making rapid progress but there is still a long way to go is somewhere there. I'm just happy there are no overly hyperbolic claims of Deep Learning AI solving all of Computer Vision and intelligence or mentions of consciousness, and that my repeated and defensively over-emphasized pleas of not making it sound like Computer Vision is now solved have materialized into at least one paragraph at the end.

EDIT: I agree with Sancho in comments that these points above are relatively subtle, and that the take-aways in the article are "true enough". This post is mostly documenting my reaction to seeing "science" -> "popular science", what the lossy process looks like and what some of its properties are.
A Stanford graduate student recently took on Google's best image recognition AI software. He won. But it took some effort.
Yasser Souri's profile photoChen Change Loy's profile photoMihail Sirotenko's profile photoDan Sozanski's profile photo
+Andrej Karpathy That was exactly my thought. That this insight into the process makes you wonder about everything you read, even in scientific reporting. 
That's also why I think a response is important. I think the media need to be aware of this issue when they write articles and realize that they should always ponder if "close enough" is really "close enough" for each article they write. It's a very slippery slope otherwise.
Add a comment...

Andrej Karpathy

Shared publicly  - 
I rendered the NIPS 2014 papers in pretty format again. Also includes a new feature to filter by day of the poster, which might be useful (?)

This is now the 5th conference page I've made in this format and I'm becoming relatively efficient at it. Every year there are a few unforseen complications and new fun parsing puzzles to solve.The first page took a bit more than day, but now I'm down to ~2 hours / conference thanks to all the code snippets and script vestiges accumulated in this one folder :)
Alfredo Canziani (冷在)'s profile photoIvan Sorokin's profile photoFlávio Regis Arruda's profile photo冯俊峰's profile photo
It's possible to link the images to the corresponding page in the PDF by appending "#page=PN" to the URL of the PDF.
Add a comment...

Andrej Karpathy

Shared publicly  - 
I should probably link to this on my G+ as well (Sorry to people who are seeing this multiple times)
New blog post: What I learned from competing against a ConvNet on ImageNet

#computervision   #deeplearning
John Collomosse's profile photoAdrien BRUNO's profile photo冯俊峰's profile photoYili Zhao's profile photo
great post
Add a comment...
In his circles
355 people
Have him in circles
6,174 people
Francisco Garcia's profile photo
Michael Brundage's profile photo
Benjamin Hornedo's profile photo
Samuel David's profile photo
Istvan Nagy's profile photo
Aditya Kumar Praharaj's profile photo
Andre Esteva's profile photo
Cezary Śliwiński's profile photo
abdul bari's profile photo
PhD Student
  • Google
    Research Intern, 2011 - 2011
Map of the places this user has livedMap of the places this user has livedMap of the places this user has lived
Mountain View - Kosice - Toronto - Vancouver
Computer Science PhD student at Stanford. I love technology, robots, and artificial intelligence
Computer Science PhD student at Stanford, working on Machine Learning and Vision. On a quest to solve intelligence.
  • Stanford University
    PhD Computer Science, 2011 - present
  • University of British Columbia
    MSc Computer Science, 2009 - 2011
  • University of Toronto
    BSc Computer Science and Physics, 2005 - 2009
Basic Information