Talks by Facebook AI Research members at CVPR in Colombus, Ohio:

Monday 6/23: Deep Learning tutorial (Granf Ballroom 2)
9.00-10.00: Marc'Aurelio Ranzato : Supervised learning
17.15-18.00: Yann LeCun: Structured Prediction

Monday 6/23, Perceptual Organization Workshop (C115):
11:30 Yann LeCun:  1130 Hierarchy, Reasoning, and Representation Learning, 
16:40: Lubomir Bourdev: Do Mid-Level Parts Still Matter in the Age of CNNs? 

Monday 6/23, Scene Understanding Workshop (C214-215)
17:50: Marc'Aurelio Ranzato

Wednesday 6/25: Oral Session 4B Attribute-based Recognition and Human Pose Estimation
13:45: PANDA: Pose Aligned Networks for Deep Attribute Modeling. Ning Zhang, Manohar Paluri, Marc'Aurelio Ranzato, Trevor Darrell, Lubomir Bourdev

Wednesday 6/25: Oral Session 5A, Face and Gesture
16:15: DeepFace: Closing the Gap to Human-Level Performance in Face Verification. Yaniv Taigman, Ming Yang, Marc'Aurelio Ranzato, Lior Wolf

Saturday 6/28: Large-Scale Visual Recognition Tutorial
13:30pm - 14:30pm: Marc'Aurelio Ranzato: Large-scale visual recognition with deep learning

Saturday 6/28, Web-scale vision and Social Media
9:10: Marc'Aurelio Ranzato (invited talk)

Saturday 6/28, Big Vision workshop
16:30: Yann LeCun: Toward a Universal Perception System.
Old demo of DrLIM (Dimensionality Reduction by Learning an Invariant Mapping) from 2006.

DrLIM is a "metric learning" criterion for training ML systems (including deep architectures) to produce an embedding. It can be applied to so-called "siamese architectures" in which two identical learning machines (sharing the same parameters) are shown two examples. When the examples are semantically similar (e.g. two portraits of the same person), the distance between the output vectors is decreased. When the examples are semantically distinct, the output vectors are pushed away with a force that decreases with distance (often a hinge).

Similar methods have become widely used in recent years for image search (series of papers on WSABIE by +Samy Bengio  and +Jason Weston) body pose estimation (papers by +Graham Taylor), and face recognition (see the recent Deep Face system from Facebook AI Research by +Yaniv Taigman et al.)

Video of a talk on the subject at a NIPS 2006 workshop:

Relevant papers: 
Raia Hadsell, Sumit Chopra and Yann LeCun: Dimensionality Reduction by Learning an Invariant Mapping, Proc. Computer Vision and Pattern Recognition Conference (CVPR'06), 2006

Sumit Chopra, Raia Hadsell and Yann LeCun: Learning a Similarity Metric Discriminatively, with Application to Face Verification, Proc. of Computer Vision and Pattern Recognition Conference, 2005.

J. Bromley, I. Guyon, Y. LeCun, E. Sackinger and R. Shah: Signature Verification using a Siamese Time Delay Neural Network, in Cowan, J. and Tesauro, G. (Eds), Advances in Neural Information Processing Systems (NIPS 1993), vol 6, 1993
As I've been playing around with siamese networks recently: Is there any particular reason or problem as to why there are no recent publications on siamese networks? I generally liked the idea very much and found it weird not to see any reuse, especially because it has potentially nice applications in similarity search. I would expect the latter to be something that many people in deep learning should be interested in (e.g. Google, Facebook... for search).
Some of my colleagues are trying to do something very similar. Nothing so sophisticated though, I'm sure.
Use a proxy service in xianggang
A paper from Jim Di Carlo's group at MIT comparing the object recognition performance of deep convolutional nets with that of IT and V4 neurons in the cortex.

Basically, the Zeiler-Fergus ConvNet matches the performance of IT neurons. 

I suspect the results would be similar with the OverFeat network (which wasn't available when the work was performed).
Yann LeCun

A crop of new papers:

Jonatan Tompson, Murphy Stein, Yann LeCun and Ken Perlin: Real-Time Continuous Pose Recovery of Human Hands Using Convolutional Networks , to appear at SIGGRAPH, 2014.

Camille Couprie, Clement Farabet, Laurent Najman and Yann LeCun: Toward Real-time Indoor Semantic Segmentation Using Depth Information, to appear in JMLR, 2014.

Joan Bruna, Arthur Szlam and Yann LeCun: Signal Recovery from Lp Pooling Representations, International Conference on Machine Learning (ICML'14), 2014.
Demo video of the LAGR project from 2008.

NYU and Net-Scale Technologies collaborated between 2005 and 2008 on the DARPA-funded LAGR project (Learning Applied to Ground Robots). The main purpose was to develop ML methods to let autonomous mobile robot drive themselves in nature.

Several teams participated, using identical robot platforms developed by NREC. The teams could not modify the hardware.

This video describes the entire NYU/Net-Scale system, including the long-range vision system based on convolutional nets, the mid-range vision system based on high-resolution stereo, and the short-range obstacle detection system based on high frame-rate, low-resolution stereo. The convnet adapts to unknown environments on the fly, using traversability labels produced by the stereo systems.

The video also describes the robot-centered hyperbolic map, the simple rotational visual odometry and the learning-based control system for short range obstacle avoidance.

Relevant papers:

Raia Hadsell, Pierre Sermanet, Marco Scoffier, Ayse Erkan, Koray Kavackuoglu, Urs Muller and Yann LeCun: Learning Long-Range Vision for Autonomous Off-Road Driving, Journal of Field Robotics, 26(2):120-144, February 2009

Pierre Sermanet, Raia Hadsell, Marco Scoffier, Matt Grimes, Jan Ben, Ayse Erkan, Chris Crudele, Urs Muller and Yann LeCun: A Multi-Range Architecture for Collision-Free Off-Road Robot Navigation, Journal of Field Robotics, 26(1):58-87, January 2009

More pictures, videos, and information here:


Very cool.
We could use infrared (though to detect body heat you near far infrared, which you won't get from a regular camera). However, we would have to collect data and retrain the network.

This could run on a mobile device with a small hit in accuracy. This video is not using the FPGA, it's pure software and is not real time (but could easily be made real time).
IARPA has announced a new research program called Machine Intelligence from Cortical Networks (MICrONS).

Part of the program is inspired by the recent results in deep learning. One purpose of the program seems to be to bring together ML researchers and neuroscientists.

Proposer's day is July 17th.
Yes, it is open to everyone. 
MIT posted videos of two lectures and a talk I gave there last year, as well as a panel discussion on invariant feature learning.

Lecture on Computer Perception with Deep Learning in Course 9.S912: "Vision and learning - computers and brains", Nov 12, 2013:

- Part1: - Part2:

Workshop on Learning Data Representation: Hierarchies and Invariance:

- Talk: "Learning Invariant Feature Hierarchies":

- Panel discussion:
