### Yann LeCun

Shared publicly  -

Deep Learning and Graphical Models

I sometimes get questions like "how does deep learning compare with graphical models?". There is no answer to this question because deep learning and graphical models are orthogonal concepts that can be (and have been) combined.

Let me state this very clearly: there is no opposition between the two paradigms. They can be advantageously combined.

Of course, deep Boltzmann Machines are a form of probabilistic factor graph themselves. But there are other ways in which the concepts can be combined.

For example, you could imagine a factor graph in which the factors themselves contain a deep neural net. A good example would be a dynamical factor graph in which the state vector at time t, Z(t) is predicted from the states and inputs at previous times through a deep neural net (perhaps a temporal convolutional net). A simple instance is when the log factor is equal to ||Z(t) - G(Z(t-1), X(t))||^2, where G is a deep neural net.
This simply says that the conditional distribution of Z(t) given Z(t-1) and X(t) is a Gaussian of mean G(Z(t-1), X(t)) and covariance unity.

This type of dynamic factor graph can be used to model multi-dimensional time series. When a sequence X(t) is observed, one can infer the most likely sequence of hidden states Z(t) by minimizing the sum of the log factors (which we can call an energy function).

Once the optimal Z(t) is found, one can update the parameters of the network G() to make the energy smaller.

A more sophisticated version of this could be used to learn the covariance of the Gaussians, or to marginalize over the Z(t) sequence instead of just doing MAP inference (only taking into account the sequence with the lowest energy).

An example of such "factor graph with deep factors" was described in 2009 ECML paper with my former student  (who is now at Bell Labs) "Factor Graphs for Time Series Modeling"
(Piotr Mirowski & Yann LeCun, ECML 2009): http://yann.lecun.com/exdb/publis/pdf/mirowski-ecml-09.pdf

A similar model used auto-encoder-type unsupervised pre-training to do language modeling "Dynamic Auto-Encoders for Semantic Indexing" (Piotr Mirowski & Yann LeCun, NIPS Workshop on Deep Learning, 2010):
http://yann.lecun.com/exdb/publis/pdf/mirowski-nipsdl-10.pdf

Another way to combine deep learning with graphical models is through structured prediction. To some, this may sound like a new idea, but the history of this goes back to the early 90's.   and Xavier Driancourt used a sequence alignment on top of a temporal convolutional net to do spoken work recognition. They trained the convnet and the elastic word models simultaneously, at the word level, by back-propagating gradients through the time alignment module (which you can see as a kind of factor graph in which the time warping function is a latent variable).

In the early 90's Leon,  and  built "hybrid" speech recognition systems in which a temporal convolutional net and an HMM were trained simultaneously using a discriminative criterion at the word (or sentence) level.

A few years later, Leon, Yoshua, Patrick and I used similar ideas to train our handwriting recognition system. Instead of a normalized HMM, we used a kind of energy-based factor graph without normalization. The normalization is superfluous (even hurtful) when the training is discriminative. We called this "Graph Transformer Networks". This was first published at CVPR 1997 and ICASSP 1997, but the best explanation of it is in our 1998 Proc, IEEE paper: http://yann.lecun.com/exdb/publis/pdf/lecun-98.pdf

Some of the history of this with detailed bibliography is available in the paper "A Tutorial on Energy-Based Learning": http://yann.lecun.com/exdb/publis/pdf/lecun-06.pdf (starting around Section 6).﻿
55
40

Didn't know about the Mirowski paper, thanks !﻿

Tell us what you think, Frank.

I know you are not really into bio-informatics, but there is a follow up paper in which we used a similar model (not deep, though) to infer gene regulation networks from temporal sequences of gene expression data: http://yann.lecun.com/exdb/publis/pdf/krouk-gb-10.pdf﻿

A very interesting topic!﻿

Yann thanks for the link. Studying graphical models these days, so interesting read ﻿

I am interested in deep learning using comparative genomics. Is it possible to get any good research articles based on this. I want to compare the genomes of different non-human primate genomes for factors affecting longevity. I want to use deep learning architecture for this. Also I want to perform large scale data analysis. What is your input on this. Thanks.﻿

there arr very few papersnon the application of deep learning to genomics.

That reminds me: one of the first "real" tasks that I applied backprop to in 1985 was intron/exon site prediction. This was only published in my PhD thesis (in French).﻿