I sometimes get questions like "how does deep learning compare with graphical models?". There is no answer to this question because deep learning and graphical models are orthogonal concepts that can be (and have been) combined.
Let me state this very clearly: there is no opposition between the two paradigms. They can be advantageously combined.
Of course, deep Boltzmann Machines are a form of probabilistic factor graph themselves. But there are other ways in which the concepts can be combined.
For example, you could imagine a factor graph in which the factors themselves contain a deep neural net. A good example would be a dynamical factor graph in which the state vector at time t, Z(t) is predicted from the states and inputs at previous times through a deep neural net (perhaps a temporal convolutional net). A simple instance is when the log factor is equal to ||Z(t) - G(Z(t-1), X(t))||^2, where G is a deep neural net.
This simply says that the conditional distribution of Z(t) given Z(t-1) and X(t) is a Gaussian of mean G(Z(t-1), X(t)) and covariance unity.
This type of dynamic factor graph can be used to model multi-dimensional time series. When a sequence X(t) is observed, one can infer the most likely sequence of hidden states Z(t) by minimizing the sum of the log factors (which we can call an energy function).
Once the optimal Z(t) is found, one can update the parameters of the network G() to make the energy smaller.
A more sophisticated version of this could be used to learn the covariance of the Gaussians, or to marginalize over the Z(t) sequence instead of just doing MAP inference (only taking into account the sequence with the lowest energy).
An example of such "factor graph with deep factors" was described in 2009 ECML paper with my former student (who is now at Bell Labs) "Factor Graphs for Time Series Modeling"
(Piotr Mirowski & Yann LeCun, ECML 2009): http://yann.lecun.com/exdb/publis/pdf/mirowski-ecml-09.pdf
A similar model used auto-encoder-type unsupervised pre-training to do language modeling "Dynamic Auto-Encoders for Semantic Indexing" (Piotr Mirowski & Yann LeCun, NIPS Workshop on Deep Learning, 2010):
Another way to combine deep learning with graphical models is through structured prediction. To some, this may sound like a new idea, but the history of this goes back to the early 90's. and Xavier Driancourt used a sequence alignment on top of a temporal convolutional net to do spoken work recognition. They trained the convnet and the elastic word models simultaneously, at the word level, by back-propagating gradients through the time alignment module (which you can see as a kind of factor graph in which the time warping function is a latent variable).
In the early 90's Leon, and built "hybrid" speech recognition systems in which a temporal convolutional net and an HMM were trained simultaneously using a discriminative criterion at the word (or sentence) level.
A few years later, Leon, Yoshua, Patrick and I used similar ideas to train our handwriting recognition system. Instead of a normalized HMM, we used a kind of energy-based factor graph without normalization. The normalization is superfluous (even hurtful) when the training is discriminative. We called this "Graph Transformer Networks". This was first published at CVPR 1997 and ICASSP 1997, but the best explanation of it is in our 1998 Proc, IEEE paper: http://yann.lecun.com/exdb/publis/pdf/lecun-98.pdf
Some of the history of this with detailed bibliography is available in the paper "A Tutorial on Energy-Based Learning": http://yann.lecun.com/exdb/publis/pdf/lecun-06.pdf (starting around Section 6).
- Lomonosov Moscow State University2005 - present
- MSU Graphics & Media LabResearcher, present
- Centre for Machine Perception
- Трататаэдральные заметки (current)
КАНДИДАТ В МЭРЫ МОСКВЫ 2013 Навальный МЕДИЦИНА ЖКХ МИГРАЦИЯ ...
Ужесточение правил. Я прослежу за тем, чтобы чиновники и подрядчики не нанимали нелегальных мигрантов за копейки, а «сэкономленную» часть вы
Conference Listing for Future Image Analysis and Related Topics with Arc...
Welcome to the complete listing of Computer Image Analysis Meetings, Workshops, Conferences and Special Journal Issue Announcements. Inc
PowerShell’s Security Guiding Principles - Windows PowerShell Blog - Sit...
One of most common issues we face with PowerShell comes from users or ISVs misunderstanding PowerShell's security guiding principles. At
ADmented Reality - Google Glasses Remixed with Google Ads - YouTube
When I saw Google had somehow forgotten to include any ads in their Project Glass promotional video I just couldn't resist fixing that overs