Cover photo
Suresh Venkatasubramanian
Works at U. Utah
Attended Stanford
Lives in Salt Lake City


Trying to understand why dropout networks work so well, I was quite surprised to see that we can get principled uncertainty information from these models for free – without changing a thing.
View original post
Add a comment...
And all of this applies even to mathematical reasoning. To build gigantic structures of math, you need powerful abstractions, but you also need to know the limits of these abstractions (and create new ones as needed). 
An old rant from Steve Yegge about what you need to understand the inner workings of and what you can afford to treat as magic.
4 comments on original post
Add a comment...
Critique of Paper by "Deep Learning Conspiracy" (Nature 521 p 436)

Machine learning is the science of credit assignment. The machine learning community itself profits from proper credit assignment to its members. The inventor of an important method should get credit for inventing it. She may not always be the one who popularizes it. Then the popularizer should get credit for popularizing it (but not for inventing it). Relatively young research areas such as machine learning should adopt the honor code of mature fields such as mathematics: if you have a new theorem, but use a proof technique similar to somebody else's, you must make this very clear. If you "re-invent" something that was already known, and only later become aware of this, you must at least make it clear later.

As a case in point, let me now comment on a recent article in Nature (2015) about "deep learning" in artificial neural networks (NNs), by LeCun & Bengio & Hinton (LBH for short), three CIFAR-funded collaborators who call themselves the "deep learning conspiracy" (e.g., LeCun, 2015). They heavily cite each other. Unfortunately, however, they fail to credit the pioneers of the field, which originated half a century ago. All references below are taken from the recent deep learning overview (Schmidhuber, 2015), except for a few papers listed beneath this critique focusing on nine items.

1. LBH's survey does not even mention the father of deep learning, Alexey Grigorevich Ivakhnenko, who published the first general, working learning algorithms for deep networks (e.g., Ivakhnenko and Lapa, 1965). A paper from 1971 already described a deep learning net with 8 layers (Ivakhnenko, 1971), trained by a highly cited method still popular in the new millennium. Given a training set of input vectors with corresponding target output vectors, layers of additive and multiplicative neuron-like nodes are incrementally grown and trained by regression analysis, then pruned with the help of a separate validation set, where regularisation is used to weed out superfluous nodes. The numbers of layers and nodes per layer can be learned in problem-dependent fashion.

2. LBH discuss the importance and problems of gradient descent-based learning through backpropagation (BP), and cite their own papers on BP, plus a few others, but fail to mention BP's inventors. BP's continuous form was derived in the early 1960s (Bryson, 1961; Kelley, 1960; Bryson and Ho, 1969). Dreyfus (1962) published the elegant derivation of BP based on the chain rule only. BP's modern efficient version for discrete sparse networks (including FORTRAN code) was published by Linnainmaa (1970). Dreyfus (1973) used BP to change weights of controllers in proportion to such gradients. By 1980, automatic differentiation could derive BP for any differentiable graph (Speelpenning, 1980). Werbos (1982) published the first application of BP to NNs, extending thoughts in his 1974 thesis (cited by LBH), which did not have Linnainmaa's (1970) modern, efficient form of BP. BP for NNs on computers 10,000 times faster per Dollar than those of the 1960s can yield useful internal representations, as shown by Rumelhart et al. (1986), who also did not cite BP's inventors.

3. LBH claim: "Interest in deep feedforward networks [FNNs] was revived around 2006 (refs 31-34) by a group of researchers brought together by the Canadian Institute for Advanced Research (CIFAR)." Here they refer exclusively to their own labs, which is misleading. For example, by 2006, many researchers had used deep nets of the Ivakhnenko type for decades. LBH also ignore earlier, closely related work funded by other sources, such as the deep hierarchical convolutional neural abstraction pyramid (e.g., Behnke, 2003b), which was trained to reconstruct images corrupted by structured noise, enforcing increasingly abstract image representations in deeper and deeper layers. (BTW, the term "Deep Learning" (the very title of LBH's paper) was introduced to Machine Learning by Dechter (1986), and to NNs by Aizenberg et al (2000), none of them cited by LBH.)

4. LBH point to their own work (since 2006) on unsupervised pre-training of deep FNNs prior to BP-based fine-tuning, but fail to clarify that this was very similar in spirit and justification to the much earlier successful work on unsupervised pre-training of deep recurrent NNs (RNNs) called neural history compressors (Schmidhuber, 1992b, 1993b). Such RNNs are even more general than FNNs. A first RNN uses unsupervised learning to predict its next input. Each higher level RNN tries to learn a compressed representation of the information in the RNN below, to minimise the description length (or negative log probability) of the data. The top RNN may then find it easy to classify the data by supervised learning. One can even "distill" a higher, slow RNN (the teacher) into a lower, fast RNN (the student), by forcing the latter to predict the hidden units of the former. Such systems could solve previously unsolvable very deep learning tasks, and started our long series of successful deep learning methods since the early 1990s (funded by Swiss SNF, German DFG, EU and others), long before 2006, although everybody had to wait for faster computers to make very deep learning commercially viable. LBH also ignore earlier FNNs that profit from unsupervised pre-training prior to BP-based fine-tuning (e.g., Maclin and Shavlik, 1995). They cite Bengio et al.'s post-2006 papers on unsupervised stacks of autoencoders, but omit the original work on this (Ballard, 1987).

5. LBH write that "unsupervised learning (refs 91-98) had a catalytic effect in reviving interest in deep learning, but has since been overshadowed by the successes of purely supervised learning." Again they almost exclusively cite post-2005 papers co-authored by themselves. By 2005, however, this transition from unsupervised to supervised learning was an old hat, because back in the 1990s, our unsupervised RNN-based history compressors (see above) were largely phased out by our purely supervised Long Short-Term Memory (LSTM) RNNs, now widely used in industry and academia for processing sequences such as speech and video. Around 2010, history repeated itself, as unsupervised FNNs were largely replaced by purely supervised FNNs, after our plain GPU-based deep FNN (Ciresan et al., 2010) trained by BP with pattern distortions (Baird, 1990) set a new record on the famous MNIST handwritten digit dataset, suggesting that advances in exploiting modern computing hardware were more important than advances in algorithms. While LBH mention the significance of fast GPU-based NN implementations, they fail to cite the originators of this approach (Oh and Jung, 2004).

6. In the context of convolutional neural networks (ConvNets), LBH mention pooling, but not its pioneer (Weng, 1992), who replaced Fukushima's (1979) spatial averaging by max-pooling, today widely used by many, including LBH, who write: "ConvNets were largely forsaken by the mainstream computer-vision and machine-learning communities until the ImageNet competition in 2012," citing Hinton's 2012 paper (Krizhevsky et al., 2012). This is misleading. Earlier, committees of max-pooling ConvNets were accelerated on GPU (Ciresan et al., 2011a), and used to achieve the first superhuman visual pattern recognition in a controlled machine learning competition, namely, the highly visible IJCNN 2011 traffic sign recognition contest in Silicon Valley (relevant for self-driving cars). The system was twice better than humans, and three times better than the nearest non-human competitor (co-authored by LeCun of LBH). It also broke several other machine learning records, and surely was not "forsaken" by the machine-learning community. In fact, the later system (Krizhevsky et al. 2012) was very similar to the earlier 2011 system. Here one must also mention that the first official international contests won with the help of ConvNets actually date back to 2009 (three TRECVID competitions) - compare Ji et al. (2013). A GPU-based max-pooling ConvNet committee also was the first deep learner to win a contest on visual object discovery in large images, namely, the ICPR 2012 Contest on Mitosis Detection in Breast Cancer Histological Images (Ciresan et al., 2013). A similar system was the first deep learning FNN to win a pure image segmentation contest (Ciresan et al., 2012a), namely, the ISBI 2012 Segmentation of Neuronal Structures in EM Stacks Challenge.

7. LBH discuss their FNN-based speech recognition successes in 2009 and 2012, but fail to mention that deep LSTM RNNs had outperformed traditional speech recognizers on certain tasks already in 2007 (Fernández et al., 2007) (and traditional connected handwriting recognisers by 2009), and that today's speech recognition conferences are dominated by (LSTM) RNNs, not by FNNs of 2009 etc. While LBH cite work co-authored by Hinton on LSTM RNNs with several LSTM layers, this approach was pioneered much earlier (e.g., Fernandez et al., 2007).

8. LBH mention recent proposals such as "memory networks" and the somewhat misnamed "Neural Turing Machines" (which do not have an unlimited number of memory cells like real Turing machines), but ignore very similar proposals of the early 1990s, on neural stack machines, fast weight networks, self-referential RNNs that can address and rapidly modify their own weights during runtime, etc (e.g., AMAmemory 2015). They write that "Neural Turing machines can be taught algorithms," as if this was something new, although LSTM RNNs were taught algorithms many years earlier, even entire learning algorithms (e.g., Hochreiter et al., 2001b).

9. In their outlook, LBH mention "RNNs that use reinforcement learning to decide where to look" but not that they were introduced a quarter-century ago (Schmidhuber & Huber, 1991). Compare the more recent Compressed NN Search for large attention-directing RNNs (Koutnik et al., 2013).

One more little quibble: While LBH suggest that "the earliest days of pattern recognition" date back to the 1950s, the cited methods are actually very similar to linear regressors of the early 1800s, by Gauss and Legendre. Gauss famously used such techniques to recognize predictive patterns in observations of the asteroid Ceres.

LBH may be backed by the best PR machines of the Western world (Google hired Hinton; Facebook hired LeCun). In the long run, however, historic scientific facts (as evident from the published record) will be stronger than any PR. There is a long tradition of insights into deep learning, and the community as a whole will benefit from appreciating the historical foundations.

The contents of this critique may be used (also verbatim) for educational and non-commercial purposes, including articles for Wikipedia and similar sites.

References not yet in the survey (Schmidhuber, 2015):

Y. LeCun, Y. Bengio, G. Hinton (2015). Deep Learning. Nature 521, 436-444.

Y. LeCun (2015). IEEE Spectrum Interview by L. Gomes, Feb 2015:

R. Dechter (1986). Learning while searching in constraint-satisfaction problems. University of California, Computer Science Department, Cognitive Systems Laboratory. First paper to introduce the term "Deep Learning" to Machine Learning.

I. Aizenberg, N.N. Aizenberg, and J. P.L. Vandewalle (2000). Multi-Valued and Universal Binary Neurons: Theory, Learning and Applications. Springer Science & Business Media. First paper to introduce the term "Deep Learning" to Neural Networks. Compare a popular G+ post on this:

J. Schmidhuber (2015). Deep learning in neural networks: An overview. Neural Networks, 61, 85-117. Preprint:

AMAmemory (2015): Answer at reddit AMA (Ask Me Anything) on "memory networks" etc (with references):


19 comments on original post
Jeff Erickson's profile photoCarlos Scheidegger's profile photoAshish Hota's profile photoHelger Lipmaa's profile photo
Thanks for setting me straight - I jumped at the "who was the Henry Ford of statistics", but should have realized the article was better than that.
Add a comment...
I've started using slack to manage communications with my students and some collaborators. It seems to work quite well to reduce clutter in my mailbox, and the various tie-ins (to gcal, dropbox and github) make it quite useful. 
Parasaran Raman's profile photoSuresh Venkatasubramanian's profile photoSaravanan Thirumuruganathan's profile photoEric Eide's profile photo
+Robert Ricci Re: wrong O(). I dunno. Dying at half speed is still dying, but I'd take it.
Add a comment...
Come to ICML and learn to be fair !! Especially in the light of the Supreme Court decision on the Fair Housing Act
We hope you can join us for this upcoming workshop!

ICML Workshop on Fairness, Accountability, and Transparency in Machine Learning
Saturday, July 11th, 2015 - Lille, France

This interdisciplinary workshop will consider issues of fairness, accountability, and transparency in machine learning. It will address growing anxieties about the role that machine learning plays in consequential decision-making in such areas as commerce, employment, healthcare, education, and policing.

Invited Speakers:
Nick Diakopoulos --- Algorithmic Accountability and Transparency in Journalism
Sara Hajian --- Discrimination- and Privacy-Aware Data Mining
Salvatore Ruggieri --- Privacy Attacks and Anonymization Methods as Tools for Discrimination Discovery and Fairness
Toshihiro Kamishima and Kazuto Fukuchi --- Future Directions of Fairness-Aware Data Mining: Recommendation, Causality, and Theoretical Aspects

Accepted Papers:
Muhammad Bilal Zafar, Isabel Valera Martinez, Manuel Gomez Rodriguez, and Krishna Gummadi --- Fairness Constraints: A Mechanism for Fair Classification
Benjamin Fish, Jeremy Kun, and Ádám D. Lelkes --- Fair Boosting: A Case Study
Zubin Jelveh and Michael Luca --- Towards Diagnosing Accuracy Loss in Discrimination-Aware Classification: An Application to Predictive Policing
Indrė Žliobaitė --- On the Relation between Accuracy and Fairness in Binary Classification

Closing Panel Discussion:
Fernando Diaz, Sorelle Friedler, Mykola Pechenizkiy, Hanna Wallach, and Suresh Venkatasubramanian (Moderator)

Looking forward to seeing you in Lille!

The organizing committee,
Solon Barocas (General Chair), Princeton University
Sorelle Friedler (Program Chair), Haverford College
Moritz Hardt, Google
Josh Kroll, Princeton Unviersity
Carlos Scheidegger, University of Arizona
Suresh Venkatasubramanian, University of Utah
Hanna Wallach, Microsoft Research and University of Massachusetts Amherst
View original post
Carlos Scheidegger's profile photo
Add a comment...
More than 30,000 sq. feet of grass at the U is being converted to water-wise plants (like the ones in the photo) this month. Learn more about the project and tips for home use at
View original post
Alok Tiwari's profile photo
I would be quite happy to see almost the entire landscaping industry disappear, especially from university campuses.
Add a comment...


7 communities
or, humans don't regularize :)
A Quick Puzzle to Test Your Problem Solving. A short game sheds light on government policy, corporate America and why no one likes to be wrong. 
A short game sheds light on government policy, corporate America, and why no one likes to be wrong.
6 comments on original post
Ian Mallett's profile photoJeffrey Ullman's profile photo
Add a comment...
So I wasn't going crazy after all. Just installed Skim
Jeff Erickson's profile photoSariel Har-Peled's profile photoSylvain Soliman's profile photoSuresh Venkatasubramanian's profile photo
I'm not one of those "emacs should be an all encompassing universe" people so your attempt at snark flies right by :)
Add a comment...
Successfully unlocked my phone. The process is surprisingly involved. 

I had to explain the notion of a locked phone to my kids, and found that a 'kidnap-and-ransom' metaphor is quite useful. 
Add a comment...
While everyone's talking about SCOTUScare, I'm relieved that the Supreme Court continues to uphold the validity of disparate impact. 
Civil rights groups are breathing a little easier today, after the Court’s ruling in an important housing discrimination case. The question before the Court was whether claims brought under the Fair Housing Act, which prohibits housing discrimination “because of” race, can be based on an alleg
Add a comment...
A brief note on Ed Catmull's book 'Creativity, Inc'. 
I'm in Bertinoro for the Algorithms and Data Structures workshop organized by Camil Demetrescu, Andrew Goldberg and Valerie King. I will try to post updates from the event, but with the density of talks, no promises :). I'm still waiting to hear more about the STOC theoryfest deliberations from ...
Add a comment...
Bimal Roy was, until a few days ago, the director of the Indian Statistical Institute, a graduate institute based in Kolkata. He is well-known in cryptography for his work on block ciphers. He was also awarded the Padma Shri last year, one of the highest civilian awards in India. 

My interactions with him were limited, but I have generally been impressed by the huge summer internship program for Indian undergraduates that he runs at ISI. 

Roy was just removed from his position by the Ministry of Statistics and Programme Implementation, citing concerns about potential "indiscipline". The reasons are almost comically vague:

"A number of general and specific matters of financial and administrative irregularities which show the direct or supervisory responsibilities for acts of omission or commission on the part of the present Director, Prof. B. Roy are available in the Ministry in the various files on the different subjects. "

Below is an account of the events leading up to his dismissal written by Roy's colleague and former Ph.D. student, Sushmita Ruj. I copy her email here verbatim.

I don't know the facts of the story, but I am inclined to believe Sushmita's account. The lack of transparency should be especially troubling for academics, in India and elsewhere. 


This is Sushmita Ruj, writing from the Cryptology group of Indian Statistical Institute, Kolkata.

Professor Bimal Roy, the leader of our group, has been removed from the post of Director of the Indian Statistical Institute with a special emergency order issued by out governing Ministry (MoS&PI) on 10 June 2015. He was about to complete his five year term as Director on 31 July 2015.

The Ministry has expressed its apprehension that Prof. Bimal Roy may indulge in "propagation of indiscipline and mischief, including acts of financial and administrative impropriety" if he continues as the Director. However, neither has it put forward any formal charge against Prof. Roy, nor has it offered him the basic courtesy to defend himself before the decision was taken. There was no show-cause served to Prof. Roy at all, and the grounds for the decision seem quite fragile:

As we understand, this drastic decision from the Ministry is orchestrated by the Chairman of our Institute, Mr. Arun Shourie, who allegedly tried to coerce Prof. Bimal Roy into signing the proceedings of our last Council Meeting, in which the selection of the new Director of the Institute was considered. Allegedly, the drafted proceedings of the meeting was substantially different from the actual incidents at the meeting, especially on the issue of the selection of the new Director, and hence Prof. Roy refused to sign the proceedings. This is the only issue that may have been considered as "indiscipline" on Prof. Roy's part, as far as we know.

To verify the truth in this matter, a senior Professor of our Institute filed a written petition (RTI) to obtain the audio recording of the meeting, and the hearing of the RTI was due on 11 June 2015. It is quite an amazing coincidence that the Ministry took the decision of stripping Prof. Bimal Roy of all his powers as the Director just the night before, in the afternoon of 10 June 2015. Not to mention, the RTI hearing has been postponed at once. This raises a strong suspicion that the Ministry, supporting the Chairman of the Institute, is aiming at suppressing the audio recording of the meeting to avoid a transparent investigation into the matters at hand.

Professor Bimal Roy has dedicated his life for services to the Nation, and this is not what he deserves from the Institute which has gained from his leadership in the last five years more than it probably has under the leadership of any other Director in the recent times. Prof. Roy, and our Institute, deserves a transparent independent public investigation. The audio recording and proceedings of the Council meeting should be released and investigated by independent authorities. Prof. Roy should be offered a chance for a fair trial, independent of the actions by the Ministry, and till the investigation is over, Prof. Roy should be reinstated as the Director of our Institute, till 31 July 2015, the end of his rightful term. At the end of his term, Prof. Bimal Roy deserves a vote of thanks from the Council for his services to the Institute during the last five years, and not a humiliating send-off. This is simply unacceptable!

If you support the cause, and want justice for Prof. Bimal Roy, please convey your thoughts to the Hon'ble President of India at the following email IDs.     
Public Grievance Cell : Mr. Purushottam Dass   
Private Secretary to the President : Mr. Rajneesh   
Private Secretary to the President : Mr. Pradeep Gupta

You may also write directly to the Hon'ble Prime Minister of India through the online portal at :

I request you to propagate the news in the concerned academic community, government organizations and public media that you deem suitable, and sign-and-share the online petition supporting the cause.

It will be nice to have the community support Prof. Bimal Roy in these dire times. Please feel free to write to him directly as well.

View original post
Heinrich C. Kuhn's profile photoManu Awasthi's profile photo
Add a comment...
7 communities
  • U. Utah
    Associate professor, present
Map of the places this user has livedMap of the places this user has livedMap of the places this user has lived
Salt Lake City
Aarhus, DK - Stanford, CA - Philadelphia, PA - Morristown, NJ - New Delhi, India - Berkeley, CA
CS prof, interested in algorithms, geometry, data mining, clustering
  • Stanford
  • IIT Kanpur
Basic Information
Other names
That this restaurant has an overall rating less than 4 is a travesty. Skip Finca's overrated food and preserve your ear drums: Cafe Madrid is a much more intimate (read: quiet and charming) Spanish fine dining experience, with possibly the best service I've ever had in Salt Lake City. Call ahead if you want the paella: it's worth it.
Public - 4 months ago
reviewed 4 months ago
3 reviews