Profile

Cover photo
Gervais Mulongoy
Works at QNX Software Systems Limited, a subsidiary of BlackBerry
Attended International School of Geneva
Lives in Ottawa, Ontario
28,387 views
AboutPostsPhotosVideos+1's

Stream

 
web technology has come soo far since 1995.
1
Add a comment...
 

My First Deep Learning System of 1991 + Deep Learning Timeline 1962-2013 (an experiment in open online peer review - comments welcome - as a machine learning researcher I am obsessed with proper credit assignment):

In 2009, our Deep Learning Artificial Neural Networks became the first Deep Learners to win official international pattern recognition competitions [A9] (with deadline and secret test set known only to the organisers); by 2012 they had won eight of them [A11]. In 2011, GPU-based versions achieved the first superhuman visual pattern recognition results [A10]. Others implemented variants and have won additional contests since 2012, e.g., [A12]. The field of Deep Learning research is far older though (see timeline further down).

My first Deep Learner dates back to 1991 [1,2]. It can perform credit assignment across hundreds of nonlinear operators or neural layers, by using unsupervised pre-training for a stack of recurrent neural networks (RNN) (deep by nature) as in the figure. (Such RNN are general computers more powerful than normal feedforward NN, and can encode entire sequences of inputs.)

The basic idea is still relevant today. Each RNN is trained for a while in unsupervised fashion to predict its next input. From then on, only unexpected inputs (errors) convey new information and get fed to the next higher RNN which thus ticks on a slower, self-organising time scale. It can easily be shown that no information gets lost. It just gets compressed (note that much of machine learning is essentially about compression). We get less and less redundant input sequence encodings in deeper and deeper levels of this hierarchical temporal memory, which compresses data in both space (like feedforward NN) and time. There also is a continuous variant [47].

One ancient illustrative Deep Learning experiment of 1993 [2] required credit assignment across 1200 time steps, or through 1200 subsequent nonlinear virtual layers. The top level code of the initially unsupervised RNN stack, however, got so compact that (previously infeasible) sequence classification through additional supervised learning became possible.

There is a way of compressing higher levels down into lower levels, thus partially collapsing the hierarchical temporal memory. The trick is to retrain lower-level RNN to continually imitate (predict) the hidden units of already trained, slower, higher-level RNN, through additional predictive output neurons [1,2]. This helps the lower RNN to develop appropriate, rarely changing memories that may bridge very long time lags.

The Deep Learner of 1991 was a first way of overcoming the Fundamental Deep Learning Problem identified and analysed in 1991 by my very first student (now professor) Sepp Hochreiter: the problem of vanishing or exploding gradients [3,4,4a,A5]. The latter motivated all our subsequent Deep Learning research of the 1990s and 2000s.

Through supervised LSTM RNN (1997) (e.g., [5,6,7,A7]) and faster computers we could eventually perform similar feats as with the 1991 system [1,2], overcoming the Fundamental Deep Learning Problem without any unsupervised pre-training. Moreover, LSTM could also learn tasks unlearnable by the partially unsupervised 1991 chunker [1,2].

Particularly successful are stacks of LSTM RNN [10] trained by Connectionist Temporal Classification (CTC) [8]. In 2009, this became the first RNN system ever to win an official international pattern recognition competition [A9], through the work of my PhD student and postdoc Alex Graves, e.g., [10]. To my knowledge, this also was the first Deep Learning system ever (recurrent or not) to win such a contest. (In fact, it won three different ICDAR 2009 contests on connected handwriting in three different languages, e.g., [11,A9,A13].) A while ago, Alex moved on to Geoffrey Hinton's lab (Univ. Toronto), where a stack [10] of our bidirectional LSTM RNN [7] also broke a famous TIMIT speech recognition record [12], despite thousands of man years previously spent on HMM-based speech recognition research.

Recently, well-known entrepreneurs also got interested in hierarchical temporal memories [13,14].

The expression Deep Learning actually got coined relatively late, around 2006, in the context of unsupervised pre-training for less general feedforward networks [15]. Such a system reached 1.2% error rate [15] on the MNIST handwritten digits [16], perhaps the most famous benchmark of Machine Learning. Our team first showed that good old backpropagation [A1] on GPUs (with training pattern distortions [42,43] but without any unsupervised pre-training) can actually achieve a three times better result of 0.35% [17] - back then, a world record (a previous standard net achieved 0.7% [43]; a backprop-trained [16] Convolutional NN (CNN) [19a,19,16,16a] got 0.39% [49]; plain backprop without distortions except for small saccadic eye movement-like translations already got 0.95%). Then we replaced our standard net by a biologically rather plausible architecture inspired by early neuroscience-related work [19a,18,19,16]: Deep and Wide GPU-based Multi-Column Max-Pooling CNN (MCMPCNN) [21,22] with alternating backprop-based [16,16a,50] weight-sharing convolutional layers [19,16,23] and winner-take-all [19a,19] max-pooling [20,24,50,46] layers (see [55] for early GPU-based CNN). MCMPCNN are committees of MPCNN [25a] with simple democratic output averaging (compare earlier more sophisticated ensemble methods [48]). Object detection [54] and image segmentation [53] profit from fast MPCNN-based image scans [28,28a]. Our supervised MCMPCNN was the first method to achieve superhuman performance in an official international competition (with deadline and secret test set known only to the organisers) [25,25a,A10] (compare [51]), and the first with human-competitive performance (around 0.2%) on MNIST [22]. Since 2011, it has won numerous additional competitions on a routine basis [A11-A13].

Some of our methods were adopted by the groups of Univ. Toronto/Stanford/Google, e.g., [26,27]. Apple Inc., the most profitable smartphone maker, hired Ueli Meier, member of our Deep Learning team that won the ICDAR 2011 Chinese handwriting contest [22,A9]. ArcelorMittal, the world's top steel producer, is using our methods for material defect detection, e.g., [28]. Other users include a leading automotive supplier, recent start-ups such as deepmind (which hired four of my former PhD students/postdocs), and many other companies and leading research labs. One of the most important applications of our techniques is biomedical imaging [54], e.g., for cancer prognosis or plaque detection in CT heart scans.

Remarkably, the most successful Deep Learning algorithms in most international contests since 2009 [A9-A13] are adaptations and extensions of a 40-year-old algorithm, namely, supervised efficient backprop [A1,60,29a] (compare [30,31,58,59,61]) or BPTT/RTRL for RNN, e.g., [32-34,37-39]. (Exceptions include two 2011 contests specialised on transfer learning [44] - but compare [45]). In particular, as of 2013, state-of-the-art feedforward nets [A10-A13] are GPU-based [21] multi-column [22] combinations of two ancient concepts: Backpropagation [A1] applied [16a] to Neocognitron-like convolutional architectures [A2] (with max-pooling layers [20,50,46] instead of alternative [19a,19,40] winner-take-all methods). (Plus additional tricks from the 1990s and 2000s, e.g., [41a,41b,41c].) In the deep recurrent case, supervised systems also dominate, e.g, [5,8,10,9,39,12,A9].

Nevertheless, in many applications it can still be advantageous to combine the best of both worlds - supervised learning and unsupervised pre-training, like in my 1991 system described above [1,2].

Acknowledgments: Thanks for valuable comments to Geoffrey Hinton, Kunihiko Fukushima, Yoshua Bengio, Sven Behnke, Yann LeCun, Sepp Hochreiter, Mike Mozer, Marc'Aurelio Ranzato, Andreas Griewank, Paul Werbos, Shun-ichi Amari, Seppo Linnainmaa, Peter Norvig, Yu-Chi Ho, and others. Graphics: Fibonacci Web Design

-------------------------------------------------------------------

Timeline of Deep Learning Highlights
(under construction - compare references further down)

[A0] 1962: Discovery of simple cells and complex cells in the visual cortex [18], inspiration for later deep artificial neural network (NN) architectures [A2] used in certain modern award-winning Deep Learners [A10-A13]

[A1] 1970 (plusminus a decade or so): Error functions and their gradients for complex, nonlinear, multi-stage, differentiable, NN-like systems have been discussed at least since the 1960s, e.g., [56-58,64-66]. Gradients can be computed [57-58] by iterating the ancient chain rule [68,69] in dynamic programming style [67]. However, efficient error backpropagation (BP) in sparse, acyclic, NN-like networks apparently was first described in 1970 [60-61]. BP is also known as the reverse mode of automatic differentiation [56], where the costs of forward activation spreading essentially equal the costs of backward derivative calculation. See early FORTRAN code [60], and compare [62]. Compare the concept of ordered derivative [29], with NN-specific discussion [29] (section 5.5.1), and the first efficient NN-specific BP of the early 1980s [29a,29b]. Compare [30,31,59] and generalisations for sequence-processing recurrent NN, e.g., [32-34,37-39]. See also natural gradients [63]. As of 2013, BP is still the central Deep Learning algorithm.

[A2] 1979: Deep Neocognitron Architecture [19a,19,40] incorporating neurophysiological insights [A0,18], with weight-sharing convolutional neural layers as well as winner-take-all layers, very similar to the architecture of modern, competition-winning, purely supervised, feedforward, gradient-based Deep Learners [A10-A13] (but using local unsupervised learning rules instead) http://www.scholarpedia.org/article/Neocognitron

[A3] 1987: Ideas published on unsupervised autoencoder hierarchies [35], related to post-2000 feedforward Deep Learners based on unsupervised pre-training, e.g., [15]; compare survey [36] and somewhat related RAAMs [52]

[A4] 1989: Backprop [A1] applied [16,16a] to weight-sharing convolutional neural layers [A2,19a,19,16], essential ingredient of many modern, competition-winning, feedforward, visual Deep Learners [A10-A13]

[A5] 1991: Fundamental Deep Learning Problem discovered and analyzed [3]; compare [4] http://www.idsia.ch/~juergen/fundamentaldeeplearningproblem.html

[A6] 1991: First recurrent Deep Learning system, and perhaps the first working Deep Learner in the modern post-2000 sense, also first Neural Hierarchical Temporal Memory (present page: deep RNN stack plus unsupervised pre-training) [1,2] www.deeplearning.me

[A7] 1997: First purely supervised Deep Learner (LSTM RNN), e.g., [5-10,12,A9] http://www.idsia.ch/~juergen/rnn.html

[A8] 2006: Science paper [15] helps to arouse interest in deep NN (focus on unsupervised pre-training)

[A9] 2009: First official international pattern recognition contests won by Deep Learning (several connected handwriting competitions won by LSTM RNN) [10,11] http://www.idsia.ch/~juergen/handwriting.html

[A10] 2011: First superhuman visual pattern recognition, through deep and wide supervised GPU-based Multicolumn Max-Pooling CNN (MCMPCNN), the current gold standard for deep feedforward NN [25-26] http://www.idsia.ch/~juergen/superhumanpatternrecognition.html

[A11] 2012: 8th international pattern recognition contest won since 2009 (interview on KurzweilAI) http://www.kurzweilai.net/how-bio-inspired-deep-learning-keeps-winning-competitions

[A12] 2013: More pattern recognition contests since 2012 (lab of G.H.) http://www.cs.toronto.edu/~hinton/

[A13] 2013: More benchmark world records set by Deep Learning (lab of J.S.) http://www.idsia.ch/~juergen/deeplearning.html


References

[1] J. Schmidhuber. Learning complex, extended sequences using the principle of history compression, Neural Computation, 4(2):234-242, 1992 (based on TR FKI-148-91, 1991). ftp://ftp.idsia.ch/pub/juergen/chunker.pdf

[2] J. Schmidhuber. Habilitation thesis, TUM, 1993. An ancient experiment with credit assignment across 1200 time steps or virtual layers and unsupervised pre-training for a stack of recurrent NN can be found here http://www.idsia.ch/~juergen/habilitation/node114.html - try Google Translate in your mother tongue.

[3] S. Hochreiter. Untersuchungen zu dynamischen neuronalen Netzen. Diploma thesis, TUM, 1991 (advisor J.S.) http://www.idsia.ch/~juergen/SeppHochreiter1991ThesisAdvisorSchmidhuber.pdf

[4] S. Hochreiter, Y. Bengio, P. Frasconi, J. Schmidhuber. Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. In S. C. Kremer and J. F. Kolen, eds., A Field Guide to Dynamical Recurrent Neural Networks. IEEE press, 2001. ftp://ftp.idsia.ch/pub/juergen/gradientflow.pdf

[4a] Y. Bengio, P. Simard, P. Frasconi. Learning long-term dependencies with gradient descent is difficult. IEEE TNN 5(2), p 157-166, 1994

[5] S. Hochreiter, J. Schmidhuber. Long Short-Term Memory. Neural Computation, 9(8):1735-1780, 1997. ftp://ftp.idsia.ch/pub/juergen/lstm.pdf

[6] F. A. Gers, J. Schmidhuber, F. Cummins. Learning to Forget: Continual Prediction with LSTM. Neural Computation, 12(10):2451--2471, 2000.

[7] A. Graves, J. Schmidhuber. Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Networks, 18:5-6, pp. 602-610, 2005.

[8] A. Graves, S. Fernandez, F. Gomez, J. Schmidhuber. Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks. ICML 06, Pittsburgh, 2006. ftp://ftp.idsia.ch/pub/juergen/icml2006.pdf

[9] A. Graves, M. Liwicki, S. Fernandez, R. Bertolami, H. Bunke, J. Schmidhuber. A Novel Connectionist System for Improved Unconstrained Handwriting Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 5, 2009.

[10] A. Graves, J. Schmidhuber. Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks. NIPS'22, p 545-552, Vancouver, MIT Press, 2009. http://www.idsia.ch/~juergen/nips2009.pdf

[11] J. Schmidhuber, D. Ciresan, U. Meier, J. Masci, A. Graves. On Fast Deep Nets for AGI Vision. In Proc. Fourth Conference on Artificial General Intelligence (AGI-11), Google, Mountain View, California, 2011. http://www.idsia.ch/~juergen/agivision2011.pdf

[12] A. Graves, A. Mohamed, G. E. Hinton. Speech Recognition with Deep Recurrent Neural Networks. ICASSP 2013, Vancouver, 2013. http://www.cs.toronto.edu/~hinton/absps/RNN13.pdf

[13] J. Hawkins, D. George. Hierarchical Temporal Memory - Concepts, Theory, and Terminology. Numenta Inc., 2006.

[14] R. Kurzweil. How to Create a Mind: The Secret of Human Thought Revealed. ISBN 0670025291, 2012.

[15] G. E. Hinton, R. R. Salakhutdinov. Reducing the dimensionality of data with neural networks. Science, Vol. 313. no. 5786, pp. 504 - 507, 2006. http://www.cs.toronto.edu/~hinton/science.pdf

[16] Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, L. D. Jackel: Backpropagation Applied to Handwritten Zip Code Recognition, Neural Computation, 1(4):541-551, 1989. http://yann.lecun.com/exdb/publis/pdf/lecun-89e.pdf

[16a] Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard and L. D. Jackel: Handwritten digit recognition with a back-propagation network. Proc. NIPS 1989, 2, Morgan Kaufman, Denver, CO, 1990.

[17] Dan Claudiu Ciresan, U. Meier, L. M. Gambardella, J. Schmidhuber. Deep Big Simple Neural Nets For Handwritten Digit Recognition. Neural Computation 22(12): 3207-3220, 2010. http://arxiv.org/abs/1003.0358

[18] D. H. Hubel, T. N. Wiesel. Receptive Fields, Binocular Interaction And Functional Architecture In The Cat's Visual Cortex. Journal of Physiology, 1962.

[19] K. Fukushima. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological Cybernetics, 36(4): 193-202, 1980. Scholarpedia http://www.scholarpedia.org/article/Neocognitron

[19a] K. Fukushima: Neural network model for a mechanism of pattern recognition unaffected by shift in position - Neocognitron. Trans. IECE, vol. J62-A, no. 10, pp. 658-665, 1979.

[20] M. Riesenhuber, T. Poggio. Hierarchical models of object recognition in cortex. Nature Neuroscience 11, p 1019-1025, 1999. http://riesenhuberlab.neuro.georgetown.edu/docs/publications/nn99.pdf

[21] D. C. Ciresan, U. Meier, J. Masci, L. M. Gambardella, J. Schmidhuber. Flexible, High Performance Convolutional Neural Networks for Image Classification. International Joint Conference on Artificial Intelligence (IJCAI-2011, Barcelona), 2011. http://www.idsia.ch/~juergen/ijcai2011.pdf

[22] D. C. Ciresan, U. Meier, J. Schmidhuber. Multi-column Deep Neural Networks for Image Classification. Proc. IEEE Conf. on Computer Vision and Pattern Recognition CVPR 2012, p 3642-3649, 2012. http://www.idsia.ch/~juergen/cvpr2012.pdf

[23] Y. LeCun, Y. Bottou, Y. Bengio, P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278-2324, 1998 http://yann.lecun.com/exdb/publis/pdf/lecun-98.pdf

[24] S. Behnke. Hierarchical Neural Networks for Image Interpretation. Dissertation, FU Berlin, 2002. LNCS 2766, Springer 2003. http://www.ais.uni-bonn.de/books/LNCS2766.pdf

[25] D. C. Ciresan, U. Meier, J. Masci, J. Schmidhuber. Multi-Column Deep Neural Network for Traffic Sign Classification. Neural Networks 32: 333-338, 2012. http://www.idsia.ch/~juergen/nn2012traffic.pdf

[25a] D. C. Ciresan, U. Meier, J. Masci, J. Schmidhuber. A Committee of Neural Networks for Traffic Sign Classification. International Joint Conference on Neural Networks (IJCNN-2011, San Francisco), 2011.

[26] A. Krizhevsky, I. Sutskever, G. E. Hinton. ImageNet Classification with Deep Convolutional Neural Networks. NIPS 25, MIT Press, 2012. http://www.cs.toronto.edu/~hinton/absps/imagenet.pdf

[27] A. Coates, B. Huval, T. Wang, D. J. Wu, Andrew Y. Ng, B. Catanzaro. Deep Learning with COTS HPC Systems, ICML 2013. http://www.stanford.edu/~acoates/papers/CoatesHuvalWangWuNgCatanzaro_icml2013.pdf

[28] J. Masci, A. Giusti, D. Ciresan, G. Fricout, J. Schmidhuber. A Fast Learning Algorithm for Image Segmentation with Max-Pooling Convolutional Networks. ICIP 2013. http://arxiv.org/abs/1302.1690

[28a] A. Giusti, D. Ciresan, J. Masci, L.M. Gambardella, J. Schmidhuber. Fast Image Scanning with Deep Max-Pooling Convolutional Neural Networks. ICIP 2013. http://arxiv.org/abs/1302.1700

[29] P. J. Werbos. Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences. PhD thesis, Harvard University, 1974

[29a] P. J. Werbos. Applications of advances in nonlinear sensitivity analysis. In R. Drenick, F. Kozin, (eds): System Modeling and Optimization: Proc. IFIP (1981), Springer, 1982.

[29b] P. J. Werbos. Backwards Differentiation in AD and Neural Nets: Past Links and New Opportunities. In H.M. Bücker, G. Corliss, P. Hovland, U. Naumann, B. Norris (Eds.), Automatic Differentiation: Applications, Theory, and Implementations, 2006. http://www.werbos.com/AD2004.pdf

[30] Y. LeCun: Une procedure d'apprentissage pour reseau a seuil asymetrique. Proceedings of Cognitiva 85, 599-604, Paris, France, 1985. http://yann.lecun.com/exdb/publis/pdf/lecun-85.pdf

[31] D. E. Rumelhart, G. E. Hinton, R. J. Williams. Learning internal representations by error propagation. In D. E. Rumelhart and J. L. McClelland, editors, Parallel Distributed Processing, volume 1, pages 318-362. MIT Press, 1986 http://www.cs.toronto.edu/~hinton/absps/pdp8.pdf

[32] Ron J. Williams. Complexity of exact gradient computation algorithms for recurrent neural networks. Technical Report Technical Report NU-CCS-89-27, Boston: Northeastern University, College of Computer Science, 1989

[33] A. J. Robinson and F. Fallside. The utility driven dynamic error propagation network. TR CUED/F-INFENG/TR.1, Cambridge University Engineering Department, 1987

[34] P. J. Werbos. Generalization of backpropagation with application to a recurrent gas market model. Neural Networks, 1, 1988

[35] D. H. Ballard. Modular learning in neural networks. Proc. AAAI-87, Seattle, WA, p 279-284, 1987

[36] G. E. Hinton. Connectionist learning procedures. Artificial Intelligence 40, 185-234, 1989. http://www.cs.toronto.edu/~hinton/absps/clp.pdf

[37] B. A. Pearlmutter. Learning state space trajectories in recurrent neural networks. Neural Computation, 1(2):263-269, 1989

[38] J. Schmidhuber. A fixed size storage O(n^3) time complexity learning algorithm for fully recurrent continually running networks. Neural Computation, 4(2):243-248, 1992.

[39] J. Martens and I. Sutskever. Training Recurrent Neural Networks with Hessian-Free Optimization. In Proc. ICML 2011.

[40] K. Fukushima: Artificial vision by multi-layered neural networks: Neocognitron and its advances, Neural Networks, vol. 37, pp. 103-119, 2013. http://dx.doi.org/10.1016/j.neunet.2012.09.016

[41a] G. B. Orr, K.R. Müller, eds., Neural Networks: Tricks of the Trade. LNCS 1524, Springer, 1999.

[41b] G. Montavon, G. B. Orr, K.R. Müller, eds., Neural Networks: Tricks of the Trade. LNCS 7700, Springer, 2012.

[41c] Lots of additional tricks for improving (e.g., accelerating, robustifying, simplifying, regularising) NN can be found in the proceedings of NIPS (since 1987), IJCNN (of IEEE & INNS, since 1989), ICANN (since 1991), and other NN conferences since the late 1980s. Given the recent attention to NN, many of the old tricks may get revived.

[42] H. Baird. Document image defect models. IAPR Workshop, Syntactic & Structural Pattern Recognition, p 38-46, 1990

[43] P. Y. Simard, D. Steinkraus, J.C. Platt. Best Practices for Convolutional Neural Networks Applied to Visual Document Analysis. ICDAR 2003, p 958-962, 2003.

[44] I. J. Goodfellow, A. Courville, Y. Bengio. Spike-and-Slab Sparse Coding for Unsupervised Feature Discovery. Proc. ICML, 2012.

[45] D. Ciresan, U. Meier, J. Schmidhuber. Transfer Learning for Latin and Chinese Characters with Deep Neural Networks. Proc. IJCNN 2012, p 1301-1306, 2012.

[46] D. Scherer, A. Mueller, S. Behnke. Evaluation of pooling operations in convolutional architectures for object recognition. In Proc. ICANN 2010. http://www.ais.uni-bonn.de/papers/icann2010_maxpool.pdf

[47] J. Schmidhuber, M. C. Mozer, and D. Prelinger. Continuous history compression. In H. Hüning, S. Neuhauser, M. Raus, and W. Ritschel, editors, Proc. of Intl. Workshop on Neural Networks, RWTH Aachen, pages 87-95. Augustinus, 1993.

[48] R. E. Schapire. The Strength of Weak Learnability. Machine Learning 5 (2): 197-227, 1990.

[49] M. A. Ranzato, C. Poultney, S. Chopra, Y. Lecun. Efficient learning of sparse representations with an energy-based model. Proc. NIPS, 2006.

[50] M. Ranzato, F.J. Huang, Y. Boureau, Y. LeCun. Unsupervised Learning of Invariant Feature Hierarchies with Applications to Object Recognition. Proc. CVPR 2007, Minneapolis, 2007. http://www.cs.toronto.edu/~ranzato/publications/ranzato-cvpr07.pdf

[51] P. Sermanet, Y. LeCun. Traffic sign recognition with multi-scale convolutional networks. Proc. IJCNN 2011, p 2809-2813, IEEE, 2011

[52] J. B. Pollack. Implications of Recursive Distributed Representations. Advances in Neural Information Processing Systems I, NIPS, 527-536, 1989.

[53] Deep Learning NN win 2012 Brain Image Segmentation Contest http://www.idsia.ch/~juergen/deeplearningwinsbraincontest.html

[54] Deep Learning NN win MICCAI 2013 Grand Challenge (and 2012 ICPR Contest) on Mitosis Detection http://www.idsia.ch/~juergen/deeplearningwinsMICCAIgrandchallenge.html

[55] K. Chellapilla, S. Puri, P. Simard. High performance convolutional neural networks for document processing. International Workshop on Frontiers in Handwriting Recognition, 2006.

[56] A. Griewank. Who invented the reverse mode of differentiation? Documenta Mathematica, Extra Volume ISMP, p 389-400, 2012

[57] H. J. Kelley. Gradient Theory of Optimal Flight Paths. ARS Journal, Vol. 30, No. 10, pp. 947-954, 1960.

[57a] A. E. Bryson. A gradient method for optimizing multi-stage allocation processes. Proc. Harvard Univ. Symposium on digital computers and their applications, 1961.

[58] A. E. Bryson, Y. Ho. Applied optimal control: optimization, estimation, and control. Waltham, MA: Blaisdell, 1969.

[59] D.B. Parker. Learning-logic, TR-47, Sloan School of Management, MIT, Cambridge, MA, 1985.

[60] S. Linnainmaa. The representation of the cumulative rounding error of an algorithm as a Taylor expansion of the local rounding errors. Master's Thesis (in Finnish), Univ. Helsinki, 1970. See chapters 6-7 and FORTRAN code on pages 58-60. http://www.idsia.ch/~juergen/linnainmaa1970thesis.pdf

[61] S. Linnainmaa. Taylor expansion of the accumulated rounding error. BIT 16, 146-160, 1976. http://link.springer.com/article/10.1007%2FBF01931367

[62] G.M. Ostrovskii, Yu.M. Volin, W.W. Borisov. Über die Berechnung von Ableitungen. Wiss. Z. Tech. Hochschule für Chemie 13, 382-384, 1971.

[63] S. Amari, Natural gradient works efficiently in learning, Neural Computation, 10, 4-10, 1998

[64] G. J. H. Wilkinson. The algebraic eigenvalue problem. Clarendon Press, Oxford, UK, 1965.

[65] S. Amari, Theory of Adaptive Pattern Classifiers. IEEE Trans., EC-16, No. 3, pp. 299-307, 1967

[66] S. W. Director, R. A. Rohrer. Automated network design - the frequency-domain case. IEEE Trans. Circuit Theory CT-16, 330-337, 1969.

[67] R. Bellman. Dynamic Programming. Princeton University Press, 1957.

[68] G. W. Leibniz. Memoir using the chain rule, 1676. (Cited in TMME 7:2&3 p 321-332, 2010)

[69] G. F. A. L'Hospital. Analyse des infiniment petits - Pour l'intelligence des lignes courbes. Paris: L'Imprimerie Royale, 1696.
.
1
Add a comment...
 
Quite um, disturbing and insanely entertaining at the same time.
And it is a real product!
1
Add a comment...
 
The more I check out the Z30 the more I want it besides I need to replace my aged Galaxy Nexus.

#fuelmyfire
1
Add a comment...

Gervais Mulongoy

Shared publicly  - 
 
#nice  
 
Watch this robot perform a perfect quadruple backflip

Youtuber hinamitetu has engineered a squadron of robot gymnasts capable of executing flips, handsprings, and high-bar acrobatics. Bots capable of entry into other artistic events are sure to follow. In this, his latest video, one of hinamitetu's creations performs a flawless quadruple backflip and sticks the landing like Kerri Strug. Please, nobody tell DARPA about this.

Read more : http://goo.gl/Oa8Mr1

Image : io9 
1
Add a comment...

Gervais Mulongoy

Shared publicly  - 
 
Thoughts?
1
Add a comment...

Gervais Mulongoy

Shared publicly  - 
 
so #funny I can't stop laughing!
 
Sunday funny

I was looking for my keys. They were not in my pockets. A quick search in the meeting room revealed nothing.
       
Suddenly I realized I must have left them in the car. Frantically, I headed for the parking lot. My husband has scolded me many times for leaving the keys in the ignition.

My theory is the ignition is the best place not to lose them. His theory is that the car will be stolen.

As I burst through the door, I came to a terrifying conclusion. His theory was right.

The parking lot was empty. I immediately called the police. I gave them my location, confessed that I had left my keys in the car, and that it had been stolen.
   
Then I made the most difficult call of all . "Hi, honey," I stammered . ( I always call him "honey" in times like these) "I love you."
         
"I left my keys in the car and it's been stolen."
   
There was a period of silence.  I thought the call had been dropped, but then I heard his voice. "Are you kidding me ! , he barked, "I dropped you off!!!!!!!"
   
Now it was my time to be silent.  Embarrassed, I said,"Well, come and get me."
   
He retorted,  "I will, as soon as I convince this cop I didn't steal your car."
4
2
Matthew Williams's profile photoBambi Blue's profile photo
Add a comment...
2
Jerry Babione's profile photo
 
Cool...
Add a comment...
 
Nice video/article by +Simon Sage re: BlackBerry Z30 - I am buying it the moment it is on sale in Canada and relegating my Galaxy Nexus  to home phone status.
1
Add a comment...
People
Work
Occupation
Software Developer
Employment
  • QNX Software Systems Limited, a subsidiary of BlackBerry
    Support Specialist/Software Developer, 2009 - present
Places
Map of the places this user has livedMap of the places this user has livedMap of the places this user has lived
Currently
Ottawa, Ontario
Previously
Geneva, Switzerland - North Andover, Massachusetts - Abidjan, Ivory Coast - Ibadan, Nigeria
Links
Contributor to
Story
Introduction
aka Jester aka JSK aka #crooks alum
Bragging rights
Proud father of two wonderful boys!
Education
  • International School of Geneva
    Grade School
  • Brooks School
    High School, 1996 - 1998
  • Oxford Academy
    High School, 1998 - 1999
  • Ottawa University
    Bachelor of Arts Concentration Linguistics, 2000 - 2005
  • Algonquin College
    Computer Engineering Technology - Computing Science, 2005 - 2008
Basic Information
Gender
Male
Relationship
Married
Gervais Mulongoy's +1's are the things they like, agree with, or want to recommend.
BBM
market.android.com

The OFFICIAL version of BBM™ from BlackBerry is now here for Android. Get the free BBM app for the best way to stay connected with friends a

Stallman: How Much Surveillance Can Democracy Withstand? | Wired Opinion...
www.wired.com

Where exactly is the maximum tolerable level of surveillance, beyond which it becomes oppressive? We must consider surveillance a kind of so

Behold The Unfilmable: Hyperion Cantos
litreactor.com

The Hyperion Cantos is a fantastic work of Science Fiction, but it would be a disaster if it were made into a film.

Android
plus.google.com

A place for Android fans everywhere to meet, share and get the latest on all things Android.

End Piracy, Not Liberty – Google
www.google.com

Millions of Americans oppose SOPA and PIPA because these bills would censor the Internet and slow economic growth in the U.S.. Two bills bef

Galaxy Nexus
www.google.com

Galaxy Nexus. First phone with Android 4.0, Face Unlock, Android Beam, an amazing HD screen and 4G LTE fast.

Galaxy Nexus
www.google.com

Galaxy Nexus. First phone with Android 4.0, Face Unlock, Android Beam, an amazing HD screen and 4G LTE fast.

Dub Gabriel
dubgabriel.bandcamp.com

Pushing the boundaries of electronic, reggae, world music and rock, and appearing on over 25 releases, DG has become one of the most in-dema

Humble Bundle
plus.google.com

Pay what you want. Support charity. Get awesome games.

BBC Africa
plus.google.com

African news from the BBC

Galaxy Nexus
www.google.ca

Galaxy Nexus. First phone with Android 4.0, Face Unlock, Android Beam, an amazing HD screen and 4G LTE fast.

Galaxy Nexus
www.google.ca

Galaxy Nexus. First phone with Android 4.0, Face Unlock, Android Beam, an amazing HD screen and 4G LTE fast.

Factory Images for Nexus Devices - Google Support for Nexus Phones and F...
code.google.com

Factory Images for Nexus Devices. This page contains binary image files that are provided for use in restoring your Nexus device's origi

NOW Showcase: Eric Benet
www.youtube.com

www.theNOW-online.com We're sharing our evening with Eric Benet, as he sings our favorites like "Chocolate Legs," "You&#3

Google Reader
market.android.com

Follow all your favorite sites, blogs, and more, all in one place. Follow all your favorite sites, blogs, and more, all in one place. See wh

YouTube - Dorian Concept - Her Tears Taste Like Pears
www.youtube.com

Create AccountSign In. Home. BrowseMoviesUpload. Hey there, this is not a commercial interruption. You're using an outdated browser, whi

Sisterhood of Dune
www.chapters.indigo.ca

It is eighty-three years after the last of the thinking machines were destroyed in the Battle of Corrin, after Faykan Butler took the name o

MDK2 gets $15 HD re-release on PC, looks and plays great
feeds.arstechnica.com

MDK2HD takes a classic and gives it a facelift, with some gameplay tweaks. $15 for a quad-wielding mutant dog and his two friends i

Gentoo Forums :: View topic - is it safe to remove these --depclean pack...
forums.gentoo.org

Code: >>> Calculating removal order... >>> These are the packages that would be unmerged: dev-perl/extutils-pkgconfig sele

The Most Expensive One-Byte Mistake - Slashdot
developers.slashdot.org

An anonymous reader writes "Poul-Henning Kamp looks back at some of the bad decisions made in language design, specifically the C/Unix/Posix