Profile

Cover photo
Peter Meijer (The vOICe)
Works at Metamodal
Attended Delft University of Technology
Lives in Eindhoven, The Netherlands
254 followers|82,743 views
AboutPostsPhotosVideos

Stream

Pinned
 
(The Guardian) The vOICe: the soundscape headsets that allow blind people to 'see' the world http://www.theguardian.com/society/2014/dec/07/voice-soundscape-headsets-allow-blind-see with +Michael Proulx Nice coverage!
Technology scans the environment and translates images into whistles and bleeps users can understand
11
Peter Meijer (The vOICe)'s profile photoShervin Emami's profile photo
10 comments
 
+Peter Meijer perhaps a better solution for The vOICe on smart glasses is to try to convince an existing smart glasses manufacturer to support Linux or Android? Maybe an online petition and/or crowdfunding charity campaign could be enough to push one of them to do it? (If you actually think it would be feasible, then I'd gladly help you spread the word)
Add a comment...
1
Add a comment...
 
Algorithms for eyes: How deep learning can help the blind http://www.infoworld.com/article/2938030/machine-learning/algorithms-for-eyes-how-deep-learning-can-help-the-blind.html by +James Kobielus; cf. the related recent brief discussion on TeraDeep and The vOICe https://plus.google.com/+EugenioCulurciello/posts/RhbQbpKr8gB : See for yourself (sensory substitution, raw vision) and/or by "canned sighted guide" (computer vision w/ deep learning, semantic analysis). #blind #a11y #vision  
Algorithms for real-time collision avoidance, geospatial nav, and situational awareness -- coupled with haptic feedback -- may soon provide the visually impaired with invaluable aid
1
Add a comment...
1
Add a comment...
 
Augmented-Reality glasses could help legally blind navigate http://www.technologyreview.com/news/538491/augmented-reality-glasses-could-help-legally-blind-navigate/ on the work of +Stephen Hicks 
Startup VA-ST thinks its depth-sensing glasses can help people with little sight get around more easily.
3
Add a comment...
 
Cross-modal orienting of visual attention: sound controls sight http://www.sciencedirect.com/science/article/pii/S0028393215300518
1
Add a comment...
Have him in circles
254 people
Jack Wan's profile photo
JR Curley's profile photo
Russell James's profile photo
linh my's profile photo
Yunie Dhealova's profile photo
Telariya Dhaval K's profile photo
Olli Niemitalo's profile photo
Andy Lin's profile photo
SKTA Innopartners's profile photo
 
Interesting read about proper scientific credit.
 
Critique of Paper by "Deep Learning Conspiracy" (Nature 521 p 436)

Machine learning is the science of credit assignment. The machine learning community itself profits from proper credit assignment to its members. The inventor of an important method should get credit for inventing it. She may not always be the one who popularizes it. Then the popularizer should get credit for popularizing it (but not for inventing it). Relatively young research areas such as machine learning should adopt the honor code of mature fields such as mathematics: if you have a new theorem, but use a proof technique similar to somebody else's, you must make this very clear. If you "re-invent" something that was already known, and only later become aware of this, you must at least make it clear later.

As a case in point, let me now comment on a recent article in Nature (2015) about "deep learning" in artificial neural networks (NNs), by LeCun & Bengio & Hinton (LBH for short), three CIFAR-funded collaborators who call themselves the "deep learning conspiracy" (e.g., LeCun, 2015). They heavily cite each other. Unfortunately, however, they fail to credit the pioneers of the field, which originated half a century ago. All references below are taken from the recent deep learning overview (Schmidhuber, 2015), except for a few papers listed beneath this critique focusing on nine items.

1. LBH's survey does not even mention the father of deep learning, Alexey Grigorevich Ivakhnenko, who published the first general, working learning algorithms for deep networks (e.g., Ivakhnenko and Lapa, 1965). A paper from 1971 already described a deep learning net with 8 layers (Ivakhnenko, 1971), trained by a highly cited method still popular in the new millennium. Given a training set of input vectors with corresponding target output vectors, layers of additive and multiplicative neuron-like nodes are incrementally grown and trained by regression analysis, then pruned with the help of a separate validation set, where regularisation is used to weed out superfluous nodes. The numbers of layers and nodes per layer can be learned in problem-dependent fashion.

2. LBH discuss the importance and problems of gradient descent-based learning through backpropagation (BP), and cite their own papers on BP, plus a few others, but fail to mention BP's inventors. BP's continuous form was derived in the early 1960s (Bryson, 1961; Kelley, 1960; Bryson and Ho, 1969). Dreyfus (1962) published the elegant derivation of BP based on the chain rule only. BP's modern efficient version for discrete sparse networks (including FORTRAN code) was published by Linnainmaa (1970). Dreyfus (1973) used BP to change weights of controllers in proportion to such gradients. By 1980, automatic differentiation could derive BP for any differentiable graph (Speelpenning, 1980). Werbos (1982) published the first application of BP to NNs, extending thoughts in his 1974 thesis (cited by LBH), which did not have Linnainmaa's (1970) modern, efficient form of BP. BP for NNs on computers 10,000 times faster per Dollar than those of the 1960s can yield useful internal representations, as shown by Rumelhart et al. (1986), who also did not cite BP's inventors.

3. LBH claim: "Interest in deep feedforward networks [FNNs] was revived around 2006 (refs 31-34) by a group of researchers brought together by the Canadian Institute for Advanced Research (CIFAR)." Here they refer exclusively to their own labs, which is misleading. For example, by 2006, many researchers had used deep nets of the Ivakhnenko type for decades. LBH also ignore earlier, closely related work funded by other sources, such as the deep hierarchical convolutional neural abstraction pyramid (e.g., Behnke, 2003b), which was trained to reconstruct images corrupted by structured noise, enforcing increasingly abstract image representations in deeper and deeper layers. (BTW, the term "Deep Learning" (the very title of LBH's paper) was introduced to Machine Learning by Dechter (1986), and to NNs by Aizenberg et al (2000), none of them cited by LBH.)

4. LBH point to their own work (since 2006) on unsupervised pre-training of deep FNNs prior to BP-based fine-tuning, but fail to clarify that this was very similar in spirit and justification to the much earlier successful work on unsupervised pre-training of deep recurrent NNs (RNNs) called neural history compressors (Schmidhuber, 1992b, 1993b). Such RNNs are even more general than FNNs. A first RNN uses unsupervised learning to predict its next input. Each higher level RNN tries to learn a compressed representation of the information in the RNN below, to minimise the description length (or negative log probability) of the data. The top RNN may then find it easy to classify the data by supervised learning. One can even "distill" a higher, slow RNN (the teacher) into a lower, fast RNN (the student), by forcing the latter to predict the hidden units of the former. Such systems could solve previously unsolvable very deep learning tasks, and started our long series of successful deep learning methods since the early 1990s (funded by Swiss SNF, German DFG, EU and others), long before 2006, although everybody had to wait for faster computers to make very deep learning commercially viable. LBH also ignore earlier FNNs that profit from unsupervised pre-training prior to BP-based fine-tuning (e.g., Maclin and Shavlik, 1995). They cite Bengio et al.'s post-2006 papers on unsupervised stacks of autoencoders, but omit the original work on this (Ballard, 1987).

5. LBH write that "unsupervised learning (refs 91-98) had a catalytic effect in reviving interest in deep learning, but has since been overshadowed by the successes of purely supervised learning." Again they almost exclusively cite post-2005 papers co-authored by themselves. By 2005, however, this transition from unsupervised to supervised learning was an old hat, because back in the 1990s, our unsupervised RNN-based history compressors (see above) were largely phased out by our purely supervised Long Short-Term Memory (LSTM) RNNs, now widely used in industry and academia for processing sequences such as speech and video. Around 2010, history repeated itself, as unsupervised FNNs were largely replaced by purely supervised FNNs, after our plain GPU-based deep FNN (Ciresan et al., 2010) trained by BP with pattern distortions (Baird, 1990) set a new record on the famous MNIST handwritten digit dataset, suggesting that advances in exploiting modern computing hardware were more important than advances in algorithms. While LBH mention the significance of fast GPU-based NN implementations, they fail to cite the originators of this approach (Oh and Jung, 2004).

6. In the context of convolutional neural networks (ConvNets), LBH mention pooling, but not its pioneer (Weng, 1992), who replaced Fukushima's (1979) spatial averaging by max-pooling, today widely used by many, including LBH, who write: "ConvNets were largely forsaken by the mainstream computer-vision and machine-learning communities until the ImageNet competition in 2012," citing Hinton's 2012 paper (Krizhevsky et al., 2012). This is misleading. Earlier, committees of max-pooling ConvNets were accelerated on GPU (Ciresan et al., 2011a), and used to achieve the first superhuman visual pattern recognition in a controlled machine learning competition, namely, the highly visible IJCNN 2011 traffic sign recognition contest in Silicon Valley (relevant for self-driving cars). The system was twice better than humans, and three times better than the nearest non-human competitor (co-authored by LeCun of LBH). It also broke several other machine learning records, and surely was not "forsaken" by the machine-learning community. In fact, the later system (Krizhevsky et al. 2012) was very similar to the earlier 2011 system. Here one must also mention that the first official international contests won with the help of ConvNets actually date back to 2009 (three TRECVID competitions) - compare Ji et al. (2013). A GPU-based max-pooling ConvNet committee also was the first deep learner to win a contest on visual object discovery in large images, namely, the ICPR 2012 Contest on Mitosis Detection in Breast Cancer Histological Images (Ciresan et al., 2013). A similar system was the first deep learning FNN to win a pure image segmentation contest (Ciresan et al., 2012a), namely, the ISBI 2012 Segmentation of Neuronal Structures in EM Stacks Challenge.

7. LBH discuss their FNN-based speech recognition successes in 2009 and 2012, but fail to mention that deep LSTM RNNs had outperformed traditional speech recognizers on certain tasks already in 2007 (Fernández et al., 2007) (and traditional connected handwriting recognisers by 2009), and that today's speech recognition conferences are dominated by (LSTM) RNNs, not by FNNs of 2009 etc. While LBH cite work co-authored by Hinton on LSTM RNNs with several LSTM layers, this approach was pioneered much earlier (e.g., Fernandez et al., 2007).

8. LBH mention recent proposals such as "memory networks" and the somewhat misnamed "Neural Turing Machines" (which do not have an unlimited number of memory cells like real Turing machines), but ignore very similar proposals of the early 1990s, on neural stack machines, fast weight networks, self-referential RNNs that can address and rapidly modify their own weights during runtime, etc (e.g., AMAmemory 2015). They write that "Neural Turing machines can be taught algorithms," as if this was something new, although LSTM RNNs were taught algorithms many years earlier, even entire learning algorithms (e.g., Hochreiter et al., 2001b).

9. In their outlook, LBH mention "RNNs that use reinforcement learning to decide where to look" but not that they were introduced a quarter-century ago (Schmidhuber & Huber, 1991). Compare the more recent Compressed NN Search for large attention-directing RNNs (Koutnik et al., 2013).

One more little quibble: While LBH suggest that "the earliest days of pattern recognition" date back to the 1950s, the cited methods are actually very similar to linear regressors of the early 1800s, by Gauss and Legendre. Gauss famously used such techniques to recognize predictive patterns in observations of the asteroid Ceres.

LBH may be backed by the best PR machines of the Western world (Google hired Hinton; Facebook hired LeCun). In the long run, however, historic scientific facts (as evident from the published record) will be stronger than any PR. There is a long tradition of insights into deep learning, and the community as a whole will benefit from appreciating the historical foundations.

The contents of this critique may be used (also verbatim) for educational and non-commercial purposes, including articles for Wikipedia and similar sites.

References not yet in the survey (Schmidhuber, 2015):

Y. LeCun, Y. Bengio, G. Hinton (2015). Deep Learning. Nature 521, 436-444. http://www.nature.com/nature/journal/v521/n7553/full/nature14539.html

Y. LeCun (2015). IEEE Spectrum Interview by L. Gomes, Feb 2015: http://spectrum.ieee.org/automaton/robotics/artificial-intelligence/facebook-ai-director-yann-lecun-on-deep-learning

R. Dechter (1986). Learning while searching in constraint-satisfaction problems. University of California, Computer Science Department, Cognitive Systems Laboratory. First paper to introduce the term "Deep Learning" to Machine Learning.

I. Aizenberg, N.N. Aizenberg, and J. P.L. Vandewalle (2000). Multi-Valued and Universal Binary Neurons: Theory, Learning and Applications. Springer Science & Business Media. First paper to introduce the term "Deep Learning" to Neural Networks. Compare a popular G+ post on this: https://plus.google.com/100849856540000067209/posts/7N6z251w2Wd?pid=6127540521703625346&oid=100849856540000067209.

J. Schmidhuber (2015). Deep learning in neural networks: An overview. Neural Networks, 61, 85-117. Preprint: http://arxiv.org/abs/1404.7828

AMAmemory (2015): Answer at reddit AMA (Ask Me Anything) on "memory networks" etc (with references): http://www.reddit.com/r/MachineLearning/comments/2xcyrl/i_am_j%C3%BCrgen_schmidhuber_ama/cp0q12t


#machinelearning
#artificialintelligence
#computervision
#deeplearning

Link: http://people.idsia.ch/~juergen/deep-learning-conspiracy.html
15 comments on original post
1
Add a comment...
 
Funded by Google: Computer vision and mobile technology could help blind people 'see' http://www.lincoln.ac.uk/news/2015/06/1110.asp
1
Add a comment...
 
Wicab's $10,000 BrainPort V100 tongue display for the blind gets FDA approval http://www.bloomberg.com/news/articles/2015-06-19/now-blind-americans-can-see-with-device-atop-their-tongues #sensorysubstitution #blind  

FDA allows marketing of new device to help the blind process visual signals via their tongues http://www.fda.gov/NewsEvents/Newsroom/PressAnnouncements/ucm451779.htm
1
Add a comment...
 
Semantic-based crossmodal processing during visual suppression http://journal.frontiersin.org/article/10.3389/fpsyg.2015.00722/full Influence of auditory input on visual awareness
1
Add a comment...
 
The next Nexus smartphone has been tipped this week with a 3D camera akin to Google's Project Tango device line. Two Nexus smartphones have been tipped to be
1
1
Mike Trieu (MegasChara)'s profile photoChris Damkat's profile photo
 
We can only hope.
Add a comment...
 
(Polish) The vOICe in Kontrast czerwiec p. 16, Quasi-wizja nadzieją dla niewidomych http://issuu.com/miesiecznikkontrast/docs/kontrast_czerwiec_30317f2cb16ed5/16?e=0
 ·  Translate
Zachęcamy do lektury czerwcowego numeru "Kontrastu".
1
Add a comment...
People
Have him in circles
254 people
Jack Wan's profile photo
JR Curley's profile photo
Russell James's profile photo
linh my's profile photo
Yunie Dhealova's profile photo
Telariya Dhaval K's profile photo
Olli Niemitalo's profile photo
Andy Lin's profile photo
SKTA Innopartners's profile photo
Work
Occupation
Medical image processing
Employment
  • Metamodal
    Neural engineering, 2011 - present
  • Hemics
    Principal Scientist, 2011 - present
  • NXP Semiconductors
    Senior Scientist, 2006 - 2010
  • Philips Research
    Senior Scientist, 1985 - 2006
Places
Map of the places this user has livedMap of the places this user has livedMap of the places this user has lived
Currently
Eindhoven, The Netherlands
Story
Tagline
Developer of The vOICe sensory substitution system for the blind.
Introduction
Peter Meijer was born on June 5, 1961, in Sliedrecht, The Netherlands. He received his M.Sc. in Physics from Delft University of Technology in 1985, for work performed in the Solid State Physics group (nowadays Quantum Transport group) on non-equilibrium superconductivity and sub-micron photolithography.

From September 1985 until August 2006 he worked as a research scientist at Philips Research Laboratories in Eindhoven, The Netherlands, initially focusing on black-box modeling techniques for analog circuit simulation. He developed two different classes of highly nonlinear multivariate interpolation techniques (published in IEEE Transactions on Circuits and Systems, 1990), and later he generalized the multilayer perceptron networks (AKA feedforward neural networks) for learning in time and frequency domain. Separately, he also developed an accelerated reliability simulator for hot-carrier degradation in CMOS circuits (presented at ESREF 1993). In May 1996 he received his Ph.D. from Eindhoven University of Technology, Department of Electrical Engineering, on the subject of dynamic neural networks for device and subcircuit modeling for circuit simulation. Dynamic neural networks were applied in modeling bipolar and MOS transistors, analog video filters (one-chip TV), folding AD converters, intermodulation distortion in mixers, two-port resonance in BNC connectors with leads, frequency-domain transformer modeling, heatflow in IC packages, and a variety of other cases. From 1999 until 2003 he was cluster leader of the Future Design Technologies cluster within the research group Digital Design & Test at Philips Research, while working on novel nanotechnology design options and the simulation and modeling of RF effects in high-speed analog and digital circuits. He applied a combination of FDTD (a 4D discretization technique for solving Maxwell’s equations in space and time) and dynamic neural networks to model cross-talk in ultra-high frequency interconnect (200 GHz). His work on nanoimprint techniques was part of a cooperation between Philips and ASML.

In October 2006 he left Philips and joined the Central R&D organization of the newly founded NXP Semiconductors, to work in the field of computer vision research, programming a massively parallel SIMD-based hardware platform for real-time low-power video processing ("pixel crunching" with the 320-core Xetal chip). Applications included real-time body-tracking and automatic camera calibration for nonlinear lens and visual perspective distortion, and real-time stereo vision. In September 2008 the focus of his work shifted towards image processing for improved picture quality in digital television (DTV). He left NXP Semiconductors in December 2010.

In October 2011 he joined Hemics (formerly known as Akeso Medical Imaging). Hemics is a medical device company active in the field of Rheumatoid Arthritis. It aims to improve the quality of life of patients by creating imaging devices that support the rheumatologist in monitoring and treatment of this disease.

In parallel with his work in the medical and electronics industry, and in line with his interests in human sensing capabilities, he developed an image-to-sound conversion system known as "The vOICe", aimed at the development of a synthetic vision device (artificial vision system) for the blind. Starting with the design and implementation of a 5-stage pipelined special purpose computer, he later developed software versions for Microsoft Windows netbooks (The vOICe Learning Edition), Nokia camera phones (The vOICe MIDlet, Java ME) and an augmented reality version for Android camera phones (The vOICe for Android). Further evaluation of this noninvasive technology in cooperation with Harvard Medical School, California Institute of Technology, University of Düsseldorf (Institute of Experimental Psychology), University of Jerusalem (Institute of Medical Sciences), University of Lübeck (Institute for Neuro- and Bioinformatics), Montreal Neurological Institute, and other academic partners around the world.

Contact: http://www.seeingwithsound.com/contact.htm

Education
  • Delft University of Technology
    Physics, 1979 - 1985
Basic Information
Gender
Male