Profile cover photo
Profile photo
Markus Breitenbach
I'm sciencing as fast as I can
I'm sciencing as fast as I can
About
Posts

Post has attachment
Net-Trim: Convex Pruning of Deep Neural Networks with Performance Guarantee
Alireza Aghasi, Afshin Abdi, Nam Nguyen, Justin Romberg
(Submitted on 16 Nov 2016 (v1), last revised 23 Nov 2017 (this version, v4))

We introduce and analyze a new technique for model reduction for deep neural networks. While large networks are theoretically capable of learning arbitrarily complex models, overfitting and model redundancy negatively affects the prediction accuracy and model variance. Our Net-Trim algorithm prunes (sparsifies) a trained network layer-wise, removing connections at each layer by solving a convex optimization program. This program seeks a sparse set of weights at each layer that keeps the layer inputs and outputs consistent with the originally trained model. The algorithms and associated analysis are applicable to neural networks operating with the rectified linear unit (ReLU) as the nonlinear activation. We present both parallel and cascade versions of the algorithm. While the latter can achieve slightly simpler models with the same generalization performance, the former can be computed in a distributed manner. In both cases, Net-Trim significantly reduces the number of connections in the network, while also providing enough regularization to slightly reduce the generalization error. We also provide a mathematical analysis of the consistency between the initial network and the retrained model. To analyze the model sample complexity, we derive the general sufficient conditions for the recovery of a sparse transform matrix. For a single layer taking independent Gaussian random vectors of length N as inputs, we show that if the network response can be described using a maximum number of s non-zero weights per node, these weights can be learned from O(slogN) samples.

Post has shared content
Notes from NIPS 2017

1) John Platt's talk on energy, fusion, and the next 100 years of human civilization. Definitely worth a watch! He does a great job of framing the problem and building up to his research groups focus. I'm still optimistic that renewables can provide more than the proposed 40% of energy for the world.

2) Kate Crawford's talk on bias in ML. As with Joelle Pineau's talk on reproducibility, Kate's talk comes at an excellent time to get folks in the community to think deeply about these issues as we build the next generation of tools and systems.

3) Joelle Pineau's talk (no public link yet available) on reproducibility during the Deep RL Symposium.

4) Ali Rahimi's test of time talk that caused a lot of buzz around the conference (the ``alchemy" piece beings at the 11minute mark). My takeaway is that Ali is calling for more rigor from our experimentation, methods, and evaluation (and not necessarily just more theory). In light of the findings presented in Joelle's talk, I feel compelled to agree with Ali (at least for Deep RL, where experimental methods are still in the process of being defined). In particular I think with RL we should open up to other kinds of experimental analysis beyond just ``which algorithm got the most reward on task X", and consider other diagnostic tools to understand our algorithms: when did it converge? how suboptimal is the converged policy? how well did it explore the space? how often did an algorithm find a really bad policy? why? where does it fail and why?. Ali and Ben just posted a follow up to their talk that's worth a read.

5) The Hierarchical RL workshop! This event was a blast. In part because I love this area and find there to be so many open foundational questions, but also because the speaker lineup and poster collection was fantastic. When videos become available I'll post links to some of my highlights, including the panel (see the end of my linked notes above for a rough transcript of the panel).

Post has attachment

Post has shared content
Neural nets can replace other algorithms
The dominance of neural networks continues. In this paper you can read about how they can outperform and replace other algorithms.
In summary, we have demonstrated that machine learned models have the potential to provide significant benefits over state-of-the-art database indexes, and we believe this is a fruitful direction for future research.

Post has attachment
Long Text Generation via Adversarial Training with Leaked Information
Jiaxian Guo, Sidi Lu, Han Cai, Weinan Zhang, Yong Yu, Jun Wang
(Submitted on 24 Sep 2017 (v1), last revised 8 Dec 2017 (this version, v2))

Automatically generating coherent and semantically meaningful text has many applications in machine translation, dialogue systems, image captioning, etc. Recently, by combining with policy gradient, Generative Adversarial Nets (GAN) that use a discriminative model to guide the training of the generative model as a reinforcement learning policy has shown promising results in text generation. However, the scalar guiding signal is only available after the entire text has been generated and lacks intermediate information about text structure during the generative process. As such, it limits its success when the length of the generated text samples is long (more than 20 words). In this paper, we propose a new framework, called LeakGAN, to address the problem for long text generation. We allow the discriminative net to leak its own high-level extracted features to the generative net to further help the guidance. The generator incorporates such informative signals into all generation steps through an additional Manager module, which takes the extracted features of current generated words and outputs a latent vector to guide the Worker module for next-word generation. Our extensive experiments on synthetic data and various real-world tasks with Turing test demonstrate that LeakGAN is highly effective in long text generation and also improves the performance in short text generation scenarios. More importantly, without any supervision, LeakGAN would be able to implicitly learn sentence structures only through the interaction between Manager and Worker.

Post has attachment
Solving internal covariate shift in deep learning with linked neurons
Carles Roger Riera Molina, Oriol Pujol Vila
(Submitted on 7 Dec 2017)

This work proposes a novel solution to the problem of internal covariate shift and dying neurons using the concept of linked neurons. We define the neuron linkage in terms of two constraints: first, all neuron activations in the linkage must have the same operating point. That is to say, all of them share input weights. Secondly, a set of neurons is linked if and only if there is at least one member of the linkage that has a non-zero gradient in regard to the input of the activation function. This means that for any input in the activation function, there is at least one member of the linkage that operates in a non-flat and non-zero area. This simple change has profound implications in the network learning dynamics. In this article we explore the consequences of this proposal and show that by using this kind of units, internal covariate shift is implicitly solved. As a result of this, the use of linked neurons allows to train arbitrarily large networks without any architectural or algorithmic trick, effectively removing the need of using re-normalization schemes such as Batch Normalization, which leads to halving the required training time. It also solves the problem of the need for standarized input data. Results show that the units using the linkage not only do effectively solve the aforementioned problems, but are also a competitive alternative with respect to state-of-the-art with very promising results.

Post has shared content
Google Is Giving Away AI That Can Build Your Genome Sequence

TODAY, A TEASPOON of spit and a hundred bucks is all you need to get a snapshot of your DNA. But getting the full picture—all 3 billion base pairs of your genome—requires a much more laborious process. One that, even with the aid of sophisticated statistics, scientists still struggle over. It’s exactly the kind of problem that makes sense to outsource to artificial intelligence. On Monday, Google released a tool called DeepVariant that uses deep learning—the machine learning technique that now dominates AI—to assemble full human genomes. Modeled loosely on the networks of neurons in the human brain, these massive mathematical models have learned how to do things like identify faces posted to your Facebook news feed, transcribe your inane requests to Siri, and even fight internet trolls. And now, engineers at Google Brain and Verily (Alphabet’s life sciences spin-off) have taught one to take raw sequencing data and line up the billions of As, Ts, Cs, and Gs that make you you. And oh yeah, it’s more accurate than all the existing methods out there. Last year, DeepVariant took first prize in an FDA contest promoting improvements in genetic sequencing. The open source version the Google Brain/Verily team introduced to the world Monday reduced the error rates even further—by more than 50 percent. Looks like grandmaster Ke Jie isn’t be the only one getting bested by Google’s AI neural networks this year.

Post has attachment
Generalization Theory and Deep Nets, An introduction http://offconvex.github.io/2017/12/08/generalization1/
Why deep learning is not overfitting.

Post has shared content

Post has shared content
'AG Zero is cool but it'll never defeat the superhuman chess engines tuned over half a century and massive distributed computing projects; it's really only appropriate for Go'.

Guess what: AG Zero can be used to learn chess and defeats Stockfish after 4 hours of training: "Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm", Silver et al 2017 https://arxiv.org/abs/1712.01815

"The game of chess is the most widely-studied domain in the history of artificial intelligence. The strongest programs are based on a combination of sophisticated search techniques, domain-specific adaptations, and handcrafted evaluation functions that have been refined by human experts over several decades. In contrast, the AlphaGo Zero program recently achieved superhuman performance in the game of Go, by tabula rasa reinforcement learning from games of self-play. In this paper, we generalise this approach into a single AlphaZero algorithm that can achieve, tabula rasa, superhuman performance in many challenging domains. Starting from random play, and given no domain knowledge except the game rules, AlphaZero achieved within 24 hours a superhuman level of play in the games of chess and shogi (Japanese chess) as well as Go, and convincingly defeated a world-champion program in each case."
Wait while more posts are being loaded