Profile

Cover photo
Mat Kelcey
Works at Google
Lives in San Francisco Bay Area
328 followers|314,084 views
AboutPostsCollectionsPhotosVideos+1'sReviews

Stream

Mat Kelcey

Shared publicly  - 
 
A Neural Algorithm of Artistic Style: In fine art, especially painting, humans have mastered the skill to create unique visual experiences through composing a complex interplay between the content and style of an image. Thus far the algorithmic basis of this process is unknown and there exists no artificial system with similar capabilities. However, in other key areas of visual perception such as object and face recognition near-human performance was recently demonstrated by a class of biologically inspired vision models called Deep Neural Networks. Here we introduce an artificial system based on a Deep Neural Network that creates artistic images of high perceptual quality. The system uses neural representations to separate and recombine content and style of arbitrary images, providing a neural algorithm for the creation of artistic images. Moreover, in light of the striking similarities between performance-optimised artificial neural networks and biological vision, our work offers a path forward to an algorithmic understanding of how humans create and perceive artistic imagery.
Abstract: In fine art, especially painting, humans have mastered the skill to create unique visual experiences through composing a complex interplay between the content and style of an image. Thus far the algorithmic basis of this process is unknown and there exists no artificial system with ...
1
Add a comment...

Mat Kelcey

Shared publicly  - 
 
Not All Contexts Are Created Equal, Better Word Representations with Variable Attention: We introduce an extension to the bag-ofwords model for learning words representations that take into account both syntactic and semantic properties within language. This is done by employing an attention model that finds within the contextual words, the words that are relevant for each prediction. The general intuition of our model is that some words are only relevant for predicting local context (e.g. function words), while other words are more suited for determining global context, such as the topic of the document. Experiments performed on both semantically and syntactically oriented tasks show gains using our model over the existing bag of words model. Furthermore, compared to other more sophisticated models, our model scales better as we increase the size of the context of the model.
2
Matt Siegel's profile photo
 
Excellent find :D I'm impressed with the fast progress in NLP!
Add a comment...

Mat Kelcey

Shared publicly  - 
 
Effective Approaches to Attention-based Neural Machine Translation: An attentional mechanism has been used in neural machine translation (NMT) lately to selectively focus on parts of the source sentence during translation. However, there has been little work exploring useful architectures for attention-based NMT. This paper examines two simple and effective classes of attentional mechanism: a global approach which always attends to all source words and a local one that only looks at a subset of source words at a time. We demonstrate the effectiveness of both approaches over the WMT translation tasks between English and German in both directions. Our attentional NMTs provide a boost of up to 5.0 BLEU points over non-attentional systems which already incorporate known techniques such as dropout. For the English to German direction, we have established new state-of-the-art results of 23.0 BLEU for WMT'14 and 25.9 BLEU for WMT'15. Our in-depth analysis sheds light on which architectures are best.
Abstract: An attentional mechanism has been used in neural machine translation (NMT) lately to selectively focus on parts of the source sentence during translation. However, there has been little work exploring useful architectures for attention-based NMT. This paper examines two simple and ...
1
Add a comment...

Mat Kelcey

Shared publicly  - 
 
Theoretical and empirical evidence indicates that the depth of neural networks is crucial for their success. However, training becomes more difficult as depth increases, and training of very deep networks remains an open problem. Here we introduce a new architecture designed to overcome this. Our so-called highway networks allow unimpeded information flow across many layers on information highways. They are inspired by Long Short-Term Memory recurrent networks and use adaptive gating units to regulate the information flow. Even with hundreds of layers, highway networks can be trained directly through simple gradient descent. This enables the study of extremely deep and efficient architectures.
Abstract: Theoretical and empirical evidence indicates that the depth of neural networks is crucial for their success. However, training becomes more difficult as depth increases, and training of very deep networks remains an open problem. Here we introduce a new architecture designed to ...
2
1
Emre Safak's profile photo
Add a comment...

Mat Kelcey

Shared publicly  - 
 
Neural machine translation, a recently proposed approach to machine translation based purely on neural networks, has shown promising results compared to the existing approaches such as phrase-based statistical machine translation. Despite its recent success, neural machine translation has its limitation in handling a larger vocabulary, as training complexity as well as decoding complexity increase proportionally to the number of target words. In this paper, we propose a method that allows us to use a very large target vocabulary without increasing training complexity, based on importance sampling. We show that decoding can be efficiently done even with the model having a very large target vocabulary by selecting only a small subset of the whole target vocabulary. The models trained by the proposed approach are empirically found to outperform the baseline models with a small vocabulary as well as the LSTM-based neural machine translation models. Furthermore, when we use the ensemble of a few models with very large target vocabularies, we achieve the state-of-the-art translation performance (measured by BLEU) on the English->German translation and almost as high performance as state-of-the-art English->French translation system.
Abstract: Neural machine translation, a recently proposed approach to machine translation based purely on neural networks, has shown promising results compared to the existing approaches such as phrase-based statistical machine translation. Despite its recent success, neural machine translation ...
1
Add a comment...

Mat Kelcey

Shared publicly  - 
 
We present the first massively distributed architecture for deep reinforcement learning. This architecture uses four main components: parallel actors that generate new behaviour; parallel learners that are trained from stored experience; a distributed neural network to represent the value function or behaviour policy; and a distributed store of experience. We used our architecture to implement the Deep Q-Network algorithm (DQN). Our distributed algorithm was applied to 49 games from Atari 2600 games from the Arcade Learning Environment, using identical hyperparameters. Our performance surpassed non-distributed DQN in 41 of the 49 games and also reduced the wall-time required to achieve these results by an order of magnitude on most games.
Abstract: We present the first massively distributed architecture for deep reinforcement learning. This architecture uses four main components: parallel actors that generate new behaviour; parallel learners that are trained from stored experience; a distributed neural network to represent the ...
1
Add a comment...
In their circles
286 people
Have them in circles
328 people
Leith Masters's profile photo
Matthew Sinclair's profile photo
Janet Campbell's profile photo
Daniel Naumann's profile photo
Neil Kodner's profile photo
Andrew Louth's profile photo
Cynthia Mullen's profile photo
Angus Ng's profile photo
Rebecca Kelly's profile photo

Mat Kelcey

Shared publicly  - 
 
End-to-End Attention-based Large Vocabulary Speech Recognition: Many of the current state-of-the-art Large Vocabulary Continuous Speech Recognition Systems (LVCSR) are hybrids of neural networks and Hidden Markov Models (HMMs). Most of these systems contain separate components that deal with the acoustic modelling, language modelling and sequence decoding. We investigate a more direct approach in which the HMM is replaced with a Recurrent Neural Network (RNN) that performs sequence prediction directly at the character level. Alignment between the input features and the desired character sequence is learned automatically by an attention mechanism built into the RNN. For each predicted character, the attention mechanism scans the input sequence and chooses relevant frames. We propose two methods to speed up this operation: limiting the scan to a subset of most promising frames and pooling over time the information contained in neighboring frames, thereby reducing source sequence length. Integrating an n-gram language model into the decoding process yields recognition accuracies similar to other HMM-free RNN-based approaches.
Abstract: Many of the current state-of-the-art Large Vocabulary Continuous Speech Recognition Systems (LVCSR) are hybrids of neural networks and Hidden Markov Models (HMMs). Most of these systems contain separate components that deal with the acoustic modelling, language modelling and sequence ...
1
Add a comment...

Mat Kelcey

Shared publicly  - 
 
Finding Function in Form: Compositional Character Models for
Open Vocabulary Word Representation: We introduce a model for constructing vector representations of words by composing characters using bidirectional LSTMs. Relative to traditional word representation models that have independent vectors for each word type, our model requires only a single vector per character type and a fixed set of parameters for the compositional model. Despite the compactness of this model and, more importantly, the arbitrary nature of the form–function relationship in language, our “composed” word representations yield state-of-the-art results in language modeling and part-of-speech tagging. Benefits over traditional baselines are particularly pronounced in morphologically rich languages (e.g., Turkish).
1
Mat Kelcey's profile photoMatt Siegel's profile photo
3 comments
 
sometimes i wonder if there should be an arrgXiv, for failed attempts ;D
Add a comment...

Mat Kelcey

Shared publicly  - 
 
"Deep Learning (hopefully faster)" Adam Coates. Great description of the "The Roofline model" across different topologies.
1
Matt Siegel's profile photo
 
clear practical advice :)  i like the graphs
Add a comment...

Mat Kelcey

Shared publicly  - 
 
We present Listen, Attend and Spell (LAS), a neural network that learns to transcribe speech utterances to characters. Unlike traditional DNN-HMM models, this model learns all the components of a speech recognizer jointly. Our system has two components: a listener and a speller. The listener is a pyramidal recurrent network encoder that accepts filter bank spectra as inputs. The speller is an attention-based recurrent network decoder that emits characters as outputs. The network produces character sequences without making any independence assumptions between the characters. This is the key improvement of LAS over previous end-to-end CTC models. On the Google Voice Search task, LAS achieves a word error rate (WER) of 14.2% without a dictionary or a language model, and 11.2% with language model rescoring over the top 32 beams. In comparison, the state-of-the-art CLDNN-HMM model achieves a WER of 10.9%.
Abstract: We present Listen, Attend and Spell (LAS), a neural network that learns to transcribe speech utterances to characters. Unlike traditional DNN-HMM models, this model learns all the components of a speech recognizer jointly. Our system has two components: a listener and a speller.
4
Add a comment...

Mat Kelcey

Shared publicly  - 
 
We consider the task of generative dialogue modeling for movie scripts. To this end, we extend the recently proposed hierarchical recurrent encoder decoder neural network and demonstrate that this model is competitive with state-of-the-art neural language models and backoff n-gram models. We show that its performance can be improved considerably by bootstrapping the learning from a larger question-answer pair corpus and from pretrained word embeddings.
Abstract: We consider the task of generative dialogue modeling for movie scripts. To this end, we extend the recently proposed hierarchical recurrent encoder decoder neural network and demonstrate that this model is competitive with state-of-the-art neural language models and backoff n-gram ...
1
Add a comment...

Mat Kelcey

Shared publicly  - 
 
We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments. The method is straightforward to implement, is computationally efficient, has little memory requirements, is invariant to diagonal rescaling of the gradients, and is well suited for problems that are large in terms of data and/or parameters. The method is also appropriate for non-stationary objectives and problems with very noisy and/or sparse gradients. The hyper-parameters have intuitive interpretations and typically require little tuning. Some connections to related algorithms, on which Adam was inspired, are discussed. We also analyze the theoretical convergence properties of the algorithm and provide a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework. Empirical results demonstrate that Adam works well in practice and compares favorably to other stochastic optimization methods. Finally, we discuss AdaMax, a variant of Adam based on the infinity norm.
Abstract: We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments. The method is straightforward to implement, is computationally efficient, has little memory requirements, is invariant to ...
1
Add a comment...
Mat's Collections
People
In their circles
286 people
Have them in circles
328 people
Leith Masters's profile photo
Matthew Sinclair's profile photo
Janet Campbell's profile photo
Daniel Naumann's profile photo
Neil Kodner's profile photo
Andrew Louth's profile photo
Cynthia Mullen's profile photo
Angus Ng's profile photo
Rebecca Kelly's profile photo
Work
Occupation
Software Engineer
Skills
Machine learning, natural language processing, information retrieval, distributed systems.
Employment
  • Google
    Software Engineer, present
  • Wavii
    Software Engineer
  • Amazon Web Services
    Software Engineer
  • Lonely Planet
    Software Engineer
  • Sensis
    Software Engineer
  • Distra
    Software Engineer
  • Nokia
    Software Engineer
  • Australian Stock Exchange
    Software Engineer
Basic Information
Gender
Decline to State
Story
Tagline
data nerd wannabe
Introduction
I work in the Machine Intelligence group at Google building as-large-as-I-can-get neural networks for knowledge extraction.
Places
Map of the places this user has livedMap of the places this user has livedMap of the places this user has lived
Currently
San Francisco Bay Area
Previously
seattle - melbourne - calgary - london - sydney - hobart
Links
Contributor to
Mat Kelcey's +1's are the things they like, agree with, or want to recommend.
Clive Barker - Google Play
market.android.com

Clive Barker is an English author, film director, video game designer and visual artist best known for his work in both fantasy and horror f

Aphex Twin - Music on Google Play
market.android.com

Richard David James, best known by his stage name Aphex Twin, is a British electronic musician and composer. He has been described by The Gu

Chess Tactics Pro (Puzzles)
market.android.com

Get better at chess with this large collection of chess puzzles for all levels !This tactic trainer lets you practice in 3 different modes :

Google Search
market.android.com

Google Search app for Android: The fastest, easiest way to find what you need on the web and on your device.* Quickly search the web and you

NetHack
market.android.com

This is an Android port of NetHack: a classic roguelike game originally released in 1987.Main features ------------- * User-friendly interfa

Improving Photo Search: A Step Across the Semantic Gap
googleresearch.blogspot.com

Posted by Chuck Rosenberg, Image Search Team Last month at Google I/O, we showed a major upgrade to the photos experience: you can now easil

Machine Learning - Stanford University
ml-class.org

A bold experiment in distributed education, "Machine Learning" will be offered free and online to students worldwide during the fa

Game Theory
www.game-theory-class.org

Game Theory is a free online class taught by Matthew Jackson and Yoav Shoham.

Probabilistic Graphical Models
www.pgm-class.org

Probabilistic Graphical Models is a free online class taught by Daphne Koller.

RStudio
rstudio.org

News. RStudio v0.94 Available (6/15/2011). RStudio v0.94 is now available. In this release we've made lots of enhancements based on the

Hadoop 0.20.205.0 API
hadoop.apache.org

Frame Alert. This document is designed to be viewed using the frames feature. If you see this message, you are using a non-frame-capable web

Shapecatcher.com: Unicode Character Recognition
shapecatcher.com

You need to find a specific Unicode Character? With Shapecatcher.com you can search through a database of characters by simply drawing your

Duncan & Sons Automotive Service Center
plus.google.com

Duncan & Sons Automotive Service Center hasn't shared anything on this page with you.

Natural Language Processing
www.nlp-class.org

Natural Language Processing is a free online class taught by Chris Manning and Dan Jurafsky.

name value description hadoop.tmp.dir /tmp/hadoop-${user.name} A ...
hadoop.apache.org

name, value, description. hadoop.tmp.dir, /tmp/hadoop-${user.name}, A base for other temporary directories. hadoop.native.lib, true, Should

Apache OpenNLP Developer Documentation
incubator.apache.org

Written and maintained by the Apache OpenNLP Development Community. Version 1.5.2-incubating. Copyright © , The Apache Software Foundation.

ggplot.
had.co.nz

ggplot. An implementation of the grammar of graphics in R. Check out the documentation for ggplot2 - the next generation. ggplot is an imple

ChainMapper (Hadoop 0.20.1 API)
hadoop.apache.org

public class ChainMapper; extends Object; implements Mapper. The ChainMapper class allows to use multiple Mapper classes within a single Map

Neural net language models - Scholarpedia
www.scholarpedia.org

A language model is a function, or an algorithm for learning such a function, that captures the salient statistical characteristics of the d

tech stuff by mat kelcey
www.matpalm.com

my nerd blog. latent semantic analysis via the singular value decomposition (for dummies). semi supervised naive bayes. statistical synonyms

Public - a year ago
reviewed a year ago
1 review
Map
Map
Map