Profile

Cover photo
Bradley Neuberg
Worked at Google Inc.
Attended Columbia University
Lives in San Francisco, California
3,082 followers|62,470 views
AboutPostsPhotosYouTube

Stream

Bradley Neuberg

Discussion  - 
 
I've set up Caffe on an Amazon EC2 GPU instance. I'm storing my data and trained weights on an EBS volume. However, I've noticed that my IO seems to be a bit slow, slowing down Caffe. Any tips on how to speed up IO when dealing with Caffe on EC2? I'm also aware of cost, so don't want to use any of the EC2 options that are too expensive.
1
Kostia Antoniuk's profile photoEdo Cohen's profile photoStefano Fabri's profile photo

Bradley Neuberg

Discussion  - 
 
I'm starting a deep learning paper reading group at my workplace. I'd like to start by reviewing important fundamental papers on adding memory to neural networks (LSTM, neural Turing machines, etc) and attention models. For those two topics what papers do you recommend as good intros and important developments?
4
1
Sam Bowman's profile photoBill Trowbridge's profile photoFederico Pernici's profile photo
3 comments
 
By the way, will you be making your own list of papers available, so we have access to it?   Perhaps some of us could follow along with your workplace group.
Add a comment...

Bradley Neuberg

Discussion  - 
 
I'm dealing with some large pre-processing and looking for advice on how to setup my pipeline.

I'm training a siamese network on the Labelled Faces in the Wild (LFW) dataset in order to do face verification. LFW has about 13,000 different faces. A siamese network ends up training a metric function that places each face into a discrete x/y space, with similar faces clustering together. At testing time I can take a given face, see where it lies in the face, and see if it clusters near another face to do positive/negative identification.

Through a subset of the data, I've learned that the best performance happens when I pair every face with every other face to get all possible positive and negative face combinations to train on. However, if I were to do this with the full 13,000 faces, it would be a huge processing task.

I'm currently pre-processing my data with Python, pairing faces, shuffling them, then writing them to a LevelDB database for use with Caffe. If I try to pair every possible combo of faces with the full 13K faces I run out of memory. I need to either get much cleverer with how I'm processing these with Python, or move to some other solution to do every possible combo. What data processing pipeline do folks recommend for dealing with a large dataset like this to do the pairing and shuffling without exhausting available memory?

Eventually I'll move from using the LFW dataset to the CASIA-WebFace dataset, which has about 500K images, so my data processing pipeline needs will get even more intense, so I'll need something that can scale up to that level.
1
Amir Rosenfeld's profile photo
 
You can avoid this by using a better scheme for sampling your examples. Check out the triplet loss in google's facenet paper. 
Add a comment...

Bradley Neuberg

Discussion  - 
 
I've been learning deep learning through Coursera and other online courses. Are there any masters programs yet that go into deep learning, or is it restricted to mostly PhDs programs still? Do Berkeley/Cal and Stanford have masters programs in these subjects yet? How about online masters programs?
3
Bradley Neuberg's profile photoEthan Caballero's profile photoYagnesh Revar's profile photoSam Bowman's profile photo
5 comments
 
A fair number of students in the general Stanford CS masters program seem to be focusing on deep learning.
Add a comment...

Bradley Neuberg

Discussion  - 
 
I'm working on a neural network that can take two segmented facial images as input and return a binary "same/not same" answer on whether the two given images are the same person or not. Is anyone aware of any prior work in the literature that can help provide direction on a best approach to this?
4
1
Chris Russell's profile photoMilan Lajtoš's profile photoBradley Neuberg's profile photoPatrick Ehlen's profile photo
3 comments
 
Chris that paper looks incredible; thanks for pointing that out to me. Milan, Siamese networks look like a good fit; there's even a Caffe model checked into the Caffe repo recently I can use.
Add a comment...

Bradley Neuberg

Shared publicly  - 
 
 
A nice and largely accurate article in The Chronicle of Higher Education about the history of neural nets and deep learning, with quotes from +Geoffrey Hinton, +Terrence Sejnowski, +Yoshua Bengio, and yours truly.

http://chronicle.com/article/The-Believers/190147/
The hidden story behind the code that runs our lives.
View original post
1
Add a comment...

Bradley Neuberg

Discussion  - 
 
I'm working through Geoffrey Hinton's 2012 Coursera course on Neural Networks. In the course he mentions the exploding and vanishing gradient problem when working with Recurrent Neural Networks. He gives several solutions to this, including careful default initialization + momentum, using Hessian Free optimization, LSTMs, and Echo State Networks. What is the current state of the art when dealing with RNNs? Are these still the grab bag of techniques one would reach for or are there other, simpler options now? Are Echo State Networks still used, as their generality seems limited?
15
5
Mingwan Wang's profile photoAlbert Zeyer's profile photoNicolas Chapados's profile photoChristian Hudon's profile photo
9 comments
Jim Fan
 
+Junyoung Chung Oh sorry about that, I didn't really read through the paper in details, just heard about it from a friend. 
Add a comment...
Have him in circles
3,082 people
Nathan Burnett's profile photo
Ray Ni's profile photo
Dan Denney's profile photo
efath ara chowdhury's profile photo
Wayne Montague's profile photo
Vinicius Policena's profile photo
Andrew Peters's profile photo
Clippingpath Freelancer's profile photo
Luciano Bonachela's profile photo

Communities

Bradley Neuberg

Discussion  - 
 
I'm planning on applying deep learning to orbital image data, starting with classifying and segmenting things like forest edges, cars, etc. Are people aware of strong papers in this area, specifically around applying deep learning to visual satellite data, I should review before hand? Any good raw or training sets publicly available for this?
1
Min Ooch's profile photoBill Trowbridge's profile photo

Bradley Neuberg

Discussion  - 
 
I'm almost finished with the Hinton 2012 Coursera course. I know that two things have changed since then:

* Using statistical generative models such as RBMs, DBNs, and Sigmoid Belief Nets for pretraining is no longer done, as simply properly initializing your weights properly and using backprop is good enough, even for deep networks.
* The logistic activation function is not used as much, with ReLU's preferred as they are simpler and faster.

Does anyone know the scientific papers that established these two changes? I'd love to study the original papers.
Neural Networks for Machine Learning from University of Toronto. Take free online classes from 115+ top universities and educational organizations. We partner with schools like Stanford, Yale, Princeton, and others to offer courses in dozens of topics, from computer science to teaching and beyond. Whether you are pursuing a passion or looking to advance your career, Coursera provides open, free education for everyone.
55
20
Mike G's profile photoChristian Yonathan's profile photoDaniel Santiago's profile photoamina mollaysa's profile photo
6 comments
 
Hi All
I’m new to the field of deep learning.
I’m wondering about the same question myself.

were RBM/DBN/autoencoders found to be less effective? (when using proper BPROP with Relu)

for example I’m looking on latest Caffe deep learning framework and not finding reference to networks that uses RBM or proper textbook autoencoders. 

I would very much appreciate your thoughts on this
thanks!
Add a comment...

Bradley Neuberg

Discussion  - 
 
I'm training a siamese network (http://yann.lecun.com/exdb/publis/pdf/chopra-05.pdf) for binary classification of facial images (i.e. are these two faces the same or different)?

By default a siamese network simply outputs a 'distance' value where the same faces have lower values and different faces have higher values.

I'd like to turn this into a binary classifier so that 1 indicates the two faces are the same and 0 indicates they are not. It seems like it might be possible to take a siamese network and have its 'distance' value feed into another fully connected layer, with the output of this layer indicating 1 or 0 as the binary classifier. I could then train not only the distance layer but also the target binary classifier output layer in order to discover a threshold value that indicates that two distance values are the same face or not, using backpropagation. Having two backpropagation targets though seems like it might get confusing for the network. Perhaps this actually has to be two different networks?

Am I off here? Is there a better way to do this?

I'm using Caffe to drive all of this.
1
Kai Arulkumaran's profile photoBradley Neuberg's profile photo
5 comments
 
Ah right sigmoid saturates at 0 and 1. Thanks!
Add a comment...

Bradley Neuberg

Discussion  - 
 
Can Caffe be used for Recurrent Neural Networks (RNN)? If not, what are most people in the field using to model their RNNs?
1
1
Will Williams's profile photoSoumith Chintala's profile photoMin Ooch's profile photoMariano Phielipp's profile photo
5 comments
 
Check out the pull requests for caffe, there s one pending for integration that brings RNN and LSTM to cafe.
Add a comment...

Bradley Neuberg

Discussion  - 
 
Does anyone know if it's easy to generate charts from Caffe, such as seeing how the error rate changes over training epochs for the test and cross validation data sets? Seems like it's necessary to generate these charts as one tunes the hyperparameters of a neural network to be understand how they effect things.
3
Manuel Lopez Antequera's profile photoSancho McCann's profile photoBradley Neuberg's profile photo
8 comments
 
For future reference for others, here's an example summary of how you can capture Caffe's output, run it through the parse_log.sh script, and then make a simple plot of the results:

./examples/mnist/train_lenet.sh 2>&1 | tee "mnist.log"
./tools/extra/parse_log.sh ./mnist.log
gnuplot ./tools/extra/plot_log.gnuplot.example

You'll want to customize the plot_log.gnuplot.example file for your own uses. The 'tee' command will also print out the results as they run as well as save them to a file, so you can follow training as it happens.
Add a comment...
People
Have him in circles
3,082 people
Nathan Burnett's profile photo
Ray Ni's profile photo
Dan Denney's profile photo
efath ara chowdhury's profile photo
Wayne Montague's profile photo
Vinicius Policena's profile photo
Andrew Peters's profile photo
Clippingpath Freelancer's profile photo
Luciano Bonachela's profile photo
Communities
Places
Map of the places this user has livedMap of the places this user has livedMap of the places this user has lived
Currently
San Francisco, California
Previously
McAllen, Texas - New York, NY - Kamala, Phuket, Thailand - Esalen, California
Work
Occupation
Software Engineer
Employment
  • Google Inc.
    Software Engineer
  • Rojo
  • Bootstrap Foundation
  • Random Walk
  • Internet Archive
  • Sitepen
Education
  • Columbia University
  • The Science Academy
Basic Information
Gender
Male
Other names
Brad