Sander Dieleman

561 followers

561 followers

Sander's interests

Sander's posts

Post has attachment

Inspired by Google's inceptionism art, my colleagues made an interactive visualization of a dreaming convnet. It's pretty trippy!

Post has attachment

My paper about galaxy morphology prediction with convolutional neural networks has been accepted for publication in MNRAS! It's on arXiv: http://arxiv.org/abs/1503.07077

Post has shared content

Our team just won the NDSB on Kaggle! Here's a blog post explaining our approach in detail: http://benanne.github.io/2015/03/17/plankton.html …

Post has attachment

Our team just won the NDSB on Kaggle! Here's a blog post explaining our approach in detail: http://benanne.github.io/2015/03/17/plankton.html …

Post has shared content

A few months ago we had a small post here discussing different weight initializations, and I remember +Sander Dieleman and a few others had a good discussion . It is fairly important to do good weight initializations, as the rewards are non-trivial.

For example, AlexNet, which is fairly popular, from Alex's One Weird Trick paper, converges in 90 epochs (using alex's 0.01 stdv initialization).

I retrained it from scratch using the weight initialization from Yann's 98 paper, and it converges to the same error within just 50 epochs, so technically +Alex Krizhevsky could've rewritten the paper with even more stellar results (training Alexnet in 8 hours with 8 GPUs).

In fact, more interestingly, just by doing good weight initialization, I even removed the Local Response Normalization layers in AlexNet with no drop in error.

I've noticed the same trend with several other imagenet-size models, like Overfeat and OxfordNet, they converge in much lesser epochs than what is reported in the paper, just by doing this small change in weight initialization.

If you want the exact formulae, look at the two links below:

https://github.com/torch/nn/blob/master/SpatialConvolution.lua#L28

https://github.com/torch/nn/blob/master/Linear.lua#L18

And read yann's 98 paper Efficient Backprop: http://yann.lecun.com/exdb/publis/pdf/lecun-98b.pdf

On that note, Surya Ganguli's talk this year at NIPS workshop wrt optimal weight initializations triggered this post. Check out his papers on that side, great work.

For example, AlexNet, which is fairly popular, from Alex's One Weird Trick paper, converges in 90 epochs (using alex's 0.01 stdv initialization).

I retrained it from scratch using the weight initialization from Yann's 98 paper, and it converges to the same error within just 50 epochs, so technically +Alex Krizhevsky could've rewritten the paper with even more stellar results (training Alexnet in 8 hours with 8 GPUs).

In fact, more interestingly, just by doing good weight initialization, I even removed the Local Response Normalization layers in AlexNet with no drop in error.

I've noticed the same trend with several other imagenet-size models, like Overfeat and OxfordNet, they converge in much lesser epochs than what is reported in the paper, just by doing this small change in weight initialization.

If you want the exact formulae, look at the two links below:

https://github.com/torch/nn/blob/master/SpatialConvolution.lua#L28

https://github.com/torch/nn/blob/master/Linear.lua#L18

And read yann's 98 paper Efficient Backprop: http://yann.lecun.com/exdb/publis/pdf/lecun-98b.pdf

On that note, Surya Ganguli's talk this year at NIPS workshop wrt optimal weight initializations triggered this post. Check out his papers on that side, great work.

Post has attachment

A guest post by Jan Schlüter: The fastest convolutions in Theano with meta-optimization!

Post has attachment

Looks like there is a paper about cuDNN now. Some good info about the implementation and the associated design choices. Some very promising GTX 980 benchmarks as well!

The stuff they list as future work (1D, 3D, unshared convolutions, multi-GPU support) looks great, but I'm hoping they will at some point reconsider their stance on FFTs and provide an optimized implementation of that approach as well. If you modify the FFT kernel it should be possible to make the zero-padding on the filters implicit, just like it is now with the GEMM approach, which would already save a ton of memory. Or am I missing something? Of course the transformed filters would still be as big as the input, so it doesn't fix the issue entirely.

The stuff they list as future work (1D, 3D, unshared convolutions, multi-GPU support) looks great, but I'm hoping they will at some point reconsider their stance on FFTs and provide an optimized implementation of that approach as well. If you modify the FFT kernel it should be possible to make the zero-padding on the filters implicit, just like it is now with the GEMM approach, which would already save a ton of memory. Or am I missing something? Of course the transformed filters would still be as big as the input, so it doesn't fix the issue entirely.

Post has attachment

Does anybody have any experience with the latest NVIDIA cards?

Anandtech has a bunch of compute benchmarks for the 980: http://www.anandtech.com/show/8526/nvidia-geforce-gtx-980-review/20

The results look pretty great, especially considering the significant reduction in power usage (the TDP is 165W). But I'm not sure which of these benchmarks is representative for, say, training a convnet, and to what extent. How would it compare to a 780Ti / Titan / K40? Does anybody have a clue about this? Or perhaps someone has some first-hand experience already?

Thanks!

Anandtech has a bunch of compute benchmarks for the 980: http://www.anandtech.com/show/8526/nvidia-geforce-gtx-980-review/20

The results look pretty great, especially considering the significant reduction in power usage (the TDP is 165W). But I'm not sure which of these benchmarks is representative for, say, training a convnet, and to what extent. How would it compare to a 780Ti / Titan / K40? Does anybody have a clue about this? Or perhaps someone has some first-hand experience already?

Thanks!

Post has attachment

I will be speaking about classifying galaxy images with deep learning at the NYC Machine Learning meetup on Thursday, August 21st.

Post has attachment

I will be speaking about classifying galaxy images with deep learning at the NYC Machine Learning meetup on Thursday, August 21st.

Wait while more posts are being loaded