Profile

Cover photo
Debasish Ghosh
Works at NRI Fintech
Attended Jadavpur University
Lives in Kolkata, India
1,487 followers|467,662 views
AboutPosts
People
Have him in circles
1,487 people
Edward Kmett's profile photo
Mike Miller's profile photo
Peter Hausel's profile photo
Ralf Laemmel's profile photo
Derek Williams's profile photo
Chris Lewis's profile photo
David MacIver's profile photo
Bob Nystrom's profile photo
John Lato's profile photo
Education
  • Jadavpur University
    1984
Basic Information
Gender
Male
Story
Tagline
Programmer, blogger, author, nerd, and Seinfeld fanboy
Introduction
Programming nerd with interest in 
functional programming, 
domain-specific languages, and NoSQL databases.

Debasish is a senior member of ACM and has authored DSLs 
In Action, published by Manning in December 2010. Debasish is also authoring another book Functional and Reactive Domain Modeling also to be published by Manning.
Work
Occupation
CTO
Employment
  • NRI Fintech
    CTO, 2012 - present
  • Anshin Software
    CTO, 2012
  • PricewaterhouseCoopers Ltd
  • Tanning Technologies
  • Techna International
Places
Map of the places this user has livedMap of the places this user has livedMap of the places this user has lived
Currently
Kolkata, India

Stream

Debasish Ghosh

Shared publicly  - 
 
 
A few months ago we had a small post here discussing different weight initializations, and I remember +Sander Dieleman and a few others had a good discussion . It is fairly important to do good weight initializations, as the rewards are non-trivial.
For example, AlexNet, which is fairly popular, from Alex's One Weird Trick paper, converges in 90 epochs (using alex's 0.01 stdv initialization).
I retrained it from scratch using the weight initialization from Yann's 98 paper, and it converges to the same error within just 50 epochs, so technically +Alex Krizhevsky could've rewritten the paper with even more stellar results (training Alexnet in 8 hours with 8 GPUs).
In fact, more interestingly, just by doing good weight initialization, I even removed the Local Response Normalization layers in AlexNet with no drop in error.
I've noticed the same trend with several other imagenet-size models, like Overfeat and OxfordNet, they converge in much lesser epochs than what is reported in the paper, just by doing this small change in weight initialization.
If you want the exact formulae, look at the two links below:
https://github.com/torch/nn/blob/master/SpatialConvolution.lua#L28
https://github.com/torch/nn/blob/master/Linear.lua#L18
And read yann's 98 paper Efficient Backprop: http://yann.lecun.com/exdb/publis/pdf/lecun-98b.pdf

On that note, Surya Ganguli's talk this year at NIPS workshop wrt optimal weight initializations triggered this post. Check out his papers on that side, great work.
30 comments on original post
1
Add a comment...

Debasish Ghosh

Shared publicly  - 
 
 
Videos represent an abundant and rich source of (unsupervised) visual information. Extracting meaningful representations from large volumes of unconstrained video sequences in unsupervised fashion is quite challenging.

Here is one attempt at doing this, but I suspect more research will follow up very shortly. 

http://arxiv.org/abs/1502.04681
Abstract: We use multilayer Long Short Term Memory (LSTM) networks to learn representations of video sequences. Our model uses an encoder LSTM to map an input sequence into a fixed length representation. This representation is decoded using single or multiple decoder LSTMs to perform different ...
View original post
1
1
Ivan Pierre's profile photo
Add a comment...
 
thanks for sharing ..
 
50 Years of Deep Learning and Beyond: an Interview with Jürgen Schmidhuber
INNS Big Data Conference website:  Over the last decades, Jürgen Schmidhuber has been one of the leading protagonists in the advancement of machine learning and neural networks. Alone or with his r...
View original post
2
1
Ivan Pierre's profile photo
Add a comment...

Debasish Ghosh

Shared publicly  - 
 
has to be one of the best introduction to monads ..
 
Answer by Tikhon Jelvis to What are monads and why are they useful? I found this to be the best in depth explanation of what monads are. 
5 comments on original post
1
1
Jay Hutfles's profile photo
Add a comment...
 
Functional Patterns in Domain Modeling - Composing a domain workflow with statically checked invariants
I have been doing quite a bit of domain modeling using functional programming mostly in Scala. And as it happens when you work on something for a long period of time you tend to identify more and more patterns that come up repeatedly within your implementat...
1
1
Satish Chhatpar's profile photoJay Hutfles's profile photo
 
Sir

Can you share examples of retail domain?
Like online shopping 
Add a comment...

Debasish Ghosh

Shared publicly  - 
 
 
Streaming @ SODA: Part II
This is the second of two posts by +Samira Daruki  on the streaming sessions at SODA 2015. For the first post, see here.   In the third paper from the streaming graph family in SODA15: " Parameterized Streaming: Maximal Matching and Vertex Cover ", Chitnis, C...
This is the second of two posts by Samira Daruki on the streaming sessions at SODA 2015. For the first post, see here. In the third paper from the streaming graph family in SODA15: "Parameterized Streaming: Maximal Matching and Vertex Cover", Chitnis, Cormode, Hajiaghayi and Monemizadeh ...
View original post
1
Add a comment...

Debasish Ghosh

Shared publicly  - 
 
Loved the first one ..
 
The second run of the MMDS MOOC on "mining massive datasets" starts Saturday Jan. 31.  In addition to the course material from the previous running, we plan to offer a number of optional programming projects.   You can enroll here:  https://class.coursera.org/mmds-002
Take free online classes from 80+ top universities and organizations. Coursera is a social entrepreneurship company partnering with Stanford University, Yale University, Princeton University and others around the world to offer courses online for anyone to take, for free. We believe in connecting people to a great education so that anyone around the world can learn without limits.
1 comment on original post
1
Add a comment...
Have him in circles
1,487 people
Edward Kmett's profile photo
Mike Miller's profile photo
Peter Hausel's profile photo
Ralf Laemmel's profile photo
Derek Williams's profile photo
Chris Lewis's profile photo
David MacIver's profile photo
Bob Nystrom's profile photo
John Lato's profile photo

Debasish Ghosh

Shared publicly  - 
 
fed - co Ex
1
Add a comment...

Debasish Ghosh

Shared publicly  - 
 
 
There are several new ImageNet results floating around that beat my 5.1% accuracy on ImageNet. Most recently an interesting paper from Google that uses "batch normalization". I wanted to make a few comments regarding "surpassing human-level accuracy":

Optimistic human performance is ~3%
I reported 5.1%, but it is interesting to try to estimate an optimistic human performance on ILSVRC by removing what I call "silly errors":

1. Note that I trained myself with 500 images, and as I documented in my blog post and our ILSVRC paper, 18 of my errors (24% - a quarter) were due to what I consider to be "class unawareness". That means that when I look at my mistake I felt that the answer was relatively evident if I had thought of that class. If I had trained longer, it's reasonable to suppose that I would have eliminated a large chunk of these, making my error ~3.9%.
2. The other issue that I call "insufficient training data" (since I was only shown 13 images / class) is also an error that falls into this category. Without this error type, the accuracy would be ~3.6%
3. The next error I'd be willing to argue I could have prevented was the fine-grained error. In the optimistic estimate, if I was willing to spend 15 minutes / terrier instead of ~5 minutes / terrier, the error would become 3.2%.

The remainder of the errors were "multiple objects" and "incorrect annotations", which I consider to be near insurmountable to some degree.

TLDR:
- 5.1% is an error rate for a human who trained for 500 images and then spent up to ~5 minutes per image.
- About ~3% is an optimistic estimate without my "silly errors".

Human ensemble experiments
This ~3% conclusion is also consistent with our "optimistic human" experiments that were based on ~250 images (and reported on in the ILSVRC paper). We had two labelers and considered an image correct if at least one of us got it. Our optimistic human performance was 2.4%, but this is a bit noisy result due to insufficient data. Moreover, we expect that an actual human ensemble would be slightly higher error than the "optimistic human", so ~3% seems relatively consistent with this interpretation.

Top5/Top1 error
As a second point, I do think we should start to look at the top-1 accuracy a bit more. I understand that there are problems with it but I do believe that there is some signal there. For example, there are only 5 snake species, so when I saw a snake image I just lazily labeled all 5 snake types and knew I got it right somewhere in top 5. In other words, the top5 error does not test differentiating between snake types. A few other categories share this property (few fishes and car types, for example).

I don't at all intend this post to somehow take away from any of the recent results: I'm very impressed with how quickly multiple groups have improved from 6.6% down to ~5% and now also below! I did not expect to see such rapid progress. 

It seems that we're now surpassing a dedicated human labeler. And imo, when we are down to 3%, we'd matching the performance of a hypothetical super-dedicated fine-grained expert human ensemble of labelers.

My blog: 
http://karpathy.github.io/2014/09/02/what-i-learned-from-competing-against-a-convnet-on-imagenet/
The ILSVRC paper that has more details on human optimistic results:
http://arxiv.org/abs/1409.0575
Abstract: Training Deep Neural Networks is complicated by the fact that the distribution of each layer's inputs changes during training, as the parameters of the previous layers change. This slows down the training by requiring lower learning rates and careful parameter initialization, ...
3 comments on original post
1
Add a comment...

Debasish Ghosh

Shared publicly  - 
14
2
Brian Oxley's profile photoGible Fog's profile photo
Add a comment...

Debasish Ghosh

Shared publicly  - 
 
Learning Concurrent Programming in Scala - a brief review

The preface of the book says “Its goal is to introduce important concurrency abstractions, and at the same time show how they work in real code”. The book is indeed a detailed one with 366 pages discussing concurrency paradigms in Scala. The author has organized the book very carefully, starting with the basics of concurrency and then building advanced materials on top of them. 

Chapter 2 discusses all concurrency primitives on the JVM - processes, threads, monitors and synchronization, atomicity, reordering and the Java Memory Model. Even if you are not very familiar with these concepts, a run through this chapter will serve you good to know the basics. I liked this approach very much.

Chapter 3 builds on top of chapter 2 and teaches you how the bigger building blocks are built on top of the primitives. It discusses topics like the Executor framework, atomic variables, lazy values and concurrent collections. There is a section dedicated to lock free programming where the author discusses CAS based implementations and lock free operations. I think this section could have been a bit more detailed with comparisons of lock free programming and wait free programming or designing a lock free data structure.

Chapter 4 is perhaps the core chapter of the book which discusses the various asynchronous abstractions that Scala standard library offers like Futures, Promises, importance of non blocking operations and the scala async framework. The discussion on Futures and Promises is fairly comprehensive and contains code snippets illustrating the usage of each of them. However the examples shown in this chapter are quite basic ones - may be the book targets a basic learning of the tools (in fact the title of the book also says so). But adding a section for the advanced learners where we develop some interesting combinators that can be used for concurrent composition would have been great.

Chapter 5 discusses parallel collections and chapter 6 details the reactive extensions. Both of these chapters are well written and covers the basics pretty well. In chapter 6 the author discusses reactive programming and event driven programming. The sections on Observables, composing observables and writing custom observables give a very detailed account of the programming model. The reader should benefit from these discussions.

Chapter 7 discusses software transactional memory, though I am not sure how many people use that in Scala. The chapter discusses all the basics of STM as implemented in Scala. But a section on gotchas and pitfalls would have been very useful. In the summary section of this chapter there are a few good references on STM, but the performance issues that are there in the Scala STM needed to be highlighted a bit more.

Chapter 8 is all actors and Akka. Again the author discusses as much as possible to elaborate the features in one chapter. It’s fairly comprehensive as an introduction to the framework.

The core strength of the book is the breadth of topics that it covers. The reader should get an idea of the overall space of concurrency on the JVM in general and Scala in particular. In some parts of the book, the discussion may seem a bit prosaic and mechanical, sometimes you get the feeling that you are reading a text book. In some of the chapters I would have liked to see specific sections dedicated to discussing design patterns and idioms for using the techniques, and also gotchas and pitfalls. But overall the book looks quite comprehensive and is possibly the best reference material out there that touches upon so many topics of concurrency on the JVM. 
12
4
Raymond Tay's profile photoSam Bessalah's profile photo
Add a comment...