Profile

Cover photo
Debasish Ghosh
Works at NRI Fintech
Attended Jadavpur University
Lives in Kolkata, India
1,386 followers|278,331 views
AboutPosts
People
Have him in circles
1,386 people
Work
Occupation
CTO
Employment
  • NRI Fintech
    CTO, 2012 - present
  • Anshin Software
    CTO, 2012
  • PricewaterhouseCoopers Ltd
  • Tanning Technologies
  • Techna International
Places
Map of the places this user has livedMap of the places this user has livedMap of the places this user has lived
Currently
Kolkata, India
Story
Tagline
Programmer, blogger, author, nerd, and Seinfeld fanboy
Introduction
Programming nerd with interest in
 functional programming, 
domain-specific languages, and NoSQL databases.

Debasish is a senior member of ACM and author of DSLs 
In Action, published by Manning in December 2010. 
Education
  • Jadavpur University
    1984
Basic Information
Gender
Male

Stream

Debasish Ghosh

Shared publicly  - 
 
 
How probabilistic programming should change machine learning

The practice of machine learning is ripe for transformation. If researchers are able to overcome some deep technical challenges, probabilistic programming languages can be a big driver of this fundamental change.

(A probabilistic programming language is one which abstracts the inference or learning process from the programmer, leaving the developer free to focus on the model that best explains the data at hand. These languages are in their early days, but various research efforts are underway.)

What does the practice of machine learning look like today? The mainstream workflow looks something like this:

1. Obtain a dataset

2. If its not already in tabular form, hammer the data into a table where each row is an entity and each column a feature or variable pertaining to the entities

3. Leverage whatever insights arre available about the problem to perform feature engineering, i.e., making the columns as hopefully informative as possible

4. Run the resulting table through one or more stock ML methods, probably imputing missing values and/or coercing values into the types needed by each method

5. Use cross-validation to evaluate the results along various accuracy metrics

6. If multiple ML methods have been tried, consider creating an ensemble model that combines results from more than one

7. If results are good enough, finish; if not, go back to step 3, iterating as necessary or until the Kaggle deadline hits

Caricature, maybe - but I think this does capture the essence of current, mainstream ML in practice.

What's sub-optimal about this process? In short, the data is made to fit the methods, rather than the methods being adapted to the nature of the data. The world rarely produces fully-observed, homogeneous tables - yet this is the format that our methods are most comfortable handling, and we work hard to feed them well. Feature engineering is the toolset we use to beat our raw data into this shape.

Highlighting the inflexibility of current mainstream methods in this way casts a different light on the emerging conventional wisdom that prior or domain knowledge is mostly useless in machine learning - that what counts is the ability to bring many different methods to bear, and then choose the combination that yields the best quantitative results. If feature engineering is our best or only way of incorporating domain knowledge, then it should be no surprise that the method with the best generic performance (often random forests these days) would rise to the top.

Performant, accessible probabilistic programming environments would radically alter this balance of power. Practitioners would be able to leverage much more of their knowledge about the problem at hand, not just through the indirect and scattershot mechanisms of feature engineering, but rigorously and predictably by encoding their understanding as a probabilistic model. As a first consequence, the value of this domain knowledge would rise quickly.

The role currently played by feature engineering would be incorporated in to the program itself, rather than lying largely outside of the learning process as it currently does. Well-written probabilistic programs would simultaneously consider many different projections and transformations of the data, and considering a new feature would not require dropping any previous attempt. In other words, many parts of the current workflow that are currently executed in series, iteratively, could instead be explored jointly. As a result, a much larger space of models could be considered, and many more explanatory hypotheses could be entertained. Quality of restuls would rise, as would our confidence in them.

The fallout would be widespread:

- The machine learning workflow would change to emphasize model exploration (i.e., probabilistic programming), rather than dataset doctoring.

- Skillsets would change, and the division of labor between inference expertise and domain modeling chops would usefully clarify; productivity (analytic insight per dollar) would rise as a result.

- Data collection and retention would (eventually) change, as it became clear that new and different kinds of observations could now be analyzed and understood.

I hope this world comes to pass. It will take at least a few years, but I think the impact will be massive.

A couple notes:

1. This discussion addresses the mainstream practice of machine learning, rather than ML methods research. Part of the attraction of probabilistic programming is that it might provide a channel by which the exciting new results in the literature, of which there are many, most of them inaccessible to practitioners for all practical purposes, might be packaged and made more widely available.

2. This has been focused on supervised learning. If anything, the impact of probabilistic programming will be much greater in the so-called unsupervised setting. Very briefly, probabilistic programs can describe much richer latent structures that can be learned from data; deployed correctly, these could be much more informative than, say, the partitions produced by clustering methods.

Follow me on twitter:@beaucronin
2
Add a comment...

Debasish Ghosh

Shared publicly  - 
 
Functional Patterns in Domain Modeling - Immutable Aggregates and Functional Updates
In the last post I looked at a pattern that enforces constraints to ensure domain objects honor the domain rules. But what exactly is a domain object ? What should be the granularity of an object that my solution model should expose so that it makes sense t...
In the last post I looked at a pattern that enforces constraints to ensure domain objects honor the domain rules. But what exactly is a domain object ? What should be the granularity of an object that my solution model should...
3
Add a comment...

Debasish Ghosh

Shared publicly  - 
 
 
Had a  very nice dinner with Mike Tipping and David Duvenaud on Tuesday night in Cambridge. Great to catch up with Mike, who's had a big influence on ML with probabilistic PCA and sparse Bayesian learning. He's been working in the commercial sector for a few years (but it was really nice when  David mentioned [in passing] perspectives about sparsity that were originated by Mike!).  

Mike followed up with a mail asking the following question (shared with permission): "If I were to read 5 papers from the last couple of years that capture the interesting/important stuff happening in ML, what would they be?"

So below's my answer: I love the fact that four of them are on arxiv. I also know that at least two of them had trouble getting published (either delayed in publication or reviewers not enthusiastic ... etc). 

They are chosen partly as reflections of where I think the field is going, and partly as reflections of where I think the field should be going. And of course the list is totally subjective and missing great papers by some of my favourite researchers: it's a personal list, but Mike and I share similar tastes. It will be interesting to hear Mike's opinion about them when he's done.

Stochastic variational inference by Hoffman, Wang, Blei and Paisley
http://arxiv.org/abs/1206.7051
A way of doing approximate inference for probabilistic models with potentially billions of data ... need I say more?

Austerity in MCMC Land: Cutting the Metropolis Hastings by Korattikara, Chen and Welling
http://arxiv.org/abs/1304.5299
Oh ... I do need to say more ... because these three are at it as well but from the sampling perspective. Probabilistic models for big data ... an idea so important it needed to be in the list twice. 

Practical Bayesian Optimization of Machine Learning Algorithms by Snoek, Larochelle and Adams
http://arxiv.org/abs/1206.2944
This paper represents the rise in probabilistic numerics, I could also have chosen papers by Osborne, Hennig or others. There are too many papers out there already. Definitely an exciting area, be it optimisation, integration, differential equations. I chose this paper because it seems to have blown the field open to a wider audience, focussing as it did on deep learning as an application, so it let's me capture both an area of developing interest and an area that hits the national news.

Kernel Bayes Rule by Fukumizu, Song, Gretton
http://arxiv.org/abs/1009.5736
One of the great things about ML is how we have different (and competing) philosophies operating under the same roof. But because we still talk to each other (and sometimes even listen to each other)  these ideas can merge to create new and interesting things. Kernel Bayes Rule makes the list.

http://www.cs.toronto.edu/~hinton/absps/imagenet.pdf
An obvious choice, but you don't leave the Beatles off lists of great bands just because they are an obvious choice.
1
Add a comment...
 
A Sketch as the Query Model of an EventSourced System
In my last post I discussed the count-min sketch data structure that can be used to process data streams using sub-linear space. In this post I will continue with some of my thoughts on how count-min sketches can be used in a typical event sourced applicati...
In my last post I discussed the count-min sketch data structure that can be used to process data streams using sub-linear space. In this post I will continue with some of my thoughts on how count-min sketches can be used in a...
1
1
Jean-Philippe Melanson's profile photo
Add a comment...

Debasish Ghosh

Shared publicly  - 
 
 
Seen in Palo Alto.
11
1
Josh Long's profile photo
Add a comment...

Debasish Ghosh

Shared publicly  - 
 
I read the original post and RT'd on Twitter. Then +Stuart Halloway clarified a lot of things about how Datomic addresses them. Also have a look at +Rich Hickey 's comment below ..
 
Seems like a lot of people are singing the praises of immutable databases. I've heard this song before. http://www.xaprb.com/blog/2013/12/28/immutability-mvcc-and-garbage-collection/
2
Naftoli Gugenheim's profile photo
 
+Debasish Ghosh what comment?
Add a comment...
Have him in circles
1,386 people

Debasish Ghosh

Shared publicly  - 
 
Functional Patterns in Domain Modeling - Anemic Models and Compositional Domain Behaviors
I was looking at the presentation that Dean Wampler made recently regarding domain driven design, anemic domain models and how using functional programming principles help ameliorate some of the problems there. There are some statements that he made which, ...
I was looking at the presentation that Dean Wampler made recently regarding domain driven design, anemic domain models and how using functional programming principles help ameliorate some of the problems there. There are some...
9
2
Raymond Tay's profile photoRaghava Nellaturu's profile photo
Add a comment...

Debasish Ghosh

Shared publicly  - 
 
Functional Patterns in Domain Modeling - The Specification Pattern
When you model a domain, you model its entities and behaviors. As Eric Evans mentions in his book Domain Driven Design , the focus is on the domain itself. The model that you design and implement must speak the ubiquitous language so that the essence of the...
When you model a domain, you model its entities and behaviors. As Eric Evans mentions in his book Domain Driven Design, the focus is on the domain itself. The model that you design and implement must speak the ubiquitous lang...
5
2
Daniel Hinojosa's profile photoMark Derricutt's profile photo
Add a comment...
 
 
We have just published Version 2.0 of the Mining of Massive Datasets book, available for free at http://i.stanford.edu/~ullman/mmds.html
The major change is the addition of Ch. 12 on Large-Scale Machine Learning.   For this second edition, Jure Leskovec has become the third coauthor.
12
4
Raymond Tay's profile photoGerd Riesselmann's profile photoAlexey Romanov's profile photoAleksey Izmailov's profile photo
 
Thanks for sharing :)
Add a comment...

Debasish Ghosh

Shared publicly  - 
 
 
Segment on Bloomberg TV about AI and my role at Facebook.

This segment aired Wednesday and is part of a series on AI hosted by Sam Grobart. Thursday's guest was Peter Lee, head of Microsoft Research.

http://www.bloomberg.com/video/meet-facebook-s-head-of-artificial-intelligence-9W0Ysm_1QayALHArsZGkFw.html

Also available on Youtube: Meet Facebook's Head of Artificial Intelligence

A follow up segment talks about the competition for AI talent between Facebook, Google and others entitled "Can Facebook's AI Chief Attract Best and Brightest?": http://www.bloomberg.com/video/can-facebook-s-ai-chief-attract-best-and-brightest-4fv5BBU8SnakvkITP_Rmzw.html
1
Add a comment...

Debasish Ghosh

Shared publicly  - 
 
In today's age of Big Data, streaming is one of the techniques for low latency computing. Besides the batch processing infrastructure of map/reduce paradigm, we are seeing a plethora of ways in which streaming data is process...
5
Add a comment...

Debasish Ghosh

Shared publicly  - 
 
Lot of food for thought. 
 
Unfortunately I cannot attend this year's #icfp  where there is a lot of talk about effects in Haskell. Here is a statement I would have made if I could participate, jointly with Matija Pretnar:

We would like to quickly report on the experience we have had with the design of Eff, a programming language with first-class algebraic effects and handlers.

First of all, algebraic effects and handlers are a lot of fun.

If Haskell goes down the road of algebraic effects and handlers, it will most likely want to incorporate typing information about effects that will in many respects resemble an effect systems. It is quite likely that this could be accomplished with existing Haskell technology, or minor modifications of it, as some contributions to ICFP this year are demonstrating. Matija and I have developed an effect system for Eff, in case anyone cares to have a look at how that works. The resemblance with what one would expect in Haskell is plain.

Regarding typing discipline, let us express a real-programmer stance: if we want to write a print statement in the middle of the code as an afterthought, the language should let us do it without fuss. (We can sense the high priests of Haskell shaking their heads in disapproval of such heretic ML-style thoughts.)

Possibly the greatest advantage of algebraic effects and handlers over traditional Haskell monads is the ease with which they combine. Down with monad transformers! In theory monads are more general than algebraic effects, but we feel that the extra generality does not matter in practice because the most important non-algebraic effect, namely (delimited) continuations, is seamlessly incorporated into the picture.

For us it was surprising how easy it is to implement in Eff control mechanisms that are typically accomplished through (delimited) continuations. In fact, it is our conclusion that the relationship between handlers and (delimited) continuations is the same as the relationship between while loops and goto statements. That is, handlers are the structured way of programming with continuations. They take black magic out of continuations and make them comprehensible also to people whose name is not Oleg.

All existing implementations of algebraic effects and handlers that we are aware of have not yet addressed efficiency of execution. Handlers are installed dynamically, a bit like exception handlers with access to continuations. A naive execution strategy with dynamic dispatch will result in unacceptable loss of efficiency. This point needs to be addressed before algebraic effects can appear in real-world programming languages.

We are sorry that we cannot participate at this most interesting panel discussion. With kind regards from Slovenia and Japan! Matija and Andrej

24 September, 2013.
2
1
Alexey Romanov's profile photo
Add a comment...