Discussion  - 
how does Theano (or TensorFlow) handle the differentials for L1 regularization ?

from the tutorial at http://deeplearning.net/tutorial/mlp.html#going-from-logistic-regression-to-mlp

it defines the cost function

self.L1 = (
+ abs(self.logRegressionLayer.W).sum()

and then just pass this to T.grad(), but supposedly L1 is not differentiable at x=0. how is this handled in the code? I tried to lookup the definition of abs() in Theano, and couldn't find it . ---- I suppose it's just an operator like sigmoid() and should define a grad() similarly?

the literature seems to use pretty complex methods such coordinate descent + Newton, or approximation with smooth functions first. Never seen some heuristic for example just randomly take a subgradient in the [-1,1] range and use it as the normal SGD gradient

still digging for TensorFlow......
Martin Andrews's profile photonacho arroyo's profile photoBoris Ginsburg's profile photoYang Yang's profile photo
The gradient of abs(w) is =-1 when w <0, and =1 when w>0. So SGD will try to push weights to 0.
Then using L1 as regularizer, should in theory help to get sparse weights, when most of weights are 0. In practice, this can be achived by just zeroing very small weights. 
Add a comment...