### Yang Yang

Discussion  -

how does Theano (or TensorFlow) handle the differentials for L1 regularization ?

from the tutorial at http://deeplearning.net/tutorial/mlp.html#going-from-logistic-regression-to-mlp

it defines the cost function

self.L1 = (
abs(self.hiddenLayer.W).sum()
+ abs(self.logRegressionLayer.W).sum()
)

and then just pass this to T.grad(), but supposedly L1 is not differentiable at x=0. how is this handled in the code? I tried to lookup the definition of abs() in Theano, and couldn't find it . ---- I suppose it's just an operator like sigmoid() and should define a grad() similarly?

the literature seems to use pretty complex methods such coordinate descent + Newton, or approximation with smooth functions first. Never seen some heuristic for example just randomly take a subgradient in the [-1,1] range and use it as the normal SGD gradient

still digging for TensorFlow......﻿
7
3