### Yang Yang

Discussion -how does Theano (or TensorFlow) handle the differentials for L1 regularization ?

from the tutorial at http://deeplearning.net/tutorial/mlp.html#going-from-logistic-regression-to-mlp

it defines the cost function

self.L1 = (

abs(self.hiddenLayer.W).sum()

+ abs(self.logRegressionLayer.W).sum()

)

and then just pass this to T.grad(), but supposedly L1 is not differentiable at x=0. how is this handled in the code? I tried to lookup the definition of abs() in Theano, and couldn't find it . ---- I suppose it's just an operator like sigmoid() and should define a grad() similarly?

the literature seems to use pretty complex methods such coordinate descent + Newton, or approximation with smooth functions first. Never seen some heuristic for example just randomly take a subgradient in the [-1,1] range and use it as the normal SGD gradient

still digging for TensorFlow......

from the tutorial at http://deeplearning.net/tutorial/mlp.html#going-from-logistic-regression-to-mlp

it defines the cost function

self.L1 = (

abs(self.hiddenLayer.W).sum()

+ abs(self.logRegressionLayer.W).sum()

)

and then just pass this to T.grad(), but supposedly L1 is not differentiable at x=0. how is this handled in the code? I tried to lookup the definition of abs() in Theano, and couldn't find it . ---- I suppose it's just an operator like sigmoid() and should define a grad() similarly?

the literature seems to use pretty complex methods such coordinate descent + Newton, or approximation with smooth functions first. Never seen some heuristic for example just randomly take a subgradient in the [-1,1] range and use it as the normal SGD gradient

still digging for TensorFlow......

7

3

5 comments

Yang Yang

+

1

2

1

2

1

+Martin Andrews thanks a lot!

Add a comment...