Profile

Cover photo
Behrang Mehrparvar
Works at University of Houston
Attends University of Houston
75 followers|73,594 views
AboutPostsPhotosYouTube

Stream

 
If we have the network, the representation of a single sample and the error for that sample, can we reconstruct that input sample exactly, without error? If so, how?
1
Sancho McCann's profile photoYossi Biton's profile photo
2 comments
 
Most of the layers (fully-connected, convolutional,...) aren't invertible too
Add a comment...
 
Is saturation good?
Does saturation imply sparsity or vise versa?
Does the continuous "value" of the hidden units really matter, or just binary is good enough?
1
Alessandro Ferrari's profile photoRakesh Chalasani's profile photoBehrang Mehrparvar's profile photo
5 comments
 
they are pushing the nonlinearity to it's flat region where the gradient is zero (figure 1 of paper) where the learning does not happen according to the gradient.
Add a comment...
 
Can anyone explain the co-adapting between units mentioned in dropout paper, in terms of back-propagation? What does it exactly mean?
3
1
Ahmed Mohamedou's profile photoBehrang Mehrparvar's profile photoKyle Kastner's profile photoHao Wooi Lim's profile photo
9 comments
 
It encourages sparsity. It is similar in some ways to dropout in that you are encouraging only some units to be active, but there is no other constraint to keep the units from co-adapting except the fact that you want to encourage solutions with less activations which should force the network into learning simpler relationships. But dropout seems to work much better in practice.
Add a comment...
 
I trained an auto-encoder and then randomly re-initialized some filters (weight vectors) after some epochs. Surprisingly, after continuing training I realized that those filters (at least visually) learn the same pattern like before. Anyone can explain this?
2
SHUWEN HUAN's profile photoHuaxing Jin's profile photoBehrang Mehrparvar's profile photoDevendra Kumar's profile photo
10 comments
 
the definition of SOME filters is ambiguous. You can see it in cost if you decompose W into (W + dW(SOME filters)), here dW is a function of SOME. You can actually change SOME to various permissible ranges and see how much it can tolerate.  You can also shuffle all hidden nodes randomly and see how far it goes. This i guess would be extreme.

After doing these kind of experiments, only we could conclude what is really happening.
Now once you have these emperical results, then it is worth your time to do theoretical analysis, i believe. Why waste time if emperical results dont show something is really happening. But a minimal would be tuning SOME and scaling dW.
Add a comment...
 
What is the intuition behind the enery of an RBM? Is the term "xWh" encouraging the hidden units to copy the input units? why? and how does this term and the bias terms xb and yb precisely result in "separating factors of variation"?!
2
Add a comment...
 
In perceptron learning (and some other topics in machine learning) there is this notion of "input space" versus "weight space" (sometimes called "version space" I suppose).
Does anyone know of any good references that defines these spaces and the mapping between them?
1
Behrang Mehrparvar's profile photoFrancis Quintal Lauzon's profile photo
5 comments
 
Sorry, I couldn't help with this.  May be someone else might have a better input.
Add a comment...
 
what is the (dis)advantage of using 3D CNNs for (spatio-)temporal data instead of LSTM-RNNs?
3
Pedro Cisneros's profile photoBehrang Mehrparvar's profile photoJustin Bayer's profile photoMike Gashler's profile photo
6 comments
 
An important consideration is whether you need interpolation or extrapolation in the temporal dimension. CNN is more of a curve-fitting approach, so I would expect it to excel at interpolation. RNNs are designed for extrapolation over time. Also, is the temporal dimension sampled at discrete regular intervals? If not, then RNNs will be tricky to use with your data.
Add a comment...
Have him in circles
75 people
Hou Yunqing's profile photo
Xi Zhao's profile photo
arash sangari's profile photo
Hugo Larochelle's profile photo
Sankar Mukherjee's profile photo
Hongyang Zhang's profile photo
Marzieh Berenjkoub's profile photo
li wei's profile photo
Dat Chu's profile photo

Communities

 
OK, here is the problem: On one hand I have backpropagation which I can understand completely. On the other hand I can see the visual image of features that we get from let's say MNIST. I just can not see the connection. Why does backpropagation result in such well-shaped features? I can not accept it just as a black box. And whatever I read is a high level introduction of what is happening. Can anyone suggest a paper that gets deep into this?
9
2
Alexandre Dalyac's profile photoBehrang Mehrparvar's profile photoHao Wooi Lim's profile photoDeniz Yuret's profile photo
9 comments
 
How about this:

If we have single repeated image in the training set, the best way to get the least reconstruction error is to copy that image right on the weight vectors but rescale the intensity in a way that the result could pass through the nonlinear function without problems. The gradient descent helps going through this copying process iteratively.

Now, we know that we don't have single image in the training data, therefore, each image tries to copy itself onto the weight vectors so each image is somehow pulling the weight vectors towards itself.
This pulling process it done over and over through the epochs and is done slowly and iteratively through gradient descent.

However, in the training data, there are some regions of the image that do repeat a lot in the training data (factors of variation). These regions of image, similar to the whole image in the first example, somehow manage to retain a copy on  the weight vectors. These are the blob-like regions on the filters.

Nevertheless, we have many hidden units and and an image can be reconstructed by combining the response of filters from many hidden units. Therefore, the optimization doesn't need to capture all variation through one hidden unit. So the variations will be distributed among the units and we will have different blob-like filters corresponding to different hidden units.
Add a comment...
 
Isn't sparsity against "distributed" representation?
6
Phillip Adkins's profile photoAlexandre Dalyac's profile photoBehrang Mehrparvar's profile photo금민석's profile photo
31 comments
 
Referring to below paper, it seems they are two opposing concepts.

https://www.cs.toronto.edu/~hinton/absps/rgbn.pdf

Add a comment...
 
why does depth help in CNNs? what does "convolution of a convolution" mean? Why not just a single convolution/pooling layer and many fully connected layers on top of it?
1
1
Behrang Mehrparvar's profile photoSander Dieleman's profile photoDebanjan Bhattacharyya's profile photoSumit Sanyal's profile photo
4 comments
 
Second that. The second conv layer has 3D filters. So a "single" filter is being applied to multiple feature maps from the first conv layer. Some times the 3rd dimension spans all the features maps like Sander said. Some times combinations are chosen (see Page 8 of Yann Lecuns seminal work on Gradient Based Learning Applied to Document Recognition)
Add a comment...
 
The sample NN of family tree is always a good example to ask questions about [coursera:NN and ML (Hinton): Lec 4a]

"What" would make a learning algorithm learn such separate features e.g. nationality, generation, branch, etc without any specific constraint or prior related to it?
1
Add a comment...
 
what should I study to understand local vs. global variation/generalization?
1
Hou Yunqing's profile photo
 
Just here to show some support. I like the questions you've been asking :)
Add a comment...
People
Have him in circles
75 people
Hou Yunqing's profile photo
Xi Zhao's profile photo
arash sangari's profile photo
Hugo Larochelle's profile photo
Sankar Mukherjee's profile photo
Hongyang Zhang's profile photo
Marzieh Berenjkoub's profile photo
li wei's profile photo
Dat Chu's profile photo
Communities
Basic Information
Gender
Male
Work
Occupation
PhD Student
Employment
  • University of Houston
    TA, 2011 - present
Education
  • University of Houston
    PhD, 2011 - present
  • KIAU
    BS Computer Engineering, 2003 - 2008
  • IUST
    MS, 2008 - 2011