Profile

Cover photo
Behrang Mehrparvar
Works at University of Houston
Attends University of Houston
95 followers|121,963 views
AboutPostsPhotosYouTube

Stream

 
How do we optimize the weights of a Siamese architecture? Do we find the weight of each sub-network separately and then add them?
1
Vivek Gandhi's profile photoAlessandro Ferrari's profile photoBehrang Mehrparvar's profile photoBartosz Ludwiczuk's profile photo
8 comments
 
Remember than you compute the gradient of loss function with respect to input, not weights of network. So you have G1 and G2. And dL/dG1 = 2 * (G1-G2). dL/dG2 = -2 * (G1-G2). Then propagate dL/dG1 to first network and dL/dG2 to second.
Additional, I do not know if this is your true Loss function but it make no sense. In Siamese Net you need to distinguish between same/not-same pairs. Currently your loss function just want to make input similar to each other. But what when classes are different? Then the distance should we high. Look at this publication, maybe it will clarify you sth: http://yann.lecun.com/exdb/publis/pdf/hadsell-chopra-lecun-06.pdf
Add a comment...
 
I implemented a deep bottleneck auto-encoder with ReLU activation functions with NO pretraining. Based on visualizing weights, activation of hidden units and reconstruction error it seems that the auto-encoder is learning the correct features.
I used the middle layer (bottleneck) as inputs to SVM but I get 12% accuracy on MNIST. Is that the right way to use the features for classification and what do you think is the problem? How many middle layer units is needed for ten classes in data?
2
david golan's profile photoZay Maung Maung Aye's profile photoMahboubeh Farahat's profile photoBehrang Mehrparvar's profile photo
7 comments
 
didn't shuffle the data and classes correctly
Add a comment...
 
How can a deep network disentangle complex variations such as rotations?
Is there any papers about that?
1
Antonio V's profile photoDan Neil's profile photoBehrang Mehrparvar's profile photoStam Nicolis's profile photo
9 comments
 
+Behrang Mehrparvar It doesn't-the statement doesn't make sense. The network doesn't choose to learn a cost function-it either can learn it or it can't. And it doesn't choose between cost functions. It's the cost function that describes what ``rotation of image'' means. So whatever way this statement is encoded by the cost function is learned-or not-by the network, depending on whether its architecture does admit this association as a learnable.
Add a comment...
 
If we know the underlying manifold of the data:

- are the intrinsic dimensions orthogonal (uncorrelated), independent, or neither of them?

- Does it mean that starting form two different points on the manifold, going towards same direction, we are adding same variation to the data?
1
Mosalam Ebrahimi's profile photo
 
1. By definition, they should be independent. Don't get confused by the heuristics of the dimensionality reduction methods (e.g., see http://www.jmlr.org/papers/volume16/cunningham15a/cunningham15a.pdf). The following link has the definition.
2. That's vague, trying to guess what you mean, yes, see http://www.owlnet.rice.edu/~fjones/chap5.pdf
Add a comment...
 
Does "invariance" imply "independence"?
1
Devendra Kumar's profile photoEmre Safak's profile photoBehrang Mehrparvar's profile photo
3 comments
 
I guess feature A is invariant to B if changing B does not affect A.
Add a comment...
 
There is a very interesting claim in this paper [Intriguing properties of neural networks] though it doesn't completely explain it well. Can anyone elaborate the concept please or refer me to related papers?

"... This puts into question the conjecture that neural networks disentangle variation factors across coordinates. Generally, it seems that it is the entire space of activations, rather than the individual units, that contains the bulk of the semantic information. ..."
8
2
Matt Siegel's profile photoStam Nicolis's profile photoKai Arulkumaran's profile photoHamed Aghdam's profile photo
19 comments
 
Here is a quote from Andrej Karpathy about this issue: "it is more appropriate to think of multiple ReLU neurons as the basis vectors of some space that represents in image patches. In other words, the visualization is showing the patches at the edge of the cloud of representations, along the (arbitrary) axes that correspond to the filter weights. This can also be seen by the fact that neurons in a ConvNet operate linearly over the input space, so any arbitrary rotation of that space is a no-op."

You can find more here: http://cs231n.github.io/understanding-cnn/
Add a comment...
 
Is there any specific paper and that claims that hidden units in higher layers are more class-specific? Is this just a hypothesis or assumption or is there any proof for that?
7
2
Paul Wohlhart's profile photoDustin Webb's profile photoPhilippe Castonguay's profile photoXundong Wu's profile photo
10 comments
 
Believe you will find this paper relevant.
http://arxiv.org/abs/1312.6199
Add a comment...
Have him in circles
95 people
Zeinab Ranji's profile photo
Xi Zhao's profile photo
Mohsen Soltani's profile photo
li wei's profile photo
jiaping zhao's profile photo
mohammad tavakoli heshajin's profile photo
Hugo Larochelle's profile photo
Panagiotis Moutafis's profile photo
ali diba's profile photo

Communities

 
Why maximizing the classification score is used for visualizing the hidden units? What is the intuition?
1
1
Devendra Kumar's profile photo
 
there wont be a way of visualization without introducing bias. Why not visualize hidden units in context of things we care about? Is there something else?
Add a comment...
 
At which year did the research on neural networks stop exactly? And why did it stop? What was the state of the art at that time?
1
Hossein Hasanpour's profile photoBehrang Mehrparvar's profile photo
7 comments
 
I might have to provide some information to students in class about that.
Add a comment...
 
Does ReLU nonlinearity satisfy the theory of universal approximation using neural networks?
2
1
Mohammad Pezeshki's profile photoVagif Hasanov's profile photoOleg Zabluda's profile photo
4 comments
 
AFAIK, result for ReLU (not yet invented) was first proved in 
"Approximation by superposition of sigmoidal and radial basis functions" (1992) Mhaskar, Micchelli
http://www.sciencedirect.com/science/article/pii/019688589290016P
Add a comment...
 
Is this true?
coarse coding is only efficient when having discrete unit activities. Otherwise, it just introduces redundancy.
1
Add a comment...

Behrang Mehrparvar

Shared publicly  - 
 
 
Il meccanismo è, purtroppo, esattamente questo. #TheFutureIsNow
 ·  Translate
1 comment on original post
1
Add a comment...
People
Have him in circles
95 people
Zeinab Ranji's profile photo
Xi Zhao's profile photo
Mohsen Soltani's profile photo
li wei's profile photo
jiaping zhao's profile photo
mohammad tavakoli heshajin's profile photo
Hugo Larochelle's profile photo
Panagiotis Moutafis's profile photo
ali diba's profile photo
Communities
Basic Information
Gender
Male
Work
Occupation
PhD Student
Employment
  • University of Houston
    TA, 2011 - present
Education
  • University of Houston
    PhD, 2011 - present
  • KIAU
    BS Computer Engineering, 2003 - 2008
  • IUST
    MS, 2008 - 2011