### Behrang Mehrparvar

Discussion -What makes a network focus on learning shapes and edges instead of textures (see figure for reconstructed output images on SVHN)?

3

4 comments

+Yossi Biton well, this is an unsupervised scenario

Add a comment...

Start a hangout

Behrang Mehrparvar

Works at University of Houston

Attends University of Houston

95 followers|127,247 views

AboutPostsPhotosYouTube

What makes a network focus on learning shapes and edges instead of textures (see figure for reconstructed output images on SVHN)?

3

4 comments

+Yossi Biton well, this is an unsupervised scenario

Add a comment...

Why maximizing the classification score is used for visualizing the hidden units? What is the intuition?

1

1

there wont be a way of visualization without introducing bias. Why not visualize hidden units in context of things we care about? Is there something else?

Add a comment...

At which year did the research on neural networks stop exactly? And why did it stop? What was the state of the art at that time?

1

7 comments

I might have to provide some information to students in class about that.

Add a comment...

Does ReLU nonlinearity satisfy the theory of universal approximation using neural networks?

2

1

4 comments

AFAIK, result for ReLU (not yet invented) was first proved in

"Approximation by superposition of sigmoidal and radial basis functions" (1992) Mhaskar, Micchelli

http://www.sciencedirect.com/science/article/pii/019688589290016P

"Approximation by superposition of sigmoidal and radial basis functions" (1992) Mhaskar, Micchelli

http://www.sciencedirect.com/science/article/pii/019688589290016P

Add a comment...

Is this true?

coarse coding is only efficient when having discrete unit activities. Otherwise, it just introduces redundancy.

coarse coding is only efficient when having discrete unit activities. Otherwise, it just introduces redundancy.

1

Add a comment...

There is a very interesting claim in this paper [Intriguing properties of neural networks] though it doesn't completely explain it well. Can anyone elaborate the concept please or refer me to related papers?

"... This puts into question the conjecture that neural networks disentangle variation factors across coordinates. Generally, it seems that it is the entire space of activations, rather than the individual units, that contains the bulk of the semantic information. ..."

"... This puts into question the conjecture that neural networks disentangle variation factors across coordinates. Generally, it seems that it is the entire space of activations, rather than the individual units, that contains the bulk of the semantic information. ..."

8

2

19 comments

Here is a quote from Andrej Karpathy about this issue: "it is more appropriate to think of multiple ReLU neurons as the basis vectors of some space that represents in image patches. In other words, the visualization is showing the patches at the edge of the cloud of representations, along the (arbitrary) axes that correspond to the filter weights. This can also be seen by the fact that neurons in a ConvNet operate linearly over the input space, so any arbitrary rotation of that space is a no-op."

You can find more here: http://cs231n.github.io/understanding-cnn/

You can find more here: http://cs231n.github.io/understanding-cnn/

Add a comment...

How do we optimize the weights of a Siamese architecture? Do we find the weight of each sub-network separately and then add them?

2

8 comments

Remember than you compute the gradient of loss function with respect to input, not weights of network. So you have G1 and G2. And dL/dG1 = 2 * (G1-G2). dL/dG2 = -2 * (G1-G2). Then propagate dL/dG1 to first network and dL/dG2 to second.

Additional, I do not know if this is your true Loss function but it make no sense. In Siamese Net you need to distinguish between same/not-same pairs. Currently your loss function just want to make input similar to each other. But what when classes are different? Then the distance should we high. Look at this publication, maybe it will clarify you sth: http://yann.lecun.com/exdb/publis/pdf/hadsell-chopra-lecun-06.pdf

Additional, I do not know if this is your true Loss function but it make no sense. In Siamese Net you need to distinguish between same/not-same pairs. Currently your loss function just want to make input similar to each other. But what when classes are different? Then the distance should we high. Look at this publication, maybe it will clarify you sth: http://yann.lecun.com/exdb/publis/pdf/hadsell-chopra-lecun-06.pdf

Add a comment...

I implemented a deep bottleneck auto-encoder with ReLU activation functions with NO pretraining. Based on visualizing weights, activation of hidden units and reconstruction error it seems that the auto-encoder is learning the correct features.

I used the middle layer (bottleneck) as inputs to SVM but I get 12% accuracy on MNIST. Is that the right way to use the features for classification and what do you think is the problem? How many middle layer units is needed for ten classes in data?

I used the middle layer (bottleneck) as inputs to SVM but I get 12% accuracy on MNIST. Is that the right way to use the features for classification and what do you think is the problem? How many middle layer units is needed for ten classes in data?

2

7 comments

didn't shuffle the data and classes correctly

Add a comment...

How can a deep network disentangle complex variations such as rotations?

Is there any papers about that?

Is there any papers about that?

1

9 comments

+Behrang Mehrparvar It doesn't-the statement doesn't make sense. The network doesn't choose to learn a cost function-it either can learn it or it can't. And it doesn't choose between cost functions. It's the cost function that describes what ``rotation of image'' means. So whatever way this statement is encoded by the cost function is learned-or not-by the network, depending on whether its architecture does admit this association as a learnable.

Add a comment...

If we know the underlying manifold of the data:

- are the intrinsic dimensions orthogonal (uncorrelated), independent, or neither of them?

- Does it mean that starting form two different points on the manifold, going towards same direction, we are adding same variation to the data?

- are the intrinsic dimensions orthogonal (uncorrelated), independent, or neither of them?

- Does it mean that starting form two different points on the manifold, going towards same direction, we are adding same variation to the data?

1

1. By definition, they should be independent. Don't get confused by the heuristics of the dimensionality reduction methods (e.g., see http://www.jmlr.org/papers/volume16/cunningham15a/cunningham15a.pdf). The following link has the definition.

2. That's vague, trying to guess what you mean, yes, see http://www.owlnet.rice.edu/~fjones/chap5.pdf

2. That's vague, trying to guess what you mean, yes, see http://www.owlnet.rice.edu/~fjones/chap5.pdf

Add a comment...

Does "invariance" imply "independence"?

1

3 comments

I guess feature A is invariant to B if changing B does not affect A.

Add a comment...

Communities

Education

- University of HoustonPhD, 2011 - present
- KIAUBS Computer Engineering, 2003 - 2008
- IUSTMS, 2008 - 2011

Links

Work

Occupation

PhD Student

Employment

- University of HoustonTA, 2011 - present

Basic Information

Gender

Male