### Behrang Mehrparvar

Discussion -How do we optimize the weights of a Siamese architecture? Do we find the weight of each sub-network separately and then add them?

1

8 comments

Remember than you compute the gradient of loss function with respect to input, not weights of network. So you have G1 and G2. And dL/dG1 = 2 * (G1-G2). dL/dG2 = -2 * (G1-G2). Then propagate dL/dG1 to first network and dL/dG2 to second.

Additional, I do not know if this is your true Loss function but it make no sense. In Siamese Net you need to distinguish between same/not-same pairs. Currently your loss function just want to make input similar to each other. But what when classes are different? Then the distance should we high. Look at this publication, maybe it will clarify you sth: http://yann.lecun.com/exdb/publis/pdf/hadsell-chopra-lecun-06.pdf

Additional, I do not know if this is your true Loss function but it make no sense. In Siamese Net you need to distinguish between same/not-same pairs. Currently your loss function just want to make input similar to each other. But what when classes are different? Then the distance should we high. Look at this publication, maybe it will clarify you sth: http://yann.lecun.com/exdb/publis/pdf/hadsell-chopra-lecun-06.pdf

Add a comment...