Hello everybody. I'm new to machine learning and have implemented a neural network from scratch. https://github.com/zxteloiv/neural-network-from-scratch
After trained on 2000 MNIST dataset samples, the model always outputs a constant number.
My question is whether 2000 sample for training is enough when you do such a job.
Though the ten output classes have different weights each time, it's always the very class whose weight is largest so I have no choice but to use it as the prediction for this test sample. And this happens for every test sample.
The class will be different if I train the model for a next time. This is rational because of the randomized intialization.
I've checked both code implementation and partial derivative computation are correct.