[R] Architecture Determines Optimization: Deriving Weight Updates from Network Topology (seeking arXiv endorsement – cs.LG)
Abstract: We derive neural network weight updates from first principles without assuming gradient descent or a specific loss function. Starting from the error equation E = y − f(x) and linearizing with respect to the free parameters, while noting that …