Analyzing Training Example Order Effects in Neural Network Gradient Descent
This article explores how the order of training examples affects neural network training via gradient descent, contrary to Bayesian assumptions that training data is unordered. It explains how to compute the effects of swapping training example order on a per-parameter level using Lie brackets, which measure the non-commutativity of gradient updates from dif