The document discusses phase transitions in neural networks with different activation functions (sigmoid vs ReLU). It finds that for sigmoid activations with more hidden units (K>2), there is a discontinuous first-order phase transition from an initial unspecialized state to a specialized state that facilitates learning. For ReLU activations, the transition is continuous, with both specialized and anti-specialized states achieving good generalization. Monte Carlo simulations show dynamics converging to specialized or anti-specialized states depending on initialization.
Related topics: