Splitting Adam Apr 2026

It isolates the stochastic direction (the sign of the gradient) from the adaptive step size (the relative variance).

Based on your interest in "Splitting Adam," you are likely referring to research surrounding the widely used in machine learning. There isn't one single paper with that exact title, but several "interesting" papers analyze splitting the algorithm's components or its behavior in complex ways: 1. The Sign, Magnitude and Variance of Stochastic Gradients Splitting Adam

By testing these separately, researchers found that "Stochastic Sign Descent" can actually outperform standard Adam on specific datasets like MNIST and CIFAR10. 2. Adaptive Multilevel Splitting (ADAM) It isolates the stochastic direction (the sign of

A more recent and highly regarded paper (2025) investigates what happens when Adam "wanders" around the manifold of minimizers. The Sign, Magnitude and Variance of Stochastic Gradients

It argues that Adam's second moment actually causes word representations to become narrow and directional (anisotropic).

This version of ADAM is used for "splitting" an elite population of particles to better sample rare events or solve multi-objective optimization problems.

It's often applied to power grid reliability or particle transport. 3. Adam Reduces a Unique Form of Sharpness