It is well known that certain neural network architectures produce loss functions that train easier and generalize better, but the reasons for this are not well understood. To understand this better, we explore the structure of neural loss functions using a range of visualization methods.
Adversarial networks are notoriously hard to train, and simple training methods often collapse. We present a simple modification to the standard training method that increases stability. The method is provably stable for a class of saddle-point problems, and improves performance of numerous GANs.
Neural net parameters can often be compressed down to just one single bit without a significant loss in network performance, yielding a huge reduction in memory footprint and computational workload. We develop a theory of quantized nets, and explain the performance of algorithms for weight quantization.
A number of non-convex optimization problems can be convexified by “lifting” strategies. These methods yield convex formulations at the cost of substantially increased dimensionality. PhaseMax is a new type of convex relaxation that does not require lifting; it solves problems in their original low-dimensional parameter space.
FASTA (Fast Adaptive Shrinkage/ Thresholding Algorithm) is an efficient, easy-to-use implementation of the Forward-Backward Splitting (FBS) method (also known as the proximal gradient method) for regularized optimization problems. Many variations on FBS are available in FASTA, including the popular accelerated variant FISTA (Beck and Teboulle ’09), the adaptive stepsize rule SpaRSA
PDHG is a powerful splitting method that can solve a wide range of constrained and non-differentiable optimization problems. Unlike the popular ADMM method, the PDHG approach usually does not require expensive minimization sub-steps. We provide adaptive stepsize selection rules that automate the solver, while increasing its speed and robustness.