Optimization

Understanding generalization through visualization

The power of neural networks lies in their ability to generalize to unseen data, yet the underlying reasons for this phenomenon remain elusive. Numerous rigorous attempts have been made to explain generalization, but available bounds are still quite loose, and analysis does not always lead to true understanding. The goal of this work is to make generalization more intuitive. Using visualization methods, we discuss the mystery of generalization, the geometry of loss landscapes, and how the curse (or, rather, the blessing) of dimensionality causes optimizers to settle into minima that generalize well.

Continue reading

Attacks on copyright systems

Overview

Copyright detection systems are among the most widely used machine learning systems in industry, and the security of these systems is of foundational importance to some of the largest companies in the world. Examples include YouTube’s Content ID, which has resulted in more than 3 billion dollars in revenue for copyright holders, and Google Jigsaw, which has been developed to detect and remove videos that promote terrorism or jeopardized national security.

Continue reading

Adversarial training for FREE!

“Adversarial training,” in which a network is trained on adversarial examples, is one of the few defenses against adversarial attacks that withstands strong attacks. Unfortunately, the high cost of generating strong adversarial examples makes standard adversarial training impractical on large-scale problems like ImageNet. We present an algorithm that eliminates the overhead cost of generating adversarial examples by recycling the gradient information computed when updating model parameters.

Our “free” adversarial training algorithm is comparable to state-of-the-art methods on CIFAR-10 and CIFAR-100 datasets at negligible additional cost compared to natural training, and can be 7 to 30 times faster than other strong adversarial training methods.

Continue reading

Stacked U-Nets: A simple architecture for image segmentation

Many imaging tasks require global information about all pixels in an image. For example, the output of an image classifier may depend on many pixels in separate regions of an image. For image segmentation, in which a neural network must produce a high-resolution map of classifications rather than a single output, each pixel’s label may depend on information from far away pixels.

Conventional bottom-up classification networks globalize information by decreasing resolution; features are pooled and downsampled into a single output that “sees” the whole image. But for semantic segmentation, object detection, and other image-to-image regression tasks, a network must preserve and output high-resolution maps, and so pooling alone is not an option. To globalize information while preserving resolution, many researchers propose the inclusion of sophisticated auxiliary blocks, but these come at the cost of a considerable increase in network size, computational cost, and implementation complexity.

Continue reading

Visualizing the Loss Landscape of Neural Nets

Neural network training relies on our ability to find “good” minimizers of highly non-convex loss functions. It is well known that certain network architecture designs (e.g., skip connections) produce loss functions that train easier, and well-chosen training parameters (batch size, learning rate, optimizer) produce minimizers that generalize better. However, the reasons for these differences, and their effects on the underlying loss landscape, are not well understood.

Continue reading

Stabilizing GANs with Prediction

Why are GANs hard to train?

Adversarial neural networks solve many important problems in data science, but are notoriously difficult to train. These difficulties come from the fact that optimal weights for adversarial nets correspond to saddle points, and not minimizers, of the loss function. The alternating stochastic gradient methods typically used for such problems do not reliably converge to saddle points, and when convergence does happen it is often highly sensitive to learning rates.

Continue reading

PhasePack

contour_plot

PhasePack


A phase retrieval library



Phase retrieval is the recovery of a signal from only the magnitudes, and not the phases, of complex-valued linear measurements. Phase retrieval problems arise in many different applications, particularly in crystallography and microscopy. Mathematically, phase retrieval recovers a complex valued signal \(x\in \mathbb{C}^n\) from \(m\) measurements of the form

Continue reading

Distributed Machine Learning

server

Distributed


Machine Learning



-

Classical machine learning methods, include stochastic gradient descent (also known as backprop), work great on one machine, but don’t scale well to the cloud or cluster setting. We propose a variety of algorithmic frameworks for scaling machine learning across many workers. Many of our distributed ML experiments are done using USNA’s Grace Supercomputer, which is currently hosted at University of Maryland.

Continue reading

The stone transform: flexible compressed sensing

Overview

Compressive sensing enables the reconstruction of high-resolution signals from under-sampled data. While compressive methods simplify data acquisition, they require the solution of difficult recovery problems to make use of the resulting measurements. The stone transform is a new sensing framework that combines the advantages of both conventional and compressive sensing. Using the proposed stone transform, measurements can be reconstructed instantly at Nyquist rates at any power-of-two resolution. The same data can then be “enhanced” to higher resolutions using compressive methods that leverage sparsity to “beat” the Nyquist limit. The availability of a fast direct reconstruction enables compressive measurements to be processed on small embedded devices. We demonstrate this by constructing a real-time compressive video camera.

Continue reading