We designed synthetic model datasets to produce two large sets of deep neural networks based on common modeling practice:

1. Model enumeration with both architecture and hyperparameter alternations.
2. Model fine-tuning with different retraining settings.

We evaluate the effectiveness of the proposed system by applying it in managing these synthetic models.
 Description The most common behavior occuring in deep learning modeling is to mutate the architecture and hyperparameters of neural networks in hope for achieving a better prediction accuracy. We model such behavior by designing a model generator which starts from a convolution network and iteratively do minor mutation automatically. Each mutation involves randomly selecting an existing model in the pool and doing either network structure mutation or hyperparameter mutation. Network structure mutation includes insert, delete or change a layer randomly, while hyperparameter mutation includes switching the base learning rate to a random value within $[10^{−1}, 10^{−2}, 10^{−3}, 10^{−4}]$. Generator Scheme Each time, we assess models w.r.t. a probability distribution computed according to the validation accuracy and generate a new model training configuration by sampling one mutation operation. The new model is then trained and added to the repository. The model generation state machine is shown as follows: Generated Models The models together with their connections and validation accuracies are shown below: Each node is a model which is labeled with its generation order and the accuracy. Each edge shows the enumeration operation, and parental relationships between two models. Red circled models have the top 3 validation accuracies click to enlarge. Download All models can be download directly: d1enumerations.zip (503.7 MB)
Description

Another popular technique in deep learning is called model fine-tuning, which is used when the user has an existing model properly trained in a large dataset (such as ImageNet) and wants to apply such model to a new dataset which can have a different set of prediction labels. The motivation of fine-tuning is that a well trained model should have good generalization to various data so it becomes unnecessary to do end-to-end training of every model which may consume a large amount of time.

There are only a very small number of models that are trained end-to-end (usually well known architectures), compared to the number of models that are fine-tuned. So we designed another model generator to simulate the use cases of fine-tuning. We start from the VGG-16 network which was originally trained using ImageNet dataset and fine-tune the model on CASIA face recognition dataset. CASIA dataset has 10575 face categories while we sample 1000 of them in our experiments. The way to fine-tune a model is to replace the last fully connected layer with a new one and set small or even zero learning rates for existing layers. The newly mutated model trained on the new dataset will converge much faster than an end-to-end training. It is often unnecessary to fine-tune early layers but it is necessary to set a small non-zero learning rates to some latter existing layers. So they enumerated different pivot layers for fine-tuning. Our fine-tuning model generator replicates such behavior.

Generator Scheme

Each time, we first choose a pivot parametric layer in VGG16, before which the parameters are fixed while after which the parametric layers are retained. The layer-wise learning rate and weight decay scalars are set to be (1,2) and (1,0) for (weight, bias) respectively. For the last full layer (fc8 in VGG16, fc8_casia1k in new models), the scalars are set to be (10,10) for (weight, bias). Once the pivot layer is chosen, then we enumerate global optimization hyperparameters, learning rate in $[10^{−3}, 10^{−4}, 5 \times 10^{−5}]$, and weight decay in $[2 \times 10^{-4}, 5 \times 10^{-4}]$. In total, there are 54 different model configurations. Each of them are retrained 10000 iterations, and checkpointed per 1000 iteration.

Generated Models
 The models with their fine-tuning configurations and and validation accuracies are shown below: Each oval node is a model which is labeled with its generation order and the accuracy. The path from root to a model shows the detailed retraining configurations. Red circled models have the top 3 validation accuracies click to enlarge.