Student Team: NO
ComVis
Audacity
FFMPEG (for MP3 to WAV and downsampling)
Code from https://github.com/johnmartinsson/bird-species-classification adapted to create the spectograms
Approximately how many hours were spent working on this submission in total? 150
May we post your submission in the Visual Analytics Benchmark Repository after VAST Challenge 2018 is complete? YES
Video
https://drive.google.com/open?id=1T75buZ0TEICJ3GZ36Hp5i2OYaP3CG6DK
Questions
The number of recordings varies significantly with time. There are not so many recordings in the beginning, and the number rises starting with 2007. We have computed centroids (average positions) of each bird species and year. We have also computed how much do the centroids move over years. The results are very different. Figure 1.1 depicts yearly centroids for the four species which move the least and four species which move the most. As expected, the centroids are either clustered or spread around. The corresponding cumulative path view shows how distance evolves over time. Note the steep rise for rose crested blue pipit in 2018 - the birds moved far away.
Figure 1.1: Yearly centroids for the four species which move the least (left) and four species which move the most (middle). The corresponding cumulative path view shows how distance evolves over time (right).
A detailed examination with linking and brushing shows various patterns. Green-tipped Scarlet Pipit, for example, has been recorded all over the preserve (Figure 1.2). Ordinary Snape or Qax are the species which do not move much. All recordings are limited to a relatively small area. Some species, such as Scrawny Jay, e..g., live basically at two locations. We could not find a temporal pattern which would correspond to those two locations, so we reason that there are two communities of the Jays in the preserve.
Figure 1.2: Locations for Green-tipped Scarlet Pipit, Ordinary Snape, Qax, and Scrawny Hay.
We have also noticed that some birds (Figure 1.3) do not appear between october and january (cold months). There are less recordings in these months for most of the birds. Two exceptions are Qax and Carries Champagne Pipit, these two species are being recorded more often in cold months. As we have no information about location of the preserve, we can only reason that this seasonal variations are due to birds which leave the preserve in summer.
Figure 1.3: Some species not present between October and January - migratory birds.
Let us examine the Rose-Crested Blue Pipit in more detail. We brush years 2007 - 2018, and examine Blue Pipit distribution relative to the overall number of recordings. There were the most recordings in 2015 (and 2016 had the most recordings in general). As we have no data for the whole 2018 (and there are always less recordings in winter), we cannot see if the number of recordings drops in 2018, or is 2017 an exception only, We do see, however, that Pipits count started to drop earlier.
As pipits are sensitive birds, it is worth investigating if something caused the population decrease. We also brush year by year (from 2007 until 2018) and show the recordings locations, bounding boxes and crosshair which shows the average postin and one standard deviation in x and y direction. Figure 1.4 shows the histogram with recordings counts and overall locations scatterplot, while Figure 1.5 shows how Blue Pipits moved. We see how the number of recordings increased in 2015, and how it decreases. The change from 2017 to 2018 is extreme as Pipits have never been so far to the south west.
Figure 1.4: Blue Pipits recordings counts histogram (left) and locations scatterplot (right).
Figure 1.5: Blue Pipits locations changes over years.
The set of bird calls provided by Kasios consists of 15 anonymous audio files in mp3 format for which the company claims to be records of Pipits taken only recently among the Preserve. The only available information about the sound files are their coordinates (Figure 2.1).
Figure 2.1: Map of the test birds recordings provided by Kasios.
Our findings show that the audio files are NOT of Pipits only and, moreover, the Pipits among them are represented in minority. In order to prove that, we have used the large bird song collection provided, to build a classifier of all bird kinds contained there, and predict the bird kinds within the Kasios collection.
Our classifier is based on a Deep Learning approach and uses Convolutional Neural Network (CNN), which takes as input pictures. Thus, in order to prepare our input data for the CNN, we have firstly converted the mp3 files from the library into spectral files with an equal length of 3 seconds each. We have separated these spectrograms further into signal and noise files in tiff format with 32 bits per pixel. For this step we have adopted a public code provided on github (see Tools Used). Furthermore, we have transformed the tiff files into png files with 16 bits per pixel.
The png files are now the input training data for our CNN classifier. The CNN is built with the Python library Keras and has the following architecture: a convolutional layer with 32 3x3 kernels and a pooling layer (with MaxPooling function) with filter size of 2x2. Then a second convolutional layer is added with 64 3x3 kernels and a second MaxPooling layer with pool size of 2x2 is added. After a following flattening layer, the input data has been further fed up into an Artificial Neural Network with one fully connected layer of 128 units and an output layer of 39 units, which describe the 19 bird kinds call and songs, as well as the noise files as an additional class. In order to avoid overfitting a Dropout technique is used and the optimization is done with the “adam” method. Note that to reduce calculation time we have only have only used one fully connected layer and have reduced the original size of the images more than twice – to 64x64 pixels.
We have also augmented our data by using image generator. The classifier has been trained on ~26 000 files and tested on only about 10% of those: ~2800 files. Our training accuracy has reached 91% and the test accuracy 68% (Overfitting is still noticeable, there is a room for improvement).
After completing the same steps of data preparation on the test sound files provided by Kasios, we have used the classifier to predict the bird kind belonging of each one of the mp3 files. In particular, we have predicted the bird kind of each of the three seconds segments of the test files provided. As certain bird kind prediction we have taken the bird kind predicted by the majority of the predicted three seconds mini-files. In the particular case of too many “mini” predictions that differ from each other, we have left the general prediction as unknown. We have highlighted the Rose Crested Blue Pipit predicted, also in the cases when they have represented a minority of “mini” predictions.
The results of the classification of Test birds from Kasios (accuracy: 0.91, validation accuracy: 0.68) are as follows:
The findings are summarized in Figure 2.2.
Figure 2.2: Map of the test birds with name labels predicted by our classifier.
If we examine the locations of appearances for the classified birds for all years (Figure 2.3 left), and 2016-2018 only (Figure 2.3 right) we see that locations of the recordings are not plausible. The recordings classified as Bent-Beak Ritraff appear in the Blue-Pipit area. The recordings that were actually classified as Blue Pipits were recorded in the far north west corner (according to Kasios) - an area where they never appeared. The classification and audio examination of the Kasios files strongly suggest that the recordings are not from the blue pipit. The locations seem to be wrong. All that demonstrates a need for further investigation.
Figure 2.3: The locations of birds appearances, all years (left) and 2016-2018 only (right).
The map of the Rose Crested Blue Pipits over the years of the large collection is shown in Figure 2.4.
Figure 2.4: Rose Crested Blue Pipits over the years based on the bird song collection.
Figure 2.4 shows that the Rose Crested Blue Pipits have slowly migrated from their original area in the Preserve, especially from 2017 to 2018. In addition, the visualisation of our results have led us to the conclusion that the test files provided by Kasios DO NOT correspond to Pipits, as claimed. Consequently, the Rose Crested Blue Pipits are not thriving, their number is lower and they have moved from their original locations.
Further investigation should try to determine why audio files are NOT of Pipits only and, moreover, why the Pipits among them are represented in minority. Identifying currently unknown birds from those recordings would determine if there are false recordings, i.e., the recordings of bird species that are not or could not be present in the Preserve.
Additional statistical analysis could determine if the drop in the number of Red Crested Blue Pipits is statistically significant, especially when compared to the other bird species and overall bird population. It seems that the Blue Pipits population started to decrease earlier than other species (Figure 1.4) which certainly deserves a further investigation. The location where pipits used to live corresponds to the spot which has been (wrongly) identified as illegal toxic-waste disposal location. Maybe the toxic waste has been disposed there, and now, in 2018 there are no more Pipits in the area. The water readings in the area can help supporting this hypothesis.