Entry Name: "TUB-Srinivasan-MC1"

VAST Challenge 2018
Mini-Challenge 1

 

 

Team Members:

Bharathi Srinivasan, Technical University of Berlin, bharathi.srini@gmail.com PRIMARY

Vincent Deuschle,Technical University of Berlin, vincent.deuschle@campus.tu-berlin.de

Ksenia Legostay,Technical University of Berlin, ksenia.legostay@campus.tu-berlin.de

Simon Fallnich, Technical University of Berlin, simon.fallnich@gmx.de

Andreas Meyer, Technical University of Berlin, andreas.meyer@campus.tu-berlin.de



Student Team: YES

 

Tools Used:

Tableau

Python

R

 

Approximately how many hours were spent working on this submission in total?

200

 

May we post your submission in the Visual Analytics Benchmark Repository after VAST Challenge 2018 is complete? YES

 

 

 

 

Questions

1Using the bird call collection and the included map of the Wildlife Preserve, characterize the patterns of all of the bird species in the Preserve over the time of the collection. Please assume we have a reasonable distribution of sensors and human collectors providing the recordings, so that the patterns are reasonably representative of the bird locations across the area. Do you detect any trends or anomalies in the patterns? Please limit your answer to 10 images and 1000 words.


Exploratory data analysis was performed on the dataset to gain an understanding of the dataset and it revealed some insights about the bird species in the park.

From the number of recordings of birds over the years seen in Figure 1, a sharp decline is noticeable starting in 2007 and post a period of growth, another decline is noticed in 2016. This reveals that there is a reduction of overall number of birds. The bar chart shows a sharp decline in the recordings for the Rose-crested Blue Pipits from 2015.

 

Images/image1.png

Figure 1 : Number of recordings in the Preserve

 

From the number of recordings by month seen in Figure 2, we also notice that all the species sing more during the months of spring from April to June.

Images/image2.png

Figure 2 : Number of recordings per month

 

The habitats of the birds were studied by visualizing the recordings (Figure 3) on the map of the Preserve.

Images/image3.png

Figure 3 : Habitats of bird species in the Preserve

 

Using the coordinates of the Preserve, we visualized where songs and calls of the birds are recorded before and after 2016, to probe into the population decrease of the Rose-crested Blue Pipits.(Figure 4)

 

Images/image4.png

Figure 4 : Habitats of birds before and after 2016

We find that the Rose-crested Blue Pipit's habitat (denoted by red dots) has also changed: their settlement in the north-east has started to decrease significantly since 2016. This pattern is observed for all the bird species. It might have happened because smokestack emissions of the Kasios Furniture manufacturing company (marked with a blue diamond on the map) blows to the north, which is affecting the birds nesting habits.


In Figure 5, we look at the migration pattern of the Rose-crested blue pipit as a function of time. The intensity of the red color denotes a year when bird's voice was recorded (scale: lighter = earlier years to intensive = latest years).

Images/image5.png

Figure 5 : Migration of Rose-crested Blue Pipits

The change in habitats can be visualized using a 2D Kernel density plot as seen in Figure 6. The bottom map indicates the habitats before 2016 and the top one reflects the changes in habitats of the birds. This shows the intensity of change in the population in the northeast area of the park (spot in the closest to us).

Images/image6.png

Figure 6 : Kernel density plot of bird habitats in the Preserve

 

2Turn your attention to the set of birdcalls supplied by Kasios. Does this set support the claim of Pipits being found across the Preserve? A machine learning approach using the birdcall library may help your investigation. What is the role of visualization in your analysis of the Kasios birdcalls?Please limit your answer to 10 images and 1000 words.

1.     Pre-processing

The bird songs from the ornithology collection and the ones from Kasios were pre-processed and audio features were extracted from the audio files. 34 features including energy, entropy of energy, zero-crossing rate, MFCC, chroma vector features were generated over sliding windows of the audio clip. Average over all the time windows were used to create a 2081 x 34 dataset, which was used as the training data. The birdcalls provided by Kasios were similarly processed and used as the test data.

The features of Pipits from Kasios were compared with those from the Ornithology collection by visualizing them using a swarm plot as seen in Figure 7. It did not show any anomalies between the two sets of Pipits.

Images/image7.png

Figure 7 : Swarm plot of audio features

 

To learn if different bird species exhibited distinct features in their calls, we reduced the 34 dimensions using tSNE and and the resulting plot(Figure 8) was used to interpret the result. However, it is not possible to distinguish between species based on reduced dimensions.

Images/image8.png

Figure 8 : Scatter plot of tSNE dimensionality reduction

So we used the following machine-learning methods to proceed with our analysis.

In section 2 we describe how we mapped the bird sounds provided by the Preserve to feasible probability distributions for each species. We used this distribution to determine the most likely species the Kasios bird sounds belong to.

In section 3 we elaborate the machine-learning technique used to train a classifier that takes bird sounds as input in order to predict the species the bird belongs to. The classifier was trained on the Preserve dataset in order to evaluate the Kasios dataset.

In both approaches we worked with the assumption, that the features of bird sounds are normally distributed. Due to lack of any ornithological domain knowledge, we do not believe to have any reason to assume, that the variance of bird sounds would skew towards one direction more than any other.

2. Probability density computation

In order to estimate the probability of a sample bird sound provided by Kasios belonging to a specific species, we used the set of bird sounds provided by the Preserve to approximate probability distributions of a sound sample belonging to a species . With the assumption of normal distribution, we computed the mean and the covariance matrix for each bird species within the dataset of the Reserve and mapped each sample provided by Kasios to the following probability density function:

 

 

Similarly the probability density of each sample was computed and further taken the mean over all 15 samples provided by Kasios. Figure 9 plots the mean probability density over all 15 samples for all 19 observed species.

Images/image9.png

Figure 9 : Rank Distribution of probability densities

As we can see, the rank-distribution of mean probability density for the 19 observed species follows a power-law distribution (like many other rank-distributions). Moreover, the species with the highest mean probability density value is indeed not the RCBP, but rather the Darkwing sparrow with the RCBP only ranking as number 4. We conclude, that it is possible, even likely, that the bird sounds provided by Kasios have not been uttered by the RCBP. We tested this conclusion with classification approaches to find support for this claim.

3. Machine Learning Classification

We used machine-learning approaches to model a classifier that predicts a birds species based on conducted sound samples. Following up on the assumption that bird sounds between a species are normally distributed, we chose a multivariate Gaussian Naive Bayes classifier as an appropriate model. We designed our model as a multi-class classifier for all 19 species that occurred in the dataset provided by the Preserve, which served as training set for our classifier. Using 5 fold cross validation on the training data set, the Guassian Na�ve Bayes classifier achieved an accuracy of 81% ( GuassianNB implementation of sklearn package of Python)

Figure 10 plots the number of classification occurrences for the 15 samples provided by Kasios.

Images/image10.png

Figure 10 : Classification results of Kasios audio files

As we can see, our model classifies only one out of 15 samples as belonging to the RCBP. This finding supports our assumption from the previous section, that the samples provided by Kasios do indeed not belong to the RCBP.

4. Conclusion

Our findings in the last two previous sections give us strong reasons to believe, that the 15 bird samples that are provided by Kasios do not belong to the RCBP. In section 1 we have computed mean probability density values for each species and found, that the RCBP is only the forth most likely species according to this metric. In section 2 we have trained a multivariate Gaussian Naive Bayes classifier on the verified dataset provided by the Preserve. Only one sample provided by Kasios was classified as belonging to the RCBP by this model. Based on these findings we assume, that the Kasios bird sounds belong to other species and are an attempt to mislead the public about the impact of the companies doings on the surrounding nature and wildlife.

 

3Formulate a hypotheses concerning the state of the Rose Crested Blue Pipit. What are your primary pieces of evidence to support your assertion?What next steps should be taken in the investigation to either support or refute the Kasios claim that the Pipits are actually thriving across the Boonsong Lekagul Wildlife Preserve? Please limit your answer to 500 words.

 

Since 2015, it is evident that the population of the Rose-crested Blue Pipits is diminishing. Moreover, all bird species in the Preserve have indicated signs of disturbance and their habitats have been shifting away from the northeast area of the Preserve. After our analysis, we believe that the Pipits whose habitat is primarily in the affected area face higher risk of being endangered.

 

Using machine learning techniques, we were able to determine that the audio clips provided by Kasios were not those of the birds in question.The evidence provided by Kasios does not back their claim that the Pipits are healthy and thriving in the Preserve. Not only is their evidence false, 15 samples also does not realistically provide an impression of the true state of the Pipits. From the bird calls provided, we cannot make a direct link between the actions of Kasios and the state of the Pipit except to conclude that Kasios has provided false evidence. This creates more suspicion that can be investigated by gathering data from the Preserve and the company actions.

ml>