Bharathi Srinivasan, Technical University of Berlin, bharathi.srini@gmail.com
PRIMARY
Vincent Deuschle,Technical University of Berlin,
vincent.deuschle@campus.tu-berlin.de
Ksenia Legostay,Technical University
of Berlin,
ksenia.legostay@campus.tu-berlin.de
Simon Fallnich, Technical
University of Berlin, simon.fallnich@gmx.de
Andreas Meyer, Technical
University of Berlin, andreas.meyer@campus.tu-berlin.de
Student Team:
YES
Tableau
Python
R
Approximately how many hours were spent
working on this submission in total?
200
May we post your submission in the
Visual Analytics Benchmark Repository after VAST Challenge 2018 is complete? YES
Questions
1 – Using the bird call
collection and the included map of the Wildlife Preserve, characterize the patterns of all of the bird species in the Preserve over the time of the collection. Please assume we have a reasonable distribution of sensors and human collectors providing the
recordings, so that the patterns are reasonably representative of the bird locations across the area. Do you detect any trends or anomalies in the patterns? Please limit your answer to 10 images and 1000 words.
Exploratory data analysis was
performed on the dataset to gain an understanding of the dataset and it
revealed some insights about the bird species in the park.
From the number of recordings
of birds over the years seen in Figure 1, a sharp decline is noticeable
starting in 2007 and post a period of growth, another decline is noticed in
2016. This reveals that there is a reduction of overall number of birds. The
bar chart shows a sharp decline in the recordings for the Rose-crested Blue
Pipits from 2015.
Figure
1
: Number of recordings in the Preserve
From the number of recordings
by month seen in Figure 2, we also notice that all the species sing more during
the months of spring from April to June.
Figure
2
: Number of recordings per month
The habitats of the birds were studied by visualizing
the recordings (Figure 3) on the map of the Preserve.
Figure
3
: Habitats of bird species in the
Preserve
Using the coordinates of the Preserve, we
visualized where songs and calls of the birds are recorded before and after
2016, to probe into the population decrease of the Rose-crested Blue Pipits.(Figure 4)
Figure
4
: Habitats of birds before and after 2016
We find that the Rose-crested Blue Pipit's habitat
(denoted by red dots) has also changed: their settlement in the north-east has started to decrease significantly since 2016. This pattern is observed for all the bird species. It might have happened because smokestack emissions of the Kasios Furniture manufacturing company
(marked with a blue diamond on the map) blows to the north, which is affecting the birds nesting habits.
In Figure 5, we look at the migration pattern
of the Rose-crested blue pipit as a function of time. The intensity of the red
color denotes a year when bird's voice was recorded (scale: lighter = earlier
years to intensive = latest years).
Figure
5
: Migration of Rose-crested Blue
Pipits
The change in habitats can be visualized
using a 2D Kernel density plot as seen in Figure 6. The bottom map indicates
the habitats before 2016 and the top one reflects the changes in habitats of
the birds. This shows the intensity of change in the population in the
northeast area of the park (spot in the closest to us).
Figure
6
: Kernel density
plot of bird habitats in the Preserve
2 – Turn your attention to the set of
birdcalls supplied by Kasios. Does this set support the claim of Pipits being found across the Preserve? A machine learning approach using the birdcall library may help your investigation.
What is the role of visualization in your analysis of the Kasios birdcalls?Please limit your answer to 10 images and 1000 words.
1.
Pre-processing
The
bird songs from the ornithology collection and the ones from Kasios were pre-processed and audio features were extracted from the audio files. 34 features including energy, entropy of energy, zero-crossing rate, MFCC, chroma
vector features were generated over sliding windows of the audio clip. Average over all the time windows were used to create a 2081 x 34 dataset, which was used as the training data. The birdcalls provided by Kasios
were similarly processed and used as the test data.
The
features of Pipits from Kasios were compared with those from the Ornithology collection by visualizing them using a swarm plot as seen in Figure 7. It did not show any anomalies between the two sets of Pipits.
Figure
7
: Swarm plot of audio features
To learn if different bird species exhibited distinct
features in their calls, we reduced the 34 dimensions using tSNE and and the
resulting plot(Figure 8) was used to interpret the result. However, it is not
possible to distinguish between species based on reduced dimensions.
Figure
8
: Scatter plot of tSNE
dimensionality reduction
So we
used the following machine-learning methods to proceed with our analysis.
In
section 2 we describe how we mapped the bird sounds provided by the Preserve to
feasible probability distributions for each species. We used this distribution
to determine the most likely species the Kasios bird sounds belong to.
In
section 3 we elaborate the machine-learning technique used to train a
classifier that takes bird sounds as input in order to predict the species the
bird belongs to. The classifier was trained on the Preserve dataset in order to
evaluate the Kasios dataset.
In
both approaches we worked with the assumption, that the features of bird sounds
are normally distributed. Due to lack of any ornithological domain knowledge,
we do not believe to have any reason to assume, that the variance of bird
sounds would skew towards one direction more than any other.
2. Probability density computation
In
order to estimate the probability of a sample bird sound provided by Kasios belonging to a specific species, we used the set of bird sounds provided by the Preserve to approximate probability distributions of a sound sample
belonging to a species
. With
the assumption of normal distribution, we computed the mean and the covariance matrix
for each bird species within the dataset of the Reserve and mapped each sample
provided by Kasios
to the following probability density function:
Similarly the probability density of each sample was computed and further taken the mean over all 15 samples provided by Kasios. Figure 9 plots the mean probability density over all 15 samples
for all 19 observed species.
Figure
9
: Rank Distribution of probability densities
As we
can see, the rank-distribution of mean probability density for the 19 observed
species follows a power-law distribution (like many other rank-distributions). Moreover,
the species with the highest mean probability density value is indeed not the
RCBP, but rather the Darkwing sparrow with the RCBP only ranking as number 4. We conclude, that it is possible, even likely, that the bird sounds provided by Kasios have not been uttered by the RCBP. We
tested this conclusion with classification approaches to find support for this claim.
3. Machine Learning Classification
We used machine-learning approaches to model a classifier
that predicts a birds species based on conducted sound samples. Following up
on the assumption that bird sounds between a species are normally distributed,
we chose a multivariate Gaussian Naive Bayes classifier as an appropriate
model. We designed our model as a multi-class classifier for all 19 species
that occurred in the dataset provided by the Preserve, which served as training
set for our classifier. Using 5 fold cross validation on the training data set, the Guassian Na�ve Bayes classifier achieved an accuracy of 81% (
GuassianNB implementation of sklearn
package of Python)
Figure
10 plots the number of classification occurrences for the 15 samples provided
by Kasios.
Figure
10
: Classification results of Kasios
audio files
As we can see, our model classifies
only one out of 15 samples as belonging to the RCBP. This finding supports our
assumption from the previous section, that the samples provided by Kasios do
indeed not belong to the RCBP.
4. Conclusion
Our findings in the last two previous
sections give us strong reasons to believe, that the 15 bird samples that are
provided by Kasios do not belong to the RCBP. In section 1 we have computed
mean probability density values for each species and found, that the RCBP is
only the forth most likely species according to this metric. In section 2 we
have trained a multivariate Gaussian Naive Bayes classifier on the verified
dataset provided by the Preserve. Only one sample provided by Kasios was
classified as belonging to the RCBP by this model. Based on these findings we
assume, that the Kasios bird sounds belong to other species and are an attempt
to mislead the public about the impact of the companies doings on the
surrounding nature and wildlife.
3 – Formulate a
hypotheses concerning the state of the Rose Crested Blue Pipit. What are your primary pieces of evidence to support your assertion?What next steps should be taken in
the investigation to either support or refute the Kasios claim that the Pipits are actually thriving across the Boonsong Lekagul
Wildlife Preserve? Please limit your answer to 500 words.
Since 2015, it is evident that the population
of the Rose-crested Blue Pipits is diminishing. Moreover, all bird species in the Preserve have
indicated signs of disturbance and their habitats have been shifting away from
the northeast area of the Preserve. After our analysis, we believe that the Pipits
whose habitat is primarily in the affected area face higher risk of being
endangered.
Using machine learning
techniques, we were able to determine that the audio clips provided by Kasios were not those of the birds in question.The evidence provided by Kasios
does not back their claim that the Pipits are healthy and thriving in the Preserve. Not only is their evidence false, 15 samples also does not realistically provide an impression of the true state of the Pipits. From the bird calls provided, we cannot
make a direct link between the actions of Kasios and the state of the Pipit except to conclude that Kasios has provided false evidence. This creates more suspicion that can be investigated by gathering data from the Preserve and the company
actions.