Student Team: YES
Python
Tableau
Excel
Approximately how many hours were spent working on this submission in total?
180
May we post your submission in the Visual Analytics Benchmark Repository after VAST Challenge 2018 is complete? YES
Video
Tableau Workbook: https://tinyurl.com/yc3edlrp
Questions
1 – Using
the bird call collection and the included map of the Wildlife Preserve,
characterize the patterns of all of the bird species in the Preserve over the
time of the collection. Please assume we have a reasonable distribution of
sensors and human collectors providing the recordings, so that the patterns are
reasonably representative of the bird locations across the area. Do you detect
any trends or anomalies in the patterns? Please limit your answer to 10 images
and 1000 words.
Spatiotemporal data of
bird calls recorded at various locations and at different points in time help
us to visualize the migrations of bird species. As seen from below, there is
significant activity happening across the time range between various points in
the preserve. The lines indicate an animated path through time showing bird
calls are being recorded at almost all points in the preserve. Having been
provided information that the sensor recording patterns represent bird
locations, we can estimate where the home base of the various bird species is.
Figure 1 Tracing the path of bird movements across the preserve through the cumulative density of the recordings
Identifying
the home base and migration patterns of bird species
We can analyze where a
bird spends the most time in the preserve, and what kind of movement patterns
it has exhibited. For each bird, we can infer and attribute a home base.
Furthermore, we can also see if there are any notable trends of bird
migrations.
For the table below,
North is at top of the Preserve map provided. For e.g., the dumping Kasios site
provided would be in the North East of the Preserve.
|
Bird |
Home base identified |
Migration patterns (if any) |
|
Bent Beak Riffraff |
No clear home base, however this species is mostly on the left side of the preserve |
|
|
Blue Collared Zipper |
South West |
Travels across the preserve occasionally |
|
Bombadil |
North West |
Rarely migrates |
|
Broad-Winged Jojo |
South West |
Travels across the preserve occasionally |
|
Canadian Cootamum |
North West |
Travels across the preserve occasionally |
|
Carries Champaign Pipit |
South East |
Travels across the preserve occasionally |
|
Darkwing Sparrow |
No clear home base |
Travels all around the Preserve |
|
Eastern Corn Skeet |
Upper middle portion of the Preserve |
Predominantly travels in the top half of the preserve |
|
Green-tipped Scarlet Pipit |
No clear home base |
Travels all around the Preserve |
|
Lesser Birchbeere |
Lower middle portion of the Preserve |
Alternates between South East lower to South West lower |
|
Orange Pine Plover |
Lower middle portion of the Preserve |
Alternates between South East lower to South West lower |
|
Ordinary Snape |
North East |
Rarely migrates |
|
Pinkfinch |
Lower middle portion of the Preserve |
Alternates between South East lower to South West lower |
|
Purple Tooting Foot |
No clear home base |
Travels all around the Preserve |
|
Qax |
Middle-left portion of the preserve.(Based on limited data). |
Limited data available |
|
Queenscoat |
North West |
Rarely migrates |
|
Rose-crested Blue Pipit |
North East |
Travels to an extent around the Preserve |
|
Scrawny Jay |
Left of the Preserve |
Alternates between North West and South West |
|
Vermillion Trillian |
North West |
Alternates between North and South West, closer to the middle of the Preserve |
Table 1:
Home base and migration patterns of the bird species over time

Figure 2: Identifying the home base and migration patterns of each of the bird species
The Frequent Flyers
Coordinates of each location
allow the calculation of distances travelled by each species between each
successive recording. A deeper dive into the patterns of total distances
travelled over time can be obtained by this stacked box plot of distance below.
Figure 3: Sum of
total distances travelled by a bird species between successive recordings
Overall, the Orange
Pine Plover, Queenscoat and Rose-Crested Blue Pipit species travel the most across
the Preserve, in line with the migration pattern identified from above.
An
emergency landing take-off in the North East of the Preserve?
The exact location of
the alleged Kasios dumping site helps to understand if there was any
significant activity being observed closer to the alleged dumping site. Specifically, we try to understand if there were any alarming
patterns or changes in migration from North East of the preserve at a point in
time, by utilizing the Euclidean distance between the species recording and the
dumping site.
The Rose-crested Blue
Pipits, which was inhabiting the region around the dumping site, over the years
until around December 2014. A significant and sudden drop in the number of
recordings near the dumping site indicates a clear pattern of movement of this
pipit species. This does make a strong case for the team to flag this as an
anomaly as the resting base of the pipits, which were hitherto stable, has now
been stirred.
Figure 4: Migration of the Rose Crested Blue Pipit from December 2014
Narrowing down
further, it is clear from the below GIF that the sudden movement away from the
site starts to happen from December 2014.
Figure 5: An animated
visualization of the sudden shift in the home base of the Rose Crested Blue
Pipit
2 – Turn
your attention to the set of bird calls supplied by Kasios. Does this set support
the claim of Pipits being found across the Preserve? A machine learning
approach using the bird call library may help your investigation. What is the
role of visualization in your analysis of the Kasios bird calls? Please
limit your answer to 10 images and 1000 words.
In this section, we shall cover the methodology
underlying our analyses of the audio files provided, followed by how we
utilized that approach to refute Kasios’ claim.
Role of
Visualization in analyzing bird calls
An overview of the methodology is as shown below.
Figure 6: An overview of the deep learning approach to classify bird sounds
Data
Preprocessing
The team decided to analyze audio files by converting them into spectrograms and then applying image classification on them to identify the bird species tagged to each of the audio files provided. We used Librosa, a Python package to accomplish the task above. In doing so, we believe sound quality A, B are of a higher audio quality as compared to the other 3 bands. This inference was arrived at by a manual inspection of a random sample of audio clips belonging to each category.
Given the size of the audio recordings
in each file are of varying lengths, spectrograms were trimmed into sets of
chunks with a 5 second duration. As can be seen below, the
images are able to clearly show a difference between the audio waves of
different bird species.
Figure 7: [Above]: An illustration of how an audio file is made into a spectrogram [Below] An illustration of a 5 second chunk of specific bird call audio
A dev set of 50 images
for each bird was taken for model validation purposes. The rest of the images
was utilized as the training data for modeling this as a supervised
classification problem.
Tuning the weights of the Inception V4 Network
The team is inspired by the traction
gained in the field of computer vision and believes the active research in the
deep learning community can be leveraged to help us accomplish this
classification and visualization task. We have utilized the Inception
V4[1],is a pioneering
neural network architecture developed by Szegedy et.al. Leveraging on inception
blocks, they are tailor made for executing computer vision oriented workloads
and are popular entrants in events such as the ImageNet
Challenge. The underlying architecture of the network is as
shown below, where the red circles indicate the inception blocks, the blue
layers indicating Convolution, red boxes indicating pooling layers, and the
final fully connected layer feeding into a Yellow SoftMax layer. Normalization
layers are illustrated in Green. No changes were made to the
author’s architecture, and we follow the author’s implementation of the
Inception V4 network.
Figure 8: The architecture of the Inception V4 neural network (For more details, please refer to https://arxiv.org/pdf/1602.07261.pdf
After training the data
through 30 epochs (passes through the training set), we can tune the weights of
the network by using a cross entropy loss function and backpropagation.
Development was done using Python leveraging the Caffe framework.
Furthermore,
t-distributed Stochastic Neighbor Embedding (t-SNE) is used to visualize how separated each species looks based on the
features learnt by the network. The more separated each species is, the better
the model performance is. As seen from below, it can make a clear distinction
between the Rose Crested Blue Pipit vs other birds.
Figure 9: Visualizing
the features learnt by the network on a two-dimensional scale by the use of
t-SNE
Figure 10: The T-SNE shows a clear separation of the Rose Crested Blue Pipit’s features learnt by the network
Model
Evaluation
We use Precision as the metric for assessing our models on the dev set. We attain a 78% precision on the image chunks (label 1 in the below figure). After this, we labeled each of the audio files based on the majority classification of its chunks, i.e. if 7 out of 10 of the chunks were Rose Crested Blue Pipit and the remaining 3 were classified as Qax, the entire recording would be labeled as Rose Crested Blue Pipit. The threshold we use to classify entire bird recordings is greater than 50%.
The precision attained is 80.4% on the overall
label classification (label 2).
Figure 11: A simplified map illustrating the model flow
Testing
Kasios Claims
The 15 test files provided by Kasios are then fed
into the model to produce the output labels. We find that the bird recordings
do not all belong to the Pipit species as claimed by Kasios. In fact, 12 out of
the 15 recordings belong to non-Pipit species. A breakdown of the classified
recordings can be seen below, indicating pipits are not found across the preserve,
and just 20% of Kasios’ claims are correct.
Figure 12: Results of
classification of the 15 audio files provided by Kasios
Furthermore, when we plot these classified outputs
on the map, it is evident that Kasios’ recording locations do not match to what
we inferred as the pipit home base from Figure 2.
Figure 13: The files provided by Kasios annotated with the results derived from the model indicate that only 3 of them are Rose Crested Blue Pipits as opposed to the Kasios claim of pipits being found across the preserve.
Furthermore, the home
base of such pipits should have been near the X annotated in red in North East
as identified in Figure 2 and not in the North West as shown above
3 – Formulate
a hypotheses concerning the state of the Rose Crested Blue Pipit. What
are your primary pieces of evidence to support your assertion? What next
steps should be taken in the investigation to either support or refute the
Kasios claim that the Pipits are actually thriving across the Boonsong Lekagul
Wildlife Preserve? Please limit your answer to 500 words.
Hypotheses
and supporting evidence
The key hypotheses we
develop is that the Rose Crested Blue Pipit is a species which shall continue
to be affected by Kasios’ activities such as the chemical dumping and
smokestack emissions. The claim put forward by Kasios is that pipits are
happily thriving across the preserve.
The following are factors that reinforce why we can refute their claim.
Where to
head in the future with this investigation?
The next steps to be taken to refute Kaisos’
claim would be to take independent recordings across the entire Preserve over a
period of a few months to determine whether the Pipit species is really found
across the Preserve. These recordings should be taken at regular intervals,
e.g. daily or weekly, in the mornings or evenings, to accurately determine the
true migration patterns of the species. The pursuit of estimating the pipit
volumes can be extended by using video analytics related technologies to
capture visual proofs of the birds at different locations. A similar
methodology to audio classification can be extended to apply a similar transfer
learning approach, as surveillance methods are fast gaining traction in the
current technology era.
“The ball is
in your court, Kasios!”
P.S.: Is
John Torch still driving those trucks around?