Student Team: YES
1. Tableau Desktop
2018.1
2. Excel
3. Python
4. R
Approximately how
many hours were spent working on this submission in total?
·
230
May we post your
submission in the Visual Analytics Benchmark Repository after VAST Challenge
2018 is complete?
·
YES
Video
Questions
1 – Using the bird call collection and the included map of the
Wildlife Preserve, characterize the patterns of all of the bird species in the
Preserve over the time of the collection. Please assume we have a reasonable
distribution of sensors and human collectors providing the recordings, so that
the patterns are reasonably representative of the bird locations across the
area. Do you detect any trends or anomalies in the patterns? Please limit your
answer to 10 images and 1000 words.
We first plotted all the bird vocalizations in
a map to understand their historic location for each bird species by year. As
it can be noted on the following figure, the Rose-Crested Blue Pipit (RCBP) is
mostly located on the north-east of the reserve until 2015. Based on contextual
information of this Mini-Challenge and Previous Challenges, we know that’s is when
the waste dump was done:

Figure 1: Map of bird location by year
If we compare the aggregated information for
the RCBP between the periods 2012/2014 and 2015/2017 and calculate the variance
of location, we find that it moved from the NE o a more centric location. To
confirm this, we created the following figure:

Figure 2: RCBP location change from 2012 to
2017
Figure 2 is based on a dashboard we built on Python using Dash. It will be shown many times for different combinations of birds. It is composed by a line chart highlighting the trend of recordings for the selected bird during a specific time period. Below we find two charts of population movement, on the left a variation heatmap and a variation histogram on the right. By interpreting the heatmap we can clearly see a migration from the NE (in red) to the center of the Preserve (in blue). During the selected period the overall net variation of recordings increased a 76%. In order to understand if that increment is an outlier we created on the following charts:

Figure 3: Historical trend for RCBP call recordings
This is a chart for the historical Calls recordings for the RCBP on the entire period of time (before 2010 is ignored due to incomplete data). We see a total of 5 points above one standard deviation (std) after the waste dump. Statistically are not considered outliers. However, there is one above two STD. This corresponds to December 2015. This possible outlier on the number of calls was after the dump suggesting a different behavior for the RCBP after migrating.
If we evaluate Songs, a similar scenario is found:

Figure 4: Historical trend for RCBP song recordings
Again, we see two points above two STD. We conclude there are spikes on both vocalization types, but we need contextual information to understand why they are taking place.
In order to do it, we first compared the RCBP with all the other birds considering Calls and Songs vocalization types:

Figure 5: Sparklines for all birds calls & songs
We see that before 2015 the majority of the species were stable. However, after the dump event, we see spikes for the RCBP and Ordinary Snape.
Experts in bird’s behavior suggest that their calls are listened when:
1. Want to stay in contact with friendly birds
2. Engaged in territorial aggression
3. Alarmed in presence of predators.
Note: Intensity is the key to move from 1) to 3).
Also, they sing when:
1. Demonstrate Territory
2. Attract a Mate
3. Protect their area from rival birds
4. Not alarmed about a predator
So, if we see two different species of birds increasing the number of calls and songs while in the same are and on the same territory we can infer they’re having a battle for resources and space. Cycles should be mixed. This is, first we should see spikes on songs and later on calls.
This is why we decided to evaluate both species mentioned above together:

Figure 6: Number of Calls/Songs for RCBP & Ordinary Snape
As it can be seen in Figure 6, the trend suggests the two birds started to contend for the same territory after the date of the dump. Time axis is expressed in Quarters given there is a big seasonality effect on birds recording due to Mating (Spring) and Breeding (Summer). The average number of calls for the RCBP increased after 2015 right in between to song spikes. This is the cycle we could be looking for.
In order to make any conclusions about the trend for the RCBP we first have to analyze the share of the recording for that trend, or in other words, the percentage of total recordings:

Figure 7: Area chart for all birds between 2010/2017 for
Calls & Songs
In
the previous figure we noticed how the percentage of Calls & Songs
recordings are distributed by species. One more time, we see an interesting
spike for RCBP as well as for the Queenscoat &
Orange Pine Plover. Interestingly enough, the last two belong to other area of
the Preserve, the W and SW respectively. Tto understand it better we needed to drill
down into the Vocalization Type again:

Figure 8: Trends by Vocalization grouping other.
Calls/Songs instances are excluded
It
can be noted from the previous figure that:
·
While
the percentage of calls for other bird species remain constant (Slope = 0) the
RCBP shows a negative trend (Slope <0)
·
Although
the negative trend is mostly due to the overall low number of recording values
after 2016 for both calls and songs, there is a decrease on RCBP observations
in that period.
·
Spikes
for Songs are present in 2015Q1 & 2016 Q1, before each local maximum for
Calls. This is interpreted by experts as a territorial challenge cycle. This is
when one species opposes another for resources or territory.
To
conclude we found evidence for:
1. RCBP is migrating from the NE
to the center.
2. Ordinary Snape was occupying
the area where the RCBP migrated to.
3. The challenge for resources
and territory justifies the spike on recordings for both species.
4. After both species occupied
the same space, the number of recordings decreased.
Extra: During our analysis we
notice a similar situation taking place on the SW side of the Preserve. If we
look for example at the Scrawny-Jay using
the density dashboard:

Figure 9: Similar behavior observed on the SW of the preserve. Scrawny-Jay example.
We saw that during the selected period the overall net variation of recordings decreased a 40%.
For all birds:

Figure 10: Similar behavior observed on the SW of the preserve. All birds
It is clear form Figure 10 that there is something happening on the SW (read area) of the preserve that is driving the birds out of there. This may be another Waste Dump or any other type of hazard forcing the bird to move away from there.
2 – Turn your attention to the
set of bird calls supplied by Kasios. Does this set support the claim of Pipits
being found across the Preserve? A
machine learning approach using the bird call library may help your
investigation. What is the role of visualization in your analysis of the Kasios
bird calls? Please limit your answer to
10 images and 1000 words.
We
executed two-step process to classify and visualize the results of the Kasios
audio files.
Firstly,
using Python we processed all the audio files in the following way:
1. Preprocessing.
1.1. Sampling frequencies were
normalized to 22,050 Hz given that the majority of the files had that sampling
frequency. We took care of avoiding anti-aliasing if frequency was higher.
2. Normalizing Samples:
2.1. We sliced each file in 30
chunks of 5 seconds each randomly and allowing overlapping.
2.2. We calculated the
Mel-spectrogram: Mel is a frequency scale similar to how human hears. More info
here.
2.3. If during those 5 seconds the
energy is minor to -75db we considered it a silence and it was classified in
that way adding a new class.
3. Training/Testing Split:
3.1. We did an 80-20% random split
of the provided bird recordings.
4. We then trained a
convolutional neural network, using the Mel-Spectrograms generated features,
which have a dimensionally of 128x216 (this is Frequency resolution times a ~5s
window)
5. After that, we classified the
audio files from Kasios.
5.1. The final class was decided
by the majority class of all the chunks for each file without considering
silences.
The
result can bee seen in the following confusion matrix:

Figure 11: Confusion Matrix
Regarding the files provided by Kasios, the classification results can be found in the following figure:

Figure 12: Classification Results
· Bent-Break Riffraff: 1,6 11, and 15.
· RCBP: Files 2, 9, and 13.
· Orange Pine Plover: 10 and 12.
· Bombadil: Files 3 and 4.
· Ledsser Birchbeere: File 8.
· Qax: File 7
· Canadian Cootsmum: Files 5 and 14.
Secondly,
we represented the audio files in a 2d scatterplot using the t-sne procedure.
For that, we executed the following steps:
1. Filtering:
1.1. We kept only class-A type of
files.
1.2. We used both training library
and Kasios files together.
2. Preprocessing:
2.1. Filter to remove frequencies
with low energy or variation.
Going from:

Figure 13: File 12 original audio file
spectrum.
To:

Figure 14: File 12 without noise
spectrum.
3. Centralizing Samples:
In order to
isolate the birds sounds we first:
3.1. Scrolled over the time
dimension finding the maximum amplitude. (Blue line in Figure 15)
3.2. When all maximums were found,
we then applied two triangular convolutions to smoother the results. (Orange
line in Figure 15)
3.3. Later, we found the local
maximum of that mentioned soothing line above the 80% percentile of the total
amplitude.

Figure 15: Centralizing process steps.
3.4. We located the maximums
(Green lines in Figure 15). Centering in those, we
took a window of 1.486 seconds of length around it. 2.972 seconds in total.

Figure 16: Centralized bird call for
first chunk of file 12. Original above and pre-processed below.
4. Using a convolutional
auto-encoder with 64 latent dimensions we translated the resulting data set
from step 3 into a 64-feature dataset for all the samples.
5. Finally, we used t-sne to
create a 2d-map maximizing distances between classes.
6. Using data from step 5, w
created a t-sne plot and then added two area overlays using a gaussian kernel
density estimator including both 90% and 75% of the points of the same class.

Figure 17: T-sne map vs geographical map with predicted
classes for Kasios test files
Please
note that in there each point represents a chunk of a given file. For the
Kasios files, only medoids were represented. We did this in order to validate
the classification.
Extra: Although we confident about
the algorithm’s predictions it seems to be a problem with RCBP and Bent-beak
Riffraff recording locations on the map. As it can be seen in the Figure 17 Bent-beak Riffraff
recordings are on the new RCBP area. Does this mean Kasios faked the audio files’
location? Or was the RCBP expelled from the NW area before 2015 and the recordings
are really old? Without having a bigger dataset and certainty about the time
when the Kasios recording were taken, is not possible to conclude anything.
Our
results:
·
We
found that only 3 instances belong to the RCBP. The exact distribution can be
seen in Figure 12
·
T-sne
map helps us confirm the algorithm suggested classification.
·
Using
Dash, we created another interactive dashboard that let you analyze each
recording chunk individually. Please refer to the video to see it works.
File
8 example:

Figure 18: T-sne map focused around
File 8 area. RCBP in blue and test file using black X.
When
File 8 is selected the dashboard automatically populates the following figures:

Figure 19: File 8 Spectrogram analyses
Comparing RCBP spectrogram, with File 8 medoid and the suggested class by the model. It later presents information about the frequency characteristics of the audio file:

Figure 20: Compared frequency characteristics of audio file vs expected distribution
Although they may seem similar, different components in Frequencies are present in both instances.
We conclude
we were able to find evidence to affirm:
1. The data set provided by
Kasios doesn’t support the claim of RCBP being found across the Preserve. Only
three recordings belonged to Pipits.
2. As it can be seen on Figure 12 the rest of the recording
belong to other species.
3 – Formulate a hypothesis
concerning the state of the Rose Crested Blue Pipit. What are your primary pieces of evidence to
support your assertion? What next steps
should be taken in the investigation to either support or refute the Kasios
claim that the Pipits are actually thriving across the Boonsong Lekagul
Wildlife Preserve? Please limit your
answer to 500 words.
Hypothesis:
RCBP
population is decreasing due to a forced migration and the necessity to fight
for limited resources and territory because of the water contamination.
Our
evidence, assuming recordings are good estimators of bird population, is:
·
RCBP
population is decreasing.
·
The
RCBP migrated from the NE to a more centric location.
·
The
mentioned area was mainly occupied by the Ordinary Snape. This presented a
challenge for both now competing species.
·
The
increasing number of Recording during the 2015/2017 can be justified as war
cries during the territory competition.
·
Both
species survived but in lower numbers because of a lower number of resources
and territory.
·
A
similar situation has been spotted on the SW of the Preserve. Three species of
bird are being affected: Orange Pine Plover, Lesser birchbeere
& Scrawny-Jay
·
Only
3 out of the 15 recordings provided by Kasios belong to the RCBP.
·
Those
3 correctly matched recordings are located in an area different to the one where
the RCBP currently is.
In
Order to confirm or refute our hypothesis Boonsong Lekagul Wildlife Preserve
should:
1. Continue on recording birds
in the exact same way it’s being done so far.
2. Gather visual evidence of
birds on the new RCBP living area.
3. Measure air quality to
confirm is healthy knowing that in the past, air was polluted in the Preserve.
4. Evaluate water and/or air
condition on the SW of the Preserve. It may be possible that Kasios is also
dumping waste there. Repeat the experiment if so.
5. Analyze if the recordings provided
by Kasios are in fact new or if they’re using old recordings for the RCBP before
2015 when some of birds could be found in the NW side of the Preserve.
6. Limit the water supply of the
new area and evaluate the behavior of the RCBP and Ordinary Snape.
6.1. If they migrate and the same
trend of calls/songs is observed, we could conclude water is a decisive factor
and proxy to bird population size.
6.2. If not, nesting and food
hypothesis should be tested by taking a similar approach.