Entry Name:  "UBA-Rukavina-MC1"

VAST Challenge 2018
Mini-Challenge 1

 

 

Team Members:

Sergio BANCHERO, Universidad de Buenos Aires, banserbase@gmail.com

Andrei RUKAVINA, Universidad de Buenos Aires, rukavina.andrei@gmail.com PRIMARY

 

Student Team:  YES

 

Tools Used:

1.      Tableau Desktop 2018.1

2.      Excel

3.      Python

4.      R

 

Approximately how many hours were spent working on this submission in total?

·        230

 

May we post your submission in the Visual Analytics Benchmark Repository after VAST Challenge 2018 is complete?

·        YES

 

Video

https://youtu.be/YTliLScMyLo

 

 

Questions

1 – Using the bird call collection and the included map of the Wildlife Preserve, characterize the patterns of all of the bird species in the Preserve over the time of the collection. Please assume we have a reasonable distribution of sensors and human collectors providing the recordings, so that the patterns are reasonably representative of the bird locations across the area. Do you detect any trends or anomalies in the patterns? Please limit your answer to 10 images and 1000 words.

We first plotted all the bird vocalizations in a map to understand their historic location for each bird species by year. As it can be noted on the following figure, the Rose-Crested Blue Pipit (RCBP) is mostly located on the north-east of the reserve until 2015. Based on contextual information of this Mini-Challenge and Previous Challenges, we know that’s is when the waste dump was done:

Figure 1: Map of bird location by year

If we compare the aggregated information for the RCBP between the periods 2012/2014 and 2015/2017 and calculate the variance of location, we find that it moved from the NE o a more centric location. To confirm this, we created the following figure:

Figure 2: RCBP location change from 2012 to 2017

Figure 2 is based on a dashboard we built on Python using Dash. It will be shown many times for different combinations of birds.  It is composed by a line chart highlighting the trend of recordings for the selected bird during a specific time period. Below we find two charts of population movement, on the left a variation heatmap and a variation histogram on the right. By interpreting the heatmap we can clearly see a migration from the NE (in red) to the center of the Preserve (in blue). During the selected period the overall net variation of recordings increased a 76%. In order to understand if that increment is an outlier we created on the following charts:

 

Figure 3: Historical trend for RCBP call recordings

This is a chart for the historical Calls recordings for the RCBP on the entire period of time (before 2010 is ignored due to incomplete data). We see a total of 5 points above one standard deviation (std) after the waste dump. Statistically are not considered outliers. However, there is one above two STD. This corresponds to December 2015. This possible outlier on the number of calls was after the dump suggesting a different behavior for the RCBP after migrating.

If we evaluate Songs, a similar scenario is found:

 

Figure 4: Historical trend for RCBP song recordings

Again, we see two points above two STD. We conclude there are spikes on both vocalization types, but we need contextual information to understand why they are taking place.

In order to do it, we first compared the RCBP with all the other birds considering Calls and Songs vocalization types:

Figure 5: Sparklines for all birds calls & songs

We see that before 2015 the majority of the species were stable. However, after the dump event, we see spikes for the RCBP and Ordinary Snape.

 

Experts in bird’s behavior suggest that their calls are listened when:

1.      Want to stay in contact with friendly birds

2.      Engaged in territorial aggression

3.      Alarmed in presence of predators.

Note: Intensity is the key to move from 1) to 3).

Also, they sing when:

1.      Demonstrate Territory

2.      Attract a Mate

3.      Protect their area from rival birds

4.      Not alarmed about a predator

 

So, if we see two different species of birds increasing the number of calls and songs while in the same are and on the same territory we can infer they’re having a battle for resources and space. Cycles should be mixed. This is, first we should see spikes on songs and later on calls.

This is why we decided to evaluate both species mentioned above together:

Figure 6: Number of Calls/Songs for RCBP & Ordinary Snape

As it can be seen in Figure 6, the trend suggests the two birds started to contend for the same territory after the date of the dump. Time axis is expressed in Quarters given there is a big seasonality effect on birds recording due to Mating (Spring) and Breeding (Summer). The average number of calls for the RCBP increased after 2015 right in between to song spikes. This is the cycle we could be looking for.

In order to make any conclusions about the trend for the RCBP we first have to analyze the share of the recording for that trend, or in other words, the percentage of total recordings:

 

Figure 7: Area chart for all birds between 2010/2017 for Calls & Songs

In the previous figure we noticed how the percentage of Calls & Songs recordings are distributed by species. One more time, we see an interesting spike for RCBP as well as for the Queenscoat & Orange Pine Plover. Interestingly enough, the last two belong to other area of the Preserve, the W and SW respectively.  Tto understand it better we needed to drill down into the Vocalization Type again:

 

Figure 8: Trends by Vocalization grouping other. Calls/Songs instances are excluded

It can be noted from the previous figure that:

 

·        While the percentage of calls for other bird species remain constant (Slope = 0) the RCBP shows a negative trend (Slope <0)

·        Although the negative trend is mostly due to the overall low number of recording values after 2016 for both calls and songs, there is a decrease on RCBP observations in that period.

·        Spikes for Songs are present in 2015Q1 & 2016 Q1, before each local maximum for Calls. This is interpreted by experts as a territorial challenge cycle. This is when one species opposes another for resources or territory.

 

To conclude we found evidence for:

1.      RCBP is migrating from the NE to the center.

2.      Ordinary Snape was occupying the area where the RCBP migrated to.

3.      The challenge for resources and territory justifies the spike on recordings for both species.

4.      After both species occupied the same space, the number of recordings decreased.

 

Extra: During our analysis we notice a similar situation taking place on the SW side of the Preserve. If we look for example at the Scrawny-Jay using the density dashboard:

 

Figure 9: Similar behavior observed on the SW of the preserve. Scrawny-Jay example.

We saw that during the selected period the overall net variation of recordings decreased a 40%.

For all birds:

 

Figure 10: Similar behavior observed on the SW of the preserve. All birds

It is clear form Figure 10 that there is something happening on the SW (read area) of the preserve that is driving the birds out of there. This may be another Waste Dump or any other type of hazard forcing the bird to move away from there.

 

2 – Turn your attention to the set of bird calls supplied by Kasios. Does this set support the claim of Pipits being found across the Preserve?  A machine learning approach using the bird call library may help your investigation. What is the role of visualization in your analysis of the Kasios bird calls?   Please limit your answer to 10 images and 1000 words.

 

We executed two-step process to classify and visualize the results of the Kasios audio files.

 

Firstly, using Python we processed all the audio files in the following way:

 

1.      Preprocessing.

1.1.   Sampling frequencies were normalized to 22,050 Hz given that the majority of the files had that sampling frequency. We took care of avoiding anti-aliasing if frequency was higher.

2.      Normalizing Samples:

2.1.   We sliced each file in 30 chunks of 5 seconds each randomly and allowing overlapping.

2.2.   We calculated the Mel-spectrogram: Mel is a frequency scale similar to how human hears. More info here.

2.3.   If during those 5 seconds the energy is minor to -75db we considered it a silence and it was classified in that way adding a new class.

3.      Training/Testing Split:

3.1.   We did an 80-20% random split of the provided bird recordings.

4.      We then trained a convolutional neural network, using the Mel-Spectrograms generated features, which have a dimensionally of 128x216 (this is Frequency resolution times a ~5s window)

5.      After that, we classified the audio files from Kasios.

5.1.   The final class was decided by the majority class of all the chunks for each file without considering silences.

 

The result can bee seen in the following confusion matrix:

 

Figure 11: Confusion Matrix

Regarding the files provided by Kasios, the classification results can be found in the following figure:

 


Figure 12: Classification Results

·        Bent-Break Riffraff: 1,6 11, and 15.

·        RCBP: Files 2, 9, and 13.

·        Orange Pine Plover: 10 and 12.

·        Bombadil: Files 3 and 4.

·        Ledsser Birchbeere: File 8.

·        Qax: File 7

·        Canadian Cootsmum: Files 5 and 14.

 

Secondly, we represented the audio files in a 2d scatterplot using the t-sne procedure. For that, we executed the following steps:

 

1.      Filtering:

1.1.   We kept only class-A type of files.

1.2.   We used both training library and Kasios files together.

2.      Preprocessing:

2.1.   Filter to remove frequencies with low energy or variation.

Going from:

qt_img375256485266980869

Figure 13: File 12 original audio file spectrum.

To:

qt_img375315987743899653

Figure 14: File 12 without noise spectrum.

3.      Centralizing Samples:

In order to isolate the birds sounds we first:

3.1.   Scrolled over the time dimension finding the maximum amplitude. (Blue line in Figure 15)

3.2.   When all maximums were found, we then applied two triangular convolutions to smoother the results. (Orange line in Figure 15)

3.3.   Later, we found the local maximum of that mentioned soothing line above the 80% percentile of the total amplitude.

qt_img375316799492718597

Figure 15: Centralizing process steps.

3.4.   We located the maximums (Green lines in Figure 15). Centering in those, we took a window of 1.486 seconds of length around it. 2.972 seconds in total.

Figure 16: Centralized bird call for first chunk of file 12. Original above and pre-processed below.

4.      Using a convolutional auto-encoder with 64 latent dimensions we translated the resulting data set from step 3 into a 64-feature dataset for all the samples.

5.      Finally, we used t-sne to create a 2d-map maximizing distances between classes.

6.      Using data from step 5, w created a t-sne plot and then added two area overlays using a gaussian kernel density estimator including both 90% and 75% of the points of the same class.

 

Figure 17: T-sne map vs geographical map with predicted classes for Kasios test files

Please note that in there each point represents a chunk of a given file. For the Kasios files, only medoids were represented. We did this in order to validate the classification.

 

Extra: Although we confident about the algorithm’s predictions it seems to be a problem with RCBP and Bent-beak Riffraff recording locations on the map. As it can be seen in the Figure 17 Bent-beak Riffraff recordings are on the new RCBP area. Does this mean Kasios faked the audio files’ location? Or was the RCBP expelled from the NW area before 2015 and the recordings are really old? Without having a bigger dataset and certainty about the time when the Kasios recording were taken, is not possible to conclude anything.

 

Our results:

 

·        We found that only 3 instances belong to the RCBP. The exact distribution can be seen in Figure 12

·        T-sne map helps us confirm the algorithm suggested classification.

·        Using Dash, we created another interactive dashboard that let you analyze each recording chunk individually. Please refer to the video to see it works.

 

File 8 example:

 

Figure 18: T-sne map focused around File 8 area. RCBP in blue and test file using black X.

When File 8 is selected the dashboard automatically populates the following figures:

Figure 19: File 8 Spectrogram analyses

Comparing RCBP spectrogram, with File 8 medoid and the suggested class by the model.  It later presents information about the frequency characteristics of the audio file:

 

Figure 20: Compared frequency characteristics of audio file vs expected distribution

Although they may seem similar, different components in Frequencies are present in both instances.

 

We conclude we were able to find evidence to affirm:

 

1.      The data set provided by Kasios doesn’t support the claim of RCBP being found across the Preserve. Only three recordings belonged to Pipits.

2.      As it can be seen on Figure 12 the rest of the recording belong to other species.

 

3 – Formulate a hypothesis concerning the state of the Rose Crested Blue Pipit.  What are your primary pieces of evidence to support your assertion?  What next steps should be taken in the investigation to either support or refute the Kasios claim that the Pipits are actually thriving across the Boonsong Lekagul Wildlife Preserve?  Please limit your answer to 500 words.

 

Hypothesis:

 

RCBP population is decreasing due to a forced migration and the necessity to fight for limited resources and territory because of the water contamination.

 

Our evidence, assuming recordings are good estimators of bird population, is:

 

·        RCBP population is decreasing.

·        The RCBP migrated from the NE to a more centric location.

·        The mentioned area was mainly occupied by the Ordinary Snape. This presented a challenge for both now competing species.

·        The increasing number of Recording during the 2015/2017 can be justified as war cries during the territory competition.

·        Both species survived but in lower numbers because of a lower number of resources and territory.

·        A similar situation has been spotted on the SW of the Preserve. Three species of bird are being affected: Orange Pine Plover, Lesser birchbeere & Scrawny-Jay

·        Only 3 out of the 15 recordings provided by Kasios belong to the RCBP.

·        Those 3 correctly matched recordings are located in an area different to the one where the RCBP currently is.

 

In Order to confirm or refute our hypothesis Boonsong Lekagul Wildlife Preserve should:

 

1.      Continue on recording birds in the exact same way it’s being done so far.

2.      Gather visual evidence of birds on the new RCBP living area.

3.      Measure air quality to confirm is healthy knowing that in the past, air was polluted in the Preserve.

4.      Evaluate water and/or air condition on the SW of the Preserve. It may be possible that Kasios is also dumping waste there. Repeat the experiment if so.

5.      Analyze if the recordings provided by Kasios are in fact new or if they’re using old recordings for the RCBP before 2015 when some of birds could be found in the NW side of the Preserve.

6.      Limit the water supply of the new area and evaluate the behavior of the RCBP and Ordinary Snape.

6.1.   If they migrate and the same trend of calls/songs is observed, we could conclude water is a decisive factor and proxy to bird population size.

6.2.   If not, nesting and food hypothesis should be tested by taking a similar approach.