Colin Scruggs, Southwestern University, scruggsc@southwestern.edu PRIMARY
Cameron Henkel, Southwestern University, henkelc@southwestern.edu
Dr. Chad Stolper, Southwestern University, stolperc@southwestern.edu
Student Team: YES
Excel, OpenRefine
Tableau
Vue.js, D3.js, Leaflet.js, Wavesurfer.js, and vue-slider-component
Python 3 with TensorFlow, Keras, and Librosa
Collaboratory
Approximately how many hours were spent working on this submission in total?
280
May we post your submission in the Visual Analytics Benchmark Repository after VAST Challenge 2018 is complete?
YES
Video: https://vimeo.com/279573533
Live demo: https://chadstolper.github.io/vast-challenge-2018-mc1/
Questions
1 – Using the bird call collection and the included map of the Wildlife Preserve, characterize the patterns of all of the bird species in the Preserve over the time of the collection. Please assume we have a reasonable distribution of sensors and human collectors providing the recordings, so that the patterns are reasonably representative of the bird locations across the area. Do you detect any trends or anomalies in the patterns? Please limit your answer to 10 images and 1000 words.
After taking note of errors and nonstandard pieces of data found in the dataset, we utilized OpenRefine to make the dataset more readily usable. This process included standardizing date and time formats (like ‘13.5’ or ‘9.10am’) and making reasonable approximations for ambiguous time entries (such as ‘09.03m’ or ‘Dawn’). With a refined dataset, we began by examining the temporal aspect of the bird recordings.
Not accounting for null entries (e.g. ‘?:?’), we discovered that the majority of bird species found in the Preserve were recorded between 6am and 12pm, as shown in Fig. 1.1 below.
Fig. 1.1 Number of recordings across each hour of the day (created in Tableau)
This visualization also reveals that, unlike most other species, the Canadian Cootamum was recorded most often in the evening, between 6-9pm.
We then turned our focus towards examining the number of records across each month of the year.
Fig. 1.2 Number of recordings across each month of the year (created in Tableau)
The first 12 species listed in Figure 1.2 were most often recorded in the Preserve through the spring and early summer (March through June). Of those, four species have little to no recordings from late fall through winter (September through January), which may suggest they exhibit migratory patterns during this time.
While most of the remaining seven species display a fairly even number of recordings from month to month, the Carries Champagne Pipit deviates with a discernible peak from October through December.
Fig. 1.3 Carries Champagne Pipit monthly recording distribution. (created in Tableau)
We then shifted focus to the geographic locations of the recordings. An integral part of our visualization application generates heatmaps for each species overlayed on the provided map of the Preserve; using this feature led us to discover fairly well defined similarities between certain species.
One of the first similarities we discovered concerned a group of species whose recordings are concentrated in two general clusters in the northwestern region of the preserve.
Fig. 1.4 Four species with similar NW/W clusters (created with Leaflet)
Three species, while located in different sections of the map from one another, all feature a single tight-knit gathering of recordings with a few outliers.
Fig. 1.5 Heatmaps of three species with similar single clusters and outliers (created with Leaflet)
In contrast with these species that have been recorded in typically more concentrated areas, we also identified two species that share a very similar widely-spread distribution across the Preserve shown in Figure 1.6. Despite the large area they cover, they both share three distinct regions.
Fig. 1.6 Similar wide distributions w/ concentrated areas (created with Leaflet)
Like the species in the figure above, we found that the Lesser Birchbeere and Orange Pine Plover populations resemble one another. Their heatmaps show that they have both been recorded in two more densely concentrated areas over time. Additionally, their outliers share a similar region on the map.
Fig. 1.7 Two species with comparable geographic locations based on the recording metadata (created with Leaflet)
When looking into geographic data with regards to time, we detected a movement and decrease in the Rose-crested Blue Pipit population. Our heatmap visualizer revealed that, past July of 2014, no recordings of the Blue Pipit were taken in the region near the dump site where, historically, they used to be found. A concentration of Blue Pipits in the northeastern region (right on top of the Kasios dump site) stopped appearing in recordings, as shown in Figure 1.8.
Fig. 1.8 Rose-crested Blue Pipit population shift away from area near dump site (created with Audio Explorer)
Finally, we visualized how the number of recordings for each species change over the years. We concluded that each species exhibits one of three trends: 1) the number of recordings stays fairly consistent with a gradual increase, 2) their recordings sharply rise in number over the years, or 3) their numbers rise to a peak and then decline.
The first category includes the Qax, Bent-beak Riffraff, Blue-collared Zipper, Bombadil, Broad-winged Jojo, Canadian Cootamum, Carries Champagne Pipit, Green-tipped Scarlet Pipit, Eastern Corn Skeet, Lesser Birchbeere, Ordinary Snape, Pinkfinch, Purple Tooting Tout, Scrawny Jay, and the Vermillion Trillian, all with reasonably stable numbers of recordings. The second category is made up of the Queenscoat, Orange Pine Plover, and Darkwing Sparrow, whose recording numbers exhibit a pretty drastic, sudden increase. The Rose-crested Blue Pipit is the only species that has experienced a jump in recordings followed by a steady decrease.
Fig. 1.9 Change in number of recordings over the years for three species (created in Tableau)
Taking a closer look at the Rose-crested Blue Pipit’s population, we found that the number of recordings has been on a downward trend between 2014 and 2017 (excluding 2018 as it is not a full year’s worth of data).
Fig. 1.10 The Rose-crested Blue Pipit’s population has been on a downward trend since 2014. (created in Tableau)
It’s also worth noting that the three recordings of this species in 2018 are all outliers.
2 – Turn your attention to the set of bird calls supplied by Kasios. Does this set support the claim of Pipits being found across the Preserve? A machine learning approach using the bird call library may help your investigation. What is the role of visualization in your analysis of the Kasios bird calls? Please limit your answer to 10 images and 1000 words.
To properly analyze and classify the 15 Kasios audio files the company provided, we made use of Python in Google’s Colaboratory to create several machine learning models using recurrent neural networks (more specifically, long short-term memory and gated recurrent unit networks). Our .ipynb notebook file can be found on our GitHub page. The models were trained on the ornithologist’ database of bird audio of quality C and higher; however, the audio had to be preprocessed before it could be translated into machine-readable data. For each audio file, we used a noise gate to isolate the actual bird audio from as much background/ambient noise as possible, then normalized the overall dB level using the Python library Librosa. After this processing, the audio’s spectral data was converted into mel-frequency cepstrum coefficients (commonly used for speech recognition), which was fed into the algorithm to be trained and validated. It is worth noting we did not account for vocalization type when training our models. Once several models were trained on the ornithologists’ database (after tweaking the parameters for optimal accuracy), the same preprocessing and feature extraction was used on the Kasios audio files to produce predictions of their species classification.
We found that Kasios’ dataset does not support their claim. Out of the 15 recordings provided, only 3 were clearly identifiable as Rose-crested Blue Pipits, not only by our machine learning predictions but also using the audio playback and spectrogram features of our audio analyzer tool.
Visualizing the spectrograms of both the Kasios recordings and the Ornithologists’ recordings was a useful means of comparison and validation when examining the results of our machine learning experiments. Additionally, our tool will place a marker on the map representing the location Kasios claims each recording was taken. This allowed us to further analyze the likelihood of our predictions, as well as uncover any geographic anomalies.
Fig. 2.1 Rose-crested Blue Pipit and Kasios recordings #2, #9, and #13. These three Kasios recordings were predicted to be of the Pipit and visual inspection of their spectrograms confirms this. (created with Audio Explorer)
Interestingly, the locations of these three recordings do not correspond with the historical geographic data for the species.
We also identified three Kasios recordings as Bent-beak Riffraffs. Like the Rose-crested recordings, visual analysis of the spectral data (look for the ‘h’ patterns on the spectrograms in Figure 2.2) confirmed our machine learning predictions, and the locations deviated from where we’ve historically heard Bent-beaks. We are led to believe that either Kasios’ location data is inaccurate for these two species or they have moved from where we would typically expect to hear them, almost swapping places (i.e. the Kasios audio of Bent-beaks were recorded where Rose-crested are concentrated and vice-versa).
Fig. 2.2 Bent-beak Riffraff and Kasios recordings #1, #11, and #15. These three Kasios recordings were predicted to be of the Bent-beak and visual inspection of their spectrograms shows the unique ‘h’ pattern of the chirp present in all. (created with Audio Explorer)
Six more Kasios recordings were predicted as four additional species. Unlike the Rose-crested or Bent-beak recordings, the location data for these more closely matched where their predicted species have normally been found over time.
Fig. 2.3 Bombadil and Kasios recordings #3 and #4. The spectrograms confirm the predictions. (created with Audio Explorer)
Fig. 2.4 Green-tipped Scarlet Pipit and Kasios recording #6. The low-range waves in the spectrograms confirm the prediction. (created with Audio Explorer)
Fig. 2.5 Lesser Birchbeere and Kasios recording #8. The unique harmonic patterns in the spectrograms confirm the prediction. (created with Audio Explorer)
Fig. 2.6 Orange Pine Plover and Kasios recordings #10, #12. The very low chirp patterns in the spectrograms confirm the predictions. (created with Audio Explorer)
Finally, our model proved unable to accurately assign a species to three of the Kasios recordings. After further analysis we discovered two of those recordings were highly repetitive and not similar in timbre or in spectral form to any of the species in the historical database. We attempted to corroborate these recordings’ locations against any species who shared a similar geographic distribution, to no avail.
We believe this may be because these two recordings are not of any species found in the preserve which severely casts doubts on Kasios’ claim.
Fig. 2.7 Spectrograms of Kasios recordings #5 and #14 which neither our machine models or manual analysis could link to a species found in the preserve. (created with Audio Explorer)
We were able to find a match between the final unidentified Kasios recording and a species within the preserve. We manually identified two audio recordings that contained the same unique chirps as those found in Kasios file #7. Recordings 311816 and 325435 are both classified as Canadian Cootamum in the database but their calls are audibly distinct from the rest of the species. Our machine learning models did not learn to pick up this specific spectral pattern because it did not occur within a significant portion of the Canadian Cootmum species audio.
Fig. 2.8 Spectrograms for Kasios recording #7 and the two audio files from the ornithological database that match it. The spectrograms confirm the match. (created with Audio Explorer)
3 – Formulate a hypothesis concerning the state of the Rose Crested Blue Pipit. What are your primary pieces of evidence to support your assertion? What next steps should be taken in the investigation to either support or refute the Kasios claim that the Pipits are actually thriving across the Boonsong Lekagul Wildlife Preserve? Please limit your answer to 500 words.
The Rose-crested Blue Pipit population is both declining in number and shifting geographically within the Preserve. We can see this in the downward trend of recordings since a peak in 2015 (Fig. 1.10) and the disappearance of a northeastern concentration of recordings after July 2014 (Fig 1.8). While recording numbers aren’t a precise measure of overall population, it’s reasonable to assume they are connected (especially considering Kasios’ claim relies on this connection). As shown in Figure 1.8, no Pipits have been recorded in the region surrounding the dump site since mid-2014. Interestingly, Pipit recordings in a small area directly south of the dump site started appearing since this time. This suggests the birds have been affected and have migrated. The spike in recording numbers found in 2015 could be accounted for due to the population’s increased density after abandoning the region they used to inhabit around the dump site. That said, the recording numbers have still experienced a general decline, which indicates the overall Rose-crested Blue Pipit population is decreasing.
Regarding the investigation, we recommend that Kasios publish more details about their 15 audio clips, like the date and time of recording, as well as the methods used to produce them. This may clarify why certain recordings have atypical locations compared to where their respective species have been historically found in the Preserve (refer to Figures 2.1 and 2.2), or reveal any possible fabrication of data provided as ‘evidence.’ This is especially pertinent considering the strange results from audio files #5 and #14, which do not appear to be birds found in the Preserve.
We’d also propose instituting new methods of keeping track of the species populations in addition to audio recordings. For example, the Preserve’s rangers or ornithologists could begin keeping logs of when they spot each species and combine this with data from wildlife camera installations. A comprehensive survey of the Preserve’s species is required in order to definitively address this problem and prevent further ecological issues in the future.