Entry Name:  VAST of Hornets

VAST Challenge 2018

Mini-Challenge 1

 

 

Team Members:

Aaron Miller, California State University, Sacramento, aaronjeromemiller@csus.edu

Andrew Tran, California State University, Sacramento, andrewtran@csus.edu

Anna Baynes, California State University, Sacramento, abaynes@ecs.csus.edu

Cyrill Castro, California State University, Sacramento, cyrillcastro@csus.edu

Sameera Dasu, California State University, Sacramento, yaminidasu@csus.edu

Student Team:  NO

Tools Used:

 

Approximately how many hours were spent working on this submission in total?

40 Hours

 

May we post your submission in the Visual Analytics Benchmark Repository after VAST Challenge 2018 is complete? Yes

 

Video

 

 https://drive.google.com/file/d/1ORNUKzreF2syGmJZpYZn65vC8NMkSMqI/view?usp=sharing

 

Questions

1 – Using the bird call collection and the included map of the Wildlife Preserve, characterize the patterns of all of the bird species in the Preserve over the time of the collection. Please assume we have a reasonable distribution of sensors and human collectors providing the recordings, so that the patterns are reasonably representative of the bird locations across the area. Do you detect any trends or anomalies in the patterns? Please limit your answer to 10 images and 1000 words.

Looking at the bird call collection data given from the Mistford town, a few patterns seem to be clear in the distribution and number of recordings. One of the most apparent patterns that can be seen in the data is the increase in the number of recordings over time. When looking at the frequency table of the year the recordings were made (using subset of the entries including this information and excluding the incomplete data for 2018), there seems to be a roughly exponential pattern of growth.

This increase in recordings may reflect a general increase in the population of the birds over time. Interestingly, while the frequency of recordings was increasing year to year from 2009 to 2016, the number of recordings suddenly dropped from 2016 to 2017. Furthermore, if just the subset of the entries made from January to March (for which are the months there is data for 2018) is examined, it becomes apparent from the histogram below that the number of recordings from 2018 is unexpectedly low given the general growth trend. This suggests that the number of recordings for the rest of 2018 will also be lower than expected.

An examination of the bird collection data shows that a majority of the bird recordings were taken in the western half of the map. The eastern half of the map, where the alleged illegal dump site of the Kasios furniture company is located (marked by the triangle in the north-east portion of the map), has noticeably fewer bird recordings. This pattern could be connected by the Methylosmolene that Kasios was alleged to have used in their manufacturing process. Below is a map demonstrating the locations of the bird recordings in a given year.

Bird Recordings: Metadata Analysis

For convenience, some abbreviations for the 19 bird species being discussed are listed below:

ID

Specie Name

ID

Specie Name

BEN

Bent-beak Riffraff

ORA

Orange Pine Plover

BLU

Blue-collared Zipper

ORD

Ordinary Snape,

BOM

Bombadil

PIN

Pinkfinch

BRO

Broad-winged Jojo

PUR

Purple Tooting Tout

CAN

Canadian Cootamum

QAX

Qax

CAR

Carries Champagne Pipit

QUE

Queenscoat

DAR

Darkwing Sparrow

ROS

Rose-crested Blue Pipit

EAS

Eastern Corn Skeet

SCR

Scrawny Jay

GRE

Green-tipped Scarlet Pipit

VER

Vermillion Trillian

LES

Lesser Birchbeere

Analysis of Aggregate Data

Population Analysis:

This map shows the location of all recordings in the region. The colors represent the different species and the color coding is given above in the bar plot. The triangle denotes the location of the alleged dump site.

ID

# of Recordings

ID

# of Recordings

BEN

72

ORA

215

BLU

67

ORD

94

BOM

140

PIN

73

BRO

94

PUR

73

CAN

82

QAX

53

CAR

104

QUE

241

DAR

86

ROS

186

EAS

88

SCR

91

GRE

88

VER

84

LES

150

Analysis of Time of Year and Day:

The number of recordings seems to peek in May and again in October. This does not seem to be surprising that the birds would appear more in spring and summer.The bird calls also seem to peak in 8:00 a.m. and again in 4:00 p.m. and 8:00 p.m.

When the distribution over the time of day and time of year that most of the recordings were done, there is some variation between the species, but most of the recordings seem to be distributed roughly the same between the different species.

Analysis of Vocalization Type

Call

Song

Call & Song

Drumming

Bill-snapping

Scold

1168

767

112

11

1

1

Most of the recordings seem to be call vocalizations followed by song vocalizations.

2 – Turn your attention to the set of bird calls supplied by Kasios. Does this set support the claim of Pipits being found across the Preserve?  A machine learning approach using the bird call library may help your investigation. What is the role of visualization in your analysis of the Kasios bird calls?   Please limit your answer to 10 images and 1000 words.

To study the bird calls supplied by Kasios, first we wanted to learn about bird calls and what do bird watchers know.  After some investigation, we learned that Mel-frequency cepstral coefficients (MFCCs) are used to characterize sounds.  MFCCs are the coefficients which  represent a short-term spectrum of a sound, based on a linear cosine transform of a log power spectrum on a nonlinear mel scale of frequency.  MFCCs are typically used to compare audio files.  

So our workflow to identify the birds in Kasios’ file is to first convert each of the known bird calls and Kasios bird calls into MFCCs.  Then we build a Gaussian mixture model (GMM) classifier based on the known bird calls.  Then we use the built GMM to predict the bird name for the unknown bird calls in the Kasios file.  In order to build MFCCs, we use python libraries (python_speech_features).  However, these libraries only work on wav files.  So first, we use the sox utility to convert each mp3 file into a wav file.  Then we can continue with the GMM classifier and prediction.  Also, for some sanity checks, we did some cross validation testing of the GMM with the known labeled birds sounds.  We saw around 60% correct predictions.  This value isn’t particularly convincing, so we will need to use visualizations to further validate and understand the bird sounds.

Using this GMM classification technique, we see the following predictions made (we abbreviated the first 4 letters of the bird name):

1 - Quee

2 - Purp

3 - Carr

4 - Carr

5 - Gree

6 - Dark

7 - Carr

8 - Rose

9- Rose

10- Carr

11 - Rose

12 - Carr

13 - Dark

14- Rose

15 - Bent

Only three of the bird calls were identified as Rose Crested Blue Pipit.  This result doesn’t support Kasios claim that the Pipits are seen throughout the reserve.  Our next step is to understand this predictions using visualizations.  We use visualizations to validate the results given through GMM classification.  

The GMM classifier prediction is shown in bright green and blue shows the known recordings of the Bent-beak Riffraff.  Based on the locations of the known Bent-beak Riffraff, we are not very confident in the result of the GMM classifier.  There is not any Bent-beak Riffraff recordings in the region that the GMM classifier labeled this bird call.  Based on visualizations of the bird mappings, we guess this might be a Carries Champagne Pipit.

Again the prediction here is in bright green.  Most of the Carries Champagne Pipits are in the same region.  But the GMM classifier labeled bird calls as Carries Champagne Pipits outside of that region.

We are more confident with the prediction from the GMM for these bird calls because the bright green (predicted sounds) are in the same area as the known Darkwing Sparrow.

The result here is inconclusive because there is not many predictions and the Green-tipped Scarlet Pipit is seen in many regions throughout the park.  

The Purple Tooting Tout is also seen throughout the park and in low quantities, but the prediction (bright green) is near the known Purple Tooting Touts.

We are not confident in the prediction made for the Queenscoat because there have not been any other Queenscoats found in the spot of the prediction (bright green).  

When we browse through the all the predicted versus known bird sounds above, we don’t see any bright green values (which would be a bird sound from the Kasios set) in the region we see the Rose-crested Blue Pipits (the lower right bright pinks in the visualization above).  This realization leads us to doubt Kasios claim that the Rose-crested Blue Pipits are thriving throughout the park.  However, our GMM classifier did find a few recordings for the Rose-crested Blue Pipits.  They were not in places where Rose-crested Blue Pipits have never been recorded, but they are not in the dense area for the pipits either.  

3 – Formulate a hypotheses concerning the state of the Rose Crested Blue Pipit.  What are your primary pieces of evidence to support your assertion?  What next steps should be taken in the investigation to either support or refute the Kasios claim that the Pipits are actually thriving across the Boonsong Lekagul Wildlife Preserve?  Please limit your answer to 500 words.

A look at the histogram of recordings for the Rose-crested Blue Pipit also shows a decrease in the number of recordings that were taken from 2015 to 2017. This also departs from the general pattern of growth in the number of recordings and aligns with the reports that the Rose-crested Blue Pipit population has been decreasing. It can also be seen from the map below that the majority of the Rose-crested Blue Pipit recordings are located in the south-east portion of the map.

In addition to the analysis and trends above, we also analyzed for all the bird species’  population, timeline, vocalization type, and data collection over time.  The following are the data analysis, including the visualizations for all the bird species metadata analysis.

A number of signs suggest that the Rose-crested Blue Pipit may in fact be in danger and there may be a connection to the manufacturing activities of Kasios. The drop in the number of recordings for the Rose-crested Blue Pipits indicates a possible drop in the population and its location on the eastern half of the map, where the dump site is located, indicates a possible connection with the toxic chemicals released by Kasios.

The average position of the recordings also seem to converge over the years to just south-west of the center (around (80, 80)). The standard deviation of the x and y positions also seem to be stable at 30 and 50 for the x and y positions respectively. So there does not seem to be much change in terms of the average position and spread of the birds. The large initial variation in the average position was probably due to the fact that there were so few birds, but as the number of recordings increased, they seem to settle in their positions.

There does not seem to be much change in patterns of frequency of the months and time of day over the different years the data was collected. May and 8:00 a.m. seem to consistently be the most frequent times of recordings.  However, the following heatmap shows the drop in the number of recordings for the Rose-crested Blue Pipits which could correlate the population drop.

The following are several future questions to explore to understand the Rose-crested Blue Pipits habitat and population sustenance: