Entry Name:  "SMU-Chan-MC1"

VAST Challenge 2018
Mini-Challenge 1

 

 

Team Members:

Chan En Ying Grace, Singapore Management University, gracechan.2017@mitb.smu.edu.sg   PRIMARY

Student Team:  YES

 

Tools Used:

R

 

Approximately how many hours were spent working on this submission in total?

100 hours

 

May we post your submission in the Visual Analytics Benchmark Repository after VAST Challenge 2018 is complete? YES

 

Video

https://youtu.be/IYQLBHvg3YM  

 

 

Questions

1Using the bird call collection and the included map of the Wildlife Preserve, characterize the patterns of all of the bird species in the Preserve over the time of the collection. Please assume we have a reasonable distribution of sensors and human collectors providing the recordings, so that the patterns are reasonably representative of the bird locations across the area. Do you detect any trends or anomalies in the patterns? Please limit your answer to 10 images and 1000 words.

 

1.       Clustering by Species

 

The facet plot characterises each species’ clustering patterns relative to the alleged dumping site.

 

If the dumping took place, one would expect that other species in the dumping area to also decrease, besides the Rose Pipits. Other species clustered around the dumping site include the Ordinary Snape and Lesser Birchbeere. We will thus assign Ordinary Snape (OS) and Lesser Birchbeere (LB) as our Control Groups, while we investigate Rose-Crested Blue Pipits (Rose Pipit) as our Treatment Group.

 

A screenshot of a cell phone

Description generated with high confidence

Figure 1: Clustering of Species Across Locations

 

 

2.       Clusters of Rose Pipits Across Time

 

a.       First, the Pipit population increased in 2013 to 2017, and peaked in 2015.

In 2013, the cluster was found right at the alleged dumping site. We define it as the home range, where Pipits tended to thrive.

 

b. Second, the home range moved away from the dumping site, from 2015 onwards.

This could be signs of dumping. It could be that the dumping caused Pipits to fly further away, moreover its population dwindled as they struggled to survive.

 

Figure 2: Rose Pipits Across Time

 

3.       Kernel Density Plot of Rose Pipits Across Time

 

For clearer visualisation, we use the Kernel Density Plot to compute point distribution intensity across 2012-2017.

 

a. Two Clusters Become One; New Cluster "overtook" Old

There were originally two clusters in 2012 – one at the dumping site, and one away from the dumping site. The cluster at the dumping site grew in size until 2014, while the cluster further away "took over" the growth from 2014.

 

b. Old Cluster Vanished in 2015

Also, the cluster at the dumping site vanished totally from 2015, leaving only the cluster away from the dumping site. This could indicate a movement of birds from the dumping site cluster towards the cluster away from the dumping site, possibly a trend of dumping.

 

 

A picture containing different, indoor, computer, black

Description generated with very high confidence

Figure 3: Kernel Density Plot of Rose Pipits

 

 

4.       Spatial Point Pattern Analysis (Across Time)

 

We then apply density-based and distance-based measures to test for cluster statistical significance, that could shed light towards the dumping allegation.

 

a.       Quadrat Analysis

 

Clusters were significant across all six years, but became less compact from 2015. The Pearson statistic peaked at 2015 and reduced after that, showing signs of reduction in cluster compactness.

 

A close up of a logo

Description generated with high confidence

    Figure 4: Quadrat Analysis Results

 

b.      Nearest Neighbour, followed by Clarke-Evans Test

 

In 2015, majority of each Pipit’s neighbours were within 10units distance away. From 2016 however, the neighbouring distance expanded towards 40. Though clusters were statistically significant from 2012 to 2017 (apply the Clarke-Evans test), the clusters started to disperse after 2015 which was the year of suspected dumping. This supports the findings above which saw that year 2015 had the most compact cluster.

 

A screenshot of a video game

Description generated with high confidence

 Figure 5: Nearest Neighbour for Rose Pipits

 

c.       K-Function

 

Similarly, with the K-Function, the size and strength of clusters reduced after 2015 where the grey confidence band increased. Eventually in 2017, the cluster lost its significance when radius was below 5, that is, the cluster became less compact.

 

A map of a computer

Description generated with high confidence

    Figure 6: K-Function of Rose Pipits

 

5.       Were Rose Pipits the only ones affected?

 

If there were dumping, even our Control Groups should be affected unless there were biological reasons why they were “immune” to the chemical. Interestingly, our Control Groups showed that they were not affected by the Dumping Site.

 

a)                  Ordinary Snape (Control Group 1)

 

As visualised in the Kernel Density Plot, the OS home range did not move and remained near the dumping site, unlike the Rose Pipits whose home range moved away from the dumping site.

 

From the K-Function, the significance of clusters grew overtime, confirming our hypothesis that the OSs were not affected by the dumping.

 

When we perform a K-Cross between Rose Pipits and Ordinary Snapes, we can see that the spatial dependence between Pipits & OS was strongest in 2015.

 

Evidence Against Dumping

If dumping too place, then the OS must be affected too, especially in 2015 since the 2 species were spatially dependent. However, the OS were not affected in population and homerange. Well, possibly, the chemical only affected the Rose Pipits.

 

Evidence For Dumping

Since 2015 was the year where Rose Pipits moved away from the dumping site, it is perhaps only normal that the spatial dependence is closest in 2015 if dumping indeed affected the Pipits. This is because the Pipits moved from the dumping site and is now closer to the OS homerange, thus a stronger K-Cross significance.

 

Figure 7: KDE, K-Function & K-Cross for Ordinary Snape

 

b)                  Control Group 2 – Lesser Birchbeere

 

As for the Lesser Birchbeere, its home range moved even closer towards the Dumping Site, whereby a new cluster was formed near the dumping site in 2016 and grew almost as large as the former cluster, in 2017.

 

However, the clusters were not statistically significant in 2016 (until radius > 15). This could be because of the emergence of a new cluster in 2016 as seen below. That said, the clusters became significant in 2017 when the new cluster grew in size. We thus still stick to our hypothesis that the Lesser Birchbeeres were not affected by the dumping.

 

The K-Cross shows that the LBs did not have spatial dependence with the Rose Pipits across all 6 years and hence could be a poor control. Nevertheless, our OSs still serve as good controls, so we will place greater emphasis on the results arising from the OSs.

A picture containing wall

Description generated with very high confidence

Figure 8: KDE, K-Function & K-Cross for Lesser Birchbeere

 

6.       Perhaps Bird Song is indicative of thriving population, while Bird Call is a sign of distress?

 

Some research suggests that bird songs could be indicative of happier birds while calls are signs of distress. True enough, Rose Pipits stopped singing and there were more signals of distress (calls) from 2015 to 2017. This could be because of the dumping. When there was dumping, the songs turned to calls, especially since 2015 was the year where we see the most calls.

 

To confirm this, let us do a K-Cross between Pipits who called and Pipits who sang. As hypothesised, year 2015 which was the suspected year of dumping, saw the most statistically significant spatial dependence between Pipits who called and Pipits who sang. This supports our hypothesis as during the year of dumping, the songs turned to calls.

 

Figure 9: Facet Plot of Rose Pipit - By Call & Song

 

A close up of a map

Description generated with very high confidence

Figure 10: K-Cross between Pipits who Sang & Pipits who Called

 

 

2Turn your attention to the set of bird calls supplied by Kasios. Does this set support the claim of Pipits being found across the Preserve?  A machine learning approach using the bird call library may help your investigation. What is the role of visualization in your analysis of the Kasios bird calls?   Please limit your answer to 10 images and 1000 words.

 

1.       Envelope Plot

 

First, we plot the amplitude envelope across the 19 birds species (training data). We then will plot the same for the 15 test birds (testing data), and compare them against the training data to label the species.

A screen shot of a computer

Description generated with very high confidence

Figure 11: Amplitude Envelope Plot of 19 Training Bird Species

A screen shot of a computer

Description generated with very high confidence

Figure 12: Amplitude Envelope Plot for 15 Testing Birds

 

 

Results

By visualizing the envelope of the amplitude envelope plots of both the training and testing data, the last column shows the predicted species for each of the 15 test birds.

 

2 out of 15 birds are predicted to be Rose Pipits. They are Test Bird 2 and Test Bird 9.

 

ID

 X

 Y

Predicted Species

1

140

119

Eastern Corn Skeet

2

63

153

Rose-Crested Blue Pipit

3

70

136

Queenscoat

4

78

150

Bombadil

5

60

90

Canadian Cootamum

6

126

103

Qax

7

71

121

Orange Pine Plover

8

78

62

Green-Tipped Scarlet Pipit

9

61

145

Rose-Crested Blue Pipit

10

45

39

Qax

11

132

106

Scrawny Jay

12

61

20

Qax

13

35

160

Qax

14

40

125

Bombadil

15

110

121

Pinkfinch

 

2.       Oscillogram Plot

 

For confirmation, let us also look at the oscillogram plot. The predicted species is indicated in the last column, after visualising and comparing the similarity of the oscillogram plots. Due to image limit, we display only the birds predicted to be Rose Pipits.

 

Our results show that the predicted species based on oscillogram visualisation, matches the predicted species based on envelope plot visualisation. This is not a surprise because the envelope is obtained from the oscillogram.

 

Table 2: Oscillogram Plot for 2 Testing Birds Predicted as Rose Pipits

 

Bird ID

Oscillogram

Predicted Species

2

Rose-Crested Pipit

9

 

Rose-Crested Blue Pipit

 

 

3. Trellis Plot of Acoustic Parameters

 

A caveat to the previous analysis is that we did not make use of all the training birds in the visualisation. Rather, we randomly selected 5 birds per species to visualise, and then chose 1 to represent the entire species. Thus, we now make use of all the training birds by plotting the distributions across the parameters.

 

There is a total of 15 parameters, out of which, 7 are chosen as these 7 parameters have greater distinction between the species. The 7 parameters are: dom_median, HNR_median,mean, Freq_median, peakFreq_median, pitch_median, pitchAutocor_median, pitchSpec_median.

 

Distributions

The trellis plot of the 7 parameters of the training birds is plotted, where the mean is indicated by the black solid line. Next, we will plot each of the 15 testing birds from Kasios onto this plot, in blue dotted line. We will then select the closest species for each parameter. The species with the most parameters selected will be assigned as the predicted species.

 

Given that Test Bird 2 and Test Bird 9 were predicted to be Rose-Crested Blue Pipits, we will focus on these two birds for visualisation.

 

Predicting Test Bird 2 & 9

The species with the highest ticks (i.e. closest to the testing bird) will be selected as the predicted species. Based on this, Test Bird 2 is predicted to be a Qax. Test Bird 9 is predicted to be a Vermillion Trillian.

 

Unfortunately, this does not match our earlier predictions by visualizing the amplitude plot. We conclude that this method may not be ideal as it is a numerical representation, while the amplitude plots are more likely to be more reflective (though less representative of the entire training population).

 

As such, we will rely on Method 1 (Envelope Plot) & Method 2 (Oscillogram Plot), and leave Method 3 (Trellis Plot) out from our concluding hypothesis.

 

A close up of a map

Description generated with very high confidence

    Figure 13: Test Bird 2 - Trellis Plot of Acoustic Parameters

 

A close up of a map

Description generated with very high confidence

    Figure 14: Test Bird 9 - Trellis Plot of Acoustic Parameters

 

 

4. Audio Classification

We also attempted classification to predict the bird species - first by experimenting Decision Tree and then Random Forest.

 

Method 1: Decision Tree

The decision tree produced a high misclassification error rate of 0.574.

 

Based on the Decision Tree Model, Test Bird 2 was predicted as a Lesser Birchbeere while Test Bird 9 was predicted as a Green Tipped Scarlet Pipit. This is contrary to our earlier predictions. Out of the 15 predictions, only 1 matches, and that's Test Bird 7 (in green below). Given that the misclassification rate is rather high (57%), we should not rely on our classification results from the Decision Tree model.

 

Method 2: Random Forest

Instead, we use Random Forest to improve the performance of decision trees. We attempt 3 different Random Forest models, by fine-tuning the parameters to reduce misclassification rate.

 

Unfortunately, the lowest classification rate is 0.5565 which is low and only slightly better than the Decision Tree model. Not only did the predicted results not match our visualisation plots, the table below shows that the predicted results did not match that of the Decision Tree either. We will thus not rely on the predicted results from classification.

 

Visualisation or Classification?

Visualisation.

 

In my opinion, classification is not a good method for predicting bird species. This is because, the data obtained is actually the same as that used in the Trellis Plots. Bird calls across species may have similar amplitude mean, pitch frequency etc, but are different in nature. We should look at the shape (amplitude pattern), than at the statistical parameters.

 

Note: We also attempted spectrogram plot but found little variation across species.

 

5. Where Did the 2 Suspected Rose Pipits Come From?

 

The two predicted-to-be-Pipit birds (in green) are not found in the two clusters near the dumping site. But they did appear together, which makes sense since birds of the same species tend to fly together, lending credibility to our prediction by visualisation.

 

 

Figure 15: Predicted Rose Pipits - Not Found Near Dumping Site

 

6.        Key Observations

 

·       Only 2 out of the 15 birds have resemblance to the Rose Pipits.

·       These 2 birds were not found near the dumping site, neither were they found in the previous 2 clusters identified.

 

7.       Hypothesis: Pipits not found across preserve

 

Given that only 2 of the 15 birds provided by Kasios were likely to be Pipits, Kasios' claim that the Pipits were thriving across the Preserve is doubted. Based on the set of bird calls supplied by Kasios, it does not support the claim of Pipits being found across the Preserve.

 

 

3Formulate a hypotheses concerning the state of the Rose Crested Blue Pipit.  What are your primary pieces of evidence to support your assertion?  What next steps should be taken in the investigation to either support or refute the Kasios claim that the Pipits are actually thriving across the Boonsong Lekagul Wildlife Preserve?  Please limit your answer to 500 words.

 

1. Hypothesis: Pipits surviving, but not thriving on Preserve.

 

Key Observations

1. Pipit clusters were significant in 2012 to 2017. 
2. Pipit population peaked in 2015. 
3. Pipit home range moved away from dumping site, from 2015 
4. Pipit clusters became less compact from 2015 and lost its significance in 2017 for radius < 5 
5. Pipits stopped singing after 2015. Songs turned into Calls - a sign of distress. 
6. Pipits were the only species affected (i.e. their home range & population). 
7. Control Groups thrived and even have their home range move closer to the dumping site 
8. Pipits were spatially dependent to the Ordinary Snapes in 2015, so they should be both exposed and given the same "treatment" if dumping did actually occur, however the Ordinary Snapes were not affected.

 

Signs of Dumping But Affected Pipits Only

Rose Pipits were surviving as its clusters still exist and were significant, until 2017 for large radiuses. However, they were not thriving at the dumping site and had to move away from it. Moreover, its population had fallen. This was especially since 2015 was the year that songs turned to calls, moreover, the control groups – OS and LB – did not experience a fall in population and in fact even increased in population and moved closer to the dumping site from 2015, respectively.

So, I conclude that there were signs of Dumping and this was likely to take place in 2015, but the Dumping most likely consisted of chemicals that affected mainly the Rose Pipits, and not the other species.

 

2. Next Steps to be Taken: Need for RCT to determine if Dumping was the cause

 

However, we have not confirmed whether dumping was the cause. If it did, then it only affected the Rose Pipits. If it did not, then there must be something else causing the slow death of Pipits.

To test our hypothesis to determine whether it is the dumping that caused it, we can conduct a Randomised Control Trial (RCT). It is a more rigorous way of determining whether a cause-effect relation exists between treatment (dumping substance) and outcome (death of Pipits).

 

Treatment Group

Introduce the dumping substance to both a Rose Pipit and an Ordinary Snape, at the same location (e.g. dumping site at coordinates = (148,159)). If only the Rose Pipit dies after being introduced the substance, while the Ordinary Snape survives, then our hypothesis that "the dumping took place and only affected the Rose Pipits due to its biological make-up", is correct.

 

Control Group

Introduce the dumping chemical again to a Rose Pipit and an Ordinary Snape, but this time, introduce it to the birds at a different location - say the new cluster at coordinates = (130, 120). If only the Rose Pipit dies after being introduced, then our hypothesis holds. Otherwise, if it does not die, then there must be "something else causing the deaths of the Pipits at the dumping site area, but not due to dumping".