Student Team: NO
Matlab
Excel
Approximately how many hours were spent working on this submission in total?
60 Hours
May we post your submission in the Visual Analytics Benchmark Repository after VAST Challenge 2018 is complete? YES
Video
Questions
1.1 description of data and data preprocessing
The given data set is the water quality data in Boonsong Lekagul Wildlife Preserve recorded from 2009 to 2016. It contains 106 indicators of water collected from 10 different locations. Among those indicators, there are 12 pesticides, 50 chemical compound, and 44 biochemical elements.
Since the given data set is raw data, it has many data vacancies and disorders. To ensure the efficiency of the following visual analysis, we mainly performed data filtering and alignment during data preprocessing.
Data alignment: First, we unified the measurements of the data. Then we eliminated duplicate values in the same time, location, or for the same chemical element by using the average values. At last, we normalized the data.
Data filtering: Since not all indicators can aid in discovering the data patterns or solving the problem, we need to identify the most relevant pollutants which means to reduce the dimensionality of the data. For each indicator d, we define:
P=![]()
We sort the indicators with P, and select the top third as candidates. Then we manually picked out those which might cause pollution and we got about 15 indicators left.
1.2 data analysis
Before conducting the analysis, we investigated conclusions of last year. Jo Wood, winner of Comprehensive Mini-challenge 2 Answer award, comprehensively analyzed the relationship between wind direction and air pollutants concentration. He found that Kasios Furniture Company discharged AGOC-3A and Methylosmolene into air at the time between 11:00 p.m. to 4:00 a.m., which is a little unusual. Another team analyzed vehicle traveling patterns, they found some vehicles have 1 - 3 unusual behaviors per month, and all these behaviors occurred between 0:00 a.m. and 6:00 a.m.. So, they inferred Kasios Furniture Company transported industrial waste to somewhere northeast of the Preserve.
In this task, we were asked to analyse the water quality data to figure out whether Kasios Furniture Company had caused environmental pollution in the Boonsong Lekagul Wildlife Preserve. To do so, we implemented a visual analysis system. Our system is composed of 4 views: map, graph, bubble chart, and line chart.
We use node-link graph to analyze data patterns in different months. In Figure 1.1, data in the same month is represented as a single node. Nodes representing consecutive months are connected with edges. We can use a slider to explore data from time to time, the node corresponds to the time being inspected is colored in red. After exploration, we found that the nodes of 2016 are clearly separated from other nodes. This proves that the type or concentration of pollutants in 2016 is different from that in previous years.

Figure 1.1 Overview of cluster membership: (a) red dot locates in the cluster of 2009; (b) red dot locates in the cluster of 2011;(c) red dot locates in the cluster of 2009;
To further describe the changes in different measurements, we make an accurate data mapping. As shown in Figure 1.2, the histograms at different locations show the type and concentration of anomalous pollutants for a certain month. By dragging the slider on the left side, we can observe the changes over time.
Using interactive data analysis, we find the concentration of chromium and lead has suddenly dropped in all locations since 2015 (jasper and pink bars in Figure 1.2a). But the concentration of Methylosmoline and AGOC-3A increased rapidly in Boonsri, Kohsoom, Busarakhan and Chai (blue and brown bars in Figure 1.2b).
Above analysis matches last years conclusion, that Kasios Furniture Company transports the pollutants to the location marked by the blue dots. These pollutants have negative effects on the water quality of nearby water sources.

Figure 1.2 The Visualization map of the preserve: (a) the concentration of chromium and lead was high in 2012; (b) the concentration of Methylosmoline and AGOC-3A increased, but the concentration of chromium and lead decreased.
Conclusion: Kasios Furniture Company secretly dumps pollutants in Preserve, and it should be responsible for the environmental pollution in the Boonsong Lekagul Wildlife Preserve. In addition, since the increase of Methylosmoline and AGOC-3A is followed by the decrease of chromium and lead, we assume that production materials of Kasios Furniture Company have changed in recent years, so the new pollutants replaced old pollutants.
In last years conclusion, some vehicles have 1-3 abnormal behaviors per month. But whether these abnormal behaviors are all related to discharge harmful pollutants, we still need to find more evidence. If last years conclusion is correct, the concentration of pollutants in the water will also change 1-3 times per month. But the hydrology department only select one day to sample data in each month, so the existing data does not support us for such analysis.
We summarized the problems in the data and the impact of these problems on our analysis:
1 Unreasonable data collection interval
Figure 2.1 shows the concentration changes of AGOC-3A at each sampling point between 2015 and 2016. We only found 7 peak during this time. The average time span of data sampling is so wide that we cant accurately analyze whether a contaminant has undergone multiple abnormal changes within a month. And this sampling strategy may also result in missing some data spikes.

Figure 2.1 The concentration changes of AGOC-3A at Chai between 2015 and 2016.
2 Data collection time is not unified
Figure 2.2 compared sampling time at each location in 2016, x-axis and y-axis respectively represents the sampling time and location. We found that in each months data measurement, the measuring time of different locations was different (as shown by the gray line in Figure 2.2). Only focusing on the Chai and Kohsoom (Chai is geographically located downstream of Kohsoom), Chai has fewer measurements than Kohsoom, and these measurements are not sampled from the same day. This brings difficulties to the joint analysis of upstream and downstream sampling sites.

Figure 2.2 Comparison of sampling time at each location in 2016
Figure 2.3 compares sampling time of AGOC-3A and Biochemical Oxygen in the same place, we found that the measuring time of different pollutants in the same place is not uniform. So that we cant judge whether two pollutants have a consistent abnormal trend through non-uniform data, especially when the sampling interval is so large.

Figure 2.3 Comparison of sampling time within different pollutants in the same place
3 Inaccurate measurement time
The current sampling time simply accurate to the day, so the analysts cant see the sampling order of multiple measurements on a particular day. Coupled with improper sampling interval, it difficult for analysts to deduce the specific time rules of pollutants dumping through data.
Conclusion: The hydrological department did not collect enough data to understand the overall situation of the entire protected area. Regarding the issue above, we suggest increasing the number of samples, using more accurate sampling time, ensuring the sampling timestamps of different locations and different pollutants are consistent.
The concentration of AGOC-3A and Methylosmolene near the dumping point, such as Kohsoom and Boonsri, is seriously exceeding the allowable standard. By consulting book, we know that Methylosmolene can cause toxic side effects in vertebrates. This increases our concern for the Pipits and other wildlife.
When analyze Methylosmolene and other pollutants whether have significant effects on Pipits, we additionally use the data of bird calls in Mini-challenge 1. In order to eliminate the interference of the deviations, we only use the data whose audio quality better than C level. Each red point represents a record of bird call collection. Assuming that the sampling method is scientific, and the samplings frequency of per month is similar, then the number of points can approximate the number of birds. If bird calls were collected in somewhere, it means that this place is one of the habitats of birds.
We counted the time and coordinates of the bird calls and mapped the data to our system. Figure 3.1 shows the distribution of birds habitat in reserve in May 2012, which presented by red dots. The birds in the red circle are in the downstream of the polluted water source. So this part of birds is very likely to be affected by pollutants discharged by Kasios Furniture Company.
Constantly adjust the observation time by sliding the scroll bar, we found that the distribution of birds in preserve has periodic changes (possibly because of migration): they are only detected in quantity between March and June every year, and the largest number of birds occurs in May. However, there is no direct evidence that pollutants have a significant negative impact on the number of birds. There may be two reasons why the number of birds is not affected by pollutants. One is that AGOC-3A and Methylosmolene have no effect on birds. But this is highly improbable. Another possible reason is that these pollutants only affect birds when they accumulate to a certain extent. However, the concentration of AGOC-3A and Methylosmolene have only increased since 2015, and birds only stay in reverse for about 3 months per year. So we havent observed any obvious impact for the time being.
Figure 3.1 (a) Distribution of birds in preserve in May 2012; (b) Population and distribution of birds under the influence of pesticides.
In addition, we find another interesting phenomenon that pesticides have a great impact on birds. In Figure 3.1(b), the black skeleton indicates pesticide concentration exceeded. In these place, the number of birds significantly reduced.
Conclusion: We have no direct evidence proves that pollutants discharged by Kasios have negative impact on the existence of birds. So we still need to improve the sampling strategy. The Hydrology Department should add sampling points in where have a large number of Pipits but no sampling sensor. In this way, we can analyze whether the above pollutants have an effect on birds by comparing different water quality and the number of birds. In addition, in future analyses, we also should pay attention to the error caused by the impact of pesticides.