Entry Name:  "SMU-JAKAY-MC2"

VAST Challenge 2018
Mini-Challenge 2

Team Members:

Kishan Bharadwaj Shridhar, Singapore Management University, kishanbs.2016@mitb.smu.edu.sg PRIMARY

Akangsha Bandalkul, Singapore Management University, akangshab.2016@mitb.smu.edu.sg

Angad Srivastava, Singapore Management University, angads.2016@mitb.smu.edu.sg

Ong Guan Jie Jason, Singapore Management University, jason.ong.2016@mitb.smu.edu.sg 

Zhang Yanrong Yale, Singapore Management University, yrzhang.2016@mitb.smu.edu.sg

Student Team:  YES

Tools Used:

Python

Tableau

Excel

D3.js

 

Approximately how many hours were spent working on this submission in total?

180

May we post your submission in the Visual Analytics Benchmark Repository after VAST Challenge 2018 is complete?  YES

 

 

Video: https://tinyurl.com/y7zmtfae

 

Tableau Workbook: https://tinyurl.com/y8gxg7u6

 

 

Questions

  1. Characterize the past and most recent situation with respect to chemical contamination in the Boonsong Lekagul waterways. Do you see any trends of possible interest in this investigation?  Your submission for this question should contain no more than 10 images and 1000 words.

 

Developing one single view across space and time

 

The Horizon plot[1] is a visual technique to capture differences both across time and location. This was developed with the intent of finding the initial chemicals to focus on, given there are over 100 various chemicals, and a 19-year timeframe. We apply min-max scaling (normalization) to individual chemical readings before we develop this single view. For e.g. all readings of Ammonium in the dataset are scaled from 0 to 1 irrespective of the time or the location in which they were captured. Hence, a data point closer to 1 helps us understand which locations show significant trends in terms of the chemical and what is the time when it is being observed. The gradations in levels are termed as PCT1 to PCT 5 respectively, each representing a width of 0.2. The idea being when PCT 5 is higher, we can clearly pick out where the value of the chemical is >0.8 (since the range of reading is now 0-1), and the other 4 bands are filled completely.

 

*As a note, we have used GIF files in our submission, especially for Q1. Please be patient while it runs through. For a more hands-on exploration, please refer to the link of the Tableau Workbook above.

 

In the order of appearance below, one can see discernible patterns in the following chemicals.

 

1.     Ammonium: A constant uptrend is found at Kohsoom as compared to the other locations. Though the peak of Ammonium at Kohsoom was in 1998, the PCT 2 band shows there is consistently higher levels of Ammonium found at Kohsoom. More recently, Tansanee has begun to show spikes.

2.     Anionic Active Surfactants: Abnormalities in Kohsoom and Boonsri alone, with all other locations being dormant.

3.     Chlorodinine: Abnormality at Kohsoom.

4.     Magnesium: A period where there is consistency in uptrend across locations.

5.     Methylosmoline: Kohsoom and Somchair show some patterns of interest.

6.     Total Nitrogen: Tansanee shows high spikes (more PCT5 bands) in recent years.

7.     Water Temperature: Has a very seasonal pattern.

 

 

 

Figure 1: The Horizon plot allows a visual examination of spatiotemporal data in a single view.

 

 

Narrowing down on the analysis set

 

Equipped with the knowledge from the Horizon plot, we can narrow down our analysis to examine how different the actual value of the chemical was at these locations such as Kohsoom or Tansanee as compared to the others. We leverage Control Charts, with gray bands indicating +/- 1 standard deviation of the value of the chemical across time. Before we narrow down our focus, we do notice other chemicals such as Atrazine and Nickel at Kohsoom are spiking up during the months of February 2014 and February 2011 respectively. To keep the analysis succinct, the focus below has been on chemicals that show more than one-off spikes, giving us better confidence in inferring trends.

 

 

Ammonium

 

The below GIF shows:

1.     The levels of Ammonium at Kohsoom has been higher as compared to the other locations throughout the timeframe.

2.     Significant spikes occurring throughout 2014-2016 in Kohsoom.

3.     While the overall average of Ammonium across locations is 620 µg/l, the average value at Kohsoom constantly hovers above it (the value shown by the moving line across time).

4.     A change-point is being detected around July 2008, where the average Kohsoom concentration of Ammonium jumps from around 2100 to more than 2300.

5.     Tansanee in the recent years also shows continual spikes in Ammonium, with a peak attained in November 2014.

 

 

 

Ammonium_MC2_F2

 

Figure 2: The change points of Ammonium

 

By stepping through the chemical filters, we infer higher baselines at Kohsoom are also found for Orthophosphorus-Phosphorus with peaks in December 2013.

Total Phosphorus as well, follows a very similar uptrend at Kohsoom, with an average twice over the baseline of all other locations.

 

 

Anionic Active Surfactants

 

It is quite evident that the concentration of Anionic active surfactants is increasing from May 2015 nearer to Kohsoom. The site Boonsri, which is nearer to the dumping site as well, shows an abnormal deviation away from the baseline, whereas all other locations have been dormant in the same time-period.

 

 

Anionic_MC2_F3

 

Figure 3: Anionic Active Surfactants

 

 

Chlorodinine

 

The levels of Chlorodinine have all significantly dropped in recent years. Despite a uniform drop across all locations, Kohsoom alone experiences a spike in it in June 2016 as seen below.

 

 

A screenshot of a cell phone

Description generated with very high confidence

Figure 4: Chlorodinine

 

Magnesium

 

There is a clear period in February to May 2011, where all locations of a particular stream (Boonsri-Kohsoom-Chai- Busrakhan-Kannika) along with Somchair, show a clear uptrend.

 

 

Figure 5: Magnesium

 

 

Methylosmoline

 

It is very evident that around January 2016, Methylosmoline levels start to spike up at two locations, namely Kohsoom and Somchair.

 

Methly_MC2_F6

 

Figure 6: Methylosmoline

 

Total Nitrogen

 

It is clearly visible that during the period of July 2014 to June 2015 (vertical yellow band), in addition to Kohsoom, an uptrend in Total Nitrogen is visible at Tansanee as well.

 

TotalNit_MC2_F7

 

 

Figure 7: Total Nitrogen

 

A deeper investigation reveals other water properties/chemicals such as COD, Chlorides and Sulphates at Tansanee show a distinct higher baseline.

For sulphates, the average at Busarakhan is higher across the years too.

 

 

Figure 8: A Box Plot showing the trends in COD, Chlorides and Sulphates

 

 

The seasonal nature of Water Temperature.

 

The cycle plot developed shows the temperature of water goes up by 4-50 C during the months of June, July and August throughout the waterways.

 

 

Figure 9: The cycle plot is used to explain seasonal variations in Water Temperature

 

A few notes on Data Preprocessing:

·       Macrozoobenthos is a chemical which does not have units of measurement, hence it is excluded from all analysis.

·       For the chemicals measured in mg/l, we convert them to µg/l to aid in comparison.

·       Water temperature, which is a physical property has different unit of measurement as compared to the rest.

·       The readings of the chemicals represented on the figures above and below, refer to the average value is being used. Tableau helps to automatically convert multiple readings at a particular time/location by aggregating them.

 

  1. What anomalies do you find in the waterway samples dataset?  How do these affect your analysis of potential problems to the environment? Is the Hydrology Department collecting sufficient data to understand the comprehensive situation across the Preserve? What changes would you propose to make in the sampling approach to best understand the situation? Your submission for this question should contain no more than 6 images and 500 words.

 

 

The August 2003 effect

 

Several locations show a spike in a Chromium, Manganese, Copper and Iron just for one day, leading the team to believe it might be an anomaly in the readings. This might probably due to a wrong calibration of the measuring equipment. This is noticed to be happening at every location except Boonsri.

 

Figure 10: The effect of 15 Aug 2003, where there were spikes in certain chemicals all at once and at different locations

 

 

 

The surge in unique measurements

 

On average, there were 43 chemicals measured in a typical year. 2008 and 2009 saw the introduction of over 35 chemicals, leading to the surge in the number of unique chemicals captured. How this affects the analysis is that these unique set of chemicals cannot be compared neither before nor after that specific narrow timeframe. We presume that the hydrology department had an impending need to collect a few samples during those two years, for some research purposes, but later dropped the initiative.

 

The heatmap illustrates that only a 20% of the total chemicals (21 out of 105) had measurements in all years of the sampling. It also shows which chemicals were measured only as part of the two-year surge.

 

 

Figure 11: A heatmap of the number of records for a chemical in a particular month of a year

 

Some examples of where an uptrend was unable to be understood further due to missing samples across locations are as below.

·       Fecal Coliforms [April 2011-Kohsoom]

·       AOX [April 2013 (Kohsoom), May 2013 (Boonsri)]

·       Arsenic [August 2015-Tansanee]

·       Total Hardness, as seen below, seem quite anomalous for the fact that no other location was measured during 2011-2012 (gray window). This induces a doubt that some readings could have been tampered with, for the sake of achieving environment compliance, etc.

 

 

Figure 12: Total hardness of water

 

Coverage of the sampling

 

The plots below show that the coverage of locations are disproportionate, implying the hydrology department has not collected comprehensive data across the preserve. It is also affected by the fact that Achara, Decha and Tansanee started to be measured only from 2009.

 

4 segments are visible from an analysis of the total number of records. ([Chai & Boonsri], [Kannika & Sakda], [ Kohsoom, Somchair & Busarkhan], [Achara, Decha & Tansanee]).

 

 

  

 

 

Figure 13: LEFT: The coverage of locations across the years RIGHT: A map view of the aggregated numbers

 

 

Sequential sampling approach

 

Currently, analysis of chemical settlement along river flow is hard to do, as there is no indication of river flow. For a thorough analysis, along the same stream, points upstream and downstream must be measured sequentially. For e.g. if Boonsri was measured in the first week, then it makes sense to compare the same chemical if it has propagated downstream to Chai or Kannika during the second/third week. Given the date of sampling and location, we find the percentage of common routes taken in a river system. More on this has been explained in our video.

 

The below illustrates that only 7.5% of measurements follow a sequential pattern, with quite a wide variety of routes being adopted by the department.

 

 

 

Sunburst_F20

 

 

Figure 14: The sequential sunburst diagram to illustrate the route of the sampling adopted by the Hydrology Department in a typical month (e.g. 46.7% at Boonsri means out of the 228 months [19years * 12 months] in the time range, 46.7% of the time the sampling began at Boonsri)

 

 

 

With limited knowledge on flow direction, another suggestion could be to follow the minimum distance between the sampling points. This also helps to see propagation of the effluents or chemicals from upstream or downstream, when done with a regular frequency. This also makes sense from a logistics perspective, if the same research group is tasked with collecting the measurements at each point.

 

 

 

 

A close up of a map

Description generated with high confidence

 

 

Figure 15: A suggestion for a possible route to sample across locations

 

 

  1. After reviewing the data, do any of your findings cause particular concern for the Pipit or other wildlife? Would you suggest any changes in the sampling strategy to better understand the waterways situation in the Preserve? Your submission for this question should contain no more than 6 images and 500 words.

 

1.     Drop in Oxygen Saturation Levels

Birds and wildlife feed off water, and Oxygen is essential for any living organism, but Oxygen saturation in water is clearly depleting lower near Kohsoom and Tansanee.

 

OxygenSat_MC2_F16

 

Figure 16: Drop in levels of Oxygen saturation being observed at Kohsoom and Tansanee

2.     Lead Contamination

At Somchair, a steep uptrend in lead contamination levels is seen. Lead is a harmful contaminant in water. 

 

 

 

Figure 17: The level of lead in the water has gone up in the recent years at Somchair

3.     Release of hot effluents?

 

The trend line shows a steeper uptrend at Tansanee as compared to the other locations. There is good reason to believe that the peak of 34C in August 2016 might not have been caused due to natural factors, as the rest of the proximal locations hover around the range of 22-25 C. One plausible explanation could be a factory which uses Tansanee as a water source for their cooling towers, and discharges hot effluents onto the water.

 

 

Figure 18: The usual seasonal water temperature pattern at Tansanee in August 2016 rises to 34C, a 5C increase from the previous year, not being observed in the proximity. Hence, there is strong reason to believe this might not be a natural cause.

 

4.     A relocated dumping site at Somchair/Tansanee

The recent spikes in Somchair and Tansanee in terms of Lead, Total Nitrogen, etc. provide us a future direction to ponder if these might be new dumping sites that have crept up in recent years. We gain impetus from the below points:

1.     Methylosmoline is expected to be higher near Kohsoom, as Kasios had been dumping their industrial waste. However, the uptrend at Somchair is totally unexpected and shows that the Methylosmoline content elsewhere in the preserve might continue to harm the pipits.

2.     Magnesium uptrend at Somchair (Figure 5) happens along with an uptrend in the stream system of Boonsri-Kannika. These two streams are totally unconnected, discounting a transmission through water borne channels, giving a strong reason to believe some other illicit activity might be happening at Somchair.

 

 

Proposed changes to sampling strategy

 

-Regular sampling frequency

 

The interval between two consecutive samples (not referring to multiple samples on the same day), is shown below. At Boonsri, the department was measuring samples once a month previously, whereas in recent years, we see a surge in the measurements on a weekly/daily basis. In Achara, the trend is quite the opposite of Boonsri. No daily recordings at Tansanee probably illustrates the level of access samplers have in getting there or the importance of Tansanee to the samplers. Tansanee and Somchair need to be measured more frequently than before. Water temperature measured every day at Chai in 2016 probably tells that the samplers have now begun to move towards consistent sampling of chemicals at various locations.

 

Redundancy in the chemicals collected can also be understood if the sampling frequencies are aligned across locations, as correlations of different chemical measurements can be checked upon and thereby help narrow down on the chemicals to focus on.

 

 

 

 

 

Figure 19: Recording the frequency of sampling by noting the difference between two consecutive measurements at a particular location (irrespective of the chemical)

 

 

-New sources of data

 

Measuring the flow rate might help to understand if a chemical’s dilution is being exacerbated from upstream or downstream and at the same time, provide an indication for the flow direction too. This will help to also understand the presence of any choke points, where the flow of a river is taking a long time from point A to B, thereby leading the birds or wildlife being directly affected by the streams. The altitude of the sampling points can also be an estimate for flow direction. Water properties such as pH and conductivity can also be measured.

 

In a nutshell, our recommendation is to target the right and relevant measurements, and measure them consistently across locations by adopting similar routes. If the readings of the sensors are automated, then all that is needed is to schedule the recordings in a cohesive time interval. We hope that the hydrology department can take our suggestions in their future endeavors.

 



[1] https://www.tableau.com/about/blog/2016/4/visualizing-dense-data-how-cut-and-superpose-areas-52839