Student Team: YES
Python
Tableau
Excel
D3.js
Approximately how many hours were spent working on this submission in total?
180
May we post your submission in the Visual Analytics
Benchmark Repository after VAST Challenge 2018 is complete?
YES
Video: https://tinyurl.com/y7zmtfae
Tableau Workbook: https://tinyurl.com/y8gxg7u6
Questions
Developing one single
view across space and time
The Horizon plot[1] is a visual technique to capture differences both across time and location. This was developed with the intent of finding the initial chemicals to focus on, given there are over 100 various chemicals, and a 19-year timeframe. We apply min-max scaling (normalization) to individual chemical readings before we develop this single view. For e.g. all readings of Ammonium in the dataset are scaled from 0 to 1 irrespective of the time or the location in which they were captured. Hence, a data point closer to 1 helps us understand which locations show significant trends in terms of the chemical and what is the time when it is being observed. The gradations in levels are termed as PCT1 to PCT 5 respectively, each representing a width of 0.2. The idea being when PCT 5 is higher, we can clearly pick out where the value of the chemical is >0.8 (since the range of reading is now 0-1), and the other 4 bands are filled completely.
*As a note, we have used GIF files in our submission, especially for
Q1. Please be patient while it runs through. For a more hands-on exploration,
please refer to the link of the Tableau Workbook above.
In the order of appearance below, one can see discernible patterns in the following chemicals.
1. Ammonium: A constant uptrend is found at Kohsoom as compared to the other locations. Though the peak of Ammonium at Kohsoom was in 1998, the PCT 2 band shows there is consistently higher levels of Ammonium found at Kohsoom. More recently, Tansanee has begun to show spikes.
2.
Anionic
Active Surfactants: Abnormalities in Kohsoom and Boonsri alone, with all
other locations being dormant.
3.
Chlorodinine:
Abnormality at Kohsoom.
4.
Magnesium:
A period where there is consistency in uptrend across locations.
5.
Methylosmoline:
Kohsoom and Somchair show some patterns of interest.
6.
Total
Nitrogen: Tansanee shows high spikes (more PCT5 bands) in recent years.
7.
Water
Temperature: Has a very seasonal pattern.

Figure 1: The Horizon plot allows a visual examination of spatiotemporal data in a single view.
Narrowing down on the
analysis set
Equipped with the knowledge from the Horizon plot, we can
narrow down our analysis to examine how different the actual value of the
chemical was at these locations such as Kohsoom or Tansanee as compared to the
others. We leverage Control Charts, with gray bands indicating +/- 1 standard
deviation of the value of the chemical across time. Before we narrow down our
focus, we do notice other chemicals such as Atrazine and Nickel at Kohsoom are
spiking up during the months of February 2014 and February 2011 respectively.
To keep the analysis succinct, the focus below has been on chemicals that show
more than one-off spikes, giving us better confidence in inferring trends.
Ammonium
The below GIF shows:
1. The levels of Ammonium at Kohsoom has been higher as compared to the other locations throughout the timeframe.
2. Significant spikes occurring throughout 2014-2016 in Kohsoom.
3. While the overall average of Ammonium across locations is 620 µg/l, the average value at Kohsoom constantly hovers above it (the value shown by the moving line across time).
4. A change-point is being detected around July 2008, where the average Kohsoom concentration of Ammonium jumps from around 2100 to more than 2300.
5. Tansanee in the recent years also shows continual spikes in Ammonium, with a peak attained in November 2014.

Figure 2: The change points of Ammonium
By stepping through the chemical filters, we infer higher baselines at Kohsoom are also found for Orthophosphorus-Phosphorus with peaks in December 2013.
Total Phosphorus as well, follows a very similar uptrend at Kohsoom, with an average twice over the baseline of all other locations.
Anionic Active
Surfactants
It is quite evident that the concentration of Anionic active surfactants is increasing from May 2015 nearer to Kohsoom. The site Boonsri, which is nearer to the dumping site as well, shows an abnormal deviation away from the baseline, whereas all other locations have been dormant in the same time-period.

Figure 3: Anionic Active Surfactants
Chlorodinine
The levels of Chlorodinine have all significantly dropped in recent years. Despite a uniform drop across all locations, Kohsoom alone experiences a spike in it in June 2016 as seen below.

Figure 4: Chlorodinine
Magnesium
There is a clear period in February to May 2011, where all locations of a particular stream (Boonsri-Kohsoom-Chai- Busrakhan-Kannika) along with Somchair, show a clear uptrend.

Figure 5: Magnesium
Methylosmoline
It is very evident that around January 2016, Methylosmoline levels start to spike up at two locations, namely Kohsoom and Somchair.

Figure 6: Methylosmoline
Total Nitrogen
It is clearly visible that during the period of July 2014 to June 2015 (vertical yellow band), in addition to Kohsoom, an uptrend in Total Nitrogen is visible at Tansanee as well.

Figure 7: Total Nitrogen
A deeper investigation reveals other water properties/chemicals such as COD, Chlorides and Sulphates at Tansanee show a distinct higher baseline.
For sulphates, the average at Busarakhan is higher across the years too.

Figure 8: A Box Plot showing the trends in COD, Chlorides and Sulphates
The seasonal nature
of Water Temperature.
The cycle plot developed shows the temperature of water goes up by 4-50 C during the months of June, July and August throughout the waterways.

Figure 9: The cycle plot is used to explain seasonal variations in Water Temperature
A few notes on Data
Preprocessing:
· Macrozoobenthos is a chemical which does not have units of measurement, hence it is excluded from all analysis.
· For the chemicals measured in mg/l, we convert them to µg/l to aid in comparison.
· Water temperature, which is a physical property has different unit of measurement as compared to the rest.
· The readings of the chemicals represented on the figures above and below, refer to the average value is being used. Tableau helps to automatically convert multiple readings at a particular time/location by aggregating them.
The August 2003 effect
Several locations show a spike in a Chromium, Manganese, Copper and Iron just for one day, leading the team to believe it might be an anomaly in the readings. This might probably due to a wrong calibration of the measuring equipment. This is noticed to be happening at every location except Boonsri.

Figure 10: The effect of 15 Aug 2003, where there were spikes in certain chemicals all at once and at different locations
The surge in unique
measurements
On average, there were 43 chemicals measured in a typical year. 2008 and 2009 saw the introduction of over 35 chemicals, leading to the surge in the number of unique chemicals captured. How this affects the analysis is that these unique set of chemicals cannot be compared neither before nor after that specific narrow timeframe. We presume that the hydrology department had an impending need to collect a few samples during those two years, for some research purposes, but later dropped the initiative.
The heatmap illustrates that only a 20% of the total chemicals (21 out of 105) had measurements in all years of the sampling. It also shows which chemicals were measured only as part of the two-year surge.

Figure 11: A heatmap of the number of records for a chemical in a particular month of a year
Some examples of where an uptrend was unable to be understood further due to missing samples across locations are as below.
· Fecal Coliforms [April 2011-Kohsoom]
· AOX [April 2013 (Kohsoom), May 2013 (Boonsri)]
· Arsenic [August 2015-Tansanee]
· Total Hardness, as seen below, seem quite anomalous for the fact that no other location was measured during 2011-2012 (gray window). This induces a doubt that some readings could have been tampered with, for the sake of achieving environment compliance, etc.

Figure 12: Total hardness of water
Coverage of the
sampling
The plots below show that the coverage of locations are disproportionate, implying the hydrology department has not collected comprehensive data across the preserve. It is also affected by the fact that Achara, Decha and Tansanee started to be measured only from 2009.
4 segments are visible from an analysis of the total number of records. ([Chai & Boonsri], [Kannika & Sakda], [ Kohsoom, Somchair & Busarkhan], [Achara, Decha & Tansanee]).


Figure 13: LEFT: The coverage of locations across the years RIGHT: A map view of the aggregated numbers
Sequential sampling
approach
Currently, analysis of chemical settlement along river flow is hard to do, as there is no indication of river flow. For a thorough analysis, along the same stream, points upstream and downstream must be measured sequentially. For e.g. if Boonsri was measured in the first week, then it makes sense to compare the same chemical if it has propagated downstream to Chai or Kannika during the second/third week. Given the date of sampling and location, we find the percentage of common routes taken in a river system. More on this has been explained in our video.
The below illustrates that only 7.5% of measurements follow a sequential pattern, with quite a wide variety of routes being adopted by the department.

Figure 14: The sequential sunburst diagram to illustrate the route of the sampling adopted by the Hydrology Department in a typical month (e.g. 46.7% at Boonsri means out of the 228 months [19years * 12 months] in the time range, 46.7% of the time the sampling began at Boonsri)
With limited knowledge on flow direction, another suggestion could be to follow the minimum distance between the sampling points. This also helps to see propagation of the effluents or chemicals from upstream or downstream, when done with a regular frequency. This also makes sense from a logistics perspective, if the same research group is tasked with collecting the measurements at each point.

Figure 15: A suggestion for a possible
route to sample across locations
1.
Drop
in Oxygen Saturation Levels
Birds and wildlife feed off water, and Oxygen is essential for any living organism, but Oxygen saturation in water is clearly depleting lower near Kohsoom and Tansanee.

Figure 16: Drop in levels of Oxygen saturation being observed at Kohsoom and Tansanee
2.
Lead
Contamination
At Somchair, a steep uptrend in lead contamination levels is seen. Lead is a harmful contaminant in water.

Figure 17: The level of lead in the water has gone up in the recent years at Somchair
3.
Release
of hot effluents?
The trend line shows a steeper uptrend at Tansanee as compared to the other locations. There is good reason to believe that the peak of 34C in August 2016 might not have been caused due to natural factors, as the rest of the proximal locations hover around the range of 22-25 C. One plausible explanation could be a factory which uses Tansanee as a water source for their cooling towers, and discharges hot effluents onto the water.

Figure 18: The usual seasonal water temperature pattern at Tansanee in August 2016 rises to 34C, a 5C increase from the previous year, not being observed in the proximity. Hence, there is strong reason to believe this might not be a natural cause.
4.
A
relocated dumping site at Somchair/Tansanee
The recent spikes in Somchair and Tansanee in terms of Lead, Total Nitrogen, etc. provide us a future direction to ponder if these might be new dumping sites that have crept up in recent years. We gain impetus from the below points:
1. Methylosmoline is expected to be higher near Kohsoom, as Kasios had been dumping their industrial waste. However, the uptrend at Somchair is totally unexpected and shows that the Methylosmoline content elsewhere in the preserve might continue to harm the pipits.
2. Magnesium uptrend at Somchair (Figure 5) happens along with an uptrend in the stream system of Boonsri-Kannika. These two streams are totally unconnected, discounting a transmission through water borne channels, giving a strong reason to believe some other illicit activity might be happening at Somchair.
Proposed changes to
sampling strategy
-Regular sampling
frequency
The interval between two consecutive samples (not referring to multiple samples on the same day), is shown below. At Boonsri, the department was measuring samples once a month previously, whereas in recent years, we see a surge in the measurements on a weekly/daily basis. In Achara, the trend is quite the opposite of Boonsri. No daily recordings at Tansanee probably illustrates the level of access samplers have in getting there or the importance of Tansanee to the samplers. Tansanee and Somchair need to be measured more frequently than before. Water temperature measured every day at Chai in 2016 probably tells that the samplers have now begun to move towards consistent sampling of chemicals at various locations.
Redundancy in the chemicals collected can also be understood if the sampling frequencies are aligned across locations, as correlations of different chemical measurements can be checked upon and thereby help narrow down on the chemicals to focus on.


Figure 19: Recording the frequency of sampling by noting the difference between two consecutive measurements at a particular location (irrespective of the chemical)
-New sources of data
Measuring the flow rate might help to understand if a chemical’s dilution is being exacerbated from upstream or downstream and at the same time, provide an indication for the flow direction too. This will help to also understand the presence of any choke points, where the flow of a river is taking a long time from point A to B, thereby leading the birds or wildlife being directly affected by the streams. The altitude of the sampling points can also be an estimate for flow direction. Water properties such as pH and conductivity can also be measured.
In a nutshell, our recommendation is to target the right and relevant measurements, and measure them consistently across locations by adopting similar routes. If the readings of the sensors are automated, then all that is needed is to schedule the recordings in a cohesive time interval. We hope that the hydrology department can take our suggestions in their future endeavors.
[1] https://www.tableau.com/about/blog/2016/4/visualizing-dense-data-how-cut-and-superpose-areas-52839