1. Entry Name:  "KULeuven-Alcaide-MC2"

  2. VAST Challenge 2018
    Mini-Challenge 2

  1. Team Members:

Daniel Alcaide, KU Leuven, daniel.alcaide@esat.kuleuven.be     PRIMARY
Jan Aerts, KU Leuven, jan.aerts@kuleuven.be

Student Team:  YES

  1. Tools Used:

R (mclean, tidyverse, shiny, igraph)

Approximately how many hours were spent working on this submission in total?

50h

May we post your submission in the Visual Analytics Benchmark Repository after VAST Challenge 2018 is complete? YES

Video

https://www.youtube.com/watch?v=XMm1NRwBb8s

 

Important notes:


Questions

  1. Characterize the past and most recent situation with respect to chemical contamination in the Boonsong Lekagul waterways. Do you see any trends of possible interest in this investigation?  Your submission for this questions should contain no more than 10 images and 1000 words.

To characterise the situation of the water we analysed the measures according to their locations. We consider that some points are unidirectionally connected as illustrated in Fig-1.1. Locations incorporate a codification defined by their position in the specified hierarchy.  The first letter (uppercase) defines the section (A, B, C or D), an independent part of the river that does not have contact with the remaining sections.  The second letter identifies the position in the hierarchy inside the section (lowercase).

Fig1.1- Hierarchy of sampling points. We identified four independent sections (A, B, C and D). [see a higher resolution image version]

A. Analysis of chemical concentration

The analysis of chemical values is based on a custom interface developed for the mini-challenge. The interface is composed of three visuals, a dendrogram, a heat-map and a set of small multiple line-charts enabled through brushing interaction.

The dendrogram defines the level of aggregation in the heat-map and complements the set of filters implemented on the top of the interface.

The heat-map exhibits patterns in a summarised fashion (cluster of measures by location over time using Euclidean distance) as shown in Fig-1.2. A sorting process of rows was added using multidimensional scaling to facilitate the detection of similar patterns in the heat-map. When selects a single pattern or a group of them, individual line-charts of the selected measurements at the bottom of the heat-map view are shown.

About the interface

Our first exploration revealed two big groups as shown in Fig-1.2 (Part-A and Part-B). Part-A represents a group of chemicals with constant high values over time, and they are being described in the following section.  Part-B includes most of the substances analysed in the preserve; they draw a progressive decrease over time (independently of their riskiness for the environment). Intuitively, we can set a breakpoint at the year 2009; the values after 2009 contain a relatively lower (higher proportion of blue rectangles) than the period before (higher proportion of red rectangles).

Fig 1.2- Default view of the interface. The dendrogram represents the grouping of all measures by location. The heat-map gives an overview of the measures from 1998 to 2016 using a clustering threshold of 100.  Each line in the heat-map represents the behaviour of a chemical or a set of them over time grouped by the number of deviations of Median Absolute Deviations (MAD). Colours encode a categorisation of the number of MAD deviations. A negative MAD represents a value smaller (blue) than the global median (median of each substance independently of the location) and, a positive value represents value above of it (red). [see a higher resolution image version]

Finding 1: High values from location Ba-Tansanee and Da-Kohsoom

In contrast with the global situation described in the previous section (Part-B of Fig-1.2), measurements included in Part-A of Fig-1.2 are characterised mainly by measures taken from locations Ba-Tansanee and Da-Kohsoom as shown in Fig-1.3. The values collected in this group have higher values than other locations:

-        Ammonium in Ba-Tansanee and Da-Kohsoom

-        Chlorides in Ba-Tansanee

-        Fecal streptococci in Da-Kohsoom

-        Nitrates in Ba-Tansanee

-        Orthophosphate-phosphorus in Da-Kohsoom

-        Total dissolved salts in Ba-Tansanee, Ca-Somchair and Db-Busarakhan

-        Total phosphorus in Da-Kohsoom

Fig 1.3 – Selection of high sustained measures over time (from Part-A Fig-1.2).[see a higher resolution image version]

Finding 2: substances are not equally evaluated

The clustering of signals can hide underlying groups. The lowest threshold value (threshold = 0) reveals all unique patterns although it increases the number of elements to explore as shown in Fig-1.4. We can quickly identify differences between the frequency of the measurements. Part-B of Fig-1.4 represents the most common behaviour of measures; samples collected spread out across the whole time period.  Part-A and C correspond to the analysis of chemicals concentrated during a specific period and part D to the inclusion of new sampling points such as Aa-Decha, Ba-Tansanee and Cb-Anchara.

Fig 1.4 – Overview of the measures from 1998 to 2016 using a clustering threshold of 0. Part A and C reveals localised chemicals collected between periods 2005-2008 and 2008-2010 respectively. Part C represents the inclusion of new sampling points and Part B the most general approach in the sampling of measures.[see a higher resolution image version]

Finding 3: Correlation between the sampled period and values collected

Part A of Fig-1.4 was collected in sections C and D of the preserve from January of 2005 until January 2008. The measures contained in this group includes Alachlor, alpha-Hexachlorocyclohexane, beta-Hexaxchlorocyclohexane, Dieldrin, Endosulfan (alpha and beta), Heptachlor and Heptachloroepoxide, Methoxychlor and Metolachlor, p,p-DDD, p,p-DDE, and Simazine.

The elements alpha-Hexachlorocyclohexane, beta-Hexaxchlorocyclohexane, p,p-DDD and p,p-DDE (Fig-1.5) described similar patterns with a rise during the first year of samples (the year 2005) and a descent during the last period of samples (the year 2007).

Fig 1.5 –Chemical concentration evolution for alpha-Hexachlorocyclohexane, beta- Hexaxchlorocyclohexane, p,p-DDD and p,p-DDE. [see a higher resolution image version]

Measurements of Endosulfan (alpha and beta) collected during the first ten months of the year 2005 show low concentration of the chemicals in the preserve (Fig-1.6). The number of samples decreased during 2006, but the few collected results revealed a sharp increment of the level. At the beginning of 2007, they returned to a low and stable.

Fig 1.6 – Chemical concentration evolution for Endosulfan (alpha and beta). [see a higher resolution image version]

Between January 2008 and January 2010 new measures were evaluated such as Aluminium, Anthracene, Benzo(a)anthracene, Cyanides, Pyrene, Selenium, Sulfides and Tetrachloromethane.

This group of chemicals is represented in an aggregated way at Part-C of Fig-1.4 and illustrated individually in Fig-1.7. Chemicals AOX, PCB 138 and Fluoranthene, are also part of the group although they were previously collected; AOX has measures from the year 1998, but they were discontinued until 2008; PCB 138 and Fluoranthene have a single sample in June of 2007 before the active period in 2008.

Fig 1.7 –Chemical concentration evolution for Aluminum, Anthracene, Benzo(a)anthracene, Cyanides, Pyrene, Selenium, Sulfides, Tetrachloromethane, AOX, PCB 138 and Fluoranthene. [see a higher resolution image version]

B. Analysis of sampling

To understand the situation in the preserve we analysed the chemicals collected by location and their frequency. A custom interface for this specific exploration (Fig-1.8) was developed. This interface integrates three visuals: a dendrogram (Part-A) that illustrates the clustering of chemicals collected in each sampling (using Tanimoto distance), a projection of water analysis by (Part-B) and, a detailed representation of the assessed measured in each sampling type (Part-C).

According to hierarchical clustering performed and illustrated in Part-A of Fig-1.8, we consider that four clusters characterise the types of samples collected in the preserve. Part-B of Fig-1.8 shows a progressive decline in the diversity of the measures evaluated, especially from the year 2008.

Cluster 1 is the most diverse analysis type (which collects the most substances). It was the most typical analysis until 2008, and it was frequently performed in the location Cc-Sakda and De-Kannika (last position in the hierarchy from Fig-1.1) from the year 2014.

Cluster 2 can be considered a simplification of Cluster 1. The intensity of color is softer in Cluster 2 compared with 1 (Part-C). This type became the most common type after the year 2008 in all locations.

Cluster 3 seems to be a particular analysis focus on the measurements of  AGOC-3A, Chlordinine, and Methylosmoline.

Lastly, Cluster 4 only evaluated water temperature and was only performed at location Dc-Chai.

 

Fig 1.8 – Screenshot of interface for the sampling types. It shows the results using a clustering threshold of 150. The overview (Part-B) uses one-dimensional projection to represent the different sampling types by time in the y-axis,  i.e. the projected values put together similar water samplings and separate the different ones. The colour of points represents the classification of resulting clusters from the dendrogram (Part-A).[see a higher resolution image version]

        

  1. What anomalies do you find in the waterway samples dataset?  How do these affect your analysis of potential problems to the environment? Is the Hydrology Department collecting sufficient data to understand the comprehensive situation across the Preserve? What changes would you propose to make in the sampling approach to best understand the situation? Your submission for this question should contain no more than 6 images and 500 words.

        

Finding 1: Disappearance of the signals

Based on the hierarchy defined in Fig-1.1, we assume that if a set of locations are in the same connected waterway, the upstream position must broadcast the signals to the downstream sites. i.e. signals detected at the beginning of the waterway should be still visible at the end of the waterway.

During the analysis process, we discovered signals that disappeared on the subsequent sampling points. At section C, i.e. locations Somchair, Achara and Sakda, the chemicals Methylosmoline, Fecal coliforms, Chemical Oxygen Demand and Mercury had a high concentration in locations Somchair or Achara as shown in Fig-2.1. However, these signals are not detectable in Sakda, a downstream place in the waterway flow.

Fig 2.1. Chemical concentration evolution for Methylosmoline, Fecal coliforms, Chemical Oxygen Demand (Mn) and Mercury in section C (Somchair, Achara and Sakda).[see a higher resolution image version]

Similarly, in section D we detected high signals of the chemicals Ammonium, Orthophosphate-phosphorus, Fecal streptococci, Mercury, Total coliforms, Total dissolved salts, Total nitrogen, Anionic active surfates and Methylosmoline (Fig-2.2). These signals appeared in location Da-Kohsoom and Dd-Busarakhan but not in the following positions, Dc-Chai and De-Kohsoom respectively.

The quick disappearance of the signals may be due to the ingestion or absorption of these products by the fauna or flora, causing severe damage of the environment if toxicity level is reached.

Fig 2.2. Line charts of samples for chemicals Ammonium, Orthophosphate-phosphorus, Fecal streptococci, Mercury, Total coliforms, Total dissolved salts, Total nitrogen, Anionic active surfates and Methylosmoline in section D (Kohsoom , Boonsri, Chai, Busarakhan, and Kannika). [see a higher resolution image version]

Finding 2: Consistent sampling frequency with one exception

Previously we identified different types of samples (Fig-1.8) but did not investigate their frequency. Fig-2.3 shows the weekly number of samplings by location. We easily recognize that one analysis per week is the most normal behavior although from 2014 there was a small fluctuation between one and two analysis per week in all the locations. The increment of analysis in location Dc-Chai corresponds to the type of analysis described in Cluster 4 of Fig-1.8.

Fig 2.3. Weekly number of water analysis by location [see a higher resolution image version]

Finding 3: The frequency of diverse tests declines over time

Fig-2.4 illustrates the diversity of the samples, representing the different number of substances in the analysis.

The strong fluctuating trend between two consecutive weeks represents an alternating between diverse sampling and non-diverse one.

The number of substances evaluated on average tends to be around thirty although a significative increment can be identified between the years 2005 until 2010.

In general, the interpretation of Fig-2.4 indicates that periodicity of diverse analysis decreased mainly in the last two years of data.

To have a clearer understanding of the situation, we suggest having weekly samplings using the reference the sampling types performed from 2005 to 2010.

Fig 2.4. Diversity (number of chemicals evaluated) of water analysis by time. More than one analysis per week was computed as combined diversity. High variability periods show a different amount of chemicals evaluated between samplings. [see a higher resolution image version]

Finding 4: Number of samples per measure

Traditionally, chemical analysis has the standard procedure of three cases per measures to mitigate the effect of external factors. Fig-2.5 represents the range between the minimum and the maximum number of samples per measure. Although the global mean is three, we found unexpected findings:

Fig 2.5. Range (minimum and maximum) of samples per chemical by time. The area defines the range of samples taken in the samplings. For example, an area range between 2 and 4 means that all chemicals were analysed twice and at least one chemical was analysed four times. The plot aggregated the data by month. [see a higher resolution image version]

  1. After reviewing the data, do any of your findings cause particular concern for the Pipit or other wildlife? Would you suggest any changes in the sampling strategy to better understand the waterways situation in the Preserve? Your submission for this question should contain no more than 6 images and 500 words.

Substances outside of the normal range

Health organisations have defined toxicity levels and normality ranges for most of the chemicals in water; this value determines the limit or interval concentration without risks to the environment. Above the limit, the chances of derived problems due to contamination increase.

We explored the literature to determine both the normality range or the toxicity level for each substance collected in the preserve. A value above or below of the acceptance limits was categorised as a dangerous for the wildlife.

The developed interface presented at Fig-1.2 allows to highlight chemicals outside of the acceptance range and therefore they present a potential risk for health (Fig-3.1). We distinguish two types of findings, sporadic hazardous values and constant hazardous values that are sustained over time.

All finding detected are listed by location in Table 3.1; additionally Fig-3.1 complements the table illustrating individuals line charts by compound and location.

Fig 3.1. Overview of the measures from 1998 to 2016 using a clustering threshold 100 and the detections of values outside acceptance range activated.  [see a higher resolution image version]

Table 3.1. List of anomalies detected by location. Chemicals are linked to the source used to determine the toxicity level or normality range of the substances.

Locations

Aa

Ba

Ca

Cb

Cc

Da

Db

Dc

Dd

De

alpha-Hexachlorocyclohexane

x

x

x

x

x

x

x

Ammonium

x

x

x

x

x

x

x

x

x

x

AOX

x

x

x

x

x

x

x

Arsenic

x

beta-Hexaxchlorocyclohexane

x

x

x

x

x

x

x

Biochemical Oxygen

x

x

x

x

x

Cadmium

x

x

x

x

x

x

x

x

Cesium

x

Chemical Oxygen Demand (Cr)

x

Chemical Oxygen Demand (Mn)

x

x

x

x

Chlorides

x

x

Chromium

x

x

Dissolved oxygen

x

x

x

x

x

x

Fecal coliforms

x

x

x

x

x

x

x

Fecal streptococci

x

x

x

x

gamma-Hexachlorocyclohexane

x

x

Heptachloroepoxide

x

x

x

x

x

Inorganic nitrogen

x

x

x

Iron

x

x

x

x

x

x

Lead

x

Manganese

x

x

x

x

x

x

x

x

x

x

Mercury

x

x

x

Nitrates

x

x

x

Nitrites

x

x

x

Organic nitrogen

x

x

x

Orthophosphate-phosphorus

x

x

x

x

x

x

x

x

x

x

p,p-DDD

x

x

x

x

x

x

x

p,p-DDE

x

x

x

x

x

p,p-DDT

x

x

x

x

x

x

PAHs

x

PCB 101

x

x

x

x

x

x

x

PCB 118

x

x

x

x

x

PCB 138

x

x

x

x

x

x

x

PCB 153

x

x

x

x

x

x

x

PCB 180

x

x

x

x

x

x

x

PCB 28

x

x

x

x

x

x

PCB 52

x

x

x

x

x

x

Petroleum hydrocarbons

x

x

x

x

Potassium

x

x

x

x

x

x

x

x

Total dissolved phosphorus

x

x

x

x

x

x

x

Total dissolved salts

x

x

x

x

x

x

x

Total hardness

x

x

x

x

x

x

x

x

x

Total organic carbon

x

x

x

x

x

x

x

x

Total phosphorus

x

x

x

x

x

x

x

x

x

x

Zinc

x

x

x

x

x

x

x

x

x

x


        

Fig 3.2. Chemical concentration evolution for dangerous values. Values above or below the normality range are coloured with red or blue respectively. [see a higher resolution image version]

From all potential values detected (Fig 3.2), the substances with clearer signals are:

New sampling strategy

Fig-1.1 defined a hierarchy where the connection between locations was presented. A more efficient approach could be to focus the analysis on the downstream locations such as Aa-Decha, Ba-Tansanee, Cc-Sakda, and De-Kannika due to the signals at this location will be still detectable from previous locations. However, we should take into account that a few signals disappeared between locations as discussed in question 2, so these chemicals should be still evaluated in all locations.

One hundred six chemicals were measured in the preserve. We saw that some of them are correlated, therefore we think that a more efficient analysis can be defined. Considering that number of measures evaluated in each water sampling is around 25-30, we defined a list of chemicals that can represent  the situation best. We used MCLEAN (Multilevel Clustering Exploration As Network) technique to determine this list. We detected five clusters in the dendrogram (Fig-3.3), and they were visualised as a network at Fig-3.4. We recognise the high similarity (small differences) between elements in the dense sections of the graph, therefore fewer chemicals from this section will be required. The node aggregation in MCLEAN allows us to reduce the number of nodes in the resulting network (Fig-3.5). A node can contain several chemicals in Fig.3-5, however, the name of the label was determined the compound that has the most correlation with other compounds. The resulting network at Fig-3.5 provides a list 25 chemicals that can give a representative view of the  situation in the preserve.

Fig 3.3 Cluster of the 106 chemical collected in the preserve using Euclidean distance.[see a higher resolution image version]

Fig 3.4 MCLEAN network without node aggregation using a parameter threshold of 175. Chemicals that are within a threshold distance selected are linked (Alcaide & Aerts, 2018 ). [see a higher resolution image version]

Fig 3.5 MCLEAN network with node aggregation using a parameter threshold of 175.[see a higher resolution image version]