Visualizing Alaska King Crab Data with Spotfire 

CMSC838B: Information Visualization

Jinwook Seo 

jinwook@cs.umd.edu

 

Introduction

Alaska Department of Fish and Game produced the Kodiak Island king crab survey data. The underlying purpose of the survey was to gather biological and abundance information to be used in the management of king crab (opening and closing of fisheries, setting of quotas, etc.) It also presented a set of general questions and issues to stimulate the analysis. Major ones are as follows :

Originally the data set was intended to be analyzed statistically. Before starting the statistical analysis, an interactive visualization tool can help us to set a good direction to further analysis. I selected Spotfire as a visualization tool since it seemed to be suitable for visualizing a simple tabular data set like this King Crab data set. As a consequence, I found some interesting patterns in the data set, and they could be good pilot results for the actual statistical analysis.

 

King Crab Data

The crab data set consists of 9 survey data files. Each of them has a different dimension, size, and category as follows.

file lines columns bytes contentes
survey  3450  14  244950  the basic survey data
kodiak  2687  48264  map coordinates for the Kodiak Island shoreline
dstns  1845  40590  distributions of crab by size, year, category
fleet  23  946  commercial fleet/catch/price data, by year
catch  96  2112  commercial catch data, by year, district
eggs  14  140  average number of eggs per female, by year
salinity  60  718  ocean salinity, by month
celsius  68  741  ocean temperature, by month, in degrees celsius 
fullness  1170  37440  distributions of females by clutch fullness

 

Visualization Results

Basically I tried to find rough answers to the questions described in the introduction section. So, I will present my visualization results that could give an idea about each question.

 1. Can trends in crab abundance be characterized?

As you can find in the following scatter plots, the number of crabs was approximately decreasing through all the age levels and sexes. An interesting fact is that the number of eggs per female increased rapidly. Therefore we can conclude that the hatch rate decreased or the death rate of embryos was increased.

2. To the extent that abundance has declined, what factors are linked to the decline?

The following graph shows the number of vessels registered for fishing(red line), the number of crab caught(blue line), and the wholesale price of king crab in dollars per pound(black line). As you can see in the graph, the number of crab caught decreased even though the number of vessels registered increased. Therefore I can conclude that overfishing is not a major cause. However overfishing could be a minor cause which accellerates the decline.

King Crab's spawning takes place in shallow water in late winter and early spring. Adult King Crab prefers temperatures of 0° to 5.5° C. Considering these two facts, the following results suggest that the environmental factors contributed to the declination of King Crab's abundance. I think that the low salinity level and high temperature prevented embryos from hatching. My reasoning is as follows.

 Global warming effect might be a possible cause of the increase of celsius. And the high temperature makes the iceberg in artic regions melt into water, which results in the low salinity.

The survey data covers at most 2 decades, so I cannot see any natural population cycle only from the data set since the life span of King Crab is almost 20 years.

 3. Characterize the dynamics of the crab population

The next figure is an animated gif file that shows the length distribution of adult female King Crab. We can see the gradual slight left shift of the peak as time goes by. It means that the King Crab society was getting older since the length of the King Crab generally is proportional to the age.

4.  Characterize the changing spatial distribution of the crab population

There were no special changes in the spacial distribution except that the population sizes decreased in all districts.

5.  By comparing abundances of recruits, prerecruits and postrecruits from year to year, can cohorts be identified and tracked?

I could see a peak near 1976 in the graph titled "Average # of pre-recruit-4 male", and the peak gradually shift to the right in the following three graphs. The peak means that there are cohorts in the population at the time period since the crab population usually is known to be large when there are strong cohorts. 

6.  Can anything be said about the "social" interaction of crab with regard to sexes and age groups?

The aging process and length distribution of adult male are similar to those of adult female as depicted in the animated graph in question 3. This fact says something about the social interaction of crab with regard to sexes and age groups. There seems to be natural population balances between two sexes. I'm not sure but there are some interactions between male and female groups of similar size(or age). 

7.  Although the primary interest is in recruits, clearly having a plentiful supply of fecund females to produce new generations of crab is essential. 

As you can see in the graph titled "Number of eggs per female", there seems to be no problem in the fertility of females. As I mentioned already, however, the abundance of females seems to be in jeopardy. The small population of juvenile female also could be a potential jeopardy.

I tried to find some correlations by comparing the two graphs titled "Average # of adult female in a pot" and "Average # of recruit males", but I could not find any correlations between the two factors.

 

Critiques of Spotfire

Spotfire is a powerful visualization tool. We can use Spotfire to get a brief but important  idea about the raw data before we start more complex and difficult data analysis. Direct manipulation combined with interactive display makes it easy to find some important patterns in a relatively large raw data. Spotfire is effective to show some correlations among 2 or 3 data attributes. It is easy to learn, and the learning can be retained overtime.

However, Spotfire has some limitations. It does not support many commonly used aggregation functions(count, summation, average, standard deviation, etc.). I could not standardize the raw data to compare them in the same scale, so I should preprocess the raw data using MS Excel before importing them. Another limitation is that it difficult to visualize the hierarchical data and show the correlations in high-dimensional data set using Spotfire in a natural way.

Data Source : StatLib, The CMU stat department