Drew Hamilton Develops New Prototype for the Global Historical Climate Data through the NOAA- NCEI Internship
Drew Hamilton, a rising junior majoring in Computer Science worked this summer for the Cooperative Institute for Satellite Earth System Studies (CISESS) under the National Oceanic and Atmospheric Administration (NOAA) 2020 Young Summer Scholars program (ncei.noaa.gov/news/2020-young-summer-scholars). Each year NOAA hosts and mentors early-career scientists as they gain experience in science and technology.
Hamilton’s work as an intern focused on experimentation and development of a new prototype data processing and data analysis system. He applied graph database architecture to increase the efficiency and function of the Global Historical Climatology Network database, one of the biggest climate observation datasets.
“It was rather difficult to get an internship in the midst of COVID-19,” said Hamilton. “ I was fortunate enough to get this opportunity.”
Hamilton is one of only 17 interns selected to support the work of NOAA’s National Centers for Environmental Information (NCEI) in a wide range of topics—from analyzing solar weather on the sun to analyzing the impacts of extreme weather here on Earth.
The National Oceanic and Atmospheric Administration stewards the world’s largest environmental data and their records include everything from ice cores to weather observations dating as far back as 1763. It is a permanent archive for the Nation (and World’s) geophysical data.
“One of the current challenges with GHCN-D is the computational efficiency,” said Hamilton. “The database—around 115,000 stations spanning more than 250 years of daily data— is continually growing in size, which will pose significant problems as the database adds new and diverse station data, especially true with the current storage system, which essentially operates on ASCII text files,” he explained.
He created a parallel-version of the existing database called GHCN-D (Global Historical Climatology Network Daily), which hosts daily temperature, precipitation, and a myriad of other measurements using the graph database technology for optimizing storage, updating, and processing of the data.
A new kind of NoSQL technology has become more widely known in recent years, Neo4j, which relies on graph theory. Because graph database languages are a relatively new concept comparatively few examples exist for their application to climatic data.
Hamilton’s role in this project has been to develop and deploy a new graph-enabled version of GHCN-D and explore the possible efficiency and analysis advantages that using Neo4j and Cypher query language could bring to climate data processing at NOAA-NCEI: facilitating simpler maintenance and easier academic usage of the dataset.
“Though I knew on some level how interdisciplinary Computer Science is by nature, it really surprised me to have an opportunity to work on a problem so directly intertwined with Atmospheric Science,” said Hamilton. “This project also tested my foundations in what I would consider largely my academic hobbies: Mathematics, Science, English, and even Art: foundations which were reinforced by the University of Maryland Curriculum.”
In future, Hamilton plans to deploy this database on the cloud and make the tool available for public use.
The Department welcomes comments, suggestions and corrections. Send email to editor [-at-] cs [dot] umd [dot] edu.