Tame climate data

World Data Center Conference on Marum successful

Climate model NOAA / GFDL
Read out

The Directors of the 51 World Data Centers held a conference on the future of the largest collection of open source scientific data from 7 to 9 May at the 2007 World Data Centers Conference. The meeting took place at the World Data Center for Marine Environmental Data (WDC-MARE) in Bremen, which is operated jointly by the MARUM_DFG Research Center Ocean Margins and the Alfred Wegener Institute for Polar and Marine Research in Bremerhaven.

Scientists need and produce data in exponentially growing volumes and ever-increasing complexity. For example, to research how climate will develop in the future, researchers will analyze information about the past, air temperature, water temperature, CO2 content, solar radiation, magnetic field, ocean currents, ice cover, the location of the continents, and much more from very different scientific directions. Such trans-disciplinary measurements can not be gathered by a single researcher, if only because he has the necessary expertise for only one or a few areas. The world databases offer free access for anyone interested in data from a wide range of disciplines.

50-year tradition

With the beginning of the International Geophysical Year 1957/58, the precursor organization of the International Council of Science (ICSU) established the World Data Centers (WDC). They should store and manage the observations from the largest scientific measurement campaign to date. Early on, the centers switched to machine-readable storage media - initially, these were punch cards. Today, fifty years after its founding, the centers around the world are archiving data on computers with enormous storage capacity and computing power. The data covers all areas of basic scientific research related to the Earth.

In Germany, there are now three world data centers: one for climate in Hamburg, one for satellite data on the atmosphere in Oberpfaffenhofen and just the -WCD-MARE in the state of Bremen. There are currently around two billion data points in the WDC-MARE system, which can be queried in seconds in any combination. It is precisely this management that makes the most of the data centers' added value.

Rapid data growth

"Every two years, the volume of scientific knowledge doubles, " explains Michael Diepenbroek, head of WDC-MARE. "The explosion of knowledge fortunately goes hand in hand with the rapid development of computer and memory performance of modern computers. This is the only way to make efficient data management possible. "Anyone who has bought a computer in recent years knows how fast the sizes of hard drives have risen at constant or even falling prices. At the beginning of the nineties, 10 megabytes cost the equivalent of about 250 euros - today, you get 100, 000 times this amount, one terabyte. display

The WDC-Mare alone has a capacity of 1, 200 terabytes. There are 3, 000 tapes of 50 to 400 gigabytes in size. And all this is not just once, but twice stored in different buildings to be prepared against all eventualities. Overall, however, this data explosion means that management is making up an increasing share of the scientific work. "In order to incorporate a publication with an average of ten data sets into our system, a scientist needs about half a day, Diepenbroek estimates the workload.

Complex climate models

The requirements for the different data centers are very different. While we deal mainly with very complex and diverse data, the WDC-CLIMATE in Hamburg has to deal with huge amounts of data. The Hamburg WDC is currently the only one that stores the results of computer-generated climate models. Bremen, on the other hand, is the only center that combines data from various disciplines in its database. This causes a problem, since different scientists already record their data differently. When measuring from different disciplines to the same parameter, the difference is even greater.

We also have to document exactly for each data point how the information was generated, which units, which measuring devices or methods were used. Otherwise, they can not be compared. Just because scientists are always using new parameters, new measuring devices and methods, the data managers at the centers are always facing new challenges. Because: Only comparable data is valuable data. One of the biggest challenges for us is the quality control of the data.

Expensive search

But to use the treasures from the depths of the databases, scientists must first find them. DatenThe data of the German centers can all be found via a common portal via full-text search and even via Google. The keyword nitrate, for example, then provides all the data sets that contain this parameter. reports Diepenbroek. The expansion of this function to the other centers was one of the objectives of the conference. To this end, around 70 participants from all over the world decided on a pilot project of five Chinese, three American and three German centers. By the end of the year, they should be able to search through a common portal using a full-text search in a way that is as quick and convenient as searching the Internet with the help of search engines like Google.

The networking capability allows data from the WDCs to be retrieved through other portals and systems, such as the Global Earth Observation System (GEOSS). This is an international initiative launched by the Group on Earth Observations (GEO) of more than 68 states. The vision of GEOSS is to better understand the dynamic processes on Earth through a coordinated information network. The project can thus also help to monitor compliance with environmental contracts and to provide information for political decisions. In his presentation, GEOSS Director Prof. Jos Achache highlighted the role of WDCs as global long-term data archives. In this role, they will also be used for the International Polar Year (IPY) from 2007 to 2009. The IPY's worldwide program involves some 50, 000 scientists, who will collect approximately 500, 000 data records. This corresponds to the amount of data collected by the German WDCs in 20 years.

Another discussion point during the conference was the civility of data sets and data sources. The databases are made by the scientists who provide their data. This is connected with work, but so far with few advantages. Therefore, the data sets should be cited in the same way as a scientific publication and should receive a corresponding label, the so-called DOI number. Thus, the work of the publishing scientists is appreciated and can be cited by colleagues in their own articles. The German WDCs are pioneers in this field and have presented their experiences from a pilot project at the conference. "However, that was probably still a dream for most of the participants - until the time has come for the centers worldwide to follow us, it will take quite some time, " regrets Michael Diepenbroek.

(Kirsten Achenbach, MARUM_Research Center Ocean Borders, 21.05.2007 - AHE)