Aller au menu principal Aller au contenu principal

SOLEIL reaches one petabyte of experimental data storage

The first byte of experimental data was stored on DIFFABS on September 13, 2006, and now twelve years later, the scientists and users at SOLEIL will also remember the date of July 17, 2018­, when all the data stored at SOLEIL reached one petabyte.

The petabyte is a unit to measure digital storage capacity equalling 10 raised to the 15th power bytes (1 Po = 1015 o). This equals to 55 times the American Library of Congress—if it were exclusively digital—, and also to the amount of data a human brain (often compared to a computer) is capable of storing with a consumption of only 20 watts.

Using the successive versions of detectors that have been made available since 2008, the scientists and users at SOLEIL have produced tremendous amounts of data that continue to feed scientific knowledge, discoveries and innovation. This scientific equipment is constantly being upgraded and relies heavily on data storage infrastructures for its operation, as these allow for the collection and analysis of increasing amounts of data that contribute to progress in science. The improvements have involved several significant upgrades in SOLEIL's IT architecture, initially set up in 2006.

Handling such large amounts of data proves quite complex, and managing one petabyte is a challenge of its own. Data isn't “frozen”—it has its own life cycle, it develops and then it ages. Once produced, data must be stored and archived for future use, and it must be protected at each step of the process to avoid any loss. Such tasks become more complex as the volume of scientific data increases.

In the past 10 years the amount of data stored at SOLEIL has constantly increased with a sharp rise in the past 3 years due to the arrival of the latest high-performance detectors and multi-detector acquisition methods. As of September 20, 2018, 195 million files were stored for a total volume of 1.05 petabytes, i.e an average of 5 megabytes per file.

 

Although all beamlines use the data storage platform known as RUCHE, some require more storage space: unsurprisingly, the biggest “consumers” are the beamlines dedicated to macromolecular crystallography (PROXIMA-1 and PROXIMA-2) and tomography (PSICHE, and soon, ANATOMIX) as they produce larger amounts of data. These beamlines are equipped with “two-dimentional” detectors, i.e. in which each acquired pixel contains data, and record a large number of images in order to identify the properties of the sample. Beamlines such as NANOSCOPIUM, with “scanning” experiments, are expected to produce similar amounts of data in the short term. On these beamlines, the X-ray beam that illuminates the sample to probe it only measures (a few hundred nm)2. The sample is moved horizontally and vertically with respect to the beam for its characterization, and several sets of data are recorded “in flight,” meaning that the result is an image with a very large number of pixels.