Public Service - analysis_opinion_debate

The grid to greatness

Thursday, January 07, 2010

CERN's Bob Jones looks at how large-scale science, collaboration and e-infrastructures work, and how we can make them last

technology collaboration
Cutting edge science in the 21st Century generates and uses far more data than researchers 100 years ago could dream of. Frequently, scientists working in physics, astronomy, biology (and many other fields) are part of large, international collaborations – using elaborate, expensive equipment – that produce a true flood of information. These scientific collaborations need 'e-infrastructures' (networks and computers) to store, circulate and process their data. Such infrastructures are essentially the 'lifeblood' of the experiment. Just as a physical body needs a supporting circulatory system to integrate functions within the body, this structure connects all members of the project and enables them to work together.

How are they used?
The world's largest science experiment, the Large Hadron Collider, located at the European Organization for Nuclear Research (CERN) in Geneva, Switzerland, is expected to produce 15 petabytes of data per year (roughly three million DVDs or 20,000 years of music in MP3 format). This data can be securely accessed and processed by a tiered network of sites all over the world. The e-infrastructure model set up for the LHC has proven successful and is currently used by many other scientific collaborations.

This particular solution, known as 'grid computing', pools existing distributed computing resources in the form of computing clusters, located around the world and connected by high-speed internet links. Grid systems spread the computing load, easing the cost of resources for any one institution. Politically, contributing countries can retain resources within national borders, rather than in a centralised location.

In Europe, the publicly funded project 'Enabling Grids for E-sciencE' (EGEE)1, like its counterpart 'Open Science Grid' in the US, originally sought to respond to the data access and processing requirements of high energy physicists. Today, however, due to the success of grid computing as a framework for collaborative work, these projects support research in a range of disciplines: from astronomy to finance, and the humanities to epidemiology.

For example, the Cardiogenics Consortium2, coordinated at the University of Lübeck in Germany, investigates the genetic causes of one of the world's biggest killers – coronary artery disease. In their work, published in the March 2009 issue of Nature Genetics, the project used the EGEE infrastructure to do computations that would have taken two years (on a single processing core) in fewer than 45 days. In this short time, they identified possible genetic candidates for the causes of a disease that kills more than two million people a year in Europe alone.

What will happen in the future?
The EGEE project will come to a close at the end of April 2010. A new organisational model, implemented by the European Grid Initiative (EGI)3 and bringing together National Grid Initiatives from more than 20 countries, will take over, and ensure the sustainability of the European grid computing infrastructure.

Europe's publicly funded e-infrastructure is also likely to evolve in technology and form. Whether the resources are from grids, clouds or supercomputers, most user communities are unconcerned as long as data management facilities are easy to use, yet powerful and secure. External commercially operated clouds, which allow the user to create 'virtual computers' that include the applications and operating systems of their choice on demand, can represent a good for purchase solution. However, some user communities require computation only possible on sophisticated, and expensive, high-performance resources such as supercomputers – currently not provided by clouds.

Each computing paradigm has its advantages and drawbacks, and, in the future, a custom fit solution for each user community will surely work the best. Standards bodies, such as Open Grid Forum, aim to make it easier to bring clouds, grids and supercomputer installations together by defining simple interoperable interfaces. The interoperability groundwork EGEE has undertaken with supercomputing structures (such as Distributed European Infrastructure for Supercomputing Applications, DEISA/PRACE4), volunteer grids (such as Enabling Desktop Grids for e-Science, EDGeS5), and cloud systems (such as DigitalRibbon6), and has been driven by the needs of users, such as the fusion and life sciences communities.

These distributed computing solutions will continue to complement grid computing in the future, helping them to support tomorrow's scientific collaborations.

1 http://eu-egee.org
2 http://cardiogenics.eu/web
3 http://eu-egi.org
4 http://deisa.eu
5 http://edges-grid.eu
6 www.digitalribbon.com
COMMENTS





YOUR COMMENT WILL BE APPROVED BY A MODERATOR
EMAILS WILL NOT BE SHOWN.