VESPA Comes of Age
Stéphane Erard (Observatoire de Paris) explores the evolution of Europlanet’s virtual access service.
Read article in the fully formatted PDF of the Europlanet Magazine.
Like many other fields of research, planetary science has experienced a huge increase in data over recent years. The exponential growth of the volumes of data and their increasing complexity calls for new and more efficient ways to handle them. Not only is this needed to ensure optimal exploitation, but also to facilitate the comparison between independent but related data records, for instance imaging and spectroscopic observations of a small region on Mars.
Practical issues include how to identify observations overlapping in space and time, or specific illumination configurations in datasets from space experiments. Finding planetary data of interest in telescope archives, locating relevant reference data for laboratory or field measurements, or accessing simulations of actual observations for any given field, such as planetary atmospheres, can be challenging.
The key to solving these problems is to have a metadata system that describes the data in an informative and uniform way. Although this has long been recognised, several such systems currently co-exist in planetary science. Thus, in practice, the issues remain: archives are not always consistent between successive space missions; descriptions in telescope archives are more adapted to astronomical data than moving objects; experimental measurements use a variety of proprietary formats; and various fields of Solar System studies make use of entirely different systems (for example, the standards used for plasma data and planetary surfaces data have little in common). Beyond the metadata descriptions, there are further inconsistencies in accessing the data themselves e.g. although NASA’s Planetary Data System (PDS) and telescope archives provide some similar services to users,1 the access and query methods can be entirely different, and each community uses specific data formats.
During the first Europlanet Research Infrastructure (RI) programme (2009-2012), the Integrated and Distributed Information Service (IDIS) activity conducted a study of existing and desirable data handling systems that could be used to develop interoperability in this field and facilitate access to Solar System data in general.2
The outcome of this study was to identify the Virtual Observatory (VO) as the most mature and promising infrastructure to handle planetary science data in general, and the most flexible provider of interfaces for thematic use. The VO has been under development since the early 2000s to address similar issues in astronomy.3 However, astronomy provides a less challenging context: all objects are located in the same coordinated frame, most data relate to measurements of light (rather than physical interactions), and time variations are simpler (e.g. there are no seasons or daily cycles).
During the Europlanet 2020 and 2024 RI programmes (2015-present), VESPA has focused on expanding the VO to accommodate planetary science data and take advantage of developments elsewhere in the VO.4 The first step has been to define a uniform description of data that could encompass all fields of Solar System research, at least at the top level, so that users could search for data of interest for their research. This metadata system, called EPNCore, is associated with a common mechanism to send queries online (Table Access Protocol), to form the EPN-TAP protocol accessing Solar System data. Both EPNCore and EPN-TAP are now a standard of the International Virtual Observatory Alliance (IVOA), the consortium which supervises VO developments. As a VO standard, the EPN-TAP data search and access protocol is compliant with the Open Science policy, and in particular with Findable, Accessible, Interoperable, Reusable (FAIR) principles, which were pioneered through the development of the VO.5 EPN-TAP not only opens the use of powerful VO tools to visualise and analyse the data, but can also be interfaced with other environments used in planetary science for specific applications, e.g. Geographic Information Systems (GIS) for planetary surfaces or the Space Physics Archive Search and Extract (SPASE) standards for plasma data.
VESPA also uses the publication system of the VO to make the data accessible. A simple and easy procedure has been identified so that any institute can publish their data online with minimum effort and make their data services visible in the VO. The VO uses a distributed infrastructure, with no single data centre. Each institute hosts their own data services and simply declares them in a common registry, so that even small teams have the same visibility as large space agency databases.
There are currently 63 EPN-TAP data services published from more than 20 institutes. The VESPA portal is a specific user interface that queries all planetary science services together and locates not only datasets of interest, but also detailed configurations inside many datasets, based on space/time/spectral/illumination coverages, and provenance of the data. Space borne data are already discoverable this way (e.g. ESA’s Planetary Science Archive provides an EPN-TAP interface), and further interfaces with archives from space agencies are being developed with a common dictionary for PDS4, a project of the International Planetary Data Alliance, of which VESPA is a member along with the space agencies.
Beyond the publication of data related to a research paper, there are many applications for such a system. Space experiments can use EPN-TAP as an off-the-shelf data-management system that allows restricted access (an important facility used by the VESPA team itself). Small nanosat projects can use it as a data handling, archiving and distribution system. EPN-TAP can help experimental projects to promote their data in a simple way by favouring cross-searches with observational data (e.g. through the PVOL service).6 Ground-based observational networks and pro-amateur projects can use it to share their data (e.g. as demonstrated by PVOL and other observational networks within the Europlanet project).7 EPN-TAP also provides a simple and cost-effective solution for making data open access, and is thus supporting Open Science policy, as required by funders (e.g. in Horizon Europe).
EPNCore can help to handle composite datasets during a research project and provides an easy interface with tools such as TOPCAT, Aladin or CASSIS to manipulate tables, images, cubes, or spectra.8 Installation on the European Open Science Cloud (EOSC) is also being assessed by VESPA.
In summary, VESPA has developed an infrastructure to handle and publish large datasets in many areas of Solar System and exoplanet studies. Through deep roots in the VO, and recognition as a community standard, VESPA has secured a sustainable long-term future.
- https://bit.ly/TOPCAT; https://aladin.u-strasbg.fr; http://cassis.irap.omp.eu