EU FP7 - diXa

Data Infrastructure for Chemical Safety

finalized

The main scientific concept which has initiated the diXa proposal, is to create a large public data infrastructure of genomic signatures of drugs, industrial chemicals and cosmetics, and to develop pattern-matching bioinformatics and biostatistics tools to detect similarities among these signatures, in order to describe all biological states induced with a chemical exposure, in terms of genomic signatures relevant for the human situation in vivo. These genomic signatures are to be provided to the diXa data infrastructure by past, ongoing and future EU Toxicogenomics projects (Workpackage 3). Next, where novel disruptive innovations and scientific breakthroughs are often born when fragments of data are unified unconventionally, this data infrastructure for toxicogenomics is to be linked with data bases holding information on molecular disease signatures from humans as provided by other research communities which investigate - beyond the Toxicogenomics Research Community - human pathobiologies (Workpackage 5), by applying an in silico model for that. Obviously, this infrastructure also needs to be connected to data bases which contain records with peer-reviewed chemical, physicochemical and detailed toxicological information on multiple classes of toxic chemicals (Workpackage 4).

This approach will enable translational interdisciplinary research by allowing for this connectivity, and thus, for integrated meta-analysis (Workpackage 8). By accessing such a resource and by applying the developed in silico model (Workpackage 9), a researcher studying a drug candidate, a novel cosmetic ingredient of a suspected environmental toxicant could thus compare its signature obtained from an in vitro cellular assay, to the human diseases database to discover unexpected connections, thus predicting its toxicity without the use of an animal test.

Toxicogenomics e.g. whole genome gene expression analysis through microarray technology (transcriptomics), has already been successfully applied to in vitro models using human cells, for the purpose of predicting toxicity, for instance with respect to genotoxicity/carcinogenicity, liver and kidney toxicity and endocrine disruption, in general being able of predicting toxicity with an accuracy of above 85%. While this is already promising, more advanced research strategies, all characterized by increasing data richness, are nowadays expected which will integrate the multiple data streams derived from transcriptomics, proteomics, epigenomics, metabonomics as well as next generation sequencing, with traditional toxicological and histopathological endpoint-evaluation into a systems biology or rather, systems toxicology approach.

The ultimate goal of this expected shift in the paradigm of toxicity testing as pursued by the Toxicogenomics Research Community, is the development and deployment of a data infrastructure and e-science environment, thus promoting toxicology from a predominantly observational science at the level of disease-specific models in animals to a predominantly predictive science focused on broad inclusion of target-specific, mechanism-based, biological observations in vitro.

The ever increasing data stream in toxicogenomics is demonstrated by the fact that it all started, some 8 years ago, with microarray-based whole genome gene expression analysis producing several tenth of kilobytes per experiment, in a second stage expanded to several megabytes per experiment through advanced microarray technologies such as ChIP-on-Chip-based epigenomic analysis and genome-wide association studies (GWAS) and through metabonomics while we are now awaiting exponential growth in data streams through the application of next generation sequencing technologies generating up to terabytes per analysis, which will undoubtedly lead to a wealth of novel insights in transcript structure and gene expression. In general, systems toxicology will thus benefit from defined workflows and standard operating procedures specified for specific data analysis and modelling tasks. Formalizing these workflows, for example through web service workflows, will help reproducibility, accuracy and sustainability of the applied methodology.

Therefore, bringing such large quantities of genomic data from collaborating EU Toxicogenomics projects together for enabling cross-fertilisation of scientific results and favoring cross-studies meta-analyses for identifying innovative molecular pathways for assessing chemical safety in in vitro models, involves three main tasks:

Assembling data from appropriate repositories developed by these EU Toxicogenomics project (and possible other depositors), and which till date, are still locally isolated, thus increasing the scale of federation and interoperation of these local data infrastructures;
Establishing methodology for efficiently querying assembled large toxicogenomics data collections, thereby providing for better synergies between these repositories;
Integrating information from a variety of experimental data types, including chemical data and molecular data on human disease, thus bridging across disciplines.

The diXa project will therefore develop a sustainable, web-based, open and publicly accessible data storage infrastructure capable of seamlessly capturing these increasing toxicogenomics data flows, which will be appropriately linked with existing web-based data bases holding chemical information, as well as molecular data of human disease, thus also enabling novel, integrative approaches to the challenge of developing better alternatives to current animal models for chemical safety testing. In this respect, the proposed initiative will build on past EU initiatives such as the Network of Excellence EMBRACE, SLING, and BIOSAPIENS, and also leverage ESFRI investments, such as ELIXIR, EATRIS and EU-OPENSCREEN. It is expected that the generic tools and services developed under the diXa e-infrastructure could be used as a proof-of-principle for the further development of research infrastructures in Europe and in particular for the implementation of clusters of ESFRI projects in the domain of Biomedical and Life Sciences.

The diXa project will therefore also contribute to the Digital Agenda for Europe which outlines policies and actions to maximize the benefit of the Digital Revolution for all, by tackling important problems as identified by the Digital Agenda, such as fragmentation of digital information, lack of interoperability, lack of investment in networks, as well as insufficient research and innovation effort, particularly in the area of finding better safety testing models. In doing so, the diXa project will address important societal challenges from which also European chemical manufacturing industries (pharma, chemical industry, cosmetics industry, food industry) will clearly benefit, through being able to develop novel, safer products and services which are healthier for patients and consumers, thus also improving competitiveness of EU’s economy.

List of participants:

Participant no. *	Participant organisation name	Participant short name	Country
1 (Coordinator)	Maastricht University	UM	The Netherlands
2	European Molecular Biology Laboratory	EMBL	Germany
3	Max Planck Society for the Advancement of Science	MPG	Germany
4	Genedata	GD	Switzerland
5	Imperial College London	IMPERIAL	United Kingdom
6	EU Joint Research Centre	JRC	Italy
7	Klinikum der Universitaet zu Koeln	UKK	Germany