CEFIC LRI C4 – Towards the development of an omics data analysis framework (ODAF) for regulatory application
Omics has enjoyed a great deal of success in research. Nevertheless, the use of omics data in regulatory assessment has been hindered, in particular, by the different approaches to the processing of the data can lead to different outcomes even from the same data set.
Development and acceptance of common foundation methods on data analysis and reporting will assist in the acceptance of omics analysis as a fundamental tool for use in regulatory toxicology. This can be achieved by setting the guidelines for the foundation methods for data analysis that would allow an easy comparison between datasets, but not preclude the use of further analytical methods considered appropriate by the analyst.
A survey conducted by ECETOC found that to date, omics data has never been used to support a submission under REACH, though there has been use in supporting pesticide submissions in the USA. A problem is the lack of agreed consistent methods that can be applied to the analysis of ‘omic data.
In a research environment analysis methods are justified in publications in the scientific literature that have been independently reviewed. The regulatory environment however requires a more prescriptive approach where studies must be consistent, reproducible and comparable in order reach consistent decision making.
To start debate within the regulatory science community, an ECETOC expert team developed a ‘strawman’ framework for transcriptomics and other big data analysis for regulatory application. This debate has taken place through several meetings with ECHA, the OECD (EAGMST) and most recently a multi-stakeholder ECETOC workshop on Applying ‘Omics Technologies in Chemicals Risk Assessment (Madrid, 10-12 October, 2016).
All these discussions and debates have led to the preparation of three different Research for Proposals (RfPs):
Towards the development of an Omics Data Analysis Framework (ODAF) for regulatory application
Integrating Multiple Molecular-level Data Streams to understand (a) range of normal adaptation vs pathology and (b) molecular generated gene expression changes and persistence over time
Omics & Read Across
The first one is described below but the three individual RfPs are intended to be all interconnected as being part of the whole Omics package under the ECETOC-LRI long-term Transformational Programmes. Therefore, it is expected that researchers engaged on these projects will participate with the steering team and collaborate, share and exchange information with other grant awardee teams to the support the development of complimentary work products..
All RfPs will build on the output from the ECETOC Workshop.
Generate an Omics Data Analysis Framework supporting bioinformatics best practices. In first instance the project will focus on best practices for microarray and RNA-Seq (and perhaps new platforms like Biospyder) platforms up to differential gene expression identification.
The Framework will outline best practices in areas identified as problematic in omics data interpretation. These areas will include but are not be restricted to outliers and batch effects.
Specifically, this project will:
Identify suitable transcriptomic datasets from the publicly available sources using different platforms that are linked to a measured adverse outcome, have a defined exposure and utilise either a dose response or time course methodology. The datasets should come from a variety of platforms and where possible include new technologies e.g. RNA-seq.
Analyse the data using a variety of methods for interpretation and compare these analyses with the method established from the ECETOC workshops.
Develop common foundation methods that also include practices to recognise outliers in the data set and thresholds that should be observed for excluding outliers.
Present the common foundation methods at a meeting of the ECETOC expert team and other similar format for improvement.
By the end of this project, a validated ODAF containing the common foundation methods will be produced that can be applied to other CEFIC LRI Projects including but not restricted to RFP: Integrating Multiple Molecular-level Data Streams.
Establish the common foundation methods as benchmark best practices, combined with framework development to facilitate the uptake of these data streams in risk assessment.
To achieve this the project will:
- Undertake new analysis of pre-existing data e.g. data from TG-GATEs, open source databases such as GEO and ArrayExpress, and Percellome
- Use data sets that have been designed to optimise gene expression data for insight into MoA and toxicology with validated results e.g. MAQC/SEQC.
- Needs to be ‘future-proof” to address other technologies coming into the picture (e.g. next generation sequencing and beyond).
- It is of course possible that the statistical power will not be sufficient to decide which methods are “less wrong”. Thus, the framework might require further validation, experimentally or with simulated datasets.
- The final report will contain an executive summary (2 pages max), a main part (max. 50 pages) using annexes as required for detailed bioinformatics method, and a bibliography.
- It is expected that the findings will be developed into at least one peer reviewed publication and presentation(s) at suitable scientific conference(s).
Download here the full version of the RfP LRI-C4.
Timing: 2 years, start early 2018
LRI funding: €250k