Translational bioinformatics: Statistical learning for patient stratification

Big Data analytics on modern e-infrastructures

Biomarker discovery, large-scale image analysis, and the emerging routine sequencing and phenotyping of patients motivates the use of Big Data analytics on modern e-infrastructures such as HPC and cloud. We will conduct research on improving the efficiency of bioinformatics pipelines using workflows and data analytics, including TensorFlow on virtualised and GPU resources. The project is align with H2020 projects PhenoMeNal and OpenRiskNet (Spjuth is WP-leader in both), as well as the SSF Big Data project HASTE (Spjuth is co-PI). We will also run a pilot on the EOSC HNSciCloud.

Improving clinical diagnostics with next-generation sequencing

The ClinSeq project at KI (Grönberg) focuses on a pan-cancer approach for initial cancer diagnosis and treatment. The focus is on short-read sequencing carried out at the Clinical Diagnostics facility of SciLifeLab in Stockholm. We have previously developed AutoSeq, a fully automated preprocessing pipeline for bioinformatics in the ClinSeq project that runs on high-performance e-Infrastructure resources. Actions to further increase pipeline performance will be made [M1.1-M1.3, D1.1-D1.2]. The data generated in ClinSeq will form an internationally leading resource of well-characterised biobank material with associated clinical information. ​Work in Uppsala is attached to SciLifeLab Clinical Diagnostics facility in Uppsala and the National Genomics Infrastructure (NGI). The focus is the novel methodology using long-read, single-molecule real time sequencing. A system CLAMP has been developed and is now in routine clinical use at Uppsala University Hospital. We will continue to develop this and work towards other clinical applications including p53 and HCV [D1.5].

Prediction models for personalised precision medicine (PPM)

PPM depends on the ability to transform large datasets into clinically actionable information. Predictive and prognostic PPM models that combine multiple types of high-dimensional molecular and clinical data for patient stratification have the potential to substantially improve patient outcomes by reducing over- and under-treatment and by increasing the probability of treatment response. We will apply modern statistical and machine learning methods to turn large molecular dataset into insights for PPM applications in cancer. The focus areas are (a) to develop models to improve patient stratification (breast cancer, prostate cancer and acute myeloid leukemia), and (b) to develop novel statistical methodologies for integrative modeling of multiple data modalities [M1.4-M1.5, D1.3-D1.4]. The project utilises statistical machine learning, HPC and the integration of both in-house and publicly available data.