SeRC Data Science

In addition to the ongoing simulation-oriented activity in e-Science, we are seeing a rapid increase in methods making use of real-life data. This field of research is often denoted Data Science. A highly related area is Machine Learning, encompassing methods to train computer models with data. There are different aspects of Data Science, presenting different methodological challenges – addressed with different mathematical and computer-scientific tools.

For some applications, e.g. climate and flow modeling, the challenge is to extract knowledge from enormous amounts of data; such as learning approximative models of highly complex physical systems to enable faster prediction from them. Often the datasets are so large that they cannot even be stored completely, but have to be analyzed sequentially – this poses challenges both in terms of hardware use and Machine Learning methodology. The current state of the art approach to learning from large datasets is Deep Learning.

In other applications, e.g. medical diagnostics, data is instead most often very sparse and incomplete, and the challenge is instead to “fill in the gaps” and make use of the little existing data as possible. There is a huge potential in the use of tools from Statistics and Numerical Analysis to make more efficient use of sparse and incomplete data.