PSDE: OpCoReS – Optimized Component Runtime System

OpCoReS focuses on the use of task-oriented programming models, high-level equation-based object-oriented textual/graphical programming models and efficient compilation of such models to task-oriented and data-parallel code, the exploitation of multi-level parallelism in application development as well as during runtime and associated performance monitoring and analysis.

Task-oriented programming models are already used as a base for recent developments, such as the task model of OpenMP and the programming model used in the programming environment SMP superscalar from the Barcelona Supercomputing Center. Through the exploitation of component-based approaches building blocks that hide different technologies and implement algorithms in the best-suited way for a specific hardware or system environment can be combined. These implementations may be serial as well as parallel, and are not restricted to specific programming techniques.

Starting with components based on MPI or hybrid MPI/OpenMP or PGAS approaches, we will demonstrate how this approach supports the efficient use even of large-scale HPC systems for a selection of several possible component implementations. In a second step we intend to expand this model to heterogeneous installations where we will apply components, which allow combining processing elements in the form of GPGPU processors as well as other many-core processors. Our approach also relies on efficient task scheduling and data distributions provided by a dynamic runtime system, which realizes the monitoring of previously composed applications, re-computes the schedule in the case of load imbalances and ensures a high parallel efficiency through dynamic load-balancing. This runtime system will also include a real-time performance analysis module applying artificial intelligence methods to program performance observations combined with earlier recorded data.

On a higher level we will analyze high-level equation-based object-oriented (EOO) textual or graphical models (e.g. for typically engineering applications) to extract parallelism in order to compile the models to efficient parallel code that in turn can exploit the component approach described above. Using languages such as Modelica or similar modelling languages, there are three main approaches: (1) automatic parallelization of the mathematical models, (2) coarse-grained explicit parallelization based on user-defined partitioning of the models, (3) explicit parallel programming integrated with the modelling language. Following approach (1) we will convert the numeric solver method into explicit form (the right-hand side), analyze the structure for data dependencies, perform scheduling/load-balancing, and generate task-oriented code for functional parallelism, and data-parallel code for array computations. This will be combined with coarse-grained parallelization based on user partitioning using TLM-connectors to de-couple model subsystems.