We aim to develop new better numerical algorithms for certain problems stemming from data science and machine learning, by using state-of-the-art numerical linear algebra techniques and software and solve the corresponding computationally demanding core problems. The problems are of fundamental nature and the long-term impact is expected in a wide range of data science applications. Several data science approaches require the solution to eigenvalue problems and various generalizations, including the problems in our approach. SeRC-supported competence on eigenvalue computations has been build up over the last years, and there are close collaborations with, e.g., KU Leuven, Univ. Strathclyde, TU Eindhoven, Univ. Pisa. The initial part of the project will be carried out with mainly the collaboration partner Francesco Tudisco, Univ. Strathclyde. Through this MCP we will enable collaboration and synergy also for researchers from all SeRC-universities, e.g., other research concerned with computation of graphs such as CausalHealth.
We are addressing two specific approaches
a. p-spectral clustering of graphs
b. robust Fischer discriminant analysis
Although these approaches are for different types of machine learning ((a) unsupervised and (b) supervised learning), both have a common fundamental computational component requiring the solution of a nonlinear eigenvalue problem with a particular structure. The characterizations as nonlinear eigenproblems have been presented at world-leading ML-conferences. Characterizations of (a) as a nonlinear eigenvalue problem has been presented in [Th. Buehler, M. Hein, Spectral clustering based on the graph p-Laplacian, Proc. 26th Ann. Inter. conf. Machine Learning, 2009] and (b) in [Kim, Magnani, Boyd, Advances in Neural Information Processing Systems (NIPS), 2006]. The usefulness of both of these approaches are restricted only by the fact that the core problem is difficult to solve reliably and efficiently. The SeRC NA-community has decades of experience on nonlinear eigenvalue problems, with a strong potential to form an entirely new orthogonal viewpoint in relation to current approaches. This will be the starting point for our algorithm developement in this area. We aim to develop new better algorithm by studying convergence of current algorithms and carefully study (both in theory and computationally on realistic problems) and improve other algorithms for nonlinear eigenvalue problems when applied to these problems, e.g., superlinear convergence algorithms such as (rational) Krylov approaches and matrix function approaches.
We aim to, in collaboration with CausalHealth, organize activities around graph computation and supervised learning with uncertainty (in which both (a) and (b) fit), with potential invited speakers such as M Hein, Univ. Tuebingen (formerly Univ. Saarland). This is outlined in Deliverable [6,8].1.
Current software for nonlinear eigenvalue problems in the KTH NA community has been developed in the Julia programming language. We also aim to organize training/workshop for machine learning and data science in the Julia programming language, e.g., by external teachers from Julia Computing (http://juliacomputing.com). Since the first stable Julia version was released in August 2018, we expect a substantial increase in this programming language in the near future, and such a workshop may be of interest to all projects in this MCP and SeRC in general. This is outlined in Deliverable 8.3.