Skip to main content

Cluster management at Google

Abstract:
Cluster management is the term that Google uses to describe how we
control the computing infrastructure in our datacenters that
supports almost all of our external services. It includes
allocating resources to different applications on our fleet of
computers, looking after software installations and hardware,
monitoring, and many other things. My goal is to present an
overview of some of these systems, introduce Omega, the new
cluster-manager tool we are building, and present some of the
challenges that we’re facing along the way. Many of these
challenges represent research opportunities, so I’ll spend the
majority of the time discussing those.

Short bio:
John Wilkes has been at Google since 2008, where he is working on
cluster management and infrastructure services. He is interested in far
too many aspects of distributed systems, but a recurring theme has been
technologies that allow systems to manage themselves. In his spare time
he continues, stubbornly, trying to learn how to blow glass.

John will be around for individual discussions after the talk – if interested please inform erwinl@pdc.kth.se