World Record Hadoop Performance

Led by Dr Jim Dowling, in 2016, SeRC researchers from the PSDE community announced world-record performance for the Hadoop platform, with their next-generation distribution of Apache Hadoop File System, HopsFS. Hadoop is the de facto open-source platform for Big Data. HopsFS delivers over 16 times the throughput of the Hadoop Filesystem (HDFS) for a real-world Hadoop workload from Spotify AB. HopsFS’ key innovation is a novel distributed architecture for managing Hadoop’s metadata in MySQL Cluster, Oracle’s open-source NewSQL database. The result is a more scalable, reliable, and more customizable drop-in replacement for Hadoop. A new startup, Logical Clocks AB, has been founded to commercialize the results of the research.

References: 

Salman Niazi, Mahmoud Ismail, Seif Haridi, and Jim Dowling, KTH Royal Institute of Technology; Steffen Grohsschmiedt, Spotify AB; Mikael Ronström, Oracle, HopsFS: Scaling Hierarchical File System Metadata Using NewSQL Databases, USENIX FAST 2017.