Breakthrough in Big Data: 16X performance gains for Hadoop, delivering over 1.2 million operations per second

Scientific Highlights

At USENIX FAST 2017, researchers from RISE SICS and KTH, in collaboration with Spotify and Oracle, presented a next-generation distribution of Apache Hadoop File System, called HopsFS, that delivers a quantum leap in both the cluster-size and throughput compared to Hadoop clusters. Hadoop is the de facto open-source platform for Big Data. HopsFS delivers over 16 times the throughput of the Hadoop Filesystem (HDFS) for a real-world Hadoop workload from Spotify AB on the SICS ICE cluster. HopsFS’ key innovation is a novel distributed architecture for managing Hadoop’s metadata in MySQL Cluster, Oracle’s open-source NewSQL database. The result is a more scalable, reliable, and more customizable drop-in replacement for Hadoop.

HopsFS: Scaling Hierarchical File System Metadata Using NewSQL Databases, USENIX FAST 2017. Salman Niazi, Mahmoud Ismail, Mikael Ronström, Steffen Grohsschmiedt, Seif Haridi, Jim Dowling.