At USENIX FAST 2017, researchers from RISE SICS and KTH, in collaboration with Spotify and Oracle, presented a next-generation distribution of Apache Hadoop File System, called HopsFS, that delivers a quantum leap in both the cluster-size and throughput compared to Hadoop clusters. Hadoop is the de facto open-source platform for Big Data. HopsFS delivers over 16 times the throughput of the Hadoop Filesystem (HDFS) for a real-world Hadoop workload from Spotify AB on the SICS ICE cluster. HopsFS’ key innovation is a novel distributed architecture for managing Hadoop’s metadata in MySQL Cluster, Oracle’s open-source NewSQL database. The result is a more scalable, reliable, and more customizable drop-in replacement for Hadoop.
HopsFS: Scaling Hierarchical File System Metadata Using NewSQL Databases, USENIX FAST 2017. Salman Niazi, Mahmoud Ismail, Mikael Ronström, Steffen Grohsschmiedt, Seif Haridi, Jim Dowling.