Choose Your Weapon: Comparing Spark on FPGAs vs GPUs (May 27, 2021)
Data & AI Summit 2021
Bigstream was honored to participate in this year's Data + AI Summit! In addition to this presentation, check out "Turbocharge Spark with Samsung SmartSSDs powered by Xilinx" with Bigstream's Steve Tuohy and Xilinx's Sr Architect Seong Kim.
Today, general-purpose CPU clusters are the most widely used environment for data analytics workloads. Recently, acceleration solutions employing field-programmable hardware have emerged providing cost, performance and power consumption advantages. Field programmable gate arrays (FPGAs) and graphics processing units (GPUs) are two leading technologies being applied. GPUs are well-known for high-performance dense-matrix, highly regular operations such as graphics processing and matrix manipulation. FPGAs are flexible in terms of programming architecture and are adept at providing performance for operations that contain conditionals and/or branches. These architectural differences have significant performance impacts, which manifest all the way up to the application layer. It is therefore critical that data scientists and engineers understand these impacts in order to inform decisions about if and how to accelerate.
This talk will characterize the architectural aspects of the two hardware types as applied to analytics, with the ultimate goal of informing the application programmer. Recently, both GPUs and FPGAs have been applied to Apache SparkSQL, via services on Amazon Web Services (AWS) cloud. These solutions’ goal is providing Spark users high performance and cost savings. We first characterize the key aspects of the two hardware platforms. Based on this characterization, we examine and contrast the sets and types of SparkSQL operations they accelerate well, how they accelerate them, and the implications for the user’s application. Finally, we present and analyze a performance comparison of the two AWS solutions (one FPGA-based, one GPU-based). The tests employ the TPC-DS (decision support) benchmark suite, a widely used performance test for data analytics.
Bishwa Roop Ganguly
Chief Solution Architect, Bigstream
Bishwa Roop Ganguly is Chief Solutions Architect at Bigstream Solutions. He has a PhD in Electrical Engineering from MIT, and an MS and BS in Computer Science from University of Illinois and University of California at Berkeley, respectively. He has published extensively in the field of parallel processing and computer networks. He also has 5 years of experience as a Data Scientist using Hadoop, Spark and SQL. He currently manages customer and partner engagements at Bigstream.