June benchmark results show a 2.89X speedup of Spark
Bigstream is in the acceleration business. Our approach to hyper-acceleration is at the platform level, as opposed to point solutions that require special APIs, custom coding, and changes to IT and DevOps processes. This means that as we add new platforms and features, the entire platform benefits. And Bigstream customers reap the benefits without ever changing a line of code.
This month’s Bigstream Benchmark Report shows a nearly 300% performance gain over the baseline open source Apache Spark 2.1 on the same configuration. This is the result of a complete performance overhaul of Spark using Bigstream Hyper-acceleration technology. To get a better understanding of what Hyper-acceleration is, check out this interview with Bigstream CEO Maysam Lavasani.
Benchmarks are part of our daily processes and from time to time it makes sense to share performance data as we refine and expand our hyper-acceleration product. The numbers and types of testing we do will evolve over time, but we use a combination of standard industry benchmarks like TPC-DS (decision support), as well as some specific big data and machine learning use cases such as ETL, data ingest and parsing, SQL analytics and vertical industry use cases like AdTech real-time bidding, FinServ trading systems, and Retail analytics.
Here is a summary of this month’s results running Apache Spark with the Bigstream Hyper-acceleration Layer on Amazon EMR versus unaccelerated Apache Spark on the same configuration.
Here is a look at the full results of the run:
All tests were done on standard Amazon EC2 server instances (m4.4xlarge) running on Intel Xeon processors.
For a full accounting of the results, take a look at the Bigstream Benchmark Report for July 2017. In that report, you will see the following:
- Test environment: Amazon EMR cluster of four m4.4xlarge instances, Apache Spark 2.1.1
- Average acceleration: 2.89X using csv data, 2.47X using Avro data
- Maximum acceleration: 3.86X on TPC-DS Query #9 (csv), 3.11X on TPC-DS Query #27 (Avro)
- Average monthly cost savings on Amazon EMR: 54% (csv) and 46% (Avro)
Check out the report to get a bit more detail about the testing environment and how we worked out the cost savings. Things are going to get even more interesting when we publish our next benchmark report. In that report, we will include some testing of FPGA powered servers.
About the Author
Bigstream provides hyper-acceleration technology for popular big data processing engines like Apache Spark using both hardware and software accelerators. Hyper-acceleration of big data, machine learning and AI workloads is achieved using advanced compiler techniques and transparent support for FPGAs, many-core CPUs and GPUs. Unlike other hardware- or platform-specific approaches, Bigstream delivers orders of magnitude performance acceleration instantly and with no application code changes or special APIs.