Hyperacceleration With Bigstream Technology
Bigstream has pioneered the field of hyperacceleration, realizing the elusive potential of software- and hardware-based acceleration for big data platforms without requiring any user code changes.
In this whitepaper we cover:
- Hardware acceleration background and challenges
- Performance results
- Technical details
- Use cases
Bridging the Gap
After holding for 50 years, Moore’s Law is coming to an end. The law that predicted a doubling in processor transistor count, and hence compute power, every 18 months has ceased to hold largely because of fundamental physics.
At the same time, the demand for big data and machine learning continues to grow as enterprises use data for a competitive advantage. Data demands have inspired a new generation of tools such as, Apache Spark, and TensorFlow that are pushing advanced analytics into the mainstream.
These tools generally take a cluster of servers to address these large computations, but cluster scaling alone has its limitations in providing high performance. Scale-up and scale-out strategies can work effectively for smaller workloads, but they run into diminishing returns when cluster sizes (scale out), or server capability (scale up) grow larger.
Hardware acceleration such as GPUs or FPGAs provide a vehicle to provide high performance and, in fact, enhance the gains of scaling. To date, however, acceleration has had limited success due to a key gap between data scientists and the performance engineers working with the computing infrastructure, illustrated by Figure 1.
Figure 1: Programming Model Gap Inhibiting Hardware Acceleration
Until recently, there was no automated way for big data platforms such as Spark to leverage advanced field programmable hardware. Consequently, data scientists and analysts had to work with performance engineers to fill that programming model gap. Though feasible, this process was inefficient and time-consuming.
Data scientists, developers and quants are accustomed to programming using big data platforms in a high-level language. Performance engineers, on the other hand, are focused on programming at a low level, including field programmable hardware. Thus, the scarcity of resources, along with additional implementation time would significantly lengthen time to value of analytics when accelerating. In addition, the resulting solutions would typically be difficult to update as analytics evolve.
Figure 2 illustrates Bigstream’s architecture to address this gap.
Figure 2: Bigstream Hyperacceleration Addresses Gap
At a high level, Bigstream Hyperacceleration automates the process of acceleration for users of big data platforms. It includes compiler technology for both software acceleration via native C++, and FPGA acceleration via bitfile templates. This technology yields up to 10x end-to-end performance gains for analytics, but with zero code change.
Cluster Scaling with Acceleration
When organizations need to increase their cluster performance, they typically scale by growing the cluster. Scaling up refers to increasing the capability of cluster nodes, keeping their number the same. Scaling out refers to increasing the number of nodes in the cluster, keeping their type the same. An approach combining scale up and scale out is also common.
Figure 3: Scale-up, Scale-out
Figure 3 illustrates these two approaches to scaling. In this example, both approaches increase the number of virtual CPUs (vCPUs), increasing the compute power. It is also possible to scale in other ways, such as adjusting network connections, memory, disk or other resources.
Scaling, in almost all cases, yields sub-linear performance increase as resources are added. That is, as the cluster is scaled by a factor of N, performance increase is less than N. This diminishing return is severe at large scale. The technical reasons for this are listed below:
- Increased I/O overhead/throttling
- Shared resource contention (memory, L2 Cache)
- Scheduling complexity increase
- Increased network overhead
- Straggler effect exacerbated
- Failure rate increase
Bigstream Hyperacceleration improves scaling in two ways:
- It can reduce the size of the cluster needed to yield a given performance level, and
- It reduces the overhead of some of the above factors - such as network and I/O) - thus reducing their impact.
Figure 4. Results of Scale up Experiment with Spark
Figure 4 illustrates the scaling issue. In this example, we conducted experiments showing scale up. We ran two TPC-DS () benchmark queries on Amazon EMR using Spark, in various cluster scenarios. Moving from left to right, each point represents the performance for a given number of vCPUs. The cluster size doubles with each step to the right (16,32,64,128,256 vCPUs). The vertical axis “Speedup” is calculated relative to the “Base”. The speedup performance of both benchmarks fall off from the blue linear line as the cluster scales, as predicted above.
Figure 5. Results of Scale up Experiment with Spark and Accelerated Spark
Next we ran the same experiment with clusters equipped with Bigstream software-based acceleration. Figure 5 shows the combined results - the dashed lines are the unaccelerated runs from Figure 4 and the solid lines are Bigstream accelerated. The accelerated curve displays a much more gentle falloff with scaling than the Spark curve. So for a given cluster size, Bigstream generates significantly more speedup than Spark alone (vertical difference of two lines). Another way of looking at this, though, is the horizontal distance and the fact that Bigstream provides nearly the same speedup at 64 vCPUs as does Spark alone with 256 vCPUs.
We see similar results with scale out experiments. These results indicate that acceleration can work synergistically with scaling, to provide maximum performance and a wide variety of performant configuration choices for the user. This, in turn, can result in total cost of ownership (TCO) savings. Cloud users can use smaller clusters or use the same cluster for a shorter time period for a given analysis. Users with on-premises clusters will be able to run faster applications and accomplish more in a given period of time.
As discussed, hardware-based acceleration has the highest performance potential. Adding an FPGA to a server can be a cost-effective way to speed up big data platforms if it doesn't come with additional complexity. These chips are typically a fraction of the cost of a full CPU-based server.
Figure 6 illustrates the results of a separate set of benchmark tests comparing Spark alone versus Spark with Bigstream FPGA-based acceleration. 104 TPC-DS benchmark queries were run with Spark in a CPU-only platform as the baseline, and then again using Spark along with Bigstream and a commodity FPGA platform.
The average speedup across the 104 queries was 3.3x with some queries as much as 5x faster with Bigstream and FPGAs. As with all Bigstream Hyperacceleration, no Spark code needed to be changed in the accelerated runs.
This section presents a technical overview of Bigstream Hyperacceleration applied to Spark, and the role of its components. We focus on its relationship to the standard Spark architecture and how it enables acceleration transparently.
Baseline Spark Architecture
Figure 7 shows the basic components of standard Spark using YARN for resource management. The Spark components and associated roles are as follows:
Figure 7: Baseline Spark Architecture
- Spark Driver – Runs the client application and communicates with the Master to install the application to be run and configurations for the cluster. The configurations include the number of Master and Core nodes as well as memory size selections for each of these.
- Spark Master – Instantiates the Spark Executors. The Master must communicate with the Resource Manager with requests for resources according to the application needs. The Resource Manager system, in turn, allocates resources for Executor creation. The Master creates the stages of the application and distributes tasks to the Executors.
- Spark Executors – Also known as Core nodes, they run individual Spark tasks, reporting back to the Master when stages are completed. The computation proceeds in stages, generating parallelism among the Executor nodes. It’s clear that the faster that the Executors can execute their individual task sets, the faster stages can finish, and therefore the faster the application finishes. In standard Spark, tasks are created as Java bytecode at runtime and downloaded to the Executors for execution.
Bigstream Hyperacceleration Architecture
Figure 8: Spark with Bigstream Acceleration Architecture
Figure 8 shows the Spark architecture with Bigstream acceleration integrated. Note that this illustration applies equally to software and hardware (many-core, GPU and FPGA) acceleration. The blue items indicate Hyperacceleration components that are added at bootstrap time and can then provide acceleration throughout the course of multiple application executions. The Client Application, Driver, Resource Manager components, and the structure of the Master and Executors all remain unchanged. Bigstream Hyperacceleration does not require changes to anything in the system related to fault tolerance, storage management and resource management. It has been carefully designed only to provide an alternative execution substrate at a node level that is transparent to the rest of Spark. These are the functions and interfaces of the components:
- Spark Master – Generates the physical plan exactly as in standard Spark through the execution of the Catalyst optimizer. The standard bytecode for Spark tasks are generated by the Master as normal.
- Bigstream Runtime – The Bigstream runtime is a set of natively compiled C++ modules (software acceleration), or bitfile templates (FPGA acceleration) and their associated APIs that implement accelerated versions of Spark operations.
- Streaming Compiler – The Bigstream Streaming Compiler examines the physical plan and inspects and evaluates individual stages for potential optimized execution. The output of the evaluation process is a set of calls into the Bigstream Runtime API, implementing each stage in the plan if deemed possible.
- Spark Executor – Via a hook inserted at cluster bootstrap time, all Executors have a pre-execution check that determines if a stage has been accelerated. If it has, the associated compiled module is called. Otherwise, the standard Java bytecode version is executed. This check is invisible to the programmer. Other than improved performance, the programmer doesn’t know whether a stage is running accelerated. Thus, stages are accelerated optimistically, defaulting to being run as in standard Spark. This approach ensures that Bigstream users continue to see an identical interface to standard Spark.
Delivering Solutions to Real-World Challenges
Bigstream continues to develop its product, focusing on the biggest opportunities for acceleration of the Spark platform. The current product has accelerated a broad range of functions and technical and business use cases. ETL and ELT workloads have seen some of the most dramatic acceleration as have SQL analytics with Hive and Spark SQL. Bigstream developed Hyperacceleration to solve real-world big data problems. With this state of the art technology comes a significant ROI and staggering performance gains.