Today, we’re excited to announce Acceleration 2.0. We have seen great interest and innovation in the big data acceleration market in the past two years, and today’s announcement really takes us into a new era of acceleration. In particular, the expansion of functionality changes the game for investing in hardware acceleration in your data center. With the combination of caching and acceleration and increased coverage of file formats and use cases, customers can now upgrade their Apache Spark clusters 80% less expensively than in the past.
Spark and infinite scaling
Big data systems have thrived because of the ability to spread large data workloads across many computing nodes. Apache Spark is the prominent example, with a fast developing open-source based platform that optimizes across large clusters of computers. The promise of infinite horizontal scaling is appealing, imagining that any capacity or performance obstacle can be met by adding more servers to your cluster.
Limits to scale out and scale up
In fact, though, there are significant limitations to this perspective, and it has led many organizations to inefficiency and overspending. As more and more server nodes are added to a Spark cluster, the amount of network traffic and communications overhead between them grows even faster. Rather than a one for one return on scaling with more nodes, each expansion delivers less gains than the previous. Below we lay out empirical examples of this for the two main traditional paths to cluster growth, scaling out and scaling up. Scaling out is adding additional servers to a cluster while scaling up is increasing the computing capacity of each existing server. For example, you can see that doubling the node count on a 128 node cluster (going from 128 to 256) falls far short of doubling speed. It generates a respectable 18% gain, but at a sizable cost.
Scale Out (Spark)
Hardware and software acceleration technology can turbo boost an existing Spark cluster much faster than a traditional scale-out project. Acceleration boosts performance of the cluster without adding additional servers or CPUs and the associated communication overhead. We have seen successful endeavors in acceleration including using native software to make better use of existing CPUs, in addition to introducing hardware accelerators such as field programmable gate arrays (FPGAs), graphics processing units (GPUs), and computational storage. All of these add efficiency to the computing paradigm, removing the CPU limitations. Caching has also been an approach that's yielded promising performance gains.
Introducing Acceleration 2.0
In Acceleration 2.0, we pull many of these elements together for an overall performance gain across use cases. This raises the impact of all Bigstream’s solutions, but is particularly a game changer for customers’ on premises Spark deployments, which are not limited to the instance types the cloud providers make available. That can be an extensive, multiyear process.
Acceleration 2.0 for on premises clusters adds a computational storage device (CSD), specifically a Samsung SmartSSD. Computational storage has a variety of use cases like Apache Spark environments where it is beneficial to process data at the storage level and reduce demands on the CPU. The SmartSSD is a combination of an FPGA and fast flash storage on a single platform. Along with Bigstream technology, it acts as both a computation accelerator and a caching engine, introducing significant efficiencies to Spark workloads. Bigstream technology offloads expensive computations onto the FPGA, freeing the CPU to process other operations. A single FPGA has been shown to outperform servers with 10-20 CPUs or more on certain Spark operators.
Even with a single SmartSSD, Bigstream can deliver 50% to 300% acceleration depending on the mix of customer use cases. This small investment coupled with big results generates enormous economic benefits.
Total cost of ownership (TCO) savings examples
For an on-premises Spark cluster, most customers approach changes to their storage design and hardware infrastructure cautiously, as they should. Big data clusters often serve many use cases and so an acceleration path must have broad-based impact to justify the change. The different elements of Acceleration 2.0 help deliver this impact to facilitate this transition.
Acceleration 2.0 is a low disruption addition. Customers do not need to change their HDFS structure. They also can upgrade incrementally as Acceleration 2.0 does not require new hardware on every server in the cluster. By adding a small number of SmartSSDs, customers can immediately realize performance benefits. The SmartSSD enables caching, and the combination of that with the SmartSSD’s FPGA delivers broad-based performance gains. Finally, Acceleration 2.0 accelerates an expanded set of widely-used file types and use cases. Let's dig into a few approaches to scaling to compare the economic impact and total cost of ownership (TCO).
Baseline TCO - Scale out
Customer server clusters range from a handful of nodes to 10s of thousands. While these relative costs will stay consistent across cluster sizes, let's examine a customer that has a 1,000 node on-premises cluster. When this customer reaches the limits of its cluster and wants to add an additional 25% speed or overall capacity, the traditional approach is to add an additional 25% more servers. This particular customer would buy 250 new servers. They likely enjoy discount pricing since they've already purchased at least 1,000 servers, but servers are a large investment with a large physical footprint, a large upfront cost and ongoing operational expenses including power and licensing. Even with discounting, these 250 servers would come with a TCO above $8 million dollars over three years.
Alternative 1 - Scale out with accelerated servers
For the customer that is reluctant to make any changes to their existing servers, the first alternative approach we offer is to add acceleration-enabled servers. The benefits of these apply not just to the new servers but to the overall cluster because the SSDs will enable caching and Bigstream software delivers acceleration. To gain the same 25% performance boost, the customer only needs to add 130 new servers (just more than half as many) with a small SmartSSD card in each. The SmartSSD card fits in a U.2 slot, representing a small fraction of the full server space and a similarly small fraction of the cost. The upfront purchase is around one eighth the server cost and unlike a server, it adds almost no ongoing annual operating expenses (OpEx). Overall, the three-year TCO of this approach is under $5 million, matching the capacity and performance gains of the traditional scale-out 42% less expensively.
Alternative 2 - Accelerate your existing cluster
Adding Acceleration 2.0 and SmartSSDs to the existing cluster is where we see truly game-changing TCO economics. By adding SmartSSDs to a subset of the existing servers, a customer can match the 25% capacity and performance increase at a remarkably lower cost. For the existing 1,000 node cluster, the customer would buy 270 SmartSSDs. The upfront capital expenditure (CapEx) is about 20% of that of the scale out approach. The ongoing operating expenses though are less than 5% of the scale out approach as it incurs no additional racks or power costs. Over a 3-year view, the total cost of ownership is 89% lower as can be seen in the graphic. Again, this is to match the performance of the scale-out approach.
Alternative 3 - Exceed performance gains and lower TCO
Each of the prior scenarios generates a 25% performance gain, but at three distinct costs. Bigstream is an acceleration company, focused on delivering the fastest big data results possible, and this fourth approach focuses on the performance gain while still delivering an impressive TCO reduction. By adding a SmartSSD to all 1,000 servers in this customer's data center, the customer gains a 90% performance gain instead of the 25% of the three prior scenarios. At the same time, the three-year TCO is 60% lower than the traditional scale out approach. Here you really recognize the best of both worlds from a capabilities and cost perspective.
Bigstream is focused on flexibility, whether it be delivering acceleration in the cloud, in an on-premises data center, with hardware accelerators, or sometimes simply with software acceleration.
Testing and implementation are simple and incremental
The on prem Acceleration 2.0 solution has a simple two step deployment process. In the initial step, we work with customers to install a single SmartSSD and Bigstream software and provide a measurement tool to provide visibility into the performance. This generates a view of the processing of the accelerated server versus the unaccelerated servers, illustrating the actual and potential speed benefits.
The second step is to equip a subset of nodes with SmartSSDs and run a full cluster test. Bigstream is confident in the results and will refund any initial investment if this testing does not deliver satisfactory performance gains. Customers can then realize further acceleration and savings with incremental installations across further server nodes. Acceleration 2.0 gives data and analytics teams a set of high performance servers for time-sensitive workloads, and this simple and incremental approach lets them match the mix of performance and cost with the needs of their specific mix of workloads.
Pushing acceleration forward
Bigstream has been at the forefront of acceleration technology for five years, creating the middleware to help the big data community access the optimal computing possibilities. Acceleration 2.0 is a culmination of this developed technology, and delivers a level of TCO that is unmatched by other approaches.