Bigstream can’t solve the chip shortage, but can help solve one of these three challenges.

Much attention is rightfully being paid to the global shortage of semiconductors. The pandemic’s combined effect of sending workers and students home (and thus increasing demand for home tech equipment) amid 2020’s production disruption has thrown off the balance of supply and demand in the space.

Chips are ubiquitous

From the obvious to the less obvious, many industries increasingly depend on microchips. The data center and associated servers, PCs, and 5G telecom rollout fall into the obvious category. Today, the reach of microprocessors extends outside of these tech arenas, and the impact of these shortages has made its way into the formerly low-tech world. One example of this is the dog-washing business, whose equipment is now instrumented for precision with microchips. The global shortage has left manufacturers juggling for supply and dog owners and pet stores facing price increases. 

The crunch has already created challenges for a long list of industries between the low-tech pet care and high-tech enterprise data worlds. Apple has resorted to stocking older iPhones because it has not been able to source enough iPhone 12 Pro components. The Sony PlayStation 5 can’t keep up with gaming demand, and General Motors and Ford have idled some factories, sending thousands of workers onto reduced pay.

Impact on big data and distributed computing

Organizations running big data applications like Apache SparkTM in their data centers are feeling the supply chain challenge. One of the key selling points of big data software is its scalability, theoretically letting users add 20 percent more capacity simply by adding 20 percent more server nodes to a big data cluster. Across the industry, a fast-growing share of big data workloads has moved to the cloud, but still a very large share is on-prem, and many organizations have clusters that continue to grow quarter after quarter. The supply crunch will accelerate the shift of certain workloads to the cloud for some organizations as server prices rise and supplies remain scarce.

Innovative organizations are finding an alternative path. Adding big data acceleration technology can drastically increase the capacity of existing data center clusters, without purchasing expensive CPU servers.

Mindless scaleout already needed a better approach

The CPU is ideal for general-purpose computing, able to process almost all workloads even if it’s not in the fastest or most efficient way. Acceleration is based on the idea that there are many situations where general-purpose is not ideal, and for big data it improves on CPU-only clusters in two ways: 1) it has the ability to make the existing CPUs process specific operators more efficiently, and 2) it lets existing CPU clusters incorporate advanced processors that process many operations faster and more efficiently than a CPU can.

A Total Cost of Ownership (TCO) Example

Consider an organization with a 100-node Spark cluster that needs to increase capacity by 15 percent. One approach is to purchase, install, and configure an additional 15 servers. There's a wide range in server pricing, but let's say between the capital expenditure (up front) and the operating expenditures (ongoing), a server ends up costing $1,000 per month, or $36,000 across three years of life.

A different approach would be to use Bigstream software and a small form factor field-programmable gate array (FPGA). FPGAs attach onto existing servers in the cluster, and the FPGA plus Bigstream approach ends up costing less than 20 percent of the server cost described above. It also increases the cluster’s capacity far more than adding the additional 15 CPU servers. Results vary based on file and workload profiles, but typical end-to-end FPGA acceleration is 2x to as much as 10x. At the conservative end of that, the total cost of ownership (TCO) for this expansion can be cut by over 80 percent by accelerating the existing CPU servers versus increasing the number of servers. 

There are several alternative options on the acceleration path including solely with software. The best approach will depend on the organization's environment and workloads, (i.e., whether its main work is ingest, transformation, analytic queries, or machine learning).

Even without a chip shortage, bigger clusters aren’t always better

There are other benefits to the acceleration approach versus scaling up or scaling out. It is true that Spark scales by adding more and bigger servers. However, the communication overhead between nodes results in significant diminishing returns to that scale, so for instance, a 200-node cluster is likely to perform 50 percent better than a 100-node cluster where you would theoretically expect a 100 percent improvement.

In the chart below, the black line represents the performance gains that would come from perfect, linear Spark scaling - a 4x node increase (moving to the right) would generate 4x speed gains (moving up). In fact, the orange bars show the diminishing returns, and that it takes a 6x node increase to generate a 4x speed gain. The blue represents the third alternative, accelerating CPUs and gaining 4x speed gains with just 2x the nodes, half of the growth of the theoretical Spark scaling and one third the growth of the actual Spark scaling. In normal times and amidst the chip shortage, the clear path to 4x speed gains is with node acceleration rather than adding more nodes.

Conclusion: turn the chip shortage into an opportunity to modernize your Spark cluster

Some of our partners and customers have shared with us their challenges in securing servers for their data centers. Of course, FPGAs are not immune from the global forces of supply and demand either. However, as a component of a Spark cluster that is less than 10 percent the price of a server, price pressures are a much smaller issue. Additionally, innovative organizations that find multiple options to increase their performance and capacity are finding that flexibility increasingly valuable in today's economy.

Market analysts don’t expect this chip shortage to go away soon, and it is likely to last another two years or more. It will continue to impact businesses and consumers. Bigstream isn’t able to avoid a dog-washing or gaming crisis, but we’d love to help organizations do more with less in their Spark clusters!