Numascale Tops World Record STREAM Benchmark

Home - Press Release - Numascale Tops World Record STREAM Benchmark

Numascale Tops World Record STREAM Benchmark

System pushes the absolute limits of scalability by supporting more cores than any other single system (5,184) and 20.7 TBytes of shared memory, handles massive amounts of data.


Frankfurt, July 13, 2015 – Numascale today announced record-breaking results from a shared memory system running the McCalpin STREAM Benchmark, a synthetic benchmark program that measures sustainable memory bandwidth and the corresponding computation rate for simple vector kernels. Numascale’s cache coherent shared memory system, which was targeted for big data analytics, reached 10.06 TBytes/second for the Scale function. This feat ranked 53% higher than the next most scalable system on the list, which was only able to achieve 6.59 TBytes/second.

Numascale’s record-breaking system is the first part of a large cloud computing install at a North American customer data center facility for the analytics and simulation of sensor data combined with historical data. The system is being used to run analytic models that simulate complex dynamic behavior in a certain supply chain. Its data sets are large and the model uses both historical data as well as close to real-time information to predict behavior. The size of the data sets requires large memory short access times in order to be able to complete computations within deadlines.

The customer data center’s analysis evaluates location placement, megawatt sizing, and energy services mix in order to determine the greatest optimization and efficiency gains from the integration of banks that store and deliver energy to an electric grid. In similar fashion, Numascale’s technology has numerous Smart City applications, such as traffic analysis, where 24×7 real-time streaming data from thousand of sensors aids decisions and actions that need to be made in real time.

To run all calculations compiled from disparate data sources in a timely manner — both structured and unstructured — requires significant computing power and a large shared memory. Numascale’s STREAM results indicate that the total bandwidth of the system is capable of supporting large parallel workloads. The STREAM benchmark is specifically designed to test datasets much larger than the available cache on any given system, so its results indicate, to some degree, of the performance of very large, vector-style applications.

Numascale’s system consists of 108 Supermicro 1U servers connected in a 3D torus via their NumaConnect™ Interconnect technology. Three cabinets with 36 servers apiece were used in a 6x6x3 topology. Each server has 48 cores in three AMD Opteron™ 6386 CPUs and 192 GBytes memory, providing a single system image and 20.7 TBytes to all 5,184 cores. The system was designed to meet requirements for “very large memory” hardware solutions running a standard single image Linux OS on commodity x86-based servers.

NumaConnect enables scalable server computer systems to be built from commodity components at cluster prices, while providing high performance shared memory programming capabilities. The Interconnect technology eliminates the difficulty of MPI coding for big data problems and typically increases programmer productivity.

“This alternative represents a compelling solution for scientists who currently work with shared memory codes on x86 desktops and laptops,” said Einar Rustad, CTO of Numascale. “These users can now scale up their data sets without any extra effort within a familiar, standard Linux OS environment. With NumaConnect, system administration is identical to that of a single server because there are no separate node images to maintain and distribute.”

In Numascale’s record-breaking system, NumaConnect provides a total physical address space of up to 256 TBytes of system-wide shared memory. It does so using cache coherency logic with a directory-based protocol that scales to 4096 nodes, providing 196,608 cores. In running this STREAM benchmark Numascale’s system did not use all of its cores, as it is a better utilization of memory channels to let one core run each memory controller, thus avoiding arbitration between different cores and providing optimum memory bandwidth.

For this install, Numscale will deliver a training session to teach best practice software design methods that take full advantage of their unique architecture. The company has signed a development agreement whereby Numascale will co-develop future software solutions with the data center.

Visit Numascale in booth 1124 at ISC15 to see a demo of Numascale’s Smart City solution.

About Numascale

Numascale provides turnkey systems for the Big Data Analytics market. Numascale combines optimized and maintained open source software with their unique shared memory architecture to deliver solutions that scale extremely well. The combined software and hardware solution provides an unmatched performance at a very attractive price point. Numascale is Bigger Data Analytics. For more information visit


Einar Rustad
CTO, Numascale
+47 924 84 510
[email protected]