Click the banner above to return to the main web -


























Numascale Whitepapers


The Numascale Solution: Extreme BIG DATA Computing

Extrenme big data computingA whitepaper on NumaConnect used in Big Data and Data Intensive applications. →download pdf


Traditional clusters based on distributed memory cannot adequately handle the current crush of data. Shared memory approaches are required. The NumaConnect technology provides an affordable solution. It delivers all the advantages of shared memory computing – streamlined application development, the ability to compute on large datasets, the ability to run more rigorous algorithms, enhanced scalability, etc. – at a substantially reduced cost.



NumaConnect WhitePaperA whitepaper that introduces you to the NumaChip and the technology →download pdf


The WEB version of the whitepaper can be found →here


Numascale's NumaConnect™ technology enables computer system vendors to build scalable servers with the functionality of enterprise mainframes at the cost level of clusters.


NumaConnect differentiates from all other interconnects in providing a true high-performance ccNuma architecture and thereby the ability to provide unified access to all resources in a system and utilize caching techniques to obtain very low latency.

Redefining Scalable OpenMP and MPI
Price-to-Performance with Numascale’s

Redefin Scalable OpenMP and MPI
Price-to-Performance with Numascale’s
NumaConnectSome early benchmark results with the Numascale technology inspired Doug Eadline to write this white paper. It shows NumaConnect's position in the interconnect market. →download pdf


The WEB version of the whitepaper can be found →here


Using commodity hardware and the “plug-and-play” NumaConnect interconnect, Numascale delivers true shared memory programming and simpler administration at standard HPC cluster price points. One such system currently offers users over 1,700 cores with a 4.6 TB single memory image.

The NumaConnect cluster excels at both OpenMP and MPI computing within the same shared memory environment. No extra software or program modifications are needed to take advantage of the entire system. Results for the NASA Advanced Supercomputing (NAS) Parallel Benchmarks have set a new record for OpenMP core count and problem size. OpenMP results show good scalability, with best results coming from larger problem sizes.

In addition, NumaConnect shared memory MPI performance delivers better results than InfiniBand clusters, using standard tools without modification for the underlying hardware environment. A cost comparison with a small FDR InfiniBand cluster shows a comparable price point when ease of programming, high performance, and ease of administration are considered.

The Numascale Solution:
Affordable BIG DATA Computing

Affordable Big Data White PaperA HPC Wire Newspaper by John Russell →download pdf


The WEB version of the whitepaper can be found →here


The challenge is that traditional cluster computing based on distributed memory – which was so successful in bringing down the cost of high performance computing (HPC) – struggles when forced to run applications where memory requirements exceed the capacity of a single node. Increased interconnect latencies, longer and more complicated software development, inefficient system utilization, and
additional administrative overhead are all adverse factors.

“Any application requiring a large memory footprint
can benefit from a shared memory computing


SMP Redux: You Can Have It All

SMP Redux White PaperA Numascale whitepapper by Douglas Eadline→download pdf


The WEB version of the whitepaper can be found →here


Large scale plug-and-play SMP delivers mainframe performance at commodity prices


When High Performance Computing (HPC) is mentioned, one often envisions a large expensive mainframe. These machines are highly parallel and employ a Symmetric Multi-Processing (SMP) design that provides global memory and process spaces. Delivering this level of performance has always been expensive due to the custom engineering required to integrate a large number of processors into a shared memory environment. Indeed, the added expense of large-scale SMP systems has pushed the market to use a “cluster approach,” where a large number of commodity server machines connected together are used as a single resource. It is well known that clusters are harder to manage and less efficient than SMP mainframe systems.


In this paper we will contrast current SMP and cluster designs in the context of HPC. We will also introduce a breakthrough technology from Numascale that allows commodity hardware to be combined into a cost-effective and scalable SMP system that delivers the best of both worlds.


A Qualitative Discussion of NumaConnect vs InfiniBand and other high-speed networks

NumaConnect vs InfinibandA Numascale whitepapper →download pdf


The WEB version of the whitepaper can be found →here



The main focus of InfiniBand was to be able to encapsulate any packet format and provide high bandwidth connections between systems and their peripherals as well as between systems. InfiniBand is a "shared nothing" architecture. This means that all communication requires a driver level in software to control the communication and handle buffers for the RDMA (Remote Direct Memory Access) engines.


NumaConnect provides a scalable, low-latency, high bandwidth interconnect with full support for cache coherence. The shared address space allows processors to access remote memory directly without any software driver level intervention and no overheads associated with setting up RDMA transfers. A NumaConnect system to addresses all memory and all memory mapped I/O devices directly.

The Change is on

From Cluster
shared memory with ccnuma by numachip
To Scalable ccNUMA SMP
Cache Coherence - ccNuma - Clusters - Coherent - Directory Based Cache Coherence - Hypertransport - InfiniBand - Numa - NumaChip - Numascale - Snooping