Technology Brief: NCALLOC Accelerating Big Data Applications with Efficient Memory Management

Analyze bigger data with higher precision using Numascale R Analytics Appliance

Data scientists and power users can now run complex R analytics at high throughput rates eleminating the need to recode R algorithms to Python or C++.

The Numascale Solution: Extreme BIG DATA Computing

A whitepaper on NumaConnect used in Big Data and Data Intensive applications

A Numascale Whitepaper by Einar Rustad, CTO

Traditional clusters based on distributed memory cannot adequately handle the current crush of data. Shared memory approaches are required. The NumaConnect technology provides an affordable solution. It delivers all the advantages of shared memory computing – streamlined application development, the ability to compute on large datasets, the ability to run more rigorous algorithms, enhanced scalability, etc. – at a substantially reduced cost.

An introduction to the NumaChip and the technology

Numascale’s NumaConnect™technology enables computer system vendors to build scalable servers with the functionality of enterprise mainframes at the cost level of clusters.

NumaConnect differentiates from all other interconnects in providing a true high-performance ccNuma architecture and thereby the ability to provide unified access to all resources in a system and utilize caching techniques to obtain very low latency

Redefining Scalable OpenMP and MPI Price-to-Performance with Numascale’s NumaConnect

Early benchmark results and NumaConnect’s position in the interconnect market

A Numascale Whitepaper by Doug Eadline

Using commodity hardware and the “plug-and-play” NumaConnect interconnect, Numascale delivers true shared memory programming and simpler administration at standard HPC cluster price points. One such system currently offers users over 1,700 cores with a 4.6 TB single memory image.

The NumaConnect cluster excels at both OpenMP and MPI computing within the same shared memory environment. No extra software or program modifications are needed to take advantage of the entire system. Results for the NASA Advanced Supercomputing (NAS) Parallel Benchmarks have set a new record for OpenMP core count and problem size. OpenMP results show good scalability, with best results coming from larger problem sizes.

In addition, NumaConnect shared memory MPI performance delivers better results than InfiniBand clusters, using standard tools without modification for the underlying hardware environment. A cost comparison with a small FDR InfiniBand cluster shows a comparable price point when ease of programming, high performance, and ease of administration are considered.

The Numascale Solution: Affordable Big Data Computing

HPC Wire Newspaper by John Russell

Today’s challenge is that traditional cluster computing based on distributed memory – which was so successful in bringing down the cost of high performance computing (HPC) – struggles when forced to run applications where memory requirements exceed the capacity of a single node. Increased interconnect latencies, longer and more complicated software development, inefficient system utilization, and additional administrative overhead are all adverse factors.

Therefore, any application requiring a large memory footprint can benefit from a shared memory computing environment. FInd out more about affordable big data computing here.

SMP Redux: You Can Have It All

Large scale plug-and-play SMP delivers mainframe performance at commodity prices

A Numascale whitepaper by Douglas Eadline

When High Performance Computing (HPC) is mentioned, one often envisions a large expensive mainframe. These machines are highly parallel and employ a Symmetric Multi-Processing (SMP) design that provides global memory and process spaces. Delivering this level of performance has always been expensive due to the custom engineering required to integrate a large number of processors into a shared memory environment. Indeed, the added expense of large-scale SMP systems has pushed the market to use a “cluster approach,” where a large number of commodity server machines connected together are used as a single resource. It is well known that clusters are harder to manage and less efficient than SMP mainframe systems.

This paper contrasts current SMP and cluster designs in the context of HPC. It also introduces a breakthrough technology from Numascale that allows commodity hardware to be combined into a cost-effective and scalable SMP system that delivers the best of both worlds.

A Qualitative Discussion of NumaConnect vs InfiniBand and other high-speed networks

The main focus of InfiniBand was to enable encapsulation of any packet format and provide high bandwidth connections between systems and their peripherals as well as between systems. InfiniBand is a “shared nothing” architecture. This means that all communication requires a driver level in software to control the communication and handle buffers for the RDMA (Remote Direct Memory Access) engines.

NumaConnect provides a scalable, low-latency, high bandwidth interconnect with full support for cache coherence. The shared address space allows processors to access remote memory directly without any software driver level intervention and no overheads associated with setting up RDMA transfers. A NumaConnect system addresses all memory and all memory mapped I/O devices directly.