Architecture
Innovative and groundbreaking coherent shared memory technology
The architecture heritage of NumaConnect™ dates back to the development of the IEEE standard 1596, Scalable Coherent Interface (SCI). SCI was architected upon three major pillars, scalability, global shared address space and cache/memory coherence.
These principles led to a definition of the packet format with support for a global address space of 64 bits, with 16 bits to address 65 536 physical nodes where each node can hold multiple processors. Each node can then have 256 TeraBytes of memory adding up to a maximum system addressing capacity of 16 ExaBytes (2**64). In that respect, the architects had foresight to envision systems in the exascale range.
The big differentiator for NumaConnect compared to other high-speed interconnect technologies is the shared memory and cache coherency mechanisms. These features allow programs to access any memory location and any memory mapped I/O device in a multiprocessor system with a high degree of efficiency.
It provides scalable servers with a unified programming model that stays the same from the small multi-core machines used in laptops and desktops to the largest imaginable single system image machines that may contain more than thousand processor cores and many Terabytes memory. This architecture is classified as ccNuma or just Numa.
There are a number of pros for shared memory machines that lead experts to hold the architecture as the holy grail of computing compared to clusters:
Any processor can access any data location through direct load and store operations – easier programming, less code to write and debug
Compilers can automatically exploit loop level parallelism – higher efficiency with less human effort
System administration relates to a unified system as opposed to a large number of separate images in a cluster – less effort to maintain
Resources can be mapped and used by any processor in the system – optimal use of resources in a single image operating system environment
No need to decompose or duplicate data sets for scaling – significantly shorter development time for applications like Scale-up and Scale-out for graph processing