Scale-Up Node ControllerLowering the technology barrier for building high performance scale-up systems
Node Controller for Intel UPIUNC3 ASIC for UPI1.1 available
The market for Scale-up server systems is growing rapidly as the need for in-memory analytics and databases, IoT, server consolidation/virtualization etc. is seen in a number of application verticals. Numascale helps growing this market through enabling low barrier for developing Scale-up systems, providing better TCO and customer differentiation. Our UPI Node Controllers can effectively scale systems from 8 sockets to 32 sockets and 48TB of shared memory.
The NumaChip on-chip fabric interconnect can be configured in different topologies by the use of routing tables. The number of fabric channels is implemented according to customer requirements. The first implementation of NumaChip™ supported 6 4-lane serial channels to accommodate 3-D Torus topologies. New designs are now targeting up to 8 4-lane channels or 4 8-lane channels to support a reasonable number of direct links between system node main boards.
On-chip fabric also eliminates the need for external switches and long cables, and provides a better TCO.
Examples on different configurations:
The following software products are available, supporting the Scale-up architecture, system and applications:
- Linux Kernel modifications for managing NumaChip hardware support for large number of CPU cores in a single scale-up system
- Interrupt extensions to make Linux work with more than ≈180 CPUs
- Linux Kernel Performance Optimizations to improve OS scalability
- Hierarchical Kernel Locks to avoid irregular staling
- System Clock Synchronization support in NumaChip hardware
- Bootloader (BIOS-level code)
- System Check and System Initialization to map all resources and check hardware for error at start-up
- System Manager (external) for flexible management of large systems:
Remote power control through IPMI
Remote Configuration for partitioning and OS management
System Monitoring for optimizing system utilization
Optimized Software Libraries and Tools
- NCBTL as plug-in using non-temporal store instructions to avoid cache pollution and reduce overheads
- NBLACS for optimized performance of Scalapack for linear algebra
- NCAlloc for local allocation of stack and heap
- NumaPlace for memory affinity to improve NUMA behaviour
Application Profiling tools
- Analysing application behaviour for performance optimisations
- Tracing tools and disassembly functions for development purposes
Node Controller for AMD HypertransportAvailable as NumaConnect plug-in module for servers
Scalable, Cache Coherent, Shared Memory System Interconnect
AMD processor nodes with Coherent HyperTransport
Based on field proven design
Enables commodity cost level for high-end servers
NumaChip 1: ASIC implementation
Numachip 2: Modular and flexible FPGA implementation
Converts between snoop-based (broadcast) and directory based coherency protocols
Write-back to Remote Cache
Non-coherent transactions (for optimized MPI)
Remote Cache size up to 16GBytes (remote data)
NumaChip RAS Features
ECC for single bit correction and double bit detection
Automatic scrubbing after single bit error detection
Automatic background scrubbing to minimize probability of soft error accumulation
Flexible micro-coded coherence processing engine
Watch-bus for internal activity observation in real-time
Built-in Performance Counters
NumaConnect Adapter Card N313 and N343 HTX SocketScalable Cache Coherent Shared Memory on your Cluster Budget
- ccNuma and Numa low latency shared memory interconnect
- Virtualizes Everything, Including Memory and IO
- >10x price/performance benefit over proprietary solutions
- Seamless Scaling of Application Size and Performance – NO Porting Efforts
- Scalable, Cache Coherent, Shared Memory System Interconnect
- AMD processor nodes with Coherent HyperTransport
- Based on field proven design
- Enables commodity cost level for high-end servers
- Converts between snoop-based (broadcast) and directory based coherency protocols
- Write-back to Remote Cache
- Non-coherent transactions (for optimized MPI)
- Pipelined memory access (16 outstanding transactions + 16 non-coherent)
- Remote Cache size up to 4GBytes (remote data)
NumaConnect RAS Features
- ECC for single bit correction and double bit detection
- Automatic scrubbing after single bit error detection
- Automatic background scrubbing to minimize probability of soft error accumulation
- Flexible micro-coded coherence processing engine
- Watch-bus for internal activity observation in real-time
- Built-in Performance Counters
DENSISHIELD™ I/O SYSTEM: High-Speed Cable Assemblies and Connectors
NumaConnect™ uses the DensiShield™ technology from FCI or exchangable systems from other vendors.
FCI’s DensiShield™ I/O system is designed to support the transmission of high-speed, serial differential signals, while taking customers equipment packaging requirements into consideration. Combining high density, mechanical robustness, and ease of PCB assembly with excellent shielding and signal integrity performance means that system designers no longer have to compromise.
The DensiShield™ connector is built around a wafer system employing a differential pair construction where each pair is shielded from the adjacent pair and the adjacent wafer. The 8-pair connector is ideally suited for 4 bi-directional channels working at >2.5Gb/s, similar to Infiniband or XAUI links, but can also perform at much higher data rates.
The DensiShield connector’s design for long engagement and low profile not only allows for more compact cable routing in systems, when compared to other high speed cabling systems, but it is also capable of being used in dense system designs whose board-to-board spacing is as low as 15mm.
The connector can also be configured to handle low-speed signals and power and is also available in selectively-loaded versions, thereby allowing system designers to benefit from the density and robustness offered by this connector system in I/O applications that do not require high-speed signaling.
Features and Benefits
- 8-pair connectors can be mounted side by side on 12.5mm pitch enabling multiple I/O ports along a card edge
- Robust strain relief with short cable exit enables close spacing between chassis panel and cabinet door or wall
- Low vertical profile allows use in systems having 15mm pitch card slot spacing
- Low crosstalk between differential pairs with controlled 100-ohm impedance to match 100-ohm shielded pair cable
- Crimp ferrule system reliably terminates EMC shield of cable to connector covers
- Robust EMC shield to chassis panel termination with shielding down to PCB level
- Signal ground is isolated from EMC ground
- SMT reflow-compatible PCB connector
- Dual-beam contact system provides redundancy and long term reliability
Numascale is using enterprise classs x86 servers from Supermicro.
For more information:
Numascale Manager™ Appliance – NMA
One-Click Numascale manager Deployment Solution
The Numascale Manager Appliance(NMA) provides a complete and easy solution to manage the Numascale system/appliance. It provides a secure GUI and CLI access allowing essential monitoring and management of the system.
Utilizing the NumaConnect peer-to-peer 3D torus topology, both cost and performance are optimized with out-of-the-box operating systems. The NMA utilizes the required configuration and boot environment so that there would be maximum performance.
The NMA provides lights-out server monitoring through standard IPMI and can automatically setup IPMI on servers.
The Numascale Manager provides a one-click OS deployment, and no software needs to be installed on the servers before using the Numascale Manager. It handles all cabling and configuration settings needed for installation and operation.
Package selections are optimized for large Shared Memory system use and are standard, secure, with optional lifetime security and errata updates.