18 Jun 2013 Leipzig – During the first BoF session on European exascale research at the ISC’13 event in Leipzig, Alex Ramirez from the Barcelona Supercomputing Center described the progress within the EC-funded Mont-Blanc project. The Mont-Blanc hardware is due to start being tested in July 2013. The hardware incorporates ARM multicore and an integrated OpenCL accelerator, Ethernet NIC, and high density packaging. Mont-Blanc is based on a hybrid MPI and OmpSs programming model. Full-scale applications are being ported to Mont-Blanc.
The challenge is the software in developing a European exascale approach, stated Alex Ramirez. The developers have to leverage commodity and embedded power-efficient technology. They need to deploy a prototype HPC system based on currently available energy-efficient embedded technology. Next, they have to port and optimise a small number of representative exascale applications capable of exploiting this new generation of HPC systems.
Alex Ramirez explained how commodity components are driving HPC. In the past, RISC processors replaced vectors. Next, x86 processors replaced RISC. At present, vector processors survive as widening SIMD extensions.
Koller mobile processors are the most recent evolution. Microprocessors killed the Vector supercomputers.
They were not faster but they were signigficantly cheaper and greener. History may be about to repeat itself by mobile processors, according to Alex Ramirez.
As such the Samsung Exynos 5 Dual Superphone SoC provides 32nm HKMG, dual-core ARM Cortex-A15 at 1.7 Ghz, a compute card based on the Exynos 5 Dual, and each daughter card is a full HPC compute node with the size of a phone.
In the BullX Carrier Blade, each blade is a cluster on its own, a cluster in a blade, so to speak, with 15 compute nodes and intergrated GbE switch.
The Mont-Blanc prototype architecture has 9 blades but is limited by SoC timing and availability but better mobile Socs keep appearing in the market.
Alex Ramirez said it is an energy-efficient machine built out of commodity parts. It will have slower cores so two times more cores will be needed for the same performance as well as 8 times more cluster nodes. The system consists of hybrid CPU and GPU and half on-chip memory per core. It also has lower I/O bandwidth. So there is no free lunch, as the speaker put it. Communication and computation have to be overlapped to hide the lower bandwidth.
The programmer exposed a simple architecture with focus on the algorithm. The partners have to exploit knowledge about the future and automatically handle all of the architecture challenges including strong scalability, multile address spaces, and low cache size.
The advances made consist in adding Fortran support as well as GPU support.
At present, applications are being ported to Mont-Blanc. There are 11 applications up to now, concluded Alex Ramirez.