Wednesday, November 13, 2013

Flux: The State of the Cluster

Last Year

What is Flux in Terms of Hardware?

Flux is a rate-based service that provides a Linux-based High Performance Computing (HPC) system to the University of Michigan community.
It is a fast system. Its CPUs, internal network, and storage are all fast in their own right and are designed to be fast together.
It is a large system on campus. Flux consists of 12,260 cores.

Flux continues to grow in cores and allocations


Flux was moved to the Modular Data Center from the MACC

Moving Flux to the MDC from the MACC resulted directly in the decrease in the rate and an accompanying change in service level.
Before the move Flux had generator-backed electrical power and could run for days during a utility power outage.
After the move Flux has battery-backed electrical power and can run for 5 minutes during a utility power outage.

The rate for all of the Flux services was reduced on October 1, 2013

Table 1: The monthly rates for Flux services were reduced.
old monthly rate new monthly rate
standard Flux $18.00/core $11.72/core
larger memory Flux $24.35/core $23.82/core
Flux Operating Env. $267.00/node $113.25/node
GPU Flux n/a $107.10/GPU

Flux has the newest GPUs from NVIDIA - the K20x

Flux has 40 K20x GPUs connected to 5 compute nodes.
Each GPU allocation comes with 2 compute cores and 8GB of CPU RAM.
Table 2: NVIDIA Tesla K20x specification
Number and Type of GPU one Kepler GK110
Peak double precision floating point perf. 1.31 Tflops
Peak single precision floating point perf. 3.95 Tflops
Memory bandwidth (ECC off) 250 GB/sec
Memory size (GDDR5) 6 GB
CUDA cores 2688

Flux has Intel Phis as a technology preview

Flux has 8 Intel 5110P Phi co-processors connected to one compute node.
As a technology preview, there is no cost to use the Phis.
Table 3: Intel Phi 5110P specification
Number and type of processor one 5110P
Processor clock 1.053GHz
Memory bandwidth (ECC off) 320 GB/sec
Memory size (GDDR5) 8 GB
Number of cores 60

Flux has Hadoop as a technology preview

Flux has a Hadoop environment that offers 16TB of HDFS storage, soon expanding to move than 100TB.
The Hadoop environment is based on Apache Hadoop version 1.1.2.
Table 4: Flux's Hadoop environment includes common software.
Hive v0.9.0
HBase v0.94.7
Sqoop v1.4.3
Pig v0.11.1
R + rhdfs + rmr2 v3.0.1
The Hadoop environment is a technology preview and has no charge associated with it. For more information on using Hadoop on Flux email hpc-support@umich.edu.

Next Year

The initial hardware will be replaced

Flux has a three-year hardware replacement cycle; we are in the process of replacing the initial 2,000 cores.
The new cores are likely to be in Intel's 10-core Xeon CPUs, resulting in 20 cores per node.
We are planning on keeping the 4GB RAM per core ratio. The memory usage over the last three years have a profile that supports this direction.

Flux may offer a option without software

ARC is hoping to have a Flux product offering that does not include the availability, and thus cost, of most commercial software.
The "software-free" version of Flux will include
  • the Intel compilers
  • the Allinea debuggers and code profilers
  • MathWorks MATLAB®
  • other no- or low-cost software
This is very dependent on acquiring licenses for the compilers, debuggers, and MATLAB that are suitable for broad use.

A clearer software usage policy will be published

With changes in how software on Flux is presented will come guidance on appropriate use of the Flux software library.
In approximate terms, the Flux software library is
  • licensed for academic research and education by faculty, students, and staff of the University of Michigan.
  • not licensed for commercial work, work that yields proprietary or restricted results, or for people who are not at the University of Michigan.

Flux on Demand may be available

ARC continues to work on developing a Flux-on-Demand service.
We hope to have some variant of this available sometime in the Winter semester.

A High-Throughput Computing service will be piloted

ARC, CAEN, and ITS are working on a High-Throughput Computing service based on HTCondor (http://research.cs.wisc.edu/htcondor/).
This will allow for large quantities (1000s) of serial jobs to be run on either Windows or Linux.
ARC does not expect there to be any charge to the researchers for this.

Advanced Research Computing at the University of Michigan