Tuesday, November 26, 2013

Amazon Visiting Ann Arbor to talk about AWS for Research

On Thursday, December 5th, 2013 Steve Elliot and KD Singh of Amazon will be at the Hilton Garden Inn Ann Arbor (1401 Briarwood Circle, Ann Arbor, MI 48108).

Steve and KD will be talking about about both compute and storage services for researchers.  This will be a technical discussion and presentation including live demos of Gluster, StarCluster among other technologies, possibly including Elastic Map Reduce or RedShift (AWS' data warehouse service).

To register, please email Steve Elliott at: elliotts@amazon.com.

Thursday, November 14, 2013

Flux for Research Administrators

About This Talk

This talk was given at the 2013 University of Michigan CyberInfrastructure Days conference.

Administrative and related support activities are needed for researchers to successfully plan for and use Flux in their projects.
This presentation describes Flux and how the use of Flux by researchers is planned for, acquired, and managed.
The information presented here is intended to help you to better support the proposal or other planning process and manage or track Flux use.

What is Flux in Terms of Hardware?

Flux is a rate-based service that provides a Linux-based High Performance Computing (HPC) system to the University of Michigan community.
It is a fast system. Its CPUs, internal network, and storage are all fast in their own right and are designed to be fast together.
It is a large system on campus. Flux consists of 12,260 cores.

Flux Services and Costs

Table 1: Monthly rates for Flux services.
monthly rate
standard Flux $11.72/core
larger memory Flux $23.82/core
Flux Operating Env. $113.25/node
GPU Flux $107.10/GPU
The size and duration of a Flux allocation determines the cost of the allocation. The number of computers added to the Flux Operating Environment determines the cost of the installation.

Planning to Use Flux

Planning for using Flux is done by estimating usage needs and considering the limits or availability of funding.
Using Flux is more flexible than purchasing hardware. Allocations can be adjusted up or down or kept the same over the duration of a project.
There are two approaches to planning for the use of Flux:
  1. Determine the amount of Flux resources your research will need and create a budget to meet that demand.
  2. Determine how much Flux time and cores you can afford on a given budget.

Understanding Flux Allocations, Accounts, \\ and Projects is Important

A Flux project is a collection of Flux user accounts that are associated with one or more Flux allocations.
A Flux project can have as many allocations as you wish.

Instructions for Research and Other Administrators \\ During the Planning Process

Administrators should confirm, as necessary, that the grant writer has done what he or she needs to do.
Grant writers need to make sure their computing needs are suitable for the use of Flux, estimate the Flux resources that are required for the project, describe Flux in the proposal, and prepare the information needed to complete the Flux PAF Supplement form.
The administrator sends the completed Flux PAF Supplement to coe-flux-paf-review@umich.edu, and attaches the returned and approved Flux PAF Supplement to the PAF packet.

The Flux PAF Supplement

The completion and the review of the Flux PAF Supplement are important steps in the Flux planning process.
Being able to fill out the Flux PAF Supplement is a good self-check for having completed a good planning process.
The review of the Flux PAF Supplement allows the Flux operators to do some system planning. In some cases you may be asked for some clarification.

Using Flux

A Flux User Account and a Flux Allocation are needed to use Flux.
A Flux user account is a Linux login ID and password (the same as your U-M uniqname and UMICH.EDU password).
Flux user accounts and allocations may be requested using email. (See http://arc.research.umich.edu/flux/managing-a-flux-project/)

Monitoring and Tracking Flux Allocations

Historical usage data for Flux allocations is available in MReports (http://mreports.umich.edu/mreports/pages/Flux.aspx).
Instructions for accessing data in MRreports are available online (http://arc.research.umich.edu/flux/managing-a-flux-project/check-my-flux-allocation/).
Billing is done monthly by ITS.
Flux allocations can be started and ended (on month boundaries). Multiple allocations may be created.

More Information is Available

Email hpc-support@umich.edu.
Look at CAEN's High Performance Computing website: http://caen.engin.umich.edu/hpc/overview.
Look at ARC's Flux website: http://arc.research.umich.edu/flux/

Wednesday, November 13, 2013

Flux: The State of the Cluster

Last Year

What is Flux in Terms of Hardware?

Flux is a rate-based service that provides a Linux-based High Performance Computing (HPC) system to the University of Michigan community.
It is a fast system. Its CPUs, internal network, and storage are all fast in their own right and are designed to be fast together.
It is a large system on campus. Flux consists of 12,260 cores.

Flux continues to grow in cores and allocations

Flux was moved to the Modular Data Center from the MACC

Moving Flux to the MDC from the MACC resulted directly in the decrease in the rate and an accompanying change in service level.
Before the move Flux had generator-backed electrical power and could run for days during a utility power outage.
After the move Flux has battery-backed electrical power and can run for 5 minutes during a utility power outage.

The rate for all of the Flux services was reduced on October 1, 2013

Table 1: The monthly rates for Flux services were reduced.
old monthly rate new monthly rate
standard Flux $18.00/core $11.72/core
larger memory Flux $24.35/core $23.82/core
Flux Operating Env. $267.00/node $113.25/node
GPU Flux n/a $107.10/GPU

Flux has the newest GPUs from NVIDIA - the K20x

Flux has 40 K20x GPUs connected to 5 compute nodes.
Each GPU allocation comes with 2 compute cores and 8GB of CPU RAM.
Table 2: NVIDIA Tesla K20x specification
Number and Type of GPU one Kepler GK110
Peak double precision floating point perf. 1.31 Tflops
Peak single precision floating point perf. 3.95 Tflops
Memory bandwidth (ECC off) 250 GB/sec
Memory size (GDDR5) 6 GB
CUDA cores 2688

Flux has Intel Phis as a technology preview

Flux has 8 Intel 5110P Phi co-processors connected to one compute node.
As a technology preview, there is no cost to use the Phis.
Table 3: Intel Phi 5110P specification
Number and type of processor one 5110P
Processor clock 1.053GHz
Memory bandwidth (ECC off) 320 GB/sec
Memory size (GDDR5) 8 GB
Number of cores 60

Flux has Hadoop as a technology preview

Flux has a Hadoop environment that offers 16TB of HDFS storage, soon expanding to move than 100TB.
The Hadoop environment is based on Apache Hadoop version 1.1.2.
Table 4: Flux's Hadoop environment includes common software.
Hive v0.9.0
HBase v0.94.7
Sqoop v1.4.3
Pig v0.11.1
R + rhdfs + rmr2 v3.0.1
The Hadoop environment is a technology preview and has no charge associated with it. For more information on using Hadoop on Flux email hpc-support@umich.edu.

Next Year

The initial hardware will be replaced

Flux has a three-year hardware replacement cycle; we are in the process of replacing the initial 2,000 cores.
The new cores are likely to be in Intel's 10-core Xeon CPUs, resulting in 20 cores per node.
We are planning on keeping the 4GB RAM per core ratio. The memory usage over the last three years have a profile that supports this direction.

Flux may offer a option without software

ARC is hoping to have a Flux product offering that does not include the availability, and thus cost, of most commercial software.
The "software-free" version of Flux will include
  • the Intel compilers
  • the Allinea debuggers and code profilers
  • MathWorks MATLAB®
  • other no- or low-cost software
This is very dependent on acquiring licenses for the compilers, debuggers, and MATLAB that are suitable for broad use.

A clearer software usage policy will be published

With changes in how software on Flux is presented will come guidance on appropriate use of the Flux software library.
In approximate terms, the Flux software library is
  • licensed for academic research and education by faculty, students, and staff of the University of Michigan.
  • not licensed for commercial work, work that yields proprietary or restricted results, or for people who are not at the University of Michigan.

Flux on Demand may be available

ARC continues to work on developing a Flux-on-Demand service.
We hope to have some variant of this available sometime in the Winter semester.

A High-Throughput Computing service will be piloted

ARC, CAEN, and ITS are working on a High-Throughput Computing service based on HTCondor (http://research.cs.wisc.edu/htcondor/).
This will allow for large quantities (1000s) of serial jobs to be run on either Windows or Linux.
ARC does not expect there to be any charge to the researchers for this.

Advanced Research Computing at the University of Michigan

Wednesday, November 6, 2013

Nyx/Flux Winter 2013-14 Outage

Nyx, Flux, and their storage systems (/home, /home2, /nobackup, and /scratch) will be unavailable starting at 6am Thursday January 2nd, returning to service on Saturday, January 4th.

During this time, CAEN will be making the following updates:
* The OS and system software will be upgraded. These should be minor updates provided by RedHat
* Scheduling software updates, including the resource manager (PBS/Torque),  job scheduler (Moab), and associated software
* PBS-generated mails related to job data will now be from hpc-support@umich.edu, rather than the current cac-support@umich.edu
* Transitioning some compute nodes to a more reliable machine room
* Software updates to the high speed storage systems (/nobackup and /scratch)
* The College of Engineering AFS cell being retired (/afs/engin.umich.edu). Jobs using the Modules system should have no issue, but any PBS scripts which directly reference /afs/engin.umich.edu/ will be impacted.
* Migrating /home from a retiring filesystem to Value Storage

We will post status updates on our Twitter feed (https://twitter.com/UMCoECAC), which can also be found on the CAEN HPC website at http://cac.engin.umich.edu .