Flux HPC: 2013

Friday, December 20, 2013

Holiday Schedule

In line with the University's holiday schedule, the CAEN HPC group will be on holiday from December 25th through January 1st. Nyx and Flux will be operational during this time and we will be monitoring these systems to ensure everything is operating appropriately.

Staff will be monitoring the ticket system but in general only be responding to critical or systems-related issues during the holiday break and will address non-critical issues after the holiday. As a reminder, immediately following the holiday break is the 2013-14 Winter Outage and there will be no access to the cluster or storage systems as of 6am on January 2nd.

As always, email any questions you may have to hpc-support@umich.edu and have a great holiday.

CAEN HPC Staff

Thursday, December 12, 2013

SPINEVOLUTION on Flux

SPINEVOLUTION is, from its web site, a highly efficient computer program for the numerical simulation of NMR experiments and spin dynamics in general.

Its installation on Flux has some more nuances than other software, but Flux is a general platform and SPINEVOLUTION can be installed and run on Flux.

The LSA Research Support and Advocacy group, and Mark Montague in particular, have documented in installation and use of SPINEVOLUTION on Flux at https://sites.google.com/a/umich.edu/flux-support/software/lsa/spinevolution

While SPINEVOLUTION is a narrowly focused software package, the details of its installation on Flux may be applicable to other narrowly focused software packages.

For more information, please send email to hpc-support@umich.edu.

Undergraduate Student job in web data visualization

The CAEN HPC group would like to improve the graphical reporting of much of the data available from the cluster.

In the past, we would run commands via scripts and parse the output and make graphs.

The most recent versions of the cluster management software present some (and increasingly more) of the information via a REST-ful interface that returns JSON-formatted results.

In addition, JavaScript graphing libraries are improving in usefulness and usability. Among these are d3.js, JavaScript InfoVis Toolkit, Chart.js, Google Charts and others.

Our current usage graphs (an example of which is below) do not differentiate different types of Flux products (regular nodes, larger memory nodes, GPU nodes, FOE nodes, etc.) and do not separate utilization by Flux project accounts or by Flux user accounts.

Figure 1: The current Flux usage graphs do not differentiate between Flux projects, do not offer different time scales, and are generally of limited use.

We would like

an overview page that improves on the current usage graphs
a way to see daily, weekly, monthly, and yearly detail
a way to see Flux products (as above) individually and stacked together
a place for this to live (locally? MiServer? AWS? Google Sites?)
a web site akin to http://flux-stats/?account-flux(|m|g) that provides per-Flux-project reports including:

allocated cores over time
running cores (by user) over time
current resources in use per total:
- x_running / x_allocated cores
- y_running / y_allocated GB RAM
the current queue represented as running jobs:

job owner # cores in use # GB RAM in use times (start, running, total) job name job ID

and idle jobs:

job owner # cores req’d # GB RAM req’d time req’d job name job ID
some heuristic advice along the lines of:

if you had X more cores, then Y more jobs would start
if you had A more GB of RAM, then B more jobs would start
“you would save money by switching your the allocations in project G from standard Flux to larger memory Flux”, etc.

Email us at coe-hpc-jobs@umich.edu if you are interested.

Tuesday, November 26, 2013

Amazon Visiting Ann Arbor to talk about AWS for Research

On Thursday, December 5th, 2013 Steve Elliot and KD Singh of Amazon will be at the Hilton Garden Inn Ann Arbor (1401 Briarwood Circle, Ann Arbor, MI 48108).

Steve and KD will be talking about about both compute and storage services for researchers. This will be a technical discussion and presentation including live demos of Gluster, StarCluster among other technologies, possibly including Elastic Map Reduce or RedShift (AWS' data warehouse service).

To register, please email Steve Elliott at: elliotts@amazon.com.

Thursday, November 14, 2013

Flux for Research Administrators

About This Talk

This talk was given at the 2013 University of Michigan CyberInfrastructure Days conference.

Administrative and related support activities are needed for researchers to successfully plan for and use Flux in their projects.
This presentation describes Flux and how the use of Flux by researchers is planned for, acquired, and managed.
The information presented here is intended to help you to better support the proposal or other planning process and manage or track Flux use.

What is Flux in Terms of Hardware?

Flux is a rate-based service that provides a Linux-based High Performance Computing (HPC) system to the University of Michigan community.
It is a fast system. Its CPUs, internal network, and storage are all fast in their own right and are designed to be fast together.
It is a large system on campus. Flux consists of 12,260 cores.

Flux Services and Costs

Table 1: Monthly rates for Flux services.
	monthly rate
standard Flux	$11.72/core
larger memory Flux	$23.82/core
Flux Operating Env.	$113.25/node
GPU Flux	$107.10/GPU

The size and duration of a Flux allocation determines the cost of the allocation. The number of computers added to the Flux Operating Environment determines the cost of the installation.

Planning to Use Flux

Planning for using Flux is done by estimating usage needs and considering the limits or availability of funding.
Using Flux is more flexible than purchasing hardware. Allocations can be adjusted up or down or kept the same over the duration of a project.
There are two approaches to planning for the use of Flux:

Determine the amount of Flux resources your research will need and create a budget to meet that demand.
Determine how much Flux time and cores you can afford on a given budget.

Understanding Flux Allocations, Accounts, \\ and Projects is Important

A Flux project is a collection of Flux user accounts that are associated with one or more Flux allocations.
A Flux project can have as many allocations as you wish.

Instructions for Research and Other Administrators \\ During the Planning Process

Administrators should confirm, as necessary, that the grant writer has done what he or she needs to do.
Grant writers need to make sure their computing needs are suitable for the use of Flux, estimate the Flux resources that are required for the project, describe Flux in the proposal, and prepare the information needed to complete the Flux PAF Supplement form.
The administrator sends the completed Flux PAF Supplement to coe-flux-paf-review@umich.edu, and attaches the returned and approved Flux PAF Supplement to the PAF packet.

The Flux PAF Supplement

The completion and the review of the Flux PAF Supplement are important steps in the Flux planning process.
Being able to fill out the Flux PAF Supplement is a good self-check for having completed a good planning process.
The review of the Flux PAF Supplement allows the Flux operators to do some system planning. In some cases you may be asked for some clarification.

Using Flux

A Flux User Account and a Flux Allocation are needed to use Flux.
A Flux user account is a Linux login ID and password (the same as your U-M uniqname and UMICH.EDU password).
Flux user accounts and allocations may be requested using email. (See http://arc.research.umich.edu/flux/managing-a-flux-project/)

Monitoring and Tracking Flux Allocations

Historical usage data for Flux allocations is available in MReports (http://mreports.umich.edu/mreports/pages/Flux.aspx).
Instructions for accessing data in MRreports are available online (http://arc.research.umich.edu/flux/managing-a-flux-project/check-my-flux-allocation/).
Billing is done monthly by ITS.
Flux allocations can be started and ended (on month boundaries). Multiple allocations may be created.

More Information is Available

Email hpc-support@umich.edu.
Look at CAEN's High Performance Computing website: http://caen.engin.umich.edu/hpc/overview.
Look at ARC's Flux website: http://arc.research.umich.edu/flux/

Wednesday, November 13, 2013

Flux: The State of the Cluster

Last Year

What is Flux in Terms of Hardware?

Flux continues to grow in cores and allocations

Flux was moved to the Modular Data Center from the MACC

Moving Flux to the MDC from the MACC resulted directly in the decrease in the rate and an accompanying change in service level.
Before the move Flux had generator-backed electrical power and could run for days during a utility power outage.
After the move Flux has battery-backed electrical power and can run for 5 minutes during a utility power outage.

The rate for all of the Flux services was reduced on October 1, 2013

Table 1: The monthly rates for Flux services were reduced.
	old monthly rate	new monthly rate
standard Flux	$18.00/core	$11.72/core
larger memory Flux	$24.35/core	$23.82/core
Flux Operating Env.	$267.00/node	$113.25/node
GPU Flux	n/a	$107.10/GPU

Flux has the newest GPUs from NVIDIA - the K20x

Flux has 40 K20x GPUs connected to 5 compute nodes.
Each GPU allocation comes with 2 compute cores and 8GB of CPU RAM.

Table 2: NVIDIA Tesla K20x specification
Number and Type of GPU	one Kepler GK110
Peak double precision floating point perf.	1.31 Tflops
Peak single precision floating point perf.	3.95 Tflops
Memory bandwidth (ECC off)	250 GB/sec
Memory size (GDDR5)	6 GB
CUDA cores	2688

Flux has Intel Phis as a technology preview

Flux has 8 Intel 5110P Phi co-processors connected to one compute node.
As a technology preview, there is no cost to use the Phis.

Table 3: Intel Phi 5110P specification
Number and type of processor	one 5110P
Processor clock	1.053GHz
Memory bandwidth (ECC off)	320 GB/sec
Memory size (GDDR5)	8 GB
Number of cores	60

Flux has Hadoop as a technology preview

Flux has a Hadoop environment that offers 16TB of HDFS storage, soon expanding to move than 100TB.
The Hadoop environment is based on Apache Hadoop version 1.1.2.

Table 4: Flux's Hadoop environment includes common software.
Hive	v0.9.0
HBase	v0.94.7
Sqoop	v1.4.3
Pig	v0.11.1
R + rhdfs + rmr2	v3.0.1

The Hadoop environment is a technology preview and has no charge associated with it. For more information on using Hadoop on Flux email hpc-support@umich.edu.

Next Year

The initial hardware will be replaced

Flux has a three-year hardware replacement cycle; we are in the process of replacing the initial 2,000 cores.
The new cores are likely to be in Intel's 10-core Xeon CPUs, resulting in 20 cores per node.
We are planning on keeping the 4GB RAM per core ratio. The memory usage over the last three years have a profile that supports this direction.

Flux may offer a option without software

ARC is hoping to have a Flux product offering that does not include the availability, and thus cost, of most commercial software.
The "software-free" version of Flux will include

the Intel compilers
the Allinea debuggers and code profilers
MathWorks MATLAB^®
other no- or low-cost software

This is very dependent on acquiring licenses for the compilers, debuggers, and MATLAB that are suitable for broad use.

A clearer software usage policy will be published

With changes in how software on Flux is presented will come guidance on appropriate use of the Flux software library.
In approximate terms, the Flux software library is

licensed for academic research and education by faculty, students, and staff of the University of Michigan.
not licensed for commercial work, work that yields proprietary or restricted results, or for people who are not at the University of Michigan.

Flux on Demand may be available

ARC continues to work on developing a Flux-on-Demand service.
We hope to have some variant of this available sometime in the Winter semester.

A High-Throughput Computing service will be piloted

ARC, CAEN, and ITS are working on a High-Throughput Computing service based on HTCondor (http://research.cs.wisc.edu/htcondor/).
This will allow for large quantities (1000s) of serial jobs to be run on either Windows or Linux.
ARC does not expect there to be any charge to the researchers for this.

Advanced Research Computing at the University of Michigan

hpc-support@umich.edu

Wednesday, November 6, 2013

Nyx/Flux Winter 2013-14 Outage

Nyx, Flux, and their storage systems (/home, /home2, /nobackup, and /scratch) will be unavailable starting at 6am Thursday January 2nd, returning to service on Saturday, January 4th.

During this time, CAEN will be making the following updates:
* The OS and system software will be upgraded. These should be minor updates provided by RedHat
* Scheduling software updates, including the resource manager (PBS/Torque), job scheduler (Moab), and associated software
* PBS-generated mails related to job data will now be from hpc-support@umich.edu, rather than the current cac-support@umich.edu
* Transitioning some compute nodes to a more reliable machine room
* Software updates to the high speed storage systems (/nobackup and /scratch)
* The College of Engineering AFS cell being retired (/afs/engin.umich.edu). Jobs using the Modules system should have no issue, but any PBS scripts which directly reference /afs/engin.umich.edu/ will be impacted.
* Migrating /home from a retiring filesystem to Value Storage

We will post status updates on our Twitter feed (https://twitter.com/UMCoECAC), which can also be found on the CAEN HPC website at http://cac.engin.umich.edu .

Wednesday, October 2, 2013

Expanded XSEDE Support at Michigan

The EXtreme Science and Engineering Discovery Environment (XSEDE) is a great source for free high-performance computing resources for the research community. Researchers who wish to use XSEDE can get support on campus from a number of people.

Brock Palen

I (Brock Palen) am the XSEDE Campus Champion for the University of Michigan. As the Champion I have direct communication with a number of the XSEDE resource providers.

I can support you on XSEDE resources in much the same way you are supported on any local resource. I can benchmark, test, and debug issues. I can also support you in the XSEDE proposal writing process for review and resource selection.

I work closely with many of the research computing support staff in the College of LSA, the School of Public Health, and the Medical School, and they can also help you with any questions or problems you have with any part of the XSEDE process. You can get a hold of any of them and me at hpc-support@umich.edu.

As always, if you run into difficulty with any part of the XSEDE application process or when using XSEDE resources to reach to us locally for help at hpc-support@umich.edu.

Thursday, August 29, 2013

Engineering New Faculty Orientation: Research Computing

Today Ken Powell, Andy Caird, and Amadi Nwankpa spoke to the new College of Engineering faculty about research computing. The slides and handout are represented below.

Who's who on campus

IT Groups

CAEN

has operational responsibilities for College of Engineering IT services and facilities.
has five main groups: high-performance computing (HPC), web, applications, student computing environment, instructional technology
http://caen.engin.umich.edu

ITS

has operational responsibilities for University IT services and facilities
http://its.umich.edu

Your departmental computing support

Research Computing Support

Michigan Institute for Computational Discovery and Engineering

administers the Rackham scientific computing PhD and scientific computing certificate
helps point CoE faculty to appropriate HPC resources
http://arc.research.umich.edu/micde/

Office of Advanced Research Computing (ARC)

coordinates campus-level research computing infrastructure and events
helps connect faculty in various colleges doing related work
http://arc.research.umich.edu

What's What—HPC resources

Flux

Recommended HPC resource for most CoE faculty
Homogeneous collection of hardware owned by ARC and operated by CAEN HPC with purchasing/business/HR support by ITS.
Run on an allocation model—faculty purchase by core-month allocation
About 10,000 InfiniBand-connected cores, all owned by ARC. Used by about 120 research groups and several courses.
http://caen.engin.umich.edu/hpc/overview

Flux Operating Environment

For faculty who have hardware needs that differ from standard Flux
Run on an ownership model—faculty buy their hardware, and it is incorporated and run for them in the Flux Operating Environment
http://caen.engin.umich.edu/hpc/flux-operating-environment

Where to turn for help

Purchasing a desktop or laptop for you or a student/post-doc in your group:
- your departmental IT group
Finding out whether a certain software package has already been licensed to the department, college or university:
- your department IT person, or the CAEN software listing: http://caen.engin.umich.edu/software/overview
Licensing a piece of software:
- your department IT person, or Amadi Nwankpa (amadi@umich.edu), the CAEN faculty liaison and software guru
- this is increasingly important to get correct
Purchasing storage:
- ValueStorage: http://www.itcs.umich.edu/storage/value/
Assessing your HPC needs (cluster computing, storage):
- Andrew Caird (acaird@umich.edu), the CAEN Director of HPC
- hpc-support@umich.edu
Getting your doctoral students enrolled in the Scientific Computing PhD or scientific computing certificate program:
- Eric Michielssen (emichiel@umich.edu), the MICDE Director
General questions about using HPC in your research:
- Andrew Caird and Ken Powell

More Information

Visit http://arc.research.umich.edu to learn about upcoming workshops (including the CI days workshop November 13th and 14th)
E-mail hpc-support@umich.edu to discuss your HPC needs
E-mail caen@umich.edu for other help

Friday, August 9, 2013

Scratch Filesystem Details

There has been a lot of questions of how it is built, as of 8/2013 /scratch consists of 12 15,000RPM SAS drives for metadata in a raid 10, and 300 3,000GB 7,200RPM SATA drives for data.

The filesystem is based on Lustre (http://www.whamcloud.com/) a high performance parallel filesystem. For details on how Lustre works visit this NICS's page. If you want to implement any of the details NICS lists please contact us first, as our system is different from theirs.

The 300 3,000GB drives are stored in an SFA10K-X which has two active-active heads. Each head has two paths to each of the 5 disk arrays that hold 60 drives each. Each head can calculate data at the rate of 3-4GB/s. Each head is also a backup to the other, and each head has two paths to each Lustre server. This allows the loss of a head or a path without disrupting data.

At the start of the last unplanned (8/2013) outage a head was removed for service, and a SAS card failed in the remaining head causing 60 drives to disappear of the 300 and yet scratch continued to operate at this point. Though at significant risk to data loss.

The 300 drives are broken into groups of 10, in a double parity (raid6) configuration. Each of the 5 disk shelves has two drives for each group. With raid6, two drives can die in each group of 10 and maintain data. With the drives spread across the shelves an entire shelf can be lost and maintain data.

Each group provides 21.3TB of space in /scratch. These are also known as OSTs. OSTs are the building blocks of Lustre/Scratch. As performance and space needs grow OST's can be added for capacity and performance.

Refer to the NICS page for details, by default a file written to scratch has an entry in the metadata server and the data are stored in one of the 30 OST's. For very large files, or when using MPI-IO (Parallel IO) users should stripe files across OST's where data are distributed across all, or a subset of OST's. Stripeing files allows users to sum up the performance of the OST's. This should only be done for large files. Small files will actually be slower when striped.

In the event an OST is lost. Only the data on that OST is gone. Single stripped files on the failed OST would be completely missing. Striped files would have holes in their data that resided on that failed OST.

The largest known Lustre filesystem is for the LLNL Sequoia system at 55,000TB and hitting performance over 1,024GB/s. Here is a talk from the Lustre User Group on the Sequoia system.

Scratch in its old data center being installed. The 6 machines on top are the Lustre servers, followed by the metadata array, the two heads and the 5 disk shelves.

Back of scratch during installation

Scratch installed in the new Modular Data Center

Closeup of the back of the SFA10k-X Heads showing the 40 SAS connections to the disks. Each connection supports 3GB/s.

Scratch Back Online

Thursday night the checks of the /scratch disk volumes completed letting the staff return access to files on that system. No problems were found. Parity rebuilds continued and performance was significantly degraded. We resumed jobs on Flux around 11pm that night.

On Friday morning the first set of parity calculations (raid5) finished on all the /scratch volumes. Data loss risk was significantly reduced at this point as every volume could now survive a single disk failure. At this point the staff failed some of the volumes over to the other active head (which had been unavailable). This should let the second level parity (raid6) calculation to proceed quickly as, well as double the performance of /scratch for applications running on Flux.

All Flux allocations affected by the outage have been extended by 4 days.

Performance is still degraded over normal operation due to the impact of the remaining parity calculations. Data are now generally safe. The /scratch filesystem is for scratch data and is not backed up. For a listing of the scratch polices visit our scratch page.

Wednesday, August 7, 2013

/scratch problems

While trying to swap out a head that was exhibiting problems we had a SAS card failure.

This failure caused 60 drives to disappear from the system. Because of the ungraceful way the drives were removed, this took away all the redundancy in the raid 6 arrays.

We were able to get the drives back up on the old head (that had been removed) but because they had been missing from the system for 10 minutes, the arrays forced themselves into full rebuild.

Right now scratch has no parity --- none --- and we have 60 drives trying rebuilding on only one head. The other head is up but is not picking up the paths. We have been working with the vendor, DDN, on this.

Right now the head is rebuilding only 30 of the drives (get us up to raid 5) and then will continue onto raid6.

With only one head working we are CPU bound, the rebuild is going at 1%/hour. We are at risk of losing data until the end of the week and it will take another week to get full raid 6.

Thursday, June 20, 2013

COMSOL no longer available on Nyx/Flux

Recently we have learned more about the license the University has with COMSOL for their software. This license precludes us installing any COMSOL software where it can be accessed via the Internet. Because Nyx and Flux can be accessed via the Internet using the U-M VPN or one of the ITS or CAEN login hosts, we are not
currently in compliance with the license.

To come into compliace with the COMSOL license, we will be removing COMSOL from Nyx and Flux on June 15, 2013. COMSOL will still be available in CAEN Computing Labs.

At this point, we are unable to provide access to COMSOL on Nyx, Flux, or machines that allow access from the internet at large, even at at one or more removes.

You might want to contact the COMSOL representative, Siva Hariharan <siva@comsol.com>, and ask them about this. I can't guarantee it, but if a license holder has a license server of their own and requests that we install it but that they will use their own license, we may be able to work something out.

It appears that COMSOL thinks it OK to run it from Amazon, however, if that's an option for you. There is information about that on these pages: https://aws.amazon.com/marketplace/pp/B00A41KQUY/ and http://www.comsol.com/ec2_manual/

We have no experience with that, and it appears that you need to provide your own license server, as, at the bottom of the AWS page, it says:

Refund PolicyThis is a BYOL product. See http://www.comsol.com/sla

If you have access to CAEN lab machines, you may be able to use COMSOL on them; if so, the CAEN Hotline can direct you to the highest-powered ones.

If your group has its own network license and license server for COMSOL, please let us know.