Tuesday, November 25, 2014

Optical Networking and Flux

A student recently asked us what kinds of optical networking systems are used in Flux. Thinking this information might be of interest to the community, we decided to post the answer here as well.


Flux uses two main networking technologies; InfiniBand within the cluster and Ethernet to the rest of campus and the Internet.

The type of InfiniBand (IB) network we use is called QDR 4x, the QDR standing for quad data rate. In QDR IB, each data lane has a raw transmission rate of 10 gbit/s and there are four lanes per connection, so each link has a raw data rate of 40 gbit/s. QDR IB can run over copper or fiber optic cables, and the vast majority of the IB cables used in Flux are fiber optic. Our fiber IB cables come pre-terminated with QSFP connectors, so it is not entirely obvious what kind of lasers are used. That said, my understanding is that there are actually eight fiber strands in a QDR IB cable; four 10 gbit/s strands for each direction of data transfer.

On the Ethernet side, we use multiple 10 gbit/s 10GBASE-SR links between the Flux access switches and their serving distribution switches. There are two distribution switches serving Flux; each has a 100 gbit/s Ethernet link to the campus backbone and a 100 gbit/s Ethernet link to the other distribution switch.

The 100 gbit/s link between the distribution switches is of the 100GBASE-SR10 type, and the links between the distribution switches and the campus backbone are 100GBASE-LR4/ER4.

The ARC Data Science Platform (a.k.a. Fladoop or Flux Hadoop) uses nine 40 gbit/s Ethernet connections within the datacenter; each of these are 40GBASE-SR links.

Friday, November 7, 2014

Citing XSEDE Resources

If you are using XSEDE resources in any way please cite this paper:
John Towns, Timothy Cockerill, Maytal Dahan, Ian Foster, Kelly Gaither, Andrew Grimshaw, Victor Hazlewood, Scott Lathrop, Dave Lifka, Gregory D. Peterson, Ralph Roskies, J. Ray Scott, Nancy Wilkens-Diehr, "XSEDE: Accelerating Scientific Discovery", Computing in Science & Engineering, vol.16, no. 5, pp. 62-74, Sept.-Oct. 2014, doi:10.1109/MCSE.2014.80
This will help show to NSF that XSEDE is a valuable resource and should continue to be funded.

Saturday, November 1, 2014

Optimizing Star Cluster on AWS

In a previous post we pointed to instructions and some common tasks for using Star Cluster to create an HPC cluster on the fly in Amazon Web Services.  Now we will focus on some options for optimizing your use of Star Cluster.

Cutting the Fat - Your Head Node is to Big

By default Star creates a head node and N compute nodes. You select the instance type with NODE_INSTANCE_TYPE but this same type is used for the head/master and the compute nodes. In most cases your head node need not be so big. Un-comment MASTER_INSTANCE_TYPE and set to a more modest size instance to control costs. A c3.large for your head at $0.105/hr is much better than paying $1.68 for c3.8xlarge at $1.68/hr but you probably want the C3.8xlarge for compute nodes because it is lower cost per CPU performance.

Use Spot For Lower Costs

Spot instances are where you bid a price you are willing to pay for AWS's extra capacity. Star can use spot, but will only use it for compute nodes, but they are your main costs, plus you don't want your head node being killed if prices rise higher than your bid.

Almost every Star command that starts an instance can be created as a bid, using the -b <bid price> options. 

Switch Regions and/or Availability Zone

AWS is made up of regions which in turn are made of availability zones. Different regions have their own pricing, my favorite are Virginia (us-east-1) and Oregon (us-west-2) for the lowest prices. By default Star will use us-east-1, mostly I switch to us-east-2, why do I do this? Lower Spot prices! The graphs from the AWS console spot price history for c3.8xlarge the fastest single node on AWS from both regions shows the difference.
c3.8xlarge us-east-1 24h price history
c3.8xlarge us-west-2 24h price history
The average price on us-west-2 for compute power on spot is on average much lower than us-east-1.  Be sure to really think about how spot works, you can bid high, and it is possible to pay for a few hours more than the on demand rate. But this keeps your nodes from being killed, and the total spend, the area under the curve should still be much lower than would have been paid under On-Demand.

Changing regions in Star Cluster:

Update the region name and the region host (the machine that accepts AWS API commands from star) and the availability zone.
#get values from:
$ starcluster listregions
$ starcluster listzones
 
AWS_REGION_NAME = us-west-2
AWS_REGION_HOST = ec2.us-west-2.amazonaws.com
AVAILABILITY_ZONE = us-west-2c
Create a new key pair for each region, so repeat that step with a new name for they key in the Star cluster Quick Start, and update the [key key-name-here] for your cluster config.
$ starcluster createkey mykey-us-west-2 -o ~/.ssh/mykey-us-west-2.rsa

[key mykey]
KEY_LOCATION=~/.ssh/mykey-us-west-2.rsa
The AMI names are also per region so when you switch regions you need to update the name of the image to boot, in general select an HVM image
#get list of AMI images for region
$ starcluster listpublic 
NODE_IMAGE_ID=ami-80bedfb0

Get a Reserved Instance for your Head/Master Node

If you know you are going to have your Star cluster up and running for a decent amount of time, and look into Reserved Instances. Discounts of close to 50% can be had compared to on demand pricing.  There are also light, medium, and heavy reserved types which match how often you expect your Star head/master node to be running. Discounts vary on instance type, and term, so refer to AWS pricing to figure out if this makes sense for you.  You can even buy the remaining reserved time from another AWS user, or sell your unused remaining reserved contract on the Reserved Instance Marketplace. Be careful reserved contracts are tied to regions and availability zones, if you plan to move between these to chase lower spot costs your contract won't follow you.

Switch to Instance Store

By default Star uses EBS volumes for the disk for each node. While very simple and allows data stored on the worker nodes to persist even when shutdown EBS has an extra cost. The cost of a few GB of EBS will be small compared to the compute costs, but if you plan to have a large cluster, it can add up to real money.  Consider instance storage supported by Star.   With instance store the compute node will boot by copying the star AMI image.

Most users clusters will not resize long enough to have this mater, contact hpc-support@umich.edu if you want to switch. Just remember to terminate your cluster, not just stop it.  If you stop it the EBS volumes remain.

Building Cloud HPC Clusters with Star Cluster and AWS

Researchers that have been traditional users of HPC clusters have been asking how can they make use of Amazon Web Services (AWS) aka the cloud to run their workloads. While AWS gives you a great hardware infrastructure they really are just renting you bare metal machines by the hour.

I should stress using AWS is not magic. There is a lot that needs to be known to avoid extra costs or risk of losing data if you are new to cloud computing.  Before you start contact ARC at hpc-support@umich.edu.

Admins of HPC clusters know that it takes a lot more than metal to make a useful HPC service which is what researchers really want.  Researchers don't want to spend time installing and configuring queueing systems, exporting shared storage, and building AMI images.

Lucky for the community the nice folks at MIT created Star Cluster. Star cluster is really a set of prebuilt AMIs and a set of python codes that uses the AWS API to create HPC clusters on the fly. Their AMIs also includes many common packages such as MPI libraries, compilers, and python packages.

There is a great Quick-Start guide form the Star team. Users can follow this, but HPC users at the University of Michigan can use the ARC Cluster Flux, which has star cluster installed as an application. Users only need user accounts to access the login node to then create clusters on AWS.
module load starcluster/0.95.5
Following the rest of the Quick-Start guide will get your first cluster up and running.

Common Star Cluster Tasks

Switch Instance Type

AWS offers a number of instance types each with their own features and costs. You switch your instance type with the NODE_INSTNACE_TYPE.
NODE_INSTANCE_TYPE=m3.medium

Make a shared Disk on EBS

EBS is the persistent storage service on AWS.  In Star you can create an EBS volume and then attach to your master node and share across your entire cluster.   Be careful that you don't leave your volumecreator cluster running.  Use the -s flag to createvolume and also check your running clusters with listclusters.

Add/Remove Nodes to Cluster

This is very important when using cloud services. Don't leave machines running you don't need. Compute nodes should be nothing special and should be clones of each other. Using the commands addnode and removenode clusters can be resized to current needs.  In general if you are not computing you should remove all your compute nodes leaving only the master/head node with the shared disk to get data.  You can queue jobs still in this state and then start nodes.
$ starcluster addnode -n # <clustername>
$ starcluster removenode -n 3 smallcluster

Further Reading

Our next post will have tricks for optimizing your Star Cluster