Flux HPC: aws

Showing posts with label aws. Show all posts

Thursday, September 17, 2015

Globus endpoint sharing available to UM researchers

We have described in a number of Blog posts some features and benefits of using the the Globus File Transfer service. Now that UM is a Globus Provider you have a new feature available to you, sharing of directories and files with your collaborators who are Globus users as well.

There are two avenues of sharing possible for you now. The first is via standard server endpoints that have sharing enabled and another via "Globus Connect Personal" client endpoints. Today I will describe sharing for standard servers endpoints only. Sharing for Personal Connect endpoints is a bit more complicated due to differences between OS versions of the client and will be described later.

To see if the endpoint you use has sharing enabled navigate to the endpoint in "Manage Endpoints" within the Globus web interface. Click on the Sharing tab, note that you may have to Activate (login) a session on the endpoint first. If sharing is enabled you will be told so and will see a "Add Shared Endpoint" button in the panel. Shared endpoints are essentially sub-endpoints you can create and provide access to any other Globus user.

Lets go ahead and make a shared endpoint from umich#flux by clicking on the button. You are presented a web form to provide required information:

Host Path ~/Test_share

You can either give a complete absolute path or use unix shorthand (~/) for my home directory as I have done (make sure the shared directory exists first!).

New Endpoint Name traeker#Test_share

Description Tell others know what this is about.

Clicking on the "Create and Manage permissions" button creates the shared endpoint and presents you with a new panel to manage permissions. It shows you the current access setting and clicking on the "Add Permission" button presents you with a number of options of how to share this endpoint with other Globus users.

Share With check which one to use among email, user, group, all users

Permissions check one or both of read, write

A couple of things you to keep in mind as you set these parameters:

Be careful about choosing all users as this will allow all users logged into Globus to access this share.
By default only read permission is set. If you allow write permission you could get files containing viruses and also get yourself into trouble with any disk usage quotas.

One easy way to manage permissions to a large group of people is to create a Globus group and populate it with users. Be advised that the entire group will have the same permissions so if you need some users to have different permissions, you either create a different group or add each user to the share individually. Using groups comes in handy when you have multiple shared directories to similar sets of collaborators.

Once a directory is shared with another Globus user he/she can find that endpoint name via the "shared with me" filter on the top of the endpoint list panel. With name in hand they can now transfer files from/to that endpoint by typing in the name under the "Transfer Files" screen just like another other endpoint they have access to.

You can go back to this shared endpoint to add new or edit any access settings.

Globus endpoint sharing is very powerful as it gives non-UM collaborators access to your research data without having to create a UM "Sponsored account" for them to access your systems. This is very similar to other cloud file sharing services like Box and Dropbox. The big difference is that Globus does not store the data and thus quotas are managed by your systems policies.

Thursday, July 2, 2015

Sending Data to Amazon AWS S3 storage

Researchers at UM have numerous storage options available to them on and off campus. In this post we focus on moving data to Amazons AWS cloud storage S3 . This storage is fast and easily accessible from other AWS resources as well as UM systems.

To use S3 you first need to have an account in AWS and create what are called S3 buckets. Buckets can be created via the AWS web console or AWS Command Line Interface (CLI) tools on your local systems. Installation and setup instructions are available in the provided link. Below we shall assume this has already been done.

Lets go through a good sample use case of creating a S3 bucket and sending a large backup file to that bucket. First, if you have configured the aws cli tools correctly, it knows your account name and has full access to your S3 resources.

Now create a S3 bucket called "mybackups":

$ aws s3 mb s3://mybackups

To confirm creation and check contents use:

$ aws s3 ls s3://mybackups

Now lets copy the file backup.tar to that bucket:

$ aws s3 cp backup.tar s3://mybackups

In this test case I got 107 MB/s from my laptop which is pretty awesome. This speed is largely due to two things: 1) the aws s3 cp command can break the file into numerous parts and simultaneously send them to the bucket and 2) the route from UM to AWS is via Internet 2 which can be be 1-10 Gb/s depending on your particular uplink speed to the UM backbone. I can confirm that doing this from my home computer is exceedingly slow!

Confirm the file is in the backup via

$ aws s3 ls s3://mybackups

Some among you might say I do not have enough space on my system to make a temporary backup tar file. Fear not, you can make nice use of piping unix utilities to avoid this.

$ tar -czf - raeker | aws s3 cp - s3://mybackups/raeker.tgz

Alternatively you can use the aws s3 sync command! This functions much like the traditional unix rsync command to sync files between a source and target:

$ aws s3 sync my_directory s3://mybackups

Be warned though that if there are lots of files to sync you likely will not get anywhere near the 100 MB/s I got above. Also be advised that AWS charges for operations as well as storage so each file cp/put incurs a request operation towards the $0.005 per 1,000 requests!

You can also use this if you simply need a copy of your local files in a S3 bucket for use in say EC2 instances for computing.

Normally, sync only copies missing or updated files or objects between the source and target. However, you may supply the --delete option to remove files or objects from the target not present in the source.

Of course you can reverse the data flow by making s3://mybackups as the source and local file/folder as target!

In another blog post I will show you how you can automatically archive your s3 object to the considerably cheaper Glacier storage. Stayed tuned.

Saturday, November 1, 2014

Optimizing Star Cluster on AWS

In a previous post we pointed to instructions and some common tasks for using Star Cluster to create an HPC cluster on the fly in Amazon Web Services. Now we will focus on some options for optimizing your use of Star Cluster.

Cutting the Fat - Your Head Node is to Big

By default Star creates a head node and N compute nodes. You select the instance type with NODE_INSTANCE_TYPE but this same type is used for the head/master and the compute nodes. In most cases your head node need not be so big. Un-comment MASTER_INSTANCE_TYPE and set to a more modest size instance to control costs. A c3.large for your head at $0.105/hr is much better than paying $1.68 for c3.8xlarge at $1.68/hr but you probably want the C3.8xlarge for compute nodes because it is lower cost per CPU performance.

Use Spot For Lower Costs

Spot instances are where you bid a price you are willing to pay for AWS's extra capacity. Star can use spot, but will only use it for compute nodes, but they are your main costs, plus you don't want your head node being killed if prices rise higher than your bid.

Almost every Star command that starts an instance can be created as a bid, using the -b <bid price> options.

Switch Regions and/or Availability Zone

AWS is made up of regions which in turn are made of availability zones. Different regions have their own pricing, my favorite are Virginia (us-east-1) and Oregon (us-west-2) for the lowest prices. By default Star will use us-east-1, mostly I switch to us-east-2, why do I do this? Lower Spot prices! The graphs from the AWS console spot price history for c3.8xlarge the fastest single node on AWS from both regions shows the difference.

c3.8xlarge us-east-1 24h price history

c3.8xlarge us-west-2 24h price history

The average price on us-west-2 for compute power on spot is on average much lower than us-east-1. Be sure to really think about how spot works, you can bid high, and it is possible to pay for a few hours more than the on demand rate. But this keeps your nodes from being killed, and the total spend, the area under the curve should still be much lower than would have been paid under On-Demand.

Changing regions in Star Cluster:

Update the region name and the region host (the machine that accepts AWS API commands from star) and the availability zone.

#get values from:
$ starcluster listregions
$ starcluster listzones

AWS_REGION_NAME = us-west-2
AWS_REGION_HOST = ec2.us-west-2.amazonaws.com
AVAILABILITY_ZONE = us-west-2c

Create a new key pair for each region, so repeat that step with a new name for they key in the Star cluster Quick Start, and update the [key key-name-here] for your cluster config.

$ starcluster createkey mykey-us-west-2 -o ~/.ssh/mykey-us-west-2.rsa

[key mykey]
KEY_LOCATION=~/.ssh/mykey-us-west-2.rsa

The AMI names are also per region so when you switch regions you need to update the name of the image to boot, in general select an HVM image

#get list of AMI images for region
$ starcluster listpublic

NODE_IMAGE_ID=ami-80bedfb0

Get a Reserved Instance for your Head/Master Node

If you know you are going to have your Star cluster up and running for a decent amount of time, and look into Reserved Instances. Discounts of close to 50% can be had compared to on demand pricing. There are also light, medium, and heavy reserved types which match how often you expect your Star head/master node to be running. Discounts vary on instance type, and term, so refer to AWS pricing to figure out if this makes sense for you. You can even buy the remaining reserved time from another AWS user, or sell your unused remaining reserved contract on the Reserved Instance Marketplace. Be careful reserved contracts are tied to regions and availability zones, if you plan to move between these to chase lower spot costs your contract won't follow you.

Switch to Instance Store

By default Star uses EBS volumes for the disk for each node. While very simple and allows data stored on the worker nodes to persist even when shutdown EBS has an extra cost. The cost of a few GB of EBS will be small compared to the compute costs, but if you plan to have a large cluster, it can add up to real money. Consider instance storage supported by Star. With instance store the compute node will boot by copying the star AMI image.

Most users clusters will not resize long enough to have this mater, contact hpc-support@umich.edu if you want to switch. Just remember to terminate your cluster, not just stop it. If you stop it the EBS volumes remain.

Building Cloud HPC Clusters with Star Cluster and AWS

Researchers that have been traditional users of HPC clusters have been asking how can they make use of Amazon Web Services (AWS) aka the cloud to run their workloads. While AWS gives you a great hardware infrastructure they really are just renting you bare metal machines by the hour.

I should stress using AWS is not magic. There is a lot that needs to be known to avoid extra costs or risk of losing data if you are new to cloud computing. Before you start contact ARC at hpc-support@umich.edu.

Admins of HPC clusters know that it takes a lot more than metal to make a useful HPC service which is what researchers really want. Researchers don't want to spend time installing and configuring queueing systems, exporting shared storage, and building AMI images.

Lucky for the community the nice folks at MIT created Star Cluster. Star cluster is really a set of prebuilt AMIs and a set of python codes that uses the AWS API to create HPC clusters on the fly. Their AMIs also includes many common packages such as MPI libraries, compilers, and python packages.

There is a great Quick-Start guide form the Star team. Users can follow this, but HPC users at the University of Michigan can use the ARC Cluster Flux, which has star cluster installed as an application. Users only need user accounts to access the login node to then create clusters on AWS.

module load starcluster/0.95.5

Following the rest of the Quick-Start guide will get your first cluster up and running.

Common Star Cluster Tasks

Switch Instance Type

AWS offers a number of instance types each with their own features and costs. You switch your instance type with the NODE_INSTNACE_TYPE.

NODE_INSTANCE_TYPE=m3.medium

Make a shared Disk on EBS

EBS is the persistent storage service on AWS. In Star you can create an EBS volume and then attach to your master node and share across your entire cluster. Be careful that you don't leave your volumecreator cluster running. Use the -s flag to createvolume and also check your running clusters with listclusters.

Add/Remove Nodes to Cluster

This is very important when using cloud services. Don't leave machines running you don't need. Compute nodes should be nothing special and should be clones of each other. Using the commands addnode and removenode clusters can be resized to current needs. In general if you are not computing you should remove all your compute nodes leaving only the master/head node with the shared disk to get data. You can queue jobs still in this state and then start nodes.

$ starcluster addnode -n # <clustername>
$ starcluster removenode -n 3 smallcluster