Sunday, January 26, 2014

High-speed Ethernet Network Connections to Flux

Flux provides high-speed Ethernet connections to locations elsewhere on- or off-campus using a hardware-based gateway between its high-speed InfiniBand network and the campus Ethernet network. There is no additional cost to use the gateway beyond the cost of using Flux.
Schematic diagram of the connection between Flux nodes and the U-M campus network showing the connection speeds and paths
The most common use case for this high-bandwidth connection to Flux nodes is to access NFS-based storage, although it isn’t restricted to that at all.

While any high-bandwidth network demand or application will work with this gateway and the Flux compute nodes, the example we have is of bandwidth to NFS storage. A researcher in the School of Public Health has a high-bandwidth NFS file server connected to the campus backbone at 10Gbps. He also has a Flux allocation and asked us to ensure that the network path to his storage servers is via the InfiniBand to Ethenet Gateway.

After the configuration was set, which is available to anyone using Flux, his compute jobs read data from his NFS server at between 4.6Gbps and 9.2Gbps—approaching the maximum speed of the 10Gbps interface on the file server.

Network traffic approaching 10Gbps in January 2014; this traffic was between Flux nodes and one file server attached at 10Gbps

The gateway hardware handles the balance between the two 10Gbps links to the campus network, ensuring that there is no bottleneck between Flux and the campus network.

Balanced network traffic to the 10Gbps Ethernet connections to campus minimizes network bottlenecks
The InfiniBand to Ethenet gateway is a physical system from Mellanox called a BX5020. The BX5020 has the capacity to support up to 12 10Gbps Ethenet connections, so there is sufficient network bandwidth for a large number of connections to other network locations on campus.
If you are interested in making use of this high-bandwidth Ethernet connection, please let us know at hpc-support@umich.edu.

Wednesday, January 22, 2014

When to use Flux and When to use XSEDE

We had a user ask the following:
I am a bit confused about what the XSEDE program is. On your website, you
indicate that XSEDE provides computing at no cost to researchers. Who is
eligible to use the XSEDE computing at no charge? What is the incentive for
using the for-fee Flux service if XSEDE is free?
I had never thought of this case before and it is very important.  I have copied our reply to the user below for anyone else who has had this question before. Text in [square brackets] added for clarity in this post.

I should start by pointing out that I [Brock Palen] am both a CAEN-HPC employee/Flux/Nyx Admin as well as the XSEDE Campus Champion for campus.

XSEDE has no monetary cost, but does have an resource application process where you propose the use for the requested resources.  Flux has no matching policy, if you can bill a shortcode [University Account] you can get Flux Time generally.

XSEDE is provided by the NSF and is open to any open research in the country that passes their allocation review process.

Beyond the allocation policies for access, the provisioning of resources once you have access is different.

A Flux allocation is much more like a lease.  Within our resource definition we generally promise that if you have 32 cores, you can run 32 cores _now_.

Xsede works as a debit card that is oversubscribed.  You have X hours you can burn up, and within each resources policies your job will run if you have enough hours left AND the resources are idle. Thus jobs may/will queue on Xsede.

So Flux gives you much more 'right now' access and might be a better platform for development.  Xsede will let you have bigger jobs, but they will take time before they start and turn around to job start will/may be longer.

Lastly if you use any commercial software the license may not allow you to run it on Xsede.  We have many such packages on Flux and it is easier for us to make software license restrictions locally on Flux. No such restrictions are possible on Xsede generally.

There are other pros and cons.  Personally I would try to use them both.  Flux resources can be provisioned quickly (normally same or next work day)  large Xsede requests are only checked 4x/year other than startup and classroom allocations  which are much more often (2 weeks on average).
Additional thoughts: There are many different resources available to researchers.  I would encourage you to contact us early in your planning process to discuss all the options available.

Monday, January 20, 2014

Theano for GPU Computing

In the near future Flux will be offering GPGPU services based on the Nvidia K20x GPU.

A user had requested support for Theano for GPU computing, so we installed it:
Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. Theano features:
  • tight integration with NumPy – Use numpy.ndarray in Theano-compiled functions.
  • transparent use of a GPU – Perform data-intensive calculations up to 140x faster than with CPU.(float32 only)
  • efficient symbolic differentiation – Theano does your derivatives for function with one or many inputs.
  • speed and stability optimizations – Get the right answer for log(1+x) even when x is really tiny.
  • dynamic C code generation – Evaluate expressions faster.
  • extensive unit-testing and self-verification – Detect and diagnose many types of mistake.
 To use Theano for GPU's run it as so:
Setting device=gpu rather than gpu# lets our system assign the correct GPU for you.  For testing on CPU set device=cpu. Theano has many configuration options but the above are the most common.

 If you want to use Theano on multiple GPUs in a single job contact us.


Thursday, January 16, 2014

New HPC Training Sessions

Offered by LSA IT for the entire University community.

Here's an opportunity for Flux cluster users and potential users to learn about using the Flux computing cluster.  This will be a hands-on experience, in which you will log in to the cluster and work with jobs.

We're offering three courses this term, some with multiple sessions:

HPC 100:
This course will familiarize the student with the basics of accessing and interacting with high-performance computers using the GNU/Linux operating system's command line. Topics include: a brief overview of Linux, the command shell, navigating the file system, basic commands, shell redirection, permissions, processes, and the command environment. Through hands-on experience, students will become familiar with the Linux command-line interface to high-performance computer systems, or other Linux systems for manipulating and analyzing data.

HPC 101 (prerequisite is HPC 100 or equivalent):
This course will provide an overview of cluster computing in general and how to use the U-M Flux Cluster in particular. Topics to be covered include cluster computing concepts, common parallel programming models, introduction to the Flux Cluster; creating, submitting, observing, and analyzing cluster jobs; common pitfalls and how to avoid them; and some useful tools. We will issue you a temporary allocation to use for the course, or you can use your existing Flux allocations, if any. Short sample programs will be provided, or come to class with your own.

HPC 201 (prerequisite is HPC 101 or equivalent):
This course will cover some more advanced topics in cluster computing on the U-M Flux Cluster. Topics to be covered include a review of common parallel programming models and basic use of Flux; dependent and array scheduling; advanced troubleshooting and analysis using checkjob, qstat, and other tools; use of common scientific applications including Python, MATLAB, and R in parallel environments; parallel debugging and profiling of C and Fortran code, including logging, gdb (line-oriented debugging), ddt (GUI-based debugging) and map (GUI-based profiling) of MPI and OpenMP programs; and an introduction to using GPUs.

Please visit http://ttc.iss.lsa.umich.edu/ttc/sessions/tag/hpc/ to register for the session(s) of your choice.  Seating is limited, so please register early.  We plan on teaching this courses again next term, so folks can take them later if they can't make it this time.

Please forward this to your researchers who are using or contemplating using Flux for their research.  We think graduate students will want to attend, and faculty and admins are certainly welcome as well.