Flux HPC: April 2014

Wednesday, April 30, 2014

Using qselect to make job management easy

How many times have you wanted to delete all your queued jobs, but not your running ones? What about putting holds on all your queued jobs so that the next job you submit jumps them all in line?

There is an easy way to do this. A combination of qselect to filter your jobs, and subshells makes this a breeze. For the most recent qselect documentation run man qselect on the login node.

Below is a selection of example qselects. All the options can be combined to make a very flexible set of job selection.

Now that qselect has given us all the job id's we want we can use the sub shell. A subshell, evaluates the command in the subshell (our qselect) first and takes the output of that command and feeds it to the second. To create a subshell, wrap the command you want to run first in $( ). The options below is some of our most common requests.

Users need not stop here, combine qselect to hold all your jobs so that other jobs from your Flux account can run. In this case assume you have jobs queued in two Flux accounts, and I only want todo this in one of them.

Qselect should make your life very easy for mass job changes. There are a few commands (qmove for one) that does not take a list of jobs and thus this method wont work. Contact us if you are in need of one of these commands.

Friday, April 25, 2014

Upcoming High Performance Computing Webinars

High Performance Computing — How Can It Improve My Research?

Where: https://univofmichigan.adobeconnect.com/flux (Select as a Guest)
April 28th 10:30am-11:30am

Learn about what can be accomplished with HPC clusters, such as the ARC Flux cluster and NSF XSEDE machines. We will cover how HPC resources let more work get done, or enable work that was unapproachable before. We will use any remaining time to go into the details of the construction of the Flux system on campus as an example.
-----------------------

Basic Linux Commands and Remote File Transfer and Access

Where: https://univofmichigan.adobeconnect.com/flux (Select as a Guest)
April 30th 10:30am-11:30am

Focused for new Linux users, users will learn the beginner command line. At the end of this session users should feel comfortable manipulating files and navigating the Linux filesystem. Particular focus will be given to connecting to remote Linux systems with SSH from Mac and Windows, using the Flux Cluster as an example.
Commands covered: ssh, sftp, globus, ls, mv, pwd, cd, etc.
------------------------

Using Modules and the PBS Batch system

Where: https://univofmichigan.adobeconnect.com/flux (Select as a Guest)
May 2nd 10:30am-11:30am

Users will learn how to use the extensive Flux software library via the Modules system. Modules provides a powerful and flexible way to manipulate a users environment, easing the management of many software titles on a shared system.
In the remaining time, users will learn how to use PBS, the batch system on Flux, to submit work loads. All users of the cluster must use PBS, and at the end of this session users should feel comfortable submitting basic serial and parallel batch jobs to the cluster.
------------------------

Notes:
Free Flux account required: https://www.engin.umich.edu/form/cacaccountapplication
Sessions are drop in. Attendees need not attend all, or any order of the sessions.
Sessions will be recorded and posted on: https://www.youtube.com/user/UMCoECAC

Wednesday, April 16, 2014

Distributed Memory HFSS Simulations (DDM, DSO)

Back in March we gave an updated way to run HFSS when running on a single node. Two things have happened sense then, we were able to get Distributed HFSS software licenses, and we figured out how to make them work.

Rather than post everything here, we have updated our HFSS documentation page.

We are excited about this because it has been requested many times before to enable Domain Decomposition (DDM) for very large models, and we were limited to using the FluxM service just to use the 40 cores/node. Now users who don't need the large memory footprint can use the Standard Flux services and realize savings.

The second group of users are those doing sweeps or sweeps inside optimization problems. HFSS will use the multiple nodes now and farm out different values in your parameter space to multiple nodes at once. This should be great for those users.

If you want to use HFSS and have questions please contact hpc-support@umich.edu

Tuesday, April 15, 2014

High-speed automated file transfers

Do you have a large amount of data that you need to move?
Do you need to move data reliably, even over unreliable networks?
Would you like to start a moving data and ignore it until you get a notification that it is complete?
Do you move data often between UM and XSEDE or Flux and your laptop?

Globus can be a tremdous help in any of those scenarios, and we have good support for it on campus, especially on Flux!

The source and destination for file transfers using Globus are called endpoints, and Flux’s endpoint supports access to /scratch, files in your Flux home directory, and files in any Value Storage shares that are available to Flux. The other endpoint could be your laptop, an XSEDE site, or even another location on Flux (for example, you can use Globus to move data between a Value Storage share on Flux and /scratch on Flux).

Using Globus is as simple as using a graphical file-transfer client, detailed instructions are at http://cac.engin.umich.edu/resources/login-nodes/globus-gridftp. As always, the HPC support staff on campus are available to help, simply send an email to hpc-support@umich.edu.

Monday, April 14, 2014

The Efficiency of Compute Jobs

Flux users often ask about the efficiency of their compute jobs — how to measure it, what can affect it, how it can be increased, etc.

One measure of efficiency, the ratio of CPU time to wallclock time, is easily accessible to Flux users. The operating system (Linux) used by Flux reports statistics about compute jobs, and those statistics are in turn reported by the job management (PBS) and job scheduling (Moab) system back to the owner of the job. The email sent to job owners should look something like this:

PBS Job Id: ########.nyx.engin.umich.edu

Job Name: myJobName

Exec host: nyx5571/3+nyx5571/10+nyx5571/11+nyx5571/15

Execution terminated

Exit_status=0

resources_used.cput=00:03:42

resources_used.mem=3308kb

resources_used.vmem=315204kb

resources_used.walltime=00:05:00

To calculate the ratio of CPU time to wallclock time, divide “resources_used.cput” by “resources_used.walltime” — in this case, 3m42s / 5m0s or 74% efficiency.

If that is the best you think it can be, then there isn’t much else to do. If your efficiency is below about 60% for a program that you run regularly, if it probably worth it to consider investing some time working to increase the efficiency.

In general, the time spent by a computer program that is not CPU time is time spent waiting for input or output (I/O) of some sort. On Flux the two main sources of I/O are reading or writing data to storage or sending or receiving data over the network.

Understanding what sort of file reading and writing your program is doing will help determine whether efficiency can be improved, and if so, how..

· Opening and closing files repeatedly is a time-consuming process, so ensuring you aren’t doing that in a loop is a good first step.

· If you are reading or writing large files, taking advantage of Lustre striping can help reduce the time spent doing that, improving CPU usage and reducing the amount of wall-clock time required for your program to complete.

Tools like the Allineacode profiler MAP (for serial or parallel programs) or gprof (for serial programs) and the Matlab profiler (for Matlab programs) or the R profiler (for R scripts) or the Python profiler (for Python scripts) can help determine where your program is spending its time. This information can help you decide whether or not there are changes you can make in the code that will not change the results but will improve performance.

Lastly, consider using optimized and well-supported third-party libraries for common tasks in scientific or engineeing programs. There are only rare cases where writing your own Fourier transform is a better option than using an existing FFT library such as FFTW or Intel’sMKL; an existing matrix math library such as Intel’s MKL or NVIDIA’s cuBLAS; existing pseudo-random number generators in MKL or cuBLAS or; or existing file formats such as HDF5 or NetCDF. There are many options for other common tasks, if Google doesn’t help you find them, please ask us and we’ll do our best to help..

When to improve efficiency

Improving program efficiency can pay benefits over a long time; a 5% improvement in performance due to efficiency gains means you get one more free job for every 20 that you run. The value of that gain can be roughly quantified: the total time savings due to efficiency should be greater than the amount of time you spend working to improve efficiency.

For example, if your program takes 6 hours to run and you run it 10 times each week and expect to run it for 50 weeks in the coming years, you’ll be spending 3000 hours waiting for your program to complete. A 5% improvement in wall clock time will save you 150 hours (almost a week) of waiting over those years. At current Flux rates of $11.70/core/month ($0.39/core/day) that is about $2.44 in savings for the Flux costs. If you can improve the efficiency of your code in less than 150 hours of effort, there is significant time savings for you and for anyone else who might be using that program.

The most valuable and costly resource to worry about is the time of the researcher—that is the rarest commodity. Intelligent use of that time is the most important consideration.

For a program or subroutine you’ll only run a couple of times, there is little value in improving its efficiency.

Summary

Be smart about improving efficiency Unless you’re developing code that will be run many times for many years or you suspect you have a serious efficiency problem, the value of your time in working on improving efficiency is probably greater than the value of the computer time you’ll save.
Start by looking at reads from and writes to storage Storage is the slowest I/O on most systems, so minimizing reads and writes can often have a dramatic effect on efficiency and wall-clock time for your programs. If you must read and write data, make use of the fastest storage that you can, either the local /tmp space on every Flux node or the shared /scratch parallel filesystem.
Use well-regarded third-party libraries instead of inventing your own For things like FFTs, matrix algebra, data storage formats, and other common components of scientific or engineering software making use of third-party libraries can have a large positive effect on the performance and efficiency of your program. Some examples are FFTW for FFTs, MKL for matrix algebra, HDF5 for data storage; all of these are available on Flux.
Use a profiler to see where your code is spending its time The Allinea code profiler MAP is available on Flux and can help guide you to the places in your code where changes will have the biggest effect. MAP will also show MPI network traffic to make sure you aren’t spending too much time sending small packets between ranks or blocking progress on some ranks waiting for another rank to deliver updated data.

Tuesday, April 1, 2014

VNC Remote Desktop

VNC (Virtual Network Computing) was recently installed on all the Flux cluster nodes, providing virtual desktop access to the cluster and improving the performance of jobs using graphics or a GUI.

While most users should still strive to make their codes work in batch without graphics or a GUI, sometimes you just need to make a plot or generate a mesh in a GUI-only tool, but you still need the horsepower of the cluster.

Traditionally users had to use X11 Forwarding with the -X option to qsub, this required a working Xserver, and while every Linux and MAC user had one, it was slow, worked badly over slow connections and some applications performed very poorly with it.

VNC is essentially your new Flux Desktop. When started inside a batch or interactive job, VNC will start a desktop on the node you were assigned. You then need a VNC client, and an SSH tunnel, and you can connect that desktop.

The first step is to set your VNC password. (NOTE: Use a totally different password for VNC than for any other service. VNC authentication is very insecure and the password is easy to find and crack.) From a login node run:

$ vncpasswd

Now that we have a working password, we need to get a VNC session started on a compute node. You can use an interactive job, and start vncserver there, or you can submit a batch job.

#PBS -N vncjob
#PBS -l nodes=1:ppn=4,walltime=8:00:00,pmem=1gb
#PBS -A example_fluxg
#PBS -l qos=flux
#PBS -q fluxg
#PBS -M uniqname@umich.edu
#PBS -m b
#PBS -V

# vncserver -geometry XxY  -fg
# -geometry 1280x1024    <= default is 1024x768
# -fg                    <= Run in foreground, needed for PBS

vncserver -geometry 1280x1024 -fg

# be sure to set your e-mail in -M and -m b  (mail when a job starts)
# without it you will have to look manualy for when the job starts.

Wait for the job to start, which you can check with qstat <jobid> or if you set -M uniquename@umich.edu -m b PBS will e-mail you at the beginning of your job letting you know it started. At that point you have a remote desktop running on the first core of your cluster job. Don't leave it idle, if you leave the PBS job running you are blocking resources from other users even though you are not using the VNC Desktop.

Now that a desktop is running (vncserver) we need to create an SSH Tunnel to connect to it. You need to tunnel from your local machine via flux-xfer.arc-ts.umich.edu to the first CPU in your batch job. The script below explains how to find both of those and how to start an SSH Tunnel from Linux, Mac, or Cygwin.

# create an ssh tunnel via flux-xfer to the machine with your VNC Display.
# this example works for Mac, Linux, and Cygwin

# find your vnchost and display number
# If you have no VNC sessions running, you can make this easier by
# cleaning your .vnc folder first with
# $ rm $HOME/.vnc/*.log
# $ rm $HOME/.vnc/*.pid
# prior to submitting the job with your VNC session

ls -rt $HOME/.vnc/

#  eg: nyx5330.arc-ts.umich.edu:1.log
#  eg: nyx5330.arc-ts.umich.edu   <== host running vnc
#  eg: 1   <== display number

# the port number is 5900+display number, which starts at 1 and increments
# if there are already VNC sessions running, which would make this
# template accurate.
# ssh -L 5901:<host running vnc>:5901  flux-xfer.arc-ts.umich.edu

ssh -L 5901:nyx5330.arc-ts.umich.edu:5901  flux-xfer.arc-ts.umich.edu

Users of Windows using Putty to connect by SSH, can follow these instructions.

The Values would be using the example above,

Source Port: 5901
Destination: nyx5330.arc-ts.umich.edu:5901
Host Name (or IP address): flux-login.arc-ts.umich.edu

At this point you should be able to connect a VNC client to localhost:5901 or if your client uses display numbers, display 1.

Here is a list, certainly not exhaustive, of VNC clients.

Linux vncviewer
MAC Chicken of the VNC
Windows/Java Tight VNC

When connecting from a VNC client/viewer the host should be localhost and the port should be the port to which you forwarded in the prior step, in our example 5901. Some viewers want the display number, in this case use the last digit in your port number, in our case 1.

At this point the viewer will either ask for your vnc password, set in step 1, or ask it from you after you connect. You should now see a desktop with a terminal. You can run any GUI application we have on our nodes. You can even spawn parallel jobs with MPI, as the PBS environment is picked up by VNC.

One of VNC's great features is that you can detach and reattach later. This makes it very useful if your connection might drop, or if you work from a laptop and need to change locations. Using Umich VPN you can even create the tunnel from home.

We hope you find that this is a powerful feature giving you access to more of the functionality of Flux's software, and we hope you will find your Flux allocation that much more useful for your research. Below is a video showing the entire process.