Monday, April 14, 2014

The Efficiency of Compute Jobs

Flux users often ask about the efficiency of their compute jobs — how to measure it, what can affect it, how it can be increased, etc.

One measure of efficiency, the ratio of CPU time to wallclock time, is easily accessible to Flux users. The operating system (Linux) used by Flux reports statistics about compute jobs, and those statistics are in turn reported by the job management (PBS) and job scheduling (Moab) system back to the owner of the job. The email sent to job owners should look something like this:

PBS Job Id: ########.nyx.engin.umich.edu
Job Name:   myJobName
Exec host:  nyx5571/3+nyx5571/10+nyx5571/11+nyx5571/15
Execution terminated
Exit_status=0
resources_used.cput=00:03:42
resources_used.mem=3308kb
resources_used.vmem=315204kb
resources_used.walltime=00:05:00

To calculate the ratio of CPU time to wallclock time, divide “resources_used.cput” by “resources_used.walltime” — in this case, 3m42s / 5m0s or 74% efficiency.
If that is the best you think it can be, then there isn’t much else to do. If your efficiency is below about 60% for a program that you run regularly, if it probably worth it to consider investing some time working to increase the efficiency.
In general, the time spent by a computer program that is not CPU time is time spent waiting for input or output (I/O) of some sort. On Flux the two main sources of I/O are reading or writing data to storage or sending or receiving data over the network.
Understanding what sort of file reading and writing your program is doing will help determine whether efficiency can be improved, and if so, how..
·         Opening and closing files repeatedly is a time-consuming process, so ensuring you aren’t doing that in a loop is a good first step.

·         If you are reading or writing large files, taking advantage of Lustre striping can help reduce the time spent doing that, improving CPU usage and reducing the amount of wall-clock time required for your program to complete.

Tools like the Allineacode profiler MAP (for serial or parallel programs) or gprof (for serial programs) and the Matlab profiler (for Matlab programs) or the R profiler (for R scripts) or the Python profiler (for Python scripts) can help determine where your program is spending its time. This information can help you decide whether or not there are changes you can make in the code that will not change the results but will improve performance.
Lastly, consider using optimized and well-supported third-party libraries for common tasks in scientific or engineeing programs. There are only rare cases where writing your own Fourier transform is a better option than using an existing FFT library such as FFTW or Intel’sMKL; an existing matrix math library such as Intel’s MKL or NVIDIA’s cuBLAS; existing pseudo-random number generators in MKL or cuBLAS or; or existing file formats such as HDF5 or NetCDF. There are many options for other common tasks, if Google doesn’t help you find them, please ask us and we’ll do our best to help..
When to improve efficiency
Improving program efficiency can pay benefits over a long time; a 5% improvement in performance due to efficiency gains means you get one more free job for every 20 that you run. The value of that gain can be roughly quantified: the total time savings due to efficiency should be greater than the amount of time you spend working to improve efficiency.
For example, if your program takes 6 hours to run and you run it 10 times each week and expect to run it for 50 weeks in the coming years, you’ll be spending 3000 hours waiting for your program to complete. A 5% improvement in wall clock time will save you 150 hours (almost a week) of waiting over those years. At current Flux rates of $11.70/core/month ($0.39/core/day) that is about $2.44 in savings for the Flux costs. If you can improve the efficiency of your code in less than 150 hours of effort, there is significant time savings for you and for anyone else who might be using that program.
The most valuable and costly resource to worry about is the time of the researcher—that is the rarest commodity. Intelligent use of that time is the most important consideration.
For a program or subroutine you’ll only run a couple of times, there is little value in improving its efficiency. 
Summary
  1. Be smart about improving efficiency Unless you’re developing code that will be run many times for many years or you suspect you have a serious efficiency problem, the value of your time in working on improving efficiency is probably greater than the value of the computer time you’ll save.
  2. Start by looking at reads from and writes to storage Storage is the slowest I/O on most systems, so minimizing reads and writes can often have a dramatic effect on efficiency and wall-clock time for your programs. If you must read and write data, make use of the fastest storage that you can, either the local /tmp space on every Flux node or the shared /scratch parallel filesystem.
  3. Use well-regarded third-party libraries instead of inventing your own For things like FFTs, matrix algebra, data storage formats, and other common components of scientific or engineering software making use of third-party libraries can have a large positive effect on the performance and efficiency of your program. Some examples are FFTW for FFTs, MKL for matrix algebra, HDF5 for data storage; all of these are available on Flux. 
  4. Use a profiler to see where your code is spending its time The Allinea code profiler MAP is available on Flux and can help guide you to the places in your code where changes will have the biggest effect. MAP will also show MPI network traffic to make sure you aren’t spending too much time sending small packets between ranks or blocking progress on some ranks waiting for another rank to deliver updated data.