Thursday, December 4, 2014

XSEDE Domain Champions

The Campus Champion program has been a successful part of helping local campus researchers reach out and use national HPC resources as part of the XSEDE (ARC Notes) project.  Our Campus Champion is Brock Palen, a member of the campus HPC support staff, who can be reached at hpc-support@umich.edu.

While Champions have worked well helping people, what about those who need more detailed help in their domain?  XSEDE recently started a new type of Champion, the Domain Champion.  The local champion, Brock Palen, can put you into contact with them.

The current list of Domains are:

  • Bioinformatics/Genomics
  • Data Analytics
  • Economics
  • Digital Humanities
  • Humanities
  • Molecular Dynamics
XSEDE plans to expand this list as Champions become available and demand is expressed.


If you have any questions feel free to ask us at hpc-support@umich.edu


Tuesday, December 2, 2014

Finding the Data Network Bottleneck with perfSONAR and BWCTL

UPDATE: ITS page on perfSONAR
Java based (no software install required) bandwidth test.

Networks are famous for getting the blame for why things are slow. It would be wonderful if one could use tools like IPerf3 on points along the network to hammer down, if the network is the problem, and if so where does the network go bad.  As we all know we all love data, so how can we collect this?

The problem with IPerf is that a server has to be started on the remote end, if you don't have access to a server on the other end you can't run a test.  Enter perfSONAR a way of registering network tests allowing both authenticated and anonymous bandwidth, ping, and other tests.

PerfSONAR publishes a list of tests, and limits what an external anonymous tester can run against it.  By using a PerfSONAR node on the network along your data path, you can find if the network can hit speeds you expect.  In this example we will focus on Bandwidth Test Controller or BWCTL.  BWCTL handles the communication with the perfSONAR box and then relies on popular tools such at IPerf, IPerf3, Nuttcp, etc. to run actual tests.

To run your tests you will need two things, an install of BWCTL with the plugins supported by the endpoints you use, use IPerf3 as most support that.  Most major distributions have packages for BWCTL, if not you can build it from the sites linked above.

You will also need a perfSONAR server to test against. At Michigan as part of the process upgrading the backbone to 100Gig and other links, ITS has installed a series of perfSONAR boxes in each datacenter near the network core.  This is where you should start, make sure you get good performance between your machine and the core servers.

There is a directory for perfSONAR deployments world wide. As of this writing there are 850 BWCTL servers in the directory. For a list of boxes at umich.edu or other domain, you can filter directly to that result. An example server would be ntap-dc-mdc-10g.umnet.umich.edu this is the perfSONAR server in the Modular Data Center, which is the datacenter Flux and the data transfer node flux-xfer.engin.umich.edu are located in.

With BWCTL with IPerf3 installed and the hostname of the perfSONAR server we can run tests:
bwctl -c "ntap-dc-mdc-10g.umnet.umich.edu:4823" -T iperf3 -t 20
[ ID] Interval           Transfer     Bandwidth       Retr
[ 17]   0.00-20.04  sec  14.9 GBytes  6.38 Gbits/sec    0             sender
[ 17]   0.00-20.04  sec  14.9 GBytes  6.37 Gbits/sec                  receiver
If we have 0 retrys and a decent bandwidth, things are looking good. Next test the network in the other direction using -o and -s options in place of -c :
bwctl -o -s "ntap-dc-mdc-10g.umnet.umich.edu:4823" -T iperf3 -t 20
[ ID] Interval           Transfer     Bandwidth       Retr
[ 17]   0.00-20.04  sec  9.85 GBytes  4.22 Gbits/sec    0             sender
[ 17]   0.00-20.04  sec  9.85 GBytes  4.22 Gbits/sec                  receiver

Where to go from here?

Choose servers from the directory along the path you are sending data, you can find the paths using tools like traceroute or tracepath.  Work with network administrators if networks do appear to be slowing your data transfer.  If the network is the problem because of errors normally speeds fall very low.  If you are getting 50% of the network as in our tests above things are probably ok on the network side.

If the network isn't the problem likely the protocol for file transfers is poor and should be replaced with tools like bbcp or our recommended Globus.  Lastly make sure the storage system can send or write data at the speeds of the network.

More Reading

ESnet has a great collection of tools and information at Faster Data.

Monday, December 1, 2014

Classroom HPC Resources

As the fall semester comes to a close those teaching need to start thinking about what resources they will need for their winter offerings.  Many may not realize that there are many options for using HPC resources in courses with no cost.

Depending on what is being taught, most classes would be served by the above. Other organizations also provide classroom/teaching acess such as; NSF Blue Waters, and DoE NERSC.

Any questions about the difference resources can be directed to ARC at hpc-support@umich.edu.  We can provide consulting, guidance, and support.  This includes guest demo and training lectures in your classroom.