Saturday, August 22, 2015

Flux High-Speed Data Transfer Service

Do you have a large data set on your own storage equipment that you would like to process on Flux? We can accommodate up to 40 gigabits per second of data transfer in and out of Flux via the campus Ethernet backbone. There is no additional cost to use this service, but you do need to contact us in order to set it up.

By default, network traffic between Flux compute nodes and other systems on campus takes place over standard one gigabit Ethernet connections. This is sufficient for modest amounts of traffic such as that generated by administrative tasks, monitoring, and home directory access.

Traffic between Flux and its high-speed /scratch filesystem runs over a separate 40 gigabit per second InfiniBand network within the datacenter, and data between Flux and off-campus systems on the Internet can be staged through our transfer server at up to 10 gigabits per second. This would seem to leave a gap though: what if you want direct high-speed connections between the Flux nodes and other systems on campus? We provide such connections using a Mellanox BX5020 InfiniBand/Ethernet gateway:

The Flux BX5020 Gateway

The gateway connects to the Flux InfiniBand network and to the campus Ethernet network and allows traffic to flow between the two networks. The InfiniBand network runs at 40 gigabits per second, and the gateway has four 10 gigabit links to the campus Ethernet network. This allows any Flux node to communicate with any system on campus at up to 40 gbit/s.

We have a customer that has multiple petabytes of data on their own storage equipment which they have been using Flux to process. We mount this customer's NFS servers on Flux and route the traffic through the gateway. The customer is currently running jobs on Flux against two of their 10-gigabit connected servers, and last weekend they reached a sustained data transfer rate into Flux of 14.3 gigabits per second.

Gateway traffic for the week of 8/11/2015 - 8/18/2015
Although we have pushed more than 14 gbit/s through the gateway during testing, this is a new record for production traffic through the system.

Our gateway is currently connected to the Ethernet network at 40 gigabits per second, but it can be readily expanded to 80 and possibly 120 gigabits per second as needed. Additionally, we plan to replace the existing gateway in the near future with newer equipment. The planned initial bandwidth for the new equipment is 160 gbit/s, and there is room for growth even beyond that.

No changes to your network configuration are needed to use the gateway; those changes take place on our end only. All you have to do is export your storage to our IP ranges. If you want to discuss or get set up for this service, please let us know! Our email address is hpc-support@umich.edu and we will be happy to answer any questions you have.

If you are are interested in the technical details of how the gateway works, this presentation from Mellanox on the Ethernet over InfiniBand (EoIB) technology used by the system should prove informative. There is no need to know anything about EoIB in order to use the service; the link is provided strictly for the curious.

Next Flux Bulk Purchase and Flux Operating Environment

Update 9/29: Final quoting took longer than expected see our post for details. Additions to the order must be placed by October 13th.

Update 8/28: The date for expressing interest was extended to Tuesday, September 8th.  After September 8th, a final pricing proposal will be sent to the vendors.

Flux will be purchasing new cores for the Standard and Large Memory Service.  Because we realize that not all funding sources allow for the purchasing of a service like Flux we provide The Flux Operating Environment (FOE).

FOE is the Flux service minus the hardware, thus a grant that provides only hardware (capital) funds is able to add nodes to Flux, where ARC-TS provides login, storage, network, support, power, etc.

More importantly grants submitted from LSA, COE, and the Medical School have no cost for placing grant nodes in FOE.  Thus the only cost to the researcher is the node and is granted dedicated access to it.

Because Flux is going to be making a larger (4000 core) purchase any faculty with such funds are invited to join in our purchase process.  If you are interested email hpc-support@umich.edu by August 28th with your node requirements.

Flux 7 2 socket nodes:

  • 128GB Ram
  • 2 x E5-2680V3 CPU (24 Total Core)
  • 3TB 7200 RPM HDD or 1TB 7200RPM HDD
  • EDR Infiniband (100Gbit ConnectX-4)
Flux 7 4 socket nodes:
  • 1024 - 2048GB Ram
  • 4x E5 class CPU (40-48 core)
  • 3TB 7200 RPM HDD
  • EDR Infiniband (100Gbit ConnectX-4)
Faculty purchasing their own via FOE can modify the drive and memory types and quantity to match their need and likely still get the bulk purchasing power by purchasing with Flux.  Researchers who wish to purchase other specialty nodes (GPU, Xeon-PHI, FPGA, Hadoop/Spark, etc.) are still encouraged to contact us.

Sunday, August 2, 2015

XSEDE15 Updates

We recently returned from the XSEDE 15 representing Michigan and learning about the new resources and features coming online at XSEDE.  What follows are our notes;  there will be a live stream webinar August 6th 10:30am for one hour.  If you have questions please attend:

Webinar: ARC-TS XSEDE[15] Faculty and Student Update
Location: http://univofmichigan.adobeconnect.com/flux/
Time: August 6th 10:30am-11:30am
Login:  (Select Guest, use uniquename)

Champions Program Update

Michigan currently participates in the Campus Champions program via the staff at ARC-TS.  There are two newer programs that faculty and students might take interest in:

Domain Champions

Domain Champions are XSEDE participants like Campus Champions but sorted by field.  These Champions are available nationally to help researchers in their fields even if they do not use XSEDE resources:

Domain Champion Institution
Data Analysis Rob Kooper University of Illinois
Finance Mao Ye University of Illinois
Molecular Dynamics Tom Cheatham University of Utah
Genomics Brian Couger Oklahoma State University
Digital Humanities Virginia Kuhn University of Southern California
Digital Humanities Michael Simeone Arizona State University
Chemistry and Material Science Sudhakar Pamidighantam Indiana University

Student Champion

The Student Champions program is a way for graduate students (preferred but not required) to get more plugged into supporting researchers in research computing.  Michigan does not currently have any student champions.  If you are interested contact ARC-TS at hpc-support@umich.edu.

New Clusters and Clouds

Many of the new XSEDE resources coming online or already available are adding virtualization capability. This ability is sometimes called cloud but can have subtle differences depending what resources you are using.  If you have questions about using any of the XSEDE resources contact ARC-TS at hpc-support@umich.edu.

NSF and XSEDE have recognized that data plays a much larger role than in the past.  Many of the resources have added persistent storage options (file space that isn't purged) as well as database hosting and other resources normally not found on HPC clusters.

Wrangler 

Wrangler is a new data focused computer and is in production.  Notable features are:
  • iRODS Service Available and persistent storage options
  • Can host long running reservations for databases and other services if needed.
  • 600TB of Flash storage directly attached. This storage can change its identity to provide different service types (GPFS, Object, HDFS, etc.).  Sustains over 4.5TB/minute terasort benchmark.

Comet

Comet is a very large traditional HPC system recently in production.  It provides over 2 petaflops of compute mostly in the form of 47,000+ cpu cores.  Notable features are:
  • Host Virtual Clusters, these are customized cluster images when researchers need to make modifications that are not possible in the traditional batch hosting environment. 
  • 36 nodes with 2x Nvidia k80 GPUs (4 total GPU dies / node)
  • SSD in each local node for fast local IO.
  • 4 nodes with 1.5TB

Bridges

Bridges is a large cluster that will support more interactive work, virtual machines, and database hosting along with traditional batch HPC processing.  Bridges is not yet in production, some notable features are:
  • Nodes with 128GB, 3TB, and 12TB of RAM
  • Reservations for long running database, web server and other services
  • Planned support for Docker containers

Jetstream

Jetstream is a cloud platform for science.  It is OpenStack based and will give researchers great control over their exact computing environment.  Jetstream is not yet in production, notable features are:
  • Libraries of VM's will be created and hosted in Atmosphere, researchers will be able to contribute their own images, or use other images already configured for their needs. 
  • Split across two national sites geographically distant

Chameleon

Chameleon is an experimental environment for large-scale cloud research. Chameleon will allow researchers to not only reconfigure the images as virtual machines but as bare metal.  Chameleon is now in production, some notable features are:
  • Geographically separated OpenStack private cloud
  • Not allocated by XSEDE but allocated in a similar way

CloudLab

CloudLab is a unique environment where researchers can deploy their own cloud to do research about clouds or on clouds.  It is in production, some notable features are:
  • Able to prototype entire cloud stacks under researcher control, or bare metal
  • Geographically distributed across three sites
  • Support multiple network types (ethernet, infiniband)
  • Supports multiple CPU types (Intel/X86, ARM64)

XSEDE 2.0

XSEDE was a 5 year proposal we are wrapping up year 4.  The XSEDE proposal doesn't actually provide any of the compute resources these are their own awards and are allocated only by the XSEDE process.  A new solicitation was extended for another 5 years and a response is currently under review by NSF.  The next generation of XSEDE aims to be even more inclusive and focus more on data intensive computing.

XSEDE Gateways, Get on the HPC Train With Less Effort

We have written about XSEDE (Arc Docs) before, a set of national computing resources for research.

XSEDE Gateways on the other hand are simple, normally web-based front ends to the XSEDE computers for specific areas of interest.  They lower the barrier to getting started utilizing super computers in research, and are a great educational tool also.

List of current XSEDE Gateways. 

One might want to use a gateway for the following reasons:
  • Not comfortable with using super computers at the command line
  • Don't need the power of a huge system but need more than their laptop
  • Are looking for an easy to introduce new users to an area of simulation
  • Undergraduate work supplementing  course material
A snapshot of some portals (over 30 at this writing):


The iPlant Collaborative Agave API Integrative Biology and Neuroscience Visit Portal
VLab - Virtual Laboratory for Earth and Planetary Materials Materials Research Visit Portal
NIST Digital Repository of Mathematical Formulae Mathematical Sciences Visit Portal
Integrated database and search engine for systems biology (IntegromeDB) Molecular Biosciences Visit Portal
ROBETTA: Automated Prediction of Protein Structure and Interactions Molecular Biosciences Visit Portal
Providing a Neuroscience Gateway Neuroscience Biology Visit Portal
General Automated Atomic Model Parameterization Physical Chemistry Visit Portal
SCEC Earthworks Project Seismology Visit Portal
Asteroseismic Modeling Portal Stellar Astronomy and Astrophysics Visit Portal
CIPRES Portal for inference of large phylogenetic trees Systematic and Population Biology Visit Portal
Computational Anatomy Visualization, Graphics, and Image Processing Visit Portal