Sunday, April 26, 2015

Filetransfer Tool Performance

On the HPC systems at ARC-TS we have two primary tools for transferring data, scp (secure copy), and Globus (GridFTP).  Other tools like rsync and sftp operate over scp and thus will have performance comparable to that tool.

So which tool performs the best?  We are going to test two cases each moving data to the XSEDE machine Gordon at SDSC. One test will be for moving a single large file, the second will be many small files.

Large file case.

For the large file we are moving a single 5GB file from Flux's scratch directory to the Gordon scratch directory.  Both filesystems can move data at GB/s rates so the network or tool will be the bottleneck.

scp / gsiscp

[brockp@flux-login2 stripe]$ gsiscp -c arcfour all-out.txt.bz2
                             gordon.sdsc.xsede.org:/oasis/scratch/brockp/temp_project
india-all-out.txt.bz2                     100% 5091MB  20.5MB/s  25.6MB/s   04:08   

Duration: 4m:08s

Globus 

Request Time            : 2015-04-26 22:41:04Z
Completion Time         : 2015-04-26 22:42:44Z
Effective MBits/sec     : 427.089

Duration: 1m:40s  2.5x faster than SCP

Many File Case

In this case the same 5GB file was split into 5093 1MB files.  Many may not know that every file has overhead, and that it is well known that moving many small files of the same size is much slower than moving one larger file of the same total size.  How much impact and can Globus help with this impact read below.

scp / gsiscp

[brockp@flux-login2 stripe]$ time gsiscp -r -c arcfour iobench
                             gordon.sdsc.xsede.org:/oasis/scratch/brockp/temp_project/
real 28m9.179s

Duration: 28m:09s

Globus

Request Time            : 2015-04-27 00:18:40Z
Completion Time         : 2015-04-27 00:25:30Z
Effective MBits/sec     : 104.423

Duration: 7m:50s  3.6x faster than SCP

Conclusion

Globus provides significant speedup both for single large files and many smaller files over scp.  The result is even more significant the smaller the files because of the overhead in scp doing one file at a time.