ARC and CAEN-HPC recently announced our second
generation Hadoop cluster. The first generation cluster lasted about a
year and was validating the software and usefulness of the platform. This new generation platform represents a
real investment into hardware to match the software to enable true data science
at scale.
The Fladoop Cluster |
Architecture
One of the core components of Hadoop is a distributed file
system known as HDFS. The power of Hadoop is the ability to break work up and
move it to chunks of the data, known as a block, on a drive local to the
machine doing the work.
Compared to a traditional compute focused HPC cluster such
as Flux this feature of Hadoop requires nodes with many local hard
drives and a fast TCP network for shuffling the large quantity of data between
nodes.
Fladoop the ARC Hadoop cluster for Data Science consists of
7 data nodes. Each of these nodes has 64GB of main memory, 12 cpu cores and 12
local hard drives. The network tying
them together is 40Gbit Ethernet from Chelsio T580-LP-CR all attached to an Arista 7050q switch. This is 40 times higher bandwidth per host than standard ethernet, and 4times faster than standard high performance 10Gig-e.
HDFS -- Hadoop File System
HDFS has some unique features to both help with data
integrity and performance. HDFS directly
controls each hard drive independently.
Because hard drives fail, HDFS by default copies the data 3
times, to at least 2 unique nodes, and tracks how many copies are available at any time. If a drive or node fails HDFS automatically
makes new copies form the remaining copies.
These copies also can lead to additional performance at the
cost of total available space. Because Hadoop tries to move work to where the
data are local, there are now 3 possibilities to do that before spilling over
to accessing the data over the network.
Getting Started
The use of the new cluster is at no cost. People interested
in using it should contact hpc-support@umich.edu
to get an account, and look over the documentation. ARC also provides
consulting for exploring if the platform will work well for your application.