HPC clusters (High Performance Computing Clusters) are the perfect platform for companies seeking insights from computer simulations – whether to optimize the airflow of an aircraft, improve the combustion process within an engine, create climate models or assess the risk of an investment portfolio.
Companies of all sizes are discovering these computer networks to perform complex mathematical calculations for themselves: according to Hyperion Research, a management consultancy specializing in this field, global revenues from high-performance computing will reach over 19.5 billion US dollars by 2022. A growing part of this budget will be invested in the cloud: The management consultancy Gartner estimates that the proportion of companies operating HPC clusters in the cloud will rise to 30 percent by 2023.
If you too are thinking about purchasing an HPC cluster for your business, you need to decide whether to run it in your own data center or in the public cloud. Let's take a closer look at these two options.
What is an HPC cluster and how is it structured?
An HPC cluster consists of a network of often hundreds or thousands of servers that are networked together. Combined, they achieve processing speeds that were once only possible for supercomputers. At the same time, they are far more cost-effective and easier to scale.
In such a cluster the individual computers are called "nodes". Usually it consists of several different node types. Here are the three most important ones:
- Head Node or Master Node: This node controls the others and assigns them tasks. In most cases, the user also dials in via this node – which is why it is often called the Login Node.
- Compute Nodes: They are the workhorses of the cluster and perform the calculations. As hardware they are usually equipped with many CPUs with high clock rates and a lot of RAM, while the hard disks are kept rather small.
- Storage nodes or storage system: These nodes serve as persistent storage of the cluster. They usually contain a high-performance parallel file system (PFS) that allows all nodes to communicate with the storage drives in parallel.
Companies planning to operate such an HPC cluster must choose from three options for its operation:
- All cloud: The cluster runs entirely in the cloud.
- On premise: The HPC cluster is operated entirely in a dedicated data center.
- Cloud bursting: The cluster is operated in its own data center, but during peak loads, it draws on reserves in the public cloud.
Let us take a closer look at each of these three options.
HPC clusters in the cloud
With the all-cloud option, all components of the HPC infrastructure, such as master node, compute nodes and storage nodes, are located in the cloud environment. The user connects to the HPC cluster master node remotely, for example through SSH, to configure and launch the HPC workload.
Companies are quickly becoming aware of the advantages of cloud computing in the HPC sector. As a result, Hyperion forecasts that spending on high-performance computing in the cloud will increase by 83 percent by 2022.
This is not surprising, as it offers several advantages: It reduces the time and cost of deployment. By using HPC in the cloud, you avoid the high initial investment and lengthy procurement cycles of implementing in your own data center: a cluster in the cloud can be created at any time, scaled as needed, and dissolved again when the project is complete – you only pay for the capacity you use. In addition, you can choose from a variety of hardware configurations, such as different CPU or GPU types, try them out for a short period of time and decide which one gives the best results with your specific workload.
In addition, cloud vendors often offer pricing models that provide significant cost savings by allowing you to manage your workloads with time flexibility:
- On-demand: Payment of computing resources per hour or per second. This option is well suited for occasional peaks in workload.
- Reserved: Reserve a time block for the execution of a specific, pre-scheduled work. Some cloud vendors offer a lower price for pre-planned work.
- Spot: The user places bids for unused capacity in the cloud network. This option often offers a lower price for jobs that can be executed outside peak hours.
While a cluster in your own data center runs the risk of becoming obsolete due to technological changes, HPC in the cloud gives you access to new technologies without having to replace hardware. It also frees IT administrators from managing capital-intensive physical hardware so they can focus on software development and analysis.
However, an HPC cluster in the cloud cannot relieve your IT administrators of all management tasks. They will still be busy managing and maintaining operating systems, networks (in the form of VPCs), application software and the distributed software frameworks. In addition, new challenges such as managing cloud credentials, VPNs, direct-connect and PLAS offerings, and data synchronization are emerging.
You also need to consider whether your workload is suitable for pure cloud-based processing. In particular, compute-intensive applications that require little communication between nodes or data movement over network connections benefit from the easy scalability of computing power provided by the cloud. A typical example is genome sequencing.
In short: The All-Cloud option is ideal for these scenarios:
- You want to quickly set up an HPC system without having to wait for the purchase, installation, and configuration of the hardware.
- You need the HPC cluster only temporarily or expect that it won’t be sufficiently used to justify the cost of purchasing a cluster on-premise.
- You have a compute-intensive workload that requires little communication between nodes.
Operating HPC clusters on-premise
The question of whether companies should run their simulations "on premise" – i.e., in their own data center – or in the cloud is primarily a cost issue. Setting up your own data center is a long-term investment. Not only servers, software, and network must be provided, but also power, cooling, buildings, personnel, and insurance. The costs of high-performance computing are comparable to the price per hour of operating a CPU (also called "core"). Of course, the core-per-hour price of an in-house HPC data center decreases on a par with better utilization and longer operation. You also must take into account that hardware, such as processors, become obsolete over time and have to be replaced with newer ones.
However, in addition to the costs, the type of task to be solved with the HPC cluster also plays a role. In an in-house data center, companies can work "close to the metal" and adapt the hardware optimally to the task. In this way, latencies in communication over networks and between nodes can be reduced to a minimum. Since a cloud solution usually uses virtual machines simulated by software, it usually cannot achieve the low latencies of an in-house HPC data center. Furthermore, when operating an HPC cluster in the public cloud, the workload must first be uploaded via the Internet. Extremely large workloads may be impractical or unprofitable to move to the public cloud over the Internet.
As a result, an on-premise solution tends to be better suited for massive workloads that require a lot of communication between nodes and over networks. Ultimately, however, the question of return on investment and the cost per core-hour will always be the deciding factor.
In short: The on-premise option is ideal for these scenarios:
- You expect to use the HPC cluster over a long period of time and with high intensity.
- Your task requires a lot of communication between nodes and over networks.
- You have an extremely large workload whose upload and configuration over the Internet would cause delays or high costs.
In a burst scenario, companies operate a master node, compute nodes, and storage nodes in their local data centers – exactly like the on-premise option. However, during peak loads, they also add compute nodes from a cloud provider.
The cloud bursting configuration combines the advantages of the on-premise option with those of a cloud solution. You can reduce the cost and maintenance of your HPC data center while retaining the ability to run large workloads on demand.
If you already own an HPC system and are looking for a cost-effective way to upgrade, bursting is the best option. Even if you're planning to build a new HPC system, bursting can save you money by designing your local HPC presence for less than the expected peak load and scaling up to the cloud as needed.
A bursting configuration requires a secure, appropriate Internet access method to integrate cloud nodes with the local infrastructure. Network access is typically via a VPN gateway or a dedicated high-bandwidth network connection such as Direct Connect or PLAS.
In summary, cloud bursting is ideal for these scenarios:
- You want to upgrade an existing HPC cluster in a data center but do not want to invest in expensive hardware.
- You are planning to set up an HPC cluster in your own data center and want to save costs by using the public cloud during peak workloads.
HPC clusters provide powerful computing resources for simulations and modeling. Implementation in a dedicated data center is only worthwhile if operation is ensured over an extended period of time with high utilization. In most cases, however, companies plan calculations on a temporary basis. Therefore, operation in the cloud is usually more cost-effective. Existing HPC clusters can also be cost-efficiently expanded through cloud bursting by drawing on resources from the cloud during load peaks. With high-performance computing from the Deutsche Telekom cloud, you can grow flexibly – from one to 720,000 cores.
Do you have questions?
We answer your questions about testing, booking and use - free of charge and individually. Try it! Hotline: 24 hours a day, 7 days a week
0800 33 04477 from Germany / 00800 33 04 47 70 from abroad