In this article by high-performance computing expert Alfred Geiger you will read about,
- which different methods are available to implement high-performance computing with the public cloud
- how companies can find out which one best suits their particular application
- and why the Open Telekom Cloud is the best offer on many levels compared to other public clouds.
Which IT concept is the fastest way for companies to solve complex problems? A general answer is: with the cloud bursting principle, i.e. a clever combination of on-premises and high-performance public cloud computing resources. On-premises covers certain basic tasks, the scalable high-performance capacities from the public cloud are used as required for corresponding peak loads.
If you look under the hood, things get more complex. High-performance computing (HPC) from the public cloud offers a wide variety of options. Companies should therefore know some basic facts about this topic, because not every technology is equally suitable for every application.
When it comes to the processor: CPU or GPU?
Virtual machines with virtualized Central Processing Units (vCPUs), for example, are basically suitable for processing complex processes in a short time. The prerequisite for this is that many powerful vCPUs are working on a problem at the same time. In this area, the Open Telekom Cloud offers high-performance virtual machines with up to 256 vCPUs. Another special feature of these high-performance flavors is that they are made available to companies on a dedicated basis. This means that the hardware on which HPC flavors are based is not shared with others. This guarantees consistently high performance over the entire booking period.
Companies can only achieve even more performance with CPUs with so-called bare metal servers. As the name suggests, companies get the bare metal – that is, they rent dedicated resources from the cloud without hypervisors and virtualization. The computing resources are only equipped with a host operating system. This provides users with a constant performance with maximum design options, because bare metal servers can be designed completely freely on the basis of their own hypervisors and virtualization technologies.
However, depending on the scaling and deployment scenario, computing with vCPUs only makes sense to a certain extent. With the technology currently used in the Open Telekom Cloud, from a certain point virtual computing clusters become inefficient. The reason: x86 computing cores that work on a problem at the same time reach a natural limit once a certain number is reached. The internal network through which the vCPUs are connected is also responsible for this. Here, bandwidth and latency are the limiting factors. Because the vCPUs of the servers within the Open Telekom Cloud are networked with each other using so-called InfiniBand technology, companies can operate computing clusters here with a scope of up to around 1,000 cores without blocking. That’s a very high value: with other public cloud services such as Amazon Web Services or Microsoft Azure, the limit is currently still far lower.
Demand-oriented supercomputer from the cloud
Companies that need even more computing power should rely on computing resources from special supercomputing facilities. Liquid cooling enables a higher packing density than in cloud data centers. Such as the High Performance Computing Center in Stuttgart (HLRS), which is directly linked to the Open Telekom Cloud. Workloads with currently up to 180,000 cores can be processed there simultaneously. The HLRS is currently the only data center of its kind in which companies can book corresponding resources according to their needs using the pay-as-you-go method. This means that they only pay as long as they use the resources. This means that every company has a supercomputer from the cloud that meets its needs via T-Systems.
For AI & Co.: GPUs are the best option
But whether virtualized or not, for some application scenarios it doesn’t make sense to use processors with a conventional CPU architecture. Because CPUs are general-purpose tools that are suitable for many different applications. They can do a lot, but are not really specialized for any application. Like a Swiss army knife that has a lot to offer, from bottle openers to screwdrivers – but which should be replaced by a specialized tool in an emergency, because it simply makes things better and faster.
The same applies to IT: for example, graphics cards with dedicated graphics memory and their own graphics processing units (GPUs) are the more suitable tool for applications in the field of artificial intelligence (AI). Compared to CPUs, they are equipped with a very high number of simple processor cores. This enables them to perform many similar operations in parallel. In this way, corresponding tasks are processed very quickly and cost-effectively. Therefore, GPUs are not only the first choice when it comes to graphics applications such as Computer Aided Design (CAD), rendering or graphics-intensive video games. But also for processes in the field of machine learning – i.e. for training artificial intelligence (AI), where often large amounts of data have to be analyzed and processed in the shortest possible time.
When it gets special: individual computing power
Now every technology mentioned also has its disadvantages. CPUs can do a lot relatively well, but nothing particularly outstanding. That's why many CPUs working in parallel are necessary for high speeds. This makes the use of CPUs above a certain limit expensive and inefficient. GPUs, on the other hand, calculate certain tasks very quickly with countless computing cores. But only certain tasks, because that only works if every core gets the same task.
But which technology should companies use if the tasks to be solved do not fit into these categories? For example, when it comes to accelerating certain computational processes in order to react to new circumstances in real time instead of having to wait a whole night. These can be online shop operators who want to calculate their prices based on current demand and the market environment; after all, they don't want to sell their products too cheaply. Or operators of trading platforms who need to react as quickly as possible to market developments in order to minimize losses.
Applications in the field of weather forecasting are also conceivable, for example in order to be able to warn all the participants at major events about a sudden storm or thunderstorm in good time and bring them to safety. Or companies that want to shorten their time-to-market by considerably accelerating simulations and product development – even though the latter certainly does not take place in real time.
ASICs: Fast, but expensive and not flexible
In traditional IT, companies use Application Specific Integrated Circuits (ASICs) to significantly accelerate certain partial steps of tasks. ASICs are like nimble dinghies that make slow tankers more agile: Application-specific, integrated circuits that are produced and programmed solely for a specific (partial) task of a process. For example, a controller that processes the signals of a radar sensor in a car, monitors the area in front of its vehicle – and, in an emergency, immediately triggers emergency braking if the end of a traffic jam is unexpectedly behind the next bend.
However, ASICs are very expensive to develop, so they only pay off in mass production. For example, in automotive engineering, where ASICs in different model series are installed in hundreds of thousands or even millions of vehicles. Another serious disadvantage of ASICs is their unchangeability. They can do what they were built for – there’s no reprogramming.
FPGAs: Process accelerator on demand
FPGAs, on the other hand, combine the flexibility of freely programmable hardware with the specialization of ASICs. Hence their name: Field Programmable Gate Arrays. Companies can now book FPGAs from the Open Telekom Cloud according to their needs. This enables them to significantly accelerate every conceivable process. However, this requires the appropriate expertise, because FPGAs also need to be programmed adequately.
Only very few companies have the appropriate personnel for this. It’s a challenge that the start-up Xelera solves for companies: The Darmstadt-based company has developed a middleware that allows companies to considerably accelerate their processes with the help of FPGAs without having to be familiar with them. Depending on the scaling, the start-up can provide an acceleration of more than a factor of 100.
But even without such middleware acting as an intermediary between FPGA and certain applications, companies without their own expertise can still use FPGAs. There are ready-made software packages that companies can use to program FPGAs. For example, in the areas of personalized medicine and data analytics. In addition, Deutsche Telekom experts are on hand to help companies accelerate highly specialized applications with the help of FPGAs.
Conclusion: The right HPC technology for every need
The technical achievements of the public cloud enable companies to book high-performance computing resources on demand. With CPUs, GPUs and FPGAs, companies have the right technology for every need. For extreme requirements, the Open Telekom Cloud offers a unique, needs-based, scalable supercomputer on demand through its connection to the High Performance Computing Center (HLRS) in Stuttgart.
Why high-performance computing from the Open Telekom Cloud?
Telekom's public cloud offers companies in the high-performance computing sector many advantages that other public cloud providers cannot offer.
- Holistic approach: With high-performance computing from the Open Telekom Cloud, companies can map complete workflows. The Open Telekom Cloud offers all the tools and expertise for this, because Telekom's specialist contacts provide companies with help and advice: from development and implementation to the operation of a process or application that requires high-performance resources.
- Infiniband network technology: If it says HPC on the outside, then it’s also HPC on the inside – the high-performance computing resources of the Open Telekom Cloud are completely designed for high performance on the hardware side; from the clock frequency via the main memory to the Infiniband network connection. In this way, companies can work in a data-centric way. This means using all data from a single source. This prevents power-intensive loading times. An offer that goes far beyond what conventional cloud providers have to offer.
- Hybrid cloud: With the Open Telekom Cloud Hybrid Solution, Telekom offers a combination of public and private cloud on the same hardware and software basis. This enables companies to implement cloud bursting scenarios faster and easier.
- Direct connection to Stuttgart: With the Open Telekom Cloud, Telekom not only offers companies HPC capacities on demand that can be used and paid for as needed. In addition, companies can use the supercomputing capacities of the High Performance Computing Center Stuttgart (HLRS), based on the same pay-as-you-go model. This is unique on the market, as supercomputing has so far only been offered by other providers in a dedicated supply model.
- Expert advice directly from the provider: Deutsche Telekom employs dozens of HPC and supercomputing experts who advise companies on all aspects of their complex problems.
- Security and data protection: The Open Telekom Cloud is one of the most secure public cloud offerings in the world. The German data centers have already been certified several times by an independent body and have received awards for their high level of data security and data protection. In addition, Deutsche Telekom's own "Telekom Security" division employs countless experts who keep an eye on the security of their corporate customers in the operational environment and continuously improve it.
- End-to-end responsibility: Deutsche Telekom owns and operates its own global network. For this reason, unlike most other public cloud providers, Telekom can offer complete end-to-end responsibility.
- Price/performance: The computing and storage resources of the Open Telekom Cloud adapt flexibly to business development. This also applies to HPC and supercomputing resources. Companies pay only as long as they use the capacities. This enables Deutsche Telekom to offer every company a fully flexible supercomputer from the cloud that meets its needs – a network connection is sufficient. The good price/performance ratio was recently confirmed by Cloud Spectator's "Western Europe Cloud Service Provider Analysis."
About the author
Alfred Geiger holds a doctorate in aerospace engineering and is authorized to teach computational science and engineering at the University of Stuttgart. After completing his studies, he spent more than 15 years researching at the Stuttgart University. After another decade in various positions at T-Systems, eight years ago he was appointed as Managing Director of Höchstleistungsrechner für Wissenschaft und Wirtschaft GmbH, a joint venture between T-Systems, Porsche, the High-Performance Computing Center Stuttgart (HLRS) and the Karlsruhe Institute of Technology (KIT), as well as Head of Scientific Computing at T-Systems.
Do you have questions?
We answer your questions about testing, booking and use - free of charge and individually. Try it! Hotline: 24 hours a day, 7 days a week
0800 33 04477 from Germany / 00800 33 04 47 70 from abroad