Many applications produce large amounts of data that can be leveraged to produce business-relevant information. However, the evaluation of such immense quantities of data necessitates appropriate platforms with sufficient technical capacities and possibilities. MapReduce is a widely used solution for analyzing, processing and streaming mass data. The MapReduce framework consists of a suite of various tools, which perform tasks such as recording, storing, streaming and analyzing data.
The MapReduce Service (MRS) in the Open Telekom Cloud generates complete clusters with separate functions for saving and processing data. All cluster management functions are triggered via a REST API or using the console (creation, configuration, expansion, scaling down, scanning). These can be integrated or work separately depending on the chosen scenario. When using the Hadoop Distributed File System (HDFS), simultaneous saving and processing is possible. This scenario is suitable for continuous or frequent use. If data is only to be analyzed occasionally, it is recommended to use Object Storage, which offers a lower transmission speed. MRS offers the analysis and data management tools HBase, Hive, Spark, Hadoop and Loader. Zookeeper is a further component of the suite. In addition, MRS supports the streaming services Kafka, Storm, CarbonData and Flume. The service is billed on an hourly basis (VMs including licenses for software images) plus the costs for data storage.