MarketplaceCommunityDEENDEENProductsCore ServicesRoadmapRelease NotesService descriptionCertifications and attestationsPrivate CloudManaged ServicesBenefitsSecurity/DSGVOSustainabilityOpenStackMarket leaderPricesPricing modelsComputing & ContainersStorageNetworkDatabase & AnalysisSecurityManagement & ApplicationsPrice calculatorSolutionsIndustriesHealthcarePublic SectorScience and researchAutomotiveMedia and broadcastingRetailUse CasesArtificial intelligenceHigh Performance ComputingBig data and analyticsInternet of ThingsDisaster RecoveryData StorageTurnkey solutionsTelekom cloud solutionsPartner cloud solutionsSwiss Open Telekom CloudReferencesPartnerCIRCLE PartnerTECH PartnerBecome a partnerAcademyTraining & certificationsEssentials trainingFundamentals training coursePractitioner online self-trainingArchitect training courseCertificationsCommunityCommunity blogsCommunity eventsLibraryStudies and whitepaperWebinarsBusiness NavigatorMarketplaceSupportSupport from expertsAI chatbotShared ResponsibilityGuidelines for Security Testing (Penetration Tests)Mobile AppHelp toolsFirst stepsTutorialStatus DashboardFAQTechnical documentationNewsBlogFairs & eventsTrade pressPress inquiriesMarketplaceCommunity

0800 3304477 24 hours a day, seven days a week

Write an E-mail 

Book now and claim starting credit of EUR 250
ProductsCore ServicesPrivate CloudManaged ServicesBenefitsPricesPricing modelsPrice calculatorSolutionsIndustriesUse CasesTurnkey solutionsSwiss Open Telekom CloudReferencesPartnerCIRCLE PartnerTECH PartnerBecome a partnerAcademyTraining & certificationsCommunityLibraryBusiness NavigatorMarketplaceSupportSupport from expertsHelp toolsTechnical documentationNewsBlogFairs & eventsTrade pressPress inquiries
  • 0800 330447724 hours a day, seven days a week
  • Write an E-mail 
Book now and claim starting credit of EUR 250

Data Lake Insight (DLI)

Data Lake Insight (DLI) is a serverless big data query and analysis service fully compatible with Apache Spark and Apache Flink ecosystems. DLI supports standard SQL and is compatible with Spark and Flink SQL. It also supports multiple access modes and is compatible with mainstream data formats. DLI supports SQL statements and Spark applications for heterogeneous data sources, including CloudTable, RDS, DWS, CSS, OBS, custom databases on ECSs, and offline databases.

Spark is a unified analysis engine that is ideal for large-scale data processing. It focuses on query, compute, and analysis. DLI optimizes performance and reconstructs services based on open-source Spark. It is compatible with the Apache Spark ecosystem and interfaces and improves performance by 2.5x when compared with open-source Spark. That way, DLI enables you to perform query and analysis of EB's of data within hours.

Flink is a distributed compute engine that is ideal for batch processing, i.e., for processing static data sets and historical data sets. You can also use it for stream processing, i.e., processing real-time data streams and generating data results in real time. DLI enhances features and security based on the open-source Flink and provides the Stream SQL feature required for data processing.

Woman working in front of several screens in a data center

Reasons for DLI in the Open Telekom Cloud

Icon with server and hand

Ease of use

DLI lets you easily explore entire terabytes in your data lake in seconds using standard SQLs with zero O&M burden.

Icon with pie chart and speech bubble with bullet list

One-stop analysis

Fully compatible with Apache Spark and Flink; stream & batch processing and interactive analysis in one place.

Icon with gear and arrow symbol for scalability

Scalable resources

On-demand, shared access to pooled resources, flexible scaling based on preset priorities.

Icon with cloud and server

Cross-source connection

Easy cross-source data access for collaborative analysis with DLI datasource connections, no need for data migration.


Key Features of DLI

Woman with pen in hand works in front of several screens displaying different data

Full SQL compatibility

You do not need a background in big data to conduct big data analyses. You only need to know SQL, and you are good to go. The SQL syntax is fully compatible with the standard ANSI SQL 2003.

 
Icon: Puzzle

Serverless Spark/Flink

Seamlessly migrate your offline applications to the cloud with serverless technology. DLI is fully compatible with Apache Spark, Apache Flink, and Presto ecosystems and APIs.

Icon: Diagram

Cross-source analysis

Analyze your data across databases. No migration required. A unified view of your data gives you a comprehensive understanding of your data and helps you innovate faster. There are no restrictions on data formats, cloud data sources, or whether the database is created online or off.

Icon: Hierarchical structure

Enterprise multi-tenant

Manage compute or resource related permissions by project or by user. Enjoy fine-grained control that makes it easy to maintain data independence for separate tasks.

Icon: Cloud-Datenbank

Storage-compute decoupling

DLI decouples storage from computing so that you can use lower costs while improving resource utilization.

Icon: Time lapse

O&M-free and high availability

DLI frees you from the hassle of complicated O&M and upgrade operations while you enjoy high data availability with dual-AZ deployment.


Identity and Access Management

DLI has a comprehensive permission control mechanism and supports fine-grained authentication through Identity and Access Management (IAM). You can create policies in IAM to manage DLI permissions. You can use both the DLI's permission control mechanism and the IAM service for permission management.

Application Scenarios of IAM Authentication

When using DLI on the cloud, enterprise users need to manage DLI resources (queues) used by employees in different departments, including creating, deleting, using, and isolating resources. In addition, data of different departments needs to be managed, including data isolation and sharing.

DLI uses IAM for refined enterprise-level multi-tenant management. IAM provides identity authentication, permissions management, and access control, helping you securely access to your cloud resources.

With IAM, you can use your cloud account to create IAM users for your employees and assign permissions to the users to control their access to specific resource types. For example, some software developers in your enterprise may need to use DLI resources but should not delete them or perform any high-risk operations. To guarantee this result, you can create IAM users for the software developers and grant them only the permissions required for using DLI resources.

 

DLI system permissions

Roles: A type of coarse-grained authorization mechanism that defines permissions related to user responsibilities. This mechanism provides only a limited number of service-level roles for authorization. When using roles to grant permissions, you need to also assign other roles on which the permissions depend to take effect. However, roles are not an ideal choice for fine-grained authorization and secure access control.

Policies: A type of fine-grained authorization mechanism that defines permissions required to perform operations on specific cloud resources under certain conditions. This mechanism allows for more flexible policy-based authorization, meeting requirements for secure access control. For example, you can grant DLI users only the permissions for managing a certain type of ECSs.
 

Role/Policy Name

Description

Category

DLI FullAccess

All permissions for DLI

System defined policy

DLI ReadOnlyAccess

DLI read permissions

System defined policy

Tenant Administrator

Tenant administrator

  • Administer permissions for managing and accessing all cloud services. After a database or a queue is created, the user can use the Access Control List (ACL) to assign rights to other users.
  • Scope: project-level service

System defined role

DLI Service Admin

DLI administrator

  • Administer permissions for managing and accessing the queues and data of DLI. After a database or a queue is created, the user can use the Access Control List (ACL) to assign rights to other users.
  • Scope: project-level service

System defined role

 

DLI service permissions

Permission Type

Subtype

SQL Syntax

Queue Permissions

Queue management permissions

Queue usage permission

None

Data Permissions

Database permissions

Table permissions

Column permissions

For details, see SQL Syntax of Batch Jobs > Data Permissions Management > Data Permissions List in the Data Lake Insight SQL Syntax Reference.

Job Permissions

Flink job permissions

None

Package Permissions

Package group permissions

Packe permissions

None

Datasource Connection Permissions

Datasource connection permissions

None

 

For details, see Permission-related APIs > Granting Users with the Data Usage Permission in the Data Lake Insight API Reference.

 

DLI console features

SQL Editor

You can use SQL statements in the SQL job editor to execute data query. DLI supports SQL 2003 and complies with Spark SQL.

On the overview page, click ‘SQL Editor’ in the navigation pane on the left or ‘Create Job’ in the upper right corner of the SQL Jobs pane. The SQL Editor page will be displayed.

A message is displayed, indicating that a temporary DLI data bucket will be created. The created bucket is used to store temporary data generated by DLI, such as job logs. You cannot view job logs if you choose not to create it. You can periodically delete objects in a bucket or transit objects between different storage classes. The bucket name is set by default.

Job Management

SQL jobs allow you to execute SQL statements entered in the 4 SQL Editor, import data, and export data.

SQL job management provides the following functions:

  • Searching for jobs: Search for jobs that meet the search criteria.
  • Viewing job details: Display job details.
  • Terminating a job: Stop a job in the ‘Submitting’ or ‘Running’ status.
  • Exporting query results: A maximum of 1000 records can be displayed in the query result on the console. To view more or all data, you can export the data to OBS.
Resources in Queue Management

Queues in DLI are computing resources, which are the basis for using DLI. All executed jobs require computing resources.

Currently, DLI provides two types of queues: for SQL and for general use. SQL queues are used to run SQL jobs. General-use queues are compatible with Spark queues of earlier versions and are used to run Spark and Flink jobs.

Data Management

DLI database and table management provide the following functions:

  • Database Permission Management
  • Table Permission Management
  • Creating a database or a table
  • Deleting a database or a table
  • Modifying the owners of databases and tables
  • Importing data to the table
  • Exporting data from DLI to OBS
  • Viewing metadata
  • Previewing data
Job Template

To facilitate SQL operation execution, DLI allows you to customize query templates or save the SQL statements in use as templates. After templates are saved, you do not need to compile SQL statements. Instead, you can directly perform the SQL operations using the templates.

SQL templates include sample templates and custom templates. The default sample template contains 22 standard TPC-H query statements, which can meet most TPC-H test requirements.

SQL template management provides the following functions:

  • Sample templates
  • Custom templates
  • Creating a template
  • Executing the template
  • Searching for a template
  • Modifying a template
  • Deleting a template
Datasource Connections

DLI supports the datasource capability of the native Spark and extends it. With DLI datasource connection, you can access other data storage services through SQL statements, Spark jobs, and Flink jobs and import, query, analyze, and process data in the services.

Global Configuration

Global variables can be used to simplify complex parameters. For example, long and difficult variables can be replaced to improve the readability of SQL statements.

 

Application scenarios

Analytics

Database Analysis

Application data stored in relational databases needs analysis to derive more value. For example, big data from registration details helps with commercial decision-making.

Pain Points

  • Complicated queries are not supported for larger relational databases.
  • Comprehensive analysis is not possible because database and table partitions are spread in multiple relational databases. Business data analysis might overload available resources.


Advantages

  • SQL experience transferability
    Hit the ground running with new services. DLI supports standard ANSI SQL 2003 relational database syntax so there is almost no learning curve.
  • Versatile, robust performance
    Distributed in-memory computing models effortlessly handle complicated queries, cross-partition analysis, and business intelligence processing.


Related Services

DataArts Studio

Cloud Data Migration (CDM)

E-Commerce

Precision Marketing

Associative analysis combines information from multiple channels to improve conversion rates.

Advantages

  • Cross-source analysis
    Advertisement CTR data stored in OBS and user registration data in RDS can be directly queried without migration to DLI.
  • Only SQL needed
    Interconnected data sources map together with a table created using just SQL statements.
Large Enterprises

Permission Control

When multiple departments need to manage resources independently, fine-grained permissions management improves data security and operations efficiency.

Advantages

  • Easier permissions assignment
    Grant permissions by column or by specific operation, such as INSERT INTO/OVERWRITE, and set metadata to read-only.
  • Unified management
    A single IAM account handles permissions for all staff users.
Genetics

Library Integration

Genome analysis relies on third-party analysis libraries, which are built on the Spark distributed framework.

Pain Points

  • High technical skills are required to install analysis libraries such as ADAM and Hail.
  • Every time you create a cluster, you have to install these analysis libraries again.


Advantages

  • Custom images
    Instead of installing libraries in a technically demanding process, package them into custom images uploaded directly to the Software Repository for Container (SWR). When using DLI to create a cluster, custom images in SWR are automatically pulled so you don't have to reinstall these libraries.
Finance

Real-time Risk Control

Almost every aspect of financial services requires comprehensive risk management and mitigation.

Pain Point

  • There is very little tolerance for excessive latency when it comes to risk control.


Advantages

  • High throughput
    Real-time data analysis in DLI with the help of an Apache Flink dataflow model keeps latency low. A single CPU processes 1,000 to 20,000 messages per second.
  • Ecosystem coverage
    Save real-time data streams to multiple cloud services such as CloudTable and SMN for comprehensive application.
Geography

Big Data Analysis

Massive volumes of data include petabytes of satellite images and many types – structured remote sensing grid data, vector data, and unstructured spatial location data. The analysis and mining of all this data needs efficient tools.

Advantages

  • Spatial data analysis
    Spark algorithm operators in DLI enable real-time stream processing and offline batch processing. They support massive data types, including structured remote sensing image data, unstructured 3D modeling, and laser point cloud data.
  • CEP SQL functionality
    SQL statements are all that is needed for yaw detection and geo-fencing.
  • Heavy data processing
    Quickly migrate up to exabytes of remote sensing images to the cloud, then slice them into data sources for distributed batch processing.


Related Services

Data Ingestion Service (DIS)

Cloud Data Migration (CDM)

 
 

The Open Telekom Cloud Community

This is where users, developers and product owners meet to help each other, share knowledge and discuss.

Discover now

Free expert hotline

Our certified cloud experts provide you with personal service free of charge.

 0800 3304477 (from Germany)

 +800 33044770 (from abroad)

 24 hours a day, seven days a week

Write an E-Mail

Our customer service is available free of charge via E-Mail

Write an E-Mail

AIssistant Cloudia

Our AI-powered search helps with your cloud needs.