CommunityDEENDEENProductsCore ServicesRoadmapRelease NotesService descriptionCertifications and attestationsPrivate CloudManaged ServicesBenefitsSecurity/DSGVOSustainabilityOpenStackMarket leaderPricesPricing modelsComputing & ContainersStorageNetworkDatabase & AnalysisSecurityManagement & ApplicationsPrice calculatorSolutionsIndustriesHealthcarePublic SectorScience and researchAutomotiveMedia and broadcastingRetailUse CasesArtificial intelligenceHigh Performance ComputingBig data and analyticsInternet of ThingsDisaster RecoveryData StorageTurnkey solutionsTelekom cloud solutionsPartner cloud solutionsSwiss Open Telekom CloudReferencesPartnerCIRCLE PartnerTECH PartnerBecome a partnerAcademyTraining & certificationsEssentials trainingFundamentals training coursePractitioner online self-trainingArchitect training courseCertificationsCommunityCommunity blogsCommunity eventsLibraryStudies and whitepaperWebinarsBusiness NavigatorSupportSupport from expertsAI chatbotShared ResponsibilityGuidelines for Security Testing (Penetration Tests)Mobile AppHelp toolsFirst stepsTutorialStatus DashboardFAQTechnical documentationNewsBlogFairs & eventsTrade pressPress inquiriesCommunity

0800 3304477 24 hours a day, seven days a week

Write an E-mail 

Book now and claim starting credit of EUR 250
ProductsCore ServicesPrivate CloudManaged ServicesBenefitsPricesPricing modelsPrice calculatorSolutionsIndustriesUse CasesTurnkey solutionsSwiss Open Telekom CloudReferencesPartnerCIRCLE PartnerTECH PartnerBecome a partnerAcademyTraining & certificationsCommunityLibraryBusiness NavigatorSupportSupport from expertsHelp toolsTechnical documentationNewsBlogFairs & eventsTrade pressPress inquiries
  • 0800 330447724 hours a day, seven days a week
  • Write an E-mail 
Book now and claim starting credit of EUR 250

GeminiDB (for Cassandra)

GeminiDB (for Cassandra) is a cloud-native NoSQL database compatible with Cassandra. It supports Cassandra Query Language (CQL), which gives you SQL-like syntax. GeminiDB (for Cassandra) is secure, reliable, scalable, and easy to manage. GeminiDB (for Cassandra) provides outstanding read/write performance and supports Cassandra 3.11 DB engine.

A woman and a man in front of a screen, the woman points to a monitor

Reasons for GeminiDB (for Cassandra) in the Open Telekom Cloud

Blue shield in front of gray server icon.

High security and reliability

A multi-layer security system, including VPC, subnet, security group, SSL, and fine-grained permission control ensures database security and user privacy.
You can deploy an instance across three Availability Zones (AZs) and quickly back up or restore data to improve data reliability.
The distributed architecture provides superlative fault tolerance (N-1 reliability).

Icon with diagram and speedometer

Outstanding read/write performance

GeminiDB (for Cassandra) gives you 3 times the performance of the open-source version. Data can be written to this high availability database 24/7, and with automated load balancing and elastic scaling you always have all the performance you need.

Icon with gear and arrows in each direction

Flexible scaling

Decoupled compute and storage allow you to add compute nodes in minutes and scale up storage capacity in seconds without service interruptions.
The compute clusters consist of multiple homogeneous nodes, and data is stored in a distributed, shared storage pool. Compute and storage resources are decoupled from each other so they can be flexibly scaled in or out without having to migrate any data.


Architecture

GeminiDB (for Cassandra) is a distributed database with decoupled storage and compute architecture. One compute cluster may consist of multiple homogeneous nodes, and data is stored in a distributed, shared storage pool. It allows you to scale compute and storage resources flexibly without having to migrate any data.

Graphic Architecture GaussDB NoSQL

Key Features of GeminiDB (for Cassandra)

Hands on a keyboard symbolizing the data backup.

Data backup

Backups are stored in Object Storage Service (OBS) buckets, which provides disaster recovery capabilities and save space. When you create a DB instance, the automated backup policy is enabled by default. After the creation is complete, an automated full backup is triggered instantly. The backup retention period is 7 days by default. You can set the backup retention period and modify the backup policy. In addition, you can initiate backup at any time according to your service requirements. Manual backups are saved until you manually delete them.

 
Icon Network

Network isolation

GeminiDB (for Cassandra) uses Virtual Private Clouds (VPCs) and network security groups to keep DB instances isolated. VPCs allow you to define which IP addresses are allowed to access a given database. Running a DB instance in a VPC improves security. To further enhance database security, you can configure subnets and security groups to control access to DB instances.

Icon with lock

Access control

VPC security groups can be configured with rules to control traffic to and from DB instances.

Icon with key

Encryption

GeminiDB (for Cassandra) uses Secure Sockets Layer (SSL) to encrypt transmitted data. You can download the root CA certificate from the management console and upload it for authentication when connecting to a database.

Icon with gear

Security system

GeminiDB (for Cassandra) supports multi-layer network protection. The security system consists of VPCs, subnets, security groups, Anti-DDoS, and SSL, which collectively can defend against a wide range of attacks and keep your data secure.

  • VPCs isolate resources and control access.
  • SSL connections ensure data security and integrity.
  • Security group rules control traffic to and from specific IP addresses and ports, protecting connections between GeminiDB (for Cassandra) and other services.
Icon diagram

Performance monitoring

GeminiDB (for Cassandra) monitors instance performance, reducing 60% of O&M activities. It provides real-time monitoring information about CPU utilization, disk usage, IOPS, and number of active connections, allowing you to check instance status at any time.

icon with a hook

Immediately ready for use

You can create a DB instance on the management console and access the database using private network IP addresses to reduce latency and avoid the cost of using a public network.

 

GeminiDB (for Cassandra) instance specifications

Flavor

vCPUs

Memory (GB)

Max. storage space (GB)

geminidb.cassandra.xlarge.arm.8

4

32

24,000

geminidb.cassandra.2xlarge.arm.8

8

64

48,000

geminidb.cassandra.4xlarge.arm.8

16

128

96,000

geminidb.cassandra.8xlarge.arm.8

32

256

192,000

geminidb.cassandra.15xlarge.arm.8

60

480

360,000

 

Compatible APIs and versions

Compatible API

Instance type

Version

Cassandra

Cluster

3.11

 
 

Related Services

Elastic Cloud Server (ECS)

Elastic Cloud Server (ECS) provides GeminiDB (for Cassandra) with elastic computing resources. GeminiDB (for Cassandra) needs to apply for resources from ECS to build a running environment for DB instances.

Object Storage Service (OBS)

Backups are stored in Object Storage Service (OBS) buckets, which provides disaster recovery capabilities and saves space.

Virtual Private Cloud (VPC)

GeminiDB (for Cassandra) uses Virtual Private Clouds (VPCs) and network security groups to keep DB instances isolated. VPCs allow you to define which IP addresses are allowed to access a given database. Running a DB instance in a VPC improves security.

Cloud Eye

Cloud Eye serves as an open monitoring platform, keeping track of GeminiDB (for Cassandra) resources for you. It reports alarms and promptly issues warnings to ensure that services remain running properly.

 
 

Application Scenarios

Internet

GeminiDB (for Cassandra) provides excellent read and write performance, flexibility, and fault tolerance, making it easy for those websites that provide product catalogs, recommendations, personalization engines, and transaction records to handle high concurrency and ensure low latency.

Advantages

  • Large-scale clusters: Each cluster can include up to 200 nodes, helping write-intensive Internet applications process massive volumes of data.
  • High availability and scalability: The failure of one node does not affect the availability of the entire cluster. Compute resources and storage space can be quickly scaled up with minimal service interruptions.
  • High availability and scalability: The failure of one node does not affect the availability of the entire cluster. Compute resources and storage space can be quickly scaled up with minimal service interruptions.
Industrial data collection

GeminiDB (for Cassandra) is fully compatible with Cassandra, so it can help you collect, organize, and store data from different types of terminals, as well as aggregate and analyze the data in real-time.

Advantages

  • Large-scale clusters: The large-scale clusters are well suited to collect and store massive numbers of manufacturing metrics.
  • High availability and performance: Data can be written to this database 24/7.
  • Fast backup and restoration: Snapshots allow for fast backup and recovery.
  • Scaling in minutes: Service or project peaks can be handled easily.
 

Performance

Performance ratio of GeminiDB (for Cassandra) to open-source Cassandra (ECS with Data disk type: Ultra-high I/O)

Selected Hardware (flavors)

Concurrent client threads

Data used

95% read and 5% update

50% read and 50% update

65% read, 25% update and 10% insert

90% insert and 10% read

8 vCPUs 32 GB

64

100 GB

8.62

8.60

4.19

5.67

16 vCPUs 64 GB

128

200 GB

8.31

3.47

3.05

4.28

32 vCPUs 128 GB

1256

400 GB

10.18

3.85

3.76

4.99

 

Test Conclusion

The GeminiDB (for Cassandra) performs ten times better than the open-source Cassandra cluster in read latency.

The GeminiDB (for Cassandra) cluster gives you nearly identical write performance as the open-source cluster.

Adding nodes slightly affects both the GeminiDB (for Cassandra) and open-source clusters. 

  • The scale-out process of GeminiDB (for Cassandra) is fast and affects services for a short period of time (10s). You do not need to change parameters, and the scale-out process lasts for 10 minutes.
  • For an open-source Cassandra cluster, the duration for adding nodes depends on the data volume and parameter settings, and the impact on performance varies. In a test scenario, the scaling took more than 30 minutes when the preset data size is 50 GB.

Best practices to choose the best flavor

  • In a multiple-node instance with 4 vCPUs per node there should be no more than 250 GB on each node, and the transactions per second (TPS) on each node are limited to 1000.
  • In a multiple-node instance with 8 vCPUs per node there should be no more than 250 GB on each node, and the transactions per second (TPS) on each node are limited to 2500.
  • In a multiple-node instance with 16 vCPUs per node there should be no more than 500 GB on each node, and the transactions per second (TPS) on each node are limited to 5000.
  • In a multiple-node instance with 32 vCPUs per node there should be no more than 500 GB on each node, and the transactions per second (TPS) on each node are limited to 10000.

 

Best Practice: Design Rules

Rules

Rule 1

Do not store large data such as images and files in these databases.

Rule 2

The maximum size of the key and value in a single row cannot exceed 64 KB, and the average size of rows cannot exceed 10 KB.

Rule 3

The data deletion policy must be considered in the design of a table. Data in a table cannot increase infinitely without being deleted.

Regel 4

Partition keys can evenly distribute workloads to avoid data skew.
A partition key of a primary key determines a logical partition for storing table data. If partition keys are not evenly distributed, data and load between nodes are unbalanced, resulting in a data skew problem.

Rule 5

The design of partition keys can evenly distribute data access requests to avoid BigKey or HotKey issues.

  • BigKey issue: The main cause of BigKey is that the primary key is improperly designed. As a result, there are too many records or too much data in a single partition. Once a partition becomes extremely large, access to the partition increases the load of a server where the partition is located and can even cause an Out of Memory (OOM) error.
  • HotKey issue: This issue occurs when a key is frequently operated in a short period of time. For example, breaking news can cause a spike in traffic and large number of requests. As a result, the CPU usage and the load on the node on which the key is located increase, affecting other requests to the node and reducing the success rate of services. HotKey issues will also occur, e.g.,  during the promotion of popular products and Internet celebrity live streaming.

Rule 6 

The number of rows of a single partition key cannot exceed 100,000, and the disk space of a single partition cannot exceed 100 MB. 
The size of records under a single partition key cannot exceed 100 MB.

Rule 7

Ensure strong consistency between data copies written to GeminiDB (for Cassandra), but do not support transactions.

Consistency model

Consistency supported

Description

Concurrent write consistency

Yes

GeminiDB (for Cassandra) does not support transactions, and data writing is strongly consistent.

Consistency between tables

Yes

GeminiDB (for Cassandra) does not support transactions, and data writing is strongly consistent.

Data migration consistency

Eventual consistency

DRS migration provides data sampling, comparison, and verification capabilities. After services are migrated, data verification occurs automatically.

Rule 8

For large-scale storage, database splitting must be considered. Ensure that the number of nodes in the GeminiDB (for Cassandra) cluster is less than 100. If the number of nodes exceeds 100, split the cluster vertically or horizontally.

Vertical splitting: Data is split by functional module, for example, the order database, product database, and user database. In this mode, the table structures of multiple databases are different.

Horizontal sharding: Data in the same table is divided into blocks and stored in different databases. The table structures in these databases are the same.

Rule 9

Avoid tombstones caused by large-scale deletion.

  • Use TTL instead of Delete if possible.
  • Do not delete a large amount of data. Delete data by primary key prefix.
  • A maximum of 1,000 rows can be deleted at a time within a partition key.
  • Avoid querying deleted data during range query.
  • Do not frequently delete data of a large range in one partition.
 

Best Practices: Design Suggestions

Suggestions

Suggestion 1

Properly control the database scale and quantity. It is recommended that the number of data records in a single table be less than or equal to 100 billion. It is also recommended that a single database contains no more than 100 tables and that the maximum number of fields in a single table be 20 to 50.

Suggestion 2

Estimate how many resources GeminiDB (for Cassandra) servers can process. If it is estimated that N nodes need to be used, adding additional N/2 nodes is recommended for fault tolerance and performance consistency. In normal scenarios, the CPU usage of each node is limited to 50% to avoid fluctuation during peak hours.

Suggestion 3

To store large volumes of data, perform a test run based on service scenarios. In service scenarios with a large number of requests and data volume, you need to test the performance in advance because the service read/write ratio, random access mode, and instance specifications vary greatly.

Suggestion 4

Split database cluster granularity properly.

  • In distributed scenarios, microservices of a service can share a GeminiDB (for Cassandra) cluster to reduce resource and maintenance costs.
  • The service can be divided into different clusters based on the data importance, number of tables, and number of records in a single table.

Suggestion 5

Do not frequently update fields in a single data record.

Suggestion 6

If there are too many nested elements such as List, Map, or Set, read and write performance will be affected. In this case, convert such elements into JSON data for storage.

Neue Features

New GaussDB (for Cassandra) Service is now available in EU-DE regionView Details
Renaming of Database service GaussDB (for Cassandra)View Details
GeminiDB (for Cassandra) supports storage auto-scalingView Details
Don't want to miss any updates?Visit our portfolio roadmap and discover new services and updates.
Learn more

Find out more

 

The Open Telekom Cloud Community

This is where users, developers and product owners meet to help each other, share knowledge and discuss.

Discover now

Free expert hotline

Our certified cloud experts provide you with personal service free of charge.

 0800 3304477 (from Germany)

 +800 33044770 (from abroad)

 24 hours a day, seven days a week

Write an E-Mail

Our customer service is available free of charge via E-Mail

Write an E-Mail

AIssistant

Our AI-powered search helps with your cloud needs.