[go: up one dir, main page]

0% found this document useful (0 votes)
24 views59 pages

08 Using and Migrating Database Services

This document outlines the migration process to Huawei Cloud, focusing on database services. It details key milestones, tasks, and precautions necessary for successful migration, including user permission planning and cyber security measures. Additionally, it provides insights into various database architectures and their evolution, emphasizing the importance of planning and execution in database migration projects.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views59 pages

08 Using and Migrating Database Services

This document outlines the migration process to Huawei Cloud, focusing on database services. It details key milestones, tasks, and precautions necessary for successful migration, including user permission planning and cyber security measures. Additionally, it provides insights into various database architectures and their evolution, emphasizing the importance of planning and execution in database migration projects.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 59

Revision Record Do Not Print this Page

Course Code Product Product Version Course Version

Author/ID Date Reviewer/ID New/Update

1
Migration to Huawei Cloud
Module 8: Using and Migrating Database Services
Objectives
⚫ Upon completion of this course, you will:
 Understand the main steps and outputs of database migration.
 Understand how to migrate existing complex databases to Huawei
Cloud.
 Acquire some hands-on experience.

3
Contents
1. Introduction to Key Milestones

2. Migration of Complex Databases

3. Survey of the Source

4. Evaluation of the Destination

5. Planning and Practices for Database Migration

4
Key Milestones of a Database Migration Project
Day T+60
Day T+40 1. Complete data Day T+150 Day T+160
Day T Make service changes synchronization.
Day T+20 Complete Complete
Initiate a to match the new 2. Finalize the
Provide the service project
project. database migration
migration design. migration. acceptance.
architecture. runbook.

Development and testing


Service switchover
Project planning of services to be Data migration Acceptance
and observation
migrated to the cloud
Project
Project management
management master plan, project
milestones, WBS, project Project risk tracking form, project self-check list, quality deviation form, and progress
progress schedule, and deviation analysis
communication plan

Tasks: Tasks: Tasks: Tasks: Tasks:


Review services, Make service changes to Clear test data, Perform migration Provide an
output Service Cloud match the new database start full and drills, migrate acceptance report.
Migration Sorting and architecture, deploy incremental data workloads, and Focus:
Migration Resources, service systems on migration, prepare observe services. Service teams and
and specify migration Huawei Cloud, test for the migration, Teams involved: the O&M team verify
Migration objectives. functions and and update and review Service, development, project delivery.
tasks Teams involved: performance, resolve Migration Runbook. O&M
Development, O&M problems related to Teams involved:
Focus: services running on the Service, development,
Assist the service cloud, and output O&M
team and O&M team in Service Test Solution.
listing service Teams involved:
systems to be migrated Service, development,
to the cloud. test, O&M
5
Overall Migration Process
Migration
Information survey Solution design Migration drill
implementation

▪ Optimize the survey ▪ Plan the migration based ▪ Develop a step-by-step ▪ Add the database
template and survey the on the source database solution based on the migration procedures to
source database. The details and the design requirements of the migration runbook
template and the survey customer's migration the database solution. and use scripts to
will be used to design requirements, including monitor the source and
the specifications of the technologies used, destination during the
the destination database the migration principles migration phase to
and plan the migration. applied, and the user ensure data consistency.
▪ Export the source permissions required.
database details. ▪ After the migration
solution is confirmed,
generate a list of all
source and destination
databases.

6
Information Survey
IP Address
Number of Total Number of Memory Memory
PM/VM/Cloud OS Version CPU Usage
CPUs CPU Cores (GB) Usage Internal Network Internet

VM Windows Server 2008 2 32 127 19.16.XX.XX 211.11.XX.XX

PM CentOS 7.6 2 32 127 19.16.XX.XX 211.11.XX.XX

Storage Resource

Local Storage

Whether to Use the Local Disk Whether the Maximum Space Used
Disk Type (SATA/SAS/SSD) Disk Capacity (TB) Disk Utilization
on the Cloud by an Instance Exceeds 4 TB

SAS 0.6 76%

SAS 0.6 81%

7
User Permission Planning
No Usage Service User Database Permission

1 Service xx_user Read and write permissions on service DBs

2 Read only readonly Read-only permissions on service DBs

3 O&M inception Read-only permissions on database dictionary tables

root (MySQL) Only for the customer's O&M department and not used as independent
4 Administrator
rwuser (MongoDB) service users or read-only users

8
Cyber Security Planning

VPC CIDR Database Subnet CIDR


No Usage VPC Design Precaution
Block Subnet Design Block

Services must be granted access to the


destination database. This access must
be allowed by the network security
1 Production qingting_prd 10.112.0.0/13 prd_db 10.118.0.0/16 devices, migration source, DRS,
container production environment, big
data, and VPN. Outgoing traffic of the
destination database is not limited.
Services must be granted access to the
destination database in the container
2 Testing qingting_test 10.104.0.0/14 test-DB 10.107.0.0/16
test environment, big data test, and by
VPN.

9
Migration Precautions
No Database Type Migration Principle Precaution
To synchronize a large amount of data, you need to change the
binlog retention period of destination databases to 0, and change
Migrate to destination databases with the same
the values of innodb_flush_log_at_trx_commit and sync_binlog to 2
version as the source database, for example, you
1 MySQL and 500, respectively, to improve the migration efficiency. During
can migrate from 5.6 to 5.6, 5.7 to 5.7, or 8 to
the incremental synchronization period, change the values of the
8.
two parameters to 1 and change the binlog retention period back to
18 hours.
Select the primary database rather than the
The binlog of standby databases of some vendors is non-standard,
2 MySQL standby database as the migration source in a DRS
which may result in lost data.
task.
You are advised to migrate MySQL databases with a • You are advised to design sharding for DDM. Perform DDM
3 DDM large amount of data to Distributed Database operations on DDM instances, rather than RDS instances.
Middleware (DDM). • Perform DDM operations on the console.
Create a test database instance for each major
version. Databases with different names on the MongoDB does not support the mapping of database names and table
4 MySQL and MongoDB
source can be migrated to an instance of the same names.
major version.
The migration from a later version to an earlier version fails to
Migrate to destination databases with the same
pass some pre-check items. If advanced functions exclusive for the
5 MongoDB major version as the source database, for example,
later version are not used, ask DRS maintenance personnel to skip
you can migrate from 3 to 3.4 or 4.2 to 4.0.
the check items.

• In MongoDB 4.0, each service user must have the read permission
on the admin database.
Create service users on the destination for the
6 MongoDB • In MongoDB 3.0, each service user must have the read and
admin database for unified management.
readAnyDatabase permissions on the admin database to read the
10 database locally.
Migration Planning
Current Number of Number of Recommended Huawei Cloud Database Full Migration
Version Cluster Architecture Migration Method Migration Tool
Database Instances Databases Service Time
Cloud service in RDS for MySQL 5.6 in
5.5 1
primary/standby mode primary/standby mode
Cloud service in
RDS for MySQL 5.6 in
5.6 primary/standby or single- 9
primary/standby or single-node mode
node mode
Cloud service in
RDS for MySQL 5.7 in
MySQL 5.7 primary/standby or single- 61 121 Online migration
primary/standby or single-node mode
node mode
Three-Node enterprise RDS for MySQL 5.7 in
5.7 1
edition primary/standby mode
Cloud service in
RDS for MySQL 8.0 in
8 primary/standby or single- 34
primary/standby or single-node mode
node mode
Cloud service in RDS for PostgreSQL 9.5 in Online migration
9.4 1 DRS
primary/standby mode primary/standby mode (full)
Cloud service in RDS for PostgreSQL 10 in Online migration
PostgreSQL 10 1 15
primary/standby mode primary/standby mode (full+incremental)
Cloud service in RDS for PostgreSQL 12 in Online migration
12 7
primary/standby mode primary/standby mode (full+incremental)
3.4 Single-Node 1 DDS 3.4 in single-node mode
4 Single-Node 2 DDS 4.0 in single-node mode
3.2 Three-Node 6 DDS 3.2 in three-node mode
3.4 Three-Node 10 DDS 3.4 in three-node mode
Online migration
MongoDB 4 Three-Node 30 125 DDS 4.0 in three-node mode
(full+incremental)
4.2 Three-Node 25 DDS 4.0 in three-node mode
3.4 Five-Node 2 DDS 3.4 in five-node mode
4 Five-Node 3 DDS 4.0 in five-node mode
4.2 Five-Node 1 DDS 4.0 in five-node mode
1.7 HA 1 GaussDB(for Influx) 1.7 in two-node Non-commercial
InfluxDB 1 Full
1.8 Basic 1 mode tool

11
Migration Implementation and Efficiency
Improvement
⚫ Using a script to batch-create migration users with the minimum
permissions needed
⚫ Creating service users and granting permissions in batches
⚫ Data migration implementation solution

12
Migration Implementation
⚫ The primary node of a DDS instance must be connected. If the secondary
node of a DDS instance is connected, data cannot be updated.

13
Migration Implementation
⚫ The DDS password must contain uppercase letters, lowercase letters,
digits, and special characters. The recommended special characters are
asterisks (*) and exclamation marks (!). Other special characters may
be truncated on your client and an insufficient permissions error will
be reported.

14
Migration Implementation
⚫ For DDS 3.4, the read and readAnyDatabase permissions on the admin
database need to be added for local login. For DDS 4.0 and later
versions, only the read permission on the admin database needs to be
added.

15
Migration Drill
⚫ Due to complex service associations, you can revoke write permissions and grant read
permissions to reduce how long your database services are unavailable.
⚫ Scripts can be used to:
 Monitor write operations on source MySQL databases.
 Revoke permissions on the migration source and kill sessions.
 Grant write permissions on Huawei Cloud databases.
 Monitor the write status on Huawei Cloud databases.

Monitor source
databases and Grant write Check whether
Revoke the Stop applications Release
ensure that no permissions on data can be
write on the migration applications on
write operations Huawei Cloud written to Huawei
permissions. source. Huawei Cloud.
are performed on databases. Cloud databases.
them.

16
Contents
1. Introduction to Key Milestones

2. Migration of Complex Databases

3. Survey of the Source

4. Evaluation of the Destination

5. Planning and Practices for Database Migration

17
MySQL Architecture Evolution — Single-Node
Architecture

Characteristics
Application Applications directly read (SELECT) from and write (INSERT, DELETE, and UPDATE) to a single node.

Advantages
Read Write The architecture is simple and the O&M is easy.

Disadvantages

There may be single points of failure (SPOFs): As workloads grow and data volume increases,
available storage may be insufficient and read and write performance may be affected (low CPU
CPU and memory
usage, memory usage, and storage IOPS).

Scale Up: vertical scaling. You can use higher specifications (CPU and memory) or high-end storage,
or scale the storage capacity.
Scale Out: horizontal scaling
✓ Grouping: primary/standby architecture and read/write splitting. All nodes in the same group use
Solutions

primary/standby replication to ensure data consistency. This architecture is used to enhance the
database read capability.
✓ Sharding: distributed architecture and sharding. Table data is horizontally split into several
copies. Each copy of data can be treated as a shard. The sum of all shards forms full data.
Local disk Different shards may reside on different MySQL instances. This architecture is used to enhance
the database write capability.

18
MySQL Architecture Evolution — Primary/Standby
Architecture+Read/Write Splitting

Characteristic
Application
✓ Data changes on the primary node are synchronized to the standby node in real time.
✓ Data stored on each data node is consistent.

s
✓ Applications write data to the primary node, but read data from the standby node or
from both the primary and standby nodes.
Write

Read Read

Advantages
Primary node Standby node
(CPU and (CPU and The database read (SELECT) capability in the monolithic architecture is enhanced.
memory) memory)

This architecture is the simplest primary/standby architecture and can be modified in


multiple different ways.
✓ Number of nodes: one primary node and one standby node, one primary node and
multiple standby nodes, and multi-level replication
Remarks

✓ Number of primary nodes: single-write and dual-write


✓ Read/Write splitting: Read and write requests from an application are routed to
Local Local primary and standby instances, respectively. Alternatively, you can add proxies to
disk disk split read and write requests.
✓ Source replication: single-source replication and multi-source replication
✓ High availability: MMM, MHA, MGR, and PXC
Data replication

19
MySQL Architecture Evolution — Distributed
Architecture+Sharding

Characteristi
✓ Table data is horizontally split into several copies. Each copy of data can
be treated as a shard.

cs
Application ✓ The sum of all shards forms full data.

Write Read Read

Advantages
The shared-nothing architecture provides higher write performance (INSERT,
Write
DELETE, and UPDATE) than that in a monolithic architecture. In a shared-
nothing architecture, different data is routed to different data nodes, so
Write Read the maximum IOPS is much higher than with a single storage node.

Primary node Primary node Primary node


(CPU and (CPU and (CPU and
memory) memory) memory)
This architecture is the simplest distributed architecture and can be modified
in multiple different ways.
✓ The deployment of shards can be single-node, primary/standby, or
Remarks primary/standby+read replicas (read/write splitting).
✓ The compute layer (data splitting and routing) can be implemented by using
the application code, or by invoking a sharding framework, such as Sharding-
JDBC or ShardingSphere through the application code, or by using third-party
open-source or commercial distributed database middleware.
Shard 1 Shard 2 Shard 3

20
Database Sharding — Vertical Sharding
⚫ If there is too much data in a database table and database performance
is affected, you can split data using vertical sharding or horizontal
sharding.
⚫ Vertical sharding is classified as either vertical table sharding or
vertical database sharding. Post
Publishe Post ID Title Publisher Posted Content
Post ID Title Posted Content ID
r
1 1 1
2 2 2
3 3 3
4 4 4

Example of vertical table sharding

21
Database Sharding — Vertical Sharding
⚫ The basic idea of vertical database sharding is to divide databases by service module
and place data tables with highly coupled database transactions in the same database.
⚫ For table sharding and vertical database sharding, the most important concern is the
customer's services.

Unified
User center Policy center Settlement center Organization
authentication
database database database center database
center database

Service mid-end Payment center Insurance type Product center Risk control Message center
database database center database database center database database

Electronic
Engine center E-credential Financial center Report center
archive center
database center database database database
database

Example of vertical database sharding

22
Database Sharding — Horizontal Sharding
⚫ Horizontal sharding consists of horizontal table sharding and horizontal database
sharding.
Basic Personnel Basic Personnel Info
Info Basic Personnel Info
- RID
Basic Personnel Info
Basic information - RID Basic information - RID
- Name
center database - Name center database - RID
...- Name
... ...- Name
...
Example of horizontal
table sharding

Basic Personnel Basic Personnel


Basic Basic
Basic information Info_0 information
Info_0
Basic Personnel
Basic Personnel ... - RID
Personnel Info center - RID Info_1
center Info_1
Basic database_0 - Name Basic Personnel database_N - Name - RID Basic Personnel
information - RID - RID Info_N
Info_N - Name - RID
center - Name - Name
- RID - Name
database ...
- Name

Horizontal table and


database sharding example

23
Database Sharding — Horizontal Sharding
⚫ After horizontal database sharding, you do not need to perform sharding again. Each database
shard has only one table and there is no child table.

Basic Personnel
Basic Personnel Basic Personnel
Info Basic Basic
Info_0 Info_0
- RID information ... information
- RID - RID
Basic information - Name center center
database_0
- Name database_1
- Name
center database ...

Example of database sharding without table sharding


In the scenario of database sharding without table sharding, there is only one table in a database shard and the table does
not need to be sharded again. In this case, only the level-1 sharding is performed. If both database sharding and table
sharding are performed, it is called level-2 sharding.
When using the Huawei Cloud DDM, performing only level-1 sharding can maximize the performance of DDM. The design of DDM is
such that for some batch operations, if level-2 sharding is performed (tables are sharded after database sharding), table
shards in each database shard are executed serially. If level-2 sharding is not performed, all child tables are tiled in all
database shards and all shards are executed concurrently. Level-1 sharding is better for parallel execution.
Generally, sharding refers to horizontal database and table sharding, which is one of the main features of distributed
database middleware.
24
Distributed Databases — Key Concepts
⚫ In horizontal sharding scenarios, where there are multiple table shards, a database shard is also called a shard.
⚫ Database shards and table shards are physical concepts.
⚫ Sharded table: Full data in a table is split and distributed in database shards of multiple MySQL instances. This is
applicable to tables, for example, order tables in fact tables, which have a large amount of data and are frequently read
and written.
⚫ Global table: It is also called broadcast table. Each database shard stores a copy of full and identical data. This is
applicable to tables that have a small amount of data but are read-intensive, for example, province tables in dimension
tables. The main purpose is to prevent performance loss caused by cross-shard data flow during associated queries. Some
distributed middleware does not support global tables. There may be no global tables when customers use application code
to implement sharding and data routing.
Database
Table shard
shard

Basic Personnel Basic Personnel


Basic Personnel Basic Info_0
Basic Personnel ... Basic Info_0
Basic Personnel
Info information - RID information - RID
Info_1 Info_1
center - Name - RID Basic Personnel center - Name - RID Basic Personnel
Basic information - RID database_0 Info_N database_1 Info_N
- Name - RID - Name - RID
center database - Name
... - Name - Name

25
Distributed Databases — Key Concepts
⚫ Sharding keys
 Database sharding keys
 Table sharding keys
 Rule for selecting sharding keys
⚫ Sharding algorithms
 The value of a sharding key determines the database shard and table shard that a record should be routed to.
 General sharding algorithms include hash algorithms and time-based hash algorithms.

Publishe
Post ID Title Posted Content
r
Publishe 1
Post ID Title Posted Content
r 3
1
2
3
4 mod(Post ID,2) Post ID Title Publisher Posted Content
2
4

26
Distributed Databases — Key Concepts
⚫ Relationships between database shards and MySQL DB instances
 A MySQL DB instance can contain one or more database shards. This means that there can be
multiple databases on a MySQL DB instance. The number of database shards on a MySQL DB
instance depends on the storage IOPS and cannot be increased without limit. If the TPS/QPS
of an application is high, that means there are too many database shards in a single MySQL
DB instance. In this case, data needs to be sharded and distributed on multiple MySQL DB
instances.

Basic information Basic information Basic information


center database_0 center database_1 center database_2
Basic information
center database
Basic information Basic information Basic information
center database_3 center database_4 center database_5
MySQL DB instance
MySQL DB instance

27
Distributed Databases — Logical Architecture
⚫ Logically, a distributed database consists of the storage layer, compute layer, and metadata
layer.
 Compute layer: It is the SQL layer in a single-node database. It is used to perform operations such as permission
check, route access, distributed transaction control, and calculation result summary and sorting on data access.
 Metadata layer: It records the number of storage nodes in a distributed database cluster and metadata such as IP
addresses and ports.
 Storage layer: It is used to store actual service data. DB
Distributed Distributed DB
[compute layer] [metadata layer]

SQL layer Metadata

Distributed DB Distributed DB Distributed DB Distributed DB


[storage layer] [storage layer] [storage layer] [storage layer]

Storage layer Storage layer Storage layer Storage layer

28
Distributed Databases — Logical Architecture
✓ Schemas and logical tables ✓ Physical databases and physical tables
The compute layer stores schemas and logical tables and provides The storage layer stores physical databases and physical tables, which
unified access for external applications. store actual physical data.
Applications access schemas and logical tables as if they were Generally, one schema corresponds to multiple physical databases, and
accessing database tables in a single-node database. one logical table corresponds to multiple physical tables.
Schemas and logical tables shield the implementation details of The mappings between schemas and physical databases and between
underlying physical databases and physical tables from logical tables and physical tables are stored at the metadata layer.
applications.
Schemas and logical tables do not store actual physical data.
Distributed DB Distributed DB
[compute layer] [metadata layer]
Schema

db t Metadata

Logical
table

Physical
Distributed DB Distributed DB Distributed DB Distributed DB database
[storage layer] [storage layer] [storage layer] [storage layer]
t_0 t_0 t_0 t_0 Physical
t_1 t_1 t_1 t_1 table
t_2 t_2 t_2 t_2

db_0 db_1 db_2 db_3

29
Distributed Databases — Logical Architecture
⚫ Multiple schemas can be created in a distributed Distributed DB Distributed DB
database (at the compute layer). A schema can [compute layer] [metadata layer]
contain multiple logical tables. A MySQL
db1 x db2 y Metadata
instance (storage layer) can store database
shards and table shards corresponding to
multiple schemas.

Distributed DB Distributed DB Distributed DB Distributed DB


[storage layer] [storage layer] [storage layer] [storage layer]
x_0 x_0 x_0 x_0
x_1 x_1 x_1 x_1

db1_0 db1_1 db1_2 db1_3

y_0 y_0 y_0 y_0


y_1 y_1 y_1 y_1

db2_0 db2_1 db2_2 db2_3

Logical Physical Physical


x Schema x_{num}
table database table

Logical Physical Physical


y Schema y_{num}
table database table
30
Distributed Databases — Logical Architecture
⚫ In data migration scenarios, pay special attention to the data distribution structure
at the storage layer of a distributed database (for example, how many database shards
are stored in a MySQL instance, how many table shards are stored in each database
shard, what the database shard names are, and what the table shard names are). Full
and incremental data capture (CDC) needs to be implemented through the storage layer.

31
DRS Technical Principles
⚫ In accordance with the full+incremental principle of DRS, when data is migrated from other distributed
databases to DDM, the source needs to be connected to the storage layer of the distributed database
because the storage layer has full physical data and binary logs that record changes in real time. The
destination needs to be connected to DDM, the compute layer of the distributed database, because SQL
statement playback needs to pass through the compute layer, and then related metadata is saved to the
metadata layer. This ensures that subsequent data query requests entering DDM are routed correctly.

Full synchronization Incremental synchronization

Logs
DRS Data
captured replayed
Data
DRS flushed to
Log file disks

▪ Source database data is synchronized to the


▪ The source database logs are parsed in real time and
destination database all at once.
converted into data in the DRS memory storage format.
▪ Table structures, indexes, unique keys, and
▪ DRS stores the captured data on disks.
primary keys can be synchronized.
▪ Data on disks is reconstructed to the corresponding SQL
▪ Tables can be concurrently synchronized in
statements, and replayed in the destination database.
shards.
32
DRS Task Types
⚫ When using DRS to migrate data in distributed databases, you may need to combine DRS tasks of
different types to complete the data transfer due to specific restrictions of different DRS
task types.
⚫ What is migration?
 Migration means that databases will be replaced and services will be migrated. Databases are migrated by instance, which is usually a
short one-off procedure.

⚫ What is synchronization?
 Synchronization means that the source and destination databases are retained at the same time. The source and destination databases
usually use different service systems. Data is shared and synchronized between the two databases in the unit of some tables. Data
continuously flows from the source system to the destination system for a long time.

⚫ How do I select a DRS task type?


 Based on the concepts described here, DRS online migration tasks are usually used to migrate all objects and data on the source at a
time. They do not provide functions such as database table mapping and data filtering.
 DRS real-time synchronization tasks provide more functions, including mapping databases and tables, filtering table data, adding columns
and timestamps, and editing running tasks.
 In particular, data flows between heterogeneous databases (for example, from Oracle to DDM) are also included in real-time
33
synchronization tasks.
Contents
1. Introduction to Key Milestones

2. Migration of Complex Databases

3. Survey of the Source

4. Evaluation of the Destination

5. Planning and Practices for Database Migration

34
Survey of the Source
⚫ Survey the following aspects of the source:

1 2 3 4
Source Sharding scheme Data volume of
Migration scope
construction of the source the source

5 6 7 8
Source database Future service
KPIs Other
resource status growth

35
Survey of the Source
⚫ Source construction
 Learn whether the source is an IDC database, database built on the cloud, or cloud database
service.
 This information will let you know if the source has any special restrictions. For example,
the values of some key parameters of cloud databases provided by different cloud vendors
may be different, and migration prerequisites may be different.

⚫ Migration scope
 Environment: development environment, test environment, and production environment
 Migration level: instance level, database level, and table level

⚫ The information determines the scope of subsequent survey and is used as the input
for selecting the scope when creating a DRS migration or synchronization task.
36
Sharding scheme of the source
⚫ Whether to use distributed middleware
 For example, whether to use the application code logic, or to invoke a sharding framework such as Sharding-JDBC or ShardingSphere in an
application program, or to use a third-party independent distributed database middleware

⚫ Sharding rules
 Determine the total number of schemas and the number of logical tables corresponding to each schema.
 Determine the sharding mode of each table, for example, database and table sharding, database sharding without table sharding, and table
sharding without database sharding.
 Determine the naming rules of database shards and table shards.
 Determine the mapping between database shards and MySQL instances. For example, determine how many database shards are stored in a MySQL
instance and what the database shard names are.

⚫ Sharding keys and sharding algorithms


 Based on this information, if DDM is to be used on the destination, provide a reference input for creating logical tables in DDM.

⚫ Main query conditions in services


 Work with the customer to clarify what query conditions are used for the main query statements of their services. If DDM is to be used on
the destination, provide a reference input for sharding keys of logical tables in DDM.

⚫ Whether there is a global table


 If third-party distributed middleware is used, check whether there is a global table. Based on the input, determine whether to migrate
global tables and which topology structure will be used for the migration.
37
Source Database Resources (1)
⚫ Data volume of the source
 Number of data records: Determine how many shards will be needed for evaluating
destination database resources based on the number of data records in the largest
table.
 Data size: Calculate the total data volume of all data tables to be migrated for
evaluating destination database resources.

⚫ Source database resource status


 MySQL version
 MySQL storage engine
 MySQL networking mode
38
Source Database Resources (2)
⚫ Performance specifications of the server where the MySQL instance is located
 Check the main performance specifications of the server where the MySQL instance is located, such as the number of CPU cores, memory
size, and storage capacity, to provide a reference for evaluating destination resources.
⚫ MySQL HA solutions
 Check whether an HA solution is used for MySQL instances and what the solution is. This information provides a reference for designing
the MySQL HA solution on the destination.
⚫ MySQL network information
 Ask the customer to provide the IP addresses and port numbers of all MySQL instances. This information is used as the source information
for creating a DRS task.
⚫ Key MySQL DB parameters
 character_set_server: code of the source MySQL server
 server_id: ID of the source MySQL server
 lower_case_table_names: whether the table names in the source MySQL instance are case sensitive
 binlog_format: binary log format of the source MySQL instance
 innodb_flush_log_at_trx_commit: policy for refreshing transaction logs of the source MySQL instance
 sync_binlog: policy for refreshing binary logs of the source MySQL instance
 The values of these parameters affect whether the DRS task pre-check is successful and serve as the basis for modifications of the source
or destination MySQL instance before migration.

39
Source Database Monitoring
⚫ MySQL key monitoring information
 Collect key monitoring information about the source MySQL database, such as the CPU usage, memory
usage, average IOPS, and disk usage. You can use these details to evaluate the resources of the
destination.
 Collect information about workload fluctuations, for example, whether there are peak hours in the
morning, at noon, or in the evening and the corresponding periods, whether there are batch tasks and
when they are processed, whether there are specific service peak hours (such as during weekly or
monthly settlement) per week or month, and whether there are traffic peaks caused by marketing
activities or promotions.
 You can evaluate the resources on the destination based on the collected information and the database
resource monitoring information in the corresponding periods. For example, you can evaluate whether
higher performance specifications are required. Alternatively, evaluate whether the application
architecture needs to be optimized. For example, you can introduce caches to reduce read pressure, or
introduce message queues to reduce write pressure. You can also determine whether to control the
traffic flow during the creation of a DRS task, as well as set the traffic limiting period, and
40
determine whether to start the task immediately upon creation.
More Information Such as Data Volume and Growth
❖ Future service growth
▪ Data increments in the next N years
The customer needs to evaluate the data increments in the next N years based on current service volume. This
information helps you evaluate the destination database resources.
▪ Termination plan in the future
This information provides the reference input for the migration solution.

❖ KPIs
▪ TPS /QPS
This information helps you evaluate the resources of the destination.
▪ IOPS
This information helps you optimize the performance of the destination.

❖ Other
▪ Backup and restoration
The surveys of the backup mode, backup period, and backup frequency of the source provide inputs for configuring the
backup policy of the destination.
▪ Interconnection with related systems
For example, check whether the service system is interconnected with a data warehouse, whether other systems need to
periodically extract data from the source database, and whether the DDM read-only group capability is recommended.

41
Contents
1. Introduction to Key Milestones

2. Migration of Complex Databases

3. Survey of the Source

4. Evaluation of the Destination

5. Planning and Practices for Database Migration

42
RDS TPS/QPS
To evaluate RDS resources, factors such as the number of shards, TPS/QPS, storage, and HA must be considered.
Evaluate the TPS/QPS based on live-network workloads, and then evaluate the number of RDS DB instances needed based on the
performance baseline.
Note 1: You need to log in to the console to check whether there are dedicated MySQL instances of different versions in each
region.
Note 2: The performance baselines are obtained based on the sysbench stress tests.
MySQL Version Instance Type Performance Baseline
General-purpose https://support.huaweicloud.com/intl/en-us/pwp-rds/rds_swp_mysql_06.html
MySQL 5.6
Dedicated https://support.huaweicloud.com/intl/en-us/pwp-rds/rds_swp_mysql_07.html
General-purpose https://support.huaweicloud.com/intl/en-us/pwp-rds/rds_swp_mysql_09.html
MySQL 5.7
Dedicated https://support.huaweicloud.com/intl/en-us/pwp-rds/rds_swp_mysql_10.html
General-purpose https://support.huaweicloud.com/intl/en-us/pwp-rds/rds_swp_mysql_11.html
MySQL 8.0
Dedicated https://support.huaweicloud.com/intl/en-us/pwp-rds/rds_swp_mysql_12.html

The following uses the reference performance baseline of MySQL 5.7 general-purpose DB instances as an example.
CPU Cores Memory Connections (Stress TPS QPS CPU Cores Memory Connections (Stress TPS QPS
Test Value) Test Value)
2 4 1,500 334 6,673 2 8 2,500 552 11,039
4 8 2,500 756 15,122 4 16 5,000 1,062 21,249
8 16 5,000 1,338 26,756 8 32 10,000 2,117 42,335
16 32 10,000 2,630 52,000 16 64 18,000 3,016 60,336

If the application QPS is 50,000 and you use RDS DB instances with 8 CPU cores and 16 GB memory, two RDS DB instances are required.

43
RDS Sharding Design
✓ Database and table sharding recommendations
If there are less than 50 million data records, you are advised to use a single table.
If there are more than 50 million data records, sharding is recommended. It is recommended that the number of records in a
single table be no more than 10 million.
For example, if there are 1 billion data records and a single table contains 10 million records, a total of 100 table
shards are required.
✓ RDS DB instance sharding recommendations
For RDS DB instances with medium performance specifications (8 CPU cores and 16 or 32 GB memory), you are advised to use
16 shards for a single instance.
For RDS DB instances with high performance specifications (16 CPU cores and 32 or 64 GB memory), you are advised to use 32
shards for a single instance.
Note 1: Calculate the total number of shards and the number of RDS DB instances based on the table with the maximum number
of records in a service database.
Note 2: The number of RDS DB instances is more important than the performance specifications of the instances. If resources
are insufficient, adding more RDS DB instances is more effective than upgrading specifications. RDS DB instances with low
specifications can be scaled up in minutes if resources are sufficient. If high-specification RDS DB instances are used,
your only choice is to scale out. Scale-out involves data redistribution and takes hours. It is a slow, risky process.
For example, if there are 1 billion data records and a single table contains 10 million records, a total of 100 database
shards are required. If RDS DB instances with 8 CPU cores and 32 GB memory are used and 16 database shards are used for an
RDS DB instance, you are advised to plan eight RDS DB instances.

44
RDS Storage Space and HA
❑ Evaluate storage capacity from the following factors:
• The total storage capacity needed depends on the existing data volume, on how fast the data is growing (for example,
data growth over the next N years, avoiding scale-out upon rollout), and on how long the data will need to be
retained.
• It is recommended that the disk usage be kept around 60% to 70% of the total disk capacity.
• The storage capacity of a single RDS DB instance on Huawei Cloud is limited to 4 TB.
In summary: Total RDS storage capacity = Data capacity/(60% or 70%)
For example, if the actual data capacity is 5 TB, you are advised to apply for 7.2 TB to 8.4 TB of RDS storage capacity
(5 TB/(60% or 70%) = 7.2 TB to 8.4 TB). Based on the storage capacity limit of a single RDS DB instance, three RDS DB
instances are required.
❑ HA
To avoid single points of failure (SPOFs), select at least primary/standby instances in the production environment.
In scenarios where the service database needs to be provided for other applications, such as big data extraction, you can
apply for at least one read replica to reduce the impact of non-core services on system services.

Comprehensive evaluation
Based on these four factors, determine the performance specifications you will need for your RDS DB instances, how many
database shards will be needed for a single instance, the instance deployment mode, and how many RDS DB instances you will
need.

45
Production Environment Resource Evaluation — DDM
Factors such as TPS/QPS and HA must be considered during DDM resource evaluation.
❑ TPS/QPS
Evaluate the TPS/QPS based on live-network workloads, and then evaluate the number of DDM instance nodes based on the following
performance baselines.
Note: The performance baselines are obtained based on the sysbench stress tests.

DDM Concurrent Requests TPS QPS


1 x 4 CPU cores and 8 GB memory 512 1,127.349 22,563.47
1 x 8 CPU cores and 16 GB memory 512 2,256.993 45,158.85
1 x 16 CPU cores and 32 GB memory 512 4,405.716 88,138.33
1 x 32 CPU cores and 64 GB memory 1,000 6,340.824 126,855.2

❑ HA
1. The total number of CPU cores of all DDM instance nodes is half of that of RDS DB instances.
2. If the first requirement is met, you are advised to use at least two nodes to prevent SPOFs of DDM instances.
Case 1: If the estimated QPS is 40,000, one DDM instance with 8 CPU cores and 16 GB memory can meet the performance baseline
requirements. In this case, to prevent SPOFs, at least two DDM instances with 8 CPU cores and 16 GB memory must be used.
Case 2: A customer's system uses distributed databases. According to the RDS resource evaluation, eight RDS DB instances with 8 CPU
cores and 32 GB memory are required. In this case, 64 CPU cores (8 x 8 cores) are required for RDS instances. Therefore, DDM instance
nodes require a total of 32 cores (0.5 x 64 cores). In this case, you can use four DDM instances with 8 CPU cores and 16 GB memory or
two DDM instances with 16 CPU cores and 32 GB memory.
❑ Comprehensive evaluation
Evaluate the number of required DDM nodes and node performance specifications based on TPS/QPS and HA.
46
Contents
1. Introduction to Key Milestones

2. Migration of Complex Databases

3. Survey of the Source

4. Evaluation of the Destination

5. Planning and Practices for Database Migration

47
Migration Solution Design — Service Overview
O&M Client side Applications HTML5 pages APIs
DevOps platform

Monitoring platform Service applications


(Prometheus and infrastructure)
Gaming Application A Application B Application C OKR
Internal office Operations
Application D Consumer goods Application F Application G

Platform services Essential middle-end services


Disconf tinker Lark CAT elephant sonar eService
wolves ulike

rooster Nexus CAS twamp Grafana Test platform


leopard Advertising
API Monitor marketing Web-appstore EasyMock bee Dubbo-admin

Big data platform Infrastructure


Database CDN Network Storage VM or container Short message
MRS clusters Test clusters
Security Gateway Middleware Scheduling

48
MySQL Migration and Rollback Solution —
Migration
⚫ During the migration, MySQL instance C and RDS for MySQL instance B need to be
retained for a period of time.

Migration source Huawei Cloud

Applications

Applications
End
⑤ ⑥
④ ③

DRS task RDS for MySQL
Self-managed MySQL ① instance B
instance A

DRS task
MySQL instance C

Direct ⑦
Connect

49
Distributed Reconstruction Scenario 1 —
Migration Link Design of Shard Migration
Migration source Huawei Cloud

Applications End Applications



Self-managed ④
MySQL instance A DRS task

RDS for MySQL
Self-managed instance D
MySQL instance B ②
Self-managed
MySQL instance C ③

Direct
Connect
Migration scenarios Migration solution
• The source databases and tables are sharded in applications and ① ② ③ Create a DRS real-time synchronization task to
data is also routed in the applications. No third-party synchronize database-level incremental data from
middleware is used. MySQL instances A, B, and C to RDS for MySQL instance
• The total amount of data in all table shards is small. If there ④ D.
is only limited potential for data growth, you can combine Start applications on Huawei Cloud and verify
instances to reduce costs. services.
⑤ ⑥ Stop services on the migration source, compare data,
• Applications do not need to be reconstructed. You just need to
modify the mapping between the domain names and IP addresses of and ensure that the data is correct. Then, stop the
the MySQL instances where shards are located. Combining synchronization task.
databases and tables is not required.
50
Distributed Reconstruction Scenario 1 —
Migration Topology Structure of Shard Migration
Huawei Cloud

Migration source shard_001 data_0 ... data_63

RDS 001
shard_002 data_0 ... data_63
MySQL A1 shard_001 data_0 ... data_63

MySQL A2 shard_002 data_0 ... data_63


shard_003 data_0 ... data_63
RDS 002
MySQL A3 shard_003 data_0 ... data_63
DRS online shard_004 data_0 ... data_63
migration
MySQL A4 shard_004 data_0 ... data_63

... data_0 ... data_63 ...


... ... data_0 ... data_63

MySQL A9 shard_009 data_0 ... data_63 shard_009 data_0 ... data_63


RDS 005
data_0 ... data_63 shard_010 data_0 ... data_63
MySQL A10 shard_010

51
Distributed Reconstruction Scenario 2 -
Migration Link Design for Data Aggregation and
Redistribution

Migration source Huawei Cloud

Applications End Applications


⑥ ⑦ ⑤
DRS
synchronizatio End
Self-managed n task
① ⑧
MySQL instance A RDS for MySQL DDM
instance D instance
Self-managed
② ④
MySQL instance B
Self-managed
MySQL instance C ③ DRS migration task

Direct Connect

52
Distributed Reconstruction Scenario 2 — Database
Sharding and Table Sharding for Data Aggregation and
Redistribution
Migration source
Huawei Cloud
MySQL A1 DB 1 index_0
DDM
... RDS for MySQL

Index_63 DB DRS online


DB
migration
index index
MySQL A2
DB 2 index_0 DRS real-time
synchronization
...
Intermediate RDS RDS RDS
Index_63 database

MySQL A3
DB 3 index_0
✓ To migrate data from a self-managed MySQL database to an RDS for MySQL database,
... create a DRS real-time synchronization (full+incremental) task, map database names and
table names, and aggregate data from multiple tables in multiple shards on the source
to a single table on the destination.
Index_63
✓ To migrate data from Huawei Cloud RDS for MySQL to DDM, create a DRS online migration
(full+incremental) task.
53
Distributed Reconstruction Scenario 2 — Database Sharding
Without Table Sharding for Data Aggregation and
Redistribution
⚫ To migrate data from a self-managed MySQL database to an RDS for MySQL database, create a DRS real-time
synchronization (full+incremental) task, map database names, and aggregate data from multiple shards on the
source to a single table on the destination.
⚫ To migrate data from Huawei Cloud RDS for MySQL to DDM, create a DRS online migration (full+incremental)
task.
DB 1 user_info
Migration
MySQL A1
ugc_category
source Huawei Cloud
RDS for MySQL DDM

DB 2 user_info
DRS real-time
MySQL A2 synchronization DB DRS online DB
ugc_category migration
user_info user_info ugc_category

ugc_category
DB 3 user_info
MySQL A3 RDS RDS RDS
Intermediate
ugc_category database

54
Distributed Reconstruction Scenario 2 — Table Sharding
Without Database Sharding for Data Aggregation and
Redistribution
Migration source Huawei Cloud

RDS for MySQL DDM


DB 1 user_0

user_1
DB DRS online
migration DB
MySQL A1
user user
..

user_127
Intermediate RDS RDS RDS
database

✓ To migrate data from a self-managed MySQL database to an RDS for MySQL database, create a DRS real-time synchronization
(full+incremental) task, map database names and table names, and aggregate data from multiple tables on the source to a single
table on the destination.
✓ To migrate data from Huawei Cloud RDS for MySQL to DDM, create a DRS online migration (full+incremental) task.
55
Advantages and Disadvantages of Solutions in
Different Scenarios
Whether Code
Low Solution Scenario Method Advantage Reconstruction Is
Involved

There is only one DB instance used for 1. Mature solution


Migration DRS migration No
one application. 2. No code refactoring

1. There are multiple DB instances


Use DRS shard migration to
complexity

with database and table sharding


Migration

aggregate multiple shards


used for one application on the 1. Reduced customer costs
of multiple instances on
Shard migration source. 2. Little code Very little
the source to a single RDS
2. The total amount of data in all refactoring
DB instance on the
shards is small.
destination.
3. Little service growth is expected.
1. Reduced maintenance
1. There are multiple DB instances
workload of MySQL
with database and table sharding 1. Use DRS tasks and RDS DB
instances
used for one application on the instances to aggregate
2. Simplified application
Data aggregation source. data.
code Yes
and redistribution 2. The total amount of data in all 2. Use DRS tasks and DDM
High shards is large. instances to redistribute
3. Good scalability of
the data layer, simple
3. Service volume is expected to grow data.
scale-out, and no
fast, and so is the amount of data.
impact on services
56
SOW for Distributed Reconstruction Scenarios
Survey the current source and expected future workloads,
evaluate the resources of the destination, including RDS DB
instances and DDM instances, and design the migration topology.
Database and table
sharding Create DDM schemas (mount RDS DB instances and specify the total Project
number of shards), accounts, and logical tables (specify side
Database sharding sharding keys and sharding algorithms).
Application A without table Create synchronization links to synchronize data. After the
sharding Distributed migration is complete, compare the data.
middleware used Reconstruct applications on the destination. Delete the data
Table sharding routing (routing fields and algorithms) logic from the source
without database code. Create new code to access logical tables in DDM in the
same way as accessing a single MySQL table.
No middleware is sharding Data aggregation Customer
Configuration modification: Configure the database connection
used. Shard routing and redistribution
is implemented
pool. side
through the code. Before reconstruction: Applications need to maintain and access
64 data sources (64 IP addresses and 64 ports).
The code branch After reconstruction: Applications only need to maintain and
pulled from access a single data source (DDM 1 IP and 1 port).
application A is
refactored. The Shard migration and Survey the current source and expected future workloads,
sharding middleware instance evaluate the resources of the destination, and design the
Sharding-JDBC is used combination migration topology. Project
in the code.
Database and
side
Create synchronization links to synchronize data. After
table sharding
the migration is complete, compare the data.
Database RDS for MySQL
Application B sharding Modify the mapping of private DNS domain names (mapping
without table from domain names to new IP addresses). Customer
sharding
The application code does not need to be refactored. The side
data routing logic is still implemented on the application
Source Destination side.

57
Reflections on Distributed Database Migration
▪ Use DRS real-time ▪ After the data of table shards in ▪ Change the database table ▪ In a distributed database, the
synchronization tasks to each database shard on the source structure on the source and add primary key value of a table
aggregate data of table shards is aggregated to RDS DB instances the uid column. must be globally unique.
in each database shard on the (intermediate databases) on ▪ Reconstruct applications on the Otherwise, data migration or
source to intermediate databases Huawei Cloud using DRS, the data source, so that a uid value is server merging (for example, in
on the destination. comparison shows that the total added to the data before it is the gaming industry) is
▪ Use DRS online migration tasks number of data records on the written into the database. difficult and the reconstruction
to synchronize data from destination is about 100,000 less ▪ Use a script to copy the value of is expensive.
intermediate databases to DDM than that on the source. the uid column in the primary ▪ If distributed middleware is
instances. ▪ After analysis, the primary key table to the uid column in child used, the middleware can be used
values on the source are not tables based on the associated to ensure that the primary key
globally unique and are unique field content_id. value is globally unique.
within each database shard or ▪ Change the database table ▪ If distributed middleware is not
table shard. After data is structure of the intermediate or used, for example, when data
aggregated to a single table in destination databases. Use splitting and routing are
the intermediate or destination content_id and uid as composite performed in an application, a
databases, data with duplicate primary keys. primary key generator can be
key values is deleted. ▪ Bring the new code online. Perform used to ensure global uniqueness
the addition, deletion, of a primary key value.
modification, and query operations
based on composite primary keys.

Summary and
Migration links Migration problems Solutions
reflections

58
Thank You.
Copyright © 2023 Huawei Technologies Co., Ltd. All Rights Reserved.
The information in this document may contain predictive statements including,
without limitation, statements regarding the future financial and operating
results, future product portfolio, new technology, etc. There are a number of
factors that could cause actual results and developments to differ materially
from those expressed or implied in the predictive statements. Therefore, such
information is provided for reference purpose only and constitutes neither an
offer nor an acceptance. Huawei may change the information at any time
without notice.

You might also like