HyperStore Software Architecture

Uploaded by

Nguyễn Văn Trực

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

67 views6 pages

HyperStore Software Architecture

Uploaded by

Nguyễn Văn Trực

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Module 3.

Hyperstore Software Architecture

Now, we are going to talk about some of the design elements of Hyperstore's
software architecture.
Some of these we have already mentioned.
You now already know Hyperstore is natively S3 compatible.
Hyperstore is also designed using a peer-to-peer architecture, or sometimes
referred to as a shared nothing architecture.
This is inherently designed to allow for partition tolerance or the ability to
continue operations in the event hardware fails.
As we also mentioned, we store metadata in a distributed database to allow for a
scale-out architecture.
For this, we use Cassandra.
Cassandra is not a cloudian product.
It is maintained by Apache and was originally designed by Facebook.
Hyperstore is designed to be multi-tenant.
In other words, being able to share storage from multiple applications and multiple
users,
or in the case of managed service providers, multiple tenants.
We have two programmable APIs in the solution.
The first we've already talked about the S3 API.
There is another called the admin API, which we will discuss later on.
Although Hyperstore is intended as an on-prem solution, we do support the ability
to tier data to the public or other private clouds and support hybrid cloud models.
We also provide intelligent, proactive support capabilities using smart support.
And lastly, is the intelligence of the software, which is the power behind our
Hyperstore solution.
At high level, Hyperstore can be regarded as a collection of integrated services.
Each, however, plays a different role.
These are in access, metadata management, data management, user management, quality
of service, system management from the CLI, and system management in terms of the
web UI portal.
And lastly, configuration management.
Some of these services we refer to as common services.
These common services run on every node in the cluster. Some are regarded as master
slave services and only run on some of the nodes.
The common services are S3, Cassandra, Hyperstore, the admin service, and the CMC
service.
In a large, distributed environment, it is beneficial and in very, very large
environments necessary to have a single source of configuration management.
With Cloudy and Hyperstore, we achieve this using the Puppet Configuration
Management Platform.
This modular approach allows us to scale literally for both capacity and
performance.
It is a single software stack, which makes it easy to scale with no separate
gateways, storage managers, or metadata servers.
This allows us to provide a single scalable namespace with capacity and performance
scaling as the cluster expands.
Hyperstore Interfaces. We have two programmable APIs, the S3 API, and the admin
management API.
Applications capable of interfacing directly with the S3 API can do so directly.
The other API is the admin API, which listens on port 1944-3 on all servers in the
cluster, assuming a single region.
A customer or partner who wishes to create their own management portal can do so
and utilize the same functionality as the CMC web interface.
This is also useful for customers or partners who wish to write scripts which can
be reproduced on a required basis directly with the admin management API.
Our CMC is our web interface portal. From this portal, all administration tasks can
be performed as you can see from the slide.
The web interface can interface directly with the admin management API.
When an action is taken on the portal, in the background, the corresponding admin
management API request is sent to one of the servers listening on the admin
management API port.
You can also see an arrow pointing from the CMC web interface to the S3 API.
This is used to allow a user or administrator or group administrator to view the
buckets and objects of users on the system.
Our Hyperstore S3 service supports the vast majority of the Amazon S3 REST API,
including advanced features.
It supports the same REST formats for the following areas as specified in the S3
specification.
It works over HTTP and HTTPS. It is responsible for error responses and the REST
error format,
common request headers and response headers, and authenticating requests,
operations on service, and bucket objects on ACLs.
The Hyperstore Admin API. In comparison, whereas the S3 API is used for data
management, the Hyperstore Admin API is used to manage Hyperstore.
This includes creating user groups, managing storage policies and rating plans,
setting quality of service controls, storing all tiering credentials,
managing public URLs, usage reporting and billing, as well as system services and
system monitoring.
The Cloudian Management Console, or CMC, is a customizable or brandable multi-
tenant web user portal.
This portal allows for management of the Hyperstore cluster and reporting.
It provides an S3 client or data explorer with upload, download, delete and file
sharing features.
It provides group and user account management through role-based access control, as
well as configuration settings for SMTP and SMMP alerting for cluster health.
Hyperstore Data Management. From a high level, the structure of a Hyperstore
cluster consists of a number of nodes.
We support a minimum of three nodes. Each of those nodes must reside within a data
center.
Note, a node cannot reside in more than one data center and should be based on the
physical location.
A data center or multiple data centers reside within a single region.
A region generally is a geographic location. However, a region can also be used to
segregate data from other data, for example, multi-tenant.
If tenants require data separation from other tenants at the node level, then
multiple regions could be a way to achieve this.
Multiple regions would exist within a single cluster.
A single Cloudian Management Console can manage up to 20 regions.
A user can create a bucket or select a bucket from any region in the cluster.
A bucket within a region can span multiple data centers but cannot span multiple
regions.
Data is distributed between nodes and data centers in a region based on a defined
storage policy.
Hyperstore's cluster architecture is based on a shared nothing architecture.
Each node has connectivity to every other node in the cluster.
There is no limitation on individual nodes and nodes can be a varying capacity or
hardware.
It can be on premise or in the Cloud or indeed physical or virtual.
The peer-to-peer nature ensures that data is only ever one hop away.
This provides for consistent performance.
It is identical software on all the nodes irrespective of the hardware.
As the software is built around the S3 API at its core, there are no gateways
involved and you can start small and grow as needed.
As S3 is a common service, the S3 API is reachable on every single node in the
cluster.
Therefore, to scale the S3 capability is a simple matter of adding additional
nodes.
A single Hyperstore cluster can have one or many storage policies.
Storage policies determine the data protection and consistency level applied,
which in turn determines the data protection and consistency level at the bucket
and object.
Consistency levels can be applied to ensure strong consistency, eventual
consistency, or dynamic consistency.
These will be discussed later.
From a disaster recovery perspective, we can create standard replication across
data centers,
replicated erasure coding across data centers, and where there are three or more
data centers in a metro area,
then distributed erasure coding is an option.
In this slide, we will try to show how data is distributed in a cluster,
when an object is stored a combination of bucket name and the object name,
as well as date and timestamp, is converted in an MD5, which is then converted into
an integer.
This gives us a unique key, which we use to determine where we will store the data
in the cluster.
For metadata, the storage policy will also determine the number of metadata copies
calculated.
A node list is generated based on the number of metadata copies required,
and stored in the Cassandra database on those nodes on SSD.
A base similar process happens for the data.
The number of object copies or number of fragments require based on replication or
erasure coding,
as well as the size of the object, will generate a node list to determine where
each fragment or copy of data will be stored
on the high-capacity drives on each of the relevant nodes.
This diagram attempts to explain that process.
At the top is the client which needs to store an object.
The S3 request is sent to one of the nodes in the cluster through either the use of
a load balancer or DNS round robin.
That node takes on a dynamic role which we call the coordinator role.
It is the coordinator's responsibility to determine if there are enough nodes
available to satisfy the storage policy data protection scheme.
If there are, if there are, the coordinator node is then responsible for sending
the data fragments to each of the nodes that will be storing that data fragment.
In this example, the object being stored is 60 megabytes in size.
We adopt a feature called chunking in order to reduce that 60 megabyte file down
into manageable 10 megabyte chunks.
Each of those 10 megabit chunks can be regarded as a separate object for storage
and as such each chunk will be protected
using the storage policy data protection scheme in this instance, erasure coding 4
plus 2.
As the initial location where data is stored is based on the text of the bucket
name, object name, timestamp,
and it is possible that the resultant integer range could be allocated to some
disks more than others.
This could produce an imbalance of data storage.
Plaudian Hyperstore uses a feature called dynamic object routing which will
periodically determine if certain disks are being more utilized than others.
If this happens, the dynamic object routing table will start to use the other disks
in the nodes.
In order to store the data in a more balanced format, dynamic object routing will
also be employed where a disk in a node has failed.
Then the remaining disks in that node can be used to store data that was destined
for the failed disk.
Cluster expansion is a simple process of adding additional nodes to increase
capacity and or front end compute.
It would be inefficient if new nodes in the cluster were only able to receive net
new data.
Therefore, in the process of adding additional nodes will also rebalance the
existing data across all nodes in the cluster.
This allows for a more balanced capacity usage going forward.
As well as data expansion being a non-intrusive process which can all be done
online, so too is our software upgrades process.
There is no requirement for downtime.
Once initiated, it is an automated process.
Software upgrades are currently done using the puppet framework where the upgrade
is processed on a node by node basis.
And assuming a data protection and consistency level has been employed to not be
impacted by a node failure,
then the node upgrade will have no impact to the operation of the cluster.
Automated maintenance and self-healing.
Data protection determines the durability of the stored data.
Consistency level determines data availability.
With a consistency level which utilizes eventual consistency, this ensures the
availability of data even when a disk or node is in a failure state.
eventual consistency also helps reduce the S3 write request latencies as not all
writes need to be committed before acknowledgement to the client.
It however does still ensure a high degree of data durability.
It must be noted that not all of the objects intended replicas or erasure coded
fragments may exist in the system when using an eventual consistency policy.
The data will be written when the node or disk is brought back online, either
automatically via proactive repair or other means.
We utilize both proactive repair, repair on read and auto repair for this process.
The current state of repair within the cluster can be seen via the CMC by clicking
on the analytics tab and then rebuilding status subtab.
Here we can see the repair status legend which identifies which nodes are in
proactive rebuild, rebalancing data, conducting a data integrity check or requiring
repair.
It also shows nodes which the cluster regards as all clear.
Green is good.
Proactive repair happens when a node has been down for a limited time.
Apparently the default is for restoration of any replicas that were destined for
that node but the node was down for less than 4 hours.
This is also known as hints.
If the node has been offline for greater than 4 hours then additional features are
required to restore the intended number of replicas or erasure coding fragments to
the node.
One of these is called auto repair.
Not all nodes in the cluster will however run auto repair at the same time and is
staggered over a 30 day period for all nodes in the cluster.
A common process to repair objects outside of these time scales is repair on read.
This will repair replicas or fragments which are either out of date or missing
during a read process.
If a replica or fragment is missing during a get request then that missing data or
out of data is re-filled during that get process.
System architecture.
As hopefully you'll be starting to understand Cloudy and Hyperstore is very much a
network based solution.
As such there are some network components and information that Cloudy and
Hyperstore relies on to function correctly.
One of these bits of information is the network name of the cluster.
We refer to this as an S3 endpoint.
The S3 endpoint is a URL the S3 client uses to connect to Hyperstore.
If this is a public URL with no access restrictions that is all that is required.
If however the data is protected then to ensure only authorized access then an
access key and secret key is also required.
As the S3 endpoint is a URL that requires a DNS lookup to translate the DNS name to
an IP address.
It is recommended that the DNS lookup points to a virtual IP address commonly
running on a load balancer.
The load balancer will then, through health checks, be able to determine which of
the cluster nodes are able to respond to the S3 request.
If a load balancer is not used then basic DNS round robin is an option.
However this is not without its challenges.
As can be seen in this slide using DNS round robin can very quickly become complex
as each endpoint and URL requires an entry for each Hyperstore node.
In a small cluster of the three nodes this may be manageable.
However as clusters typically grow each additional node added requires multiple
entries in the DNS table.
These required DNS entries can be seen here.
As mentioned the S3 service endpoint is just one of several required DNS entries.
There will also be one of these per region.
Each bucket that is created in Hyperstore can also be accessed via the bucket name.
For this we would recommend the use of a wildcard DNS entry as bucket names may not
always be readily known ahead of time.
And this helps reduce the management overhead of having to add a new DNS entry
every time a bucket is created.
We also support static website hosting from a bucket.
If this is a technology or feature that is required then as this is enabled at the
bucket level you will also need a wildcard entry for the website endpoint.
A critical DNS entry is the admin service.
As each server or node runs the admin service has to be added into the DNS.
The Cloudian Management Console service needs to be able to resolve that entry so
it can issue admin API requests.
Lastly every node can also respond and supply access to the Cloudian Management
Console.
Therefore a DNS entry for the CMC is also required.
Cloudian would recommend the use of a load balancer rather than DNS round robin for
the following reason.
DNS typically does not employ health checks of the service and is only supplying an
IP address to the client that has requested the name.
It is possible the supplied IP address may be down and therefore the S3, CMC or
admin request will time out based on the TTL in your network.
This could be considerable.
Subsequent requests will receive the next IP address in the DNS list.
However this gives inconsistent performance and user experience.
By employing a load balancer it is possible to not only send requests to IP
addresses which are online but extend that to only send the request to a node where
the service is also online.
It is possible to have a hyperstore node where only the S3 service has been taken
down for a reason.
In this scenario we would still want CMC and admin requests to still be routed to
this node.
Hyperstore can now also be found in cloud architecture.
Customers and users and tenants can access data via the CMC or third party S3 API
application and clients.
Service providers and administrative tenants can use the cloud provider management
portal or CMC portal
in order to configure access to clouding and hyperstore through the cloud
architecture or orchestration tools natively.
This allows for the provisioning, usage tracking and reporting, alerting, logging
and quota management as well as chargeback.
An example of this is VMware VCD using a virtual object storage engine.
Some of the capabilities and offerings can be seen here for management.
Using the VCD UI integration provides a single management platform for both
orchestration and usage of clouding and hyperstore.
It can be used for importing and exporting VCD objects and provide single sign-on
between the hyperstore CMC and VCD.
It also provides for multi-region support.
Further capabilities include reporting for usage tracking and utility based
consumption.
Some of the cases supported are to provide an S3 compatible storage service used
for temporary storage.
As a document store service, file and share with guest access and to hold VCD
catalogs and templates.
User management capabilities include object storage specific rights in VCD, quota
management and usage integration.
There is also close integration using hyperstore APIs for S3, IAM and admin.

HyperStore S3 API Compliance Guide
No ratings yet
HyperStore S3 API Compliance Guide
125 pages
HyperStoreAWS APIs V 8.1
No ratings yet
HyperStoreAWS APIs V 8.1
142 pages
Ccs Mod3
No ratings yet
Ccs Mod3
64 pages
HyperStoreAdminGuide v-7.2.3
No ratings yet
HyperStoreAdminGuide v-7.2.3
1,073 pages
Cloudian Hyperstore: Features/Benefits Use Cases
No ratings yet
Cloudian Hyperstore: Features/Benefits Use Cases
2 pages
CCS Module 3
No ratings yet
CCS Module 3
70 pages
AWS Exam Prep for Cloud Professionals
No ratings yet
AWS Exam Prep for Cloud Professionals
9 pages
Amazon EC2 and S3: Features Overview
No ratings yet
Amazon EC2 and S3: Features Overview
13 pages
Cloudian HyperStore Technical Whitepaper
No ratings yet
Cloudian HyperStore Technical Whitepaper
35 pages
HyperStoreInstallGuide V 7.5
No ratings yet
HyperStoreInstallGuide V 7.5
46 pages
CHAP - IV (Cloud Computing)
No ratings yet
CHAP - IV (Cloud Computing)
30 pages
Writen Question Answare
No ratings yet
Writen Question Answare
5 pages
AWS Interview Quetion Ans
No ratings yet
AWS Interview Quetion Ans
70 pages
F5 LB Best Practices v2.1
No ratings yet
F5 LB Best Practices v2.1
69 pages
AWS Interview
100% (1)
AWS Interview
38 pages
HyperStoreAdminGuide V 7.4.1
No ratings yet
HyperStoreAdminGuide V 7.4.1
1,125 pages
S3 CloudFront RDS DynamoDB
No ratings yet
S3 CloudFront RDS DynamoDB
49 pages
Preparing For ACE Module 2 v2.0
No ratings yet
Preparing For ACE Module 2 v2.0
53 pages
Unit III Tcs 750
No ratings yet
Unit III Tcs 750
37 pages
Cloudian HyperStore Introduction
No ratings yet
Cloudian HyperStore Introduction
6 pages
AWS Interview Questions
100% (2)
AWS Interview Questions
58 pages
AWS Best Practices For Storage Partners
No ratings yet
AWS Best Practices For Storage Partners
20 pages
Cloud Storage For Cloud Computing
No ratings yet
Cloud Storage For Cloud Computing
12 pages
1691654424037
No ratings yet
1691654424037
13 pages
AWS Architect
100% (1)
AWS Architect
72 pages
Netflix Cloud Engineering Insights
No ratings yet
Netflix Cloud Engineering Insights
66 pages
CC UNIT 3 Part1 Mod
No ratings yet
CC UNIT 3 Part1 Mod
22 pages
Digitalcloud - Training-Amazon S3 and Glacier
No ratings yet
Digitalcloud - Training-Amazon S3 and Glacier
18 pages
Amazon Web Services
No ratings yet
Amazon Web Services
19 pages
s3 DG 20060301
100% (4)
s3 DG 20060301
100 pages
Aws Csaa Notes
No ratings yet
Aws Csaa Notes
23 pages
AWS Solutions Architect Associate Certification Notes
No ratings yet
AWS Solutions Architect Associate Certification Notes
20 pages
Cloud Computing Course Guide
No ratings yet
Cloud Computing Course Guide
76 pages
Now Let's Look at Why It's Such A Big Hit, So Everyone Is Trying To Use AWS, Everyone Is Trying To
No ratings yet
Now Let's Look at Why It's Such A Big Hit, So Everyone Is Trying To Use AWS, Everyone Is Trying To
12 pages
CC - Week 6 - IaaS
No ratings yet
CC - Week 6 - IaaS
51 pages
AWS Cloud Practitioner Essentials (CLF-C01)
0% (1)
AWS Cloud Practitioner Essentials (CLF-C01)
55 pages
HyperStoreAdminGuide V 7.5
No ratings yet
HyperStoreAdminGuide V 7.5
586 pages
8-Resource Management & Case Studies-06-10-2023
No ratings yet
8-Resource Management & Case Studies-06-10-2023
19 pages
Case Study: Amazon AWS: CSE 40822 - Cloud Computing Prof. Douglas Thain University of Notre Dame
No ratings yet
Case Study: Amazon AWS: CSE 40822 - Cloud Computing Prof. Douglas Thain University of Notre Dame
33 pages
What Is An Amazon S3 Bucket
No ratings yet
What Is An Amazon S3 Bucket
7 pages
Cis Co3, Co4 Notes
No ratings yet
Cis Co3, Co4 Notes
3 pages
S 3
No ratings yet
S 3
7 pages
Cloud API: What Is AWS?
No ratings yet
Cloud API: What Is AWS?
7 pages
AWS Interview Question
No ratings yet
AWS Interview Question
210 pages
ACE Module 2 Planning and Configuring Cloud Solutions v2.0
No ratings yet
ACE Module 2 Planning and Configuring Cloud Solutions v2.0
53 pages
CP Unit 2
No ratings yet
CP Unit 2
67 pages
Private Cloud
No ratings yet
Private Cloud
26 pages
Cloud Computing
No ratings yet
Cloud Computing
11 pages
Cloud Computing
No ratings yet
Cloud Computing
34 pages
Durability and Availability
No ratings yet
Durability and Availability
6 pages
Aws Clf-Co2 Qa
No ratings yet
Aws Clf-Co2 Qa
44 pages
AWS Cloud Computing Overview
No ratings yet
AWS Cloud Computing Overview
55 pages
Microservices in E-Commerce - An Approach To Avoid Monolithic Architecture
No ratings yet
Microservices in E-Commerce - An Approach To Avoid Monolithic Architecture
9 pages
AWS Storage Options
100% (1)
AWS Storage Options
34 pages
Q1 Installation and Configuration of Hadoop/Euceliptus Etc
No ratings yet
Q1 Installation and Configuration of Hadoop/Euceliptus Etc
26 pages
Amazon Machine Images and Instances - An Amazon Machine
No ratings yet
Amazon Machine Images and Instances - An Amazon Machine
5 pages
Unit 6 - CC
No ratings yet
Unit 6 - CC
40 pages
FileCloud Battlecard
No ratings yet
FileCloud Battlecard
3 pages
Cloudian For AI - General RAG - TC
No ratings yet
Cloudian For AI - General RAG - TC
20 pages
Cloudian HyperStore On VMware Cloud Foundation With Tanzu
No ratings yet
Cloudian HyperStore On VMware Cloud Foundation With Tanzu
19 pages
CloudianHyperStoreReleaseNotes 8.2.1
No ratings yet
CloudianHyperStoreReleaseNotes 8.2.1
5 pages
HyperStore Capabilities
No ratings yet
HyperStore Capabilities
2 pages
ASP.NET Web Development Guide
No ratings yet
ASP.NET Web Development Guide
112 pages
Alteon OS 26 Data Sheet
No ratings yet
Alteon OS 26 Data Sheet
4 pages
Design Patterns For Microservices
No ratings yet
Design Patterns For Microservices
16 pages
OCI - DNS Traffic-Management-100
No ratings yet
OCI - DNS Traffic-Management-100
29 pages
Answers To Kubernetes Interview Questions
100% (5)
Answers To Kubernetes Interview Questions
3 pages
Note IOT Module 4
No ratings yet
Note IOT Module 4
12 pages
Matrimonialsitemanagementsystem
No ratings yet
Matrimonialsitemanagementsystem
78 pages
Case Study of Cloud OS
No ratings yet
Case Study of Cloud OS
3 pages
UNIT-IV Advanced Architecture Part-2
100% (1)
UNIT-IV Advanced Architecture Part-2
52 pages
AZ-104 - UD - 103 Removed - PDF
100% (1)
AZ-104 - UD - 103 Removed - PDF
67 pages
Hbase Mapr
No ratings yet
Hbase Mapr
25 pages
VMware ESXI PDF
No ratings yet
VMware ESXI PDF
74 pages
Zimbra Service Provider Benchmark
No ratings yet
Zimbra Service Provider Benchmark
23 pages
Ambo University: Code: - Cosc6021
No ratings yet
Ambo University: Code: - Cosc6021
3 pages
NetScaler ADC TDM Presentation
100% (1)
NetScaler ADC TDM Presentation
138 pages
RA 2114 Nutanix Enterprise Cloud For AI
No ratings yet
RA 2114 Nutanix Enterprise Cloud For AI
66 pages
DevOps Patterns To Scale Web Applications Using Cloud Services
No ratings yet
DevOps Patterns To Scale Web Applications Using Cloud Services
10 pages
FortiADC GLB Deployment Guide
No ratings yet
FortiADC GLB Deployment Guide
23 pages
70 341 PDF
No ratings yet
70 341 PDF
38 pages
FortiGate Infrastructure 7.2 Lab Guide-Online
No ratings yet
FortiGate Infrastructure 7.2 Lab Guide-Online
123 pages
3 Networking in AWS
100% (1)
3 Networking in AWS
34 pages
3 Designing Pastebin - Grokking The System Design Interview
No ratings yet
3 Designing Pastebin - Grokking The System Design Interview
7 pages
DMP Admin 60pr1 Sol
No ratings yet
DMP Admin 60pr1 Sol
166 pages
(En) Brochure Pas K (m8 d01)
No ratings yet
(En) Brochure Pas K (m8 d01)
8 pages
Electronics Project in World
No ratings yet
Electronics Project in World
48 pages
Cisco IOS 15.0S SLB Config Guide
No ratings yet
Cisco IOS 15.0S SLB Config Guide
189 pages
Lecture 1.2.2
No ratings yet
Lecture 1.2.2
13 pages
LTM Implementations 11 2 1
No ratings yet
LTM Implementations 11 2 1
212 pages
Vsphere ICM 8 Lab 24
No ratings yet
Vsphere ICM 8 Lab 24
41 pages

HyperStore Software Architecture

Uploaded by

HyperStore Software Architecture

Uploaded by

Module 3.

Hyperstore Software Architecture

You might also like