[go: up one dir, main page]

0% found this document useful (0 votes)
22 views205 pages

Cloud Computing Notes

The document provides an overview of cloud computing, including its delivery models (IaaS, PaaS, SaaS), advantages, types, and ethical challenges. It discusses net-centric computing principles and peer-to-peer computing architectures, highlighting their characteristics, benefits, and drawbacks. Additionally, it addresses major issues in cloud computing such as privacy, compliance, security, and sustainability.

Uploaded by

balaparameswari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views205 pages

Cloud Computing Notes

The document provides an overview of cloud computing, including its delivery models (IaaS, PaaS, SaaS), advantages, types, and ethical challenges. It discusses net-centric computing principles and peer-to-peer computing architectures, highlighting their characteristics, benefits, and drawbacks. Additionally, it addresses major issues in cloud computing such as privacy, compliance, security, and sustainability.

Uploaded by

balaparameswari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 205

UNIT I:

Introduction: Network centric computing, Network centric content, peer-to –peer systems,
cloud computing delivery models and services, Ethical issues, Vulnerabilities, Major
challenges for cloud computing. Parallel and Distributed Systems: Introduction,
architecture, distributed systems, communication protocols, logical clocks, message delivery
rules, concurrency, model concurrency with Petri Nets

Cloud Computing is the delivery of computing services such as servers, storage, databases,
networking, software, analytics, intelligence, and more, over the Cloud (Internet).

Cloud Computing provides an alternative to the on-premises datacentre. With an on-premises


datacentre, we have to manage everything, such as purchasing and installing hardware,
virtualization, installing the operating system, and any other required applications, setting up
the network, configuring the firewall, and setting up storage for data. After doing all the set-
up, we become responsible for maintaining it through its entire lifecycle.

But if we choose Cloud Computing, a cloud vendor is responsible for the hardware purchase
and maintenance. They also provide a wide variety of software and platform as a service. We
can take any required services on rent. The cloud computing services will be charged based
on usage.
The cloud environment provides an easily accessible online portal that makes handy for the
user to manage the compute, storage, network, and application resources. Some cloud service
providers are in the following figure.

Advantages of cloud computing

 Cost: It reduces the huge capital costs of buying hardware and software.
 Speed: Resources can be accessed in minutes, typically within a few clicks.
 Scalability: We can increase or decrease the requirement of resources according to
the business requirements.
 Productivity: While using cloud computing, we put less operational effort. We do not
need to apply patching, as well as no need to maintain hardware and software. So, in
this way, the IT team can be more productive and focus on achieving business goals.
 Reliability: Backup and recovery of data are less expensive and very fast for business
continuity.
 Security: Many cloud vendors offer a broad set of policies, technologies, and controls
that strengthen our data security.

Types of Cloud Computing

 Public Cloud: The cloud resources that are owned and operated by a third-party
cloud service provider are termed as public clouds. It delivers computing resources
such as servers, software, and storage over the internet
 Private Cloud: The cloud computing resources that are exclusively used inside a
single business or organization are termed as a private cloud. A private cloud may
physically be located on the company’s on-site datacentre or hosted by a third-party
service provider.
 Hybrid Cloud: It is the combination of public and private clouds, which is bounded
together by technology that allows data applications to be shared between them.
Hybrid cloud provides flexibility and more deployment options to the business.

Types of Cloud Services

1. Infrastructure as a Service (IaaS): In IaaS, we can rent IT infrastructures like


servers and virtual machines (VMs), storage, networks, operating systems from a
cloud service vendor. We can create VM running Windows or Linux and install
anything we want on it. Using IaaS, we don’t need to care about the hardware or
virtualization software, but other than that, we do have to manage everything else.
Using IaaS, we get maximum flexibility, but still, we need to put more effort into
maintenance.
2. Platform as a Service (PaaS): This service provides an on-demand environment for
developing, testing, delivering, and managing software applications. The developer is
responsible for the application, and the PaaS vendor provides the ability to deploy and
run it. Using PaaS, the flexibility gets reduce, but the management of the environment
is taken care of by the cloud vendors.
3. Software as a Service (SaaS): It provides a centrally hosted and managed software
services to the end-users. It delivers software over the internet, on-demand, and
typically on a subscription basis. E.g., Microsoft One Drive, Dropbox, WordPress,
Office 365, and Amazon Kindle. SaaS is used to minimize the operational cost to the
maximum extent.
Net-centric computing is a set of principles that have been heavily adopted by the non profit
organization Cloud Security Alliance. Cloud computing is here to stay, but the cloud
ecosystem can be complex to navigate. This article will look at net-centric principles and
break them down for you in plain English.

The cloud is a computer network of remote computing servers, made accessible on-demand
to users who may be located anywhere. The cloud gives you the ability to access, store and
share your information and data from any Internet-connected device.

The cloud has revolutionized the way that companies store and share information in
comparison to traditional on-premise infrastructures. However, not all organizations have yet
taken advantage of this technology. The Cloud Computing Service Provider industry includes
those firms are as follows:

 IaaS (Infrastructure as a Service)


 PaaS (Platform as a Service)
 SaaS (Software as a Service)

SaaS is basically the application delivery over the Internet. The application is installed on to
the cloud provider’s servers and each user has a web browser interface to access the
applications. The data that you store in this environment can be accessed from any device
with an internet connection.

PaaS offers a platform over the cloud where each user can access resources such as databases,
storage, and bandwidth with a single login. The platform enables users to develop and deploy
applications in which they can use applications programming interfaces (API).

IaaS provides storage, processor power, memory, operating systems, and networking
capabilities to customers so that they do not have to buy and maintain their own computer
system infrastructure.

For more details please read the Cloud-Based Services article.


Net-Centric:

Net-Centric is a way to manage your data, applications, and infrastructure in the cloud. Net-
centric cloud computing can be considered an evolution of Software as a Service (SaaS). It
leverages the power of the Internet to provide an environment for data, applications, and
infrastructure on demand. It allows you to manage everything from one interface without
worrying about hardware or server management issues.

The term net-centric combines network-based computing with its integration of various types
of information technology resources – servers, storage devices, servers, computers – into
centralized repositories that are served using standard Web-based protocols such as HTTP or
HTTPS via a global computer communications network like the internet.

Net-centric computing allows organizations to focus on their core business needs without
limiting themselves by software or hardware limitations imposed on their infrastructure. In
other words, when an organization adopts net-centric principles, they are able to completely
virtualize its IT footprint while still being able to take advantage of modern networking
technologies like LANs and WANs.

Net-centric cloud computing service is a combination of IaaS, PaaS, and SaaS. What this
means is that instead of buying hardware and software for your own data center, you buy it
from the cloud provider. This gives you the ability to move your data to the cloud and access
it from anywhere.

Net-centric computing service allows you to centralize your applications with a single
interface. It provides fully managed services according to user’s specific requirements, which
are invoked in real-time as needed rather than being provided on-demand or already
provisioned for use. The concept of net-centric computing enables multiple distributed clients
to access a single entity’s applications in real-time.

Benefits of Net-Centric Computing:

Net-centric computing allows organizations to effectively manage their IT infrastructure via a


unified application that is more flexible and easier to maintain without the added overhead of
operating multiple hardware platforms. In turn, organizations of all sizes can now enjoy the
same benefits that larger more traditional enterprises are able to with their own data centers.
The net-centric virtualization platform establishes a single management point for security,
performance, and capacity, as well as cloud applications and services.

In cloud computing, there are many advantages over traditional data center technologies.
Cloud computing allows for agility on a business level by not having to invest in maintaining
multiple physical data centers.

Cloud computing has gained traction with both enterprises and consumers. It is expected that
CSPs will continue to embrace this technology as it becomes the norm for organizations of all
kinds. As a result, CISOs need to be trained on how to adopt net-centric principles for
managing the cloud without limits in order to be successful within this new market.
Peer to peer computing

The peer to peer computing architecture contains nodes that are equal participants in data
sharing. All the tasks are equally divided between all the nodes. The nodes interact with each
other as required as share resources.

A diagram to better understand peer to peer computing is as follows −

Characteristics of Peer to Peer Computing

The different characteristics of peer to peer networks are as follows −

 Peer to peer networks are usually formed by groups of a dozen or less computers.
These computers all store their data using individual security but also share data with
all the other nodes.
 The nodes in peer to peer networks both use resources and provide resources. So, if
the nodes increase, then the resource sharing capacity of the peer to peer network
increases. This is different than client server networks where the server gets
overwhelmed if the nodes increase.
 Since nodes in peer to peer networks act as both clients and servers, it is difficult to
provide adequate security for the nodes. This can lead to denial of service attacks.
 Most modern operating systems such as Windows and Mac OS contain software to
implement peer to peer networks.

Advantages of Peer to Peer Computing

Some advantages of peer to peer computing are as follows −

 Each computer in the peer to peer network manages itself. So, the network is quite
easy to set up and maintain.
 In the client server network, the server handles all the requests of the clients. This
provision is not required in peer to peer computing and the cost of the server is saved.
 It is easy to scale the peer to peer network and add more nodes. This only increases
the data sharing capacity of the system.
 None of the nodes in the peer to peer network are dependent on the others for their
functioning.
Disadvantages of Peer to Peer Computing

Some disadvantages of peer to peer computing are as follows −

 It is difficult to backup the data as it is stored in different computer systems and there
is no central server.
 It is difficult to provide overall security in the peer to peer network as each system is
independent and contains its own data

Cloud Service Models

There are the following three types of cloud service models -

1. Infrastructure as a Service (IaaS)


2. Platform as a Service (PaaS)
3. Software as a Service (SaaS)

Infrastructure as a Service (IaaS)

IaaS is also known as Hardware as a Service (HaaS). It is a computing infrastructure


managed over the internet. The main advantage of using IaaS is that it helps users to avoid
the cost and complexity of purchasing and managing the physical servers.

Characteristics of IaaS

There are the following characteristics of IaaS -

 Resources are available as a service


 Services are highly scalable
 Dynamic and flexible
 GUI and API-based access
 Automated administrative tasks
Example: DigitalOcean, Linode, Amazon Web Services (AWS), Microsoft Azure, Google
Compute Engine (GCE), Rackspace, and Cisco Metacloud.

To know more about the IaaS, click here.

Platform as a Service (PaaS)

PaaS cloud computing platform is created for the programmer to develop, test, run, and
manage the applications.

Characteristics of PaaS

There are the following characteristics of PaaS -

 Accessible to various users via the same development application.


 Integrates with web services and databases.
 Builds on virtualization technology, so resources can easily be scaled up or down as
per the organization's need.
 Support multiple languages and frameworks.
 Provides an ability to "Auto-scale".

Example: AWS Elastic Beanstalk, Windows Azure, Heroku, Force.com, Google App
Engine, Apache Stratos, Magento Commerce Cloud, and OpenShift.

To know more about PaaS, click here.

Software as a Service (SaaS)

SaaS is also known as "on-demand software". It is a software in which the applications are
hosted by a cloud service provider. Users can access these applications with the help of
internet connection and web browser.

Characteristics of SaaS

There are the following characteristics of SaaS -

 Managed from a central location


 Hosted on a remote server
 Accessible over the internet
 Users are not responsible for hardware and software updates. Updates are applied
automatically.
 The services are purchased on the pay-as-per-use basis

Example: BigCommerce, Google Apps, Salesforce, Dropbox, ZenDesk, Cisco WebEx,


ZenDesk, Slack, and GoToMeeting.
To know more about the SaaS, click here.

Difference between IaaS, PaaS, and SaaS

The below table shows the difference between IaaS, PaaS, and SaaS -

IaaS Paas SaaS

It provides virtual
It provides a virtual data center to store It provides web software
platforms and tools to
information and create platforms for app and apps to complete
create, test, and deploy
development, testing, and deployment. business tasks.
apps.

It provides runtime
It provides access to resources such as environments and It provides software as a
virtual machines, virtual storage, etc. deployment tools for service to the end-users.
applications.

It is used by network architects. It is used by developers. It is used by end users.

SaaS provides
PaaS provides
IaaS provides only Infrastructure. Infrastructure+Platform
Infrastructure+Platform.
+Software.

Cloud Computing is a new name for an old concept. The delivery of computing services from
a remote location. Cloud Computing is Internet-based computing, where shared resources,
software, and information are provided to computers and other devices on demand.

These are major issues in Cloud Computing:

1. Privacy: The user data can be accessed by the host company with or without permission.
The service provider may access the data that is on the cloud at any point in time. They could
accidentally or deliberately alter or even delete information.

2. Compliance: There are many regulations in places related to data and hosting. To comply
with regulations (Federal Information Security Management Act, Health Insurance Portability
and Accountability Act, etc.) the user may have to adopt deployment modes that are
expensive.

3. Security: Cloud-based services involve third-party for storage and security. Can one
assume that a cloud-based company will protect and secure one’s data if one is using their
services at a very low or for free? They may share users’ information with others. Security
presents a real threat to the cloud.

4. Sustainability: This issue refers to minimizing the effect of cloud computing on the
environment. Citing the server’s effects on the environmental effects of cloud computing, in
areas where climate favors natural cooling and renewable electricity is readily available, the
countries with favorable conditions, such as Finland, Sweden, and Switzerland are trying to
attract cloud computing data centers. But other than nature’s favors, would these countries
have enough technical infrastructure to sustain the high-end clouds?

5. Abuse: While providing cloud services, it should be ascertained that the client is not
purchasing the services of cloud computing for a nefarious purpose. In 2009, a banking
Trojan illegally used the popular Amazon service as a command and control channel that
issued software updates and malicious instructions to PCs that were infected by the malware
So the hosting companies and the servers should have proper measures to address these
issues.

6, Higher Cost: If you want to use cloud services uninterruptedly then you need to have a
powerful network with higher bandwidth than ordinary internet networks, and also if your
organization is broad and large so ordinary cloud service subscription won’t suit your
organization. Otherwise, you might face hassle in utilizing an ordinary cloud service while
working on complex projects and applications. This is a major problem before small
organizations, that restricts them from diving into cloud technology for their business.

7. Recovery of lost data in contingency: Before subscribing any cloud service provider goes
through all norms and documentations and check whether their services match your
requirements and sufficient well-maintained resource infrastructure with proper upkeeping.
Once you subscribed to the service you almost hand over your data into the hands of a third
party. If you are able to choose proper cloud service then in the future you don’t need to
worry about the recovery of lost data in any contingency.

8. Upkeeping(management) of Cloud: Maintaining a cloud is a herculin task because a


cloud architecture contains a large resources infrastructure and other challenges and risks as
well, user satisfaction, etc. As users usually pay for how much they have consumed the
resources. So, sometimes it becomes hard to decide how much should be charged in case the
user wants scalability and extend the services.

9. Lack of resources/skilled expertise: One of the major issues that companies and
enterprises are going through today is the lack of resources and skilled employees. Every
second organization is seeming interested or has already been moved to cloud services.
That’s why the workload in the cloud is increasing so the cloud service hosting companies
need continuous rapid advancement. Due to these factors, organizations are having a tough
time keeping up to date with the tools. As new tools and technologies are emerging every day
so more skilled/trained employees need to grow. These challenges can only be minimized
through additional training of IT and development staff.

10. Pay-per-use service charges: Cloud computing services are on-demand services a user
can extend or compress the volume of the resource as per needs. so you paid for how much
you have consumed the resources. It is difficult to define a certain pre-defined cost for a
particular quantity of services. Such types of ups and downs and price variations make the
implementation of cloud computing very difficult and intricate. It is not easy for a firm’s
owner to study consistent demand and fluctuations with the seasons and various events. So it
is hard to build a budget for a service that could consume several months of the budget in a
few days of heavy use.
Cloud computing provides various advantages, such as improved collaboration, excellent
accessibility, Mobility, Storage capacity, etc. But there are also security risks in cloud
computing.

Some most common Security Risks of Cloud Computing are given below-

Data Loss

Data loss is the most common cloud security risks of cloud computing. It is also known as
data leakage. Data loss is the process in which data is being deleted, corrupted, and
unreadable by a user, software, or application. In a cloud computing environment, data loss
occurs when our sensitive data is somebody else's hands, one or more data elements can not
be utilized by the data owner, hard disk is not working properly, and software is not updated.

Hacked Interfaces and Insecure APIs

As we all know, cloud computing is completely depends on Internet, so it is compulsory to


protect interfaces and APIs that are used by external users. APIs are the easiest way to
communicate with most of the cloud services. In cloud computing, few services are available
in the public domain. These services can be accessed by third parties, so there may be a
chance that these services easily harmed and hacked by hackers.

Data Breach

Data Breach is the process in which the confidential data is viewed, accessed, or stolen by the
third party without any authorization, so organization's data is hacked by the hackers.

Vendor lock-in

Vendor lock-in is the of the biggest security risks in cloud computing. Organizations may
face problems when transferring their services from one vendor to another. As different
vendors provide different platforms, that can cause difficulty moving one cloud to another.

Increased complexity strains IT staff

Migrating, integrating, and operating the cloud services is complex for the IT staff. IT staff
must require the extra capability and skills to manage, integrate, and maintain the data to the
cloud.

Spectre & Meltdown

Spectre & Meltdown allows programs to view and steal data which is currently processed on
computer. It can run on personal computers, mobile devices, and in the cloud. It can store the
password, your personal information such as images, emails, and business documents in the
memory of other running programs.
Denial of Service (DoS) attacks

Denial of service (DoS) attacks occur when the system receives too much traffic to buffer the
server. Mostly, DoS attackers target web servers of large organizations such as banking
sectors, media companies, and government organizations. To recover the lost data, DoS
attackers charge a great deal of time and money to handle the data.

Account hijacking

Account hijacking is a serious security risk in cloud computing. It is the process in which
individual user's or organization's cloud account (bank account, e-mail account, and social
media account) is stolen by hackers. The hackers use the stolen account to perform
unauthorized activities.

Major challenges for cloud computing: Cloud computing is the provisioning of resources
like data and storage on demand, that is in real-time. It has been proven to be revolutionary in
the IT industry with the market valuation growing at a rapid rate. Cloud development has
proved to be beneficial not only for huge public and private enterprises but small-scale
businesses as well as it help to cut costs. It is estimated that more than 94% of businesses will
increase their spending on the cloud by more than 45%. This also has resulted in more and
high-paying jobs if you are a cloud developer.

Cloud technology was flourishing before the pandemic, but there has been a sudden spike in
cloud deployment and usage during the lockdown. The tremendous growth can be linked to
the fact that classes have been shifted online, virtual office meetings are happening on video
calling platforms, conferences are taking place virtually as well as on-demand streaming apps
have a huge audience. All this is made possible by us of cloud computing only. We are safe
to conclude that the cloud is an important part of our life today, even if we are an enterprise,
student, developer, or anyone else and are heavily dependent on it. But with this dependence,
it is also important for us to look at the issues and challenges that arise with cloud computing.
Therefore, today we bring you the most common challenges that are faced when dealing with
cloud computing, let’s have a look at them one by one:

1. Data Security and Privacy

Data security is a major concern when switching to cloud computing. User or organizational
data stored in the cloud is critical and private. Even if the cloud service provider assures data
integrity, it is your responsibility to carry out user authentication and authorization, identity
management, data encryption, and access control. Security issues on the cloud include
identity theft, data breaches, malware infections, and a lot more which eventually decrease
the trust amongst the users of your applications. This can in turn lead to potential loss in
revenue alongside reputation and stature. Also, dealing with cloud computing requires
sending and receiving huge amounts of data at high speed, and therefore is susceptible to data
leaks.

2. Cost Management

Even as almost all cloud service providers have a “Pay As You Go” model, which reduces
the overall cost of the resources being used, there are times when there are huge costs
incurred to the enterprise using cloud computing. When there is under optimization of the
resources, let’s say that the servers are not being used to their full potential, add up to the
hidden costs. If there is a degraded application performance or sudden spikes or overages in
the usage, it adds up to the overall cost. Unused resources are one of the other main reasons
why the costs go up. If you turn on the services or an instance of cloud and forget to turn it
off during the weekend or when there is no current use of it, it will increase the cost without
even using the resources.

3. Multi-Cloud Environments

Due to an increase in the options available to the companies, enterprises not only use a single
cloud but depend on multiple cloud service providers. Most of these companies use hybrid
cloud tactics and close to 84% are dependent on multiple clouds. This often ends up being
hindered and difficult to manage for the infrastructure team. The process most of the time
ends up being highly complex for the IT team due to the differences between multiple cloud
providers.

4. Performance Challenges

Performance is an important factor while considering cloud-based solutions. If the


performance of the cloud is not satisfactory, it can drive away users and decrease profits.
Even a little latency while loading an app or a web page can result in a huge drop in the
percentage of users. This latency can be a product of inefficient load balancing, which means
that the server cannot efficiently split the incoming traffic so as to provide the best user
experience. Challenges also arise in the case of fault tolerance, which means the operations
continue as required even when one or more of the components fail.

5. Interoperability and Flexibility

When an organization uses a specific cloud service provider and wants to switch to another
cloud-based solution, it often turns up to be a tedious procedure since applications written for
one cloud with the application stack are required to be re-written for the other cloud. There is
a lack of flexibility from switching from one cloud to another due to the complexities
involved. Handling data movement, setting up the security from scratch and network also add
up to the issues encountered when changing cloud solutions, thereby reducing flexibility.

6. High Dependence on Network

Since cloud computing deals with provisioning resources in real-time, it deals with enormous
amounts of data transfer to and from the servers. This is only made possible due to the
availability of the high-speed network. Although these data and resources are exchanged over
the network, this can prove to be highly vulnerable in case of limited bandwidth or cases
when there is a sudden outage. Even when the enterprises can cut their hardware costs, they
need to ensure that the internet bandwidth is high as well there are zero network outages, or
else it can result in a potential business loss. It is therefore a major challenge for smaller
enterprises that have to maintain network bandwidth that comes with a high cost.
7. Lack of Knowledge and Expertise

Due to the complex nature and the high demand for research working with the cloud often
ends up being a highly tedious task. It requires immense knowledge and wide expertise on the
subject. Although there are a lot of professionals in the field they need to constantly update
themselves. Cloud computing is a highly paid job due to the extensive gap between demand
and supply. There are a lot of vacancies but very few talented cloud engineers, developers,
and professionals. Therefore, there is a need for upskilling so these professionals can actively
understand, manage and develop cloud-based applications with minimum issues and
maximum reliability.

We have therefore discussed the most common cloud issues and challenges that are faced by
cloud engineers all over the world. If you are looking out to be a cloud professional in the
near future, then must read the article Top 5 Cloud Computing Companies to Work For in
2021.

There are mainly two computation types, including parallel computing and distributed
computing. A computer system may perform tasks according to human instructions. A single
processor executes only one task in the computer system, which is not an effective way.
Parallel computing solves this problem by allowing numerous processors to accomplish tasks
simultaneously. Modern computers support parallel processing to improve system
performance. In contrast, distributed computing enables several computers to communicate
with one another and achieve a goal. All of these computers communicate and collaborate
over the network. Distributed computing is commonly used by organizations such as
Facebook and Google that allow people to share resources.

In this article, you will learn about the difference between Parallel Computing and
Distributed Computing. But before discussing the differences, you must know about parallel
computing and distributed computing.

What is Parallel Computing?

It is also known as parallel processing. It utilizes several processors. Each of the processors
completes the tasks that have been allocated to them. In other words, parallel computing
involves performing numerous tasks simultaneously. A shared memory or distributed
memory system can be used to assist in parallel computing. All CPUs in shared memory
systems share the memory. Memory is shared between the processors in distributed memory
systems.

Parallel computing provides numerous advantages. Parallel computing helps to increase the
CPU utilization and improve the performance because several processors work
simultaneously. Moreover, the failure of one CPU has no impact on the other CPUs'
functionality. Furthermore, if one processor needs instructions from another, the CPU might
cause latency.

Advantages and Disadvantages of Parallel Computing

There are various advantages and disadvantages of parallel computing. Some of the
advantages and disadvantages are as follows:
Advantages

1. It saves time and money because many resources working together cut down on time
and costs.
2. It may be difficult to resolve larger problems on Serial Computing.
3. You can do many things at once using many computing resources.
4. Parallel computing is much better than serial computing for modeling, simulating, and
comprehending complicated real-world events.

Disadvantages

1. The multi-core architectures consume a lot of power.


2. Parallel solutions are more difficult to implement, debug, and prove right due to the
complexity of communication and coordination, and they frequently perform worse
than their serial equivalents.

What is Distributing Computing?

It comprises several software components that reside on different systems but operate as a
single system. A distributed system's computers can be physically close together and linked
by a local network or geographically distant and linked by a wide area network (WAN). A
distributed system can be made up of any number of different configurations, such as
mainframes, PCs, workstations, and minicomputers. The main aim of distributed computing
is to make a network work as a single computer.

There are various benefits of using distributed computing. It enables scalability and makes it
simpler to share resources. It also aids in the efficiency of computation processes.

Advantages and Disadvantages of Distributed Computing

There are various advantages and disadvantages of distributed computing. Some of the
advantages and disadvantages are as follows:

Advantages

1. It is flexible, making it simple to install, use, and debug new services.


2. In distributed computing, you may add multiple machines as required.
3. If the system crashes on one server, that doesn't affect other servers.
4. A distributed computer system may combine the computational capacity of several
computers, making it faster than traditional systems.

Disadvantages

1. Data security and sharing are the main issues in distributed systems due to the features
of open systems
2. Because of the distribution across multiple servers, troubleshooting and diagnostics
are more challenging.
3. The main disadvantage of distributed computer systems is the lack of software
support.
Key differences between the Parallel Computing and Distributed Computing

Here, you will learn the various key differences between parallel computing and distributed
computation. Some of the key differences between parallel computing and distributed
computing are as follows:

1. Parallel computing is a sort of computation in which various tasks or processes are


run at the same time. In contrast, distributed computing is that type of computing in
which the components are located on various networked systems that interact and
coordinate their actions by passing messages to one another.
2. In parallel computing, processors communicate with another processor via a bus. On
the other hand, computer systems in distributed computing connect with one another
via a network.
3. Parallel computing takes place on a single computer. In contrast, distributed
computing takes place on several computers.
4. Parallel computing aids in improving system performance. On the other hand,
distributed computing allows for scalability, resource sharing, and the efficient
completion of computation tasks.
5. The computer in parallel computing can have shared or distributed memory. In
contrast, every system in distributed computing has its memory.
6. Multiple processors execute multiple tasks simultaneously in parallel computing. In
contrast, many computer systems execute tasks simultaneously in distributed
computing.

Head-to-head Comparison between the Parallel Computing and Distributed Computing

Features Parallel Computing Distributed Computing

It is that type of computing in which the


It is a type of computation
components are located on various networked
Definition in which various processes
systems that interact and coordinate their
runs simultaneously.
actions by passing messages to one another.

The processors
The computer systems connect with one
Communication communicate with one
another via a network.
another via a bus.

Several processors execute


various tasks Several computers execute tasks
Functionality
simultaneously in parallel simultaneously.
computing.

Number of It occurs in a single


It involves various computers.
Computers computer system.

The system may have


Each computer system in distributed
Memory distributed or shared
computing has its own memory.
memory.
It helps to improve the It allows for scalability, resource sharing, and
Usage
system performance the efficient completion of computation tasks.

Conclusion

There are two types of computations: parallel computing and distributed computing. Parallel
computing allows several processors to accomplish their tasks at the same time. In contrast,
distributed computing splits a single task among numerous systems to achieve a common
goal.

Logical Clock in Distributed System


Logical Clocks refer to implementing a protocol on all machines within your distributed
system, so that the machines are able to maintain consistent ordering of events within some
virtual timespan. A logical clock is a mechanism for capturing chronological and causal
relationships in a distributed system. Distributed systems may have no physically
synchronous global clock, so a logical clock allows global ordering on events from
different processes in such systems.
Example :
If we go outside then we have made a full plan that at which place we have to go first,
second and so on. We don’t go to second place at first and then the first place. We always
maintain the procedure or an organization that is planned before. In a similar way, we
should do the operations on our PCs one by one in an organized way.
Suppose, we have more than 10 PCs in a distributed system and every PC is doing it’s own
work but then how we make them work together. There comes a solution to this i.e.
LOGICAL CLOCK.

Method-1:
To order events across process, try to sync clocks in one approach.
This means that if one PC has a time 2:00 pm then every PC should have the same time
which is quite not possible. Not every clock can sync at one time. Then we can’t follow this
method.
Method-2:
Another approach is to assign Timestamps to events.
Taking the example into consideration, this means if we assign the first place as 1, second
place as 2, third place as 3 and so on. Then we always know that the first place will always
come first and then so on. Similarly, If we give each PC their individual number than it will
be organized in a way that 1st PC will complete its process first and then second and so on.
BUT, Timestamps will only work as long as they obey causality.
What is causality ?
Causality is fully based on HAPPEN BEFORE RELATIONSHIP.
 Taking single PC only if 2 events A and B are occurring one by one then TS(A) <
TS(B). If A has timestamp of 1, then B should have timestamp more than 1, then only
happen before relationship occurs.
 Taking 2 PCs and event A in P1 (PC.1) and event B in P2 (PC.2) then also the condition
will be TS(A) < TS(B). Taking example- suppose you are sending message to someone
at 2:00:00 pm, and the other person is receiving it at 2:00:02 pm.Then it’s obvious that
TS(sender) < TS(receiver).
Properties Derived from Happen Before Relationship –
 Transitive Relation –
If, TS(A) <TS(B) and TS(B) <TS(C), then TS(A) < TS(C)
 Causally Ordered Relation –
a->b, this means that a is occurring before b and if there is any changes in a it will surely
reflect on b.
 Concurrent Event –
This means that not every process occurs one by one, some processes are made to
happen simultaneously i.e., A || B.
Message Passing Model of Process Communication
o message passing means how a message can be sent from one end to the other end. Either
it may be a client-server model or it may be from one node to another node. The formal
model for distributed message passing has two timing models one is synchronous and the
other is asynchronous.
The fundamental points of message passing are:
1. In message-passing systems, processors communicate with one another by sending and
receiving messages over a communication channel. So how the arrangement should be
done?
2. The pattern of the connection provided by the channel is described by some topology
systems.
3. The collection of the channels is called a network.
4. So by the definition of distributed systems, we know that they are geographically set of
computers. So it is not possible for one computer to directly connect with some other
node.
5. So all channels in the Message-Passing Model are private.
6. The sender decides what data has to be sent over the network. An example is, making a
phone call.
7. The data is only fully communicated after the destination worker decides to receive the
data. Example when another person receives your call and starts to reply to you.
8. There is no time barrier. It is in the hand of a receiver after how many rings he receives
your call. He can make you wait forever by not picking up the call.
9. For successful network communication, it needs active participation from both sides.

Message Passing Model

Algorithm:
1. Let us consider a network consisting of n nodes named p 0, p1, p2……..pn-1 which are
bidirectional point to point channels.
2. Each node might not know who is at another end. So in this way, the topology would be
arranged.
3. Whenever the communication is established and whenever the message passing is
started then only the processes know from where to where the message has to be sent.
Advantages of Message Passing Model :
1. Easier to implement.
2. Quite tolerant of high communication latencies.
3. Easier to build massively parallel hardware.
4. It is more tolerant of higher communication latencies.
5. Message passing libraries are faster and give high performance.
Disadvantages of Message Passing Model :
1. Programmer has to do everything.
2. Connection setup takes time that’s why it is slower.
3. Data transfer usually requires cooperative operations which can be difficult to achieve.
4. It is difficult for programmers to develop portable applications using this model because
message-passing implementations commonly comprise a library of subroutines that are
embedded in source code. Here again, the programmer has to do everything on his own.
UNIT II:
Cloud Infrastructure: At Amazon, The Google Perspective, Microsoft Windows Azure,
Open Source Software Platforms, Cloud storage diversity, Inter cloud, energy use and
ecological impact, responsibility sharing, user experience, Software licensing, Cloud
Computing :Applications and Paradigms: Challenges for cloud, existing cloud applications
and new opportunities, architectural styles, workflows, The Zookeeper, The Map Reduce
Program model, HPC on cloud, biological research.

Cloud Computing architecture comprises of many cloud components, which are loosely
coupled. We can broadly divide the cloud architecture into two parts:

 Front End
 Back End

Each of the ends is connected through a network, usually Internet. The following diagram
shows the graphical view of cloud computing architecture:

Front End

The front end refers to the client part of cloud computing system. It consists of interfaces
and applications that are required to access the cloud computing platforms, Example - Web
Browser.

Back End

The back End refers to the cloud itself. It consists of all the resources required to provide
cloud computing services. It comprises of huge data storage, virtual machines, security
mechanism, services, deployment models, servers, etc.
Note

 It is the responsibility of the back end to provide built-in security mechanism, traffic
control and protocols.
 The server employs certain protocols known as middleware, which help the connected
devices to communicate with each other.

Cloud infrastructure consists of servers, storage devices, network, cloud management


software, deployment software, and platform virtualization.

Hypervisor

Hypervisor is a firmware or low-level program that acts as a Virtual Machine Manager. It


allows to share the single physical instance of cloud resources between several tenants.

Management Software

It helps to maintain and configure the infrastructure.

Deployment Software

It helps to deploy and integrate the application on the cloud.

Network

It is the key component of cloud infrastructure. It allows to connect cloud services over the
Internet. It is also possible to deliver network as a utility over the Internet, which means, the
customer can customize the network route and protocol.

Server

The server helps to compute the resource sharing and offers other services such as resource
allocation and de-allocation, monitoring the resources, providing security etc.

Storage

Cloud keeps multiple replicas of storage. If one of the storage resources fails, then it can be
extracted from another one, which makes cloud computing more reliable.
Infrastructural Constraints

Fundamental constraints that cloud infrastructure should implement are shown in the
following diagram:

Transparency

Virtualization is the key to share resources in cloud environment. But it is not possible to
satisfy the demand with single resource or server. Therefore, there must be transparency in
resources, load balancing and application, so that we can scale them on demand.

Scalability

Scaling up an application delivery solution is not that easy as scaling up an application


because it involves configuration overhead or even re-architecting the network. So,
application delivery solution is need to be scalable which will require the virtual
infrastructure such that resource can be provisioned and de-provisioned easily.

Intelligent Monitoring

To achieve transparency and scalability, application solution delivery will need to be capable
of intelligent monitoring.

Security

The mega data centre in the cloud should be securely architected. Also the control node, an
entry point in mega data centre, also needs to be secure.
AWS

AWS stands for Amazon Web Services which uses distributed IT infrastructure to provide
different IT resources on demand.

Our AWS tutorial includes all the topics such as introduction, history of aws, global
infrastructure, features of aws, IAM, Storage services, Database services, etc.

What is AWS?

 AWS stands for Amazon Web Services.


 The AWS service is provided by the Amazon that uses distributed IT infrastructure to
provide different IT resources available on demand. It provides different services such
as infrastructure as a service (IaaS), platform as a service (PaaS) and packaged
software as a service (SaaS).
 Amazon launched AWS, a cloud computing platform to allow the different
organizations to take advantage of reliable IT infrastructure.

Uses of AWS

 A small manufacturing organization uses their expertise to expand their business by


leaving their IT management to the AWS.
 A large enterprise spread across the globe can utilize the AWS to deliver the training
to the distributed workforce.
 An architecture consulting company can use AWS to get the high-compute rendering
of construction prototype.
 A media company can use the AWS to provide different types of content such as ebox
or audio files to the worldwide files.

Pay-As-You-Go

Based on the concept of Pay-As-You-Go, AWS provides the services to the customers.

AWS provides services to customers when required without any prior commitment or upfront
investment. Pay-As-You-Go enables the customers to procure services from AWS.

 Computing
 Programming models
 Database storage
 Networking
Advantages of AWS

1) Flexibility

 We can get more time for core business tasks due to the instant availability of new
features and services in AWS.
 It provides effortless hosting of legacy applications. AWS does not require learning
new technologies and migration of applications to the AWS provides the advanced
computing and efficient storage.
 AWS also offers a choice that whether we want to run the applications and services
together or not. We can also choose to run a part of the IT infrastructure in AWS and
the remaining part in data centres.

2) Cost-effectiveness

AWS requires no upfront investment, long-term commitment, and minimum expense when
compared to traditional IT infrastructure that requires a huge investment.

3) Scalability/Elasticity

Through AWS, autoscaling and elastic load balancing techniques are automatically scaled up
or down, when demand increases or decreases respectively. AWS techniques are ideal for
handling unpredictable or very high loads. Due to this reason, organizations enjoy the
benefits of reduced cost and increased user satisfaction.

4) Security

 AWS provides end-to-end security and privacy to customers.


 AWS has a virtual infrastructure that offers optimum availability while managing full
privacy and isolation of their operations.
 Customers can expect high-level of physical security because of Amazon's several
years of experience in designing, developing and maintaining large-scale IT operation
centers.
 AWS ensures the three aspects of security, i.e., Confidentiality, integrity, and
availability of user's data.

History of AWS

 2003: In 2003, Chris Pinkham and Benjamin Black presented a paper on how
Amazon's own internal infrastructure should look like. They suggested to sell it as a
service and prepared a business case on it. They prepared a six-page document and
had a look over it to proceed with it or not. They decided to proceed with the
documentation.
 2004: SQS stands for "Simple Queue Service" was officially launched in 2004. A
team launched this service in Cape Town, South Africa.
 2006: AWS (Amazon Web Services) was officially launched.
 2007: In 2007, over 180,000 developers had signed up for the AWS.
 2010: In 2010, amazon.com retail web services were moved to the AWS, i.e.,
amazon.com is now running on AWS.
 2011: AWS suffered from some major problems. Some parts of volume of EBS
(Elastic Block Store) was stuck and were unable to read and write requests. It took
two days for the problem to get resolved.
 2012: AWS hosted a first customer event known as re:Invent conference. First
re:invent conference occurred in which new products were launched. In AWS,
another major problem occurred that affects many popular sites such as Pinterest,
Reddit, and Foursquare.
 2013: In 2013, certifications were launched. AWS started a certifications program for
software engineers who had expertise in cloud computing.
 2014: AWS committed to achieve 100% renewable energy usage for its global
footprint.
 2015: AWS breaks its revenue and reaches to $6 Billion USD per annum. The
revenue was growing 90% every year.
 2016: By 2016, revenue doubled and reached $13Billion USD per annum.
 2017: In 2017, AWS re: invent releases a host of Artificial Intelligence Services due
to which revenue of AWS doubled and reached $27 Billion USD per annum.
 2018: In 2018, AWS launched a Machine Learning Speciality Certs. It heavily
focussed on automating Artificial Intelligence and Machine learning.

Features of AWS

The following are the features of AWS:

 Flexibility
 Cost-effective
 Scalable and elastic
 Secure
 Experienced

1) Flexibility

 The difference between AWS and traditional IT models is flexibility.


 The traditional models used to deliver IT solutions that require large investments in a
new architecture, programming languages, and operating system. Although these
investments are valuable, it takes time to adopt new technologies and can also slow
down your business.
 The flexibility of AWS allows us to choose which programming models, languages,
and operating systems are better suited for their project, so we do not have to learn
new skills to adopt new technologies.
 Flexibility means that migrating legacy applications to the cloud is easy, and cost-
effective. Instead of re-writing the applications to adopt new technologies, you just
need to move the applications to the cloud and tap into advanced computing
capabilities.
 Building applications in aws are like building applications using existing hardware
resources.
 The larger organizations run in a hybrid mode, i.e., some pieces of the application run
in their data center, and other portions of the application run in the cloud.
 The flexibility of aws is a great asset for organizations to deliver the product with
updated technology in time, and overall enhancing the productivity.

2) Cost-effective

 Cost is one of the most important factors that need to be considered in delivering IT
solutions.
 For example, developing and deploying an application can incur a low cost, but after
successful deployment, there is a need for hardware and bandwidth. Owing our own
infrastructure can incur considerable costs, such as power, cooling, real estate, and
staff.
 The cloud provides on-demand IT infrastructure that lets you consume the resources
what you actually need. In aws, you are not limited to a set amount of resources such
as storage, bandwidth or computing resources as it is very difficult to predict the
requirements of every resource. Therefore, we can say that the cloud provides
flexibility by maintaining the right balance of resources.
 AWS provides no upfront investment, long-term commitment, or minimum spend.
 You can scale up or scale down as the demand for resources increases or decreases
respectively.
 An aws allows you to access the resources more instantly. It has the ability to respond
the changes more quickly, and no matter whether the changes are large or small,
means that we can take new opportunities to meet the business challenges that could
increase the revenue, and reduce the cost.

3) Scalable and elastic

 In a traditional IT organization, scalability and elasticity were calculated with


investment and infrastructure while in a cloud, scalability and elasticity provide
savings and improved ROI (Return On Investment).
 Scalability in aws has the ability to scale the computing resources up or down when
demand increases or decreases respectively.
 Elasticity in aws is defined as the distribution of incoming application traffic across
multiple targets such as Amazon EC2 instances, containers, IP addresses, and Lambda
functions.
 Elasticity load balancing and scalability automatically scale your AWS computing
resources to meet unexpected demand and scale down automatically when demand
decreases.
 The aws cloud is also useful for implementing short-term jobs, mission-critical jobs,
and the jobs repeated at the regular intervals.

4) Secure
 AWS provides a scalable cloud-computing platform that provides customers with
end-to-end security and end-to-end privacy.
 AWS incorporates the security into its services, and documents to describe how to use
the security features.
 AWS maintains confidentiality, integrity, and availability of your data which is the
utmost importance of the aws.

Physical security: Amazon has many years of experience in designing, constructing, and
operating large-scale data centres. An aws infrastructure is incorporated in AWS controlled
data centres throughout the world. The data centres are physically secured to prevent
unauthorized access.

Secure services: Each service provided by the AWS cloud is secure.

Data privacy: A personal and business data can be encrypted to maintain data privacy.

5) Experienced

 The AWS cloud provides levels of scale, security, reliability, and privacy.
 AWS has built an infrastructure based on lessons learned from over sixteen years of
experience managing the multi-billion dollar Amazon.com business.
 Amazon continues to benefit its customers by enhancing their infrastructure
capabilities.
 Nowadays, Amazon has become a global web platform that serves millions of
customers, and AWS has been evolved since 2006, serving hundreds of thousands of
customers worldwide.

AWS Global Infrastructure

 AWS is a cloud computing platform which is globally available.


 Global infrastructure is a region around the world in which AWS is based. Global
infrastructure is a bunch of high-level IT services which is shown below:
 AWS is available in 19 regions, and 57 availability zones in December 2018 and 5
more regions 15 more availability zones for 2019.

The following are the components that make up the AWS infrastructure:

 Availability Zones
 Region
 Edge locations
 Regional Edge Caches
Availability zone as a Data Center

 An availability zone is a facility that can be somewhere in a country or in a city.


Inside this facility, i.e., Data Centre, we can have multiple servers, switches, load
balancing, firewalls. The things which interact with the cloud sits inside the data
centers.
 An availability zone can be a several data centers, but if they are close together, they
are counted as 1 availability zone.

Region

 A region is a geographical area. Each region consists of 2 more availability zones.


 A region is a collection of data centers which are completely isolated from other
regions.
 A region consists of more than two availability zones connected to each other through
links.

 Availability zones are connected through redundant and isolated metro fibers.

Edge Locations

 Edge locations are the endpoints for AWS used for caching content.
 Edge locations consist of Cloud Front, Amazon's Content Delivery Network (CDN).
 Edge locations are more than regions. Currently, there are over 150 edge locations.
 Edge location is not a region but a small location that AWS have. It is used for
caching the content.
 Edge locations are mainly located in most of the major cities to distribute the content
to end users with reduced latency.
 For example, some user accesses your website from Singapore; then this request
would be redirected to the edge location closest to Singapore where cached data can
be read.

Regional Edge Cache

 AWS announced a new type of edge location in November 2016, known as a


Regional Edge Cache.
 Regional Edge cache lies between Cloud Front Origin servers and the edge locations.
 A regional edge cache has a large cache than an individual edge location.
 Data is removed from the cache at the edge location while the data is retained at the
Regional Edge Caches.
 When the user requests the data, then data is no longer available at the edge location.
Therefore, the edge location retrieves the cached data from the Regional edge cache
instead of the Origin servers that have high latency.

AWS Free Tier

How to SignUp to the AWS platform

 Firstly visit the website https://aws.amazon.com.


 The following screen appears after opening the website, then click on the Complete
Sign Up to create an account and fill the required details.

 The following screen appears after clicking on the "Complete Sign Up" button. If
you are an already existing user of an AWS account, then enter the email address of
your AWS account otherwise "create an aws account".
 On clicking on the "create an aws account" button, the following screen appears
that requires some fields to be filled by the user.

 Now, fill your contact information.

 After providing the contact information, fill your payment information.

 After providing your payment information, confirm your identity by entering your
phone number and security check code, and then click on the "Contact me" button.

 AWS will contact you to verify whether the provided contact number is correct or not.

 When number is verified, then the following message appears on the screen.

 The final step is the confirmation step. Click on the link to log in again; it redirects
you to the "Management Console".

AWS Account Identifiers


AWS assigns two types of unique ID to each user's account:

 An AWS account ID
 A canonical user ID

AWS account ID

AWS account ID is a 12-digit number such as 123456780123 which can be used to construct
Amazon Resource Names (ARNs). Suppose we refer to resources such as an IAM user, the
AWS account ID distinguishes the resources from resources in other AWS accounts.

Finding the AWS account ID

We can find the AWS account ID from AWS Management Console. The following steps are
taken to view your account ID:

 Login to the aws account by entering your email address and password, and then you
will move to the management console.

Now, click on the account name, a dropdown menu appears.

Click on "My Account" in the dropdown menu of account name to view your account ID.

Canonical User ID

 A Canonical user ID is 64-digit hexadecimal encoded a 256-bit number.


 A canonical user ID is used in an Amazon S3 bucket policy for cross-account access
means that AWS account can access the resources in another AWS account. For
example, if you want AWS account access to your bucket, you need to specify the
canonical user ID to your bucket's policy.

Finding the canonical user ID

 Firstly, visit the website https://aws.amazon.com, and log in to the aws account by
entering your email address and password.
 From the right side of the management console, click on the account name.
 Click on the "My Security Credentials" from the dropdown menu of the account
name. The screen appears which is shown below:
 Click on the Account identifiers to view the Canonical user ID.
IAM Identities

IAM identities are created to provide authentication for people and processes in your aws
account.

IAM identities are categorized as given below:

 IAM Users
 IAM Groups
 IAM Roles

AWS Account Root User

 When you first create an AWS account, you create an account as a root user identity
which is used to sign in to AWS.
 You can sign to the AWS Management Console by entering your email address and
password. The combination of email address and password is known as root user
credentials.
 When you sign in to AWS account as a root user, you have unrestricted access to all
the resources in AWS account.
 The Root user can also access the billing information as well as can change the
password also.

What is a Role?

 A role is a set of permissions that grant access to actions and resources in AWS.
These permissions are attached to the role, not to an IAM User or a group.
 An IAM User can use a role in the same AWS account or a different account.
 An IAM User is similar to an IAM User; role is also an AWS identity with permission
policies that determine what the identity can and cannot do in AWS.
 A role is not uniquely associated with a single person; it can be used by anyone who
needs it.
 A role does not have long term security credential, i.e., password or security key.
Instead, if the user uses a role, temporarily security credentials are created and
provided to the user.
 You can use the roles to delegate access to users, applications or services that
generally do not have access to your AWS resources.

Situations in which "IAM Roles" can be used:

 Sometimes you want to grant the users to access the AWS resources in your AWS
account.
 Sometimes you want to grant the users to access the AWS resources in another AWS
account.
 It also allows the mobile app to access the AWS resources, but not want to store the
keys in the app.
 It can be used to grant access to the AWS resources which have identities outside of
AWS.
 It can also be used to grant access to the AWS resources to the third party so that they
can perform an audit on AWS resources.

Following are the important terms associated with the "IAM Roles":

 Delegation: Delegation is a process of granting the permissions to the user to allow


the access to the AWS resources that you control. Delegation sets up the trust between
a trusted account (an account that owns the resource) and a trusting account (an
account that contains the users that need to access the resources).
The trusting and trusted account can be of three types:
o Same account
o Two different accounts under the same organization control
o Two different accounts owned by different organizations.

To delegate permission to access the resources, an IAM role is to be created in the trusting
account that has the two policies attached.

Permission Policy: It grants the user with a role the needed permissions to carry out the
intended tasks.

Trust Policy: It specifies which trusted account members can use the role.

 Federation: Federation is a process of creating the trust relationship between the


external service provider and AWS. For example, Facebook allows the user to login
to different websites by using their facebook accounts.
 Trust policy: A document was written in JSON format to define who is allowed to
use the role. This document is written based on the rules of the IAM Policy Language.
 Permissions policy: A document written in JSON format to define the actions and
resources that the role can use. This document is based on the rules of the IAM Policy
Language.
 Permissions boundary: It is an advanced feature of AWS in which you can limit the
maximum permissions that the role can have. The permission boundaries can be
applied to IAM User or IAM role but cannot be applied to the service-linked role.
 Principal: A principal can be AWS root account user, an IAM User, or a role. The
permissions that can be granted in one of the two ways:
o Attach a permission policy to a role.
o The services that support resource-based policies, you can identify the
principal in the principal element of policy attached to the resource.
 Cross-account access: Roles vs Resource-Based Policies: It allows you to grant
access to the resources in one account to the trusted principal in another account is
known as cross-account access. Some services allow you to attach the policy directly,
known as Resource-Based policy. The services that support Resource-Based Policy
are Amazon S3 buckets, Amazon SNS, Amazon SQS Queues.

IAM Roles Use Cases

There are two ways to use the roles:

 IAM Console: When IAM Users working in the IAM Console and want to use the
role, then they access the permissions of the role temporarily. An IAM Users give up
their original permissions and take the permissions of the role. When IAM User exits
the role, their original permissions are restored.
 Programmatic Access: An AWS service such as Amazon EC2 instance can use role
by requesting temporary security credentials using the programmatic requests to
AWS.

An IAM Role can be used in the following ways:

 IAM User: IAM Roles are used to grant the permissions to your IAM Users to access
AWS resources within your own or different account. An IAM User can use the
permissions attached to the role using the IAM Console. A Role also prevents the
accidental access to the sensitive AWS resources.
 Applications and Services: You can grant the access of permissions attached with a
role to applications and services by calling the AssumeRole API function. The
AssumeRole function returns a temporary security credentials associated with a role.
An application and services can only take those actions which are permitted by the
role. An application cannot exit the role in the way the IAM User in Console does,
rather it stops using with the temporary credentials and resumes its original
credentials.
 Federated Users: Federated Users can sign in using the temporary credentials
provided by an identity provider. AWS provides an IDP (identity provider) and
temporary credentials associated with the role to the user. The credentials grant the
access of permissions to the user.

Following are the cases of Roles:

 Switch to a role as an IAM User in one AWS account to access resources in another
account that you own.
o You can grant the permission to your IAM Users to switch roles within your
AWS account or different account. For example, you have Amazon EC2
instances which are very critical to your organization. Instead of directly
granting permission to users to terminate the instances, you can create a role
with the privileges that allows the administrators to switch to the role when
they need to terminate the instance.
o You have to grant users permission to assume the role explicitly.
o Multi-factor authentication role can be added to the role so that only users who
sign in with the MFA can use the role.
o Roles prevent accidental changes to the sensitive resource, especially if you
combine them with the auditing so that the roles can only be used when
needed.
o An IAM User in one account can switch to the role in a same or different
account. With roles, a user can access the resources permitted by the role.
When user switch to the role, then their original permissions are taken away. If
a user exits the role, their original permissions are restored.
 Providing access to an AWS service
o AWS services use roles to access a AWS resources.
o Each service is different in how it uses roles and how the roles are assigned to
the service.
o Suppose an AWS service such as Amazon EC2 instance that runs your
application, wants to make request to the AWS resources such as Amazon S3
bucket, the service must have security credentials to access the resources. If
you embed security credentials directly into the instance, then distributing the
credentials to the multiple instances create a security risk. To overcome such
problems, you can create a role which is assigned to the Amazon EC2 instance
that grants the permission to access the resources.
 Providing access to externally authenticated users.
Sometimes users have identities outside of AWS such as in your corporate directory.
If such users want to work with the AWS resources, then they should know the
security credentials. In such situations, we can use a role to specify the permissions
for a third-party identity provider (IDP).
o SAML -based federation
SAML 2.0 (Security Assertion Markup Language 2.0) is an open framework
that many identity providers use. SAML provides the user with the federated
single-sign-on to the AWS Management Console, so that user can log in to the
AWS Management Console.
How SAML-based federation works
o Web-identity federation
Suppose you are creating a mobile app that accesses AWS resources such as a
game that run on a mobile device, but the information is stored using Amazon
S3 and DynamoDB.
When you create such an app, you need to make requests to the AWS services
that must be signed with an AWS access key. However, it is recommended not
to use long-term AWS credentials, not even in an encrypted form. An
Application must request for the temporary security credentials which are
dynamically created when needed by using web-identity federation. These
temporary security credentials will map to a role that has the permissions
needed for the app to perform a task.
With web-identity federation, users do not require any custom sign-in code or
user identities. A User can log in using the external identity provider such as
login with Amazon, Facebook, Google or another OpenID. After login, the
user gets the authentication token, and they exchange the authentication token
for receiving the temporary security credentials.
 Providing access to third parties
When third parties want to access the AWS resources, then you can use roles to
delegate access to them. IAM roles grant these third parties to access the AWS
resources without sharing any security credentials.
Third parties provide the following information to create a role:
o The third party provides the account ID that contains the IAM Users to use
your role. You need to specify AWS account ID as the principal when you
define the trust policy for the role.
o The external ID of the third party is used to associate with the role. You
specify the external ID to define the trust policy of the role.
o The permissions are used by the third party to access the AWS resources. The
permissions are associated with the role made when you define the trust
policy. The policy defines the actions what they can take and what resources
they can use.

Creating IAM Roles for a service

Creating a Role for a service using the AWS Management Console.

 In the navigation pane of the console, click Roles and then click on "Create Role".
The screen appears shown below on clicking Create Role button.

 Choose the service that you want to use with the role.
 Select the managed policy that attaches the permissions to the service.

 In a role name box, enter the role name that describes the role of the service, and then
click on "Create role".

Creating a Role for a service using the CLI (Command Line Interface)

 Creating a role using the console, many of the steps are already done for you, but with
the CLI you explicitly perform each step yourself. You must create a policy, and
assign a permission policy to the role.
To create a role for an AWS service using the AWS CLI, use the following
commands:
o Create a role: aws iam create-role
o Attach a permission policy to the role: aws iam put-role-policy
 If you are using a role with instance such as Amazon EC2 instance, then you need to
create an instance profile to store a role. An instance profile is a container of role, but
instance profile can contain only one role. If you create the role by using AWS
Management Console, then instance profile is already created for you. If you create
the profile using CLI, you must explicitly specify each step yourself.
To create an instance profile using CLI, use the following commands:
o Create an instance profile: aws iam create-instance-profile
o Add a role to instance profile: aws iam add-role-to-instance-profile
Creating IAM Roles for an IAM User

Creating a Role for an IAM User using AWS Management Console

 In the navigation pane of the console, click Roles and then click on "Create Role".
The screen appears shown below on clicking Create Role button.

 Specify the account ID that you want to grant the access to the resources, and then
click on Next Permissions button.
 If you selected the option "Require external ID" means that it allows the users from
the third party to access the resources. You need to enter the external ID provided by
the administrator of the third party. This condition is automatically added to the trust
policy that allows the user to assume the role.
 If you selected the option "Require MFA" is used to restrict the role to the users who
provide Multi-factor authentication.
 Select a policy that you want to attach with the role. A policy contains the permissions
that specify the actions that they can take and resources that they can access.

 In a role name box, enter the role name and the role description.

 Click on Create role to complete the creation of the role.

Creating a Role for an IAM User using CLI (Command Line Interface)

When you use the console to create a role, many of the steps are already done for you. In the
case of CLI, you must specify each step explicitly.

To create a role for cross-account access using CLI, use the following commands:

 Create a role: aws iam create-role


 Attach a permission policy to the role: aws iam put-role-policy

Creating IAM Roles for a Third Party Identity Provider (Federation)

Identity Federation allows you to access AWS resources for users who can sign in using
third-party identity provider. To configure Identity Federation, you must configure the
identity provider and then create an IAM Role that determines the permissions which
federated users can have.

 Web Identity Federation: Web Identity Federation provides access to the AWS
resources which have signed in with the login with facebook, Google, Amazon or
another Open ID standard. To configure with the Web Identity Federation, you must
first create and configure the identity provider and then create the IAM Role that
determines the permissions that federated users will have.
 Security Assertion Markup Language (SAML) 2.0 Federation: SAML-Based
Federation provides access to the AWS resources in an organization that uses SAML.
To configure SAML 2.0 Based Federation, you must first create and configure the
identity provider and then create the IAM Role that determines the permissions the
federated users from the organization will have.
Creating a Role for a web identity using AWS Management Console

 Open the IAM Console at https://console.aws.amazon.com/iam/


 In the navigation pane, click Roles and then click on Create role.
 After clicking on the create role, select the type of trusted entity, i.e., web identity

 Specify the client ID that identifies your application.


o If you are creating a role for Amazon Cognity, specify the ID of the identity
pool when you have created your Amazon Cognito applications into the
identity Pool ID box.
o If you are creating a role for a single web identity provider, specify the ID that
the provider provides when you have registered your application with the
identity provider.
 (Optional) Click Add Conditions to add the additional conditions that must be met
before users of your application can use the permissions granted by the role.
 Now, attach the permission policies to the role and then click Next: Tags.

 In a role name box, specify the role name and role description

 Click Create role to complete the process of creation of role.

Creating a Role for SAML Based 2.0 Federation using AWS Management Console

 Open the IAM Console at https://console.aws.amazon.com/iam/


 In the navigation pane of the console, Click Roles and then click on Create role
 Click on Role for Identity Provider Access.
 Select the type of the role that you want to create for Grant Web Single Sign-On
(SSO) or Grant API access.
 Select the SAML Provider for which you want to create the role.
 If you are creating a role for API access, select the attribute from the attribute list.
Then in the value box, enter the value that you want to include in the role. It restricts
the access to the role to the users from the identity providers whose SAML
authentication response includes the attributes you select.
 If you want to add more attribute related conditions, click on Add Conditions.
 Attach the permission policies to the role.
 Click Create role to complete the process of creation of role.

Creating a role for Federated Users using AWS CLI

To create a role for federated users using AWS CLI, use the following commands:

Create a role: aws iam create-role

To attach permission to the policy: aws iam attach-role-policy or aws iam put-role-policy

S3-101

 S3 is one of the first services that has been produced by aws.


 S3 stands for Simple Storage Service.
 S3 provides developers and IT teams with secure, durable, highly scalable object
storage.
 It is easy to use with a simple web services interface to store and retrieve any amount
of data from anywhere on the web.

What is S3?

 S3 is a safe place to store the files.


 It is Object-based storage, i.e., you can store the images, word files, pdf files, etc.
 The files which are stored in S3 can be from 0 Bytes to 5 TB.
 It has unlimited storage means that you can store the data as much you want.
 Files are stored in Bucket. A bucket is like a folder available in S3 that stores the files.
 S3 is a universal namespace, i.e., the names must be unique globally. Bucket contains
a DNS address. Therefore, the bucket must contain a unique name to generate a
unique DNS address.

If you create a bucket, URL look like:

 If you upload a file to S3 bucket, then you will receive an HTTP 200 code means that
the uploading of a file is successful.

Advantages of Amazon S3
 Create Buckets: Firstly, we create a bucket and provide a name to the bucket.
Buckets are the containers in S3 that stores the data. Buckets must have a unique
name to generate a unique DNS address.
 Storing data in buckets: Bucket can be used to store an infinite amount of data. You
can upload the files as much you want into an Amazon S3 bucket, i.e., there is no
maximum limit to store the files. Each object can contain upto 5 TB of data. Each
object can be stored and retrieved by using a unique developer assigned-key.
 Download data: You can also download your data from a bucket and can also give
permission to others to download the same data. You can download the data at any
time whenever you want.
 Permissions: You can also grant or deny access to others who want to download or
upload the data from your Amazon S3 bucket. Authentication mechanism keeps the
data secure from unauthorized access.
 Standard interfaces: S3 is used with the standard interfaces REST and SOAP
interfaces which are designed in such a way that they can work with any development
toolkit.
 Security: Amazon S3 offers security features by protecting unauthorized users from
accessing your data.

S3 is a simple key-value store

S3 is object-based. Objects consist of the following:

 Key: It is simply the name of the object. For example, hello.txt, spreadsheet.xlsx, etc.
You can use the key to retrieve the object.
 Value: It is simply the data which is made up of a sequence of bytes. It is actually a
data inside the file.
 Version ID: Version ID uniquely identifies the object. It is a string generated by S3
when you add an object to the S3 bucket.
 Metadata: It is the data about data that you are storing. A set of a name-value pair
with which you can store the information regarding an object. Metadata can be
assigned to the objects in Amazon S3 bucket.
 Sub resources: Sub resource mechanism is used to store object-specific information.
 Access control information: You can put the permissions individually on your files.
Amazon S3 Concepts

 Buckets
 Objects
 Keys
 Regions
 Data Consistency Model

 Buckets
o A bucket is a container used for storing the objects.
o Every object is incorporated in a bucket.
o For example, if the object named photos/tree.jpg is stored in the treeimage
bucket, then it can be addressed by using the URL
http://treeimage.s3.amazonaws.com/photos/tree.jpg.
o A bucket has no limit to the amount of objects that it can store. No bucket can
exist inside of other buckets.
o S3 performance remains the same regardless of how many buckets have been
created.
o The AWS user that creates a bucket owns it, and no other AWS user cannot
own it. Therefore, we can say that the ownership of a bucket is not
transferrable.
o The AWS account that creates a bucket can delete a bucket, but no other AWS
user can delete the bucket.

 Objects
o Objects are the entities which are stored in an S3 bucket.
o An object consists of object data and metadata where metadata is a set of
name-value pair that describes the data.
o An object consists of some default metadata such as date last modified, and
standard HTTP metadata, such as Content type. Custom metadata can also be
specified at the time of storing an object.
o It is uniquely identified within a bucket by key and version ID.

 Key
o A key is a unique identifier for an object.
o Every object in a bucket is associated with one key.
o An object can be uniquely identified by using a combination of bucket name,
the key, and optionally version ID.
o For example, in the URL
http://jtp.s3.amazonaws.com/2019-01-31/Amazons3.wsdl where "jtp" is the
bucket name, and key is "2019-01-31/Amazons3.wsdl"

 Regions
o You can choose a geographical region in which you want to store the buckets
that you have created.
o A region is chosen in such a way that it optimizes the latency, minimize costs
or address regulatory requirements.
o Objects will not leave the region unless you explicitly transfer the objects to
another region.

 Data Consistency Model


Amazon S3 replicates the data to multiple servers to achieve high availability.
Two types of model:
o Read-after-write consistency for PUTS of new objects.
 For a PUT request, S3 stores the data across multiple servers to
achieve high availability.
 A process stores an object to S3 and will be immediately available to
read the object.
 A process stores a new object to S3, it will immediately list the keys
within the bucket.
 It does not take time for propagation, the changes are reflected
immediately.

Eventual consistency for overwrite PUTS and DELETES

 For PUTS and DELETES to objects, the changes are reflected


eventually, and they are not available immediately.
 If the process replaces an existing object with the new object, you try
to read it immediately. Until the change is fully propagated, the S3
might return prior data.
 If the process deletes an existing object, immediately try to read it.
Until the change is fully propagated, the S3 might return the deleted
data.
 If the process deletes an existing object, immediately list all the keys
within the bucket. Until the change is fully propagated, the S3 might
return the list of the deleted key.
Challenges of cloud computing: Cloud computing is a hot topic at the moment, and there is
a lot of ambiguity when it comes to managing its features and resources. Technology is
evolving, and as companies scale up, their need to use the latest Cloud frameworks also
increases. Some of the benefits introduced by cloud solutions include data security,
flexibility, efficiency, and high performance. Smoother processes and improved collaboration
between enterprises while reducing costs are among its perks. However, the Cloud is not
perfect and has its own set of drawbacks when it comes to data management and privacy
concerns. Thus, there are various benefits and challenges of cloud computing. The list below
discusses some of the key challenges in the adoption of cloud computing.

The top 15 cloud computing challenges and problems include:

1. Data Security and Privacy

Data security is a major concern when working with Cloud environments. It is one of the
major challenges in cloud computing as users have to take accountability for their data, and
not all Cloud providers can assure 100% data privacy. Lack of visibility and control tools, no
identity access management, data misuse, and Cloud misconfiguration are the common
causes behind Cloud privacy leaks. There are also concerns with insecure APIs, malicious
insiders, and oversights or neglect in Cloud data management.

Solution: Configure network hardware and install the latest software updates to prevent
security vulnerabilities. Using firewalls, antivirus, and increasing bandwidth for Cloud data
availability are some ways to prevent data security risks.

2. Multi-Cloud Environments

Common cloud computing issues and challenges with multi-cloud environments are -
configuration errors, lack of security patches, data governance, and no granularity. It is
difficult to track the security requirements of multi-clouds and apply data management
policies across various boards.

Solution: Using a multi-cloud data management solution is a good start for enterprises. Not
all tools will offer specific security functionalities, and multi-cloud environments grow highly
sophisticated and complex. Open-source products like Terraform provide a great deal of
control over multi-cloud architectures.

3. Performance Challenges

The performance of Cloud computing solutions depends on the vendors who offer these
services to clients, and if a Cloud vendor goes down, the business gets affected too. It is one
of the major challenges associated with cloud computing.

Solution: Sign up with Cloud Service Providers who have real-time SaaS monitoring policies.

The Cloud Solution Architect Certification training addresses all Cloud performance issues
and teaches learners how to mitigate them.
4. Interoperability and Flexibility

Interoperability is a challenge when you try to move applications between two or multiple
Cloud ecosystems. It is one of the challenges faced in cloud computing. Some common issues
faced are:

 Rebuilding application stacks to match the target cloud environment's specifications


 Handling data encryption during migration
 Setting up networks in the target cloud for operations
 Managing apps and services in the target cloud ecosystem

Solution: Setting Cloud interoperability and portability standards in organizations before


getting to work on projects can help solve this problem. The use of multi-layer authentication
and authorization tools is also encouraged for account verifications in public, private, and
hybrid cloud ecosystems.

5. High Dependence on Network

Lack of sufficient internet bandwidth is a common problem when transferring large volumes
of information to and from Cloud data servers. It is one of the various challenges in cloud
computing. Data is highly vulnerable, and there is a risk of sudden outages. Enterprises that
want to lower hardware costs without sacrificing performance need to ensure there is high
bandwidth, which will help prevent business losses from sudden outages.

Solution: Pay more for higher bandwidth and focus on improving operational efficiency to
address network dependencies.

6. Lack of Knowledge and Expertise

Organizations are finding it tough to find and hire the right Cloud talent, which is another
common challenge in cloud computing. There is a shortage of professionals with the required
qualifications in the industry. Workloads are increasing, and the number of tools launched in
the market is increasing. Enterprises need good expertise in order to use these tools and find
out which ones are ideal for them.

Solution: Hire Cloud professionals with specializations in DevOps and automation

7. Reliability and Availability

High unavailability of Cloud services and a lack of reliability are two major concerns in these
ecosystems. Organizations are forced to seek additional computing resources in order to keep
up with changing business requirements. If a Cloud vendor gets hacked or affected, the data
of organizations using their services gets compromised. It is another one of the many cloud
security risks and challenges faced by the industry.

Solution: Implementing the NIST Framework standards in Cloud environments can greatly
improve both aspects.
8. Password Security

Account managers use the same passwords to manage all their Cloud accounts. Password
management is a critical problem, and it is often found that users resort to using reused and
weak passwords.

Solution: Use a strong password management solution to secure all your accounts. To further
improve security, use Multifactor Authentication (MFA) in addition to a password manager.
Good cloud-based password managers alert users of security risks and leaks.

9. Cost Management

Even though Cloud Service Providers (CSPs) offer a pay-as-you-go subscription for services,
the costs can add up. Hidden costs appear in the form of underutilized resources in
enterprises.

Solution: Auditing systems regularly and implementing resource utilization monitoring tools
are some ways organizations can fix this. It's one of the most effective ways to manage
budgets and deal with major challenges in cloud computing.

10. Lack of expertise

Cloud computing is a highly competitive field, and there are many professionals who lack the
required skills and knowledge to work in the industry. There is also a huge gap in supply and
demand for certified individuals and many job vacancies.

Solution: Companies should retrain their existing IT staff and help them in upskilling their
careers by investing in Cloud training programs.

11. Control or Governance

Good IT governance ensures that the right tools are used, and assets get implemented
according to procedures and agreed-to policies. Lack of governance is a common problem,
and companies use tools that do not align with their vision. IT teams don't get total control of
compliance, risk management, and data quality checks, and there are many uncertainties
faced when migrating to the Cloud from traditional infrastructures.

Solution: Traditional IT processes should be adopted in ways to accommodate Cloud


migrations.

12. Compliance

Cloud Service Providers (CSP) are not up-to-date when it comes to having the best data
compliance policies. Whenever a user transfers data from internal servers to the Cloud, they
run into compliance issues with state laws and regulations.

Solution: The General Data Protection Regulation (GDPR) Act is expected to expedite
compliance issues in the future for CSPs.
13. Multiple Cloud Management

Enterprises depend on multiple cloud environments due to scaling up and provisioning


resources. One of the hybrid cloud security challenges is that most companies follow a hybrid
cloud strategy, and many resort to multi-cloud. The problem is that infrastructures grow
increasingly complex and difficult to manage when multiple cloud providers get added,
especially due to technological cloud computing challenges and differences.

Solution: Creating strong data management and privacy policies is a starting point when it
comes to managing multi-cloud environments effectively.

14. Migration

Migration of data to the Cloud takes time, and not all organizations are prepared for it. Some
report increased downtimes during the process, face security issues, or have problems with
data formatting and conversions. Cloud migration projects can get expensive and are harder
than anticipated.

Solution: Organizations will have to employ in-house professionals to handle their Cloud data
migration and increase their investments. Experts must analyze cloud computing issues and
solutions before investing in the latest platforms and services offered by CSPs.

15. Hybrid-Cloud Complexity

Hybrid-cloud complexity refers to cloud computing challenges arising from mixed


computing, storage, and services, and multi-cloud security causes various challenges. It
comprises private cloud services, public Clouds, and on-premises infrastructures, for
example, products like Microsoft Azure and Amazon Web Services - which are orchestrated
on various platforms.

Solution: Using centralized Cloud management solutions, increasing automation, and


hardening security are good ways to mitigate hybrid-cloud complexity.

Conclusion

Now that you're aware of the cloud computing opportunities and challenges and key
challenges in the adoption of cloud computing and their solutions, you can take the necessary
steps to address them. It is important to view a company's Cloud strategy as a whole,
involving both people and processes, and not just the technology. The Cloud is powerful and,
if implemented properly, can substantially help an organization grow faster and perform
better.

Check out the KnowledgeHut’s Cloud Computing Courses List to learn the most in-demand
Cloud Computing skills from experts in the industry. Develop knowledge in managing cloud
storage, databases, networking, security, and analytics with in-depth course materials.
Cloud Computing Applications

Cloud service providers provide various applications in the field of art, business, data storage
and backup services, education, entertainment, management, social networking, etc.

The most widely used cloud computing applications are given below -

1. Art Applications

Cloud computing offers various art applications for quickly and easily design attractive
cards, booklets, and images. Some most commonly used cloud art applications are given
below:

i Moo

Moo is one of the best cloud art applications. It is used for designing and printing business
cards, postcards, and mini cards.

ii. Vistaprint

Vistaprint allows us to easily design various printed marketing products such as business
cards, Postcards, Booklets, and wedding invitations cards.

iii. Adobe Creative Cloud

Adobe creative cloud is made for designers, artists, filmmakers, and other creative
professionals. It is a suite of apps which includes PhotoShop image editing programming,
Illustrator, InDesign, TypeKit, Dreamweaver, XD, and Audition.
2. Business Applications

Business applications are based on cloud service providers. Today, every organization
requires the cloud business application to grow their business. It also ensures that business
applications are 24*7 available to users.

There are the following business applications of cloud computing -

i. MailChimp

MailChimp is an email publishing platform which provides various options to design, send,
and save templates for emails.

iii. Salesforce

Salesforce platform provides tools for sales, service, marketing, e-commerce, and more. It
also provides a cloud development platform.

iv. Chatter

Chatter helps us to share important information about the organization in real time.

v. Bitrix24

Bitrix24 is a collaboration platform which provides communication, management, and social


collaboration tools.

vi. Paypal

Paypal offers the simplest and easiest online payment mode using a secure internet account.
Paypal accepts the payment through debit cards, credit cards, and also from Paypal account
holders.

vii. Slack

Slack stands for Searchable Log of all Conversation and Knowledge. It provides a user-
friendly interface that helps us to create public and private channels for communication.

viii. Quickbooks

Quickbooks works on the terminology "Run Enterprise anytime, anywhere, on any


device." It provides online accounting solutions for the business. It allows more than 20 users
to work simultaneously on the same system.

3. Data Storage and Backup Applications

Cloud computing allows us to store information (data, files, images, audios, and videos) on
the cloud and access this information using an internet connection. As the cloud provider is
responsible for providing security, so they offer various backup recovery application for
retrieving the lost data.
A list of data storage and backup applications in the cloud are given below -

i. Box.com

Box provides an online environment for secure content management, workflow, and
collaboration. It allows us to store different files such as Excel, Word, PDF, and images on
the cloud. The main advantage of using box is that it provides drag & drop service for files
and easily integrates with Office 365, G Suite, Salesforce, and more than 1400 tools.

ii. Mozy

Mozy provides powerful online backup solutions for our personal and business data. It
schedules automatically back up for each day at a specific time.

iii. Joukuu

Joukuu provides the simplest way to share and track cloud-based backup files. Many users
use joukuu to search files, folders, and collaborate on documents.

iv. Google G Suite

Google G Suite is one of the best cloud storage and backup application. It includes Google
Calendar, Docs, Forms, Google+, Hangouts, as well as cloud storage and tools for managing
cloud apps. The most popular app in the Google G Suite is Gmail. Gmail offers free email
services to users.

4. Education Applications

Cloud computing in the education sector becomes very popular. It offers various online
distance learning platforms and student information portals to the students. The
advantage of using cloud in the field of education is that it offers strong virtual classroom
environments, Ease of accessibility, secure data storage, scalability, greater reach for the
students, and minimal hardware requirements for the applications.

There are the following education applications offered by the cloud -

i. Google Apps for Education

Google Apps for Education is the most widely used platform for free web-based email,
calendar, documents, and collaborative study.

ii. Chromebooks for Education

Chromebook for Education is one of the most important Google's projects. It is designed for
the purpose that it enhances education innovation.

iii. Tablets with Google Play for Education

It allows educators to quickly implement the latest technology solutions into the classroom
and make it available to their students.
iv. AWS in Education

AWS cloud provides an education-friendly environment to universities, community colleges,


and schools.

5. Entertainment Applications

Entertainment industries use a multi-cloud strategy to interact with the target audience.
Cloud computing offers various entertainment applications such as online games and video
conferencing.

i. Online games

Today, cloud gaming becomes one of the most important entertainment media. It offers
various online games that run remotely from the cloud. The best cloud gaming services are
Shaow, GeForce Now, Vortex, Project xCloud, and PlayStation Now.

ii. Video Conferencing Apps

Video conferencing apps provides a simple and instant connected experience. It allows us to
communicate with our business partners, friends, and relatives using a cloud-based video
conferencing. The benefits of using video conferencing are that it reduces cost, increases
efficiency, and removes interoperability.

6. Management Applications

Cloud computing offers various cloud management tools which help admins to manage all
types of cloud activities, such as resource deployment, data integration, and disaster recovery.
These management tools also provide administrative control over the platforms, applications,
and infrastructure.

Some important management applications are -

i. Toggl

Toggl helps users to track allocated time period for a particular project.

ii. Evernote

Evernote allows you to sync and save your recorded notes, typed notes, and other notes in
one convenient place. It is available for both free as well as a paid version.

It uses platforms like Windows, macOS, Android, iOS, Browser, and Unix.

iii. Outright

Outright is used by management users for the purpose of accounts. It helps to track income,
expenses, profits, and losses in real-time environment.

iv. GoToMeeting
GoToMeeting provides Video Conferencing and online meeting apps, which allows you to
start a meeting with your business partners from anytime, anywhere using mobile phones or
tablets. Using GoToMeeting app, you can perform the tasks related to the management such
as join meetings in seconds, view presentations on the shared screen, get alerts for upcoming
meetings, etc.

7. Social Applications

Social cloud applications allow a large number of users to connect with each other using
social networking applications such as Facebook, Twitter, Linkedln, etc.

There are the following cloud based social applications -

i. Facebook

Facebook is a social networking website which allows active users to share files, photos,
videos, status, more to their friends, relatives, and business partners using the cloud storage
system. On Facebook, we will always get notifications when our friends like and comment on
the posts.

ii. Twitter

Twitter is a social networking site. It is a microblogging system. It allows users to follow


high profile celebrities, friends, relatives, and receive news. It sends and receives short posts
called tweets.

iii. Yammer

Yammer is the best team collaboration tool that allows a team of employees to chat, share
images, documents, and videos.

iv. LinkedIn

LinkedIn is a social network for students, fresher’s, and professionals.

Architecture of Cloud Computing

Cloud Computing , which is one of the demanding technology of the current time and which
is giving a new shape to every organization by providing on demand virtualized
services/resources. Starting from small to medium and medium to large, every organization
use cloud computing services for storing information and accessing it from anywhere and any
time only with the help of internet. In this article, we will know more about the internal
architecture of cloud computing.

Transparency, scalability, security and intelligent monitoring are some of the most important
constraints which every cloud infrastructure should experience. Current research on other
important constraints is helping cloud computing system to come up with new features and
strategies with a great capability of providing more advanced cloud solutions.
Cloud Computing Architecture :
The cloud architecture is divided into 2 parts i.e.

1. Frontend
2. Backend

The below figure represents an internal architectural view of cloud computing.

Architecture of Cloud Computing

Architecture of cloud computing is the combination of both SOA (Service Oriented


Architecture) and EDA (Event Driven Architecture). Client infrastructure, application,
service, runtime cloud, storage, infrastructure, management and security all these are the
components of cloud computing architecture.

1. Frontend:
Frontend of the cloud architecture refers to the client side of cloud computing system. Means
it contains all the user interfaces and applications which are used by the client to access the
cloud computing services/resources. For example, use of a web browser to access the cloud
platform.

 Client Infrastructure – Client Infrastructure is a part of the frontend component. It


contains the applications and user interfaces which are required to access the cloud
platform.
 In other words, it provides a GUI( Graphical User Interface ) to interact with the
cloud.

2. Backend:
Backend refers to the cloud itself which is used by the service provider. It contains the
resources as well as manages the resources and provides security mechanisms. Along with
this, it includes huge storage, virtual applications, virtual machines, traffic control
mechanisms, deployment models, etc.

1. Application –
Application in backend refers to a software or platform to which client accesses.
Means it provides the service in backend as per the client requirement.
2. Service –
Service in backend refers to the major three types of cloud based services like SaaS,
PaaS and IaaS. Also manages which type of service the user accesses.
3. Runtime Cloud-
Runtime cloud in backend provides the execution and Runtime platform/environment
to the Virtual machine.
4. Storage –
Storage in backend provides flexible and scalable storage service and management of
stored data.
5. Infrastructure –
Cloud Infrastructure in backend refers to the hardware and software components of
cloud like it includes servers, storage, network devices, virtualization software etc.
6. Management –
Management in backend refers to management of backend components like
application, service, runtime cloud, storage, infrastructure, and other security
mechanisms etc.
7. Security –
Security in backend refers to implementation of different security mechanisms in the
backend for secure cloud resources, systems, files, and infrastructure to end-users.
8. Internet –
Internet connection acts as the medium or a bridge between frontend and backend and
establishes the interaction and communication between frontend and backend.

Benefits of Cloud Computing Architecture:

 Makes overall cloud computing system simpler.


 Improves data processing requirements.
 Helps in providing high security.
 Makes it more modularized.
 Results in better disaster recovery.
 Gives good user accessibility.
 Reduces IT operating costs.
Architecture Style

An architecture style is a family of architectures that share certain characteristics. For


example, N-tier is a common architecture style. More recently, micro service architectures
have started to gain favour. Architecture styles don't require the use of particular
technologies, but some technologies are well-suited for certain architectures. For example,
containers are a natural fit for micro services.

We have identified a set of architecture styles that are commonly found in cloud applications.
The article for each style includes:

 A description and logical diagram of the style.


 Recommendations for when to choose this style.
 Benefits, challenges, and best practices.
 A recommended deployment using relevant Azure services.

A quick tour of the styles

This section gives a quick tour of the architecture styles that we've identified, along with
some high-level considerations for their use. Read more details in the linked topics.

N-tier

N-tier is a traditional architecture for enterprise applications. Dependencies are managed by


dividing the application into layers that perform logical functions, such as presentation,
business logic, and data access. A layer can only call into layers that sit below it. However,
this horizontal layering can be a liability. It can be hard to introduce changes in one part of
the application without touching the rest of the application. That makes frequent updates a
challenge, limiting how quickly new features can be added.

N-tier is a natural fit for migrating existing applications that already use a layered
architecture. For that reason, N-tier is most often seen in infrastructure as a service (IaaS)
solutions, or application that use a mix of IaaS and managed services.
Web-Queue-Worker

For a purely PaaS solution, consider a Web-Queue-Worker architecture. In this style, the
application has a web front end that handles HTTP requests and a back-end worker that
performs CPU-intensive tasks or long-running operations. The front end communicates to the
worker through an asynchronous message queue.

Web-queue-worker is suitable for relatively simple domains with some resource-intensive


tasks. Like N-tier, the architecture is easy to understand. The use of managed services
simplifies deployment and operations. But with complex domains, it can be hard to manage
dependencies. The front end and the worker can easily become large, monolithic components
that are hard to maintain and update. As with N-tier, this can reduce the frequency of updates
and limit innovation.

Microservices

If your application has a more complex domain, consider moving to a Microservices


architecture. A microservices application is composed of many small, independent services.
Each service implements a single business capability. Services are loosely coupled,
communicating through API contracts.
Each service can be built by a small, focused development team. Individual services can be
deployed without a lot of coordination between teams, which encourages frequent updates. A
microservice architecture is more complex to build and manage than either N-tier or web-
queue-worker. It requires a mature development and DevOps culture. But done right, this
style can lead to higher release velocity, faster innovation, and a more resilient architecture.

Event-driven architecture

Event-Driven Architectures use a publish-subscribe (pub-sub) model, where producers


publish events, and consumers subscribe to them. The producers are independent from the
consumers, and consumers are independent from each other.

Consider an event-driven architecture for applications that ingest and process a large volume
of data with very low latency, such as IoT solutions. The style is also useful when different
subsystems must perform different types of processing on the same event data.

Big Data, Big Compute

Big Data and Big Compute are specialized architecture styles for workloads that fit certain
specific profiles. Big data divides a very large dataset into chunks, performing parallel
processing across the entire set, for analysis and reporting. Big compute, also called high-
performance computing (HPC), makes parallel computations across a large number
(thousands) of cores. Domains include simulations, modeling, and 3-D rendering.
Architecture styles as constraints

An architecture style places constraints on the design, including the set of elements that can
appear and the allowed relationships between those elements. Constraints guide the "shape"
of architecture by restricting the universe of choices. When architecture conforms to the
constraints of a particular style, certain desirable properties emerge.

For example, the constraints in micro services include:

 A service represents a single responsibility.


 Every service is independent of the others.
 Data is private to the service that owns it. Services do not share data.

By adhering to these constraints, what emerges is a system where services can be deployed
independently, faults are isolated, frequent updates are possible, and it's easy to introduce
new technologies into the application.

Before choosing an architecture style, make sure that you understand the underlying
principles and constraints of that style. Otherwise, you can end up with a design that
conforms to the style at a superficial level, but does not achieve the full potential of that style.
It's also important to be pragmatic. Sometimes it's better to relax a constraint, rather than
insist on architectural purity.

The following table summarizes how each style manages dependencies, and the types of
domain that are best suited for each.

Architecture
Dependency management Domain type
style

Traditional business domain.


N-tier Horizontal tiers divided by subnet
Frequency of updates is low.

Web-queue- Front and backend jobs, decoupled by Relatively simple domain with
worker async messaging. some resource intensive tasks.

Vertically (functionally) decomposed Complicated domain. Frequent


Microservices
services that call each other through APIs. updates.

Event-driven Producer/consumer. Independent view per


IoT and real-time systems.
architecture sub-system.

Batch and real-time data


Divide a huge dataset into small chunks.
Big data analysis. Predictive analysis
Parallel processing on local datasets.
using ML.

Compute intensive domains


Big compute Data allocation to thousands of cores.
such as simulation.
Consider challenges and benefits

Constraints also create challenges, so it's important to understand the trade-offs when
adopting any of these styles. Do the benefits of the architecture style outweigh the challenges,
for this subdomain and bounded context.

Here are some of the types of challenges to consider when selecting an architecture style:

 Complexity. Is the complexity of the architecture justified for your domain?


Conversely, is the style too simplistic for your domain? In that case, you risk ending
up with a "big ball of mud", because the architecture does not help you to manage
dependencies cleanly.
 Asynchronous messaging and eventual consistency. Asynchronous messaging can
be used to decouple services, and increase reliability (because messages can be
retried) and scalability. However, this also creates challenges in handling eventual
consistency, as well as the possibility of duplicate messages.
 Inter-service communication. As you decompose an application into separate
services, there is a risk that communication between services will cause unacceptable
latency or create network congestion (for example, in a microservices architecture).
 Manageability. How hard is it to manage the application, monitor, deploy updates,
and so on?

 SWF stands for Simple Workflow Service.


 It is a web service used to build scalable and resilient applications.
 It provides simple API calls which can be executed from code written in any language
and can be run on your EC2 instance or any of your machines located anywhere in the
world that access the internet. For example, you are building an application which
consists of various modules, and to coordinate among various modules; we rely on
SWF in aws. SWF acts as a coordinator, and it has control over all the modules of an
application.
 It allows you to build applications and makes it easy to coordinate the work across
distributed components.
 SWF provides a logical separation among all the components of a project.
 SWF involves in coordinating various tasks such as managing inter-task
dependencies, scheduling, and concurrency in accordance with the logical flow of the
application. You do not have to manage the tasks manually; SWF will do everything
for you.

Let's understand through an example.


Suppose customer placed an order.

Step 1: You have to verify an order. You have your EC2 instances, and they go and check
whether the order is in stock or not. Once the order has been verified, i.e., you have got a
stock, then move to step 2.

Step 2: Now, it works on the Charge Credit card. It checks whether the charge of a credit
card has been successful or not.

Step 3: If the charge of a credit card has been successful, we will ship an order. Shipping an
order needs human interaction. Human brings order from the warehouse, and if the product
has been boxed up means that it is ready for the shipment.

Step 4: Record Completion is a database which says that the product has been boxed up and
shipped to the destination address. It also provides the tracking number. This is the end of the
typical workflow.

SWF Workers and Deciders

 Workers are the programs that interact with the Amazon SWF to get the tasks, process
the received tasks, and return the results.
 The decider is a program that provides coordination of tasks such as ordering,
concurrency, scheduling, etc according to the application logic.
 Both workers and deciders run on the cloud infrastructure such as Amazon EC2, or
machines behind firewalls.
 Deciders take a consistent view into the progress of tasks and initiate new tasks while
Amazon SWF stores the tasks and assigns them to the workers to process them.
 Amazon SWF ensures that the task is assigned only once and is never duplicated.
 Workers and Deciders do not have to keep track of the execution state as Amazon
SWF maintains the state durably.
 Both the workers and deciders run independently and scale quickly.

SWF Domains

 Domains are containers which isolate a set of types, executions, and task lists from
others within the same account.
 Workflow, activity types, and workflow execution are all scoped to a domain.
 You can register a domain either by using the AWS Management Console or
RegisterDomain action in the Amazon SWF API.

The parameters are specified in a JSON (Javascript Object Notation) format. The format is
shown below:

1. RegisterDomain
2. {
3. "name" : "867530901";
4. "Description": "music";
5. "workflowExecutionRetentionPeriodInDays": "60";
6. }

Where,

workflowExecutionRetentionPeriodInDays defines the number of days of retention period.

Note: The maximum workflow can be 1 year and its value is measured in seconds.

Differences b/w SQS and SWF

 Amazon SWF provides a task-oriented API while Amazon SQS provides a message-
oriented API./li>
 Amazon SWF ensures that the task is assigned only once and is never duplicated.
With Amazon SQS, the message can be duplicated and it may also need to ensure that
a message is processed only once./li>
 SWF keeps track of all tasks and events in an application while SQS implements its
own application level tracking when an application uses multiple queues.

Features of SWF
 Scalable
Amazon SWF automatically scales the resources along with your application's usage.
There is no manual administration of the workflow service required when you add
more cloud workflows or increase the complexity of the workflows.
 Reliable
Amazon SWF runs at Amazon's highly available data centres, therefore the state
tracking is provided whenever applications need them. Amazon SWF stores the tasks,
sends them to their respective application components, keeps a track on their progress.
 Simple
Amazon SWF completely replaces the complexity of the old workflow solutions and
process automation software with new cloud workflow internet service. It eliminates
the need for the developers to manage the automation process so that you can focus
on the unique functionality of an application.
 Logical separation
Amazon SWF provides a logical separation between the control flow of your
background job's stepwise logic and the actual units of work that contains business
logic. Due to the logical separation, you can separately manage, maintain, and scale
"state machinery" of your application from the business logic. According to the
change in the business requirements, you can easily manage the business logic
without having worry about the state machinery, task dispatch, and flow control.
 Flexible
Amazon SWF allows you to modify the application components, i.e., you can modify
the application logic in any programming language and runs them within the cloud or
on-premises.
Zookeeper is a distributed, open-source coordination service for distributed applications. It
exposes a simple set of primitives to implement higher-level services for synchronization,
configuration maintenance, and group and naming.

Why do we need it?

 Coordination services: The integration/communication of services in a distributed


environment.
 Coordination services are complex to get right. They are especially prone to errors
such as race conditions and deadlock.
 Race condition-Two or more systems trying to perform some task.
 Deadlocks– Two or more operations are waiting for each other.
 To make the coordination between distributed environments easy, developers came up
with an idea called zookeeper so that they don’t have to relieve distributed
applications of the responsibility of implementing coordination services from scratch.

What is distributed system?

 Multiple computer systems working on a single problem.


 It is a network that consists of autonomous computers that are connected using
distributed middleware.
 Key Features: Concurrent, resource sharing, independent, global, greater fault
tolerance, and price/performance ratio is much better.
 Key Goals: Transparency, Reliability, Performance, Scalability.
 Challenges: Security, Fault, Coordination, and resource sharing.

Coordination Challenge

 why is coordination in a distributed system the hard problem?


 Coordination or configuration management for a distributed application that has many
systems.
 Master Node where the cluster data is stored.
 Worker nodes or slave nodes get the data from this master node.
 single point of failure.
 synchronization is not easy.
 Careful design and implementation are needed.

Apache Zookeeper

Apache Zookeeper is a distributed, open-source coordination service for distributed systems.


It provides a central place for distributed applications to store data, communicate with one
another, and coordinate activities. Zookeeper is used in distributed systems to coordinate
distributed processes and services. It provides a simple, tree-structured data model, a simple
API, and a distributed protocol to ensure data consistency and availability. Zookeeper is
designed to be highly reliable and fault-tolerant, and it can handle high levels of read and
write throughput.
Zookeeper is implemented in Java and is widely used in distributed systems, particularly in
the Hadoop ecosystem. It is an Apache Software Foundation project and is released under the
Apache License 2.0.
Zookeeper Architecture
Zookeeper Services

The ZooKeeper architecture consists of a hierarchy of nodes called znodes, organized in a


tree-like structure. Each znode can store data and has a set of permissions that control access
to the znode. The znodes are organized in a hierarchical namespace, similar to a file system.
At the root of the hierarchy is the root znode, and all other znodes are children of the root
znode. The hierarchy is similar to a file system hierarchy, where each znode can have
children and grandchildren, and so on.
Important Components in Zookeeper
ZooKeeper Services

 Leader & Follower


 Request Processor – Active in Leader Node and is responsible for processing write
requests. After processing, it sends changes to the follower nodes
 Atomic Broadcast – Present in both Leader Node and Follower Nodes. It is
responsible for sending the changes to other Nodes.
 In-memory Databases (Replicated Databases)-It is responsible for storing the data in
the zookeeper. Every node contains its own databases. Data is also written to the file
system providing recoverability in case of any problems with the cluster.

Other Components

 Client – One of the nodes in our distributed application cluster. Access information
from the server. Every client sends a message to the server to let the server know that
client is alive.
 Server– Provides all the services to the client. Gives acknowledgment to the client.
 Ensemble– Group of Zookeeper servers. The minimum number of nodes that are
required to form an ensemble is 3.
Zookeeper Data Model
ZooKeeper data model

In Zookeeper, data is stored in a hierarchical namespace, similar to a file system. Each node
in the namespace is called a Znode, and it can store data and have children. Znodes are
similar to files and directories in a file system. Zookeeper provides a simple API for creating,
reading, writing, and deleting Znodes. It also provides mechanisms for detecting changes to
the data stored in Znodes, such as watches and triggers. Znodes maintain a stat structure that
includes: Version number, ACL, Timestamp, Data Length

Types of Znodes:

 Persistence: Alive until they’re explicitly deleted.


 Ephemeral: Active until the client connection is alive.
 Sequential: Either persistent or ephemeral.

Why do we need ZooKeeper in the Hadoop?

Zookeeper is used to manage and coordinate the nodes in a Hadoop cluster, including the
NameNode, DataNode, and ResourceManager. In a Hadoop cluster, Zookeeper helps to:

 Maintain configuration information: Zookeeper stores the configuration information


for the Hadoop cluster, including the location of the NameNode, DataNode, and
ResourceManager.
 Manage the state of the cluster: Zookeeper tracks the state of the nodes in the Hadoop
cluster and can be used to detect when a node has failed or become unavailable.
 Coordinate distributed processes: Zookeeper can be used to coordinate distributed
processes, such as job scheduling and resource allocation, across the nodes in a
Hadoop cluster.

Zookeeper helps to ensure the availability and reliability of a Hadoop cluster by providing a
central coordination service for the nodes in the cluster.

How ZooKeeper in Hadoop Works?

ZooKeeper operates as a distributed file system and exposes a simple set of APIs that enable
clients to read and write data to the file system. It stores its data in a tree-like structure called
a znode, which can be thought of as a file or a directory in a traditional file system.
ZooKeeper uses a consensus algorithm to ensure that all of its servers have a consistent view
of the data stored in the Znodes. This means that if a client writes data to a znode, that data
will be replicated to all of the other servers in the ZooKeeper ensemble.

One important feature of ZooKeeper is its ability to support the notion of a “watch.” A watch
allows a client to register for notifications when the data stored in a znode changes. This can
be useful for monitoring changes to the data stored in ZooKeeper and reacting to those
changes in a distributed system.

In Hadoop, ZooKeeper is used for a variety of purposes, including:

 Storing configuration information: ZooKeeper is used to store configuration


information that is shared by multiple Hadoop components. For example, it might be
used to store the locations of NameNodes in a Hadoop cluster or the addresses of
JobTracker nodes.
 Providing distributed synchronization: ZooKeeper is used to coordinate the activities
of various Hadoop components and ensure that they are working together in a
consistent manner. For example, it might be used to ensure that only one NameNode
is active at a time in a Hadoop cluster.
 Maintaining naming: ZooKeeper is used to maintain a centralized naming service for
Hadoop components. This can be useful for identifying and locating resources in a
distributed system.

ZooKeeper is an essential component of Hadoop and plays a crucial role in coordinating the
activity of its various subcomponents.

Reading and Writing in Apache Zookeeper

ZooKeeper provides a simple and reliable interface for reading and writing data. The data is
stored in a hierarchical namespace, similar to a file system, with nodes called znodes. Each
znode can store data and have children znodes. ZooKeeper clients can read and write data to
these znodes by using the getData() and setData() methods, respectively. Here is an example
of reading and writing data using the ZooKeeper Java API:

// Connect to the ZooKeeper ensemble

ZooKeeper zk = new ZooKeeper("localhost:2181", 3000, null);

// Write data to the znode "/myZnode"

String path = "/myZnode";

String data = "hello world";

zk.create(path, data.getBytes(), Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);

// Read data from the znode "/myZnode"

byte[] bytes = zk.getData(path, false, null);

String readData = new String(bytes);

// prints "hello world"

System.out.println(readData);

// Close the connection

// to the ZooKeeper ensemble

zk.close();
Session and Watches

Session

 Requests in a session are executed in FIFO order.


 Once the session is established then the session id is assigned to the client.
 Client sends heartbeats to keep the session valid
 session timeout is usually represented in milliseconds

Watches

 Watches are mechanisms for clients to get notifications about the changes in the
Zookeeper
 Client can watch while reading a particular znode.
 Znodes changes are modifications of data associated with the znodes or changes in the
znode’s children.
 Watches are triggered only once.
 If the session is expired, watches are also removed.

Use cases of Apache Zookeeper

 Configuration management: ZooKeeper can store and manage configuration


information for distributed systems, making it easy to update and maintain
configuration data.
 Leader election: ZooKeeper can be used to elect a leader among a group of nodes in a
distributed system. This is useful for tasks such as load balancing or task scheduling.
 Group membership: ZooKeeper can be used to track the membership of a group of
nodes in a distributed system. This can be used to implement failover or other
reliability features.
 Synchronization: ZooKeeper can be used to synchronize access to shared resources in
a distributed system.
 Dependency management: ZooKeeper can be used to manage dependencies between
tasks in a distributed system, ensuring that tasks are completed in the correct order.

Popular applications/companies using Zookeeper

 Hadoop: Zookeeper is used by the Hadoop distributed file system (HDFS) to manage
the coordination of data processing tasks and metadata.
 Apache Kafka: Zookeeper is used by Apache Kafka, a distributed streaming platform,
to manage the coordination of Kafka brokers and consumer groups.
 Apache Storm: Zookeeper is used by Apache Storm, a distributed real-time
processing system, to manage the coordination of worker processes and task
assignments.
 LinkedIn: Zookeeper is used by LinkedIn, the social networking site, to manage the
coordination of distributed systems and services.
 Yahoo!: Zookeeper is used by Yahoo!, the internet company, to manage the
coordination of distributed systems and services.
MapReduce

MapReduce and HDFS are the two major components of Hadoop which makes it so powerful
and efficient to use. MapReduce is a programming model used for efficient processing in
parallel over large data-sets in a distributed manner. The data is first split and then combined
to produce the final result. The libraries for MapReduce is written in so many programming
languages with various different-different optimizations. The purpose of MapReduce in
Hadoop is to Map each of the jobs and then it will reduce it to equivalent tasks for providing
less overhead over the cluster network and to reduce the processing power. The MapReduce
task is mainly divided into two phases Map Phase and Reduce Phase.

MapReduce Architecture:

Components of MapReduce Architecture:

1. Client: The MapReduce client is the one who brings the Job to the MapReduce for
processing. There can be multiple clients available that continuously send jobs for
processing to the Hadoop MapReduce Manager.
2. Job: The MapReduce Job is the actual work that the client wanted to do which is
comprised of so many smaller tasks that the client wants to process or execute.
3. Hadoop MapReduce Master: It divides the particular job into subsequent job-parts.
4. Job-Parts: The task or sub-jobs that are obtained after dividing the main job. The
result of all the job-parts combined to produce the final output.
5. Input Data: The data set that is fed to the MapReduce for processing.
6. Output Data: The final result is obtained after the processing.

In MapReduce, we have a client. The client will submit the job of a particular size to the
Hadoop MapReduce Master. Now, the MapReduce master will divide this job into further
equivalent job-parts. These job-parts are then made available for the Map and Reduce Task.
This Map and Reduce task will contain the program as per the requirement of the use-case
that the particular company is solving. The developer writes their logic to fulfill the
requirement that the industry requires. The input data which we are using is then fed to the
Map Task and the Map will generate intermediate key-value pair as its output. The output of
Map i.e. these key-value pairs are then fed to the Reducer and the final output is stored on the
HDFS. There can be n number of Map and Reduce tasks made available for processing the
data as per the requirement. The algorithm for Map and Reduce is made with a very
optimized way such that the time complexity or space complexity is minimum.

Let’s discuss the MapReduce phases to get a better understanding of its architecture:

The MapReduce task is mainly divided into 2 phases i.e. Map phase and Reduce phase.

1. Map: As the name suggests its main use is to map the input data in key-value pairs.
The input to the map may be a key-value pair where the key can be the id of some
kind of address and value is the actual value that it keeps. The Map() function will be
executed in its memory repository on each of these input key-value pairs and
generates the intermediate key-value pair which works as input for the Reducer or
Reduce() function.

2. Reduce: The intermediate key-value pairs that work as input for Reducer are shuffled
and sort and send to the Reduce() function. Reducer aggregate or group the data based
on its key-value pair as per the reducer algorithm written by the developer.

How Job tracker and the task tracker deal with MapReduce:

1. Job Tracker: The work of Job tracker is to manage all the resources and all the jobs
across the cluster and also to schedule each map on the Task Tracker running on the
same data node since there can be hundreds of data nodes available in the cluster.

2. Task Tracker: The Task Tracker can be considered as the actual slaves that are
working on the instruction given by the Job Tracker. This Task Tracker is deployed
on each of the nodes available in the cluster that executes the Map and Reduce task as
instructed by Job Tracker.

There is also one important component of Map Reduce Architecture known as Job History
Server. The Job History Server is a daemon process that saves and stores historical
information about the task or application, like the logs which are generated during or after the
job execution are stored on Job History Server.
MapReduce is a programming model used to perform distributed processing in parallel in a
Hadoop cluster, which Makes Hadoop working so fast. When you are dealing with Big Data,
serial processing is no more of any use. MapReduce has mainly two tasks which are divided
phase-wise:

 Map Task
 Reduce Task

Let us understand it with a real-time example, and the example helps you understand
Mapreduce Programming Model in a story manner:

 Suppose the Indian government has assigned you the task to count the population of
India. You can demand all the resources you want, but you have to do this task in 4
months. Calculating the population of such a large country is not an easy task for a
single person(you). So what will be your approach?.
 One of the ways to solve this problem is to divide the country by states and assign
individual in-charge to each state to count the population of that state.
 Task Of Each Individual: Each Individual has to visit every home present in the state
and need to keep a record of each house members as:
 State_Name Member_House1
 State_Name Member_House2
 State_Name Member_House3

 .
 .
 State_Name Member_House n
 .
 .

For Simplicity, we have taken only three states.


This is a simple Divide and Conquer approach and will be followed by each
individual to count people in his/her state.

 Once they have counted each house member in their respective state. Now they need
to sum up their results and need to send it to the Head-quarter at New Delhi.
 We have a trained officer at the Head-quarter to receive all the results from each state
and aggregate them by each state to get the population of that entire state. and Now,
with this approach, you are easily able to count the population of India by summing
up the results obtained at Head-quarter.
 The Indian Govt. is happy with your work and the next year they asked you to do the
same job in 2 months instead of 4 months. Again you will be provided with all the
resources you want.
 Since the Govt. has provided you with all the resources, you will simply double the
number of assigned individual in-charge for each state from one to two. For that
divide each state in 2 division and assigned different in-charge for these two divisions
as:
 State_Name_Incharge_division1
 State_Name_Incharge_division2
 Similarly, each individual in charge of its division will gather the information about
members from each house and keep its record.
 We can also do the same thing at the Head-quarters, so let’s also divide the Head-
quarter in two division as:
 Head-qurter_Division1
 Head-qurter_Division2
 Now with this approach, you can find the population of India in two months. But
there is a small problem with this, we never want the divisions of the same state to
send their result at different Head-quarters then, in that case, we have the partial
population of that state in Head-quarter_Division1 and Head-quarter_Division2 which
is inconsistent because we want consolidated population by the state, not the partial
counting.
 One easy way to solve is that we can instruct all individuals of a state to either send
there result to Head-quarter_Division1 or Head-quarter_Division2. Similarly, for all
the states.
 Our problem has been solved, and you successfully did it in two months.
 Now, if they ask you to do this process in a month, you know how to approach the
solution.
 Great, now we have a good scalable model that works so well. The model we have
seen in this example is like the Map Reduce Programming model. so now you must be
aware that Map Reduce is a programming model, not a programming language.
Now let’s discuss the phases and important things involved in our model.

1. Map Phase: The Phase where the individual in-charges are collecting the population of
each house in their division is Map Phase.

 Mapper: Involved individual in-charge for calculating population


 Input Splits: The state or the division of the state
 Key-Value Pair: Output from each individual Mapper like the key is Rajasthan and
value is 2

2. Reduce Phase: The Phase where you are aggregating your result

 Reducers: Individuals who are aggregating the actual result. Here in our example, the
trained-officers. Each Reducer produce the output as a key-value pair

3. Shuffle Phase: The Phase where the data is copied from Mappers to Reducers is
Shuffler’s Phase. It comes in between Map and Reduces phase. Now the Map Phase,
Reduce Phase, and Shuffler Phase our the three main Phases of our Mapreduce.

High performance Computing

It is the use of parallel processing for running advanced application programs efficiently,
relatives, and quickly. The term applies especially is a system that function above a teraflop
(1012) (floating opm per second). The term High-performance computing is occasionally used
as a synonym for supercomputing. Although technically a supercomputer is a system that
performs at or near currently highest operational rate for computers. Some supercomputers
work at more than a petaflop (1012) floating points opm per second. The most common HPC
system all scientific engineers & academic institutions. Some Government agencies
particularly military are also relying on APC for complex applications.
High-performance Computers:

High Performance Computing (HPC) generally refers to the practice of combining computing
power to deliver far greater performance than a typical desktop or workstation, in order to
solve complex problems in science, engineering, and business.

Processors, memory, disks, and OS are elements of high-performance computers of interest


to small & medium size businesses today are really clusters of computers. Each individual
computer in a commonly configured small cluster has between one and four processors and
today ‘s processors typically are from 2 to 4 crores, HPC people often referred to individual
computers in a cluster as nodes. A cluster of interest to a small business could have as few as
4 nodes on 16 crores. Common cluster size in many businesses is between 16 & 64 crores or
from 64 to 256 crores. The main reason to use this is that in its individual node can work
together to solve a problem larger than any one computer can easily solve. These nodes are so
connected that they can communicate with each other in order to produce some meaningful
work. There are two popular HPC’s software i. e, Linux, and windows. Most of installations
are in Linux because of Linux legacy in supercomputer and large scale machines. But one can
use it with his / her requirements.

Importance of High performance Computing :

1. It is used for scientific discoveries, game-changing innovations, and to improve


quality of life.
2. It is a foundation for scientific & industrial advancements.
3. It is used in technologies like IoT, AI, 3D imaging evolves & amount of data that is
used by organization is increasing exponentially to increase ability of a computer, we
use High-performance computer.
4. HPC is used to solve complex modelling problems in a spectrum of disciplines. It
includes AI, Nuclear Physics, Climate Modelling, etc.
5. HPC is applied to business uses, data warehouses & transaction processing.

Need of High performance Computing:

1. It will complete a time-consuming operation in less time.


2. It will complete an operation under a light deadline and perform a high numbers of
operations per second.
3. It is fast computing, we can compute in parallel over lot of computation elements
CPU, GPU, etc. It set up very fast network to connect between elements.

Need of ever increasing Performance:

1. Climate modelling
2. Drug discovery
3. Data Analysis
4. Protein folding
5. Energy research

How Does HPC Work?

User/Scheduler → Compute cluster → Data storage


To create a high-performance computing architecture, multiple computer servers are
networked together to form a compute cluster. Algorithms and software programs are
executed simultaneously on the servers, and the cluster is networked to data storage to
retrieve the results. All of these components work together to complete a diverse set of tasks.

To achieve maximum efficiency, each module must keep pace with others, otherwise, the
performance of the entire HPC infrastructure would suffer.
UNIT III
Cloud Resource virtualization: Virtualization, layering and virtualization, virtual machine
monitors, virtual machines, virtualization- full and para, performance and security isolation,
hardware support for virtualization, Case Study: Xen, vBlades,

Cloud Resource Management and Scheduling: Policies and Mechanisms, Applications of


control theory to task scheduling, Stability of a two-level resource allocation architecture,
feedback control based on dynamic thresholds, coordination, resource bundling, scheduling
algorithms, fair queuing, start time fair queuing, cloud scheduling subject to deadlines,
Scheduling Map Reduce applications, Resource management and dynamic application
scaling

Virtualization is the "creation of a virtual (rather than actual) version of something, such as a
server, a desktop, a storage device, an operating system or network resources".

In other words, Virtualization is a technique, which allows to share a single physical instance
of a resource or an application among multiple customers and organizations. It does by
assigning a logical name to a physical storage and providing a pointer to that physical
resource when demanded.

What is the concept behind the Virtualization?

Creation of a virtual machine over existing operating system and hardware is known as
Hardware Virtualization. A Virtual machine provides an environment that is logically
separated from the underlying hardware.

The machine on which the virtual machine is going to create is known as Host Machine and
that virtual machine is referred as a Guest Machine

Types of Virtualization:

1. Hardware Virtualization.
2. Operating system Virtualization.
3. Server Virtualization.
4. Storage Virtualization.

1) Hardware Virtualization:

When the virtual machine software or virtual machine manager (VMM) is directly installed
on the hardware system is known as hardware virtualization.

The main job of hypervisor is to control and monitoring the processor, memory and other
hardware resources.

After virtualization of hardware system we can install different operating system on it and run
different applications on those OS.

Usage:
Hardware virtualization is mainly done for the server platforms, because controlling virtual
machines is much easier than controlling a physical server.

2) Operating System Virtualization:

When the virtual machine software or virtual machine manager (VMM) is installed on the
Host operating system instead of directly on the hardware system is known as operating
system virtualization.

Usage:

Operating System Virtualization is mainly used for testing the applications on different
platforms of OS.

3) Server Virtualization:

When the virtual machine software or virtual machine manager (VMM) is directly installed
on the Server system is known as server virtualization.

Usage:

Server virtualization is done because a single physical server can be divided into multiple
servers on the demand basis and for balancing the load.

4) Storage Virtualization:

Storage virtualization is the process of grouping the physical storage from multiple network
storage devices so that it looks like a single storage device.

Storage virtualization is also implemented by using software applications.

Usage:

Storage virtualization is mainly done for back-up and recovery purposes.

How does virtualization work in cloud computing?

Virtualization plays a very important role in the cloud computing technology, normally in
the cloud computing, users share the data present in the clouds like application etc, but
actually with the help of virtualization users shares the Infrastructure.

The main usage of Virtualization Technology is to provide the applications with the
standard versions to their cloud users, suppose if the next version of that application is
released, then cloud provider has to provide the latest version to their cloud users and
practically it is possible because it is more expensive.

To overcome this problem we use basically virtualization technology, By using virtualization,


all severs and the software application which are required by other cloud providers are
maintained by the third party people, and the cloud providers has to pay the money on
monthly or annual basis.

Conclusion

Mainly Virtualization means, running multiple operating systems on a single machine but
sharing all the hardware resources. And it helps us to provide the pool of IT resources so that
we can share these IT resources in order get benefits in the business.

Data Virtualization

Data virtualization is the process of retrieve data from various resources without knowing its
type and physical location where it is stored. It collects heterogeneous data from different
resources and allows data users across the organization to access this data according to their
work requirements. This heterogeneous data can be accessed using any application such as
web portals, web services, E-commerce, Software as a Service (SaaS), and mobile
application.

We can use Data Virtualization in the field of data integration, business intelligence, and
cloud computing.

Advantages of Data Virtualization

There are the following advantages of data virtualization -

 It allows users to access the data without worrying about where it resides on the
memory.
 It offers better customer satisfaction, retention, and revenue growth.
 It provides various security mechanism that allows users to safely store their personal
and professional information.
 It reduces costs by removing data replication.
 It provides a user-friendly interface to develop customized views.
 It provides various simple and fast deployment resources.
 It increases business user efficiency by providing data in real-time.
 It is used to perform tasks such as data integration, business integration, Service-
Oriented Architecture (SOA) data services, and enterprise search.

Disadvantages of Data Virtualization

 It creates availability issues, because availability is maintained by third-party


providers.
 It required a high implementation cost.
 It creates the availability and scalability issues.
 Although it saves time during the implementation phase of virtualization but it
consumes more time to generate the appropriate result.

Uses of Data Virtualization

There are the following uses of Data Virtualization -

1. Analyze performance

Data virtualization is used to analyze the performance of the organization compared to


previous years.

2. Search and discover interrelated data

Data Virtualization (DV) provides a mechanism to easily search the data which is similar and
internally related to each other.

3. Agile Business Intelligence

It is one of the most common uses of Data Virtualization. It is used in agile reporting, real-
time dashboards that require timely aggregation, analyze and present the relevant data from
multiple resources. Both individuals and managers use this to monitor performance, which
helps to make daily operational decision processes such as sales, support, finance, logistics,
legal, and compliance.

4. Data Management

Data virtualization provides a secure centralized layer to search, discover, and govern the
unified data and its relationships.

Data Virtualization Tools

There are the following Data Virtualization tools -

1. Red Hat JBoss data virtualization

Red Hat virtualization is the best choice for developers and those who are using micro
services and containers. It is written in Java.
2. TIBCO data virtualization

TIBCO helps administrators and users to create a data virtualization platform for accessing
the multiple data sources and data sets. It provides a builtin transformation engine to
combine non-relational and un-structured data sources.

3. Oracle data service integrator

It is a very popular and powerful data integrator tool which is mainly worked with Oracle
products. It allows organizations to quickly develop and manage data services to access a
single view of data.

4. SAS Federation Server

SAS Federation Server provides various technologies such as scalable, multi-user, and
standards-based data access to access data from multiple data services. It mainly focuses on
securing data.

5. Denodo

Denodo is one of the best data virtualization tools which allows organizations to minimize the
network traffic load and improve response time for large data sets. It is suitable for both small
as well as large organizations.

Industries that use Data Virtualization

 Communication & Technology


In Communication & Technology industry, data virtualization is used to increase
revenue per customer, create a real-time ODS for marketing, manage customers,
improve customer insights, and optimize customer care, etc.
 Finance
In the field of finance, DV is used to improve trade reconciliation, empowering data
democracy, addressing data complexity, and managing fixed-risk income.
 Government
In the government sector, DV is used for protecting the environment.
 Healthcare
Data virtualization plays a very important role in the field of healthcare. In healthcare,
DV helps to improve patient care, drive new product innovation, accelerating M&A
synergies, and provide more efficient claims analysis.
 Manufacturing
In manufacturing industry, data virtualization is used to optimize a global supply
chain, optimize factories, and improve IT assets utilization.

Virtual Machine abstracts the hardware of our personal computer such as CPU, disk drives,
memory, NIC (Network Interface Card) etc, into many different execution environments as
per our requirements, hence giving us a feel that each execution environment is a single
computer. For example, VirtualBox.
When we run different processes on an operating system, it creates an illusion that each
process is running on a different processor having its own virtual memory, with the help of
CPU scheduling and virtual-memory techniques. There are additional features of a process
that cannot be provided by the hardware alone like system calls and a file system. The virtual
machine approach does not provide these additional functionalities but it only provides an
interface that is same as basic hardware. Each process is provided with a virtual copy of the
underlying computer system.

We can create a virtual machine for several reasons, all of which are fundamentally related to
the ability to share the same basic hardware yet can also support different execution
environments, i.e., different operating systems simultaneously.

The main drawback with the virtual-machine approach involves disk systems. Let us suppose
that the physical machine has only three disk drives but wants to support seven virtual
machines. Obviously, it cannot allocate a disk drive to each virtual machine, because virtual-
machine software itself will need substantial disk space to provide virtual memory and
spooling. The solution is to provide virtual disks.

Users are thus given their own virtual machines. After which they can run any of the
operating systems or software packages that are available on the underlying machine. The
virtual-machine software is concerned with multi-programming multiple virtual machines
onto a physical machine, but it does not need to consider any user-support software. This
arrangement can provide a useful way to divide the problem of designing a multi-user
interactive system, into two smaller pieces.

Advantages:

1. There are no protection problems because each virtual machine is completely isolated
from all other virtual machines.
2. Virtual machine can provide an instruction set architecture that differs from real
computers.
3. Easy maintenance, availability and convenient recovery.

Disadvantages:

1. When multiple virtual machines are simultaneously running on a host computer, one
virtual machine can be affected by other running virtual machines, depending on the
workload.
2. Virtual machines are not as efficient as a real one when accessing the hardware.

Types of Virtual Machines : You can classify virtual machines into two types:

1. System Virtual Machine: These types of virtual machines gives us complete system
platform and gives the execution of the complete virtual operating system. Just like virtual
box, system virtual machine is providing an environment for an OS to be installed
completely. We can see in below image that our hardware of Real Machine is being
distributed between two simulated operating systems by Virtual machine monitor. And then
some programs, processes are going on in that distributed hardware of simulated machines
separately.

3. Process Virtual Machine: While process virtual machines, unlike system virtual
machine, does not provide us with the facility to install the virtual operating system
completely. Rather it creates virtual environment of that OS while using some app or
program and this environment will be destroyed as soon as we exit from that app.
Like in below image, there are some apps running on main OS as well some virtual
machines are created to run other apps. This shows that as those programs required
different OS, process virtual machine provided them with that for the time being those
programs are running. Example – Wine software in Linux helps to run Windows
applications.

Virtual Machine Language : It’s type of language which can be understood by different
operating systems. It is platform-independent. Just like to run any programming language (C,
python, or java) we need specific compiler that actually converts that code into system
understandable code (also known as byte code). The same virtual machine language works. If
we want to use code that can be executed on different types of operating systems like
(Windows, Linux, etc) then virtual machine language will be helpful.
1. Full Virtualization: Full Virtualization was introduced by IBM in the year 1966. It is the
first software solution for server virtualization and uses binary translation and direct approach
techniques. In full virtualization, guest OS is completely isolated by the virtual machine from
the virtualization layer and hardware. Microsoft and Parallels systems are examples of full

virtualization.

2. Para virtualization: Para virtualization is the category of CPU virtualization which uses
hyper calls for operations to handle instructions at compile time. In paravirtualization, guest
OS is not completely isolated but it is partially isolated by the virtual machine from the
virtualization layer and hardware. VMware and Xen are some examples of paravirtualization.
The difference between Full Virtualization and Para virtualization are as follows:

S.No. Full Virtualization Paravirtualization


In Full virtualization, virtual machines In para virtualization, a virtual machine does
permit the execution of the instructions not implement full isolation of OS but rather
1.
with the running of unmodified OS in provides a different API which is utilized when
an entirely isolated way. OS is subjected to alteration.
While the Para virtualization is more secure
2. Full Virtualization is less secure.
than the Full Virtualization.
Full Virtualization uses binary
While Para virtualization uses hype rcalls at
3. translation and a direct approach as a
compile time for operations.
technique for operations.
Full Virtualization is slow than para Para virtualization is faster in operation as
4.
virtualization in operation. compared to full virtualization.
Full Virtualization is more portable Para virtualization is less portable and
5.
and compatible. compatible.
Examples of full virtualization are Examples of para virtualization are Microsoft
6.
Microsoft and Parallels systems. Hyper-V, Citrix Xen, etc.
It supports all guest operating systems The guest operating system has to be modified
7.
without modification. and only a few operating systems support it.
The guest operating system will issue Using the drivers, the guest operating system
8.
hardware calls. will directly communicate with the hypervisor.
It is less streamlined compared to para-
9. It is more streamlined.
virtualization.
It provides less isolation compared to full
10. It provides the best isolation.
virtualization.
Performance And Security Isolation
 The run-time behavior of an application is affected by other applications running
concurrently on the same platform and competing for CPU cycles, cache, main memory, disk
and network access. Thus, it is difficult to predict the completion time!
 Performance isolation - a critical condition for QoS guarantees in shared computing
environments.
 A VMM is a much simpler and better specified system than a traditional operating system.
Example - Xen has approximately 60,000 lines of code; Denali has only about half, 30,000.
 The security vulnerability of VMMs is considerably reduced as the systems expose a much
smaller number of privileged functions

Hardware Virtualization

Previously, there was "one to one relationship" between physical servers and operating
system. Low capacity of CPU, memory, and networking requirements were available. So, by
using this model, the costs of doing business increased. The physical space, amount of power,
and hardware required meant that costs were adding up.

The hypervisor manages shared the physical resources of the hardware between the guest
operating systems and host operating system. The physical resources become abstracted
versions in standard formats regardless of the hardware platform. The abstracted hardware is
represented as actual hardware. Then the virtualized operating system looks into these
resources as they are physical entities.

Virtualization means abstraction. Hardware virtualization is accomplished by abstracting


the physical hardware layer by use of a hypervisor or VMM (Virtual Machine Monitor).

When the virtual machine software or virtual machine manager (VMM) or hypervisor
software is directly installed on the hardware system is known as hardware virtualization.

The main job of hypervisor is to control and monitoring the processor, memory and other
hardware resources.

After virtualization of hardware system we can install different operating system on it and run
different applications on those OS.

Usage of Hardware Virtualization

Hardware virtualization is mainly done for the server platforms, because controlling virtual
machines is much easier than controlling a physical server.

Advantages of Hardware Virtualization

The main benefits of hardware virtualization are more efficient resource utilization, lower
overall costs as well as increased uptime and IT flexibility.

1) More Efficient Resource Utilization:

Physical resources can be shared among virtual machines. Although the unused resources can
be allocated to a virtual machine and that can be used by other virtual machines if the need
exists.

2) Lower Overall Costs Because Of Server Consolidation:

Now it is possible for multiple operating systems can co-exist on a single hardware platform,
so that the number of servers, rack space, and power consumption drops significantly.

3) Increased Uptime Because Of Advanced Hardware Virtualization Features:

The modern hypervisors provide highly orchestrated operations that maximize the abstraction
of the hardware and help to ensure the maximum uptime. These functions help to migrate a
running virtual machine from one host to another dynamically, as well as maintain a running
copy of virtual machine on another physical host in case the primary host fails.

4) Increased IT Flexibility:

Hardware virtualization helps for quick deployment of server resources in a managed and
consistent ways. That results in IT being able to adapt quickly and provide the business with
resources needed in good time.
Xen is an open source hypervisor based on paravirtualization. It is the most popular
application of paravirtualization. Xen has been extended to compatible with full virtualization
using hardware-assisted virtualization. It enables high performance to execute guest operating
system. This is probably done by removing the performance loss while executing the
instructions requiring significant handling and by modifying portion of the guest operating
system executed by Xen, with reference to the execution of such instructions. Hence this
especially support x86, which is the most used architecture on commodity machines and
servers.

Xen network architecture: (a) original; (b) optimized


Figure – Xen Architecture and Guest OSnManagement

Above figure describes the Xen Architecture and its mapping onto a classic x86 privilege
model. A Xen based system is handled by Xen hypervisor, which is executed in the most
privileged mode and maintains the access of guest operating system to the basic hardware.
Guest operating system are run between domains, which represents virtual machine
instances.

In addition, particular control software, which has privileged access to the host and handles
all other guest OS, runs in a special domain called Domain 0. This the only one loaded once
the virtual machine manager has fully booted, and hosts an HTTP server that delivers
requests for virtual machine creation, configuration, and termination. This component
establishes the primary version of a shared virtual machine manager (VMM), which is a
necessary part of Cloud computing system delivering Infrastructure-as-a-Service (IaaS)
solution.

Various x86 implementation support four distinct security levels, termed as rings, i.e.,

Ring 0,
Ring 1,
Ring 2,
Ring 3

Here, Ring 0 represents the level having most privilege and Ring 3 represents the level
having least privilege. Almost all the frequently used Operating system, except for OS/2, uses
only two levels i.e. Ring 0 for the Kernel code and Ring 3 for user application and non-
privilege OS program. This provides a chance to the Xen to implement paravirtualization.
This enables Xen to control unchanged the Application Binary Interface (ABI) thus allowing
a simple shift to Xen-virtualized solutions, from an application perspective.
Due to the structure of x86 instruction set, some instructions allow code execution in Ring 3
to switch to Ring 0 (Kernel mode). Such an operation is done at hardware level, and hence
between a virtualized environment, it will lead to a TRAP or a silent fault, thus preventing the
general operation of the guest OS as it is now running in Ring 1.

This condition is basically occurred by a subset of system calls. To eliminate this situation,
implementation in operating system requires a modification and all the sensitive system calls
needs re-implementation with hypercalls. Here, hypercalls are the particular calls revealed by
the virtual machine (VM) interface of Xen and by use of it, Xen hypervisor tends to catch the
execution of all the sensitive instructions, manage them, and return the control to the guest
OS with the help of a supplied handler.

Paravirtualization demands the OS codebase be changed, and hence all operating systems can
not be referred to as guest OS in a Xen-based environment. This condition holds where
hardware-assisted virtualization can not be free, which enables to run the hypervisor in Ring
1 and the guest OS in Ring 0. Hence, Xen shows some limitations in terms of legacy
hardware and in terms of legacy OS.

In fact, these are not possible to modify to be run in Ring 1 safely as their codebase is not
reachable, and concurrently, the primary hardware hasn’t any support to execute them in a
more privileged mode than Ring 0. Open source OS like Linux can be simply modified as its
code is openly available, and Xen delivers full support to virtualization, while components of
Windows are basically not compatible with Xen, unless hardware-assisted virtualization is
available. As new releases of OS are designed to be virtualized, the problem is getting
resolved and new hardware supports x86 virtualization.

Pros:

 a) Xen server is developed over open-source Xen hypervisor and it uses a


combination of hardware-based virtualization and paravirtualization. This tightly
coupled collaboration between the operating system and virtualized platform enables
the system to develop lighter and flexible hypervisor that delivers their functionalities
in an optimized manner.
 b) Xen supports balancing of large workload efficiently that capture CPU, Memory,
disk input-output and network input-output of data. It offers two modes to handle this
workload: Performance enhancement, and For handling data density.
 c) It also comes equipped with a special storage feature that we call Citrix storage
link. Which allows a system administrator to uses the features of arrays from Giant
companies- Hp, Netapp, Dell Equal logic etc.
 d) It also supports multiple processor, Iive migration one machine to another, physical
server to virtual machine or virtual server to virtual machine conversion tools,
centralized multiserver management, real time performance monitoring over window
and linux.

Cons:

 a) Xen is more reliable over linux rather than on window.


 b) Xen relies on 3rd-party component to manage the resources like drivers, storage,
backup, recovery & fault tolerance.
 c) Xen deployment could be a burden some on your Linux kernel system as time
passes.
 d) Xen sometimes may cause increase in load on your resources by high input-output
rate and and may cause starvation of other Vm’s.

Vblades- Para virtualization Of X86-64 Itanium Processor

 The goal of the project à create a VMM for the Itanium family of x86-64 processors,
developed jointly by HP and Intel.

 Itanium à based on explicitly parallel instruction computing (EPIC), that allows the
processor to execute multiple instructions in each clock cycle. EPIC implements a
form of Very Long Instruction Word (VLIW) architecture; a single instruction word
contains multiple instructions.

 A 128-bit instruction word contains three instructions; the fetch mechanism can read
up to two instruction words per clock from the L1 cache into the pipeline.

 The hardware supports 64-bit addressing; it has 32, 64-bit general-purpose registers,
R0 - R31 and 96 automatically renumbered registers R32 - R127 used by procedure
calls. When a procedure is entered, the alloc instruction specifies the registers the
procedure could access by setting the bits of a 7-bit field that controls the register
usage; an illegal read operation from such a register out of range returns a zero value
while an illegal write operation to it is trapped as an illegal instruction.
UNIT IV
Storage Systems: Evolution of storage technology, storage models, file systems and
database, distributed file systems, general parallel file systems. Google file system. Apache
Hadoop, Big Table, Megastore (text book 1), Amazon Simple Storage Service(S3) (Text
book 2), Cloud Security: Cloud security risks, security – a top concern for cloud users,
privacy and privacy

Cloud computing is all about renting computing services. This idea first came in the 1950s. In
making cloud computing what it is today, five technologies played a vital role. These are
distributed systems and its peripherals, virtualization, web 2.0, service orientation, and utility
computing.

 Distributed Systems:
It is a composition of multiple independent systems but all of them are depicted as a
single entity to the users. The purpose of distributed systems is to share resources and
also use them effectively and efficiently. Distributed systems possess characteristics
such as scalability, concurrency, continuous availability, heterogeneity, and
independence in failures. But the main problem with this system was that all the
systems were required to be present at the same geographical location. Thus to solve
this problem, distributed computing led to three more types of computing and they
were-Mainframe computing, cluster computing, and grid computing.

 Mainframe computing:
Mainframes which first came into existence in 1951 are highly powerful and reliable
computing machines. These are responsible for handling large data such as massive
input-output operations. Even today these are used for bulk processing tasks such as
online transactions etc. These systems have almost no downtime with high fault
tolerance. After distributed computing, these increased the processing capabilities of
the system. But these were very expensive. To reduce this cost, cluster computing
came as an alternative to mainframe technology.

 Cluster computing:
In 1980s, cluster computing came as an alternative to mainframe computing. Each
machine in the cluster was connected to each other by a network with high bandwidth.
These were way cheaper than those mainframe systems. These were equally capable
of high computations. Also, new nodes could easily be added to the cluster if it was
required. Thus, the problem of the cost was solved to some extent but the problem
related to geographical restrictions still pertained. To solve this, the concept of grid
computing was introduced.

 Grid computing:
In 1990s, the concept of grid computing was introduced. It means that different
systems were placed at entirely different geographical locations and these all were
connected via the internet. These systems belonged to different organizations and thus
the grid consisted of heterogeneous nodes. Although it solved some problems but new
problems emerged as the distance between the nodes increased. The main problem
which was encountered was the low availability of high bandwidth connectivity and
with it other network associated issues. Thus. cloud computing is often referred to as
“Successor of grid computing”.

 Virtualization:
It was introduced nearly 40 years back. It refers to the process of creating a virtual
layer over the hardware which allows the user to run multiple instances
simultaneously on the hardware. It is a key technology used in cloud computing. It is
the base on which major cloud computing services such as Amazon EC2, VMware
vCloud, etc work on. Hardware virtualization is still one of the most common types of
virtualization.
 Web 2.0:
It is the interface through which the cloud computing services interact with the clients.
It is because of Web 2.0 that we have interactive and dynamic web pages. It also
increases flexibility among web pages. Popular examples of web 2.0 include Google
Maps, Facebook, Twitter, etc. Needless to say, social media is possible because of this
technology only. In gained major popularity in 2004.
 Service orientation:
It acts as a reference model for cloud computing. It supports low-cost, flexible, and
evolvable applications. Two important concepts were introduced in this computing
model. These were Quality of Service (QoS) which also includes the SLA (Service
Level Agreement) and Software as a Service (SaaS).
 Utility computing:
It is a computing model that defines service provisioning techniques for services such
as compute services along with other major services such as storage, infrastructure,
etc which are provisioned on a pay-per-use basis.

Storage Systems in the Cloud:


There are 3 types of storage systems in the Cloud as follows.

 Block-Based Storage System


 File-Based Storage System
 Object-Based Storage System

Let’s discuss it one by one as follows.

Type-1 :
Block-Based Storage System –

 Hard drives are block-based storage systems. Your operating system like Windows or
Linux actually sees a hard disk drive. So, it sees a drive on which you can create a
volume, and then you can partition that volume and format them.
 For example, If a system has 1000 GB of volume, then we can partition it into 800
GB and 200 GB for local C and local D drive respectively.
 Remember with a block-based storage system, your computer would see a drive, and
then you can create volumes and partitions.

Type-2 :
File-Based Storage System –

 In this, you are actually connecting through a Network Interface Card (NIC). You are
going over a network, and then you can access the network-attached storage server
(NAS). NAS devices are file-based storage systems.
 This storage server is another computing device that has another disk in it. It is
already created a file system so that it’s already formatted its partitions, and it will
share its file systems over the network. Here, you can actually map the drive to its
network location.
 In this, like the previous one, there is no need to partition and format the volume by
the user. It’s already done in file-based storage systems. So, the operating system sees
a file system that is mapped to a local drive letter.

Type-3 :
Object-Based Storage System –

 In this, a user uploads objects using a web browser and uploading an object to a
container i.e, Object Storage Container. This uses the HTTP Protocols with the rest of
the APIs (example: GET, PUT, POST, SELECT, DELETE).
 For example, when you connect to any website, and you need to download some
images, text, or anything that the website contains. For that, it is a code HTTP GET
request. If you want to review any product then you can use PUT and POST requests.
 Also, there is no hierarchy of objects in the container. Every file is on the same level
in an Object-Based storage system.
Advantages:

 Scalability –
Capacity and storage can be expanded and performance can be enhanced.

 Flexibility –
Data can be manipulated and scaled according to the rules.

 Simpler Data Migrations –


As it can add and remove the new and old data when required eliminates disruptive
data migrations.

Disadvantages:

 Data centres require electricity and proper internet facility to operate their work,
failing in which system will not work properly.

File System Approach

File based systems were an early attempt to computerize the manual system. It is also called a
traditional based approach in which a decentralized approach was taken where each
department stored and controlled its own data with the help of a data processing specialist.
The main role of a data processing specialist was to create the necessary computer file
structures, and also manage the data within structures and design some application programs
that create reports based on file data.

In the above figure:


Consider an example of a student's file system. The student file will contain information
regarding the student (i.e. roll no, student name, course etc.). Similarly, we have a subject file
that contains information about the subject and the result file which contains the information
regarding the result.

Some fields are duplicated in more than one file, which leads to data redundancy. So to
overcome this problem, we need to create a centralized system, i.e. DBMS approach.

DBMS:

A database approach is a well-organized collection of data that are related in a meaningful


way which can be accessed by different users but stored only once in a system. The various
operations performed by the DBMS system are: Insertion, deletion, selection, sorting etc.

In the above figure,

In the above figure, duplication of data is reduced due to centralization of data.


There are the following differences between DBMS and File systems:

Basis DBMS Approach File System Approach


The file system is a collection of
DBMS is a collection of data. In
data. In this system, the user has to
Meaning DBMS, the user is not required to
write the procedures for managing
write the procedures.
the database.
Data is distributed in many files,
Due to the centralized approach, data
Sharing of data and it may be of different formats,
sharing is easy.
so it isn't easy to share data.
The file system provides the detail
DBMS gives an abstract view of data
Data Abstraction of the data representation and
that hides the details.
storage of data.
Security and DBMS provides a good protection It isn't easy to protect a file under
Protection mechanism. the file system.
The file system doesn't have a crash
DBMS provides a crash recovery mechanism, i.e., if the system
Recovery
mechanism, i.e., DBMS protects the crashes while entering some data,
Mechanism
user from system failure. then the content of the file will be
lost.
DBMS contains a wide variety of
Manipulation The file system can't efficiently
sophisticated techniques to store and
Techniques store and retrieve the data.
retrieve the data.
In the File system, concurrent
DBMS takes care of Concurrent access has many problems like
Concurrency
access of data using some form of redirecting the file while deleting
Problems
locking. some information or updating some
information.
File system approach used in large
Database approach used in large
Where to use systems which interrelate many
systems which interrelate many files.
files.
The database system is expensive to The file system approach is cheaper
Cost
design. to design.
In this, the files and application
Due to the centralization of the
Data programs are created by different
database, the problems of data
Redundancy and programmers so that there exists a
redundancy and inconsistency are
Inconsistency lot of duplication of data which
controlled.
may lead to inconsistency.
The database structure is complex to The file system approach has a
Structure
design. simple structure.
In this system, Data Independence
exists, and it can be of two types.
Data In the File system approach, there
Independence  Logical Data Independence exists no Data Independence.
 Physical Data Independence

Integrity Integrity Constraints are difficult to


Integrity Constraints are easy to apply.
Constraints implement in file system.
In the database approach, 3 types of
data models exist:
In the file system approach, there is
Data Models  Hierarchal data models
no concept of data models exists.
 Network data models
 Relational data models

Changes are often a necessity to the


The flexibility of the system is less
content of the data stored in any
Flexibility as compared to the DBMS
system, and these changes are more
approach.
easily with a database approach.
Examples Oracle, SQL Server, Sybase etc. Cobol, C++ etc.

A Distributed File System (DFS) as the name suggests, is a file system that is distributed on
multiple file servers or multiple locations. It allows programs to access or store isolated files
as they do with the local ones, allowing programmers to access files from any network or
computer.

The main purpose of the Distributed File System (DFS) is to allows users of physically
distributed systems to share their data and resources by using a Common File System. A
collection of workstations and mainframes connected by a Local Area Network (LAN) is a
configuration on Distributed File System. A DFS is executed as a part of the operating
system. In DFS, a namespace is created and this process is transparent for the clients.

DFS has two components:

 Location Transparency –
Location Transparency achieves through the namespace component.
 Redundancy –
Redundancy is done through a file replication component.

In the case of failure and heavy load, these components together improve data availability by
allowing the sharing of data in different locations to be logically grouped under one folder,
which is known as the “DFS root”.

It is not necessary to use both the two components of DFS together, it is possible to use the
namespace component without using the file replication component and it is perfectly
possible to use the file replication component without using the namespace component
between servers.

File system replication:

Early iterations of DFS made use of Microsoft’s File Replication Service (FRS), which
allowed for straightforward file replication between servers. The most recent iterations of the
whole file are distributed to all servers by FRS, which recognises new or updated files.

“DFS Replication” was developed by Windows Server 2003 R2 (DFSR). By only copying
the portions of files that have changed and minimising network traffic with data compression,
it helps to improve FRS. Additionally, it provides users with flexible configuration options to
manage network traffic on a configurable schedule.

Features of DFS :

 Transparency :
o Structure transparency –
There is no need for the client to know about the number or locations of file
servers and the storage devices. Multiple file servers should be provided for
performance, adaptability, and dependability.
o Access transparency –
Both local and remote files should be accessible in the same manner. The file
system should be automatically located on the accessed file and send it to the
client’s side.
o Naming transparency –
There should not be any hint in the name of the file to the location of the file.
Once a name is given to the file, it should not be changed during transferring
from one node to another.
o Replication transparency –
If a file is copied on multiple nodes, both the copies of the file and their
locations should be hidden from one node to another.
 User mobility :
It will automatically bring the user’s home directory to the node where the user logs
in.
 Performance :
Performance is based on the average amount of time needed to convince the client
requests. This time covers the CPU time + time taken to access secondary storage +
network access time. It is advisable that the performance of the Distributed File
System be similar to that of a centralized file system.
 Simplicity and ease of use :
The user interface of a file system should be simple and the number of commands in
the file should be small.
 High availability :
A Distributed File System should be able to continue in case of any partial failures
like a link failure, a node failure, or a storage drive crash.
A high authentic and adaptable distributed file system should have different and
independent file servers for controlling different and independent storage devices.
 Scalability :
Since growing the network by adding new machines or joining two networks together
is routine, the distributed system will inevitably grow over time. As a result, a good
distributed file system should be built to scale quickly as the number of nodes and
users in the system grows. Service should not be substantially disrupted as the number
of nodes and users grows.
 High reliability :
The likelihood of data loss should be minimized as much as feasible in a suitable
distributed file system. That is, because of the system’s unreliability, users should not
feel forced to make backup copies of their files. Rather, a file system should create
backup copies of key files that can be used if the originals are lost. Many file systems
employ stable storage as a high-reliability strategy.
 Data integrity :
Multiple users frequently share a file system. The integrity of data saved in a shared
file must be guaranteed by the file system. That is, concurrent access requests from
many users who are competing for access to the same file must be correctly
synchronized using a concurrency control method. Atomic transactions are a high-
level concurrency management mechanism for data integrity that is frequently offered
to users by a file system.
 Security :
A distributed file system should be secure so that its users may trust that their data
will be kept private. To safeguard the information contained in the file system from
unwanted & unauthorized access, security mechanisms must be implemented.
 Heterogeneity :
Heterogeneity in distributed systems is unavoidable as a result of huge scale. Users of
heterogeneous distributed systems have the option of using multiple computer
platforms for different purposes.

History :

The server component of the Distributed File System was initially introduced as an add-on
feature. It was added to Windows NT 4.0 Server and was known as “DFS 4.1”. Then later on
it was included as a standard component for all editions of Windows 2000 Server. Client-side
support has been included in Windows NT 4.0 and also in later on version of Windows.

Linux kernels 2.6.14 and versions after it come with an SMB client VFS known as “cifs”
which supports DFS. Mac OS X 10.7 (lion) and onwards supports Mac OS X DFS.

Properties:

 File transparency: users can access files without knowing where they are physically
stored on the network.
 Load balancing: the file system can distribute file access requests across multiple
computers to improve performance and reliability.
 Data replication: the file system can store copies of files on multiple computers to
ensure that the files are available even if one of the computers fails.
 Security: the file system can enforce access control policies to ensure that only
authorized users can access files.
 Scalability: the file system can support a large number of users and a large number of
files.
 Concurrent access: multiple users can access and modify the same file at the same
time.
 Fault tolerance: the file system can continue to operate even if one or more of its
components fail.
 Data integrity: the file system can ensure that the data stored in the files is accurate
and has not been corrupted.
 File migration: the file system can move files from one location to another without
interrupting access to the files.
 Data consistency: changes made to a file by one user are immediately visible to all
other users.
Support for different file types: the file system can support a wide range of file types,
including text files, image files, and video files.
Applications:

 NFS –
NFS stands for Network File System. It is a client-server architecture that allows a
computer user to view, store, and update files remotely. The protocol of NFS is one of
the several distributed file system standards for Network-Attached Storage (NAS).
 CIFS –
CIFS stands for Common Internet File System. CIFS is an accent of SMB. That is,
CIFS is an application of SIMB protocol, designed by Microsoft.
 SMB –
SMB stands for Server Message Block. It is a protocol for sharing a file and was
invented by IMB. The SMB protocol was created to allow computers to perform read
and write operations on files to a remote host over a Local Area Network (LAN). The
directories present in the remote host can be accessed via SMB and are called as
“shares”.
 Hadoop –
Hadoop is a group of open-source software services. It gives a software framework
for distributed storage and operating of big data using the MapReduce programming
model. The core of Hadoop contains a storage part, known as Hadoop Distributed File
System (HDFS), and an operating part which is a MapReduce programming model.
 NetWare –
NetWare is an abandon computer network operating system developed by Novell, Inc.
It primarily used combined multitasking to run different services on a personal
computer, using the IPX network protocol.

Working of DFS :

There are two ways in which DFS can be implemented:

 Standalone DFS namespace –


It allows only for those DFS roots that exist on the local computer and are not using
Active Directory. A Standalone DFS can only be acquired on those computers on
which it is created. It does not provide any fault liberation and cannot be linked to any
other DFS. Standalone DFS roots are rarely come across because of their limited
advantage.
 Domain-based DFS namespace –
It stores the configuration of DFS in Active Directory, creating the DFS namespace
root accessible at \\<domainname>\<dfsroot> or \\<FQDN>\<dfsroot>
Advantages:

 DFS allows multiple user to access or store the data.


 It allows the data to be share remotely.
 It improved the availability of file, access time, and network efficiency.
 Improved the capacity to change the size of the data and also improves the ability to
exchange the data.
 Distributed File System provides transparency of data even if server or disk fails.

Disadvantages:

 In Distributed File System nodes and connections needs to be secured therefore we


can say that security is at stake.
 There is a possibility of lose of messages and data in the network while movement
from one node to another.
 Database connection in case of Distributed File System is complicated.
 Also handling of the database is not easy in Distributed File System as compared to a
single user system.
 There are chances that overloading will take place if all nodes tries to send data at
once.

Parallel File System

Cloud computing is a popular choice among IT professionals and companies in the digital
marketing industry. It allows users to access shared resources through the Internet with little
to no up-front investment. Companies that offer cloud computing services typically charge
clients a flat fee per month or yearly contract, but they also might offer free cloud hosting
options for individuals who want to try it out before paying for a subscription plan. The
downside of using cloud services is that data can easily be lost by accessing it from multiple
computers simultaneously without locking down the file system in order to prevent users
from interfering with one another files.
Terminologies in Cloud Computing:

 Parallel File System: The parallel file system is a system that is used to store data
across multiple network servers. It provides high-performance network access through
parallel coordinated input-output operations. This is a file system that allows
concurrent access to data by more than one user.
 Flock: A group of processes (corresponding to a group of threads) sharing the same
memory image.
 Flock semantics: The properties describe how an entity can be accessed by other
processes within the flock when it is not active. In flock semantics, only one process
at a time may have exclusive access to an entity and all other processes must share the
same view of the entity, even if it is active or protected.

How PFS Relates to Cloud Computing:

Cloud computing gives users a lot of freedom to access the data and resources that they need
on demand. However, when it comes to accessing data, it’s important that we shouldn’t lose
the data from different machines at the same time. Without locking down file system access
between different machines, there is a high risk of losing or corrupting important data across
multiple computers at once. This can make managing files difficult because certain users may
end up accessing a file while others are trying to edit it at the same time.

Example: Google File System is a cloud file system that uses a parallel file system. Google
File System (GFS) is a scalable distributed file system that provides consistently high
performance across tens of thousands of commodity servers. It manages huge data sets across
dynamic clusters of computers using only application-level replication and auto-recovery
techniques. This architecture provides high availability with no single point of failure and
supports expected and/or unexpected hardware and software failures without data loss or
system shutdown.
There are two main types of parallel file systems:

 Cloud-based parallel file systems.


 Traditional parallel file systems.

Advantages:

 Data integrity
 Data security
 Disaster recovery

Disadvantages:

 Low scalability
 Less performance

Google File System

Google Inc. developed the Google File System (GFS), a scalable distributed file system
(DFS), to meet the company’s growing data processing needs. GFS offers fault tolerance,
dependability, scalability, availability, and performance to big networks and connected nodes.
GFS is made up of a number of storage systems constructed from inexpensive commodity
hardware parts. The search engine, which creates enormous volumes of data that must be
kept, is only one example of how it is customized to meet Google’s various data use and
storage requirements.

The Google File System reduced hardware flaws while gains of commercially available
servers. Google FS is another name for GFS. It manages two types of data namely File
metadata and File Data. The GFS node cluster consists of a single master and several chunk
servers that various client systems regularly access. On local discs, chunk servers keep data in
the form of Linux files. Large (64 MB) pieces of the stored data are split up and replicated at
least three times around the network. Reduced network overhead results from the greater
chunk size.

Without hindering applications, GFS is made to meet Google’s huge cluster requirements.
Hierarchical directories with path names are used to store files. The master is in charge of
managing metadata, including namespace, access control, and mapping data. The master
communicates with each chunk server by timed heartbeat messages and keeps track of its
status updates.

More than 1,000 nodes with 300 TB of disc storage capacity make up the largest GFS
clusters. This is available for constant access by hundreds of clients.
Components of GFS

A group of computers makes up GFS. A cluster is just a group of connected computers. There
could be hundreds or even thousands of computers in each cluster. There are three basic
entities included in any GFS cluster as follows:

 GFS Clients: They can be computer programs or applications which may be used to
request files. Requests may be made to access and modify already-existing files or
add new files to the system.
 GFS Master Server: It serves as the cluster’s coordinator. It preserves a record of the
cluster’s actions in an operation log. Additionally, it keeps track of the data that
describes chunks, or metadata. The chunks’ place in the overall file and which files
they belong to are indicated by the metadata to the master server.
 GFS Chunk Servers: They are the GFS’s workhorses. They keep 64 MB-sized file
chunks. The master server does not receive any chunks from the chunk servers.
Instead, they directly deliver the client the desired chunks. The GFS makes numerous
copies of each chunk and stores them on various chunk servers in order to assure
stability; the default is three copies. Every replica is referred to as one.

Features of GFS

 Namespace management and locking.


 Fault tolerance.
 Reduced client and master interaction because of large chunk server size.
 High availability.
 Critical data replication.
 Automatic and efficient data recovery.
 High aggregate throughput.

Advantages of GFS

1. High accessibility Data is still accessible even if a few nodes fail. (replication)
Component failures are more common than not, as the saying goes.
2. Excessive throughput. many nodes operating concurrently.
3. Dependable storing. Data that has been corrupted can be found and duplicated.

Disadvantages of GFS

1. Not the best fit for small files.


2. Master may act as a bottleneck.
3. Unable to type at random.
4. Suitable for procedures or data that are written once and only read (appended) later.

Hadoop

Hadoop is an open-source software framework that is used for storing and processing large
amounts of data in a distributed computing environment. It is designed to handle big data and
is based on the MapReduce programming model, which allows for the parallel processing of
large datasets.

Hadoop has two main components:

 HDFS (Hadoop Distributed File System): This is the storage component of Hadoop,
which allows for the storage of large amounts of data across multiple machines. It is
designed to work with commodity hardware, which makes it cost-effective.
 YARN (Yet Another Resource Negotiator): This is the resource management
component of Hadoop, which manages the allocation of resources (such as CPU and
memory) for processing the data stored in HDFS.
 Hadoop also includes several additional modules that provide additional functionality,
such as Hive (a SQL-like query language), Pig (a high-level platform for creating
MapReduce programs), and HBase (a non-relational, distributed database).
 Hadoop is commonly used in big data scenarios such as data warehousing, business
intelligence, and machine learning. It’s also used for data processing, data analysis,
and data mining.

What is Hadoop?

Hadoop is an open source software programming framework for storing a large amount of
data and performing the computation. Its framework is based on Java programming with
some native code in C and shell scripts.

Hadoop is an open-source software framework that is used for storing and processing large
amounts of data in a distributed computing environment. It is designed to handle big data and
is based on the MapReduce programming model, which allows for the parallel processing of
large datasets.

Hadoop has two main components:

 HDFS (Hadoop Distributed File System): This is the storage component of Hadoop,
which allows for the storage of large amounts of data across multiple machines. It is
designed to work with commodity hardware, which makes it cost-effective.
 YARN (Yet Another Resource Negotiator): This is the resource management
component of Hadoop, which manages the allocation of resources (such as CPU and
memory) for processing the data stored in HDFS.
 Hadoop also includes several additional modules that provide additional functionality,
such as Hive (a SQL-like query language), Pig (a high-level platform for creating
MapReduce programs), and HBase (a non-relational, distributed database).
 Hadoop is commonly used in big data scenarios such as data warehousing, business
intelligence, and machine learning. It’s also used for data processing, data analysis,
and data mining. It enables the distributed processing of large data sets across clusters
of computers using a simple programming model.

History of Hadoop

Apache Software Foundation is the developers of Hadoop, and it’s co-founders are Doug
Cutting and Mike Cafarella. It’s co-founder Doug Cutting named it on his son’s toy
elephant. In October 2003 the first paper release was Google File System. In January 2006,
MapReduce development started on the Apache Nutch which consisted of around 6000 lines
coding for it and around 5000 lines coding for HDFS. In April 2006 Hadoop 0.1.0 was
released.

Hadoop is an open-source software framework for storing and processing big data. It was
created by Apache Software Foundation in 2006, based on a white paper written by Google in
2003 that described the Google File System (GFS) and the MapReduce programming model.
The Hadoop framework allows for the distributed processing of large data sets across clusters
of computers using simple programming models. It is designed to scale up from single
servers to thousands of machines, each offering local computation and storage. It is used by
many organizations, including Yahoo, Facebook, and IBM, for a variety of purposes such as
data warehousing, log processing, and research. Hadoop has been widely adopted in the
industry and has become a key technology for big data processing.

Features of hadoop:

1. it is fault tolerance.

2. it is highly available.

3. it’s programming is easy.

4. it have huge flexible storage.

5. it is low cost.

Hadoop has several key features that make it well-suited for big data processing:

 Distributed Storage: Hadoop stores large data sets across multiple machines, allowing
for the storage and processing of extremely large amounts of data.
 Scalability: Hadoop can scale from a single server to thousands of machines, making
it easy to add more capacity as needed.
 Fault-Tolerance: Hadoop is designed to be highly fault-tolerant, meaning it can
continue to operate even in the presence of hardware failures.
 Data locality: Hadoop provides data locality feature, where the data is stored on the
same node where it will be processed, this feature helps to reduce the network traffic
and improve the performance
 High Availability: Hadoop provides High Availability feature, which helps to make
sure that the data is always available and is not lost.
 Flexible Data Processing: Hadoop’s MapReduce programming model allows for the
processing of data in a distributed fashion, making it easy to implement a wide variety
of data processing tasks.
 Data Integrity: Hadoop provides built-in checksum feature, which helps to ensure that
the data stored is consistent and correct.
 Data Replication: Hadoop provides data replication feature, which helps to replicate
the data across the cluster for fault tolerance.
 Data Compression: Hadoop provides built-in data compression feature, which helps to
reduce the storage space and improve the performance.
 YARN: A resource management platform that allows multiple data processing
engines like real-time streaming, batch processing, and interactive SQL, to run and
process data stored in HDFS.

Hadoop Distributed File System

It has distributed file system known as HDFS and this HDFS splits files into blocks and sends
them across various nodes in form of large clusters. Also in case of a node failure, the system
operates and data transfer takes place between the nodes which are facilitated by HDFS.

HDFS

Advantages of HDFS: It is inexpensive, immutable in nature, stores data reliably, ability to


tolerate faults, scalable, block structured, can process a large amount of data simultaneously
and many more. Disadvantages of HDFS: It’s the biggest disadvantage is that it is not fit for
small quantities of data. Also, it has issues related to potential stability, restrictive and rough
in nature. Hadoop also supports a wide range of software packages such as Apache Flumes,
Apache Oozie, Apache HBase, Apache Sqoop, Apache Spark, Apache Storm, Apache Pig,
Apache Hive, Apache Phoenix, Cloudera Impala.

Some common frameworks of Hadoop

1. Hive- It uses HiveQl for data structuring and for writing complicated MapReduce in
HDFS.
2. Drill- It consists of user-defined functions and is used for data exploration.
3. Storm- It allows real-time processing and streaming of data.
4. Spark- It contains a Machine Learning Library(MLlib) for providing enhanced
machine learning and is widely used for data processing. It also supports Java,
Python, and Scala.
5. Pig- It has Pig Latin, a SQL-Like language and performs data transformation of
unstructured data.
6. Tez- It reduces the complexities of Hive and Pig and helps in the running of their
codes faster.
Hadoop framework is made up of the following modules:

1. Hadoop MapReduce- a MapReduce programming model for handling and processing


large data.
2. Hadoop Distributed File System- distributed files in clusters among nodes.
3. Hadoop YARN- a platform which manages computing resources.
4. Hadoop Common- it contains packages and libraries which are used for other
modules.

Advantages and Disadvantages of Hadoop

Advantages:

 Ability to store a large amount of data.


 High flexibility.
 Cost effective.
 High computational power.
 Tasks are independent.
 Linear scaling.

Hadoop has several advantages that make it a popular choice for big data processing:

 Scalability: Hadoop can easily scale to handle large amounts of data by adding more
nodes to the cluster.
 Cost-effective: Hadoop is designed to work with commodity hardware, which makes
it a cost-effective option for storing and processing large amounts of data.
 Fault-tolerance: Hadoop’s distributed architecture provides built-in fault-tolerance,
which means that if one node in the cluster goes down, the data can still be processed
by the other nodes.
 Flexibility: Hadoop can process structured, semi-structured, and unstructured data,
which makes it a versatile option for a wide range of big data scenarios.
 Open-source: Hadoop is open-source software, which means that it is free to use and
modify. This also allows developers to access the source code and make
improvements or add new features.
 Large community: Hadoop has a large and active community of developers and users
who contribute to the development of the software, provide support, and share best
practices.
 Integration: Hadoop is designed to work with other big data technologies such as
Spark, Storm, and Flink, which allows for integration with a wide range of data
processing and analysis tools.

Disadvantages:

 Not very effective for small data.


 Hard cluster management.
 Has stability issues.
 Security concerns.
 Complexity: Hadoop can be complex to set up and maintain, especially for
organizations without a dedicated team of experts.
 Latency: Hadoop is not well-suited for low-latency workloads and may not be the best
choice for real-time data processing.
 Limited Support for Real-time Processing: Hadoop’s batch-oriented nature makes it
less suited for real-time streaming or interactive data processing use cases.
 Limited Support for Structured Data: Hadoop is designed to work with unstructured
and semi-structured data, it is not well-suited for structured data processing
 Data Security: Hadoop does not provide built-in security features such as data
encryption or user authentication, which can make it difficult to secure sensitive data.
 Limited Support for Ad-hoc Queries: Hadoop’s MapReduce programming model is
not well-suited for ad-hoc queries, making it difficult to perform exploratory data
analysis.
 Limited Support for Graph and Machine Learning: Hadoop’s core component HDFS
and MapReduce are not well-suited for graph and machine learning workloads,
specialized components like Apache Giraph and Mahout are available but have some
limitations.
 Cost: Hadoop can be expensive to set up and maintain, especially for organizations
with large amounts of data.
 Data Loss: In the event of a hardware failure, the data stored in a single node may be
lost permanently.
 Data Governance: Data Governance is a critical aspect of data management, Hadoop
does not provide a built-in feature to manage data lineage, data quality, data
cataloging, data lineage, and data audit.
Cloud Big Table: we may store terabytes or even petabytes of data in Google Cloud
BigTable, a sparsely populated table that can scale to billions of rows and thousands of
columns. The row key is the lone index value that appears in every row and is also known as
the row value. Low-latency storage for massive amounts of single-keyed data is made
possible by Google Cloud Bigtable. It is the perfect data source for MapReduce processes
since it enables great read and write throughput with low latency.

Applications can access Google Cloud BigTable through a variety of client libraries,
including a supported Java extension to the Apache HBase library. Because of this, it is
compatible with the current Apache ecosystem of open-source big data software.

 Powerful backend servers from Google Cloud Bigtable have a number of advantages
over a self-managed HBase installation, including:
Exceptional scalability In direct proportion to the number of machines in your
cluster, Google Cloud Bigtable scales. After a certain point, a self-managed HBase
system has a design bottleneck that restricts performance. This bottleneck does not
exist for Google Cloud Bigtable, therefore you can extend your cluster to support
more reads and writes.
 Ease of administration Upgrades and restarts are handled by Google Cloud Bigtable
transparently, and it automatically upholds strong data durability. Simply add a
second cluster to your instance to begin replicating your data; replication will begin
immediately. Simply define your table schemas, and Google Cloud Bigtable will take
care of the rest for you. No more managing replication or regions.
 Cluster scaling with minimal disruption. Without any downtime, you may scale down
a Google Cloud Bigtable cluster after increasing its capacity for a few hours to handle
a heavy load. Under load, Google Cloud Bigtable usually balances performance
across all of the nodes in your cluster within a few minutes after you modify the size
of a cluster.

Why use BigTable?

Applications that require high throughput and scalability for key/value data, where each value
is typically no more than 10 MB, should use Google Cloud BigTable. Additionally, Google
Cloud Bigtable excels as a storage engine for machine learning, stream processing, and batch
MapReduce operations.

All of the following forms of data can be stored in and searched using Google Cloud
Bigtable:

 Time-series information, such as CPU and memory utilization patterns across


various servers.
 Marketing information, such as consumer preferences and purchase history.
Financial information, including stock prices, currency exchange rates, and
transaction histories.
 Internet of Things data, such as consumption statistics from home appliances and
energy meters. Graph data, which includes details on the connections between users.

BigTable Storage Concept:

Each massively scalable table in Google Cloud Bigtable is a sorted key/value map that holds
the data. The table is made up of columns that contain unique values for each row and rows
that typically describe a single object. A single row key is used to index each row, and a
column family is often formed out of related columns. The column family and a column
qualifier, a distinctive name within the column family, are combined to identify each column.

Multiple cells may be present at each row/column intersection. A distinct timestamped copy
of the data for that row and column is present in each cell. When many cells are put in a
column, a history of the recorded data for that row and column is preserved. Cloud by Google
Bigtable tables is sparse, taking up no room if a column is not used in a given row.
a few points to remember Rows of columns could be empty.

A specific row and column contain cells with individual timestamps (t).

All client queries made through the Google Cloud Bigtable architecture are sent through a
frontend server before being forwarded to a Google Cloud Bigtable node. The nodes are
arranged into a Google Cloud Bigtable cluster, which is a container for the cluster and is part
of a Google Cloud Bigtable instance.

A portion of the requests made to the cluster is handled by each node. The number of
simultaneous requests that a cluster can handle can be increased by adding nodes. The
cluster’s maximum throughput rises as more nodes are added. You can send various types of
traffic to various clusters if replication is enabled by adding more clusters. Then you can fail
over to another cluster if one cluster is unavailable.

It’s important to note that data is never really saved in Google Cloud Bigtable nodes; rather,
each node contains pointers to a collection of tablets that are kept on Colossus. Because the
real data is not duplicated, rebalancing tablets from one node to another proceeds swiftly.
When a Google Cloud Bigtable node fails, no data is lost; recovery from a node failure is
quick since only metadata must be moved to the new node. Google Cloud Bigtable merely
changes the pointers for each node.

Load balancing

A primary process oversees each Google Cloud Bigtable zone, balancing workload and data
volume within clusters. By dividing busier/larger tablets in half and combining
less-used/smaller tablets, this procedure moves tablets across nodes as necessary. Google
Cloud Bigtable divides a tablet into two when it experiences a spike in traffic, and then
moves one of the new tablets to a different node. By handling the splitting, merging, and
rebalancing automatically with Google Cloud Bigtable, you may avoid having to manually
manage your tablets.

It’s crucial to distribute writes among nodes as equally as you can in order to obtain the
optimum write performance out of Google Cloud Bigtable. Using row keys with
unpredictable ordering is one method to accomplish this.

Additionally, grouping comparable rows together and placing them next to one another
makes it much easier to read multiple rows at once. If you were keeping various kinds of
weather data across time, for instance, your row key may be the place where the data was
gathered, followed by a timestamp (for instance, WashingtonDC#201803061617). A
contiguous range of rows would be created using this kind of row key to combine all the data
from one location. With several sites gathering data at the same rate, writes would still be
dispersed uniformly between tablets. For other places, the row would begin with a new
identifier.

Obtainable data types

For the majority of uses, Google Cloud Bigtable treats all data as raw byte strings. Only
during increment operations, where the destination must be a 64-bit integer encoded as an 8-
byte big-endian value, does Google Cloud Bigtable attempt to ascertain the type.

Use of the disc and memory

The sections that follow explain how various Google Cloud Bigtable features impact the
amount of memory and disc space used by your instance.

Inactive columns

A Google Cloud Bigtable row doesn’t have any room for columns that aren’t being used.
Each row is essentially made up of a set of key/value entries, where the key is made up of the
timestamp, column family, and column qualifier. The key/value entry is just plain absent if a
row doesn’t have a value for a certain column.

Columns that qualify

Since each column qualifier used in a row is stored in that row, column qualifiers occupy
space in rows. As a result, using column qualifiers as data is frequently effective.

Compactions

To make reads and writes more effective and to eliminate removed entries, Google Cloud
Bigtable periodically rewrites your tables. This procedure is called compaction. Your data is
automatically compacted by Google Cloud Big Table; there are no tuning options.

Removals and Modifications

Because Google Cloud Bigtable saves mutations sequentially and only periodically compacts
them, updates to a row require more storage space. A table is compacted by Google Cloud
Bigtable by removing values that are no longer required. The original value and the updated
value will both be kept on disc until the data is compressed if you change a cell’s value.
Because deletions are actually a particular kind of mutation, they also require more storage
space, at least initially. A deletion consumes additional storage rather than releasing space up
until the table is compacted.

 Compression of data:- Your data is automatically compressed by Google Cloud


Bigtable using a clever algorithm. Compression settings for your table cannot be
configured. To store data effectively so that it may be compressed, though, is useful.
 Patterned data can be compressed more effectively than random data. • Compression
performs best when identical values are next to one another, either in the same row or
in adjacent rows. Text, like as the page you’re reading right now, is a type of
patterned data. The data can be efficiently compressed if your row keys are arranged
so that rows with similar pieces of data are near to one another.
 Before saving values in Google Cloud Bigtable, compress those that are greater than 1
MiB. This compression conserves network traffic, server memory, and CPU cycles.
Compression is automatically off for values greater than 1 MiB in Google Cloud
Bigtable.

Data longevity

When you use Google Cloud Bigtable, your information is kept on Colossus, an internal,
incredibly resilient file system, employing storage components located in Google’s data
centers. To use Google Cloud Bigtable, you do not need to run an HDFS cluster or any other
type of file system.

Beyond what conventional HDFS three-way replication offers, Google employs customized
storage techniques to ensure data persistence. Additionally, we make duplicate copies of your
data to enable disaster recovery and protection against catastrophic situations.
Dependable model

Single-cluster Strong consistency is provided via Google Cloud Bigtable instances.


IAM roles that you can apply for security stop specific users from creating new instances,
reading from tables, or writing to tables. Any of your tables cannot be accessed by anyone
who does not have access to your project or who does not have an IAM role with the
necessary Google Cloud Bigtable permissions.

At the level of projects, instances, and tables, security can be managed. There are no row-
level, column-level, or cell-level security constraints supported by Google Cloud Bigtable.

Encryption

The same hardened key management mechanisms that we employ for our own encrypted data
are used by default for all data stored within Google Cloud, including the data in Google
Cloud Big Table tables.

Customer-managed encryption keys provide you more control over the keys used to protect
your Google Cloud Bigtable data at rest (CMEK).

Backups

With Google Cloud Bigtable backups, you may copy the schema and data of a table and later
restore it to a new table using the backup. You can recover from operator errors, such as
accidentally deleting a table and application-level data destruction with the use of backups.

AWS Storage Services: AWS offers a wide range of storage services that can be provisioned
depending on your project requirements and use case. AWS storage services have different
provisions for highly confidential data, frequently accessed data, and the not so frequently
accessed data. You can choose from various storage types namely, object storage, file
storage, block storage services, backups, and data migration options. All of which fall under
the AWS Storage Services list.
AWS Simple Storage Service (S3): From the aforementioned list, S3, is the object storage
service provided by AWS. It is probably the most commonly used, go-to storage service for
AWS users given the features like extremely high availability, security, and simple
connection to other AWS Services. AWS S3 can be used by people with all kinds of use
cases like mobile/web applications, big data, machine learning and many more.

AWS S3 Terminology:

 Bucket: Data, in S3, is stored in containers called buckets.


o Each bucket will have its own set of policies and configuration. This enables
users to have more control over their data.
o Bucket Names must be unique.
o Can be thought of as a parent folder of data.
o There is a limit of 100 buckets per AWS accounts. But it can be increased if
requested from AWS support.
 Bucket Owner: The person or organization that owns a particular bucket is its bucket
owner.
 Import/Export Station: A machine that uploads or downloads data to/from S3.
 Key: Key, in S3, is a unique identifier for an object in a bucket. For example in a
bucket ‘ABC’ your GFG.java file is stored at javaPrograms/GFG.java then
‘javaPrograms/GFG.java’ is your object key for GFG.java.
o It is important to note that ‘bucketName+key’ is unique for all objects.
o This also means that there can be only one object for a key in a bucket. If you
upload 2 files with the same key. The file uploaded latest will overwrite the
previously contained file.
 Versioning: Versioning means to always keep a record of previously uploaded files
in S3. Points to note:
o Versioning is not enabled by default. Once enabled, it is enabled for all objects
in a bucket.
o Versioning keeps all the copies of your file, so, it adds cost for storing
multiple copies of your data. For example, 10 copies of a file of size 1GB will
have you charged for using 10GBs for S3 space.
o Versioning is helpful to prevent unintended overwrites and deletions.
o Note that objects with the same key can be stored in a bucket if versioning is
enabled (since they have a unique version ID).
 null Object: Version ID for objects in a bucket where versioning is suspended is null.
Such objects may be referred to as null objects.
o For buckets with versioning enabled, each version of a file has a specific
version ID.
 Object: Fundamental entity type stored in AWS S3.
 Access Control Lists (ACL): A document for verifying the access to S3 buckets
from outside your AWS account. Each bucket has its own ACL.
 Bucket Policies: A document for verifying the access to S3 buckets from within your
AWS account, this controls which services and users have what kind of access to your
S3 bucket. Each bucket has its own Bucket Policies.
 Lifecycle Rules: This is a cost-saving practice that can move your files to AWS
Glacier (The AWS Data Archive Service) or to some other S3 storage class for
cheaper storage of old data or completely delete the data after the specified time.
Features of AWS S3:

 Durability: AWS claims Amazon S3 to have a 99.999999999% of durability (11 9’s).


This means the possibility of losing your data stored on S3 is one in a billion.
 Availability: AWS ensures that the up-time of AWS S3 is 99.99% for standard
access.
o Note that availability is related to being able to access data and durability is
related to losing data altogether.
 Server-Side-Encryption (SSE): AWS S3 supports three types of SSE models:
o SSE-S3: AWS S3 manages encryption keys.
o SSE-C: The customer manages encryption keys.
o SSE-KMS: The AWS Key Management Service (KMS) manages the
encryption keys.
 File Size support: AWS S3 can hold files of size ranging from 0 bytes to 5 terabytes.
A 5TB limit on file size should not be a blocker for most of the applications in the
world.
 Infinite storage space: Theoretically AWS S3 is supposed to have infinite storage
space. This makes S3 infinitely scalable for all kinds of use cases.
 Pay as you use: The users are charged according to the S3 storage they hold.
 AWS-S3 is region-specific.

S3 storage classes:

AWS S3 provides multiple storage types that offer different performance and features and
different cost structure.

 Standard: Suitable for frequently accessed data, that needs to be highly available and
durable.
 Standard Infrequent Access (Standard IA): This is a cheaper data-storage class and
as the name suggests, this class is best suited for storing infrequently accessed data
like log files or data archives. Note that there may be a per GB data retrieval fee
associated with Standard IA class.
 Intelligent Tiering: This service class classifies your files automatically into
frequently accessed and infrequently accessed and stores the infrequently accessed
data in infrequent access storage to save costs. This is useful for unpredictable data
access to an S3 bucket.
 One Zone Infrequent Access (One Zone IA): All the files on your S3 have their
copies stored in a minimum of 3 Availability Zones. One Zone IA stores this data in a
single availability zone. It is only recommended to use this storage class for
infrequently accessed, non-essential data. There may be a per GB cost for data
retrieval.
 Reduced Redundancy Storage (RRS): All the other S3 classes ensure the durability
of 99.999999999%. RRS only ensures a 99.99% durability. AWS no longer
recommends RRS due to its less durability. However, it can be used to store non-
essential data.
Security Issues in Cloud Computing

Cloud computing is a type of technology that provides remote services on the internet to
manage, access, and store data rather than storing it on Servers or local drives. This
technology is also known as Server less technology. Here the data can be anything like
Image, Audio, video, documents, files, etc.
Need of Cloud Computing :
Before using Cloud Computing, most of the large as well as small IT companies use
traditional methods i.e. they store data in Server, and they need a separate Server room for
that. In that Server Room, there should be a database server, mail server, firewalls, routers,
modems, high net speed devices, etc. For that IT companies have to spend lots of money. In
order to reduce all the problems with cost Cloud computing come into existence and most
companies shift to this technology.

Security Issues in Cloud Computing:


There is no doubt that Cloud Computing provides various Advantages but there are also some
security issues in cloud computing. Below are some following Security Issues in Cloud
Computing as follows.

1. Data Loss –
Data Loss is one of the issues faced in Cloud Computing. This is also known as Data
Leakage. As we know that our sensitive data is in the hands of Somebody else, and
we don’t have full control over our database. So, if the security of cloud service is to
break by hackers then it may be possible that hackers will get access to our sensitive
data or personal files.

2. Interference of Hackers and Insecure API’s –


As we know, if we are talking about the cloud and its services it means we are talking
about the Internet. Also, we know that the easiest way to communicate with Cloud is
using API. So it is important to protect the Interface’s and API’s which are used by an
external user. But also in cloud computing, few services are available in the public
domain which are the vulnerable part of Cloud Computing because it may be possible
that these services are accessed by some third parties. So, it may be possible that with
the help of these services hackers can easily hack or harm our data.

3. User Account Hijacking –


Account Hijacking is the most serious security issue in Cloud Computing. If
somehow the Account of User or an Organization is hijacked by a hacker then the
hacker has full authority to perform Unauthorized Activities.

4. Changing Service Provider –


Vendor lock-In is also an important Security issue in Cloud Computing. Many
organizations will face different problems while shifting from one vendor to another.
For example, An Organization wants to shift from AWS Cloud to Google Cloud
Services then they face various problems like shifting of all data, also both cloud
services have different techniques and functions, so they also face problems regarding
that. Also, it may be possible that the charges of AWS are different from Google
Cloud, etc.

5. Lack of Skill –
While working, shifting to another service provider, need an extra feature, how to use
a feature, etc. are the main problems caused in IT Company who doesn’t have skilled
Employees. So it requires a skilled person to work with Cloud Computing.

6. Denial of Service (DoS) attack –


This type of attack occurs when the system receives too much traffic. Mostly DoS
attacks occur in large organizations such as the banking sector, government sector,
etc. When a DoS attack occurs, data is lost. So, in order to recover data, it requires a
great amount of money as well as time to handle it.

Almost every organization has adopted cloud computing to varying degrees within their
business. However, with this adoption of the cloud comes the need to ensure that the
organization’s cloud security strategy is capable of protecting against the top threats to cloud
security.

Mis configuration

Misconfigurations of cloud security settings are a leading cause of cloud data breaches. Many
organizations’ cloud security posture management strategies are inadequate for protecting
their cloud-based infrastructure.

Several factors contribute to this. Cloud infrastructure is designed to be easily usable and to
enable easy data sharing, making it difficult for organizations to ensure that data is only
accessible to authorized parties. Also, organizations using cloud-based infrastructure also do
not have complete visibility and control over their infrastructure, meaning that they need to
rely upon security controls provided by their cloud service provider (CSP) to configure and
secure their cloud deployments. Since many organizations are unfamiliar with securing cloud
infrastructure and often have multi-cloud deployments – each with a different array of
vendor-provided security controls – it is easy for a misconfiguration or security oversight to
leave an organization’s cloud-based resources exposed to attackers.

Unauthorized Access

Unlike an organization’s on-premises infrastructure, their cloud-based deployments are


outside the network perimeter and directly accessible from the public Internet. While this is
an asset for the accessibility of this infrastructure to employees and customers, it also makes
it easier for an attacker to gain unauthorized access to an organization’s cloud-based
resources. Improperly-configured security or compromised credentials can enable an attacker
to gain direct access, potentially without an organization’s knowledge.

Insecure Interfaces/APIs

CSPs often provide a number of application programming interfaces (APIs) and interfaces for
their customers. In general, these interfaces are well-documented in an attempt to make them
easily-usable for a CSP’s customers.

However, this creates potential issues if a customer has not properly secured the interfaces for
their cloud-based infrastructure. The documentation designed for the customer can also be
used by a cybercriminal to identify and exploit potential methods for accessing and
exfiltrating sensitive data from an organization’s cloud environment.

Hijacking of Accounts

Many people have extremely weak password security, including password reuse and the use
of weak passwords. This problem exacerbates the impact of phishing attacks and data
breaches since it enables a single stolen password to be used on multiple different accounts.

Account hijacking is one of the more serious cloud security issues as organizations are
increasingly reliant on cloud-based infrastructure and applications for core business
functions. An attacker with an employee’s credentials can access sensitive data or
functionality, and compromised customer credentials give full control over their online
account. Additionally, in the cloud, organizations often lack the ability to identify and
respond to these threats as effectively as for on-premises infrastructure.

Lack of Visibility

An organization’s cloud-based resources are located outside of the corporate network and run
on infrastructure that the company does not own. As a result, many traditional tools for
achieving network visibility are not effective for cloud environments, and some organizations
lack cloud-focused security tools. This can limit an organization’s ability to monitor their
cloud-based resources and protect them against attack.

External Sharing of Data

The cloud is designed to make data sharing easy. Many clouds provide the option to
explicitly invite a collaborator via email or to share a link that enables anyone with the URL
to access the shared resource.

While this easy data sharing is an asset, it can also be a major cloud security issue. The use of
link-based sharing – a popular option since it is easier than explicitly inviting each intended
collaborator – makes it difficult to control access to the shared resource. The shared link can
be forwarded to someone else, stolen as part of a cyberattack, or guessed by a cybercriminal,
providing unauthorized access to the shared resource. Additionally, link-based sharing makes
it impossible to revoke access to only a single recipient of the shared link.
Malicious Insiders

Insider threats are a major security issue for any organization. A malicious insider already has
authorized access to an organization’s network and some of the sensitive resources that it
contains. Attempts to gain this level of access are what reveals most attackers to their target,
making it hard for an unprepared organization to detect a malicious insider.

On the cloud, detection of a malicious insider is even more difficult. With cloud
deployments, companies lack control over their underlying infrastructure, making many
traditional security solutions less effective. This, along with the fact that cloud-based
infrastructure is directly accessible from the public Internet and often suffers from security
misconfigurations, makes it even more difficult to detect malicious insiders.

Cyberattacks

Cybercrime is a business, and cybercriminals select their targets based upon the expected
profitability of their attacks. Cloud-based infrastructure is directly accessible from the public
Internet, is often improperly secured, and contains a great deal of sensitive and valuable data.
Additionally, the cloud is used by many different companies, meaning that a successful attack
can likely be repeated many times with a high probability of success. As a result,
organizations’ cloud deployments are a common target of cyberattacks.

Denial of Service Attacks

The cloud is essential to many organizations’ ability to do business. They use the cloud to
store business-critical data and to run important internal and customer-facing applications.

This means that a successful Denial of Service (DoS) attack against cloud infrastructure is
likely to have a major impact on a number of different companies. As a result, DoS attacks
where the attacker demands a ransom to stop the attack pose a significant threat to an
organization’s cloud-based resources.

Main Cloud Security Concerns in 2021

In the Cloud Security Report, organizations were asked about their major security concerns
regarding cloud environments. Despite the fact that many organizations have decided to
move sensitive data and important applications to the cloud, concerns about how they can
protect it there abound.

Data Loss/Leakage

Cloud-based environments make it easy to share the data stored within them. These
environments are accessible directly from the public Internet and include the ability to share
data easily with other parties via direct email invitations or by sharing a public link to the
data.

The ease of data sharing in the cloud – while a major asset and key to collaboration in the
cloud – creates serious concerns regarding data loss or leakage. In fact, 69% of organizations
point to this as their greatest cloud security concern. Data sharing using public links or
setting a cloud-based repository to public makes it accessible to anyone with knowledge of
the link, and tools exist specifically for searching the Internet for these unsecured cloud
deployments.

Data Privacy/Confidentiality

Data privacy and confidentiality is a major concern for many organizations. Data protection
regulations like the EU’s General Data Protection Regulation (GDPR), the Health Insurance
Portability and Accessibility Act (HIPAA), the Payment Card Industry Data Security
Standard (PCI DSS) and many more mandate the protection of customer data and impose
strict penalties for security failures. Additionally, organizations have a large amount of
internal data that is essential to maintaining competitive advantage.

Placing this data on the cloud has its advantages but also has created major security concerns
for 66% of organizations. Many organizations have adopted cloud computing but lack the
knowledge to ensure that they and their employees are using it securely. As a result, sensitive
data is at risk of exposure – as demonstrated by a massive number of cloud data breaches.

Accidental Exposure of Credentials

Phishers commonly use cloud applications and environments as a pretext in their phishing
attacks. With the growing use of cloud-based email (G-Suite, Microsoft 365, etc.) and
document sharing services (Google Drive, Dropbox, OneDrive), employees have become
accustomed to receiving emails with links that might ask them to confirm their account
credentials before gaining access to a particular document or website.

This makes it easy for cybercriminals to learn an employee’s credentials for cloud services.
As a result, accidental exposure of cloud credentials is a major concern for 44% of
organizations since it potentially compromises the privacy and security of their cloud-based
data and other resources.

Incident Response

Many organizations have strategies in place for responding to internal cyber security
incidents. Since the organization owns their entire internal network infrastructure and security
personnel are on-site, it is possible to lock down the incident. Additionally, this ownership of
their infrastructure means that the company likely has the visibility necessary to identify the
scope of the incident and perform the appropriate remediation actions.

With cloud-based infrastructure, a company only has partial visibility and ownership of their
infrastructure, making traditional processes and security tools ineffective. As a result, 44% of
companies are concerned about their ability to perform incident response effectively in the
cloud.

Legal and Regulatory Compliance

Data protection regulations like PCI DSS and HIPAA require organizations to demonstrate
that they limit access to the protected information (credit card data, healthcare patient
records, etc.). This could require creating a physically or logically isolated part of the
organization’s network that is only accessible to employees with a legitimate need to access
this data.

When moving data protected by these and similar regulations to the cloud, achieving and
demonstrating regulatory compliance can be more difficult. With a cloud deployment,
organizations only have visibility and control into some of the layers of their infrastructure.
As a result, legal and regulatory compliance is considered a major cloud security issue by
42% of organizations and requires specialized cloud compliance solutions.

Data Sovereignty/Residence/Control

Most cloud providers have a number of geographically distributed data centres. This helps to
improve the accessibility and performance of cloud-based resources and makes it easier for
CSPs to ensure that they are capable of maintaining service level agreements in the face of
business-disrupting events such as natural disasters, power outages, etc.

Organizations storing their data in the cloud often have no idea where their data is actually
stored within a CSP’s array of data centres. This creates major concerns around data
sovereignty, residence, and control for 37% of organizations. With data protection regulations
such as the GDPR limiting where EU citizens data can be sent, the use of a cloud platform
with data centres outside of the approved areas could place an organization in a state of
regulatory non-compliance. Additionally, different jurisdictions have different laws regarding
access to data for law enforcement and national security, which can impact the data privacy
and security of an organization’s customers.

Protecting the Cloud

The cloud provides a number of advantages to organizations; however, it also comes with its
own security threats and concerns. Cloud-based infrastructure is very different from an on-
premises data centre, and traditional security tools and strategies are not always able to secure
it effectively. For more information about leading cloud security issues and threats, download
the Cloud Security Report.

Privacy Challenges

Cloud computing is a widely well-discussed topic today with interest from all fields, be it
research, academia, or the IT industry. It has seen suddenly started to be a hot topic in
international conferences and other opportunities throughout the whole world. The spike in
job opportunities is attributed to huge amounts of data being processed and stored on the
servers. The cloud paradigm revolves around convenience and easy the provision of a huge
pool of shared computing resources.

The rapid development of the cloud has led to more flexibility, cost-cutting, and scalability of
products but also faces an enormous amount of privacy and security challenges. Since it is a
relatively new concept and is evolving day by day, there are undiscovered security issues that
creep up and need to be taken care of as soon as discovered. Here we discuss the top 7
privacy challenges encountered in cloud computing:
1. Data Confidentiality Issues

Confidentiality of the user’s data is an important issue to be considered when externalizing


and outsourcing extremely delicate and sensitive data to the cloud service provider. Personal
data should be made unreachable to users who do not have proper authorization to access it
and one way of making sure that confidentiality is by the usage of severe access control
policies and regulations. The lack of trust between the users and cloud service providers or
the cloud database service provider regarding the data is a major security concern and holds
back a lot of people from using cloud services.

2. Data Loss Issues

Data loss or data theft is one of the major security challenges that the cloud providers face. If
a cloud vendor has reported data loss or data theft of critical or sensitive material data in the
past, more than sixty percent of the users would decline to use the cloud services provided by
the vendor. Outages of the cloud services are very frequently visible even from firms such as
Dropbox, Microsoft, Amazon, etc., which in turn results in an absence of trust in these
services during a critical time. Also, it is quite easy for an attacker to gain access to multiple
storage units even if a single one is compromised.

3. Geographical Data Storage Issues

Since the cloud infrastructure is distributed across different geographical locations spread
throughout the world, it is often possible that the user’s data is stored in a location that is out
of the legal jurisdiction which leads to the user’s concerns about the legal accessibility of
local law enforcement and regulations on data that is stored out of their region. Moreover, the
user fears that local laws can be violated due to the dynamic nature of the cloud makes it very
difficult to delegate a specific server that is to be used for trans-border data transmission.

4. Multi-Tenancy Security Issues

Multi-tenancy is a paradigm that follows the concept of sharing computational resources, data
storage, applications, and services among different tenants. This is then hosted by the same
logical or physical platform at the cloud service provider’s premises. While following this
approach, the provider can maximize profits but puts the customer at a risk. Attackers can
take undue advantage of the multi-residence opportunities and can launch various attacks
against their co-tenants which can result in several privacy challenges.

5. Transparency Issues

In cloud computing security, transparency means the willingness of a cloud service provider
to reveal different details and characteristics on its security preparedness. Some of these
details compromise policies and regulations on security, privacy, and service level. In
addition to the willingness and disposition, when calculating transparency, it is important to
notice how reachable the security readiness data and information actually are. It will not
matter the extent to which the security facts about an organization are at hand if they are not
presented in an organized and easily understandable way for cloud service users and auditors,
the transparency of the organization can then also be rated relatively small.
6. Hypervisor Related Issues

Virtualization means the logical abstraction of computing resources from physical restrictions
and constraints. But this poses new challenges for factors like user authentication,
accounting, and authorization. The hypervisor manages multiple Virtual Machines and
therefore becomes the target of adversaries. Different from the physical devices that are
independent of one another, Virtual Machines in the cloud usually reside in a single physical
device that is managed by the same hypervisor. The compromise of the hypervisor will hence
put various virtual machines at risk. Moreover, the newness of the hypervisor technology,
which includes isolation, security hardening, access control, etc. provides adversaries with
new ways to exploit the system.

7. Managerial Issues

There are not only technical aspects of cloud privacy challenges but also non-technical and
managerial ones. Even on implementing a technical solution to a problem or a product and
not managing it properly is eventually bound to introduce vulnerabilities. Some examples are
lack of control, security and privacy management for virtualization, developing
comprehensive service level agreements, going through cloud service vendors and user
negotiations, etc.

Operating System Security

Every computer system and software design must handle all security risks and implement the
necessary measures to enforce security policies. At the same time, it's critical to strike a
balance because strong security measures might increase costs while also limiting the
system's usability, utility, and smooth operation. As a result, system designers must assure
efficient performance without compromising security.

In this article, you will learn about operating system security with its issues and other
features.

What is Operating System Security?

The process of ensuring OS availability, confidentiality, integrity is known as operating


system security. OS security refers to the processes or measures taken to protect the operating
system from dangers, including viruses, worms, malware, and remote hacker intrusions.
Operating system security comprises all preventive-control procedures that protect any
system assets that could be stolen, modified, or deleted if OS security is breached.
Security refers to providing safety for computer system resources like software, CPU,
memory, disks, etc. It can protect against all threats, including viruses and unauthorized
access. It can be enforced by assuring the operating system's integrity, confidentiality, and
availability. If an illegal user runs a computer application, the computer or data stored may
be seriously damaged.

System security may be threatened through two violations, and these are as follows:

1. Threat

A program that has the potential to harm the system seriously.

2. Attack

A breach of security that allows unauthorized access to a resource.

There are two types of security breaches that can harm the system: malicious and accidental.
Malicious threats are a type of destructive computer code or web script that is designed to
cause system vulnerabilities that lead to back doors and security breaches. On the other hand,
Accidental Threats are comparatively easier to protect against.

Security may be compromised through the breaches. Some of the breaches are as follows:

1. Breach of integrity

This violation has unauthorized data modification.

2. Theft of service

It involves the unauthorized use of resources.

3. Breach of confidentiality

It involves the unauthorized reading of data.


4. Breach of availability

It involves the unauthorized destruction of data.

5. Denial of service

It includes preventing legitimate use of the system. Some attacks may be accidental.

The goal of Security System

There are several goals of system security. Some of them are as follows:

1. Integrity

Unauthorized users must not be allowed to access the system's objects, and users with
insufficient rights should not modify the system's critical files and resources.

2. Secrecy

The system's objects must only be available to a small number of authorized users. The
system files should not be accessible to everyone.

3. Availability

All system resources must be accessible to all authorized users, i.e., no single user/process
should be able to consume all system resources. If such a situation arises, service denial may
occur. In this case, malware may restrict system resources and preventing legitimate
processes from accessing them.

Types of Threats

There are mainly two types of threats that occur. These are as follows:

Program threats

The operating system's processes and kernel carry out the specified task as directed. Program
Threats occur when a user program causes these processes to do malicious operations. The
common example of a program threat is that when a program is installed on a computer, it
could store and transfer user credentials to a hacker. There are various program threats. Some
of them are as follows:

1.Virus

A virus may replicate itself on the system. Viruses are extremely dangerous and can
modify/delete user files as well as crash computers. A virus is a little piece of code that is
implemented on the system program. As the user interacts with the program, the virus
becomes embedded in other files and programs, potentially rendering the system inoperable.

2. Trojan Horse
This type of application captures user login credentials. It stores them to transfer them to a
malicious user who can then log in to the computer and access system resources.

3. Logic Bomb

A logic bomb is a situation in which software only misbehaves when particular criteria are
met; otherwise, it functions normally.

4. Trap Door

A trap door is when a program that is supposed to work as expected has a security weakness
in its code that allows it to do illegal actions without the user's knowledge.

System Threats

System threats are described as the misuse of system services and network connections to
cause user problems. These threats may be used to trigger the program threats over an entire
network, known as program attacks. System threats make an environment in which OS
resources and user files may be misused. There are various system threats. Some of them are
as follows:

1. Port Scanning

It is a method by which the cracker determines the system's vulnerabilities for an attack. It is
a fully automated process that includes connecting to a specific port via TCP/IP. To protect
the attacker's identity, port scanning attacks are launched through Zombie Systems, which
previously independent systems now serve their owners while being utilized for such terrible
purposes.

2. Worm

The worm is a process that can choke a system's performance by exhausting all system
resources. A Worm process makes several clones, each consuming system resources and
preventing all other processes from getting essential resources. Worm processes can even
bring a network to a halt.

3. Denial of Service

Denial of service attacks usually prevents users from legitimately using the system. For
example, if a denial-of-service attack is executed against the browser's content settings, a user
may be unable to access the internet.

Threats to Operating System

There are various threats to the operating system. Some of them are as follows:

Malware

It contains viruses, worms, trojan horses, and other dangerous software. These are generally
short code snippets that may corrupt files, delete the data, replicate to propagate further, and
even crash a system. The malware frequently goes unnoticed by the victim user while
criminals silently extract important data.

Network Intrusion

Network intruders are classified as masqueraders, misfeasors, and unauthorized users. A


masquerader is an unauthorized person who gains access to a system and uses an authorized
person's account. A misfeasor is a legitimate user who gains unauthorized access to and
misuses programs, data, or resources. A rogue user takes supervisory authority and tries to
evade access constraints and audit collection.

Buffer Overflow

It is also known as buffer overrun. It is the most common and dangerous security issue of the
operating system. It is defined as a condition at an interface under which more input may be
placed into a buffer and a data holding area than the allotted capacity, and it may overwrite
other information. Attackers use such a situation to crash a system or insert specially created
malware that allows them to take control of the system.

How to ensure Operating System Security?

There are various ways to ensure operating system security. These are as follows:

Authentication

The process of identifying every system user and associating the programs executing with
those users is known as authentication. The operating system is responsible for implementing
a security system that ensures the authenticity of a user who is executing a specific program.
In general, operating systems identify and authenticate users in three ways.

1. Username/Password

Every user contains a unique username and password that should be input correctly before
accessing a system.

2. User Attribution

These techniques usually include biometric verification, such as fingerprints, retina scans,
etc. This authentication is based on user uniqueness and is compared to database samples
already in the system. Users can only allow access if there is a match.

3. User card and Key

To login into the system, the user must punch a card into a card slot or enter a key produced
by a key generator into an option provided by the operating system.
One Time passwords

Along with standard authentication, one-time passwords give an extra layer of security. Every
time a user attempts to log into the One-Time Password system, a unique password is needed.
Once a one-time password has been used, it cannot be reused. One-time passwords may be
implemented in several ways.

1. Secret Key

The user is given a hardware device that can generate a secret id that is linked to the user's id.
The system prompts for such a secret id, which must be generated each time you log in.

2. Random numbers

Users are given cards that have alphabets and numbers printed on them. The system requests
numbers that correspond to a few alphabets chosen at random.

3. Network password

Some commercial applications issue one-time passwords to registered mobile/email


addresses, which must be input before logging in.

Firewalls

Firewalls are essential for monitoring all incoming and outgoing traffic. It imposes local
security, defining the traffic that may travel through it. Firewalls are an efficient way of
protecting network systems or local systems from any network-based security threat.

Physical Security

The most important method of maintaining operating system security is physical security. An
attacker with physical access to a system may edit, remove, or steal important files since
operating system code and configuration files are stored on the hard drive.

Operating System Security Policies and Procedures

Various operating system security policies may be implemented based on the organization
that you are working in. In general, an OS security policy is a document that specifies the
procedures for ensuring that the operating system maintains a specific level of integrity,
confidentiality, and availability.

OS Security protects systems and data from worms, malware, threats, ransomware, backdoor
intrusions, viruses, etc. Security policies handle all preventative activities and procedures to
ensure an operating system's protection, including steal, edited, and deleted data.

As OS security policies and procedures cover a large area, there are various techniques to
addressing them. Some of them are as follows:

1. Installing and updating anti-virus software


2. Ensure the systems are patched or updated regularly
3. Implementing user management policies to protect user accounts and privileges.
4. Installing a firewall and ensuring that it is properly set to monitor all incoming and
outgoing traffic.

OS security policies and procedures are developed and implemented to ensure that you must
first determine which assets, systems, hardware, and date are the most vital to your
organization. Once that is completed, a policy can be developed to secure and safeguard them
properly.

Azure Virtual Machine Security

There are many services available to secure our virtual machine.

Azure Active Directory

 By using the Azure Active Directory, we can control access to our virtual machines to
different users or groups of users. When we create a virtual machine, we can assign a
user to it, and while we are assigning the user to the virtual machine, we will also
associate a particular rule to them. That role defines the level of access that the user
will have on our virtual machine.
 Users, groups, and applications from that directory can manage resources in the Azure
subscription.
 It grants access by assigning the appropriate RBAC role to users, groups, and
applications at a certain scope. The scope of a role assignment can be a subscription, a
resource group, or a single resource.
 Azure RBAC has three essential roles that apply to all resource types:
o Owner: They have full access to all resources, including the right to delegate
access to others.
o Contributor: They can create and manage all types of Azure resources but
can't grant access to others.
o Reader: They can only view existing Azure resources.

Azure security center

The Azure security center identifies potential virtual machine (VM) configuration issues and
targeted security threats. These might include VMs that are missing network security groups,
unencrypted disks, and brute-force Remote Desktop Protocol (RDP) attacks.

We can customize the recommendations we would like to see from the Security Center using
security policies.

 Set up data collection


 Set up security policies
 View VM configuration health
 Remediate configuration issues
 View detected threats
Managed Service Identity

It is newly introduced in Azure. Earlier, what used to happen was whenever we're deploying
an application into a virtual machine; we generally have user id and password within a
configuration file of a folder of that application. But if someone gets access to that virtual
machine, they can be able to go to the configuration file and view that also. To further
increase the security of our application code and safety of services that are being accessed by
application code, we can use Managed Service Identity.

Other Security Features

 Network security group: To filter the traffic in and out of the virtual machine.
 Microsoft Antimalware for Azure: We can install on our Azure virtual machines to
secure our machines against any malware.
 Encryption: We can enable Azure Disk Encryption.
 Key Vault and SSH Keys: we can use key vault to store the certificates or any
sensitive key.
 Policies: All the security-related policies we can apply using it.
UNIT V:
Cloud Application Development: Amazon Web Services : EC2 – instances, connecting
clients, security rules, launching, usage of S3 in Java, Installing Simple Notification Service
on Ubuntu 10.04, Installing Hadoop on Eclipse, Cloud based simulation of a Distributed trust
algorithm, Cloud service for adaptive data streaming ( Text Book 1), Google: Google App
Engine, Google Web Toolkit (Text Book 2), Microsoft: Azure Services Platform, Windows
live, Exchange Online, Share Point Services, Microsoft Dynamics CRM (Text Book2).

EC2

 EC2 stands for Amazon Elastic Compute Cloud.


 Amazon EC2 is a web service that provides resizable compute capacity in the cloud.
 Amazon EC2 reduces the time required to obtain and boot new user instances to
minutes rather than in older days, if you need a server then you had to put a purchase
order, and cabling is done to get a new server which is a very time-consuming
process. Now, Amazon has provided an EC2 which is a virtual machine in the cloud
that completely changes the industry.
 You can scale the compute capacity up and down as per the computing requirement
changes.
 Amazon EC2 changes the economics of computing by allowing you to pay only for
the resources that you actually use. Rather than you previously buy physical servers,
you would look for a server that has more CPU capacity, RAM capacity and you buy
a server over 5 year term, so you have to plan for 5 years in advance. People spend a
lot of capital in such investments. EC2 allows you to pay for the capacity that you
actually use.
 Amazon EC2 provides the developers with the tools to build resilient applications that
isolate themselves from some common scenarios.

EC2 Pricing Options


On Demand

 It allows you to pay a fixed rate by the hour or even by the second with no
commitment.
 Linux instance is by the second and windows instance is by the hour.
 On Demand is perfect for the users who want low cost and flexibility of Amazon EC2
without any up-front investment or long-term commitment.
 It is suitable for the applications with short term, spiky or unpredictable workloads
that cannot be interrupted.
 It is useful for the applications that have been developed or tested on Amazon EC2 for
the first time.
 On Demand instance is recommended when you are not sure which instance type is
required for your performance needs.

Reserved

 It is a way of making a reservation with Amazon or we can say that we make a


contract with Amazon. The contract can be for 1 or 3 years in length.
 In a Reserved instance, you are making a contract means you are paying some
upfront, so it gives you a significant discount on the hourly charge for an instance.
 It is useful for applications with steady state or predictable usage.
 It is used for those applications that require reserved capacity.
 Users can make up-front payments to reduce their total computing costs. For example,
if you pay all your up fronts and you do 3 years contract, then only you can get a
maximum discount, and if you do not pay all up fronts and do one year contract then
you will not be able to get as much discount as you can get If you do 3 year contract
and pay all the up fronts.

Types of Reserved Instances:

 Standard Reserved Instances


 Convertible Reserved Instances
 Scheduled Reserved Instances

Standard Reserved Instances

 It provides a discount of up to 75% off on demand. For example, you are paying all
up-fronts for 3 year contract.
 It is useful when your Application is at the steady-state.
Convertible Reserved Instances

 It provides a discount of up to 54% off on demand.


 It provides the feature that has the capability to change the attributes of RI as long as
the exchange results in the creation of Reserved Instances of equal or greater value.
 Like Standard Reserved Instances, it is also useful for the steady state applications.

Scheduled Reserved Instances

 Scheduled Reserved Instances are available to launch within the specified time
window you reserve.
 It allows you to match your capacity reservation to a predictable recurring schedule
that only requires a fraction of a day, a week, or a month.

Spot Instances

 It allows you to bid for a price whatever price that you want for instance capacity, and
providing better savings if your applications have flexible start and end times.
 Spot Instances are useful for those applications that have flexible start and end times.
 It is useful for those applications that are feasible at very low compute prices.
 It is useful for those users who have an urgent need for large amounts of additional
computing capacity.
 EC2 Spot Instances provide less discounts as compared to On Demand prices.
 Spot Instances are used to optimize your costs on the AWS cloud and scale your
application's throughput up to 10X.
 EC2 Spot Instances will continue to exist until you terminate these instances.

Dedicated Hosts

 A dedicated host is a physical server with EC2 instance capacity which is fully
dedicated to your use.
 The physical EC2 server is the dedicated host that can help you to reduce costs by
allowing you to use your existing server-bound software licenses. For example, Vm
ware, Oracle, SQL Server depending on the licenses that you can bring over to AWS
and then they can use the dedicated host.
 Dedicated hosts are used to address compliance requirements and reduces host by
allowing using your existing server-bound server licenses.
 It can be purchased as a Reservation for up to 70% off On-Demand price.

Web Services in Cloud Computing

The Internet is the worldwide connectivity of hundreds of thousands of computers belonging


to many different networks.

A web service is a standardized method for propagating messages between client and server
applications on the World Wide Web. A web service is a software module that aims to
accomplish a specific set of tasks. Web services can be found and implemented over a
network in cloud computing.

The web service would be able to provide the functionality to the client that invoked the web
service.

A web service is a set of open protocols and standards that allow data exchange between
different applications or systems. Web services can be used by software programs written in
different programming languages and on different platforms to exchange data through
computer networks such as the Internet. In the same way, communication on a computer can
be inter-processed.

Any software, application, or cloud technology that uses a standardized Web protocol (HTTP
or HTTPS) to connect, interoperate, and exchange data messages over the Internet-usually
XML (Extensible Markup Language) is considered a Web service. Is.

Web services allow programs developed in different languages to be connected between a


client and a server by exchanging data over a web service. A client invokes a web service by
submitting an XML request, to which the service responds with an XML response.

 Web services functions


 It is possible to access it via the Internet or intranet network.
 XML messaging protocol that is standardized.
 Operating system or programming language independent.
 Using the XML standard is self-describing.

A simple location approach can be used to detect this.

Web Service Components

XML and HTTP is the most fundamental web service platform. All typical web services use
the following components:

1. SOAP (Simple Object Access Protocol)

SOAP stands for "Simple Object Access Protocol". It is a transport-independent messaging


protocol. SOAP is built on sending XML data in the form of SOAP messages. A document
known as an XML document is attached to each message.

Only the structure of an XML document, not the content, follows a pattern. The great thing
about web services and SOAP is that everything is sent through HTTP, the standard web
protocol.

Every SOAP document requires a root element known as an element. In an XML document,
the root element is the first element.

The "envelope" is divided into two halves. The header comes first, followed by the body.
Routing data, or information that directs the XML document to which client it should be sent,
is contained in the header. The real message will be in the body.
2. UDDI (Universal Description, Search, and Integration)

UDDI is a standard for specifying, publishing and searching online service providers. It
provides a specification that helps in hosting the data through web services. UDDI provides a
repository where WSDL files can be hosted so that a client application can search the WSDL
file to learn about the various actions provided by the web service. As a result, the client
application will have full access to UDDI, which acts as the database for all WSDL files.

The UDDI Registry will keep the information needed for online services, such as a telephone
directory containing the name, address, and phone number of a certain person so that client
applications can find where it is.

3. WSDL (Web Services Description Language)

The client implementing the web service must be aware of the location of the web service. If
a web service cannot be found, it cannot be used. Second, the client application must
understand what the web service does to implement the correct web service. WSDL, or Web
Service Description Language, is used to accomplish this. A WSDL file is another XML-
based file that describes what a web service does with a client application. The client
application will understand where the web service is located and how to access it using the
WSDL document.

How does web service work?

The diagram shows a simplified version of how a web service would function. The client will
use requests to send a sequence of web service calls to the server hosting the actual web
service.

Remote procedure calls are used to perform these requests. The calls to the methods hosted
by the respective web service are known as Remote Procedure Calls (RPC). Example:
Flipkart provides a web service that displays the prices of items offered on Flipkart.com. The
front end or presentation layer can be written in .NET or Java, but the web service can be
communicated using a programming language.
The data exchanged between the client and the server, XML, is the most important part of
web service design. XML (Extensible Markup Language) is a simple, intermediate language
understood by various programming languages. It is the equivalent of HTML.

As a result, when programs communicate with each other, they use XML. It forms a common
platform for applications written in different programming languages to communicate with
each other.

Web services employ SOAP (Simple Object Access Protocol) to transmit XML data between
applications. The data is sent using standard HTTP. A SOAP message is data sent from a web
service to an application. An XML document is all that is contained in a SOAP message. The
client application that calls the web service can be built in any programming language as the
content is written in XML.

Features of Web Service

Web services have the following characteristics:

(a) XML-based: A web service's information representation and record transport layers
employ XML. There is no need for networking, operating system, or platform bindings when
using XML. At the mid-level, web offering-based applications are highly interactive.

(b) Loosely Coupled: The subscriber of an Internet service provider may not necessarily be
directly connected to that service provider. The user interface for a web service provider may
change over time without affecting the user's ability to interact with the service provider. A
strongly coupled system means that the decisions of the mentor and the server are
inextricably linked, indicating that if one interface changes, the other must be updated.

A loosely connected architecture makes software systems more manageable and easier to
integrate between different structures.

(c) Ability to be synchronous or asynchronous: Synchronicity refers to the client's


connection to the execution of the function. Asynchronous operations allow the client to
initiate a task and continue with other tasks. The client is blocked, and the client must wait
for the service to complete its operation before continuing in synchronous invocation.

Asynchronous clients get their results later, but synchronous clients get their effect
immediately when the service is complete. The ability to enable loosely connected systems
requires asynchronous capabilities.

(d) Coarse Grain: Object-oriented systems, such as Java, make their services available
differently. At the corporate level, an operation is too great for a character technique to be
useful. Building a Java application from the ground up requires the development of several
granular strategies, which are then combined into a coarse grain provider that is consumed by
the buyer or service.

Corporations should be coarse-grained, as should the interfaces they expose. Building web
services is an easy way to define coarse-grained services that have access to substantial
business enterprise logic.
(e) Supports remote procedural calls: Consumers can use XML-based protocols to call
procedures, functions, and methods on remote objects that use web services. A web service
must support the input and output framework of the remote system.

Enterprise-wide component development Over the years, JavaBeans (EJBs) and .NET
components have become more prevalent in architectural and enterprise deployments.
Several RPC techniques are used to both allocate and access them.

A web function can support RPC by providing its services, similar to a traditional role, or
translating incoming invocations into an EJB or .NET component invocation.

(f) Supports document exchanges: One of the most attractive features of XML for
communicating with data and complex entities.

Cloud Computing Security Architecture

Security in cloud computing is a major concern. Proxy and brokerage services should be
employed to restrict a client from accessing the shared data directly. Data in the cloud should
be stored in encrypted form.

Security Planning

Before deploying a particular resource to the cloud, one should need to analyze several
aspects of the resource, such as:

 A select resource needs to move to the cloud and analyze its sensitivity to risk.
 Consider cloud service models such as IaaS, PaaS,and These models require the
customer to be responsible for Security at different service levels.
 Consider the cloud type, such as public, private, community, or
 Understand the cloud service provider's system regarding data storage and its transfer
into and out of the cloud.
 The risk in cloud deployment mainly depends upon the service models and cloud
types.

Understanding Security of Cloud

Security Boundaries

The Cloud Security Alliance (CSA) stack model defines the boundaries between each
service model and shows how different functional units relate. A particular service model
defines the boundary between the service provider's responsibilities and the customer. The
following diagram shows the CSA stack model:
Key Points to CSA Model

 IaaS is the most basic level of service, with PaaS and SaaS next two above levels of
services.
 Moving upwards, each service inherits the capabilities and security concerns of the
model beneath.
 IaaS provides the infrastructure, PaaS provides the platform development
environment, and SaaS provides the operating environment.
 IaaS has the lowest integrated functionality and security level, while SaaS has the
highest.
 This model describes the security boundaries at which cloud service providers'
responsibilities end and customers' responsibilities begin.
 Any protection mechanism below the security limit must be built into the system and
maintained by the customer.

Although each service model has a security mechanism, security requirements also depend on
where these services are located, private, public, hybrid, or community cloud.

Understanding data security

Since all data is transferred using the Internet, data security in the cloud is a major concern.
Here are the key mechanisms to protect the data.

 access control
 audit trail
 certification
 authority

The service model should include security mechanisms working in all of the above areas.

Separate access to data

Since the data stored in the cloud can be accessed from anywhere, we need to have a
mechanism to isolate the data and protect it from the client's direct access.

Broker cloud storage is a way of separating storage in the Access Cloud. In this approach,
two services are created:

1. A broker has full access to the storage but does not have access to the client.
2. A proxy does not have access to storage but has access to both the client and the
broker.
3. Working on a Brocade cloud storage access system
4. When the client issues a request to access data:
5. The client data request goes to the external service interface of the proxy.
6. The proxy forwards the request to the broker.
7. The broker requests the data from the cloud storage system.
8. The cloud storage system returns the data to the broker.
9. The broker returns the data to the proxy.
10. Finally, the proxy sends the data to the client.

All the above steps are shown in the following diagram:


Encoding

Encryption helps to protect the data from being hacked. It protects the data being transferred
and the data stored in the cloud. Although encryption helps protect data from unauthorized
access, it does not prevent data loss.

Why is cloud security architecture important?

The difference between "cloud security" and "cloud security architecture" is that the former is
built from problem-specific measures while the latter is built from threats. A cloud security
architecture can reduce or eliminate the holes in Security that point-of-solution approaches
are almost certainly about to leave.

It does this by building down - defining threats starting with the users, moving to the cloud
environment and service provider, and then to the applications. Cloud security architectures
can also reduce redundancy in security measures, which will contribute to threat mitigation
and increase both capital and operating costs.

The cloud security architecture also organizes security measures, making them more
consistent and easier to implement, particularly during cloud deployments and
redeployments. Security is often destroyed because it is illogical or complex, and these flaws
can be identified with the proper cloud security architecture.
Elements of cloud security architecture

The best way to approach cloud security architecture is to start with a description of the
goals. The architecture has to address three things: an attack surface represented by external
access interfaces, a protected asset set that represents the information being protected, and
vectors designed to perform indirect attacks anywhere, including in the cloud and attacks the
system.

The goal of the cloud security architecture is accomplished through a series of functional
elements. These elements are often considered separately rather than part of a coordinated
architectural plan. It includes access security or access control, network security, application
security, contractual Security, and monitoring, sometimes called service security. Finally,
there is data protection, which are measures implemented at the protected-asset level.

Complete cloud security architecture addresses the goals by unifying the functional elements.

Cloud security architecture and shared responsibility model

The security and security architectures for the cloud are not single-player processes. Most
enterprises will keep a large portion of their IT workflow within their data centers, local
networks, and VPNs. The cloud adds additional players, so the cloud security architecture
should be part of a broader shared responsibility model.

A shared responsibility model is an architecture diagram and a contract form. It exists


formally between a cloud user and each cloud provider and network service provider if they
are contracted separately.

Each will divide the components of a cloud application into layers, with the top layer being
the responsibility of the customer and the lower layer being the responsibility of the cloud
provider. Each separate function or component of the application is mapped to the
appropriate layer depending on who provides it. The contract form then describes how each
party responds.

Launch a Website on AWS S3

Launching a website is one of the most important thing for a company either it is a startup or
a well established company. But. Launching a website is not an easy task there are a lot of
things to be taken care off.AWS makes it easier for both with complete knowledge or a
startup. By using AWS S3 service hosting a website in a child’s game now. These gives
company holders more time to focus on other important things to be done in the company.

Let’s see how you can host a website using AWS with some easy steps.

Step 1: Gathering the Basics

The most basic step is to first have a working AWS account and your front end code (.html
file) which will be the content of your website. Don’t worry about the .html content even a
basic <p>Hello World</p> can be made.
Step 2: Create a S3 Bucket of your website

To keep things simple we will be using only one AWS service to host our website that is
AWS S3. AWS S3 in an storage service where all files are stored in S3 Buckets.

Login to your AWS account and choose S3 in search box.

After this when the S3 Dashboard opens, click on create bucket.

Then provide a globally unique name to your S3 Bucket and select the region you want your
bucket to be in.
After clicking on Next, you will see a panel asking you to define some tags to your bucket,
this is optional since tags are just for your recognition of the bucket. You can skip this step by
simply clicking on Next.

Once this is done, a new panel will come up where all the public access to your bucket is by
default denied. But, since we are going to host a website which should be public so that
everyone can see it.

To do this you need to un tick the check box. Once you untick it, a pop up will come warning
you that the bucket is going to be public. So, don’t panic from it just check the
acknowledgement box

After this review your bucket and click on Create bucket.

Step 3: Uploading file into your S3 Bucket

Once the bucket is created, now it’s time to upload the .html file onto it. For this click on blue
Upload button on the top right.

Add your file and click on Next.

After uploading the file click on Next

Once you click on Next, under the Manage public permission select Grand public read
access to this object, so that your website is publicly readable.

At last select your S3 storage type, we choose the basic standard type. But, to reduce the cost
you can choose any other type depending upon your needs.

Now at the end just review the details and click on Upload.

After these step you can see that our index.html file is successfully uploaded.

Step 4: Configure the settings of your S3 Bucket

To inform your S3 Bucket that you are going to use this for hosting your website, click on
Properties tab. After this select the Static Website hosting title and fill in your document
name, error name is not required(can type 404.html).

Next click on Permission tab, Now you’ll need to click on the “Bucket Policy” subsection.
Here, you’ll be prompted to create a JSON object that contains the details of your bucket’s
access permission policy.

This part can be confusing. For now, I’ll just give you the JSON that will grant full public
access to the files in your bucket. This will make the website publicly accessible.

Paste this into the bucket policy editor shown above:


{"Version":"2012-10-17",
"Statement":
[{"Sid":"PublicReadForGetBucketObjects",
"Effect":"Allow","Principal":"*",
"Action":"s3:GetObject",
"Resource":"arn:aws:s3:::YOUR-BUCKET-NAME/*"}]}

In place of YOUR-BUCKET-NAME type your bucket name.

Once this is done just click on Save and All Done! You have now successfully uploaded a
simple static website on AWS S3.

Step 5: Hosting your Website

To access your site, go back to the “Overview” tab on S3 and click on your index document .
You’ll get a slide-in menu with the link on your website.

Copy and Paste the link on your browser and your website will be accessible.

How To Install AWS CLI – Amazon Simple Notification Service (SNS)?

Amazon Simple Notification Service (SNS) is used for the Application to Application (A2A)
and Application to Person (A2P) communication. It provides developers with a highly
scalable, flexible, and cost-effective capability to publish messages from an application and
immediately deliver them to subscribers or other applications. Using AWS Console it is easy
to publish messages to your endpoint of choice (HTTP, SQS, Lambda, mobile push, email, or
SMS) and edit topic policies to control publisher and subscriber access.

Advantages of SNS:

1. Immediate, push-based delivery


2. Easy for integration
3. Flexible message delivery using protocols
4. It supports FIFO topics

Steps to create AWS SNS Service:

The below Steps are related to AWS E-mail SNS Service.

 Firstly, open the AWS cloud shell and use the following command for creating the
topic. Specify the name of the topic, for example, gfg-topic:

$ aws sns create-topic


--name gfg-topic

 Subscribe to the Topic:

$ aws sns subscribe --topic-arn arn:aws:sns:us-west


-2:123456789012:gfg-topic --protocol email
--notification-endpoint example@example.com

 Here you can choose your protocol to send your message through which protocol. i.e.,
HTTP, SQS, Lambda, mobile push, email, or SMS. After running the command you
will get the email, to confirm your subscription by clicking on the given link.

 Publish to a topic: After subscribing to the topic publish your topic and send the
message to a perspective person or device.

$ aws sns publish --topic-arn


arn:aws:sns:us-west-2:123456789012:
gfg-topic --message "Hello Geeks"

 Unsubscribe the Topic: For stopping receiving messages from the particular
application you can unsubscribe using this command:

$ aws sns unsubscribe --subscription-arn


arn:aws:sns:us-west-2:123456789012:
gfg-topic:1328f057-de93-4c15-512e-8bb22EXAMPLE

 Delete the topic: For deleting the topic you can simply use the below command:

$ aws sns delete-topic --topic-arn


arn:aws:sns:us-west-2:123456789012:gfg-topic

How to Configure the Eclipse with Apache Hadoop?

Eclipse is an IDE(Integrated Development Environment) that helps to create and build an


application as per our requirement. And Hadoop is used for storing and processing big data.
And if you have requirements to configure eclipse with Hadoop then you can follow this
section step by step. Here, we will discuss 8 steps in which you will see the download,
installation, and configuration part of an eclipse and will see how you can configure the
Hadoop while installing and configuring an eclipse.

Step 1: Download and Install Eclipse IDE

It is the most popular IDE for developing java applications. It is easy to use and powerful
IDE that is the reason behind the trust of many programmers. It can be downloaded from the
given link below as follows.

http://www.eclipse.org/downloads/eclipse-packages

You will get a file named eclipse-committers-photon-R-linux-gtk.tar.gz.

We need to extract it using the command tar –zxvf eclipse-committers-photon-R-linux-


gtk.tar.gz
Step 2: Move the eclipse folder to the home directory

In this step, you can see how to move the eclipse folder to the home directory. You can check
the screenshot for your reference.

Step 3: Open Eclipse

Now, in this step, you can see the eclipse icon once you will successfully download and
extract it to the required folder. And double click to open. You can see the screenshot for
your reference.

Step 4: Execute Eclipse

Now choose a workspace directory and then click LAUNCH.

Step 5: Download required files

 Hadoop-core-1.2.1.jar
 commons-cli-1.2.jar

Step 6: Creating Java Project

Create a java project in the package explorer.

file—>new—>java project—>finish
right click—>new—>package—>finish
right click on package—>new—>class_name

Step 7: Adding Reference libraries

Now add the following reference libraries as follows. First, we need to go to build path –
>configure build path –> configure build path

Step 8: Adding .jar Files

Click on add external jars and browse to the folder where the files are downloaded and click
open and select these two files i.e. hadoop-core-1.2.1.jar , commons-cli-1.2.jar.
Cloud based simulation of a Distributed trust algorithm

Cloud service for adaptive data streaming


What is Google Cloud Platform (GCP)?

Before we begin learning about Google Cloud Platform, we will talk about what is Cloud
Computing. Basically it is using someone else’s computer over the internet. Example- GCP,
AWS, IBM Cloud, etc. Some interesting features of cloud computing are as follows:

 You get computing resources on-demand and self-service. The customer has to use a
simple User Interface and they get the computing power, storage requirements, and
network you need, without human intervention.
 You can access these cloud resources over the internet from anywhere on the globe.
 The provider of these resources has a huge collection of these resources and allocates
them to customers out of that collection.
 The resources are elastic. If you need more resources you can get more, rapidly. If
you need less, you can scale down back.
 The customers pay only for what they use or reserve. If they stop using resources,
they stop paying.

Three Categories of Cloud Services

 Infrastructure as a Service (IaaS): It provides you all the hardware components you
require such as computing power, storage, network, etc.
 Platform as a Service (PaaS): It provides you a platform that you can use to develop
applications, software, and other projects.
 Software as a Service (SaaS): It provides you with complete software to use like
Gmail, google drive, etc.
Google Cloud Platform

Starting from 1998 with the launch of google search. google has developed one of the largest
and most Powerful IT Infrastructure in the world. Today, this infrastructure is used by billion
of users to use services such as Gmail, Youtube, Google Photo and Maps.In 2008 , Google
decided to open its network and IT infrastructure to business customers, taking an
infrastructure that was initially developed for consumers application to public service and
launching google cloud platform.

All the services listed above are provided by Google hence the name Google Cloud Platform
(GCP). Apart from these, there are so many other services provided by GCP and also many
concepts related to it that we are going to discuss in this article.

Regions and zones:

Let’s start at the finest grain level (i.e. the smallest or first step in the hierarchy), the Zone. A
zone is an area where Google Cloud Platform Resources like virtual machines or storage is
deployed.

For example, when you launch a virtual machine in GCP using Compute Engine, it runs in a
zone you specify (suppose Europe-west2-a). Although people consider a zone as being sort of
a GCP Data Center, that’s not strictly accurate because a zone doesn’t always correspond to
one physical building. You can still visualize the zone that way, though.

Zones are grouped into regions which are independent geographic areas and much larger
than zones (for example- all zones shown above are grouped into a single region Europe-
west2) and you can choose what regions you want your GCP resources to be placed in. All
the zones within a neighborhood have fast network connectivity among them. Locations
within regions usually have trip network latencies of under five milliseconds.

As a part of developing a fault-tolerant application, you’ll need to spread your resources


across multiple zones in a region. That helps protect against unexpected failures. You can run
resources in different regions too. Lots of GCP customers do this, both to bring their
applications closer to users around the world, and also to guard against the loss of a whole
region, say, due to a natural disaster.

A few GCP Services supports deploying resources in what we call a Multi-Region. For
example, Google Cloud Storage, lets you place data within the Europe Multi-Region. What
that means is that it is stored redundantly in a minimum of two different geographic locations,
separated by at least 160 kilometers within Europe. Previously, GCP had 15 regions. Visit
cloud.google.com to ascertain what the entire is up to today.

Pricing

Google was the primary major Cloud provider to bill by the second instead of rounding up to
greater units of your time for its virtual machines as a service offering. This may not sound
like a big deal, but charges for rounding up can really add up for customers who are creating
and running lots of virtual machines. Per second billing is obtainable for a virtual machine
use through Compute Engine and for several other services too.

Compute Engine provides automatically applied use discounts which are discounts that you
simply get for running a virtual machine for a big portion of the billing month. When you run
an instance for at least 25% of a month, Compute Engine automatically gives you a reduction
for each incremental minute you employ it. Here’s one more way Compute Engine saves you
money.

Normally, you choose a virtual machine type from a typical set of those values, but Compute
Engine also offers custom virtual machine types, in order that you’ll fine-tune the sizes of the
virtual machines you use. That way, you’ll tailor your pricing for your workloads.

Open API’s

Some people are afraid to bring their workloads to the cloud because they’re afraid they’ll get
locked into a specific vendor. But in many ways, Google gives customers the power to run
their applications elsewhere, if Google becomes not the simplest provider for his or her
needs. Here are some samples of how Google helps its customers avoid feeling locked in.
GCP services are compatible with open source products. For example, take Cloud
Bigtable, a database that uses the interface of the open-source database Apache HBase, which
provides customers the advantage of code portability. Another example, Cloud Dataproc
provides the open-source big data environment Hadoop, as a managed service, etc.

Why choose GCP?

 GCP allows you to choose between computing, storage, big data, machine learning,
and application services for your web, mobile, analytics, and, back-end solutions.
 It’s global and it is cost-effective.
 It’s open-source friendly.
 It’s designed for security.

Advantages of GCP

1. Good documentation: We are talking about many pages in total, including a


reasonably detailed API Reference guide.
2. Different storage classes for every necessity: Regional (frequent use), Nearline
(infrequent use), and Coldline (long-term storage).
3. High durability: This suggests that data survives even within the event of the
simultaneous loss of two disks.
4. Many regions available to store your data: North Ameria, South America, Europe,
Asia, and Australia.
5. The “Console” tab within the documentation allows you to try for free of charge
different SDKs. It’s incredibly useful for developers
6. One of the simplest free layers within the industry. $300 free credit to start with any
GCP product during the primary year. Afterward, 5 GB of Storage to use forever
without any charges.
Disadvantages of GCP

1. The support fee is sort of hefty: Around 150 USD per month for the foremost basic service
(Silver class).
2. Downloading data from Google Cloud Storage is expensive. 0, 12 USD per GB.
3. Google Cloud Platform web interface is somewhat confusing. Sometimes I am lost while
browsing around the menus.
4. Prices in both Microsoft Azure (around 0.018 USD per GB/month) or Backblaze B2 (about
0.005 USD per GB/month) are less than Google Cloud Storage.
5. It has a high pricing schema, almost like AWS S3, so it’s easy to urge unexpected costs (e.g.
number of requests, transfers, etc.).

Google App Engine lets you run your Python and Java Web applications on elastic
infrastructure supplied by Google. App Engine allows your applications to scale
dynamically as your traffic and data storage requirements increase or decrease. It gives
developers a choice between a Python stack and Java. The App Engine serving architecture
is notable in that it allows real-time auto- scaling without virtualization for many common
types of Web applications. However, such auto-scaling is dependent on the application
developer using a limited subset of the native APIs on each platform, and in some instances
you need to use specific Google APIs such as URLFetch, Data store, and mem cache in
place of certain native API calls. For example, a deployed App Engine application cannot
write to the file system directly (you must use the Google Data store) or open a socket or
access another host directly (you must use Google URL fetch service). A Java application
cannot create a new Thread either.

A scalable runtime environment, Google App Engine is mostly used to run Web applications.
These dynamic scales as demand change over time because of Google’s vast computing
infrastructure. Because it offers a secure execution environment in addition to a number of
services, App Engine makes it easier to develop scalable and high-performance Web apps.
Google’s applications will scale up and down in response to shifting demand. Croon tasks,
communications, scalable data stores, work queues, and in-memory caching are some of
these services.

The App Engine SDK facilitates the testing and professionalization of applications by
emulating the production runtime environment and allowing developers to design and test
applications on their own PCs. When an application is finished being produced, developers
can quickly migrate it to App Engine, put in place quotas to control the cost that is generated,
and make the programmer available to everyone. Python, Java, and Go are among the
languages that are currently supported.

The development and hosting platform Google App Engine, which powers anything from
web programming for huge enterprises to mobile apps, uses the same infrastructure as
Google’s large-scale internet services. It is a fully managed PaaS (platform as a service)
cloud computing platform that uses in-built services to run your apps. You can start creating
almost immediately after receiving the software development kit (SDK). You may
immediately access the Google app developer’s manual once you’ve chosen the language you
wish to use to build your app.

After creating a Cloud account, you may Start Building your App
 Using the Go template/HTML package
 Python-based webapp2 with Jinja2
 PHP and Cloud SQL
 using Java’s Maven

The app engine runs the programmers on various servers while “sandboxing” them. The app
engine allows the program to use more resources in order to handle increased demands. The
app engine powers programs like Snapchat, Rovio, and Khan Academy.

Features of App Engine

Runtimes and Languages

To create an application for an app engine, you can use Go, Java, PHP, or Python. You can
develop and test an app locally using the SDK’s deployment toolkit. Each language’s SDK
and nun time are unique. Your program is run in a:

 Java Run Time Environment version 7


 Python Run Time environment version 2.7
 PHP runtime’s PHP 5.4 environment
 Go runtime 1.2 environment

Generally Usable Features

These are protected by the service-level agreement and depreciation policy of the app engine.
The implementation of such a feature is often stable, and any changes made to it are
backward-compatible. These include communications, process management, computing, data
storage, retrieval, and search, as well as app configuration and management. Features like the
HRD migration tool, Google Cloud SQL, logs, datastore, dedicated Memcached, blob store,
Memcached, and search are included in the categories of data storage, retrieval, and search.

Features in Preview

In a later iteration of the app engine, these functions will undoubtedly be made broadly
accessible. However, because they are in the preview, their implementation may change in
ways that are backward-incompatible. Sockets, MapReduce, and the Google Cloud Storage
Client Library are a few of them.

Experimental Features

These might or might not be made broadly accessible in the next app engine updates. They
might be changed in ways that are irreconcilable with the past. The “trusted tester” features,
however, are only accessible to a limited user base and require registration in order to utilize
them. The experimental features include Prospective Search, Page Speed, OpenID,
Restore/Backup/Datastore Admin, Task Queue Tagging, MapReduce, and Task Queue REST
API. App metrics analytics, datastore admin/backup/restore, task queue tagging, MapReduce,
task queue REST API, OAuth, prospective search, OpenID, and Page Speed are some of the
experimental features.
Third-Party Services

As Google provides documentation and helper libraries to expand the capabilities of the app
engine platform, your app can perform tasks that are not built into the core product you are
familiar with as app engine. To do this, Google collaborates with other organizations. Along
with the helper libraries, the partners frequently provide exclusive deals to app engine users.

Advantages of Google App Engine

The Google App Engine has a lot of benefits that can help you advance your app ideas. This
comprises:

1. Infrastructure for Security: The Internet infrastructure that Google uses is arguably
the safest in the entire world. Since the application data and code are hosted on
extremely secure servers, there has rarely been any kind of illegal access to date.
2. Faster Time to Market: For every organization, getting a product or service to
market quickly is crucial. When it comes to quickly releasing the product,
encouraging the development and maintenance of an app is essential. A firm can grow
swiftly with Google Cloud App Engine’s assistance.
3. Quick to Start: You don’t need to spend a lot of time prototyping or deploying the
app to users because there is no hardware or product to buy and maintain.
4. Easy to Use: The tools that you need to create, test, launch, and update the
applications are included in Google App Engine (GAE).
5. Rich set of APIs & Services: A number of built-in APIs and services in Google App
Engine enable developers to create strong, feature-rich apps.
6. Scalability: This is one of the deciding variables for the success of any software.
When using the Google app engine to construct apps, you may access technologies
like GFS, Big Table, and others that Google uses to build its own apps.
7. Performance and Reliability: Among international brands, Google ranks among the
top ones. Therefore, you must bear that in mind while talking about performance and
reliability.
8. Cost Savings: To administer your servers, you don’t need to employ engineers or
even do it yourself. The money you save might be put toward developing other areas
of your company.
9. Platform Independence: Since the app engine platform only has a few dependencies,
you can easily relocate all of your data to another environment.

Google Web Toolkit (GWT) :


GWT stands for Google Web Toolkit. It is an open-source set of tools that allows to manage
or create an application in java. The original author of GWT is google. It supports various
operating system like Linux, UNIX, windows Osx, freeSD.GWT developed by google may
16 2006 and it is written on java. It helps to create a browser-based application code will be
written in java with some CSS knowledge. GWT is used by the most revenue-generating
organization internet that is google Adword, Adsense, Blogger, Wallet etc.

Features of GWT –

 GWT provides easy integration with Junit and Maven.


 Again being Java based, GWT has a low learning curve for Java Developers.
 GWT generates optimized javascript code, produces browser’s specific javascript
code by self.
 GWT provides Widgets library provides most of tasks required in an application.

Difference between AngularJs and Google Web Toolkit (GWT) :

AngularJs GWT
AngularJs is an open source JAVA
Google Web Toolkit is an open-source
SCRIPT framework. It is maintain by
set of tools that allows to manage or
google and support for all the major
create application in java.
browser.
It is developed by google on 20 October GWT developed by google may 16
2010. 2006.
it is written on java programming
It is written on JAVASCRIPT.
language.
AngularJs support MVVM design pattern. GWT support MVP design pattern.
In AngularJs for client server code we In GWT for client server code we have
have MVVM web service. MVC.
GWT is also open-source under Apache
It is open-source under MIT license.
license.
It support dynamic typing. It does not support dynamic typing.
In AngularJs we use cloud platform In GWT we use cloud platform support
support via Google app engine. via digital ocean.
AngularJs support 80 kb file size. GWT support 32mb file size.
It support object-oriented or event driven It support only object oriented
program. programming.
In AngularJs there is some condition for
In GWT it support code generation.
code generation.
Introduction to Microsoft Azure | A cloud computing service

Azure is Microsoft’s cloud platform, just like Google has it’s Google Cloud and Amazon has
it’s Amazon Web Service or AWS.000. Generally, it is a platform through which we can use
Microsoft’s resource. For example, to set up a huge server, we will require huge investment,
effort, physical space and so on. In such situations, Microsoft Azure comes to our rescue. It
will provide us with virtual machines, fast processing of data, analytical and monitoring tools
and so on to make our work simpler. The pricing of Azure is also simpler and cost-effective.
Popularly termed as “Pay As You Go”, which means how much you use, pay only for that.

Azure History

Microsoft unveiled Windows Azure in early October 2008 but it went to live after February
2010. Later in 2014, Microsoft changed its name from Windows Azure to Microsoft Azure.
Azure provided a service platform for .NET services, SQL Services, and many Live Services.
Many people were still very skeptical about “the cloud”. As an industry, we were entering a
brave new world with many possibilities. Microsoft Azure is getting bigger and better in
coming days. More tools and more functionalities are getting added. It has two releases as of
now. It’s famous version Microsoft Azure v1 and later Microsoft Azure v2. Microsoft
Azure v1 was more like JSON script driven then the new version v2, which has interactive UI
for simplification and easy learning. Microsoft Azure v2 is still in the preview version.

How Azure can help in business?

Azure can help in our business in the following ways-

 Capital less: We don’t have to worry about the capital as Azure cuts out the high cost
of hardware. You simply pay as you go and enjoy a subscription-based model that’s
kind to your cash flow. Also, to set up an Azure account is very easy. You simply
register in Azure Portal and select your required subscription and get going.
 Less Operational Cost: Azure has low operational cost because it runs on its own
servers whose only job is to make the cloud functional and bug-free, it’s usually a
whole lot more reliable than your own, on-location server.
 Cost Effective: If we set up a server on our own, we need to hire a tech support team
to monitor them and make sure things are working fine. Also, there might be a
situation where the tech support team is taking too much time to solve the issue
incurred in the server. So, in this regard is way too pocket-friendly.
 Easy Back Up and Recovery options: Azure keep backups of all your valuable data.
In disaster situations, you can recover all your data in a single click without your
business getting affected. Cloud-based backup and recovery solutions save time,
avoid large up-front investment and roll up third-party expertise as part of the deal.
 Easy to implement: It is very easy to implement your business models in Azure.
With a couple of on-click activities, you are good to go. Even there are several
tutorials to make you learn and deploy faster.
 Better Security: Azure provides more security than local servers. Be carefree about
your critical data and business applications. As it stays safe in the Azure Cloud. Even,
in natural disasters, where the resources can be harmed, Azure is a rescue. The cloud
is always on.
 Work from anywhere: Azure gives you the freedom to work from anywhere and
everywhere. It just requires a network connection and credentials. And with most
serious Azure cloud services offering mobile apps, you’re not restricted to which
device you’ve got to hand.
 Increased collaboration: With Azure, teams can access, edit and share documents
anytime, from anywhere. They can work and achieve future goals hand in hand.
Another advantage of the Azure is that it preserves records of activity and data.
Timestamps are one example of the Azure’s record keeping. Timestamps improve
team collaboration by establishing transparency and increasing accountability.

Microsoft Azure Services

Some following are the services of Microsoft Azure offers:

1. Compute: Includes Virtual Machines, Virtual Machine Scale Sets, Functions for
serverless computing, Batch for containerized batch workloads, Service Fabric for
microservices and container orchestration, and Cloud Services for building cloud-
based apps and APIs.
2. Networking: With Azure you can use variety of networking tools, like the Virtual
Network, which can connect to on-premise data centers; Load Balancer; Application
Gateway; VPN Gateway; Azure DNS for domain hosting, Content Delivery Network,
Traffic Manager, ExpressRoute dedicated private network fiber connections; and
Network Watcher monitoring and diagnostics
3. Storage: Includes Blob, Queue, File and Disk Storage, as well as a Data Lake Store,
Backup and Site Recovery, among others.
4. Web + Mobile: Creating Web + Mobile applications is very easy as it includes
several services for building and deploying applications.
5. Containers: Azure has a property which includes Container Service, which supports
Kubernetes, DC/OS or Docker Swarm, and Container Registry, as well as tools for
microservices.
6. Databases: Azure has also includes several SQL-based databases and related tools.
7. Data + Analytics: Azure has some big data tools like HDInsight for Hadoop Spark, R
Server, HBase and Storm clusters
8. AI + Cognitive Services: With Azure developing applications with artificial
intelligence capabilities, like the Computer Vision API, Face API, Bing Web Search,
Video Indexer, Language Understanding Intelligent.
9. Internet of Things: Includes IoT Hub and IoT Edge services that can be combined
with a variety of machine learning, analytics, and communications services.
10. Security + Identity: Includes Security Center, Azure Active Directory, Key Vault
and Multi-Factor Authentication Services.
11. Developer Tools: Includes cloud development services like Visual Studio Team
Services, Azure DevTest Labs, HockeyApp mobile app deployment and monitoring,
Xamarin cross-platform mobile development and more.

Difference between AWS (Amazon Web Services), Google Cloud and Azure
What Does Windows Live Mean?

Windows Live is Microsoft's branded suite of online and client-side tools and applications.
Windows Live includes browser-based Web services, mobile services and Windows Live
Essentials.

Similar to Google Apps, Windows Live is part of Microsoft's cloud strategy, or Software Plus
Services (Software + Services or S+S).

Techopedia Explains Windows Live

Released in November 2005, Windows Live serves as an online user gateway that provides
Microsoft and third-party applications for seamless user interaction. Classic Windows Live
applications include Hotmail (Microsoft's free email service), Live Messenger, Live Photos
and Live Calendar.

Recently added Windows Live applications include:

 Windows Live Mail: POP3 email client that easily integrates with non-Microsoft
email services
 Windows Live SkyDrive: Facilitates online Microsoft Office collaboration and
provides free cloud storage for documents and photos.
 Windows Live Messenger Companion: Internet Explorer add-in for live collaboration
 Windows Live Family Safety: Extends parental controls in Windows 7 and Vista

Exchange Online: Exchange Online is the hosted version of Microsoft's Exchange Server
messaging platform that organizations can obtain as a stand-alone service or via an Office
365 subscription.

Exchange Online gives companies a majority of the same benefits that on-premises Exchange
deployments provide. Users connect to Exchange Online via the Microsoft Outlook desktop
client, Outlook on the web with a web browser, or with mobile devices using the Outlook
mobile app to access email and collaboration functionality, including shared calendars, global
address lists and conference rooms.

Exchange Online management options

Administrators can use different tools to manage Exchange Online.

Administrators use the Exchange admin center to tweak features in Exchange Online, such as
the ability to put disclaimers in an email.

The Exchange admin center is a centralized management console used to adjust Exchange
Online features, including permissions, compliance management, protection and mobile to
configure mobile device access.

Administrators can also use Windows PowerShell to set up permissions and manage
functionality from the PowerShell command line using cmdlets. While the Exchange admin
center operates from a web browser, PowerShell requires the administrator to execute several
steps to make a remote PowerShell session to Exchange Online.

Exchange Online offerings

With the basic Exchange Online Plan 1 offering, users have 50 GB of mailbox storage at a
cost of $4 per user, per month. Microsoft provides Exchange Online Protection as part of this
service to scan email for malware and spam.

At $8 a month, the Exchange Online Plan 2 gives unlimited mailbox space and Unified
messaging features, including call answering and automated attendant functionality.
Administrators get additional features such as data loss prevention policies for regulated
industries and organizations that require additional protections for sensitive information.

Organizations that use the Office 365 Business Premium subscription pay $12.50 per user,
per month for Exchange Online and access to other features, including web- and desktop-
based Office 365 applications, SharePoint intranet features, 1 TB of storage via the OneDrive
for Business service, and video conferencing with Microsoft Teams, which is replacing
Skype for Business.
Microsoft offers Exchange Online in its other Office 365 offerings, including Office 365
Business Essentials, Office 365 Business, Office 365 Enterprise E1, Office 365 Enterprise
E3, Office 365 Enterprise E5, Office 365 Enterprise F1, Office 365 Education and Office 365
Education E5.

Deployment choices

Organizations can use Exchange Online in a hybrid arrangement in which some mailboxes
remain in the data center while others are hosted in Microsoft's cloud. A hybrid deployment
allows an organization to retain some control and use some of its on-premises functionality,
such as secure mail transport, with a cloud-based mailbox.

Organizations can also use cloud-only deployments that put all the mailboxes in a Microsoft
data center.

Features and drawbacks of Exchange Online

Microsoft positions Exchange Online as one way to reduce the workload of an IT staff. It
takes time and effort to maintain an on-premises version of Exchange to ensure mail,
calendars and other messaging features perform as expected.

Administrators must contend with regular patching via Patch Tuesday to maintain the
stability and security of the Exchange deployment. The IT staff must also plot out an upgrade
if their current Exchange Server version is due to move out of support, which might require
purchasing newer equipment and developing a method to perform an upgrade without
disrupting end users. With Exchange Online, Microsoft runs the service in its data centers and
executes updates without downtime or involving an organization's IT department.

A switch to Exchange Online can alleviate some of the hardware issues and problems with
infrastructure components that can affect an on-premises Exchange deployment. Microsoft
touts the stability of its service and offers a 99.9% uptime guarantee and a service-level
agreement to provide a credit back to the organization if a disruption occurs.

Cost is another factor to consider when deciding whether to stay with Exchange Server or
move to Exchange Online. Depending on the number of users and the frequency of server
hardware upgrades, it might be cheaper to subscribe to Exchange Online for smaller
organizations that buy new server equipment every three years.

Another benefit of Exchange Online is scalability. A cloud-based service can more quickly
absorb a merger involving a significant number of users than if it had an on-premises
Exchange deployment.

The other advantage of storing data in Microsoft's data centers is not needing to develop and
maintain the infrastructure for disaster recovery.

A drawback to Exchange Online is that disruptions will happen, and because Microsoft
handles the support, it can be difficult to determine when the service will return.

Another drawback of Exchange Online is that Microsoft updates its cloud services on a
frequent basis to add, remove and modify certain features. Users might get frustrated if some
functionality changes or disappears when Microsoft pushes out an update. With an on-
premises Exchange deployment, the feature set tends to remain fixed.

Some organizations might have to stop using certain third-party tools or applications that
work with Exchange Server if they don't integrate with Exchange Online. These
organizations might want to maintain the level of flexibility provided by an on-premises
messaging deployment.

SharePoint Apps and Microsoft Azure

SharePoint and Microsoft Azure are two sizeable platforms unto themselves. SharePoint is
one of Microsoft’s leading server productivity platforms or the collaborative platform for the
enterprise and the Web.

Microsoft Azure is Microsoft’s operating system in the cloud. Separately, they have their
own strengths, market viability, and developer following.

Together, they provide many powerful benefits. They are −

 They help expand how and where you deploy your code and data.
 They increase opportunities to take advantage of the Microsoft Azure while at the
same time reducing the storage and failover costs of on-premises applications.
 They provide you with new business models and offerings that you can take to your
customers to increase your own solution offerings.

In SharePoint 2010, Azure and SharePoint were two distinct platforms and technologies,
which could be integrated easily enough, but they were not part of the same system.
However, in SharePoint 2013 this has changed.

SharePoint 2013 introduces different types of cloud applications. In fact, you can build two
types of Azure integrated applications.

The first type of application is Autohosted, and the second is Provider-hosted (sometimes
referred to as self-hosted).

The major difference between the two is −


 Autohosted applications natively support a set of Azure features such as Web Sites
and SQL Database with the SharePoint development and deployment experience.
 Provider-hosted applications are meant to integrate with a broader set of web
technologies and standards than Autohosted applications, one of which is Microsoft
Azure.

Thus, you can take advantage of the entire Microsoft Azure stack when building
Providerhosted apps that use Azure.

Microsoft Dynamics CRM

Microsoft Dynamics CRM is a customer relationship management software package


developed by Microsoft focused on enhancing the customer relationship for any organization.
Out of the box, the product focuses mainly on Sales, Marketing, and Customer Service
sectors, though Microsoft has been marketing Dynamics CRM as an XRM platform and has
been encouraging partners to use its proprietary (.NET based) framework to customize it. In
recent years, it has also grown as an Analytics platform driven by CRM.

The CRM Solution can be used to drive the sales productivity and marketing effectiveness for
an organization, handle the complete customer support chain, and provide social insights,
business intelligence, and a lot of other out-of-the-box functionalities and features. As a
product, Microsoft Dynamics CRM also offers full mobile support for using CRM apps on
mobiles and tablets.
As of writing this tutorial, the latest version of CRM is CRM 2016. However, in this tutorial
we will be using CRM 2015 Online version as it is the latest stable version as well as
frequently used in many organizations. Nevertheless, even if you are using any other versions
of CRM, all the concepts in the tutorial will still hold true.

Product Offerings

Microsoft Dynamics CRM is offered in two categories −

CRM Online

CRM Online is a cloud-based offering of Microsoft Dynamics CRM where all the backend
processes (such as application servers, setups, deployments, databases, licensing, etc.) are
managed on Microsoft servers. CRM Online is a subscription-based offering which is
preferred for organizations who may not want to manage all the technicalities involved in a
CRM implementation. You can get started with setting up your system in a few days (not
weeks, months or years) and access it on web via your browser.

CRM On-Premise

CRM on-premise is a more customized and robust offering of Microsoft Dynamics CRM,
where the CRM application and databases will be deployed on your servers. This offering
allows you to control all your databases, customizations, deployments, backups, licensing and
other network and hardware setups. Generally, organizations who want to go for a
customized CRM solution prefer on-premise deployment as it offers better integration and
customization capabilities.

From the functional standpoint, both the offerings offer similar functionalities; however, they
differ significantly in terms of implementation. The differences are summarized in the
following table.

CRM Online CRM On-Premise

This is a cloud-based solution provided by This is an on-premise solution provided by


Microsoft in which all the servers and Microsoft in which the servers and databases
databases are managed by Microsoft. are managed by the customer.

You can get started with an online offering in Setting up an on-premise offering needs
a matter of few days. You pay for the users technical skills as well as sufficient time to
and used space on-the-go. setup the CRM instance and get it running.

It supports relatively less customizations and It supports relatively more customization and
extensions. extensions.

CRM Online does not give the ability to CRM on-premise gives complete ability to
perform manual data backup and restore manage your database.
options, since the database is hosted on
Microsoft servers. However, Microsoft
performs daily backups of the database.

CRM on-premise does not have any such


CRM Online has various plans based on the
limits on storage size, since the data exists on
data storage limits such as 5GB, 20 GB, etc.
your own servers.

CRM Online provides inbuilt capabilities of


CRM on-premise has extra costs for these
features such as insights, social listening,
features.
analytics, etc.

CRM Online supports automatic updates to CRM on-premise updates need to be installed
future version. by the administrator.

Accessing CRM

Microsoft Dynamics CRM can be accessed via any of the following options −

 Browser
 Mobile and Tablets
 Outlook

Product Competitors

Microsoft Dynamics CRM is undoubtedly one of the top products in the CRM space.
However, following are the other products that compete with Microsoft Dynamics CRM.

 Salesforce.com
 Oracle
 SAP
 Sage CRM
 Sugar CRM
 NetSuite

Product Versions

Microsoft Dynamics CRM has grown over the years starting from its 1.0 version in 2003.
The latest version (as of writing this article) is 2015. Following is the chronological list of
release versions −

 Microsoft CRM 1.0


 Microsoft CRM 1.2
 Microsoft Dynamics CRM 3.0
 Microsoft Dynamics CRM 4.0
 Microsoft Dynamics CRM 2011
 Microsoft Dynamics CRM 2013
 Microsoft Dynamics CRM 2015
 Microsoft Dynamics CRM 201

You might also like