Cloud Computing Notes
Cloud Computing Notes
Introduction: Network centric computing, Network centric content, peer-to –peer systems,
cloud computing delivery models and services, Ethical issues, Vulnerabilities, Major
challenges for cloud computing. Parallel and Distributed Systems: Introduction,
architecture, distributed systems, communication protocols, logical clocks, message delivery
rules, concurrency, model concurrency with Petri Nets
Cloud Computing is the delivery of computing services such as servers, storage, databases,
networking, software, analytics, intelligence, and more, over the Cloud (Internet).
But if we choose Cloud Computing, a cloud vendor is responsible for the hardware purchase
and maintenance. They also provide a wide variety of software and platform as a service. We
can take any required services on rent. The cloud computing services will be charged based
on usage.
The cloud environment provides an easily accessible online portal that makes handy for the
user to manage the compute, storage, network, and application resources. Some cloud service
providers are in the following figure.
Cost: It reduces the huge capital costs of buying hardware and software.
Speed: Resources can be accessed in minutes, typically within a few clicks.
Scalability: We can increase or decrease the requirement of resources according to
the business requirements.
Productivity: While using cloud computing, we put less operational effort. We do not
need to apply patching, as well as no need to maintain hardware and software. So, in
this way, the IT team can be more productive and focus on achieving business goals.
Reliability: Backup and recovery of data are less expensive and very fast for business
continuity.
Security: Many cloud vendors offer a broad set of policies, technologies, and controls
that strengthen our data security.
Public Cloud: The cloud resources that are owned and operated by a third-party
cloud service provider are termed as public clouds. It delivers computing resources
such as servers, software, and storage over the internet
Private Cloud: The cloud computing resources that are exclusively used inside a
single business or organization are termed as a private cloud. A private cloud may
physically be located on the company’s on-site datacentre or hosted by a third-party
service provider.
Hybrid Cloud: It is the combination of public and private clouds, which is bounded
together by technology that allows data applications to be shared between them.
Hybrid cloud provides flexibility and more deployment options to the business.
The cloud is a computer network of remote computing servers, made accessible on-demand
to users who may be located anywhere. The cloud gives you the ability to access, store and
share your information and data from any Internet-connected device.
The cloud has revolutionized the way that companies store and share information in
comparison to traditional on-premise infrastructures. However, not all organizations have yet
taken advantage of this technology. The Cloud Computing Service Provider industry includes
those firms are as follows:
SaaS is basically the application delivery over the Internet. The application is installed on to
the cloud provider’s servers and each user has a web browser interface to access the
applications. The data that you store in this environment can be accessed from any device
with an internet connection.
PaaS offers a platform over the cloud where each user can access resources such as databases,
storage, and bandwidth with a single login. The platform enables users to develop and deploy
applications in which they can use applications programming interfaces (API).
IaaS provides storage, processor power, memory, operating systems, and networking
capabilities to customers so that they do not have to buy and maintain their own computer
system infrastructure.
Net-Centric is a way to manage your data, applications, and infrastructure in the cloud. Net-
centric cloud computing can be considered an evolution of Software as a Service (SaaS). It
leverages the power of the Internet to provide an environment for data, applications, and
infrastructure on demand. It allows you to manage everything from one interface without
worrying about hardware or server management issues.
The term net-centric combines network-based computing with its integration of various types
of information technology resources – servers, storage devices, servers, computers – into
centralized repositories that are served using standard Web-based protocols such as HTTP or
HTTPS via a global computer communications network like the internet.
Net-centric computing allows organizations to focus on their core business needs without
limiting themselves by software or hardware limitations imposed on their infrastructure. In
other words, when an organization adopts net-centric principles, they are able to completely
virtualize its IT footprint while still being able to take advantage of modern networking
technologies like LANs and WANs.
Net-centric cloud computing service is a combination of IaaS, PaaS, and SaaS. What this
means is that instead of buying hardware and software for your own data center, you buy it
from the cloud provider. This gives you the ability to move your data to the cloud and access
it from anywhere.
Net-centric computing service allows you to centralize your applications with a single
interface. It provides fully managed services according to user’s specific requirements, which
are invoked in real-time as needed rather than being provided on-demand or already
provisioned for use. The concept of net-centric computing enables multiple distributed clients
to access a single entity’s applications in real-time.
In cloud computing, there are many advantages over traditional data center technologies.
Cloud computing allows for agility on a business level by not having to invest in maintaining
multiple physical data centers.
Cloud computing has gained traction with both enterprises and consumers. It is expected that
CSPs will continue to embrace this technology as it becomes the norm for organizations of all
kinds. As a result, CISOs need to be trained on how to adopt net-centric principles for
managing the cloud without limits in order to be successful within this new market.
Peer to peer computing
The peer to peer computing architecture contains nodes that are equal participants in data
sharing. All the tasks are equally divided between all the nodes. The nodes interact with each
other as required as share resources.
Peer to peer networks are usually formed by groups of a dozen or less computers.
These computers all store their data using individual security but also share data with
all the other nodes.
The nodes in peer to peer networks both use resources and provide resources. So, if
the nodes increase, then the resource sharing capacity of the peer to peer network
increases. This is different than client server networks where the server gets
overwhelmed if the nodes increase.
Since nodes in peer to peer networks act as both clients and servers, it is difficult to
provide adequate security for the nodes. This can lead to denial of service attacks.
Most modern operating systems such as Windows and Mac OS contain software to
implement peer to peer networks.
Each computer in the peer to peer network manages itself. So, the network is quite
easy to set up and maintain.
In the client server network, the server handles all the requests of the clients. This
provision is not required in peer to peer computing and the cost of the server is saved.
It is easy to scale the peer to peer network and add more nodes. This only increases
the data sharing capacity of the system.
None of the nodes in the peer to peer network are dependent on the others for their
functioning.
Disadvantages of Peer to Peer Computing
It is difficult to backup the data as it is stored in different computer systems and there
is no central server.
It is difficult to provide overall security in the peer to peer network as each system is
independent and contains its own data
Characteristics of IaaS
PaaS cloud computing platform is created for the programmer to develop, test, run, and
manage the applications.
Characteristics of PaaS
Example: AWS Elastic Beanstalk, Windows Azure, Heroku, Force.com, Google App
Engine, Apache Stratos, Magento Commerce Cloud, and OpenShift.
SaaS is also known as "on-demand software". It is a software in which the applications are
hosted by a cloud service provider. Users can access these applications with the help of
internet connection and web browser.
Characteristics of SaaS
The below table shows the difference between IaaS, PaaS, and SaaS -
It provides virtual
It provides a virtual data center to store It provides web software
platforms and tools to
information and create platforms for app and apps to complete
create, test, and deploy
development, testing, and deployment. business tasks.
apps.
It provides runtime
It provides access to resources such as environments and It provides software as a
virtual machines, virtual storage, etc. deployment tools for service to the end-users.
applications.
SaaS provides
PaaS provides
IaaS provides only Infrastructure. Infrastructure+Platform
Infrastructure+Platform.
+Software.
Cloud Computing is a new name for an old concept. The delivery of computing services from
a remote location. Cloud Computing is Internet-based computing, where shared resources,
software, and information are provided to computers and other devices on demand.
1. Privacy: The user data can be accessed by the host company with or without permission.
The service provider may access the data that is on the cloud at any point in time. They could
accidentally or deliberately alter or even delete information.
2. Compliance: There are many regulations in places related to data and hosting. To comply
with regulations (Federal Information Security Management Act, Health Insurance Portability
and Accountability Act, etc.) the user may have to adopt deployment modes that are
expensive.
3. Security: Cloud-based services involve third-party for storage and security. Can one
assume that a cloud-based company will protect and secure one’s data if one is using their
services at a very low or for free? They may share users’ information with others. Security
presents a real threat to the cloud.
4. Sustainability: This issue refers to minimizing the effect of cloud computing on the
environment. Citing the server’s effects on the environmental effects of cloud computing, in
areas where climate favors natural cooling and renewable electricity is readily available, the
countries with favorable conditions, such as Finland, Sweden, and Switzerland are trying to
attract cloud computing data centers. But other than nature’s favors, would these countries
have enough technical infrastructure to sustain the high-end clouds?
5. Abuse: While providing cloud services, it should be ascertained that the client is not
purchasing the services of cloud computing for a nefarious purpose. In 2009, a banking
Trojan illegally used the popular Amazon service as a command and control channel that
issued software updates and malicious instructions to PCs that were infected by the malware
So the hosting companies and the servers should have proper measures to address these
issues.
6, Higher Cost: If you want to use cloud services uninterruptedly then you need to have a
powerful network with higher bandwidth than ordinary internet networks, and also if your
organization is broad and large so ordinary cloud service subscription won’t suit your
organization. Otherwise, you might face hassle in utilizing an ordinary cloud service while
working on complex projects and applications. This is a major problem before small
organizations, that restricts them from diving into cloud technology for their business.
7. Recovery of lost data in contingency: Before subscribing any cloud service provider goes
through all norms and documentations and check whether their services match your
requirements and sufficient well-maintained resource infrastructure with proper upkeeping.
Once you subscribed to the service you almost hand over your data into the hands of a third
party. If you are able to choose proper cloud service then in the future you don’t need to
worry about the recovery of lost data in any contingency.
9. Lack of resources/skilled expertise: One of the major issues that companies and
enterprises are going through today is the lack of resources and skilled employees. Every
second organization is seeming interested or has already been moved to cloud services.
That’s why the workload in the cloud is increasing so the cloud service hosting companies
need continuous rapid advancement. Due to these factors, organizations are having a tough
time keeping up to date with the tools. As new tools and technologies are emerging every day
so more skilled/trained employees need to grow. These challenges can only be minimized
through additional training of IT and development staff.
10. Pay-per-use service charges: Cloud computing services are on-demand services a user
can extend or compress the volume of the resource as per needs. so you paid for how much
you have consumed the resources. It is difficult to define a certain pre-defined cost for a
particular quantity of services. Such types of ups and downs and price variations make the
implementation of cloud computing very difficult and intricate. It is not easy for a firm’s
owner to study consistent demand and fluctuations with the seasons and various events. So it
is hard to build a budget for a service that could consume several months of the budget in a
few days of heavy use.
Cloud computing provides various advantages, such as improved collaboration, excellent
accessibility, Mobility, Storage capacity, etc. But there are also security risks in cloud
computing.
Some most common Security Risks of Cloud Computing are given below-
Data Loss
Data loss is the most common cloud security risks of cloud computing. It is also known as
data leakage. Data loss is the process in which data is being deleted, corrupted, and
unreadable by a user, software, or application. In a cloud computing environment, data loss
occurs when our sensitive data is somebody else's hands, one or more data elements can not
be utilized by the data owner, hard disk is not working properly, and software is not updated.
Data Breach
Data Breach is the process in which the confidential data is viewed, accessed, or stolen by the
third party without any authorization, so organization's data is hacked by the hackers.
Vendor lock-in
Vendor lock-in is the of the biggest security risks in cloud computing. Organizations may
face problems when transferring their services from one vendor to another. As different
vendors provide different platforms, that can cause difficulty moving one cloud to another.
Migrating, integrating, and operating the cloud services is complex for the IT staff. IT staff
must require the extra capability and skills to manage, integrate, and maintain the data to the
cloud.
Spectre & Meltdown allows programs to view and steal data which is currently processed on
computer. It can run on personal computers, mobile devices, and in the cloud. It can store the
password, your personal information such as images, emails, and business documents in the
memory of other running programs.
Denial of Service (DoS) attacks
Denial of service (DoS) attacks occur when the system receives too much traffic to buffer the
server. Mostly, DoS attackers target web servers of large organizations such as banking
sectors, media companies, and government organizations. To recover the lost data, DoS
attackers charge a great deal of time and money to handle the data.
Account hijacking
Account hijacking is a serious security risk in cloud computing. It is the process in which
individual user's or organization's cloud account (bank account, e-mail account, and social
media account) is stolen by hackers. The hackers use the stolen account to perform
unauthorized activities.
Major challenges for cloud computing: Cloud computing is the provisioning of resources
like data and storage on demand, that is in real-time. It has been proven to be revolutionary in
the IT industry with the market valuation growing at a rapid rate. Cloud development has
proved to be beneficial not only for huge public and private enterprises but small-scale
businesses as well as it help to cut costs. It is estimated that more than 94% of businesses will
increase their spending on the cloud by more than 45%. This also has resulted in more and
high-paying jobs if you are a cloud developer.
Cloud technology was flourishing before the pandemic, but there has been a sudden spike in
cloud deployment and usage during the lockdown. The tremendous growth can be linked to
the fact that classes have been shifted online, virtual office meetings are happening on video
calling platforms, conferences are taking place virtually as well as on-demand streaming apps
have a huge audience. All this is made possible by us of cloud computing only. We are safe
to conclude that the cloud is an important part of our life today, even if we are an enterprise,
student, developer, or anyone else and are heavily dependent on it. But with this dependence,
it is also important for us to look at the issues and challenges that arise with cloud computing.
Therefore, today we bring you the most common challenges that are faced when dealing with
cloud computing, let’s have a look at them one by one:
Data security is a major concern when switching to cloud computing. User or organizational
data stored in the cloud is critical and private. Even if the cloud service provider assures data
integrity, it is your responsibility to carry out user authentication and authorization, identity
management, data encryption, and access control. Security issues on the cloud include
identity theft, data breaches, malware infections, and a lot more which eventually decrease
the trust amongst the users of your applications. This can in turn lead to potential loss in
revenue alongside reputation and stature. Also, dealing with cloud computing requires
sending and receiving huge amounts of data at high speed, and therefore is susceptible to data
leaks.
2. Cost Management
Even as almost all cloud service providers have a “Pay As You Go” model, which reduces
the overall cost of the resources being used, there are times when there are huge costs
incurred to the enterprise using cloud computing. When there is under optimization of the
resources, let’s say that the servers are not being used to their full potential, add up to the
hidden costs. If there is a degraded application performance or sudden spikes or overages in
the usage, it adds up to the overall cost. Unused resources are one of the other main reasons
why the costs go up. If you turn on the services or an instance of cloud and forget to turn it
off during the weekend or when there is no current use of it, it will increase the cost without
even using the resources.
3. Multi-Cloud Environments
Due to an increase in the options available to the companies, enterprises not only use a single
cloud but depend on multiple cloud service providers. Most of these companies use hybrid
cloud tactics and close to 84% are dependent on multiple clouds. This often ends up being
hindered and difficult to manage for the infrastructure team. The process most of the time
ends up being highly complex for the IT team due to the differences between multiple cloud
providers.
4. Performance Challenges
When an organization uses a specific cloud service provider and wants to switch to another
cloud-based solution, it often turns up to be a tedious procedure since applications written for
one cloud with the application stack are required to be re-written for the other cloud. There is
a lack of flexibility from switching from one cloud to another due to the complexities
involved. Handling data movement, setting up the security from scratch and network also add
up to the issues encountered when changing cloud solutions, thereby reducing flexibility.
Since cloud computing deals with provisioning resources in real-time, it deals with enormous
amounts of data transfer to and from the servers. This is only made possible due to the
availability of the high-speed network. Although these data and resources are exchanged over
the network, this can prove to be highly vulnerable in case of limited bandwidth or cases
when there is a sudden outage. Even when the enterprises can cut their hardware costs, they
need to ensure that the internet bandwidth is high as well there are zero network outages, or
else it can result in a potential business loss. It is therefore a major challenge for smaller
enterprises that have to maintain network bandwidth that comes with a high cost.
7. Lack of Knowledge and Expertise
Due to the complex nature and the high demand for research working with the cloud often
ends up being a highly tedious task. It requires immense knowledge and wide expertise on the
subject. Although there are a lot of professionals in the field they need to constantly update
themselves. Cloud computing is a highly paid job due to the extensive gap between demand
and supply. There are a lot of vacancies but very few talented cloud engineers, developers,
and professionals. Therefore, there is a need for upskilling so these professionals can actively
understand, manage and develop cloud-based applications with minimum issues and
maximum reliability.
We have therefore discussed the most common cloud issues and challenges that are faced by
cloud engineers all over the world. If you are looking out to be a cloud professional in the
near future, then must read the article Top 5 Cloud Computing Companies to Work For in
2021.
There are mainly two computation types, including parallel computing and distributed
computing. A computer system may perform tasks according to human instructions. A single
processor executes only one task in the computer system, which is not an effective way.
Parallel computing solves this problem by allowing numerous processors to accomplish tasks
simultaneously. Modern computers support parallel processing to improve system
performance. In contrast, distributed computing enables several computers to communicate
with one another and achieve a goal. All of these computers communicate and collaborate
over the network. Distributed computing is commonly used by organizations such as
Facebook and Google that allow people to share resources.
In this article, you will learn about the difference between Parallel Computing and
Distributed Computing. But before discussing the differences, you must know about parallel
computing and distributed computing.
It is also known as parallel processing. It utilizes several processors. Each of the processors
completes the tasks that have been allocated to them. In other words, parallel computing
involves performing numerous tasks simultaneously. A shared memory or distributed
memory system can be used to assist in parallel computing. All CPUs in shared memory
systems share the memory. Memory is shared between the processors in distributed memory
systems.
Parallel computing provides numerous advantages. Parallel computing helps to increase the
CPU utilization and improve the performance because several processors work
simultaneously. Moreover, the failure of one CPU has no impact on the other CPUs'
functionality. Furthermore, if one processor needs instructions from another, the CPU might
cause latency.
There are various advantages and disadvantages of parallel computing. Some of the
advantages and disadvantages are as follows:
Advantages
1. It saves time and money because many resources working together cut down on time
and costs.
2. It may be difficult to resolve larger problems on Serial Computing.
3. You can do many things at once using many computing resources.
4. Parallel computing is much better than serial computing for modeling, simulating, and
comprehending complicated real-world events.
Disadvantages
It comprises several software components that reside on different systems but operate as a
single system. A distributed system's computers can be physically close together and linked
by a local network or geographically distant and linked by a wide area network (WAN). A
distributed system can be made up of any number of different configurations, such as
mainframes, PCs, workstations, and minicomputers. The main aim of distributed computing
is to make a network work as a single computer.
There are various benefits of using distributed computing. It enables scalability and makes it
simpler to share resources. It also aids in the efficiency of computation processes.
There are various advantages and disadvantages of distributed computing. Some of the
advantages and disadvantages are as follows:
Advantages
Disadvantages
1. Data security and sharing are the main issues in distributed systems due to the features
of open systems
2. Because of the distribution across multiple servers, troubleshooting and diagnostics
are more challenging.
3. The main disadvantage of distributed computer systems is the lack of software
support.
Key differences between the Parallel Computing and Distributed Computing
Here, you will learn the various key differences between parallel computing and distributed
computation. Some of the key differences between parallel computing and distributed
computing are as follows:
The processors
The computer systems connect with one
Communication communicate with one
another via a network.
another via a bus.
Conclusion
There are two types of computations: parallel computing and distributed computing. Parallel
computing allows several processors to accomplish their tasks at the same time. In contrast,
distributed computing splits a single task among numerous systems to achieve a common
goal.
Method-1:
To order events across process, try to sync clocks in one approach.
This means that if one PC has a time 2:00 pm then every PC should have the same time
which is quite not possible. Not every clock can sync at one time. Then we can’t follow this
method.
Method-2:
Another approach is to assign Timestamps to events.
Taking the example into consideration, this means if we assign the first place as 1, second
place as 2, third place as 3 and so on. Then we always know that the first place will always
come first and then so on. Similarly, If we give each PC their individual number than it will
be organized in a way that 1st PC will complete its process first and then second and so on.
BUT, Timestamps will only work as long as they obey causality.
What is causality ?
Causality is fully based on HAPPEN BEFORE RELATIONSHIP.
Taking single PC only if 2 events A and B are occurring one by one then TS(A) <
TS(B). If A has timestamp of 1, then B should have timestamp more than 1, then only
happen before relationship occurs.
Taking 2 PCs and event A in P1 (PC.1) and event B in P2 (PC.2) then also the condition
will be TS(A) < TS(B). Taking example- suppose you are sending message to someone
at 2:00:00 pm, and the other person is receiving it at 2:00:02 pm.Then it’s obvious that
TS(sender) < TS(receiver).
Properties Derived from Happen Before Relationship –
Transitive Relation –
If, TS(A) <TS(B) and TS(B) <TS(C), then TS(A) < TS(C)
Causally Ordered Relation –
a->b, this means that a is occurring before b and if there is any changes in a it will surely
reflect on b.
Concurrent Event –
This means that not every process occurs one by one, some processes are made to
happen simultaneously i.e., A || B.
Message Passing Model of Process Communication
o message passing means how a message can be sent from one end to the other end. Either
it may be a client-server model or it may be from one node to another node. The formal
model for distributed message passing has two timing models one is synchronous and the
other is asynchronous.
The fundamental points of message passing are:
1. In message-passing systems, processors communicate with one another by sending and
receiving messages over a communication channel. So how the arrangement should be
done?
2. The pattern of the connection provided by the channel is described by some topology
systems.
3. The collection of the channels is called a network.
4. So by the definition of distributed systems, we know that they are geographically set of
computers. So it is not possible for one computer to directly connect with some other
node.
5. So all channels in the Message-Passing Model are private.
6. The sender decides what data has to be sent over the network. An example is, making a
phone call.
7. The data is only fully communicated after the destination worker decides to receive the
data. Example when another person receives your call and starts to reply to you.
8. There is no time barrier. It is in the hand of a receiver after how many rings he receives
your call. He can make you wait forever by not picking up the call.
9. For successful network communication, it needs active participation from both sides.
Algorithm:
1. Let us consider a network consisting of n nodes named p 0, p1, p2……..pn-1 which are
bidirectional point to point channels.
2. Each node might not know who is at another end. So in this way, the topology would be
arranged.
3. Whenever the communication is established and whenever the message passing is
started then only the processes know from where to where the message has to be sent.
Advantages of Message Passing Model :
1. Easier to implement.
2. Quite tolerant of high communication latencies.
3. Easier to build massively parallel hardware.
4. It is more tolerant of higher communication latencies.
5. Message passing libraries are faster and give high performance.
Disadvantages of Message Passing Model :
1. Programmer has to do everything.
2. Connection setup takes time that’s why it is slower.
3. Data transfer usually requires cooperative operations which can be difficult to achieve.
4. It is difficult for programmers to develop portable applications using this model because
message-passing implementations commonly comprise a library of subroutines that are
embedded in source code. Here again, the programmer has to do everything on his own.
UNIT II:
Cloud Infrastructure: At Amazon, The Google Perspective, Microsoft Windows Azure,
Open Source Software Platforms, Cloud storage diversity, Inter cloud, energy use and
ecological impact, responsibility sharing, user experience, Software licensing, Cloud
Computing :Applications and Paradigms: Challenges for cloud, existing cloud applications
and new opportunities, architectural styles, workflows, The Zookeeper, The Map Reduce
Program model, HPC on cloud, biological research.
Cloud Computing architecture comprises of many cloud components, which are loosely
coupled. We can broadly divide the cloud architecture into two parts:
Front End
Back End
Each of the ends is connected through a network, usually Internet. The following diagram
shows the graphical view of cloud computing architecture:
Front End
The front end refers to the client part of cloud computing system. It consists of interfaces
and applications that are required to access the cloud computing platforms, Example - Web
Browser.
Back End
The back End refers to the cloud itself. It consists of all the resources required to provide
cloud computing services. It comprises of huge data storage, virtual machines, security
mechanism, services, deployment models, servers, etc.
Note
It is the responsibility of the back end to provide built-in security mechanism, traffic
control and protocols.
The server employs certain protocols known as middleware, which help the connected
devices to communicate with each other.
Hypervisor
Management Software
Deployment Software
Network
It is the key component of cloud infrastructure. It allows to connect cloud services over the
Internet. It is also possible to deliver network as a utility over the Internet, which means, the
customer can customize the network route and protocol.
Server
The server helps to compute the resource sharing and offers other services such as resource
allocation and de-allocation, monitoring the resources, providing security etc.
Storage
Cloud keeps multiple replicas of storage. If one of the storage resources fails, then it can be
extracted from another one, which makes cloud computing more reliable.
Infrastructural Constraints
Fundamental constraints that cloud infrastructure should implement are shown in the
following diagram:
Transparency
Virtualization is the key to share resources in cloud environment. But it is not possible to
satisfy the demand with single resource or server. Therefore, there must be transparency in
resources, load balancing and application, so that we can scale them on demand.
Scalability
Intelligent Monitoring
To achieve transparency and scalability, application solution delivery will need to be capable
of intelligent monitoring.
Security
The mega data centre in the cloud should be securely architected. Also the control node, an
entry point in mega data centre, also needs to be secure.
AWS
AWS stands for Amazon Web Services which uses distributed IT infrastructure to provide
different IT resources on demand.
Our AWS tutorial includes all the topics such as introduction, history of aws, global
infrastructure, features of aws, IAM, Storage services, Database services, etc.
What is AWS?
Uses of AWS
Pay-As-You-Go
Based on the concept of Pay-As-You-Go, AWS provides the services to the customers.
AWS provides services to customers when required without any prior commitment or upfront
investment. Pay-As-You-Go enables the customers to procure services from AWS.
Computing
Programming models
Database storage
Networking
Advantages of AWS
1) Flexibility
We can get more time for core business tasks due to the instant availability of new
features and services in AWS.
It provides effortless hosting of legacy applications. AWS does not require learning
new technologies and migration of applications to the AWS provides the advanced
computing and efficient storage.
AWS also offers a choice that whether we want to run the applications and services
together or not. We can also choose to run a part of the IT infrastructure in AWS and
the remaining part in data centres.
2) Cost-effectiveness
AWS requires no upfront investment, long-term commitment, and minimum expense when
compared to traditional IT infrastructure that requires a huge investment.
3) Scalability/Elasticity
Through AWS, autoscaling and elastic load balancing techniques are automatically scaled up
or down, when demand increases or decreases respectively. AWS techniques are ideal for
handling unpredictable or very high loads. Due to this reason, organizations enjoy the
benefits of reduced cost and increased user satisfaction.
4) Security
History of AWS
2003: In 2003, Chris Pinkham and Benjamin Black presented a paper on how
Amazon's own internal infrastructure should look like. They suggested to sell it as a
service and prepared a business case on it. They prepared a six-page document and
had a look over it to proceed with it or not. They decided to proceed with the
documentation.
2004: SQS stands for "Simple Queue Service" was officially launched in 2004. A
team launched this service in Cape Town, South Africa.
2006: AWS (Amazon Web Services) was officially launched.
2007: In 2007, over 180,000 developers had signed up for the AWS.
2010: In 2010, amazon.com retail web services were moved to the AWS, i.e.,
amazon.com is now running on AWS.
2011: AWS suffered from some major problems. Some parts of volume of EBS
(Elastic Block Store) was stuck and were unable to read and write requests. It took
two days for the problem to get resolved.
2012: AWS hosted a first customer event known as re:Invent conference. First
re:invent conference occurred in which new products were launched. In AWS,
another major problem occurred that affects many popular sites such as Pinterest,
Reddit, and Foursquare.
2013: In 2013, certifications were launched. AWS started a certifications program for
software engineers who had expertise in cloud computing.
2014: AWS committed to achieve 100% renewable energy usage for its global
footprint.
2015: AWS breaks its revenue and reaches to $6 Billion USD per annum. The
revenue was growing 90% every year.
2016: By 2016, revenue doubled and reached $13Billion USD per annum.
2017: In 2017, AWS re: invent releases a host of Artificial Intelligence Services due
to which revenue of AWS doubled and reached $27 Billion USD per annum.
2018: In 2018, AWS launched a Machine Learning Speciality Certs. It heavily
focussed on automating Artificial Intelligence and Machine learning.
Features of AWS
Flexibility
Cost-effective
Scalable and elastic
Secure
Experienced
1) Flexibility
2) Cost-effective
Cost is one of the most important factors that need to be considered in delivering IT
solutions.
For example, developing and deploying an application can incur a low cost, but after
successful deployment, there is a need for hardware and bandwidth. Owing our own
infrastructure can incur considerable costs, such as power, cooling, real estate, and
staff.
The cloud provides on-demand IT infrastructure that lets you consume the resources
what you actually need. In aws, you are not limited to a set amount of resources such
as storage, bandwidth or computing resources as it is very difficult to predict the
requirements of every resource. Therefore, we can say that the cloud provides
flexibility by maintaining the right balance of resources.
AWS provides no upfront investment, long-term commitment, or minimum spend.
You can scale up or scale down as the demand for resources increases or decreases
respectively.
An aws allows you to access the resources more instantly. It has the ability to respond
the changes more quickly, and no matter whether the changes are large or small,
means that we can take new opportunities to meet the business challenges that could
increase the revenue, and reduce the cost.
4) Secure
AWS provides a scalable cloud-computing platform that provides customers with
end-to-end security and end-to-end privacy.
AWS incorporates the security into its services, and documents to describe how to use
the security features.
AWS maintains confidentiality, integrity, and availability of your data which is the
utmost importance of the aws.
Physical security: Amazon has many years of experience in designing, constructing, and
operating large-scale data centres. An aws infrastructure is incorporated in AWS controlled
data centres throughout the world. The data centres are physically secured to prevent
unauthorized access.
Data privacy: A personal and business data can be encrypted to maintain data privacy.
5) Experienced
The AWS cloud provides levels of scale, security, reliability, and privacy.
AWS has built an infrastructure based on lessons learned from over sixteen years of
experience managing the multi-billion dollar Amazon.com business.
Amazon continues to benefit its customers by enhancing their infrastructure
capabilities.
Nowadays, Amazon has become a global web platform that serves millions of
customers, and AWS has been evolved since 2006, serving hundreds of thousands of
customers worldwide.
The following are the components that make up the AWS infrastructure:
Availability Zones
Region
Edge locations
Regional Edge Caches
Availability zone as a Data Center
Region
Availability zones are connected through redundant and isolated metro fibers.
Edge Locations
Edge locations are the endpoints for AWS used for caching content.
Edge locations consist of Cloud Front, Amazon's Content Delivery Network (CDN).
Edge locations are more than regions. Currently, there are over 150 edge locations.
Edge location is not a region but a small location that AWS have. It is used for
caching the content.
Edge locations are mainly located in most of the major cities to distribute the content
to end users with reduced latency.
For example, some user accesses your website from Singapore; then this request
would be redirected to the edge location closest to Singapore where cached data can
be read.
The following screen appears after clicking on the "Complete Sign Up" button. If
you are an already existing user of an AWS account, then enter the email address of
your AWS account otherwise "create an aws account".
On clicking on the "create an aws account" button, the following screen appears
that requires some fields to be filled by the user.
After providing your payment information, confirm your identity by entering your
phone number and security check code, and then click on the "Contact me" button.
AWS will contact you to verify whether the provided contact number is correct or not.
When number is verified, then the following message appears on the screen.
The final step is the confirmation step. Click on the link to log in again; it redirects
you to the "Management Console".
An AWS account ID
A canonical user ID
AWS account ID
AWS account ID is a 12-digit number such as 123456780123 which can be used to construct
Amazon Resource Names (ARNs). Suppose we refer to resources such as an IAM user, the
AWS account ID distinguishes the resources from resources in other AWS accounts.
We can find the AWS account ID from AWS Management Console. The following steps are
taken to view your account ID:
Login to the aws account by entering your email address and password, and then you
will move to the management console.
Click on "My Account" in the dropdown menu of account name to view your account ID.
Canonical User ID
Firstly, visit the website https://aws.amazon.com, and log in to the aws account by
entering your email address and password.
From the right side of the management console, click on the account name.
Click on the "My Security Credentials" from the dropdown menu of the account
name. The screen appears which is shown below:
Click on the Account identifiers to view the Canonical user ID.
IAM Identities
IAM identities are created to provide authentication for people and processes in your aws
account.
IAM Users
IAM Groups
IAM Roles
When you first create an AWS account, you create an account as a root user identity
which is used to sign in to AWS.
You can sign to the AWS Management Console by entering your email address and
password. The combination of email address and password is known as root user
credentials.
When you sign in to AWS account as a root user, you have unrestricted access to all
the resources in AWS account.
The Root user can also access the billing information as well as can change the
password also.
What is a Role?
A role is a set of permissions that grant access to actions and resources in AWS.
These permissions are attached to the role, not to an IAM User or a group.
An IAM User can use a role in the same AWS account or a different account.
An IAM User is similar to an IAM User; role is also an AWS identity with permission
policies that determine what the identity can and cannot do in AWS.
A role is not uniquely associated with a single person; it can be used by anyone who
needs it.
A role does not have long term security credential, i.e., password or security key.
Instead, if the user uses a role, temporarily security credentials are created and
provided to the user.
You can use the roles to delegate access to users, applications or services that
generally do not have access to your AWS resources.
Sometimes you want to grant the users to access the AWS resources in your AWS
account.
Sometimes you want to grant the users to access the AWS resources in another AWS
account.
It also allows the mobile app to access the AWS resources, but not want to store the
keys in the app.
It can be used to grant access to the AWS resources which have identities outside of
AWS.
It can also be used to grant access to the AWS resources to the third party so that they
can perform an audit on AWS resources.
Following are the important terms associated with the "IAM Roles":
To delegate permission to access the resources, an IAM role is to be created in the trusting
account that has the two policies attached.
Permission Policy: It grants the user with a role the needed permissions to carry out the
intended tasks.
Trust Policy: It specifies which trusted account members can use the role.
IAM Console: When IAM Users working in the IAM Console and want to use the
role, then they access the permissions of the role temporarily. An IAM Users give up
their original permissions and take the permissions of the role. When IAM User exits
the role, their original permissions are restored.
Programmatic Access: An AWS service such as Amazon EC2 instance can use role
by requesting temporary security credentials using the programmatic requests to
AWS.
IAM User: IAM Roles are used to grant the permissions to your IAM Users to access
AWS resources within your own or different account. An IAM User can use the
permissions attached to the role using the IAM Console. A Role also prevents the
accidental access to the sensitive AWS resources.
Applications and Services: You can grant the access of permissions attached with a
role to applications and services by calling the AssumeRole API function. The
AssumeRole function returns a temporary security credentials associated with a role.
An application and services can only take those actions which are permitted by the
role. An application cannot exit the role in the way the IAM User in Console does,
rather it stops using with the temporary credentials and resumes its original
credentials.
Federated Users: Federated Users can sign in using the temporary credentials
provided by an identity provider. AWS provides an IDP (identity provider) and
temporary credentials associated with the role to the user. The credentials grant the
access of permissions to the user.
Switch to a role as an IAM User in one AWS account to access resources in another
account that you own.
o You can grant the permission to your IAM Users to switch roles within your
AWS account or different account. For example, you have Amazon EC2
instances which are very critical to your organization. Instead of directly
granting permission to users to terminate the instances, you can create a role
with the privileges that allows the administrators to switch to the role when
they need to terminate the instance.
o You have to grant users permission to assume the role explicitly.
o Multi-factor authentication role can be added to the role so that only users who
sign in with the MFA can use the role.
o Roles prevent accidental changes to the sensitive resource, especially if you
combine them with the auditing so that the roles can only be used when
needed.
o An IAM User in one account can switch to the role in a same or different
account. With roles, a user can access the resources permitted by the role.
When user switch to the role, then their original permissions are taken away. If
a user exits the role, their original permissions are restored.
Providing access to an AWS service
o AWS services use roles to access a AWS resources.
o Each service is different in how it uses roles and how the roles are assigned to
the service.
o Suppose an AWS service such as Amazon EC2 instance that runs your
application, wants to make request to the AWS resources such as Amazon S3
bucket, the service must have security credentials to access the resources. If
you embed security credentials directly into the instance, then distributing the
credentials to the multiple instances create a security risk. To overcome such
problems, you can create a role which is assigned to the Amazon EC2 instance
that grants the permission to access the resources.
Providing access to externally authenticated users.
Sometimes users have identities outside of AWS such as in your corporate directory.
If such users want to work with the AWS resources, then they should know the
security credentials. In such situations, we can use a role to specify the permissions
for a third-party identity provider (IDP).
o SAML -based federation
SAML 2.0 (Security Assertion Markup Language 2.0) is an open framework
that many identity providers use. SAML provides the user with the federated
single-sign-on to the AWS Management Console, so that user can log in to the
AWS Management Console.
How SAML-based federation works
o Web-identity federation
Suppose you are creating a mobile app that accesses AWS resources such as a
game that run on a mobile device, but the information is stored using Amazon
S3 and DynamoDB.
When you create such an app, you need to make requests to the AWS services
that must be signed with an AWS access key. However, it is recommended not
to use long-term AWS credentials, not even in an encrypted form. An
Application must request for the temporary security credentials which are
dynamically created when needed by using web-identity federation. These
temporary security credentials will map to a role that has the permissions
needed for the app to perform a task.
With web-identity federation, users do not require any custom sign-in code or
user identities. A User can log in using the external identity provider such as
login with Amazon, Facebook, Google or another OpenID. After login, the
user gets the authentication token, and they exchange the authentication token
for receiving the temporary security credentials.
Providing access to third parties
When third parties want to access the AWS resources, then you can use roles to
delegate access to them. IAM roles grant these third parties to access the AWS
resources without sharing any security credentials.
Third parties provide the following information to create a role:
o The third party provides the account ID that contains the IAM Users to use
your role. You need to specify AWS account ID as the principal when you
define the trust policy for the role.
o The external ID of the third party is used to associate with the role. You
specify the external ID to define the trust policy of the role.
o The permissions are used by the third party to access the AWS resources. The
permissions are associated with the role made when you define the trust
policy. The policy defines the actions what they can take and what resources
they can use.
In the navigation pane of the console, click Roles and then click on "Create Role".
The screen appears shown below on clicking Create Role button.
Choose the service that you want to use with the role.
Select the managed policy that attaches the permissions to the service.
In a role name box, enter the role name that describes the role of the service, and then
click on "Create role".
Creating a Role for a service using the CLI (Command Line Interface)
Creating a role using the console, many of the steps are already done for you, but with
the CLI you explicitly perform each step yourself. You must create a policy, and
assign a permission policy to the role.
To create a role for an AWS service using the AWS CLI, use the following
commands:
o Create a role: aws iam create-role
o Attach a permission policy to the role: aws iam put-role-policy
If you are using a role with instance such as Amazon EC2 instance, then you need to
create an instance profile to store a role. An instance profile is a container of role, but
instance profile can contain only one role. If you create the role by using AWS
Management Console, then instance profile is already created for you. If you create
the profile using CLI, you must explicitly specify each step yourself.
To create an instance profile using CLI, use the following commands:
o Create an instance profile: aws iam create-instance-profile
o Add a role to instance profile: aws iam add-role-to-instance-profile
Creating IAM Roles for an IAM User
In the navigation pane of the console, click Roles and then click on "Create Role".
The screen appears shown below on clicking Create Role button.
Specify the account ID that you want to grant the access to the resources, and then
click on Next Permissions button.
If you selected the option "Require external ID" means that it allows the users from
the third party to access the resources. You need to enter the external ID provided by
the administrator of the third party. This condition is automatically added to the trust
policy that allows the user to assume the role.
If you selected the option "Require MFA" is used to restrict the role to the users who
provide Multi-factor authentication.
Select a policy that you want to attach with the role. A policy contains the permissions
that specify the actions that they can take and resources that they can access.
In a role name box, enter the role name and the role description.
Creating a Role for an IAM User using CLI (Command Line Interface)
When you use the console to create a role, many of the steps are already done for you. In the
case of CLI, you must specify each step explicitly.
To create a role for cross-account access using CLI, use the following commands:
Identity Federation allows you to access AWS resources for users who can sign in using
third-party identity provider. To configure Identity Federation, you must configure the
identity provider and then create an IAM Role that determines the permissions which
federated users can have.
Web Identity Federation: Web Identity Federation provides access to the AWS
resources which have signed in with the login with facebook, Google, Amazon or
another Open ID standard. To configure with the Web Identity Federation, you must
first create and configure the identity provider and then create the IAM Role that
determines the permissions that federated users will have.
Security Assertion Markup Language (SAML) 2.0 Federation: SAML-Based
Federation provides access to the AWS resources in an organization that uses SAML.
To configure SAML 2.0 Based Federation, you must first create and configure the
identity provider and then create the IAM Role that determines the permissions the
federated users from the organization will have.
Creating a Role for a web identity using AWS Management Console
In a role name box, specify the role name and role description
Creating a Role for SAML Based 2.0 Federation using AWS Management Console
To create a role for federated users using AWS CLI, use the following commands:
To attach permission to the policy: aws iam attach-role-policy or aws iam put-role-policy
S3-101
What is S3?
If you upload a file to S3 bucket, then you will receive an HTTP 200 code means that
the uploading of a file is successful.
Advantages of Amazon S3
Create Buckets: Firstly, we create a bucket and provide a name to the bucket.
Buckets are the containers in S3 that stores the data. Buckets must have a unique
name to generate a unique DNS address.
Storing data in buckets: Bucket can be used to store an infinite amount of data. You
can upload the files as much you want into an Amazon S3 bucket, i.e., there is no
maximum limit to store the files. Each object can contain upto 5 TB of data. Each
object can be stored and retrieved by using a unique developer assigned-key.
Download data: You can also download your data from a bucket and can also give
permission to others to download the same data. You can download the data at any
time whenever you want.
Permissions: You can also grant or deny access to others who want to download or
upload the data from your Amazon S3 bucket. Authentication mechanism keeps the
data secure from unauthorized access.
Standard interfaces: S3 is used with the standard interfaces REST and SOAP
interfaces which are designed in such a way that they can work with any development
toolkit.
Security: Amazon S3 offers security features by protecting unauthorized users from
accessing your data.
Key: It is simply the name of the object. For example, hello.txt, spreadsheet.xlsx, etc.
You can use the key to retrieve the object.
Value: It is simply the data which is made up of a sequence of bytes. It is actually a
data inside the file.
Version ID: Version ID uniquely identifies the object. It is a string generated by S3
when you add an object to the S3 bucket.
Metadata: It is the data about data that you are storing. A set of a name-value pair
with which you can store the information regarding an object. Metadata can be
assigned to the objects in Amazon S3 bucket.
Sub resources: Sub resource mechanism is used to store object-specific information.
Access control information: You can put the permissions individually on your files.
Amazon S3 Concepts
Buckets
Objects
Keys
Regions
Data Consistency Model
Buckets
o A bucket is a container used for storing the objects.
o Every object is incorporated in a bucket.
o For example, if the object named photos/tree.jpg is stored in the treeimage
bucket, then it can be addressed by using the URL
http://treeimage.s3.amazonaws.com/photos/tree.jpg.
o A bucket has no limit to the amount of objects that it can store. No bucket can
exist inside of other buckets.
o S3 performance remains the same regardless of how many buckets have been
created.
o The AWS user that creates a bucket owns it, and no other AWS user cannot
own it. Therefore, we can say that the ownership of a bucket is not
transferrable.
o The AWS account that creates a bucket can delete a bucket, but no other AWS
user can delete the bucket.
Objects
o Objects are the entities which are stored in an S3 bucket.
o An object consists of object data and metadata where metadata is a set of
name-value pair that describes the data.
o An object consists of some default metadata such as date last modified, and
standard HTTP metadata, such as Content type. Custom metadata can also be
specified at the time of storing an object.
o It is uniquely identified within a bucket by key and version ID.
Key
o A key is a unique identifier for an object.
o Every object in a bucket is associated with one key.
o An object can be uniquely identified by using a combination of bucket name,
the key, and optionally version ID.
o For example, in the URL
http://jtp.s3.amazonaws.com/2019-01-31/Amazons3.wsdl where "jtp" is the
bucket name, and key is "2019-01-31/Amazons3.wsdl"
Regions
o You can choose a geographical region in which you want to store the buckets
that you have created.
o A region is chosen in such a way that it optimizes the latency, minimize costs
or address regulatory requirements.
o Objects will not leave the region unless you explicitly transfer the objects to
another region.
Data security is a major concern when working with Cloud environments. It is one of the
major challenges in cloud computing as users have to take accountability for their data, and
not all Cloud providers can assure 100% data privacy. Lack of visibility and control tools, no
identity access management, data misuse, and Cloud misconfiguration are the common
causes behind Cloud privacy leaks. There are also concerns with insecure APIs, malicious
insiders, and oversights or neglect in Cloud data management.
Solution: Configure network hardware and install the latest software updates to prevent
security vulnerabilities. Using firewalls, antivirus, and increasing bandwidth for Cloud data
availability are some ways to prevent data security risks.
2. Multi-Cloud Environments
Common cloud computing issues and challenges with multi-cloud environments are -
configuration errors, lack of security patches, data governance, and no granularity. It is
difficult to track the security requirements of multi-clouds and apply data management
policies across various boards.
Solution: Using a multi-cloud data management solution is a good start for enterprises. Not
all tools will offer specific security functionalities, and multi-cloud environments grow highly
sophisticated and complex. Open-source products like Terraform provide a great deal of
control over multi-cloud architectures.
3. Performance Challenges
The performance of Cloud computing solutions depends on the vendors who offer these
services to clients, and if a Cloud vendor goes down, the business gets affected too. It is one
of the major challenges associated with cloud computing.
Solution: Sign up with Cloud Service Providers who have real-time SaaS monitoring policies.
The Cloud Solution Architect Certification training addresses all Cloud performance issues
and teaches learners how to mitigate them.
4. Interoperability and Flexibility
Interoperability is a challenge when you try to move applications between two or multiple
Cloud ecosystems. It is one of the challenges faced in cloud computing. Some common issues
faced are:
Lack of sufficient internet bandwidth is a common problem when transferring large volumes
of information to and from Cloud data servers. It is one of the various challenges in cloud
computing. Data is highly vulnerable, and there is a risk of sudden outages. Enterprises that
want to lower hardware costs without sacrificing performance need to ensure there is high
bandwidth, which will help prevent business losses from sudden outages.
Solution: Pay more for higher bandwidth and focus on improving operational efficiency to
address network dependencies.
Organizations are finding it tough to find and hire the right Cloud talent, which is another
common challenge in cloud computing. There is a shortage of professionals with the required
qualifications in the industry. Workloads are increasing, and the number of tools launched in
the market is increasing. Enterprises need good expertise in order to use these tools and find
out which ones are ideal for them.
High unavailability of Cloud services and a lack of reliability are two major concerns in these
ecosystems. Organizations are forced to seek additional computing resources in order to keep
up with changing business requirements. If a Cloud vendor gets hacked or affected, the data
of organizations using their services gets compromised. It is another one of the many cloud
security risks and challenges faced by the industry.
Solution: Implementing the NIST Framework standards in Cloud environments can greatly
improve both aspects.
8. Password Security
Account managers use the same passwords to manage all their Cloud accounts. Password
management is a critical problem, and it is often found that users resort to using reused and
weak passwords.
Solution: Use a strong password management solution to secure all your accounts. To further
improve security, use Multifactor Authentication (MFA) in addition to a password manager.
Good cloud-based password managers alert users of security risks and leaks.
9. Cost Management
Even though Cloud Service Providers (CSPs) offer a pay-as-you-go subscription for services,
the costs can add up. Hidden costs appear in the form of underutilized resources in
enterprises.
Solution: Auditing systems regularly and implementing resource utilization monitoring tools
are some ways organizations can fix this. It's one of the most effective ways to manage
budgets and deal with major challenges in cloud computing.
Cloud computing is a highly competitive field, and there are many professionals who lack the
required skills and knowledge to work in the industry. There is also a huge gap in supply and
demand for certified individuals and many job vacancies.
Solution: Companies should retrain their existing IT staff and help them in upskilling their
careers by investing in Cloud training programs.
Good IT governance ensures that the right tools are used, and assets get implemented
according to procedures and agreed-to policies. Lack of governance is a common problem,
and companies use tools that do not align with their vision. IT teams don't get total control of
compliance, risk management, and data quality checks, and there are many uncertainties
faced when migrating to the Cloud from traditional infrastructures.
12. Compliance
Cloud Service Providers (CSP) are not up-to-date when it comes to having the best data
compliance policies. Whenever a user transfers data from internal servers to the Cloud, they
run into compliance issues with state laws and regulations.
Solution: The General Data Protection Regulation (GDPR) Act is expected to expedite
compliance issues in the future for CSPs.
13. Multiple Cloud Management
Solution: Creating strong data management and privacy policies is a starting point when it
comes to managing multi-cloud environments effectively.
14. Migration
Migration of data to the Cloud takes time, and not all organizations are prepared for it. Some
report increased downtimes during the process, face security issues, or have problems with
data formatting and conversions. Cloud migration projects can get expensive and are harder
than anticipated.
Solution: Organizations will have to employ in-house professionals to handle their Cloud data
migration and increase their investments. Experts must analyze cloud computing issues and
solutions before investing in the latest platforms and services offered by CSPs.
Conclusion
Now that you're aware of the cloud computing opportunities and challenges and key
challenges in the adoption of cloud computing and their solutions, you can take the necessary
steps to address them. It is important to view a company's Cloud strategy as a whole,
involving both people and processes, and not just the technology. The Cloud is powerful and,
if implemented properly, can substantially help an organization grow faster and perform
better.
Check out the KnowledgeHut’s Cloud Computing Courses List to learn the most in-demand
Cloud Computing skills from experts in the industry. Develop knowledge in managing cloud
storage, databases, networking, security, and analytics with in-depth course materials.
Cloud Computing Applications
Cloud service providers provide various applications in the field of art, business, data storage
and backup services, education, entertainment, management, social networking, etc.
The most widely used cloud computing applications are given below -
1. Art Applications
Cloud computing offers various art applications for quickly and easily design attractive
cards, booklets, and images. Some most commonly used cloud art applications are given
below:
i Moo
Moo is one of the best cloud art applications. It is used for designing and printing business
cards, postcards, and mini cards.
ii. Vistaprint
Vistaprint allows us to easily design various printed marketing products such as business
cards, Postcards, Booklets, and wedding invitations cards.
Adobe creative cloud is made for designers, artists, filmmakers, and other creative
professionals. It is a suite of apps which includes PhotoShop image editing programming,
Illustrator, InDesign, TypeKit, Dreamweaver, XD, and Audition.
2. Business Applications
Business applications are based on cloud service providers. Today, every organization
requires the cloud business application to grow their business. It also ensures that business
applications are 24*7 available to users.
i. MailChimp
MailChimp is an email publishing platform which provides various options to design, send,
and save templates for emails.
iii. Salesforce
Salesforce platform provides tools for sales, service, marketing, e-commerce, and more. It
also provides a cloud development platform.
iv. Chatter
Chatter helps us to share important information about the organization in real time.
v. Bitrix24
vi. Paypal
Paypal offers the simplest and easiest online payment mode using a secure internet account.
Paypal accepts the payment through debit cards, credit cards, and also from Paypal account
holders.
vii. Slack
Slack stands for Searchable Log of all Conversation and Knowledge. It provides a user-
friendly interface that helps us to create public and private channels for communication.
viii. Quickbooks
Cloud computing allows us to store information (data, files, images, audios, and videos) on
the cloud and access this information using an internet connection. As the cloud provider is
responsible for providing security, so they offer various backup recovery application for
retrieving the lost data.
A list of data storage and backup applications in the cloud are given below -
i. Box.com
Box provides an online environment for secure content management, workflow, and
collaboration. It allows us to store different files such as Excel, Word, PDF, and images on
the cloud. The main advantage of using box is that it provides drag & drop service for files
and easily integrates with Office 365, G Suite, Salesforce, and more than 1400 tools.
ii. Mozy
Mozy provides powerful online backup solutions for our personal and business data. It
schedules automatically back up for each day at a specific time.
iii. Joukuu
Joukuu provides the simplest way to share and track cloud-based backup files. Many users
use joukuu to search files, folders, and collaborate on documents.
Google G Suite is one of the best cloud storage and backup application. It includes Google
Calendar, Docs, Forms, Google+, Hangouts, as well as cloud storage and tools for managing
cloud apps. The most popular app in the Google G Suite is Gmail. Gmail offers free email
services to users.
4. Education Applications
Cloud computing in the education sector becomes very popular. It offers various online
distance learning platforms and student information portals to the students. The
advantage of using cloud in the field of education is that it offers strong virtual classroom
environments, Ease of accessibility, secure data storage, scalability, greater reach for the
students, and minimal hardware requirements for the applications.
Google Apps for Education is the most widely used platform for free web-based email,
calendar, documents, and collaborative study.
Chromebook for Education is one of the most important Google's projects. It is designed for
the purpose that it enhances education innovation.
It allows educators to quickly implement the latest technology solutions into the classroom
and make it available to their students.
iv. AWS in Education
5. Entertainment Applications
Entertainment industries use a multi-cloud strategy to interact with the target audience.
Cloud computing offers various entertainment applications such as online games and video
conferencing.
i. Online games
Today, cloud gaming becomes one of the most important entertainment media. It offers
various online games that run remotely from the cloud. The best cloud gaming services are
Shaow, GeForce Now, Vortex, Project xCloud, and PlayStation Now.
Video conferencing apps provides a simple and instant connected experience. It allows us to
communicate with our business partners, friends, and relatives using a cloud-based video
conferencing. The benefits of using video conferencing are that it reduces cost, increases
efficiency, and removes interoperability.
6. Management Applications
Cloud computing offers various cloud management tools which help admins to manage all
types of cloud activities, such as resource deployment, data integration, and disaster recovery.
These management tools also provide administrative control over the platforms, applications,
and infrastructure.
i. Toggl
Toggl helps users to track allocated time period for a particular project.
ii. Evernote
Evernote allows you to sync and save your recorded notes, typed notes, and other notes in
one convenient place. It is available for both free as well as a paid version.
It uses platforms like Windows, macOS, Android, iOS, Browser, and Unix.
iii. Outright
Outright is used by management users for the purpose of accounts. It helps to track income,
expenses, profits, and losses in real-time environment.
iv. GoToMeeting
GoToMeeting provides Video Conferencing and online meeting apps, which allows you to
start a meeting with your business partners from anytime, anywhere using mobile phones or
tablets. Using GoToMeeting app, you can perform the tasks related to the management such
as join meetings in seconds, view presentations on the shared screen, get alerts for upcoming
meetings, etc.
7. Social Applications
Social cloud applications allow a large number of users to connect with each other using
social networking applications such as Facebook, Twitter, Linkedln, etc.
i. Facebook
Facebook is a social networking website which allows active users to share files, photos,
videos, status, more to their friends, relatives, and business partners using the cloud storage
system. On Facebook, we will always get notifications when our friends like and comment on
the posts.
ii. Twitter
iii. Yammer
Yammer is the best team collaboration tool that allows a team of employees to chat, share
images, documents, and videos.
iv. LinkedIn
Cloud Computing , which is one of the demanding technology of the current time and which
is giving a new shape to every organization by providing on demand virtualized
services/resources. Starting from small to medium and medium to large, every organization
use cloud computing services for storing information and accessing it from anywhere and any
time only with the help of internet. In this article, we will know more about the internal
architecture of cloud computing.
Transparency, scalability, security and intelligent monitoring are some of the most important
constraints which every cloud infrastructure should experience. Current research on other
important constraints is helping cloud computing system to come up with new features and
strategies with a great capability of providing more advanced cloud solutions.
Cloud Computing Architecture :
The cloud architecture is divided into 2 parts i.e.
1. Frontend
2. Backend
1. Frontend:
Frontend of the cloud architecture refers to the client side of cloud computing system. Means
it contains all the user interfaces and applications which are used by the client to access the
cloud computing services/resources. For example, use of a web browser to access the cloud
platform.
2. Backend:
Backend refers to the cloud itself which is used by the service provider. It contains the
resources as well as manages the resources and provides security mechanisms. Along with
this, it includes huge storage, virtual applications, virtual machines, traffic control
mechanisms, deployment models, etc.
1. Application –
Application in backend refers to a software or platform to which client accesses.
Means it provides the service in backend as per the client requirement.
2. Service –
Service in backend refers to the major three types of cloud based services like SaaS,
PaaS and IaaS. Also manages which type of service the user accesses.
3. Runtime Cloud-
Runtime cloud in backend provides the execution and Runtime platform/environment
to the Virtual machine.
4. Storage –
Storage in backend provides flexible and scalable storage service and management of
stored data.
5. Infrastructure –
Cloud Infrastructure in backend refers to the hardware and software components of
cloud like it includes servers, storage, network devices, virtualization software etc.
6. Management –
Management in backend refers to management of backend components like
application, service, runtime cloud, storage, infrastructure, and other security
mechanisms etc.
7. Security –
Security in backend refers to implementation of different security mechanisms in the
backend for secure cloud resources, systems, files, and infrastructure to end-users.
8. Internet –
Internet connection acts as the medium or a bridge between frontend and backend and
establishes the interaction and communication between frontend and backend.
We have identified a set of architecture styles that are commonly found in cloud applications.
The article for each style includes:
This section gives a quick tour of the architecture styles that we've identified, along with
some high-level considerations for their use. Read more details in the linked topics.
N-tier
N-tier is a natural fit for migrating existing applications that already use a layered
architecture. For that reason, N-tier is most often seen in infrastructure as a service (IaaS)
solutions, or application that use a mix of IaaS and managed services.
Web-Queue-Worker
For a purely PaaS solution, consider a Web-Queue-Worker architecture. In this style, the
application has a web front end that handles HTTP requests and a back-end worker that
performs CPU-intensive tasks or long-running operations. The front end communicates to the
worker through an asynchronous message queue.
Microservices
Event-driven architecture
Consider an event-driven architecture for applications that ingest and process a large volume
of data with very low latency, such as IoT solutions. The style is also useful when different
subsystems must perform different types of processing on the same event data.
Big Data and Big Compute are specialized architecture styles for workloads that fit certain
specific profiles. Big data divides a very large dataset into chunks, performing parallel
processing across the entire set, for analysis and reporting. Big compute, also called high-
performance computing (HPC), makes parallel computations across a large number
(thousands) of cores. Domains include simulations, modeling, and 3-D rendering.
Architecture styles as constraints
An architecture style places constraints on the design, including the set of elements that can
appear and the allowed relationships between those elements. Constraints guide the "shape"
of architecture by restricting the universe of choices. When architecture conforms to the
constraints of a particular style, certain desirable properties emerge.
By adhering to these constraints, what emerges is a system where services can be deployed
independently, faults are isolated, frequent updates are possible, and it's easy to introduce
new technologies into the application.
Before choosing an architecture style, make sure that you understand the underlying
principles and constraints of that style. Otherwise, you can end up with a design that
conforms to the style at a superficial level, but does not achieve the full potential of that style.
It's also important to be pragmatic. Sometimes it's better to relax a constraint, rather than
insist on architectural purity.
The following table summarizes how each style manages dependencies, and the types of
domain that are best suited for each.
Architecture
Dependency management Domain type
style
Web-queue- Front and backend jobs, decoupled by Relatively simple domain with
worker async messaging. some resource intensive tasks.
Constraints also create challenges, so it's important to understand the trade-offs when
adopting any of these styles. Do the benefits of the architecture style outweigh the challenges,
for this subdomain and bounded context.
Here are some of the types of challenges to consider when selecting an architecture style:
Step 1: You have to verify an order. You have your EC2 instances, and they go and check
whether the order is in stock or not. Once the order has been verified, i.e., you have got a
stock, then move to step 2.
Step 2: Now, it works on the Charge Credit card. It checks whether the charge of a credit
card has been successful or not.
Step 3: If the charge of a credit card has been successful, we will ship an order. Shipping an
order needs human interaction. Human brings order from the warehouse, and if the product
has been boxed up means that it is ready for the shipment.
Step 4: Record Completion is a database which says that the product has been boxed up and
shipped to the destination address. It also provides the tracking number. This is the end of the
typical workflow.
Workers are the programs that interact with the Amazon SWF to get the tasks, process
the received tasks, and return the results.
The decider is a program that provides coordination of tasks such as ordering,
concurrency, scheduling, etc according to the application logic.
Both workers and deciders run on the cloud infrastructure such as Amazon EC2, or
machines behind firewalls.
Deciders take a consistent view into the progress of tasks and initiate new tasks while
Amazon SWF stores the tasks and assigns them to the workers to process them.
Amazon SWF ensures that the task is assigned only once and is never duplicated.
Workers and Deciders do not have to keep track of the execution state as Amazon
SWF maintains the state durably.
Both the workers and deciders run independently and scale quickly.
SWF Domains
Domains are containers which isolate a set of types, executions, and task lists from
others within the same account.
Workflow, activity types, and workflow execution are all scoped to a domain.
You can register a domain either by using the AWS Management Console or
RegisterDomain action in the Amazon SWF API.
The parameters are specified in a JSON (Javascript Object Notation) format. The format is
shown below:
1. RegisterDomain
2. {
3. "name" : "867530901";
4. "Description": "music";
5. "workflowExecutionRetentionPeriodInDays": "60";
6. }
Where,
Note: The maximum workflow can be 1 year and its value is measured in seconds.
Amazon SWF provides a task-oriented API while Amazon SQS provides a message-
oriented API./li>
Amazon SWF ensures that the task is assigned only once and is never duplicated.
With Amazon SQS, the message can be duplicated and it may also need to ensure that
a message is processed only once./li>
SWF keeps track of all tasks and events in an application while SQS implements its
own application level tracking when an application uses multiple queues.
Features of SWF
Scalable
Amazon SWF automatically scales the resources along with your application's usage.
There is no manual administration of the workflow service required when you add
more cloud workflows or increase the complexity of the workflows.
Reliable
Amazon SWF runs at Amazon's highly available data centres, therefore the state
tracking is provided whenever applications need them. Amazon SWF stores the tasks,
sends them to their respective application components, keeps a track on their progress.
Simple
Amazon SWF completely replaces the complexity of the old workflow solutions and
process automation software with new cloud workflow internet service. It eliminates
the need for the developers to manage the automation process so that you can focus
on the unique functionality of an application.
Logical separation
Amazon SWF provides a logical separation between the control flow of your
background job's stepwise logic and the actual units of work that contains business
logic. Due to the logical separation, you can separately manage, maintain, and scale
"state machinery" of your application from the business logic. According to the
change in the business requirements, you can easily manage the business logic
without having worry about the state machinery, task dispatch, and flow control.
Flexible
Amazon SWF allows you to modify the application components, i.e., you can modify
the application logic in any programming language and runs them within the cloud or
on-premises.
Zookeeper is a distributed, open-source coordination service for distributed applications. It
exposes a simple set of primitives to implement higher-level services for synchronization,
configuration maintenance, and group and naming.
Coordination Challenge
Apache Zookeeper
Other Components
Client – One of the nodes in our distributed application cluster. Access information
from the server. Every client sends a message to the server to let the server know that
client is alive.
Server– Provides all the services to the client. Gives acknowledgment to the client.
Ensemble– Group of Zookeeper servers. The minimum number of nodes that are
required to form an ensemble is 3.
Zookeeper Data Model
ZooKeeper data model
In Zookeeper, data is stored in a hierarchical namespace, similar to a file system. Each node
in the namespace is called a Znode, and it can store data and have children. Znodes are
similar to files and directories in a file system. Zookeeper provides a simple API for creating,
reading, writing, and deleting Znodes. It also provides mechanisms for detecting changes to
the data stored in Znodes, such as watches and triggers. Znodes maintain a stat structure that
includes: Version number, ACL, Timestamp, Data Length
Types of Znodes:
Zookeeper is used to manage and coordinate the nodes in a Hadoop cluster, including the
NameNode, DataNode, and ResourceManager. In a Hadoop cluster, Zookeeper helps to:
Zookeeper helps to ensure the availability and reliability of a Hadoop cluster by providing a
central coordination service for the nodes in the cluster.
ZooKeeper operates as a distributed file system and exposes a simple set of APIs that enable
clients to read and write data to the file system. It stores its data in a tree-like structure called
a znode, which can be thought of as a file or a directory in a traditional file system.
ZooKeeper uses a consensus algorithm to ensure that all of its servers have a consistent view
of the data stored in the Znodes. This means that if a client writes data to a znode, that data
will be replicated to all of the other servers in the ZooKeeper ensemble.
One important feature of ZooKeeper is its ability to support the notion of a “watch.” A watch
allows a client to register for notifications when the data stored in a znode changes. This can
be useful for monitoring changes to the data stored in ZooKeeper and reacting to those
changes in a distributed system.
ZooKeeper is an essential component of Hadoop and plays a crucial role in coordinating the
activity of its various subcomponents.
ZooKeeper provides a simple and reliable interface for reading and writing data. The data is
stored in a hierarchical namespace, similar to a file system, with nodes called znodes. Each
znode can store data and have children znodes. ZooKeeper clients can read and write data to
these znodes by using the getData() and setData() methods, respectively. Here is an example
of reading and writing data using the ZooKeeper Java API:
System.out.println(readData);
zk.close();
Session and Watches
Session
Watches
Watches are mechanisms for clients to get notifications about the changes in the
Zookeeper
Client can watch while reading a particular znode.
Znodes changes are modifications of data associated with the znodes or changes in the
znode’s children.
Watches are triggered only once.
If the session is expired, watches are also removed.
Hadoop: Zookeeper is used by the Hadoop distributed file system (HDFS) to manage
the coordination of data processing tasks and metadata.
Apache Kafka: Zookeeper is used by Apache Kafka, a distributed streaming platform,
to manage the coordination of Kafka brokers and consumer groups.
Apache Storm: Zookeeper is used by Apache Storm, a distributed real-time
processing system, to manage the coordination of worker processes and task
assignments.
LinkedIn: Zookeeper is used by LinkedIn, the social networking site, to manage the
coordination of distributed systems and services.
Yahoo!: Zookeeper is used by Yahoo!, the internet company, to manage the
coordination of distributed systems and services.
MapReduce
MapReduce and HDFS are the two major components of Hadoop which makes it so powerful
and efficient to use. MapReduce is a programming model used for efficient processing in
parallel over large data-sets in a distributed manner. The data is first split and then combined
to produce the final result. The libraries for MapReduce is written in so many programming
languages with various different-different optimizations. The purpose of MapReduce in
Hadoop is to Map each of the jobs and then it will reduce it to equivalent tasks for providing
less overhead over the cluster network and to reduce the processing power. The MapReduce
task is mainly divided into two phases Map Phase and Reduce Phase.
MapReduce Architecture:
1. Client: The MapReduce client is the one who brings the Job to the MapReduce for
processing. There can be multiple clients available that continuously send jobs for
processing to the Hadoop MapReduce Manager.
2. Job: The MapReduce Job is the actual work that the client wanted to do which is
comprised of so many smaller tasks that the client wants to process or execute.
3. Hadoop MapReduce Master: It divides the particular job into subsequent job-parts.
4. Job-Parts: The task or sub-jobs that are obtained after dividing the main job. The
result of all the job-parts combined to produce the final output.
5. Input Data: The data set that is fed to the MapReduce for processing.
6. Output Data: The final result is obtained after the processing.
In MapReduce, we have a client. The client will submit the job of a particular size to the
Hadoop MapReduce Master. Now, the MapReduce master will divide this job into further
equivalent job-parts. These job-parts are then made available for the Map and Reduce Task.
This Map and Reduce task will contain the program as per the requirement of the use-case
that the particular company is solving. The developer writes their logic to fulfill the
requirement that the industry requires. The input data which we are using is then fed to the
Map Task and the Map will generate intermediate key-value pair as its output. The output of
Map i.e. these key-value pairs are then fed to the Reducer and the final output is stored on the
HDFS. There can be n number of Map and Reduce tasks made available for processing the
data as per the requirement. The algorithm for Map and Reduce is made with a very
optimized way such that the time complexity or space complexity is minimum.
Let’s discuss the MapReduce phases to get a better understanding of its architecture:
The MapReduce task is mainly divided into 2 phases i.e. Map phase and Reduce phase.
1. Map: As the name suggests its main use is to map the input data in key-value pairs.
The input to the map may be a key-value pair where the key can be the id of some
kind of address and value is the actual value that it keeps. The Map() function will be
executed in its memory repository on each of these input key-value pairs and
generates the intermediate key-value pair which works as input for the Reducer or
Reduce() function.
2. Reduce: The intermediate key-value pairs that work as input for Reducer are shuffled
and sort and send to the Reduce() function. Reducer aggregate or group the data based
on its key-value pair as per the reducer algorithm written by the developer.
How Job tracker and the task tracker deal with MapReduce:
1. Job Tracker: The work of Job tracker is to manage all the resources and all the jobs
across the cluster and also to schedule each map on the Task Tracker running on the
same data node since there can be hundreds of data nodes available in the cluster.
2. Task Tracker: The Task Tracker can be considered as the actual slaves that are
working on the instruction given by the Job Tracker. This Task Tracker is deployed
on each of the nodes available in the cluster that executes the Map and Reduce task as
instructed by Job Tracker.
There is also one important component of Map Reduce Architecture known as Job History
Server. The Job History Server is a daemon process that saves and stores historical
information about the task or application, like the logs which are generated during or after the
job execution are stored on Job History Server.
MapReduce is a programming model used to perform distributed processing in parallel in a
Hadoop cluster, which Makes Hadoop working so fast. When you are dealing with Big Data,
serial processing is no more of any use. MapReduce has mainly two tasks which are divided
phase-wise:
Map Task
Reduce Task
Let us understand it with a real-time example, and the example helps you understand
Mapreduce Programming Model in a story manner:
Suppose the Indian government has assigned you the task to count the population of
India. You can demand all the resources you want, but you have to do this task in 4
months. Calculating the population of such a large country is not an easy task for a
single person(you). So what will be your approach?.
One of the ways to solve this problem is to divide the country by states and assign
individual in-charge to each state to count the population of that state.
Task Of Each Individual: Each Individual has to visit every home present in the state
and need to keep a record of each house members as:
State_Name Member_House1
State_Name Member_House2
State_Name Member_House3
.
.
State_Name Member_House n
.
.
Once they have counted each house member in their respective state. Now they need
to sum up their results and need to send it to the Head-quarter at New Delhi.
We have a trained officer at the Head-quarter to receive all the results from each state
and aggregate them by each state to get the population of that entire state. and Now,
with this approach, you are easily able to count the population of India by summing
up the results obtained at Head-quarter.
The Indian Govt. is happy with your work and the next year they asked you to do the
same job in 2 months instead of 4 months. Again you will be provided with all the
resources you want.
Since the Govt. has provided you with all the resources, you will simply double the
number of assigned individual in-charge for each state from one to two. For that
divide each state in 2 division and assigned different in-charge for these two divisions
as:
State_Name_Incharge_division1
State_Name_Incharge_division2
Similarly, each individual in charge of its division will gather the information about
members from each house and keep its record.
We can also do the same thing at the Head-quarters, so let’s also divide the Head-
quarter in two division as:
Head-qurter_Division1
Head-qurter_Division2
Now with this approach, you can find the population of India in two months. But
there is a small problem with this, we never want the divisions of the same state to
send their result at different Head-quarters then, in that case, we have the partial
population of that state in Head-quarter_Division1 and Head-quarter_Division2 which
is inconsistent because we want consolidated population by the state, not the partial
counting.
One easy way to solve is that we can instruct all individuals of a state to either send
there result to Head-quarter_Division1 or Head-quarter_Division2. Similarly, for all
the states.
Our problem has been solved, and you successfully did it in two months.
Now, if they ask you to do this process in a month, you know how to approach the
solution.
Great, now we have a good scalable model that works so well. The model we have
seen in this example is like the Map Reduce Programming model. so now you must be
aware that Map Reduce is a programming model, not a programming language.
Now let’s discuss the phases and important things involved in our model.
1. Map Phase: The Phase where the individual in-charges are collecting the population of
each house in their division is Map Phase.
2. Reduce Phase: The Phase where you are aggregating your result
Reducers: Individuals who are aggregating the actual result. Here in our example, the
trained-officers. Each Reducer produce the output as a key-value pair
3. Shuffle Phase: The Phase where the data is copied from Mappers to Reducers is
Shuffler’s Phase. It comes in between Map and Reduces phase. Now the Map Phase,
Reduce Phase, and Shuffler Phase our the three main Phases of our Mapreduce.
It is the use of parallel processing for running advanced application programs efficiently,
relatives, and quickly. The term applies especially is a system that function above a teraflop
(1012) (floating opm per second). The term High-performance computing is occasionally used
as a synonym for supercomputing. Although technically a supercomputer is a system that
performs at or near currently highest operational rate for computers. Some supercomputers
work at more than a petaflop (1012) floating points opm per second. The most common HPC
system all scientific engineers & academic institutions. Some Government agencies
particularly military are also relying on APC for complex applications.
High-performance Computers:
High Performance Computing (HPC) generally refers to the practice of combining computing
power to deliver far greater performance than a typical desktop or workstation, in order to
solve complex problems in science, engineering, and business.
1. Climate modelling
2. Drug discovery
3. Data Analysis
4. Protein folding
5. Energy research
To achieve maximum efficiency, each module must keep pace with others, otherwise, the
performance of the entire HPC infrastructure would suffer.
UNIT III
Cloud Resource virtualization: Virtualization, layering and virtualization, virtual machine
monitors, virtual machines, virtualization- full and para, performance and security isolation,
hardware support for virtualization, Case Study: Xen, vBlades,
Virtualization is the "creation of a virtual (rather than actual) version of something, such as a
server, a desktop, a storage device, an operating system or network resources".
In other words, Virtualization is a technique, which allows to share a single physical instance
of a resource or an application among multiple customers and organizations. It does by
assigning a logical name to a physical storage and providing a pointer to that physical
resource when demanded.
Creation of a virtual machine over existing operating system and hardware is known as
Hardware Virtualization. A Virtual machine provides an environment that is logically
separated from the underlying hardware.
The machine on which the virtual machine is going to create is known as Host Machine and
that virtual machine is referred as a Guest Machine
Types of Virtualization:
1. Hardware Virtualization.
2. Operating system Virtualization.
3. Server Virtualization.
4. Storage Virtualization.
1) Hardware Virtualization:
When the virtual machine software or virtual machine manager (VMM) is directly installed
on the hardware system is known as hardware virtualization.
The main job of hypervisor is to control and monitoring the processor, memory and other
hardware resources.
After virtualization of hardware system we can install different operating system on it and run
different applications on those OS.
Usage:
Hardware virtualization is mainly done for the server platforms, because controlling virtual
machines is much easier than controlling a physical server.
When the virtual machine software or virtual machine manager (VMM) is installed on the
Host operating system instead of directly on the hardware system is known as operating
system virtualization.
Usage:
Operating System Virtualization is mainly used for testing the applications on different
platforms of OS.
3) Server Virtualization:
When the virtual machine software or virtual machine manager (VMM) is directly installed
on the Server system is known as server virtualization.
Usage:
Server virtualization is done because a single physical server can be divided into multiple
servers on the demand basis and for balancing the load.
4) Storage Virtualization:
Storage virtualization is the process of grouping the physical storage from multiple network
storage devices so that it looks like a single storage device.
Usage:
Virtualization plays a very important role in the cloud computing technology, normally in
the cloud computing, users share the data present in the clouds like application etc, but
actually with the help of virtualization users shares the Infrastructure.
The main usage of Virtualization Technology is to provide the applications with the
standard versions to their cloud users, suppose if the next version of that application is
released, then cloud provider has to provide the latest version to their cloud users and
practically it is possible because it is more expensive.
Conclusion
Mainly Virtualization means, running multiple operating systems on a single machine but
sharing all the hardware resources. And it helps us to provide the pool of IT resources so that
we can share these IT resources in order get benefits in the business.
Data Virtualization
Data virtualization is the process of retrieve data from various resources without knowing its
type and physical location where it is stored. It collects heterogeneous data from different
resources and allows data users across the organization to access this data according to their
work requirements. This heterogeneous data can be accessed using any application such as
web portals, web services, E-commerce, Software as a Service (SaaS), and mobile
application.
We can use Data Virtualization in the field of data integration, business intelligence, and
cloud computing.
It allows users to access the data without worrying about where it resides on the
memory.
It offers better customer satisfaction, retention, and revenue growth.
It provides various security mechanism that allows users to safely store their personal
and professional information.
It reduces costs by removing data replication.
It provides a user-friendly interface to develop customized views.
It provides various simple and fast deployment resources.
It increases business user efficiency by providing data in real-time.
It is used to perform tasks such as data integration, business integration, Service-
Oriented Architecture (SOA) data services, and enterprise search.
1. Analyze performance
Data Virtualization (DV) provides a mechanism to easily search the data which is similar and
internally related to each other.
It is one of the most common uses of Data Virtualization. It is used in agile reporting, real-
time dashboards that require timely aggregation, analyze and present the relevant data from
multiple resources. Both individuals and managers use this to monitor performance, which
helps to make daily operational decision processes such as sales, support, finance, logistics,
legal, and compliance.
4. Data Management
Data virtualization provides a secure centralized layer to search, discover, and govern the
unified data and its relationships.
Red Hat virtualization is the best choice for developers and those who are using micro
services and containers. It is written in Java.
2. TIBCO data virtualization
TIBCO helps administrators and users to create a data virtualization platform for accessing
the multiple data sources and data sets. It provides a builtin transformation engine to
combine non-relational and un-structured data sources.
It is a very popular and powerful data integrator tool which is mainly worked with Oracle
products. It allows organizations to quickly develop and manage data services to access a
single view of data.
SAS Federation Server provides various technologies such as scalable, multi-user, and
standards-based data access to access data from multiple data services. It mainly focuses on
securing data.
5. Denodo
Denodo is one of the best data virtualization tools which allows organizations to minimize the
network traffic load and improve response time for large data sets. It is suitable for both small
as well as large organizations.
Virtual Machine abstracts the hardware of our personal computer such as CPU, disk drives,
memory, NIC (Network Interface Card) etc, into many different execution environments as
per our requirements, hence giving us a feel that each execution environment is a single
computer. For example, VirtualBox.
When we run different processes on an operating system, it creates an illusion that each
process is running on a different processor having its own virtual memory, with the help of
CPU scheduling and virtual-memory techniques. There are additional features of a process
that cannot be provided by the hardware alone like system calls and a file system. The virtual
machine approach does not provide these additional functionalities but it only provides an
interface that is same as basic hardware. Each process is provided with a virtual copy of the
underlying computer system.
We can create a virtual machine for several reasons, all of which are fundamentally related to
the ability to share the same basic hardware yet can also support different execution
environments, i.e., different operating systems simultaneously.
The main drawback with the virtual-machine approach involves disk systems. Let us suppose
that the physical machine has only three disk drives but wants to support seven virtual
machines. Obviously, it cannot allocate a disk drive to each virtual machine, because virtual-
machine software itself will need substantial disk space to provide virtual memory and
spooling. The solution is to provide virtual disks.
Users are thus given their own virtual machines. After which they can run any of the
operating systems or software packages that are available on the underlying machine. The
virtual-machine software is concerned with multi-programming multiple virtual machines
onto a physical machine, but it does not need to consider any user-support software. This
arrangement can provide a useful way to divide the problem of designing a multi-user
interactive system, into two smaller pieces.
Advantages:
1. There are no protection problems because each virtual machine is completely isolated
from all other virtual machines.
2. Virtual machine can provide an instruction set architecture that differs from real
computers.
3. Easy maintenance, availability and convenient recovery.
Disadvantages:
1. When multiple virtual machines are simultaneously running on a host computer, one
virtual machine can be affected by other running virtual machines, depending on the
workload.
2. Virtual machines are not as efficient as a real one when accessing the hardware.
Types of Virtual Machines : You can classify virtual machines into two types:
1. System Virtual Machine: These types of virtual machines gives us complete system
platform and gives the execution of the complete virtual operating system. Just like virtual
box, system virtual machine is providing an environment for an OS to be installed
completely. We can see in below image that our hardware of Real Machine is being
distributed between two simulated operating systems by Virtual machine monitor. And then
some programs, processes are going on in that distributed hardware of simulated machines
separately.
3. Process Virtual Machine: While process virtual machines, unlike system virtual
machine, does not provide us with the facility to install the virtual operating system
completely. Rather it creates virtual environment of that OS while using some app or
program and this environment will be destroyed as soon as we exit from that app.
Like in below image, there are some apps running on main OS as well some virtual
machines are created to run other apps. This shows that as those programs required
different OS, process virtual machine provided them with that for the time being those
programs are running. Example – Wine software in Linux helps to run Windows
applications.
Virtual Machine Language : It’s type of language which can be understood by different
operating systems. It is platform-independent. Just like to run any programming language (C,
python, or java) we need specific compiler that actually converts that code into system
understandable code (also known as byte code). The same virtual machine language works. If
we want to use code that can be executed on different types of operating systems like
(Windows, Linux, etc) then virtual machine language will be helpful.
1. Full Virtualization: Full Virtualization was introduced by IBM in the year 1966. It is the
first software solution for server virtualization and uses binary translation and direct approach
techniques. In full virtualization, guest OS is completely isolated by the virtual machine from
the virtualization layer and hardware. Microsoft and Parallels systems are examples of full
virtualization.
2. Para virtualization: Para virtualization is the category of CPU virtualization which uses
hyper calls for operations to handle instructions at compile time. In paravirtualization, guest
OS is not completely isolated but it is partially isolated by the virtual machine from the
virtualization layer and hardware. VMware and Xen are some examples of paravirtualization.
The difference between Full Virtualization and Para virtualization are as follows:
Hardware Virtualization
Previously, there was "one to one relationship" between physical servers and operating
system. Low capacity of CPU, memory, and networking requirements were available. So, by
using this model, the costs of doing business increased. The physical space, amount of power,
and hardware required meant that costs were adding up.
The hypervisor manages shared the physical resources of the hardware between the guest
operating systems and host operating system. The physical resources become abstracted
versions in standard formats regardless of the hardware platform. The abstracted hardware is
represented as actual hardware. Then the virtualized operating system looks into these
resources as they are physical entities.
When the virtual machine software or virtual machine manager (VMM) or hypervisor
software is directly installed on the hardware system is known as hardware virtualization.
The main job of hypervisor is to control and monitoring the processor, memory and other
hardware resources.
After virtualization of hardware system we can install different operating system on it and run
different applications on those OS.
Hardware virtualization is mainly done for the server platforms, because controlling virtual
machines is much easier than controlling a physical server.
The main benefits of hardware virtualization are more efficient resource utilization, lower
overall costs as well as increased uptime and IT flexibility.
Physical resources can be shared among virtual machines. Although the unused resources can
be allocated to a virtual machine and that can be used by other virtual machines if the need
exists.
Now it is possible for multiple operating systems can co-exist on a single hardware platform,
so that the number of servers, rack space, and power consumption drops significantly.
The modern hypervisors provide highly orchestrated operations that maximize the abstraction
of the hardware and help to ensure the maximum uptime. These functions help to migrate a
running virtual machine from one host to another dynamically, as well as maintain a running
copy of virtual machine on another physical host in case the primary host fails.
4) Increased IT Flexibility:
Hardware virtualization helps for quick deployment of server resources in a managed and
consistent ways. That results in IT being able to adapt quickly and provide the business with
resources needed in good time.
Xen is an open source hypervisor based on paravirtualization. It is the most popular
application of paravirtualization. Xen has been extended to compatible with full virtualization
using hardware-assisted virtualization. It enables high performance to execute guest operating
system. This is probably done by removing the performance loss while executing the
instructions requiring significant handling and by modifying portion of the guest operating
system executed by Xen, with reference to the execution of such instructions. Hence this
especially support x86, which is the most used architecture on commodity machines and
servers.
Above figure describes the Xen Architecture and its mapping onto a classic x86 privilege
model. A Xen based system is handled by Xen hypervisor, which is executed in the most
privileged mode and maintains the access of guest operating system to the basic hardware.
Guest operating system are run between domains, which represents virtual machine
instances.
In addition, particular control software, which has privileged access to the host and handles
all other guest OS, runs in a special domain called Domain 0. This the only one loaded once
the virtual machine manager has fully booted, and hosts an HTTP server that delivers
requests for virtual machine creation, configuration, and termination. This component
establishes the primary version of a shared virtual machine manager (VMM), which is a
necessary part of Cloud computing system delivering Infrastructure-as-a-Service (IaaS)
solution.
Various x86 implementation support four distinct security levels, termed as rings, i.e.,
Ring 0,
Ring 1,
Ring 2,
Ring 3
Here, Ring 0 represents the level having most privilege and Ring 3 represents the level
having least privilege. Almost all the frequently used Operating system, except for OS/2, uses
only two levels i.e. Ring 0 for the Kernel code and Ring 3 for user application and non-
privilege OS program. This provides a chance to the Xen to implement paravirtualization.
This enables Xen to control unchanged the Application Binary Interface (ABI) thus allowing
a simple shift to Xen-virtualized solutions, from an application perspective.
Due to the structure of x86 instruction set, some instructions allow code execution in Ring 3
to switch to Ring 0 (Kernel mode). Such an operation is done at hardware level, and hence
between a virtualized environment, it will lead to a TRAP or a silent fault, thus preventing the
general operation of the guest OS as it is now running in Ring 1.
This condition is basically occurred by a subset of system calls. To eliminate this situation,
implementation in operating system requires a modification and all the sensitive system calls
needs re-implementation with hypercalls. Here, hypercalls are the particular calls revealed by
the virtual machine (VM) interface of Xen and by use of it, Xen hypervisor tends to catch the
execution of all the sensitive instructions, manage them, and return the control to the guest
OS with the help of a supplied handler.
Paravirtualization demands the OS codebase be changed, and hence all operating systems can
not be referred to as guest OS in a Xen-based environment. This condition holds where
hardware-assisted virtualization can not be free, which enables to run the hypervisor in Ring
1 and the guest OS in Ring 0. Hence, Xen shows some limitations in terms of legacy
hardware and in terms of legacy OS.
In fact, these are not possible to modify to be run in Ring 1 safely as their codebase is not
reachable, and concurrently, the primary hardware hasn’t any support to execute them in a
more privileged mode than Ring 0. Open source OS like Linux can be simply modified as its
code is openly available, and Xen delivers full support to virtualization, while components of
Windows are basically not compatible with Xen, unless hardware-assisted virtualization is
available. As new releases of OS are designed to be virtualized, the problem is getting
resolved and new hardware supports x86 virtualization.
Pros:
Cons:
The goal of the project à create a VMM for the Itanium family of x86-64 processors,
developed jointly by HP and Intel.
Itanium à based on explicitly parallel instruction computing (EPIC), that allows the
processor to execute multiple instructions in each clock cycle. EPIC implements a
form of Very Long Instruction Word (VLIW) architecture; a single instruction word
contains multiple instructions.
A 128-bit instruction word contains three instructions; the fetch mechanism can read
up to two instruction words per clock from the L1 cache into the pipeline.
The hardware supports 64-bit addressing; it has 32, 64-bit general-purpose registers,
R0 - R31 and 96 automatically renumbered registers R32 - R127 used by procedure
calls. When a procedure is entered, the alloc instruction specifies the registers the
procedure could access by setting the bits of a 7-bit field that controls the register
usage; an illegal read operation from such a register out of range returns a zero value
while an illegal write operation to it is trapped as an illegal instruction.
UNIT IV
Storage Systems: Evolution of storage technology, storage models, file systems and
database, distributed file systems, general parallel file systems. Google file system. Apache
Hadoop, Big Table, Megastore (text book 1), Amazon Simple Storage Service(S3) (Text
book 2), Cloud Security: Cloud security risks, security – a top concern for cloud users,
privacy and privacy
Cloud computing is all about renting computing services. This idea first came in the 1950s. In
making cloud computing what it is today, five technologies played a vital role. These are
distributed systems and its peripherals, virtualization, web 2.0, service orientation, and utility
computing.
Distributed Systems:
It is a composition of multiple independent systems but all of them are depicted as a
single entity to the users. The purpose of distributed systems is to share resources and
also use them effectively and efficiently. Distributed systems possess characteristics
such as scalability, concurrency, continuous availability, heterogeneity, and
independence in failures. But the main problem with this system was that all the
systems were required to be present at the same geographical location. Thus to solve
this problem, distributed computing led to three more types of computing and they
were-Mainframe computing, cluster computing, and grid computing.
Mainframe computing:
Mainframes which first came into existence in 1951 are highly powerful and reliable
computing machines. These are responsible for handling large data such as massive
input-output operations. Even today these are used for bulk processing tasks such as
online transactions etc. These systems have almost no downtime with high fault
tolerance. After distributed computing, these increased the processing capabilities of
the system. But these were very expensive. To reduce this cost, cluster computing
came as an alternative to mainframe technology.
Cluster computing:
In 1980s, cluster computing came as an alternative to mainframe computing. Each
machine in the cluster was connected to each other by a network with high bandwidth.
These were way cheaper than those mainframe systems. These were equally capable
of high computations. Also, new nodes could easily be added to the cluster if it was
required. Thus, the problem of the cost was solved to some extent but the problem
related to geographical restrictions still pertained. To solve this, the concept of grid
computing was introduced.
Grid computing:
In 1990s, the concept of grid computing was introduced. It means that different
systems were placed at entirely different geographical locations and these all were
connected via the internet. These systems belonged to different organizations and thus
the grid consisted of heterogeneous nodes. Although it solved some problems but new
problems emerged as the distance between the nodes increased. The main problem
which was encountered was the low availability of high bandwidth connectivity and
with it other network associated issues. Thus. cloud computing is often referred to as
“Successor of grid computing”.
Virtualization:
It was introduced nearly 40 years back. It refers to the process of creating a virtual
layer over the hardware which allows the user to run multiple instances
simultaneously on the hardware. It is a key technology used in cloud computing. It is
the base on which major cloud computing services such as Amazon EC2, VMware
vCloud, etc work on. Hardware virtualization is still one of the most common types of
virtualization.
Web 2.0:
It is the interface through which the cloud computing services interact with the clients.
It is because of Web 2.0 that we have interactive and dynamic web pages. It also
increases flexibility among web pages. Popular examples of web 2.0 include Google
Maps, Facebook, Twitter, etc. Needless to say, social media is possible because of this
technology only. In gained major popularity in 2004.
Service orientation:
It acts as a reference model for cloud computing. It supports low-cost, flexible, and
evolvable applications. Two important concepts were introduced in this computing
model. These were Quality of Service (QoS) which also includes the SLA (Service
Level Agreement) and Software as a Service (SaaS).
Utility computing:
It is a computing model that defines service provisioning techniques for services such
as compute services along with other major services such as storage, infrastructure,
etc which are provisioned on a pay-per-use basis.
Type-1 :
Block-Based Storage System –
Hard drives are block-based storage systems. Your operating system like Windows or
Linux actually sees a hard disk drive. So, it sees a drive on which you can create a
volume, and then you can partition that volume and format them.
For example, If a system has 1000 GB of volume, then we can partition it into 800
GB and 200 GB for local C and local D drive respectively.
Remember with a block-based storage system, your computer would see a drive, and
then you can create volumes and partitions.
Type-2 :
File-Based Storage System –
In this, you are actually connecting through a Network Interface Card (NIC). You are
going over a network, and then you can access the network-attached storage server
(NAS). NAS devices are file-based storage systems.
This storage server is another computing device that has another disk in it. It is
already created a file system so that it’s already formatted its partitions, and it will
share its file systems over the network. Here, you can actually map the drive to its
network location.
In this, like the previous one, there is no need to partition and format the volume by
the user. It’s already done in file-based storage systems. So, the operating system sees
a file system that is mapped to a local drive letter.
Type-3 :
Object-Based Storage System –
In this, a user uploads objects using a web browser and uploading an object to a
container i.e, Object Storage Container. This uses the HTTP Protocols with the rest of
the APIs (example: GET, PUT, POST, SELECT, DELETE).
For example, when you connect to any website, and you need to download some
images, text, or anything that the website contains. For that, it is a code HTTP GET
request. If you want to review any product then you can use PUT and POST requests.
Also, there is no hierarchy of objects in the container. Every file is on the same level
in an Object-Based storage system.
Advantages:
Scalability –
Capacity and storage can be expanded and performance can be enhanced.
Flexibility –
Data can be manipulated and scaled according to the rules.
Disadvantages:
Data centres require electricity and proper internet facility to operate their work,
failing in which system will not work properly.
File based systems were an early attempt to computerize the manual system. It is also called a
traditional based approach in which a decentralized approach was taken where each
department stored and controlled its own data with the help of a data processing specialist.
The main role of a data processing specialist was to create the necessary computer file
structures, and also manage the data within structures and design some application programs
that create reports based on file data.
Some fields are duplicated in more than one file, which leads to data redundancy. So to
overcome this problem, we need to create a centralized system, i.e. DBMS approach.
DBMS:
A Distributed File System (DFS) as the name suggests, is a file system that is distributed on
multiple file servers or multiple locations. It allows programs to access or store isolated files
as they do with the local ones, allowing programmers to access files from any network or
computer.
The main purpose of the Distributed File System (DFS) is to allows users of physically
distributed systems to share their data and resources by using a Common File System. A
collection of workstations and mainframes connected by a Local Area Network (LAN) is a
configuration on Distributed File System. A DFS is executed as a part of the operating
system. In DFS, a namespace is created and this process is transparent for the clients.
Location Transparency –
Location Transparency achieves through the namespace component.
Redundancy –
Redundancy is done through a file replication component.
In the case of failure and heavy load, these components together improve data availability by
allowing the sharing of data in different locations to be logically grouped under one folder,
which is known as the “DFS root”.
It is not necessary to use both the two components of DFS together, it is possible to use the
namespace component without using the file replication component and it is perfectly
possible to use the file replication component without using the namespace component
between servers.
Early iterations of DFS made use of Microsoft’s File Replication Service (FRS), which
allowed for straightforward file replication between servers. The most recent iterations of the
whole file are distributed to all servers by FRS, which recognises new or updated files.
“DFS Replication” was developed by Windows Server 2003 R2 (DFSR). By only copying
the portions of files that have changed and minimising network traffic with data compression,
it helps to improve FRS. Additionally, it provides users with flexible configuration options to
manage network traffic on a configurable schedule.
Features of DFS :
Transparency :
o Structure transparency –
There is no need for the client to know about the number or locations of file
servers and the storage devices. Multiple file servers should be provided for
performance, adaptability, and dependability.
o Access transparency –
Both local and remote files should be accessible in the same manner. The file
system should be automatically located on the accessed file and send it to the
client’s side.
o Naming transparency –
There should not be any hint in the name of the file to the location of the file.
Once a name is given to the file, it should not be changed during transferring
from one node to another.
o Replication transparency –
If a file is copied on multiple nodes, both the copies of the file and their
locations should be hidden from one node to another.
User mobility :
It will automatically bring the user’s home directory to the node where the user logs
in.
Performance :
Performance is based on the average amount of time needed to convince the client
requests. This time covers the CPU time + time taken to access secondary storage +
network access time. It is advisable that the performance of the Distributed File
System be similar to that of a centralized file system.
Simplicity and ease of use :
The user interface of a file system should be simple and the number of commands in
the file should be small.
High availability :
A Distributed File System should be able to continue in case of any partial failures
like a link failure, a node failure, or a storage drive crash.
A high authentic and adaptable distributed file system should have different and
independent file servers for controlling different and independent storage devices.
Scalability :
Since growing the network by adding new machines or joining two networks together
is routine, the distributed system will inevitably grow over time. As a result, a good
distributed file system should be built to scale quickly as the number of nodes and
users in the system grows. Service should not be substantially disrupted as the number
of nodes and users grows.
High reliability :
The likelihood of data loss should be minimized as much as feasible in a suitable
distributed file system. That is, because of the system’s unreliability, users should not
feel forced to make backup copies of their files. Rather, a file system should create
backup copies of key files that can be used if the originals are lost. Many file systems
employ stable storage as a high-reliability strategy.
Data integrity :
Multiple users frequently share a file system. The integrity of data saved in a shared
file must be guaranteed by the file system. That is, concurrent access requests from
many users who are competing for access to the same file must be correctly
synchronized using a concurrency control method. Atomic transactions are a high-
level concurrency management mechanism for data integrity that is frequently offered
to users by a file system.
Security :
A distributed file system should be secure so that its users may trust that their data
will be kept private. To safeguard the information contained in the file system from
unwanted & unauthorized access, security mechanisms must be implemented.
Heterogeneity :
Heterogeneity in distributed systems is unavoidable as a result of huge scale. Users of
heterogeneous distributed systems have the option of using multiple computer
platforms for different purposes.
History :
The server component of the Distributed File System was initially introduced as an add-on
feature. It was added to Windows NT 4.0 Server and was known as “DFS 4.1”. Then later on
it was included as a standard component for all editions of Windows 2000 Server. Client-side
support has been included in Windows NT 4.0 and also in later on version of Windows.
Linux kernels 2.6.14 and versions after it come with an SMB client VFS known as “cifs”
which supports DFS. Mac OS X 10.7 (lion) and onwards supports Mac OS X DFS.
Properties:
File transparency: users can access files without knowing where they are physically
stored on the network.
Load balancing: the file system can distribute file access requests across multiple
computers to improve performance and reliability.
Data replication: the file system can store copies of files on multiple computers to
ensure that the files are available even if one of the computers fails.
Security: the file system can enforce access control policies to ensure that only
authorized users can access files.
Scalability: the file system can support a large number of users and a large number of
files.
Concurrent access: multiple users can access and modify the same file at the same
time.
Fault tolerance: the file system can continue to operate even if one or more of its
components fail.
Data integrity: the file system can ensure that the data stored in the files is accurate
and has not been corrupted.
File migration: the file system can move files from one location to another without
interrupting access to the files.
Data consistency: changes made to a file by one user are immediately visible to all
other users.
Support for different file types: the file system can support a wide range of file types,
including text files, image files, and video files.
Applications:
NFS –
NFS stands for Network File System. It is a client-server architecture that allows a
computer user to view, store, and update files remotely. The protocol of NFS is one of
the several distributed file system standards for Network-Attached Storage (NAS).
CIFS –
CIFS stands for Common Internet File System. CIFS is an accent of SMB. That is,
CIFS is an application of SIMB protocol, designed by Microsoft.
SMB –
SMB stands for Server Message Block. It is a protocol for sharing a file and was
invented by IMB. The SMB protocol was created to allow computers to perform read
and write operations on files to a remote host over a Local Area Network (LAN). The
directories present in the remote host can be accessed via SMB and are called as
“shares”.
Hadoop –
Hadoop is a group of open-source software services. It gives a software framework
for distributed storage and operating of big data using the MapReduce programming
model. The core of Hadoop contains a storage part, known as Hadoop Distributed File
System (HDFS), and an operating part which is a MapReduce programming model.
NetWare –
NetWare is an abandon computer network operating system developed by Novell, Inc.
It primarily used combined multitasking to run different services on a personal
computer, using the IPX network protocol.
Working of DFS :
Disadvantages:
Cloud computing is a popular choice among IT professionals and companies in the digital
marketing industry. It allows users to access shared resources through the Internet with little
to no up-front investment. Companies that offer cloud computing services typically charge
clients a flat fee per month or yearly contract, but they also might offer free cloud hosting
options for individuals who want to try it out before paying for a subscription plan. The
downside of using cloud services is that data can easily be lost by accessing it from multiple
computers simultaneously without locking down the file system in order to prevent users
from interfering with one another files.
Terminologies in Cloud Computing:
Parallel File System: The parallel file system is a system that is used to store data
across multiple network servers. It provides high-performance network access through
parallel coordinated input-output operations. This is a file system that allows
concurrent access to data by more than one user.
Flock: A group of processes (corresponding to a group of threads) sharing the same
memory image.
Flock semantics: The properties describe how an entity can be accessed by other
processes within the flock when it is not active. In flock semantics, only one process
at a time may have exclusive access to an entity and all other processes must share the
same view of the entity, even if it is active or protected.
Cloud computing gives users a lot of freedom to access the data and resources that they need
on demand. However, when it comes to accessing data, it’s important that we shouldn’t lose
the data from different machines at the same time. Without locking down file system access
between different machines, there is a high risk of losing or corrupting important data across
multiple computers at once. This can make managing files difficult because certain users may
end up accessing a file while others are trying to edit it at the same time.
Example: Google File System is a cloud file system that uses a parallel file system. Google
File System (GFS) is a scalable distributed file system that provides consistently high
performance across tens of thousands of commodity servers. It manages huge data sets across
dynamic clusters of computers using only application-level replication and auto-recovery
techniques. This architecture provides high availability with no single point of failure and
supports expected and/or unexpected hardware and software failures without data loss or
system shutdown.
There are two main types of parallel file systems:
Advantages:
Data integrity
Data security
Disaster recovery
Disadvantages:
Low scalability
Less performance
Google Inc. developed the Google File System (GFS), a scalable distributed file system
(DFS), to meet the company’s growing data processing needs. GFS offers fault tolerance,
dependability, scalability, availability, and performance to big networks and connected nodes.
GFS is made up of a number of storage systems constructed from inexpensive commodity
hardware parts. The search engine, which creates enormous volumes of data that must be
kept, is only one example of how it is customized to meet Google’s various data use and
storage requirements.
The Google File System reduced hardware flaws while gains of commercially available
servers. Google FS is another name for GFS. It manages two types of data namely File
metadata and File Data. The GFS node cluster consists of a single master and several chunk
servers that various client systems regularly access. On local discs, chunk servers keep data in
the form of Linux files. Large (64 MB) pieces of the stored data are split up and replicated at
least three times around the network. Reduced network overhead results from the greater
chunk size.
Without hindering applications, GFS is made to meet Google’s huge cluster requirements.
Hierarchical directories with path names are used to store files. The master is in charge of
managing metadata, including namespace, access control, and mapping data. The master
communicates with each chunk server by timed heartbeat messages and keeps track of its
status updates.
More than 1,000 nodes with 300 TB of disc storage capacity make up the largest GFS
clusters. This is available for constant access by hundreds of clients.
Components of GFS
A group of computers makes up GFS. A cluster is just a group of connected computers. There
could be hundreds or even thousands of computers in each cluster. There are three basic
entities included in any GFS cluster as follows:
GFS Clients: They can be computer programs or applications which may be used to
request files. Requests may be made to access and modify already-existing files or
add new files to the system.
GFS Master Server: It serves as the cluster’s coordinator. It preserves a record of the
cluster’s actions in an operation log. Additionally, it keeps track of the data that
describes chunks, or metadata. The chunks’ place in the overall file and which files
they belong to are indicated by the metadata to the master server.
GFS Chunk Servers: They are the GFS’s workhorses. They keep 64 MB-sized file
chunks. The master server does not receive any chunks from the chunk servers.
Instead, they directly deliver the client the desired chunks. The GFS makes numerous
copies of each chunk and stores them on various chunk servers in order to assure
stability; the default is three copies. Every replica is referred to as one.
Features of GFS
Advantages of GFS
1. High accessibility Data is still accessible even if a few nodes fail. (replication)
Component failures are more common than not, as the saying goes.
2. Excessive throughput. many nodes operating concurrently.
3. Dependable storing. Data that has been corrupted can be found and duplicated.
Disadvantages of GFS
Hadoop
Hadoop is an open-source software framework that is used for storing and processing large
amounts of data in a distributed computing environment. It is designed to handle big data and
is based on the MapReduce programming model, which allows for the parallel processing of
large datasets.
HDFS (Hadoop Distributed File System): This is the storage component of Hadoop,
which allows for the storage of large amounts of data across multiple machines. It is
designed to work with commodity hardware, which makes it cost-effective.
YARN (Yet Another Resource Negotiator): This is the resource management
component of Hadoop, which manages the allocation of resources (such as CPU and
memory) for processing the data stored in HDFS.
Hadoop also includes several additional modules that provide additional functionality,
such as Hive (a SQL-like query language), Pig (a high-level platform for creating
MapReduce programs), and HBase (a non-relational, distributed database).
Hadoop is commonly used in big data scenarios such as data warehousing, business
intelligence, and machine learning. It’s also used for data processing, data analysis,
and data mining.
What is Hadoop?
Hadoop is an open source software programming framework for storing a large amount of
data and performing the computation. Its framework is based on Java programming with
some native code in C and shell scripts.
Hadoop is an open-source software framework that is used for storing and processing large
amounts of data in a distributed computing environment. It is designed to handle big data and
is based on the MapReduce programming model, which allows for the parallel processing of
large datasets.
HDFS (Hadoop Distributed File System): This is the storage component of Hadoop,
which allows for the storage of large amounts of data across multiple machines. It is
designed to work with commodity hardware, which makes it cost-effective.
YARN (Yet Another Resource Negotiator): This is the resource management
component of Hadoop, which manages the allocation of resources (such as CPU and
memory) for processing the data stored in HDFS.
Hadoop also includes several additional modules that provide additional functionality,
such as Hive (a SQL-like query language), Pig (a high-level platform for creating
MapReduce programs), and HBase (a non-relational, distributed database).
Hadoop is commonly used in big data scenarios such as data warehousing, business
intelligence, and machine learning. It’s also used for data processing, data analysis,
and data mining. It enables the distributed processing of large data sets across clusters
of computers using a simple programming model.
History of Hadoop
Apache Software Foundation is the developers of Hadoop, and it’s co-founders are Doug
Cutting and Mike Cafarella. It’s co-founder Doug Cutting named it on his son’s toy
elephant. In October 2003 the first paper release was Google File System. In January 2006,
MapReduce development started on the Apache Nutch which consisted of around 6000 lines
coding for it and around 5000 lines coding for HDFS. In April 2006 Hadoop 0.1.0 was
released.
Hadoop is an open-source software framework for storing and processing big data. It was
created by Apache Software Foundation in 2006, based on a white paper written by Google in
2003 that described the Google File System (GFS) and the MapReduce programming model.
The Hadoop framework allows for the distributed processing of large data sets across clusters
of computers using simple programming models. It is designed to scale up from single
servers to thousands of machines, each offering local computation and storage. It is used by
many organizations, including Yahoo, Facebook, and IBM, for a variety of purposes such as
data warehousing, log processing, and research. Hadoop has been widely adopted in the
industry and has become a key technology for big data processing.
Features of hadoop:
1. it is fault tolerance.
2. it is highly available.
5. it is low cost.
Hadoop has several key features that make it well-suited for big data processing:
Distributed Storage: Hadoop stores large data sets across multiple machines, allowing
for the storage and processing of extremely large amounts of data.
Scalability: Hadoop can scale from a single server to thousands of machines, making
it easy to add more capacity as needed.
Fault-Tolerance: Hadoop is designed to be highly fault-tolerant, meaning it can
continue to operate even in the presence of hardware failures.
Data locality: Hadoop provides data locality feature, where the data is stored on the
same node where it will be processed, this feature helps to reduce the network traffic
and improve the performance
High Availability: Hadoop provides High Availability feature, which helps to make
sure that the data is always available and is not lost.
Flexible Data Processing: Hadoop’s MapReduce programming model allows for the
processing of data in a distributed fashion, making it easy to implement a wide variety
of data processing tasks.
Data Integrity: Hadoop provides built-in checksum feature, which helps to ensure that
the data stored is consistent and correct.
Data Replication: Hadoop provides data replication feature, which helps to replicate
the data across the cluster for fault tolerance.
Data Compression: Hadoop provides built-in data compression feature, which helps to
reduce the storage space and improve the performance.
YARN: A resource management platform that allows multiple data processing
engines like real-time streaming, batch processing, and interactive SQL, to run and
process data stored in HDFS.
It has distributed file system known as HDFS and this HDFS splits files into blocks and sends
them across various nodes in form of large clusters. Also in case of a node failure, the system
operates and data transfer takes place between the nodes which are facilitated by HDFS.
HDFS
1. Hive- It uses HiveQl for data structuring and for writing complicated MapReduce in
HDFS.
2. Drill- It consists of user-defined functions and is used for data exploration.
3. Storm- It allows real-time processing and streaming of data.
4. Spark- It contains a Machine Learning Library(MLlib) for providing enhanced
machine learning and is widely used for data processing. It also supports Java,
Python, and Scala.
5. Pig- It has Pig Latin, a SQL-Like language and performs data transformation of
unstructured data.
6. Tez- It reduces the complexities of Hive and Pig and helps in the running of their
codes faster.
Hadoop framework is made up of the following modules:
Advantages:
Hadoop has several advantages that make it a popular choice for big data processing:
Scalability: Hadoop can easily scale to handle large amounts of data by adding more
nodes to the cluster.
Cost-effective: Hadoop is designed to work with commodity hardware, which makes
it a cost-effective option for storing and processing large amounts of data.
Fault-tolerance: Hadoop’s distributed architecture provides built-in fault-tolerance,
which means that if one node in the cluster goes down, the data can still be processed
by the other nodes.
Flexibility: Hadoop can process structured, semi-structured, and unstructured data,
which makes it a versatile option for a wide range of big data scenarios.
Open-source: Hadoop is open-source software, which means that it is free to use and
modify. This also allows developers to access the source code and make
improvements or add new features.
Large community: Hadoop has a large and active community of developers and users
who contribute to the development of the software, provide support, and share best
practices.
Integration: Hadoop is designed to work with other big data technologies such as
Spark, Storm, and Flink, which allows for integration with a wide range of data
processing and analysis tools.
Disadvantages:
Applications can access Google Cloud BigTable through a variety of client libraries,
including a supported Java extension to the Apache HBase library. Because of this, it is
compatible with the current Apache ecosystem of open-source big data software.
Powerful backend servers from Google Cloud Bigtable have a number of advantages
over a self-managed HBase installation, including:
Exceptional scalability In direct proportion to the number of machines in your
cluster, Google Cloud Bigtable scales. After a certain point, a self-managed HBase
system has a design bottleneck that restricts performance. This bottleneck does not
exist for Google Cloud Bigtable, therefore you can extend your cluster to support
more reads and writes.
Ease of administration Upgrades and restarts are handled by Google Cloud Bigtable
transparently, and it automatically upholds strong data durability. Simply add a
second cluster to your instance to begin replicating your data; replication will begin
immediately. Simply define your table schemas, and Google Cloud Bigtable will take
care of the rest for you. No more managing replication or regions.
Cluster scaling with minimal disruption. Without any downtime, you may scale down
a Google Cloud Bigtable cluster after increasing its capacity for a few hours to handle
a heavy load. Under load, Google Cloud Bigtable usually balances performance
across all of the nodes in your cluster within a few minutes after you modify the size
of a cluster.
Applications that require high throughput and scalability for key/value data, where each value
is typically no more than 10 MB, should use Google Cloud BigTable. Additionally, Google
Cloud Bigtable excels as a storage engine for machine learning, stream processing, and batch
MapReduce operations.
All of the following forms of data can be stored in and searched using Google Cloud
Bigtable:
Each massively scalable table in Google Cloud Bigtable is a sorted key/value map that holds
the data. The table is made up of columns that contain unique values for each row and rows
that typically describe a single object. A single row key is used to index each row, and a
column family is often formed out of related columns. The column family and a column
qualifier, a distinctive name within the column family, are combined to identify each column.
Multiple cells may be present at each row/column intersection. A distinct timestamped copy
of the data for that row and column is present in each cell. When many cells are put in a
column, a history of the recorded data for that row and column is preserved. Cloud by Google
Bigtable tables is sparse, taking up no room if a column is not used in a given row.
a few points to remember Rows of columns could be empty.
A specific row and column contain cells with individual timestamps (t).
All client queries made through the Google Cloud Bigtable architecture are sent through a
frontend server before being forwarded to a Google Cloud Bigtable node. The nodes are
arranged into a Google Cloud Bigtable cluster, which is a container for the cluster and is part
of a Google Cloud Bigtable instance.
A portion of the requests made to the cluster is handled by each node. The number of
simultaneous requests that a cluster can handle can be increased by adding nodes. The
cluster’s maximum throughput rises as more nodes are added. You can send various types of
traffic to various clusters if replication is enabled by adding more clusters. Then you can fail
over to another cluster if one cluster is unavailable.
It’s important to note that data is never really saved in Google Cloud Bigtable nodes; rather,
each node contains pointers to a collection of tablets that are kept on Colossus. Because the
real data is not duplicated, rebalancing tablets from one node to another proceeds swiftly.
When a Google Cloud Bigtable node fails, no data is lost; recovery from a node failure is
quick since only metadata must be moved to the new node. Google Cloud Bigtable merely
changes the pointers for each node.
Load balancing
A primary process oversees each Google Cloud Bigtable zone, balancing workload and data
volume within clusters. By dividing busier/larger tablets in half and combining
less-used/smaller tablets, this procedure moves tablets across nodes as necessary. Google
Cloud Bigtable divides a tablet into two when it experiences a spike in traffic, and then
moves one of the new tablets to a different node. By handling the splitting, merging, and
rebalancing automatically with Google Cloud Bigtable, you may avoid having to manually
manage your tablets.
It’s crucial to distribute writes among nodes as equally as you can in order to obtain the
optimum write performance out of Google Cloud Bigtable. Using row keys with
unpredictable ordering is one method to accomplish this.
Additionally, grouping comparable rows together and placing them next to one another
makes it much easier to read multiple rows at once. If you were keeping various kinds of
weather data across time, for instance, your row key may be the place where the data was
gathered, followed by a timestamp (for instance, WashingtonDC#201803061617). A
contiguous range of rows would be created using this kind of row key to combine all the data
from one location. With several sites gathering data at the same rate, writes would still be
dispersed uniformly between tablets. For other places, the row would begin with a new
identifier.
For the majority of uses, Google Cloud Bigtable treats all data as raw byte strings. Only
during increment operations, where the destination must be a 64-bit integer encoded as an 8-
byte big-endian value, does Google Cloud Bigtable attempt to ascertain the type.
The sections that follow explain how various Google Cloud Bigtable features impact the
amount of memory and disc space used by your instance.
Inactive columns
A Google Cloud Bigtable row doesn’t have any room for columns that aren’t being used.
Each row is essentially made up of a set of key/value entries, where the key is made up of the
timestamp, column family, and column qualifier. The key/value entry is just plain absent if a
row doesn’t have a value for a certain column.
Since each column qualifier used in a row is stored in that row, column qualifiers occupy
space in rows. As a result, using column qualifiers as data is frequently effective.
Compactions
To make reads and writes more effective and to eliminate removed entries, Google Cloud
Bigtable periodically rewrites your tables. This procedure is called compaction. Your data is
automatically compacted by Google Cloud Big Table; there are no tuning options.
Because Google Cloud Bigtable saves mutations sequentially and only periodically compacts
them, updates to a row require more storage space. A table is compacted by Google Cloud
Bigtable by removing values that are no longer required. The original value and the updated
value will both be kept on disc until the data is compressed if you change a cell’s value.
Because deletions are actually a particular kind of mutation, they also require more storage
space, at least initially. A deletion consumes additional storage rather than releasing space up
until the table is compacted.
Data longevity
When you use Google Cloud Bigtable, your information is kept on Colossus, an internal,
incredibly resilient file system, employing storage components located in Google’s data
centers. To use Google Cloud Bigtable, you do not need to run an HDFS cluster or any other
type of file system.
Beyond what conventional HDFS three-way replication offers, Google employs customized
storage techniques to ensure data persistence. Additionally, we make duplicate copies of your
data to enable disaster recovery and protection against catastrophic situations.
Dependable model
At the level of projects, instances, and tables, security can be managed. There are no row-
level, column-level, or cell-level security constraints supported by Google Cloud Bigtable.
Encryption
The same hardened key management mechanisms that we employ for our own encrypted data
are used by default for all data stored within Google Cloud, including the data in Google
Cloud Big Table tables.
Customer-managed encryption keys provide you more control over the keys used to protect
your Google Cloud Bigtable data at rest (CMEK).
Backups
With Google Cloud Bigtable backups, you may copy the schema and data of a table and later
restore it to a new table using the backup. You can recover from operator errors, such as
accidentally deleting a table and application-level data destruction with the use of backups.
AWS Storage Services: AWS offers a wide range of storage services that can be provisioned
depending on your project requirements and use case. AWS storage services have different
provisions for highly confidential data, frequently accessed data, and the not so frequently
accessed data. You can choose from various storage types namely, object storage, file
storage, block storage services, backups, and data migration options. All of which fall under
the AWS Storage Services list.
AWS Simple Storage Service (S3): From the aforementioned list, S3, is the object storage
service provided by AWS. It is probably the most commonly used, go-to storage service for
AWS users given the features like extremely high availability, security, and simple
connection to other AWS Services. AWS S3 can be used by people with all kinds of use
cases like mobile/web applications, big data, machine learning and many more.
AWS S3 Terminology:
S3 storage classes:
AWS S3 provides multiple storage types that offer different performance and features and
different cost structure.
Standard: Suitable for frequently accessed data, that needs to be highly available and
durable.
Standard Infrequent Access (Standard IA): This is a cheaper data-storage class and
as the name suggests, this class is best suited for storing infrequently accessed data
like log files or data archives. Note that there may be a per GB data retrieval fee
associated with Standard IA class.
Intelligent Tiering: This service class classifies your files automatically into
frequently accessed and infrequently accessed and stores the infrequently accessed
data in infrequent access storage to save costs. This is useful for unpredictable data
access to an S3 bucket.
One Zone Infrequent Access (One Zone IA): All the files on your S3 have their
copies stored in a minimum of 3 Availability Zones. One Zone IA stores this data in a
single availability zone. It is only recommended to use this storage class for
infrequently accessed, non-essential data. There may be a per GB cost for data
retrieval.
Reduced Redundancy Storage (RRS): All the other S3 classes ensure the durability
of 99.999999999%. RRS only ensures a 99.99% durability. AWS no longer
recommends RRS due to its less durability. However, it can be used to store non-
essential data.
Security Issues in Cloud Computing
Cloud computing is a type of technology that provides remote services on the internet to
manage, access, and store data rather than storing it on Servers or local drives. This
technology is also known as Server less technology. Here the data can be anything like
Image, Audio, video, documents, files, etc.
Need of Cloud Computing :
Before using Cloud Computing, most of the large as well as small IT companies use
traditional methods i.e. they store data in Server, and they need a separate Server room for
that. In that Server Room, there should be a database server, mail server, firewalls, routers,
modems, high net speed devices, etc. For that IT companies have to spend lots of money. In
order to reduce all the problems with cost Cloud computing come into existence and most
companies shift to this technology.
1. Data Loss –
Data Loss is one of the issues faced in Cloud Computing. This is also known as Data
Leakage. As we know that our sensitive data is in the hands of Somebody else, and
we don’t have full control over our database. So, if the security of cloud service is to
break by hackers then it may be possible that hackers will get access to our sensitive
data or personal files.
5. Lack of Skill –
While working, shifting to another service provider, need an extra feature, how to use
a feature, etc. are the main problems caused in IT Company who doesn’t have skilled
Employees. So it requires a skilled person to work with Cloud Computing.
Almost every organization has adopted cloud computing to varying degrees within their
business. However, with this adoption of the cloud comes the need to ensure that the
organization’s cloud security strategy is capable of protecting against the top threats to cloud
security.
Mis configuration
Misconfigurations of cloud security settings are a leading cause of cloud data breaches. Many
organizations’ cloud security posture management strategies are inadequate for protecting
their cloud-based infrastructure.
Several factors contribute to this. Cloud infrastructure is designed to be easily usable and to
enable easy data sharing, making it difficult for organizations to ensure that data is only
accessible to authorized parties. Also, organizations using cloud-based infrastructure also do
not have complete visibility and control over their infrastructure, meaning that they need to
rely upon security controls provided by their cloud service provider (CSP) to configure and
secure their cloud deployments. Since many organizations are unfamiliar with securing cloud
infrastructure and often have multi-cloud deployments – each with a different array of
vendor-provided security controls – it is easy for a misconfiguration or security oversight to
leave an organization’s cloud-based resources exposed to attackers.
Unauthorized Access
Insecure Interfaces/APIs
CSPs often provide a number of application programming interfaces (APIs) and interfaces for
their customers. In general, these interfaces are well-documented in an attempt to make them
easily-usable for a CSP’s customers.
However, this creates potential issues if a customer has not properly secured the interfaces for
their cloud-based infrastructure. The documentation designed for the customer can also be
used by a cybercriminal to identify and exploit potential methods for accessing and
exfiltrating sensitive data from an organization’s cloud environment.
Hijacking of Accounts
Many people have extremely weak password security, including password reuse and the use
of weak passwords. This problem exacerbates the impact of phishing attacks and data
breaches since it enables a single stolen password to be used on multiple different accounts.
Account hijacking is one of the more serious cloud security issues as organizations are
increasingly reliant on cloud-based infrastructure and applications for core business
functions. An attacker with an employee’s credentials can access sensitive data or
functionality, and compromised customer credentials give full control over their online
account. Additionally, in the cloud, organizations often lack the ability to identify and
respond to these threats as effectively as for on-premises infrastructure.
Lack of Visibility
An organization’s cloud-based resources are located outside of the corporate network and run
on infrastructure that the company does not own. As a result, many traditional tools for
achieving network visibility are not effective for cloud environments, and some organizations
lack cloud-focused security tools. This can limit an organization’s ability to monitor their
cloud-based resources and protect them against attack.
The cloud is designed to make data sharing easy. Many clouds provide the option to
explicitly invite a collaborator via email or to share a link that enables anyone with the URL
to access the shared resource.
While this easy data sharing is an asset, it can also be a major cloud security issue. The use of
link-based sharing – a popular option since it is easier than explicitly inviting each intended
collaborator – makes it difficult to control access to the shared resource. The shared link can
be forwarded to someone else, stolen as part of a cyberattack, or guessed by a cybercriminal,
providing unauthorized access to the shared resource. Additionally, link-based sharing makes
it impossible to revoke access to only a single recipient of the shared link.
Malicious Insiders
Insider threats are a major security issue for any organization. A malicious insider already has
authorized access to an organization’s network and some of the sensitive resources that it
contains. Attempts to gain this level of access are what reveals most attackers to their target,
making it hard for an unprepared organization to detect a malicious insider.
On the cloud, detection of a malicious insider is even more difficult. With cloud
deployments, companies lack control over their underlying infrastructure, making many
traditional security solutions less effective. This, along with the fact that cloud-based
infrastructure is directly accessible from the public Internet and often suffers from security
misconfigurations, makes it even more difficult to detect malicious insiders.
Cyberattacks
Cybercrime is a business, and cybercriminals select their targets based upon the expected
profitability of their attacks. Cloud-based infrastructure is directly accessible from the public
Internet, is often improperly secured, and contains a great deal of sensitive and valuable data.
Additionally, the cloud is used by many different companies, meaning that a successful attack
can likely be repeated many times with a high probability of success. As a result,
organizations’ cloud deployments are a common target of cyberattacks.
The cloud is essential to many organizations’ ability to do business. They use the cloud to
store business-critical data and to run important internal and customer-facing applications.
This means that a successful Denial of Service (DoS) attack against cloud infrastructure is
likely to have a major impact on a number of different companies. As a result, DoS attacks
where the attacker demands a ransom to stop the attack pose a significant threat to an
organization’s cloud-based resources.
In the Cloud Security Report, organizations were asked about their major security concerns
regarding cloud environments. Despite the fact that many organizations have decided to
move sensitive data and important applications to the cloud, concerns about how they can
protect it there abound.
Data Loss/Leakage
Cloud-based environments make it easy to share the data stored within them. These
environments are accessible directly from the public Internet and include the ability to share
data easily with other parties via direct email invitations or by sharing a public link to the
data.
The ease of data sharing in the cloud – while a major asset and key to collaboration in the
cloud – creates serious concerns regarding data loss or leakage. In fact, 69% of organizations
point to this as their greatest cloud security concern. Data sharing using public links or
setting a cloud-based repository to public makes it accessible to anyone with knowledge of
the link, and tools exist specifically for searching the Internet for these unsecured cloud
deployments.
Data Privacy/Confidentiality
Data privacy and confidentiality is a major concern for many organizations. Data protection
regulations like the EU’s General Data Protection Regulation (GDPR), the Health Insurance
Portability and Accessibility Act (HIPAA), the Payment Card Industry Data Security
Standard (PCI DSS) and many more mandate the protection of customer data and impose
strict penalties for security failures. Additionally, organizations have a large amount of
internal data that is essential to maintaining competitive advantage.
Placing this data on the cloud has its advantages but also has created major security concerns
for 66% of organizations. Many organizations have adopted cloud computing but lack the
knowledge to ensure that they and their employees are using it securely. As a result, sensitive
data is at risk of exposure – as demonstrated by a massive number of cloud data breaches.
Phishers commonly use cloud applications and environments as a pretext in their phishing
attacks. With the growing use of cloud-based email (G-Suite, Microsoft 365, etc.) and
document sharing services (Google Drive, Dropbox, OneDrive), employees have become
accustomed to receiving emails with links that might ask them to confirm their account
credentials before gaining access to a particular document or website.
This makes it easy for cybercriminals to learn an employee’s credentials for cloud services.
As a result, accidental exposure of cloud credentials is a major concern for 44% of
organizations since it potentially compromises the privacy and security of their cloud-based
data and other resources.
Incident Response
Many organizations have strategies in place for responding to internal cyber security
incidents. Since the organization owns their entire internal network infrastructure and security
personnel are on-site, it is possible to lock down the incident. Additionally, this ownership of
their infrastructure means that the company likely has the visibility necessary to identify the
scope of the incident and perform the appropriate remediation actions.
With cloud-based infrastructure, a company only has partial visibility and ownership of their
infrastructure, making traditional processes and security tools ineffective. As a result, 44% of
companies are concerned about their ability to perform incident response effectively in the
cloud.
Data protection regulations like PCI DSS and HIPAA require organizations to demonstrate
that they limit access to the protected information (credit card data, healthcare patient
records, etc.). This could require creating a physically or logically isolated part of the
organization’s network that is only accessible to employees with a legitimate need to access
this data.
When moving data protected by these and similar regulations to the cloud, achieving and
demonstrating regulatory compliance can be more difficult. With a cloud deployment,
organizations only have visibility and control into some of the layers of their infrastructure.
As a result, legal and regulatory compliance is considered a major cloud security issue by
42% of organizations and requires specialized cloud compliance solutions.
Data Sovereignty/Residence/Control
Most cloud providers have a number of geographically distributed data centres. This helps to
improve the accessibility and performance of cloud-based resources and makes it easier for
CSPs to ensure that they are capable of maintaining service level agreements in the face of
business-disrupting events such as natural disasters, power outages, etc.
Organizations storing their data in the cloud often have no idea where their data is actually
stored within a CSP’s array of data centres. This creates major concerns around data
sovereignty, residence, and control for 37% of organizations. With data protection regulations
such as the GDPR limiting where EU citizens data can be sent, the use of a cloud platform
with data centres outside of the approved areas could place an organization in a state of
regulatory non-compliance. Additionally, different jurisdictions have different laws regarding
access to data for law enforcement and national security, which can impact the data privacy
and security of an organization’s customers.
The cloud provides a number of advantages to organizations; however, it also comes with its
own security threats and concerns. Cloud-based infrastructure is very different from an on-
premises data centre, and traditional security tools and strategies are not always able to secure
it effectively. For more information about leading cloud security issues and threats, download
the Cloud Security Report.
Privacy Challenges
Cloud computing is a widely well-discussed topic today with interest from all fields, be it
research, academia, or the IT industry. It has seen suddenly started to be a hot topic in
international conferences and other opportunities throughout the whole world. The spike in
job opportunities is attributed to huge amounts of data being processed and stored on the
servers. The cloud paradigm revolves around convenience and easy the provision of a huge
pool of shared computing resources.
The rapid development of the cloud has led to more flexibility, cost-cutting, and scalability of
products but also faces an enormous amount of privacy and security challenges. Since it is a
relatively new concept and is evolving day by day, there are undiscovered security issues that
creep up and need to be taken care of as soon as discovered. Here we discuss the top 7
privacy challenges encountered in cloud computing:
1. Data Confidentiality Issues
Data loss or data theft is one of the major security challenges that the cloud providers face. If
a cloud vendor has reported data loss or data theft of critical or sensitive material data in the
past, more than sixty percent of the users would decline to use the cloud services provided by
the vendor. Outages of the cloud services are very frequently visible even from firms such as
Dropbox, Microsoft, Amazon, etc., which in turn results in an absence of trust in these
services during a critical time. Also, it is quite easy for an attacker to gain access to multiple
storage units even if a single one is compromised.
Since the cloud infrastructure is distributed across different geographical locations spread
throughout the world, it is often possible that the user’s data is stored in a location that is out
of the legal jurisdiction which leads to the user’s concerns about the legal accessibility of
local law enforcement and regulations on data that is stored out of their region. Moreover, the
user fears that local laws can be violated due to the dynamic nature of the cloud makes it very
difficult to delegate a specific server that is to be used for trans-border data transmission.
Multi-tenancy is a paradigm that follows the concept of sharing computational resources, data
storage, applications, and services among different tenants. This is then hosted by the same
logical or physical platform at the cloud service provider’s premises. While following this
approach, the provider can maximize profits but puts the customer at a risk. Attackers can
take undue advantage of the multi-residence opportunities and can launch various attacks
against their co-tenants which can result in several privacy challenges.
5. Transparency Issues
In cloud computing security, transparency means the willingness of a cloud service provider
to reveal different details and characteristics on its security preparedness. Some of these
details compromise policies and regulations on security, privacy, and service level. In
addition to the willingness and disposition, when calculating transparency, it is important to
notice how reachable the security readiness data and information actually are. It will not
matter the extent to which the security facts about an organization are at hand if they are not
presented in an organized and easily understandable way for cloud service users and auditors,
the transparency of the organization can then also be rated relatively small.
6. Hypervisor Related Issues
Virtualization means the logical abstraction of computing resources from physical restrictions
and constraints. But this poses new challenges for factors like user authentication,
accounting, and authorization. The hypervisor manages multiple Virtual Machines and
therefore becomes the target of adversaries. Different from the physical devices that are
independent of one another, Virtual Machines in the cloud usually reside in a single physical
device that is managed by the same hypervisor. The compromise of the hypervisor will hence
put various virtual machines at risk. Moreover, the newness of the hypervisor technology,
which includes isolation, security hardening, access control, etc. provides adversaries with
new ways to exploit the system.
7. Managerial Issues
There are not only technical aspects of cloud privacy challenges but also non-technical and
managerial ones. Even on implementing a technical solution to a problem or a product and
not managing it properly is eventually bound to introduce vulnerabilities. Some examples are
lack of control, security and privacy management for virtualization, developing
comprehensive service level agreements, going through cloud service vendors and user
negotiations, etc.
Every computer system and software design must handle all security risks and implement the
necessary measures to enforce security policies. At the same time, it's critical to strike a
balance because strong security measures might increase costs while also limiting the
system's usability, utility, and smooth operation. As a result, system designers must assure
efficient performance without compromising security.
In this article, you will learn about operating system security with its issues and other
features.
System security may be threatened through two violations, and these are as follows:
1. Threat
2. Attack
There are two types of security breaches that can harm the system: malicious and accidental.
Malicious threats are a type of destructive computer code or web script that is designed to
cause system vulnerabilities that lead to back doors and security breaches. On the other hand,
Accidental Threats are comparatively easier to protect against.
Security may be compromised through the breaches. Some of the breaches are as follows:
1. Breach of integrity
2. Theft of service
3. Breach of confidentiality
5. Denial of service
It includes preventing legitimate use of the system. Some attacks may be accidental.
There are several goals of system security. Some of them are as follows:
1. Integrity
Unauthorized users must not be allowed to access the system's objects, and users with
insufficient rights should not modify the system's critical files and resources.
2. Secrecy
The system's objects must only be available to a small number of authorized users. The
system files should not be accessible to everyone.
3. Availability
All system resources must be accessible to all authorized users, i.e., no single user/process
should be able to consume all system resources. If such a situation arises, service denial may
occur. In this case, malware may restrict system resources and preventing legitimate
processes from accessing them.
Types of Threats
There are mainly two types of threats that occur. These are as follows:
Program threats
The operating system's processes and kernel carry out the specified task as directed. Program
Threats occur when a user program causes these processes to do malicious operations. The
common example of a program threat is that when a program is installed on a computer, it
could store and transfer user credentials to a hacker. There are various program threats. Some
of them are as follows:
1.Virus
A virus may replicate itself on the system. Viruses are extremely dangerous and can
modify/delete user files as well as crash computers. A virus is a little piece of code that is
implemented on the system program. As the user interacts with the program, the virus
becomes embedded in other files and programs, potentially rendering the system inoperable.
2. Trojan Horse
This type of application captures user login credentials. It stores them to transfer them to a
malicious user who can then log in to the computer and access system resources.
3. Logic Bomb
A logic bomb is a situation in which software only misbehaves when particular criteria are
met; otherwise, it functions normally.
4. Trap Door
A trap door is when a program that is supposed to work as expected has a security weakness
in its code that allows it to do illegal actions without the user's knowledge.
System Threats
System threats are described as the misuse of system services and network connections to
cause user problems. These threats may be used to trigger the program threats over an entire
network, known as program attacks. System threats make an environment in which OS
resources and user files may be misused. There are various system threats. Some of them are
as follows:
1. Port Scanning
It is a method by which the cracker determines the system's vulnerabilities for an attack. It is
a fully automated process that includes connecting to a specific port via TCP/IP. To protect
the attacker's identity, port scanning attacks are launched through Zombie Systems, which
previously independent systems now serve their owners while being utilized for such terrible
purposes.
2. Worm
The worm is a process that can choke a system's performance by exhausting all system
resources. A Worm process makes several clones, each consuming system resources and
preventing all other processes from getting essential resources. Worm processes can even
bring a network to a halt.
3. Denial of Service
Denial of service attacks usually prevents users from legitimately using the system. For
example, if a denial-of-service attack is executed against the browser's content settings, a user
may be unable to access the internet.
There are various threats to the operating system. Some of them are as follows:
Malware
It contains viruses, worms, trojan horses, and other dangerous software. These are generally
short code snippets that may corrupt files, delete the data, replicate to propagate further, and
even crash a system. The malware frequently goes unnoticed by the victim user while
criminals silently extract important data.
Network Intrusion
Buffer Overflow
It is also known as buffer overrun. It is the most common and dangerous security issue of the
operating system. It is defined as a condition at an interface under which more input may be
placed into a buffer and a data holding area than the allotted capacity, and it may overwrite
other information. Attackers use such a situation to crash a system or insert specially created
malware that allows them to take control of the system.
There are various ways to ensure operating system security. These are as follows:
Authentication
The process of identifying every system user and associating the programs executing with
those users is known as authentication. The operating system is responsible for implementing
a security system that ensures the authenticity of a user who is executing a specific program.
In general, operating systems identify and authenticate users in three ways.
1. Username/Password
Every user contains a unique username and password that should be input correctly before
accessing a system.
2. User Attribution
These techniques usually include biometric verification, such as fingerprints, retina scans,
etc. This authentication is based on user uniqueness and is compared to database samples
already in the system. Users can only allow access if there is a match.
To login into the system, the user must punch a card into a card slot or enter a key produced
by a key generator into an option provided by the operating system.
One Time passwords
Along with standard authentication, one-time passwords give an extra layer of security. Every
time a user attempts to log into the One-Time Password system, a unique password is needed.
Once a one-time password has been used, it cannot be reused. One-time passwords may be
implemented in several ways.
1. Secret Key
The user is given a hardware device that can generate a secret id that is linked to the user's id.
The system prompts for such a secret id, which must be generated each time you log in.
2. Random numbers
Users are given cards that have alphabets and numbers printed on them. The system requests
numbers that correspond to a few alphabets chosen at random.
3. Network password
Firewalls
Firewalls are essential for monitoring all incoming and outgoing traffic. It imposes local
security, defining the traffic that may travel through it. Firewalls are an efficient way of
protecting network systems or local systems from any network-based security threat.
Physical Security
The most important method of maintaining operating system security is physical security. An
attacker with physical access to a system may edit, remove, or steal important files since
operating system code and configuration files are stored on the hard drive.
Various operating system security policies may be implemented based on the organization
that you are working in. In general, an OS security policy is a document that specifies the
procedures for ensuring that the operating system maintains a specific level of integrity,
confidentiality, and availability.
OS Security protects systems and data from worms, malware, threats, ransomware, backdoor
intrusions, viruses, etc. Security policies handle all preventative activities and procedures to
ensure an operating system's protection, including steal, edited, and deleted data.
As OS security policies and procedures cover a large area, there are various techniques to
addressing them. Some of them are as follows:
OS security policies and procedures are developed and implemented to ensure that you must
first determine which assets, systems, hardware, and date are the most vital to your
organization. Once that is completed, a policy can be developed to secure and safeguard them
properly.
By using the Azure Active Directory, we can control access to our virtual machines to
different users or groups of users. When we create a virtual machine, we can assign a
user to it, and while we are assigning the user to the virtual machine, we will also
associate a particular rule to them. That role defines the level of access that the user
will have on our virtual machine.
Users, groups, and applications from that directory can manage resources in the Azure
subscription.
It grants access by assigning the appropriate RBAC role to users, groups, and
applications at a certain scope. The scope of a role assignment can be a subscription, a
resource group, or a single resource.
Azure RBAC has three essential roles that apply to all resource types:
o Owner: They have full access to all resources, including the right to delegate
access to others.
o Contributor: They can create and manage all types of Azure resources but
can't grant access to others.
o Reader: They can only view existing Azure resources.
The Azure security center identifies potential virtual machine (VM) configuration issues and
targeted security threats. These might include VMs that are missing network security groups,
unencrypted disks, and brute-force Remote Desktop Protocol (RDP) attacks.
We can customize the recommendations we would like to see from the Security Center using
security policies.
It is newly introduced in Azure. Earlier, what used to happen was whenever we're deploying
an application into a virtual machine; we generally have user id and password within a
configuration file of a folder of that application. But if someone gets access to that virtual
machine, they can be able to go to the configuration file and view that also. To further
increase the security of our application code and safety of services that are being accessed by
application code, we can use Managed Service Identity.
Network security group: To filter the traffic in and out of the virtual machine.
Microsoft Antimalware for Azure: We can install on our Azure virtual machines to
secure our machines against any malware.
Encryption: We can enable Azure Disk Encryption.
Key Vault and SSH Keys: we can use key vault to store the certificates or any
sensitive key.
Policies: All the security-related policies we can apply using it.
UNIT V:
Cloud Application Development: Amazon Web Services : EC2 – instances, connecting
clients, security rules, launching, usage of S3 in Java, Installing Simple Notification Service
on Ubuntu 10.04, Installing Hadoop on Eclipse, Cloud based simulation of a Distributed trust
algorithm, Cloud service for adaptive data streaming ( Text Book 1), Google: Google App
Engine, Google Web Toolkit (Text Book 2), Microsoft: Azure Services Platform, Windows
live, Exchange Online, Share Point Services, Microsoft Dynamics CRM (Text Book2).
EC2
It allows you to pay a fixed rate by the hour or even by the second with no
commitment.
Linux instance is by the second and windows instance is by the hour.
On Demand is perfect for the users who want low cost and flexibility of Amazon EC2
without any up-front investment or long-term commitment.
It is suitable for the applications with short term, spiky or unpredictable workloads
that cannot be interrupted.
It is useful for the applications that have been developed or tested on Amazon EC2 for
the first time.
On Demand instance is recommended when you are not sure which instance type is
required for your performance needs.
Reserved
It provides a discount of up to 75% off on demand. For example, you are paying all
up-fronts for 3 year contract.
It is useful when your Application is at the steady-state.
Convertible Reserved Instances
Scheduled Reserved Instances are available to launch within the specified time
window you reserve.
It allows you to match your capacity reservation to a predictable recurring schedule
that only requires a fraction of a day, a week, or a month.
Spot Instances
It allows you to bid for a price whatever price that you want for instance capacity, and
providing better savings if your applications have flexible start and end times.
Spot Instances are useful for those applications that have flexible start and end times.
It is useful for those applications that are feasible at very low compute prices.
It is useful for those users who have an urgent need for large amounts of additional
computing capacity.
EC2 Spot Instances provide less discounts as compared to On Demand prices.
Spot Instances are used to optimize your costs on the AWS cloud and scale your
application's throughput up to 10X.
EC2 Spot Instances will continue to exist until you terminate these instances.
Dedicated Hosts
A dedicated host is a physical server with EC2 instance capacity which is fully
dedicated to your use.
The physical EC2 server is the dedicated host that can help you to reduce costs by
allowing you to use your existing server-bound software licenses. For example, Vm
ware, Oracle, SQL Server depending on the licenses that you can bring over to AWS
and then they can use the dedicated host.
Dedicated hosts are used to address compliance requirements and reduces host by
allowing using your existing server-bound server licenses.
It can be purchased as a Reservation for up to 70% off On-Demand price.
A web service is a standardized method for propagating messages between client and server
applications on the World Wide Web. A web service is a software module that aims to
accomplish a specific set of tasks. Web services can be found and implemented over a
network in cloud computing.
The web service would be able to provide the functionality to the client that invoked the web
service.
A web service is a set of open protocols and standards that allow data exchange between
different applications or systems. Web services can be used by software programs written in
different programming languages and on different platforms to exchange data through
computer networks such as the Internet. In the same way, communication on a computer can
be inter-processed.
Any software, application, or cloud technology that uses a standardized Web protocol (HTTP
or HTTPS) to connect, interoperate, and exchange data messages over the Internet-usually
XML (Extensible Markup Language) is considered a Web service. Is.
XML and HTTP is the most fundamental web service platform. All typical web services use
the following components:
Only the structure of an XML document, not the content, follows a pattern. The great thing
about web services and SOAP is that everything is sent through HTTP, the standard web
protocol.
Every SOAP document requires a root element known as an element. In an XML document,
the root element is the first element.
The "envelope" is divided into two halves. The header comes first, followed by the body.
Routing data, or information that directs the XML document to which client it should be sent,
is contained in the header. The real message will be in the body.
2. UDDI (Universal Description, Search, and Integration)
UDDI is a standard for specifying, publishing and searching online service providers. It
provides a specification that helps in hosting the data through web services. UDDI provides a
repository where WSDL files can be hosted so that a client application can search the WSDL
file to learn about the various actions provided by the web service. As a result, the client
application will have full access to UDDI, which acts as the database for all WSDL files.
The UDDI Registry will keep the information needed for online services, such as a telephone
directory containing the name, address, and phone number of a certain person so that client
applications can find where it is.
The client implementing the web service must be aware of the location of the web service. If
a web service cannot be found, it cannot be used. Second, the client application must
understand what the web service does to implement the correct web service. WSDL, or Web
Service Description Language, is used to accomplish this. A WSDL file is another XML-
based file that describes what a web service does with a client application. The client
application will understand where the web service is located and how to access it using the
WSDL document.
The diagram shows a simplified version of how a web service would function. The client will
use requests to send a sequence of web service calls to the server hosting the actual web
service.
Remote procedure calls are used to perform these requests. The calls to the methods hosted
by the respective web service are known as Remote Procedure Calls (RPC). Example:
Flipkart provides a web service that displays the prices of items offered on Flipkart.com. The
front end or presentation layer can be written in .NET or Java, but the web service can be
communicated using a programming language.
The data exchanged between the client and the server, XML, is the most important part of
web service design. XML (Extensible Markup Language) is a simple, intermediate language
understood by various programming languages. It is the equivalent of HTML.
As a result, when programs communicate with each other, they use XML. It forms a common
platform for applications written in different programming languages to communicate with
each other.
Web services employ SOAP (Simple Object Access Protocol) to transmit XML data between
applications. The data is sent using standard HTTP. A SOAP message is data sent from a web
service to an application. An XML document is all that is contained in a SOAP message. The
client application that calls the web service can be built in any programming language as the
content is written in XML.
(a) XML-based: A web service's information representation and record transport layers
employ XML. There is no need for networking, operating system, or platform bindings when
using XML. At the mid-level, web offering-based applications are highly interactive.
(b) Loosely Coupled: The subscriber of an Internet service provider may not necessarily be
directly connected to that service provider. The user interface for a web service provider may
change over time without affecting the user's ability to interact with the service provider. A
strongly coupled system means that the decisions of the mentor and the server are
inextricably linked, indicating that if one interface changes, the other must be updated.
A loosely connected architecture makes software systems more manageable and easier to
integrate between different structures.
Asynchronous clients get their results later, but synchronous clients get their effect
immediately when the service is complete. The ability to enable loosely connected systems
requires asynchronous capabilities.
(d) Coarse Grain: Object-oriented systems, such as Java, make their services available
differently. At the corporate level, an operation is too great for a character technique to be
useful. Building a Java application from the ground up requires the development of several
granular strategies, which are then combined into a coarse grain provider that is consumed by
the buyer or service.
Corporations should be coarse-grained, as should the interfaces they expose. Building web
services is an easy way to define coarse-grained services that have access to substantial
business enterprise logic.
(e) Supports remote procedural calls: Consumers can use XML-based protocols to call
procedures, functions, and methods on remote objects that use web services. A web service
must support the input and output framework of the remote system.
Enterprise-wide component development Over the years, JavaBeans (EJBs) and .NET
components have become more prevalent in architectural and enterprise deployments.
Several RPC techniques are used to both allocate and access them.
A web function can support RPC by providing its services, similar to a traditional role, or
translating incoming invocations into an EJB or .NET component invocation.
(f) Supports document exchanges: One of the most attractive features of XML for
communicating with data and complex entities.
Security in cloud computing is a major concern. Proxy and brokerage services should be
employed to restrict a client from accessing the shared data directly. Data in the cloud should
be stored in encrypted form.
Security Planning
Before deploying a particular resource to the cloud, one should need to analyze several
aspects of the resource, such as:
A select resource needs to move to the cloud and analyze its sensitivity to risk.
Consider cloud service models such as IaaS, PaaS,and These models require the
customer to be responsible for Security at different service levels.
Consider the cloud type, such as public, private, community, or
Understand the cloud service provider's system regarding data storage and its transfer
into and out of the cloud.
The risk in cloud deployment mainly depends upon the service models and cloud
types.
Security Boundaries
The Cloud Security Alliance (CSA) stack model defines the boundaries between each
service model and shows how different functional units relate. A particular service model
defines the boundary between the service provider's responsibilities and the customer. The
following diagram shows the CSA stack model:
Key Points to CSA Model
IaaS is the most basic level of service, with PaaS and SaaS next two above levels of
services.
Moving upwards, each service inherits the capabilities and security concerns of the
model beneath.
IaaS provides the infrastructure, PaaS provides the platform development
environment, and SaaS provides the operating environment.
IaaS has the lowest integrated functionality and security level, while SaaS has the
highest.
This model describes the security boundaries at which cloud service providers'
responsibilities end and customers' responsibilities begin.
Any protection mechanism below the security limit must be built into the system and
maintained by the customer.
Although each service model has a security mechanism, security requirements also depend on
where these services are located, private, public, hybrid, or community cloud.
Since all data is transferred using the Internet, data security in the cloud is a major concern.
Here are the key mechanisms to protect the data.
access control
audit trail
certification
authority
The service model should include security mechanisms working in all of the above areas.
Since the data stored in the cloud can be accessed from anywhere, we need to have a
mechanism to isolate the data and protect it from the client's direct access.
Broker cloud storage is a way of separating storage in the Access Cloud. In this approach,
two services are created:
1. A broker has full access to the storage but does not have access to the client.
2. A proxy does not have access to storage but has access to both the client and the
broker.
3. Working on a Brocade cloud storage access system
4. When the client issues a request to access data:
5. The client data request goes to the external service interface of the proxy.
6. The proxy forwards the request to the broker.
7. The broker requests the data from the cloud storage system.
8. The cloud storage system returns the data to the broker.
9. The broker returns the data to the proxy.
10. Finally, the proxy sends the data to the client.
Encryption helps to protect the data from being hacked. It protects the data being transferred
and the data stored in the cloud. Although encryption helps protect data from unauthorized
access, it does not prevent data loss.
The difference between "cloud security" and "cloud security architecture" is that the former is
built from problem-specific measures while the latter is built from threats. A cloud security
architecture can reduce or eliminate the holes in Security that point-of-solution approaches
are almost certainly about to leave.
It does this by building down - defining threats starting with the users, moving to the cloud
environment and service provider, and then to the applications. Cloud security architectures
can also reduce redundancy in security measures, which will contribute to threat mitigation
and increase both capital and operating costs.
The cloud security architecture also organizes security measures, making them more
consistent and easier to implement, particularly during cloud deployments and
redeployments. Security is often destroyed because it is illogical or complex, and these flaws
can be identified with the proper cloud security architecture.
Elements of cloud security architecture
The best way to approach cloud security architecture is to start with a description of the
goals. The architecture has to address three things: an attack surface represented by external
access interfaces, a protected asset set that represents the information being protected, and
vectors designed to perform indirect attacks anywhere, including in the cloud and attacks the
system.
The goal of the cloud security architecture is accomplished through a series of functional
elements. These elements are often considered separately rather than part of a coordinated
architectural plan. It includes access security or access control, network security, application
security, contractual Security, and monitoring, sometimes called service security. Finally,
there is data protection, which are measures implemented at the protected-asset level.
Complete cloud security architecture addresses the goals by unifying the functional elements.
The security and security architectures for the cloud are not single-player processes. Most
enterprises will keep a large portion of their IT workflow within their data centers, local
networks, and VPNs. The cloud adds additional players, so the cloud security architecture
should be part of a broader shared responsibility model.
Each will divide the components of a cloud application into layers, with the top layer being
the responsibility of the customer and the lower layer being the responsibility of the cloud
provider. Each separate function or component of the application is mapped to the
appropriate layer depending on who provides it. The contract form then describes how each
party responds.
Launching a website is one of the most important thing for a company either it is a startup or
a well established company. But. Launching a website is not an easy task there are a lot of
things to be taken care off.AWS makes it easier for both with complete knowledge or a
startup. By using AWS S3 service hosting a website in a child’s game now. These gives
company holders more time to focus on other important things to be done in the company.
Let’s see how you can host a website using AWS with some easy steps.
The most basic step is to first have a working AWS account and your front end code (.html
file) which will be the content of your website. Don’t worry about the .html content even a
basic <p>Hello World</p> can be made.
Step 2: Create a S3 Bucket of your website
To keep things simple we will be using only one AWS service to host our website that is
AWS S3. AWS S3 in an storage service where all files are stored in S3 Buckets.
Then provide a globally unique name to your S3 Bucket and select the region you want your
bucket to be in.
After clicking on Next, you will see a panel asking you to define some tags to your bucket,
this is optional since tags are just for your recognition of the bucket. You can skip this step by
simply clicking on Next.
Once this is done, a new panel will come up where all the public access to your bucket is by
default denied. But, since we are going to host a website which should be public so that
everyone can see it.
To do this you need to un tick the check box. Once you untick it, a pop up will come warning
you that the bucket is going to be public. So, don’t panic from it just check the
acknowledgement box
Once the bucket is created, now it’s time to upload the .html file onto it. For this click on blue
Upload button on the top right.
Once you click on Next, under the Manage public permission select Grand public read
access to this object, so that your website is publicly readable.
At last select your S3 storage type, we choose the basic standard type. But, to reduce the cost
you can choose any other type depending upon your needs.
Now at the end just review the details and click on Upload.
After these step you can see that our index.html file is successfully uploaded.
To inform your S3 Bucket that you are going to use this for hosting your website, click on
Properties tab. After this select the Static Website hosting title and fill in your document
name, error name is not required(can type 404.html).
Next click on Permission tab, Now you’ll need to click on the “Bucket Policy” subsection.
Here, you’ll be prompted to create a JSON object that contains the details of your bucket’s
access permission policy.
This part can be confusing. For now, I’ll just give you the JSON that will grant full public
access to the files in your bucket. This will make the website publicly accessible.
Once this is done just click on Save and All Done! You have now successfully uploaded a
simple static website on AWS S3.
To access your site, go back to the “Overview” tab on S3 and click on your index document .
You’ll get a slide-in menu with the link on your website.
Copy and Paste the link on your browser and your website will be accessible.
Amazon Simple Notification Service (SNS) is used for the Application to Application (A2A)
and Application to Person (A2P) communication. It provides developers with a highly
scalable, flexible, and cost-effective capability to publish messages from an application and
immediately deliver them to subscribers or other applications. Using AWS Console it is easy
to publish messages to your endpoint of choice (HTTP, SQS, Lambda, mobile push, email, or
SMS) and edit topic policies to control publisher and subscriber access.
Advantages of SNS:
Firstly, open the AWS cloud shell and use the following command for creating the
topic. Specify the name of the topic, for example, gfg-topic:
Here you can choose your protocol to send your message through which protocol. i.e.,
HTTP, SQS, Lambda, mobile push, email, or SMS. After running the command you
will get the email, to confirm your subscription by clicking on the given link.
Publish to a topic: After subscribing to the topic publish your topic and send the
message to a perspective person or device.
Unsubscribe the Topic: For stopping receiving messages from the particular
application you can unsubscribe using this command:
Delete the topic: For deleting the topic you can simply use the below command:
It is the most popular IDE for developing java applications. It is easy to use and powerful
IDE that is the reason behind the trust of many programmers. It can be downloaded from the
given link below as follows.
http://www.eclipse.org/downloads/eclipse-packages
In this step, you can see how to move the eclipse folder to the home directory. You can check
the screenshot for your reference.
Now, in this step, you can see the eclipse icon once you will successfully download and
extract it to the required folder. And double click to open. You can see the screenshot for
your reference.
Hadoop-core-1.2.1.jar
commons-cli-1.2.jar
file—>new—>java project—>finish
right click—>new—>package—>finish
right click on package—>new—>class_name
Now add the following reference libraries as follows. First, we need to go to build path –
>configure build path –> configure build path
Click on add external jars and browse to the folder where the files are downloaded and click
open and select these two files i.e. hadoop-core-1.2.1.jar , commons-cli-1.2.jar.
Cloud based simulation of a Distributed trust algorithm
Before we begin learning about Google Cloud Platform, we will talk about what is Cloud
Computing. Basically it is using someone else’s computer over the internet. Example- GCP,
AWS, IBM Cloud, etc. Some interesting features of cloud computing are as follows:
You get computing resources on-demand and self-service. The customer has to use a
simple User Interface and they get the computing power, storage requirements, and
network you need, without human intervention.
You can access these cloud resources over the internet from anywhere on the globe.
The provider of these resources has a huge collection of these resources and allocates
them to customers out of that collection.
The resources are elastic. If you need more resources you can get more, rapidly. If
you need less, you can scale down back.
The customers pay only for what they use or reserve. If they stop using resources,
they stop paying.
Infrastructure as a Service (IaaS): It provides you all the hardware components you
require such as computing power, storage, network, etc.
Platform as a Service (PaaS): It provides you a platform that you can use to develop
applications, software, and other projects.
Software as a Service (SaaS): It provides you with complete software to use like
Gmail, google drive, etc.
Google Cloud Platform
Starting from 1998 with the launch of google search. google has developed one of the largest
and most Powerful IT Infrastructure in the world. Today, this infrastructure is used by billion
of users to use services such as Gmail, Youtube, Google Photo and Maps.In 2008 , Google
decided to open its network and IT infrastructure to business customers, taking an
infrastructure that was initially developed for consumers application to public service and
launching google cloud platform.
All the services listed above are provided by Google hence the name Google Cloud Platform
(GCP). Apart from these, there are so many other services provided by GCP and also many
concepts related to it that we are going to discuss in this article.
Let’s start at the finest grain level (i.e. the smallest or first step in the hierarchy), the Zone. A
zone is an area where Google Cloud Platform Resources like virtual machines or storage is
deployed.
For example, when you launch a virtual machine in GCP using Compute Engine, it runs in a
zone you specify (suppose Europe-west2-a). Although people consider a zone as being sort of
a GCP Data Center, that’s not strictly accurate because a zone doesn’t always correspond to
one physical building. You can still visualize the zone that way, though.
Zones are grouped into regions which are independent geographic areas and much larger
than zones (for example- all zones shown above are grouped into a single region Europe-
west2) and you can choose what regions you want your GCP resources to be placed in. All
the zones within a neighborhood have fast network connectivity among them. Locations
within regions usually have trip network latencies of under five milliseconds.
A few GCP Services supports deploying resources in what we call a Multi-Region. For
example, Google Cloud Storage, lets you place data within the Europe Multi-Region. What
that means is that it is stored redundantly in a minimum of two different geographic locations,
separated by at least 160 kilometers within Europe. Previously, GCP had 15 regions. Visit
cloud.google.com to ascertain what the entire is up to today.
Pricing
Google was the primary major Cloud provider to bill by the second instead of rounding up to
greater units of your time for its virtual machines as a service offering. This may not sound
like a big deal, but charges for rounding up can really add up for customers who are creating
and running lots of virtual machines. Per second billing is obtainable for a virtual machine
use through Compute Engine and for several other services too.
Compute Engine provides automatically applied use discounts which are discounts that you
simply get for running a virtual machine for a big portion of the billing month. When you run
an instance for at least 25% of a month, Compute Engine automatically gives you a reduction
for each incremental minute you employ it. Here’s one more way Compute Engine saves you
money.
Normally, you choose a virtual machine type from a typical set of those values, but Compute
Engine also offers custom virtual machine types, in order that you’ll fine-tune the sizes of the
virtual machines you use. That way, you’ll tailor your pricing for your workloads.
Open API’s
Some people are afraid to bring their workloads to the cloud because they’re afraid they’ll get
locked into a specific vendor. But in many ways, Google gives customers the power to run
their applications elsewhere, if Google becomes not the simplest provider for his or her
needs. Here are some samples of how Google helps its customers avoid feeling locked in.
GCP services are compatible with open source products. For example, take Cloud
Bigtable, a database that uses the interface of the open-source database Apache HBase, which
provides customers the advantage of code portability. Another example, Cloud Dataproc
provides the open-source big data environment Hadoop, as a managed service, etc.
GCP allows you to choose between computing, storage, big data, machine learning,
and application services for your web, mobile, analytics, and, back-end solutions.
It’s global and it is cost-effective.
It’s open-source friendly.
It’s designed for security.
Advantages of GCP
1. The support fee is sort of hefty: Around 150 USD per month for the foremost basic service
(Silver class).
2. Downloading data from Google Cloud Storage is expensive. 0, 12 USD per GB.
3. Google Cloud Platform web interface is somewhat confusing. Sometimes I am lost while
browsing around the menus.
4. Prices in both Microsoft Azure (around 0.018 USD per GB/month) or Backblaze B2 (about
0.005 USD per GB/month) are less than Google Cloud Storage.
5. It has a high pricing schema, almost like AWS S3, so it’s easy to urge unexpected costs (e.g.
number of requests, transfers, etc.).
Google App Engine lets you run your Python and Java Web applications on elastic
infrastructure supplied by Google. App Engine allows your applications to scale
dynamically as your traffic and data storage requirements increase or decrease. It gives
developers a choice between a Python stack and Java. The App Engine serving architecture
is notable in that it allows real-time auto- scaling without virtualization for many common
types of Web applications. However, such auto-scaling is dependent on the application
developer using a limited subset of the native APIs on each platform, and in some instances
you need to use specific Google APIs such as URLFetch, Data store, and mem cache in
place of certain native API calls. For example, a deployed App Engine application cannot
write to the file system directly (you must use the Google Data store) or open a socket or
access another host directly (you must use Google URL fetch service). A Java application
cannot create a new Thread either.
A scalable runtime environment, Google App Engine is mostly used to run Web applications.
These dynamic scales as demand change over time because of Google’s vast computing
infrastructure. Because it offers a secure execution environment in addition to a number of
services, App Engine makes it easier to develop scalable and high-performance Web apps.
Google’s applications will scale up and down in response to shifting demand. Croon tasks,
communications, scalable data stores, work queues, and in-memory caching are some of
these services.
The App Engine SDK facilitates the testing and professionalization of applications by
emulating the production runtime environment and allowing developers to design and test
applications on their own PCs. When an application is finished being produced, developers
can quickly migrate it to App Engine, put in place quotas to control the cost that is generated,
and make the programmer available to everyone. Python, Java, and Go are among the
languages that are currently supported.
The development and hosting platform Google App Engine, which powers anything from
web programming for huge enterprises to mobile apps, uses the same infrastructure as
Google’s large-scale internet services. It is a fully managed PaaS (platform as a service)
cloud computing platform that uses in-built services to run your apps. You can start creating
almost immediately after receiving the software development kit (SDK). You may
immediately access the Google app developer’s manual once you’ve chosen the language you
wish to use to build your app.
After creating a Cloud account, you may Start Building your App
Using the Go template/HTML package
Python-based webapp2 with Jinja2
PHP and Cloud SQL
using Java’s Maven
The app engine runs the programmers on various servers while “sandboxing” them. The app
engine allows the program to use more resources in order to handle increased demands. The
app engine powers programs like Snapchat, Rovio, and Khan Academy.
To create an application for an app engine, you can use Go, Java, PHP, or Python. You can
develop and test an app locally using the SDK’s deployment toolkit. Each language’s SDK
and nun time are unique. Your program is run in a:
These are protected by the service-level agreement and depreciation policy of the app engine.
The implementation of such a feature is often stable, and any changes made to it are
backward-compatible. These include communications, process management, computing, data
storage, retrieval, and search, as well as app configuration and management. Features like the
HRD migration tool, Google Cloud SQL, logs, datastore, dedicated Memcached, blob store,
Memcached, and search are included in the categories of data storage, retrieval, and search.
Features in Preview
In a later iteration of the app engine, these functions will undoubtedly be made broadly
accessible. However, because they are in the preview, their implementation may change in
ways that are backward-incompatible. Sockets, MapReduce, and the Google Cloud Storage
Client Library are a few of them.
Experimental Features
These might or might not be made broadly accessible in the next app engine updates. They
might be changed in ways that are irreconcilable with the past. The “trusted tester” features,
however, are only accessible to a limited user base and require registration in order to utilize
them. The experimental features include Prospective Search, Page Speed, OpenID,
Restore/Backup/Datastore Admin, Task Queue Tagging, MapReduce, and Task Queue REST
API. App metrics analytics, datastore admin/backup/restore, task queue tagging, MapReduce,
task queue REST API, OAuth, prospective search, OpenID, and Page Speed are some of the
experimental features.
Third-Party Services
As Google provides documentation and helper libraries to expand the capabilities of the app
engine platform, your app can perform tasks that are not built into the core product you are
familiar with as app engine. To do this, Google collaborates with other organizations. Along
with the helper libraries, the partners frequently provide exclusive deals to app engine users.
The Google App Engine has a lot of benefits that can help you advance your app ideas. This
comprises:
1. Infrastructure for Security: The Internet infrastructure that Google uses is arguably
the safest in the entire world. Since the application data and code are hosted on
extremely secure servers, there has rarely been any kind of illegal access to date.
2. Faster Time to Market: For every organization, getting a product or service to
market quickly is crucial. When it comes to quickly releasing the product,
encouraging the development and maintenance of an app is essential. A firm can grow
swiftly with Google Cloud App Engine’s assistance.
3. Quick to Start: You don’t need to spend a lot of time prototyping or deploying the
app to users because there is no hardware or product to buy and maintain.
4. Easy to Use: The tools that you need to create, test, launch, and update the
applications are included in Google App Engine (GAE).
5. Rich set of APIs & Services: A number of built-in APIs and services in Google App
Engine enable developers to create strong, feature-rich apps.
6. Scalability: This is one of the deciding variables for the success of any software.
When using the Google app engine to construct apps, you may access technologies
like GFS, Big Table, and others that Google uses to build its own apps.
7. Performance and Reliability: Among international brands, Google ranks among the
top ones. Therefore, you must bear that in mind while talking about performance and
reliability.
8. Cost Savings: To administer your servers, you don’t need to employ engineers or
even do it yourself. The money you save might be put toward developing other areas
of your company.
9. Platform Independence: Since the app engine platform only has a few dependencies,
you can easily relocate all of your data to another environment.
Features of GWT –
AngularJs GWT
AngularJs is an open source JAVA
Google Web Toolkit is an open-source
SCRIPT framework. It is maintain by
set of tools that allows to manage or
google and support for all the major
create application in java.
browser.
It is developed by google on 20 October GWT developed by google may 16
2010. 2006.
it is written on java programming
It is written on JAVASCRIPT.
language.
AngularJs support MVVM design pattern. GWT support MVP design pattern.
In AngularJs for client server code we In GWT for client server code we have
have MVVM web service. MVC.
GWT is also open-source under Apache
It is open-source under MIT license.
license.
It support dynamic typing. It does not support dynamic typing.
In AngularJs we use cloud platform In GWT we use cloud platform support
support via Google app engine. via digital ocean.
AngularJs support 80 kb file size. GWT support 32mb file size.
It support object-oriented or event driven It support only object oriented
program. programming.
In AngularJs there is some condition for
In GWT it support code generation.
code generation.
Introduction to Microsoft Azure | A cloud computing service
Azure is Microsoft’s cloud platform, just like Google has it’s Google Cloud and Amazon has
it’s Amazon Web Service or AWS.000. Generally, it is a platform through which we can use
Microsoft’s resource. For example, to set up a huge server, we will require huge investment,
effort, physical space and so on. In such situations, Microsoft Azure comes to our rescue. It
will provide us with virtual machines, fast processing of data, analytical and monitoring tools
and so on to make our work simpler. The pricing of Azure is also simpler and cost-effective.
Popularly termed as “Pay As You Go”, which means how much you use, pay only for that.
Azure History
Microsoft unveiled Windows Azure in early October 2008 but it went to live after February
2010. Later in 2014, Microsoft changed its name from Windows Azure to Microsoft Azure.
Azure provided a service platform for .NET services, SQL Services, and many Live Services.
Many people were still very skeptical about “the cloud”. As an industry, we were entering a
brave new world with many possibilities. Microsoft Azure is getting bigger and better in
coming days. More tools and more functionalities are getting added. It has two releases as of
now. It’s famous version Microsoft Azure v1 and later Microsoft Azure v2. Microsoft
Azure v1 was more like JSON script driven then the new version v2, which has interactive UI
for simplification and easy learning. Microsoft Azure v2 is still in the preview version.
Capital less: We don’t have to worry about the capital as Azure cuts out the high cost
of hardware. You simply pay as you go and enjoy a subscription-based model that’s
kind to your cash flow. Also, to set up an Azure account is very easy. You simply
register in Azure Portal and select your required subscription and get going.
Less Operational Cost: Azure has low operational cost because it runs on its own
servers whose only job is to make the cloud functional and bug-free, it’s usually a
whole lot more reliable than your own, on-location server.
Cost Effective: If we set up a server on our own, we need to hire a tech support team
to monitor them and make sure things are working fine. Also, there might be a
situation where the tech support team is taking too much time to solve the issue
incurred in the server. So, in this regard is way too pocket-friendly.
Easy Back Up and Recovery options: Azure keep backups of all your valuable data.
In disaster situations, you can recover all your data in a single click without your
business getting affected. Cloud-based backup and recovery solutions save time,
avoid large up-front investment and roll up third-party expertise as part of the deal.
Easy to implement: It is very easy to implement your business models in Azure.
With a couple of on-click activities, you are good to go. Even there are several
tutorials to make you learn and deploy faster.
Better Security: Azure provides more security than local servers. Be carefree about
your critical data and business applications. As it stays safe in the Azure Cloud. Even,
in natural disasters, where the resources can be harmed, Azure is a rescue. The cloud
is always on.
Work from anywhere: Azure gives you the freedom to work from anywhere and
everywhere. It just requires a network connection and credentials. And with most
serious Azure cloud services offering mobile apps, you’re not restricted to which
device you’ve got to hand.
Increased collaboration: With Azure, teams can access, edit and share documents
anytime, from anywhere. They can work and achieve future goals hand in hand.
Another advantage of the Azure is that it preserves records of activity and data.
Timestamps are one example of the Azure’s record keeping. Timestamps improve
team collaboration by establishing transparency and increasing accountability.
1. Compute: Includes Virtual Machines, Virtual Machine Scale Sets, Functions for
serverless computing, Batch for containerized batch workloads, Service Fabric for
microservices and container orchestration, and Cloud Services for building cloud-
based apps and APIs.
2. Networking: With Azure you can use variety of networking tools, like the Virtual
Network, which can connect to on-premise data centers; Load Balancer; Application
Gateway; VPN Gateway; Azure DNS for domain hosting, Content Delivery Network,
Traffic Manager, ExpressRoute dedicated private network fiber connections; and
Network Watcher monitoring and diagnostics
3. Storage: Includes Blob, Queue, File and Disk Storage, as well as a Data Lake Store,
Backup and Site Recovery, among others.
4. Web + Mobile: Creating Web + Mobile applications is very easy as it includes
several services for building and deploying applications.
5. Containers: Azure has a property which includes Container Service, which supports
Kubernetes, DC/OS or Docker Swarm, and Container Registry, as well as tools for
microservices.
6. Databases: Azure has also includes several SQL-based databases and related tools.
7. Data + Analytics: Azure has some big data tools like HDInsight for Hadoop Spark, R
Server, HBase and Storm clusters
8. AI + Cognitive Services: With Azure developing applications with artificial
intelligence capabilities, like the Computer Vision API, Face API, Bing Web Search,
Video Indexer, Language Understanding Intelligent.
9. Internet of Things: Includes IoT Hub and IoT Edge services that can be combined
with a variety of machine learning, analytics, and communications services.
10. Security + Identity: Includes Security Center, Azure Active Directory, Key Vault
and Multi-Factor Authentication Services.
11. Developer Tools: Includes cloud development services like Visual Studio Team
Services, Azure DevTest Labs, HockeyApp mobile app deployment and monitoring,
Xamarin cross-platform mobile development and more.
Difference between AWS (Amazon Web Services), Google Cloud and Azure
What Does Windows Live Mean?
Windows Live is Microsoft's branded suite of online and client-side tools and applications.
Windows Live includes browser-based Web services, mobile services and Windows Live
Essentials.
Similar to Google Apps, Windows Live is part of Microsoft's cloud strategy, or Software Plus
Services (Software + Services or S+S).
Released in November 2005, Windows Live serves as an online user gateway that provides
Microsoft and third-party applications for seamless user interaction. Classic Windows Live
applications include Hotmail (Microsoft's free email service), Live Messenger, Live Photos
and Live Calendar.
Windows Live Mail: POP3 email client that easily integrates with non-Microsoft
email services
Windows Live SkyDrive: Facilitates online Microsoft Office collaboration and
provides free cloud storage for documents and photos.
Windows Live Messenger Companion: Internet Explorer add-in for live collaboration
Windows Live Family Safety: Extends parental controls in Windows 7 and Vista
Exchange Online: Exchange Online is the hosted version of Microsoft's Exchange Server
messaging platform that organizations can obtain as a stand-alone service or via an Office
365 subscription.
Exchange Online gives companies a majority of the same benefits that on-premises Exchange
deployments provide. Users connect to Exchange Online via the Microsoft Outlook desktop
client, Outlook on the web with a web browser, or with mobile devices using the Outlook
mobile app to access email and collaboration functionality, including shared calendars, global
address lists and conference rooms.
Administrators use the Exchange admin center to tweak features in Exchange Online, such as
the ability to put disclaimers in an email.
The Exchange admin center is a centralized management console used to adjust Exchange
Online features, including permissions, compliance management, protection and mobile to
configure mobile device access.
Administrators can also use Windows PowerShell to set up permissions and manage
functionality from the PowerShell command line using cmdlets. While the Exchange admin
center operates from a web browser, PowerShell requires the administrator to execute several
steps to make a remote PowerShell session to Exchange Online.
With the basic Exchange Online Plan 1 offering, users have 50 GB of mailbox storage at a
cost of $4 per user, per month. Microsoft provides Exchange Online Protection as part of this
service to scan email for malware and spam.
At $8 a month, the Exchange Online Plan 2 gives unlimited mailbox space and Unified
messaging features, including call answering and automated attendant functionality.
Administrators get additional features such as data loss prevention policies for regulated
industries and organizations that require additional protections for sensitive information.
Organizations that use the Office 365 Business Premium subscription pay $12.50 per user,
per month for Exchange Online and access to other features, including web- and desktop-
based Office 365 applications, SharePoint intranet features, 1 TB of storage via the OneDrive
for Business service, and video conferencing with Microsoft Teams, which is replacing
Skype for Business.
Microsoft offers Exchange Online in its other Office 365 offerings, including Office 365
Business Essentials, Office 365 Business, Office 365 Enterprise E1, Office 365 Enterprise
E3, Office 365 Enterprise E5, Office 365 Enterprise F1, Office 365 Education and Office 365
Education E5.
Deployment choices
Organizations can use Exchange Online in a hybrid arrangement in which some mailboxes
remain in the data center while others are hosted in Microsoft's cloud. A hybrid deployment
allows an organization to retain some control and use some of its on-premises functionality,
such as secure mail transport, with a cloud-based mailbox.
Organizations can also use cloud-only deployments that put all the mailboxes in a Microsoft
data center.
Microsoft positions Exchange Online as one way to reduce the workload of an IT staff. It
takes time and effort to maintain an on-premises version of Exchange to ensure mail,
calendars and other messaging features perform as expected.
Administrators must contend with regular patching via Patch Tuesday to maintain the
stability and security of the Exchange deployment. The IT staff must also plot out an upgrade
if their current Exchange Server version is due to move out of support, which might require
purchasing newer equipment and developing a method to perform an upgrade without
disrupting end users. With Exchange Online, Microsoft runs the service in its data centers and
executes updates without downtime or involving an organization's IT department.
A switch to Exchange Online can alleviate some of the hardware issues and problems with
infrastructure components that can affect an on-premises Exchange deployment. Microsoft
touts the stability of its service and offers a 99.9% uptime guarantee and a service-level
agreement to provide a credit back to the organization if a disruption occurs.
Cost is another factor to consider when deciding whether to stay with Exchange Server or
move to Exchange Online. Depending on the number of users and the frequency of server
hardware upgrades, it might be cheaper to subscribe to Exchange Online for smaller
organizations that buy new server equipment every three years.
Another benefit of Exchange Online is scalability. A cloud-based service can more quickly
absorb a merger involving a significant number of users than if it had an on-premises
Exchange deployment.
The other advantage of storing data in Microsoft's data centers is not needing to develop and
maintain the infrastructure for disaster recovery.
A drawback to Exchange Online is that disruptions will happen, and because Microsoft
handles the support, it can be difficult to determine when the service will return.
Another drawback of Exchange Online is that Microsoft updates its cloud services on a
frequent basis to add, remove and modify certain features. Users might get frustrated if some
functionality changes or disappears when Microsoft pushes out an update. With an on-
premises Exchange deployment, the feature set tends to remain fixed.
Some organizations might have to stop using certain third-party tools or applications that
work with Exchange Server if they don't integrate with Exchange Online. These
organizations might want to maintain the level of flexibility provided by an on-premises
messaging deployment.
SharePoint and Microsoft Azure are two sizeable platforms unto themselves. SharePoint is
one of Microsoft’s leading server productivity platforms or the collaborative platform for the
enterprise and the Web.
Microsoft Azure is Microsoft’s operating system in the cloud. Separately, they have their
own strengths, market viability, and developer following.
They help expand how and where you deploy your code and data.
They increase opportunities to take advantage of the Microsoft Azure while at the
same time reducing the storage and failover costs of on-premises applications.
They provide you with new business models and offerings that you can take to your
customers to increase your own solution offerings.
In SharePoint 2010, Azure and SharePoint were two distinct platforms and technologies,
which could be integrated easily enough, but they were not part of the same system.
However, in SharePoint 2013 this has changed.
SharePoint 2013 introduces different types of cloud applications. In fact, you can build two
types of Azure integrated applications.
The first type of application is Autohosted, and the second is Provider-hosted (sometimes
referred to as self-hosted).
Thus, you can take advantage of the entire Microsoft Azure stack when building
Providerhosted apps that use Azure.
The CRM Solution can be used to drive the sales productivity and marketing effectiveness for
an organization, handle the complete customer support chain, and provide social insights,
business intelligence, and a lot of other out-of-the-box functionalities and features. As a
product, Microsoft Dynamics CRM also offers full mobile support for using CRM apps on
mobiles and tablets.
As of writing this tutorial, the latest version of CRM is CRM 2016. However, in this tutorial
we will be using CRM 2015 Online version as it is the latest stable version as well as
frequently used in many organizations. Nevertheless, even if you are using any other versions
of CRM, all the concepts in the tutorial will still hold true.
Product Offerings
CRM Online
CRM Online is a cloud-based offering of Microsoft Dynamics CRM where all the backend
processes (such as application servers, setups, deployments, databases, licensing, etc.) are
managed on Microsoft servers. CRM Online is a subscription-based offering which is
preferred for organizations who may not want to manage all the technicalities involved in a
CRM implementation. You can get started with setting up your system in a few days (not
weeks, months or years) and access it on web via your browser.
CRM On-Premise
CRM on-premise is a more customized and robust offering of Microsoft Dynamics CRM,
where the CRM application and databases will be deployed on your servers. This offering
allows you to control all your databases, customizations, deployments, backups, licensing and
other network and hardware setups. Generally, organizations who want to go for a
customized CRM solution prefer on-premise deployment as it offers better integration and
customization capabilities.
From the functional standpoint, both the offerings offer similar functionalities; however, they
differ significantly in terms of implementation. The differences are summarized in the
following table.
You can get started with an online offering in Setting up an on-premise offering needs
a matter of few days. You pay for the users technical skills as well as sufficient time to
and used space on-the-go. setup the CRM instance and get it running.
It supports relatively less customizations and It supports relatively more customization and
extensions. extensions.
CRM Online does not give the ability to CRM on-premise gives complete ability to
perform manual data backup and restore manage your database.
options, since the database is hosted on
Microsoft servers. However, Microsoft
performs daily backups of the database.
CRM Online supports automatic updates to CRM on-premise updates need to be installed
future version. by the administrator.
Accessing CRM
Microsoft Dynamics CRM can be accessed via any of the following options −
Browser
Mobile and Tablets
Outlook
Product Competitors
Microsoft Dynamics CRM is undoubtedly one of the top products in the CRM space.
However, following are the other products that compete with Microsoft Dynamics CRM.
Salesforce.com
Oracle
SAP
Sage CRM
Sugar CRM
NetSuite
Product Versions
Microsoft Dynamics CRM has grown over the years starting from its 1.0 version in 2003.
The latest version (as of writing this article) is 2015. Following is the chronological list of
release versions −