0% found this document useful (0 votes)

67 views19 pages

Unit 5 Notes

The document outlines major techniques in data science, focusing on data mining, data warehousing, and machine learning. It details the phases of data mining, the architecture of data warehousing, and the distinctions between data mining and data warehousing. Additionally, it covers various applications and tools used in these fields, as well as the steps and types of machine learning.

Uploaded by

g27715292

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

67 views19 pages

Unit 5 Notes

Uploaded by

g27715292

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

UNIT V MAJOR TECHNIQUES IN DATA SCIENCE

Data mining – Data warehousing – Data mining vs Data warehouse – Machine learning –
Supervised learning – Unsupervised learning – Business Intelligence – Cloud computing.

1.DATA MINING

Data mining is a multidisciplinary field that involves extracting valuable insights from large
datasets. It combines techniques from statistics, machine learning, artificial intelligence, and
database systems to identify patterns, relationships, and trends that are not immediately obvious.
This knowledge can be used to make predictions, improve decision-making, and uncover hidden
insights within vast amounts of data.

1. Phases of Data Mining

The process of data mining can be broken down into the following phases:

1. Data Collection:
The first step in any data mining project is gathering data. This could come from various
sources, including databases, data warehouses, data lakes, and external data sources. It's
important to have access to large, clean, and relevant data.
2. Data Preprocessing:
This phase involves cleaning and preparing the data for analysis. It includes:
o Data Cleaning: Removing or correcting errors, missing values, and
inconsistencies in the data.
o Data Transformation: Normalizing or scaling data to ensure all variables are on
the same scale (especially important in algorithms like k-means clustering).
o Data Integration: Combining data from different sources and ensuring
compatibility.
o Data Reduction: Reducing the complexity of the data without losing important
information (e.g., dimensionality reduction).
3. Data Exploration:
Here, data scientists perform exploratory data analysis (EDA) to better understand the
data's structure, patterns, and relationships. Visualization techniques and summary
statistics are often used.
4. Data Mining:
This is the core phase, where various data mining techniques are applied to the data.
These techniques help to uncover patterns, trends, and relationships. The primary
methods include:
o Classification: Predicting a categorical label for new data based on historical
data. For example, classifying an email as spam or not.
o Clustering: Grouping similar data points together based on certain attributes. For
instance, clustering customers based on buying behavior.
o Regression: Predicting a continuous numeric value based on input features. For
example, predicting house prices based on factors like location, size, and age.
o Association Rule Mining: Discovering interesting relationships between
variables, such as finding that customers who buy diapers are also likely to buy
baby wipes.
o Anomaly Detection: Identifying rare or unusual patterns, often used for fraud
detection or fault detection.
5. Model Evaluation:
After building the model, it’s crucial to assess its performance. Evaluation metrics
depend on the type of problem:
o For classification, metrics such as accuracy, precision, recall, and F1 score are
used.
o For regression, mean squared error (MSE) or R-squared can be used.
o Cross-validation is often applied to check how well the model generalizes to new,
unseen data.
6. Deployment and Monitoring:
After evaluation, the model can be deployed into production to start making predictions
or identifying patterns in real-time data. Continuous monitoring and updates are required
to maintain model accuracy as new data is collected.
Tools and Technologies in Data Mining

There are many tools and technologies available for data mining. Some of the most commonly
used ones include:

 Programming Languages:
o Python: Libraries like Pandas, Scikit-learn, TensorFlow, and Keras are widely
used in data mining for data manipulation, analysis, and machine learning.
o R: Another popular language for statistical analysis and data mining, with
packages like caret, randomForest, and ggplot2 for visualization.
 Data Mining Software:
o RapidMiner: A user-friendly tool that provides a wide array of data mining and
machine learning algorithms without much programming.
o KNIME: Open-source data analytics platform with a graphical interface to build
data pipelines.
o Weka: A collection of machine learning algorithms for data mining tasks, with an
easy-to-use interface.
 Databases:
o SQL Databases: For managing structured data.
o NoSQL Databases: Used for unstructured or semi-structured data, like
MongoDB or Cassandra.
o Data Warehouses: Specialized databases for reporting and analysis (e.g.,
Amazon Redshift, Google BigQuery).

Applications of Data Mining

 Business & Marketing:

o Customer segmentation, churn prediction, and targeted marketing campaigns.
o Recommendation systems (e.g., Amazon recommending products based on user
behavior).
 Finance:
o Credit scoring, risk analysis, and fraud detection.
 Healthcare:
o Predictive modeling for disease outbreaks, personalized medicine, and patient
outcome prediction.
 Retail:
o Market basket analysis, product recommendation, and inventory management.
 Social Media:
o Sentiment analysis, trend detection, and social network analysis.

2.DATA WAREHOUSING

Data warehousing is the process of collecting, storing, and managing large volumes of data
from multiple sources into a centralized repository designed for reporting and analysis. It enables
businesses to consolidate data from disparate sources and provides a platform for decision
support and business intelligence (BI). The data warehouse is structured to support query
processing and decision-making activities efficiently.

Data Warehouse Architecture

1. Data Source Layer

This is the foundation of the data warehousing architecture. The data source layer consists of
all the external data sources that provide the raw data for the data warehouse. These sources
could include:

 Operational Databases: These are transactional systems (e.g., CRM, ERP) that manage
day-to-day operations and generate transactional data.
 Flat Files: Data can also come from files like CSV, Excel, or log files.
 External Data: This could include third-party data sources or data from public APIs.
 Social Media, IoT Devices, and Web Scraping: Additional sources of data that can feed
into the data warehouse.

2. Data Staging Layer

Once data is collected from the data sources, it is moved to the data staging layer. This is a
temporary area where data is processed before it is loaded into the data warehouse. In this layer,
the data undergoes ETL (Extract, Transform, Load) operations.

 Extract: Data is extracted from different sources.

 Transform: Data is cleaned and transformed into a format that can be loaded into the
data warehouse. This may involve removing duplicates, handling missing values, and
applying business rules for data consistency.
 Load: The transformed data is loaded into the data warehouse.

3. Data Warehouse Layer

The data warehouse layer is the central repository where data is stored after the ETL process.
This layer is designed to support reporting, querying, and analytical tasks. It is the most
important part of the architecture and serves as the source of truth for all business intelligence
activities.

Key characteristics of the data warehouse layer:

 Relational Database: Data is typically stored in relational database management systems

(RDBMS) such as Oracle, SQL Server, or Teradata, though modern cloud-based
systems may use columnar storage like Amazon Redshift or Google BigQuery.
 Data Models: The data is organized using dimensional modeling techniques (e.g., star
schema, snowflake schema) or normalized structures (3NF) depending on
performance and use case.
 Historical Data: Data warehouses store large volumes of historical data, enabling trends
and comparisons over time.

4. Data Presentation Layer

The data presentation layer is where end users interact with the data warehouse. This layer
involves tools and interfaces that allow users to visualize, analyze, and report on the data stored
in the warehouse. Business intelligence (BI) tools are commonly used in this layer.

Features of the data presentation layer:

 Business Intelligence (BI) Tools: Tools like Tableau, Power BI, QlikView, and
Looker allow users to create reports, dashboards, and visualizations that provide insights.
 **OLAP (Online

Applications of Data Warehousing

1. Business Intelligence (BI):

Data warehouses are the foundation for BI applications, providing a consolidated data
source for generating reports, dashboards, and analytics.
2. Financial Analysis:
Financial institutions use data warehouses to consolidate data from various departments
and systems to generate financial reports, perform risk analysis, and detect fraud.
3. Sales and Marketing Analytics:
Sales and marketing teams rely on data warehouses to analyze customer behavior, track
sales performance, and run marketing campaigns effectively.
4. Healthcare:
Healthcare organizations use data warehouses to integrate patient data, analyze treatment
outcomes, and make data-driven decisions to improve patient care.

3.DATA MINING VS DATA WAREHOUSE

1. Definition

 Data Warehousing:
Data warehousing is the process of collecting, storing, and managing large volumes of
historical data from various sources into a centralized repository known as a data
warehouse. This structured repository is designed to support query and reporting
processes, enabling business intelligence (BI) and decision-making.
 Data Mining:
Data mining refers to the process of analyzing large datasets to uncover hidden patterns,
correlations, and useful insights. It involves using algorithms and statistical techniques to
extract valuable knowledge from the data stored in databases or data warehouses. Data
mining is more focused on discovering insights rather than storing or organizing data.

2. Primary Focus

 Data Warehousing:
The primary focus of data warehousing is on data storage, management, and
consolidation. It involves organizing and storing data from various operational systems
into a centralized location, typically optimized for read-heavy analytical processing.
 Data Mining:
Data mining is focused on the analysis of data to find patterns, trends, or relationships.
The goal is to extract actionable insights that can drive decision-making or predict future
outcomes.

3. Purpose

 Data Warehousing:
The main purpose of a data warehouse is to store and consolidate data for reporting,
querying, and analysis. A data warehouse supports Business Intelligence (BI) activities
by providing a single source of truth from which companies can draw data for reporting
and decision-making.
 Data Mining:
The primary purpose of data mining is to discover patterns and trends in the data. It
uses advanced analytical techniques, such as machine learning, clustering, classification,
and regression, to extract insights that were previously hidden in large datasets.
4. Key Technologies and Techniques

 Data Warehousing:
o ETL (Extract, Transform, Load): Used to extract data from multiple sources,
clean and transform it, and load it into the data warehouse.
o OLAP (Online Analytical Processing): Used to analyze multidimensional data
and generate reports and dashboards.
o Relational Database Management Systems (RDBMS): For storing structured
data in a way that is optimized for reporting and analysis.
 Data Mining:
o Classification: Assigning data into predefined categories or classes.
o Clustering: Grouping data into similar categories.
o Regression: Predicting a continuous value.
o Association Rule Mining: Identifying relationships between variables (e.g., in
market basket analysis).
o Anomaly Detection: Identifying outliers or rare events.

Example Applications

 Data Warehousing:
o Business Reporting: A retail chain may have a data warehouse containing sales
data from multiple stores. Analysts can query the warehouse to generate sales
reports, customer behavior insights, and inventory analysis.
o Historical Trend Analysis: A financial organization might use a data warehouse
to store transaction data and track long-term trends, such as stock performance or
client portfolios over time.
 Data Mining:
o Customer Segmentation: A company can use data mining to analyze customer
behavior and segment customers into groups based on purchasing patterns,
demographics, and preferences.
o Fraud Detection: In banking, data mining techniques can be applied to detect
unusual transaction patterns that could indicate fraudulent activity.
o Recommendation Systems: Data mining algorithms are used by e-commerce
sites (like Amazon or Netflix) to recommend products or movies based on users’
past behavior and preferences.
Key Difference

4.MACHINE LEARNING

Machine Learning (ML) is a branch of artificial intelligence (AI) that focuses on developing
algorithms and statistical models that allow computers to learn from and make predictions or
decisions based on data, without being explicitly programmed. The goal of machine learning is
to enable computers to automatically improve their performance on a task through experience.

Steps in Machine Learning Workflow

1. Data Collection:
Collect relevant data from various sources such as databases, sensors, websites, or logs.
The quality and quantity of the data are crucial for building an effective machine learning
model.
2. Data Preprocessing:
Raw data often requires cleaning and transformation before it can be used in machine
learning models. This involves:
o Handling missing values
o Removing duplicates
o Encoding categorical data
o Normalizing or scaling numerical data
3. Feature Selection and Engineering:
Selecting the most important features (variables) and transforming them into a format
suitable for the model. Feature engineering involves creating new features based on
domain knowledge to improve the model's performance.
4. Model Training:
Train the machine learning model on the prepared data. During training, the algorithm
learns to recognize patterns in the data that correlate with the target output.
5. Model Evaluation:
Once the model is trained, evaluate its performance using testing data that it has not seen
before. Common evaluation metrics include accuracy, precision, recall, F1-score, and
mean squared error (MSE).
6. Model Tuning:
Adjust the hyperparameters (e.g., learning rate, number of layers in a neural network) to
improve the model’s performance. Techniques such as cross-validation, grid search, and
random search are commonly used for hyperparameter tuning.
7. Model Deployment:
Once the model is trained and tuned, it is deployed into production where it can make
predictions on real-world data.
8. Model Monitoring and Maintenance:
After deployment, it’s important to monitor the model’s performance over time and
retrain it if necessary, especially if the data changes or if the model starts to degrade.

Types of Machine Learning

1. Supervised Learning:
In supervised learning, the algorithm is trained using a labeled dataset, where both the
input data (features) and the correct output (labels) are provided. The goal is to learn a
mapping from inputs to outputs, so the model can predict the label for new, unseen data.
o Example: Predicting house prices based on features such as the number of rooms,
location, etc.
o Algorithms: Linear Regression, Logistic Regression, Decision Trees, Support
Vector Machines (SVM), Neural Networks.
2. Unsupervised Learning:
In unsupervised learning, the algorithm is provided with input data but no labels
(outputs). The goal is to identify underlying patterns or groupings in the data. It is often
used for clustering, anomaly detection, or association.
o Example: Grouping customers into segments based on purchasing behavior.
o Algorithms: K-Means Clustering, Hierarchical Clustering, Principal Component
Analysis (PCA), Association Rule Learning.
3. Reinforcement Learning:
In reinforcement learning, an agent interacts with an environment and learns by receiving
rewards or penalties for actions taken. The agent's goal is to maximize the cumulative
reward by choosing the best actions over time.
o Example: Training a robot to navigate a maze by rewarding it for getting closer to
the goal and penalizing it for making wrong moves.
o Algorithms: Q-Learning, Deep Q-Networks (DQN), Policy Gradient Methods.

Applications of Machine Learning

Machine learning has a wide variety of applications across different industries, such as:
 Healthcare:
o Predicting disease outbreaks, diagnosing illnesses, and personalizing treatment
plans.
o Machine learning models can analyze medical images, such as X-rays and MRIs,
to assist doctors in diagnosis.
 Finance:
o Fraud detection, credit scoring, algorithmic trading, and risk management.
o Predicting stock prices and market movements based on historical data.
 Retail and E-commerce:
o Recommender systems that suggest products based on customer preferences.
o Predicting customer behavior, optimizing inventory, and demand forecasting.
 Autonomous Vehicles:
o Machine learning is used to train self-driving cars to recognize objects, navigate
roads, and make decisions in real-time.
 Marketing:
o Customer segmentation, targeted advertising, and social media analysis to
personalize marketing efforts.
 Natural Language Processing (NLP):
o Sentiment analysis, machine translation, chatbots, and speech recognition.
 Image and Video Processing:
o Image recognition, facial recognition, and object detection in security systems,
social media, and autonomous vehicles.

5.SUPERVISED LEARNING

How Supervised Learning Works?

In supervised learning, models are trained using labelled dataset, where the model learns about
each type of data. Once the training process is completed, the model is tested on the basis of test
data (a subset of the training set), and then it predicts the output.

The working of Supervised learning can be easily understood by the below example and
diagram:

uppose we have a dataset of different types of shapes which includes square, rectangle, triangle,
and Polygon. Now the first step is that we need to train the model for each shape.
o If the given shape has four sides, and all the sides are equal, then it will be labelled as
a Square.
o If the given shape has three sides, then it will be labelled as a triangle.
o If the given shape has six equal sides then it will be labelled as hexagon.
Now, after training, we test our model using the test set, and the task of the model is to identify
the shape.

The machine is already trained on all types of shapes, and when it finds a new shape, it classifies
the shape on the bases of a number of sides, and predicts the output.

Types of supervised Machine learning Algorithms:

1. Regression

Regression algorithms are used if there is a relationship between the input variable and the
output variable. It is used for the prediction of continuous variables, such as Weather forecasting,
Market Trends, etc. Below are some popular Regression algorithms which come under
supervised learning:

o Linear Regression
o Regression Trees
o Non-Linear Regression
o Bayesian Linear Regression
o Polynomial Regression
2. Classification

Classification algorithms are used when the output variable is categorical, which means there are
two classes such as Yes-No, Male-Female, True-false, etc.

Spam Filtering,

o Random Forest
o Decision Trees
o Logistic Regression
o Support vector Machines

Advantages of Supervised learning:

o With the help of supervised learning, the model can predict the output on the basis of
prior experiences.
o In supervised learning, we can have an exact idea about the classes of objects.
o Supervised learning model helps us to solve various real-world problems such as fraud
detection, spam filtering, etc.

Disadvantages of supervised learning:

o Supervised learning models are not suitable for handling the complex tasks.
o Supervised learning cannot predict the correct output if the test data is different from the
training dataset.
o Training required lots of computation times.
o In supervised learning, we need enough knowledge about the classes of object.

6.UNSUPERVISED LEARNING

Unsupervised learning is a type of machine learning in which models are trained using
unlabeled dataset and are allowed to act on that data without any supervision.

Working of Unsupervised Learning

Types of Unsupervised Learning Algorithm:

o Clustering: Clustering is a method of grouping the objects into clusters such that objects
with most similarities remains into a group and has less or no similarities with the objects
of another group. Cluster analysis finds the commonalities between the data objects and
categorizes them as per the presence and absence of those commonalities.
o Association: An association rule is an unsupervised learning method which is used for
finding the relationships between variables in the large database. It determines the set of
items that occurs together in the dataset. Association rule makes marketing strategy more
effective. Such as people who buy X item (suppose a bread) are also tend to purchase Y
(Butter/Jam) item. A typical example of Association rule is Market Basket Analysis.

Unsupervised Learning algorithms:

o K-means clustering
o KNN (k-nearest neighbors)
o Hierarchal clustering
o Anomaly detection
o Neural Networks
o Principle Component Analysis
o Independent Component Analysis
o Apriori algorithm
o Singular value decomposition

Advantages of Unsupervised Learning

o Unsupervised learning is used for more complex tasks as compared to supervised

learning because, in unsupervised learning, we don't have labeled input data.
o Unsupervised learning is preferable as it is easy to get unlabeled data in comparison to
labeled data.

Disadvantages of Unsupervised Learning

o Unsupervised learning is intrinsically more difficult than supervised learning as it does

not have corresponding output.
o The result of the unsupervised learning algorithm might be less accurate as input data is
not labeled, and algorithms do not know the exact output in advance.

7.BUSINESS INTELLIGENCE (BI)

Business Intelligence (BI) refers to the technologies, processes, and practices used to collect,
analyze, present, and interpret business data to help organizations make informed decisions. BI
helps companies to gain insights into their operations, understand market trends, and optimize
strategies for competitive advantage. It involves using various tools, systems, and methodologies
to turn raw data into meaningful and actionable insights.

Key Components of Business Intelligence

1. Data Collection and Integration:

o Gathering data from various sources, such as internal systems (ERP, CRM),
external databases, and online data sources. This data can be structured (tables,
spreadsheets) or unstructured (text, social media).
o Data Warehousing: This involves consolidating and storing data in a centralized
repository for easy access and analysis.
o ETL (Extract, Transform, Load): A process used to extract data from different
sources, transform it into a usable format, and load it into a data warehouse or
database.
2. Data Analysis:
o BI tools analyze historical, current, and predictive data to find trends, patterns,
and insights.
o Descriptive Analytics: Explains what happened in the past and provides insights
based on historical data.
o Diagnostic Analytics: Seeks to explain why something happened.
o Predictive Analytics: Uses data modeling and statistical algorithms to forecast
future outcomes.
o Prescriptive Analytics: Suggests actions that should be taken based on the
analysis of data.
3. Data Visualization:
o Presenting data in graphical formats, such as charts, graphs, and dashboards, to
make complex data more understandable.
o Dashboards: Real-time views of key performance indicators (KPIs) and metrics.
o Reports: Detailed, often static, data summaries that inform decision-makers about
specific metrics or areas of interest.
o Interactive Visualizations: Allow users to interact with the data and explore
different perspectives.
4. Reporting and Querying:
o BI systems allow users to generate reports and run ad-hoc queries to answer
specific business questions.
o Custom Reports: Tailored to meet the specific needs of different stakeholders in
an organization.
o Ad-Hoc Queries: Allows users to ask specific questions and receive answers
based on the available data.
5. Data Mining:
o This involves discovering hidden patterns or relationships in large datasets using
machine learning, statistical methods, and advanced algorithms.
6. Performance Management:
o BI systems help track business performance against predefined KPIs, goals, or
benchmarks. This helps in evaluating whether the business is on track to meet its
objectives.

Types of Business Intelligence Tools

1. Data Warehousing Solutions:

o These tools help consolidate data from various sources and store it for easy
access. Examples: Amazon Redshift, Microsoft Azure Synapse Analytics.
2. ETL Tools:
o These help in extracting data from various sources, transforming it into a format
that's suitable for analysis, and loading it into a data warehouse. Examples:
Talend, Informatica, Apache Nifi.
3. BI Reporting and Querying Tools:
o These tools are used for querying databases and generating reports. Examples:
SQL Server Reporting Services (SSRS), Crystal Reports.
4. Data Visualization Tools:
o These tools help in creating visual reports, dashboards, and charts to make
complex data more understandable. Examples: Tableau, Power BI, Qlik Sense,
Looker.
5. Self-Service BI:
o Tools that allow non-technical users to interact with data and generate insights
without needing assistance from IT. Examples: Tableau, Power BI, Sisense.
6. Advanced Analytics Tools:
o These tools provide more sophisticated analysis, such as predictive modeling and
data mining. Examples: SAS, R, Python with libraries (like Pandas, Scikit-learn).

Key Benefits of Business Intelligence

1. Improved Decision Making:

o BI provides data-driven insights that help decision-makers make informed
choices, leading to better strategic and operational decisions.
2. Increased Operational Efficiency:
o By analyzing business processes and data, organizations can identify
inefficiencies and streamline operations, reducing costs and increasing
productivity.
3. Better Customer Insights:
o BI tools help analyze customer behavior, preferences, and feedback, enabling
businesses to tailor products, services, and marketing efforts.
4. Competitive Advantage:
o BI allows businesses to track industry trends, competitor performance, and market
dynamics, helping them to anticipate changes and act proactively.
5. Enhanced Financial Management:
o BI tools can help organizations analyze financial data, track expenses, and
forecast future financial performance, enabling better budgeting and resource
allocation.
6. Improved Compliance and Risk Management:
o BI helps organizations stay compliant with regulatory requirements by tracking
data and generating reports. It also helps identify and mitigate potential risks.

Use Cases of Business Intelligence

1. Sales and Marketing:

o Analyzing customer data, sales performance, marketing campaigns, and
segmentation to optimize strategies and improve sales.
2. Finance and Accounting:
o Monitoring financial performance, identifying cost-cutting opportunities, and
forecasting revenue. BI tools are used for financial planning, budgeting, and
compliance.
3. Human Resources (HR):
o Analyzing employee performance, turnover rates, recruitment, and compensation
data to optimize workforce management.
4. Supply Chain Management:
o Tracking inventory levels, supplier performance, and logistics to optimize the
supply chain, reduce costs, and improve delivery times.
5. Healthcare:
o BI is used in healthcare for improving patient care, managing hospital operations,
monitoring medical equipment usage, and ensuring compliance with regulations.
6. Retail:
o Retailers use BI to track sales performance, inventory levels, customer behavior,
and supply chain logistics to optimize product offerings and pricing strategies.

Challenges of Business Intelligence

1. Data Quality:
o BI is only as good as the data it analyzes. Poor-quality or incomplete data can lead
to incorrect insights and decisions.
2. Data Security and Privacy:
o Storing and analyzing sensitive data presents security and privacy challenges.
Organizations need to implement strong data protection measures to ensure
compliance with regulations (e.g., GDPR, HIPAA).
3. Integration Issues:
o Integrating data from different systems, especially legacy systems, can be
complex and time-consuming.
4. Cost:
o Implementing and maintaining BI tools and systems can be costly, especially for
small and medium-sized enterprises (SMEs).
5. User Adoption:
o Getting stakeholders to adopt BI tools and fully leverage them for decision-
making can be challenging, particularly in organizations with limited data
literacy.

8.CLOUD COMPUTING
What is cloud computing?

Cloud computing is the on-demand delivery of IT resources over the Internet with pay-as-you-go
pricing. Instead of buying, owning, and maintaining physical data centers and servers, you can
access technology services, such as computing power, storage, and databases, on an as-needed
basis from a cloud provider like Amazon Web Services (AWS).

Who is using cloud computing?

Organizations of every type, size, and industry are using the cloud for a wide variety of use
cases, such as data backup, disaster recovery, email, virtual desktops, software development and
testing, big data analytics, and customer-facing web applications. For example, healthcare
companies are using the cloud to develop more personalized treatments for patients. Financial
services companies are using the cloud to power real-time fraud detection and prevention. And
video game makers are using the cloud to deliver online games to millions of players around the
world.

Characteristics of Cloud Computing

1. On-Demand Self-Service:
o Users can provision computing resources (such as storage or processing power)
automatically, without needing to interact with service providers.
2. Broad Network Access:
o Cloud services are accessible from various devices and locations, as long as
there's internet access, enabling flexible and remote work.
3. Resource Pooling:
o The cloud provider’s resources (e.g., processing power, storage) are pooled
together and distributed to multiple users based on demand. This is achieved
through multi-tenant models.
4. Rapid Elasticity:
o Cloud resources can be scaled up or down quickly, providing flexibility to
accommodate fluctuating workloads.
5. Measured Service:
o Cloud services are metered, meaning users only pay for what they use, based on
usage or resource consumption.
6. Security and Privacy:
o Cloud providers invest in high levels of security for data storage and transfer,
though users also need to implement their own security practices.

Deployment Models

1. Public Cloud:
o Services are delivered over the internet and shared among multiple organizations
(tenants). Examples: AWS, Google Cloud, Microsoft Azure.
2. Private Cloud:
o A dedicated cloud infrastructure for a single organization. This is used when there
are strict security, compliance, or data privacy requirements. It can be hosted
internally or externally by a third-party provider.
3. Hybrid Cloud:
o A mix of public and private clouds that work together, allowing data and
applications to be shared between them. This model provides more flexibility and
optimization.

Common Cloud Computing Services

1. Cloud Storage:
o Services that allow you to store and retrieve data online, such as Dropbox, Google
Drive, and Amazon S3.
2. Cloud Databases:
o Databases provided as a service, such as Amazon RDS, Google Cloud SQL, and
Azure SQL Database.
3. Cloud Hosting:
o Hosting services for websites and applications, like AWS EC2, Google Compute
Engine, or DigitalOcean.
4. Cloud Analytics and AI:
o Cloud-based tools for processing and analyzing data, including services for
machine learning, data analytics, and artificial intelligence (e.g., Google AI, AWS
SageMaker).

Future Trends in Cloud Computing

1. Edge Computing:
o Processing data closer to where it's generated (e.g., on IoT devices) rather than
relying on centralized cloud servers. This reduces latency and bandwidth usage.
2. Serverless Computing:
o The growth of serverless computing, where developers can focus on writing code
without managing the underlying infrastructure, is rapidly increasing.
3. AI and Cloud Integration:
o More AI and machine learning services are being integrated into cloud platforms,
making it easier for businesses to leverage advanced analytics.
4. Quantum Computing:
o Cloud-based quantum computing services are beginning to emerge, offering
potential advancements in computing power for specialized tasks.

Internship
No ratings yet
Internship
12 pages
Data Warehousing Essentials
No ratings yet
Data Warehousing Essentials
19 pages
Data Mininng
No ratings yet
Data Mininng
11 pages
Introduction To Data Warehouse
No ratings yet
Introduction To Data Warehouse
17 pages
Datawarehouse and Data Mining Final Notes
No ratings yet
Datawarehouse and Data Mining Final Notes
9 pages
Data Warehosing and Data Mining
No ratings yet
Data Warehosing and Data Mining
15 pages
Lecture 2.1.1 2.1.2
No ratings yet
Lecture 2.1.1 2.1.2
19 pages
ISS - Module 3
No ratings yet
ISS - Module 3
11 pages
Data Mining and Warehousing Lecture-1,2
No ratings yet
Data Mining and Warehousing Lecture-1,2
37 pages
Unit 01
No ratings yet
Unit 01
10 pages
Lecture 1 & 2
No ratings yet
Lecture 1 & 2
14 pages
Ai Pass
No ratings yet
Ai Pass
12 pages
Data Mining and Data Warehouse
No ratings yet
Data Mining and Data Warehouse
11 pages
Data Mining Display
No ratings yet
Data Mining Display
20 pages
Introduction To Data Mining and Data Warehousing
No ratings yet
Introduction To Data Mining and Data Warehousing
2 pages
Unit Iii
No ratings yet
Unit Iii
10 pages
DM & W SQ
No ratings yet
DM & W SQ
15 pages
Elaborated DWH DataMining Assignment Answers
No ratings yet
Elaborated DWH DataMining Assignment Answers
8 pages
Module 3 DM
No ratings yet
Module 3 DM
9 pages
Data Mining Warehousing DistributedDBMS Summary
No ratings yet
Data Mining Warehousing DistributedDBMS Summary
5 pages
MCA 301 Data Mining Notes
No ratings yet
MCA 301 Data Mining Notes
6 pages
DWDM Unit 1 (R23)
No ratings yet
DWDM Unit 1 (R23)
85 pages
DWDM Fresh Notes For Unit 1, Unit 2, Unit 3
No ratings yet
DWDM Fresh Notes For Unit 1, Unit 2, Unit 3
54 pages
DWM 2
No ratings yet
DWM 2
31 pages
Data Warehousing Mining
No ratings yet
Data Warehousing Mining
26 pages
Data Warehousing & Data Mining Unit-3 Notes
No ratings yet
Data Warehousing & Data Mining Unit-3 Notes
27 pages
??? ????????? ???
No ratings yet
??? ????????? ???
21 pages
DMT Unit-1
No ratings yet
DMT Unit-1
59 pages
Data Notes
No ratings yet
Data Notes
37 pages
Data Warehousing and Data Mining
No ratings yet
Data Warehousing and Data Mining
36 pages
Data Warehousing & Data Mining
No ratings yet
Data Warehousing & Data Mining
16 pages
Data Ming Unit 2
No ratings yet
Data Ming Unit 2
8 pages
Data Warehousing & Mining Overview
No ratings yet
Data Warehousing & Mining Overview
55 pages
Data Mining
No ratings yet
Data Mining
48 pages
Data Mining v3
No ratings yet
Data Mining v3
54 pages
DW Assignment
No ratings yet
DW Assignment
6 pages
Data Mining
No ratings yet
Data Mining
4 pages
DWDM
No ratings yet
DWDM
18 pages
Business Analytics
No ratings yet
Business Analytics
3 pages
Report Group 10
No ratings yet
Report Group 10
13 pages
Part A Aim: Prerequisite: Database Outcome: To Impart Knowledge of Data Warehouse and Data Mining Theory
No ratings yet
Part A Aim: Prerequisite: Database Outcome: To Impart Knowledge of Data Warehouse and Data Mining Theory
4 pages
Ex 1
No ratings yet
Ex 1
14 pages
Build The Models
No ratings yet
Build The Models
7 pages
Chapter 1&2
No ratings yet
Chapter 1&2
91 pages
Unit 3 - Data Warehouse
No ratings yet
Unit 3 - Data Warehouse
26 pages
Ba Unit 2 Imp
No ratings yet
Ba Unit 2 Imp
9 pages
D-Unit-1 R16
No ratings yet
D-Unit-1 R16
17 pages
Lec 11 - DW
No ratings yet
Lec 11 - DW
32 pages
Data Mining and Data Warehouse: Qis College of Engineering & Technology Ongole
No ratings yet
Data Mining and Data Warehouse: Qis College of Engineering & Technology Ongole
10 pages
Definition of Data Science
No ratings yet
Definition of Data Science
38 pages
Difference Between Data Warehousing and Data Mining: Data Warehouse Architecture Three-Tier Data Warehouse Architecture
No ratings yet
Difference Between Data Warehousing and Data Mining: Data Warehouse Architecture Three-Tier Data Warehouse Architecture
10 pages
Database 4
No ratings yet
Database 4
35 pages
UNIT 1 Data Warehouseing
No ratings yet
UNIT 1 Data Warehouseing
26 pages
Lecture 7 and 8
No ratings yet
Lecture 7 and 8
20 pages
DATA Mining UNIT1 DATA Mining UNIT1: Operating System (Sindhi College) Operating System (Sindhi College)
No ratings yet
DATA Mining UNIT1 DATA Mining UNIT1: Operating System (Sindhi College) Operating System (Sindhi College)
24 pages
Data Mining: Concepts and Techniques
No ratings yet
Data Mining: Concepts and Techniques
70 pages
Session 35 - Data Mining and Data Warehousing
No ratings yet
Session 35 - Data Mining and Data Warehousing
14 pages
1 What Is Data Mining
No ratings yet
1 What Is Data Mining
9 pages
Data Mining Abstract
No ratings yet
Data Mining Abstract
6 pages
DBMS 25 26 Notes
No ratings yet
DBMS 25 26 Notes
118 pages
User Guide For Magento 2 SEO Extension - Magefan
No ratings yet
User Guide For Magento 2 SEO Extension - Magefan
55 pages
Data Definition Language
No ratings yet
Data Definition Language
4 pages
Upgrading - Alluxio v2.7.3 (Stable) Documentation
No ratings yet
Upgrading - Alluxio v2.7.3 (Stable) Documentation
4 pages
Isc2 SSCP 1 6 1 The Asset Management Lifecycle Second Half 031422
No ratings yet
Isc2 SSCP 1 6 1 The Asset Management Lifecycle Second Half 031422
5 pages
Tableau Training Resources
No ratings yet
Tableau Training Resources
7 pages
ASE15UpgradeChecklist For 12.x V3.0 PDF
No ratings yet
ASE15UpgradeChecklist For 12.x V3.0 PDF
97 pages
Swetha - Etl - MDM - Developer 2
No ratings yet
Swetha - Etl - MDM - Developer 2
8 pages
SAS94 9CHKJ3 12001364 Win WRKSTN
No ratings yet
SAS94 9CHKJ3 12001364 Win WRKSTN
3 pages
Data Quality Programming Techniques With SAS Cloud Analytic Services
No ratings yet
Data Quality Programming Techniques With SAS Cloud Analytic Services
10 pages
TCLs
No ratings yet
TCLs
3 pages
Bridge Tables and Cognos 8
No ratings yet
Bridge Tables and Cognos 8
11 pages
Asset Management System
100% (1)
Asset Management System
21 pages
Database Management System - Project
No ratings yet
Database Management System - Project
7 pages
M5 Glossary
No ratings yet
M5 Glossary
2 pages
Grade12 Practical Questions
No ratings yet
Grade12 Practical Questions
6 pages
Other Relevant Roadmaps: Postgresql Roadmap Backend Developer Roadmap
No ratings yet
Other Relevant Roadmaps: Postgresql Roadmap Backend Developer Roadmap
1 page
Unit 1 Part 1
No ratings yet
Unit 1 Part 1
18 pages
SQL Server Data Encryption & Auditing
No ratings yet
SQL Server Data Encryption & Auditing
35 pages
SSAS Time Calculations Guide
No ratings yet
SSAS Time Calculations Guide
22 pages
Proven Success 1Z0-083 Exam Dumps - Oracle Database Administration II
No ratings yet
Proven Success 1Z0-083 Exam Dumps - Oracle Database Administration II
11 pages
CHAPTER 3: Big Data Adoption and Planning Considerations
No ratings yet
CHAPTER 3: Big Data Adoption and Planning Considerations
23 pages
Mysql
No ratings yet
Mysql
4 pages
Pentestmonkey
No ratings yet
Pentestmonkey
5 pages
Ankit Verma
No ratings yet
Ankit Verma
8 pages
Manwant Singh Bala: Project Presentation
No ratings yet
Manwant Singh Bala: Project Presentation
7 pages
IMDb Movie Review Sentiment Analysis
No ratings yet
IMDb Movie Review Sentiment Analysis
18 pages
Business Analytics Essentials
No ratings yet
Business Analytics Essentials
43 pages
CH09 PPT DesigningDatabase
No ratings yet
CH09 PPT DesigningDatabase
43 pages
BDA Hive Practical
No ratings yet
BDA Hive Practical
7 pages

Unit 5 Notes

Uploaded by

Unit 5 Notes

Uploaded by

UNIT V MAJOR TECHNIQUES IN DATA SCIENCE

1. Phases of Data Mining

Applications of Data Mining

 Business & Marketing:

Data Warehouse Architecture

1. Data Source Layer

2. Data Staging Layer

 Extract: Data is extracted from different sources.

3. Data Warehouse Layer

Key characteristics of the data warehouse layer:

 Relational Database: Data is typically stored in relational database management systems

4. Data Presentation Layer

Features of the data presentation layer:

Applications of Data Warehousing

1. Business Intelligence (BI):

3.DATA MINING VS DATA WAREHOUSE

Steps in Machine Learning Workflow

Types of Machine Learning

Applications of Machine Learning

How Supervised Learning Works?

Types of supervised Machine learning Algorithms:

Advantages of Supervised learning:

Disadvantages of supervised learning:

Working of Unsupervised Learning

Unsupervised Learning algorithms:

Advantages of Unsupervised Learning

o Unsupervised learning is used for more complex tasks as compared to supervised

Disadvantages of Unsupervised Learning

o Unsupervised learning is intrinsically more difficult than supervised learning as it does

7.BUSINESS INTELLIGENCE (BI)

Key Components of Business Intelligence

1. Data Collection and Integration:

Types of Business Intelligence Tools

1. Data Warehousing Solutions:

Key Benefits of Business Intelligence

1. Improved Decision Making:

Use Cases of Business Intelligence

1. Sales and Marketing:

Challenges of Business Intelligence

Who is using cloud computing?

Characteristics of Cloud Computing

Common Cloud Computing Services

Future Trends in Cloud Computing

You might also like