0% found this document useful (0 votes)

29 views11 pages

TLMweek 1 Intro Ds

Uploaded by

oneaboveall0016

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views11 pages

TLMweek 1 Intro Ds

Uploaded by

oneaboveall0016

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

TLM

INTRODUCTION TO DATA SCIENCE

Data science can be seen as the interdisciplinary field that deals with the creation of insights or data
products from a given set of data files (usually in unstructured form), using analytics methodologies.
The data it handles is often what is commonly known as “big data,” although it is often applied to
conventional data streams, such as the ones usually encountered in the databases, the spreadsheets,
and the text documents of a business. We’ll take a closer look into big data in the next section.

Data science is not a guaranteed tool for finding the answers to the questions we have about the
data, though it does a good job at shedding some light on what we are investigating. For example, we
may be interested in figuring out the answer to “How can we predict customer attrition based on the
demographics data we have on them?” This is something that may not be possible with that data
alone.

However, investigating the data may help us come up with other questions, like “Can demographics
data supplement a prediction system of attrition, based on the orders they have made?” Also, it is as
good as the data we have, so it doesn’t make sense to expect breathtaking insights if the data we
have is of low quality.

The term “data science” combines two key elements: “data” and “science.”
1. Data: It refers to the raw information that is collected, stored, and processed. In today’s digital
age, enormous amounts of data are generated from various sources such as sensors, social
media, transactions, and more. This data can come in structured formats (e.g., databases) or
unstructured formats (e.g., text, images, videos).

2. Science: It refers to the systematic study and investigation of phenomena using scientific
methods and principles. Science involves forming hypotheses, conducting experiments,
analyzing data, and drawing conclusions based on evidence.
What is data science used for?

Data science is used to study data in four main ways:

1. Descriptive analysis

Descriptive analysis examines data to gain insights into what happened or what is happening in
the data environment. It is characterized by data visualizations such as pie charts, bar charts, line
graphs, tables, or generated narratives. For example, a flight booking service may record data like the
number of tickets booked each day. Descriptive analysis will reveal booking spikes, booking slumps,
and high-performing months for this service.

2. Diagnostic analysis

Diagnostic analysis is a deep-dive or detailed data examination to understand why something

happened. It is characterized by techniques such as drill-down, data discovery, data mining, and
correlations. Multiple data operations and transformations may be performed on a given data set to
discover unique patterns in each of these techniques. For example, the flight service might drill down
on a particularly high-performing month to better understand the booking spike. This may lead to the
discovery that many customers visit a particular city to attend a monthly sporting event.

3. Predictive analysis

Predictive analysis uses historical data to make accurate forecasts about data patterns that may occur
in the future. It is characterized by techniques such as machine learning, forecasting, pattern
matching, and predictive modeling. In each of these techniques, computers are trained to reverse
engineer causality connections in the data. For example, the flight service team might use data
science to predict flight booking patterns for the coming year at the start of each year. The computer
program or algorithm may look at past data and predict booking spikes for certain destinations in
May. Having anticipated their customer’s future travel requirements, the company could start
targeted advertising for those cities from February.

4. Prescriptive analysis

Prescriptive analytics takes predictive data to the next level. It not only predicts what is likely to
happen but also suggests an optimum response to that outcome. It can analyze the potential
implications of different choices and recommend the best course of action. It uses graph analysis,
simulation, complex event processing, neural networks, and recommendation engines from machine
learning.

Back to the flight booking example, prescriptive analysis could look at historical marketing campaigns
to maximize the advantage of the upcoming booking spike. A data scientist could project booking
outcomes for different levels of marketing spend on various marketing channels. These data forecasts
would give the flight booking company greater confidence in their marketing decisions.

Applications of data Science:

Healthcare

Finance

Marketing

Retail

Transportation

Education

Entertainment

Manufacturing
Energy

Government
DATA SCIENCE VS. BUSINESS INTELLIGENCE

Data Science: Data science is basically a field in which information and knowledge are extracted
from the data by using various scientific methods, algorithms, and processes. It can thus be
defined as a combination of various mathematical tools, algorithms, statistics, and machine
learning techniques which are thus used to find the hidden patterns and insights from the data
which help in the decision-making process. Data science deals with both structured as well as
unstructured data. It is related to both data mining and big data. Data science involves studying
the historic trends and thus using its conclusions to redefine present trends and also predict
future trends. Technologies.

Business Intelligence: Business intelligence (BI) is a set of technologies, applications, and

processes that are used by enterprises for business data analysis. It is used for the conversion of
raw data into meaningful information which is thus used for business decision-making and
profitable actions. It deals with the analysis of structured and sometimes unstructured data
which paves the way for new and profitable business opportunities. It supports decision-making
based on facts rather than assumption-based decision-making. Thus it has a direct impact on the
business decisions of an enterprise. Business intelligence tools enhance the chances of an
enterprise to enter a new market as well as help in studying the impact of marketing efforts.

S.
No. Factor Data Science Business Intelligence

It is a field that uses mathematics, It is basically a set of technologies,

statistics and various other tools applications and processes that are
1. Concept
to discover the hidden patterns in used by the enterprises for business
the data. data analysis.

2. Focus It focuses on the future. It focuses on the past and present.

It deals with both structured as It mainly deals only with structured

3. Data
well as unstructured data. data.

Data science is much more flexible It is less flexible as in case of business

4. Flexibility as data sources can be added as intelligence data sources need to be
per requirement. pre-planned.

It makes use of the scientific

5. Method It makes use of the analytic method.
method.

It has a higher complexity in

It is much simpler when compared to
6. Complexity comparison to business
data science.
intelligence.

7. Expertise It’s expertise is data scientist. It’s expertise is the business user.

It deals with the questions of what It deals with the question of what
8. Questions
will happen and what if. happened.

The data to be used is

9. Storage Data warehouse is utilized to hold data.
disseminated in real-time clusters.
S.
No. Factor Data Science Business Intelligence

The ELT (Extract-Load-Transform) The ETL (Extract-Transform-Load)

Integration process is generally used for the process is generally used for the
10.
of data integration of data for data science integration of data for business
applications. intelligence applications.

It’s tools are InsightSquared Sales

It’s tools are SAS, BigML,
11. Tools Analytics, Klipfolio, ThoughtSpot, Cyfe,
MATLAB, Excel, etc.
TIBCO Spotfire, etc.

Companies can harness their

Business Intelligence helps in
potential by anticipating the future
performing root cause analysis on a
12. Usage scenario using data science in
failure or to understand the current
order to reduce risk and increase
status.
income.

Business Intelligence has lesser business

Greater business value is achieved
value as the extraction process of
Business with data science in comparison to
13. business value carries out statically by
Value business intelligence as it
plotting charts and KPIs (Key
anticipates future events.
Performance Indicator).

The technologies such as Hadoop

are available and others are The sufficient tools and technologies
Handling
14. evolving for handling are not available for handling large data
data sets
understanding Its Its arge data sets.
sets.

BUSINESS INTELLIGENCE (BI) VS. STATISTICS

As for business intelligence, although it too deals with business data (almost exclusively), it does so
through rudimentary data analysis methods (mainly statistics), data visualization, and other
techniques, such as reports and presentations, with a focus on business applications. Also, it handles
mainly conventional sized data, almost always structured, with little to no need for in-depth data
analytics. Moreover, business intelligence is primarily concerned with getting useful information from
the data and doesn’t involve the creation of data products (unless you count fancy plots as data
products). Business intelligence is not a kind of data science, nor is it a scientific field. Business
intelligence is essential in many organizations, but if you are after hard-to-find insights or have
challenging data streams in your company’s servers, then business intelligence is not what you are
after. Nevertheless, business intelligence is not completely unrelated to data science either. Given
some training and a lot of practice, a business intelligence analyst can evolve into a data scientist

Statistics is a field that is similar to data science and business intelligence, but it has its own domain.
Namely, it involves doing basic manipulations on a set of data (usually tidy and easy to work with) and
applying a set of tests and models to that data. It’s like a conventional vehicle that you drive on city
roads. It does a decent job, but you wouldn’t want to take that vehicle to the country roads or off-
road. For this kind of terrain you’ll need something more robust and better-equipped for messy data:
data science. If you have data that comes straight from a database, it’s fairly clean, and all you want to
do is create a simple regression model or check to see if February sales are significantly different from
January sales, analyzing statistics will work. That’s why statisticians remain in business, even if most
of the methods they use are not as effective as the techniques a data scientist employs. Scientists
make use of statistics, though it is not formally a scientific field. This is an important point. In fact,
even mathematicians look down on the field of statistics, for the simple reason that it fails to create
robust theories that can be generalized to other aspects of Mathematics. So, even though statistical
techniques are employed in various areas, they are inherently inferior to most principles of
Mathematics and of Science. Also, statistics is not a fool-proof framework when it comes to drawing
inferences about the data. Despite the confidence metrics it provides, its results are only as good as
the assumptions it makes about the distribution of each variable, and how well these assumptions
hold. This is why many scientists also employ simulation methods to ensure that the conclusion their
statistical models come up with are indeed viable and robust enough to be used in the real world.

Factor Business Intelligence (BI) Statistics

Convert raw data into meaningful
Summarize, analyze, and infer properties of
Purpose information for decision-making and
a population based on sample data.
identifying business opportunities.
Understanding patterns, testing
Past and present data to support
Focus hypotheses, and making predictions based
decision-making and strategic planning.
on sample data.
Primarily structured data, but can also Structured data, but also deals with semi-
Data Type handle semi-structured and unstructured structured and unstructured data in some
data. advanced applications.
Uses analytic methods and data Uses mathematical theories, probabilistic
Methodology visualization techniques to identify trends models, and statistical tests to draw
and insights. conclusions.
Data warehousing, dashboards, reporting Statistical software like SPSS, R, SAS, and
Tools
tools like Power BI, Tableau, and SQL. Python libraries like NumPy and Pandas.
Visual reports, dashboards, KPIs, and Statistical summaries, p-values, confidence
Output
actionable business insights. intervals, and regression models.
Business analysts, BI developers, data
Expertise Statisticians, data scientists, researchers.
engineers.
ETL (Extract-Transform-Load) process, Data sampling, hypothesis testing, and
Data Handling
real-time data processing. inferential statistics.
Less complex, focuses on user-friendly More complex, involving in-depth
Complexity
tools and interfaces for business users. mathematical and statistical analysis.
Time Retrospective (what happened) and Predictive (what will happen) and
Orientation descriptive (what is happening). inferential (what is likely to be true).
Less flexible, relies on predefined data More flexible, adaptable to various types
Flexibility
sources and structures. of data and methodologies.
Direct impact on business decisions, Provides insights for making informed
Business
strategic planning, and performance decisions and understanding underlying
Value
measurement. data patterns.

BIG DATA

Big data refers to extremely large datasets that are complex, grow rapidly, and require advanced
techniques and technologies for storage, analysis, and processing. Here’s an overview of big data, its
characteristics, and its applications:

Characteristics of Big Data

1. Volume: The sheer amount of data generated every second from various sources, such as social
media, sensors, transactions, and more.

2. Velocity: The speed at which new data is generated and the pace at which it must be processed
to be useful.

3. Variety: The different types of data, including structured, semi-structured, and unstructured data
(e.g., text, images, videos, sensor data).
4. Veracity: The quality and accuracy of the data, which can vary and affect the reliability of
analysis.

5. Value: The potential insights and benefits that can be derived from analyzing the data.

6. Variability: Variability often applies to sets of big data, which might have multiple meanings or be
formatted differently in separate data sources.
Technologies and Tools for Big Data

1. Storage Solutions: Distributed file systems like Hadoop Distributed File System (HDFS) and
cloud storage solutions such as Amazon S3.

2. Processing Frameworks:

Hadoop: An open-source framework for distributed storage and processing.

Spark: A fast and general-purpose cluster computing system for big data processing.

3. Data Management: NoSQL databases (e.g., MongoDB, Cassandra) for handling unstructured
data.

4. Data Integration: Tools like Apache Kafka and Apache NiFi for data ingestion and streaming.

5. Analytics and Machine Learning: Tools and frameworks like Apache Mahout, TensorFlow, and
H2O.ai for big data analytics and machine learning.

6. Visualization: Tools like Tableau, Power BI, and D3.js for visualizing large datasets.

Applications of Big Data

1. Healthcare: Predictive analytics for patient care, personalized medicine, and epidemic outbreak
prediction.

2. Finance: Fraud detection, algorithmic trading, and risk management.

3. Retail: Customer behavior analysis, inventory management, and personalized marketing.

4. Telecommunications: Network optimization, predictive maintenance, and customer churn

analysis.

5. Government: Smart city initiatives, public safety, and efficient resource management.

6. Entertainment: Content recommendation systems, audience analysis, and sentiment analysis on

social media.

Big data benefits

Organizations that use and manage large data volumes correctly can reap many benefits, such as the
following:

Enhanced decision-making. An organization can glean important insights, risks, patterns or

trends from big data. Large data sets are meant to be comprehensive and encompass as much
information as the organization needs to make better decisions. Big data insights let business
leaders quickly make data-driven decisions that impact their organizations.
Better customer and market insights. Big data that covers market trends and consumer habits
gives an organization the important insights it needs to meet the demands of its intended
audiences. Product development decisions, in particular, benefit from this type of insight.

Cost savings. Big data can be used to pinpoint ways businesses can enhance operational
efficiency. For example, analysis of big data on a company's energy use can help it be more
efficient.

Positive social impact. Big data can be used to identify solvable problems, such as improving
healthcare or tackling poverty in a certain area.

Big data challenges

There are common challenges for data experts when dealing with big data. They include the
following:

Architecture design. Designing a big data architecture focused on an organization's processing

capacity is a common challenge for users. Big data systems must be tailored to an organization's
particular needs. These types of projects are often do-it-yourself undertakings that require IT
and data management teams to piece together a customized set of technologies and tools.

Skill requirements. Deploying and managing big data systems also requires new skills compared
to the ones that database administrators and developers focused on relational software typically
possess.

Costs. Using a managed cloud service can help keep costs under control. However, IT managers
still must keep a close eye on cloud computing use to make sure costs don't get out of hand.

Migration. Migrating on-premises data sets and processing workloads to the cloud can be a
complex process.

Accessibility. Among the main challenges in managing big data systems is making the data
accessible to data scientists and analysts, especially in distributed environments that include a
mix of different platforms and data stores. To help analysts find relevant data, data management
and analytics teams are increasingly building data catalogs that incorporate metadata
management and data lineage functions.

Integration. The process of integrating sets of big data is also complicated, particularly when data
variety and velocity are factors.

Introduction to Machine Learning

What is Machine Learning?
Machine learning is a subset of artificial intelligence (AI) that involves the development of algorithms
and statistical models that enable computers to perform tasks without explicit instructions. Instead of
being programmed to follow specific instructions, machine learning algorithms identify patterns in
data and make predictions or decisions based on that data.

Types of Machine Learning

1. Supervised Learning: In supervised learning, the model is trained on a labeled dataset, which
means that each training example is paired with an output label. The goal is for the model to
learn to predict the output from the input data. Examples include regression and classification
tasks.
2. Unsupervised Learning: In unsupervised learning, the model is trained on an unlabeled dataset,
and it tries to learn the underlying structure of the data. Examples include clustering and
association tasks.

3. Semi-Supervised Learning: This approach combines a small amount of labeled data with a large
amount of unlabeled data during training. It falls between supervised and unsupervised learning.

4. Reinforcement Learning: In reinforcement learning, the model learns by interacting with an

environment and receiving rewards or penalties based on its actions. The goal is to learn a
strategy that maximizes cumulative rewards.
Applications of Machine Learning

Machine learning is used in various domains to solve complex problems and automate tasks. Some
common applications include:

1. Natural Language Processing (NLP): Used in language translation, sentiment analysis, and
chatbots.

2. Computer Vision: Applied in facial recognition, image classification, and autonomous vehicles.

3. Healthcare: Used for disease prediction, personalized treatment plans, and medical imaging
analysis.

4. Finance: Used for fraud detection, algorithmic trading, and credit scoring.

5. Marketing: Applied in customer segmentation, recommendation systems, and targeted

advertising.

Life Cycle of Machine Learning

Gathering Data

Data preparation

Data Wrangling

Analyse Data

Train the model

Test the model

Deployment
Types of Machine Learning Algorithms

Machine learning algorithms can be broadly classified into several categories based on their
learning styles and the nature of tasks they are designed to solve. Here are the primary types of
machine learning algorithms:

1. Supervised Learning Algorithms

Supervised learning algorithms are trained on labeled data. This means that each training example
is paired with an output label. The algorithm learns to predict the output from the input data.

Linear Regression: Used for predicting continuous values. It models the relationship between
a dependent variable and one or more independent variables using a linear equation.

Logistic Regression: Used for binary classification problems. It predicts the probability of a
binary outcome using a logistic function.

Support Vector Machines (SVM): Used for both classification and regression tasks. It finds the
optimal hyperplane that separates data points of different classes with the maximum margin.

Decision Trees: Used for classification and regression tasks. It splits the data into subsets
based on the value of input features, forming a tree-like structure.
Random Forests: An ensemble learning method that combines multiple decision trees to
improve predictive performance and reduce overfitting.

k-Nearest Neighbors (k-NN): A simple, instance-based learning algorithm that classifies a data
point based on the majority class among its k nearest neighbors.

Naive Bayes: Based on Bayes' theorem, it assumes independence between features and is
used for classification tasks.
2. Unsupervised Learning Algorithms

Unsupervised learning algorithms are trained on unlabeled data. They try to learn the underlying
structure of the data without any explicit instructions on what to predict.

K-Means Clustering: Partitions data into k clusters based on feature similarity. Each data
point is assigned to the nearest cluster center.

Hierarchical Clustering: Builds a hierarchy of clusters either by merging smaller clusters into
larger ones (agglomerative) or by splitting larger clusters into smaller ones (divisive).

Principal Component Analysis (PCA): Reduces the dimensionality of data by transforming it

into a set of linearly uncorrelated components called principal components.

t-Distributed Stochastic Neighbor Embedding (t-SNE): Used for dimensionality reduction and
visualization of high-dimensional data.

Association Rule Learning: Identifies interesting relationships between variables in large

databases. Apriori and Eclat are common algorithms used for this purpose.

3. Semi-Supervised Learning Algorithms

Semi-supervised learning algorithms use a combination of a small amount of labeled data and a
large amount of unlabeled data. This approach helps improve learning accuracy when labeled data
is scarce.

Self-Training: Uses a model trained on labeled data to predict labels for the unlabeled data.
The model is then retrained on the combined dataset.

Co-Training: Utilizes two or more models trained on different views of the data to label the
unlabeled data, iteratively improving each other.

4. Reinforcement Learning Algorithms

Reinforcement learning algorithms learn by interacting with an environment. The algorithm, known
as an agent, takes actions and receives rewards or penalties based on the outcomes of those
actions. The goal is to learn a strategy that maximizes cumulative rewards.

Q-Learning: A value-based method that aims to learn the value of taking a particular action in
a particular state.

Deep Q-Networks (DQN): Combines Q-learning with deep neural networks to handle high-
dimensional state spaces.

Policy Gradient Methods: Directly optimizes the policy by adjusting the parameters in the
direction that maximizes expected rewards.

Proximal Policy Optimization (PPO): An advanced policy gradient method that balances
exploration and exploitation to improve training stability.
5. Ensemble Learning Algorithms

Ensemble learning algorithms combine the predictions of multiple base models to produce a final
prediction. This approach often improves the accuracy and robustness of the model.

Bagging (Bootstrap Aggregating): Builds multiple models from different subsamples of the
training dataset and aggregates their predictions. Random Forest is a popular bagging
algorithm.

Boosting: Builds models sequentially, each trying to correct the errors of the previous one.
Common boosting algorithms include AdaBoost, Gradient Boosting, and XGBoost.

Stacking: Combines multiple models by training a meta-model to make the final prediction
based on the outputs of the base models.

Data Science Chacha
No ratings yet
Data Science Chacha
150 pages
Beginner's Guide to Data Science
No ratings yet
Beginner's Guide to Data Science
26 pages
DS 1
No ratings yet
DS 1
85 pages
Data Science
100% (1)
Data Science
31 pages
Unit 1
No ratings yet
Unit 1
28 pages
What Is Data Science - A Beginner's Guide To Data Science - Edureka
No ratings yet
What Is Data Science - A Beginner's Guide To Data Science - Edureka
14 pages
Data Science Basics
No ratings yet
Data Science Basics
25 pages
UNIT-IV Basics of Data Science 7 Hours: What Is AI?
No ratings yet
UNIT-IV Basics of Data Science 7 Hours: What Is AI?
31 pages
Unit-1 Data Science
No ratings yet
Unit-1 Data Science
17 pages
00 Introduction To Data Science
No ratings yet
00 Introduction To Data Science
4 pages
Chapter 1
No ratings yet
Chapter 1
85 pages
Data Science Unit 1st
No ratings yet
Data Science Unit 1st
25 pages
Introduction To Data Science L1
No ratings yet
Introduction To Data Science L1
28 pages
What Is Data Science?: Module - 1
No ratings yet
What Is Data Science?: Module - 1
29 pages
Data Science 2020
100% (1)
Data Science 2020
123 pages
Himadev
No ratings yet
Himadev
37 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
37 pages
DS B&V-1
No ratings yet
DS B&V-1
30 pages
Data Science Life Cycle
No ratings yet
Data Science Life Cycle
12 pages
09 Handout 1
No ratings yet
09 Handout 1
4 pages
Ab Assignment 3
No ratings yet
Ab Assignment 3
7 pages
Data Science Question Bank T.Y.B.Sc.
No ratings yet
Data Science Question Bank T.Y.B.Sc.
81 pages
Unit 1 Notes
No ratings yet
Unit 1 Notes
17 pages
DataScience Reading
No ratings yet
DataScience Reading
6 pages
Unit 1
No ratings yet
Unit 1
60 pages
Data Science Introduction
No ratings yet
Data Science Introduction
24 pages
What Is Data Science A Beginner's Guide To Data Science
No ratings yet
What Is Data Science A Beginner's Guide To Data Science
15 pages
Ids Unit 1 Final
No ratings yet
Ids Unit 1 Final
30 pages
What Is Data Science - Data Science Explained - AWS
No ratings yet
What Is Data Science - Data Science Explained - AWS
13 pages
Business Intelligence and Data Science Overview
No ratings yet
Business Intelligence and Data Science Overview
21 pages
Foundations of Data Science UNIT 1
No ratings yet
Foundations of Data Science UNIT 1
23 pages
Lecture 1 What Is Data Science Prerequisites, Lifecycle and Applications Simplilearn
No ratings yet
Lecture 1 What Is Data Science Prerequisites, Lifecycle and Applications Simplilearn
5 pages
Data Science
No ratings yet
Data Science
59 pages
Week 1 Data Science
No ratings yet
Week 1 Data Science
17 pages
Data Science Unit 1
No ratings yet
Data Science Unit 1
30 pages
Data Science
No ratings yet
Data Science
9 pages
Data Science Ppt1 Update
No ratings yet
Data Science Ppt1 Update
67 pages
5 - Data Analytics, Data Science and Machine Learning
No ratings yet
5 - Data Analytics, Data Science and Machine Learning
56 pages
Data Science Essentials for Beginners
No ratings yet
Data Science Essentials for Beginners
203 pages
Intro To Data Science - LVC1 With Markings
No ratings yet
Intro To Data Science - LVC1 With Markings
22 pages
Fundamentals of Data Science Course
100% (3)
Fundamentals of Data Science Course
62 pages
Ch7-Overview of Data Science-Part 1
No ratings yet
Ch7-Overview of Data Science-Part 1
37 pages
Intro To Data Science - LVC1
No ratings yet
Intro To Data Science - LVC1
22 pages
Introduction To Data Science What Is Data Science?
No ratings yet
Introduction To Data Science What Is Data Science?
11 pages
Class X Data Science
No ratings yet
Class X Data Science
29 pages
DS Notes
No ratings yet
DS Notes
159 pages
BA UNIT III Developing Analytical Talent
No ratings yet
BA UNIT III Developing Analytical Talent
73 pages
Data Science Essentials for Beginners
No ratings yet
Data Science Essentials for Beginners
14 pages
EDS Unit 1?
No ratings yet
EDS Unit 1?
15 pages
Class 2 - Lifecycle ML Concepts in Ds
No ratings yet
Class 2 - Lifecycle ML Concepts in Ds
22 pages
DS RC
No ratings yet
DS RC
92 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
8 pages
Project V
No ratings yet
Project V
35 pages
Intro to Data Science Fields
No ratings yet
Intro to Data Science Fields
8 pages
Module-1 Notes Basics 09.07.25
No ratings yet
Module-1 Notes Basics 09.07.25
45 pages
Computational Data Science - Unit 1
No ratings yet
Computational Data Science - Unit 1
18 pages
Ids Sem Ans U-I
No ratings yet
Ids Sem Ans U-I
17 pages
Data Science Introduction
No ratings yet
Data Science Introduction
35 pages
Data Science for Industry Innovators
No ratings yet
Data Science for Industry Innovators
2 pages
SEO Content Writer & Editor Profile
No ratings yet
SEO Content Writer & Editor Profile
1 page
Program FIFO CPU Scheduling Alogrithm
No ratings yet
Program FIFO CPU Scheduling Alogrithm
3 pages
Two-Port Network Parameters Guide
No ratings yet
Two-Port Network Parameters Guide
108 pages
Civil & Structural Design Guide
No ratings yet
Civil & Structural Design Guide
2 pages
Forex Powerband Dominator
100% (2)
Forex Powerband Dominator
49 pages
Teacher Education Program: Core Gateway College, Inc
100% (1)
Teacher Education Program: Core Gateway College, Inc
17 pages
NoSQL vs RDBMS: A Modern Shift
100% (1)
NoSQL vs RDBMS: A Modern Shift
142 pages
ZPA Professional Written Exam
No ratings yet
ZPA Professional Written Exam
35 pages
Corsaro Anzivino 2021 Understanding Value Creation in Digital Context An Empirical Investigation of b2b
No ratings yet
Corsaro Anzivino 2021 Understanding Value Creation in Digital Context An Empirical Investigation of b2b
33 pages
The Most Complete Starter Kit For MEGA V1.0.19.09.17 PDF
100% (1)
The Most Complete Starter Kit For MEGA V1.0.19.09.17 PDF
225 pages
Voltage+Regulation (20 Files Merged)
No ratings yet
Voltage+Regulation (20 Files Merged)
20 pages
MF 5600
No ratings yet
MF 5600
11 pages
Software Testing and Audit Ncs-071
0% (1)
Software Testing and Audit Ncs-071
1 page
ACC Branch ID Creation Form
No ratings yet
ACC Branch ID Creation Form
2 pages
Usb Hid Class: Application Note: AN00129
No ratings yet
Usb Hid Class: Application Note: AN00129
24 pages
Active Duty & Reserve "A" School Guide
No ratings yet
Active Duty & Reserve "A" School Guide
31 pages
Me2032 QB
No ratings yet
Me2032 QB
10 pages
Ism Notes All Units
No ratings yet
Ism Notes All Units
158 pages
Salesforce Exam Purchase Confirmation
No ratings yet
Salesforce Exam Purchase Confirmation
1 page
Ampliflow T-Boost Ssfly000029-Ena4
No ratings yet
Ampliflow T-Boost Ssfly000029-Ena4
2 pages
Đề Cương Tiếng Anh 9 Hk II
No ratings yet
Đề Cương Tiếng Anh 9 Hk II
30 pages
Smart Door Lock User Guide
100% (1)
Smart Door Lock User Guide
17 pages
Keyestudio ESP32 Sensor Kit Guide
100% (1)
Keyestudio ESP32 Sensor Kit Guide
351 pages
PRWB User Manual
No ratings yet
PRWB User Manual
7 pages
Julio Barrientos Resume - Principal PM Lead 2023
No ratings yet
Julio Barrientos Resume - Principal PM Lead 2023
2 pages
Ifr Com120b Training
No ratings yet
Ifr Com120b Training
78 pages
Audio Production and Critical Listening Technical Ear Training 2nd Edition Jason Andrew Corey PDF Download
No ratings yet
Audio Production and Critical Listening Technical Ear Training 2nd Edition Jason Andrew Corey PDF Download
83 pages
Automatic Transmission Fluids Chart by Vehicle and Transmission - English
No ratings yet
Automatic Transmission Fluids Chart by Vehicle and Transmission - English
3 pages
Tadka TV (@tadkatvapp) Twitter
No ratings yet
Tadka TV (@tadkatvapp) Twitter
1 page
Fortnite Download All The Ways To Play Fortnite
No ratings yet
Fortnite Download All The Ways To Play Fortnite
1 page

TLMweek 1 Intro Ds

Uploaded by

TLMweek 1 Intro Ds

Uploaded by

TLM

INTRODUCTION TO DATA SCIENCE

Data science is used to study data in four main ways:

Diagnostic analysis is a deep-dive or detailed data examination to understand why something

Applications of data Science:

Business Intelligence: Business intelligence (BI) is a set of technologies, applications, and

It is a field that uses mathematics, It is basically a set of technologies,

2. Focus It focuses on the future. It focuses on the past and present.

It deals with both structured as It mainly deals only with structured

Data science is much more flexible It is less flexible as in case of business

It makes use of the scientific

It has a higher complexity in

The data to be used is

The ELT (Extract-Load-Transform) The ETL (Extract-Transform-Load)

It’s tools are InsightSquared Sales

Companies can harness their

Business Intelligence has lesser business

The technologies such as Hadoop

BUSINESS INTELLIGENCE (BI) VS. STATISTICS

Factor Business Intelligence (BI) Statistics

Characteristics of Big Data

Hadoop: An open-source framework for distributed storage and processing.

Applications of Big Data

2. Finance: Fraud detection, algorithmic trading, and risk management.

3. Retail: Customer behavior analysis, inventory management, and personalized marketing.

4. Telecommunications: Network optimization, predictive maintenance, and customer churn

6. Entertainment: Content recommendation systems, audience analysis, and sentiment analysis on

Big data benefits

Enhanced decision-making. An organization can glean important insights, risks, patterns or

Big data challenges

Architecture design. Designing a big data architecture focused on an organization's processing

Introduction to Machine Learning

Types of Machine Learning

4. Reinforcement Learning: In reinforcement learning, the model learns by interacting with an

5. Marketing: Applied in customer segmentation, recommendation systems, and targeted

Life Cycle of Machine Learning

Train the model

Test the model

1. Supervised Learning Algorithms

Principal Component Analysis (PCA): Reduces the dimensionality of data by transforming it

Association Rule Learning: Identifies interesting relationships between variables in large

3. Semi-Supervised Learning Algorithms

4. Reinforcement Learning Algorithms

You might also like