CUITM217-DATA-SCIENCE Data
CUITM217-DATA-SCIENCE Data
MODULE TEMPLATE
LEVEL 1.1
ASSESSMENT Practicum
Group tasks and presentations on group assignment
Individual in-class activities and programming assignments
Examination
Quiz
Units Module Units
1 – Introduction To 1.0 Introduction
Data And Data
Science Data is a collection of information that can be analyzed to gain
insights and knowledge. Data science is the field that deals with the
extraction of knowledge from data. It involves the use of various
techniques, such as statistical analysis, machine learning, and data
visualization, to uncover patterns and trends in data. Data science is
used in various industries, from finance to healthcare, to improve
decision-making and drive innovation. It is a rapidly growing field
and has become an essential part of many businesses and
organizations.
What is data?
The data science lifecycle is the process of using data to solve problems.
It consists of the following stages:
1. Data collection: This stage involves collecting data from a
variety of sources, such as databases, sensors, and social media.
2. Data cleaning: This stage involves removing errors and
inconsistencies from the data.
3. Data preparation: This stage involves transforming the data into
a format that can be easily analyzed.
4. Data analysis: This stage involves using statistical and machine
learning techniques to extract knowledge from the data.
5. Data visualization: This stage involves creating visualizations to
communicate the findings of the data analysis to others.
6. Data communication: This stage involves communicating the
findings of the data analysis to others clearly and concisely.
Machine learning
Statistical analysis
Data visualization
If you are interested in getting started with data science, there are a few
things you can do:
Activity 1.1
Class discussion:
What data science projects have you heard of? How
are they being used to solve real-world problems?
Here are some examples of how businesses are using data today:
1. Start with a clear goal in mind. What are you hoping to achieve
by using data? Once you know your goal, you can start to collect
and analyse the data that will help you achieve it.
2. Use the right tools and techniques. There are a variety of data
analysis tools and techniques available. Choose the ones that are
most appropriate for your needs and the data that you have.
3. Be sceptical of the data. Not all data is accurate or reliable. Be
sure to verify the data before you use it to make any decisions.
4. Communicate the findings effectively. Once you have analysed
the data, you need to communicate the findings to your team and
stakeholders in a way that is clear and concise.
1.7 Summary
Data and data science are crucial components of today's world. Data
refers to the information that is collected, stored, and analysed by
individuals and organizations. Data science, on the other hand, is the
process of extracting insights from this data using various techniques
such as statistical analysis, machine learning, and artificial
intelligence. Data science helps businesses and individuals to make
informed decisions, improve efficiency, and gain a competitive
advantage. It involves various tasks such as data cleaning, data
analysis, data visualization, and data interpretation. In summary,
data and data science play a vital role in today's world and help
individuals and organizations to succeed in their respective fields.
Van Der Aalst, W., & van der Aalst, W. (2016). Data science in
action (pp. 3-23). Springer Berlin Heidelberg.
2– 2.0 Introduction
INTRODUCTION Power BI is a powerful tool for data science that allows you to create
TO interactive visualizations, reports, and dashboards. It is designed to help
POWERBI/EXCE you make sense of large amounts of data and turn that data into
L actionable insights. With Power BI, you can easily connect to various
data sources, clean and transform data, and create compelling reports
and visualizations. Whether you are a data scientist, analyst, or business
user, Power BI can help you explore, analyze, and communicate your
data more effectively and efficiently.
2.3 Power BI
Power BI is a business intelligence (BI) and data visualization tool that
can be used by data scientists to analyze and visualize data. It is a cloud-
based service that provides a variety of features for data scientists,
including:
Here are some examples of how data scientists can use Power BI:
Here are some additional tips for data scientists using Power BI:
Class discussion:
Design a piece of code to determine which letter grade
a student has obtained based on a given final mark.
To explore data with Power BI, you can follow these steps:
1. Import your data into Power BI. Power BI can import data from
a variety of sources, including databases, spreadsheets, and cloud
storage services.
2. Clean and prepare your data. Before you can explore your data,
you need to clean it and prepare it for analysis. This may involve
removing errors and inconsistencies from the data and
transforming the data into a format that is easy to analyze.
3. Create data visualizations. Power BI provides a variety of data
visualization tools that can be used to explore data. You can
create charts, graphs, and maps to visualize your data and
identify trends, patterns, and outliers.
4. Interact with your data visualizations. Power BI allows you to
interact with your data visualizations to explore your data
further. For example, you can filter and slice your data to focus
on specific subsets of data. You can also drill down into your
data to get more detailed information.
5. Share your findings. Once you have explored your data and
identified key findings, you can share your findings with others
using Power BI. You can share your data visualizations or create
reports to communicate your findings.
Here are some tips for exploring data with Power BI:
Here are some examples of how you can use Power BI to explore data:
To analyse data with Power BI, you can follow these steps:
1. Import your data into Power BI. Power BI can import data from
a variety of sources, including databases, spreadsheets, and cloud
storage services.
2. Clean and prepare your data. Before you can analyze your data,
you need to clean it and prepare it for analysis. This may involve
removing errors and inconsistencies from the data, and
transforming the data into a format that is easy to analyze.
3. Create data models. Power BI uses data models to represent the
data that is imported into the tool. Data models define the
relationships between different variables in the data set.
4. Apply statistical and machine learning functions. Power BI
provides a variety of statistical and machine learning functions
that can be applied to data. You can use these functions to
perform data analysis, such as calculating averages, correlations,
and regressions.
5. Create data visualizations. Power BI provides a variety of data
visualization tools that can be used to communicate the findings
of your data analysis. You can create charts, graphs, and maps to
visualize your data and identify trends, patterns, and outliers.
6. Share your findings. Once you have analysed your data and
identified key findings, you can share your findings with others
using Power BI. You can share your data visualizations or create
reports to communicate your findings.
Here are some tips for analyzing data with Power BI:
Here are some examples of how you can use Power BI to analyze data:
Activity 2.1
Lab activity
Objective: To learn how to use Power BI to
explore and visualize data.
Materials:
Instructions:
Summary
Power BI is a powerful tool used in data science to create interactive
and visually appealing reports and dashboards from various data
sources. It enables data analysts to explore data, identify patterns, and
make data-driven decisions through its robust data visualization and
analysis features. Power BI is widely used in industries such as finance,
marketing, healthcare, and retail to gain insights and make informed
decisions. With its ability to connect to multiple data sources, Power BI
simplifies data processing and analysis, making it a valuable tool in the
field of data science.
Further reading
Activity 3.1
Lab activity:
Objective: To practice identifying and correcting
common data quality issues.
Materials:
Instructions:
Discussion:
Summary
The steps in the data preparation process include data acquisition, data
cleaning, data transformation, and data integration. Each step is
important in ensuring that the data is of high quality and ready for
analysis.
To help with the data preparation process, there are various tools
available such as data profiling tools, data cleaning tools, and data
transformation tools. These tools can help to automate and streamline
the data preparation process.
Further Reading
Soh, J., Singh, P., Soh, J., & Singh, P. (2020). Data Preparation and
Data Engineering Basics. Data Science Solutions on Azure: Tools and
Techniques Using Databricks and MLOps, 65-115.
Zhang, S., Zhang, C., & Yang, Q. (2003). Data preparation for data
mining. Applied artificial intelligence, 17(5-6), 375-381.
In data analysis for data science, there are three main areas of focus:
exploratory data analysis (EDA), descriptive data analysis, and
predictive data analysis. EDA involves identifying patterns and
relationships in data, while descriptive data analysis involves
summarizing data using statistical measures. Predictive data analysis
utilizes statistical models to make predictions based on historical data.
Overall, these techniques provide a comprehensive approach to
analyzing data in data science.
4.1 Learning outcomes
Benefits of EDA:
Helps to identify patterns and trends in the data that may not
be immediately obvious
Helps to develop hypotheses about the underlying
relationships between variables
Helps to identify outliers and anomalies in the data
Can be used to inform the design of further data analysis and
modelling
Regression analysis
Classification algorithms
Clustering algorithms
Time series analysis
Example
1. EDA: The data scientist would first use EDA to explore the
data and identify patterns and trends. This might involve
creating histograms and scatter plots to examine the
relationships between variables such as customer
demographics, purchase history, and product type.
2. Descriptive data analysis: The data scientist would then use
descriptive data analysis to summarize and describe the data.
This might involve calculating the average spending per
customer, the most popular product categories, and the
correlation between customer demographics and spending.
3. Predictive data analysis: Finally, the data scientist would use
predictive data analysis to develop a model that can predict
customer spending. This model could then be used by the
company to identify customers who are likely to spend more
money and to develop targeted marketing campaigns.
Activity 4.1
Materials:
Instructions:
Discussion:
Summary
Further Reading
Metre, K. V., Mathur, A., Dahake, R. P., Bhapkar, Y., Ghadge, J., Jain,
P., & Gore, S. (2024). An Introduction to Power BI for Data Analysis.
International Journal of Intelligent Systems and Applications in
Engineering, 12(1s), 142-147.
5: DATA 5.0 Introduction
VISUALISATION
Data visualization in data science is the process of presenting data in a
graphical or pictorial format. It involves the use of charts, graphs, and
other visual aids to present complex data sets in a clear and concise
manner. Data visualization is an important part of data science as it
helps analysts and stakeholders understand the data better and make
informed decisions based on the findings. It is used in various fields
including finance, healthcare, marketing, and others, to identify trends,
patterns, and relationships in data sets. Data visualization tools such as
Tableau, Power BI, and QlikView are widely used in the industry to
create interactive and engaging visualizations. Overall, data
visualization plays a crucial role in data science as it helps to
communicate complex information in a simple and meaningful way.
Graphical basics
There are a few basic elements that are common to all data
visualizations:
There are a few things that can be done to make data visualizations more
understandable:
Use clear and concise titles and labels. Make sure that the title of
your visualization accurately reflects what it shows, and that the
labels for the axes and data points are easy to read and
understand.
Choose appropriate colors. Choose colours that are visually
appealing and that can be easily distinguished from each other.
Avoid using too many colours, as this can make your
visualization cluttered and confusing.
Use a consistent design. Use the same fonts, colours, and
symbols throughout your visualization to create a consistent look
and feel.
Highlight important features. Use visual cues such as bold text,
different colours, or larger data points to highlight the most
important features of your visualization.
Provide context. Add a brief explanation to your visualization
that provides context for the data and helps the viewer to
understand what they are looking at.
Here are some additional tips for creating effective data visualizations:
Use the right type of visualization for your data. There are many
different types of data visualizations, each with its strengths and
weaknesses. Choose the type of visualization that is most
appropriate for the type of data you are visualizing and the
message you want to communicate.
Keep it simple. Don't try to cram too much information into a
single visualization. Focus on communicating the most important
insights from your data.
Tell a story. Use your visualization to tell a story about your
data. What insights can be drawn from the data? What
implications do the insights have?
Get feedback. Once you have created a visualization, share it
with others and get their feedback. This will help you to identify
any areas where the visualization can be improved.
Activity 5.1
Materials:
Instructions:
Discussion:
Summary
Further Reading
There are many different types of regression algorithms, each with its
own strengths and weaknesses. Some of the most common regression
algorithms include:
To train a regression model, you need a set of labelled data, where each
data point has a known target value. You can then use a variety of
machine learning libraries to train a regression model on your data.
Once the model is trained, you can evaluate its performance on a held-
out test set. This will help you to assess how well the model generalizes
to unseen data.
Once you have trained and evaluated a regression model, you can use it
to make predictions on new data points. However, it is important to
interpret the results of the model carefully.
Here are some additional tips for using regression algorithms in data
science:
Choose the right algorithm for the task at hand. There is no one-
size-fits-all regression algorithm. The best algorithm for a
particular task will depend on the specific characteristics of the
data and the problem you are trying to solve.
Prepare your data carefully. Before you train a regression model,
it is important to clean and prepare your data. This includes
removing outliers and missing values.
Tune the hyperparameters. Most regression algorithms have a
number of hyperparameters that can be tuned to improve the
performance of the model. It is important to tune the
hyperparameters for your specific problem.
Evaluate the model carefully. Once you have trained a regression
model, it is important to evaluate its performance on a held-out
test set. This will help you to assess how well the model
generalizes to unseen data.
Use the model responsibly. Regression models can be used to
make predictions, but it is important to use them responsibly. Be
aware of the limitations of the model and do not rely on it to
make important decisions without consulting with other experts.
By following these tips, you can use regression algorithms to
build and deploy effective machine learning models that can
solve real-world problems.
Activity 6.1
Conclusion:
This learning activity will help you to understand how
to use regression algorithms to solve real-world
problems. By experimenting with different regression
algorithms, datasets, features, and hyperparameters,
you can develop the skills you need to build and
deploy effective machine learning models.
Solution:
Summary
Further Reading
Once the model is trained, you can evaluate its performance on a held-
out test set. This will help you to assess how well the model generalizes
to unseen data.
Choose the right algorithm for the task at hand. There is no one-
size-fits-all unsupervised learning algorithm. The best algorithm
for a particular task will depend on the specific characteristics of
the data and the problem you are trying to solve.
Prepare your data carefully. Before you train an unsupervised
learning model, it is important to clean and prepare your data.
This may involve removing outliers and missing values.
Tune the hyperparameters. Most unsupervised learning
algorithms have a number of hyperparameters that can be tuned
to improve the performance of the model. It is important to tune
the hyperparameters for your specific problem.
Evaluate the model carefully. Once you have trained an
unsupervised learning model, it is important to evaluate its
performance on a held-out test set. This will help you to assess
how well the model generalizes to unseen data.
Use the model responsibly. Unsupervised learning models can be
used to discover patterns and relationships in data, but they
cannot explain why these patterns and relationships exist. It is
important to use your domain knowledge to interpret the results
of unsupervised learning models and to identify meaningful
patterns and relationships.
Activity 7.1
Variations:
Conclusion:
Summary
Further Reading
8: 8.0 Introduction
RECOMMENDER
SYSTEMS Recommender systems are a type of algorithm that suggests items to
users based on their preferences and behaviour. They are commonly
used in e-commerce websites, streaming services, and social networks.
By analysing user data such as past purchases, ratings, and browsing
history, recommender systems can make personalized recommendations
to help users find products or content they are likely to enjoy. This can
improve user engagement, increase sales, and enhance the user
experience. However, designing effective recommender systems
requires careful consideration of factors such as data privacy, bias, and
evaluation metrics.
Here are some examples of how recommender systems are used in the
real world:
Here are some tips for building and deploying effective recommender
systems:
Activity 8.1
Lab Activity:
Conclusion:
Solution:
Summary
Further Reading
Ricci, F., Rokach, L., Shapira, B., & Kantor, P. (2011). Recommender
systems handbook. Springer Science & Business Media.