[go: up one dir, main page]

0% found this document useful (0 votes)
7 views5 pages

Introduction to Data Science Course Outline

The document outlines the course 'Introduction to Data Science' offered by Wachemo University, detailing its objectives, learning outcomes, and assessment methods. It covers essential topics such as data exploration, statistical concepts, machine learning, and ethical issues in data science, aiming to equip students with foundational skills in the field. Prerequisites include basic knowledge of algorithms and programming, and the course utilizes Python and various data science tools for practical applications.

Uploaded by

Abdulkarim Emam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views5 pages

Introduction to Data Science Course Outline

The document outlines the course 'Introduction to Data Science' offered by Wachemo University, detailing its objectives, learning outcomes, and assessment methods. It covers essential topics such as data exploration, statistical concepts, machine learning, and ethical issues in data science, aiming to equip students with foundational skills in the field. Prerequisites include basic knowledge of algorithms and programming, and the course utilizes Python and various data science tools for practical applications.

Uploaded by

Abdulkarim Emam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

WACHEMO UNIVERSITY

COLLEGE OF ENGINEERING AND TECHNOLOGY


COMPUTER SCIENCE DEPARTMENT

Course Title: Introduction to Data Science (CoSc Continuous Assessment (100%)


2042)
Credit Hours: 3 Cr. Hrs
Year: II • Class Participation (5%)
Semister:I • Mid-exam (20)
Prerequisites: • Assignments (25%)
Office: 4 • Final exam (50%)
Email: habteshiferaw27@gmail.com
Instructor: Habtamu sh.
Course Description
Data Science is the study of the generalizable extraction of knowledge from data. Being a data
scientist requires an integrated skill set spanning mathematics, statistics, machine learning,
databases and other branches of computer science along with a good understanding of the craft of
problem formulation to engineer effective solutions. This course will introduce students to this
rapidly growing field and equip them with some of its basic principles and tools as well as its
general mindset. Students will learn concepts, techniques and tools they need to deal with various
facets of data science practice, including data collection and integration, exploratory data analysis,
predictive modeling, descriptive modeling, data product creation, evaluation, and effective
communication. The focus in the treatment of these topics will be on breadth, rather than depth,
and emphasis will be placed on integration and synthesis of concepts and their application to
solving problems. To make the learning contextual, real datasets from a variety of disciplines will
be used.

Learning Outcomes
At the conclusion of the course, students should be able to:

▪ Describe what Data Science is and the skill sets needed to be a data scientist.
▪ Explain in basic terms what Statistical Inference means. Identify probability distributions
commonly used as foundations for statistical modeling. Fit a model to data.
▪ Use python to carry out basic statistical modeling and analysis.
▪ Explain the significance of exploratory data analysis (EDA) in data science. Apply basic
tools (plots, graphs, summary statistics) to carry out EDA.

1
▪ Describe the Data Science Process and how its components interact.
▪ Use APIs and other tools to scrap the Web and collect data.
▪ Apply EDA and the Data Science process in a case study.
▪ Apply basic machine learning algorithms (Linear Regression, k-Nearest Neighbors (k-NN),
k-means, Naive Bayes) for predictive modeling. Explain why Linear Regression and k-NN
are poor choices for Filtering Spam. Explain why Naive Bayes is a better alternative.
▪ Identify common approaches used for Feature Generation. Identify basic Feature Selection
algorithms (Filters, Wrappers, Decision Trees, Random Forests) and use in applications.
▪ Identify and explain fundamental mathematical and algorithmic ingredients that constitute a
Recommendation Engine (dimensionality reduction, singular value decomposition, principal
component analysis). Build their own recommendation system using existing components.
▪ Create effective visualization of given data (to communicate or persuade).
▪ Work effectively (and synergically) in teams on data science projects.
▪ Reason around ethical and privacy issues in data science conduct and apply ethical practices.
Prerequisites
Students are expected to have basic knowledge of algorithms and reasonable programming
experience and some familiarity with basic linear algebra (e.g., solution of linear systems and
eigenvalue/vector computation) and basic probability and statistics. If you are interested in taking
the course, but are not sure if you have the right background, talk to the instructor. You may still
be allowed to take the course if you are willing to put in the extra effort to fill in any gaps.

Topics and course outline:


1. Introduction to Data Science
▪ What is Data Science?
▪ The need for Data Science
▪ Jobs in Data Science
▪ Types of Jobs in Data Science
▪ Components of Data Science
▪ Work Flow of Data Science
▪ Life Cycle (Process) of Data Science
▪ BI (Business Intelligence) Vs. Data Science
▪ Applications of Data Science
▪ Toolboxes for Data Scientists
 Introduction
 Why Python
 Fundamental Python Libraries for Data Scientists

2
✓ Numeric and Scientific Computation: NumPy and SciPy
✓ SCIKIT-Learn: Machine Learning in Python
✓ PANDAS: Python Data Analysis Library
 Data Science Ecosystem Installation
 Integrated Development Environments (IDE)
✓ Web Integrated Development Environment (WIDE): Jupyter
 Get Started with Python for Data Scientists
✓ Reading, Selecting Data, Filtering Data, Filtering Missing Values, Manipulating
Data, Sorting, Grouping Data, Rearranging Data, Ranking Data and Plotting
2. Data Exploration, Cleaning and Data visualization
▪ Exploratory Data Analysis (EDA)
▪ Data cleaning and preprocessing techniques
▪ Dealing with missing data and outliers
▪ Data Visualization
▪ Tools for data visualization (e.g., Matplotlib, Seaborn, ggplot2)
▪ Creating static and interactive visualizations
3. Statistical Concepts in Data Scienc
3.1 Descriptive statistics
▪ Introduction
▪ Descriptive statistics
▪ Exploratory Data Analysis
▪ Estimation
✓ Sample and Estimated Mean, Variance and Standard
3.2 Inferential statistics and hypothesis testing
▪ Introduction
▪ Statistical Inference
▪ Measuring the Variability in Estimates
✓ Point Estimates
✓ Confidence Intervals
▪ Hypothesis Testing
✓ Testing Hypotheses Using Confidence Intervals

3
4. Machine learning
▪ Introduction
▪ Supervised learning (e.g., decision trees, random forests, support vector machines)
▪ Unsupervised learning (e.g., clustering, dimensionality reduction)
▪ Evaluation of machine learning models
▪ Three Basic Machine Learning Algorithms
✓ Linear Regression
✓ k-Nearest Neighbors (k-NN)
✓ k-means
▪ Machine Learning Algorithm and Usage in Applications
5. Regression analysis and Regression:
▪ Introduction
▪ linear regression
✓ Simple linear regression
✓ Multiple & Polynomial regression
▪ Sparse model.
▪ Logistics regression
6. Unsupervised learning
▪ Introduction
▪ Clustering
✓ similarity and distances
✓ quality measures of clustering
7. Mining Social-Network Graphs- Social networks as graphs
▪ Clustering of graphs
▪ Direct discovery of communities in graphs
▪ Partitioning of graphs
▪ Neighborhood properties in graphs
8. Recommendation Systems: Building a User-Facing Data Product
▪ Algorithmic ingredients of a Recommendation Engine
▪ Dimensionality Reduction
▪ Singular Value Decomposition
▪ Principal Component Analysis
▪ Exercise: build your own recommendation system

4
9. Data Science and Ethical Issues
▪ Discussions on privacy, security, ethics
▪ A look back at Data Science
▪ Next-generation data scientists

Books
1. "Python for Data Analysis" by Wes McKinney "Data Science for Business" by Foster
Provost and Tom Fawcett
2. introduction to Data Science a Python approach to concepts, Techniques and
Applications, Igual, L;Seghi’, S. Springer, ISBN:978-3-319-50016-4
3. Data Analysis with Python A Modern Approach, David Taieb, Packt Publishing, ISBN-
9781789950069
4. Python Data Analysis, Second Ed., Armando Fandango, Packt Publishing, ISBN:
9781787127487
Software and Tools:
• Python (Jupyter Notebooks)
• R (optional)
• Data visualization tools (e.g., Matplotlib, Seaborn, ggplot2)
• Machine learning libraries (e.g., scikit-learn, TensorFlow, PyTorch)
Additional references and books related to the course:

• Jure Leskovek, Anand Rajaraman and Jeffrey Ullman. Mining of Massive Datasets. v2.1,
Cambridge University Press. 2014. (free online)
• Kevin P. Murphy. Machine Learning: A Probabilistic Perspective. ISBN 0262018020. 2013.
• Foster Provost and Tom Fawcett. Data Science for Business: What You Need to Know about
Data Mining and Data-analytic Thinking. ISBN 1449361323. 2013.
• Trevor Hastie, Robert Tibshirani and Jerome Friedman. Elements of Statistical Learning,
Second Edition. ISBN 0387952845. 2009. (free online)
• Avrim Blum, John Hopcroft and Ravindran Kannan. Foundations of Data Science. (Note:
this is a book currently being written by the three authors. The authors have made the first
draft of their notes for the book available online. The material is intended for a modern
theoretical course in computer science.)
• Mohammed J. Zaki and Wagner Miera Jr. Data Mining and Analysis: Fundamental Concepts
and Algorithms. Cambridge University Press. 2014.
• Jiawei Han, Micheline Kamber and Jian Pei. Data Mining: Concepts and Techniques, Third
Edition. ISBN 0123814790. 2011.

You might also like