[go: up one dir, main page]

0% found this document useful (0 votes)
121 views23 pages

Full Stack Data Science Syllabus

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
121 views23 pages

Full Stack Data Science Syllabus

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 23

Data Science & AI Curriculum

1. Introduction to Python

● What is Python?
● Features of Python
● Python 2 vs Python 3
● Installing Python and IDEs (VS Code, PyCharm, Jupyter)
● Writing and running your first Python script
● Python interactive mode vs script mode

2. Python Basics

● Variables and data types


● int, float, str, bool, NoneType
● Comments (single-line, multi-line)
● Type casting
● Input/output functions: input(), print()
● f-strings and formatting strings

3. Operators

● Arithmetic operators
● Assignment operators
● Comparison (Relational) operators
● Logical operators
● Bitwise operators
● Identity operators (is, is not)
● Membership operators (in, not in)
● Operator precedence

4. Conditional Statements

● if, if-else, if-elif-else


● Nested if
● Short-hand if statements
● pass statement

5. Loops and Iteration

● while loop
● for loop
● range() function
● break, continue, pass
● Nested loops
● Loop else block

6. Data Structures in Python

6.1 Strings

● Creating and accessing strings


● String slicing and indexing
● String methods (e.g., upper(), split(), find())
● in keyword
● String formatting (format(), f-strings)

6.2 Lists

● Creating lists
● Indexing, slicing
● List methods (append(), insert(), remove(), sort())
● List comprehensions

6.3 Tuples

● Creating tuples
● Tuple unpacking
● Immutability
● Tuple methods

6.4 Sets

● Creating sets
● Set operations (union, intersection, difference)
● Set methods (add(), remove(), discard())

6.5 Dictionaries

● Creating dictionaries
● Accessing, updating, deleting items
● Dictionary methods (get(), items(), keys(), values())
● Dictionary comprehensions
7. Functions

● Defining functions with def


● Arguments and return values

Types of arguments:

● Default
● Keyword
● Arbitrary (*args, **kwargs)
● Recursion
● Lambda functions
● map(), filter(), reduce()
● zip(), enumerate()

8. Modules and Packages

● Importing built-in modules (math, random, datetime, etc.)


● Creating and using user-defined modules
● The __name__ == "__main__" statement
● from module import ... syntax
● Installing external packages using pip

9. File Handling

● Opening files (open())


● Reading and writing text files
● Reading and writing CSV files
● Working with file modes: 'r', 'w', 'a', 'rb', 'wb'
● Context manager (with statement)
● File methods: read(), readline(), readlines()

10. Exception Handling

● Syntax: try-except
● finally block
● else block
● Raising exceptions (raise)
● Built-in exceptions (e.g., ValueError, TypeError)
● Creating custom exceptions

11. Object-Oriented Programming (OOP)


● Classes and objects
● __init__ method (constructor)
● self keyword
● Instance vs class variables
● Methods (instance, class, static)
● Inheritance
● Method overriding
● super() function
● Encapsulation, Abstraction, Polymorphism
● Dunder/Magic Methods (__str__, __repr__, etc.)

12. Advanced Python Topics

12.1 Iterators and Generators

● iter(), next()
● Creating custom iterators
● Generator functions using yield
● Generator expressions

12.2 Decorators

● Function decorators
● Chaining decorators
● @property decorator

12.3 Context Managers

● with statement
● Custom context managers using classes
● contextlib module

12.4 Regular Expressions

● re module
● match(), search(), findall(), sub()
● Meta-characters and special sequences

12.5 Comprehensions

● List, dict, set, and generator comprehensions

13. Working with External Libraries


● requests (HTTP requests)
● json (parsing JSON data)
● os, sys (OS-level operations)
● time, datetime
● shutil, glob
● argparse (CLI arguments)

14. Working with Data (Pandas + NumPy)

Introduction to NumPy arrays and operations

Pandas:

● DataFrame and Series


● Reading/writing data (read_csv, to_csv)
● Indexing, filtering, sorting
● Handling missing data
● Grouping and aggregations

15. Multithreading and Multiprocessing

● threading module
● multiprocessing module
● Use cases and differences

16. Virtual Environments & Packaging

● Creating virtual environments using venv, virtualenv


● requirements.txt
● Creating and installing Python packages
● setup.py, __init__.py

17. Libraries for Data Science

● NumPy: Arrays and Mathematical Operations


● Pandas: DataFrames, Series, Data Manipulation
● Matplotlib & Seaborn & Scipy: Data Visualization
18.Web Scrapping

● Scrapping the data from API


● Scarpping Using Beautiful soup

2.Introduction to Statistics and Math for Data Science

3. Descriptive Statistics

3.1 Central Tendency

● Mean (Arithmetic, Geometric, Harmonic)


● Median
● Mode

3.2 Dispersion (Variability)

● Range
● Variance
● Standard deviation
● Interquartile Range (IQR)
● Coefficient of Variation

3.3 Shape of Data

● Skewness (positive, negative, symmetric)


● Kurtosis (leptokurtic, platykurtic, mesokurtic)

4. Data Types and Scales of Measurement

● Qualitative vs Quantitative data


● Discrete vs Continuous data

Levels of measurement:

● Nominal
● Ordinal
● Interval
● Ratio

5. Data Collection and Sampling


● Population vs Sample

Types of Sampling:

● Random sampling
● Stratified sampling
● Cluster sampling
● Systematic sampling
● Convenience sampling
● Bias and variability

6. Data Visualization

● Histogram
● Box plot
● Bar chart
● Pie chart
● Scatter plot
● Heatmap
● Pair plots (with seaborn/pandas)

7. Probability Theory

● Basic terminology (experiment, sample space, event)


● Types of events (independent, mutually exclusive, exhaustive)
● Classical vs Empirical vs Subjective probability
● Addition and multiplication rules
● Conditional probability
● Bayes’ Theorem
● Complementary rule

8. Probability Distributions

8.1 Discrete Distributions

● Bernoulli distribution
● Binomial distribution
● Poisson distribution

8.2 Continuous Distributions

● Uniform distribution
● Normal distribution (Gaussian)
● Exponential distribution
● Log-normal distribution

8.3 Key Properties

● Probability Density Function (PDF)


● Cumulative Distribution Function (CDF)
● Z-scores and standardization
● Central Limit Theorem (CLT)

9. Inferential Statistics

● Population vs sample
● Sampling distribution
● Estimation (point estimate vs interval estimate)
● Confidence intervals (CI)

10. Hypothesis Testing

● Null and Alternative Hypothesis (H0 vs H1)


● Type I and Type II errors
● p-value and significance level (α)
● One-tailed vs two-tailed tests
● Steps in hypothesis testing

10.1 Common Tests

● Z-test (one-sample, two-sample)


● T-test (independent, paired)
● ANOVA (One-way and Two-way)
● Chi-Square test (goodness of fit, independence)
● F-test

11. Correlation and Covariance

● Covariance (positive, negative, zero)


● Pearson correlation coefficient
● Spearman rank correlation
● Causation vs correlation

12. Linear Algebra (For ML & Data Science)

● Scalars, vectors, matrices, tensors


● Matrix operations:
● Addition, subtraction, multiplication
● Transpose, inverse, determinant
● Identity and diagonal matrices
● Rank of a matrix
● Linear transformations
● Eigenvalues and eigenvectors
● Dot product and cross product
● Norms (L1, L2)
● Applications in ML (PCA, SVD)

13. Calculus (For ML & Deep Learning)

13.1 Differential Calculus

● Limits and continuity


● Derivatives and rules (product, quotient, chain rule)
● Partial derivatives
● Gradient, slope, tangent
● Optimization (minima, maxima)
● Cost function & gradient descent

13.2 Integral Calculus

● Indefinite and definite integrals


● Area under the curve
● Applications in ML (e.g., loss functions)

14. Discrete Mathematics

● Set theory
● Logic and truth tables
● Functions and relations
● Graph theory basics (nodes, edges, trees)

Combinatorics:
● Permutations and combinations
● Factorials and counting problems

SQL

1. Introduction to Databases & SQL

● What is a database?
● Types of databases: Relational vs Non-relational
● What is SQL?
● RDBMS vs DBMS
● SQL dialects (MySQL, PostgreSQL, SQLite, SQL Server, Oracle)
● Setting up environment (MySQL Workbench / PgAdmin / SQLite)

2. Database & Table Basics

● Creating a database: CREATE DATABASE


● Using a database: USE
● Dropping a database: DROP DATABASE
● Creating tables with CREATE TABLE
● Data types (INT, VARCHAR, TEXT, DATE, BOOLEAN, etc.)

Constraints:

● PRIMARY KEY
● FOREIGN KEY
● NOT NULL
● UNIQUE
● DEFAULT
● CHECK
● Dropping tables: DROP TABLE
● Modifying tables: ALTER TABLE (ADD, DROP, MODIFY column)

3. Basic Data Operations (CRUD)

● INSERT data into tables


● SELECT data from tables
● UPDATE existing records
● DELETE records from tables

4. Filtering and Sorting


● WHERE clause
● Logical operators: AND, OR, NOT
● Comparison operators: =, !=, <, >, <=, >=
● Pattern matching: LIKE, NOT LIKE, %, _
● BETWEEN, IN, IS NULL, IS NOT NULL
● ORDER BY clause (ASC/DESC)
● LIMIT and OFFSET

5. Working with Functions

Aggregate functions:

● COUNT(), SUM(), AVG(), MAX(), MIN()

String functions:

● LENGTH(), UPPER(), LOWER(), CONCAT(), SUBSTRING(), REPLACE(), TRIM()

Date functions:

● NOW(), CURDATE(), DATEDIFF(), DATE_ADD(), EXTRACT()

Mathematical functions:

● ROUND(), CEIL(), FLOOR(), MOD()

6. Grouping and Aggregating

● GROUP BY clause
● HAVING clause (vs WHERE)
● Grouping multiple columns
● Nested aggregations

7. SQL Joins (Combining Tables)

● INNER JOIN
● LEFT JOIN (LEFT OUTER JOIN)
● RIGHT JOIN (RIGHT OUTER JOIN)
● FULL OUTER JOIN
● CROSS JOIN
● Joining more than 2 tables
● Aliases with joins
● Self joins

8. Subqueries & Nested Queries


● Subquery in SELECT
● Subquery in WHERE and FROM
● Correlated subqueries
● EXISTS, NOT EXISTS
● IN vs EXISTS

9. Views

● Creating views: CREATE VIEW


● Updating views
● Dropping views
● Advantages and limitations

10. Indexes & Performance

● Creating indexes: CREATE INDEX


● Unique index
● Composite index
● Dropping indexes
● Performance benefits and trade-offs

11. Transactions and ACID

● What is a transaction?
● BEGIN, COMMIT, ROLLBACK
● Savepoints
● ACID properties (Atomicity, Consistency, Isolation, Durability)
● Isolation levels

12. Stored Procedures and Functions

● CREATE PROCEDURE
● CALL, IN, OUT parameters
● CREATE FUNCTION
● Differences between procedures and functions
● Dropping procedures and functions

13. Triggers and Events

● What are triggers?


● BEFORE and AFTER triggers
● Row-level vs statement-level triggers
● Creating, updating, and deleting triggers
● Scheduled events
14. Advanced SQL

● CASE statement
● Common Table Expressions (CTEs): WITH clause

Window functions:

● ROW_NUMBER(), RANK(), DENSE_RANK()


● LEAD(), LAG(), NTILE()
● Pivot tables using SQL
● Recursive queries

15. Database Normalization

● What is normalization?
● 1NF, 2NF, 3NF, BCNF
● Decomposition and dependency preservation
● Denormalization

4. Data Visualization and BI Tools


4.1. Power BI

● Data Loading and Transformation


● Power Query Editor
● DAX Formulas and Measures
● Creating Interactive Dashboards
● Publishing and Sharing Reports

4.2. Tableau

● Data Connection and Preparation


● Creating Basic and Advanced Charts
● Filters, Parameters, Calculated Fields
● Dashboards and Storytelling
● Tableau Prep for Data Cleaning

5. Data Wrangling & Exploratory Data Analysis (EDA)


● Handling Missing Values
● Data Type Conversion
● Outlier Detection
● Feature Engineering & Feature Selection
● Correlation and Trend Analysis
● Data Imbalance Handling technique

Machine Learning
1. Introduction to Machine Learning
What is Machine Learning?
Types of ML:

● Supervised Learning
● Unsupervised Learning
● Semi-supervised Learning
● Reinforcement Learning
● ML vs AI vs Deep Learning

2. Data Preprocessing
● Importing datasets
● Handling missing data
● Mean/Median/Mode imputation
● Forward/Backward fill
● Handling categorical data
● Label Encoding
● One-Hot Encoding
● Feature Scaling
● MinMaxScaler
● StandardScaler
● Train-Test Split
● Pipeline creation using Pipeline and ColumnTransformer

3. Exploratory Data Analysis (EDA)


● Summary statistics
● Univariate, bivariate, multivariate analysis
● Correlation matrix & heatmap
● Outlier detection and treatment (Z-score, IQR)
● Distribution plots (histogram, boxplot, KDE)
● Feature importance analysis

4. Supervised Learning – Regression


4.1 Linear Regression
● Simple and Multiple Linear Regression
● Assumptions of linear regression
● Evaluation metrics: MAE, MSE, RMSE, R²

Regularization:

● Lasso Regression
● Ridge Regression
● ElasticNet

4.2 Polynomial Regression


● Polynomial features
● Overfitting and underfitting

5. Supervised Learning – Classification


5.1 Logistic Regression
● Binary and Multiclass classification
● Sigmoid function
● Confusion matrix
● Accuracy, Precision, Recall, F1-Score, ROC-AUC

5.2 K-Nearest Neighbors (KNN)


5.3 Support Vector Machines (SVM)
● Linear and non-linear SVM
● Kernel trick (RBF, polynomial, sigmoid)

5.4 Decision Trees


● Gini index vs Entropy
● Overfitting and pruning

5.5 Random Forest


● Ensemble concept
● Feature importance
● Hyperparameter tuning
5.6 Gradient Boosting
● AdaBoost
● XGBoost
● LightGBM
● CatBoost

6. Unsupervised Learning
6.1 Clustering
● K-Means clustering
● Elbow method & silhouette score
● Hierarchical clustering
● DBSCAN

6.2 Dimensionality Reduction


● PCA (Principal Component Analysis)
● t-SNE
● LDA (Linear Discriminant Analysis)

6.3 Association Rule Learning


● Apriori algorithm

7. Model Evaluation and Selection


● cross-validation (K-Fold, Stratified K-Fold)
● GridSearchCV vs RandomizedSearchCV
● Bias-Variance tradeoff
● Learning curves
● Precision-Recall tradeoff
● ROC-AUC Curve

8. Feature Engineering
● Feature creation and extraction
● Handling date and time features

9.Time Series Forecasting (Basic)


● Time series components
● Lag features, rolling statistics
● AR, MA, ARMA, ARIMA
● Stationarity and differencing
● ACF & PACF plots
● Seasonal decomposition
● Prophet model

Deep Learning
1. Introduction to Deep Learning
● What is Deep Learning?
● Deep Learning vs Machine Learning
● Why Deep Learning? Use-cases and advantages
● History and evolution of neural networks
● AI ML DL ANN/CNN/RNN Transformers

2. Neural Networks Basics


● Biological Neuron vs Artificial Neuron
● Perceptron (single-layer)

Activation Functions:

● Sigmoid
● Tanh
● ReLU
● Leaky ReLU
● Softmax
● Feedforward and Backpropagation

Loss functions:

● MSE, MAE
● Cross-entropy loss

Optimizers:

● SGD
● Adam
● RMSProp
● Momentum

3. Building Neural Networks from Scratch


● NumPy implementation of ANN
● Forward pass and backward pass
● Weight initialization
● Hyperparameters: learning rate, batch size, epochs
● Underfitting vs Overfitting
● Train/Validation/Test split
● Regularization:
● L1/L2
● Dropout
● Early stopping

4. Deep Neural Networks (DNN)


● Multi-layer Perceptrons (MLPs)
● Depth vs Width of networks
● Vanishing/exploding gradients
● Batch Normalization
● Model tuning: grid search, random search
● Saving/loading models

5. Computer Vision with Convolutional Neural Networks


(CNN)
● Image basics: pixels, channels, filters
● Convolution operation
● Pooling layers: MaxPool, AvgPool
● CNN architecture:
● LeNet
● AlexNet
● VGG16/VGG19
● ResNet (skip connections)
● Inception, MobileNet, EfficientNet
● Data augmentation
● Transfer Learning & Fine-tuning
● Image classification, object detection basics
NATURAL LANGUAGE
PROCESSING
1. Introduction to NLP
● What is NLP?
● NLP vs NLU vs NLG

Applications of NLP:

● Chatbots
● Sentiment Analysis
● Search Engines
● Spam Detection
● Machine Translation
● Challenges in NLP
● Structured vs Unstructured Text

2. Text Preprocessing
● Text cleaning basics
● Tokenization
● Word tokenization
● Sentence tokenization
● Normalization
● Lowercasing
● Removing punctuation, special characters
● Removing stopwords
● Stemming vs Lemmatization
● POS (Part-of-Speech) tagging
● Named Entity Recognition (NER)
● Spell correction
● Regex for text patterns

3. Feature Extraction from Text


● Bag of Words (BoW)
● Term Frequency (TF)
● Inverse Document Frequency (IDF)
● TF-IDF
● N-grams (unigram, bigram, trigram)
● Vocabulary and document matrix

Text vectorization with:

● CountVectorizer
● TfidfVectorizer
● Document frequency analysis

4. Word Embeddings (Vector Representations)


● One-hot encoding
● Word2Vec (CBOW & Skip-gram)
● GloVe (Global Vectors for Word Representation)
● FastText
● Comparison: BoW vs Word2Vec vs GloVe
● Visualizing embeddings using t-SNE or PCA

9. MLOps (Machine Learning Operations)


● CI/CD for ML
● Model Monitoring and Logging
● MLflow, DVC
● Git, GitHub Actions
● Deployment using Cloud Services (AWS, GCP, Azure)

Generative AI
1. Introduction to Generative AI
● What is Generative AI?
● Discriminative vs Generative models
● Why is Generative AI important?
● Evolution of GenAI

Real-world applications:

● ChatGPT, Bard, Claude


● DALL·E, MidJourney, Stable Diffusion
● Music & voice generation
● Code generation

2.Foundations of Generative Models


● What is a generative model?
● Data distribution learning
● Types of generative models:
● Explicit vs implicit models
● Directed vs undirected models
● Key architectures: VAEs, GANs, Transformers

3.Transformers & Foundation Models


● Introduction to Transformer architecture
● Self-attention
● Positional encoding
● Multi-head attention
● Encoder vs Decoder vs Encoder-Decoder models
● Pretraining & fine-tuning

4.Large Language Models (LLMs)


● What are LLMs?
● BERT vs GPT

Key models:

● GPT-2, GPT-3, GPT-4


● T5
● PaLM
● LLaMA, Mistral, Falcon, Claude, Gemini
● Tokenization: BPE, SentencePiece
● Prompt engineering
● Chain-of-Thought reasoning
● In-context learning & few-shot learning
● Instruction-tuning
● Retrieval-Augmented Generation (RAG)

5.Diffusion Models
● What are diffusion models?
● Forward & reverse processes
● Denoising Diffusion Probabilistic Models (DDPM)
● Training and sampling process

Popular models:

● Stable Diffusion
● Imagen
● Glide

6.Multimodal Generative AI
● What is multimodal AI?
● Combining text, images, audio, and video
● CLIP (Contrastive Language–Image Pretraining)
● Flamingo (text + image)
● Visual Question Answering (VQA)
● Audio + Text (Whisper + GPT)
● Video generation models

7.Fine-Tuning & Customization


● Fine-tuning vs Prompt-tuning vs LoRA
● Dataset preparation for fine-tuning
● Parameter-efficient tuning (PEFT)

Tools:

● PEFT by Hugging Face


● DeepSpeed, bitsandbytes, QLoRA
● Training custom LLMs or image generators

8.Hugging Face Transformers, Datasets, Diffusers


● LangChain for LLM workflows
● OpenAI API (ChatGPT, DALL·E, Whisper)
● Vertex AI, Azure OpenAI, Amazon Bedrock
● Replicate.com for hosted models
● Gradio / Streamlit for GenAI apps
● Weights & Biases (WandB) for experiment tracking

11. BASICS OF AI AGENTS

11.1. Introduction to AI Agents

● What is an agent in AI?


● Agent = Perceives + Acts
● Types of agents:
o Simple Reflex Agents
o Model-based Reflex Agents
o Goal-based Agents
o Utility-based Agents
o Learning Agents

11.2. Agent Architecture

● Perception Decision Action


● PEAS Framework (Performance measure, Environment, Actuators, Sensors)
● Environment types:
o Fully vs Partially Observable
o Deterministic vs Stochastic
o Episodic vs Sequential
o Static vs Dynamic
o Discrete vs Continuous

11.3. Simple Agent Programs

● Rule-based agents
● IF-THEN rules
● Condition-action rules

11.4. Problem Solving Agents

● Search problem formulation


● Uninformed search (DFS, BFS, UCS)
● Informed search (A*, Greedy)

12. Career Readiness


● Resume Building
● LinkedIn Optimization
● Mock Interviews
● Project Presentation Skills
● Freelancing & Portfolio Websites

You might also like