Data Science and Machine Learning Syllabus
Module 1: Python & Basic Tools
- Python basics (variables, loops, functions)
- NumPy: arrays and matrix operations
- Pandas: data manipulation and analysis
- Matplotlib & Seaborn: data visualization
- Scikit-learn basics
Module 2: Statistics & Probability
- Descriptive statistics (mean, median, std dev, IQR)
- Probability theory and distributions (Normal, Binomial, Poisson)
- Bayes' theorem
- Hypothesis testing (t-test, chi-square, ANOVA)
- Confidence intervals and p-values
Module 3: Data Preprocessing
- Handling missing values
- Encoding categorical variables (Label, One-Hot)
- Feature scaling (Normalization, Standardization)
- Outlier detection & treatment
- Data splitting (train/test/validation)
Module 4: Machine Learning Algorithms
- Supervised Learning: Linear Regression, Logistic Regression, Decision Trees, Random Forest, SVM, KNN,
Naive Bayes
- Unsupervised Learning: K-Means, Hierarchical Clustering, PCA
- Model Evaluation: Accuracy, Precision, Recall, F1-score, Confusion matrix, ROC-AUC, Cross-validation
Module 5: Advanced Machine Learning
Data Science and Machine Learning Syllabus
- Ensemble methods: Bagging, Boosting (AdaBoost, XGBoost)
- Feature selection techniques
- Hyperparameter tuning (GridSearchCV, RandomSearch)
- Pipelines in Scikit-learn
Module 6: Deep Learning (Optional/Advanced)
- Neural networks basics (Perceptron, Backpropagation)
- TensorFlow or PyTorch introduction
- CNNs (for images)
- RNNs/LSTMs (for sequences/text)
- Transfer Learning
Module 7: Projects & Applications
- Predictive modeling
- Sentiment analysis (NLP)
- Time-series forecasting
- Image classification
- Recommendation systems
Tools & Platforms
- Jupyter Notebook / Google Colab
- Git & GitHub
- SQL (for database queries)
- Excel (for simple data analysis)
- Tableau or Power BI (for dashboards)