[go: up one dir, main page]

0% found this document useful (0 votes)
7 views1 page

Full ml-2

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 1

Q 3....

Handling missing data is crucial for effective machine - Maximum margin classification, epitomized by Support
learning. Here are some common techniques: Vector Machines (SVM), prioritizes finding a hyperplane that
1. *Deletion Methods*: maximally separates classes with the widest margin. This
- *Listwise Deletion*: Removes rows with any missing approach aims to enhance model generalization by focusing on
the most challenging instances, known as support vectors. SVM
values. - Adv: Simple, maintains dataset integrity if data
accomplishes this by optimizing the margin, ensuring robustness
is missing completely at random (MCAR). to noise and better handling of outliers. While initially designed
- *Pairwise Deletion*: Uses all available data for each for linearly separable data, SVM's kernel trick extends its
analysis. - Adv: Retains more data than listwise deletion. applicability to nonlinear datasets by mapping them into higher-
2. *Imputation Methods*: dimensional spaces. Consequently, maximum margin classification
- *Mean/Median/Mode Imputation*: Replaces missing with SVM remains a prominent choice for binary classification
values with the mean, median, or mode. - Adv: Easy and tasks, renowned for its ability to achieve strong performance and
quick, preserves overall data distribution. resilience in various domains.
- *K-Nearest Neighbors (KNN) Imputation*: Imputes
based on the average of k-nearest neighbors. - Adv:
Accounts for data similarity, often more accurate.
Data preprocessing is a crucial step in preparing raw data for
3. *Advanced Techniques*:
analysis or machine learning. It involves several tasks such as
- *Multiple Imputation*: Creates several imputed cleaning, where missing or erroneous values are handled;
datasets and combines results. - Advs: Accounts for transformation, where data is normalized or standardized for
uncertainty in imputed values, provides robust statistical uniformity; feature selection, to identify the most relevant
inference. attributes; and feature engineering, creating new features or
- *Expectation-Maximization (EM)*: Iteratively estimates modifying existing ones to enhance model performance.
missing values to maximize data likelihood. - Advs: Additionally, data may be encoded or scaled for compatibility with
Statistically robust, handles complex data well. algorithms. Dimensionality reduction techniques like PCA or t-SNE
can be applied to reduce complexity. Overall, effective
These techniques improve model performance and
preprocessing ensures data quality, improves model accuracy, and
robustness by addressing missing data effectively. expedites the learning process.

Supervised learning is a machine learning paradigm where


algorithms learn from labeled data, consisting of input-output
Q4 .....Bagging (Bootstrap Aggregating) and Boosting are pairs. The algorithm aims to learn the mapping between input
both ensemble learning techniques used to improve the features and corresponding output labels, enabling it to make
performance of machine learning models, but they have predictions or decisions on new, unseen data. Supervised learning
key differences: tasks include classification, where the output is a category or class
*Bagging*: label, and regression, where the output is a continuous numerical
value. Common supervised learning algorithms include linear
- *Independent Models*: Trains multiple models
regression, logistic regression, decision trees, support vector
independently using random subsets of the training data machines, and neural networks. Supervised learning finds
(with replacement). applications in various fields such as image recognition, natural
- *Parallel Training*: Models are trained in parallel. language processing, healthcare diagnostics, and financial
- *Goal*: Reduces variance by averaging predictions, forecasting.
leading to more stable and less overfit models. Unsupervised learning is a machine learning approach where
- *Example*: Random Forest, where each tree is trained algorithms are trained on unlabeled data, aiming to discover
underlying patterns, structures, or relationships within the data.
on a bootstrap sample. Unlike supervised learning, there are no predetermined output
*Boosting*: labels, and the algorithm must autonomously identify meaningful
- *Sequential Models*: Trains models sequentially, where patterns. Unsupervised learning tasks include clustering, where
each model corrects the errors of its predecessor. data points are grouped based on similarity, and dimensionality
- *Sequential Training*: Models are trained one after reduction, where the dataset's complexity is reduced while
preserving essential information. Common unsupervised learning
another.
algorithms include k-means clustering, hierarchical clustering,
- *Goal*: Reduces bias by focusing on hard-to-predict principal component analysis (PCA), and autoencoders.
instances, improving overall model accuracy. Unsupervised learning has applications in areas such as anomaly
- *Example*: AdaBoost, Gradient Boosting, where each detection, customer segmentation, recommendation systems, and
new model is added to correct the mistakes of the data compression. It plays a crucial role in exploratory data
analysis and understanding complex datasets without explicit
combined previous models.
guidance.
Random Forest is a popular machine learning algorithm that
belongs to the supervised learning technique. It can be used for
both Classification and Regression problems in ML. It is based on
the concept of ensemble learning, which is a process
of combining multiple classifiers to solve a complex problem and
to improve the performance of the model.

You might also like