ANNOVA IS NOT IN MINING SYLLABUS
//////////////////////////////////////////////////////////////////////////////////////////////////////////
NOT DONE BY MAM
/////////////////////////////////////////////////////////////////////////////////////////////////////////
MACHINE LEARNING ALGORITHM
5. Root Mean Squared Error
Root of Mean Squared Error (MSE) or root of the mean squared distances between
actual and predicted values.
Here, N = total number of data points Yi = actual value Ŷi = predicted value
Higher the RMSE the larger the deviation in actual and predicted value. Lower the
RMSE value the better the model is with its predictions.
Advantages of RMSE:
i) The value of MSE is same as output unit, which makes the interpretation of loss
easy.
Disadvantages of RMSE:
i) Not robust to outliers.
6. Mean Squared Log Error (MSLE)
MSLE is a variation of Mean Squared Error. Use MSLE, when you don't want to
penalize large differences between actual and predicted value.
The logarithmic was introduced to interpret the relative difference between actual
and predicted value. To avoid natural log of possible 0 values, add 1 on both actual
and predicted values before taking logarithmic.
Here, N = total number of data points Yi = actual value Ŷi = predicted value
Advantages of MSLE:
i) Treats small differences between small actual and predicted values same as big
differences between large actual and predicted values.
Disadvantages of MSLE:
i) Penalizes underestimates more than the overestimates.
Linear Regression:
Linear Regression is a supervised machine learning algorithm which performs Regression by
plotting a straight line which best fits the data points.
Y = b0 + b1X1 + b2X2 + ... + bnXn
Assumptions:
Assumes a linear relationship between the independent variable 'x', and the dependent variable 'y'
Assumes no correlation between the independent variables 'x' (Multicollinearity)
Assumes residuals have constant variance at every level of x (Homoscedasticity)
Assumes residuals of the model are normally distributed (Normality)
Assumes no pattern is formed when residuals are plotted
Advantages Disadvantages
Simple Implementation Prone to Underfitting
Performs best on Linear Data Sensitive to Outliers
Overfitting can be reduced by regularization Assumes that data is Independent
Check for normal distribution
Chi-square method
Kolmogorov-Smirnov
Shapirov-Wilk
Graphical Method to find Normal distribution
Histogram
Quantile-Quantile Plot
Miu = mean
Sigma square = variance
Sigma = standard deviation
Assumtion
Features are continuous
Another different variant of naïve bayes are
Bernoulli (bernouli distribution), multinomial (multinomial distribution)