Simple Linear Regression Lab II
DataSet (After All Preprocessing)
Years Of Experience Salary
1.1 39343
1.3 46205
1.5 37731
2 43525
2.2 39891
2.9 56642
3 60150
3.2 54445
3.2 64445
3.7 57189
3.9 63218
4 55794
4 56957
4.1 57081
4.5 61111
4.9 67938
5.1 66029
5.3 83088
5.9 81363
6 93940
6.8 91738
7.1 98273
7.9 101302
8.2 113812
8.7 109431
9 105582
9.5 116969
9.6 112635
10.3 122391
10.5 121872
Complete Code
# Simple Linear Regression
# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# Importing the dataset
dataset = pd.read_csv('Salary_Data.csv')
X = dataset.iloc[:, :-1].values
# all columns except last column
y = dataset.iloc[:, -1].values
# here -1 means last column
# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 1/3, random_state = 0)
# Training the Simple Linear Regression model on the Training set
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, y_train)
# Predicting the Test set results
y_pred = regressor.predict(X_test)
# here compare Y_test and y_predict
# Y_test contains real salary and y_predict contains predicted salaries
# Visualising the Training set results
plt.scatter(X_train, y_train, color = 'red')
# x-axis experience and y-axis is salary, here observation point shown as red
# it will show real values in red
plt.plot(X_train, regressor.predict(X_train), color = 'blue')
# here y coordinate will be predictions from x_train, and color blue for observational points
# it will show predicted values as blue line
plt.title('Salary vs Experience (Training set)')
# Title of graph
plt.xlabel('Years of Experience')
# x-axis lable
plt.ylabel('Salary')
# y-axis lable
plt.show()
# here real values are red and predicted values are blue
# Visualising the Test set results
plt.scatter(X_test, y_test, color = 'red')
plt.plot(X_train, regressor.predict(X_train), color = 'blue')
plt.title('Salary vs Experience (Test set)')
plt.xlabel('Years of Experience')
plt.ylabel('Salary')
plt.show()
# here blue line is same as previous graph but red points are from test set