# Linear Regression (Step by Step)

I’ve created these step-by-step machine learning algorith implementations in Python for everyone who is new to the field and might be confused with the different steps.

Check out this page to learn about curious history of Linear Regression.

#### Estimated Time

10 mins

#### Skill Level

Advanced

#### Content Sections

#### Course Provider

Provided by HolyPython.com

I’ve split up Linear Regression implementation to 2 different categories here:

(*Red for the actual machine learning work and black font signifies preparation phase*)

- Import the relevant Python libraries
- Import the data
- Read / clean / adjust the data (if needed)
- Create a train / test split
- Create the
model object*Linear Regression* - Fit the model
- Predict
- Evaluate the accuracy

## 1 Import Libraries

pandas can be useful for constructing **dataframes** and **scikit learn** is the ultimate library for simple machine learning operations, learning and practicing machine learning.

## 3 Read the Data

Reading data is simple but there can be important points such as: dealing with columns, headers, titles, constructing data frames etc.

## 5 Create the Model

Machine Learning models can be created with a very simple and straight-forward process using scikitlearn. In this case we will create a Linear Regression object from the * Linear Regression* of

*library.*

**scikitlearn.linear_model**## 7 Predict

Once the model is ready, predictions can be done on the test part of the data. Furthermore, I enjoy predicting foreign values that are not in the initial dataset just to observe the outcomes the model creates. * .predict* method is used for predictions.

## 2 Import the Data

We need a nice dataset that’s sensible to analyze with machine learning techniques, particularly * linear regression* in this case.

*has some cool sample data as well.*

**Scikitlearn**## 4 Split the Data

Even splitting data is made easy with * Scikit-learn, *for this operation we will use

*from*

**train_test_module***library.*

**scikitlearn**## 6 Fit the Model

Machine Learning models are generally * fit* by training data. This is the part where training of the model takes place and we will do the same for our

*model.*

**Linear Regression**## 8 Evaluation

Finally, ** scikitlearn** library’s

*module is very useful to test the accuracy of the model’s predictions. This part could be done manually as well but*

**metrics***module brings lots of functionality and simplicity to the table.*

**metrics**### 1- Importing the libraries (*pandas and sklearn libraries*)

First the import part for libraries:

**pandas**and**numpy**can be useful to handle data and data frames**train_test_split**from**sklearn.linear_model**makes splitting data for train and test purposes very easy and proper**sklearn.linear_model**provides the actual model for**Linear Regression****datasets**module of**sklearn**has great datasets making it easy to experiment with AI & Machine Learning**metrics**is great for evaluating the results we’ll get from linear regression

```
###Importing Libraries
import numpy as np
from sklearn import datasets, linear_model
from sklearn.metrics import r2_score, mean_squared_error
from sklearn.model_selection import train_test_split as tts
```

### 2- Importing the data (*diabetes dataset*)

It’s time to find some data to work with. For the simplicity I will suggest using pre-included **datasets** library in scikitlearn. They are great for practice and everything is already taken care. So, there won’t be a complication such as missing values or invalid characters etc. while you’re learning.

**One thing I’ve been learning is to keep it simple while I’m learning in fields outside my expertise and then step up gradually to avoid burn-out.**

Let’s import the diabetes dataset:

```
###Importing Dataset
X, y = datasets.load_diabetes(return_X_y=True)
```

### 3- Reading the data (*scikitlearn datasets and pandas dataframe*)

Now we can get the data ready:

*In case of machine learning algorithms: you usually have feature(s) and an outcome or multiple outcomes to work with, this mean different titles and sometimes different types of data. That’s why DataFrame becomes the perfect structure to work with.*

*In this regression example, we’re choosing one of the features as to represent X for simplicity. (index 1)*

```
###Constructing Data Frame
X = X[:, np.newaxis, 1] #Choosing one of the features for regression
```

### 4- Splitting the data (*train_test_split module*)

This is another standard Machine Learning step:

**We need to split data so that there are: **

- training feature(s) and outcome(s)
- test feature(s) and test outcome(s)

**linear regression machine learning model**with the train split and then test the

**trained model**with the test split.

It’s a rather simple process (step) thanks to Scikit learn’s **train_test_split** module.

- I named the variables X_tr, y_tr for training and X_ts, y_ts for test input. This is up to your taste or your circumstances.
- X_tr, X_ts will be assigned to a part of the features
- y_tr, y_ts will be assigned to a part of outcomes
- Split ratio can be assigned using
**test_size**parameter. This is an important parameter and something you should experiment with to get a better understanding. 1/3rd or 30% usually are reasonable ratios. - Then model works on X_tr and y_tr for training.
- Then we will test it on X_ts and y_ts to see how successful the model is.

```
###Splitting train/test data
X_tr, X_ts, y_tr, y_ts = tts(X,y, test_size=30/100, random_state=None)
```

Linear Regression can be prone to overfit and regularization parameter can be very useful for further optimization.

You can take a look at this page regarding Regularization parameter in Linear Regression: Linear Regression Optimization Parameters

### 5- Creating the model (*linear_model.LinearRegression*)

Now we can create a ** Linear Regression** object and put machine learning to work using the training data:

```
###Creating Linear Regression Model (OLS)
linreg = linear_model.LinearRegression()
```

### 6- Fitting the model (*Training with features(X) and outcomes (y)*)

```
###Training the Model
linreg.fit(X_tr, y_tr)
```

### 7- Making predictions (*.predict method*)

```
###Making Predictions
y_pr = linreg.predict(X_ts)
# print(y_pr)
```

### 8- Evaluating results (*scikitlearn metrics module*)

```
###Evaluating Prediction Accuracy
print('Coefficients: \n', regr.coef_)
print('Mean squared error: %.2f' % mean_squared_error(y_ts, y_pr))
print('Coefficient of determination: %.2f' % r2_score(y_ts, y_pr))
```

### Bonus: Predicting foreign data

```
###Making Prediction with Foreign Data
linreg.predict([4.5555])
```

You can see the full one piece code in this page: Linear Regression Simple Implementation