Support
Quality
Security
License
Reuse
kandi has reviewed ML-From-Scratch and discovered the below as its top functions. This is intended to give you an instant insight into ML-From-Scratch implemented functionality, and help decide if they suit your requirements.
Machine Learning From Scratch. Bare bones NumPy implementations of machine learning models and algorithms with a focus on accessibility. Aims to cover everything from linear regression to deep learning.
Installation
$ git clone https://github.com/eriklindernoren/ML-From-Scratch
$ cd ML-From-Scratch
$ python setup.py install
QUESTION
Why the predicted value by LinearRegression is exactly the same as the true value?
Asked 2020-Sep-24 at 23:10I'm doing a regression by LinearRegression
and get the mean squared error 0. I think there should be some deviation(at least small). Could you please explain this phenomenon?
## Import packages
import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
import urllib.request
## Import dataset
urllib.request.urlretrieve('https://raw.githubusercontent.com/Data-Science-FMI/ml-from-scratch-2019/master/data/house_prices_train.csv',
'house_prices_train.csv')
df_train = pd.read_csv('house_prices_train.csv')
x = df_train['GrLivArea'].values.reshape(1, -1)
y = df_train['SalePrice'].values.reshape(1, -1)
print('The explanatory variable is', x)
print('The variable to be predicted is', y)
## Regression
reg = LinearRegression().fit(x, y)
mean_squared_error(y, reg.predict(x))
print('The MSE is', mean_squared_error(y, reg.predict(x)))
print('Predicted value is', reg.predict(x))
print('True value is', y)
The result is
The explanatory variable is [[1710 1262 1786 ... 2340 1078 1256]]
The variable to be predicted is [[208500 181500 223500 ... 266500 142125 147500]]
The MSE is 0.0
Predicted value is [[208500. 181500. 223500. ... 266500. 142125. 147500.]]
True value is [[208500 181500 223500 ... 266500 142125 147500]]
ANSWER
Answered 2020-Sep-24 at 23:10While the comments are certainly correct that a model's score on its own training set will be inflated, it is unlikely to get a perfect fit with linear regression, especially with just one feature.
Your problem is that you've reshaped the data incorrectly: reshape(1, -1)
makes an array of shape (1, n)
, so your model thinks it has n
features and n
outputs with only a single sample, and so is a multiple linear regression with a perfect fit. Try instead with reshape(-1, 1)
for x
and no reshaping for y
.
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
No vulnerabilities reported
Explore Related Topics