美文网首页
Kaggle|Exercise2: Model Validati

Kaggle|Exercise2: Model Validati

作者: 十二支箭 | 来源:发表于2020-04-09 17:02 被阅读0次

来自kaggle官网的标准化机器学习流程。

Recap

You've built a model. In this exercise you will test how good your model is.

Run the cell below to set up your coding environment where the previous exercise left off.

# Code you have previously used to load data
import pandas as pd
from sklearn.tree import DecisionTreeRegressor

# Path of the file to read
iowa_file_path = '../input/home-data-for-ml-course/train.csv'

home_data = pd.read_csv(iowa_file_path)
y = home_data.SalePrice
feature_columns = ['LotArea', 'YearBuilt', '1stFlrSF', '2ndFlrSF', 'FullBath', 'BedroomAbvGr', 'TotRmsAbvGrd']
X = home_data[feature_columns]

# Specify Model
iowa_model = DecisionTreeRegressor()
# Fit Model
iowa_model.fit(X, y)

print("First in-sample predictions:", iowa_model.predict(X.head()))
print("Actual target values for those homes:", y.head().tolist())

# Set up code checking
from learntools.core import binder
binder.bind(globals())
from learntools.machine_learning.ex4 import *
print("Setup Complete")
First in-sample predictions: [208500. 181500. 223500. 140000. 250000.]
Actual target values for those homes: [208500, 181500, 223500, 140000, 250000]
Setup Complete

Exercises

Step 1: Split Your Data

Use the train_test_split function to split up your data.

Give it the argument random_state=1 so the check functions know what to expect when verifying your code.

Recall, your features are loaded in the DataFrame X and your target is loaded in y.

# Import the train_test_split function and uncomment
from sklearn.model_selection import train_test_split

# fill in and uncomment
train_X, val_X, train_y, val_y = train_test_split(X,y,random_state=1)

# Check your answer
step_1.check()

Step 2: Specify and Fit the Model

Create a DecisionTreeRegressor model and fit it to the relevant data.
Set random_state to 1 again when creating the model.

# You imported DecisionTreeRegressor in your last exercise
# and that code has been copied to the setup code above. So, no need to
# import it again

# Specify the model
iowa_model = DecisionTreeRegressor(random_state=1)

# Fit iowa_model with the training data.
iowa_model.fit(train_X,train_y)

# Check your answer
step_2.check()

Step 3: Make Predictions with Validation data

# Predict with all validation observations
val_predictions = iowa_model.predict(val_X)

# Check your answer
step_3.check()

Inspect your predictions and actual values from validation data.

# print the top few validation predictions
print(iowa_model.predict(val_X.head()))
#or print(val_predictions[:5])
#print the top few actual prices from validation data
print(val_y.head().tolist())
[186500. 184000. 130000.  92000. 164500.]
[231500, 179500, 122000, 84500, 142000]

What do you notice that is different from what you saw with in-sample predictions (which are printed after the top code cell in this page).

Do you remember why validation predictions differ from in-sample (or training) predictions? This is an important idea from the last lesson.

Step 4: Calculate the Mean Absolute Error in Validation Data

from sklearn.metrics import mean_absolute_error
val_mae = mean_absolute_error(val_y,val_predictions)

# uncomment following line to see the validation_mae
print(val_mae)

# Check your answer
step_4.check()

Is that MAE good? There isn't a general rule for what values are good that applies across applications. But you'll see how to use (and improve) this number in the next step.

Keep Going

You are ready for [Underfitting and Overfitting].

To be continued

相关文章

  • Kaggle|Exercise2: Model Validati

    来自kaggle官网的标准化机器学习流程。 Recap You've built a model. In this...

  • Kaggle|Exercise3:Underfitting an

    来自kaggle官网的机器学习标准化流程。 Recap You've built your first model...

  • JOS lab1

    Lab 1: Booting a PC PC Bootstrap Exercise 1. Exercise2 Pa...

  • BO数据验证

    1.验证Bean: public class ValBean { /** * Bean Validati...

  • 模型融合stacking

    kaggle比赛利器stacking 模型叠加 我们以二层叠加为例子理解它 我们有模型model1(可以是GBDT...

  • Stacking的思路及编码

     在kaggle上打比赛的时候发现,利用ensemble的思路对Base model进行集成是一个对最终模型影响较...

  • kaggle API配置

    需要两点进行kaggle API配置: github上下载kaggle API kaggle 账户上产生token...

  • 竞赛相关

    Kaggle 新手入门之路 (完结)Kaggle官网kaggle word-vectors 转载一篇别人 kagg...

  • Kaggle: Detect toxicity - Basic

    This kaggle is: https://www.kaggle.com/c/jigsaw-unintende...

  • swift Alamofire 用法介绍

    1.基本用法 2.链式响应 添加Headers和parameter 自定义队列 Response Validati...

网友评论

      本文标题:Kaggle|Exercise2: Model Validati

      本文链接:https://www.haomeiwen.com/subject/knwcmhtx.html