[Python] 캐글 자전거 수요 예측
bike-sharing-demand
1. 모듈 import
import pandas as pd
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
import datetime
import os
from scipy import stats
%matplotlib inline
mpl.rcParams['axes.unicode_minus'] = False
2. 데이터 로드
train = pd.read_csv("train.csv", parse_dates=["datetime"])
test = pd.read_csv("test.csv", parse_dates=["datetime"])
train.shape, test.shape
((10886, 12), (6493, 9))
train.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10886 entries, 0 to 10885
Data columns (total 12 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 datetime 10886 non-null datetime64[ns]
1 season 10886 non-null int64
2 holiday 10886 non-null int64
3 workingday 10886 non-null int64
4 weather 10886 non-null int64
5 temp 10886 non-null float64
6 atemp 10886 non-null float64
7 humidity 10886 non-null int64
8 windspeed 10886 non-null float64
9 casual 10886 non-null int64
10 registered 10886 non-null int64
11 count 10886 non-null int64
dtypes: datetime64[ns](1), float64(3), int64(8)
memory usage: 1020.7 KB
test.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6493 entries, 0 to 6492
Data columns (total 9 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 datetime 6493 non-null datetime64[ns]
1 season 6493 non-null int64
2 holiday 6493 non-null int64
3 workingday 6493 non-null int64
4 weather 6493 non-null int64
5 temp 6493 non-null float64
6 atemp 6493 non-null float64
7 humidity 6493 non-null int64
8 windspeed 6493 non-null float64
dtypes: datetime64[ns](1), float64(3), int64(5)
memory usage: 456.7 KB
train.isnull().sum()
datetime 0
season 0
holiday 0
workingday 0
weather 0
temp 0
atemp 0
humidity 0
windspeed 0
casual 0
registered 0
count 0
dtype: int64
test.isnull().sum()
datetime 0
season 0
holiday 0
workingday 0
weather 0
temp 0
atemp 0
humidity 0
windspeed 0
dtype: int64
#year, month, day, dayofweek(0~6), quarter, hour, minute, second 컬럼 생성
train['year'] = train['datetime'].dt.year
train['month'] = train['datetime'].dt.month
train['day'] = train['datetime'].dt.day
train['dayofweek'] = train['datetime'].dt.dayofweek
train['quarter'] = train['datetime'].dt.quarter
train['hour'] = train['datetime'].dt.hour
train['minute'] = train['datetime'].dt.minute
train['second'] = train['datetime'].dt.second
train.shape
(10886, 20)
test['year'] = test['datetime'].dt.year
test['month'] = test['datetime'].dt.month
test['day'] = test['datetime'].dt.day
test['dayofweek'] = test['datetime'].dt.dayofweek
test['quarter'] = test['datetime'].dt.quarter
test['hour'] = test['datetime'].dt.hour
test['minute'] = test['datetime'].dt.minute
test['second'] = test['datetime'].dt.second
test.shape
(6493, 17)
train.head()
| datetime | season | holiday | workingday | weather | temp | atemp | humidity | windspeed | casual | registered | count | year | month | day | dayofweek | quarter | hour | minute | second | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2011-01-01 00:00:00 | 1 | 0 | 0 | 1 | 9.84 | 14.395 | 81 | 0.0 | 3 | 13 | 16 | 2011 | 1 | 1 | 5 | 1 | 0 | 0 | 0 |
| 1 | 2011-01-01 01:00:00 | 1 | 0 | 0 | 1 | 9.02 | 13.635 | 80 | 0.0 | 8 | 32 | 40 | 2011 | 1 | 1 | 5 | 1 | 1 | 0 | 0 |
| 2 | 2011-01-01 02:00:00 | 1 | 0 | 0 | 1 | 9.02 | 13.635 | 80 | 0.0 | 5 | 27 | 32 | 2011 | 1 | 1 | 5 | 1 | 2 | 0 | 0 |
| 3 | 2011-01-01 03:00:00 | 1 | 0 | 0 | 1 | 9.84 | 14.395 | 75 | 0.0 | 3 | 10 | 13 | 2011 | 1 | 1 | 5 | 1 | 3 | 0 | 0 |
| 4 | 2011-01-01 04:00:00 | 1 | 0 | 0 | 1 | 9.84 | 14.395 | 75 | 0.0 | 0 | 1 | 1 | 2011 | 1 | 1 | 5 | 1 | 4 | 0 | 0 |
test.head()
| datetime | season | holiday | workingday | weather | temp | atemp | humidity | windspeed | year | month | day | dayofweek | quarter | hour | minute | second | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2011-01-20 00:00:00 | 1 | 0 | 1 | 1 | 10.66 | 11.365 | 56 | 26.0027 | 2011 | 1 | 20 | 3 | 1 | 0 | 0 | 0 |
| 1 | 2011-01-20 01:00:00 | 1 | 0 | 1 | 1 | 10.66 | 13.635 | 56 | 0.0000 | 2011 | 1 | 20 | 3 | 1 | 1 | 0 | 0 |
| 2 | 2011-01-20 02:00:00 | 1 | 0 | 1 | 1 | 10.66 | 13.635 | 56 | 0.0000 | 2011 | 1 | 20 | 3 | 1 | 2 | 0 | 0 |
| 3 | 2011-01-20 03:00:00 | 1 | 0 | 1 | 1 | 10.66 | 12.880 | 56 | 11.0014 | 2011 | 1 | 20 | 3 | 1 | 3 | 0 | 0 |
| 4 | 2011-01-20 04:00:00 | 1 | 0 | 1 | 1 | 10.66 | 12.880 | 56 | 11.0014 | 2011 | 1 | 20 | 3 | 1 | 4 | 0 | 0 |
3. EDA
fig, ((ax1, ax2, ax3), (ax4, ax5, ax6)) = plt.subplots(2, 3)
fig.set_size_inches(18, 8)
sns.barplot(data = train, x = 'year', y='count', ax=ax1)
sns.barplot(data = train, x = 'month', y='count', ax=ax2)
sns.barplot(data = train, x = 'day', y='count', ax=ax3)
sns.barplot(data = train, x = 'hour', y='count', ax=ax4)
sns.barplot(data = train, x = 'minute', y='count', ax=ax5)
sns.barplot(data = train, x = 'second', y='count', ax=ax6)
ax1.set(ylabel='Count')
ax2.set(ylabel='month')
ax3.set(ylabel='day')
ax4.set(ylabel='hour')
[Text(0, 0.5, 'hour')]

minute와 second는 값이 0밖에 없으므로 해당 Feature는 사용하지 않는다.
fig, axes = plt.subplots(2, 2)
fig.set_size_inches(12, 10)
sns.boxplot(data = train, y='count', orient="v", ax=axes[0][0])
sns.boxplot(data = train, y='count', x = "season", orient="v", ax=axes[0][1])
sns.boxplot(data = train, y='count', x = "hour", orient="v", ax=axes[1][0])
sns.boxplot(data = train, y='count', x = "workingday", orient="v", ax=axes[1][1])
axes[0][0].set(ylabel='Count')
axes[0][1].set(xlabel = "Season", ylabel='month')
axes[1][0].set(xlabel = "Hour of the Day", ylabel='day')
axes[1][1].set(xlabel = "Woking Day", ylabel='hour')
[Text(0.5, 0, 'Woking Day'), Text(0, 0.5, 'hour')]

hour의 너무 많은 값이 이상치로 나온다. hour를 여러 조건으로 나누어 살펴보자.
fig, ((ax1, ax2, ax3, ax4, ax5)) = plt.subplots(5)
fig.set_size_inches(18, 25)
sns.pointplot(data = train, x = 'hour', y='count', ax=ax1)
sns.pointplot(data = train, x = 'hour', y='count', hue="workingday", ax=ax2)
sns.pointplot(data = train, x = 'hour', y='count', hue="dayofweek", ax=ax3)
sns.pointplot(data = train, x = 'hour', y='count', hue="weather", ax=ax4)
sns.pointplot(data = train, x = 'hour', y='count', hue="season", ax=ax5)
<AxesSubplot:xlabel='hour', ylabel='count'>

주중과 주말, 요일별로 hour당 수요량이 다르다.
train[['temp', 'atemp', 'weather', 'count', 'casual', 'registered']].corr()
| temp | atemp | weather | count | casual | registered | |
|---|---|---|---|---|---|---|
| temp | 1.000000 | 0.984948 | -0.055035 | 0.394454 | 0.467097 | 0.318571 |
| atemp | 0.984948 | 1.000000 | -0.055376 | 0.389784 | 0.462067 | 0.314635 |
| weather | -0.055035 | -0.055376 | 1.000000 | -0.128655 | -0.135918 | -0.109340 |
| count | 0.394454 | 0.389784 | -0.128655 | 1.000000 | 0.690414 | 0.970948 |
| casual | 0.467097 | 0.462067 | -0.135918 | 0.690414 | 1.000000 | 0.497250 |
| registered | 0.318571 | 0.314635 | -0.109340 | 0.970948 | 0.497250 | 1.000000 |
plt.figure(figsize=(12,10))
sns.heatmap(abs(train.corr()), annot=True, cmap='RdPu')
plt.show()

def concatenate_year_month(datetime) :
return "{0}-{1}".format(datetime.year, datetime.month)
train["year_month"] = train["datetime"].apply(concatenate_year_month)
print(train.shape)
train[["datetime", "year_month"]].head()
(10886, 21)
| datetime | year_month | |
|---|---|---|
| 0 | 2011-01-01 00:00:00 | 2011-1 |
| 1 | 2011-01-01 01:00:00 | 2011-1 |
| 2 | 2011-01-01 02:00:00 | 2011-1 |
| 3 | 2011-01-01 03:00:00 | 2011-1 |
| 4 | 2011-01-01 04:00:00 | 2011-1 |
fig, (ax1, ax2) = plt.subplots(1, 2)
fig.set_size_inches(18, 4)
sns.barplot(data=train, x="year", y="count", ax=ax1)
sns.barplot(data=train, x="month", y="count", ax=ax2)
fig, ax3 = plt.subplots(1, 1)
fig.set_size_inches(18, 4)
sns.barplot(data=train, x="year_month", y="count", ax=ax3)
<AxesSubplot:xlabel='year_month', ylabel='count'>


#이상치 제거
trainWithoutOutliers = train[np.abs(train["count"] - train["count"].mean()) <= (3*train["count"].mean())]
print(train.shape)
print(trainWithoutOutliers.shape)
(10886, 21)
(10768, 21)
fig, axes = plt.subplots(2, 2)
fig.set_size_inches(12, 10)
sns.distplot(train["count"], ax=axes[0][0])
stats.probplot(train["count"], dist='norm', fit=True, plot=axes[0][1])
sns.distplot(np.log(trainWithoutOutliers["count"]), ax=axes[1][0])
stats.probplot(np.log(trainWithoutOutliers["count"]), dist='norm', fit=True, plot=axes[1][1])
C:\Users\yun70\anaconda3\lib\site-packages\seaborn\distributions.py:2557: FutureWarning: `distplot` is a deprecated function and will be removed in a future version. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `histplot` (an axes-level function for histograms).
warnings.warn(msg, FutureWarning)
C:\Users\yun70\anaconda3\lib\site-packages\seaborn\distributions.py:2557: FutureWarning: `distplot` is a deprecated function and will be removed in a future version. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `histplot` (an axes-level function for histograms).
warnings.warn(msg, FutureWarning)
((array([-3.82886059, -3.6047202 , -3.48171232, ..., 3.48171232,
3.6047202 , 3.82886059]),
array([0. , 0. , 0. , ..., 6.63463336, 6.64118217,
6.64118217])),
(1.4123284761790171, 4.528748279449013, 0.9542628138734534))

4. 예측 준비
풍속 0값 예측
# widspeed 풍속에 0 값이 가장 많음 -> 잘못 기록된 데이터를 고쳐 줄 필요가 있음
fig, axes = plt.subplots(2)
fig.set_size_inches(18, 10)
plt.sca(axes[0])
plt.xticks(rotation=30, ha='right')
axes[0].set(ylabel='Count', title="train windspeed")
sns.countplot(data=train, x="windspeed", ax=axes[0])
plt.sca(axes[1])
plt.xticks(rotation=30, ha='right')
axes[1].set(ylabel='Count', title="test windspeed")
sns.countplot(data=test, x="windspeed", ax=axes[1])
<AxesSubplot:title={'center':'test windspeed'}, xlabel='windspeed', ylabel='count'>

trainWind0 = train.loc[train['windspeed']==0]
trainWindNot0 = train.loc[train['windspeed']!=0]
print(trainWind0.shape)
print(trainWindNot0.shape)
(1313, 21)
(9573, 21)
# 랜덤포레스트로 예측해서 풍속 넣기
from sklearn.ensemble import RandomForestClassifier
def predict_windspeed(data) :
dataWind0 = data.loc[data['windspeed'] == 0]
dataWindNot0 = data.loc[data['windspeed'] != 0]
wCol = ["season", "weather", "humidity", "month", "temp", "year", "atemp"]
dataWindNot0["windspeed"] = dataWindNot0["windspeed"].astype("str")
rfModel_wind = RandomForestClassifier()
rfModel_wind.fit(dataWindNot0[wCol], dataWindNot0["windspeed"])
wind0Values = rfModel_wind.predict(X = dataWind0[wCol])
predictWind0 = dataWind0
predictWindNot0 = dataWindNot0
predictWind0["windspeed"] = wind0Values
data = predictWindNot0.append(predictWind0)
data["windspeed"] = data["windspeed"].astype("float")
data.reset_index(inplace=True)
data.drop('index', inplace=True, axis=1)
return data
train = predict_windspeed(train)
fig, ax1 = plt.subplots()
fig.set_size_inches(18, 6)
plt.sca(ax1)
plt.xticks(rotation=30, ha='right')
ax1.set(ylabel='Count', title="train windspeed")
sns.countplot(data=train, x="windspeed", ax=ax1)
<AxesSubplot:title={'center':'train windspeed'}, xlabel='windspeed', ylabel='count'>

feature 정하기
categorical_feature_names = ["season", "holiday", "workingday", "weather", "dayofweek", "month", "year", "hour"]
for var in categorical_feature_names :
train[var] = train[var]. astype("category")
test[var] = test[var]. astype("category")
feature_names = ["season", "weather", "temp", "atemp", "humidity", "windspeed", "year", "hour", "dayofweek", "holiday", "workingday"]
feature_names
['season',
'weather',
'temp',
'atemp',
'humidity',
'windspeed',
'year',
'hour',
'dayofweek',
'holiday',
'workingday']
X_train = train[feature_names]
print(X_train.shape)
X_train.head()
(10886, 11)
| season | weather | temp | atemp | humidity | windspeed | year | hour | dayofweek | holiday | workingday | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 2 | 9.84 | 12.880 | 75 | 6.0032 | 2011 | 5 | 5 | 0 | 0 |
| 1 | 1 | 1 | 15.58 | 19.695 | 76 | 16.9979 | 2011 | 10 | 5 | 0 | 0 |
| 2 | 1 | 1 | 14.76 | 16.665 | 81 | 19.0012 | 2011 | 11 | 5 | 0 | 0 |
| 3 | 1 | 1 | 17.22 | 21.210 | 77 | 19.0012 | 2011 | 12 | 5 | 0 | 0 |
| 4 | 1 | 2 | 18.86 | 22.725 | 72 | 19.9995 | 2011 | 13 | 5 | 0 | 0 |
X_test = test[feature_names]
print(X_test.shape)
X_test.head()
(6493, 11)
| season | weather | temp | atemp | humidity | windspeed | year | hour | dayofweek | holiday | workingday | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 1 | 10.66 | 11.365 | 56 | 26.0027 | 2011 | 0 | 3 | 0 | 1 |
| 1 | 1 | 1 | 10.66 | 13.635 | 56 | 0.0000 | 2011 | 1 | 3 | 0 | 1 |
| 2 | 1 | 1 | 10.66 | 13.635 | 56 | 0.0000 | 2011 | 2 | 3 | 0 | 1 |
| 3 | 1 | 1 | 10.66 | 12.880 | 56 | 11.0014 | 2011 | 3 | 3 | 0 | 1 |
| 4 | 1 | 1 | 10.66 | 12.880 | 56 | 11.0014 | 2011 | 4 | 3 | 0 | 1 |
label_name = "count"
y_train = train[label_name]
print(y_train.shape)
y_train.head()
(10886,)
0 1
1 36
2 56
3 84
4 94
Name: count, dtype: int64
RMSLE
- 과소평가에 패널티 줌
- 오차 제곱 평균의 제곱근
- 0에 가까울수록 정밀도 높음
from sklearn.metrics import make_scorer
def rmsle(predicted_values, actual_values) :
predicted_values = np.array(predicted_values)
actual_values = np.array(actual_values)
log_predict = np.log(predicted_values + 1)
log_actual = np.log(actual_values + 1)
difference = log_predict - log_actual
difference = np.square(difference)
mean_difference = difference.mean()
score = np.sqrt(mean_difference)
return score
rmsle_scorer = make_scorer(rmsle)
rmsle_scorer
make_scorer(rmsle)
KFold 교차검증
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score
k_fold = KFold(n_splits=10, shuffle=True, random_state=0)
5. 예측 모델
Linear Regression Model
from sklearn.linear_model import LinearRegression, Ridge, Lasso
from sklearn.model_selection import GridSearchCV
from sklearn import metrics
import warnings
pd.options.mode.chained_assignment = None
warnings.filterwarnings("ignore", category=DeprecationWarning)
# 선형회귀 모델을 초기화
lModel = LinearRegression()
# 모델 학습
y_train_log = np.log1p(y_train)
lModel.fit(X_train, y_train_log)
# 예측, 정확도 평가
preds = lModel.predict(X_train)
print("RMSLE Value For Linear Regression: ", rmsle(np.exp(y_train_log), np.exp(preds)))
RMSLE Value For Linear Regression: 0.9798738624110362
Regularization Model - Ridge
- 가중치 모든 원소가 0에 가깝게 만들어 모든 Feature가 주는 영향을 최소화(기울기 작게 만듦)
- max_iter : 반복 실행하는 최대 횟수
ridge_m_ = Ridge()
ridge_params_ = {'max_iter' : [3000], 'alpha' : [0.01, 1, 2, 3, 4, 10, 30, 100, 200, 300, 400, 800, 900, 1000]}
rmsle_scorer = metrics.make_scorer(rmsle, greater_is_better=False)
grid_ridge_m = GridSearchCV(ridge_m_, ridge_params_, scoring=rmsle_scorer, cv=5)
y_train_log = np.log1p(y_train)
grid_ridge_m.fit(X_train, y_train_log)
preds = grid_ridge_m.predict(X_train)
print(grid_ridge_m.best_params_)
print("RMSLE Value For Ridge Regression: ", rmsle(np.exp(y_train_log), np.exp(preds)))
fig,ax = plt.subplots()
fig.set_size_inches(12, 5)
df = pd.DataFrame(grid_ridge_m.cv_results_)
df["alpha"] = df["params"].apply(lambda x:x["alpha"])
df["rmsle"] = df["mean_test_score"].apply(lambda x:-x)
plt.xticks(rotation=30, ha='right')
sns.pointplot(data=df, x="alpha", y="rmsle", ax=ax)
{'alpha': 0.01, 'max_iter': 3000}
RMSLE Value For Ridge Regression: 0.9798738604013573
<AxesSubplot:xlabel='alpha', ylabel='rmsle'>

-
여기서 ‘GridSearchCV’ object has no attribute ‘gridscores’ 라는 오류가 떴다.
구글에 쳐보니 sklearn의 이전 버전에는 ‘gridscores‘가 있고 새 버전은 ‘cvresults‘로 변경되었다고 한다.
그래서 df = pd.DataFrame(gridridge_m.cv_results) 고치고 했다. -
고쳤더니 컬럼값도 바껴있다..
원하는 컬럼값을 찾아 바꿔주었다.
df[“parameters”] -> df[“params”]
df[“mean_validation_score”] -> df[“mean_test_score”]
Regularization Model - Lasso
- 계수를 0에 가깝게 만들려고 하며 이를 L1 규제라고 한다. 0인 계수는 완전히 제외하는 Feature가 생긴다는 의미
- Feature 선택이 자동으로 이루어짐
- alpha 값의 기본 값은 1.0이며, 과소 적합을 줄이기 위해서는 이 값을 줄여야한다.
lasso_m_ = Lasso()
alpha = 1/np.array([0.01, 1, 2, 3, 4, 10, 30, 100, 200, 300, 400, 800, 900, 1000])
lasso_params_ = {'max_iter' : [3000], 'alpha' : alpha}
grid_lasso_m = GridSearchCV(lasso_m_, lasso_params_, scoring=rmsle_scorer, cv=5)
y_train_log = np.log1p(y_train)
grid_lasso_m.fit(X_train, y_train_log)
preds = grid_lasso_m.predict(X_train)
print(grid_lasso_m.best_params_)
print("RMSLE Value For Lasso Regression: ", rmsle(np.exp(y_train_log), np.exp(preds)))
fig,ax = plt.subplots()
fig.set_size_inches(12, 5)
df = pd.DataFrame(grid_ridge_m.cv_results_)
df["alpha"] = df["params"].apply(lambda x:x["alpha"])
df["rmsle"] = df["mean_test_score"].apply(lambda x:-x)
plt.xticks(rotation=30, ha='right')
sns.pointplot(data=df, x="alpha", y="rmsle", ax=ax)
{'alpha': 0.00125, 'max_iter': 3000}
RMSLE Value For Lasso Regression: 0.9798830685468796
<AxesSubplot:xlabel='alpha', ylabel='rmsle'>

Ensemble Model - RandomForest
from sklearn.ensemble import RandomForestRegressor
rfModel = RandomForestRegressor(n_estimators = 100)
y_train_log = np.log1p(y_train)
rfModel.fit(X_train, y_train_log)
preds = rfModel.predict(X_train)
score = rmsle(np.exp(y_train_log), np.exp(preds))
print("RMSLE Value For Random Forest: ", score)
RMSLE Value For Random Forest: 0.10687876851241544
Ensemble Model - Gradient Boost
- 이진 트리의 오차를 보완
- 무작위성X, 강력한 사전 가지치기
from sklearn.ensemble import GradientBoostingRegressor
gbm = GradientBoostingRegressor(n_estimators = 4000, alpha=0.01)
y_train_log = np.log1p(y_train)
gbm.fit(X_train, y_train_log)
preds = gbm.predict(X_train)
print("RMSLE Value For Gradient Boost: ", rmsle(np.exp(y_train_log), np.exp(preds)))
RMSLE Value For Gradient Boost: 0.20502659343932583
predsTest = rfModel.predict(X_test)
fig, (ax1, ax2) = plt.subplots(1, 2)
fig.set_size_inches(12, 5)
sns.distplot(y_train, ax=ax1, bins=50)
sns.distplot(np.exp(predsTest), ax=ax2, bins=50)
C:\Users\yun70\anaconda3\lib\site-packages\seaborn\distributions.py:2557: FutureWarning: `distplot` is a deprecated function and will be removed in a future version. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `histplot` (an axes-level function for histograms).
warnings.warn(msg, FutureWarning)
C:\Users\yun70\anaconda3\lib\site-packages\seaborn\distributions.py:2557: FutureWarning: `distplot` is a deprecated function and will be removed in a future version. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `histplot` (an axes-level function for histograms).
warnings.warn(msg, FutureWarning)
<AxesSubplot:ylabel='Density'>

6. Submit
submission = pd.read_csv("sampleSubmission.csv")
submission
submission["count"] = np.exp(predsTest)
print(submission.shape)
submission.head()
(6493, 2)
| datetime | count | |
|---|---|---|
| 0 | 2011-01-20 0:00 | 12.422343 |
| 1 | 2011-01-20 1:00 | 5.785711 |
| 2 | 2011-01-20 2:00 | 4.143073 |
| 3 | 2011-01-20 3:00 | 4.055910 |
| 4 | 2011-01-20 4:00 | 3.508064 |
submission.to_csv("Score_{0:.5f}_submission.csv" .format(score), index=False)
※ Kaggle에 제출할 때 “RandomForestRegressor(n_estimators = 100)” 처럼 어떤 파라미터를 썼는지 작성하고 제출하면 더 좋다.
댓글남기기