亚洲精品电影院,人妻夜夜爽天天爽三区麻豆av网站 ,极品人妻videos人妻

實驗要求：

1、分類對象：from sklearn.datasets import load_wine

2、實驗步驟：參考隨機森林章節中的實驗5，編寫第1-12步驟程序代碼，最終給出最優的模型參數。

步驟

一、導入各種我們需要的模塊或者數據集等(導入就不注釋了，不懂的看實驗一)

from sklearn.datasets import load_wine
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import GridSearchCV
import matplotlib.pyplot as plt
import numpy as np

二、將數據實例化，評估

這是最開始的分數，沒經過調參的，random_state可以設為任意數

wine = load_wine()
rfc = RandomForestClassifier(random_state=0) #實例化
score_f=cross_val_score(rfc,wine.data,wine.target,cv=10).mean() #交叉驗證10次的平均分
score_f

三、畫出n_estimators的學習曲線，找出最優的n_estimaters

這段代碼跑的比較久一點，n_job用來調用電腦的線程，可能會報錯warnings.warn("Estimator fit failed. The score on this train-test"，報錯的可以注釋掉

scorel = []    #定義一個列表，用來存放每次循環得到的score
for i in range(0,200,10):
    rfc = RandomForestClassifier(n_estimators=i+1,  #因為n_estimators不能為0，所以i要加1
                             #   n_jobs=-1,
                                 random_state=0)
    score = cross_val_score(rfc,wine.data,wine.target,cv=10).mean()
    scorel.append(score)
print(max(scorel),(scorel.index(max(scorel))*10)+1)  #打印出最好的score和他的索引
plt.figure(figsize=[20,5])
plt.plot(range(1,201,10),scorel)
plt.show()

因為我得出的結果索引是31，為了更精確的確定n_estimators，可以在更小的區間內再畫曲線

scorel = []
for i in range(25,35):   #這個是更小的區間
    rfc = RandomForestClassifier(n_estimators=i,  #這里從25開始所以不用加1
                                # n_jobs=-1,
                                 random_state=0)
    score = cross_val_score(rfc,wine.data,wine.target,cv=10).mean()
    scorel.append(score)
print(max(scorel),([*range(25,35)][scorel.index(max(scorel))]))
plt.figure(figsize=[20,5])
plt.plot(range(25,35),scorel)
plt.show()

結果在n_estimators為29的時候分數是最高的

四、用網格搜索找出其他的最優參數

1.首先找max_depth

#調整max_depth
param_grid = {'max_depth':np.arange(1, 14, 1)} #這里范圍為1-14，步長為1是因為有13個特征，由wine.data.shape可查看
rfc = RandomForestClassifier(n_estimators=29 #這里記得加上剛剛調整出來的n_estimators
                             ,random_state=0
                            )
GS = GridSearchCV(rfc,param_grid,cv=10)#網格搜索
GS.fit(wine.data,wine.target)
GS.best_params_  #顯示調整出來的最佳參數
GS.best_score_    #分數不變，說明max_depth不影響模型

2.調整max_features

#調整max_features
 
param_grid = {'max_features':np.arange(1,10,1)} 
 
rfc = RandomForestClassifier(n_estimators=29
                             ,random_state=0
                             #,max_depth=4  #如果我們加入了map_depth=4分數會更低,可以先加了再注釋比較一下分數
                            )
GS = GridSearchCV(rfc,param_grid,cv=10)
GS.fit(wine.data,wine.target)
 
GS.best_params_
GS.best_score_

在我調整完這個參數之后，我發現后面的幾個參數對模型的分數沒有什么影響了，也可以試著調調看

3.調整min_samples_leaf

改一下參數，其他的不便

param_grid={'min_samples_leaf':np.arange(1, 1+10, 1)}
rfc = RandomForestClassifier(n_estimators=29
                             ,random_state=0
                             ,max_features=1
                            )
GS = GridSearchCV(rfc,param_grid,cv=10)
GS.fit(wine.data,wine.target)
 
GS.best_params_
GS.best_score_

4.調整min_samples_split

param_grid={'min_samples_split':np.arange(2, 2+20, 1)}
rfc = RandomForestClassifier(n_estimators=29
                             ,random_state=0
                             ,max_features=1
                            )
GS = GridSearchCV(rfc,param_grid,cv=10)
GS.fit(wine.data,wine.target)
 
GS.best_params_
GS.best_score_

5.調整min_samples_split

#調整Criterion
 
param_grid = {'criterion':['gini', 'entropy']}
 
rfc = RandomForestClassifier(n_estimators=29
                             ,random_state=0
                            )
GS = GridSearchCV(rfc,param_grid,cv=10)
GS.fit(data.data,data.target)
 
GS.best_params_
GS.best_score_

五、總結

模型的最優參數是當map_depth=13，max_features=1時，但根據泛化誤差，map_depth為13應該在曲線的左邊，要調最優應該要往右邊調，但是max_features=1明顯是往曲線的左邊調，這好像有一丟丟的矛盾。所以這是因為紅酒數據集的原因么？

image

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美国产综合欧美视频

實驗二：使用隨機森林算法對紅酒數據集進行分類建模過程

實驗二：使用隨機森林算法對紅酒數據集進行分類建模過程

步驟

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美 国产 综合 欧美 视频

實驗二：使用隨機森林算法對紅酒數據集進行分類建模過程

步驟

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美国产综合欧美视频