本文采用的是v1.1版本,github地址https://github.com/kingfengji/gcForest
代碼主要分為兩部分:examples文件夾下是主代碼.py和配置文件.json;libs文件夾下是代碼中用到的庫
主代碼的實現(xiàn)
from gcforest.gcforest import GCForest
gc = GCForest(config) # should be a dict
X_train_enc = gc.fit_transform(X_train, y_train)
y_pred = gc.predict(X_test)
lib庫的詳解
gcforest.py 整個框架的實現(xiàn)
fgnet.py 多粒度部分,F(xiàn)ineGrained的實現(xiàn)
cascade/cascade_classifier 級聯(lián)分類器的實現(xiàn)
datasets/.... 包含一系列數(shù)據(jù)集的定義
estimator/... 包含決策樹在進(jìn)行評估用到的函數(shù)(多種分類器的預(yù)估)
layer/... 包含不同的層操作,如連接、池化、滑窗等
utils/.. 包含各種功能函數(shù),譬如計算準(zhǔn)確率、win_vote、win_avg、get_windows等
json配置文件的詳解
參數(shù)介紹
- max_depth: 決策樹最大深度。默認(rèn)為"None",決策樹在建立子樹的時候不會限制子樹的深度這樣建樹時,會使每一個葉節(jié)點(diǎn)只有一個類別,或是達(dá)到min_samples_split。一般來說,數(shù)據(jù)少或者特征少的時候可以不管這個值。如果模型樣本量多,特征也多的情況下,推薦限制這個最大深度,具體的取值取決于數(shù)據(jù)的分布。常用的可以取值10-100之間。
- estimators表示選擇的分類器
- n_estimators 為森林里的樹的數(shù)量
- n_jobs: int (default=1)
The number of jobs to run in parallel for any Random Forest fit and predict.
If -1, then the number of jobs is set to the number of cores.
訓(xùn)練的配置,分三類情況:
- 采用默認(rèn)的模型
def get_toy_config():
config = {}
ca_config = {}
ca_config["random_state"] = 0 # 0 or 1
ca_config["max_layers"] = 100 #最大的層數(shù),layer對應(yīng)論文中的level
ca_config["early_stopping_rounds"] = 3 #如果出現(xiàn)某層的三層以內(nèi)的準(zhǔn)確率都沒有提升,層中止
ca_config["n_classes"] = 3 #判別的類別數(shù)量
ca_config["estimators"] = []
ca_config["estimators"].append(
{"n_folds": 5, "type": "XGBClassifier", "n_estimators": 10, "max_depth": 5,
"objective": "multi:softprob", "silent": True, "nthread": -1, "learning_rate": 0.1} )
ca_config["estimators"].append({"n_folds": 5, "type": "RandomForestClassifier", "n_estimators": 10, "max_depth": None, "n_jobs": -1})
ca_config["estimators"].append({"n_folds": 5, "type": "ExtraTreesClassifier", "n_estimators": 10, "max_depth": None, "n_jobs": -1})
ca_config["estimators"].append({"n_folds": 5, "type": "LogisticRegression"})
config["cascade"] = ca_config #共使用了四個基學(xué)習(xí)器
return config
支持的基本分類器:
RandomForestClassifier
XGBClassifier
ExtraTreesClassifier
LogisticRegression
SGDClassifier
你可以通過下述方式手動添加任何分類器:
lib/gcforest/estimators/__init__.py
- 只有級聯(lián)(cascade)部分
{
"cascade": {
"random_state": 0,
"max_layers": 100,
"early_stopping_rounds": 3,
"n_classes": 10,
"estimators": [
{"n_folds":5,"type":"XGBClassifier","n_estimators":10,"max_depth":5,"objective":"multi:softprob", "silent":true, "nthread":-1, "learning_rate":0.1},
{"n_folds":5,"type":"RandomForestClassifier","n_estimators":10,"max_depth":null,"n_jobs":-1},
{"n_folds":5,"type":"ExtraTreesClassifier","n_estimators":10,"max_depth":null,"n_jobs":-1},
{"n_folds":5,"type":"LogisticRegression"}
]
}
}
- “multi fine-grained + cascade” 兩部分
滑動窗口的大小: {[d/16], [d/8], [d/4]},d代表輸入特征的數(shù)量;
"look_indexs_cycle": [
[0, 1],
[2, 3],
[4, 5]]
代表級聯(lián)多粒度的方式,第一層級聯(lián)0、1森林的輸出,第二層級聯(lián)2、3森林的輸出,第三層級聯(lián)4、5森林的輸出
{
"net":{
"outputs": ["pool1/7x7/ets", "pool1/7x7/rf", "pool1/10x10/ets", "pool1/10x10/rf", "pool1/13x13/ets", "pool1/13x13/rf"],
"layers":[
// win1/7x7
{
"type":"FGWinLayer",
"name":"win1/7x7",
"bottoms": ["X","y"],
"tops":["win1/7x7/ets", "win1/7x7/rf"],
"n_classes": 10,
"estimators": [
{"n_folds":3,"type":"ExtraTreesClassifier","n_estimators":20,"max_depth":10,"n_jobs":-1,"min_samples_leaf":10},
{"n_folds":3,"type":"RandomForestClassifier","n_estimators":20,"max_depth":10,"n_jobs":-1,"min_samples_leaf":10}
],
"stride_x": 2,
"stride_y": 2,
"win_x":7,
"win_y":7
},
// win1/10x10
{
"type":"FGWinLayer",
"name":"win1/10x10",
"bottoms": ["X","y"],
"tops":["win1/10x10/ets", "win1/10x10/rf"],
"n_classes": 10,
"estimators": [
{"n_folds":3,"type":"ExtraTreesClassifier","n_estimators":20,"max_depth":10,"n_jobs":-1,"min_samples_leaf":10},
{"n_folds":3,"type":"RandomForestClassifier","n_estimators":20,"max_depth":10,"n_jobs":-1,"min_samples_leaf":10}
],
"stride_x": 2,
"stride_y": 2,
"win_x":10,
"win_y":10
},
// win1/13x13
{
"type":"FGWinLayer",
"name":"win1/13x13",
"bottoms": ["X","y"],
"tops":["win1/13x13/ets", "win1/13x13/rf"],
"n_classes": 10,
"estimators": [
{"n_folds":3,"type":"ExtraTreesClassifier","n_estimators":20,"max_depth":10,"n_jobs":-1,"min_samples_leaf":10},
{"n_folds":3,"type":"RandomForestClassifier","n_estimators":20,"max_depth":10,"n_jobs":-1,"min_samples_leaf":10}
],
"stride_x": 2,
"stride_y": 2,
"win_x":13,
"win_y":13
},
// pool1
{
"type":"FGPoolLayer",
"name":"pool1",
"bottoms": ["win1/7x7/ets", "win1/7x7/rf", "win1/10x10/ets", "win1/10x10/rf", "win1/13x13/ets", "win1/13x13/rf"],
"tops": ["pool1/7x7/ets", "pool1/7x7/rf", "pool1/10x10/ets", "pool1/10x10/rf", "pool1/13x13/ets", "pool1/13x13/rf"],
"pool_method": "avg",
"win_x":2,
"win_y":2
}
]
},
"cascade": {
"random_state": 0,
"max_layers": 100,
"early_stopping_rounds": 3,
"look_indexs_cycle": [
[0, 1],
[2, 3],
[4, 5]
],
"n_classes": 10,
"estimators": [
{"n_folds":5,"type":"ExtraTreesClassifier","n_estimators":1000,"max_depth":null,"n_jobs":-1},
{"n_folds":5,"type":"RandomForestClassifier","n_estimators":1000,"max_depth":null,"n_jobs":-1}
]
}
}