集成學習Bagging和Boosting算法總結

技術交流QQ群:1027579432,歡迎你的加入!

一、集成學習綜述

    1. 集成方法或元算法是對其他算法進行組合的一種方式,下面的博客中主要關注的是AdaBoost元算法。將不同的分類器組合起來,而這種組合結果被稱為集成方法/元算法。使用集成算法時會有很多的形式,如:
    • 不同算法的集成
    • 同一種算法在不同設置下的集成
    • 數據集不同部分分配給不同分類器之后的集成
    1. AdaBoost算法優缺點
    • 優點:泛化錯誤率低,易編碼,可以應用在大部分分類器上,無參數調整
    • 缺點:對離群點敏感
    • 適用數據類型:數值型和標稱型數據

二、基于數據集多重采樣的分類器

    1. bagging方法(bootstrap aggregating)
    • 主要思想:
      • (1). 從原始數據集中抽取新的訓練集。每次從原始數據集中使用有放回采樣數據的方法,抽取n個樣本(在原始數據集中,有些樣本可能被重復采樣,而有些樣本可能一次都未被采樣到)。共進行k次抽取,得到k個新的數據集(k個新訓練集之間是相互獨立的),新的數據集的大小和原始數據集的大小相等。
      • (2). 每次使用一個新的訓練集得到一個模型,k個新的訓練集總共可以得到k個新的模型
      • (3). 對分類問題:將(2)中得到的k個模型采用投票方式得到分類結果;對回歸問題:計算(2)中模型的均值作為最后的結果(所有模型的重要性相同!!!)
    1. boosting方法
    • 不論是在bagging還是boosting當中,所使用的多個分類器的類型都是一樣的。但是,在boosting中,不同的分類器通過串行訓練來獲得的,每個新分類器都根據已訓練出的分類器的性能來進行訓練。boosting是通過關注被已有分類器錯分的那些數據來獲得新的分類器,,boosting方法有多個版本,下面介紹的是最流行的一個版本AdaBoosting算法。
    • 主要思想:
      • (1). 對每一次的訓練數據樣本賦予一個權重,并且每一次樣本的權重分布依賴上一次的分類結果。
      • (2). 基分類器之間采用序列的線性加權方式來組合。

三、bagging方法與boosting方法對比

    1. 樣本選擇上:
    • bagging方法:新的訓練集是在原始訓練集中采用有放回的方式采樣樣本的,從原始訓練集中選取的每個新的訓練集之間是相互獨立的。
    • boosting方法:每一次的訓練集不變,只是訓練集中的每個樣本在分類器中的權重發生變化,而權重是根據上一次的分類結果進行調整的。
    1. 樣本權重上:
    • bagging方法:使用均勻選取樣本,每個樣本的權重相同。
    • boosting方法:根據錯誤率不斷調整樣本的權重,錯誤率越大,權重越大。
    1. 預測函數:
    • bagging方法:所有預測函數的權重相等
    • boosting方法:每個弱分類器都有相應的權重,對分類誤差小的分類器會有更大的權重
    1. 并行計算:
    • bagging方法:各個預測函數可以并行計算
    • boosting方法:各個預測函數只能順序生成,因為后一個模型需要前一個模型的輸出結果

四、集成學習的常見應用

    1. 常見算法
    • Bagging + 決策樹 = 隨機森林
    • AdaBoost + 決策樹 = 提升樹
    • Gradient Boosting + 決策樹 = GBDT
    1. 基于錯誤率提升分類器的性能(AdaBoost算法原理介紹)
    • 2.1 AdaBoost算法介紹
      集成學習算法思想:使用弱分類器和多個樣本來構建一個強分類器。AdaBoost是adaptive boosting的縮寫,主要運行過程是:首先,對訓練數據集中的每個樣本進行訓練,并賦予每個樣本一個權重,這些權重構成一個向量D。一開始,這些樣本的初始權重都是相同的!然后,在訓練數據上訓練出一個弱分類器并計算弱分類器的錯誤率。接著,在相同的訓練數據上再次訓練弱分類器。在分類器的第二次訓練過程中,將會重新調整每個樣本的權重!其中對第一次中分對的樣本降低其權重,在第一次中分錯的樣本提高其權重。為了從所有弱分類器中得到最終的分類結果,AdaBoost還會對每個弱分類器都分配一個權重值alpha,這些alpha值是基于每個弱分類器的錯誤率進行計算出來的。
    • 2.2 錯誤率ε的定義如下:


      錯誤率計算公式.png
    • 2.3 alpha的計算公式


      alpha的計算公式.png
    • 2.4 AdaBoost算法流程如下


      AdaBoost算法流程.png
    • 2.5 對上圖的解釋如下:
      • 首先,對訓練數據集中的每個樣本進行初始化權重,此時每個樣本的權重是相同的,這些權重構成了權重向量D;然后,經過第一個弱分類器后,訓練集中每個樣本的權重發生變化,根據第一個弱分類器的分類結果計算其錯誤率ε;接著,計算出alpha的值;計算出aplha值之后,可以對權重向量D進行更新,使得對第一個分類器分類結果中分類錯誤的樣本,提高其權重。對分類正確的樣本,降低其權重。權重向量D的更新方法如下:
        • 2.5.1 如果某個樣本被第一個弱分類器分類正確,那么該樣本的權重更新公式是:


          分類正確情況.png
        • 2.5.2 如果某個樣本被第一個弱分類器分類錯誤,那么該樣本的權重更新公式是:


          分類錯誤情況.png
      • 在計算出D后,AdaBoost又開始進行下一輪的迭代,AdaBoost算法會不斷的重復訓練和調整權重,直到訓練錯誤率為0或弱分類器的數目達到用戶的指定值為止。
    1. AdaBoost算法實戰(基于單層決策樹構建弱分類器)


      數據集可視化.png
  • 3.1從上圖可以看出,試著從某個坐標軸上選擇一個值(即選擇一條與坐標軸平行的直線)來將所有的藍色圓點和橘色圓點分開,這顯然是不可能的。這就是單層決策樹難以處理的一個著名問題。通過使用多顆單層決策樹,我們可以構建出一個能夠對該數據集完全正確分類的分類器。
  #################數據集的可視化#####################

def loadSimData():
  """
  創建單層決策樹的數據集
  """
  dataMat = np.matrix([[1., 2.1],
                       [1.5, 1.6],
                       [1.3, 1.],
                       [1., 1.],
                       [2., 1.]])
  classLabels = [1.0, 1.0, -1.0, -1.0, 1.0]
  return dataMat, classLabels


def showDataSet(dataMat, labelMat):
  """
  數據可視化
  """
  data_plus = []  # 正樣本
  data_minus = []  # 負樣本
  for i in range(len(dataMat)):
      if labelMat[i] > 0:
          data_plus.append(dataMat[i])
      else:
          data_minus.append(dataMat[i])
  data_plus_np = np.array(data_plus)
  data_minus_np = np.array(data_minus)  # 轉化成numpy中的數據類型
  plt.scatter(np.transpose(data_plus_np)[0], np.transpose(data_plus_np)[1])
  plt.scatter(np.transpose(data_minus_np)[0], np.transpose(data_minus_np)[1])
  plt.title("Dataset Visualize")
  plt.xlabel("x1")
  plt.ylabel("x2")
  plt.show()
if __name__ == '__main__':
  data_Arr, classLabels = loadSimData()
  showDataSet(data_Arr, classLabels)
數據集可視化2.png
  • 3.2藍色橫線上的是一個類別,藍色橫線下邊是一個類別。顯然,此時有一個藍點分類錯誤,計算此時的分類誤差錯誤率為1/5 = 0.2。這個橫線與坐標軸的y軸的交點,就是我們設置的閾值,通過不斷改變閾值的大小,找到使單層決策樹的分類誤差最小的閾值。同理,豎線也是如此,找到最佳分類的閾值,就找到了最佳單層決策樹。
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# @Date    : 2019-05-12 21:31:41
# @Author  : cdl (1217096231@qq.com)
# @Link    : https://github.com/cdlwhm1217096231/python3_spider
# @Version : $Id$

import numpy as np
import matplotlib.pyplot as plt

# 數據集可視化


def loadSimpleData():
    dataMat = np.matrix([[1., 2.1],
                         [1.5, 1.6],
                         [1.3, 1.],
                         [1., 1.],
                         [2., 1.]])
    classLabels = [1.0, 1.0, -1.0, -1.0, 1.0]
    return dataMat, classLabels


def showDataSet(dataMat, labelMat):
    data_plus = []
    data_minus = []
    for i in range(len(dataMat)):
        if labelMat[i] > 0:
            data_plus.append(dataMat[i])
        else:
            data_minus.append(dataMat[i])
    data_plus_np = np.array(data_plus)
    data_minus_np = np.array(data_minus)
    plt.scatter(np.transpose(data_plus_np)[0], np.transpose(data_plus)[1])
    plt.scatter(np.transpose(data_minus_np)[0], np.transpose(data_minus)[1])
    plt.title("dataset visualize")
    plt.xlabel("x")
    plt.ylabel("y")
    plt.show()


# 構建單層決策樹分類函數
def stumpClassify(dataMat, dimen, threshval, threshIneq):
    """
        dataMat:數據矩陣
        dimen:第dimen列,即第幾個特征
        threshval:閾值
        threshIneq:標志
        返回值retArray:分類結果
    """
    retArray = np.ones((np.shape(dataMat)[0], 1))  # 初始化retArray為1
    if threshIneq == "lt":
        retArray[dataMat[:, dimen] <= threshval] = -1.0   # 如果小于閾值,則賦值為-1
    else:
        retArray[dataMat[:, dimen] > threshval] = 1.0   # 如果大于閾值,則賦值為-1
    return retArray
# 找到數據集上最佳的單層決策樹,單層決策樹是指只考慮其中的一個特征,在該特征的基礎上進行分類,尋找分類錯誤率最低的閾值即可。例如本文中的例子是,如果以第一列特征為基礎,閾值選擇1.3,并且設置>1.3的為-1,<1.3的為+1,這樣就構造出了一個二分類器


def buildStump(dataMat, classLabels, D):
    """
        dataMat:數據矩陣
        classLabels:數據標簽
        D:樣本權重
        返回值是:bestStump:最佳單層決策樹信息;minError:最小誤差;bestClasEst:最佳的分類結果
    """
    dataMat = np.matrix(dataMat)
    labelMat = np.matrix(classLabels).T
    m, n = np.shape(dataMat)
    numSteps = 10.0
    bestStump = {}  # 存儲最佳單層決策樹信息的字典
    bestClasEst = np.mat(np.zeros((m, 1)))   # 最佳分類結果
    minError = float("inf")
    for i in range(n):  # 遍歷所有特征
        rangeMin = dataMat[:, i].min()
        rangeMax = dataMat[:, i].max()
        stepSize = (rangeMax - rangeMin) / numSteps  # 計算步長
        for j in range(-1, int(numSteps) + 1):
            for inequal in ["lt", "gt"]:
                threshval = (rangeMin + float(j) * stepSize)  # 計算閾值
                predictVals = stumpClassify(
                    dataMat, i, threshval, inequal)  # 計算分類結果
                errArr = np.mat(np.ones((m, 1)))  # 初始化誤差矩陣
                errArr[predictVals == labelMat] = 0  # 分類完全正確,賦值為0
                # 基于權重向量D而不是其他錯誤計算指標來評價分類器的,不同的分類器計算方法不一樣
                weightedError = D.T * errArr  # 計算弱分類器的分類錯誤率---這里沒有采用常規方法來評價這個分類器的分類準確率,而是乘上了樣本權重D
                print("split: dim %d, thresh %.2f, thresh ineqal: %s, the weighted error is %.3f" % (
                    i, threshval, inequal, weightedError))
                if weightedError < minError:
                    minError = weightedError
                    bestClasEst = predictVals.copy()
                    bestStump["dim"] = i
                    bestStump["thresh"] = threshval
                    bestStump["ineq"] = inequal
    return bestStump, minError, bestClasEst
  • 3.3 通過遍歷,改變不同的閾值,計算最終的分類誤差,找到分類誤差最小的分類方式,即為我們要找的最佳單層決策樹。這里lt表示less than,表示分類方式,對于小于閾值的樣本點賦值為-1,gt表示greater than,也是表示分類方式,對于大于閾值的樣本點賦值為-1。經過遍歷,我們找到,訓練好的最佳單層決策樹的最小分類誤差為0,就是對于該數據集,無論用什么樣的單層決策樹,分類誤差最小就是0。這就是我們訓練好的弱分類器。接下來,使用AdaBoost算法提升分類器性能,將分類誤差縮短到0,看下AdaBoost算法是如何實現的。
# 使用Adaboost算法提升弱分類器性能
def adbBoostTrainDS(dataMat, classLabels, numIt=40):
    """
        dataMat:數據矩陣
        classLabels:標簽矩陣
        numIt:最大迭代次數
        返回值:weakClassArr  訓練好的分類器   aggClassEst:類別估計累計值
    """
    weakClassArr = []
    m = np.shape(dataMat)[0]
    D = np.mat(np.ones((m, 1)) / m)  # 初始化樣本權重D
    aggClassEst = np.mat(np.zeros((m, 1)))
    for i in range(numIt):
        bestStump, error, classEst = buildStump(
            dataMat, classLabels, D)  # 構建單個單層決策樹
        # 計算弱分類器權重alpha,使error不等于0,因為分母不能為0
        alpha = float(0.5 * np.log((1.0 - error) / max(error, 1e-16)))
        bestStump["alpha"] = alpha   # 存儲每個弱分類器的權重alpha
        weakClassArr.append(bestStump)  # 存儲單個單層決策樹
        print("classEst: ", classEst.T)
        expon = np.multiply(-1 * alpha *
                            np.mat(classLabels).T, classEst)  # 計算e的指數項
        D = np.multiply(D, np.exp(expon))
        D = D / D.sum()

        # 計算AdaBoost誤差,當誤差為0時,退出循環
        aggClassEst += alpha * classEst  # 計算類別估計累計值--注意這里包括了目前已經訓練好的每一個弱分類器
        print("aggClassEst: ", aggClassEst.T)
        aggErrors = np.multiply(np.sign(aggClassEst) != np.mat(
            classLabels).T, np.ones((m, 1)))  # 目前集成分類器的分類誤差
        errorRate = aggErrors.sum() / m  # 集成分類器分類錯誤率,如果錯誤率為0,則整個集成算法停止,訓練完成
        print("total error: ", errorRate)
        if errorRate == 0.0:
            break
    return weakClassArr, aggClassEst
  • 3.4 使用AdaBoost提升分類器性能
# Adaboost分類函數
def adaClassify(dataToClass, classifier):
    """
        dataToClass:待分類樣本
        classifier:訓練好的強分類器
    """
    dataMat = np.mat(dataToClass)
    m = np.shape(dataMat)[0]
    aggClassEst = np.mat(np.zeros((m, 1)))
    for i in range(len(classifier)):   # 遍歷所有分類器,進行分類
        classEst = stumpClassify(
            dataMat, classifier[i]["dim"], classifier[i]["thresh"], classifier[i]["ineq"])
        aggClassEst += classifier[i]["alpha"] * classEst
        print(aggClassEst)
    return np.sign(aggClassEst)
  • 3.5 整個Adaboost提升算法的代碼如下:
  #!/usr/bin/env python
# -*- coding: utf-8 -*-
# @Date    : 2019-05-12 21:31:41
# @Author  : cdl (1217096231@qq.com)
# @Link    : https://github.com/cdlwhm1217096231/python3_spider
# @Version : $Id$

import numpy as np
import matplotlib.pyplot as plt

# 數據集可視化


def loadSimpleData():
    dataMat = np.matrix([[1., 2.1],
                         [1.5, 1.6],
                         [1.3, 1.],
                         [1., 1.],
                         [2., 1.]])
    classLabels = [1.0, 1.0, -1.0, -1.0, 1.0]
    return dataMat, classLabels


def showDataSet(dataMat, labelMat):
    data_plus = []
    data_minus = []
    for i in range(len(dataMat)):
        if labelMat[i] > 0:
            data_plus.append(dataMat[i])
        else:
            data_minus.append(dataMat[i])
    data_plus_np = np.array(data_plus)
    data_minus_np = np.array(data_minus)
    plt.scatter(np.transpose(data_plus_np)[0], np.transpose(data_plus)[1])
    plt.scatter(np.transpose(data_minus_np)[0], np.transpose(data_minus)[1])
    plt.title("dataset visualize")
    plt.xlabel("x")
    plt.ylabel("y")
    plt.show()


# 構建單層決策樹分類函數
def stumpClassify(dataMat, dimen, threshval, threshIneq):
    """
        dataMat:數據矩陣
        dimen:第dimen列,即第幾個特征
        threshval:閾值
        threshIneq:標志
        返回值retArray:分類結果
    """
    retArray = np.ones((np.shape(dataMat)[0], 1))  # 初始化retArray為1
    if threshIneq == "lt":
        retArray[dataMat[:, dimen] <= threshval] = -1.0   # 如果小于閾值,則賦值為-1
    else:
        retArray[dataMat[:, dimen] > threshval] = 1.0   # 如果大于閾值,則賦值為-1
    return retArray


# 找到數據集上最佳的單層決策樹,單層決策樹是指只考慮其中的一個特征,在該特征的基礎上進行分類,尋找分類錯誤率最低的閾值即可。例如本文中的例子是,如果以第一列特征為基礎,閾值選擇1.3,并且設置>1.3的為-1,<1.3的為+1,這樣就構造出了一個二分類器
def buildStump(dataMat, classLabels, D):
    """
        dataMat:數據矩陣
        classLabels:數據標簽
        D:樣本權重
        返回值是:bestStump:最佳單層決策樹信息;minError:最小誤差;bestClasEst:最佳的分類結果
    """
    dataMat = np.matrix(dataMat)
    labelMat = np.matrix(classLabels).T
    m, n = np.shape(dataMat)
    numSteps = 10.0
    bestStump = {}  # 存儲最佳單層決策樹信息的字典
    bestClasEst = np.mat(np.zeros((m, 1)))   # 最佳分類結果
    minError = float("inf")
    for i in range(n):  # 遍歷所有特征
        rangeMin = dataMat[:, i].min()
        rangeMax = dataMat[:, i].max()
        stepSize = (rangeMax - rangeMin) / numSteps  # 計算步長
        for j in range(-1, int(numSteps) + 1):
            for inequal in ["lt", "gt"]:
                threshval = (rangeMin + float(j) * stepSize)  # 計算閾值
                predictVals = stumpClassify(
                    dataMat, i, threshval, inequal)  # 計算分類結果
                errArr = np.mat(np.ones((m, 1)))  # 初始化誤差矩陣
                errArr[predictVals == labelMat] = 0  # 分類完全正確,賦值為0
                # 基于權重向量D而不是其他錯誤計算指標來評價分類器的,不同的分類器計算方法不一樣
                weightedError = D.T * errArr  # 計算弱分類器的分類錯誤率---這里沒有采用常規方法來評價這個分類器的分類準確率,而是乘上了樣本權重D
                print("split: dim %d, thresh %.2f, thresh ineqal: %s, the weighted error is %.3f" % (
                    i, threshval, inequal, weightedError))
                if weightedError < minError:
                    minError = weightedError
                    bestClasEst = predictVals.copy()
                    bestStump["dim"] = i
                    bestStump["thresh"] = threshval
                    bestStump["ineq"] = inequal
    return bestStump, minError, bestClasEst


# 使用Adaboost算法提升弱分類器性能
def adbBoostTrainDS(dataMat, classLabels, numIt=40):
    """
        dataMat:數據矩陣
        classLabels:標簽矩陣
        numIt:最大迭代次數
        返回值:weakClassArr  訓練好的分類器   aggClassEst:類別估計累計值
    """
    weakClassArr = []
    m = np.shape(dataMat)[0]
    D = np.mat(np.ones((m, 1)) / m)  # 初始化樣本權重D
    aggClassEst = np.mat(np.zeros((m, 1)))
    for i in range(numIt):
        bestStump, error, classEst = buildStump(
            dataMat, classLabels, D)  # 構建單個單層決策樹
        # 計算弱分類器權重alpha,使error不等于0,因為分母不能為0
        alpha = float(0.5 * np.log((1.0 - error) / max(error, 1e-16)))
        bestStump["alpha"] = alpha   # 存儲每個弱分類器的權重alpha
        weakClassArr.append(bestStump)  # 存儲單個單層決策樹
        print("classEst: ", classEst.T)
        expon = np.multiply(-1 * alpha *
                            np.mat(classLabels).T, classEst)  # 計算e的指數項
        D = np.multiply(D, np.exp(expon))
        D = D / D.sum()

        # 計算AdaBoost誤差,當誤差為0時,退出循環
        aggClassEst += alpha * classEst  # 計算類別估計累計值--注意這里包括了目前已經訓練好的每一個弱分類器
        print("aggClassEst: ", aggClassEst.T)
        aggErrors = np.multiply(np.sign(aggClassEst) != np.mat(
            classLabels).T, np.ones((m, 1)))  # 目前集成分類器的分類誤差
        errorRate = aggErrors.sum() / m  # 集成分類器分類錯誤率,如果錯誤率為0,則整個集成算法停止,訓練完成
        print("total error: ", errorRate)
        if errorRate == 0.0:
            break
    return weakClassArr, aggClassEst


# Adaboost分類函數
def adaClassify(dataToClass, classifier):
    """
        dataToClass:待分類樣本
        classifier:訓練好的強分類器
    """
    dataMat = np.mat(dataToClass)
    m = np.shape(dataMat)[0]
    aggClassEst = np.mat(np.zeros((m, 1)))
    for i in range(len(classifier)):   # 遍歷所有分類器,進行分類
        classEst = stumpClassify(
            dataMat, classifier[i]["dim"], classifier[i]["thresh"], classifier[i]["ineq"])
        aggClassEst += classifier[i]["alpha"] * classEst
        print(aggClassEst)
    return np.sign(aggClassEst)


if __name__ == "__main__":
    dataMat, classLabels = loadSimpleData()
    showDataSet(dataMat, classLabels)
    weakClassArr, aggClassEst = adbBoostTrainDS(dataMat, classLabels)
    print(adaClassify([[0, 0], [5, 5]], weakClassArr))

  • 結果如下:
      split: dim 0, thresh 0.90, thresh ineqal: lt, the weighted error is 0.400
      split: dim 0, thresh 0.90, thresh ineqal: gt, the weighted error is 0.400
      split: dim 0, thresh 1.00, thresh ineqal: lt, the weighted error is 0.400
      split: dim 0, thresh 1.00, thresh ineqal: gt, the weighted error is 0.400
      split: dim 0, thresh 1.10, thresh ineqal: lt, the weighted error is 0.400
      split: dim 0, thresh 1.10, thresh ineqal: gt, the weighted error is 0.400
      split: dim 0, thresh 1.20, thresh ineqal: lt, the weighted error is 0.400
      split: dim 0, thresh 1.20, thresh ineqal: gt, the weighted error is 0.400
      split: dim 0, thresh 1.30, thresh ineqal: lt, the weighted error is 0.200
      split: dim 0, thresh 1.30, thresh ineqal: gt, the weighted error is 0.400
      split: dim 0, thresh 1.40, thresh ineqal: lt, the weighted error is 0.200
      split: dim 0, thresh 1.40, thresh ineqal: gt, the weighted error is 0.400
      split: dim 0, thresh 1.50, thresh ineqal: lt, the weighted error is 0.400
      split: dim 0, thresh 1.50, thresh ineqal: gt, the weighted error is 0.400
      split: dim 0, thresh 1.60, thresh ineqal: lt, the weighted error is 0.400
      split: dim 0, thresh 1.60, thresh ineqal: gt, the weighted error is 0.400
      split: dim 0, thresh 1.70, thresh ineqal: lt, the weighted error is 0.400
      split: dim 0, thresh 1.70, thresh ineqal: gt, the weighted error is 0.400
      split: dim 0, thresh 1.80, thresh ineqal: lt, the weighted error is 0.400
      split: dim 0, thresh 1.80, thresh ineqal: gt, the weighted error is 0.400
      split: dim 0, thresh 1.90, thresh ineqal: lt, the weighted error is 0.400
      split: dim 0, thresh 1.90, thresh ineqal: gt, the weighted error is 0.400
      split: dim 0, thresh 2.00, thresh ineqal: lt, the weighted error is 0.600
      split: dim 0, thresh 2.00, thresh ineqal: gt, the weighted error is 0.400
      split: dim 1, thresh 0.89, thresh ineqal: lt, the weighted error is 0.400
      split: dim 1, thresh 0.89, thresh ineqal: gt, the weighted error is 0.400
      split: dim 1, thresh 1.00, thresh ineqal: lt, the weighted error is 0.200
      split: dim 1, thresh 1.00, thresh ineqal: gt, the weighted error is 0.400
      split: dim 1, thresh 1.11, thresh ineqal: lt, the weighted error is 0.200
      split: dim 1, thresh 1.11, thresh ineqal: gt, the weighted error is 0.400
      split: dim 1, thresh 1.22, thresh ineqal: lt, the weighted error is 0.200
      split: dim 1, thresh 1.22, thresh ineqal: gt, the weighted error is 0.400
      split: dim 1, thresh 1.33, thresh ineqal: lt, the weighted error is 0.200
      split: dim 1, thresh 1.33, thresh ineqal: gt, the weighted error is 0.400
      split: dim 1, thresh 1.44, thresh ineqal: lt, the weighted error is 0.200
      split: dim 1, thresh 1.44, thresh ineqal: gt, the weighted error is 0.400
      split: dim 1, thresh 1.55, thresh ineqal: lt, the weighted error is 0.200
      split: dim 1, thresh 1.55, thresh ineqal: gt, the weighted error is 0.400
      split: dim 1, thresh 1.66, thresh ineqal: lt, the weighted error is 0.400
      split: dim 1, thresh 1.66, thresh ineqal: gt, the weighted error is 0.400
      split: dim 1, thresh 1.77, thresh ineqal: lt, the weighted error is 0.400
      split: dim 1, thresh 1.77, thresh ineqal: gt, the weighted error is 0.400
      split: dim 1, thresh 1.88, thresh ineqal: lt, the weighted error is 0.400
      split: dim 1, thresh 1.88, thresh ineqal: gt, the weighted error is 0.400
      split: dim 1, thresh 1.99, thresh ineqal: lt, the weighted error is 0.400
      split: dim 1, thresh 1.99, thresh ineqal: gt, the weighted error is 0.400
      split: dim 1, thresh 2.10, thresh ineqal: lt, the weighted error is 0.600
      split: dim 1, thresh 2.10, thresh ineqal: gt, the weighted error is 0.400
      classEst:  [[-1.  1. -1. -1.  1.]]
      aggClassEst:  [[-0.69314718  0.69314718 -0.69314718 -0.69314718  0.69314718]]
      total error:  0.2
      split: dim 0, thresh 0.90, thresh ineqal: lt, the weighted error is 0.250
      split: dim 0, thresh 0.90, thresh ineqal: gt, the weighted error is 0.250
      split: dim 0, thresh 1.00, thresh ineqal: lt, the weighted error is 0.625
      split: dim 0, thresh 1.00, thresh ineqal: gt, the weighted error is 0.250
      split: dim 0, thresh 1.10, thresh ineqal: lt, the weighted error is 0.625
      split: dim 0, thresh 1.10, thresh ineqal: gt, the weighted error is 0.250
      split: dim 0, thresh 1.20, thresh ineqal: lt, the weighted error is 0.625
      split: dim 0, thresh 1.20, thresh ineqal: gt, the weighted error is 0.250
      split: dim 0, thresh 1.30, thresh ineqal: lt, the weighted error is 0.500
      split: dim 0, thresh 1.30, thresh ineqal: gt, the weighted error is 0.250
      split: dim 0, thresh 1.40, thresh ineqal: lt, the weighted error is 0.500
      split: dim 0, thresh 1.40, thresh ineqal: gt, the weighted error is 0.250
      split: dim 0, thresh 1.50, thresh ineqal: lt, the weighted error is 0.625
      split: dim 0, thresh 1.50, thresh ineqal: gt, the weighted error is 0.250
      split: dim 0, thresh 1.60, thresh ineqal: lt, the weighted error is 0.625
      split: dim 0, thresh 1.60, thresh ineqal: gt, the weighted error is 0.250
      split: dim 0, thresh 1.70, thresh ineqal: lt, the weighted error is 0.625
      split: dim 0, thresh 1.70, thresh ineqal: gt, the weighted error is 0.250
      split: dim 0, thresh 1.80, thresh ineqal: lt, the weighted error is 0.625
      split: dim 0, thresh 1.80, thresh ineqal: gt, the weighted error is 0.250
      split: dim 0, thresh 1.90, thresh ineqal: lt, the weighted error is 0.625
      split: dim 0, thresh 1.90, thresh ineqal: gt, the weighted error is 0.250
      split: dim 0, thresh 2.00, thresh ineqal: lt, the weighted error is 0.750
      split: dim 0, thresh 2.00, thresh ineqal: gt, the weighted error is 0.250
      split: dim 1, thresh 0.89, thresh ineqal: lt, the weighted error is 0.250
      split: dim 1, thresh 0.89, thresh ineqal: gt, the weighted error is 0.250
      split: dim 1, thresh 1.00, thresh ineqal: lt, the weighted error is 0.125
      split: dim 1, thresh 1.00, thresh ineqal: gt, the weighted error is 0.250
      split: dim 1, thresh 1.11, thresh ineqal: lt, the weighted error is 0.125
      split: dim 1, thresh 1.11, thresh ineqal: gt, the weighted error is 0.250
      split: dim 1, thresh 1.22, thresh ineqal: lt, the weighted error is 0.125
      split: dim 1, thresh 1.22, thresh ineqal: gt, the weighted error is 0.250
      split: dim 1, thresh 1.33, thresh ineqal: lt, the weighted error is 0.125
      split: dim 1, thresh 1.33, thresh ineqal: gt, the weighted error is 0.250
      split: dim 1, thresh 1.44, thresh ineqal: lt, the weighted error is 0.125
      split: dim 1, thresh 1.44, thresh ineqal: gt, the weighted error is 0.250
      split: dim 1, thresh 1.55, thresh ineqal: lt, the weighted error is 0.125
      split: dim 1, thresh 1.55, thresh ineqal: gt, the weighted error is 0.250
      split: dim 1, thresh 1.66, thresh ineqal: lt, the weighted error is 0.250
      split: dim 1, thresh 1.66, thresh ineqal: gt, the weighted error is 0.250
      split: dim 1, thresh 1.77, thresh ineqal: lt, the weighted error is 0.250
      split: dim 1, thresh 1.77, thresh ineqal: gt, the weighted error is 0.250
      split: dim 1, thresh 1.88, thresh ineqal: lt, the weighted error is 0.250
      split: dim 1, thresh 1.88, thresh ineqal: gt, the weighted error is 0.250
      split: dim 1, thresh 1.99, thresh ineqal: lt, the weighted error is 0.250
      split: dim 1, thresh 1.99, thresh ineqal: gt, the weighted error is 0.250
      split: dim 1, thresh 2.10, thresh ineqal: lt, the weighted error is 0.750
      split: dim 1, thresh 2.10, thresh ineqal: gt, the weighted error is 0.250
      classEst:  [[ 1.  1. -1. -1. -1.]]
      aggClassEst:  [[ 0.27980789  1.66610226 -1.66610226 -1.66610226 -0.27980789]]
      total error:  0.2
      split: dim 0, thresh 0.90, thresh ineqal: lt, the weighted error is 0.143
      split: dim 0, thresh 0.90, thresh ineqal: gt, the weighted error is 0.143
      split: dim 0, thresh 1.00, thresh ineqal: lt, the weighted error is 0.357
      split: dim 0, thresh 1.00, thresh ineqal: gt, the weighted error is 0.143
      split: dim 0, thresh 1.10, thresh ineqal: lt, the weighted error is 0.357
      split: dim 0, thresh 1.10, thresh ineqal: gt, the weighted error is 0.143
      split: dim 0, thresh 1.20, thresh ineqal: lt, the weighted error is 0.357
      split: dim 0, thresh 1.20, thresh ineqal: gt, the weighted error is 0.143
      split: dim 0, thresh 1.30, thresh ineqal: lt, the weighted error is 0.286
      split: dim 0, thresh 1.30, thresh ineqal: gt, the weighted error is 0.143
      split: dim 0, thresh 1.40, thresh ineqal: lt, the weighted error is 0.286
      split: dim 0, thresh 1.40, thresh ineqal: gt, the weighted error is 0.143
      split: dim 0, thresh 1.50, thresh ineqal: lt, the weighted error is 0.357
      split: dim 0, thresh 1.50, thresh ineqal: gt, the weighted error is 0.143
      split: dim 0, thresh 1.60, thresh ineqal: lt, the weighted error is 0.357
      split: dim 0, thresh 1.60, thresh ineqal: gt, the weighted error is 0.143
      split: dim 0, thresh 1.70, thresh ineqal: lt, the weighted error is 0.357
      split: dim 0, thresh 1.70, thresh ineqal: gt, the weighted error is 0.143
      split: dim 0, thresh 1.80, thresh ineqal: lt, the weighted error is 0.357
      split: dim 0, thresh 1.80, thresh ineqal: gt, the weighted error is 0.143
      split: dim 0, thresh 1.90, thresh ineqal: lt, the weighted error is 0.357
      split: dim 0, thresh 1.90, thresh ineqal: gt, the weighted error is 0.143
      split: dim 0, thresh 2.00, thresh ineqal: lt, the weighted error is 0.857
      split: dim 0, thresh 2.00, thresh ineqal: gt, the weighted error is 0.143
      split: dim 1, thresh 0.89, thresh ineqal: lt, the weighted error is 0.143
      split: dim 1, thresh 0.89, thresh ineqal: gt, the weighted error is 0.143
      split: dim 1, thresh 1.00, thresh ineqal: lt, the weighted error is 0.500
      split: dim 1, thresh 1.00, thresh ineqal: gt, the weighted error is 0.143
      split: dim 1, thresh 1.11, thresh ineqal: lt, the weighted error is 0.500
      split: dim 1, thresh 1.11, thresh ineqal: gt, the weighted error is 0.143
      split: dim 1, thresh 1.22, thresh ineqal: lt, the weighted error is 0.500
      split: dim 1, thresh 1.22, thresh ineqal: gt, the weighted error is 0.143
      split: dim 1, thresh 1.33, thresh ineqal: lt, the weighted error is 0.500
      split: dim 1, thresh 1.33, thresh ineqal: gt, the weighted error is 0.143
      split: dim 1, thresh 1.44, thresh ineqal: lt, the weighted error is 0.500
      split: dim 1, thresh 1.44, thresh ineqal: gt, the weighted error is 0.143
      split: dim 1, thresh 1.55, thresh ineqal: lt, the weighted error is 0.500
      split: dim 1, thresh 1.55, thresh ineqal: gt, the weighted error is 0.143
      split: dim 1, thresh 1.66, thresh ineqal: lt, the weighted error is 0.571
      split: dim 1, thresh 1.66, thresh ineqal: gt, the weighted error is 0.143
      split: dim 1, thresh 1.77, thresh ineqal: lt, the weighted error is 0.571
      split: dim 1, thresh 1.77, thresh ineqal: gt, the weighted error is 0.143
      split: dim 1, thresh 1.88, thresh ineqal: lt, the weighted error is 0.571
      split: dim 1, thresh 1.88, thresh ineqal: gt, the weighted error is 0.143
      split: dim 1, thresh 1.99, thresh ineqal: lt, the weighted error is 0.571
      split: dim 1, thresh 1.99, thresh ineqal: gt, the weighted error is 0.143
      split: dim 1, thresh 2.10, thresh ineqal: lt, the weighted error is 0.857
      split: dim 1, thresh 2.10, thresh ineqal: gt, the weighted error is 0.143
      classEst:  [[1. 1. 1. 1. 1.]]
      aggClassEst:  [[ 1.17568763  2.56198199 -0.77022252 -0.77022252  0.61607184]]
      total error:  0.0
      [[-0.69314718]
     [ 0.69314718]]
      [[-1.66610226]
     [ 1.66610226]]
      [[-2.56198199]
     [ 2.56198199]]
      [[-1.]
       [ 1.]]
      [Finished in 2.5s]
    
最后編輯于
?著作權歸作者所有,轉載或內容合作請聯系作者
平臺聲明:文章內容(如有圖片或視頻亦包括在內)由作者上傳并發布,文章內容僅代表作者本人觀點,簡書系信息發布平臺,僅提供信息存儲服務。