色墦五月丁香,国产中年熟女高潮大集合,中国少妇BBWBBW

Python 2.7
IDE Pycharm 5.0.3
numpy 1.11.0
matplotlib 1.5.1

建議先閱讀：
1.(大)數據處理：從txt到數據可視化
2.機器學習之K-近鄰算法（Python描述）基礎

教程來自于《機器學習實戰》第二章
代碼及數據github@Mini-Python-Project中的DataSource文件夾下有個壓縮包

前言

通過基礎的knn學習，現在開始利用knn解決實際問題。

目的

將txt保存的數據進行分析，并能在給出數據時候根據knn算法進行分類，驗證分類器精度，進行匹配等如有疑問親先看基礎部分@MrLevo520--機器學習之K-近鄰算法（Python描述）基礎

首先：將數據可視化

本來的數據圖保存在txt中是這樣的：

這里寫圖片描述

你只需要知道
每行的第一列數據是飛行里程，第二列是玩游戲所占百分比時間，第三列是每年吃的冰激凌消耗量，第四列是某個xx覺得這類人的適合約會的感興趣程度，也就是說啦，他一年飛40920公里，有百分之八左右的時間在玩游戲，每年還要吃掉0.9公升哦，這個對象xx覺得好有魅力，非常想和它約會呢，就是這個意思！

詳細的可見（大）數據處理：從txt到數據可視化，這里不做詳細理解，這里po上一張圖，至于怎么讀出來的，請看上述鏈接

這里寫圖片描述

歸一化特征值

一句話，就是把值拍扁，構成0~1之間的值，這樣就是消去了數字差值對平方后的數據影響力，也就是說，大家數據能量等價，不偏不倚，當然，如果認為某個數值非常重要，可以適當增加權重，（默認歸一化為權重一樣），這個就是后話。放上添加的代碼。

#歸一化計算
def autoNorm(dataSet):
    minVals = dataSet.min(0) #求各列最小，返回一行，
    maxVals = dataSet.max(0)
    ranges = maxVals-minVals #最大最小差值，返回一行
    normDataSet = zeros(shape(dataSet))
    m = dataSet.shape[0] #求行數
    normDataSet = dataSet -tile(minVals,(m,1))
    normDataSet = normDataSet/tile(ranges,(m,1)) #最后返回的是一個矩陣
    return normDataSet,ranges,minVals

這個處理之后出來的值也就是歸一化后的值，可以進行下一步的處理，但是有些數據已經預處理之后，數據已經直接可用了，那就沒有必要進行歸一化，注意查看你自己的數據集。

驗證分類器思想

所以得提前步驟準備妥當之后，可以來測試這個分類器的精度了。
步驟就是
1.把數據集分類測試集和訓練集，當然，knn沒有訓練這個說法
2.測試集遮去標簽，只輸入數據，直接靠KNN的算法，進行預測判斷標簽
3.測試集本身自己的標簽是正確的，只是暫時不用而已，用來當判斷knn算法是否判斷正確
4.錯誤率也就是=貼錯的標簽總數/總的測試樣本數

驗證分類器精度算法

# -*- coding: utf-8 -*-
from numpy import *
import operator

def classify0(inX,dataSet,labels,k): # inX用于需要分類的數據，dataSet輸入訓練集

    ######輸入與訓練樣本之間的距離計算######
    dataSetSize = dataSet.shape[0] # 讀取行數,shape[1]則為列數
    diffMat = tile(inX,(dataSetSize,1))-dataSet # tile,重復inX數組的行(dataSize)次，列重復1
    sqDiffMat = diffMat**2 #平方操作
    sqDistances = sqDiffMat.sum(axis=1) # 每一個列向量相加,axis=0為行相加
    distances = sqDistances**0.5


    sortedDistIndicies = distances.argsort() # argsort函數返回的是數組值從小到大的索引值
    #print sortedDistIndicies #產生的是一個排序號組成的矩陣
    classCount={}

    ######累計次數構成字典######
    for i in range(k):
        voteIlabel = labels[sortedDistIndicies[i]] #排名前k個貼標簽
        classCount[voteIlabel] = classCount.get(voteIlabel,0)+1 # 不斷累加計數的過程，體現在字典的更新中
        
        #get(key,default=None),就是造字典


    ######找到出現次數最大的點######
    sortedClassCount = sorted(classCount.iteritems(),key = operator.itemgetter(1),reverse=True)
    #以value值大小進行排序，reverse=True降序
    #key = operator.itemgetter(1)，operator.itemgetter函數獲取的不是值，而是定義了一個函數，通過該函數作用到對象上才能獲取值

    return sortedClassCount[0][0]
    #返回出現次數最多的value的key


def file2matrix(filename):
    fr = open(filename)
    arrayOlines = fr.readlines()
    numberOfLines = len(arrayOlines)
    returnMat = zeros((numberOfLines,3)) #構造全零陣來存放數
    classLabelVector = [] #開辟容器
    index = 0

    for line in arrayOlines:
        #清洗數據
        line = line.strip()
        listFromLine = line.split('\t')
        #存入數據到list
        returnMat[index,:] = listFromLine[0:3] #三個特征分別存入一行的三個列
        classLabelVector.append(int(listFromLine[-1])) #最后一行是類別標簽
        index +=1
    return returnMat,classLabelVector


#歸一化計算
def autoNorm(dataSet):
    minVals = dataSet.min(0) #求各列最小，返回一行，
    maxVals = dataSet.max(0)
    ranges = maxVals-minVals #最大最小差值，返回一行
    normDataSet = zeros(shape(dataSet))
    m = dataSet.shape[0] #求行數
    normDataSet = dataSet -tile(minVals,(m,1))
    normDataSet = normDataSet/tile(ranges,(m,1)) #最后返回的是一個矩陣
    return normDataSet,ranges,minVals


#測試分類器精度
def datingTest(HORATIO,K):
    hoRatio = HORATIO #取百分之十作為測試數據
    datingDataMat,datingLabels = file2matrix("C:\Users\MrLevo\Desktop\machine_learning_in_action\Ch02\datingTestSet2.txt")
    normMat,ranges,minVals = autoNorm(datingDataMat)
    m = normMat.shape[0]
    numTestVecs = int(m*hoRatio) #挑選出多少組測試數據
    errorCount = 0.0

    for i in range(numTestVecs):
        classifierResult = classify0(normMat[i,:],normMat[numTestVecs:m,:],datingLabels[numTestVecs:m],K)

        print "the classifier came back with:%d,the real answer is %d"%(classifierResult,datingLabels[i])
        if classifierResult !=datingLabels[i]:
            errorCount +=1.0
        print "the total error rate is : %f" % (errorCount/float(numTestVecs))

if __name__ == '__main__':
    HORATIO = input("Please enter test set (%all): ")
    K = input("Please enter the k: ")
    datingTest(HORATIO,K)

IDE輸入輸出的結果：

Please enter test set (%all): 0.1
Please enter the k: 3

the classifier came back with:3,the real answer is 3
the total error rate is : 0.000000
the classifier came back with:2,the real answer is 2
the total error rate is : 0.000000

...

the classifier came back with:1,the real answer is 1
the total error rate is : 0.040000
the classifier came back with:3,the real answer is 1
the total error rate is : 0.050000

錯誤率為5%，可以接受的一個錯誤率。

構建一個能用于實際的系統

這里是約會網站的匹配：手動輸入心目中的她大概是什么樣的，比如飛行里程期望是多少公里，玩不玩游戲呢，還有吃冰激凌怎么看，這些都是用戶自己輸入的。

修改代碼如下：

def classifyPerson():
    resultList = ['not at all','in small doses','in large doses']
    percentTats = input("percentage of time spent playing video games?")
    ffMiles = input("frequent flier miles earned per year?")
    iceCream = input("liters of ice cream consumed per year?")
    datingDataMat,datingLabels = file2matrix("C:\Users\MrLevo\Desktop\machine_learning_in_action\Ch02\datingTestSet2.txt")
    normMat,ranges,minVals = autoNorm(datingDataMat)
    inArr =array([ffMiles,percentTats,iceCream])
    classifierResult = classify0((inArr-minVals)/ranges,normMat,datingLabels,3)
    print "You will probably like this person: ",resultList[classifierResult -1]

if __name__ == '__main__':
    classifyPerson()

開始測試例子，先放個圖，紅色點是十分感興趣，綠色一般，黑色不感興趣

這里寫圖片描述

第一個例子,這是用戶交互界面，10，40000，1等都是自己輸入的，然后系統會根據算法，認為他是屬于什么樣的人。

percentage of time spent playing video games?10
frequent flier miles earned per year?40000
liters of ice cream consumed per year?1
You will probably like this person:  in large doses

從上圖可以看出，飛行距離為40000左右，玩游戲10%的很密集的紅點也就是in large doses，符合

再來個例子

percentage of time spent playing video games?3
frequent flier miles earned per year?40000
liters of ice cream consumed per year?1
You will probably like this person:  not at all

從圖中可以看出玩游戲3%，飛行距離40000的，并不感興趣，所以測試通過。具體的分析可以參考（大）數據處理：從txt到數據可視化中的分析例子，幾乎是符合的。

What's More！

當然，這是書本上的知識，拿來理解其中的算法和結構不錯，對knn也有更深入的理解，但是，這還不夠，所以，我作死的拿出了我研究課題的數據，AVIRIS數據，一個高光譜遙感圖像的數據，簡單說，就是放大了剛才的數據，維數從3維變成了200維（波段），數據從1000組變成了10266組，3類變成了13類，僅此而已啦。看看結構是怎樣的。

這里寫圖片描述

這里只有matlab的.mat格式的，沒事，先將它轉為為txt保存。

從.mat到txt

如何從.mat到txt請看我單獨列出來的一篇文章解決：將.mat文件保存到.txt不帶有科學計數法e-0，這里放上寫好之后的效果大概是這樣的。

這里寫圖片描述

這些都處理好了之后，就可以用上述的第一個例子的算法了。開始！

擬構適用于AVIRIS的Knn算法（有致命bug，錯誤率高）

修改部分代碼

sortedClassCount = sorted(classCount.iteritems(),key = lambda d:d[1],reverse=True)

這里是對同標簽進行累加的過程，為了之后的排序求概率做準備。這里使用lambda比較好理解

增加file2matrix_Label函數，修改file2matrix函數
修改細則請見詳細代碼，注意構造zeros矩陣時候的大小設置，現在已經是200維，10266組數據了。

修改datingTest(HORATIO,K)
增加HORATIO,K參數，用來自定義設置測試集數量和K的參數

完整測試代碼

# -*- coding: utf-8 -*-
from numpy import *
import re


def file2matrix(filename):
    fr = open(filename,'r')
    arrayOlines = fr.readlines()
    numberOfLines = len(arrayOlines) #行數
    #numberOfColumn = shape(mat(arrayOlines))[0] #列數
    returnMat = zeros((numberOfLines,200)) #構造全零陣來存放數
    index = 0
    for line in arrayOlines:
        #清洗數據
        line = line.strip()

        line = re.sub(' +',' ',line)
        line = re.sub('\t',' ',line)
        listFromLine = line.split(' ')
        #存入數據到list
        #print listFromLine
        returnMat[index,:] = listFromLine[0:200]
        returnMat
        index +=1
    print returnMat

    return returnMat

def file2matrix_Label(filename):
    fr = open(filename,'r')

    arrayOlines = fr.readlines()
    numberOfLines = len(arrayOlines) #行數
    returnLab = zeros((numberOfLines,1)) #構造全零陣來存放數
    classLabelVector = [] #開辟容器
    index = 0

    for line in arrayOlines:
        #清洗數據
        line = line.strip()
        line = re.sub('\t',' ',line)
        line = re.sub(' +',' ',line)
        listFromLine = line.split(' ')
        #存入數據到list
        #print listFromLine
        returnLab[index,:] = listFromLine[0:1] 
        classLabelVector.append(int(listFromLine[0]))
        index +=1
    return classLabelVector



def classify0(inX,dataSet,labels,k): # inX用于需要分類的數據，dataSet輸入訓練集

    ######輸入與訓練樣本之間的距離計算######
    dataSetSize = dataSet.shape[0] # 讀取行數,shape[1]則為列數
    diffMat = tile(inX,(dataSetSize,1))-dataSet # tile,重復inX數組的行(dataSize)次，列重復1
    sqDiffMat = diffMat**2 #平方操作
    sqDistances = sqDiffMat.sum(axis=1) # 每一個列向量相加,axis=0為行相加
    distances = sqDistances**0.5
    sortedDistIndicies = distances.argsort() # argsort函數返回的是數組值從小到大的索引值
    #print sortedDistIndicies #產生的是一個從小到大排序后索引號的矩陣
    classCount={}

    ######累計次數構成字典######
    for i in range(k):
        voteIlabel = labels[sortedDistIndicies[i]] #排名前k個貼標簽
        classCount[voteIlabel] = classCount.get(voteIlabel,0)+1 # 不斷累加計數的過程，體現在字典的更新中
        #get(key,default=None),就是造字典

    ######找到出現次數最大的點######
    #sortedClassCount = sorted(classCount.iteritems(),key = operator.itemgetter(1),reverse=True)
    sortedClassCount = sorted(classCount.iteritems(),key = lambda d:d[1],reverse=True)

    #以value值大小進行排序，reverse=True降序
    return sortedClassCount[0][0]
    #返回出現次數最多的value的key



#測試分類器精度
def datingTest(HORATIO,K):
    hoRatio = HORATIO*0.01 #取HORATIO作為測試數據%
    datingDataMat = file2matrix('C:\\Users\\MrLevo\\Desktop\\AL_Toolbox\\data.txt')
    datingLabels = file2matrix_Label('C:\\Users\\MrLevo\\Desktop\\AL_Toolbox\\label2.txt')
    #datingDataMat,datingLabels = file2matrixComebin('C:\Users\MrLevo\Desktop\AL_Toolbox\datacombinlabel.txt')

    m = datingDataMat.shape[0]
    numTestVecs = int(m*hoRatio) #挑選出多少組測試數據
    errorCount = 0.0

    for i in range(numTestVecs):

        classifierResult = classify0(datingDataMat[i,:],datingDataMat[numTestVecs:m,:],datingLabels[numTestVecs:m],K)
        print "the classifier came back with:%d,the real answer is %d"%(classifierResult,datingLabels[i])
        if classifierResult !=datingLabels[i]:
            errorCount +=1.0
        print "the total error rate is : %f" % (errorCount/float(numTestVecs))


if __name__ == '__main__':
    HORATIO = input("Please enter test set (%): ")
    K = input("Please enter the k: ")
    datingTest(HORATIO,K)

測試結果如下，選取百分之十作為測試集，k=3，進行計算

Please enter test set (%): 10
Please enter the k: 3
the classifier came back with:2,the real answer is 2
the total error rate is : 0.000000
the classifier came back with:2,the real answer is 2
the total error rate is : 0.000000

...

the classifier came back with:9,the real answer is 2
the total error rate is : 0.594542
the classifier came back with:9,the real answer is 2
the total error rate is : 0.595517

錯誤率達到了60%！！！！難道對于高維數據來說，knn是災難，難道這個方法并不適合我的AVIRIS數據集？為什么能在約會匹配網絡得到比較良好的誤差呢？

mdzz

分析解決BUG

原因

數據的類別都堆在一起了！！這就導致取測試樣本的時候一堆相同類別的數據，就像這樣！

這里寫圖片描述

這怎么取樣啊，第一個例子表現的是1，2，3類幾乎是錯開的，所以比較好取樣，但是，對于我這數據而言，數據堆疊太嚴重了，導致錯誤率太高(高的離譜)

解決方案
使用random.shuffle(new_mat)方法，打亂列表數據，當然先要合并列表等等操作，所以重構def SelectLabel(numberOfLines,returnMatLabel,list_label,numberOfColumns):函數，第一個傳入的是數組的行數，也就是樣本個數，第二個參數表示將數組存入list后的列表，第三個則是選擇需要的分類類別的列表形式，第四個是列總數包括維度和標簽
整個函數如下

def SelectLabel(numberOfLines,returnMatLabel,list_label,numberOfColumns):

    new_mat =[]
    for i in range(numberOfLines):
        if (returnMatLabel[:,-1])[i] in list_label: # 挑選選中標簽
            new_mat.append(returnMatLabel[i,:])

    random.shuffle(new_mat)
    classLabelVector = list(array(new_mat)[:,-1])
    returnMat = array(new_mat)[:,0:numberOfColumns-1]
    return returnMat,classLabelVector

重構適用于AVIRIS的Knn算法（支持多類別自主選擇）

改進拓展
1.增加自由選擇類別函數，可以自主選擇所需要分類的類別，比如說，我想知道knn在第1,2,3,6類上的精度，直接可以輸入1,2,3,6即可，增加的核心語句是

list_label = input("please enter label you want to classify(use comma to separate):")
    list_label = list(list_label)

一個個input太麻煩了，所以我選擇直接輸入一組想分類的類別，然后構造list再傳入下一個函數

2增加K值可選，測試樣本可選參數，這樣就可以自己設置k和測試樣本百分比了，這樣就可以驗證自己的更多想法

3.增加自動化適應格式，只需要輸入文件路徑即可運行，格式要求，txt文件，且每行最后一個為標簽即可。

完整代碼

# -*- coding: utf-8 -*-
#Author:哈士奇說喵
#KNN算法
from numpy import *

#txt轉成立于分析的格式
def file2matrixComebin(filename):
    fr = open(filename)
    arrayOlines = fr.readlines()
    numberOfLines = len(arrayOlines)

    #計算列數(包括標簽在內)
    numberOfColumns = arrayOlines[0].split('\n')
    numberOfColumns =(numberOfColumns[0].split('\t'))
    numberOfColumns = len(numberOfColumns)

    returnMatLabel = zeros((numberOfLines,numberOfColumns)) #構造全零陣來存放數據和標簽
    returnAllLabel = zeros((numberOfLines,1)) #存放標簽
    index = 0

    for line in arrayOlines:
        #清洗數據
        line = line.strip()
        listFromLine = line.split('\t')

        #存入數據到list
        returnMatLabel[index,:] = listFromLine[0:numberOfColumns]

        returnAllLabel[index,:] = listFromLine[-1]
        index +=1

    #顯示類別及各類別占個數
    labelclass = set(list(array(returnAllLabel)[:,-1]))
    for i in labelclass:
        print 'Label:',i,'number:',list(array(returnAllLabel)[:,-1]).count(i)
    print 'please select the labels from this ! '

    list_label = input("please enter label you want to classify(use comma to separate):")
    list_label = list(list_label)
    #調用SelectLabel函數來選擇分類的種類
    returnMat,classLabelVector = SelectLabel(numberOfLines,returnMatLabel,list_label,numberOfColumns)

    return returnMat,classLabelVector



#SelectLabel函數，自由選擇需要分類的類別及個數
def SelectLabel(numberOfLines,returnMatLabel,list_label,numberOfColumns):

    new_mat =[]
    for i in range(numberOfLines):
        if (returnMatLabel[:,-1])[i] in list_label: # 挑選選中標簽
            new_mat.append(returnMatLabel[i,:])

    random.shuffle(new_mat)
    classLabelVector = list(array(new_mat)[:,-1])
    returnMat = array(new_mat)[:,0:numberOfColumns-1]
    return returnMat,classLabelVector


def classify0(inX,dataSet,labels,k): # inX用于需要分類的數據，dataSet輸入訓練集

    ######輸入與訓練樣本之間的距離計算######
    dataSetSize = dataSet.shape[0] # 讀取行數,shape[1]則為列數
    diffMat = tile(inX,(dataSetSize,1))-dataSet # tile,重復inX數組的行(dataSize)次，列重復1
    sqDiffMat = diffMat**2 #平方操作
    sqDistances = sqDiffMat.sum(axis=1) # 每一個列向量相加,axis=0為行相加
    distances = sqDistances**0.5
    sortedDistIndicies = distances.argsort() # argsort函數返回的是數組值從小到大的索引值
    #print sortedDistIndicies #產生的是一個從小到大排序后索引號的矩陣
    classCount={}

    ######累計次數構成字典######
    for i in range(k):
        voteIlabel = labels[sortedDistIndicies[i]] #排名前k個貼標簽
        classCount[voteIlabel] = classCount.get(voteIlabel,0)+1 # 不斷累加計數的過程，體現在字典的更新中
        #get(key,default=None),就是造字典

    ######找到出現次數最大的點######
    sortedClassCount = sorted(classCount.iteritems(),key = lambda d:d[1],reverse=True)

    #以value值大小進行排序，reverse=True降序
    return sortedClassCount[0][0]
    #返回出現次數最多的value的key


#測試分類器精度
def datingTest(HORATIO,K,Path):

    hoRatio = HORATIO*0.01 #取HORATIO作為測試數據%
    datingDataMat,datingLabels = file2matrixComebin(Path)
    m = datingDataMat.shape[0]
    numTestVecs = int(m*hoRatio) #挑選出多少組測試數據
    errorCount = 0.0

    for i in range(numTestVecs):
        classifierResult = classify0(datingDataMat[i,:],datingDataMat[numTestVecs:m,:],datingLabels[numTestVecs:m],K)
        print "the classifier came back with:%d,the real answer is %d"%(classifierResult,datingLabels[i])
        if classifierResult !=datingLabels[i]:
            errorCount +=1.0
        print "the total error rate is : %f" % (errorCount/float(numTestVecs))


if __name__ == '__main__':
    HORATIO = input("Please enter test set (%) : ")
    K = input("Please enter the k: ")
    Path = raw_input("Please enter the data path (.txt):")
    datingTest(HORATIO,K,Path)

進行測試，首先選取k=3，測試樣本為百分之十，選擇分類為全分類（13個類別）

Please enter test set (%) : 10
Please enter the k: 3
Please enter the data path (.txt):C:\Users\MrLevo\Desktop\AL_Toolbox\datacombinlabel.txt
Label: 1.0 number: 1434
Label: 2.0 number: 834
Label: 3.0 number: 234
Label: 4.0 number: 497
Label: 5.0 number: 747
Label: 6.0 number: 489
Label: 7.0 number: 968
Label: 8.0 number: 2468
Label: 9.0 number: 614
Label: 10.0 number: 212
Label: 11.0 number: 1294
Label: 12.0 number: 380
Label: 13.0 number: 95
please select the labels from this ! 
please enter label you want to classify(use comma to separate):1,2,3,4,5,6,7,8,9,10,11,12,13

the classifier came back with:11,the real answer is 11
the total error rate is : 0.000000
the classifier came back with:8,the real answer is 8
the total error rate is : 0.000000

...

the classifier came back with:9,the real answer is 9
the total error rate is : 0.208577

當選擇k=3，取樣率百分之十，分類類別為第1,8,11類

...

the classifier came back with:11,the real answer is 11
the total error rate is : 0.073218

當選擇k=3，取樣百分之十，分類為第2,5,7,9

...

the classifier came back with:7,the real answer is 7
the total error rate is : 0.088608

從誤差上來說，這個精度還算是不錯的了，因為這組數據維度是200，數據集是10266組，類別13類，一般而言，有監督如果不上SVM的話，單一算法未改進的差不多也是這個準確度，原來真的是取樣的問題！！

再來測試另一個高光譜數據KSC1，維度176個，樣本數3784個
k=3，取百分之十做測試集

Please enter test set (%) : 10
Please enter the k: 3
Please enter the data path (.txt):C:\Users\MrLevo\Desktop\AL_Toolbox\testKSC1.txt
Label: 0.0 number: 761
Label: 1.0 number: 243
Label: 2.0 number: 256
Label: 3.0 number: 252
Label: 4.0 number: 161
Label: 5.0 number: 229
Label: 6.0 number: 105
Label: 7.0 number: 431
Label: 8.0 number: 419
Label: 9.0 number: 927
please select the labels from this ! 
please enter label you want to classify(use comma to separate):0,1,2,3,4,7

the classifier came back with:7,the real answer is 7
the total error rate is : 0.000000

...

the classifier came back with:0,the real answer is 0
the total error rate is : 0.061905

ok完美實現，其余的就不一一測試了。

Pay Attention

1.請盡量選擇樣本數相差不多的類別進行分類，不然分類精度上下浮動很大，比如事先，你可以查看一下自己的數據標簽是多少個

labelclass = set(datingLabels)
for i in labelclass:
    print i,datingLabels.count(i)

查詢可得我的標簽樣本為，如第一類有1434個樣本。

2.融合數據和標簽
在改進版代碼中，我的樣本和標簽是融合在一份txt中的，所以和擬構那個分開的標簽和樣本集不同，因為要考慮到重新打亂順序，所以需要樣本和標簽一一對應，之后再打亂順序，再分開。至于怎么合并，直接選中excel的標簽列，復制到樣本的最后一列也就是第201列，粘貼就好了，就像這樣，之后保存為txt即可

這里寫圖片描述

最后

打包成exe點擊這里下載源碼和打包文件方便沒有裝python環境的同學們學習knn算法，使用自己的數據集進行測試驗證自己的idea，至于如何打包請看如何將py文件打包成exe

這里寫圖片描述

最后的最后

深刻理解knn算法的實現過程，期間出現太多問題，沒有一一記錄，但是真學到了非常多的東西，剛好又和研究課題結合起來，覺得非常值得這兩天的不斷推翻重構代碼！

致謝

利用python進行數據分析.Wes McKinney著
機器學習實戰.Peter Harrington著
@MrLevo520--機器學習之K-近鄰算法（Python描述）基礎
 @MrLevo520--（大）數據處理：從txt到數據可視化
 @MrLevo520--NumPy快速入門
 @MrLevo520--解決：將.mat文件保存到.txt不帶有科學計數法e-0
@MrLevo520--如何將py文件打包成exe

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美国产综合欧美视频

機器學習之K-近鄰算法（Python描述）實戰百維萬組數據

機器學習之K-近鄰算法（Python描述）實戰百維萬組數據

前言

目的

首先：將數據可視化

歸一化特征值

驗證分類器思想

驗證分類器精度算法

構建一個能用于實際的系統

What's More！

從.mat到txt

擬構適用于AVIRIS的Knn算法（有致命bug，錯誤率高）

完整測試代碼

分析解決BUG

重構適用于AVIRIS的Knn算法（支持多類別自主選擇）

Pay Attention

最后

這里寫圖片描述

最后的最后

致謝

推薦閱讀更多精彩內容

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美 国产 综合 欧美 视频

機器學習之K-近鄰算法（Python描述）實戰百維萬組數據

前言

目的

首先：將數據可視化

歸一化特征值

驗證分類器思想

驗證分類器精度算法

構建一個能用于實際的系統

What's More！

從.mat到txt

擬構適用于AVIRIS的Knn算法（有致命bug，錯誤率高）

完整測試代碼

分析解決BUG

重構適用于AVIRIS的Knn算法（支持多類別自主選擇）

Pay Attention

最后

這里寫圖片描述

最后的最后

致謝

推薦閱讀更多精彩內容

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美国产综合欧美视频