被老头玩弄的漂亮人妻,少妇2做爰伦理,女人被添荫蒂潮喷视频

大家早安、午安、晚安啦，今天繼續學習scikit-learn中K-means聚類模型。在scikit-learn 中聚類的模型很多，可以見下面截圖：

圖1

而上述這些算法的差異性見下圖：

圖2

感覺好復雜的樣子，辣么，先學K-means好啦，貌似是最簡單的聚類。

在scikit-learn中，k-means算法是基于KMeans模型來實現，其基本的思想還是利用上一篇無監督學習K-means聚類算法筆記-Python中提到的最小化SSE(誤差平方和)來逐步迭代求解質心，將數據分為不同的簇。

圖3

上面提到的Inertia就是SSE。K-means方法的主要缺陷如下：

1）Inertia(SSE)其實是假設簇是具有凸的且同極性的（因為他是最小化與質心的距離），但是事實不一定是這樣的，因此，當遇到分布式狹長的或者具有很多小分支的不規則分布的數據(It responds poorly to elongated clusters, or manifolds with irregular shapes.)時，該聚類方法的錯誤率就提高了，比如下圖中的分類

圖4

2）Inertia(SSE)并不是一個標準化的指標，我們只知道這個數值是越小越好且如果為0是最優的，但是在高維度特征值的數據集中，在計算歐式距離時，因為維度很高，導致距離公式急速膨脹，出現所謂的高維災難。此時，就需要先用一些方法降維，然后再采用Kmeans算法。

具體來看看KMeans模型

class sklearn.cluster.KMeans(n_clusters=8, init='k-means++', n_init=10, max_iter=300, tol=0.0001, precompute_distances='auto', verbose=0, random_state=None, copy_x=True, n_jobs=1, algorithm='auto')

n_clusters->最終要形成的簇的個數

init: {‘k-means++’, ‘random’ or an ndarray}->獲取初始化質心的方法

Method for initialization, defaults to ‘k-means++’:

‘k-means++’ : selects initial cluster centers for k-mean clustering in a smart way to speed up convergence. See section Notes in k_init for more details.

‘random’: choose k observations (rows) at random from data for the initial centroids.

If an ndarray is passed, it should be of shape (n_clusters, n_features) and gives the initial centers.

在scikit-learn中，有個栗子是對比‘init: {‘k-means++’, ‘random’ or an ndarray}’中，不同的獲取初始質心的方法將會影響K-means方法的聚類效果。