前言：

以斯坦福cs231n課程的python編程任務為主線，展開對該課程主要內容的理解和部分數學推導。
該課程相關筆記參考自知乎-CS231n官方筆記授權翻譯總集篇發布
課程材料和事例參考自-cs231n

SVM分類器簡介：

SVM-支持向量機(Support Vector Machine)，是一個有監督的線性分類器
線性分類器：在本模型中，我們從最簡單的函數開始，一個線性映射：

這個公式就是平時最常見到的線性函數，常為一維線性函數（即 W 為一維的）。當這種函數擴展到多維度的情況下時就是我們SVM要面臨的情況。首先我們要做的處理是將每個圖像數據都拉長為一個長度為D的列向量，大小為 [D * 1] 。其中大小為 [K * D] 的矩陣W和大小為 [K 1] 列向量 b 為該函數的參數。以CIFAR-10為例,CIFAR-10中一個圖像的大小等于 [32323] ,含了該圖像的所有像素信息，這些信息被拉成為一個 [3072 * 1] 的列向量， W 大小為 [103072] ， b 的大小為 [10*1] 。因此，3072個數字（素數值）輸入函數，函數輸出10個數字（不同分類得到的評分）。參數 W 被稱為權重（weights）。 b 被稱為偏差向量（bias vector）。

理解線性分類器

線性分類器計算圖像中3個顏色通道中所有像素的值與權重的矩陣乘，從而得到分類分值。根據我們對權重設置的值，對于圖像中的某些位置的某些顏色，函數表現出的得分即對該點的接受程度。例如對于飛機來說，飛機圖片中包含有大量的藍色天空，白色的云彩以及白色的飛機，那么這個飛機分類器就會在藍色通道上的權重比較多，而在其他通道上的權重就較少,正如筆記中指出的：

一個將圖像映射到分類分值的例子。為了便于可視化，假設圖像只有4個像素（都是黑白像素，這里不考慮RGB通道），有3個分類（紅色代表貓，綠色代表狗，藍色代表船，注意，這里的紅、綠和藍3種顏色僅代表分類，和RGB通道沒有關系）。首先將圖像像素拉伸為一個列向量，與W進行矩陣乘，然后得到各個分類的分值。需要注意的是，這個W一點也不好：貓分類的分值非常低。從上圖來看，算法倒是覺得這個圖像是一只狗。

現在考慮高維度情況：還是以CIFAR-10為例，CIFAR-10中的圖片轉化成一個向量（3072維）后，就是一個高維度問題，而一個向量（3色通道轉化而來）可以看作是3072維空間中的一個點，而線性分類器就是在高維度空間中的一個超平面，將各個空間點分開。如圖所示：

圖像空間的示意圖。其中每個圖像是一個點，有3個分類器。以紅色的汽車分類器為例，紅線表示空間中汽車分類分數為0的點的集合，紅色的箭頭表示分值上升的方向。所有紅線右邊的點的分數值均為正，且線性升高。紅線左邊的點分值為負，且線性降低。

目標：而我們要做的就是尋找一個W和一個b,使得這個超平面能很好的區分各個類。尋找方法就是不停的改變w和b的值，即不停的旋轉平移，直到它使分類的偏差較小。

SVM的組成：

<li>圖像數據預處理：在上面的例子中，所有圖像都是使用的原始像素值（從0到255）。在機器學習中，對于輸入的特征做歸一化（normalization）是必然的。在圖像處理中，每個像素點可以看作是一個簡單的特征,在一般使用過程中，我們都先將特征“集中”，即訓練集中所有的圖像計算出一個平均圖像值，然后每個圖像都減去這個平均值，這樣圖像的像素值就大約分布在[-127, 127]之間了，下一個常見步驟是，讓所有數值分布的區間變為[-1, 1]。

<li>損失函數（loss function）：如何評判分類器的偏差就是當前的問題，解決這問題的方法就是損失函數:

這個函數得到的就是當前分類的偏差值。

舉例：用一個例子演示公式是如何計算的。假設有3個分類，并且得到了分值s=[13,-7,11]。其中第一個類別是正確類別，即$y_i=0$。同時假設$\Delta$是10。上面的公式是將所有不正確分類加起來，所以得到兩個部分：
$$Li=max(0,-7-13+10)+max(0,11-13+10)$$
可以看到第一個部分結果是0，這是因為[-7-13+10]得到的是負數，經過函數處理后得到0。這一對類別分數和標簽的損失值是0，這是因為正確分類的得分13與錯誤分類的得分-7的差為20，高于邊界值10。而SVM只關心差距至少要大于10，更大的差值還是算作損失值為0。第二個部分計算[11-13+10]得到8。雖然正確分類的得分比不正確分類的得分要高（13>11），但是比10的邊界值還是小了，分差只有2，這就是為什么損失值等于8。簡而言之，SVM的損失函數想要正確分類類別的分數比不正確類別分數高，而且至少要高。如果不滿足這點，就開始計算損失值。
那么在這次的模型中，我們面對的是線性評分函數（f(x_i,W)=Wx_i），所以我們可以將損失函數的公式稍微改寫一下：

其中w_j是權重W的第j行，被變形為列向量。然而，一旦開始考慮更復雜的評分函數f公式，這樣做就不是必須的了。

<li>正則化(Regularization):上面損失函數有一個問題。假設有一個數據集和一個權重集W能夠正確地分類每個數據（即所有的邊界都滿足，對于所有的i都有）。問題在于這個W并不唯一：可能有很多相似的W都能正確地分類所有的數據。

一個簡單的例子：如果W能夠正確分類所有數據，即對于每個數據，損失值都是0。那么當時，任何數乘都能使得損失值為0，因為這個變化將所有分值的大小都均等地擴大了，所以它們之間的絕對差值也擴大了。舉個例子，如果一個正確分類的分值和舉例它最近的錯誤分類的分值的差距是15，對W乘以2將使得差距變成30。

當然,在沒有這種模糊性的情況下我們能很好的控制偏差。而減少這種模糊性的方法是向損失函數增加一個正則化懲罰（regularization penalty）部分。最常用的正則化懲罰是L2范式，L2范式通過對所有參數進行逐元素的平方懲罰來抑制大數值的權重,將其展開完整公式是:

其中，N是訓練集的數據量。現在正則化懲罰添加到了損失函數里面，并用超參數來計算其權重。該超參數無法簡單確定，需要通過交叉驗證來獲取,引入了L2懲罰后，SVM們就有了最大邊界這一良好性質。（如果感興趣，可以查看CS229課程）。

SVM實現：

<li>linear_svm.py

#coding:utf-8
import numpy as np
from random import shuffle

def svm_loss_naive(W, X, y, reg):
  """
  Structured SVM loss function, naive implementation (with loops).

  Inputs have dimension D, there are C classes, and we operate on minibatches
  of N examples.

  Inputs:
  - W: A numpy array of shape (D, C) containing weights.
  - X: A numpy array of shape (N, D) containing a minibatch of data.
  - y: A numpy array of shape (N,) containing training labels; y[i] = c means
    that X[i] has label c, where 0 <= c < C.
  - reg: (float) regularization strength

  Returns a tuple of:
  - loss as single float
  - gradient with respect to weights W; an array of same shape as W
  """
  dW = np.zeros(W.shape) # initialize the gradient as zero

  # compute the loss and the gradient
  num_classes = W.shape[1]
  num_train = X.shape[0]
  loss = 0.0
  for i in xrange(num_train):
    scores = X[i].dot(W)
    correct_class_score = scores[y[i]]
    for j in xrange(num_classes):
      if j == y[i]:
        continue
      margin = scores[j] - correct_class_score + 1 # note delta = 1
      if margin > 0:
        loss += margin
        dW[:, y[i]] += -X[i, :]    
        dW[:, j] += X[i, :]         

  # Right now the loss is a sum over all training examples, but we want it
  # to be an average instead so we divide by num_train.
  loss /= num_train
  dW /= num_train
  # Add regularization to the loss.
  loss +=  reg * np.sum(W * W)
  dW += reg * W
  
  return loss, dW


def svm_loss_vectorized(W, X, y, reg):
  """
  Structured SVM loss function, vectorized implementation.

  Inputs and outputs are the same as svm_loss_naive.
  """
  loss = 0.0
  dW = np.zeros(W.shape) # initialize the gradient as zero
  scores = X.dot(W)        
  num_classes = W.shape[1]
  num_train = X.shape[0]

  scores_correct = scores[np.arange(num_train), y]   # 1 by N
  scores_correct = np.reshape(scores_correct, (num_train, -1))  # N by 1
  margins = scores - scores_correct + 1    # N by C
  margins = np.maximum(0,margins)
  margins[np.arange(num_train), y] = 0
  loss += np.sum(margins) / num_train
  loss += 0.5 * reg * np.sum(W * W)

  # compute the gradient
  margins[margins > 0] = 1
  row_sum = np.sum(margins, axis=1)                  # 1 by N
  margins[np.arange(num_train), y] = -row_sum        
  dW += np.dot(X.T, margins)/num_train + reg * W     # D by C

  return loss, dW

<li>linear_classifier.py

#coding:utf-8
import numpy as np
from classifiers.linear_svm import *
from classifiers.softmax import *

class LinearClassifier(object):

  def __init__(self,w=None):
    self.W = w

  def train(self, X, y, learning_rate=1e-3, reg=1e-5, num_iters=100,
            batch_size=200, verbose=False):
    """
    Train this linear classifier using stochastic gradient descent.

    Inputs:
    - X: A numpy array of shape (N, D) containing training data; there are N
      training samples each of dimension D.
    - y: A numpy array of shape (N,) containing training labels; y[i] = c
      means that X[i] has label 0 <= c < C for C classes.
    - learning_rate: (float) learning rate for optimization.
    - reg: (float) regularization strength.
    - num_iters: (integer) number of steps to take when optimizing
    - batch_size: (integer) number of training examples to use at each step.
    - verbose: (boolean) If true, print progress during optimization.

    Outputs:
    A list containing the value of the loss function at each training iteration.
    """
    num_train, dim = X.shape
    num_classes = np.max(y) + 1 # assume y takes values 0...K-1 where K is number of classes
    if self.W is None:
      # lazily initialize W
      self.W = 0.001 * np.random.randn(dim, num_classes)

    # Run stochastic gradient descent to optimize W
    loss_history = []
    for it in xrange(num_iters):
      X_batch = None
      y_batch = None

      sample_index = np.random.choice(num_train, batch_size, replace=False)
      X_batch = X[sample_index, :]   # select the batch sample
      y_batch = y[sample_index]      # select the batch label
     
      # evaluate loss and gradient
      loss, grad = self.loss(X_batch, y_batch, reg)
      loss_history.append(loss)

      # perform parameter update
      self.W += -learning_rate * grad

      if verbose and it % 100 == 0:
        print 'iteration %d / %d: loss %f' % (it, num_iters, loss)

    return loss_history

  def predict(self, X):
    """
    Use the trained weights of this linear classifier to predict labels for
    data points.

    Inputs:
    - X: D x N array of training data. Each column is a D-dimensional point.

    Returns:
    - y_pred: Predicted labels for the data in X. y_pred is a 1-dimensional
      array of length N, and each element is an integer giving the predicted
      class.
    """
    y_pred = np.zeros(X.shape[1])
    score = X.dot(self.W)
    y_pred = np.argmax(score,axis=1)
    return y_pred
  
  def loss(self, X_batch, y_batch, reg):
    """
    Compute the loss function and its derivative. 
    Subclasses will override this.

    Inputs:
    - X_batch: A numpy array of shape (N, D) containing a minibatch of N
      data points; each point has dimension D.
    - y_batch: A numpy array of shape (N,) containing labels for the minibatch.
    - reg: (float) regularization strength.

    Returns: A tuple containing:
    - loss as a single float
    - gradient with respect to self.W; an array of the same shape as W
    """
    pass


class LinearSVM(LinearClassifier):
  """ A subclass that uses the Multiclass SVM loss function """

  def loss(self, X_batch, y_batch, reg):
    return svm_loss_vectorized(self.W, X_batch, y_batch, reg)


class Softmax(LinearClassifier):
  """ A subclass that uses the Softmax + Cross-entropy loss function """

  def loss(self, X_batch, y_batch, reg):
    return softmax_loss_vectorized(self.W, X_batch, y_batch, reg)

測試：

不同參數下SVM10類分類的準確率如下：

總結：

SVM在分類少以及線性的情況下有非常好的分類效果（尤其是二類），在配合PCA的情況下會有更好的結果。

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美国产综合欧美视频

cs231n課程作業assignment1（SVM）

cs231n課程作業assignment1（SVM）

前言：

SVM分類器簡介：

理解線性分類器

SVM的組成：

SVM實現：

測試：

總結：

推薦閱讀更多精彩內容

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美 国产 综合 欧美 视频

cs231n課程作業assignment1（SVM）

前言：

SVM分類器簡介：

理解線性分類器

SVM的組成：

SVM實現：

測試：

總結：

推薦閱讀更多精彩內容

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美国产综合欧美视频