Module1 Classification

Paste_Image.png

Each observation is represented by a set of numbers(features).

Paste_Image.png
Paste_Image.png

\

Formally, given training set (xi,yi) for i=1…n, we want to create a classification model f that can predict label y for a new x.

Paste_Image.png

The machine learning algorithm will create the function f for you.
The predicted y for a new x is the sigh of f(x).

Loss Functions For Classificaiton

How do we measure classification error?


Paste_Image.png
Paste_Image.png
Paste_Image.png
Paste_Image.png
Paste_Image.png
Paste_Image.png

Statistical Learning Theory For Supervised Learning

Statistical Learning Theory

  • Ockham's Razor: The best models are simple models that fit the data well.
  • William of Ockham,English frier and philosopher (1287-1347) said that among hypotheses that predict equally well, we should choose the one with the fewest assumptions.

Paste_Image.png

We need a balance between accuracy and simplicity.
Most common machine learning methods choose f to minimize training error and complexity.
Aims to thwart the "curse" of dimensionality.

Paste_Image.png

Basic Outline for ML

  • step 1: Split data randomly into training and test sets.
  • step 2: Estimate coefficients/ Train Model:
Paste_Image.png
  • step 3: Score model: Compute score for each xi in the test set
  • step 4: Evaluate model.

Logistic Regression

simple, fast, often competes with the best ML algorithms.

Paste_Image.png
Paste_Image.png

Another perspective:

Paste_Image.png
Paste_Image.png
Paste_Image.png
Paste_Image.png
Paste_Image.png
Paste_Image.png
Paste_Image.png
Paste_Image.png
Paste_Image.png
Paste_Image.png

Evaluation Measures for Classifiers

Paste_Image.png
Paste_Image.png
Paste_Image.png
Paste_Image.png
Paste_Image.png
Paste_Image.png
Paste_Image.png
Paste_Image.png
Paste_Image.png
Paste_Image.png

ROC Curves

  • Started during WWII for analyzing radar signals.
  • For a particular False Positive Rate(FPR), what is the True Positive Rate(TPR)?
  • FPR = number of negatives that were classified by the ML algorithm as positives / total number of negatives
  • TPR = number of positives that were classified by the ML algorithm as positives / total number of positives.
Paste_Image.png

TPR=7/11
FPR=3/11

Paste_Image.png

TPR=3/11
FPR=2/11


Paste_Image.png
Paste_Image.png
Paste_Image.png
最后編輯于
?著作權歸作者所有,轉載或內容合作請聯系作者
平臺聲明:文章內容(如有圖片或視頻亦包括在內)由作者上傳并發布,文章內容僅代表作者本人觀點,簡書系信息發布平臺,僅提供信息存儲服務。

推薦閱讀更多精彩內容

  • 最近男朋友老忙的,然后我也老忙的,都沒有時間在一起了?? 求老板給他減活! 求我能找到新機遇!
    任小菲閱讀 133評論 0 0
  • 星期六,今天我正好也休息,這樣在家就可以和兒子一起來做飯了。因為,上一次學校里就已經組織了一次讓孩子動手給父母做一...
    張子洋媽媽閱讀 815評論 0 2
  • 有時候,不帶目的性地做一件自己喜歡做的事,愿意去做的事,我覺得那是種很美好的感覺,隨心所欲,真誠的,善良的,開心就...
    秋陳閱讀 268評論 0 1
  • 第一次來。 這個應用才下載不超過幾個小時。 我甚至今天剛知道這個應用——簡書。 我什么都不知道,只知道它能寫東西,...
    Dreamfall閱讀 135評論 0 0
  • 還是那句話,我只推薦自己讀過的那些好書,其余的不做太多的評價。 計算機網絡,謝希仁版<a id="orgheadl...
    Yihulee閱讀 17,008評論 1 3