青苹果乐园电视剧影视,国产av在线一区二区,亚洲精品精华液一区二区

搭建編程環(huán)境

此處推薦安裝Octave，如若已安裝Matlab也可。這里不過多敘述如何安裝Octave或Matlab，請自行查閱相關(guān)資料。

多維特征（Multiple Features）

之前我們學(xué)習(xí)了單變量線性回歸，現(xiàn)在我們繼續(xù)利用房價的例子來學(xué)習(xí)多變量線性回歸。

如上圖所示，我們對房價模型增加一些特征，例如：房間的數(shù)量、樓層數(shù)和房屋使用年限。對此，我們分別令x₁，x₂，x₃和x₄表示房屋面積、房間的數(shù)量、樓層數(shù)和房屋使用年限。

這里增添了一些特征，我們也要引入一系列新的符號標記：

n：代表特征的數(shù)量
x⁽ⁱ⁾：代表第i個訓(xùn)練示例，即表示特征矩陣中的第i行
x_j⁽ⁱ⁾：代表特征矩陣中第i行的第j個特征

因此，我們的多變量線性回歸的表達式為：
　　h_θ(x) = θ₀+θ₁x₁+θ₂x₂+···+θ_nx_n

這個公式中有n+1個參數(shù)和n個變量，為了簡化該公式，我們引入x₀=1（x₀⁽ⁱ⁾=1），則公式可以轉(zhuǎn)化為：
　　h_θ(x) = θ₀x₀+θ₁x₁+θ₂x₂+···+θ_nx_n

此時公式中有n+1個參數(shù)和n+1個變量，此時我們可以將參數(shù)和變量看成n+1維的向量（即θ表示n+1維的（參數(shù)）向量，X表示n+1維的（變量）向量），則我們可將公式簡化成：
　　h_θ(x) = θ^TX

補充筆記

Multiple Features

Linear regression with multiple variables is also known as "multivariate linear regression".

We now introduce notation for equations where we can have any number of input variables.

x_j⁽ⁱ⁾ = value of feature j in the i^th training example
x⁽ⁱ⁾ = the input (features) of the i^th training example

Note:

m = the number of training example
n = the number of features

The multivariable form of the hypothesis function accommodating these multiple features is as follows:
　　h_θ(x) = θ₀+θ₁x₁+θ₂x₂+···+θ_nx_n

In order to develop intuition about this function, we can think about θ₀ as the basic price of a house, θ₁ as the price per square meter, θ₂ as the price per floor, etc. x₁ will be the number of square meters in the house, x₂ the number of floors, etc.

Using the definition of matrix multiplication, our multivariable hypothesis function can be concisely represented as:

This is a vectorization of our hypothesis function for one training example.

多變量梯度下降（Gradient Descent For Multiple Variables）

與之前的單變量線性回歸類似，我們也構(gòu)建了一個代價函數(shù)J：

我們的目標與在單變量線性回歸中一樣，找出使得代價函數(shù)最小的一系列參數(shù)。在單變量線性回歸中，我們引入梯度下降算法來找尋該參數(shù)。因此，在多變量線性回歸中，我們依舊引入梯度下降算法。

即：

通過簡單的求導(dǎo)后可得：

補充筆記

Gradient Descent for Multiple Variables

The gradient descent equation itself is generally the same form; we just have to repeat it for our 'n' features:

In other words:

The following image compares image compares gradient descent with one value to gradient descent with multiple variables:

特征縮放（Feature Scaling）

在多維特征的情況下，若我們保證這些特征都具有相近的尺度，則梯度下降算法能夠更快地收斂。

我們還是以房價預(yù)測為例，假設(shè)此處我們只使用兩個特征，房屋的面積和房間的數(shù)量，房屋面積的取值范圍為0~2000平方英尺，房間數(shù)量的取值范圍為0~5。同時，我們以兩個參數(shù)為橫、縱坐標軸構(gòu)建代價函數(shù)的等高線圖。

從圖中可看出，橢圓較扁，且根據(jù)圖中紅色線條可知，梯度下降算法需要較多次數(shù)迭代才能收斂。

因此，為了讓梯度下降算法更快的收斂，我們采用特征縮放和均值歸一化的方法。特征縮放通過將特征變量除以特征變量的范圍（即最大值減去最小值）的方法，使得特征變量的新取值范圍僅為1，即-1 ≤ x_(i) ≤1；均值歸一化通過特征變量的值減去特征變量的平均值的方法，使得特征變量的新平均值為0。我們通常使用如下公式實現(xiàn)特征縮放和均值歸一化：

其中μ_n表示某一特征的平均值，s_n表示某一特征的標準差（或最大值與最小值間的差，即max-min）。

補充筆記

Feature Scaling

We can speed up gradient descent by having each of our input values in roughly the same range. This is because θ will descend quickly on small ranges and slowly on large ranges, and so will oscillate inefficiently down to the optimum when the variables are very uneven.

The way to prevent this is to modify the ranges of our input variables so that they are all roughly the same. Ideally:
　　-1 ≤ x_(i) ≤1
or
　　-0.5 ≤ x_(i) ≤0.5

These aren't exact requirements; we are only trying to speed things up. The goal is to get all input variables into roughly one of these ranges, give or take a few.

Two techniques to help with this are feature scaling and mean normalization. Feature scaling involves dividing the input values by the range (i.e. the maximum value minus the minimum values) of the input variable, resulting in a new range of just 1. Mean normalization involves subtracting the average value for an input variable from the values for that input variable resulting in a new average value for the input variable of just zero. To implement both of these techniques, adjust your input values as shown in this formula:

Where μ_i is the average of all the values for feature (i) and s_i is the range of values (max - min), or s_i is the standard deviation.

Note that dividing by the range, or dividing by the standard deviation, give different results.

學(xué)習(xí)率α

梯度下降算法收斂所需要的迭代次數(shù)根據(jù)模型的不同而不同。實際上，我們很難提前判斷梯度下降算法需要多少步迭代才能收斂。對此，我們通常畫出代價函數(shù)隨著迭代步數(shù)增加的變化曲線來試著預(yù)測梯度下降算法是否已經(jīng)收斂。

同時，這是種方法也可以進行一些自動收斂測試。（注：自動收斂測試就是用一種算法來判斷梯度下降算法是否收斂，通常要選擇一個合理的閾值ε來與代價函數(shù)J(θ)的下降的幅度比較，如若代價函數(shù)J(θ)的下降的幅度小于這個閾值ε，則可判斷梯度下降算法已經(jīng)收斂。但這個閾值ε的選擇是非常困難的，因此我們實際上還是通過觀察曲線圖來判斷梯度下降算法是否收斂。）

梯度下降算法的每次迭代都要受到學(xué)習(xí)率α的影響，當學(xué)習(xí)率α過小時，則梯度下降算法要進行很多次迭代才能收斂；當學(xué)習(xí)率α過大時，則梯度下降算法可能就會出錯，即每次迭代，代價函數(shù)可能不會下降，并可能越過局部最小值導(dǎo)致無法收斂。

補充筆記

Learning Rate

Debugging gradient descent. Make a plot with number of iterations on the x-axis. Now plot the cost function, J(θ) over the number of iterations of gradient descent. If J(θ) ever increase, then you probably need to decrease α.

Automatic convergence test. Declare convergence if J(θ) decrease by less than E in one iteration, where E is some small value such as 10^-3. However in practice it's difficult to choose this threshold value.

It has been proven that if learning rate α is sufficiently small, then J(θ) will decrease on every iteration.

To summarize:

if α is too small: slow convergence.
if α is too large: may not decrease on every iteration and thus may not converge.

特征和多項式回歸（Features and Polynomial Regression）

之前我們介紹了多變量的線性回歸，現(xiàn)在我們來學(xué)習(xí)一下多項式回歸，其能幫助我們使用線性回歸的方法來擬合非常復(fù)雜的函數(shù)，甚至是非線性函數(shù)。

比如有時我們想使用二次方模型（h_θ(x) = θ₀ + θ₁x₁ + θ₂x₂²）來擬合我們的數(shù)據(jù)，又有時我們想使用三次方模型（h_θ(x) = θ₀ + θ₁x₁ + θ₂x₂² + θ₃x₃³）來擬合我們的數(shù)據(jù)······

通常我們需要先觀察數(shù)據(jù)然后來決定參數(shù)使用什么樣的模型。

另外，我們可以令：

x₂ = x₂²
x₃ = x₃³
······

這樣我們就將這些多項式回歸模型又轉(zhuǎn)換為線性回歸模型。（注：我們在使用多項式回歸模型時，由于會對變量x_i進行平方、立方等操作，因此我們有必要在運行梯度下降算法之前進行特征縮放。）

補充筆記

Features and Polynomial Regression

We can improve our features and the form of our hypothesis function in a couple different ways.

We can combine multiple features into one. For example, we can combine x₁ and x₂ into a new feature x₃ by taking x₁ * x₂.

Polynomial Regression

Our hypothesis function need not be linear (a straight line) if that does not fit the data well.

We can change the behavior or curve of our hypothesis function by making it a quadratic, cubic or square root function (or any other form).

For example, if our hypothesis function is h_θ(x) = θ₀ + θ₁x₁ then we can create additional features based on x₁, to get the quadratic function h_θ(x) = θ₀ + θ₁x₁ + θ₂x₁² or the cubic function h_θ(x) = θ₀ + θ₁x₁ + θ₂x₁² + θ₃x₁³

In the cubic version, we have created new features x₂ = x₁² and x₃ = x₁³.

To make it a square root function, we could do:

One important thing to keep in mind is, if you choose your features this way then feature scaling becomes very important.

eg. if x₁ has range 1~1000 then range of x₁² becomes 1~1000000 and that of x₁³ becomes 1~1000000000

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美国产综合欧美视频

多變量線性回歸（一）

多變量線性回歸（一）

搭建編程環(huán)境

多維特征（Multiple Features）

補充筆記

Multiple Features

多變量梯度下降（Gradient Descent For Multiple Variables）

補充筆記

Gradient Descent for Multiple Variables

特征縮放（Feature Scaling）

補充筆記

Feature Scaling

學(xué)習(xí)率α

補充筆記

Learning Rate

特征和多項式回歸（Features and Polynomial Regression）

補充筆記

Features and Polynomial Regression

推薦閱讀更多精彩內(nèi)容

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美 国产 综合 欧美 视频

多變量線性回歸（一）

搭建編程環(huán)境

多維特征（Multiple Features）

補充筆記

Multiple Features

多變量梯度下降（Gradient Descent For Multiple Variables）

補充筆記

Gradient Descent for Multiple Variables

特征縮放（Feature Scaling）

補充筆記

Feature Scaling

學(xué)習(xí)率α

補充筆記

Learning Rate

特征和多項式回歸（Features and Polynomial Regression）

補充筆記

Features and Polynomial Regression

推薦閱讀更多精彩內(nèi)容

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美国产综合欧美视频