代價函數(shù)及梯度下降算法的應(yīng)用
/#1
Consider the following training set of m=4 training examples:
x | y |
---|---|
1 | 0.5 |
2 | 1 |
4 | 2 |
0 | 0 |
Consider the linear regression model hθ(x)=θ0+θ1x. What are the values of θ0 and θ1 that you would expect to obtain upon running gradient descent on this model? (Linear regression will be able to fit this data perfectly.)
A. θ0=0.5,θ1=0
B. θ0=0.5,θ1=0.5
C. θ0=1,θ1=1
D. θ0=1,θ1=0.5
-
F. θ0=0,θ1=0.5
分析解答:由四組樣本數(shù)據(jù)可以得出一個標(biāo)準(zhǔn)的一元一次線性方程,由此可求出答案是F
/#2
Let f be some function so that
f(θ0,θ1) outputs a number. For this problem,
f is some arbitrary/unknown smooth function (not necessarily the
cost function of linear regression, so f may have local optima).
Suppose we use gradient descent to try to minimize f(θ0,θ1)
as a function of θ0 and θ1. Which of the
following statements are true? (Check all that apply.)
A. If θ0 and θ1 are initialized at the global minimum, then one iteration will not change their values.
B. Setting the learning rate α to be very small is not harmful, and can only speed up the convergence of gradient descent.
C. If the first few iterations of gradient descent cause f(θ0,θ1) to increase rather than decrease, then the most likely cause is that we have set the learning rate α to too large a value.
-
D. No matter how θ0 and θ1 are initialized, so long as α is sufficiently small, we can safely expect gradient descent to convergen to the same solution.
分析解答:學(xué)習(xí)速率影響其數(shù)據(jù)變化的快慢
/#3
For this question, assume that we are
using the training set from Q1. Recall our definition of the
cost function was J(θ0,θ1)=12m∑i=1m(hθ(x(i))?y(i))2.
What is J(0,1)? In the box below,
please enter your answer (Simplify fractions to decimals when entering answer, and '.' as the decimal delimiter e.g., 1.5).
分析解答:展開公式直接帶入得0.5
多元線性回歸方程
Suppose m=4 students have taken some class, and the class had a midterm exam and a final exam. You have collected a dataset of their scores on the two exams, which is as follows:
midterm exam | (midterm exam)^2 | final exam |
---|---|---|
89 | 7921 | 96 |
72 | 5184 | 74 |
94 | 8836 | 87 |
69 | 4761 | 78 |
You'd like to use polynomial regression to predict a student's final exam score from their midterm exam score. Concretely, suppose you want to fit a model of the form hθ(x)=θ0+θ1x1+θ2x2, where x1 is the midterm score and x2 is (midterm score)2. Further, you plan to use both feature scaling (dividing by the "max-min", or range, of a feature) and mean normalization.
What is the normalized feature x1(3)? (Hint: midterm = 94, final = 87 is training example 3.) Please round off your answer to two decimal places and enter in the text box below.
公式:正規(guī)方程特征 = (目標(biāo)值 - 平均值)/(Max-Min)
分析解答:平均值為 (7921+5184+8836+4761)/4=6675.5
Max-Min=8836-4761=4075
(94-6675.5)/4075=-1.61509202
保留兩位小數(shù)為-1.62