感謝Dr.fish的耐心講解和細(xì)致回答。
本次課的隨堂作業(yè)如下:
有100個房屋面積的樣本,均值300.85㎡,并已知總體標(biāo)準(zhǔn)差為86㎡
用t分布求房屋平均面積在95%的置信區(qū)間
導(dǎo)入分析包及數(shù)據(jù)
import scipy.stats
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%config InlineBackend.figure_format = 'retina'
house = pd.read_csv('house_size.csv', header=None)
取全部數(shù)據(jù)
house_size = house.iloc[:,0] # 取全部數(shù)據(jù)
計(jì)算T分布,置信度95%下房屋平均面積的置信區(qū)間
# 計(jì)算T分布,置信度95%下房屋平均面積的置信區(qū)間
house_std = house.std() # 計(jì)算樣本標(biāo)準(zhǔn)差
sample_mean = house_size.mean() # 計(jì)算樣本均值
sample_size = len(house_size)
t_score = scipy.stats.t.pdf(0.025 , sample_size - 1)
margin_error = t_score * house_std / np.sqrt(sample_size)
lower_limit = sample_mean - margin_error
upper_limit = sample_mean + margin_error
print '95%% Confidence Interval: ( %.1f, %.1f)' % (lower_limit, upper_limit)
# 輸出結(jié)果
95% Confidence Interval: ( 297.3, 304.4)
另一種方法--定義函數(shù)計(jì)算置信區(qū)間
# 定義函數(shù)計(jì)算置信區(qū)間
def ci_t(data, house_std, confidence):
sample_mean = np.mean(data)
sample_size = len(data)
alpha = (1 - confidence) / 2
t_score = scipy.stats.t.pdf(alpha , sample_size - 1)
ME = t_score * house_std / np.sqrt(sample_size)
lower_limit = sample_mean - ME
upper_limit = sample_mean + ME
return (lower_limit , upper_limit)
輸入數(shù)據(jù)
# 設(shè)置95%置信區(qū)間
ci_t(house_size, house_std, 0.95)
# 輸出結(jié)果
(297.311149,304.388851)