本文主要為筆者學習kaggle實戰項目“Daily sea ice exten data”時心得筆記,項目主要利用NSIDC提供的每日海冰面積(sea ice extent)數據進行數據分析,學習源代碼為Mathew Savage:visualisation of sea-ice data,僅供交流參考。
3 時間序列分析
3.1 海冰的逐日變化
因為數據直接為每日數據,因此無需進行數據處理。通過想x-y折線圖表現出逐日變化。
- 主體figure
plt.figure
plt.plot
通過plt.figure(figsize=(9,3))
指定紙張大小,將兩副圖疊加繪制
plt.figure(figsize=(9,3))
plt.plot(north.index,north['Extent'],label="North Hemisphere")
plt.plot(south.index,south['Extent'],label="South Hemisphere")
- 圖例legend
plt.legend
#add plot legend and titles
plt.legend(bbox_to_anchor=(0.,-.363,1.,.102),loc=3,ncol=2,mode="expand",borderaxespad=0)
bbox_to_anchor=(0.,-.363,1.,.102)
指定錨點 (x,y,width,height)一般只用x,y
loc=3
表示圖標位于左下,也可以使用·loc=“lower left·”
這里可以省略
ncol=2
表示圖標有幾列,這里是兩列
mode=expand
{"expand", None}水平填充滿坐標區域擺放
borderaxespad=0
邊界與坐標軸之間的距離
- 標題和x/y軸標簽 title&label
plt.title
plt.xlabel
plt.ylabel
plt.ylabel("Sea ice exten(10^6 sq km)")
plt.xlabel('Data')
plt.title('Daily sea ice exten')
3.2 海冰的逐年變化
3.2.1 時間序列的resample
重采樣指將時間序列從一個頻率轉換到另外一個頻率,包括downsampling(高頻到低頻)和upsampling(低頻到高頻)
resample的相關參數:
- freq='12m','5min','Second(15)' 采樣頻率
- how='mean','sum','max',‘min’,'fist','last','median' 采樣方式(‘ohlc’金融計算開盤收盤最高最低的采樣方式)
- axis=0 采樣的軸
- closed=‘right’,'left' 即時間哪一段是包含的
- label=‘right’,‘left’ 時間哪一段是標記的9:30-9:35 默認right即為9:35標記
- loffset=None/‘-1s’ 用于聚合標簽調早1秒
- kind=None 聚合到時期‘period’或‘timestamp’,默認聚集到時間序列的索引類型
- fill_method ffile或者bfill
- limit=none 填充期數
需要對數據求月平均,這里使用了north.resample
即panda對象的resample方法來進行重采樣
例子
各區間哪邊是閉合的?如何標記哪個?
降采樣 -聚合 close、label
ts.resample('5min',how='sum')
groupby采樣:
ts.groupby(lambda x:x.month).mean()
ts.groupby(lambda x:x.weekday).mean()
升采樣:插值!fill_method limit
df_daily=frame.resample('D',fill_method='ffill')
3.2.2 對海冰序列進行降頻處理
由‘D’轉為‘12M’采樣,采樣方式為求平均
#resample raw data into annual averages
northyear=north.resample('12M',how='mean')
southyear=south.resample('12M',how='mean')
默認右邊封閉,標記右邊。因為最初和最末的數據可能會不全,因此將其刪去。
#remove the initial and final itmes as they are averageed incoorrectly
northyear=northyear[1:-1]
southyear=southyear[1:-1]
3.2.2 繪圖
#plot
plt.figure(figsize=(9,3))
plt.plot(northyear.Year,northyear['Extent'],marker='.',label='North hemisphere')
plt.plot(southyear.Year,southyear['Extent'],marker='.',label='South Hemisphere')
#add plot legend and title
plt.xlabel('Year')
plt.ylabel('Sea ice exten(10^6 sq km)')
plt.title('Annual average sea ice')
plt.xlim(1977,2016)
- 通過
plt.xlim
對坐標進行限制
3.3 海冰的逐月變化
#difine date range to plot between
start=1978
end=dt.datetime.now().year+1
畫兩幅子圖使用plt.subplots
,通過設置sharex
共享x軸,返回f-畫布控制對象,axarr圖形控制對象。
#defien plot
f,axarr=plt.subplots(2,sharex=True,figsize=(9,6))
設置主坐標格標注格式axarr.xaxis.set_major_formatter(mdates.DateFormatter("%b"))
繪圖時的顏色循環繪圖,因此需要漸變色
axarr.set_pro_cycle(plt.cycler('color',plt.cm.winter(np.linspace(0,1,len(range(sater,end)))))
#orgnise plot axxes
month_fmt=mdates.DateFormatter("%b")
axarr[0].xaxis.set_major_formatter(month_fmt)
axarr[0].set_prop_cycle(plt.cycler('color',plt.cm.winter(np.linspace(0,1,len(range(start,end))))))
axarr[1].set_prop_cycle(plt.cycler('color',plt.cm.winter(np.linspace(0,1,len(range(start, end))))))
設置子圖的圖例和坐標,使用axarr.set_xlabel
,axarr.set_ylabel
,axarr.set_title
設置坐標名和標題名
axarr.add_artist(AnchoredText())
添加文本框,loc
指文本框位置
#add legend and title
axarr[0].set_ylabel('Sea ice extent (10^6 sq km)')
axarr[1].set_ylabel('Sea ice extent (10^6 sq km)')
axarr[1].set_xlabel('Month')
axarr[0].set_title('Annual change in sea-ice extent');
axarr[0].add_artist(AnchoredText('Northern Hemisphere', loc=3))
axarr[1].add_artist(AnchoredText('Southern Hemisphere', loc=2))
作者繪圖并不是通過計算海冰月平均來展現每月的變化。而是通過循環繪制每年的海冰變化。因此這里需要在一張圖上循環繪圖。為了使得繪圖都在同一個坐標上,認為設定將‘Year’值都定位了1972年。不需要采樣,直接繪圖即可。
# loop for every year between the start year and current
for year in range(start, end):
# create new dataframe for each year,
# and set the year to 1972 so all are plotted on the same axis
nyeardf = north[['Extent', 'Day', 'Month']][north['Year'] == year]
nyeardf['Year'] = 1972
nyeardf['Date'] = pd.to_datetime(nyeardf[['Year','Month','Day']])
nyeardf.index = nyeardf['Date'].values
syeardf = south[['Extent', 'Day', 'Month']][south['Year'] == year]
syeardf['Year'] = 1972
syeardf['Date'] = pd.to_datetime(syeardf[['Year','Month','Day']])
syeardf.index = syeardf['Date'].values
# plot each year individually
axarr[0].plot(nyeardf.index,nyeardf['Extent'], label = year)
axarr[1].plot(syeardf.index,syeardf['Extent'])
3.4 小結
本章學習重點:時間序列數據的重采樣,x-y軸圖的繪制。
3.5 完整代碼
plt.figure(figsize=(9,3))
plt.plot(north.index,north['Extent'],label="North Hemisphere")
plt.plot(south.index,south['Extent'],label="South Hemisphere")
#add plot legend and titles
#plt.legend(bbox_to_anchor=(0.,-.363,1.,.102),loc=3,ncol=2,mode="expand",borderaxespad=0)
plt.legend(bbox_to_anchor=(0.1,-0.1,0.8,0),ncol=2,mode="expand",borderaxespad=0)
plt.ylabel("Sea ice exten(10^6 sq km)")
plt.xlabel('Data')
plt.title('Daily sea ice exten')
plt.figure(figsize=(9,3))
plt.plot(north.index,north['Extent'],label="North Hemisphere")
plt.plot(south.index,south['Extent'],label="South Hemisphere")
#add plot legend and titles
#plt.legend(bbox_to_anchor=(0.,-.363,1.,.102),loc=3,ncol=2,mode="expand",borderaxespad=0)
plt.legend(bbox_to_anchor=(0.1,-0.1,0.8,0),ncol=2,mode="expand",borderaxespad=0)
plt.ylabel("Sea ice exten(10^6 sq km)")
plt.xlabel('Data')
plt.title('Daily sea ice exten')
#difine date range to plot between
start=1978
end=dt.datetime.now().year+1
#defien plot
f,axarr=plt.subplots(2,sharex=True,figsize=(9,6))
#orgnise plot axxes
month_fmt=mdates.DateFormatter("%b")
axarr[0].xaxis.set_major_formatter(month_fmt)
axarr[0].set_prop_cycle(plt.cycler('color',plt.cm.winter(np.linspace(0,1,len(range(start,end))))))
axarr[1].set_prop_cycle(plt.cycler('color',plt.cm.winter(np.linspace(0,1,len(range(start, end))))))
#add legend and title
axarr[0].set_ylabel('Sea ice extent (10^6 sq km)')
axarr[1].set_ylabel('Sea ice extent (10^6 sq km)')
axarr[1].set_xlabel('Month')
axarr[0].set_title('Annual change in sea-ice extent');
axarr[0].add_artist(AnchoredText('Northern Hemisphere', loc=3))
axarr[1].add_artist(AnchoredText('Southern Hemisphere', loc=2))
# loop for every year between the start year and current
for year in range(start, end):
# create new dataframe for each year,
# and set the year to 1972 so all are plotted on the same axis
nyeardf = north[['Extent', 'Day', 'Month']][north['Year'] == year]
nyeardf['Year'] = 1972
nyeardf['Date'] = pd.to_datetime(nyeardf[['Year','Month','Day']])
nyeardf.index = nyeardf['Date'].values
syeardf = south[['Extent', 'Day', 'Month']][south['Year'] == year]
syeardf['Year'] = 1972
syeardf['Date'] = pd.to_datetime(syeardf[['Year','Month','Day']])
syeardf.index = syeardf['Date'].values
# plot each year individually
axarr[0].plot(nyeardf.index,nyeardf['Extent'], label = year)
axarr[1].plot(syeardf.index,syeardf['Extent'])