本文將帶著你使用Python對標普100數據進行簡單的分析,你會學到:
- NumPy數組及其運算
- 布爾索引篩選數據
- 散點圖和直方圖的繪制
標普100數據
標準普爾100指數用來衡量大公司的股票表現,它由多個行業的100家主要公司構成。2017年標普100在各行業的比例如下圖所示。
本文將要分析的數據如下表所示,它由四列數據構成,分別是公司名(Name),行業(Sector),股價(Price)和每股盈余(EPS)。
我們將這四列數據分別存儲在四個Python列表中。
names = ['Apple Inc', 'Abbvie Inc', 'Abbott Laboratories', 'Accenture Plc', 'Allergan Plc', 'American International Group', 'Allstate Corp', 'Amgen', 'Amazon.Com Inc.', 'American Express Company', 'Boeing Company', 'Bank of America Corp', 'Biogen Inc', 'Bank of New York Mellon Corp', 'Blackrock', 'Bristol-Myers Squibb Company', 'Berkshire Hath Hld B', 'Citigroup Inc', 'Caterpillar Inc', 'Celgene Corp', 'Charter Communicatio', 'Colgate-Palmolive Company', 'Comcast Corp A', 'Capital One Financial Corp', 'Conocophillips', 'Costco Wholesale', 'Cisco Systems Inc', 'CVS Corp', 'Chevron Corp', 'Danaher Corp', 'Walt Disney Company', 'Duke Energy Corp', 'Dowdupont Inc.', 'Emerson Electric Company', 'Exelon Corp', 'Ford Motor Company', 'Facebook Inc', 'Fedex Corp', '21st Centry Fox Class B', '21st Centry Fox Class A', 'General Dynamics Corp', 'General Electric Company', 'Gilead Sciences Inc', 'General Motors Company', 'Alphabet Class C', 'Alphabet Class A', 'Goldman Sachs Group', 'Halliburton Company', 'Home Depot', 'Honeywell International Inc', 'International Business Machines', 'Intel Corp', 'Johnson & Johnson', 'JP Morgan Chase & Co', 'Kraft Heinz Co', 'Kinder Morgan', 'Coca-Cola Company', 'Eli Lilly and Company', 'Lockheed Martin Corp', "Lowe's Companies", 'Mastercard Inc', "McDonald's Corp", 'Mondelez Intl Cmn A', 'Medtronic Inc', 'Metlife Inc', '3M Company', 'Altria Group', 'Monsanto Company', 'Merck & Company', 'Morgan Stanley', 'Microsoft Corp', 'Nextera Energy', 'Nike Inc', 'Oracle Corp', 'Occidental Petroleum Corp', 'Priceline Group', 'Pepsico Inc', 'Pfizer Inc', 'Procter & Gamble Company', 'Philip Morris International Inc', 'Paypal Holdings', 'Qualcomm Inc', 'Raytheon Company', 'Starbucks Corp', 'Schlumberger N.V.', 'Southern Company', 'Simon Property Group', 'AT&T Inc', 'Target Corp', 'Time Warner Inc', 'Texas Instruments', 'Unitedhealth Group Inc', 'Union Pacific Corp', 'United Parcel Service', 'U.S. Bancorp', 'United Technologies Corp', 'Visa Inc', 'Verizon Communications Inc', 'Walgreens Boots Alliance', 'Wells Fargo & Company', 'Wal-Mart Stores', 'Exxon Mobil Corp']
prices = [170.12, 93.29, 55.28, 145.3, 171.81, 59.5, 100.5, 168.93, 1126.82, 93.92, 265.04, 26.7, 311.92, 52.73, 474.05, 60.48, 181.27, 71.87, 137.37, 102.88, 346.2, 72.16, 36.13, 88.26, 49.89, 171.22, 36.38, 70.18, 114.84, 93.45, 103.02, 88.61, 71.12, 60.14, 41.32, 12.11, 179.14, 217.75, 30.42, 31.14, 198.7, 17.91, 71.63, 44.74, 1018.48, 1034.09, 238.05, 41.57, 170.13, 148.04, 151.4, 44.88, 138.54, 98.58, 80.59, 17.04, 45.6, 82.97, 312.93, 81.43, 149.93, 167.01, 42.49, 79.52, 51.85, 232.49, 66.51, 118.19, 53.74, 49.06, 82.49, 155.7, 59.46, 48.97, 68.17, 1762.23, 115.5, 35.38, 88.33, 103.35, 76.55, 66.83, 184.22, 56.83, 61.53, 51.12, 159.25, 34.59, 57.77, 88.62, 98.59, 209.75, 115.58, 113.2, 51.88, 117.05, 110.27, 45.85, 70.25, 54.02, 96.08, 80.31]
earnings = [9.2, 5.31, 2.41, 5.91, 15.42, 2.51, 6.79, 12.58, 3.94, 5.22, 9.75, 1.75, 21.59, 3.47, 21.55, 2.96, 6.29, 5.19, 5.55, 6.4, 1.61, 2.87, 2.02, 7.58, 0.02, 5.82, 2.17, 5.71, 3.57, 3.89, 5.7, 4.45, 3.66, 2.58, 2.48, 1.68, 5.19, 11.91, 1.92, 1.92, 10.07, 1.24, 9.58, 6.19, 29.87, 29.87, 19.2, 0.73, 6.96, 6.95, 13.66, 3.18, 7.14, 6.94, 3.56, 0.65, 1.89, 4.09, 12.72, 4.34, 4.31, 6.4, 2.05, 4.69, 5.2, 8.95, 3.16, 5.53, 3.89, 3.61, 3.38, 6.67, 2.35, 2.55, 0.35, 74.45, 5.12, 2.5, 3.98, 4.49, 1.4, 3.78, 7.56, 2.07, 1.29, 2.75, 6.05, 2.93, 4.93, 6.06, 4.06, 9.6, 5.66, 5.98, 3.37, 6.62, 3.48, 3.75, 5.1, 4.14, 4.36, 3.56]
sectors = ['Information Technology', 'Health Care', 'Health Care', 'Information Technology', 'Health Care', 'Financials', 'Financials', 'Health Care', 'Consumer Discretionary', 'Financials', 'Industrials', 'Financials', 'Health Care', 'Financials', 'Financials', 'Health Care', 'Financials', 'Financials', 'Industrials', 'Health Care', 'Consumer Discretionary', 'Consumer Staples', 'Consumer Discretionary', 'Financials', 'Energy', 'Consumer Staples', 'Information Technology', 'Consumer Staples', 'Energy', 'Health Care', 'Consumer Discretionary', 'Utilities', 'Materials', 'Industrials', 'Utilities', 'Consumer Discretionary', 'Information Technology', 'Industrials', 'Consumer Discretionary', 'Consumer Discretionary', 'Industrials', 'Industrials', 'Health Care', 'Consumer Discretionary', 'Information Technology', 'Information Technology', 'Financials', 'Energy', 'Consumer Discretionary', 'Industrials', 'Information Technology', 'Information Technology', 'Health Care', 'Financials', 'Consumer Staples', 'Energy', 'Consumer Staples', 'Health Care', 'Industrials', 'Consumer Discretionary', 'Information Technology', 'Consumer Discretionary', 'Consumer Staples', 'Health Care', 'Financials', 'Industrials', 'Consumer Staples', 'Materials', 'Health Care', 'Financials', 'Information Technology', 'Utilities', 'Consumer Discretionary', 'Information Technology', 'Energy', 'Consumer Discretionary', 'Consumer Staples', 'Health Care', 'Consumer Staples', 'Consumer Staples', 'Information Technology', 'Information Technology', 'Industrials', 'Consumer Discretionary', 'Energy', 'Utilities', 'Real Estate', 'Telecommunications', 'Consumer Discretionary', 'Consumer Discretionary', 'Information Technology', 'Health Care', 'Industrials', 'Industrials', 'Financials', 'Industrials', 'Information Technology', 'Telecommunications', 'Consumer Staples', 'Financials', 'Consumer Staples', 'Energy']
先來用切片的方法觀察下數據。比如查看前四家公司的名稱。
print(names[:4])
['Apple Inc', 'Abbvie Inc', 'Abbott Laboratories', 'Accenture Plc']
或者輸出最后一家公司的所有信息。
print("公司名:", names[-1])
print("股價:", prices[-1])
print("每股盈余:", earnings[-1])
print("行業:", sectors[-1])
公司名: Exxon Mobil Corp
股價: 80.31
每股盈余: 3.56
行業: Energy
計算市盈率
市盈率(Price to Earnings ratio),也稱股價收益比率,由股價除以每年度每股盈余(EPS)得到,它是用來衡量股價水平是否合理的指標之一。
為了方便計算市盈率,我們首先將數據從Python列表類型轉換為NumPy數組。
numpy.array()
函數創建numpy數組。
# 導入科學計算包NumPy
import numpy as np
# 將列表轉換成numpy數組
names = np.array(names)
prices = np.array(prices)
earnings = np.array(earnings)
sectors = np.array(sectors)
NumPy數組的優勢是它可以直接對數組進行運算,而這一點Python列表是做不到的。比如計算市盈率 pe
,我們可以直接將數組 prices
除以數組 earnings
。
# 計算市盈率(P/E)
pe = prices / earnings
# 輸出市盈率的前5個值
print(pe[:5])
[ 18.49130435 17.56873823 22.93775934 24.58544839 11.14202335]
接下來我們就具體行業來進行分析,比如對于IT行業,我們首先需要篩選出哪些公司屬于這一行業。
這里需要使用布爾型索引。比如在數組 numbers 中找到大于3的數,首先使用 numbers > 3 來得到一個只含有 True 和 False的布爾數組。
numbers = np.array([1,2,3,4,5])
boolean_array = (numbers > 3)
print(boolean_array)
輸出:[False False False True True]
然后利用這一布爾數組,篩選出 True 對應的元素,就可以得到大于3的數了。
large_number = numbers[boolean_array]
print(large_number)
輸出:[4 5]
# 創建IT行業的布爾數組
boolean_array = (sectors == 'Information Technology')
# 選取IT行業的子集數據
it_names = names[boolean_array]
it_pe = pe[boolean_array]
# 輸出IT行業的公司名和市盈率
print(it_names)
print(it_pe)
['Apple Inc' 'Accenture Plc' 'Cisco Systems Inc' 'Facebook Inc'
'Alphabet Class C' 'Alphabet Class A' 'International Business Machines'
'Intel Corp' 'Mastercard Inc' 'Microsoft Corp' 'Oracle Corp'
'Paypal Holdings' 'Qualcomm Inc' 'Texas Instruments' 'Visa Inc']
[ 18.49130435 24.58544839 16.76497696 34.51637765 34.09708738
34.6196853 11.08345534 14.11320755 34.78654292 24.40532544
19.20392157 54.67857143 17.67989418 24.28325123 31.68678161]
用同樣的方法,篩選出必需消費品行業的公司和市盈率。
# 創建必需消費品(CS)行業的布爾數組
boolean_array = (sectors == 'Consumer Staples')
# 選取CS行業的子集數據
cs_names = names[boolean_array]
cs_pe = pe[boolean_array]
# 輸出CS行業的公司名和市盈率
print(cs_names)
print(cs_pe)
['Colgate-Palmolive Company' 'Costco Wholesale' 'CVS Corp' 'Kraft Heinz Co'
'Coca-Cola Company' 'Mondelez Intl Cmn A' 'Altria Group' 'Pepsico Inc'
'Procter & Gamble Company' 'Philip Morris International Inc'
'Walgreens Boots Alliance' 'Wal-Mart Stores']
[ 25.14285714 29.41924399 12.29071804 22.63764045 24.12698413
20.72682927 21.04746835 22.55859375 22.19346734 23.01781737
13.7745098 22.03669725]
篩選出IT和必需消費品行業的數據后,我們來計算這兩個行業市盈率的均值和標準差。
numpy.mean(array)
函數計算數組array的均值。
numpy.std(array)
函數計算數組array的標準差。
# 計算IT行業市盈率的均值和標準差
it_pe_mean = np.mean(it_pe)
it_pe_std = np.std(it_pe)
print("IT行業市盈率的均值:", it_pe_mean)
print("IT行業市盈率的標準差:", it_pe_std)
IT行業市盈率的均值: 26.3330554204
IT行業市盈率的標準差: 10.8661467927
# 計算必需消費品行業市盈率的均值和標準差
cs_pe_mean = np.mean(cs_pe)
cs_pe_std = np.std(cs_pe)
print("必需消費品行業市盈率的均值:", cs_pe_mean)
print("必需消費品行業市盈率的標準差:", cs_pe_std)
必需消費品行業市盈率的均值: 21.5810689064
必需消費品行業市盈率的標準差: 4.41202165427
繪圖
首先用散點圖來觀察這兩個行業中每一家公司的市盈率。這里使用Python中常用的繪圖工具包 matplotlib
。
matplotlib.pyplot.scatter()
函數繪制散點圖。
# 導入 matplotlib.pyplot 模塊
import matplotlib.pyplot as plt
# 設置公司id
it_id = np.arange(len(it_pe))
cs_id = np.arange(len(cs_pe))
# 繪制市盈率的散點圖
plt.scatter(it_id, it_pe, color='red', label='IT')
plt.scatter(cs_id, cs_pe, color='green', label='CS')
# 增加圖例
plt.legend()
# 增加坐標軸標簽
plt.xlabel('Company ID')
plt.ylabel('P/E Ratio')
# 輸出圖
plt.show()
我們注意到,上圖的右上角有一IT公司的市盈率特別高。若某股票的市盈率高于同類股票,往往意味著該股有較高的增長預期。所以讓我們進一步來觀察IT行業的市盈率分布,在這里直方圖可以用來查看數據的分布情況。
matplotlib.pyplot.hist()
函數繪制直方圖。
# 繪制IT行業市盈率的直方圖,將數值分成8個區間
plt.hist(it_pe, bins=8)
# 增加坐標軸標簽
plt.xlabel('P/E ratio')
plt.ylabel('Frequency')
# 輸出圖
plt.show()
現在可以更直觀的看到在直方圖的右側有一離群值,它具有很高的市盈率。我們可以使用布爾索引找到這家市盈率很高的公司。
# 找出市盈率大于50的值
outlier_price = it_pe[it_pe > 50]
# 找出市盈率大于50的公司
outlier_name = it_names[it_pe > 50]
# 輸出結果, round()函數用于四舍五入
print(str(outlier_name[0]) + " 公司的市盈率是" + str(round(outlier_price[0],2)))
Paypal Holdings 公司的市盈率是54.68
注:本文是 DataCamp 課程 Intro to Python for Finance 的學習筆記。