(八)pandas知識學習3-python數據分析與機器學習實戰(學習筆記)

文章原創,最近更新:2018-05-3

引言:關于series的介紹

這這里為了方便大家可以學習series這個案例,將fandango_score_comparison.csv這個文件以百度網盤分享給大家,鏈接: https://pan.baidu.com/s/1U6z7OvXK75L1AGm1vYlN4w 密碼: qe1a

課程來源: python數據分析與機器學習實戰-唐宇迪

dataframe是相當于矩陣,series是相當于矩陣的一行,series類型由一組數據及與之相關的數據索引組成.
比如以下一個小的案例:

import pandas as pd
a=pd.Series([9,8,7,6])
a
Out[19]: 
0    9
1    8
2    7
3    6
dtype: int64

以下是關于電影的一個評分以及相關的數據.我們觀察以下用series結構有沒有什么特別之處?

import pandas as pd

fandango=pd.read_csv('fandango_score_comparison.csv')

series_film = fandango['FILM']

type(series_film)
Out[85]: pandas.core.series.Series

通過上面可以看出從fandango是個Datafram,然后將fandango其中的一列['FILM']拿出來,fandango['FILM']變成了Series.

在Series進行定位,與Datafram有什么區別呢?

其實都是一樣的用法,通過索引和切片的方式.

series_film = fandango['FILM']
series_film[0:5]
Out[84]: 
0    Avengers: Age of Ultron (2015)
1                 Cinderella (2015)
2                    Ant-Man (2015)
3            Do You Believe? (2015)
4     Hot Tub Time Machine 2 (2015)
Name: FILM, dtype: object

series_rt = fandango['RottenTomatoes']
series_rt[0:5]
Out[87]: 
0    74
1    85
2    80
3    18
4    14
Name: RottenTomatoes, dtype: int64

新建一個Series結構應該怎么辦?

首先我們查看series.values的結構.發現結果是一個ndarray.即就是從series每一個值拿出來,每個值就是ndarray.這就說明了,dataframe里面的結構是series,series里面的結構是ndarray.其實pandas是封裝在numpy的基礎之上的.

很多操作就是把numpy組合形成便利的條件,pandas與numpy很多操作都是互通的.

film_names=series_film.values

type(film_names)
Out[89]: numpy.ndarray

下面的操作是創建一個series出來,在pandas當中要將series導進來.

from pandas  import Series

Series的字符串表現形式為:索引在左邊,值在右邊。由于我們沒有為數據指定索引,于是會自動創建一個0到N-1(N為數據的長度)的整數型索引。你可以通過Series 的values和index屬性獲取其數組表示形式和索引對象:

與普通NumPy數組相比,你可以通過索引的方式選取Series中的單個或一組值:

案例創建一個series,在這個結構當中,一個電影名字,對應其中一個媒體的評分值等于多少.

from pandas  import Series
rt_scores = series_rt.values
series_custom = Series(rt_scores , index=film_names)

series_custom[['Minions (2015)', 'Leviathan (2014)']]
Out[96]: 
Minions (2015)      54
Leviathan (2014)    99
dtype: int64

series如何排序?

reindex更多的不是修改pandas對象的索引,而只是修改索引的順序,如果修改的索引不存在就會使用默認的None代替此行。且不會修改原數組,要修改需要使用賦值語句。

首先提取電影的名稱,即是將index提取成列表.

original_index = series_custom.index.tolist()

original_index
Out[110]: 
['Avengers: Age of Ultron (2015)',
 'Cinderella (2015)',
 'Ant-Man (2015)',
 'Do You Believe? (2015)',
 'Hot Tub Time Machine 2 (2015)',
 ....
 'Mr. Holmes (2015)',
 "'71 (2015)",
 'Two Days, One Night (2014)',
 'Gett: The Trial of Viviane Amsalem (2015)',
 'Kumiko, The Treasure Hunter (2015)']

對電影的名稱進行排序.排序后的結果如下:

sorted_index = sorted(original_index)

sorted_index
Out[112]: 
["'71 (2015)",
 '5 Flights Up (2015)',
 'A Little Chaos (2015)',
 'A Most Violent Year (2014)',
 'About Elly (2015)',
....
 'What We Do in the Shadows (2015)',
 'When Marnie Was There (2015)',
 "While We're Young (2015)",
 'Wild Tales (2014)',
 'Woman in Gold (2015)']

用reindex函數,根據排序后的電影名稱修改series_custom的索引順序,具體如下:

sorted_by_index = series_custom.reindex(sorted_index)

sorted_by_index
Out[114]: 
'71 (2015)                                         97
5 Flights Up (2015)                                52
A Little Chaos (2015)                              40
A Most Violent Year (2014)                         90
About Elly (2015)                                  97
....
When Marnie Was There (2015)                       89
While We're Young (2015)                           83
Wild Tales (2014)                                  96
Woman in Gold (2015)                               52
Length: 146, dtype: int64

如何用對series的索引以及值進行排序?

用sort_index()對索引進行排序,得到sc2

sc2 = series_custom.sort_index()

sc2
Out[116]: 
'71 (2015)                                         97
5 Flights Up (2015)                                52
A Little Chaos (2015)                              40
A Most Violent Year (2014)                         90
About Elly (2015)                                  97
....
What We Do in the Shadows (2015)                   96
When Marnie Was There (2015)                       89
While We're Young (2015)                           83
Wild Tales (2014)                                  96
Woman in Gold (2015)                               52
Length: 146, dtype: int64

用sort_values()對值進行排序,得到sc3

sc3 = series_custom.sort_values()

sc3
Out[118]: 
Paul Blart: Mall Cop 2 (2015)                    5
Hitman: Agent 47 (2015)                          7
Hot Pursuit (2015)                               8
Fantastic Four (2015)                            9
Taken 3 (2015)                                   9
....
Song of the Sea (2014)                          99
Phoenix (2015)                                  99
Selma (2014)                                    99
Seymour: An Introduction (2015)                100
Gett: The Trial of Viviane Amsalem (2015)      100
Length: 146, dtype: int64

如何對2個series進行相加?

對于兩個維度一樣的series,相加之后就會得到一個新的series.如果維度一樣,對應位置相加,如果維度不一樣,直接是分別相加的要給操作.


通過用add函數將2個series_custom進行相加.

series_custom
Out[123]: 
Avengers: Age of Ultron (2015)                     74
Cinderella (2015)                                  85
Ant-Man (2015)                                     80
Do You Believe? (2015)                             18
Hot Tub Time Machine 2 (2015)                      14
....
Mr. Holmes (2015)                                  87
'71 (2015)                                         97
Two Days, One Night (2014)                         97
Gett: The Trial of Viviane Amsalem (2015)         100
Kumiko, The Treasure Hunter (2015)                 87
Length: 146, dtype: int64

np.add(a,b)等價于a+b,相加的結果如下:

np.add(series_custom, series_custom)#等價于series_custom+series_custom
Out[124]: 
Avengers: Age of Ultron (2015)                    148
Cinderella (2015)                                 170
Ant-Man (2015)                                    160
Do You Believe? (2015)                             36
Hot Tub Time Machine 2 (2015)                      28
....
Mr. Holmes (2015)                                 174
'71 (2015)                                        194
Two Days, One Night (2014)                        194
Gett: The Trial of Viviane Amsalem (2015)         200
Kumiko, The Treasure Hunter (2015)                174
Length: 146, dtype: int64

用np.sin()對series求sin值

np.sin(series_custom)
Out[126]: 
Avengers: Age of Ultron (2015)                   -0.985146
Cinderella (2015)                                -0.176076
Ant-Man (2015)                                   -0.993889
Do You Believe? (2015)                           -0.750987
Hot Tub Time Machine 2 (2015)                     0.990607
....
Mr. Holmes (2015)                                -0.821818
'71 (2015)                                        0.379608
Two Days, One Night (2014)                        0.379608
Gett: The Trial of Viviane Amsalem (2015)        -0.506366
Kumiko, The Treasure Hunter (2015)               -0.821818
Length: 146, dtype: float64

求series_custom的最大值,用np.max()進行計算

np.max(series_custom)
Out[127]: 100

判斷series_custom中大于50的數

series_custom > 50
Out[128]: 
Avengers: Age of Ultron (2015)                     True
Cinderella (2015)                                  True
Ant-Man (2015)                                     True
Do You Believe? (2015)                            False
Hot Tub Time Machine 2 (2015)                     False
....
Mr. Holmes (2015)                                  True
'71 (2015)                                         True
Two Days, One Night (2014)                         True
Gett: The Trial of Viviane Amsalem (2015)          True
Kumiko, The Treasure Hunter (2015)                 True
Length: 146, dtype: bool

查找series_custom中大于50的數

series_greater_than_50
Out[130]: 
Avengers: Age of Ultron (2015)                                             74
Cinderella (2015)                                                          85
Ant-Man (2015)                                                             80
The Water Diviner (2015)                                                   63
Top Five (2014)                                                            86
....
Mr. Holmes (2015)                                                          87
'71 (2015)                                                                 97
Two Days, One Night (2014)                                                 97
Gett: The Trial of Viviane Amsalem (2015)                                 100
Kumiko, The Treasure Hunter (2015)                                         87
Length: 94, dtype: int64

查找series_custom中>50,<75的數

criteria_one = series_custom > 50

criteria_two = series_custom < 75

both_criteria = series_custom[criteria_one & criteria_two]

both_criteria
Out[134]: 
Avengers: Age of Ultron (2015)                                            74
The Water Diviner (2015)                                                  63
Unbroken (2014)                                                           51
Southpaw (2015)                                                           59
Insidious: Chapter 3 (2015)                                               59
The Man From U.N.C.L.E. (2015)                                            68
....
Woman in Gold (2015)                                                      52
The Last Five Years (2015)                                                60
Jurassic World (2015)                                                     71
Minions (2015)                                                            54
Spare Parts (2015)                                                        52
dtype: int64

如何使2個series的index相同?如何進行計算?

index相同,兩個value會在相對應的位置進行計算,會得到一個新的series

rt_critics = Series(fandango['RottenTomatoes'].values, index=fandango['FILM'])

rt_users = Series(fandango['RottenTomatoes_User'].values, index=fandango['FILM'])

rt_mean = (rt_critics + rt_users)/2

rt_mean
Out[138]: 
FILM
Avengers: Age of Ultron (2015)                    80.0
Cinderella (2015)                                 82.5
Ant-Man (2015)                                    85.0
Do You Believe? (2015)                            51.0
Hot Tub Time Machine 2 (2015)                     21.0
....
Inside Out (2015)                                 94.0
Mr. Holmes (2015)                                 82.5
'71 (2015)                                        89.5
Two Days, One Night (2014)                        87.5
Gett: The Trial of Viviane Amsalem (2015)         90.5
Kumiko, The Treasure Hunter (2015)                75.0
Length: 146, dtype: float64

如何指定一個索引?

set_index函數拓展:
DataFrame可以通過set_index方法,可以設置單索引和復合索引。
DataFrame.set_index(keys, drop=True, append=False, inplace=False, verify_integrity=False)
append添加新索引,drop為False,inplace為True時,索引將會還原為列

fandango的index是0-146.

fandango=pd.read_csv('fandango_score_comparison.csv')
fandango.index
Out[149]: RangeIndex(start=0, stop=146, step=1)

通過set_index,將0-146更改為'FILM'這一列的值為索引,結果如下:

fandango_films = fandango.set_index('FILM', drop=False)
fandango_films.index
Out[140]: 
Index(['Avengers: Age of Ultron (2015)', 'Cinderella (2015)', 'Ant-Man (2015)',
       'Do You Believe? (2015)', 'Hot Tub Time Machine 2 (2015)',
       'The Water Diviner (2015)', 'Irrational Man (2015)', 'Top Five (2014)',
       'Shaun the Sheep Movie (2015)', 'Love & Mercy (2015)',
       ...
       'The Woman In Black 2 Angel of Death (2015)', 'Danny Collins (2015)',
       'Spare Parts (2015)', 'Serena (2015)', 'Inside Out (2015)',
       'Mr. Holmes (2015)', ''71 (2015)', 'Two Days, One Night (2014)',
       'Gett: The Trial of Viviane Amsalem (2015)',
       'Kumiko, The Treasure Hunter (2015)'],
      dtype='object', name='FILM', length=146)

對指定索引進行切片

一個數值型可以進行切片選擇,對str之間用冒號:,安裝字典的排列,比如a:c,代表a,b,c進行排列的.會將對應索引的行所有的數據都可以拿出來.與數值做索引的方法是類似的.

案例:切片從"Avengers: Age of Ultron (2015)"到"Hot Tub Time Machine 2 (2015)"的行.

fandango_films["Avengers: Age of Ultron (2015)":"Hot Tub Time Machine 2 (2015)"]與fandango_films.loc["Avengers: Age of Ultron (2015)":"Hot Tub Time Machine 2 (2015)"]等價

fandango_films["Avengers: Age of Ultron (2015)":"Hot Tub Time Machine 2 (2015)"]
Out[147]: 
                                                          FILM  \
FILM                                                             
Avengers: Age of Ultron (2015)  Avengers: Age of Ultron (2015)   
Cinderella (2015)                            Cinderella (2015)   
Ant-Man (2015)                                  Ant-Man (2015)   
Do You Believe? (2015)                  Do You Believe? (2015)   
Hot Tub Time Machine 2 (2015)    Hot Tub Time Machine 2 (2015)   


                                RT_user_norm         ...           IMDB_norm  \
FILM                                                 ...                       
Avengers: Age of Ultron (2015)           4.3         ...                3.90   
Cinderella (2015)                        4.0         ...                3.55   
Ant-Man (2015)                           4.5         ...                3.90   
Do You Believe? (2015)                   4.2         ...                2.70   
Hot Tub Time Machine 2 (2015)            1.4         ...                2.55   

                                RT_norm_round  RT_user_norm_round  \

                                Fandango_Difference  
FILM                                                 
Avengers: Age of Ultron (2015)                  0.5  
Cinderella (2015)                               0.5  
Ant-Man (2015)                                  0.5  
Do You Believe? (2015)                          0.5  
Hot Tub Time Machine 2 (2015)                   0.5  

[5 rows x 22 columns]

相類似的小練習:

#查找一個索引對應的行
fandango_films.loc['Kumiko, The Treasure Hunter (2015)']
#查找三個索引對應的行
movies = ['Kumiko, The Treasure Hunter (2015)', 'Do You Believe? (2015)', 'Ant-Man (2015)']
fandango_films.loc[movies]

如何更改數據類型?

通過dtypes函數查詢dataframe每行的數據類型,得到結果如下:

import numpy as np

types = fandango_films.dtypes

types
Out[158]: 
FILM                           object
RottenTomatoes                  int64
RottenTomatoes_User             int64
Metacritic                      int64
Metacritic_User               float64
....
IMDB_norm_round               float64
Metacritic_user_vote_count      int64
IMDB_user_vote_count            int64
Fandango_votes                  int64
Fandango_Difference           float64
dtype: object

獲取數據類型是float64的索引

float_columns = types[types.values == 'float64'].index

float_columns
Out[160]: 
Index(['Metacritic_User', 'IMDB', 'Fandango_Stars', 'Fandango_Ratingvalue',
       'RT_norm', 'RT_user_norm', 'Metacritic_norm', 'Metacritic_user_nom',
       'IMDB_norm', 'RT_norm_round', 'RT_user_norm_round',
       'Metacritic_norm_round', 'Metacritic_user_norm_round',
       'IMDB_norm_round', 'Fandango_Difference'],
      dtype='object')

通過獲得的float64的索引,以此得到對應索引中所有行的數據

float_df = fandango_films[float_columns]

float_df
Out[162]: 
                                                Metacritic_User  IMDB  \
FILM                                                                    
Avengers: Age of Ultron (2015)                              7.1   7.8   
Cinderella (2015)                                           7.5   7.1   
Ant-Man (2015)                                              8.1   7.8   
Do You Believe? (2015)                                      4.7   5.4   
Hot Tub Time Machine 2 (2015)                               3.4   5.1   
The Water Diviner (2015)                                    6.8   7.2   
Irrational Man (2015)                                       7.6   6.9   
Top Five (2014)                                             6.8   6.5   
Shaun the Sheep Movie (2015)                                8.8   7.4   
Love & Mercy (2015)                                         8.5   7.8   
Far From The Madding Crowd (2015)                           7.5   7.2   
Black Sea (2015)                                            6.6   6.4   
Leviathan (2014)                                            7.2   7.7   
Unbroken (2014)                                             6.5   7.2   
The Imitation Game (2014)                                   8.2   8.1   
Taken 3 (2015)                                              4.6   6.1   
Ted 2 (2015)                                                6.5   6.6   
Southpaw (2015)                                             8.2   7.8   
Night at the Museum: Secret of the Tomb (2014)              5.8   6.3   
Pixels (2015)                                               5.3   5.6   
McFarland, USA (2015)                                       7.2   7.5   
Insidious: Chapter 3 (2015)                                 6.9   6.3   
The Man From U.N.C.L.E. (2015)                              7.9   7.6   
Run All Night (2015)                                        7.3   6.6   
Trainwreck (2015)                                           6.0   6.7   
Selma (2014)                                                7.1   7.5   
Ex Machina (2015)                                           7.9   7.7   
Still Alice (2015)                                          7.8   7.5   
Wild Tales (2014)                                           8.8   8.2   
The End of the Tour (2015)                                  7.5   7.9   
                                                            ...  
Clouds of Sils Maria (2015)                                     0.1  
Testament of Youth (2015)                                       0.1  
Infinitely Polar Bear (2015)                                    0.1  
Phoenix (2015)                                                  0.1  
The Wolfpack (2015)                                             0.1  
The Stanford Prison Experiment (2015)                           0.1  
Tangerine (2015)                                                0.1  
Magic Mike XXL (2015)                                           0.1  
Home (2015)                                                     0.1  
The Wedding Ringer (2015)                                       0.1  
Woman in Gold (2015)                                            0.1  
The Last Five Years (2015)                                      0.1  
Mission: Impossible a€“ Rogue Nation (2015)                     0.1  
Amy (2015)                                                      0.1  
Jurassic World (2015)                                           0.0  
Minions (2015)                                                  0.0  
Max (2015)                                                      0.0  
Paul Blart: Mall Cop 2 (2015)                                   0.0  
The Longest Ride (2015)                                         0.0  
The Lazarus Effect (2015)                                       0.0  
The Woman In Black 2 Angel of Death (2015)                      0.0  
Danny Collins (2015)                                            0.0  
Spare Parts (2015)                                              0.0  
Serena (2015)                                                   0.0  
Inside Out (2015)                                               0.0  
Mr. Holmes (2015)                                               0.0  
'71 (2015)                                                      0.0  
Two Days, One Night (2014)                                      0.0  
Gett: The Trial of Viviane Amsalem (2015)                       0.0  
Kumiko, The Treasure Hunter (2015)                              0.0  

[146 rows x 15 columns]

通過std()函數,對每個指標都進行計算標準差

deviations = float_df.apply(lambda x: np.std(x))

deviations
Out[165]: 
Metacritic_User               1.505529
IMDB                          0.955447
Fandango_Stars                0.538532
Fandango_Ratingvalue          0.501106
RT_norm                       1.503265
RT_user_norm                  0.997787
Metacritic_norm               0.972522
Metacritic_user_nom           0.752765
IMDB_norm                     0.477723
RT_norm_round                 1.509404
RT_user_norm_round            1.003559
Metacritic_norm_round         0.987561
Metacritic_user_norm_round    0.785412
IMDB_norm_round               0.501043
Fandango_Difference           0.152141
dtype: float64

相類似的小練習:

rt_mt_user = float_df[['RT_user_norm', 'Metacritic_user_nom']]
rt_mt_user.apply(lambda x: np.std(x), axis=1)
最后編輯于
?著作權歸作者所有,轉載或內容合作請聯系作者
平臺聲明:文章內容(如有圖片或視頻亦包括在內)由作者上傳并發布,文章內容僅代表作者本人觀點,簡書系信息發布平臺,僅提供信息存儲服務。
  • 序言:七十年代末,一起剝皮案震驚了整個濱河市,隨后出現的幾起案子,更是在濱河造成了極大的恐慌,老刑警劉巖,帶你破解...
    沈念sama閱讀 230,527評論 6 544
  • 序言:濱河連續發生了三起死亡事件,死亡現場離奇詭異,居然都是意外死亡,警方通過查閱死者的電腦和手機,發現死者居然都...
    沈念sama閱讀 99,687評論 3 429
  • 文/潘曉璐 我一進店門,熙熙樓的掌柜王于貴愁眉苦臉地迎上來,“玉大人,你說我怎么就攤上這事?!?“怎么了?”我有些...
    開封第一講書人閱讀 178,640評論 0 383
  • 文/不壞的土叔 我叫張陵,是天一觀的道長。 經常有香客問我,道長,這世上最難降的妖魔是什么? 我笑而不...
    開封第一講書人閱讀 63,957評論 1 318
  • 正文 為了忘掉前任,我火速辦了婚禮,結果婚禮上,老公的妹妹穿的比我還像新娘。我一直安慰自己,他們只是感情好,可當我...
    茶點故事閱讀 72,682評論 6 413
  • 文/花漫 我一把揭開白布。 她就那樣靜靜地躺著,像睡著了一般。 火紅的嫁衣襯著肌膚如雪。 梳的紋絲不亂的頭發上,一...
    開封第一講書人閱讀 56,011評論 1 329
  • 那天,我揣著相機與錄音,去河邊找鬼。 笑死,一個胖子當著我的面吹牛,可吹牛的內容都是我干的。 我是一名探鬼主播,決...
    沈念sama閱讀 44,009評論 3 449
  • 文/蒼蘭香墨 我猛地睜開眼,長吁一口氣:“原來是場噩夢啊……” “哼!你這毒婦竟也來了?” 一聲冷哼從身側響起,我...
    開封第一講書人閱讀 43,183評論 0 290
  • 序言:老撾萬榮一對情侶失蹤,失蹤者是張志新(化名)和其女友劉穎,沒想到半個月后,有當地人在樹林里發現了一具尸體,經...
    沈念sama閱讀 49,714評論 1 336
  • 正文 獨居荒郊野嶺守林人離奇死亡,尸身上長有42處帶血的膿包…… 初始之章·張勛 以下內容為張勛視角 年9月15日...
    茶點故事閱讀 41,435評論 3 359
  • 正文 我和宋清朗相戀三年,在試婚紗的時候發現自己被綠了。 大學時的朋友給我發了我未婚夫和他白月光在一起吃飯的照片。...
    茶點故事閱讀 43,665評論 1 374
  • 序言:一個原本活蹦亂跳的男人離奇死亡,死狀恐怖,靈堂內的尸體忽然破棺而出,到底是詐尸還是另有隱情,我是刑警寧澤,帶...
    沈念sama閱讀 39,148評論 5 365
  • 正文 年R本政府宣布,位于F島的核電站,受9級特大地震影響,放射性物質發生泄漏。R本人自食惡果不足惜,卻給世界環境...
    茶點故事閱讀 44,838評論 3 350
  • 文/蒙蒙 一、第九天 我趴在偏房一處隱蔽的房頂上張望。 院中可真熱鬧,春花似錦、人聲如沸。這莊子的主人今日做“春日...
    開封第一講書人閱讀 35,251評論 0 28
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽。三九已至,卻和暖如春,著一層夾襖步出監牢的瞬間,已是汗流浹背。 一陣腳步聲響...
    開封第一講書人閱讀 36,588評論 1 295
  • 我被黑心中介騙來泰國打工, 沒想到剛下飛機就差點兒被人妖公主榨干…… 1. 我叫王不留,地道東北人。 一個月前我還...
    沈念sama閱讀 52,379評論 3 400
  • 正文 我出身青樓,卻偏偏與公主長得像,于是被迫代替她去往敵國和親。 傳聞我的和親對象是個殘疾皇子,可洞房花燭夜當晚...
    茶點故事閱讀 48,627評論 2 380

推薦閱讀更多精彩內容