pandas數(shù)據(jù)分析中常用方法

讀取寫入?

read_csv 　　　　　　to_csv?

read_excel 　　　　　to_excel?

read_hdf 　　　　　　to_hdf?

read_sql 　　　　　　to_sql?

read_json　　　　　　to_json?

read_msgpack (experimental) 　　to_msgpack (experimental)?

read_html 　　　　　　to_html?

read_gbq (experimental) 　　　　to_gbq (experimental)?

read_stata 　　　　　 to_stata?

read_sas?

read_clipboard 　　　to_clipboard?

read_pickle 　　　　　to_pickle／／速度比csv快?

讀取CSV文件：

pd.read_csv('foo.csv') //讀取CSV

# header參數(shù)指定從第幾行開始生成，且將header行的數(shù)據(jù)作為列的name（鍵），header行以前的數(shù)據(jù)將不會處理。取值為None表示csv中行不做為列的name（鍵），取值為0表示將csv的第0行作為列的name。| 如果沒有傳遞參數(shù)names那么header默認(rèn)為0；如果傳遞參數(shù)names，那么header默認(rèn)為None。

存儲為csv文件：

submission = pd.DataFrame({ 'PassengerId': test_df['PassengerId'],'Survived': predictions })

submission.to_csv("submission.csv", index=False)

# index參數(shù)是否寫入行names鍵

選擇數(shù)據(jù)

官方選擇教程?

官方多index選擇教程

[]:

df['A'] 通過列name（鍵）選擇列

df[['A', 'B']]? 通過list選擇列

df[0:3]? 通過隱含的序列（index所在行值）選擇行

df['20130102':'20130104']? 通過行index（鍵）選擇行

dataset[(dataset['Sex'] == i) & (dataset['Pclass'] == j+1)]['Age']? ? ? ? #布爾運(yùn)算選擇數(shù)據(jù)，以其他列為條件篩選數(shù)據(jù)，注意做布爾運(yùn)算時最好將每個運(yùn)算用括號括起來，像以下這種就會容易出問題：dataset[dataset['TARGET'].notnull() & dataset['need_pre']!=1 ]

loc:

dataset.loc[ dataset.Age.isnull(),'BB'] //age不是null的數(shù)據(jù)中選擇BB列

train_df.loc[:, ['Age*Class', 'Age', 'Pclass']].head(10)

dataset.loc[ EID,'Age']? //根據(jù)index（注意這個是index的取值，而不是index所在行）選取列Age單元數(shù)據(jù)

iloc:?

iloc是選擇DataFrame第幾行第幾列（類似于數(shù)組，數(shù)值下標(biāo)從0開始）

df.iloc[3:5,0:2]

df.iloc[1:3,:]

df.iat[1,1]

循環(huán)行數(shù)據(jù)：

for i, row in colTypes.iterrows():

# i為dataFrame的index，row為一行數(shù)據(jù)

使用另一series作為dataframe的篩選條件：

import numpy as np

import pandas as pd

df = pd.DataFrame({ 'A' : [1,2,3,4],

? ? ? ? ? ? ? ? ? ? 'B' : [4,5,6,7]

? ? ? ? ? ? ? ? ? })?

a = pd.Series([1,2,3,1])

# 對series進(jìn)行篩選

(a==1).sum()

>>>2

# 對dataframe進(jìn)行篩選

df[a==1].sum(0)

>>>

A? ? 5

B? ? 11

dtype: int64

計算數(shù)據(jù)

重復(fù)數(shù)值個數(shù)統(tǒng)計：

Series.value_counts() //統(tǒng)計重復(fù)重現(xiàn)的數(shù)據(jù)的個數(shù)。返回以數(shù)據(jù)作為key，以重復(fù)個數(shù)為value的對象。

X[c].value_counts().index[0] //最多的那個數(shù)

中值計算：

Series.median() //計算某個軸的中

計算均值和偏差：

age_mean = guess_df.mean()

# 計算均值

age_std = guess_df.std()

# 計算標(biāo)準(zhǔn)差

計算眾值：

# freq_port = train_df.Embarked.dropna().mode()[0]

# mode返回出現(xiàn)最多的數(shù)據(jù)，可能出現(xiàn)多個，因此返回數(shù)組

1

其他：

方法　　　　　　　　　　　說明

count　　　　　　　　　非NA值得數(shù)量

describe　　　　　　　針對series或各dataframe列計算匯總統(tǒng)計

min max 　　　　　　　計算最小值和最大值

argmin，argmax 　　　計算能夠獲取到最小值和最大值的索引位置（整數(shù)）

much_nuclei = df_img['nuclei'].argmax()

plt.imshow(imgs[much_nuclei])

idxmin , idxmax 　　　　計算獲取到最小值和最大值索引值

df.idxmax()? //按列

df.idxmax(axis=1)? //按行

quantile 　　　　　　　計算樣本的分位數(shù)（0到1）

sum 　　　　　　　　　　值得總和

df.sum()? //按列求和

df.sum(axis=1)? //按行求和

mean 　　　　　　　　　值得平均數(shù)

df.mean(axis=1) //按行求和，注意，如果存在Nan值會忽略，如果整個都為nan，則取nan

df.mean(axis=1, skipna = False) //禁止忽略nan值

median 　　　　　　　　值的算數(shù)中位數(shù)

mad 　　　　　　　　　　根據(jù)平均值計算平均絕對離差

var 　　　　　　　　　　　樣本值得方差

std 　　　　　　　　　　　樣本值得標(biāo)準(zhǔn)差

skew 　　　　　　　　　樣本值得偏度（三階矩）

kurt 　　　　　　　　　　樣本值的峰度（四階矩）

cumsum 　　　　　　　　樣本值的累計和，累計累積，也就是說從開始位置到當(dāng)前位置的總和

df.cumsum()? //按列求累積和，如果當(dāng)前位置為nan，直接返回nan，如果不是，而前面某個位置是，則忽略前面位置的nan

df.cumsum(axis=1)? //按行求累積和

cummin,cummax 　　　樣本值的累計最大值和累計最小值

cumprod 　　　　　　　　　樣本值的累計積

diff 　　　　　　　　　　計算一階差分（對時間序列很有用）

pct_change 　　　　　　　　計算百分?jǐn)?shù)變化

isin 　　　　　　　　　　判斷series，dataframe數(shù)據(jù)是否在另一個變量其中

缺失值處理

方法　　　　　　說明

count 　　　　　非NA值得數(shù)量

dropna 　　　　　根據(jù)各標(biāo)簽的值中是否存在缺失數(shù)據(jù)對軸標(biāo)簽進(jìn)行過濾，可通過閾值調(diào)節(jié)對缺失值得容忍度

fillna 　　　　　用指定值或插值方法（如ffill或bfill）填充確實(shí)數(shù)據(jù)

isnull 　　　　　返回一個含有布爾值的對象，這些布爾值表示哪些值是缺失值/NA，該對象的類型與源類型一樣

notnull 　　　　　isnull的否定式

存在三種方法來完善連續(xù)數(shù)字特征：

1、簡單方式：在中值和標(biāo)準(zhǔn)偏差間產(chǎn)生一個隨機(jī)數(shù)

2、準(zhǔn)確方式：通過相關(guān)特征猜測缺失值

3、聯(lián)合1、2基于特征組合，在中值和偏差間產(chǎn)生一個隨機(jī)數(shù)

缺失值填充：

dataset['E'] = dataset['E'].fillna(f)

# 對缺失值處進(jìn)行填充0，參數(shù)value可為 scalar, dict, Series, 或者DataFrame，但不能是list；Series應(yīng)用于每個index，DataFrame應(yīng)用于每個列。如果不在dict/Series/DataFrame中，將不會被填充

清除空值：.dropna()

dataset.loc[ (dataset.Age.isnull()) & (dataset.Sex == i) & (dataset.Pclass == j+1),'Age'] = guess_ages[i,j]

# 多條件填充

方法1：

for dataset in full_data:

? ? age_avg? ? ? ? = dataset['Age'].mean()

? ? age_std? ? ? ? = dataset['Age'].std()

? ? age_null_count = dataset['Age'].isnull().sum()

? ? age_null_random_list = np.random.randint(age_avg - age_std, age_avg + age_std, size=age_null_count)

? ? dataset['Age'][np.isnan(dataset['Age'])] = age_null_random_list

? ? dataset['Age'] = dataset['Age'].astype(int)

方法3：

# 生成一個空數(shù)組來存儲Age的猜測值：

? ? ? ? # guess_ages = np.zeros((2,3))

? ? ? ? # guess_ages

# 遍歷Sex和Pclass來猜測Age猜測值：

? ? ? ? # for dataset in combine:

? ? ? ? #? ? for i in range(0, 2):

? ? ? ? #? ? ? ? for j in range(0, 3):

? ? ? ? #? ? ? ? ? ? guess_df = dataset[(dataset['Sex'] == i) & (dataset['Pclass'] == j+1)]['Age'].dropna()

? ? ? ? # 根據(jù)相關(guān)特征值Pclass，Sex選取數(shù)據(jù)并除空值

? ? ? ? ? ? ? ? #? ? age_mean = guess_df.mean()

? ? ? ? ? ? ? ? # 計算均值

? ? ? ? ? ? ? ? #? ? age_std = guess_df.std()

? ? ? ? ? ? ? ? # 計算標(biāo)準(zhǔn)差

? ? ? ? ? ? ? ? #? ? age_guess = rnd.uniform(age_mean - age_std, age_mean + age_std)

? ? ? ? ? ? ? ? # 產(chǎn)生隨機(jī)值

? ? ? ? ? ? ? ? #? ? age_guess = guess_df.median()

? ? ? ? ? ? ? ? # 或計算中值

? ? ? ? ? ? ? ? #? ? Convert random age float to nearest .5 age

? ? ? ? ? ? ? ? #? ? guess_ages[i,j] = int( age_guess/0.5 + 0.5 ) * 0.5?

for i in range(0, 2):

? ? ? for j in range(0, 3):

? ? ? ? ? ? dataset.loc[ (dataset.Age.isnull()) & (dataset.Sex == i) & (dataset.Pclass == j+1),'Age'] = guess_ages[i,j]

? ? ? ? ? ? # 賦值

? ? ? ? ? ? dataset['Age'] = dataset['Age'].astype(int)

填充眾值：

# freq_port = train_df.Embarked.dropna().mode()[0]

# mode返回出現(xiàn)最多的數(shù)據(jù)，可能出現(xiàn)多個，因此返回數(shù)組

# 填充：

# for dataset in combine:

#? ? dataset['E'] = dataset['E'].fillna(freq_port)

查看數(shù)據(jù)

查看鍵和值：

train_data = pd.read_csv('train.csv')

# 查看數(shù)據(jù)的行鍵index（index.values）、列鍵columns(columns.values)、值values

print(train_data.index)

print(train_data.index.values)

查看數(shù)據(jù)統(tǒng)計：

train_data.info()

# 主要統(tǒng)計有各列鍵非空數(shù)據(jù)數(shù)量（便于后面填充空值）、各列數(shù)據(jù)類型、及數(shù)據(jù)類型統(tǒng)計（一般object表示字符串對象數(shù)量）。

print(train_data.describe())

# 默認(rèn)統(tǒng)計數(shù)值型數(shù)據(jù)每列數(shù)據(jù)平均值，標(biāo)準(zhǔn)差，最大值，最小值，25%，50%，75%比例。

print(train_data.describe(include=['O']))

# 統(tǒng)計字符串型數(shù)據(jù)的總數(shù)，取不同值數(shù)量，頻率最高的取值。其中include參數(shù)是結(jié)果數(shù)據(jù)類型白名單，O代表object類型，可用info中輸出類型篩選。

print("Before", train_data.shape)

# 數(shù)據(jù)行數(shù)和列數(shù)

查看部分?jǐn)?shù)據(jù)內(nèi)容：

# 查看前五條和后五條數(shù)據(jù)，大致了解數(shù)據(jù)內(nèi)容

print(train_data.head())

print(train_data.tail())

# 選取三條數(shù)據(jù)

data_train.sample(3)

排序：

features.sort_values(by='EID', ascending=True)

features.sort_index(axis=1, ascending=True)

python原生排序list和dict

sorted([wifi for wifi in line[5]], key=lambda x:int(x[1]), reverse=True)[:5]? // 默認(rèn)從小到大

sorted(dict.items(),key=lambda x:x[1],reverse=True)[0][0]

sorted(L, cmp=lambda x,y:cmp(x[1],y[1])) //x，y代表前后兩個元素

輸出格式控制：

pandas dataframe數(shù)據(jù)全部輸出，數(shù)據(jù)太多也不用省略號表示。

pd.set_option('display.max_columns',None)

或者

with option_context('display.max_rows', 10, 'display.max_columns', 5):

某列字符長度統(tǒng)計

lens = train.comment_text.str.len()

lens.mean(), lens.std(), lens.max()

print('mean text len:',train["comment_text"].str.count('\S+').mean())

print('max text len:',train["comment_text"].str.count('\S+').max())

分析數(shù)據(jù)相關(guān)性

groupby數(shù)據(jù)：

train_data[['Pclass','Survived']].groupby(['Pclass'], as_index=False).mean().sort_values(by='Survived',ascending=False)

# 選取數(shù)據(jù)中兩列，以Pclass分組，計算每個分組內(nèi)平均值，最后根據(jù)Survived平均值降序排列。其中as_index=False不以Pclass做結(jié)果行鍵。

分組后，可以通過size()分組內(nèi)數(shù)據(jù)數(shù)量,sum()分組內(nèi)數(shù)據(jù)和,count()分組內(nèi)：

df = DataFrame({'key1':['a','a','b','b','a'],'key2':['one','two','one','two','one'],'data1':np.random.randn(5),'data2':np.random.randn(5)})?

df?

#[Out]#? ? ? data1? ? data2 key1 key2?

#[Out]# 0? 0.439801? 1.582861? ? a? one?

#[Out]# 1 -1.388267 -0.603653? ? a? two?

#[Out]# 2 -0.514400 -0.826736? ? b? one?

#[Out]# 3 -1.487224 -0.192404? ? b? two?

#[Out]# 4? 2.169966? 0.074715? ? a? one

group2 = df.groupby(['key1','key2'])?

group2.size()?

#[Out]# key1? key2?

#[Out]# a? ? one? ? 2? ? //注意size返回的對象2，1，1，1沒有列鍵

#[Out]#? ? ? two? ? 1?

#[Out]# b? ? one? ? 1?

#[Out]#? ? ? two? ? 1?

#[Out]# dtype: int64?

group2.count()?

#[Out]#? ? ? ? ? ? data1? data2? ?

#[Out]# key1 key2? ? ? ? ? ? ? ?

#[Out]# a? ? one? ? ? 2? ? ? 2? ? //注意count返回的對象2，1，1，1有列鍵data1，data2

#[Out]#? ? ? two? ? ? 1? ? ? 1?

#[Out]# b? ? one? ? ? 1? ? ? 1?

#[Out]#? ? ? two? ? ? 1? ? ? 1

group2.sum()

? ? ? ? ? ? ? ? ? ? ? ? data1? data2

key1? ? key2? ? ? ?

a? ? ? one? ? ? ? ? 0.222249? 1.188488

? ? ? ? two? ? ? ? ? 0.627373? 0.406101

b? ? ? one? ? ? ? ? -2.527461? 0.267850

? ? ? ? two? ? ? ? ? -0.594238? -0.137129?

自定義組內(nèi)統(tǒng)計函數(shù)：

BRA_CLOSE_DECADE = branch2[['EID', 'B_ENDYEAR']].groupby('EID').agg(lambda df:df[df['B_ENDYEAR']>2007].count())

crosstab數(shù)據(jù)：

pd.crosstab(train_data['Title'], train_data['Sex'])

# 分別以Title（Mrs，Mr等）為行，Sex（female，male）為例，計算出現(xiàn)頻數(shù)。觀察二者的對應(yīng)關(guān)系。

Pivot數(shù)據(jù)：

impute_grps = data.pivot_table(values=["LoanAmount"],index=["Gender","Married","Self_Employed"], aggfunc=np.mean)

COV()，CORR()計算?

協(xié)方差cov()：表示線性相關(guān)的方向，取值正無窮到負(fù)無窮。協(xié)方差為正值，說明一個變量變大另一個變量也變大；協(xié)方差取負(fù)值，說明一個變量變大另一個變量變小，取0說明兩個變量咩有相關(guān)關(guān)系。?

相關(guān)系數(shù)corr()：不僅表示線性相關(guān)的方向，還表示線性相關(guān)的程度，取值[-1,1]。也就是說，相關(guān)系數(shù)為正值，說明一個變量變大另一個變量也變大；取負(fù)值說明一個變量變大另一個變量變小，取0說明兩個變量沒有相關(guān)關(guān)系。同時，相關(guān)系數(shù)的絕對值越接近1，線性關(guān)系越顯著。?

corrwith()：計算DataFrame的列（axis=0，默認(rèn)）或行（axis=1)跟另外一個Series或DataFrame之間的相關(guān)系數(shù)。

刪除數(shù)據(jù)

print(df.drop(0,axis=0)) #刪除行，注意原數(shù)據(jù)不變，返回一個新數(shù)據(jù)

print(df.drop(['col1'],axis=1,inplace=True)) #刪除列，inplace=True表示直接在原數(shù)據(jù)修改而不新建對象

合并數(shù)據(jù)

concat:?

相同字段的表首尾相接

result= pd.concat([df1, df2, df3],keys=['x','y','z']) //keys給合并的表來源加一個辨識號

注意多張表concat后可能會出現(xiàn)index重復(fù)情況，這是最好使用reset_index重新組織下index。

result.reset_index(drop=True)

pd.concat(objs, axis=0, join='outer', join_axes=None, ignore_index=False,

?keys=None, levels=None, names=None, verify_integrity=False)

append方式：

# append方式

result = df1.append([df2, df3]) //將df2，df3追加到df1后返回

# [官方合并教程](http://pandas.pydata.org/pandas-docs/stable/merging.html#)

merge方式：?

merge(left, right, how=’inner’, on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=True, suffixes=(‘_x’, ‘_y’), copy=True, indicator=False)?

merge方式用于通過一個或多個鍵將兩個數(shù)據(jù)集的行連接起來，類似于 SQL 中的 JOIN?

on=None 用于顯示指定列名（鍵名），如果該列在兩個對象上的列名不同，則可以通過 left_on=None, right_on=None 來分別指定。或者想直接使用行索引作為連接鍵的話，就將left_index=False, right_index=False 設(shè)為 True。如果沒有指定且其他參數(shù)也未指定則以兩個DataFrame的列名交集做為連接鍵.?

how=’inner’ 參數(shù)指的是當(dāng)左右兩個對象中存在不重合的鍵時，取結(jié)果的方式：inner 代表交集；outer 代表并集；left 和 right 分別為取一邊。?

suffixes=(‘_x’,’_y’) 指的是當(dāng)左右對象中存在除連接鍵外的同名列時，結(jié)果集中的區(qū)分方式，可以各加一個小尾巴。?

對于多對多連接，結(jié)果采用的是行的笛卡爾積。

# merge方式

# 其中how取值 : {‘left’, ‘right’, ‘outer’, ‘inner’}, default ‘inner’類似于SQL中 left outer join，right outer join， full outer join，inner join

>>> A? ? ? ? ? ? ? >>> B

? ? lkey value? ? ? ? rkey value

0? foo? 1? ? ? ? 0? foo? 5

1? bar? 2? ? ? ? 1? bar? 6

2? baz? 3? ? ? ? 2? qux? 7

3? foo? 4? ? ? ? 3? bar? 8

>>> A.merge(B, left_on='lkey', right_on='rkey', how='outer')

? lkey? value_x? rkey? value_y

0? foo? 1? ? ? ? foo? 5

1? foo? 4? ? ? ? foo? 5

2? bar? 2? ? ? ? bar? 6

3? bar? 2? ? ? ? bar? 8

4? baz? 3? ? ? ? NaN? NaN

5? NaN? NaN? ? ? qux? 7

join方式：?

其中參數(shù)的意義與merge方法基本相同,只是join方法默認(rèn)為左外連接how=left。默認(rèn)按索引合并，可以合并相同或相似的索引。主要用于索引上的合并

join(self, other,on=None, how='left', lsuffix='', rsuffix='',sort=False):

修改數(shù)據(jù)

從數(shù)據(jù)中提取數(shù)據(jù)：

dataset['Title'] = dataset.Name.str.extract(' ([A-Za-z]+)\.', expand=False)

# 左邊dataset['Title']為DataFrame添加一列，右邊dataset.Name取出DataFrame的name列，然后對于該Series里

data['sum_Times']=data['Times'].groupby(['userID']).cumsum()//統(tǒng)計單個userid組內(nèi)到當(dāng)前行之前的所有time和

替換數(shù)據(jù)：

dataset['Title'] = dataset['Title'].replace('Ms','Miss')

dataset['Title'].replace('Ms','Miss')

#將一列中數(shù)據(jù)Ms替換Miss，[詳解](https://jingyan.baidu.com/article/454316ab4d0e64f7a6c03a41.html)

將分類數(shù)據(jù)數(shù)值化：

title_mapping = {"Mr": 1,"Miss": 2,"Mrs": 3,"Master": 4,"Rare": 5}

fordataset in combine:

? ? dataset['Title'] = dataset['Title'].map(title_mapping)

# dataset['Sex'] = dataset['Sex'].map( {'female': 1,'male': 0} ).astype(int)

轉(zhuǎn)成矩陣：

big_X_imputed[0:train_df.shape[0]].as_matrix()? //將DataFrame對象轉(zhuǎn)成numpy矩陣

將連續(xù)值分成幾部分：

# 自動

pd.cut(np.array([.2, 1.4, 2.5, 6.2, 9.7, 2.1]), 3,

? ? ? ? ? labels=["good","medium","bad"])

[good, good, good, medium, bad, good]

# 手動，一般手動前先自動分析一波。

# train_df['AgeBand'] = pd.cut(train_df['Age'], 5)

# train_df[['AgeBand', 'Survived']].groupby(['AgeBand'], as_index=False).mean().sort_values(by='AgeBand', ascending=True)

# 手動區(qū)分

# for dataset in combine:? ?

#? ? dataset.loc[ dataset['Age'] <= 16, 'Age'] = 0

#? ? dataset.loc[(dataset['Age'] > 16) & (dataset['Age'] <= 32), 'Age'] = 1

#? ? dataset.loc[ dataset['Age'] > 64, 'Age'] = 4

對每一行或每一列應(yīng)用函數(shù)：

def num_missing(x):

? return sum(x.isnull())

#應(yīng)用列:

print data.apply(num_missing, axis=0)

#應(yīng)用行:

print data.apply(num_missing, axis=1).head()

def get_title(name):

? ? title_search = re.search(' ([A-Za-z]+)\.', name)

? ? # If the title exists, extract and return it.

? ? if title_search:

? ? ? ? return title_search.group(1)

? ? return ""

for dataset in full_data:

? ? dataset['Title'] = dataset['Name'].apply(get_title)

df.Cabin = df.Cabin.apply(lambda x: x[0])

將字符型數(shù)據(jù)轉(zhuǎn)成數(shù)值型數(shù)值：

from sklearn import preprocessingdefencode_features(df_train, df_test): features = ['Fare', 'Cabin', 'Age', 'Sex', 'Lname', 'NamePrefix']

? ? df_combined = pd.concat([df_train[features], df_test[features]])

? ? for feature in features:

? ? ? ? le = preprocessing.LabelEncoder()

? ? ? ? le = le.fit(df_combined[feature])

? ? ? ? df_train[feature] = le.transform(df_train[feature])

? ? ? ? df_test[feature] = le.transform(df_test[feature])

? ? return df_train, df_test

data_train, data_test = encode_features(data_train, data_test)

除去離群點(diǎn)：?

通過畫圖如果發(fā)現(xiàn)數(shù)據(jù)中出現(xiàn)一些離群點(diǎn)，應(yīng)將其除去，使用pandas布爾運(yùn)算即可：

train = train[abs(train['length'])<10]

categorial無序特征啞編碼one-hot：?

星期為無序特征，如果該特征有三種取值：星期一、星期二、星期三，那么可用三維向量分別表示（1，0，0）（0，1，0）（0，0，1）。使用pd.get_dummies()，如果特征取值過多就應(yīng)根據(jù)數(shù)據(jù)分布規(guī)律將不重要的幾個取值歸為一類。?

去重相同行：

alter.duplicated() //返回每行是否重復(fù)的bool值，frame.duplicated(['state'])可選擇指定列進(jìn)行查重。

alter.duplicated().value_counts()

alter2 = alter.drop_duplicates()? //除去相同行，注意返回新數(shù)據(jù)，而不是在舊有的上面修改

修改index名，列鍵名：

df.columns = ['a', 'b', 'c', 'd', 'e']

df.columns = df.columns.str.strip('$')

df.columns = df.columns.map(lambda x:x[1:])

df.rename(columns=('$a': 'a', '$b': 'b', '$c': 'c', '$d': 'd', '$e': 'e'}, inplace=True)

df.rename(columns=lambda x:x.replace('$',''), inplace=True)

investFeature.index.rename('EID', inplace=True)

列轉(zhuǎn)index、index轉(zhuǎn)列：

df.set_index('date', inplace=True)

df['index'] = df.indexdf.reset_index(level=0, inplace=True)

df.reset_index(level=['tick', 'obs'])

df['si_name'] = df.index.get_level_values('si_name') # where si_name is the name of the subindex.

刪除index

df_load.reset_index(inplace=True)

del df_load['index']

原文參考：https://blog.csdn.net/qq_16234613/article/details/64217337

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
平臺聲明：文章內(nèi)容（如有圖片或視頻亦包括在內(nèi)）由作者上傳并發(fā)布，文章內(nèi)容僅代表作者本人觀點(diǎn)，簡書系信息發(fā)布平臺，僅提供信息存儲服務(wù)。

人面猴
序言：七十年代末，一起剝皮案震驚了整個濱河市，隨后出現(xiàn)的幾起案子，更是在濱河造成了極大的恐慌，老刑警劉巖，帶你破解...
沈念sama閱讀 228,197評論 6贊 531
死咒
序言：濱河連續(xù)發(fā)生了三起死亡事件，死亡現(xiàn)場離奇詭異，居然都是意外死亡，警方通過查閱死者的電腦和手機(jī)，發(fā)現(xiàn)死者居然都...
沈念sama閱讀 98,415評論 3贊 415
救了他兩次的神仙讓他今天三更去死
文/潘曉璐我一進(jìn)店門，熙熙樓的掌柜王于貴愁眉苦臉地迎上來，“玉大人，你說我怎么就攤上這事。” “怎么了？”我有些...
開封第一講書人閱讀 176,104評論 0贊 373
道士緝兇錄：失蹤的賣姜人
文/不壞的土叔我叫張陵，是天一觀的道長。經(jīng)常有香客問我，道長，這世上最難降的妖魔是什么？我笑而不...
開封第一講書人閱讀 62,884評論 1贊 309
?港島之戀（遺憾婚禮）
正文為了忘掉前任，我火速辦了婚禮，結(jié)果婚禮上，老公的妹妹穿的比我還像新娘。我一直安慰自己，他們只是感情好，可當(dāng)我...
茶點(diǎn)故事閱讀 71,647評論 6贊 408
惡毒庶女頂嫁案：這布局不是一般人想出來的
文/花漫我一把揭開白布。她就那樣靜靜地躺著，像睡著了一般。火紅的嫁衣襯著肌膚如雪。梳的紋絲不亂的頭發(fā)上，一...
開封第一講書人閱讀 55,130評論 1贊 323
城市分裂傳說
那天，我揣著相機(jī)與錄音，去河邊找鬼。笑死，一個胖子當(dāng)著我的面吹牛，可吹牛的內(nèi)容都是我干的。我是一名探鬼主播，決...
沈念sama閱讀 43,208評論 3贊 441
雙鴛鴦連環(huán)套：你想象不到人心有多黑
文/蒼蘭香墨我猛地睜開眼，長吁一口氣：“原來是場噩夢啊……” “哼！你這毒婦竟也來了？” 一聲冷哼從身側(cè)響起，我...
開封第一講書人閱讀 42,366評論 0贊 288
萬榮殺人案實(shí)錄
序言：老撾萬榮一對情侶失蹤，失蹤者是張志新（化名）和其女友劉穎，沒想到半個月后，有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體，經(jīng)...
沈念sama閱讀 48,887評論 1贊 334
?護(hù)林員之死
正文獨(dú)居荒郊野嶺守林人離奇死亡，尸身上長有42處帶血的膿包…… 初始之章·張勛以下內(nèi)容為張勛視角年9月15日...
茶點(diǎn)故事閱讀 40,737評論 3贊 354
?白月光啟示錄
正文我和宋清朗相戀三年，在試婚紗的時候發(fā)現(xiàn)自己被綠了。大學(xué)時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片。...
茶點(diǎn)故事閱讀 42,939評論 1贊 369
活死人
序言：一個原本活蹦亂跳的男人離奇死亡，死狀恐怖，靈堂內(nèi)的尸體忽然破棺而出，到底是詐尸還是另有隱情，我是刑警寧澤，帶...
沈念sama閱讀 38,478評論 5贊 358
?日本核電站爆炸內(nèi)幕
正文年R本政府宣布，位于F島的核電站，受9級特大地震影響，放射性物質(zhì)發(fā)生泄漏。R本人自食惡果不足惜，卻給世界環(huán)境...
茶點(diǎn)故事閱讀 44,174評論 3贊 347
男人毒藥：我在死后第九天來索命
文/蒙蒙一、第九天我趴在偏房一處隱蔽的房頂上張望。院中可真熱鬧，春花似錦、人聲如沸。這莊子的主人今日做“春日...
開封第一講書人閱讀 34,586評論 0贊 26
一樁弒父案，背后竟有這般陰謀
文/蒼蘭香墨我抬頭看了看天上的太陽。三九已至，卻和暖如春，著一層夾襖步出監(jiān)牢的瞬間，已是汗流浹背。一陣腳步聲響...
開封第一講書人閱讀 35,827評論 1贊 283
情欲美人皮
我被黑心中介騙來泰國打工，沒想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留，地道東北人。一個月前我還...
沈念sama閱讀 51,608評論 3贊 390
代替公主和親
正文我出身青樓，卻偏偏與公主長得像，于是被迫代替她去往敵國和親。傳聞我的和親對象是個殘疾皇子，可洞房花燭夜當(dāng)晚...
茶點(diǎn)故事閱讀 47,914評論 2贊 372

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美国产综合欧美视频

pandas數(shù)據(jù)分析中常用方法

pandas數(shù)據(jù)分析中常用方法

讀取寫入文件

推薦閱讀更多精彩內(nèi)容

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美 国产 综合 欧美 视频

pandas數(shù)據(jù)分析中常用方法

讀取寫入文件

推薦閱讀更多精彩內(nèi)容

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美国产综合欧美视频