# S!T!A!R!T```python%matplotlib inlineimport numpy as npimport pandas as pdimport matplotlib.pyplot as plt```## 1.define a Dataframe data```pythondates = pd.date_range('20170129', periods = 5, freq = 'd')df = pd.DataFrame(np.random.randn(5,4), columns = list('ABCD'), index = dates)``````pythondf```
.dataframe thead tr:only-child th {
text-align: right;
}
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
ABCD
2017-01-29-0.8767950.6257040.8394741.239243
2017-01-300.3514250.6191040.6901520.890323
2017-01-310.4845342.188402-0.258478-0.833300
2017-02-011.492153-1.739632-1.868941-0.013322
2017-02-020.6747790.317034-0.379203-0.076168
```pythondf.plot(kind = 'barh')```## 2.having data types in 'df'```pythondf.dtypes```? ? A? ? float64? ? B? ? float64? ? C? ? float64? ? D? ? float64? ? dtype: object## 3. Choose indexs&cloumns&values```pythonprint('-_'*33)print('The index of df is :\n{0}'.format(df.index))print('-_'*33)print('The values of df is :\n{0}'.format(df.values))print('-_'*33)print('The columns of df is :\n{0}'.format(df.columns))print('-_'*33)print(df.describe())print('-_'*33)```? ? -_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_? ? The index of df is :? ? DatetimeIndex(['2017-01-29', '2017-01-30', '2017-01-31', '2017-02-01',? ? ? ? ? ? ? ? ? '2017-02-02'],? ? ? ? ? ? ? ? ? dtype='datetime64[ns]', freq='D')? ? -_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_? ? The values of df is :? ? [[-0.87679455? 0.62570401? 0.8394737? 1.23924262]? ? [ 0.35142547? 0.61910446? 0.69015155? 0.89032261]? ? [ 0.48453389? 2.18840161 -0.2584778? -0.83329982]? ? [ 1.49215272 -1.739632? -1.86894142 -0.01332174]? ? [ 0.67477899? 0.31703409 -0.37920271 -0.07616798]]? ? -_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_? ? The columns of df is :? ? Index(['A', 'B', 'C', 'D'], dtype='object')? ? -_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_? ? ? ? ? ? ? ? ? A? ? ? ? B? ? ? ? C? ? ? ? D? ? count? 5.000000? 5.000000? 5.000000? 5.000000? ? mean? 0.425219? 0.402122 -0.195399? 0.241355? ? std? ? 0.852178? 1.403764? 1.083244? 0.827270? ? min? -0.876795 -1.739632 -1.868941 -0.833300? ? 25%? ? 0.351425? 0.317034 -0.379203 -0.076168? ? 50%? ? 0.484534? 0.619104 -0.258478 -0.013322? ? 75%? ? 0.674779? 0.625704? 0.690152? 0.890323? ? max? ? 1.492153? 2.188402? 0.839474? 1.239243? ? -_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_## 4.Transposing your data```pythonprint(df.T)```? ? ? 2017-01-29? 2017-01-30? 2017-01-31? 2017-02-01? 2017-02-02? ? A? -0.876795? ? 0.351425? ? 0.484534? ? 1.492153? ? 0.674779? ? B? ? 0.625704? ? 0.619104? ? 2.188402? -1.739632? ? 0.317034? ? C? ? 0.839474? ? 0.690152? -0.258478? -1.868941? -0.379203? ? D? ? 1.239243? ? 0.890323? -0.833300? -0.013322? -0.076168## 5.Sort```pythondf```
.dataframe thead tr:only-child th {
text-align: right;
}
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
ABCD
2017-01-29-0.8767950.6257040.8394741.239243
2017-01-300.3514250.6191040.6901520.890323
2017-01-310.4845342.188402-0.258478-0.833300
2017-02-011.492153-1.739632-1.868941-0.013322
2017-02-020.6747790.317034-0.379203-0.076168
```pythonprint('-_'*33)print(df.sort_index(axis = 0, ascending = False))print('-_'*33)print(df.sort_index(axis = 1, ascending = False))print('-_'*33)print(df.sort_values(axis = 0, ascending = True , by = 'A'))print('-_'*33)```? ? -_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_? ? ? ? ? ? ? ? ? ? ? A? ? ? ? B? ? ? ? C? ? ? ? D? ? 2017-02-02? 0.674779? 0.317034 -0.379203 -0.076168? ? 2017-02-01? 1.492153 -1.739632 -1.868941 -0.013322? ? 2017-01-31? 0.484534? 2.188402 -0.258478 -0.833300? ? 2017-01-30? 0.351425? 0.619104? 0.690152? 0.890323? ? 2017-01-29 -0.876795? 0.625704? 0.839474? 1.239243? ? -_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_? ? ? ? ? ? ? ? ? ? ? D? ? ? ? C? ? ? ? B? ? ? ? A? ? 2017-01-29? 1.239243? 0.839474? 0.625704 -0.876795? ? 2017-01-30? 0.890323? 0.690152? 0.619104? 0.351425? ? 2017-01-31 -0.833300 -0.258478? 2.188402? 0.484534? ? 2017-02-01 -0.013322 -1.868941 -1.739632? 1.492153? ? 2017-02-02 -0.076168 -0.379203? 0.317034? 0.674779? ? -_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_? ? ? ? ? ? ? ? ? ? ? A? ? ? ? B? ? ? ? C? ? ? ? D? ? 2017-01-29 -0.876795? 0.625704? 0.839474? 1.239243? ? 2017-01-30? 0.351425? 0.619104? 0.690152? 0.890323? ? 2017-01-31? 0.484534? 2.188402 -0.258478 -0.833300? ? 2017-02-02? 0.674779? 0.317034 -0.379203 -0.076168? ? 2017-02-01? 1.492153 -1.739632 -1.868941 -0.013322? ? -_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_## 6.Choose data### 6.1.Choose columns by single label```pythondf```
.dataframe thead tr:only-child th {
text-align: right;
}
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
ABCD
2017-01-29-0.8767950.6257040.8394741.239243
2017-01-300.3514250.6191040.6901520.890323
2017-01-310.4845342.188402-0.258478-0.833300
2017-02-011.492153-1.739632-1.868941-0.013322
2017-02-020.6747790.317034-0.379203-0.076168
```pythonprint('-_'*33)print(df.A)print('-_'*33)print(df['B'])print('-_'*33)```? ? -_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_? ? 2017-01-29? -0.876795? ? 2017-01-30? ? 0.351425? ? 2017-01-31? ? 0.484534? ? 2017-02-01? ? 1.492153? ? 2017-02-02? ? 0.674779? ? Freq: D, Name: A, dtype: float64? ? -_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_? ? 2017-01-29? ? 0.625704? ? 2017-01-30? ? 0.619104? ? 2017-01-31? ? 2.188402? ? 2017-02-01? -1.739632? ? 2017-02-02? ? 0.317034? ? Freq: D, Name: B, dtype: float64? ? -_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_### 6.2.Choose rows by slice via'[ : ]'```pythondf```
.dataframe thead tr:only-child th {
text-align: right;
}
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
ABCD
2017-01-29-0.8767950.6257040.8394741.239243
2017-01-300.3514250.6191040.6901520.890323
2017-01-310.4845342.188402-0.258478-0.833300
2017-02-011.492153-1.739632-1.868941-0.013322
2017-02-020.6747790.317034-0.379203-0.076168
```python#not include the second numberdf[ :3]```
.dataframe thead tr:only-child th {
text-align: right;
}
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
ABCD
2017-01-29-0.8767950.6257040.8394741.239243
2017-01-300.3514250.6191040.6901520.890323
2017-01-310.4845342.188402-0.258478-0.833300
### 6.3. Choose rows by index name via'[:]'```python# include the second labledf['20170129': '2017-1-31']```
.dataframe thead tr:only-child th {
text-align: right;
}
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
ABCD
2017-01-29-0.8767950.6257040.8394741.239243
2017-01-300.3514250.6191040.6901520.890323
2017-01-310.4845342.188402-0.258478-0.833300
### 6.4. Choose complex sections using '.loc( )'```pythondf```
.dataframe thead tr:only-child th {
text-align: right;
}
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
ABCD
2017-01-29-0.8767950.6257040.8394741.239243
2017-01-300.3514250.6191040.6901520.890323
2017-01-310.4845342.188402-0.258478-0.833300
2017-02-011.492153-1.739632-1.868941-0.013322
2017-02-020.6747790.317034-0.379203-0.076168
```pythonprint(df.loc[dates[0]])print('-_'*33)print(df.loc['20170201'])print('-_'*33)print(df.loc[dates[0]:dates[len(df)-1], list('AB')])print('-_'*33)print(df.loc['20170129': '20170202', ['A', 'C', 'D']])print('-_'*33)print(df.loc[:, list('CBA')])print('-_'*33)print(df.loc[dates[0], 'A'])print('-_'*33)print(df.loc[dates[0], ['A']])print('-_'*33)##A' can not be a listprint(df.at[dates[0], 'A'])```? ? A? -0.876795? ? B? ? 0.625704? ? C? ? 0.839474? ? D? ? 1.239243? ? Name: 2017-01-29 00:00:00, dtype: float64? ? -_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_? ? A? ? 1.492153? ? B? -1.739632? ? C? -1.868941? ? D? -0.013322? ? Name: 2017-02-01 00:00:00, dtype: float64? ? -_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_? ? ? ? ? ? ? ? ? ? ? A? ? ? ? B? ? 2017-01-29 -0.876795? 0.625704? ? 2017-01-30? 0.351425? 0.619104? ? 2017-01-31? 0.484534? 2.188402? ? 2017-02-01? 1.492153 -1.739632? ? 2017-02-02? 0.674779? 0.317034? ? -_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_? ? ? ? ? ? ? ? ? ? ? A? ? ? ? C? ? ? ? D? ? 2017-01-29 -0.876795? 0.839474? 1.239243? ? 2017-01-30? 0.351425? 0.690152? 0.890323? ? 2017-01-31? 0.484534 -0.258478 -0.833300? ? 2017-02-01? 1.492153 -1.868941 -0.013322? ? 2017-02-02? 0.674779 -0.379203 -0.076168? ? -_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_? ? ? ? ? ? ? ? ? ? ? C? ? ? ? B? ? ? ? A? ? 2017-01-29? 0.839474? 0.625704 -0.876795? ? 2017-01-30? 0.690152? 0.619104? 0.351425? ? 2017-01-31 -0.258478? 2.188402? 0.484534? ? 2017-02-01 -1.868941 -1.739632? 1.492153? ? 2017-02-02 -0.379203? 0.317034? 0.674779? ? -_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_? ? -0.876794550096? ? -_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_? ? A? -0.876795? ? Name: 2017-01-29 00:00:00, dtype: float64? ? -_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_? ? -0.876794550096### 6.5. Choose complex sections using '.lioc( )'```pythondf```
.dataframe thead tr:only-child th {
text-align: right;
}
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
ABCD
2017-01-29-0.8767950.6257040.8394741.239243
2017-01-300.3514250.6191040.6901520.890323
2017-01-310.4845342.188402-0.258478-0.833300
2017-02-011.492153-1.739632-1.868941-0.013322
2017-02-020.6747790.317034-0.379203-0.076168
```pythonprint('-_'*33)#single elementprint(df.iloc[0,0])print('-_'*33)#complex sectionsprint(df.iloc[:3,2:])print('-_'*33)#single rowprint(df.iloc[0])print('-_'*33)#single columnprint(df.iloc[:,0])print('-_'*33)```? ? -_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_? ? -0.876794550096? ? -_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_? ? ? ? ? ? ? ? ? ? ? C? ? ? ? D? ? 2017-01-29? 0.839474? 1.239243? ? 2017-01-30? 0.690152? 0.890323? ? 2017-01-31 -0.258478 -0.833300? ? -_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_? ? A? -0.876795? ? B? ? 0.625704? ? C? ? 0.839474? ? D? ? 1.239243? ? Name: 2017-01-29 00:00:00, dtype: float64? ? -_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_? ? 2017-01-29? -0.876795? ? 2017-01-30? ? 0.351425? ? 2017-01-31? ? 0.484534? ? 2017-02-01? ? 1.492153? ? 2017-02-02? ? 0.674779? ? Freq: D, Name: A, dtype: float64? ? -_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_### 7. Boolean indexing```python#element in column 'A' > 0, return the DataFramedf[df.A > 0]```
.dataframe thead tr:only-child th {
text-align: right;
}
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
ABCD
2017-01-300.3514250.6191040.6901520.890323
2017-01-310.4845342.188402-0.258478-0.833300
2017-02-011.492153-1.739632-1.868941-0.013322
2017-02-020.6747790.317034-0.379203-0.076168
```python#all element in 'df' which <=0 is np.nandf[df > 0]```
.dataframe thead tr:only-child th {
text-align: right;
}
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
ABCD
2017-01-29NaN0.6257040.8394741.239243
2017-01-300.3514250.6191040.6901520.890323
2017-01-310.4845342.188402NaNNaN
2017-02-011.492153NaNNaNNaN
2017-02-020.6747790.317034NaNNaN
```pythondf2=df.copy()df2['E'] = ['fuck', 'shit', ] * 2 + ['suck']df2[df2.E.isin(['fuck', 'suck'])]```
.dataframe thead tr:only-child th {
text-align: right;
}
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
ABCDE
2017-01-29-0.8767950.6257040.8394741.239243fuck
2017-01-310.4845342.188402-0.258478-0.833300fuck
2017-02-020.6747790.317034-0.379203-0.076168suck
### 8.Missing Data```pythondf3 = df[df > 0]df3```
.dataframe thead tr:only-child th {
text-align: right;
}
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
ABCD
2017-01-29NaN0.6257040.8394741.239243
2017-01-300.3514250.6191040.6901520.890323
2017-01-310.4845342.188402NaNNaN
2017-02-011.492153NaNNaNNaN
2017-02-020.6747790.317034NaNNaN
```pythondf3.dropna()```
.dataframe thead tr:only-child th {
text-align: right;
}
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
ABCD
2017-01-300.3514250.6191040.6901520.890323
```pythondf3.fillna(value = 'Missing')```
.dataframe thead tr:only-child th {
text-align: right;
}
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
ABCD
2017-01-29Missing0.6257040.8394741.23924
2017-01-300.3514250.6191040.6901520.890323
2017-01-310.4845342.1884MissingMissing
2017-02-011.49215MissingMissingMissing
2017-02-020.6747790.317034MissingMissing
```pythondf4 = pd.isnull(df3)df4```
.dataframe thead tr:only-child th {
text-align: right;
}
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
ABCD
2017-01-29TrueFalseFalseFalse
2017-01-30FalseFalseFalseFalse
2017-01-31FalseFalseTrueTrue
2017-02-01FalseTrueTrueTrue
2017-02-02FalseFalseTrueTrue
### 9. Merge```pythona = df[? : 2]b = df[2 : 4]c = df[4 : 5]pieces = [a, b, c]pd.concat(pieces)```
.dataframe thead tr:only-child th {
text-align: right;
}
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
ABCD
2017-01-29-0.8767950.6257040.8394741.239243
2017-01-300.3514250.6191040.6901520.890323
2017-01-310.4845342.188402-0.258478-0.833300
2017-02-011.492153-1.739632-1.868941-0.013322
2017-02-020.6747790.317034-0.379203-0.076168
### 10. Time Series### 10.1. define time series based 'df'```pythontime = pd.date_range('20170129', periods = 1000, freq = 'd')col = list('ABCDEFG')df_time = pd.DataFrame(np.random.randn(len(time), len(col)), index = time, columns = col)df_time.head(5)```
.dataframe thead tr:only-child th {
text-align: right;
}
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
ABCDEFG
2017-01-29-2.442170-1.423526-0.2833060.4749680.2284170.8108171.918797
2017-01-300.3414900.611065-0.6841250.9758580.595433-1.2319040.027245
2017-01-31-0.448969-0.416562-1.026428-1.529416-0.5672462.305233-0.186806
2017-02-010.3858290.0555172.3483670.419194-0.3857280.0490750.613855
2017-02-02-0.2629490.3337900.750670-0.7366430.0635610.6028150.328664
### 10.2. resample 'df' by month```pythondf_time2 = df_time.resample('m').mean()df_time2.head()```
.dataframe thead tr:only-child th {
text-align: right;
}
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
ABCDEFG
2017-01-31-0.849883-0.409675-0.664620-0.0261960.0855350.6280490.586412
2017-02-28-0.0904390.0590900.069942-0.049585-0.075990-0.1410150.258787
2017-03-310.133955-0.0674840.157919-0.0252380.2037280.217022-0.109552
2017-04-30-0.1271820.1696820.065305-0.2698410.2872390.1400860.193535
2017-05-31-0.0336770.088312-0.284405-0.1335220.114316-0.551639-0.158488
```pythondf_time3 = df_time2.tz_localize('utc')df_time3.head()```
.dataframe thead tr:only-child th {
text-align: right;
}
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
ABCDEFG
2017-01-31 00:00:00+00:00-0.849883-0.409675-0.664620-0.0261960.0855350.6280490.586412
2017-02-28 00:00:00+00:00-0.0904390.0590900.069942-0.049585-0.075990-0.1410150.258787
2017-03-31 00:00:00+00:000.133955-0.0674840.157919-0.0252380.2037280.217022-0.109552
2017-04-30 00:00:00+00:00-0.1271820.1696820.065305-0.2698410.2872390.1400860.193535
2017-05-31 00:00:00+00:00-0.0336770.088312-0.284405-0.1335220.114316-0.551639-0.158488
```pythondf_time3.tz_convert('US/Eastern').head()```
.dataframe thead tr:only-child th {
text-align: right;
}
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
ABCDEFG
2017-01-30 19:00:00-05:00-0.849883-0.409675-0.664620-0.0261960.0855350.6280490.586412
2017-02-27 19:00:00-05:00-0.0904390.0590900.069942-0.049585-0.075990-0.1410150.258787
2017-03-30 20:00:00-04:000.133955-0.0674840.157919-0.0252380.2037280.217022-0.109552
2017-04-29 20:00:00-04:00-0.1271820.1696820.065305-0.2698410.2872390.1400860.193535
2017-05-30 20:00:00-04:00-0.0336770.088312-0.284405-0.1335220.114316-0.551639-0.158488
### 11. Plotting```pythondf.plot()```
### 12. Data I/O
```python
df_time.to_csv('DataFrame_time_data.csv')
```
```python
df_time.to_excel('DataFrame_time_data.xlsx', sheet_name = 'data_1')
```
```python
df_time.to_html('DataFrame_time_data.html')
```
```python
df_time.to_json('DataFrame_time_data.txt')
```
```python
```