Pandas基本屬性

Numpy是列表的話,Pandas更類似于字典,可以重命名行名和列名。

創建pandas序列 會自動加上序號和dtype

import pandas as pd
import numpy as np
s = pd.Series([1,3,6,np.nan,44,1])
s

0     1.0
1     3.0
2     6.0
3     NaN
4    44.0
5     1.0
dtype: float64

創建DataFrame

  1. 生成默認行號和列號
df1 = pd.DataFrame(np.arange(12).reshape((3,4)))
df1

    0   1   2   3
0   0   1   2   3
1   4   5   6   7
2   8   9   10  11
  1. 新增日期索引
dates = pd.date_range('20160101',periods = 6)
dates

DatetimeIndex(['2016-01-01', '2016-01-02', '2016-01-03', '2016-01-04',
               '2016-01-05', '2016-01-06'],
              dtype='datetime64[ns]', freq='D')

df = pd.DataFrame(np.random.randn(6,4),index = dates,columns = ['a','b','c','d'])
df

            a           b           c           d
2016-01-01  -1.281511   1.713843    -0.606131   -0.699298
2016-01-02  -0.690049   -0.624657   1.521370    -0.226207
2016-01-03  1.280099    0.188350    -0.481156   0.131706
2016-01-04  -0.026690   0.899729    -0.678333   -1.096834
2016-01-05  0.517648    0.291178    -0.879998   -0.823239
2016-01-06  -1.936642   -0.286916   0.362583    0.444345
  1. 字典形式定義每一列
df2 = pd.DataFrame({'A':1.,
                    'B':pd.Timestamp('20130102'),
                    'C':pd.Series(1,index=list(range(4)), dtype= 'float32' ),
                    'D':np.array([3]*4, dtype = 'int32'),
                    'E':pd.Categorical(["test","train","test","train"]),
                    'F':'foo'})
df2

    A   B           C   D   E       F
0   1.0 2013-01-02  1.0 3   test    foo
1   1.0 2013-01-02  1.0 3   train   foo
2   1.0 2013-01-02  1.0 3   test    foo
3   1.0 2013-01-02  1.0 3   train   foo

DataFrame的基本屬性

  1. 打印每一列的數據形式
df2.dtypes 

A           float64
B    datetime64[ns]
C           float32
D             int32
E          category
F            object
dtype: object
  1. 打印行名、列名和值
df2.index # 打印行名
Int64Index([0, 1, 2, 3], dtype='int64')

df2.columns # 打印列名
Index(['A', 'B', 'C', 'D', 'E', 'F'], dtype='object')

df2.values # 打印值
array([[1.0, Timestamp('2013-01-02 00:00:00'), 1.0, 3, 'test', 'foo'],
       [1.0, Timestamp('2013-01-02 00:00:00'), 1.0, 3, 'train', 'foo'],
       [1.0, Timestamp('2013-01-02 00:00:00'), 1.0, 3, 'test', 'foo'],
       [1.0, Timestamp('2013-01-02 00:00:00'), 1.0, 3, 'train', 'foo']],
      dtype=object)
  1. describe 計數、均值、標準差、分位數(只運算數值型的列)
df2.describe() 

        A   C   D
count   4.0 4.0 4.0
mean    1.0 1.0 3.0
std     0.0 0.0 0.0
min     1.0 1.0 3.0
25%     1.0 1.0 3.0
50%     1.0 1.0 3.0
75%     1.0 1.0 3.0
max     1.0 1.0 3.0
  1. 行列轉置
df2.T

    0   1   2   3
A   1   1   1   1
B   2013-01-02 00:00:00 2013-01-02 00:00:00 2013-01-02 00:00:00 2013-01-02 00:00:00
C   1   1   1   1
D   3   3   3   3
E   test    train   test    train
F   foo foo foo foo
  1. 排序
df2.sort_index(axis=1,ascending=False) # 按列倒序

    F   E   D   C   B   A
0   foo test    3   1.0 2013-01-02  1.0
1   foo train   3   1.0 2013-01-02  1.0
2   foo test    3   1.0 2013-01-02  1.0
3   foo train   3   1.0 2013-01-02  1.0


df2.sort_index(axis=0,ascending=False) # 按行倒序

    A   B   C   D   E   F
3   1.0 2013-01-02  1.0 3   train   foo
2   1.0 2013-01-02  1.0 3   test    foo
1   1.0 2013-01-02  1.0 3   train   foo
0   1.0 2013-01-02  1.0 3   test    foo

df2.sort_values(by='E') # 按值排序

    A   B   C   D   E   F
0   1.0 2013-01-02  1.0 3   test    foo
2   1.0 2013-01-02  1.0 3   test    foo
1   1.0 2013-01-02  1.0 3   train   foo
3   1.0 2013-01-02  1.0 3   train   foo

Pandas學習教程來源請戳這里

?著作權歸作者所有,轉載或內容合作請聯系作者
平臺聲明:文章內容(如有圖片或視頻亦包括在內)由作者上傳并發布,文章內容僅代表作者本人觀點,簡書系信息發布平臺,僅提供信息存儲服務。

推薦閱讀更多精彩內容