這一篇文章主要來講索引,但是先不直接說各種索引的用法,先胡扯會需要知道的,也不是什么重點,但是就是需要知道。沒有先后順序,就是胡扯。
以下所有的英文引用均來自官方介紹Indexing。
ndarrays
can be indexed using the standard Pythonx[obj]
syntax, where x is the array and obj the selection. There are three kinds of indexing available: field access, basic slicing, advanced indexing. Which one occurs depends on obj.
格式 :x[obj]
,其中x是array,obj是選擇項,一共有三種索引方式: field access
, basic slicing
, advanced indexing
,這是官方文檔的解釋,和我們平時說的有些出入。
In Python,
x[(exp1, exp2, ..., expN)]
is equivalent tox[exp1, exp2, ..., expN]
; the latter is just syntactic sugar for the former.
在Python中, x[(exp1, exp2, ..., expN)]
等效于x[exp1, exp2, ..., expN]
,另外在《數據分析》一書中說”x[1][2]
是等效于x[1,2]
的。“
All arrays generated by basic slicing are always views of the original array.
通過切片產生的數組是原始數組的視圖。
Basic slicing with more than one non-
:
entry in the slicing tuple, acts like repeated application of slicing using a single non-:
entry, where the non-:
entries are successively taken (with all other non-:
entries replaced by:
). Thus,x[ind1,...,ind2,:]
acts likex[ind1][...,ind2,:]
under basic slicing.Warning:
The above is not true for advanced indexing.
在切片元組中使用多個非:
的基本切片,其行為類似于使用單個非:
重復應用于切片,其中非:
是被連續采用的,并且必須是在前面出現的,經測試:
出現在前面失敗。,x[ind1,...,ind2,:]
等效于 x[ind1][...,ind2,:]
You may use slicing to set values in the array, but (unlike lists) you can never grow the array. The size of the value to be set in
x[obj] = value
must be (broadcastable) to the same shape asx[obj]
.
通切片索引賦值,value的shape要和x[obj]的形狀一致,如果一定要不同的話,那必須是可廣播的,并且賦值后的shape依舊不能變化。
Advanced indexing is triggered when the selection object, obj, is a non-tuple sequence object, an
ndarray
(of data type integer or bool), or a tuple with at least one sequence object or ndarray (of data type integer or bool). There are two types of advanced indexing: integer and Boolean.Advanced indexing always returns a copy of the data (contrast with basic slicing that returns a view).
這一個是高級索引的定義,也就是我們說的花式索引。高級索引的觸發條件是x[obj]中的obj是一個非元組的序列對象,或者是一個數據類型是整型或布爾型的ndarray,或者是至少有一個序列對象或數據類型是整型或布爾型的ndarray的元組。
這里的翻譯確實繞口,如有翻譯錯誤,請不吝指正。最前面的那個非元組應該就是不能是純數字的元組E.g.(2,3,4),因為元組也是一個序列對象x[(2,3,4)]就等于x[2,3,4],這就成了基本索引。
Integer array indexing
Integer array indexing allows selection of arbitrary items in the array based on their N-dimensional index. Each integer array represents a number of indexes into that dimension.
整數數組索引允許基于軸隨意的選擇元素,每一個整數數組代表了一些在特定維度上的索引。
Combining advanced and basic indexing
When there is at least one slice (
:
), ellipsis (...
) ornp.newaxis
in the index (or the array has more dimensions than there are advanced indexes), then the behaviour can be more complicated. It is like concatenating the indexing result for each advanced index element.
當高級索引里面包含基本索引的時候如切片,那么他就像高級索引里的每一個基本索引的串聯,就是在上一個索引的基礎上索引,遞歸索引。說句實話括號里的那一句確實不知道在說什么。
下面就有一個例子,確實復雜,完全靠猜。
The easiest way to understand the situation may be to think in terms of the result shape. There are two parts to the indexing operation, the subspace defined by the basic indexing (excluding integers) and the subspace from the advanced indexing part. Two cases of index combination need to be distinguished:
- The advanced indexes are separated by a slice, ellipsis or newaxis. For example
x[arr1, :, arr2]
.- The advanced indexes are all next to each other. For example
x[..., arr1, arr2, :]
but notx[arr1, :, 1]
since1
is an advanced index in this regard.In the first case, the dimensions resulting from the advanced indexing operation come first in the result array, and the subspace dimensions after that. In the second case, the dimensions from the advanced indexing operations are inserted into the result array at the same spot as they were in the initial array (the latter logic is what makes simple advanced indexing behave just like slicing).
Example
Suppose
x.shape
is (10,20,30) andind
is a (2,3,4)-shaped indexingintp
array, thenresult = x[...,ind,:]
has shape (10,2,3,4,30) because the (20,)-shaped subspace has been replaced with a (2,3,4)-shaped broadcasted indexing subspace. If we let i, j, k loop over the (2,3,4)-shaped subspace thenresult[...,i,j,k,:] = x[...,ind[i,j,k],:]
. This example produces the same result asx.take(ind, axis=-2)
.
這個例子雖然說看不太懂吧,但是解釋了我以前遇到的奇葩問題:一個3×3的數組經過一個2×2的數組索引后變成了一個2×2×3的數組,并且如果用一個自己構造的同種結構的列表數組,卻是無法實現的,結果和兩個數組一樣。
In [62]: array
Out[62]:
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
In [63]: aa
Out[63]:
array([[1, 0],
[0, 1]])
In [64]: array[aa]
Out[64]:
array([[[4, 5, 6],
[1, 2, 3]],
[[1, 2, 3],
[4, 5, 6]]])
In [65]: array[[1,0],[0,1]]
Out[65]: array([4, 2])
In [66]: array[[[1,0],[0,1]]]
Out[66]: array([4, 2])
這個一定要有個解釋的話,應該是這樣的:ndarray中的每一個維度中同維度元素都是指向要索引數組同一緯度的,不想列表數組那樣,第一個數組指向0軸,第二個指向1軸,不是索引遞歸,而是同等級的,他們選出的數組也是同等級的。這并不能看成是簡單的3替換成2×2。
Boolean array indexing
This advanced indexing occurs when obj is an array object of Boolean type, such as may be returned from comparison operators.
布爾型索引發生的條件是obj是一個布爾型數組,比如可以從比較運算符返回。
其實這個布爾型索引和整數列表的高級索引是相似的。
好了不再瞎扯了,挺累的,現在開始規矩的說各種索引了,全部通過例子呈現,凡是我想到的需要注意的,都寫在例子中了。
-
基本索引
In [4]: arr Out[4]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) In [5]: arr[5] Out[5]: 5 In [7]: arr2 Out[7]: array([[0, 1, 2], [3, 4, 5], [6, 7, 8]]) In [8]: arr2[1][2] Out[8]: 5 In [9]: arr2[1,2] Out[9]: 5
x[a][b] == x[a,b]
通過索引列表遞歸索引,維度遞歸,
a
索引的是最高維0軸元素,b
索引的是次高維1軸元素。 -
切片索引
In [10]: arr Out[10]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) In [11]: arr[2:5] Out[11]: array([2, 3, 4]) In [12]: arr[2::2] Out[12]: array([2, 4, 6, 8]) In [13]: arr2 Out[13]: array([[0, 1, 2], [3, 4, 5], [6, 7, 8]]) In [14]: arr2[1:] Out[14]: array([[3, 4, 5], [6, 7, 8]]) In [15]: arr2[1:,1:] Out[15]: array([[4, 5], [7, 8]]) In [16]: arr2[:,:1] Out[16]: array([[0], [3], [6]]) In [55]: arr2 Out[55]: array([[0, 1, 2], [3, 4, 5], [6, 7, 8]]) In [56]: temp = arr2[2:] In [57]: temp Out[57]: array([[6, 7, 8]]) In [58]: temp = 9 In [59]: temp Out[59]: 9 In [60]: arr2 Out[60]: array([[0, 1, 2], [3, 4, 5], [6, 7, 8]]) # 這里不會發生改變,temp已經指向了新的區域 In [61]: temp = arr2[2:] In [62]: temp Out[62]: array([[6, 7, 8]]) In [63]: temp[:] = 9 In [64]: temp Out[64]: array([[9, 9, 9]]) In [65]: arr2 Out[65]: array([[0, 1, 2], [3, 4, 5], [9, 9, 9]]) # 通過切片,改變temp,arr2數據也同時發生了改變。
切片是在某一軸向進行橫向選取,維度選定,這種的選取是同級的元素,這種選擇方式似乎還會保留原數據的相對維度信息。比如切片選擇一個3×3的數組的第一列,選出來的是(3,1)的數組,而基本索引選出來的是(3,)的。
切片索引和列表索引可以疊在一起使用。
通過第一個基本索引和這切片索引產生的數組是原數組的視圖,改變視圖即改變原數據。
-
高級索引
-
整數數組索引
In [10]: arr Out[10]: array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11], [12, 13, 14, 15], [16, 17, 18, 19], [20, 21, 22, 23]]) In [11]: arr[[5,2,1,0]] Out[11]: array([[20, 21, 22, 23], [ 8, 9, 10, 11], [ 4, 5, 6, 7], [ 0, 1, 2, 3]]) In [12]: arr[[5,2,1,0],[2,2,1,2]] #構成索引對 Out[12]: array([22, 10, 5, 2]) In [13]: arr[[[0,0],[5,5]],[[0,3],[0,3]]] #選取四角元素方式一 Out[13]: array([[ 0, 3], [20, 23]]) In [14]: arr[[[0],[5]],[0,3]] #選取四角元素方式二 Out[14]: array([[ 0, 3], [20, 23]])
整數數組索引是通過數組與數組一一對應構成索引對來選取的,每一數組代表不同軸,假如兩個數組形狀不同,如果這兩個數組能夠以廣播的形式構成索引對,也是可以的。
如果一定要選取一個區域的話可以使用高級索引+切片索引,或者使用np.ix_函數,此函數只允許傳入兩個一維整數數組。其實np.ix_產生的就是一個元組里面是兩個array,看一下array的形狀就知道np.ix_的原理了。
-
布爾型數組索引
In [22]: arr Out[22]: array([[ 1.12105851, 0.27287448, 0.07762638, -0.26287726], [ 0.78763995, -0.48796014, 0.3238146 , 0.22576988], [ 0.86004933, 1.79189963, -0.88055021, -0.1065679 ]]) In [23]: arr[np.array([False,True,False])] Out[23]: array([[ 0.78763995, -0.48796014, 0.3238146 , 0.22576988]]) In [24]: arr[arr < 0] Out[24]: array([-0.26287726, -0.48796014, -0.88055021, -0.1065679 ]) In [25]: arr[arr < 0] = 0 #通過布爾型數組設值 In [26]: arr Out[26]: array([[ 1.12105851, 0.27287448, 0.07762638, 0. ], [ 0.78763995, 0. , 0.3238146 , 0.22576988], [ 0.86004933, 1.79189963, 0. , 0. ]])
通過布爾型數組選取數組中的數據,總是創建數據的副本,因為布爾型數組索引也是高級索引的一種。
-
ndarray索引
ndarray做索引在上文已經說明這里不再所贅述。
還是那句話,如有不當之處,理解錯誤之處,歡迎指正。
-