about

這篇文章是Python2 的官方文檔 7.3. struct
— Interpret strings as packed binary data的一個(gè)學(xué)習(xí)筆記
官方文檔簡(jiǎn)介：

This module performs conversions between Python values and C structs represented as Python strings. This can be used in handling binary data stored in files or from network connections, among other sources.

簡(jiǎn)單說(shuō)來(lái)，就是Python中的value(i.e. int, float, string) 和string（似二進(jìn)制般的）之間的一個(gè)轉(zhuǎn)換

struct模板主要函數(shù)有：

pack(v1, v2, ...)
unpack(string)
pack_into(buffer, offset, v1, v2, ...)
unpack_from(buffer, offset=0)

下文一一介紹

pack() and unpack()

pack()

先來(lái)看看官方說(shuō)明:

pack(fmt, v1, v2, ...):

Return a string containing the values v1, v2, ... packed according to the given format. The arguments must match the values required by the format exactly.

就是把values:v1, v2按照對(duì)應(yīng)fmt(format)方式轉(zhuǎn)換為string.

來(lái)看個(gè)栗子：

>>> import struct
>>> 
>>> v1 = 1
>>> v2 = 'abc'
>>> bytes = struct.pack('i3s', v1, v2)
>>> bytes
'\x01\x00\x00\x00abc'

這里的fmt就是'i3s'，什么意思呢？其中i就是integer,即整數(shù)，后面的s對(duì)應(yīng)string。在上面的栗子中，abc是長(zhǎng)度為３的字符串，所以就有了3s.

這里有一個(gè)完整的fmt列表：

fmt.png

unpack()

同樣，先看看官方文檔

unpack(fmt, string)

Unpack the string (presumably packed by pack(fmt, ...)) according to the given format. The result is a tuple even if it contains exactly one item. The string must contain exactly the amount of data required by the format (len(string) must equal calcsize(fmt)).

簡(jiǎn)單說(shuō)來(lái)，就是把string按照對(duì)應(yīng)的fmt形式解析出來(lái)。注意，結(jié)果返回的是一個(gè)tuple

舉個(gè)栗子

>>> bytes = '\x01\x00\x00\x00abc'
>>> v1, v2 = struct.unpack('i3s', bytes)
>>> v1
1
>>> v2
'abc'

這就把上面的v1，v2還原回去了。

注意，當(dāng)返回值只有一個(gè)時(shí)：

>>> a = 2
>>> a_pack = struct.pack('i',a)
>>> a_unpack = struct.unpack('i',a_pack)  #此處得到的a_unpack為tuple
>>> a_unpack
(2,)
>>> a_unpack, = struct.unpack('i',a_pack) #此處得到的a_unpack為int
>>> a_unpack
2

Byte Order, Size, and Alignment

這里穿插一下字節(jié)的順序，大小，和對(duì)齊問(wèn)題

byte order

下面有個(gè)表

order.png

如果在fmt字符串前加上了'<',那么字節(jié)將會(huì)采用little-endian即小端的排列方式，如果是'>'會(huì)采用big-endian即大端的排列方式。默認(rèn)的是'@'方式

舉個(gè)栗子

>>> a = 2
>>> a_pack = struct.pack('i',a)      #這是默認(rèn)的，機(jī)器不同可能會(huì)不同，我這里默認(rèn)為字節(jié)按little-endian順序排列
>>> a_pack
'\x02\x00\x00\x00'
>>> 
>>> a_pack2 = struct.pack('>i',a)    # '>'即big-endian
>>> a_pack2
'\x00\x00\x00\x02'
>>> 
>>> a_pack3 = struct.pack('<i',a)   #'<'即little-endian
>>> a_pack3
'\x02\x00\x00\x00'

如果不按默認(rèn)的小端或大端字節(jié)排列，加上'<'或'>'，unpack就要留意了

>>> a = 2
>>> a_pack2 = struct.pack('>i',a)   #big-endian
>>> a_pack2
'\x00\x00\x00\x02'
>>> a_unpack, = struct.unpack('<i',a_pack2)    #little-endian
>>> a_unpack
33554432
>>> a_unpack2, = struct.unpack('>i', a_pack2)   #big-endian
>>> a_unpack2
2

如上所示，如果pack與unpack操作的字節(jié)順序不一致，把little-endian和big-endian亂搞，就會(huì)導(dǎo)致數(shù)據(jù)搞亂

size and alignment

其實(shí)，struct是類(lèi)似于C語(yǔ)言中的struct結(jié)構(gòu)體方式存儲(chǔ)數(shù)據(jù)的。故這里有一個(gè)數(shù)據(jù)的對(duì)齊方式問(wèn)題。如果在內(nèi)存為32位(即４GB)機(jī)器中，一般是以4 bytes對(duì)齊的。CPU一次讀取４字節(jié)，然后放入對(duì)應(yīng)的cache(緩存)中。

看個(gè)栗子

struct A{
  char c1;
  int a;
  char c2;
}

結(jié)構(gòu)體A會(huì)占用多少內(nèi)存大小呢？直覺(jué)上可能是 1+4+1 = 6　字節(jié)，但一般來(lái)說(shuō)，其實(shí)是12字節(jié)！在第一個(gè)char變量c1占用了一字節(jié)后，由于是４字節(jié)對(duì)齊的，int變量a不會(huì)插在c1后面，而是把c1后面隱式的補(bǔ)上3個(gè)字節(jié)，然后把a放在了下面的那行中，最后把char變量c2放到a下面。
再看看下面的

struct A{
  char c1;
  char c2;
  int a;
}

這種情形，結(jié)構(gòu)體A會(huì)占用多少內(nèi)存呢？答案是8字節(jié)。原理同上，先把char變量c1放上去，和c1同行的還有３字節(jié)，一看下一個(gè)char變量c2才１字節(jié)，于是就把c2接在c1后面了，此時(shí)還剩2字節(jié)，但是已經(jīng)不夠int了，故只能填充上２字節(jié)，然后另起一行。

想想為什么要這樣呢？這豈不是浪費(fèi)了內(nèi)存了？！從某種意義上說(shuō)，確實(shí)是浪費(fèi)了內(nèi)存，但這卻提高了CPU的效率！
想想這種情景模式：假設(shè)內(nèi)存中某一行已經(jīng)先放了一字節(jié)的char變量c, 下一個(gè)是輪到int變量a了，它一共占４字節(jié)內(nèi)存，先是拿出3字節(jié)放在了變量c的后面，然后再拿最后的１字節(jié)放在下面一行。
如果CPU想讀取a變量該怎么辦？它應(yīng)該讀取２次！一次讀取３字節(jié)，一次讀取１字節(jié)。故這速度真是拖了，慢了一倍啊！如果變量ａ是另起一行的話，只要讀取一次就夠了，直接把４字節(jié)取走。

calcsize()

有了上了的簡(jiǎn)單認(rèn)識(shí)，就好理解這個(gè)函數(shù)是干什么了的

文檔君說(shuō)

struct.calcsize(fmt)

Return the size of the struct (and hence of the string) corresponding to the given format.

簡(jiǎn)單說(shuō)來(lái)，就是根據(jù)fmt計(jì)算出struct占用了內(nèi)存的多少字節(jié)

舉個(gè)栗子

>>> struct.calcsize('ci')
8
>>> struct.calcsize('ic')
5

查查上面的format表可知，c對(duì)應(yīng)于char,大小為１字節(jié)；i對(duì)應(yīng)于int,大小為４字節(jié)。所以，出現(xiàn)了上面情況，至于原因，不再累贅。只是最后的ic輸出了５，我猜，在struct所占用內(nèi)存行中的最后一行是不用再padding即填充了。

上面舉的栗子都是加了padding的，如果不填充呢？

>>> struct.calcsize('<ci')
5
>>> struct.calcsize('@ci')
8

倘若在fmt前加上了'<','>','=','!'這些，則不會(huì)padding,即不填充。默認(rèn)的或是'@'則會(huì)。

pack_into() and pack_from()

在具體講解之前，先來(lái)看幾個(gè)函數(shù)預(yù)熱一下

binascii module

這個(gè)模塊用于二進(jìn)制和ASCII碼之間的轉(zhuǎn)換，下面介紹幾個(gè)函數(shù)

binascii.b2a_hex(data)
binascii.hexlify(data)

Return the hexadecimal representation of the binary data. Every byte of data is converted into the corresponding 2-digit hex representation. The resulting string is therefore twice as long as the length of data.

簡(jiǎn)單說(shuō)來(lái)，就是用十六進(jìn)制表示二進(jìn)制數(shù)。

舉個(gè)栗子

>>> import binascii
>>> s = 'abc'
>>> binascii.b2a_hex(s)
'616263'
>>> binascii.hexlify(s)
'616263'

binascii.a2b_hex(hexstr)
binascii.unhexlify(hexstr)

Return the binary data represented by the hexadecimal string hexstr. This function is the inverse of b2a_hex()
hexstr must contain an even number of hexadecimal digits (which can be upper or lower case), otherwise a TypeError is raised.

簡(jiǎn)單說(shuō)來(lái)，就是上面函數(shù)的反操作，即把十六進(jìn)制串轉(zhuǎn)為二進(jìn)制數(shù)據(jù)

舉個(gè)栗子

>>> binascii.a2b_hex('616263')
'abc'
>>> binascii.unhexlify('616263')
'abc'

pack_into()　and pack_from()

文檔說(shuō)

struct.pack_into(fmt, buffer, offset, v1, v2, ...)

Pack the values v1, v2, ...
according to the given format, write the packed bytes into the writable buffer starting at offset. Note that the offset is a required argument.

簡(jiǎn)單說(shuō)來(lái)，就是把values：v1, v2, ...打包按格式fmt轉(zhuǎn)換后寫(xiě)入指定的內(nèi)存buffer中，并且可以指定buffer中的offset即偏移量，從哪里開(kāi)始寫(xiě)。

struct.unpack_from(fmt, buffer[, offset=0])

Unpack the buffer according to the given format. The result is a tuple even if it contains exactly one item. The buffer must contain at least the amount of data required by the format (len(buffer[offset:])
must be at least calcsize(fmt)).

簡(jiǎn)單說(shuō)來(lái)，就是從內(nèi)存中的指定buffer區(qū)讀取出來(lái)，然后按照fmt格式解析。可以指定offset，從buffer的哪個(gè)位置開(kāi)始讀取。

相比于前面的pack, unpack，這兩個(gè)函數(shù)有什么作用呢？我們也可以看出區(qū)別，就是多了buffer這東東，內(nèi)存中的一個(gè)緩沖區(qū)。在前面，pack需要將values v1, v2打包放入內(nèi)存中某個(gè)區(qū)域，而這某個(gè)區(qū)域是程序內(nèi)部定的，可能會(huì)讓出很多的空間給它放，這有點(diǎn)浪費(fèi)了。其次，如果每次間斷性的來(lái)一些vlaues，然后又要開(kāi)辟新的空間，這效率有點(diǎn)慢了，拖時(shí)間啊！那還不如我們一次性給定算了，而且我們可以指定多少內(nèi)存給它，這樣就不會(huì)浪費(fèi)內(nèi)存了。

舉個(gè)栗子

import struct
import binascii
import ctypes

vals1 = (1, 'hello', 1.2)
vals2 = ('world', 2)
s1 = struct.Struct('I5sf')
s2 = struct.Struct('5sI')
print 's1 format: ', s1.format
print 's2 format: ', s2.format

b_buffer = ctypes.create_string_buffer(s1.size+s2.size)  #開(kāi)出一塊buffer
print 'Before pack:',binascii.hexlify(b_buffer)
s1.pack_into(b_buffer,0,*vals1)
s2.pack_into(b_buffer,s1.size,*vals2)
print 'After pack:',binascii.hexlify(b_buffer)
print 'vals1 is:', s1.unpack_from(b_buffer,0)
print 'vals2 is:', s2.unpack_from(b_buffer,s1.size)

結(jié)果輸出：

s1 format:  I5sf
s2 format:  5sI
Before pack: 00000000000000000000000000000000000000000000000000000000
After pack: 0100000068656c6c6f0000009a99993f776f726c6400000002000000
vals1 is: (1, 'hello', 1.2000000476837158)
vals2 is: ('world', 2)

咋看之下，我們用了class struct.Struct(format)這個(gè)類(lèi)，這跟前面是有一點(diǎn)不同，前面我們是面向過(guò)程，但現(xiàn)在是面向?qū)ο罅耍骱瘮?shù)功能還是一樣的。
這里需要注意的一點(diǎn)是，float在unpack后的精度變了！

這里，由于vals1, vals2是tuple,　故在函數(shù)傳遞時(shí)用*vals1帶上星號(hào)*, 會(huì)把帶星號(hào)*的tuple，此處的vals1, vals2解析出單獨(dú)的數(shù)據(jù)。沒(méi)有星號(hào)*就會(huì)出現(xiàn)參數(shù)錯(cuò)誤。

參考

Python2 官方文檔　7.3. struct — Interpret strings as packed binary data
糖拌咸魚(yú)同學(xué)的淺析Python中的struct模塊
Python2　官方文檔　18.14. binascii — Convert between binary and ASCII
zhangxinrun同學(xué)的　結(jié)構(gòu)體struct的自然對(duì)齊問(wèn)題（經(jīng)典）
知乎上鄭誠(chéng)同學(xué)的回答元組的reference前加個(gè)星號(hào)是什么意思？

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美国产综合欧美视频

Python學(xué)習(xí)筆記 --struct模板

Python學(xué)習(xí)筆記 --struct模板

about

pack() and unpack()

pack()

unpack()

Byte Order, Size, and Alignment

byte order

size and alignment

calcsize()

pack_into() and pack_from()

binascii module

pack_into()　and pack_from()

參考

推薦閱讀更多精彩內(nèi)容

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美 国产 综合 欧美 视频

Python學(xué)習(xí)筆記 --struct模板

about

pack() and unpack()

pack()

unpack()

Byte Order, Size, and Alignment

byte order

size and alignment

calcsize()

pack_into() and pack_from()

binascii module

pack_into() and pack_from()

參考

推薦閱讀更多精彩內(nèi)容

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美国产综合欧美视频

pack_into()　and pack_from()