python讀取csv文件報錯UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc9 in position 0: invalid co...

今天要處理大量的csv文件,出現UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc9 in position 0: invalid continuation byte 報錯,于是統一將文件轉為utf-8編碼,代碼如下:


# 將編碼轉化為utf-8編碼

def change_code(original_file, newfile):

files = os.listdir(original_file)

for namein files:

original_path = original_file +'\\' + name

f =open(original_path, 'rb+')

content = f.read()

source_encoding ='utf-8'

        try:

content.decode('utf-8').encode('utf-8')

source_encoding ='utf-8'

        except:

try:

content.decode('gbk').encode('utf-8')

source_encoding ='gbk'

            except:

try:

content.decode('gb2312').encode('utf-8')

source_encoding ='gb2312'

                except:

try:

content.decode('gb18030').encode('utf-8')

source_encoding ='gb18030'

                    except:

try:

content.decode('big5').encode('utf-8')

source_encoding ='gb18030'

                        except:

content.decode('cp936').encode('utf-8')

source_encoding ='cp936'

        f.close()

# 按照確定的encoding讀取文件內容,并另存為utf-8編碼:

        block_size =4096

        with codecs.open(original_path, 'r', source_encoding)as f:

newfile_path = newfile +'\\' + name

with codecs.open(newfile_path, 'w', 'utf-8')as f2:

while True:

content = f.read(block_size)

if not content:

break

                    f2.write(content)

?著作權歸作者所有,轉載或內容合作請聯系作者
平臺聲明:文章內容(如有圖片或視頻亦包括在內)由作者上傳并發布,文章內容僅代表作者本人觀點,簡書系信息發布平臺,僅提供信息存儲服務。

推薦閱讀更多精彩內容