印度女人丰满大屁股毛茸茸,亚洲同性猛男毛片,东京热无码AV一区二区

數據集中含有10072個圖片文件和10072個圖片所對應的包含圖片中中文字內容的文本文。

訓練樣本

task:

1.得到圖片數據集中所有的中文字符，構成字符字典，字典大小為所包含不同中文字符的類別數；（dict_size=992，加上一個“空白”，在CTC中一共含有992+1=993個類別）

2.構建訓練數據 train_x,train_y; train_x中每一個元素為一張圖片（cv2.imread()讀取的灰度圖），train_y 中每一個元素為圖片對應的文字在字符字典中的序號；

print("train_size:{}".format(len(train_x)))//輸出訓練集大小

code：

def get_char_dict(path):
char_dict = []
txt_files = glob.glob(path + '*.txt')
# print(len(txt_files))
for file in txt_files:
    with open(file, 'r') as f:
        text = f.readline()
    char_dict += text
char_dict = set(char_dict)
char_dict = list(char_dict)
dict_size = len(char_dict)
print("dict_size:{}".format(dict_size))
return char_dict


def get_data(path, char_dict):
  train_x = []
  train_y = []
  txt_files = glob.glob(path + '*.txt')
  # random.shuffle(txt_files)
  for file in txt_files:
      base_name = os.path.basename(file)
      file_name, _ = os.path.splitext(base_name)
      image = cv2.imread(path + file_name + '.jpg', cv2.IMREAD_GRAYSCALE)
      train_x.append(image)
      with open(file, 'r') as f:
        label = []
        text = f.readline()
        for c in text:
            index = char_dict.index(c)
            label.append(index)
    train_y.append(label)
# # 若圖片路徑中含中文字符時，
# # cv2.imread()讀取圖像失敗返回None,
# # 刪除為None的數據
# for i, img in enumerate(train_x):
#    if train_x[i] is None:
#        del train_x[i]
#        del train_y[i]
print("train_size:{}".format(len(train_x)))
return train_x, train_y

reference：

glob.glob()

返回path路徑下的符合條件的所有文件，然后用for循環對每一個文件進行操作。

import glob
txt_files = glob.glob(path + '*.txt')
for file in txt_files:
      ***

返回path路徑下的符合條件的所有文件

python open()

用于打開一個文件。創建一個 file 對象，相關的方法才可以調用它進行讀寫。

 with open(file, 'r') as f:
    text = f.readline()

readline()函數讀取整行，包括 "\n" 字符。

創建一個 file 對象，用readline()方法讀取.txt文件中的文字

python set()

返回一個無序不重復元素集(這里用于刪除重復的中文字符)。

python list()

用于將元組轉換為列表。

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美国产综合欧美视频

利用CRNN來識別圖片中的文字（一）數據預處理

利用CRNN來識別圖片中的文字（一）數據預處理

task:

code：

reference：

glob.glob()

python open()

python set()

python list()

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美 国产 综合 欧美 视频

利用CRNN來識別圖片中的文字（一）數據預處理

task:

code：

reference：

glob.glob()

python open()

python set()

python list()

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美国产综合欧美视频