黑人巨大两根一起挤进A片小说,原神申鹤的裸妆水乳,亚洲中久无码永久在线

上一次提到了不錯的學習聊天機器人的資源，不知道小伙伴們有沒有去學習呢。
自己動手做聊天機器人教程
我最近每天都會學一點，拿出解讀來和大家分享一下。

本文結構：

1. 聊天機器人的架構簡圖
1. 用 TensorFlow 實現 Chatbot 的模型
1. 如何準備 chatbot 的訓練數據
1. Chatbot 源碼解讀

1. 聊天機器人的架構簡圖

學習資源：
[自己動手做聊天機器人九-聊天機器人應該怎么做]
(http://www.shareditor.com/blogshow/?blogId=73)

聊天機器人的工作流程大體為：提問－檢索－答案抽取。

提問：就是要分析主人的問句中關鍵詞，提問類型，還有真正想知道的東西。

檢索：根據前一步的分析，去找答案。

答案抽取：找到的答案，并不能直接應用，還要整理成真正有用的，可以作為答案的回答。

涉及到的關鍵技術如圖中所示。

看不清圖的話，就是醬紫：

問句解析：
中文分詞、詞性標注、實體標注、概念類別標注、句法分析、語義分析、邏輯結構標注、指代消解、關聯關系標注、問句分類、答案類別確定；

海量文本知識表示：
網絡文本資源獲取、機器學習方法、大規模語義計算和推理、知識表示體系、知識庫構建

答案生成與過濾：
候選答案抽取、關系推演、吻合程度判斷、噪聲過濾

2. 用 TensorFlow 實現 Chatbot 的模型

之前有根據 Siraj 的視頻寫過一篇《自己動手寫個聊天機器人吧》，
文章里只寫了主函數的簡單過程：Data－Model－Training，是用 Lua 實現的，詳細的代碼可以去他的 github 上學習

下面這篇文章是用 TensorFlow + tflearn 庫實現，在 建模，訓練和預測 等環節可以學到更多細節：

學習資源：自己動手做聊天機器人三十八-原來聊天機器人是這么做出來的

兩篇的共同點是都用了 Seq2Seq 來實現。

LSTM的模型結構為：

細節的話可以直接去看上面那篇原文，這里 po 出建立模型階段簡要的流程圖和過程描述：

先將原始數據 300w chat 做一下預處理，即切詞，分為問答對。
然后用 word2vec 訓練出詞向量，生成二進制的詞向量文件。

作為 Input data X 傳入下面流程：

question 進入 LSTM 的 encoder 環節，answer 進入 decoder 環節，
分別生成 output tensor。
其中 decoder 是一個詞一個詞的生成結果，將所有結果加入到一個 list 中。
最后和 encoder 的輸出，一起做為下一環節 Regression 的輸入，并傳入 DNN 網絡。

3. 如何準備 chatbot 的訓練數據

學習資源：
自己動手做聊天機器人三十八-原來聊天機器人是這么做出來的

訓練數據的生成過程如下：

首先在 input file 里讀取每一行，并根據 ‘｜’ 拆分成 question 和 answer 句子。
每個句子，都將 word 通過 word2vec 轉化成詞向量。
每一句的向量序列都轉化成相同維度的形式：self.word_vec_dim * self.max_seq_len
最后 answer 構成了 y 數據，question＋answer 構成了 xy 數據，再被投入到 model 中去訓練：

model.fit(trainXY, trainY, n_epoch=1000, snapshot_epoch=False, batch_size=1)

代碼如下：

def init_seq(input_file):
    """讀取切好詞的文本文件，加載全部詞序列
    """
    file_object = open(input_file, 'r')
    vocab_dict = {}
    while True:
        question_seq = []
        answer_seq = []
        line = file_object.readline()
        if line:
            line_pair = line.split('|')
            line_question = line_pair[0]
            line_answer = line_pair[1]
            for word in line_question.decode('utf-8').split(' '):
                if word_vector_dict.has_key(word):
                    question_seq.append(word_vector_dict[word])
            for word in line_answer.decode('utf-8').split(' '):
                if word_vector_dict.has_key(word):
                    answer_seq.append(word_vector_dict[word])
        else:
            break
        question_seqs.append(question_seq)
        answer_seqs.append(answer_seq)
    file_object.close()

def generate_trainig_data(self):
        xy_data = []
        y_data = []
        for i in range(len(question_seqs)):
            question_seq = question_seqs[i]
            answer_seq = answer_seqs[i]
            if len(question_seq) < self.max_seq_len and len(answer_seq) < self.max_seq_len:
                sequence_xy = [np.zeros(self.word_vec_dim)] * (self.max_seq_len-len(question_seq)) + list(reversed(question_seq))
                sequence_y = answer_seq + [np.zeros(self.word_vec_dim)] * (self.max_seq_len-len(answer_seq))
                sequence_xy = sequence_xy + sequence_y
                sequence_y = [np.ones(self.word_vec_dim)] + sequence_y
                xy_data.append(sequence_xy)
                y_data.append(sequence_y)
        return np.array(xy_data), np.array(y_data)

4. Chatbot 源碼解讀

學習資源：
自己動手做聊天機器人三十八-原來聊天機器人是這么做出來的

這篇文章在 github 上的源碼：

提煉出步驟如下：

其中 2. 準備數據， 3. 建立模型就是上文著重說的部分。

1. 引入包
1. 準備數據
1. 建立模型
1. 訓練
1. 預測

1. 引入包

import sys
import math
import tflearn
import tensorflow as tf
from tensorflow.python.ops import rnn_cell
from tensorflow.python.ops import rnn
import chardet
import numpy as np
import struct

2. 準備數據

def load_word_set()
將 3000 萬語料，分成 Question 和 Answer 部分，提取出 word。

def load_word_set():
    file_object = open('./segment_result_lined.3000000.pair.less', 'r')
    while True:
        line = file_object.readline()
        if line:
            line_pair = line.split('|')
            line_question = line_pair[0]
            line_answer = line_pair[1]
            for word in line_question.decode('utf-8').split(' '):
                word_set[word] = 1
            for word in line_answer.decode('utf-8').split(' '):
                word_set[word] = 1
        else:
            break
    file_object.close()

def load_vectors(input)
從 vectors.bin 加載詞向量，返回一個 word_vector_dict 的詞典，key 是詞，value 是200維的向量。

def init_seq(input_file)
將 Question 和 Answer 中單詞對應的詞向量放在詞向量序列中 question_seqs， answer_seqs。

def init_seq(input_file):
    """讀取切好詞的文本文件，加載全部詞序列
    """
    file_object = open(input_file, 'r')
    vocab_dict = {}
    while True:
        question_seq = []
        answer_seq = []
        line = file_object.readline()
        if line:
            line_pair = line.split('|')
            line_question = line_pair[0]
            line_answer = line_pair[1]
            for word in line_question.decode('utf-8').split(' '):
                if word_vector_dict.has_key(word):
                    question_seq.append(word_vector_dict[word])
            for word in line_answer.decode('utf-8').split(' '):
                if word_vector_dict.has_key(word):
                    answer_seq.append(word_vector_dict[word])
        else:
            break
        question_seqs.append(question_seq)
        answer_seqs.append(answer_seq)
    file_object.close()

def vector_sqrtlen(vector)
用來求向量的長度。

def vector_sqrtlen(vector):
    len = 0
    for item in vector:
        len += item * item
    len = math.sqrt(len)
    return len

def vector_cosine(v1, v2)
用來求兩個向量間的距離。

def vector_cosine(v1, v2):
    if len(v1) != len(v2):
        sys.exit(1)
    sqrtlen1 = vector_sqrtlen(v1)
    sqrtlen2 = vector_sqrtlen(v2)
    value = 0
    for item1, item2 in zip(v1, v2):
        value += item1 * item2
    return value / (sqrtlen1*sqrtlen2)

def vector2word(vector)
給定一個詞向量，去 word－vector 字典中查找與此向量距離最近的向量，并記憶相應的單詞，返回單詞和 cosine 值。

def vector2word(vector):
    max_cos = -10000
    match_word = ''
    for word in word_vector_dict:
        v = word_vector_dict[word]
        cosine = vector_cosine(vector, v)
        if cosine > max_cos:
            max_cos = cosine
            match_word = word
    return (match_word, max_cos)

3. 建立模型

class MySeq2Seq(object)
在前兩篇筆記中單獨寫了這兩塊。

def generate_trainig_data(self)
由 question_seqs， answer_seqs 得到 xy_data 和 y_data 的形式。

def model(self, feed_previous=False)
用 input data 生成 encoder_inputs 和帶GO頭的 decoder_inputs。
將 encoder_inputs 傳遞給編碼器，返回一個輸出(預測序列的第一個值)和一個狀態(傳給解碼器)。
在解碼器中，用編碼器的最后一個輸出作為第一個輸入，預測過程用前一個時間序的輸出作為下一個時間序的輸入。

4. 訓練

def train(self)
用 generate_trainig_data() 生成 X y 數據，傳遞給上面定義的 model，并訓練 model.fit，再保存。

    def train(self):
        trainXY, trainY = self.generate_trainig_data()
        model = self.model(feed_previous=False)
        model.fit(trainXY, trainY, n_epoch=1000, snapshot_epoch=False, batch_size=1)
        model.save('./model/model')
        return model

5. 預測

用 generate_trainig_data() 生成數據，用 model.predict 進行預測，predict 結果的每一個 sample 相當于一句話的詞向量序列，每個 sample 中的每個 vector 在 word－vector 字典中找到與其最近的向量，并返回對應的 word，及二者間的 cosine。

if __name__ == '__main__':
    phrase = sys.argv[1]
    if 3 == len(sys.argv):
        my_seq2seq = MySeq2Seq(word_vec_dim=word_vec_dim, max_seq_len=max_seq_len, input_file=sys.argv[2])
    else:
        my_seq2seq = MySeq2Seq(word_vec_dim=word_vec_dim, max_seq_len=max_seq_len)
    if phrase == 'train':
        my_seq2seq.train()
    else:
        model = my_seq2seq.load()
        trainXY, trainY = my_seq2seq.generate_trainig_data()
        predict = model.predict(trainXY)
        for sample in predict:
            print "predict answer"
            for w in sample[1:]:
                (match_word, max_cos) = vector2word(w)
                #if vector_sqrtlen(w) < 1:
                #    break
                print match_word, max_cos, vector_sqrtlen(w)

歷史技術博文鏈接匯總

我是 不會停的蝸牛 Alice
85后全職主婦
喜歡人工智能，行動派
創造力，思考力，學習力提升修煉進行中
歡迎您的喜歡，關注和評論！

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美国产综合欧美视频

用 TensorFlow 做個聊天機器人

用 TensorFlow 做個聊天機器人

1. 聊天機器人的架構簡圖

2. 用 TensorFlow 實現 Chatbot 的模型

3. 如何準備 chatbot 的訓練數據

4. Chatbot 源碼解讀

1. 引入包

2. 準備數據

3. 建立模型

4. 訓練

5. 預測

推薦閱讀更多精彩內容

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美 国产 综合 欧美 视频

用 TensorFlow 做個聊天機器人

1. 聊天機器人的架構簡圖

2. 用 TensorFlow 實現 Chatbot 的模型

3. 如何準備 chatbot 的訓練數據

4. Chatbot 源碼解讀

1. 引入包

2. 準備數據

3. 建立模型

4. 訓練

5. 預測

推薦閱讀更多精彩內容

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美国产综合欧美视频