zztt15.su黑料不打烊入口,国产午夜成人免费看片无遮挡,七仙女裸体被强伦理

一、概述

先上圖，圖來自李宏毅老師的《Deep Learning for Human Language Processing》課程

image

核心包括三個部分 Encoder, Attention, Decoder，其中Attention的實現方式各種各樣，也是大家重點研究的對象。 Encoder 和 Decoder的使用和實現方式就比較通用了。并且也都是用到RNN結構。

二、RNN

RNN 實際上和Conv 的本質上是相同，拿(None,Width, Height, Channel) 結果來說，就是將 w,h 空間上的信息提取到channel 維度上。換語之，rnn的操作就是將 t 時間維度上的信息，變成有限容量的信息。

image

每個時刻的輸入對應一個 $x_t$ , RNN是參數共享，每一個時刻通過的都是同一個神經元。公式如下

一個segment 結束后，同時有兩個輸出 y_t, h_t，下一個時刻 h_t 作為新的輸入經過 σh 后輸出下一個時刻的 y{t+1}, h_{t+1}，以此類推。

RNN結構看似能完美解決了時間維度上的體征提取，但同時存在諸多問題，例如:

無法長時記憶，RNN 對于短的序列可以有效記錄相關信息，較長序列時，因為特征是通過簡單的累加操作，當序列中存在較多相似的特征時，就容易被覆蓋。

超參調教困難，RNN 存在遞歸相乘的結構 h_t = σh(W_h x_t + U_h y{t-1} + b_h)，進行倒數運算的時候，就會發現，存在 W^{t-1} 項，意味著什么的，超參選取不合理的時候，在進行反向梯度傳播的時候，即便時較小的變動也會造成網絡參數的巨大波動，梯度爆炸，或者梯度擬散。

如何解決呢？使用RNN相應的變種 lstm 或 gru 都可以很好的解決～

三、Embedding 編碼

現實世界存在很多具有相關性的數據，以詞匯為例： “哈士奇”，“薩摩耶”， “英短”， “加菲”，如果以簡單的 one_hot 進行處理那么上面的數據將被處理成: 000, 100, 010,001, 不經加大了運算量，同時破壞了數據相關性。

那么通過 Embedding 層后的，數據是什么樣子的？接圖：

image

數據使用向量表示，并采用歐式距離表示數據相關性，可視化的結果，就是相關的數據聚合在某個區域。

四、構建模型

上面講述基礎的單元，接下來補充一下，更大的基礎結構，==模型==。

編碼器模型


class Encoder(tf.keras.Model):
    def __init__(self, vocab_size, embedding_dim, enc_units, batch_sz):
        super(Encoder, self).__init__()
        self.batch_sz = batch_sz
        self.enc_units = enc_units
        self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim)
        self.gru = tf.keras.layers.GRU(self.enc_units, return_sequences=True, return_state=True,
                                       recurrent_initializer='glorot_uniform')

    def call(self, x, hidden):
        x = self.embedding(x)
        output, state = self.gru(x, initial_state=hidden)
        return output, state

    def initialize_hidden_state(self):
        return tf.zeros((self.batch_sz, self.enc_units))

一個批次的數據的 shape 通過， embeding層和 rnn 層后，相應的shape 的變換為。

i: embeding 層變換

(None, 20) => (None, 20 , embeding_dim)

ii:RNN 層變

RNN 輸出兩個Tensor, (None, enc_units) 為最后一個經過RNNCell的狀態，可以認為是整句話的語境， (None, 20, enc_uints) 為每個時刻輸出的狀態。

(None, 20 , embeding_dim) => (None, enc_units), (None, 20, enc_uints)

解碼器模型


class Decoder(tf.keras.Model):
    def __init__(self, vocab_size, embedding_dim, dec_units, batch_sz):
        super(Decoder, self).__init__()
        self.batch_sz = batch_sz
        self.dec_units = dec_units
        self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim)
        self.gru = tf.keras.layers.GRU(self.dec_units,
                                       return_sequences=True,
                                       return_state=True,
                                       recurrent_initializer='glorot_uniform')
        self.fc = tf.keras.layers.Dense(vocab_size)

        self.attention = BahdanauAttention(self.dec_units)

    def call(self, x, hidden, enc_output):
        # 解碼器 hidden 和 編碼器 output 輸入dao Attention中，hidden (128, 256),  enc_output (128, 20, 256)
        context_vector, attention_weights = self.attention(hidden, enc_output)

        x = self.embedding(x)

        x = tf.concat([tf.expand_dims(context_vector, 1), x], axis=-1)

        output, state = self.gru(x)

        output = tf.reshape(output, (-1, output.shape[2]))

        x = self.fc(output)

        return x, state, attention_weights

解碼器的接口在本質上和編碼器相似，不一樣的是，解碼器的RNN輸入是來自上一個 RNN 輸出，并且在首次輸入的時候都是一個固定的標志符 "start" . 擁有start 標志符還不夠，還需要結合編碼器的輸出輸出的語境。

當前時刻輸入 = start標志符 +  語境 vector

i. embeding 層變換

此部分和編碼器一摸一樣, 不同的是，輸入的數量是一個字符表示啟動。
(None, 1) => (None, 1, embeding_dim)

ii. 語境結合

content_vector 來自于Atention Model，下面會講到。


(None, 1, embeding) + boradcat(context_vector) ==> (None, 1, embeding_dim + content_vector_dim)

iii. RNN 層變換

經過 attention 加持過的輸出，需要再次經過RNN 層，入下圖。

image

以頭尾相接的形式，也就是上一個時刻的輸出為下一個時刻的輸入。變換如下

(None, 1, embeding_dim + content_vector_dim) => (None, 1)

進過N輪后，或者超過最大限制長度，循環結束。輸出的shape 經過 concat處理得到 (None, N), 通過查表可得到對應的字符

Attention

該層是變化最多的層，經典的有: Luong Attention 和 Bahdanau Attention, 這里我只講解 Bahdanau Attention

結構圖

image

解碼器的 t-1 時刻輸出和編碼器全部時刻輸出，經過一個 a_t 變換后，得到 attention_weight 權重， attention_weight 和編碼器全部輸出再次經過一個 c_t變換等到 t時刻的 content_vector。shape的變化如下注釋所示。

class BahdanauAttention(tf.keras.Model):
    def __init__(self, units):
        super(BahdanauAttention, self).__init__()
        self.W1 = tf.keras.layers.Dense(units)
        self.W2 = tf.keras.layers.Dense(units)
        self.V = tf.keras.layers.Dense(1)

    def call(self, query, values):
        # hidden shape == (batch_size, hidden size)
        # hidden_with_time_axis shape == (batch_size, 1, hidden size)
        # we are doing this to perform addition to calculate the score
        # (128, 256) => (128, 1,  256)
        hidden_with_time_axis = tf.expand_dims(query, 1)

        # (128, 20, 256) x (256, 256)  + (128, 1, 256) x (256, 256) => (128, 20, 256) + (128, 20, 256) => (128, 20, 256)
        # fc 層處理 (128, 20, 256) x (256, 1) => (128, 20, 1)
        # score shape == (batch_size, max_length, hidden_size)
        score = self.V(tf.nn.tanh(
            self.W1(values) + self.W2(hidden_with_time_axis)))

        # softmax 層處理到 0 ~ 1
        # attention_weights shape == (batch_size, max_length, 1)
        # we get 1 at the last axis because we are applying score to self.V
        attention_weights = tf.nn.softmax(score, axis=1)

        # (128, 20, 1) x (128, 20, 256) => (128, 20, 256)
        # context_vector shape after sum == (batch_size, hidden_size)
        context_vector = attention_weights * values
        context_vector = tf.reduce_sum(context_vector, axis=1)

        return context_vector, attention_weights

注意上面的變化是某個時刻的變換，而不是群全部時刻的，所以如果想的全部時刻的變化，將生成多個時刻的 content_vector。

四、loss 計算

使用categorical_crossentropy 即可，如果出現梯度擬散，計算loss的時候，使用batch的loss即可。

?五、注意事項

訓練文本需要標記 start end 符號，使得模型知道何時輸入開始和輸出結束。
訓練時間比較長，使用小數據確認loss下降后，再去訓練。

六、附加

請右鍵下載后使用

視頻及其代碼

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美国产综合欧美视频

基于LAS模型的聊天機器人解讀

基于LAS模型的聊天機器人解讀

一、概述

二、RNN

三、Embedding 編碼

四、構建模型

i: embeding 層變換

ii:RNN 層變

i. embeding 層變換

ii. 語境結合

iii. RNN 層變換

四、loss 計算

?五、注意事項

六、附加

推薦閱讀更多精彩內容

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美 国产 综合 欧美 视频

基于LAS模型的聊天機器人解讀

一、概述

二、RNN

三、Embedding 編碼

四、構建模型

i: embeding 層變換

ii:RNN 層變

i. embeding 層變換

ii. 語境結合

iii. RNN 層變換

四、loss 計算

?五、注意事項

六、附加

推薦閱讀更多精彩內容

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美国产综合欧美视频