劉鐵巖
- 微軟亞研院副院長,首席研究員
Key Technical Area
- Computer Vision
ImageNet 11-12年深度學習
ResNet(Residual Network)技術 - Speech
語音識別
16年年底,微軟把speech recognition word error rate 降低到5.1%(Magic Number,人類的錯誤率) - Natural Language
機器翻譯水準尚還低于人類,但距離不遠
翻譯準確率的量化?N-gram可以粗糙地衡量
業內認為一年后,可以超過同聲傳譯的專家 - Games
Alphago
Key Industries
Security
公安領域,交通領域
技術:人體分析,車輛分析,行為分析
Industry Trend
資本流向技術
博鰲亞洲論壇——Face++安保
Autonomous Drive
Google,Baidu,Mobileye,Tesla,Benz,BMW
最主要的問題:復雜路況,道德,法律條款
無人駕駛的車撞人的責任?
Industry Trend
Baidu:阿波羅計劃
Google:200w mile路測數據
Mobileye:3000wkm路測數據
Tesla:16年與Mobileye停止合作
Healthcare
最數字化(早已經是計算機輔助的技術,血常規,CT..)
- 基于大數據(CT,核磁共振)的輔助診斷系統
- 醫療知識圖譜
- 智能醫療顧問
- 基因工程
- 制藥、免疫
Deep Learning
An end-to-end learning approach that uses a highly complex model(nonlinear,multi-layer) to fit the training data from scratch.
做Genomics不需要先學幾年生物
LightGBM
速度快于XGBoost
Basic Machine Learning Concepts
- The goal:To learn a model from experiences/data
Training data
model - Test/inference/prediction
- Validation sets for hyperparameter tuning
- Training:empirical loss minimization
Loss Function L
1.Linear regression
2.SVM
3.Maximum likelihood
Biological Motivation and Connections
Dendrite 樹突
Synapse 突觸
Axon 軸突,輸出信號
Perceptron
Feedforward Neural Networks
有界連續函數可以被深度神經網絡完美逼近(要有隱層)Universal Approximation Theorem
Hidden Unites: Sigmoid and Tangenth
Sigmoid: f(x)=1/(1+e^(-x))
Rectified Linear Units
Loss Function
交叉商
Gradient Descent
GD肯定可以收斂,計算量很大
SGD(隨機梯度下降法),過程快很多,是對整體的無偏估計
SGD也有問題:可能方差非常大,掩蓋收斂過程的小的抖動,不能保證收斂性
定義一個Learning Rate,平方階求和收斂
實際上使用的是折中的辦法——Minibatch SGD
以上的都是基本方法
現在用了很多技巧和改進
比如Momentum SGD,Nesterov Momentum
AdaGrad
Adam
Regularization for deep learning
Overfitting
Generalization gap
DropOut:Prevents units from co-adapting too much
Batch Normalization:The distribution of each layer's inputs changes during training帶參數的歸一化
Weight decay(or L^2 parameter norm penalty)
Early Stopping
Convolutional neural networks
局部連接
模擬人的模式識別的過程
卷積核:SGD學出來
Pooling:Reduce dimension
An example:VGG
- Gradient Vanishing
深層神經網絡,梯度求不出來
Sigmoid求導數小于等于0.5,深層求導相乘,會變得很小
解決:Residual Network(ResNet) - What's Missing?
Feedforward network and CNN
However, many applications involve sequences with variable lengths
Recurrent Neural Networks(RNN)
We can process a sequence of vectors x by applying a recurrence formula at every time step
記憶上一層的輸入
- Many to One:輸入序列,輸出單一標量
- One to Many:輸入單一向量,輸出序列(例如:看圖寫話)
- Many to many:Language Modeling (聯想下一個詞).Encoder-Decoder for Sequence Generation.
同樣的問題:網絡過長
解決:Long Short Term Memory
Deep learning toolkits
- Tensorflow(Google)
- Caffe(UC Berkeley)
- CNTK(Microsoft)
- MAXNET(Amazon)
- Torch7(NYU/Facebook)
- Theano(U Mnotreal)
圖像分類:Caffe Torch
文本:Theano
大規模:CNTK
豐富性:Tensorflow
Advanced topics in deep learning
Challenging of deep learning
- Relying on Big Training Data
- Relying on Big Computation
- Modify Coefficients
- Lack of interpretability
黑盒子?白盒子? - Lack of Diverse Tech Roadmaps
NIPS,ICML越來越多的論文是Deep Leaning - Overlooking Differences between Animal and Human
解決的是函數擬合問題,離真正的智能還很遠
Dual learning
- A New View:The Beauty of Symmetry
Dual Learning from with 10% bilingual data (2016 NIPS)
Lightweight deep learning
Light RNN
Distributed deep learning
Convex Problems
Universal Approximation Theorem只是存在性命題