目標跟蹤簡述+深度學習目標跟蹤+context目標跟蹤

Visual Tracking With Deep Learning And The Context

一. The overview of Visual Tracking 目標跟蹤簡介

1. What is visual tracking?


This three pictures are the 1,40,80 frame of the same video.When we give the bounding-box of the running woman in the first frame,the bounding-box can still circle the same woman.

Given the initialized state (e.g.position and size) of a target object in a frame of a video, the goal of tracking is to estimate the states of the target in the subsequent frames.

Although object tracking has been studied for several decades, and much progress has been made in recent years , it remains a very challenging problem.

Numerous factors affect the performance of a tracking algorithm, such as illumination variation, occlusion, as well as background clutters, and there exists no single tracking approach that can successfully handle all scenarios.

2. Difficulties of visual tracking

There are many limiting factors of object tracking based on video image. In the theory and method, the research on the target tracking is confronted with great challenge.

The diversity of the target

  • Multiple moving targets. It is difficult to describe the unified model.

  • Motion laws of the targets are very complex.

  • The movement of the targets can lead to changes in its appearance.

  • Mutual occlusion may occur between multiple moving objects.

The complexity of the scene

  • Changes in lighting, atmospheric conditions in the scene can cause serious interference.

  • Regions having similar appearance as the target.

  • The target may be obscured by objects in the scene

In a dilemma

  • Fast but Fallible

  • Robust but Slow

  • The contradiction between real-time and accuracy

困境

3. Recent algorithms for visual tracking

Based on model matching

----- global model matching

  • Create a target appearance model online or offline.
  • Search for the most similar regions of the image in the model.
  • Advantage: Tracking rigid targets works well.
  • Disadvantage: can not work while the appearance changed.

-----Local model matching

  • Tracking targets are divided into different components, and the models are respectively established for each component.
  • Human motion is divided into head, limbs, body.
  • Advantage: Tracking stability. Especially occlusion
  • Disadvantage: Matching between components is difficult. time-consuming

-----Feature matching

  • Extracts features with translation, rotation, and scaling invariance.
  • Feature matching the current frame.
  • Advantage: insensitive to the shape, scale and other changes of the target.
  • Disadvantage: Most image features are sensitive to ambient conditions such as changes in light.

Based on classification

  • Take the tracking as online classification.
  • One is the target, the other is the background.
  • Training a target-background classifier.
  • The classifier is updated with the current image frame
  • Advantage: has a certain self-adaptability to the change of target
  • Disadvantage: Classification accuracy often depends on the expression of target features

Based on bayes filtering

  • Combining a priori information with current information.
  • The state of the target image in the current frame is estimated optimally using the a priori information before the current frame.
  • Typical algorithms include** Kalman filter** and particle filter.
  • Advantage: Wide range of applications and less constraints.
  • disadvantage: Particle filter algorithms often produce a large number of particles due to the precision of filtering, and the more the number of particles required, the higher the complexity of the algorithm

Based on deep learning(after 2015)

Depth learning in the field of target tracking is not smooth sailing. The main problem is the lack of training data: one of the magic of the depth model comes from the effective training of a large number of labeled training data, while the target tracking only provides the first frame of the bounding-box as training data. In this case, it is difficult to train a depth model at the beginning of the trace for the current target.

Several ideas:
  • Pre-training the depth model with auxiliary image data, and fine-tune on-line tracking.(DLT,SO-DLT NIPS15)
  • The CNN classification network pre-trained by the existing large-scale classification dataset is used to extract the features.(FCNT,HCFT ICCV15)
  • Pre-training with tracking sequences.(Mdnet CVPR16)
  • Using RNN.(RTT CVPR16)

4. Deep Learning for visual tracking

DLT: Learning a Deep Compact Image Representation for Visual Tracking (NIPS 2014)

DLT

預訓練:SDAE+Tiny Image dataset+無監督訓練:通用的物體表征能力;
在線跟蹤結構:SDAE的encoding(通用特征表示)+sigmoid分類(二分類跟蹤方式):獲得 目標與背景的分類;
微調:利用第一幀獲取正負樣本:獲取當前目標與背景更有針對性的分類網絡;
后續幀跟蹤:當前幀粒子濾波提取patch+patch依次輸入分類網絡+置信度;
模型更新:限定閾值;
優點:預訓練+微調:解決訓練數據不足
缺點:32*32 自編碼器是否適合分類跟蹤任務 4層網絡特征表達能力不足

SO-DLT:Transferring Rich Feature Hierarchies for Robust Visual Tracking(ICCV 2015)

SO-DLT

在線跟蹤:處理t幀時,以t-1幀預測位置為中心; 從小到大采樣不同尺度區域,依次放入網絡; 當CNN輸出的概率圖高于一個值,停止采樣,以當前概率圖為最佳區域; 在最終區域里確定boundingbox大小與位置
模型更新:CNNs---->及時響應目標變化; CNNl---->對噪聲魯棒;
借鑒:ensemble的思路解決update 的敏感性 ,跟蹤算法提高評分的殺手锏。

FCNT: Visual Tracking with Fully Convolutional Networks (ICCV 2015)

FCNT

預訓練:VGGNet+imageNet已分類數據集;
核心: FeatureMap可以直接做跟蹤目標定位;
高層特征:擅長區分不同類(高度抽象)
底層特征:擅長區分同類物體(關注局部細節)
兩層卷積結構: conv4-3:區分相似物體distractor(SNet) conv5-3:區分類別信息 (GNet)
在線跟蹤: 利用上一幀中心采樣一塊區域,分別輸入SNet和GNet; 生成兩個heatmap(互補);
SNet:去掉了distractor
GNet:目標更加明顯
總結: 有效抑制漂移,對遮擋不魯棒 track新思路(多少層 哪幾層)

MDNet:Learning Multi-Domain Convolutional Neural Networks for Visual Tracking(CVPR 2016)

圖像分類與實際跟蹤的巨大差別;
圖像分類: 目標和背景的任意組合,目標出現在任何一個背景都要被檢測出;
實際跟蹤: 給出第一幀的前后景后,后續幀前后景和第一幀很類似;
直接用視頻序列預訓練CNN; 目標差別:某類物體在一個序列中是目標,在另一個就可能是背景;

MDNet

共享層:CNN獲得目標通用的特征表達;
特定區域層:每個訓練序列--->單獨的domain--->單獨的二分類層--->區分當前序列前后景 (解決不同序列目標不一致問題)
確定bounding:RCNN Region Proposal方式 上一幀附近尋找256個proposal,之后進行bounding回歸
總結:Precision達到了94.8% 實時性:目標檢測的Region Proposal是否適合在線跟蹤任務 (256個proposal 89個domain)

Use RNN?

這是一個視頻的第一幀 第10幀和第20幀,汽車在勻速前進時,視頻序列具有明顯的時序相關性。
跟蹤任務的特殊性(時間序列,前后相關)
是否可以使用多方向的遞歸神經網絡(RNN)學出跟蹤視頻序列的前后關聯性?

What is RNN ?

RNN神經元
隨時間展開的RNN

RNN Tracker

CVPR2016

image.png

AAAI2016

5. Visual Tracking With The Context

Context information is also very important for tracking.
Recently, some approaches have been proposed by mining auxiliary objects or local visual information surrounding the target to assist tracking .
The context information is especially helpful when the target is fully occluded or leaves the image region .
To improve the tracking performance, some tracker fusion methods have been proposed recently.

Context-Aware Visual Tracking

the environment can also be advantageous to the tracker if it contains objects that are correlated to the target

Question: whether the object being followed by the tracker is really the target?
Answer:Use the dynamic environment!


How to track a face in a crowd?

  • it is almost impossible to learn a discriminative model to distinguish the face of interest from the rest of the crowd.

Why do we have to focus our attention only on the target?

  • If the person (with that face) is wearing a quite unique shirt (or a hat), then including the shirt (or the hat) in matching will surely make the tracking much easier and more robust.
  • if another face always accompanies the target face, treating them as a geometric structure and tracking them as a group.

It seems that:

  • A target is seldom isolated and independent to the entire scene.
  • there may exist some objects that have short-term or long-term motion correlations to the targets.

So why not track the target and auxiliary objects as a group?

What is auxiliary objects?

  • frequent co-occurrence with the target .
  • consistent motion correlation to the target.
  • suitable for tracking.

This definition may cover a large variety of image regions or features

  • simple,generic, and low-level is better
  • Choose color regions but not the features
  • Because the color regions can be reliably and efficiently tracked

Experiments


(The yellow bounding-box is the target. the red are the color region.)

Tracking the Invisible: Learning Where the Object Might be

context helps in object detection is wellknown.
strongest predictors of vehicle presence and location in an image is the shadow it casts on the road


In tracking, many temporary, but potentially very strong links exist between the tracked object and the rest of the image.

local image features vote for the object.

  • Implicit Shape Model is used to choose the local image features.
  • Object points lie on the object surface and thus always have a strong correlation to the object motion(green points).
  • points on other independently moving objects or in the static background, are considered to carry no information about the object position(blue points).
  • Supporters are features which are useful to predicting the target object positions. They at least temporarily move in a way which is statistically related to the motion of the target(red points).

the position of an object can be estimated even when it is not seen directly (e.g., fully occluded or outside of the image region)
How to choose the supporter?


Experiments

We can see what we can not see

Context Tracker: Exploring Supporters and Distracters

Visual tracking is very challenging when the target leaves the field of view leading the tracker to follow another similar object, and not reacquire the right target when it reappears.
There is additional information which can be exploited instead of using only the object region.

What is supporters and distracters?
Distracters

  • Regions have similar appearance as the target
  • consistently co-occur
  • The tracker must keep tracking these distracters to avoid drifting
  • dangerous


Supporters

  • local key-points around the target
  • consistently co-occur
  • motion correlation
  • useful


Experiments

6. 目標跟蹤的方向

提高目標的特征描述能力

  • 足夠強的特征能夠應對絕大多負面的環境影響
    提高系統實時性
  • 搜索策略需要遍歷很多冗余區域大大影響到跟蹤算法的實時性
  • 如何縮小目標搜索范圍
最后編輯于
?著作權歸作者所有,轉載或內容合作請聯系作者
平臺聲明:文章內容(如有圖片或視頻亦包括在內)由作者上傳并發布,文章內容僅代表作者本人觀點,簡書系信息發布平臺,僅提供信息存儲服務。
  • 序言:七十年代末,一起剝皮案震驚了整個濱河市,隨后出現的幾起案子,更是在濱河造成了極大的恐慌,老刑警劉巖,帶你破解...
    沈念sama閱讀 229,963評論 6 542
  • 序言:濱河連續發生了三起死亡事件,死亡現場離奇詭異,居然都是意外死亡,警方通過查閱死者的電腦和手機,發現死者居然都...
    沈念sama閱讀 99,348評論 3 429
  • 文/潘曉璐 我一進店門,熙熙樓的掌柜王于貴愁眉苦臉地迎上來,“玉大人,你說我怎么就攤上這事。” “怎么了?”我有些...
    開封第一講書人閱讀 178,083評論 0 383
  • 文/不壞的土叔 我叫張陵,是天一觀的道長。 經常有香客問我,道長,這世上最難降的妖魔是什么? 我笑而不...
    開封第一講書人閱讀 63,706評論 1 317
  • 正文 為了忘掉前任,我火速辦了婚禮,結果婚禮上,老公的妹妹穿的比我還像新娘。我一直安慰自己,他們只是感情好,可當我...
    茶點故事閱讀 72,442評論 6 412
  • 文/花漫 我一把揭開白布。 她就那樣靜靜地躺著,像睡著了一般。 火紅的嫁衣襯著肌膚如雪。 梳的紋絲不亂的頭發上,一...
    開封第一講書人閱讀 55,802評論 1 328
  • 那天,我揣著相機與錄音,去河邊找鬼。 笑死,一個胖子當著我的面吹牛,可吹牛的內容都是我干的。 我是一名探鬼主播,決...
    沈念sama閱讀 43,795評論 3 446
  • 文/蒼蘭香墨 我猛地睜開眼,長吁一口氣:“原來是場噩夢啊……” “哼!你這毒婦竟也來了?” 一聲冷哼從身側響起,我...
    開封第一講書人閱讀 42,983評論 0 290
  • 序言:老撾萬榮一對情侶失蹤,失蹤者是張志新(化名)和其女友劉穎,沒想到半個月后,有當地人在樹林里發現了一具尸體,經...
    沈念sama閱讀 49,542評論 1 335
  • 正文 獨居荒郊野嶺守林人離奇死亡,尸身上長有42處帶血的膿包…… 初始之章·張勛 以下內容為張勛視角 年9月15日...
    茶點故事閱讀 41,287評論 3 358
  • 正文 我和宋清朗相戀三年,在試婚紗的時候發現自己被綠了。 大學時的朋友給我發了我未婚夫和他白月光在一起吃飯的照片。...
    茶點故事閱讀 43,486評論 1 374
  • 序言:一個原本活蹦亂跳的男人離奇死亡,死狀恐怖,靈堂內的尸體忽然破棺而出,到底是詐尸還是另有隱情,我是刑警寧澤,帶...
    沈念sama閱讀 39,030評論 5 363
  • 正文 年R本政府宣布,位于F島的核電站,受9級特大地震影響,放射性物質發生泄漏。R本人自食惡果不足惜,卻給世界環境...
    茶點故事閱讀 44,710評論 3 348
  • 文/蒙蒙 一、第九天 我趴在偏房一處隱蔽的房頂上張望。 院中可真熱鬧,春花似錦、人聲如沸。這莊子的主人今日做“春日...
    開封第一講書人閱讀 35,116評論 0 28
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽。三九已至,卻和暖如春,著一層夾襖步出監牢的瞬間,已是汗流浹背。 一陣腳步聲響...
    開封第一講書人閱讀 36,412評論 1 294
  • 我被黑心中介騙來泰國打工, 沒想到剛下飛機就差點兒被人妖公主榨干…… 1. 我叫王不留,地道東北人。 一個月前我還...
    沈念sama閱讀 52,224評論 3 398
  • 正文 我出身青樓,卻偏偏與公主長得像,于是被迫代替她去往敵國和親。 傳聞我的和親對象是個殘疾皇子,可洞房花燭夜當晚...
    茶點故事閱讀 48,462評論 2 378

推薦閱讀更多精彩內容

  • 簡單畫畫,求意見 古風畫法,第一次
    墨寫傳說zp閱讀 172評論 0 0
  • 灰色,是陰郁的顏色,我并不喜歡。但漸漸發現,自己的手機壁紙,輸入法的背景,頭像……居然都變成了灰色。當生活中不再...
    印凝閱讀 299評論 0 0
  • 想當初 我錯把你的冷漠當成了耍酷 而如今 看清楚 心痛的一塌糊涂 不愿意認輸 就一步 兩步 后退 然后抱著自己哭 ...
    小菲菲菲菲菲菲兒閱讀 260評論 0 2