NoteSet | SSL

科普筆記

自監督學習近期進展——naiyan wang

https://zhuanlan.zhihu.com/p/30265894


Yan Lecun 自監督學習:機器能像人一樣學習嗎? 110頁PPT+視頻

https://cloud.tencent.com/developer/article/1356966


自主學習(Self Learning)有什么比較新的思路?——xiaolong wang

https://www.zhihu.com/question/267563087/answer/327486390


ICML workshop

https://sites.google.com/view/self-supervised-icml2019


Lecun IJCAI18 ppt /?Zisserman ICML19 ppt

my MacBook :)


論文筆記

Multi-task?Self-Supervised Visual Learning (ICCV)

https://blog.csdn.net/hibercraft/article/details/80150148

用多自監督任務一起學圖像表示

Introduction

Vision is one of the most promising domains for unsupervised learning. Unlabeled images and video are available in practically unlimited quantities, and the most prominent present image models—neural networks—are data starved, easily memorizing even random labels for large image collections. Yet unsupervised algorithms are still not very effective for training neural networks: they fail to adequately capture the visual semantics needed to solve real-world tasks like object detection or geometry estimation the way strongly-supervised methods do. For most vision problems, the current state-of-the-art approach begins by training a neural network on ImageNet ?or a similarly large dataset which has been hand-annotated.

How might we better train neural networks without manual labeling? Neural networks are generally trained via backpropagation on some objective function. Without labels, however, what objective function can measure how good the network is??Self-supervised?learning answers this?question by proposing various tasks for networks to solve, where performance is easy to measure, i.e., performance can be captured with an objective function like those seen in supervised learning. Ideally, these tasks will be difficult to solve without understanding some form of image semantics, yet any labels necessary to formulate the objective function can be obtained automatically. In the last few years, a considerable number of such tasks have been proposed [1, 2, 6, 7, 8, 17, 20, 21, 23, 25, 26, 27, 28, 29, 31, 39, 40, 42, 43, 46, 47], such as asking a neural network to colorize grayscale images, fill in image holes, solve jigsaw puzzles made from image patches, or predict movement in videos. Neural networks pre-trained with these tasks can be re-trained to perform well on standard vision tasks (e.g. image classification, object detection, geometry estimation) with less manually-labeled data than networks which are initialized randomly. However, they still perform worse in this setting than networks pre-trained on ImageNet.

Related Work

自監督方法

兩類:use auxiliary information / use raw pixels.

video & image


TextTopicNet: Self-Supervised Learning of Visual Features Through Embedding Images on Semantic Text Spaces

https://blog.csdn.net/qq_26074263/article/details/81277630

用文本監督圖像


Split-brain Auto-encoder

https://richzhang.github.io/splitbrainauto/

不同channel之間預測監督,L-ab, RGB-D


Unsupervised Visual Representation Learning by Context Prediction

relative position

Abstract

This work explores the use of spatial context as a source of free and plentiful supervisory signal for training a rich visual representation. (自監督)

Introduction

1. 小樣本

Recently, new computer vision methods have leveraged large datasets of millions of labeled examples to learn rich, high-performance visual representations.

Yet efforts to scale these methods to truly Internet-scale datasets (i.e. hundreds of?billions of images) are hampered by the sheer expense of the human annotation required.

A natural way to address this difficulty would be to employ unsupervised learning, which aims to use data without any annotation.

2. 動機:文本領域中的context

This converts an apparently unsupervised problem (finding a good similarity metric between words) into a “self-supervised” one: learning a function from a given word to the words surrounding it.?

Here the context prediction task is just a “pretext” to force the model to learn a good word embedding, which, in turn, has been shown to be useful in a number of real tasks, such as semantic word similarity.

3. Our paper

Our underlying hypothesis is that doing well on this task requires understanding scenes and objects,?i.e.?a good visual representation for this task will need to extract objects and their parts in order to reason about their relative spatial location. (借口任務的作用)

“Objects,” after all, consist of multiple parts that can be detected independently of one another, and which?occur in a specific spatial configuration (if there is no specific configuration of the parts, then it is “stuff” [1]).

We demonstrate that the resulting visual representation is good for both object detection, providing a significant boost on PASCAL VOC 2007 compared to learning from scratch, as well as for unsupervised object discovery / visual data mining. This means, surprisingly, that our representation generalizes?across?images, despite being trained using an objective function that operates on a single image at a time. That is, instance-level supervision appears to improve performance on category-level tasks.

Related Work

1. 生成模型

存在問題:Generative models have shown promising performance on smaller datasets such as handwritten digits [25,?24,?48,?30,?46], but none have proven effective for high-resolution natural images.(2016年)

2. 無監督

存在問題:We believe that current reconstruction-based algorithms struggle with low-level phenomena, like stochastic textures, making it hard to even measure whether a model is generating well.

文本領域:context prediction

各種pretext task:However, such a task would be trivial, since discriminating low-level color statistics and lighting would be enough.?To make the task harder and more high-level, in this paper, we instead classify between multiple possible configurations of patches sampled from the same image, which means they will share lighting and color statistics, as shown on Figure?2.

Another line of work in unsupervised learning from images aims to ...

Video

Our work

Avoiding trivial solution

When designing a pretext task, care must be taken to ensure that the task forces the network to extract the desired information (high-level semantics, in our case), without taking “trivial” shortcuts. In our case, low-level cues like boundary patterns or textures continuing between patches could potentially serve as such a shortcut. Hence, for the relative prediction task, it was important to include a gap between patches.

However, even these precautions are not enough: we were surprised to find that, for some images, another trivial solution exists. We traced the problem to an unexpected culprit: chromatic aberration.



Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles

Abstract

By following the principles of self-supervision, we build a convolutional neural network (CNN) that can be trained to solve Jigsaw puzzles as a?pretext?task, which requires no manual labeling, and then later repurposed(重新調整用途) to solve object classification and detection.

We show that the CFN includes fewer parameters than AlexNet while preserving the same semantic learning capabilities.

Introduction

1. vision 小樣本

However, as manually labeled data can be costly, unsupervised learning methods are gaining momentum.

2. 自監督

... have explored a novel paradigm for unsupervised learning called?self-supervised learning. The main idea is to exploit different labelings that are freely available besides or within visual data, and to use them as intrinsic reward signals to learn general-purpose features.

The features obtained with these approaches have been successfully transferred to classification and detections tasks, and their performance is very encouraging when compared to features trained in a supervised manner.

We introduce a novel self-supervised task, the?Jigsaw puzzle reassembly?problem (see Fig. 1), which builds features that yield high performance when transferred to detection and classification tasks.

3. Our work

We argue that solving Jigsaw puzzles can be used to teach a system that an object is made of parts and what these parts are. The association of each separate puzzle tile to a precise object part might be ambiguous. However, when all the tiles are observed, the ambiguities might be eliminated more easily because the tile placement is mutually exclusive. This argument is supported by our experimental validation. Training a Jigsaw puzzle solver takes about 2.5 days compared to 4 weeks of [10]. Also, there is no need to handle chromatic aberration or to build robustness to pixelation. Moreover, the features are highly transferable to detection and classification and yield the highest performance to date for an unsupervised method.

Related Work

1. 表示學習

transfer learning / pre-traing (這么說好像也可以,后面實驗)

2. 無監督學習

三類:probabilistic, direct mapping (autoencoders), and manifold learning ones

3. 自監督學習





pixelCNN

https://blog.csdn.net/Jasminexjf/article/details/82499513



弱監督

弱監督通常分為三種類型:

1 不完全監督:標簽樣本少 —— 主動學習、半監督學習、遷移學習

2 不確切監督:標簽粗粒度 —— 多示例學習

3 不準確監督:標簽有噪聲

讓機器“一葉知秋”:弱監督視覺語義分割

https://blog.csdn.net/xwukefr2tnh4/article/details/80479335

弱監督學習在醫學影像中的探索?

http://www.sohu.com/a/240831591_133098

南京大學周志華教授綜述論文:弱監督學習

A brief introduction to weakly supervised learning

最后編輯于
?著作權歸作者所有,轉載或內容合作請聯系作者
平臺聲明:文章內容(如有圖片或視頻亦包括在內)由作者上傳并發布,文章內容僅代表作者本人觀點,簡書系信息發布平臺,僅提供信息存儲服務。