亚洲韩国精品无码一区二区三区,全是肉的高h短篇列车,男女一边摸一边做爽爽电视

前言

ggplot是一個擁有一套完備語法且容易上手的繪圖系統，在Python和R中都能引入并使用，在數據分析可視化領域擁有極為廣泛的應用。本篇從R的角度介紹如何使用ggplot2包，首先給幾個我覺得最值得推薦的理由：

采用“圖層”疊加的設計方式，一方面可以增加不同的圖之間的聯系，另一方面也有利于學習和理解該package，photoshop的老玩家應該比較能理解這個帶來的巨大便利
適用范圍廣，擁有詳盡的文檔，通過?和對應的函數即可在R中找到函數說明文檔和對應的實例
在R和Python中均可使用，降低兩門語言之間互相過度的學習成本

基本概念

本文采用ggplot2的自帶數據集diamonds。

> head(diamonds)
# A tibble: 6 x 10
  carat cut       color clarity depth table price     x     y     z
  <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
1 0.23  Ideal     E     SI2      61.5    55   326  3.95  3.98  2.43
2 0.21  Premium   E     SI1      59.8    61   326  3.89  3.84  2.31
3 0.23  Good      E     VS1      56.9    65   327  4.05  4.07  2.31
4 0.290 Premium   I     VS2      62.4    58   334  4.2   4.23  2.63
5 0.31  Good      J     SI2      63.3    58   335  4.34  4.35  2.75
6 0.24  Very Good J     VVS2     62.8    57   336  3.94  3.96  2.48

# 變量含義
price  : price in US dollars (\$326–\$18,823)
carat  : weight of the diamond (0.2–5.01)
cut    :   quality of the cut (Fair, Good, Very Good, Premium, Ideal)
color  : diamond colour, from D (best) to J (worst)
clarity: a measurement of how clear the diamond is (I1 (worst), SI2, SI1, VS2, VS1, VVS2, VVS1, IF (best))
x      : length in mm (0–10.74)
y      : width in mm (0–58.9)
z      : depth in mm (0–31.8)
depth  : total depth percentage = z / mean(x, y) = 2 * z / (x + y) (43–79)
table  : width of top of diamond relative to widest point (43–95)

基于圖層和畫布的概念，ggplot2引申出如下的語法框架：

圖源：https://mp.weixin.qq.com/s/uskZWGAwfK9BVqLBQIXpGA

image.png

data：數據源，一般是data.frame結構，否則會被轉化為該結構
個性映射與共性映射：ggplot()中的mapping = aes()參數屬于共性映射，會被之后的geom_xxx()和stat_xxx()所繼承，而geom_xxx()和stat_xxx()中的映射參數屬于個性映射，僅作用于內部
mapping：映射，包括顏色類型映射color;fill、形狀類型映射linetype;size;shape和位置類型映射x,y等
geom_xxx：幾何對象，常見的包括點圖、折線圖、柱形圖和直方圖等，也包括輔助繪制的曲線、斜線、水平線、豎線和文本等
aesthetic attributes：圖形參數，包括colour;size;hape等
facetting：分面，將數據集劃分為多個子集subset，然后對于每個子集都繪制相同的圖表
theme：指定圖表的主題

ggplot(data = NALL, mapping = aes(x = , y = )) +   # 數據集
    geom_xxx()|stat_xxx() +       # 幾何圖層/統計變換
    coord_xxx() +  # 坐標變換, 默認笛卡爾坐標系     
    scale_xxx() +  # 標度調整, 調整具體的標度  
    facet_xxx() +  # 分面, 將其中一個變量進行分面變換  
    guides() +     # 圖例調整
    theme()        # 主題系統

這些概念可以等看完全文再回過頭看，相當于一個匯總，這些概念都掌握了基本ggplot2的核心邏輯也就理解了

一些核心概念的含義可以從RStudio官方的cheat sheet圖中大致得知：

image.png

一些栗子

通過實例和RCode從淺到深介紹ggplot2的語法。

1. 五臟俱全的散點圖

library(ggplot2)

# 表明我們使用diamonds數據集, 
ggplot(diamonds) + 
  # 繪制散點圖: 橫坐標x為depth, 縱坐標y為price, 點的顏色通過color列區分,alpha透明度,size點大小,shape形狀(實心正方形),stroke點邊框的寬度
  geom_point(aes(x = carat, y = price, colour = color), alpha=0.7, size=1.0, shape=15, stroke=1) +
  # 添加擬合線
  geom_smooth(aes(x = carat, y = price), method = 'glm') +
  # 添加水平線
  geom_hline(yintercept = 0, size = 1, linetype = "dotted", color = "black") +
  # 添加垂直線
  geom_vline(xintercept = 3, size = 1, linetype = "dotted", color = "black") +
  # 添加坐標軸與圖像標題
  labs(title = "Diamonds Point Plot", x = "Carat", y = "Price") +
  # 調整坐標軸的顯示范圍
  coord_cartesian(xlim = c(0, 3), ylim = c(0, 20000)) +
  # 更換主題, 這個主題比較簡潔, 也可以在ggthemes包中獲取其他主題
  theme_linedraw()

image.png

2. 自定義圖片布局&多種幾何繪圖

library(gridExtra)
#建立數據集
df <- data.frame(
  x = c(3, 1, 5),
  y = c(2, 4, 6),
  label = c("a","b","c")
)  

p <- ggplot(df, aes(x, y, label = label)) +
  # 去掉橫坐標信息
  labs(x = NULL, y = NULL) +
  # 切換主題
  theme_linedraw()

p1 <- p + geom_point() + ggtitle("point")
p2 <- p + geom_text() + ggtitle("text")
p3 <- p + geom_bar(stat = "identity") + ggtitle("bar")
p4 <- p + geom_tile() + ggtitle("raster")
p5 <- p + geom_line() + ggtitle("line")
p6 <- p + geom_area() + ggtitle("area")
p7 <- p + geom_path() + ggtitle("path")
p8 <- p + geom_polygon() + ggtitle("polygon")

# 構造ggplot圖片列表
plots <- list(p1, p2, p3, p4, p5, p6, p7, p8)
# 自定義圖片布局
gridExtra::grid.arrange(grobs = plots, ncol = 4)

image.png

3. 箱線圖

統計學中展示數據分散情況的直觀圖形，在探索性分析中常常用于展示在某個因子型變量下因變量的分散程度。

下面展示箱線圖最長使用的一些方法：

library(ggplot2) # 繪圖
library(ggsci)   # 使用配色

# 使用diamonds數據框, 分類變量為cut, 目標變量為depth
p <- ggplot(diamonds, aes(x = cut, y = carat)) +
  theme_linedraw()

# 一個因子型變量時, 直接用顏色區分不同類別, 后面表示將圖例設置在右上角
p1 <- p + geom_boxplot(aes(fill = cut)) + theme(legend.position = "None")
# 兩個因子型變量時, 可以將其中一個因子型變量設為x, 將另一個因子型變量設為用圖例顏色區分
p2 <- p + geom_boxplot(aes(fill = color)) + theme(legend.position = "None")
# 將箱線圖進行轉置
p3 <- p + geom_boxplot(aes(fill = cut)) + coord_flip() + theme(legend.position = "None")
# 使用現成的配色方案: 包括scale_fill_jama(), scale_fill_nejm(), scale_fill_lancet(), scale_fill_brewer()(藍色系)
p4 <- p + geom_boxplot(aes(fill = cut)) + scale_fill_brewer() + theme(legend.position = "None")

# 構造ggplot圖片列表
plots <- list(p1, p2, p3, p4)
# 自定義圖片布局
gridExtra::grid.arrange(grobs = plots, ncol = 2)

image.png

當研究某個連續型變量的箱線圖涉及多個離散型分類變量時，我們常使用分面facetting來提高圖表的可視性。

library(ggplot2)

ggplot(diamonds, aes(x = color, y = carat)) +
  # 切換主題
  theme_linedraw() +
  # 箱線圖顏色根據因子型變量color填色
  geom_boxplot(aes(fill = color)) +
  # 分面: 本質上是將數據框按照因子型變量color類劃分為多個子數據集subset, 在每個子數據集上繪制相同的箱線圖
  # 注意一般都要加scales="free", 否則子數據集數據尺度相差較大時會被拉扯開
  facet_wrap(~cut, scales="free")

image.png

4. 直方圖

library(ggplo2)

# 普通的直方圖
p1 <- ggplot(data = diamonds) + 
  geom_bar(mapping = aes(x = cut, fill = cut)) + 
  theme_linedraw() +
  scale_fill_brewer()

# 堆積直方圖
p2 <- ggplot(data = diamonds) + 
  geom_bar(mapping = aes(x = cut, fill = clarity), position = "identity") + 
  theme_linedraw() +
  scale_fill_brewer()
  
# 累積直方圖
p3 <- ggplot(data = diamonds) + 
  geom_bar(mapping = aes(x = cut, fill = clarity), position = "fill") + 
  theme_linedraw() +
  scale_fill_brewer()

# 分類直方圖
p4 <- ggplot(data = diamonds) + 
  geom_bar(mapping = aes(x = cut, fill = clarity), position = "dodge") + 
  theme_linedraw() +
  scale_fill_brewer()

# 構造ggplot圖片列表
plots <- list(p1, p2, p3, p4)
# 自定義圖片布局
gridExtra::grid.arrange(grobs = plots, ncol = 2)

image.png

5. 坐標系統

除了前面箱線圖使用的coord_flip()方法實現了坐標軸轉置，ggplot還提供了很多和坐標系統相關的功能。

library(ggplot2)

bar <- ggplot(data = diamonds) + 
  geom_bar(mapping = aes(x = cut, fill = cut), show.legend = FALSE, width = 1) + 
  # 指定比率: 長寬比為1, 便于展示圖形
  theme(aspect.ratio = 1) +
  scale_fill_brewer() +
  labs(x = NULL, y = NULL)

# 坐標軸轉置
bar1 <- bar + coord_flip()
# 繪制極坐標
bar2 <- bar + coord_polar()

# 構造ggplot圖片列表
plots <- list(bar1, bar2)
# 自定義圖片布局
gridExtra::grid.arrange(grobs = plots, ncol = 2)

image.png

6. 瓦片圖、熱力圖

機器學習中探索性分析我們可以通過corrplot直接繪制所有變量的相關系數圖，用于判斷總體的相關系數情況。

library(corrplot)
#計算數據集的相關系數矩陣并可視化
mycor = cor(mtcars)
corrplot(mycor, tl.col = "black")

image.png

ggplot提供了更加個性化的瓦片圖繪制：

library(RColorBrewer)
# 生成相關系數矩陣
corr <- round(cor(mtcars), 2)
df <- reshape2::melt(corr)
p1 <- ggplot(df, aes(x = Var1, y = Var2, fill = value, label = value)) +
  geom_tile() +
  theme_bw() +
  geom_text(aes(label = value, size = 0.3), color = "white") +
  labs(title = "mtcars - Correlation plot") +
  theme(text = element_text(size = 10), legend.position = "none", aspect.ratio = 1)
p2 <- p1 + scale_fill_distiller(palette = "Reds")
p3 <- p1 + scale_fill_gradient2()
gridExtra::grid.arrange(p1, p2, p3, ncol=3)

image.png

其他文章

1. 機器學習必知必會與算法原理

機器學習導論：什么是機器學習
 機器學習必知必會：凸優化
 深入淺出機器學習算法：XGBoost
機器學習必知必會：梯度下降法

2. 數據分析和爬蟲案例

Python數據分析：誰是2018當之無愧的“第一”國產電影
 如何用python爬蟲實現簡單PV刷量——以CSDN為例
 python腳本從零到一構建自己的免費代理IP池

Reference

[1] https://ggplot2-book.org/introduction.html#welcome-to-ggplot2
[2] https://rstudio.com/resources/cheatsheets/
[3] https://r4ds.had.co.nz/data-visualisation.html
[4] https://www.sohu.com/a/320024110_718302

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美国产综合欧美视频

[R語言]數據可視化的最佳解決方案：ggplot2

[R語言]數據可視化的最佳解決方案：ggplot2

前言

基本概念

一些栗子

1. 五臟俱全的散點圖

2. 自定義圖片布局&多種幾何繪圖

3. 箱線圖

4. 直方圖

5. 坐標系統

6. 瓦片圖、熱力圖

更多例子

其他文章

1. 機器學習必知必會與算法原理

2. 數據分析和爬蟲案例

3. 相關經驗

Reference

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美 国产 综合 欧美 视频

[R語言]數據可視化的最佳解決方案：ggplot2

前言

基本概念

一些栗子

1. 五臟俱全的散點圖

2. 自定義圖片布局&多種幾何繪圖

3. 箱線圖

4. 直方圖

5. 坐標系統

6. 瓦片圖、 熱力圖

更多例子

其他文章

1. 機器學習必知必會與算法原理

2. 數據分析和爬蟲案例

3. 相關經驗

Reference

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美国产综合欧美视频

6. 瓦片圖、熱力圖