R可視化:ggplot2的基本原理和使用方法

ggplot2的基本原理和使用方法

ggplot2是R語言第三方可視化擴展包,在某種程度上它基本代替了R可視化。該包是RStudio首席科學家Hadley Wickham讀博期間的作品,它強大的畫圖邏輯使得它稱為R最流行的包之一。

Introduction

ggplot2 is based on the grammar of graphics, the idea that you can build every graph from the same few components: a data set, a set of geoms—visual marks that represent data points, and a coordinate system。

一張統計圖形就是從數據幾何對象(geometric object,縮寫geom)的圖形屬性(aesthetic attribute,縮寫aes)的一個映射。此外,圖形中還可能包含數據的統計變換(statistical transformation,縮寫stats),最后繪制在某個特定的坐標系(coordinate system,縮寫coord)中,而分面(facet)則可以用來生成數據不同子集的圖形。

To display data values, map variables in the data set to aesthetic properties of the geom like size, color, and x and y locations

Basic concept

  • 數據:data
  • 統計變換:stats
  • 幾何對象:geom
  • 圖形屬性:aes
  • 標尺:scale
  • 圖層:layer
  • 坐標系:coord
  • 分面:facet

Data and Mapping

aesthetic map variables in data to graphic properties. mappings control the relationship between data and graphic properties.

Aesthetic mapping means "something you can see"

  1. position (x, y axes)
  2. color ("outside" color)
  3. fill ("inside", color)
  4. shape (points)
  5. linetype
  6. size

Each type of geom accepts only a subset of all aesthetics-refer to the geom help pages to see what mappings each geom accepts. Aesthetic mappings are set with the aes() function.

Scale

scales map values in the data space to values in the aesthetic space(color, size, shape ...). scales are reported on the plot using axes and legends. Control aesthetic mapping.

Scales are modified with a series of functions using a scale_<aesthetic>_<type> naming scheme

  1. position
  2. color and fill
  3. size
  4. shape
  5. line type

The following arguments are common to most scales in ggplot2:

  1. name: the first argument gives the axis or legend title
  2. limits: the minimum and maximum of the scale
  3. breaks: the points along the scale where labels should appear
  4. labels: the labels that appear at each break
ggplot(housing,
       aes(x = State,
           y = Home.Price.Index)) + 
       theme(legend.position="top",
             axis.text=element_text(size = 6))+
    geom_point(aes(color = Date),
               alpha = 0.5,
               size = 1.5,
               position = position_jitter(width = 0.25, height = 0)))+
  scale_color_continuous(name="",
                         breaks = c(1976, 1994, 2013),
                         labels = c("'76", "'94", "'13"),
                         low = muted("blue"), high = muted("red"))

Geometic Objects (geom)

geometric objects are the actual marks we put on a plot

  1. points (geom_points, scatter plot, dot plot)
  2. lines (geom_lines, time series)
  3. boxplot (geom_boxplot, boxplot, barplot)

A plot must have at least one geometric object, and there is no upper limit. adding a geom by using the + operator.

Statistical Transformations

It's often useful to transform your data before plotting, and that's what statistical transformations do.

Statistic Explanation
stat_bin Statistics - (Discretizing|binning) (bin)
stat_smooth Statistic - Smooth (Function Continuity) (Soft ?)
stat_density Statistics - (Probability) Density Function (PDF)

Every geom function has a default statistic:

  1. geom_histogram = stat_bin + bar
  2. geom_smooth = stat_smooth + ribbon
  3. geom_density = stat_density + ribbon

Themes

The ggplot2 theme system handles non-data plot elements such as

  1. Axis labels
  2. Plot background
  3. Facet label backround
  4. Legend appearance

Built-in themes include:

  1. theme_gray() (default)
  2. theme_bw()
  3. theme_classc()
create a new theme
theme_new <- theme_bw() +
  theme(plot.background = element_rect(size = 1, color = "blue", fill = "black"),
        text=element_text(size = 12, family = "Serif", color = "ivory"),
        axis.text.y = element_text(colour = "purple"),
        axis.text.x = element_text(colour = "red"),
        panel.background = element_rect(fill = "pink"),
        strip.background = element_rect(fill = muted("orange")))

Facet

  • Faceting is ggplot2 parlance for small multiples
  • The idea is to create separate graphs for subsets of data
  • ggplot2 offers two functions for creating small multiples:
    • facet_wrap(): define subsets as the levels of a single grouping variable
    • facet_grid(): define subsets as the crossing of two grouping variables
  • Facilitates comparison among plots, not just of geoms within a plot
library(ggrepel)
library(ggplot2)
library(scales)

dat <- read.csv("EconomistData.csv")
mR2 <- summary(lm(HDI ~ CPI + log(CPI), data = dat))$r.squared
mR2 <- paste0(format(mR2, digits = 2), "%")
ggplot(dat,
            mapping = aes(x = CPI, y = HDI)) +
    geom_point(mapping = aes(color = Region),
               shape = 1,
               size = 4,
               stroke = 1.5) +
    geom_smooth(mapping = aes(linetype = "r2"),
                method = "lm",
                formula = y ~ x + log(x), se = FALSE,
                color = "red") +
    geom_text_repel(mapping = aes(label = Country, alpha = labels),
                    data = transform(dat,
                                     labels = Country %in% c("Russia",
                                                             "Venezuela",
                                                             "Iraq",
                                                             "Mayanmar",
                                                             "Sudan",
                                                             "Afghanistan",
                                                             "Congo",
                                                             "Greece",
                                                             "Argentinia",
                                                             "Italy",
                                                             "Brazil",
                                                             "India",
                                                             "China",
                                                             "South Africa",
                                                             "Spain",
                                                             "Cape Verde",
                                                             "Bhutan",
                                                             "Rwanda",
                                                             "France",
                                                             "Botswana",
                                                             "France",
                                                             "US",
                                                             "Germany",
                                                             "Britain",
                                                             "Barbados",
                                                             "Japan",
                                                             "Norway",
                                                             "New Zealand",
                                                             "Sigapore"))) +
    scale_x_continuous(name = "Corruption Perception Index, 2011 (10=least corrupt)",
                       limits = c(1.0, 10.0),
                       breaks = 1:10) +
    scale_y_continuous(name = "Human Development Index, 2011 (1=best)",
                       limits = c(0.2, 1.0),
                       breaks = seq(0.2, 1.0, by = 0.1)) +
    scale_color_manual(name = "",
                       values = c("#24576D",
                                  "#099DD7",
                                  "#28AADC",
                                  "#248E84",
                                  "#F2583F",
                                  "#96503F"),
                       guide = guide_legend(nrow = 1)) +
    scale_alpha_discrete(range = c(0, 1),
                         guide = FALSE) +
    scale_linetype(name = "",
                   breaks = "r2",
                   labels = list(bquote(R^2==.(mR2))),
                   guide = guide_legend(override.aes = list(linetype = 1, size = 2, color = "red"))) +
    ggtitle("Corruption and human development") +
    theme_bw() +
    theme(panel.border = element_blank(),
          panel.grid = element_blank(),
          panel.grid.major.y = element_line(color = "gray"),
          axis.line.x = element_line(color = "gray"),
          axis.text = element_text(face = "italic"),
          legend.position = "top",
          legend.direction = "horizontal",
          legend.box = "horizontal",
          legend.text = element_text(size = 12),
          plot.title = element_text(size = 16, face = "bold"))

參考

  1. ggplot2
  2. ggplot2 packages
  3. ggplot2簡介
最后編輯于
?著作權歸作者所有,轉載或內容合作請聯系作者
平臺聲明:文章內容(如有圖片或視頻亦包括在內)由作者上傳并發布,文章內容僅代表作者本人觀點,簡書系信息發布平臺,僅提供信息存儲服務。
禁止轉載,如需轉載請通過簡信或評論聯系作者。