今天繼續(xù)介紹dplyr包中的重要函數(shù)mutate,其基本功能為創(chuàng)建新列;mutate中的選項(xiàng)幾乎是無窮無盡的,可以通過各種函數(shù)之間的組合來對數(shù)據(jù)集做任意的處理,下面通過具體的案例來進(jìn)行演示
這次我們使用R內(nèi)置的數(shù)據(jù)集msleep,其中包括哺乳動(dòng)物的睡眠時(shí)間。讓我們首先加載包并查看數(shù)據(jù):
library(tidyverse)
msleep
name genus vore order conservation sleep_total sleep_rem sleep_cycle
<chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
1 Cheetah Acino~ carni Carn~ lc 12.1 NA NA
2 Owl mo~ Aotus omni Prim~ NA 17 1.8 NA
3 Mounta~ Aplod~ herbi Rode~ nt 14.4 2.4 NA
4 Greate~ Blari~ omni Sori~ lc 14.9 2.3 0.133
mutate 基礎(chǔ)操作
最簡單的的操作就是根據(jù)其他列中的值進(jìn)行計(jì)算。在示例代碼中,我們將睡眠數(shù)據(jù)從以小時(shí)為單位更改為分鐘為單位
msleep %>%
select(name,sleep_total) %>%
mutate(sleep_total_min = sleep_total * 60)
name sleep_total sleep_total_min
<chr> <dbl> <dbl>
1 Cheetah 12.1 726
2 Owl monkey 17 1020
3 Mountain beaver 14.4 864
下列代碼創(chuàng)建了兩列新列:一列顯示了睡眠時(shí)間與平均睡眠時(shí)間的差異,另一列顯示了與睡眠時(shí)間最少的動(dòng)物之間的差異;round( )對數(shù)據(jù)進(jìn)行四舍五入操作
msleep %>%
select(name, sleep_total) %>%
mutate(AVG = sleep_total - round(mean(sleep_total), 1),
MIN = sleep_total - min(sleep_total))
# A tibble: 83 x 4
name sleep_total AVG MIN
<chr> <dbl> <dbl> <dbl>
1 Cheetah 12.1 1.7 10.2
2 Owl monkey 17 6.6 15.1
3 Mountain beaver 14.4 4 12.5
選擇特定列按行求均值,rowwise( )說明按行進(jìn)行操作
msleep %>%
select(name, contains("sleep")) %>%
rowwise() %>%
mutate(avg = mean(c(sleep_rem,sleep_cycle)))
name sleep_total sleep_rem sleep_cycle avg
<chr> <dbl> <dbl> <dbl> <dbl>
1 Cheetah 12.1 NA NA NA
2 Owl monkey 17 1.8 NA NA
3 Mountain beaver 14.4 2.4 NA NA
4 Greater short-tail~ 14.9 2.3 0.133 1.22
通過ifelse判斷語句對數(shù)據(jù)進(jìn)行操作,如果brainwt > 4
返回NA
,不滿足此條件返回原值
msleep %>%
select(name, brainwt) %>%
mutate(brainwt2 = ifelse(brainwt > 4, NA, brainwt)) %>%
arrange(desc(brainwt))
name brainwt brainwt2
<chr> <dbl> <dbl>
1 African elephant 5.71 NA
2 Asian elephant 4.60 NA
3 Human 1.32 1.32
4 Horse 0.655 0.655
也可以結(jié)合使用stringr的功能或正則表達(dá)式來對字符串列進(jìn)行操作;
示例代碼將返回動(dòng)物名稱的最后一個(gè)單詞,并使其小寫
msleep %>%
select(name) %>%
mutate(name_last_word = tolower(str_extract(name, pattern = "\\w+$")))
name name_last_word
<chr> <chr>
1 Cheetah cheetah
2 Owl monkey monkey
3 Mountain beaver beaver
對多列同時(shí)進(jìn)行操作
- mutate_all() 將對所有列進(jìn)行操作
- mutate_if()首先需要一個(gè)返回布爾值,如果是T,則將在這些變量上執(zhí)行mutate指令
- mutate_at()要求在vars() 參數(shù)內(nèi)指定要進(jìn)行改變的列
將所有數(shù)據(jù)轉(zhuǎn)換為小寫:
msleep %>% mutate_all(tolower)
name genus vore order conservation sleep_total sleep_rem
<chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 cheetah acin~ carni carn~ lc 12.1 NA
2 owl mo~ aotus omni prim~ NA 17 1.8
3 mounta~ aplo~ herbi rode~ nt 14.4 2.4
所有列添加" /n "
msleep %>% mutate_all(~paste(., " /n "))
將" /n "
全部替換為空
msleep_ohno <- msleep %>% mutate_all(~paste(., " /n "))
msleep_ohno %>%
mutate_all(~str_replace_all(., "/n", "")) %>%
mutate_all(str_trim)
mutate_if()對數(shù)據(jù)進(jìn)行判斷
如果數(shù)據(jù)類型是數(shù)值,對其進(jìn)行四舍五入操作
msleep %>%
select(name, sleep_total:bodywt) %>%
mutate_if(is.numeric, round)
name sleep_total sleep_rem sleep_cycle awake brainwt bodywt
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Cheetah 12 NA NA 12 NA 50
2 Owl monkey 17 2 NA 7 0 0
3 Mountain beaver 14 2 NA 10 NA 1
mutate_at( )對特定列進(jìn)行操作
對列名含有sleep
的進(jìn)行操作
msleep %>%
select(name, sleep_total:awake) %>%
mutate_at(vars(contains("sleep")), ~(.*60))
name sleep_total sleep_rem sleep_cycle awake
<chr> <dbl> <dbl> <dbl> <dbl>
1 Cheetah 726 NA NA 11.9
2 Owl monkey 1020 108 NA 7
3 Mountain beaver 864 144 NA 9.6
更改列名
msleep %>%
select(name, sleep_total:awake) %>%
mutate_at(vars(contains("sleep")), ~(.*60)) %>%
rename_at(vars(contains("sleep")), ~paste0(.,"_min"))
name sleep_total_min sleep_rem_min sleep_cycle_min awake
<chr> <dbl> <dbl> <dbl> <dbl>
1 Cheetah 726 NA NA 11.9
2 Owl monkey 1020 108 NA 7
3 Mountain beaver 864 144 NA 9.6
保留原始數(shù)據(jù)
msleep %>%
select(name, sleep_total:awake) %>%
mutate_at(vars(contains("sleep")), funs(min = .*60))
name sleep_total sleep_rem sleep_cycle awake sleep_total_min sleep_rem_min sleep_cycle_min
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Cheetah 12.1 NA NA 11.9 726 NA NA
2 Owl monkey 17 1.8 NA 7 1020 108 NA
ifelse創(chuàng)建2個(gè)級別的離散列
msleep %>%
select(name, sleep_total) %>%
mutate(sleep_time = ifelse(sleep_total > 10, "long", "short"))
name sleep_total sleep_time
<chr> <dbl> <chr>
1 Cheetah 12.1 long
2 Owl monkey 17 long
3 Mountain beaver 14.4 long
case_when創(chuàng)建多級離散列
此函數(shù)在后續(xù)數(shù)據(jù)清洗中有大有,需要多多練習(xí)
msleep %>%
select(name, sleep_total) %>%
mutate(sleep_total_discr = case_when(
sleep_total > 13 ~ "very long",
sleep_total > 10 ~ "long",
sleep_total > 7 ~ "limited",
TRUE ~ "short"))
name sleep_total sleep_total_discr
<chr> <dbl> <chr>
1 Cheetah 12.1 long
2 Owl monkey 17 very long
3 Mountain beaver 14.4 very long
4 Greater short-tailed shrew 14.9 very long
將數(shù)據(jù)轉(zhuǎn)化為NA
msleep %>%
select(name:order) %>%
na_if("omni")
name genus vore order
<chr> <chr> <chr> <chr>
1 Cheetah Acinonyx carni Carnivora
2 Owl monkey Aotus NA Primates
3 Mountain beaver Aplodontia herbi Rodentia
4 Greater short-tailed shrew Blarina NA Soricomorpha