嗯灬啊灬把腿张开灬所有漫画,日韩精品无码一区二区三区不卡,宝贝乖h调教跪趴sm主人

老規(guī)矩，先奉上學(xué)習(xí)資料鏈接：

https://bioconductor.org/packages/release/bioc/vignettes/MutationalPatterns/inst/doc/Introduction_to_MutationalPatterns.html

這一次學(xué)習(xí)Genomic distribution章節(jié)部分的內(nèi)容。

突變不是隨機(jī)分布在整個(gè)基因組中。通過MutationalPatterns，您可以可視化突變是如何在整個(gè)基因組中分布的。您還可以查看特定的基因組區(qū)域，如啟動(dòng)子、CTCF結(jié)合位點(diǎn)和轉(zhuǎn)錄因子結(jié)合位點(diǎn)。在這些區(qū)域內(nèi)，你可以尋找突變的富集/耗盡，你可以尋找它們之間突變譜的差異。

一、 Rainfall plot

降雨圖顯示突變類型和突變間隔。降雨圖可以用來可視化突變沿基因組或染色體子集的分布。

y軸對應(yīng)于突變與前一個(gè)突變的距離，并進(jìn)行l(wèi)og10變換。圖中的下拉框表示突變的集群或“熱點(diǎn)”。可以為snv、indeds、DBSs和mbs制作降雨圖。

在這個(gè)例子中，我們對一個(gè)樣本的常染色體做了一個(gè)降雨圖：

library(BSgenome)
ref_genome <- "BSgenome.Hsapiens.UCSC.hg19"
library(ref_genome, character.only = TRUE)
vcf_files <- list.files(system.file("extdata", package = "MutationalPatterns"),
  pattern = "sample.vcf", full.names = TRUE)
sample_names <- c("colon1", "colon2", "colon3","intestine1", "intestine2", "intestine3","liver1", "liver2", "liver3")
tissue <- c(rep("colon", 3), rep("intestine", 3), rep("liver", 3))

grl <- read_vcfs_as_granges(vcf_files, sample_names, ref_genome)

## Rainfall plot

# Define autosomal chromosomes
chromosomes <- seqnames(get(ref_genome))[1:22]

# Make a rainfall plot
p <- plot_rainfall(grl[[1]], title = names(grl[1]), chromosomes = chromosomes, cex = 1.5, ylim = 1e+09 )
ggsave(filename = paste0(opt$od,"/rainfall.png"), width = 8, height = 5, plot = p)

image-20231024201847902.png

二、Define genomic regions

要查看特定類型的基因組區(qū)域，首先需要在一個(gè)名為GRangesList的列表中定義它們。您可以使用自己的基因組區(qū)域定義(例如基于ChipSeq實(shí)驗(yàn))，也可以使用公開可用的基因組注釋數(shù)據(jù)，如下例所示。

下面的示例展示了如何使用Biocpkg(“biomaRt”)從Ensembl下載基因組構(gòu)建hg19的啟動(dòng)子、CTCF結(jié)合位點(diǎn)和轉(zhuǎn)錄因子結(jié)合位點(diǎn)區(qū)域。有關(guān)其他數(shù)據(jù)集，請參閱biomaRt文檔。

下載基因組區(qū)域：

注意:這里我們通過加載示例數(shù)據(jù)的結(jié)果采取了一些捷徑。下載這些數(shù)據(jù)的相應(yīng)代碼可以在我們運(yùn)行的命令上面找到：

library(biomaRt)

# regulatory <- useEnsembl(biomart="regulation",
#                          dataset="hsapiens_regulatory_feature",
#                          GRCh = 37)

## Download the regulatory CTCF binding sites and convert them to
## a GRanges object.
# CTCF <- getBM(attributes = c('chromosome_name',
#                             'chromosome_start',
#                             'chromosome_end',
#                             'feature_type_name'),
#              filters = "regulatory_feature_type_name",
#              values = "CTCF Binding Site",
#              mart = regulatory)
#
# CTCF_g <- reduce(GRanges(CTCF$chromosome_name,
#                 IRanges(CTCF$chromosome_start,
#                 CTCF$chromosome_end)))

CTCF_g <- readRDS(system.file("states/CTCF_g_data.rds", package = "MutationalPatterns"))
CTCF_g

## Download the promoter regions and convert them to a GRanges object.

# promoter = getBM(attributes = c('chromosome_name', 'chromosome_start',
#                                 'chromosome_end', 'feature_type_name'),
#                  filters = "regulatory_feature_type_name",
#                  values = "Promoter",
#                  mart = regulatory)
# promoter_g = reduce(GRanges(promoter$chromosome_name,
#                     IRanges(promoter$chromosome_start,
#                             promoter$chromosome_end)))

promoter_g <- readRDS(system.file("states/promoter_g_data.rds",package = "MutationalPatterns"))
promoter_g

## Download the promoter flanking regions and convert them to a GRanges object.

# flanking = getBM(attributes = c('chromosome_name',
#                                 'chromosome_start',
#                                 'chromosome_end',
#                                 'feature_type_name'),
#                  filters = "regulatory_feature_type_name",
#                  values = "Promoter Flanking Region",
#                  mart = regulatory)
# flanking_g = reduce(GRanges(
#                        flanking$chromosome_name,
#                        IRanges(flanking$chromosome_start,
#                        flanking$chromosome_end)))

flanking_g <- readRDS(system.file("states/promoter_flanking_g_data.rds", package = "MutationalPatterns"))
flanking_g

將所有基因組區(qū)域(GRanges對象)組合在一個(gè)命名的GrangesList中

regions <- GRangesList(promoter_g, flanking_g, CTCF_g)
names(regions) <- c("Promoter", "Promoter flanking", "CTCF")
regions

確保這些區(qū)域使用與突變數(shù)據(jù)相同的染色體命名約定:

seqlevelsStyle(regions) <- "UCSC"

三、 Enrichment or depletion of mutations in genomic regions

有必要在每個(gè)樣本的分析中包括一個(gè)區(qū)域范圍的列表，即基因組中你有足夠高質(zhì)量的reads來支持識別的突變。這可以使用例如GATK的CallableLoci來確定。

如果你不把調(diào)查區(qū)域包括在你的分析中，你可能會看到一個(gè)特定基因組區(qū)域的突變減少，這僅僅是該區(qū)域低覆蓋率的結(jié)果，因此并不代表實(shí)際的突變減少。

我們在包中提供了一個(gè)示例調(diào)查區(qū)域數(shù)據(jù)文件。為簡單起見，這里我們對每個(gè)示例使用相同的調(diào)查文件。為了進(jìn)行正確的分析，確定每個(gè)樣本的調(diào)查區(qū)域并在分析中使用這些區(qū)域。

加載示例調(diào)查區(qū)域數(shù)據(jù)：

## Get the filename with surveyed/callable regions
surveyed_file <- system.file("extdata/callableloci-sample.bed", package = "MutationalPatterns")

## Import the file using rtracklayer and use the UCSC naming standard
library(rtracklayer)
surveyed <- import(surveyed_file)
seqlevelsStyle(surveyed) <- "UCSC"

## For this example we use the same surveyed file for each sample.
surveyed_list <- rep(list(surveyed), 9)

首先，您需要計(jì)算每個(gè)樣本的每個(gè)基因組區(qū)域中觀察到的突變數(shù)量和預(yù)期突變數(shù)量。

distr <- genomic_distribution(grl, surveyed_list, regions)

接下來，可以使用雙側(cè)二項(xiàng)檢驗(yàn)來測試定義的基因組區(qū)域中突變的富集或減少。在這個(gè)測試中，觀察到突變的概率被計(jì)算為突變的總數(shù)除以被調(diào)查堿基的總數(shù)，并進(jìn)行多次測試校正。fdr和p值的顯著性截止值可以用與strand_bias_test相同的方式更改。在本例中，我們按組織類型執(zhí)行富集/減少試驗(yàn)。

distr_test <- enrichment_depletion_test(distr, by = tissue)
head(distr_test)

image-20231024203544536.png

最后，您可以繪制結(jié)果。星號表示顯著的突變富集/減少。這里我們使用p值來繪制星號。

p <- plot_enrichment_depletion(distr_test, sig_type = "p")
ggsave(filename = paste0(opt$od,"/enrichment_depletion.png"), width = 8, height = 5, plot = p)

image-20231024203800286.png

四、Mutational patterns of genomic regions

1）根據(jù)基因組區(qū)域分裂突變

你也可以看看基因組區(qū)域的突變模式。然而，請記住，突變很少的區(qū)域?qū)?dǎo)致不太可靠的結(jié)果。

首先，可以根據(jù)定義的基因組區(qū)域拆分包含突變的GRangesList。

grl_region <- split_muts_region(grl, regions)
names(grl_region)

image-20231024204116012.png

現(xiàn)在，您可以將這些樣本/區(qū)域組合視為完全獨(dú)立的樣本。例如，你可以對這些進(jìn)行NMF，試圖識別特定于某些基因組區(qū)域的特征。

mut_mat_region <- mut_matrix(grl_region, ref_genome)
nmf_res_region <- extract_signatures(mut_mat_region, rank = 2, nrun = 10, single_core = TRUE)

signatures = get_known_signatures()
nmf_res_region <- rename_nmf_signatures(nmf_res_region, signatures, cutoff = 0.85)
p <- plot_contribution_heatmap(nmf_res_region$contribution, cluster_samples = TRUE, cluster_sigs = TRUE)
ggsave(filename = paste0(opt$od,"/enrichment_depletion_nmf.png"), width = 8, height = 5, plot = p)

image-20231024222444983.png

2）Mutation Spectrum

也可以使用plot_spectrum_region函數(shù)繪制每個(gè)基因組區(qū)域的譜，而不是將樣品/區(qū)域組合作為單獨(dú)的樣本處理。plot_spectrum的參數(shù)也可用于此函數(shù)。默認(rèn)情況下，y軸表示變異數(shù)除以該樣本和基因組區(qū)域中的變異總數(shù)。這樣，突變很少的區(qū)域的譜可以更容易地與突變多的區(qū)域進(jìn)行比較。

type_occurrences_region <- mut_type_occurrences(grl_region, ref_genome)
p <- plot_spectrum_region(type_occurrences_region)
ggsave(filename = paste0(opt$od,"/enrichment_depletion_spectrum.png"), width = 8, height = 5, plot = p)

image-20231024222815793.png

還可以在y軸上繪制變異數(shù)除以該樣本中變異總數(shù)的圖。在這種情況下，你不會對每個(gè)基因組區(qū)域的變異數(shù)量進(jìn)行標(biāo)準(zhǔn)化。如下圖所示，本例中絕大多數(shù)突變發(fā)生在“其他”區(qū)域。

p <- plot_spectrum_region(type_occurrences_region, mode = "relative_sample")
ggsave(filename = paste0(opt$od,"/enrichment_depletion_spectrum.relative_sample.png"), width = 8, height = 5, plot = p)

image-20231024223229507.png

3）Mutational profiles

除了繪制spectra譜外，還可以繪制突變profile。要做到這一點(diǎn)，首先需要制作一個(gè)“長”突變矩陣。在這個(gè)矩陣中，不同的基因組區(qū)域被認(rèn)為是不同的突變類型，而不是像以前那樣作為不同的樣本。

mut_mat_region <- mut_matrix(grl_region, ref_genome)
mut_mat_long <- lengthen_mut_matrix(mut_mat_region)
mut_mat_long[1:5, 1:5]

image-20231024225039845.png

現(xiàn)在可以使用plot_profile_region繪制它。plot_96_profile的參數(shù)也可用于此函數(shù)。y軸的選項(xiàng)與plot_spectrum_region相同。然而，默認(rèn)情況下，沒有對每個(gè)基因組區(qū)域的變體數(shù)量進(jìn)行標(biāo)準(zhǔn)化，因?yàn)槊糠N突變類型的突變數(shù)量通常是有限的。

p <- plot_profile_region(mut_mat_long[, c(1, 4, 7)])
ggsave(filename = paste0(opt$od,"/enrichment_depletion_profile.png"), width = 8, height = 5, plot = p)

image-20231024225244375.png

4）Mutation density

在上面的例子中，我們使用了已知的功能，如區(qū)域的啟動(dòng)子。也可以根據(jù)突變密度來定義區(qū)域。你可以把基因組分成3個(gè)不同突變密度的箱子，像這樣：

regions_density <- bin_mutation_density(grl, ref_genome, nrbins = 3)
names(regions_density) <- c("Low", "Medium", "High")

這些區(qū)域可以像前面的區(qū)域一樣使用。例如，將kataegis區(qū)域的spectrum與基因組其余部分的spectrum進(jìn)行比較，這可能是有用的。

grl_region_dens <- split_muts_region(grl, regions_density, include_other = FALSE)

五、無監(jiān)督的局部突變模式

區(qū)域突變模式也可以使用帶有determine_regional_similarity函數(shù)的無監(jiān)督方法進(jìn)行研究。該函數(shù)使用滑動(dòng)窗口方法來計(jì)算全局突變譜和較小基因組窗口的突變譜之間的余弦相似度，從而允許對具有突變譜的區(qū)域進(jìn)行無偏識別，這些區(qū)域與基因組的其余部分不同。由于這個(gè)函數(shù)的無偏方法，它在包含至少100,000個(gè)substitutions的大型數(shù)據(jù)集上工作得最好。

首先我們把所有的樣品組合在一起。通常，您只會對來自相同癌癥類型/組織的樣本進(jìn)行此操作，但由于示例數(shù)據(jù)中的substitutions數(shù)量有限，因此在這里我們將所有內(nèi)容結(jié)合起來。

gr = unlist(grl)

接下來，與基因組其他部分不同的突變模式區(qū)域被識別出來。這里我們使用一個(gè)較小的窗口大小，因?yàn)槭纠龜?shù)據(jù)的大小較小。在實(shí)踐中，窗口大小為100或更多效果更好。

regional_sims <- determine_regional_similarity(gr,
                                               ref_genome, 
                                               chromosomes = c("chr1", "chr2", "chr3", "chr4", "chr5", "chr6"),
                                               window_size = 40,
                                               stepsize = 10,
                                               max_window_size_gen = 40000000 )

determine_regional_similarity的結(jié)果可以可視化。每個(gè)點(diǎn)表示單個(gè)窗口的突變譜與基因組其余部分之間的余弦相似度。具有不同突變譜的區(qū)域?qū)⒕哂休^低的余弦相似度。這些點(diǎn)是根據(jù)窗戶的大小來著色的。這個(gè)大小是窗口中第一個(gè)和最后一個(gè)突變之間的距離。

p <- plot_regional_similarity(regional_sims)
ggsave(filename = paste0(opt$od,"/regional_similarity.png"), width = 8, height = 5, plot = p)

image-20231024231132809.png

這個(gè)包的內(nèi)容很多，這次學(xué)習(xí)只能粗略過一遍了，后面還有一些，下次見~

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美国产综合欧美视频

2023-MutationalPatterns包學(xué)習(xí)筆記（五）-Genomic distribution

2023-MutationalPatterns包學(xué)習(xí)筆記（五）-Genomic distribution

一、 Rainfall plot

二、Define genomic regions

三、 Enrichment or depletion of mutations in genomic regions

四、Mutational patterns of genomic regions

1）根據(jù)基因組區(qū)域分裂突變

2）Mutation Spectrum

3）Mutational profiles

4）Mutation density

五、無監(jiān)督的局部突變模式

推薦閱讀更多精彩內(nèi)容

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美 国产 综合 欧美 视频

2023-MutationalPatterns包學(xué)習(xí)筆記（五）-Genomic distribution

一、 Rainfall plot

二、Define genomic regions

三、 Enrichment or depletion of mutations in genomic regions

四、Mutational patterns of genomic regions

1）根據(jù)基因組區(qū)域分裂突變

2）Mutation Spectrum

3）Mutational profiles

4）Mutation density

五、無監(jiān)督的局部突變模式

推薦閱讀更多精彩內(nèi)容

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美国产综合欧美视频