美文网首页
phyloseq: Explore microbiome pro

phyloseq: Explore microbiome pro

作者: 超人快飞 | 来源:发表于2020-04-13 17:40 被阅读0次

本节主要是在PhyloseqTutorials学习Gap Statistic的函数功能.


Gap Statistic

How many clusters are there?


在cluster的包中的clusGap函数计算聚类度量的优度,称为“gap”统计量。对于有很多的研究对象时,要想直接找出他们之间的差异,并且能够最快的把他们分组,并发现其中的规律,可以利用本章中的clusGap()的统计方法。R语言聚类分析--cluster,factoextra的分析方法也可以用来进行进行分析。


First perform an ordination

在这个例子中,利用 Bray-Curtis距离

library("phyloseq"); packageVersion("phyloseq")
library("cluster"); packageVersion("cluster")
library("ggplot2"); packageVersion("ggplot2")
theme_set(theme_bw())
# Load data
data(enterotype)
# ordination
exord = ordinate(enterotype, method="MDS", distance="jsd")

Compute Gap Statistic

#Partitioning Around Medoids(缩写为pam,围绕中心点划分),x为距离矩阵或者数据框
pam1 = function(x, k){list(cluster = pam(x,k, cluster.only=TRUE))}
x = phyloseq:::scores.pcoa(exord, display="sites")
#使用下列函数直接计算最有的k值
#gskmn=clusGap[x[,1:2],FUN=kmeans,nsatrt=20,k.max=6,B=500)
gskmn = clusGap(x[, 1:2], FUN=pam1, K.max = 6, B = 50)

将Gap statistic的函数“附加”到phyloseq中

gap_statistic_ordination = function(ord, FUNcluster, type="sites", K.max=6, axes=c(1:2), B=500, verbose=interactive(), ...){
    require("cluster")
    #   If "pam1" was chosen, use this internally defined call to pam
    if(FUNcluster == "pam1"){
        FUNcluster = function(x,k) list(cluster = pam(x, k, cluster.only=TRUE))     
    }
    # Use the scores function to get the ordination coordinates
    x = phyloseq:::scores.pcoa(ord, display=type)
    #   If axes not explicitly defined (NULL), then use all of them
    if(is.null(axes)){axes = 1:ncol(x)}
    #   Finally, perform, and return, the gap statistic calculation using cluster::clusGap  
    clusGap(x[, axes], FUN=FUNcluster, K.max=K.max, B=B, verbose=verbose, ...)
}

Plot Results

利用Function函数定义一个对结果的绘图方法

plot_clusgap = function(clusgap, title="Gap Statistic calculation results"){
    require("ggplot2")
    gstab = data.frame(clusgap$Tab, k=1:nrow(clusgap$Tab))
    p = ggplot(gstab, aes(k, gap)) + geom_line() + geom_point(size=5)
    p = p + geom_errorbar(aes(ymax=gap+SE.sim, ymin=gap-SE.sim))
    p = p + ggtitle(title)
    return(p)
}

试验这个函数,(特别注意的是:不将phyloseq定义的scores扩展作为常规函数导出以避免冲突,因此只能通过前面的phyloseq:::前缀来访问由phyloseq定义的scores扩展.

gs = gap_statistic_ordination(exord, "pam1", B=50, verbose=FALSE)
print(gs, method="Tibs2001SEmax")
plot_clusgap(gs)

和基础的绘图图形进行比较

plot(gs, main = "Gap statistic for the 'Enterotypes' data")
mtext("Looks like 4 clusters is best, with 3 and 5 close runners up.")  

相关文章

网友评论

      本文标题:phyloseq: Explore microbiome pro

      本文链接:https://www.haomeiwen.com/subject/kbhqmhtx.html