- phyloseq: Explore microbiome pro
- phyloseq: Explore microbiome pro
- phyloseq: Explore microbiome pro
- phyloseq: Explore microbiome pro
- phyloseq: Explore microbiome pro
- phyloseq: Explore microbiome pro
- phyloseq: Explore microbiome pro
- phyloseq: Explore microbiome pro
- phyloseq: Explore microbiome pro
- phyloseq: Explore microbiome pro
本节主要是在PhyloseqTutorials学习Gap Statistic的函数功能.
Gap Statistic
How many clusters are there?
在cluster的包中的clusGap函数计算聚类度量的优度,称为“gap”统计量。对于有很多的研究对象时,要想直接找出他们之间的差异,并且能够最快的把他们分组,并发现其中的规律,可以利用本章中的clusGap()的统计方法。R语言聚类分析--cluster,factoextra的分析方法也可以用来进行进行分析。
First perform an ordination
在这个例子中,利用 Bray-Curtis距离
library("phyloseq"); packageVersion("phyloseq")
library("cluster"); packageVersion("cluster")
library("ggplot2"); packageVersion("ggplot2")
theme_set(theme_bw())
# Load data
data(enterotype)
# ordination
exord = ordinate(enterotype, method="MDS", distance="jsd")
Compute Gap Statistic
#Partitioning Around Medoids(缩写为pam,围绕中心点划分),x为距离矩阵或者数据框
pam1 = function(x, k){list(cluster = pam(x,k, cluster.only=TRUE))}
x = phyloseq:::scores.pcoa(exord, display="sites")
#使用下列函数直接计算最有的k值
#gskmn=clusGap[x[,1:2],FUN=kmeans,nsatrt=20,k.max=6,B=500)
gskmn = clusGap(x[, 1:2], FUN=pam1, K.max = 6, B = 50)
将Gap statistic的函数“附加”到phyloseq中
gap_statistic_ordination = function(ord, FUNcluster, type="sites", K.max=6, axes=c(1:2), B=500, verbose=interactive(), ...){
require("cluster")
# If "pam1" was chosen, use this internally defined call to pam
if(FUNcluster == "pam1"){
FUNcluster = function(x,k) list(cluster = pam(x, k, cluster.only=TRUE))
}
# Use the scores function to get the ordination coordinates
x = phyloseq:::scores.pcoa(ord, display=type)
# If axes not explicitly defined (NULL), then use all of them
if(is.null(axes)){axes = 1:ncol(x)}
# Finally, perform, and return, the gap statistic calculation using cluster::clusGap
clusGap(x[, axes], FUN=FUNcluster, K.max=K.max, B=B, verbose=verbose, ...)
}
Plot Results
利用Function函数定义一个对结果的绘图方法
plot_clusgap = function(clusgap, title="Gap Statistic calculation results"){
require("ggplot2")
gstab = data.frame(clusgap$Tab, k=1:nrow(clusgap$Tab))
p = ggplot(gstab, aes(k, gap)) + geom_line() + geom_point(size=5)
p = p + geom_errorbar(aes(ymax=gap+SE.sim, ymin=gap-SE.sim))
p = p + ggtitle(title)
return(p)
}
试验这个函数,(特别注意的是:不将phyloseq定义的scores扩展作为常规函数导出以避免冲突,因此只能通过前面的phyloseq:::前缀来访问由phyloseq定义的scores扩展.
gs = gap_statistic_ordination(exord, "pam1", B=50, verbose=FALSE)
print(gs, method="Tibs2001SEmax")
plot_clusgap(gs)
和基础的绘图图形进行比较
plot(gs, main = "Gap statistic for the 'Enterotypes' data")
mtext("Looks like 4 clusters is best, with 3 and 5 close runners up.")





网友评论