美文网首页
Aspera 下载测序数据

Aspera 下载测序数据

作者: 吴十三和小可爱的札记 | 来源:发表于2021-03-14 21:49 被阅读0次

下载安装

conda install -c hcc aspera-cli

# path of the ascp
which ascp

下载链接获取

  1. 已知文章中所用数据的BioProject number: PRJNAxxxxx

  2. 进入https://www.ebi.ac.uk/ena/browser/home

  3. 搜索项目文件信息


  4. 展开项目文件


  5. 选择链接类型


  6. 下载链接文件


解析链接文件

setlinks <- function(file_path, ena_tsv){
 raw_data <- read.delim(paste0(file_path, ena_tsv), header = TRUE)
  # fastq_aspera/fastq_ftp
 raw_links <- stringr::str_split(raw_data$fastq_aspera, ";")
 links <- unlist(raw_links)
 # links <- paste0("ftp://", links)
 temp <- gsub("_tsv.txt", "", ena_tsv)
 file_name <- gsub("filereport_read_run_", "", temp)
 write.table(links, quote = FALSE, row.names = FALSE,
             col.names = FALSE,
             file = paste0(file_path, file_name, "_links.txt"))
}

setlinks(file_path, ena_tsv)

下载

# find path
which ascp
find -name asperaweb_id_dsa.openssh

# download
ascp -i path_to/asperaweb_id_dsa.openssh --overwrite=diff -P33001 
    -T -l 30m era-fasp@fasp.sra.ebi.ac.uk:/vol1/fastq/-XXXX

针对ftp链接


# 去掉末尾换行符
sed 's/\r//' .*url.txt | while read url;
do
wget -b $url;
done

# 查看文件情况
cat wget-log* | grep 100% | wc -l

# 抓取未完成链接
cat wget-log* | grep "Giving up" -B3 | grep "SRR" 
    | sed "s/ (try:20) => ‘//" | sed "s/’//"

md5 检查

md5检查可以校验文件完整性;在“下载链接获取”的 “5. 选择链接类型”中有fastq_md5,同样下载TSV格式并构建md5检查文件。


setmd5 <- function(file_path, md5_tsv){
 raw_data <- read.delim(paste0(file_path, md5_tsv),
                        header = TRUE)

 raw_md5 <- stringr::str_split(raw_data$fastq_md5, ";")


 clean_md5 <- data.frame(unlist(raw_md5),
                  paste0(rep(raw_data$run_accession,
                             each=2),
                         c("_1.fastq.gz", "_2.fastq.gz")))

 # set file name
 temp <- gsub("_tsv.txt", "", ena_tsv)
 file_name <- gsub("filereport_read_run_", "", temp)
 

 write.table(clean_md5, quote = FALSE,
             row.names = FALSE,
             col.names = FALSE,
             file = paste0(file_path,
                           file_name,
                           "_md5.txt"))
}
setmd5(file_path, md5_tsv)

md5sum -c _md5.txt

相关文章

网友评论

      本文标题:Aspera 下载测序数据

      本文链接:https://www.haomeiwen.com/subject/lsbccltx.html