博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
[samtools] sam格式与bam格式互换,提取未匹配reads,转为fastq
阅读量:5880 次
发布时间:2019-06-19

本文共 3201 字,大约阅读时间需要 10 分钟。

Converting a SAM file to a BAM file

First, if you use the Unix command

head test.sam

The first 10 lines on your terminal after typing "head test.sam", should be lines starting with an "@" sign, which is an indicator for a header line. If you don't see lines starting with the "@" sign, the header information is most likely missing.

If the header information is absent不存在 from the SAM file use the command below, where reference.fa is the reference fasta file used to map the reads:

samtools view -bT reference.fa test.sam > test.bam

If the header information is available:

samtools view -bS test.sam > test.bam

Sorting a BAM file

samtools sort test.bam -o test_sorted

Creating a BAM index file

samtools index test_sorted.bam test_sorted.bai

Converting a BAM file to a SAM file

Note: remember to use -h to ensure the converted SAM file contains the header information. Generally, I suggest storing only sorted BAM files as they use much less disk space and are faster to process.

samtools view -h NA06984.chrom16.ILLUMINA.bwa.CEU.low_coverage.20100517.bam > NA06984.chrom16.ILLUMINA.bwa.CEU.low_coverage.20100517.sam

Simple stats

samtools flagstat NA06984.chrom16.ILLUMINA.bwa.CEU.low_coverage.20100517.bam

10182494 in total

0 QC failure
223627 duplicates
9861117 mapped (96.84%)
10095646 paired in sequencing
5049066 read1
5046580 read2
8174084 properly paired (80.97%)
9452892 with itself and mate mapped
321377 singletons (3.18%)
215316 with mate mapped to a different chr
126768 with mate mapped to a different chr (mapQ>=5)
For more statistics of SAM or BAM files have a look at the  program.

Interpreting the BAM flags

(也可以利用FLAG值含义解释工具:)

Here are some common BAM flags:

163: 10100011 in binary

147: 10010011 in binary

99: 1100011 in binary

83: 1010011 in binary

Interpretation解释 of 10100011 (reading the binary from left to right):

1 the read is paired in sequencing, no matter whether it is mapped in a pair
1 the read is mapped in a proper pair (depends on the protocol, normally inferred during alignment)
0 the query sequence itself is unmapped
0 the mate is unmapped
0 strand of the query (0 for forward; 1 for reverse strand)
1 strand of the mate
0 the read is the first read in a pair
1 the read is the second read in a pair

163 second read of a pair on the positive strand with negative strand mate

147 second read of a pair on the negative strand with positive strand mate

99 first read of a pair on the forward strand with negative strand mate

83 first read of a pair on the reverse strand with positive strand mate

Extracting only the first read from paired end BAM files

samtools view -h -f 0x0040 test.bam > test_first_pair.sam

0x0040 is hexadecimal十六进制 for 64 (i.e. 16 * 4), which is binary for 1000000, corresponding to the read in the first read pair.

Filtering out unmapped reads in BAM files

samtools view -h -F 4 blah.bam > blah_only_mapped.sam

Creating FASTQ files from a BAM file

I found this great tool at 

For example to extract ONLY unaligned from a bam file:

bam2fastq -o blah_unaligned.fastq --no-aligned blah.bam

To extract ONLY aligned reads from a bam file:

bam2fastq -o blah_aligned.fastq --no-unaligned blah.bam

转载于:https://www.cnblogs.com/xiaofeiIDO/p/6424649.html

你可能感兴趣的文章
明确自己的位置
查看>>
从Darry Ring看奢侈品该如何玩转互联网思维
查看>>
设计模式第二课 观察者模式
查看>>
DIY强大的虚拟化环境-升级存储主机
查看>>
Spring源码解析(三)——容器创建
查看>>
document.bgcolor设置文档的背景颜色
查看>>
星期天写了点蛋疼的东西(1)
查看>>
A10的上网链路负载实现
查看>>
文件I/O
查看>>
橙子引擎CEO尚韬: 蓝海破冰,重新定义TV游戏
查看>>
Spring中factory-method的使用
查看>>
zTree默认选中指定节点并执行事件
查看>>
编译安装syslog-ng debian
查看>>
通过爬妹子图片来学习async/await
查看>>
【python】编程语言入门经典100例--35
查看>>
cookie增加Secure属性
查看>>
360浏览器兼容模式 - 兼容问题
查看>>
WebLogic11g-负载分发
查看>>
appcan是什么
查看>>
美国破获世纪“银行大劫案”隐形罪犯不再拿刀枪
查看>>