AMPtk file formats¶
NGS Sequencing Data¶
AMPtk handles FASTQ or gzipped FASTQ input. Platform specific formats, i.e. SFF from 454 and BAM from Ion Torrent are also supported in their respective scripts.
Mapping file (QIIME-like)¶
The pre-processing scripts in AMPtk will automatically produce a QIIME-like mapping file for you if one is not specified in the command line arguments. The format is simliar to used in QIIME, although has fewer formatting rules. This file is intended to be used as a metadata file in the amptk taxonomy
script. Below is an example of a mapping file for a dual indexed PE MiSeq run:
#SampleID BarcodeSequence LinkerPrimerSequence ReversePrimer phinchID Treatment
301-1 TCCGGAGA-CCTATCCT GTGARTCATCGAATCTTTG TCCTCCGCTTATTGATATGC 301-1 no_data
301-2 TCCGGAGA-GGCTCTGA GTGARTCATCGAATCTTTG TCCTCCGCTTATTGATATGC 301-2 no_data
spike CGCTCATT-GGCTCTGA GTGARTCATCGAATCTTTG TCCTCCGCTTATTGATATGC spike no_data
A properly formatted AMPtk mapping file contains the first 5 columns. Pre-processing scripts will parse the mapping file and grab the primer sequences. Ion Torrent and 454 mapping files should have the following format:
#SampleID BarcodeSequence LinkerPrimerSequence ReversePrimer phinchID Treatment Location
BC.1 CTAAGGTAAC CCATCTCATCCCTGCGTGTCTCCGACTCAGCTAAGGTAACAGTGARTCATCGAATCTTTG TCCTCCGCTTATTGATATGC BC.1 Treatment1 Woods
BC.2 TAAGGAGAAC CCATCTCATCCCTGCGTGTCTCCGACTCAGTAAGGAGAACAGTGARTCATCGAATCTTTG TCCTCCGCTTATTGATATGC BC.2 Treatment1 Woods
BC.3 AAGAGGATTC CCATCTCATCCCTGCGTGTCTCCGACTCAGAAGAGGATTCAGTGARTCATCGAATCTTTG TCCTCCGCTTATTGATATGC BC.3 Treatment2 Field
BC.4 TACCAAGATC CCATCTCATCCCTGCGTGTCTCCGACTCAGTACCAAGATCAGTGARTCATCGAATCTTTG TCCTCCGCTTATTGATATGC BC.4 Treatment1 Field
BC.5 CAGAAGGAAC CCATCTCATCCCTGCGTGTCTCCGACTCAGCAGAAGGAACAGTGARTCATCGAATCTTTG TCCTCCGCTTATTGATATGC BC.5 Treatment2 Woods
BC.6 CTGCAAGTTC CCATCTCATCCCTGCGTGTCTCCGACTCAGCTGCAAGTTCAGTGARTCATCGAATCTTTG TCCTCCGCTTATTGATATGC BC.6 Treatment2 Woods
BC.7 TTCGTGATTC CCATCTCATCCCTGCGTGTCTCCGACTCAGTTCGTGATTCAGTGARTCATCGAATCTTTG TCCTCCGCTTATTGATATGC BC.7 Treatment2 Field
Note that the barcode is nested within the ‘LinkerPrimerSequence’ column. In this example, there are two metadata columns (Treatment and Location).
Barcode FASTA files¶
Barcode FASTA files for pre-processing should be in standard FASTA format:
>Sample1
CTAAGGTAAC
>Sample2
TAAGGAGAAC
>Sample3
AAGAGGATTC
>Sample4
TACCAAGATC
>Sample5
CAGAAGGAAC
>Mock
CTGCAAGTTC