Mini-Tutorals: Difference between revisions
Jump to navigation
Jump to search
imported>Saymon |
imported>Saymon |
||
Line 185: | Line 185: | ||
Run contex_var_31_c1 (cutoff 1 used for 1 genome) | Run contex_var_31_c1 (cutoff 1 used for 1 genome) | ||
*--se_list is the command the reads the list you want to target (ie: list-genome.txt) | *--''se_list'' is the command the reads the list you want to target (ie: list-genome.txt) | ||
*--kmer_size is the middle size, has to be an odd integer | *--''kmer_size'' is the middle size, has to be an odd integer | ||
*--mem_width always choose 17 | *--''mem_width'' always choose 17 | ||
*--mem_height always choose 100 | *--''mem_height'' always choose 100 | ||
*--dump_binary Name your file name (ie: Genome.ctx) | *--''dump_binary'' Name your file name (ie: Genome.ctx) | ||
<syntaxhighlight lang="bash"> | <syntaxhighlight lang="bash"> | ||
Line 203: | Line 203: | ||
</syntaxhighlight> | </syntaxhighlight> | ||
Bubble caller will detect differences between each | Bubble caller will detect differences between each genome by assigning distinct colors to each genome (note that the UK spelling of color is used: '''colour''') | ||
*--multicolour_bin | *--''multicolour_bin'' holds your all-colors.ctx binary from the Bubble Caller | ||
*-- | *--''detect_bubbles1 i/i'' Detects 1 variation between genomes ''i'' and ''i''. ''i'' indicates the position number the genome is listed on the colorlist.txt file. If the genome is fourth on the colorlist.txt, for example, its corresponding ''i'' variable is 4 | ||
*-- | *--''output_bubbles1'' Output variant reads in fasta format (ie: Evo-RefHG.var for bubble detection between | ||
Evolved genome and Reference HG genome) | |||
*--''print_colour_coverages'' necessary for output | |||
<syntaxhighlight lang="bash"> | |||
/home/weigang/CORTEX_release_v1.0.5.21/bin/cortex_var_31_c5 --kmer_size 31 --mem_height 17 --mem_width 100 --multicolour_bin all-colors.ctx --detect_bubbles1 0/1 --output_bubbles1 Evo-RefHG.var --print colour coverages > Evo-RefHG.log | |||
</syntaxhighlight> |
Revision as of 18:02, 24 November 2015
Bp-utils: sequence, alignment & tree utilities by Qiu Lab
bioseq: sequence/FASTA manipulations
- Use accession "CP002316.1" to retrieve the Genbank file from NCBI. Save the output (in genbank format) to a file named as "cp002316.gb".
bioseq -f "CP002316.1" -o'genbank' > cp002316.gb
- Use the above file as input, extract FASTA sequences for each genes and save the output to a new file called "cp002316.nuc". Use this file for the following questions.
bioseq -i "genbank" -F cp002316.gb > cp002316.fas
- Count the number of sequences.
bioseq -n cp002316.fas
- In a single command, pick the first 10 sequences and find their length
bioseq -p "order:1-10" cp002316.fas | bioseq –l
- In a single command, pick the third and seventh sequences from the file and do the 3-frame translation. Which reading frame is the correct on both? Specify
bioseq -p "order:3,7" cp002316.fas | bioseq -t3
- Find the base composition of the last two sequences
bioseq -p "order:25-26" cp002316.fas| bioseq –c
- Pick the sequence with id "Bbu|D1_B11|8784|9302|1" and count the number of codons present in this sequence
bioseq -p "id:BbuJD1_B11|8784|9302|1" cp002316.fas | bioseq –C
- Delete the last 10 sequences from the file and save the output to cp002316-v2.nuc
bioseq -d "order:17-26" cp002316.fas > cp002316-v2.nuc
- In a single command, pick the first sequence, then get the 50-110 nucleotides and make reverse complement of the sub-sequences
bioseq -p "order:1" cp002316.fas | bioseq -s "50,110" | bioseq –r
- In a single command, get the first 100 nucleotides of all the sequences present in the file and do 1-frame translation of all sub-sequences.
bioseq -s "1,100" cp002316.fas | bioseq -t1
bioaln: alignment/CLUSTALW manipulations
- Go to /home/shared/LabMeetingReadings/Test-Data and find the sequence alignment file “bioaln_tutorial.aln”. Name the format of the alignment file. Use it to answer all the questions below.
CLUSTALW
- Find the length of the alignment.
bioaln -l bioaln_tutorial.aln
- Count the number of the sequences present in the alignment.
bioaln -n bioaln_tutorial.aln
- How do you convert this alignment in phylip format? Save the output.
bioaln -o "phylip" bioaln_tutorial.aln > test.phy
- Pick “seq2, seq5, seq7, seq10” from the alignment and calculate their average percent identity.
bioaln -p "seq2, seq5, seq7, seq10" bioaln_tutorial.aln | bioaln -a
- Get an alignment slice from “50-140” and find the average identities of the slice for sliding windows of 25.
bioaln -s "50, 140" bioaln_tutorial.aln | bioaln -w "25"
- Extract conserved blocks from the alignment.
bioaln -B bioaln_tutorial.aln
- Find the unique sequences and list their ids.
bioaln -u bioaln_tutorial.aln | bioaln -L
- Extract third sites from the alignment and show only variable sites in match view.
bioaln -T bioaln_tutorial.aln | bioaln -v | bioaln -m
- Remove the gaps and show the final alignment in codon view for an alignment slice “1-100”.
bioaln -s "1, 100" bioaln_tutorial.aln | bioaln -g | bioaln -c
- Add a 90% consensus sequence and then show the final alignment in match plus codon view for an alignment slice “20-80”. (Hint: match view followed by codon view)
bioaln -s "20, 80" bioaln_tutorial.aln | bioaln -C "90" | bioaln -m | bioaln -c
biotree: tree/NEWICK manipulations
biopop: SNP statistics
Homology searching and clustering
BLAST+: search("google") for homologs/pariwise alignment
hmmer
cdhit
cdhit -i all.pep -o all.cdhit -c 0.5 -n 3
Options:
- -i: input file
- -o: output file
- -c: percent identity (below which it is considered different families)
- -n: word length
interproscan
../../software/interproscan/interproscan-5.13-52.0/interproscan.sh -i trep-cdhit.representatives.pep2 -o trep-representatives.tsv -t p -goterms -pa -f tsv
Documentation page: How to run
Programs for producing multiple alignments
MUSCLE
CLUSTALW
MAFT
TCOFFEE
Programs for producing phylogeny & phylogenetic analysis
FastTree
PHYLIP
MrBayes
RaXML
PhyloNet
R packages for phylogenetics
APE
phengorn
phytools
Population genetics
ms: coalescence simulation
SFS: forward simulation
PAML: testing selection with Ka/Ks
Microbial genome databases & pipelines in Qiu Lab
borreliabase
pa2
spiro_genomes/treponema
Genome annotation pipeline
de novo variant call with cortex_var
Create binary file of fasta genome file.
Run contex_var_31_c1 (cutoff 1 used for 1 genome)
- --se_list is the command the reads the list you want to target (ie: list-genome.txt)
- --kmer_size is the middle size, has to be an odd integer
- --mem_width always choose 17
- --mem_height always choose 100
- --dump_binary Name your file name (ie: Genome.ctx)
/home/weigang/CORTEX_release_v1.0.5.21/bin/cortex_var_31_c1 --se_list list-Evo.txt --kmer_size 31 --mem_width 17 --mem_height 100 dump_binary Evo.ctx > Evo.log
Read each binary file (.ctx) into its own individual color list (ls Evo.ctx > Evo.colorlist) Then save these lists into their own collective colorlist.txt (ls *.ctx > colorlist.txt)
Reveal genetic variation using the Bubble Caller from cortex_var.
/home/weigang/CORTEX_release_v1.0.5.21/bin/cortex_var_31_c5 --se_list colorlist.txt --kmer_size 31 --mem_width 17 --mem_height 100 dump_binary all-colors.ctx > all-colors.log
Bubble caller will detect differences between each genome by assigning distinct colors to each genome (note that the UK spelling of color is used: colour)
- --multicolour_bin holds your all-colors.ctx binary from the Bubble Caller
- --detect_bubbles1 i/i Detects 1 variation between genomes i and i. i indicates the position number the genome is listed on the colorlist.txt file. If the genome is fourth on the colorlist.txt, for example, its corresponding i variable is 4
- --output_bubbles1 Output variant reads in fasta format (ie: Evo-RefHG.var for bubble detection between
Evolved genome and Reference HG genome)
- --print_colour_coverages necessary for output
/home/weigang/CORTEX_release_v1.0.5.21/bin/cortex_var_31_c5 --kmer_size 31 --mem_height 17 --mem_width 100 --multicolour_bin all-colors.ctx --detect_bubbles1 0/1 --output_bubbles1 Evo-RefHG.var --print colour coverages > Evo-RefHG.log