The bioinfomaticians of the Australian Centre for Ecogenomics actively design and develop software used in the analysis of ecogenomic datasets, software which furthers the research aims of the centre.
Please read further for examples of software currently used.
RefineM is a set of tools for improving population genomes. It provides methods designed to improve the completeness of a genome along with methods for identifying and removing contamination. RefineM comprises only part of a full genome QC pipeline and should be used in conjunction with existing QC tools such as CheckM.
Please see https://github.com/dparks1134/RefineM.
CheckM provides a set of tools for assessing the quality of genomes recovered from isolates, single cells, or metagenomes. It provides robust estimates of genome completeness and contamination by using collocated sets of genes that are ubiquitous and single-copy within a phylogenetic lineage. Assessment of genome quality can also be examined using plots depicting key genomic characteristics (e.g., GC, coding density) which highlight sequences outside the expected distributions of a typical genome. CheckM also provides tools for identifying genome bins that are likely candidates for merging based on marker set compatibility, similarity in genomic characteristics, and proximity within a reference genome tree.
GraftM is a meta-omic tool that identifies and classifies marker genes in short read datasets (metagenomes and metatranscriptomes), as well as assembled contigs, whole genomes and protein sequences. GraftM outputs a taxonomic/functional summary table, a krona plot, as well as various other run statistics. Both unaligned and aligned "hit" sequences are provided. GraftM is designed for speed and accuracy: it is able to find marker genes in a 200Mb of assembled metagenome in <20 sec, and compares favourably with similar tools in accuracy benchmarking.
Please see http://geronimp.github.io/graftM
GroopM is a metagenomic binning toolset. It leverages spatio-temoral dynamics (differential coverage) to accurately (and almost automatically) extract population genomes from multi-sample metagenomic datasets.
BamM is a C library, wrapped in python, to efficiently generate and parse BAM files, specifically for the analysis of metagenomic data. For instance, it implements several methods to assess
contig-wise read coverage, and provides a convenient interface for mapping multiple sequencing libraries against an assembly.
Please see http://ecogenomics.github.io/BamM/
Metagenome and isolate assemblers generate contigs from reads, but still leave valuable information on the table. FinishM exploits this information to improve/finish a draft genome without any further laboratory-based work.
In even a moderately successful assembly, resultant contigs constitute the vast majority of the genome being sequenced, but this fact is ignored by assemblers. Unlike a traditional assembler FinishM does not attempt to directly extend contigs, but instead focuses on connecting already assembled contigs.
Please see https://github.com/wwood/finishm
SingleM is a tool to find the abundances of discrete operational taxonomic units (OTUs) directly from shotgun metagenome data, without heavy reliance of reference sequence databases. It is able to differentiate closely related species even if those species are from lineages new to science.
Please see https://github.com/wwood/singlem
A simple and not slow open reading frame (ORF) caller. No bells or whistles like frameshift detection, just a straightforward goal of returning a FASTA file of open reading frames over a certain length from a FASTA/Q file of nucleotide sequences.