changelogs.md


mad-lab/transit

Repository  -  API  -  Source

3.2.3

October 16, 2021

TRANSIT:

  • added Binomial essentials (EB) to Gumbel analysis (supplementing genes categorized as E), to help with low-saturation datasets
  • modified ANOVA and ZINB so that --include-conditions and --exclude-conditions refer to original Conditions column in samples metadata file (instead of whatever is specified by --conditions)

TPP:

  • improved metrics reported in *.tn_stats by TPP, to help diagnose why reads don't map

3.2.2

September 8, 2021

TRANSIT:

  • fixed bug in converting gff_to_prot_table
  • fixed bug in tn5gaps (fixes some false negative calls)
  • fixed some bugs in pathway_enrichment (GSEA calculations)
  • fixed links to Salmonella Tn5 data in docs
  • fixed problem with margins in heatmap.py that was causing R to fail
  • added --ref to anova.py and zinb.py (for computing LFCs relative to designated reference condition)
  • added --low_mean_filter for heatmap.py (for excluding genes with low counts, even if they are significant by ANOVA or ZINB)
  • add dependency on pypubsub<4.0

3.2.1

December 22, 2020

TRANSIT:

  • maintenance release
    • fixed a bug in the GUI caused by changes in wxPython 4.1.0
    • added GO terms for M. smegmatis in the data directory for doing pathway analysis

3.2.0

October 26, 2020

TRANSIT:

  • improvements to pathway_enrichment analysis
    • added '--ranking' flag for GSEA to sort genes based on LFC or SLPV
    • implemented Ontologizer method (-M ONT), which works better for GO terms
    • updated auxilliary files in transit data directory for different systems of functional categories (COG, Sanger, GO)
  • added '-signif' flag to GI (Genetic Interaction analysis) (options: HDI, prob, BFDR, FWER)
    • updated description of methods for determining significant interactions in documentation
  • various improvements to other methods, including corrplot and heatmap

3.1.0

March 8, 2020

TRANSIT:

  • added 'corrplot' and 'heatmap' commands
  • pathway_enrichment:
    • completely re-done so it is faster and simpler
    • now implements Fisher's exact test and GSEA
    • can be used with COG categories and GO terms
    • switch to 2-column format for associations files
  • resampling:
    • changed semantics of pseudocounts from "fake sites" (-pc, dropped) to calculation of log-fold-changes (-PC, new)
  • anova:
    • put columns for condition means in correct order
    • added columns for log-fold-changes for each condition to output
  • zinb:
    • improved handling of --include-conditions and --ignore-conditions
    • now prints out a summary of how many samples are in each condition (including cross-product with covars and interactions)
  • make pseudocounts flag (-PC) work uniformly for resampling, anova, and zinb

3.0.2

December 21, 2019

TRANSIT:

  • Mostly cosmetic fixes
  • Updated some command-line and GUI messages
  • Updated documentation (especially for GI and resampling)
  • Removed "warning: high stderr" from gene status in ZINB
  • Added LFCs in ZINB output
  • updated 'convert gff_to_prot_table' so it works with gff3 files downloaded from NCBI

3.0.1

August 1, 2019

TRANSIT:

  • Add check for python3 (TRANSIT 3+ requires python3.6+)
  • Minor fixes in GI and Pathway Enrichment

3.0.0

July 18, 2019

TRANSIT:

  • TRANSIT now supports python 3. (To use with python 2, use releases < 3.0.0)
  • Improved speed of GSEA in Pathway Enrichment analysis.

2.5.2

May 16, 2019

TRANSIT:

  • Made some improvements in command-line version of 'tn5gaps'
  • Added flags for trimming insertions in N- and C-termini of genes for tn5gaps (-iN and -iC)

2.5.1

April 25, 2019

TRANSIT:

2.5.0

March 28, 2019

TRANSIT:

  • Added analysis method for Zero-Inflated Negative Binomial (ZINB)
  • Fix LOESS flag bug in resampling 2.4.2
  • Resampling supports combined_wig files
  • Change ordering of metadata and annotation file in ANOVA cmd

2.4.2

March 15, 2019

TPP:

  • updated docs for TPP; expanded discussion of protocols, including Mme1
  • for Mme1, change min read length from 20bp to 15bp (for genomic part of read1)
  • replaced '-himar1' and 'tn5' flags with '-protocol [sassetti|tn5|mme1]'
  • added 'auto' for -replicon-ids
  • added 'pre-trimmed' as option for transposon in TPP GUI (prefix="")

    TRANSIT:

  • resampling can now be done between TnSeq libraries from different strains
  • add documentation for 'griffin' and Mann-Whitney 'utest' analysis methods

2.4.1

March 4, 2019

TPP:

  • allow the primer sequence to be the empty string (i.e. -primer "" on command-line; for pre-trimmed reads)
  • do not throw an error if header ids in read1 and read2 fastq files happen to match identically
  • minor bug fixes:
  • fixed problem of order of data in tn_stats table when there are multiple contigs but only single-ended reads
  • fixed name of flag from "replicon-id" to "replicon-ids"
  • prevent div-by-zero error in cases where no reads map

2.4.0

February 28, 2019

TPP:

  • can now handle genomes with multiple contigs (thanks to modifications by Robert Jenquin and William Matern); it creates multiple .wig files as output
  • BWA: switched from using 'aln' to 'mem' by default
  • added flags to set the nucleotide window for searching for start of primer sequence (-primer-window-start)
  • fixed bug in counting misprimed reads, and reads mapped to both R1 and R2
  • added some fields to TPP GUI, and made it more consistent about saving/reading parameters in the tpp.cfg config file

    Transit:

  • fixed bug in handling '-minreads' flag in Gumbel analysis
  • updated support for converting .gff files to .prot_table format (in GUI and on command line)
  • added a status field to ANOVA output
  • TrackView scales all plots simultaneously by default
    • updated documentation

Pull Request 18 by Robert Jenquin and William Matern (Jan, 2019)

  • Added the ability to accept multiple replicons in the form of either multiline reference genomes or multiple reference genome files.
  • Added -bwa-alg argument, allowing the user to specify mem or aln to use bwa mem or bwa aln algorithms
  • Now requires -replicon-id argument to specify names for the replicons if multiple reference genomes given (respective order to order appearing in reference genome(s)
  • Code cleanup: closing dangling file handles
  • Bug fix: if adapter is at exact end of R1, it is now properly handled
  • Bug fix: trimmed_reads now counted properly
  • Added support for specifying -window-size argument
  • Sample usage:
    python2 src/tpp.py -himar1 -bwa /usr/bin/bwa -bwa-alg aln -ref MAC109_genome.fa -replicon-id CP029332 CP029333 CP029334 -reads1 ../HJKK5BCX2_ATGCTG_1.fastq -reads2 ../HJKK5BCX2_ATGCTG_2.fastq -primer AACCTGTTA -mismatches 2 -window-size 6 -output tpp_output/avium
  • Explanation of arguments
    • -himar1 specifies that the Himar1 transposon was used in the transposon mutagenesis procedure. Tn5 is also supported (-tn5)
    • -bwa specifies the path to the bwa executable
    • -bwa-alg specifies either mem or aln algorithms for bwa to use. aln is widely considered obselete to mem for reads of length > 70bp. aln is default.
    • -ref specifies the reference genome(s) in FASTA format to which reads will be mapped. If more than one, they can be specified in either multiple FASTAs, or as a multilined FASTA (or a combination of both).
    • -replicon-id [contig1 contig2 ...] specifies the names of the contigs in the genome(s). These are used as filename suffixes for output files (ie *_contig1.wig, *_contig2.wig, etc). The order of the contigs is assumed to be the same as they appear in the reference genome(s) (as given with -ref). Specifying this option is only required if there is more than one contig. Note: While you can technically use any contig name at this step, if you wish to use wig_gb_to_csv.py to organize the data you should use the contig names as they appear in the Genbank file (as specified by wig_gb_to_csv.py -g).
    • -reads1 specifies the file containing the raw reads (untrimmed) for read1 in FASTQ or FASTA format
    • -reads2 specifies the file containing the raw reads (untrimmed) for read2 in FASTQ or FASTA format
    • -primer specifies a nucleotide sequence at the end of the transposon, is used to separate transposon DNA from genomic DNA in read 1.
    • -window-size specifies how many positions to look for -primer within read 1. It should be set to at least the difference between the maximum and minumum expected positions of the first base of genomic DNA in read 1 (and larger if you want to allow for insertions/deletions). For the Long et al 2015 protocol (using a pool of 4 shifting prefixes) the window-size should be at least 6. Default value is 6.
    • -mismatches specifies the number of mismatches to allow when searching for the transposon in read 1 (ie number of mismatches to -primer).
    • -output specifies the filename prefix to be applied to output files. Can include directories, allowing custom paths to be specified.

2.3.4

January 14, 2019
  • TRANSIT:
    • Minor bug fixes related to flags in Resampling and HMM

2.3.3

December 6, 2018
  • TRANSIT:
    • Minor bug fixes related to flags in HMM

2.3.2

November 9, 2018
  • TRANSIT:
    • Minor bug fixes related to changing parameters in TPP GUI

2.3.1

October 19, 2018
  • TRANSIT:
    • Removed dependence on PyPubSub (can run Transit in command-line mode without it, but needed for GUI)

2.3.0

October 10, 2018
  • TRANSIT:
    • Added calculation of Pathway Enrichment as post-processing for resampling, to determine if conditionally essential genes over-represent a particular functional category or pathway (such as for GO terms)
  • Added ANOVA analysis for identifying genes with significant variability of counts across multiple conditions
    • Updated Documentation - especially for "Quality Control/TnSeq Statistics"; also added more command-line examples under "Analysis Methods"
    • Fixed bugs (including TrackView in the GUI)
    • Upgraded dependencies, including wxPython 4.0 (required)

2.2.0

June 4, 2018
  • TRANSIT:
    • Added analysis method for Genetic Interactions.
    • Added Mann-Whitney U-test for comparative analysis.
    • Made TRANSIT compatible with wxPython 4.0 (Phoenix).
    • Datasets now automatically selected when they are added to TRANSIT.
    • Fixed bug in packaging of TPP, causing problem with console mode in new setuptools.
    • Miscellaneous bugs fixes

2.1.0

June 23, 2017
  • TRANSIT:

    • Added tooltips next to most parameters to explain their functionality.
    • Added Quality Control window, with choice for normalization method.
    • Added more normalization options to the HMM method.
    • Added LOESS correction functionality back to TRANSIT
    • Added ability to scale Track View based on mean-count of the window.
    • Added ability to scale individual tracks in Track View.
    • Added ability to add tracks of features to Track View.
    • New documentation on normalization.
  • TPP:

    • TPP can now accept empty primer prefix (in case reads have been trimmed).
    • TPP can now process reads obtained using Mme1 enzyme and protocol.
    • TPP can now pass flags to BWA.

2.0.2

August 19, 2016
  • TRANSIT:

    • Now accepts GFF3 formatted annotations.
    • Added ability to specify pseudocounts for resampling.
    • Added extra columns to resampling output.
    • Fixed bug with some log2FC calculations.
    • Export to combined wig format now asks for normalization BEFORE file name.
    • Fixed bug preventing Quality Control window from opening.
    • Miscellanous bug fixes.
    • Updates to Documentation
  • TPP:

    • Now accepts custom primer sequences.
    • Reporting additional diagnostic statistics for reads mapping to phiMycoMarT7, and Illumina adapters.
    • Miscellaneous bug fixes.

2.0.1

July 5, 2016

-TRANSIT:

- Fixed crash in TPP.
- Misc changes for outputs.

2.0.0

June 16, 2016
  • TRANSIT:
    • Added new method for datasets created with Tn5 transposons.
    • Added label indicating intended transposons for the methods.
    • Added textbox with short description of the chosen method.
    • Changed methods choices to be in menu (on top).
    • Changed the file display window.
    • Added Help menu with link to online documentation.
    • Added new logo.
    • Added option to export (normalized) datasets to IGV or combined wig format.
    • Can now select multiple .wig files at the same time (Ctrl + select).
    • Lots of changes under the hood.

1.4.5

January 10, 2016
  • TRANSIT:
    • Added Binomial analysis method as an option to TRANSIT.
    • Added DE-HMM analysis method as an option to TRANSIT.

1.4.3

January 2, 2016
  • TRANSIT:
    • Fixed bug causing TRANSIT not to open on some Windows systems.

1.4.3

December 4, 2015
  • TRANSIT:
    • Precision of resampling p-values in output file now increases with sample size
    • Added preliminary Quality Control functionality. Select some datasets and click View -> Quality Control
    • In resampling, changed logFC to divide by number of replicates
    • Changed plotting of results files to be more versitile
    • Fixed bug causing HMM_sites output not to be added to list of files
    • Fixed bug causing LOESS correction not to work in HMM

1.4.2

July 29, 2015
  • TRANSIT:
    • Added Total Trimmed Reads normaliztion (TTR) as the default option. This is the recommended normalization method at this point.
    • Added BetaGeomtric Correction (betageom) as a normalization option. This is recommended for datasets that are very skewed.
    • Fixed bug that caused transit to create histograms when not desired.
    • Added a pseudo-count when calculating log-FC to genes without reads.
    • Increased size of result windows so that all columns are immediately visible.

1.4.1

June 5, 2015
  • TRANSIT:
    • TRANSIT now accepts read-counts in floating-point precision, not just integers.
    • Made transit work with most recent versions of matplotlib.

1.4.0

May 27, 2015
  • TRANSIT:
    • Added option to correct for genomic position bias (using LOESS)
    • Added more options for normalization, including zero-inflated negative binomial and quantile normalization.
  • TPP:
    • Eliminated soft-clipped reads.
    • Modified template_counts() to be much more memory efficient (does not need gigabytes of RAM any more to process large datasets)
    • Added ability to process Tn5 datasets

1.3.0

March 31, 2015
  • TRANSIT:
    • Fixed threading issue for volcano plot.
    • Improved format and quality of the output messages.
    • Fixed direction of log-fold change in volcano plots.
    • Added log-fold change column to resampling output file.
    • Made adaptive resampling work better with custom sample sizes.
  • TPP:
    • Fixed genomic portion for single ends.
    • Added usage help as part of command line arguments.

1.2.33

March 6, 2015
  • TRANSIT:
    • Fixed issue with histograms create using adaptive resampling.

1.2.32

March 5, 2015
  • TRANSIT:

    • Put .pyc files in in new src/ directory.
    • Fixed error that sometimes occurred when plotting volcano plots.
    • Made TRANSIT default to the current working directory when opening file dialogs.
  • TPP:

    • TPP can now process files with single-end reads.
    • TPP can now process .fasta and compressed files with ".fastq.gz" extension

1.2.7

February 25, 2015
  • TRANSIT:

    • Fixed error that occured when displaying graphs after running an analysis.
    • Updated datasets included in the data/ directory.
  • TPP:

    • Removed the requirement for wxPython when running TPP on command-line mode.

1.1.0

February 20, 2015
  • TRANSIT:

    • Fixed error in HMM results file table, which was not correctly showing breakdown of genes.
    • Made TRANSIT work from the command-line, without displaying GUI. See documentation for arguments/flags.
    • Added ability to convert annotation files between several formats (.prot_table, ptt.table, gff3).
  • TPP:

    • User can supply reads in either FastA or FastQ format.
    • Added an option to specify number of mismatches (default=1) when looking for sequence patterns such as the transposon prefix in read 1.
    • Added command-line arguments so TPP can be run in batch mode without the GUI.
    • Number of mapped reads for R1 and R2 independently is also now reported.
    • Modified how barcodes are extracted from read 2. It now looks for specific sequence patterns, even if they are shifted. This should greatly increase the number of mapped reads (esp. the genomic part of R2) for certain datasets.
    • Properly handle short fragments, ie. for reads where the insert size is shorter than the read length. In such cases, the adapter from other end appears at the end of read 1, and this suffix is now stripped off so these reads will map too.

1.0.0

February 10, 2015
  • First limited-release version of TRANSIT
  • Released to close collaborators first and presented in teleconference to get feedback.