The UEA Small RNA Workbench Version 2.5.0

The UEA Small RNA Workbench Version 2.5.0: Released: 13/12/2012

Download

Includes: Some changes and additions to the behaviour of certain tools in the Workbench (when running in GUI mode), some bug fixes

PLEASE NOTE: The addition of batch mode to many of the tools has changed the behaviour of many of the progress bars. Because each job to be processed is carried out in parallel it has meant that reporting each files progress is not possible (without a separate progress bar for each file). Until a better solution presents itself, the main progress bars of many batch mode tools will now inform you of which file has completed and some tools contain tabbed interfaces informing you of which files are still processing. See details below:

Sequence Alignment

  • The sequence alignment tool can now handle long read files in Windows form. Previously the workbench would only convert the short read files from windows line endings to UNIX line endings and this could cause problems in the patman output.
  • The sequence alignment tool can now output matches on either the positive or negative strand (or both) depending on the selected parameters
  • Batch mode is here! (note CLI mode is unchanged)
    • To use the tool place a list of files to be aligned into the box using the dialogs
    • The output file must now be a path to a directory where all of the aligned sequence files will be placed with the same filename as the input and _Aligned.patman appended to the end

Filter

  • Batch mode is here! (note CLI mode is unchanged)
    • To use the tool place a list of files to be aligned into the box using the dialogs
    • The output file must now be a path to a directory where all of the aligned sequence files will be placed with the same filename as the input and _Filter.fa appended to the end
    • The optional Discarded Sequences box must now also be a directory. Files will contain original file name and _Discarded.fa tagged to the end
  • The output statistics is now given in a set of tabs indexed by the filename for that sequence
    • Each tab has it’s own progress indicator. When the tab shows a green tick, that file has been completed
    • As the files are all processed in parallel they may finish processing in different orders depending on the files size
  • The maximum limit for sequence lengths has been increased to 60
  • The minimum limit for sequence lengths has been decreased to 10
  • The GUI now gives a size class distribution before and after filtering

Adapter Removal

  • Batch mode is here! (note CLI mode is unchanged)
    • The interface has had to change slightly to allow for this, users can now input a list of files into the GUI for adapter removal
    • The output box must now contain a path to a directory, each file after processing will be placed in this directory with the original file name and _AR tagged to the end
    • The optional Discarded Sequences box must now also be a directory. Files will contain original file name and _Discarded.fa tagged to the end
  • Each file to have it’s adapter trimmed still requires the same adapter sequence to be present
  • The tables are now contained in a series of tabs that will be indexed by the filename of each file that has been processed
    • Each tab has it’s own progress indicator. When the tab shows a green tick, that file has been completed
    • As the files are all processed in parallel they may finish processing in different orders depending on the files size
  • The main progress bar currently will only report how many files have been completed rather than the progress of each file being processed
  • Added a new Adapter Sequence to the pre-defined sequences

RNA Annotation

  • File load support for multiple sequence renders created from FASTA files. Primary usage is for files exported from miRCat results
  • The program will create secondary structure plots from any fasta formatted files that contain foldable RNA sequences, however, if the FASTA file is formatted in the same way as the miRCat output the miRNA and star sequence will be highlighted

The format in FASTA is as follows:

>maturemiRNA_SEQUENCE_miRNA*_SEQUENCE_CHROMOSOMEHEADER-PRECURSOR-START_COORD_END_COORD_MFE_VALUE_STRANDPLUSMINUS [Negative strand][Positive strand]

HAIRPIN

So all of the bold values will not change, and the non bold values must contain the required data. I have added an example below,

maturemiRNA: TTGAGCCGTGCCAATATCACG

miRNA*: AGATATTAGTGCGGTTCAATC

Chromosome header: “1 CHROMOSOME dumped from ADB: Feb/3/09 16:9; last updated: 2009-02-02”

PRECURSOR Start-End coord: 3961364, 3961453

Minimum Free Energy : -41.9

and it is on the negative strand

>maturemiRNA_TTGAGCCGTGCCAATATCACG_miRNA*_AGATATTAGTGCGGTTCAATC_1 CHROMOSOME dumped from ADB: Feb/3/09 16:9; last updated: 2009-02-02_PRECURSOR-START_3961364_END_3961453_MFE_-41.9_STRAND_- [Negative strand]
CGCGAGATATTAGTGCGGTTCAATCAAATAGTCGTCCTCTTAACTCATGGAGAACGGTGTTGTTCGATTGAGCCGTGCCAATATCACGCG

miRProf

  • Removed the weighting from the calculation of normalised values for miRProf
  • Removed the need for a genome to be supplied in order to detect miRNAs from a sRNA dataset
    • If no genome is specified the tool will conduct normalisation on total reads rather than genome matching (can be dangerous if there is contamination!)
    • A warning is shown to the user when no genome is being used
  • Completely redesigned the output of miRProf to a more useful form
    • Previously, miRProf would output (depending on grouping options):
      • miRNA code, total raw abundance (of all sequences matching miRNA code), sequences, normalised abundance over total for each sample file
    • Now it will output:
      • miRNA Code -> drop this down to reveal -> all miRNA sequences followed by the raw/normalised abundance and the number of hits each individual sequence had to the genome (if present)
    • This should allow for a far more effective identification of the most abundant miRNA sequence in your dataset, and in turn, an easier choice for selection of sequence when choosing probes in further experiment and a simple determination of differential expression over your sample set. An example of the old output and the new output is given below:
OLD (Truncated to allow text to fit on the page):
mir156 RAW: 359 NORM: 2308.38 TTGACAGAAGATAGAGAGCAC; RAW: 14 NORM: 184.15 TTGACAGAAGATAGAGAGCAC 14 972.7
NEW

mir156  Genome-Matches  S1 Raw  S2 Raw S3 Raw  S4 Raw  S1 Norm  S2 Norm  S3 Norm  S4 Norm
CTGACAGAAGATAGAGAGCAC 1 3 0 0 57 19.29 0 0 812.12
GCTCACCTCTCTTTCTGTCAGT 1 6 0 0 0 38.58 0 0 0
GCTCACTGCTCTTTCTGTCAGA 1 0 0 0 1 0 0 0 14.25
TGACAGAAGAAAGAGAGCAC 1 0 1 0 0 0 13.15 0 0
TGACAGAAGAGAGTGAGCA 6 1 0 0 0 6.43 0 0 0
TGACAGAAGAGAGTGAGCAC 6 236 4 0 19 1517.49 52.61 0 270.71
TGACAGAAGAGAGTGAGCACA 6 9 0 0 3 57.87 0 0 42.74
TGACAGAAGATAGAGAGCAC 4 2 0 0 11 12.86 0 0 156.72
TTGACAGAAGAAAGAGAGCA 1 0 0 0 1 0 0 0 14.25
TTGACAGAAGAAAGAGAGCAC 1 0 0 0 6 0 0 0 85.49
TTGACAGAAGAGAGTGAGC 1 0 0 0 1 0 0 0 14.25
TTGACAGAAGAGAGTGAGCA 1 1 0 0 1 6.43 0 0 14.25
TTGACAGAAGAGAGTGAGCAC 1 28 5 2 26 180.04 65.77 138.96 370.44
TTGACAGAAGATAGAGAGCA 3 0 0 0 3 0 0 0 42.74
TTGACAGAAGATAGAGAGCAC 3 73 4 12 203 469.39 52.61 833.74 2892.27

SiLoCo

  • Fixed a bug that was preventing users from being able to select a min_length parameter when running from the command line
  • Modified the output table to make clearer reading (chromosome, start, stop all given separate columns)
  • Fixed a bug where CLI runs where not flushing the output file correctly

miRCat

  • Added a new parameter that allows user control over how many heavy weight processing threads miRCat is allowed to generate. This will enable users to prevent miRCat from sucking up all of the CPU resources on shared systems for example
  • Reorganised the procedure for creating thread pools within miRCat in an attempt to reduce the zombie processes that are created when the program suffers a severe crash
  • Fixed a bug in the export miRNA to FASTA function that was giving incorrect data as the original abundance of the sequence in that file
  • The export hairpins function now outputs the pre-cursor sequences formated as FASTA. The header contains mature miRNA, miRNA*, chromosome header, pre-cursor start, pre-cursor end. All data is underscore separated

TASI

  • Added new functionality to output single (or multiple) loci to FASTA.
    • Highlight each row you wish to output by holding ctrl/option or shift. Then right click on the row and select the desired control
    • Single loci will output the entire sequence as FASTA that can be blasted. FASTA header will contain chromosome and start stop
  • Added new functionality to output individual small RNA (phased and unphased) from single or multiple loci
    • Highlight each row you wish to output by holding ctrl/option or shift. Then right click on the row and select the desired control
    • Each individual small RNA contains in the FASTA header: chromosome, start stop, phase or unphased, abundance and strand
  • Entire TASI result sets can be output to FASTA as individual small RNA sequences

VisSR

  • Fixed a bug that was stopping various tools from producing a render of the data if the FASTA header of the reference genome file contained spaces (affecting SiLoCo and TASI mainly)

GENERAL

  • MAC OSX users will now find there is a dock icon for the workbench rather than the standard java icon
  • A 64bit version of the patman binary has been included (for LINUX only) to help alleviate some of the problems faced during initial install
  • UTKARSH RAGHUVANSHI

    hi,
    i am using MacOSx10.5.8.
    I want to know how would i install UEA small RNA workbench 3.01 alpha version on my system. i want to perform adapter trimming so can anyone suggest me something, i didn’t find any executable files.
    though i tried but it was not positive

    • Hi,

      The executable files have a .jar extension on them. I have not built (yet…) a full OSX package for the workbench but to use the tool you can simply double click on the sRNAWorkbenchStartup.jar file.

      Alternatively you can navigate to the folder in the terminal and type java -jar sRNAWorkbenchStartup.jar.

      Let me know if this helps,
      Cheers,
      Matt

  • Brandon

    “A 64bit version of the patman binary has been included (for LINUX only) to help alleviate some of the problems faced during initial install”

    Does this mean all the other software is 32bit? I’ve tried to run the software on 64bit Windows 7. The Java screen comes up and shows it is loading the various programs, but then it just closes.

    Thanks. I hope this is something silly on my end, because this program would be perfectly suited for the work in our lab.

    -Brandon

    • Hi Brandon,

      Sorry for the delay in reply, I work alone mostly on this project and I am currently on paternity leave (so there may be further delay until I return in two weeks)

      I just thought I would quickly mail you and mention that this is most likely a small java setup problem. The entire program apart from the Patman binary is 64bit so no limits on RAM. The reason I recompiled a 64bit version of Patman is just to fix a Linux setup issue that some people had. This will not affect windows users and you should ensure that you have a 64 bit version of java installed on your windows machine.

      Could you please run the srnaworkbenchstartup jar file from your command line instead of double clicking it? Please let me know if you need further instructions on how to do this.

      Please let me know if this helps 🙂

      Best wishes,
      Matt

  • thirumaran

    Is this tool solves or supports this: The species for which annotated genome sequence is not available and for small RNA sequences obtained by isolation, cloning and sequencing(not by Next Gen Sequencing)

    • Please see comment on home page. I will look into this comment when I return 🙂