ta-si prediction

Available from Version: 2.0

The trans-acting RNA prediction tool identifies phased 21nt sRNAs characteristic of ta-siRNA loci, from a supplied sRNA dataset and an associated genome.

This tool requires an sRNA dataset and a genome file from a plant species for input (ta-si RNAs have not been found in animals so far to the best of our knowledge). The paths to the files representing the sample and the genome are entered at the top of the main dialog box. In addition, the ta-si prediction tool requires the user to specify the p-value cutoff (to control the tool’s senstivity) and the minimum sRNA abundance, although these text boxes are automatically set with default values.

When the user has configured all the input parameters to their satisfaction they can start the ta-si prediction tool by clicking on the “Start” button on the main dialog, or by selecting the “Start” menu item from the run menu. Once running, the tool can be cancelled at any time by clicking on the “Cancel” button or menu item. Note: cancelling a run may not be instant as execution must reach a safe position in the code before cleanly stopping the run.

The first step during execution aligns sRNAs to the genome. sRNAs not matching the genome are discarded. It implements the algorithm described by Chen et al to calculate the probability of the phasing being significant based on the hypergeometric distribution (see figure below). Our implementation differs slightly as we take into account the length of the input sRNA sequences, only using 21nt sRNAs in the phasing analysis.

After the run has completed the results are available in the scrollable table underneath the input boxes (as shown in the screenshot below). Each predicted TAS gene is displayed as a row in the table. The columns represent:

  • The chromosome from which the predicted TAS originated.
  • The start position of the predicted TAS gene in the chromosome.
  • The end position of the predicted TAS gene in the chromosome.
  • The number of distinct sRNAs that align to the region described in the previous columns.
  • The number of distinct phased sRNAs that were detected in the regions described in the previous columns
  • The P-Value which was calculated by the algorithm described by Chen et al

Each predicted TAS gene can be copied to clipboard by raising the context menu for that row in the table and clicking on the “Copy to Clipboard” button. In addition, the phased sRNAs for each TAS gene can be displayed in a separate popup dialog by clicking the “Show phased sRNAs” in the context menu for that row in the table.

All TAS loci can be visualised in VisSR using the “Show in VisSR” menu item located in the View menu. Alternatively, each locus can be sent individually to VisSR by bringing up a context menu on a particular locus and selecting the “Show in VisSR” button. An example of a TAS locus in VisSR is shown below.

Now that the results are available to the user, the tool can export two types of file: a results table detailing all predicted TAS loci in .csv format and a list of phased sRNAs for each TAS loci in txt format. This csv file can be loaded into any good spreadsheet program.

  • Marek D. Koter

    Hi Matt,

    First time I used your phasiRNA prediction tool, it worked GREAT. But then I chose other data set and I got “Bad formatting in input file. Line with problem:” message. Then I returned to the previous data set which worked, but I got the same error message :-((( What is wrong???
    Thank you for your help!
    Marek D. Koter

  • Kavita Goswami

    hi, I need some help, I want to predict tasiRNAs using ta-si prediction but its not working properly its showing some memory problem and I have a lot space in the drive where it is saved so what should I do now , please solve my problem

  • Robert King

    From the command line, what are the variables for the parameter file. I couldn’t get them from the workbench. are the names below correct?

    min_abundance=10
    phasing_register=21
    p-value_threshold=0.0001

    • The sRNA Workbench

      Hi Robert,

      There is an example of each of the parameter files in the directory

      data/default_params

      a file containing the params should be supplied rather than each individual parameter (many of the tools have quite long lists of parameters so it made sense to do it this way instead)

      however, the phasing register cannot be modified from the CLI only from the GUI at present (it stays at the default of 21)

      Cheers,
      Matt

  • Mayur Divate

    Hi Matthew,

    Recently I have started working on ta-siRNA prediction and came across to your tool.

    What i understand that you tile the aligned genomic loci into 21nt (is it overlapping or non-overlapping?), then you look for sRNAs exactly falling in the tile region(with alowed mismatch and min abundance specified) called as phased and sRNA falling into two adjacent tile is called as un-phased sRNA. If this is the case, as per above demo visualization, both phased and un-phased sRNAs are aligning to the exactly same co-ordinates.

    Can you please help us to understand on phasing and un-phasing sRNAs ? And also which should be considered as true ta-siRNA candidates ?

    Thanking you in anticipation !!!

    • The sRNA Workbench

      Hi Mayur,

      Sorry for the delay in response I have been away on holiday.

      The phased sequences themselves will not overlap. The phased sequences are those that fall into phase on their respective strand, whereas those that are unphased are the same size (depending on the phase register selected) as the phased sequences but perhaps align one or two NT outside of phase.

      We report those incase it is useful or the sequences themselves are so close that another error further upstream might be responsible for them being out of phase.

      further information on the algorithm can be found here:

      http://aao.sinica.edu.tw/download/publication_list/en/96.pdf

      In terms of determining a good candidate for a TAS locus. It is a tricky one, the Chen algorithm was designed for 454 data and as such does not work as well with HTS data. But you can follow these rules:

      * high number (usually more than 5-6) of putative tasiRNAs with decent abundances i.e. comparable abundances. Something like one sequence with abn 1000 and the others with abundances < 10 is not relevant.
      * existence of a miRNA which could cleave and induce the phasing
      * enrichment for 21mers in the given locus

      I hope this helps. I will put a post up in our new upcoming information blogs about this. In addition, I will be developing an entirely new tool for TASI prediction in the near future that should work much better for IL samples with millions of reads

      Cheers,
      Matt

  • Manon

    Hi Matt,

    I would like to know if there is an exclusion between
    different Ta-Si loci windows of 251 bp. I mean if one of the region
    where phased small RNA mapped map to my genome is larger than 251 bp, what output of tasi-prediction will I have?

    – Several windows of 251 bp shifted each 21 nucleotides (for each phased small RNA)?

    – Several windows of 251 bp, shifted each 251 bp one after the other?

    – Or a single window of 251pb somewhere on the locus?

    In other words, is there a minimal size between two 251 bp Ta-Si regions ?

    Thanks,

    Best regards

    Manon

    • The sRNA Workbench

      Hi Manon,

      We use the algorithm described in (Chen et.al 2007) where we have windows of 231NT + 21 (252nt = 12 * 21nt reads for phase register of 21) and check for phasing in this region.

      So if the region is larger than the window the Chen algorithm produces several TAS loci separated by 21nt.

      I hope to replace this system in the near future with something more optimised and accurate (in fact I will be improving the TASI tool quite considerably including determination of potential TAS loci)

      I hope this helps, let me know if you have any further questions or this is not clear

      Cheers,
      Matt

  • Vincent

    Hi !
    I don’t understand why, in my results, the locus length is always 251bp. According to Chen et al algorithm should not it be 231bp ?
    Best,

    Vincent

    • The sRNA Workbench

      Hi Vincent,

      Sorry for the delay in reply, I have had a look through the code, although the region size is 231bp we will allow a sequence to be included if a portion of it lies within that region hence the overall size of 251bp. Would it be worth adding a parameter to restrict the locus to a smaller window?

      Cheers,
      Matt

      • Vincent

        Thanks for that precision. Now my concern is to find larger region, like 1kb Ta-Si loci, such as described in many papers. If theses regions would occur in my genome, does Ta-Si prediction able to find many 251bp Ta-Si loci in a raw ?

        • The sRNA Workbench

          Just to let you know I am away on leave at the moment. I will attend to your comment as soon as I return in December!

          Best wishes,
          Matt

          • The sRNA Workbench

            Hi Vincent,

            I have returned from leave now. Currently the software has a hard limit of 231BP for a TAS locus but if the sequence is partly within the locus it can be included therefore making a longer prediction possible. However, I can add the locus size as a parameter if you think this would help? Could you point me towards any publication describing such loci for my personal reading?

            Best wishes,
            Matt

  • rod

    Hi Matt,
    A colleague has set-up the tasi tool to use on my behalf. The data set we use is small (one smallRNA sequence and one ref transcriptome). We would like to run a positive control to ensure everything is operating ok. Do you have a couple of sequences you use for this purpose?
    Thanks,
    Rod

    • The sRNA Workbench

      Hi, sorry for the delay in response I have been away at the start of the week.

      Have you tried with the tutorial data? You can download it at the same place as the workbench:

      http://sourceforge.net/projects/srnaworkbench/files/TutorialData/TutorialDataV2.zip/download

      Use the sRNA data in FASTA/SRNAOME

      GSM154336_carrington_col0_flower_nr.fa

      and the file in the GENOME folder

      If you are getting no results with your data you could change the p-value parameter and see if that helps?

      Cheers,
      Matt

  • Ania

    Hi πŸ™‚
    I have question concerning the ta-si prediction, since I have quite large data sets I’m using the perl script for my analysis. It works fine however there are only two output formats – is it possible to export individual tasi to fasta, as it is in java based workbench?
    Thank you very much for quick response!!
    Best regards,
    Ania

    • Hi Ania,

      I am afraid the Perl scripts do not have this functionality included and I no longer update them. Are you having a memory issue using the Workbench?

      Cheers,
      Matt

  • david horner

    Hi Guys,

    Just a quick comment on the tasi hunter … can you set up the option to look for 24nt phasing?

    many thanks

    David

    • Hi David,

      Yes I cant see any issue with adding a user configurable parameter for it. I will add it to the list for the stable Version3.0 release πŸ™‚

      Thanks for the suggestion!

      Best wishes,
      Matt

A suite of tools for analysing micro RNA and other small RNA data from High-Throughput Sequencing devices