SiLoCo

Available from Version: 2.0

This tool predicts sRNA loci using the method described in [1] and [2]. It also enables the user to compare the expression profile of sRNA loci between difererent samples. In order to determine the relative position of sRNAs, the reads are mapped to the reference genome using PatMaN [3]. Only full-length, perfect matches are accepted as hits. The genome-matching reads are normalised [4] and weighted by repetitiveness. The normalisation method divides hit counts by the number of redundant reads that match the genome. The normalised count, for each distinct read, is given in \hits per 1 million matching reads”. Because it is impossible to decide where a sRNA with multiple matches to the genome originated, we correct the normalised read-abundance for repet- itiveness by dividing it by the number of matches to the genome. The result is a weighted hit count. The method uses the normalised and weighted read-abundance and relative position of sRNAs on the reference genome to predict the sRNA loci. A locus must have a minimum of 3 weighted sRNA hits (this threshold can be adjusted using the min hits parameter) and no gap (absence of sRNA hits) longer than 300nt (this threshold can be adjusted using the sRNA loci distance parameter).

The datasets must contain sRNA sequence reads in FASTA format, in redundant form, i.e. with one entry for each read. Sequences shorter than 18nt (minsize parameter) or longer than 30nt (maxsize parameter) will be removed.

Required Parameters:

  • Genome File: The location of the genome file in FASTA format.
  • sample names: The locations of the sRNA samples

Input files are entered using the box displayed below:

Input dialogue for SiLoCo. Enter your databases for each sample and your genome file

Optional Parameters

  • sRNA loci distance: The maximum gap length in a locus, default max gap = 300).
  • max size: The maximum length of a sRNA.(18 maxsize 35, default maxsize = 25).
  • min size: The minimum length of a sRNA.(18 minsize 35, default minsize = 25).
  • min sRNA locus size: The minimum number of sRNAs in a locus.(1 min hits, default min hits = 3).
  • max genome hits: The maximum number of times a sRNA can hit the genome.(18 minsize 35, default minsize = 18).

The results are presented in a Table as shown in the image below.

An example of the results from a two sample SiLoCo run

The headers for each column contains the description of the data and the name of the sample file. Locus-data is shown in a table with the following columns:

  • Chromosome, start/end position and length Genomic location and length of locus in nucleotides. Some incomplete genomes may not yet be assembled into chromosomes and the acces- sions listed here may be scaffolds or bacs instead. The list is initially sorted by chromosome and position.
  • Raw count Sum of read abundances in samples 1 and 2 that from the locus (not corrected for repetitiveness).
  • Weighted count Sum of raw read abundances divided by number of matches of each sequence to the genome.
  • Normalised count Sum of weighted counts divided by the total number of genome-matching reads in each sample, given in \hits per 1 million genome-matching reads. Normalised counts (abundances) are comparable between sam- ples.
  • Uniquely matching reads (optional) Number of sequence reads in the locus that only have a single match to the genome.

The context menu operates on the currently selected result line. ‘Show in VisSR’ will display the selected locus in VisSR. An example of a locus shown in VisSR is below:

siloco in VisSR

A locus, predicted in SiLoCo and displayed in VisSR

[1] Attila Molnar, Frank Schwach, David J Studholme, Eva C Thuene- mann, and David C Baulcombe. mirnas control gene expression in the single-cell alga chlamydomonas reinhardtii. Nature, 447(7148):1126 1129, Jun 2007 [2] Rebecca A Mosher, Frank Schwach, David Studholme, and David C Baulcombe. Polivb induences rna-directed dna methylation indepen- dently of its role in sirna biogenesis. Proc Natl Acad Sci U S A, 105(8):31453150, Feb 2008 [3] Kay Prufer, Udo Stenzel, Michael Dannemann, Richard E Green, Michael Lachmann, and Janet Kelso. Patman: rapid alignment of short sequences to large databases. Bioinformatics, 24(13):1530{1531, Jul 2008 [4] Ali Mortazavi, Brian A Williams, Kenneth McCue, Lorian Schaevter, and Barbara Wold. Mapping and quantifying mammalian transcrip- tomes by rna-seq. Nat Methods, 5(7):621:628, Jul 2008

6 comments on “SiLoCo

  1. Kenlee Nakasugi on said:

    Hi Matt,
    I’m using the latest v2.5, and the output from Silico doesn’t appear to have normalized values as described above, just the raw abundance and unique hits and a few other columns. I’m hoping to use the normalized values for some other analysis. Will this field be back?
    Also, the default parameter for min. size is 16 set in the params window – is this intended?
    Some other notes:
    - parameters won’t write to disk even when clicking save
    - when exporting the output to csv, if one wants to over-write a previously saved file, it actually appends to it instead of overwriting. I had a 20Mb file saved initially, and after a second analysis it doubled to 40Mb. The number of lines doubled exactly too.
    Cheers
    Ken

    • Hi Ken,

      Yes the minimum size is set to 16 by default. Do you need to examine smaller sequences?

      I am looking into the file write issues you have reported. Hopefully it should not be to tricky to re-create and fix!

      Thanks for letting me know,
      Matt

  2. I am using a parameter file that contains this

    max_genome_hits=100
    min_abundance=1
    cluster_sentinel=100
    min_length=18
    max_length=26
    min_locus_size=100

    I get this error

    Illegal min_length parameter value. Valid values: 16 <= min_length <= 0.

    But I can not see what the problem is. I have run this from the GUI using these settings and it works fine there. When I try to save the settings from the GUI no file gets written to disk.

    I also can not see how to name the output if I run this from the command line. I ran once without specifying -params and the program ran but no output was written in the current directory.

    When I run this with a large number of sample (24) I get many loci generated with no or very low expression in all samples. Should it be the case that at least one of the samples has a clear expression signal for each loci detected?

    • Hi,

      I have figured out the problem here. It is just a silly bug I have introduced while setting up the program. Clearly <=0 is incorrect.

      I have fixed it and will add it to the change list for the next release. Thanks for pointing it out! Output for SiLoCo is placed into the user/SiLoCoData directory into a time/date stamped folder, you should find your results there (but will not be able to modify the command line params until I release the fixed version of the code I am afraid)

      Also for your large 24 sample experiment. With SiLoco, no, you may not find each locus has a strong expression profile because of the conditions that are used to determine loci (rule based) are likely to produce many false positives (we have found in tomato and A.th data that 1/3 of the predicted loci had a high chance of being real and the other 2/3 could be degradation products)

      For this reason, we have developed a new locus detection tool that will improve detection for your experiment. It is based on statistical approach to locus detection and although will not replace SiLoCo completely will undoubtably provide your with a result set that is far more usable!

      We hope to make this tool available in the next few weeks. Most likely we will have two releases, one with the bug fixes and a few features and then a major release with the new tool

  3. In the first column of the csv file output there seems to be a formatting issue. I have values like these

    Locus
    scaffold_11989-2047
    scaffold_16151-6168

    I guess there should be a delimiter between the scaffolds name and the start position?

    • hmm that is strange, usually there is a ‘/’ between the chromosome name and start and stop.

      To be honest this format is just a legacy from the original scripts. For the next release I will just put all data into separate columns anyway and this should stop any type of problem like this appearing in the future.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>