This tool infers the location of significant biological units known as sRNA loci, by combining genomic location with the analysis of other information such as variation in expression levels (expression pattern) and size class distribution. In the CoLIde tool we define a locus as a union of regions sharing the same pattern, located in close proximity on the genome. Biological relevance, detected though the analysis of size class distribution is presented for each locus.
This tool can be used on ordered (e.g. time-dependent) or un-ordered (e.g. organ, mutant) serie of samples both with and without biological/technical replicates. The tool reliably identifies known types of loci and shows improved performance on sequencing data from both plants (e.g. A. Thaliana, S. Lycopersicum) and animals (e.g. D. Melanogaster) when compared to existing locus detection techniques.
The tool first requires information on how many samples form the experiment.
From here it will require you to input the files that relate to each sample:
- Genome File: The location of the genome file in FASTA format.
- Sample Names: The locations of the sRNA samples and their optional replicates
Input files are entered using the box displayed below:
Each sample can be modified (i.e. have files added and removed) individually by selecting the desired sample number from the table below:
Series Type Parameters
- Ordered Series: (select this option if order is important to the experiment e.g. time series)
- Unordered Series: (select this option if order is not important to the experiment e.g. organ series)
The Confidence Interval (CI)s are also controlled using the following parameters which represents the percentage of replicated
measurements to be included in each CI.
Non Replicate Data – Confidence Interval Control
- Percentage CI: This determines the percentage to add to either side of the normalised expression
Replicate Data – Confidence Interval Control
- Min Max: Use the minimum and maximum normalised expression value to determine the confidence interval (100%)
- +-SD: CI is mean +- 1 standard deviation (67%)
- +-r(2)SD: CI is mean +- standard deviation divided by the square root of 2 (50%)
- +-2SD: Ci is mean +- 2 X standard deviation
Percentage Overlap: controls the amount each confidence interval must overlap to be considered a straight pattern
The results are presented in a Table as shown in the image below.
The headers for each column contains the description of the data and the name of the sample file.
Locus-data is shown in a table with the following columns:
- ID: Split by chromosome/scaffold: each locus is numbered per chromosome.
- Start: Start coordinate for locus
- End: End coordinate for locus
- Length: Locus length
- P-Val: The probability value for the locus as calculated from the chi-square statistic
- Sample 1-n: The expression series for this locus
- Chromosome: The chromosome this locus resides on
- Differential Expression: The absolute differential expression for this locus
The context menu operates on the currently selected result line.
- Export individual sequences: Export the sequences that form the locus to FASTA
- Output entire locus: Export the entire locus sequence from the genome to FASTA
- Show locus in genome view: display the selected locus using standard arrow view in VisSR
- Show locus in aggregate genome view: display the selected locus as a compressed view in VisSR
Users have two options when viewing loci predicted in CoLIDE. The standard arrow view as shown below:
Or the aggregated view as shown below (same data and location)
The aggregated view groups all small RNAs in the locus into windows of 100nt and generates a histogram showing the abundance of all small RNAs within that window.