When sequencing devices produce a list small RNAs, often the minimum read length exceeds the length of the small RNA. Depending on the device, this results in sequenced reads with adaptor sequences at one, or both, ends of the read. The Adaptor Removal tool can remove these adaptor sequences making sRNA data ready for analysis and processing by other tools.
The tool is able to quickly and efficiently process high-throughput sequenced data in FastQ or a FastA formats to produce a FastA file containing trimmed reads with redundancy removed. The tool processes the input file in the following manner:
- Optionally trim 5′ adaptor from beginning of all reads. Reads not containing a 5′ adaptor, if specified, are discarded.
- Trim 3′ adaptor from end of all reads. Reads not containing a 3′ adaptor are discarded.
- All trimmed sequences outside a user specified length range are discarded.
5′ adaptor trimming is an optional step because some sequencing devices automatically trim the 5′ adaptors from sequenced data. For example, Solexa/Illumina reads start at the first base of the sRNA and contain only the 3′ adaptor, whereas 454 datasets contain both the 5′ and the 3′ adaptors, as shown in the diagram below.
The adaptor matching process looks for exact matches in each read to the adaptor sequence. Therefore it will not trim reads with adaptors containing mismatches. In addition it is common, particularly in reads from Illumina/Solexa devices, that the adaptor is truncated in the raw read. For these reasons it is preferable to match using a truncated version of the adaptor sequence. In practice, the first 8nt of the 3′ adaptor and/or the last 8nt of the 5′ adaptor sequence are normally sufficient. This behaviour is easily controllable via the GUI interface (as shown below), which allows users to specify the full adaptor sequence and then enter the number of nucleotides to use in the matching algorithm. In addition, a set of commonly used adaptor sequences are readily available from drop down menus, saving the user time when processing data sets using commmon adaptors.
After processing has completed, the user can view a table of the job’s execution statistics and a table representing the length distribution of trimmed reads in the results panel shown below the input panel, as shown in the Figure below. This information is also output to file.