miRProf

Available from Version: 2.0

This tool determines normalised expression levels of sRNAs matching known miRNAs in miRBase. miRProf can also quickly and easily compare miRNA expression levels across multiple samples.

miRProf requires a set of sRNA samples and a genome file for input. The paths to the files representing the samples and the genome are entered at the top of the main dialog box. In addition, miRProf can be configured in a number of ways, which are described in the following paragraphs.

miRProf filters sRNA sequences in each sample before expression levels are computed. The user can specify criteria such as minimum sRNA size, maximum sRNA size and minimum abundance. In addition, all low-complexity sRNAs (sequences, which contain less than 3 distinct nucleotides) and sequences containing non-canonical nucleotides (e.g. ‘N’ for unknown nucleotide) are discarded. Finally, any sRNAs not aligning to the user-specified genome are discarded. Genome filtering is mandatory in order to provide a frame of reference for comparing sRNAs across multiple samples.

Filtered reads from each sample are then aligned to a (sub)set of miRBase mature sequences. The user has the option to partion mirBase mature sequence based on whether they originate from plant, animal or virus miRNA precursors. In addition, the user can optionally choose to include all sets for alignment. miRBase is constantly updated with new miRNAs reported from the field. Therefore, the user can select any version of miRBase for their specific experiments. The data download and management is automatically handled within the tool. All the user need do is type in the version they which to use in the parameters dialog box.

The sRNA to mature miRNA matching process can be controlled using a few parameters:

    • Overhangs allowed

When aligning sRNAs to mature sequences the aligning tool will fail to report a match if the short read (the sRNA) overhangs the long read (the miRNA), to circumvent this problem miRNAs are padded with “XXX” at the beginning and end of the sequence. sRNAs which overhang the actual mature sequence will then be treated as having a number of mismatches equal to the number of nucleotides, which overhang the miRNA. This behaviour occurs by default, but the user can switch this off by deselecting the “overhangs allowed” check box.

    • Mismatches allowed

The number of allowed mismatches is controlled using the “Mismatches allowed” text box. The default value here is 0, implying the sRNA must exact match somewhere within the mature sequence. The maximum number of mismatches allowed is capped at 3 to prevent huge numbers of meaningless results being recorded.

    • Only keep best match

Even with 1, 2, or 3 mismatches allowed many hits can be returned, these can be minimised by checking the “only keep best match” box, which instructs miRProf to discard all hits to a given miRNA that have less than the best number of mismatches (a smaller number of mismatches is considered “better” than a larger number).

The last part of miRProf configuration involves determining how the sRNAs that match known miRNAs should be grouped. The user has these options:

    • Group organisms

If checked, miRProf does not consider the organism from which the miRNA was found. Therefore, users who select this check box will see only a single organism being returned: (all-combined).

    • Group variants

If checked, matches to different variants of a single miRNA precursor family are combined into one, such as: arm, and copy number.

    • Group mature and star

If checked, matches to mature and star sequences of the derived from the same miRNA precursor are grouped into one.

    • Group mismatches

If checked, matches to the same miRNA are combined into a group regardless of the number of mismatches.

All miRProf configuration parameters can be saved and loaded from file by using the toolbar buttons at the top of parameter browser dialog. This functionality can help the user maintain consistency between separate runs and can be used as part of an experiment log book by automatically documenting the experimental setup.

Setting parameters for the miRProf tool

When the user has configured all the parameters to their satisfaction they can start miRProf by clicking on the “Start” button on the main dialog, or by selecting the “Start” menu item from the run menu. Once running miRProf can be cancelled at any time by clicking on the “Cancel” button or menu item. Note: cancelling a run may not be instant as execution must reach a safe position in the code before cleanly stopping the run.

After the run has completed the results for all samples is available in the main miRProf dialog as shown in the figure below. The configuration of the results table is highly dependent on the grouping options the user has selected. For example, if the user selected to “group organisms”, then the results table will initially only contain one row, called “all combined”. Otherwise the display shows each organism the sRNAs could be mapped to. By clicking on the organism entry, individual detected miRNAs are revealed. Again, the number of miRNAs displayed here can vary dramatically based on the grouping options selected by the user. Each row containing a detected miRNA will contain 4 columns for each sample: raw count (total number of reads in the sRNA dataset matching this miRNA), weighted count (total number of reads in sRNA dataset matching this miRNA divided by the number of times the matching sRNAs aligned to the genome), normalised count (weighted count divided by total number of reads in this sample multiplied by 1 million), and finally the actual sRNA sequences in this sample that matched the miRNA. Any row within the table can be copied by selecting it and then using “ctrl-c” or by bringing up a context menu and selecting the “Copy to Clipboard” item.

Displaying miRProf results

Now that the results are available to the user, miRProf can export two types of file: a results table in .csv format and a list of sRNAs matching known miRNAs (in FASTA format). The results table contains a formatted list of reads that match to known miRNAs. It also contains information about redundant (total) and non-redundant (distinct) sequence counts in the input set before and after every filtering step. The csv file can be loaded into any good spreadsheet program.

26 comments on “miRProf

  1. Alice Lunardon on said:

    Hi Matt,
    I tried to use miRProf in a linux server I have now on my lab: the tool works correctly, the only problem I encountered is when I save my results in .csv format. I explain: inside the tool i see correct number of raw and normalized abundances but when I save the data in the .csv file here I lose all decimal numbers and decimals numbers are splitted in two columns. I did not manage to solve the problem by opening the file in excel because in the .csv file both comma and points are saved as commas, for example a normalized abundance 2.16 is saved in the .csv file as 2,16. I don’t it is a problem of my computer settings, because I tried several settings and I obtain always the same result.

    I thank you for your help!

    Alice

    • Hi Alice,

      How are you logging into the Linux server? Is it through a terminal from a windows machine? Is the linux machine setup to be English or Italian? The problem is CSV files are comma separated values which of course poses a problem if values themselves are also separated with commas! Currently, the software must be run on a machine configured as English (UK or US) for it to format numbers with a decimal point instead of a comma which doesnt really help you at the moment!

      A quick solution would be to drop down all the miRNAs from the result table and simply highlight and copy paste the data into an excel spread sheet (which is a bit of work I am afraid if there are lots of miRNAs in your results). In fact the miRProf exporter is not working correctly at the moment anyway in terms of how it groups miRNAs (one column is missing, it is a known bug and will be fixed in the next release). As this is going to be a problem for anyone using a European format I will also add it to the fixes and look at including the option to output the results to TSV (tab separated values) or if possible an XML based exporter so I can directly create excel files.

      Sorry I cannot be of any more help, I will step up the release schedule and try and get this available as soon as possible.

      Best wishes,
      Matt

      • Alice Lunardon on said:

        Hi Matt,
        I am logging through a terminal and my pc is set to English but still don’t manage to get the the correct results, don’t know why..thank you anyway!

        Alice

  2. Hi Alice,

    yes it is output row by row, since redesigning I need to simply add the missing column in the table but the order remains the same :) it is a simple fix but I usually try and wait until I have several new features/fixes before releasing new code to reduce the amount of downloading people have to do (unless it is a real game breaker)

    For the command line, you must provide a text file containing the parameters you wish to use. Examples for these (you can just copy paste and then edit in any text editor) can be found in data/default_params

    If you need to access the help just type … -tool mirprof and the message should appear

    You will find example files for all tools, if you copy the default_mirprof_params.cfg into a new location and modify it you could expect your command to look something like

    java -jar Workbench.jar -tool mirprof -srna_file_list 118372.fa -mirbase_db /sRNAWorkbench/Workbench/dist/data/mirbase/19/mature.fa -out_file result2.csv -genome TAIR10_chr_all.fas -params default_mirprof_params.cfg

    where is the path to your install. A few things though, the csv file is currently suffering even worse from this bug and has none of the information contained from the GUI, you will get the fasta file of all the miRNA sequences though.

    In addition, if you are running many small RNA files through miRProf you may need to increase the RAM from the default amount. Usually this is done for you with the sRNAWorkbenchStartup.jar but to run from the CLI you must bypass this program and therefore need to do this yourself. Let me know if you need instructions on how to do this.

    Please expect a fix for the output as soon as possible :)

    • Alice Lunardon on said:

      Hi Matt,
      I found a strategy to recover only the matches of one species by replacing the the file with mature mirna with one containing only mine of interest :)
      With the command chmod 777 I think now all the executables and the workbench have full read/write access, but I still am not able to find the help message :( I tried by typing:
      java -jar Workbench.jar -tool mirprof
      Workbench.jar -tool mirprof
      -tool mirprof
      (also with the entire path to the directory where the Workbench.jar is) I really can’t understand with is wrong with it!
      To eventually increase the memory I think I have to type:
      java -jar -Xmx10G /path/../srna-workbenchV2.5.0/Workbench.jar … is it right?

      Thanks a lot!!

      Alice

      • hmm a very strange one, I have no idea why you are not receiving the help message. I just tried java -jar Workbench.jar -tool mirprof from inside the workbench directory and got back:

        Error: The parameter srna_file_list must be specified.
        Usage:
        java -jar /path/to/Workbench.jar [-verbose] -tool mirprof [-f] -srna_file_list comma-seperated-srna-file-paths -mirbase_db mirbase-file-path -out_file output-file-path [ -params params-file-path ]
        -f = Force overwriting of output file

        which is the correct help message. If you just type java -jar Workbench.jar -tool do you get any of the other help back? if you leave the -tool does the workbench boot into GUI mode?

        Yes you are nearly correct with your -Xmx but prefix with -Xms10g so:

        java -Xms10g -Xmx10g -jar Workbench.jar -tool ….

  3. Hi,
    I am using miRProof from (2.5.0) in a linux cluster; I started the analysis but while the program is matching the sRNA data to the genome it is killed with Error I message. This is what I have in my prompt:
    Mar 20, 2013 12:48:12 PM uk.ac.uea.cmp.srnaworkbench.utils.WorkbenchLogger log
    SEVERE: WORKBENCH: MIRPROF: Message: 1;
    Stack Trace: uk.ac.uea.cmp.srnaworkbench.utils.patman.PatmanEntry.parse(PatmanEntry.java:185)
    uk.ac.uea.cmp.srnaworkbench.utils.patman.PatmanReader.process(PatmanReader.java:78)
    uk.ac.uea.cmp.srnaworkbench.utils.patman.PatmanReader.process(PatmanReader.java:42)
    uk.ac.uea.cmp.srnaworkbench.tools.mirprof.Mirprof.process(Mirprof.java:211)
    uk.ac.uea.cmp.srnaworkbench.tools.RunnableTool.run(RunnableTool.java:339)
    java.lang.Thread.run(Thread.java:662)

    Could you please help me on understanding the problem?

    Thank you

    Alice

    • Hi Alice,

      Are you able to run the program without supplying the genome file? In addition, are you able to align the sRNA data to the genome through the sequence alignment tool?

      In addition, are you running in GUI mode with X-Window or from the CLI?

      Thanks,
      Matt

      • Alice Lunardon on said:

        Hi Matt,
        I performed the alignment separately and than I run the program without matching to the genome and it works, thanks a lot!
        I use a GUI mode with -X in a linux cluster.

        I run the program without grouping for the organisms: I obtained the right results in the interface but when I download the csv file I loose the information of the different organism, is there a way to save the results divided per organism?

        Many thanks

        Alice

        • Ok great, I must look into now why the alignment did not work however because this will have an affect on normalised values for the sequence abundance!

          Can you confirm that the directory for the workbench has full read/write access, also that the patman binary (ExeFiles/linux) has execute permissions?

      • Alice Lunardon on said:

        Hi Matt,
        I checked the .csv file and I see that the counts are divided per organism but I don’t have the name of organisms, why don’t I recover the name of the species?

        Thank you

        Alice

        • Hi Alice,

          Yes there is a few issues with the output to CSV at the moment that a couple of users have reported. I changed quite a lot with the updates to miRProf in the last release but obviously missed something in the output modules. I am working on a fix for it now so that the CSV files completely reflects what is shown in the GUI. I will release the fixed version as soon as possible. You may be able to simply copy paste the entire table into EXCEL (I havent tried this) Bear with me on the fix :)

          Cheers,
          Matt

          • Alice Lunardon on said:

            Hi Matt,
            can I assume that the species in the .csv file are in the same order than in the interface? I need Zea maize, in this way I could know that it’s the last of the list, although it’s name it is not written.
            (I will also try to copy and paste form the interface)

            If I want to run the program from the command line what is the command to type to see the manual, in particular to see how to write the parameters? I tried with java -jar Workbench.jar -tool mirprof –help but it does not work.

            Thank you very much!

            Alie

          • Hi Alice, please see reply above as the text became very squashed and unreadable!

  4. Kenlee Nakasugi on said:

    Hi Matt,
    I just started using the latest version 2.5 of Workbench.jar, in Linux Ubuntu using java -Xmx10g -jar Workbench.jar.
    I ran miRProf off this, without grouping organisms, using miRBase19 and plant only matures. When I saved the output to csv, there are no indicators in the export file as to which miRNAs belong to which organism, although this is shown in the GUI. The row order of the miRNAs in the csv do appear to coincide with that of the GUI, just missing the organism names to separate them. Is this expected?
    Also as a feature request, would it be possible to save the miRBase version and category details in the ‘save parameters to disk’ option?

    cheers,
    Ken

    • Hi Ken,

      I hadn’t noticed, but I will fix this for the next version. Thanks for pointing it out! Also, yes your feature request is fairly simple and I will also include it in the next version :)

      Thanks again,
      Matt

  5. Nathaniel Street on said:

    Would it be possible to also include the ability to use output from miRCat as an addition to the use of miRBase? It would be cool to predict de novo miRNAs and then use miRProf to generate expression values for those de novo predictions.

    • Hi Nathaniel,

      could you give a little more information about what you mean? Are you thinking something along the lines of outputting normalised expression levels for the miRNAs predicted over a range of samples? Then collating these into a spread sheet?

      • Nathaniel Street on said:

        That’s exactly what I’m thinking. I would like to have normalised expression values for the miRNAs predicted by miRCat using the same normalisation calculation as in miRProf. I can generate that by mapping all of the reads back to the predicted miRNAs but it would be great if one of the tools could include this option.

        • Hi Nathaniel,

          Ok no problem, you will notice I just released a version of the program (hence the delay on replies etc while I was locked in a cupboard with my laptop) with some pretty major changes to the output miRProf produces (hopefully this will make the tool much more usable). Unfortunately this feature is not included in this release, however, although the release notes are quite long this is really just a pre-cursor for a much larger release in the next few weeks.

          I have started on the code for this today, and will introduce a normalised calculation into miRCat and try and devise some kind of usable output for the major release (along with the SAM/BAM format if possible)

          Thanks again for the suggestion :) this is exactly the sort of thing that helps me improve this program.

          Best wishes,
          Matt

  6. Gracie on said:

    Hi,

    I am using miRProf in sRNA Workbench 2.4.0 (release11), miRBase is latest version 18.
    I don’t get a parameter window either. After uploading the sRNA and reference, staring the miRProf gives me an error “String index out of range: -17″.
    Could you please drop a hint why this is?

    Thanks!
    Gracie

    • Hi Gracie,

      Sorry for the delay, I tried to email you yesterday to gather some further information but my email got bounced for some reason. If possible could you email me at matthew.stocks@uea.ac.uk so I can gather some further information privately?

      Best Wishes,
      Matt

    • Gabrielle on said:

      What was the answer with the string index out of range problem?
      I have the same version I believe and mine says it has “string index out of range: -5″

      Any clues would be great!
      Thanks,
      Gabrielle

      • Hi Gabrielle,

        The problem was related to the input data not being in non-redundant format. Many of our tools produce a non-redundant format and the others require this format to work. If you run your data set through the filter tool and ensure the “make output non-redundant” is checked this may solve your problem.

        Please let me know if this helps or if you need further explanation.

        Best wishes,
        Matt

  7. Chintan Vora on said:

    I am not getting parameter window of mirprof in version 2.2 and how does it update the mirbase as i am not able to update the mirbase also.Please help me out with this.

    • admin on said:

      Hi,

      RE your problem with miRBase (copied from another response)

      you may be experiencing an issue related to Java 1.7 when attempting to download/update miRBase files (related to support for IPv4, Java are aware of the problem but I am not sure when their updated code will be available). We are attempting to add a work-around at the moment.

      The software should work fine on Java 1.6 (any update). If you want instruction on how to use an earlier version of Java please let me know.

      As for the parameter window, I am not sure why this would happen, I have not experienced a case where it didn’t appear. Did you try to press the “Show Parameter Browser” button on the tools drop down menu? miRProf does not show the parameter browser by default.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>