miRNAs

Photo by rawpixel on Unsplash

There are more than 45 000 miRNA related publications in PubMed. While some studies investigate miRNA biogenesis, function and decay, the majority of work has focused on identifying specific miRNAs with roles in disease and developmental processes. In the most common approach, miRNA association studies are performed using microarray or Next Generation Sequencing (NGS) platforms to compare two conditions (e.g. healthy versus cancer) and identify miRNAs that have statistically significant differences in expression levels. The mRNA targets of these miRNAs are predicted using computational tools such as TargetScan3 and functionally interesting ones may be also experimentally verified.

However, these studies implicitly assume an oversimplistic model of miRNA function

  • Annotation: miRNA studies are dependent on annotation and the primary reference resource is miRBase. The quality of this resource is variable and different versions can return different results in miRNA expression studies (i.e. identification of differentially expressed miRNAs). A further complication is that miRBase includes highly similar or duplicate miRNAs, and miRNAs that have multiple copies.

  • isomiRs: In most miRNA expression studies it is assumed that miRNAs exist as a well-defined and stable entity, i.e., the single sequence specified in miRBase is the exact form in which a miRNA is expressed. In reality, a miRNA is expressed as a series of highly similar isoforms, or isomiRs, which have demonstrated functional roles18. Microarray based studies are unable to capture this variation, and most NGS Small RNA Sequencing (Small RNA Seq) studies generally fail to consider such deviations.

  • Ethnicity: This is rarely considered in miRNA studies. miRBase annotation is based on the standard reference genome, GrCh38.p13, but ethnicity can impact miRNA studies by (i) failing to map reads to features containing population specific SNVs, and (ii) failing to incorporate population specific variation in the 3’UTR targets.

  • Targeting: Due to cost and throughput issues, determining miRNA targets is heavily dependent on computational prediction tools. Many of these are rule based, i.e., they incorporate knowledge into the prediction process - in particular, they require the presence of seed region binding (nt2 to nt7/8 in a miRNA). Even machine learning based approaches used by tools such as TargetScan incorporate this information into their models. While this helps to improve model performance, it biases the model to identify targeting events based on existing knowledge, rather than providing new insight into the targeting process.

Simon Rayner
Simon Rayner
Group Leader

Computational Biology Group.

Related