Plant Long Non-Coding RNA Prediction by Random fOrests


Long non-coding RNAs (lncRNAs) make up a significant portion of non-coding RNAs and are involved in a variety of biological processes. The long non-coding RNAs (lncRNAs) do not code for proteins and have minimum transcript length of 200 bp. Accurate identification/annotation of lncRNAs is the primary step for gaining deeper insights into their functions. Next generation RNA-sequencing methods give us an opportunity to study the whole transcriptome of any organism and these data can be used to identify potential lncRNAs. We have developed a novel tool, PLncPRO, for prediction of lncRNAs in plants using transcriptome data. PLncPRO is based on machine learning and uses random forest algorithm via constructing training model based on 71 features to classify the coding and long non-coding transcripts. PLncPRO has very high prediction accuracy and is particularly well-suited for plants. The performance of PLncPRO was quite better with vertebrate transcriptome data as well. We demonstarted its utility by identifying novel lncRNAs in rice and chickpea. The availability of plant-specific lncRNA prediction tool will provide a useful resource for discovery of lncRNAs and understanding their role in plants.



Visitorsweb counter code