Recent advances in genome analysis, including microarray and massively parallel sequencing, have shown that a much larger portion of the human genome is pervasively transcribed into RNA than previously recognized. Moreover, much of the evidence emerging in recent years has highlighted the biological and pathological importance of RNA molecules that lack protein-coding potential; these are collectively referred to as noncoding RNAs (ncRNAs) (1). Long ncRNAs (lncRNAs) are broadly defined as transcribed RNA molecules greater than 200 nt in length and lacking an open reading frame of significant length (less than 100 amino acids). Although there are no specific definitions, lncRNAs can be categorized into several subgroups based on their locations and characteristics. For instance, antisense RNAs are transcribed from the opposite strand of a protein-coding gene, while lncRNAs transcribed from intergenic regions are referred to as large intergenic ncRNAs (lincRNAs).
Much about the molecular and biological characteristics of lncRNAs is as yet unknown, but what is known suggests that many show expression patterns that are spatially and temporally specific and are generally poorly conserved among species. Cabili et al. recently used integrated RNA-seq data to construct a reference catalog of 8195 human lincRNAs (2). According to their reports, lincRNAs, like protein-coding transcripts, are transcribed by RNA polymerase II and are spliced and polyadenylated, but the maximum expression levels of lincRNAs are 10 times lower than those of protein-coding transcripts. lincRNAs are also smaller in size than protein-coding RNAs and have fewer exons: on average, lincRNAs are ~1 kb in length with 2.9 exons, whereas protein-coding transcripts are ∼2.9 kb in length with 10.7 exons. In addition, lincRNAs are alternatively spliced more frequently than protein-coding mRNAs (2.3 isoforms per lincRNA locus, on average). It is also noteworthy that whereas 78% of examined lincRNAs exhibit tissue-specific expression patterns, only 19% of protein-coding transcripts do so. In fact, the tissue (cell or context) specificity of lincRNAs expression has been reported often enough (3,4) to suggest that it is an important feature of lincRNAs.
Although the specific functions of the large majority of lncRNAs remain unknown, recent studies have begun to shed light on the critical roles played by these molecules in a variety of cellular processes, including differentiation, development and tumorigenesis. In this review, we will briefly outline the known functions of lncRNAs and their involvement in cancer.
FUNCTIONS OF lncRNA
Although, as mentioned, the function of most lncRNAs remains unknown, dozens of examples of biologically functional lncRNAs have already been reported, and the number of such examples is rapidly increasing (Table 1). Basically, lncRNAs appear to be involved in all aspects of gene regulation, including chromosome dosage-compensation, imprinting, epigenetic regulation, nuclear and cytoplasmic trafficking, transcription, mRNA splicing and translation. Through such gene regulation, lncRNAs are involved in a wide range of biological processes, including proliferation, cell cycle, apoptosis, differentiation and maintenance of pluripotency, among others. There is a good review of lncRNA’s functions, focusing on biological and pathological processes in cancer (5); refer to that article for further details. On the other hand, it is also useful to look at lncRNAs in the context of their molecular functionality (6). In this review we will focus on three molecular functions of lncRNAs: guide, scaffold and decoy.
Table 1.Representative lncRNAs implicated in cancer
One group of lncRNAs is able to bind specific proteins and then direct the localization of the resultant complex to specific targets. Such lncRNAs can guide proteins either in cis (on neighboring genes) or in trans (on distantly located genes). XIST, one of the most well-studied lncRNAs, guides PRC2 (polycomb repressive complex 2) to one of the two X chromosomes in cis to achieve X inactivation (7). Two other examples of lncRNAs that function as guides in cis are AIR and HOTTIP. AIR silences transcription of its target gene on the paternal chromosome by recruiting G9a and then mediating targeted histone H3 lysine 9 (H3K9) methylation and allelic silencing (8). HOTTIP, which is transcribed from the end of HOXA cluster, binds to WDR5 and recruits the MLL histone H3 lysine 4 (H3K4) methyltransferase complex to the HOXA cluster to support active chromatin confirmation (9).
Some lncRNAs, including lincRNA-p21 and HOTAIR, are able to alter and regulate epigenetic states and gene expression across multiple sites in trans (10,11). LincRNA-p21 is transcribed upstream of CDKN1A gene and acts as a transcriptional repressor through its interaction with hnRNP-K, which it guides to target sites. Knocking down lincRNA-p21 alters the expression of over 1,000 genes, suggesting lincRNAs may be able to regulate numerous genes in trans. In addition, some lncRNAs may serve as cellular “navigation systems” for proteins lacking direct DNA binding capacity (12).
The mechanism by which lncRNAs specifically regulate their target genes remains unclear, and their binding sites throughout the genome are largely unknown. In one recent study, Chu and colleagues addressed this question using a novel assay they named ChIRP (Chromatin Isolation by RNA Purification)-seq, which is a method for genome-wide mapping of lncRNA binding sites in vivo (13). This analysis enabled them to obtain a high-resolution map of ncRNA occupancy throughout the genome and to identify a set of 832 HOTAIR binding sites in human breast cancer cells. Interestingly, binding sites for HOTAIR are focal (＜500 bp) and located in the midst of a broad polycomb binding domain, which suggests HOTAIR may act as a pioneering factor able to recruit polycomb to its target genes and then bilaterally spread the repressive regions outward. They also discovered an underlying DNA sequence motif enriched in HOTAIR binding sites, indicating the existence of a new class of regulatory element: lncRNA target sites. We now think that lncRNAs can function as sequence-specific transcription factors.
Another class of lncRNAs may possess distinct domains that bind different effector molecules. Such lncRNAs can mediate assembly of multiple molecular components in temporally and spatially specific manner. Two examples of lncRNAs that act as scaffolds are ANRIL, which interacts with components from PRC1 (polycomb repressive complex 1) and PRC2 (14,15), and KCNQ1OT1, which binds both G9a and PRC2 (16). In addition, MALAT1 and NEAT1 serve as molecular scaffolds for proteins within nuclear speckles and paraspeckles, respectively (17). Depletion of NEAT1 is sufficient to cause loss of paraspeckles from within the nucleus, and overexpressing NEAT1, but not paraspeckle-associated proteins, leads to an increase in the number of paraspeckles, suggesting NEAT1 plays an essential role as a scaffold in the formation of paraspeckles.
A recent study revealed that HOTAIR interacts with two chromatin modifying complexes, the PRC2 complex (“writer” of a repressive mark, H3K27 trimethylation) and the LSD1/CoREST H3K4 demethylase complex (“eraser” of an activating mark, H3K4 trimethylation) (18). Using a series of deletion mutants, the PRC2 binding domain was mapped to the 5’ end (the first 300 nt) of HOTAIR, while the LSD1 binding site corresponds to the 3’ end. This suggests that HOTAIR that bridges between the PRC2 and LSD1 complexes, and that the resultant HOTAIR/PRC2/LSD1 complex can suppress gene expression via multiple mechanisms.
This finding is not applicable only to HOTAIR; many other lncRNAs also appear to interact with both the PRC2 and LSD1 complexes. For example, Khalil and colleagues performed RIP-chip assays (RNA coimmunoprecipitation combined with high throughput lincRNA microarray) using antibodies directed against several proteins involved in chromatin modifying complexes (PRC2, CoREST and SMCX) (19). They found that as many as 38% of lincRNAs expressed in the cell types studied reproducibly associate with one of these complexes. In mouse embryonic stem cells, moreover, a number of lincRNAs were found to be strongly associated with multiple chromatin modifier complexes (20). For example, eight lincRNAs bind to the PRC2 H3K27 and ESET H3K9 methyltransferase complexes and the JARID1C H3K4 demethylase complex. Similarly, 17 lincRNAs were found to bind to the PRC2, PRC1 and JARID1B complexes. Taken together, these results suggest the attractive hypothesis that lincRNAs bind to ubiquitously expressed chromatin modifying complexes in order to guide them to specific genomic regions.
A third class of lncRNAs bind and then sequester a protein or RNA target, but do not exert additional effects. By acting as molecular decoys, these lncRNAs negatively regulate the expression of their targets. Examples of lncRNAs with decoy functionality include GAS5 (growth arrest-specific 5), which binds to the glucocorticoid receptor (GR) and represses GR-induced genes (21); PANDA (P21 associated ncRNA DNA damage activated), which binds to the transcription factor NF-YA to negatively regulate expression of pro-apoptotic genes (4); and TERRA (telomeric repeat-containing RNA), which interacts with telomerase via a repeat sequence to reduce enzyme activity (22).
Interestingly, several pseudogenes also reportedly act as molecular decoys. The 3’UTR of PTENP1, a tumor suppressor pseudogene, was found to bind the same set of regulatory miRNA sequences that normally target the tumor-suppressor gene PTEN, which reduces the downregulation of PTEN mRNA and allows its translation into the tumor-suppressor protein PTEN (23). It was also shown that the PTENP1 locus is selectively lost in human cancer, and that similar relationships also exist between other cancer-related genes and their pseudogenes (24). These findings attribute a novel biological function to expressed pseudogenes, as they can regulate coding gene expression and reveal a non-coding function in mRNAs.
DYSREGULATION OF lncRNAS IN HUMAN CANCER
The lincRNA HOTAIR was originally discovered by Rinn and colleagues (25). It is encoded within the HOXC gene cluster and acts in trans to regulate HOXD genes through recruitment of PRC2 to induce trimethylation of H3K27 (H3K27me3). Remarkably, pull-down assays with PRC2 components demonstrated a direct and specific interaction with HOTAIR. This observation that HOTAIR binds PRC2 and induces epigenetic silencing of another HOX cluster on a different chromosome was an unexpected and novel finding.
In breast cancer, elevated expression of HOTAIR reportedly correlates with a poor prognosis and tumor metastasis (11). It is noteworthy that expression of a single lincRNA in primary tumors can be a powerful predictor of eventual metastasis and death. Enforced expression of HOTAIR induces genome-wide re-targeting of PRC2, leading to altered H3K27me3 and gene expression, and increased cancer invasiveness and metastasis. The link between HOTAIR and metastatic disease depends on both the direct interaction between the ncRNA and its protein partner and between the ncRNA and its target DNA sequence. Furthermore, several other studies have also shown that the level of HOTAIR expression correlates positively with metastasis and poor outcome in hepatocellular carcinoma, colorectal cancer and pancreatic cancer (26-28). We also discovered that upregulation of HOTAIR is strongly associated with aggressiveness in gastrointestinal stromal tumors (GISTs) (29). Interestingly, in malignant GISTs, HOTAIR is concurrently overexpressed with collinear HOXC genes and an oncogenic microRNA, miR-196a. We also observed enrichment of an active histone mark, H3K4me3, over a wide range of the HOXC cluster, suggesting the entire region is epigenetically activated in malignant GISTs. Taken together, these results suggest that lincRNAs play active roles in modulating the cancer epigenome and could be useful targets for cancer diagnosis and therapy.
ANRIL (antisense lncRNA of the INK4 locus) is transcribed antisense to The INK4 locus and mediates INK4a transcriptional repression in cis (30). Independent studies have shown that overexpression of ANRIL in prostate cancer results in the silencing of INK4b/ARF/INK4a and p15/CDKN2B due to heterochromatin formation (14,31). ANRIL interacts with SUZ12 (suppressor of zeste 12 homolog), a subunit of PRC2 (15) and with CBX7 (chromo- box homolog 7), a subunit of PRC1 (14). ANRIL may recruit multiple sets of chromatin-modifying complexes to a target gene for silencing, serving as a molecular scaffold. Genome-wide association studies (GWAS) have linked ANRIL with increased susceptibility to coronary disease, intracranial aneurysm and type 2 diabetes, as well as to several types of cancer, including acute lymphoblastic leukemia, glioma, basal cell carcinoma, nasopharyngeal carcinoma, breast cancer and plexiform neurofibromas (32-35).
MALAT1 (Metastasis-Associated Lung Adenocarcinoma Transcript 1) was initially identified in a screen for genes associated with metastasis (36). It is abundantly expressed in many human cell types and is highly conserved across several species. MALAT1 is reportedly upregulated in multiple malignancies, including lung cancer, uterine endometrial stromal sarcoma, cervical cancer and hepatocellular carcinoma (36-38). In lung metastasizing tumors, MALAT1 expression is three-fold higher than in non-metastasizing tumors (36), and it can serve as an independent prognostic parameter for patient survival in early stage lung adenocarcinoma (39). MALAT1 promotes the motility of lung cancer cells through transcriptional or post-transcriptional regulation of motility-related genes (40).
MALAT1 localizes to nuclear speckles, which contain several proteins known to be involved in alternative splicing (41). It appears that MALAT1 forms a molecular scaffold for several of the proteins present within nuclear speckles, and modulates the phosphorylation of SR proteins. Depletion of MALAT1 is sufficient to alter the patterns of alternative splicing of a subset of mRNAs.
Recent studies identified a number of lincRNAs induced by the p53 tumor-suppressor gene (10). LincRNA-p21 is located upstream of the CDKN1A gene on mouse chromosome 17 and is directly activated by p53 in response to DNA damage. LincRNA-p21 acts as a transcriptional repressor in the canonical p53 pathway and plays a role in triggering apoptosis. Inhibition of lincRNA-p21 alters the expression of hundreds of genes normally repressed by p53, potentially explaining how p53 can activate large numbers of genes while simultaneously repressing many others. LincRNA-p21 also interacts with heterogeneous nuclear ribonucleoprotein K (hnRNP-K), a wellknown RNA binding protein that acts as a transcriptional repressor. A 780-nt region at the 5’ end of lincRNA-p21 is necessary for interaction with hnRNP-K, which is required for proper genomic localization of hnRNP-K at repressed genes and for regulation of p53-mediated apoptosis. Although lincRNA-p21 has not been directly associated with disease, we would speculate that loss of lincRNA-p21 function could be an important factor contributing to cancer initiation.
PANDA is also induced in a p53-dependent manner. After DNA damage, p53 directly binds to the CDKN1A locus to activate PANDA (4), which appears to possess decoy function. PANDA inhibits the expression of apoptotic genes by directly binding to and sequestering the transcription factor NF-YA away from target gene promoters. PANDA is overexpressed in a subset of human breast cancers, and its depletion can sensitize cells to chemotherapeutic agents.
Prensner and colleagues used high throughput RNA-Seq with a large panel of clinical samples to comprehensively evaluate the ncRNAs dysregulated in prostate cancer (42). They identified approximately 1,800 lincRNAs in prostate tissue, of which 121 were transcriptionally dysregulated in prostate cancer. Among them, PCAT-1 (prostate cancer associated transcript -1) showed tissue-specific expression and was selectively upregulated in prostate cancer. Like HOTAIR, PCAT-1 functions predominantly as a transcriptional repressor by facilitating trans-regulation of genes preferentially involved in mitosis and cell division, including known tumor suppressor genes such as BRCA2. The discovery of PCAT-1 highlights the usefulness of unbiased transcriptome analysis when investigating the actions of lncRNAs in cancer.
GAS5 was originally identified as a gene highly expressed in cells whose growth was arrested (43). The human GAS5 gene is transcribed from chromosome 1q25.1 and is alternatively spliced. GAS5 sensitizes cells to apoptosis by regulating the activity of glucocorticoids in response to nutrient starvation (21). GAS5 binds to the DNA-binding domain of the GR, where it acts as a decoy preventing the GR from interacting with cognate glucocorticoid response elements (GRE). The binding of GAS5 to the GR is sufficient to repress GR-induced genes such as cIAP2.
Implicating GAS5 in breast cancer is the observation that levels of GAS5 transcript are significantly lower in breast cancer cells than in unaffected normal breast epithelium (44). In addition, genetic aberrations at the GAS5 locus have been found in many types of tumors, including melanomas and breast and prostate cancers (45,46), though their functional significance has not yet been established. The GAS5 gene locus has also been linked to increased susceptibility to autoimmune disorders such as systemic lupus erythematosus in the mouse BXSB strain (21). Chromosomal translocations affecting the 1q25 locus containing the GAS5 gene have been detected in melanoma, B-cell lymphoma, and prostate and breast cancer.
In recent years, technological advances in high-throughput sequencing have enabled us to identify numerous ncRNAs, which has improved our appreciation for the complexity of transcriptome and enabled us to demonstrate the dysregulated expression of a number of lncRNAs in various types of cancer. The affected molecules are now thought to function as oncogenes or tumor suppressors. However, the biological and molecular characteristics of most lncRNAs remain unknown, and much effort will be needed before a full understanding of their roles in normal and cancer cells is attained. This work will in part entail bioinformatics analysis, and secondary structure prediction will be important for identifying potentially functional motifs in lncRNAs. Elucidation of the structures of genes encoding lncRNAs, including the promoters, transcription start sites and enhancers will be essential for understanding the mechanisms governing their spatiotemporal expression patterns. Moreover, greater knowledge of the gene structures will lead to complete cloning of lncRNAs, which will facilitate functional studies using expression constructs and mouse models. Analysis of genetic and epigenetic alterations of lncRNA genes will also provide clues to understanding their pathological roles, and the mechanisms by which such alterations affect lncRNA function must be carefully characterized. Although the roles played by lncRNAs in cancer have just begun to be revealed, it is anticipated that advances in the study of lncRNAs will yield new diagnostic biomarkers as well as RNA-based therapeutic strategies.