View examples of each type of basepair observed or predicted to occur in RNA 3D structures from this page. Basepairs are organized in geometric families using the Leontis-Westhof classification. Select basepair families with the dropdown menu to view base combinations that form in each family. The interactive heat map shows geometric similarities of basepairs in each family and can be used to select instances to superpose in the 3D viewer window.
More InfoMore Information About the RNA Basepair Catalog
Exemplar instances are presented for each base combination that forms an RNA basepair in each geometric family, according to the Leontis-Westhof system [1,2]. Also provided are visualizations of the basepairs, counts of observed instances for that each combination at 4 Angstroms or better in release 1.59 (April 26, 2014) of the Representative Sets of RNA 3D Structures (RNAEQ), and quantitative measures of geometric similarities between different base combinations in the same geometric family. In each of the twelve base pair families, there are potentially 4 x 4 = 16 base combinations, i.e. AA, AC, AG, AU, CA, CC, etc. Not all families have all combinations because of the arrangement of H-bonding donors and acceptors on each base.
Display areas
There are four main display areas which interact with one another to facilitate viewing:
- 4x4 grid of base combinations displays static images of each known base combination in each geometric family with ideal hydrogen bonds. Clicking one of the 16 cells will cause the selected base combination to appear in the 3D coordinate viewer; control-clicking will add or remove the clicked base combinations. In cases such as cWW AA, where there is both an Aa and aA combination, clicking on the AA cell will display both the Aa and aA. Selected cells will be highlighted with a circle in the cell.
- 3D coordinate viewer, which will display one or more base combinations, as selected by the user. The button labeled “Show Numbers” refers to the 3D coordinate viewer. The “Clear Selected” will clear both the 3D display as well as the other views. Right clicking brings up a menu of Jsmol options. Clicking and dragging on the window will rotate the molecule(s), rolling the mouse wheel will, on many platforms, zoom in or out, hovering over an atom will pop up text telling the name of the atom and number of the nucleotide. Many more controls exist and can be found by reading the Jsmol documentation.
- IsoDiscrepancy heat map. The IsoDiscrepancy is a numerical measure of geometric similarity, known as isostericity, between two base combinations in the same basepairing family [3]. Low numerical values, less than 2.2, indicate geometrically similar basepairs and are colored in shades of blue; these are considered to be isosteric. Higher numerical values falling between 2.2 and 3.5, colored in shades of yellow, are nearly isosteric, while cells with larger values indicate non-isosteric base combinations and are colored orange or red. Basepair combinations are listed in an order which puts geometrically similar base combinations near each other in the list. Clicking (and control-clicking) on the diagonal cells selects which base combinations appear in the 3D coordinate viewer. Clicking off the diagonal superimposes the two base combinations (one for the row, one for the column) in the 3D coordinate viewer. The base combinations being shown are indicated with green dots in the diagonal cells.
- Base Combinations / Exemplars Table: For each base combination a listing of the isostericity value, count in the 4 Angstrom representative set, PDB ID and resolution of the chosen exemplar along with the model, chain, nucleotide numbers, crystallographic symmetry operators, and the isostericity groups as defined by [2] are shown. The isostericity grouping indicates which set of basepairs are isosteric with each other. Clicking on a row will display that combination in the 3D view and select it in the 4x4 grid.
The uses of basepair isostericity
Basepair isostericity was introduced to explain observed sequence variability in homologous positions between different biological sources such as different organisms. Most familiar is the sequence variability in cWW basepairs in RNA double helices, where a CG base combination in one organism may correspond to an AU base combination in another.
Isostericity explains this observation by noting that the CG cWW basepair and the AU cWW basepair have nearly identical connections to the RNA backbone, as measured by the relative locations of their glycosidic bonds (between the N1/N9 atom of the base and the C1’ atom of the backbone). Similarly, in the tHS basepair family, the AA and AG base combinations are isosteric (though less so than the canonical cWW base combinations). In [3] it was documented that corresponding positions of homologous molecules tend to have conserved 3D structure and, furthermore, make RNA basepairs from the same basepairing family.
Uppercase and lowercase letters
Some basepairs like AA cWW are not actually symmetric, even though the bases are the same and the interacting edges are the same. We use upper and lowercase letters to distinguish between the two geometries. One can see them separately by clicking on the diagonal entries on the heat map. Note, for example, that cWW Cc and cWW uU are isosteric while cWW Cc and cWW Uu are not. Note also that cWW uU is more nearly isosteric with cWW UG than with GU, so the difference in geometry might be seen in substitution patterns.
Basepair frequencies by base combination and by geometric family
Basepair frequencies calculated from the representative set provide estimates of the relative occurrences of base combinations and base pair families for use in bioinformatics and RNA structure modeling. In the following tables, cells with frequency values of 20% or more are shaded blue, values between 10% and 20% are shaded grey.
- Relative frequencies by geometric basepair family, i.e., percent of all basepairs that are cWW, tWW, cWH, tWH, …, tSS for all twelve basepair families, normalized by total number of basepairs (12 x 1 table).
- Relative frequencies by bases in the pair, i.e., percent of basepairs that are AA, AC, AG, … UU for all base combinations, normalized by total number of basepairs (4 x 4 table). In this summary, each pair of bases is listed only once with bases in alphabetical order, so for example, CG is listed but GC is not.
- Relative frequencies of each base combination by geometric family, normalized by total number of basepairs in that family (12 x 16 table). In symmetric families cWW, tWW, cHH, and tHH, each pair of bases is listed only once, so for example CG is listed but GC is not (cell is shaded black). Percentages in each row sum to 100. Basepair combinations with 0% frequency are indicated by empty cells.
- Relative frequencies of each pair of bases by geometric family, normalized by total number of basepairs having that pair of bases (12 x 10 table). In this summary, each pair of bases is listed only once, so for example CG is listed but GC is not. Percentages in each column sum to 100. Basepair combinations with 0% frequency are indicated by empty cells.
Family | Notation | Count | Percentage |
---|---|---|---|
1 | cWW | 15181 | 74.21 |
2 | tWW | 239 | 1.17 |
3 | cWH | 284 | 1.39 |
4 | tWH | 826 | 4.04 |
5 | cWS | 326 | 1.59 |
6 | tWS | 286 | 1.40 |
7 | cHH | 9 | 0.04 |
8 | tHH | 199 | 0.97 |
9 | cHS | 273 | 1.33 |
10 | tHS | 1100 | 5.38 |
11 | cSS | 953 | 4.66 |
12 | tSS | 781 | 3.82 |
Total | 20457 | 100.00 |
1st base↓/2nd base→ | A | C | G | U |
---|---|---|---|---|
A | 3.31 | 4.64 | 9.17 | 25.77 |
C | 0.29 | 46.3 | 0.45 | |
G | 1.45 | 7.24 | ||
U | 1.37 |
Family | Notation | AA | AC | AG | AU | CA | CC | CG | CU | GA | GC | GG | GU | UA | UC | UG | UU | Total |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | cWW | 0.11 | 0.44 | 1.26 | 27.67 | 0.04 | 61.37 | 0.16 | 7.54 | 1.42 | 100.0 | |||||||
2 | tWW | 28.87 | 5.86 | 29.29 | 2.93 | 23.85 | 1.26 | 4.18 | 1.67 | 2.09 | 100.0 | |||||||
3 | cWH | 1.41 | 1.41 | 3.17 | 2.11 | 43.31 | 40.49 | 2.82 | 5.28 | 100.0 | ||||||||
4 | tWH | 11.38 | 0.73 | 12.35 | 0.97 | 0.97 | 5.81 | 0.97 | 64.77 | 0.12 | 1.94 | 100.0 | ||||||
5 | cWS | 23.62 | 26.69 | 0.31 | 12.27 | 6.44 | 2.45 | 1.84 | 5.52 | 0.92 | 1.84 | 1.53 | 3.37 | 7.36 | 0.31 | 3.68 | 1.84 | 100.0 |
6 | tWS | 2.80 | 1.05 | 48.95 | 1.05 | 0.70 | 3.15 | 6.64 | 2.80 | 22.38 | 5.59 | 1.05 | 3.15 | 0.7 | 100.0 | |||
7 | cHH | 44.44 | 11.11 | 44.44 | 100.0 | |||||||||||||
8 | tHH | 80.40 | 3.02 | 2.51 | 4.02 | 2.01 | 3.02 | 5.03 | 100.0 | |||||||||
9 | cHS | 14.29 | 2.20 | 2.20 | 2.20 | 9.16 | 1.83 | 0.73 | 6.96 | 1.10 | 2.2 | 0.37 | 0.73 | 49.45 | 6.59 | 100.0 | ||
10 | tHS | 8.36 | 2.82 | 74.73 | 2.09 | 1.00 | 1.18 | 0.73 | 3.27 | 3.09 | 2.73 | 100.0 | ||||||
11 | cSS | 7.87 | 28.86 | 12.49 | 0.52 | 17.52 | 2.52 | 0.52 | 9.86 | 0.63 | 0.63 | 1.78 | 12.59 | 0.42 | 3.57 | 0.21 | 100.0 | |
12 | tSS | 6.02 | 17.03 | 59.80 | 9.09 | 0.51 | 0.77 | 6.27 | 0.51 | 100.0 |
Family | Notation | AA | AC | AG | AU | CC | CG | CU | GG | GU | UU |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | cWW | 2.51 | 10.70 | 9.26 | 93.84 | 10.00 | 95.62 | 66.12 | 88.77 | 77.14 | |
2 | tWW | 10.18 | 3.08 | 0.48 | 1.59 | 11.67 | 0.69 | 2.20 | 3.37 | 0.59 | 1.79 |
3 | cWH | 0.59 | 6.16 | 0.32 | 6.67 | 1.35 | 4.13 | 41.41 | 0.98 | 5.36 | |
4 | tWH | 13.86 | 1.17 | 2.62 | 0.34 | 13.33 | 0.57 | 4.41 | 16.16 | 1.57 | 5.71 |
5 | cWS | 11.36 | 13.93 | 0.29 | 0.98 | 13.33 | 0.11 | 6.61 | 1.68 | 1.11 | 2.14 |
6 | tWS | 1.18 | 1.76 | 6.79 | 0.11 | 15.00 | 0.20 | 0.55 | 4.31 | 0.71 | |
7 | cHH | 0.39 | 0.05 | 1.35 | |||||||
8 | tHH | 23.60 | 0.88 | 0.73 | 0.17 | 0.14 | 1.65 | 3.37 | |||
9 | cHS | 5.75 | 1.61 | 0.58 | 0.51 | 8.33 | 0.08 | 10.19 | 2.02 | 1.17 | 6.43 |
10 | tHS | 13.57 | 6.45 | 41.61 | 0.49 | 21.67 | 0.37 | 2.20 | 12.12 | ||
11 | cSS | 11.06 | 40.32 | 6.06 | 0.15 | 0.31 | 1.93 | 2.02 | 1.24 | 0.71 | |
12 | tSS | 6.93 | 19.50 | 25.02 | 1.51 | 0.50 | 16.50 | 0.26 | |||
Total | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 |
References
- Leontis, N.B. and E. Westhof, Geometric nomenclature and classification of RNA base pairs. RNA, 2001. 7(4): p. 499-512. link
- Leontis, N.B., J. Stombaugh, and E. Westhof, The non-Watson-Crick base pairs and their associated isostericity matrices. Nucleic acids research, 2002. 30(16): p. 3497-531. link
- Stombaugh, J., C.L. Zirbel, E. Westhof, and N.B. Leontis, Frequency and isostericity of RNA base pairs. Nucleic acids research, 2009. 37(7): p. 2294-312. link