The nucleotide sequence of the rubella virus capsid protein (C) gene has been determined from a cDNA clone derived from the 40S genomic RNA. The sequence covers the coding region of the C protein (831 nucleotides), 70 nucleotides of the 5′ untranslated region, and the 5′ end of the downstream E2 membrane protein gene. The capsid gene is unusually rich in C (41.6%) and G (31.2%) residues (G + C 72.8%), and poor in A (15.4%) and U residues (11.8%). There are regions with long runs of up to 45% C or 35% G residues. The codon usage is non-random, with a strong preference for C and G residues in the third position. Starting from two in-frame AUG codons (seven amino acid residues apart) an open reading frame (ORF) was identified that extended in frame into the ORF coding for the downstream E2 membrane protein gene. Since the amino terminus of the capsid protein is blocked, we could not determine which of the AUGs serve as the initiating codon. To verify that the deduced ORF was correct, we have determined the amino acid sequence of 13 tryptic peptides corresponding to one-third of the C protein. Our data show that the C protein is about 277 residues in length (Mr about 30750). It is very hydrophilic and rich in prolines (14.1%) and arginines (14.4%). Clusters of these amino acids are concentrated in the aminoterminal third of the C protein. No sequence homology to the capsid protein of several alphaviruses was observed. Together with our previous sequence data we have now completed the sequence of the genes coding for the structural proteins C, E2 and E1 of rubella virus.
CHANGG-J. J., TRENTD. W.1987; Nucleotide sequence of the genome region encoding the 26S mRNA of eastern equine encephalomyelitis virus and the deduced amino acid sequence of the viral structural proteins. Journal of General Virology 68:2129–2142
CLARKED. M., LOOT. W., HUII., CHONGP., GILLIAMS.1987; Nucleotide sequence and in vitro expression of rubella virus 24S subgenomic messenger RNA encoding the structural proteins E1, E2 and C. Nucleic Acids Research 15:3041–3057
DALGARNOL., RICEC. M., STRAUSSJ. H.1983; Ross River virus 26S RNA: complete nucleotide sequence and deduced sequence of the encoded structural proteins. Virology 129:170–187
FREYT. K., MARRL. D., HEMPHILLM. L., DOMINGUEZG.1986; Molecular cloning and sequencing of the region of the rubella virus genome coding for glycoprotein El. Virology 154:228–232
GAROFFH., FRISCHAUFA.-M., SIMONSK., LEHRACHH., DELIUSH.1980; The capsid protein of Semliki Forest virus has clusters of basic amino acids and prolines in its amino terminal region. Proceedings of the National Academy of Sciences, U.S.A 77:6376–6380
KALKKINENN., OKER-BLOMC., PETTERSSONR. F.1984; Three genes code for rubella virus structural proteins E1, E2a, E2b and C. Journal of General Virology 65:1549–1557
KINNEYR. M., JOHNSONB. J. B., BROWNV. L., TRENTD. W.1986; Nucleotide sequence of the 26S mRNA of the virulent Trinidad donkey strain of Venezuelan equine encephalitis virus and deduced sequence of the encoded structural proteins. Virology 152:400–413
KOZAKM.1984; Compilation and analysis of the sequences upstream from the translational start site in eukaryotic mRNAs. Nucleic Acids Research 12:857–872
OKER-BLOMC, KALKKINENN., KÄÄRIÄINENL., PETTERSSONR. F.1983; Rubella virus contains one capsid protein and three envelope glycoproteins, El, E2a, and E2b. Journal of Virology 46:964–973
OKER-BLOMC, ULMANENI., KääRIäINENL., PETTERSSONR. F.1984; Rubella virus 40S genome RNA specifies a 24S subgenomic mRNA that codes for a precursor to the structural proteins. Journal of Virology 49:403–408
RICEC. M., STRAUSSJ. H.1981; Nucleotide sequence of the 26S mRNA of Sindbis virus and deduced sequence of the encoded virus structural proteins. Proceedings of the National Academy of Sciences, U.S.A 78:2062–2066
SANGERF., NICKLENS., COULSONA. R.1977; DNA sequencing with chain-terminating inhibitors. Proceedings of the National Academy of Sciences, U.S.A 74:5463–5467
VIDGRENG., TAKKINENK., KALKKINENN., KääRIäINENL., PETTERSSONR. F.1987; Nucleotide sequence of the genes coding for the membrane glycoproteins E1 and E2 of rubella virus. Journal of General Virology 68:2347–2357