How to refer to amino acids by their single-letter abbreviations in ChemicalData[]?
Well, I guess the only way is to build up a rule to do the conversion for you. I've done just that, so here it is:
abb = {"A" -> "L-Alanine", "R" -> "L-Arginine", "N" -> "LAsparagine",
"D" -> "LAsparticAcid", "C" -> "LCysteine", "E" -> "L-GlutamicAcid",
"Q" -> "L-Glutamine", "G" -> "Glycine", "H" -> "L-Histidine",
"I" -> "L-Isoleucine", "L" -> "L-Leucine", "K" -> "Lysine",
"M" -> "L-Methionine", "F" -> "L-Phenylalanine", "P" -> "L-Proline",
"S" -> "L-Serine", "T" -> "L-Threonine", "W" -> "L-Tryptophan",
"Y" -> "L-Tyrosine", "V" -> "L-Valine", "U" -> "Selenocysteine",
"O" -> "LPyrrolysine" }
Now let's use it:
Map[ChemicalData[#, "MolarMass"] &, {"Y", "A", "R", "F"} /. abb]
(* {181.189g/mol, 89.0932g/mol, 174.201g/mol, 165.189g/mol} *)
Not an answer (community wiki)
Thanks to J.M. for a correction to the 3-letter abbreviations.
aaList = {"LAlanine", "LCysteine", "LAsparticAcid", "LGlutamicAcid","LPhenylalanine", "Glycine", "LHistidine", "LIsoleucine", "LLysine", "LLeucine", "LMethionine", "LAsparagine","LPyrrolysine", "LProline", "LGlutamine", "LArginine", "LSerine", "LThreonine", "Selenocysteine", "LValine", "LTryptophan","LTyrosine"};
Standard Amino Acids (22) incorporated on ribosome
stdAA = ChemicalData[#, "StandardName"] & /@ aaList;
and
nameAA = ChemicalData[#, "Name"] & /@ aaList;
Accepted Amino Acid 1-Letter and 3-Letter Abbreviations
oneLetterAA = Delete[CharacterRange["A", "Z"], {{2}, {10}, {24}, {26}}];
threeLetterAA = ReplacePart[StringTake[#, 3] & /@ (StringTrim[#, "L"] & /@ stdAA) /. {"Pyr" ->
"Pyl", "Sel" -> "Sec", "Iso" -> "Ile", "Try" -> "Trp"}, {12 ->
"Asn", 15 -> "Gln"}]
1-Letter to Standard
oneToStdAA = Thread[oneLetterAA -> stdAA]
{A->LAlanine,C->LCysteine,D->LAsparticAcid,E->LGlutamicAcid,F->LPhenylalanine,
G->Glycine,H->LHistidine,I->LIsoleucine,K->LLysine,L->LLeucine,
M->LMethionine,N->LAsparagine,O->LPyrrolysine,P->LProline,Q->LGlutamine,R->LArginine,
S->LSerine,T->LThreonine,U->Selenocysteine,V->LValine,W->LTryptophan,
Y->LTyrosine}
3-Letter to Standard
threeToStdAA = Thread[threeLetterAA -> stdAA]
{Ala->LAlanine,Cys->LCysteine,Asp->LAsparticAcid,Glu->LGlutamicAcid,
Phe->LPhenylalanine,Gly->Glycine,His->LHistidine,Ile->LIsoleucine,
Lys->LLysine,Leu->LLeucine,Met->LMethionine,Asn->LAsparagine,
Pyl->LPyrrolysine,Pro->LProline,Gln->LGlutamine,Arg->LArginine,Ser->LSerine,
Thr->LThreonine,Sec->Selenocysteine,Val->LValine,Trp->LTryptophan,
Tyr->LTyrosine}
Example One
Transpose[{oneLetterAA, threeLetterAA, stdAA, nameAA,
Map[ChemicalData[#, "MolarMass"] &, oneLetterAA /. oneToStdAA]}] //
TableForm
Example Two. The molecular weight of bovine serum albumin (bsa)
aaToMMass = Thread[oneLetterAA -> Map[ChemicalData[#, "MolarMass"] &, oneLetterAA /. oneToStdAA]];
Import Sequence
importedSequence = Import["http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=
protein&id=3336842&rettype=fasta&retmode=text", "Data"];
{bsaSequenceHeading, bsaSequenceData} = {First@#, Rest@#} &@importedSequence;
bsaSequenceHeading
>gi|3336842|emb|CAA76847.1| bovine serum albumin [Bos taurus]
Sequence molar mass of bovine serum albumin
bsaMolarMass = Total[# /. aaToMMass] - (Length@# - 1) ChemicalData["Water", "MolarMass"] &@
Flatten@Characters@StringReplace[bsaSequenceData, Whitespace -> ""]
69323. g/mol
69332.0 g/mol
Thanks to J.M. for suggesting ChemicalData["Water", "MolarMass"]
. I had originally just multiplied by 18.
If "MolecularMass" is preferred to "MolarMass" (see here)
aaMolecularMass = Thread[ChemicalData["AminoAcids", "StandardName"] ->
ChemicalData["AminoAcids", "MolecularMass"]];
bsaMolecularMass = Total[# /. aaMolecularMass] - (Length@# - 1) ChemicalData["Water",
"MolecularMass"] &@(Flatten@
Characters@StringReplace[bsaSequenceData, Whitespace -> ""] /.
oneToStdAA)
69323. u
As pointed out by J.M., an advantage of using "MolecularMass" is that the result may converted to kilodaltons
UnitConvert[%, "Kilodaltons"]
69.323 kDa
Example Three
Putting it all together:
importedSequence2 = Import["http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=
protein&id=AAS66653&rettype=fasta&retmode=text", "Data"];
Flatten@{{UnitConvert[#, "Kilodaltons"] &@(Total[# /. aaMolecularMass]
- (Length@# - 1) ChemicalData["Water","MolecularMass"]), Quantity[Length@# ,
IndependentUnit["amino acids"]]} &@(Flatten@Characters@StringReplace[#, Whitespace -> ""] /.
oneToStdAA) &@Rest@# , First@#} &@importedSequence2
51.2581kDa, 472 amino acids,
>gi|45479207|gb|AAS66653.1| variable surface glycoprotein [Trypanosoma evansi]