Comprehensive Natural Products II: Chemistry and Biology: Volume 9 - PDF Free Download (2024)

, because of the mutual chiral displacement. Therefore, the CD Cotton effects of this group are much stronger than those of groups (a) and (b). So, when determining the ACs by CD spectroscopy, it is advisable to take advantage of the large Cotton effects of this group, even in the cases of theoretical calculations. 9.04.3.5

Theoretical Calculation of CD Spectra

As described above, if rotational strength R defined by Equation (10) can be calculated by quantum mechanical theory, CD spectra can be reproduced by Equation (13) leading to the determination of ACs. For the calculation of rotational strength R, there are several theoretical methods as follows: 1. CD exciton chirality method8,11: the most simple and reliable method applicable to a variety of natural products, because the exciton-coupled CD is based on the coupled oscillator theory and the mechanism of this method has already been established as will be briefly explained in the following sections. Therefore, numerical calculations using a computer are not necessary. 2. De Voe calculation12,13: a simple method based on the coupled oscillator theory, which is applicable to more complex chiral molecules composed of two or more groups. This method needs numerical calculations using a computer. Some examples are listed in the section of applications. 3. -Electron SCF-CI-DV MO (Self-Consistent Field-Configuration Interaction-Dipole Velocity-Molecular Orbital) method8,14: a molecular orbital (MO) method with -electron approximation, which is applicable to chiral molecules having twisted -electron systems. As this method treats only -electrons, computation time is shorter than the cases treating all electrons. Some examples are shown in the application sections. 4. Ab initio MO calculations15: Recent years have witnessed a great advancement in the first-principle calculations of chiroptical properties. The development of ab initio methodologies, which include Hartree–Fock, density functional theory (DFT), as well as high-level correlation methods, such as coupled cluster theory, have enabled theoretical simulations of CD, OR, and other chiroptical properties. Since these methods treat all electrons including -electrons, a large amount of computation is necessary, especially for calculations on conformationally flexible molecules. Some examples are shown in the application sections. Figure 6 shows the general scheme for determining the AC of a chiral compound based on theoretical calculations and experimental CD measurements. Natural or synthetic chiral compound

Experimental CD spectrum

Comparison Conformational analysis by: empirical force field MM, density functional theory (DFT)

Calculated CD spectrum

Boltzmann weighing

Conformers of different energies Calculated CD curve for each conformer by De Voe coupled oscillator, π-electron SCF-CI-DV MO, time-dependent DFT (TDDFT)

Figure 6 The scheme for determining the AC of a chiral compound by theoretical calculation of CD spectrum and comparison with the experimental CD spectrum.

100 Characterization by Circular Dichroism Spectroscopy

9.04.4 CD Exciton Chirality Method 9.04.4.1

Basic Principles

The CD exciton chirality method has been successfully applied to a variety of natural products to determine their ACs. This method enables one to deduce the AC of a chiral compound without any reference compound, and therefore, it is established as a nonempirical method. The principles of the CD exciton chirality method are explained using the steroidal bis(p-dimethylaminobenzoate) shown below as a model compound, where the nonempirical nature of this method is easily proved.8,11,16 As exemplified with cholest-5-ene-3,4-diol bis(p-dimethylaminobenzoate) 1 in Figure 7, when two identical chromophores (i and j), which exhibit intense UV absorption of their transition (ground state 0 ! excited state a), exist in a molecule, these two chromophores interact with each other and the excited state splits into two energy levels ( and states).8 The ground state (0) remains unsplit. This phenomenon is called exciton coupling or exciton interaction. Thus there are two electronic transitions, from ground 0 to excited states and , that is, transitions 0 ! and 0 ! . The wavefunction, energy, dipole strength, and rotational strength for the -state and -state are formulated as shown in Figure 7, where Vij is defined as the interaction energy between two electric transition moments mi 0a and mj 0a. If Vij is positive, the -state corresponds to the transition at longer wavelength, while the -state corresponds to the transition at shorter wavelength. As shown in Figure 7, the rotational strength R of the -state is opposite in sign to that of the -state, R, but their absolute values are equal to each other. It should be noted that the sign and magnitude of R and R are governed by the triple product Rij (mj 0a mj 0a).8

Figure 7 Theoretical summary of the CD exciton chirality method.

Characterization by Circular Dichroism Spectroscopy

(a)

101

(c) N

Group j

O Vector product μi0a × μj0a

O

O

Δε

Rij

H

O O

N

O

4

1

H 3

HO

O

Antiparallel

+40

N

295.5 (+39.7)

CD

N

+20

Group i (1) Interaction energy Vij > 0.

–20 in EtOH

(b)

ε × 10–4

(2) Antiparallel, so R α < 0, and R β > 0.

–40

Positive second Cotton –60

λ

6

320.5 (–63.1)

308 (53 200)

4 UV

Negative first Cotton

2

200

300

λ (nm)

400

Figure 8 Application of the CD exciton chirality method to cholest-5-ene-3,4-diol bis(p-dimethylaminobenzoate) 1: CD and UV spectra in EtOH. Redrawn from N. Harada; K. Nakanishi, Circular Dichroic Spectroscopy – Exciton Coupling in Organic Stereochemistry; University Science Books: Mill Valley, CA, and Oxford University Press: Oxford, 1983.

These equations are next applied to steroidal dibenzoate 1 in Figure 8. For the two electric transition moments mi 0a and mj 0a in the benzoate chromophores (Figure 8), the interaction energy Vij becomes positive, and therefore the -state is lower in energy than the -state. Two vectors mi 0a and mj 0a constitute a counterclockwise screw, and so the resultant vector mj 0a mj 0a is antiparallel to the distance vector Rij. Therefore the triple product Rij (mj 0a mj 0a) becomes negative, and so R is negative while R is positive. This result leads to the CD spectral pattern as shown in Figure 8(b), where the Cotton effect at longer wavelength (named first Cotton effect) is negative and that at shorter wavelength (second Cotton effect) is positive. These exciton-coupled CD Cotton effects with opposite signs each other are called ‘bisignate Cotton effects’. This is the theoretical deduction of exciton CD Cotton effects reflecting the AC of the two electric transition moments, that is, two chromophores.8 Figure 8(c) shows the UV and CD spectra of the actual compound, with cholest-5-ene-3,4-diol bis(p-dimethylaminobenzoate) 1, where the UV shows an intense absorption band (max 308 nm,

102 Characterization by Circular Dichroism Spectroscopy

" 53 200), which is polarized along the long axis of the chromophore. The CD spectrum shows negative first and positive second Cotton effects in agreement with the theoretical conclusion: first Cotton effect, ext 320.5 nm, " 63.1 and second one ext 295.5 nm, " þ39.7. The amplitude of the exciton CD is defined as A ¼ "1 "2, where "1 and "2 are " values of first and second Cotton effects, respectively. In the case of dibenzoate 1, A ¼ 102.8. From these results, one can easily determine the AC of the original glycol. In Figure 9, the UV spectrum of cholest-5-ene-3,4-diol bis(p-bromobenzoate) 2 shows the longaxis-polarized transition at 244 nm, while the CD spectrum shows negative first and positive second Cotton effects (A ¼ 51.6) in agreement with the negative screw sense between the two long axes. This counterclockwise screw sense is directly observed by the X-ray crystallographic stereoview shown in Figure 9, where 3-equatorial benzoate chromophore is placed in front, while 4-axial benzoate in the rear. From the above-mentioned results, the exciton chirality governing the sign and intensity of Cotton effects is defined as shown in Table 1.8 The qualitative definition of exciton chirality is very simple: (1) if two transition moments constitute a clockwise screw sense, CD shows positive first and negative second Cotton effects. On the other hand, (2) if they describe a counterclockwise screw sense, negative first and positive second Cotton effects are observed. In most cases, intense exciton-coupled CD Cotton effects are observed at the long-axis-polarized transition, and therefore the above results are rephrased as follows:8 1. If the long axes of two interacting chromophores constitute a clockwise screw sense, the CD shows a positive first Cotton effect at a longer wavelength and a negative second Cotton effect at a shorter wavelength (Table 1 and Figure 10). 2. If they make a counterclockwise screw sense, a negative first Cotton effect at a longer wavelength and a positive second Cotton effect at a shorter wavelength are observed. In general, the CD zero crossing point corresponds to max of UV band.

+40

CD CD 243.6 (–30.4) 236.2 (+21.2)

+20 Δε

Group i 3β -equatorial O

H 2

O

O

Br

O

A = –51.6

ε × 10–4

–20

–40

Group j 4β-axial

4β-axial group j in rear Br

O H

O O

Br

6

O

4

UV Br

UV 244.0 (41 800)

2

X-ray crystallographic stereoview 200

250

300

λ (nm)

3β -equatorial group i in front

Figure 9 CD and UV spectra of cholest-5-ene-3,4-diol bis(p-bromobenzoate) 2 in 10% 1,4-dioxane/EtOH and X-ray crystallographic stereoview (X-ray, N. Harada, unpublished data).

Characterization by Circular Dichroism Spectroscopy

103

Table 1 Definition of exciton chirality Qualitative definition

Quantitative definition

+

Positive exciton chirality

R ij • (μ i 0a × μ j 0a) V ij > 0

Negative exciton chirality

R ij • (μ i 0a × μ j0a) V ij < 0

Cotton effects

Positive first (at longer wavelength) and negative second (at shorter wavelength) Cotton effects Negative first (at longer wavelength) and positive second (at shorter wavelength) Cotton effects

Redrawn from N. Harada; K. Nakanishi, Circular Dichroic Spectroscopy – Exciton Coupling in Organic Stereochemistry; University Science Books: Mill Valley, CA, and Oxford University Press: Oxford, 1983.

Negative exciton chirality

Positive exciton chirality

Postive second Cotton

Postive first Cotton

CD

A

λ

λ

Negative first Cotton

A

Negative second Cotton

UV

λ max

λ

λ max

λ

Figure 10 Typical pattern of exciton-coupled CD Cotton effects and UV absorption band.

From the quantitative definition of exciton chirality, some important features are derived. 1. The intensity of the exciton CD (A-value) is inversely proportional to the square of the interchromophoric distance Rij provided the remaining angular part is the same.8 A _ Rij – 2

ð16Þ

2. The A-value of exciton split CD is the function of the dihedral angle between two transition moments. In the case of vicinal glycol dibenzoates, the sign of the exciton split Cotton effects remains unchanged from 0 to 180 . Therefore, the qualitative definition shown in Table 1 is applicable to a dibenzoate with the dihedral angle of more than 90 . The maximum A-value is around 70 .8 3. In the case of chiral 1,19-binaphthyl and related compounds, it was theoretically calculated that the sign of the exciton CD Cotton effects changes from plus to minus, or vice versa, when the dihedral angle between the two naphthalene planes is changed from 0 to 180 ; the zero point being around 110 .17–19 Therefore, when the CD

104 Characterization by Circular Dichroism Spectroscopy

exciton method is applied to these compounds, the information of the dihedral angle is necessary. However, the X-ray data of some compounds of this series revealed that the dihedral angle is distributed in the range of 68–92 .20 4. The A-value is proportional to the square of absorption coefficient " of the chromophore. Therefore, it is advisable to use chromophores undergoing intense transition. 5. For the exciton coupling systems with three or more chromophores, it was found that the so-called additivity rule holds. For example, for a trimer, AðtotalÞ ¼ Að1;2Þ þ Að1;3Þ þ Að2;3Þ

ð17Þ

where A(1,2), A(1,3), and A(2,3) are the A-values of component pairs. 6. The Cotton effects of - and -states have identical rotational strength of opposite signs. Namely, the two split Cotton effects are conservative, and satisfy the sum rule. X

Rk ¼ 0

ð18Þ

7. Since rotational strength R is a physically observable quantity, rotational strength should be origin independent. Equations shown in Figure 7 satisfy the origin independence of rotational strength.

9.04.4.2 The Consistency between X-Ray Crystallographic Bijvoet and CD Exciton Chirality Methods It is well known that the AC of chiral compounds was first determined by X-ray crystallography using the anomalous dispersion effect of heavy atoms by Bijvoet et al. in 1951.21–23 As discussed above, the CD exciton chirality method enables one to determine ACs in a nonempirical manner without any reference compounds with known ACs. These methods are based on totally different physical phenomena, but it is natural that for a specific chiral compound, X-ray and CD exciton methods should come to the same AC. However, it was claimed in 1972 that the ACs determined by X-ray and CD exciton methods disagreed with each other, and the ACs determined by the X-ray Bijvoet method should be revised.24–26 This conclusion was based on the X-ray and CD analyses of compounds ()-5 and (þ)-6 in Figure 11, where the CD of the weak 1Lb transition (290 nm) of aniline chromophore polarized along the short axis was analyzed as an exciton couplet. However, this claim was subsequently retracted as a wrong assignment. Thus the exciton chirality method should be applied to the intense UV transition as shown in Figure 11, but not to the weak UV transition. In 1976, the synthesis and CD spectra of the most ideal chiral cage compound (þ)-3 with two anthracene chromophores were reported;27–29 the results completely proved the consistency between X-ray Bijvoet and CD exciton methods (Figure 11). Compound (þ)-3 was synthesized starting from diester (þ)-4, which was chemically correlated with compounds ()-5 and (þ)-6. The ACs of ()-5 and (þ)-6 had previously been determined by X-ray Bijvoet method.30,31 As expected, compound (þ)-3 shows extremely intense exciton-coupled CD Cotton effects at the strong 1Bb transition of anthracene chromophore polarized along the long axis: ext 268.0 nm (" þ 931.3), 249.7 (720.8), A ¼ þ1652.1 (Figure 11). It was thus shown that the use of strong UV transition gives rise to intense exciton-coupled CD. Since the UV transition at 267.2 nm is polarized along the long axis of the anthracene chromophore, exciton split Cotton effects at 268.0 and 249.7 nm are generated by the exciton coupling between these two transition moments of each anthracene group. The long axes of the anthracene moieties constitute a clockwise screw sense leading to positive first and negative second Cotton effects, and therefore the AC of compound (þ)-3 was determined as shown. This result agrees with that determined by the X-ray Bijvoet method (Figure 11). It is now established that both X-ray and CD exciton chirality methods give correct ACs.27 As discussed above, the CD exciton chirality method enables one to determine ACs in a nonempirical manner. A striking example proving the nonempirical nature and utility of the CD exciton chirality method is the reversal of the ACs of clerodin 7 and related diterpenes 8, 9 as briefly explained below (Figure 12). In 1962, the AC of clerodin 7,32 a key compound of the clerodane diterpenes, was determined as shown in Figure 12 by the X-ray Bijvoet method.33 Since this AC was believed to be correct, clerodin 7 was then treated as a reference compound for newly isolated members of this diterpene family. For example, in 1974, the ACs of caryoptin 8 and 3-epicaryoptin 9 were determined to be as shown by CD and/or chemical correlation with 7.

Characterization by Circular Dichroism Spectroscopy

+1000

268.0 (+931.3)

+500

COOCH3

H

H

H

H

H

H

H

H CH3OOC

A = +1652.1 CD

ε × 10–5

(+)-3 Δε

3 267.2 (268 000)

H

H

H

H

(+)-4

CD exciton chirality

CH3O

BrH3N

N(CH3)2HBr

2

+

NH3Br

–500

CH3O

1

249.7 (–720.8)

UV 200

105

300

λ /nm

400

(–)-5

(+)-6

X-ray Bijvoet

X-ray Bijvoet

Figure 11 CD and UV spectra of (6R,15R)-(þ)-6,15-dihydro-6,15-ethanonaphtho[2,3-c]pentaphene 3 in dioxane/EtOH and chemical correlation between compound 3 and related compounds, where the most ideal chiral cage compound 3 with two anthracene chromophores shows intense exciton-coupled CD Cotton effects, establishing the consistency between X-ray Bijvoet and CD exciton chirality methods. Redrawn from N. Harada; K. Nakanishi, Circular Dichroic Spectroscopy – Exciton Coupling in Organic Stereochemistry; University Science Books: Mill Valley, CA, and Oxford University Press: Oxford, 1983.

However, the observed positive exciton couplet of 3,6-bis(p-Cl-benzoate) 11 derived from 3-epicaryoptin 9 disagreed with the negative one expected from the ACs of 11 and 11a (Figure 12). To explain the discrepancy between the CD and AC, the conformation 11b was proposed in which one of the benzoate groups adopted a twisted conformation due to an intramolecular hydrogen bond (H-bond), thus generating a positive twist (Figure 12). This result was reported as an exception of the CD exciton chirality method.34 On the other hand, in 1973, the AC of newly isolated clerodendrin A 10 was independently determined by X-ray crystallography and chemical correlation and was shown to have the opposite AC to that of 7.35–37 Thus, it was classified as ent-clerodane (enantiomeric clerodane) (Figure 12). As described above, 3,6-bis(p-Cl-benzoate) 11 was reported as the exception of the CD exciton chirality method. To resolve this problem, in 1978 the steroidal model compound 12 was synthesized.38 Compounds 12 and 11 have the same relative configurations at key positions as shown in 12a and 11a (Figure 13). The CD spectrum of 12 showed a positive couplet in agreement with the positive twist of conformation 12a, indicating that the conformation of the benzoate group is not twisted by an intramolecular H-bond. Since 3,6-bis(p-Cl-benzoate) 11 and 12 showed CD couplets of the same sign, 11a should have the same AC as 12a as shown in Figure 13. From these results, the ACs of 11, 9, and 7 were reversed.38 The above results prompted new X-ray analyses of 7 and 9. The results showed that the original X-ray analysis used to determine the AC assignment for clerodin was incorrect and that the AC as originally assigned should be reversed.39 9.04.4.3

The Use of Preexisting Chromophores in Natural Products for Exciton Coupling

Some natural products already have one or two chromophores, which are useful for observing exciton CD to determine their ACs. The following chromophores are commonly found in natural products.

106 Characterization by Circular Dichroism Spectroscopy

Figure 12 The absolute configurations of clerodane diterpenes as determined by X-ray crystallography/CD/chemical correlation. However, the ACs of compounds in brackets were later reversed.

9.04.4.3.1 method

Substituted benzene and polyacene chromophores for the CD exciton chirality

As demonstrated in Section 9.04.4.2, the 1Bb transition of polyacene chromophores is ideally suitable for observing exciton-coupled CD. The UV data of some polyacenes with D2h-symmetry are shown in Figure 14. In the polyacene systems, there is no ambiguity for determining the long and short axes, and therefore the CD exciton chirality method offers more reliable and definite conclusions of AC. 9.04.4.3.2 Conjugated dienes, enones, ene-esters, ene-lactones, and diene-esters as exciton CD chromophores

The conjugated dienes, enones, etc., shown in Figure 14 are useful chromophores for the CD exciton chirality method. The transition moment of their band is almost parallel to the long axis of the chromophores as depicted in the tables. 9.04.4.3.3 Natural products with two chromophores showing exciton CDs: Nondegenerate and degenerate cases

The exciton coupling CD mechanism is applicable also to compounds having two different chromophores, which exhibit long-axis-polarized transitions at different wavelengths. This case is called the nondegenerate system because of the different transition energies. On the other hand, if a compound has two identical chromophores, for example, steroidal bis(p-dimethylaminobenzoate) 1 in Figure 7, it is called a degenerate system, because of the same transition energies (degenerated excited state). The ACs of some natural products, such as those in Figure 15, were established conveniently by direct analysis of their CD spectra without any additional chemical modification. In such cases the interaction of at

Characterization by Circular Dichroism Spectroscopy

107

Figure 13 Comparison of exciton CD of model compound 12a with that of 11a led to the reversal of ACs of clerodane and related compounds. The same conclusion was obtained later by X-ray crystallography.

least two preexisting chromophores with suitable electronic and geometrical attributes leads to a very diagnostic exciton split CD band, and hence to assignment of the AC. Dihydro--agarofuran 13 itself has cinnamate and benzoate chromophores, which exhibit exciton CDs at 270.7 and 227.8 nm (see Section 9.04.6.3.4).40 In vinblastine 14, indole and indoline chromophores interact with each other generating exciton CDs (see Section 9.04.6.3.13).41 For abscisic acid 15, the opposite AC was once assigned, but it was later revised as shown by several studies. One was the application of exciton CD to the interaction between the enone and the diene-carboxylic acid chromophores showing a positive couplet.42 The AC of dendryphiellin F 16 was determined on the basis of exciton CDs generated by the interaction between diene and diene-carboxylate chromophores.43 The case of quassin 17 is unique because of the exciton coupling between two identical chromophores, that is, -methoxy-enone groups.44 The exciton coupling between dehydro-tetralone and phthalide chromophores enabled the determination of AC of arnottin II 18.45

9.04.4.4

Suitable Chromophores for the CD Exciton Chirality Method

In general, natural products contain either only one useful chromophore or none at all. For this reason the selection of suitable chromophore(s) to be introduced into the substrate by chemical derivatization is an important issue when it comes to the determination of AC by CD. With the so-called monochromophoric approach, applicable mainly to rigid substrates, two identical chromophores are introduced in a one-step reaction, usually by acylating the primary or secondary hydroxyl or amino groups. In such cases the exciton coupling provides an intense CD and allows for uncomplicated AC assignments.9,11

108 Characterization by Circular Dichroism Spectroscopy

Figure 14 Some exciton CD chromophores found in natural products.

The chromophores used for the CD exciton chirality method have to satisfy the following requirements: (1) presence of an intense transition and (2) direction of the transition moment is clear in the geometry of the chromophore. Therefore, in general, chromophores of high symmetry are desirable. Figure 16 shows typical chromophores useful for the CD exciton chirality method, where arrows indicate the direction of the transition moment responsible for the exciton-coupled CD. In general, the long-axispolarized transitions are suitable for exciton CD, because of the larger UV intensity; as discussed above, the exciton coupling between strong UV transition moments gives rise to strong CD Cotton effects. The newly introduced chromophores are selected either because of their suitability for exciton coupling with another preexisting chromophore in the substrate or to avoid interaction with them if the latter possess an electronically complicated structure. 9.04.4.4.1

Para-substituted benzoate chromophores for glycols As discussed in the above examples of natural products, the intramolecular CT or 1La transition (230–310 nm) of para-substituted benzoate chromophores is useful for determining the AC of glycols.8 The intramolecular CT transition is polarized along the long axis of the benzoate chromophore, which is almost parallel to the alcoholic C–O bond. Therefore, the AC of the glycol part can be determined from the exciton CD data. On the other hand, ortho- and meta-substituted benzoate chromophores are not suitable for the CD exciton chirality method because their transition moments are not parallel to the alcoholic C–O bond. 9.04.4.4.2

Cinnamate, -naphthoate, and other chromophores for glycols These chromophores are also useful because of their absorption at longer wavelength and/or strong absorption intensity.

Characterization by Circular Dichroism Spectroscopy

109

Figure 15 Examples of natural products with two preexisting chromophores showing exciton CD.

9.04.4.4.3

Tetraphenyl-porphyrin-carboxylic acid Among the chromophores shown in Figure 16 the tetraphenporphyrins and metalloporphyrins (see also Figure 17) deserve special attention. They possess a very intense sharp and narrow Soret band (" 450 000– 550 000), shifted to the red (at 420 nm). They are also endowed with many other unique geometrical and electronic properties, such as fluorescence, facile modification, variable solubility, and approximately planar geometry. Therefore, the porphyrins and their Zn and Mg derivatives belong to the most powerful and versatile CD chromophores. A detailed discussion on the application of porphyrins as CD reporter groups as well as an account of the theoretical analysis of porphyrin–porphyrin exciton interactions is available.46,47 The Soret band originates from the two degenerate transitions Bx and By (Figure 17), which are perpendicular to each other; therefore, theoretically the porphyrin Soret band should be considered as a circular oscillator.47 However, due to rotational flexibility around the meso porphyrin 5-C-phenyl junction (librational averaging), the transitions Bx and By can be represented by one effective transition moment along the 5–15 axis (Figure 17), and the exciton CD reflects the chirality between two effective transition moments. So, tetraphenyl-porphyrin-carboxylic acid (TPP-COOH) is very useful for observing exciton CD because of its large red shift and large " value. Most of the chromophores shown in Figure 16 are useful for exciton split CD analysis for short to medium interchromophoric distances of 13–15 A˚. For distances up to 50 A˚, only the porphyrins and metalloporphyrins can provide couplets sufficiently intense for configurational analysis. Therefore when the AC of remote stereogenic centers is sought, the tetraarylporphyrin (TPP) and metalloporphyrins make an excellent choice. In cases where the configurational analysis involves remote stereogenic centers with C–C distances of approximately 8–9 A˚ and interchromophoric distances with Rij of approximately 13–14 A˚, the observed CD couplet becomes very weak or even undetectable with chromophores with weak or even moderate absorption

Figure 16 Typical chromophores useful for the CD exciton chirality method, where arrows show the direction of transition moment responsible for the exciton-coupled CD.

ε

ε

+200 +100 Δε

424 nm (+153) CD

2

A = +270

+

ε × 10–6

0 –100 416 nm (–117) 419 nm (914 000)

–200

1 bis(Zn-TPP) derivative

419 nm (550 000) monomer R

UV–Vis

ε 350

R

450

400

R

λ (nm)

0 500

ε

ε

ε R

CH2Cl2

R

Figure 17 UV–Vis and fluorescence data for TPP-COOH 19 and Zn-TPP-COOH 20. Bottom: bis(tetraarylporphyrin) derivative of 5-cholestane-3,17-diol 23 and 24: a positive helicity between the two effective transition moments defined in direction 5C/15C, the interchromophoric distance Rij, UV–Vis and CD spectra of in CH2Cl2; UV ", Rij values, and CD amplitudes A of other bischromophoric derivatives of 5-cholestane-3,17-diol 21 and 22.

112 Characterization by Circular Dichroism Spectroscopy

bands, such as benzoate or substituted benzoates. This is because the amplitude A is inversely proportional to the square of the interchromophoric distance Rij (see Equation (16)). A striking increase in the A-value is seen in TPP and its Zn derivative (Zn-TPP). The Zn-TPP derivative exhibits an A-value more than 10-fold larger than p-dimethylaminobenzoate at an Rij distance of 24.0 A˚. Other examples for efficient porphyrin–porphyrin CD coupling over 40–50 A˚ can be found in Matile et al.48 9.04.4.4.4 Benzamido and C 2v -symmetrical 2,3-naphthalenedicarboximido chromophores for amino alcohols and diamines

The CD exciton chirality method is also applicable to the intramolecular CT band of benzamido groups. The transition is polarized along the long axis of the chromophore. However, in some cases, the benzamide moiety exists as a mixture of (E) and (Z) isomers, and therefore, the mutual orientation of the transition moments is uncertain. Thus, in these situations, one should be cautious in assigning AC by CD. The chromophore of 2,3-naphthalenedicarboximide exhibits an intense 1Bb transition around 260 nm, which is polarized along the long axis of the chromophore. This C2v-symmetrical chromophore is ideally suitable for the CD exciton chirality method because the long-axis-polarized transition moment is exactly parallel to the C–N bond of amine moiety. This is an advantage of the 2,3-naphthalenedicarboximide group and hence the use of this chromophore is highly recommended for primary amines. 9.04.4.4.5

Bichromophoric methods and derivatization For acyclic or conformationally flexible natural products, the bichromophoric approach is suitable, where chromophores with very different max are introduced selectively by two-step protocols. When chromophores whose absorption maxima span 50–100 nm are introduced, the coupling leads to a CD curve with unique, fingerprint shapes, depending on the absolute twist between the interacting chromophores and the conformational population in the solvent employed. The comparison of such curves characteristic for each solvent with corresponding reference curves of known standards lead to a configurational assignment, although in a semiempirical manner, of several stereogenic centers at the same time. This approach was successfully applied to 1,2- and mixed 1,2-/1,3-polyols and amino alcohols.49–51 Figure 18 illustrates a submicroscale chemical protocol developed for the analysis of sphingosines and dihydrosphingosines isolated from new cell lines. First, the NH2 group of D-erythro-sphingosine 25 was blocked as a naphthimido group yielding a derivative 26. Then the OH groups were converted to 2-naphthoate groups affording a derivative 27 that can be sensitively detected by high-performance liquid chromatography (HPLC), mass spectrometry, CD, and fluorescence analysis. Upon comparison of the observed CD with the standard CD curves of erythro- and threo-sphingosines/dihydrosphingosines, the relative configuration and AC can be assigned.52 9.04.4.4.6

Chromophores for carboxylic acids and olefin compounds The chromophores suitable for chiral carboxylic acids are listed in Figure 16. The application of the exciton chirality method to olefin compounds is unique and interesting. The isolated olefin group shows a transition below 200 nm, and therefore the exciton method is not applicable in a straightforward manner. However, by the use of olefin metathesis, the chromophores shown in Figure 16 can be introduced and exciton CD can be used for determining ACs. 9.04.4.4.7 coupling

Natural products with one preexisting chromophore useful for exciton

If a natural product contains one chromophore, which is useful as a partner of exciton coupling, the second chromophore can be introduced by chemical derivatization to determine the AC by exciton CD. The newly introduced chromophore is selected for optimal exciton coupling with the preexisitng chromophore. Thus the chromophore showing similar UV max to that of the preexisitng chromophore is effective for exciton coupling CD.

Characterization by Circular Dichroism Spectroscopy

113

Figure 18 By a selective two-step microscale chemical derivatization procedure, two different types of chromophores are introduced in D-erythro-sphingosine 25.

9.04.4.4.8 Natural products with preexisting chromophore not useful for exciton coupling: Use of red-shifted chromophores

If a natural product has a preexisting chromophore, which may disturb the observation of exciton CD, it is advisable to choose chromophores with longer wavelength UV max than that of the preexisting chromophore to avoid the overlap of Cotton effects. Red-shifted chromophores shown in Figure 19 are useful for this purpose. As shown in Figure 19, taxinine derivative -glycol 28a shows a very intense positive CD Cotton effect due to the transition of the strongly strained enone group around 263 nm. In the previous application of the exciton chirality method, unsubstituted benzoate chromophores were used and a negative exciton couplet was clearly observed despite the overlap with the enone Cotton effect.53 To avoid the overlap of exciton CDs with the enone Cotton effect, a red-shifted chromophore (chrom-3) was used for derivatization yielding ester 28b. As expected, the CD of 28b exhibited a clear negative exciton couplet indicating a counterclockwise screw sense between the two hydroxyl groups in full agreement with the previous report of the AC.54

9.04.4.5 Supramolecular Approach in Exciton Chirality Method – Application of Porphyrin Tweezers Recently, the use of TPPs and their metal derivatives as useful CD chromophores was extended by the development of a totally new supramolecular approach for the determination of the AC of chiral compounds that contain a single stereogenic center and only one site for chromophoric derivatization. This group includes various natural products carrying only a single functionality, such as secondary hydroxyl, primary or secondary

114 Characterization by Circular Dichroism Spectroscopy

UV λ max 382 nm (ε 34 000)

UV λ max 382 nm (ε 27 000) Δε 263 (+25) +20

389 (+16)

+10

ε × 10–4

CD

0 UV λ max 410 nm (ε 37 000)

28a

–10 –20

274 (58 000)

28b 8

412 (63 000)

6

455 (–25)

4 UV 2 Dotted line: 28a, R = H Solid line: 28b, R = chrom-3 ester

200

300

400

500

λ /nm

Figure 19 Red-shifted chromophores and application to taxinine system.

amino, and carboxyl groups. They are unsuitable for application of conventional exciton chirality approach where at least two intramolecularly interacting chromophores are necessary. The supramolecular approach mentioned above employs a dimeric zinc porphyrin reagent, now available under the name ‘Zn-tweezer’. The latter is capable of forming 1:1 host–guest complexes upon adding a solution of N,N-bidentate conjugate, prepared by reacting the chiral substrate with an achiral trifunctional bidentate carrier as shown in Figure 20.55,56 Interestingly, the observed facile N/Zn coordination to a Zn-porphyrin tweezer and formation of 1:1 sandwiched chiral host–guest complex proceeds under steric control and usually leads to a very intense exciton-coupled bisignate CD spectrum in the Soret region. The origin of such intense CD couplets lies in the predominant presence of conformers with a preferred interporphyrin helicity where the larger group L protrudes from the binding pockets in order to avoid unfavorable steric interactions. Therefore the chiral sense of twist between the two porphyrins in the complex is dictated by the steric orientation of L and M at the stereogenic center of the substrate. In case there is no ambiguity in the assignment of L and M groups, the sign of the couplet determines the AC at this center. Over the past years the search for more reliable discrimination of L/M relative steric size and theoretical prediction of the preferred interporphyrin helicity of the host–guest complex has led to the development of molecular mechanics calculations protocol using the Merck Molecular Force Field (MMFF) approach coupled to Monte Carlo-based conformational analysis.57 The porphyrin tweezers method is now well established and has allowed a successful determination of AC of some natural products, such as isotomenoic acid 36, an irregular diterpene,58 and bovidic acid 37, an 18-carbon

N

N Zn

N

(a)

N M N

H M L

Zn O

XH

H N

HO

X = O, NH Chiral substrate 29

H

NH2 M L

Carrier 30

O X

N H N

P-2 N

O

P-1

H O

NH2 O

L

N

N H

Zn O

HH N

Zn

Conjugate 31 (guest) O

O

O

O

O

O

1:1 host–guest complex 33 Zn-porphyrin host 32 ‘tweezer’

100

L

(b)

433 nm (+91)

CD

M H3C

50

P-1 P-2

H

O

O N

A = +170

Δε 0

Tweezer H H N

P-2 P-1

Zn M L

>>>

M L Zn –50

423 nm (–79)

1.5 × 106

H Chiral conjugate 34 of (S)-absolute configuration

Host–guest complex 35 Preferred conformation

422.2

416.6

–100

1 × 106 UV/Vis

5 × 105 0

400

410

420

430

440

450

Figure 20 Formation and CD of 1:1 host–guest complex between achiral Zn-porphyrin tweezer and chiral substrate. (a) A reaction of the carrier molecule 30 with a starting substrate 29 (secondary alcohol or primary amine) leads to formation of bidentate chiral conjugate 31 (guest), which upon mixing with an achiral Zn-porphyrin tweezer 32 yields a 1:1 host–guest complex 33. (b) Example for the formation with (S)--(2-naphthyl)ethanol 34 of a host–guest complex 35 in two conceivable conformations with opposite interporphyrin twist. The one where the L (larger) group protrudes away from P-1/P-2 binding pocket is preferred and has a positive twist between the two porphyrins. This gives rise to a characteristic exciton split CD with positive amplitude A ¼ þ170 (in methylcyclohexane) in agreement with the (S)-absolute configuration of starting substrate. Redrawn from N. Berova; L. Di Bari; G. Pescitelli, Chem. Soc. Rev. 2007, 36, 914–931.

116 Characterization by Circular Dichroism Spectroscopy

Figure 21 Applications of the porphyrin tweezers method to natural products.

hydroxyfuranoid acid59 (Figure 21). More recently, other types of porphyrin-based tweezers have been developed. Structural changes in the tweezer, such as introduction of various substituents at the aryl groups and in the bridge between the two porphyrins allow for tuning the complexation ability of the tweezer and extension of its application to other types of chiral substrates.60–62

9.04.5 Induced CD The enormous attention and advance in supramolecular chemistry in the past few decades has stimulated interest in CD arising from different types of intermolecular interactions. Four typical situations are encountered: (1) A chiral (nonracemic) ‘guest’ and an achiral chromophoric compound as ‘host’, for example, crown ethers, calixarenes, atropisomeric biaryls, and bis-porphyrin systems, can form a chiral host–guest complex, which exhibits an induced CD (ICD) within the absorption bands of the host.63 (2) Inversely, a small guest molecule that is achiral and hence its chromophore is chiroptically inactive, upon binding to a biopolymer host, such as proteins,64 polypeptides, oligonucleotides,65 oligosaccharides (notably including cyclodextrins),63 may produce an ICD due to the chiral perturbation by the biopolymer host. (3) A third case is when a coupling between several guest molecules bound to different sites of a macromolecular host results in a diagnostic CD spectrum.66 (4) A chiral, nonchromophoric ligand binds to a metal ion with observable d- or f-type transitions in the UV–Vis spectrum, making them CD active. In several cases, CD lends itself not only to the detection of host–guest interactions, but also to the analysis of binding modes, such as association–dissociation kinetics and thermodynamics (see Section 9.04.6.5.2). Figure 22 shows an interesting example of ICD of type (1), where achiral resorcinol-dodecanal cyclotetramer 40 interacts with D-(þ)-fucose 41 to form a chiral host–guest complex, the CD spectrum of which shows positive and negative Cotton effects around 305 and 290 nm, respectively. Upon the host–guest interaction, host 40 takes chiral conformations, in which four resorcinol rings are chirally twisted to generate induced bisignate CD. When L-()-fucose 41 was used, opposite CDs were observed. Based on these results, the use of host 40 as a supramolecular probe for the assignment of ACs of chiral guests was reported.67

Characterization by Circular Dichroism Spectroscopy

117

Figure 22 Induced CD of complexes, achiral host 40 and chiral sugar guests 41: CD ext data were obtained from the published spectra.

9.04.6 Characterization of Natural Products by CD – Selected Examples As discussed above, CD spectroscopy is useful for the characterization of natural products. In the following, the application of CD spectroscopy to the structural studies of natural products is exemplified and explained. The cases are (1) CD and solvent-dependent conformational change, (2) determination of AC by comparison of CD spectra, (3) application of CD exciton chirality method, (4) CD of atropisomers, (5) determination of ACs by theoretical calculation of CD spectra, and (6) supramolecular systems and CD spectra. 9.04.6.1

CD and Solvent-Dependent Atropisomerism of Antibiotic FD-594

Antibiotic FD-594 42 exhibited almost opposite CD curves in CHCl3 and MeOH due to the solvent-dependent atropisomerism, which was confirmed by 1H nuclear magnetic resonance (NMR) coupling constants68 (Figure23). The AC of 42 was determined by X-ray crystallography as shown. A strong negative CD around 270 nm in CHCl3 implies a negative exciton coupling between the two aromatic chromophores. In MeOH, the helicity is inverted to generate a strong positive CD around 270 nm. Similar behavior was observed with aglycon 43. 9.04.6.2

Determination of Absolute Configuration by Comparison of CD Spectra

9.04.6.2.1

Absolute configuration of thysanone isolated from Thysanophora penicilloides The (1R,3S) AC of thysanone 44, a fungal benzoisochromanquinone with potent rhinovirus 3C-protease inhibitory activity was determined by comparison of the CD spectra of the authentic natural thysanone with that of a synthetic sample prepared by total synthesis from (S)-ethyl lactate69 (Figure 24). 9.04.6.2.2 Absolute configurations of mutafurans A–G isolated from Bahamian sponge Xestospongia muta

The ACs of mutafurans A–G 45–51, brominated ene-yne tetrahydrofurans (THFs), isolated from Bahamian sponge X. muta were determined by comparison of CD spectra as shown in Figure 25. The observed CD Cotton effects are very weak because of the weak perturbation of a conjugated ene-yne chromophore by the chirality in a THF ring. On the other hand, the terminal bromo-diene or bromo-ene chromophore does not contribute to the CD because of remote distance from the chiral THF ring. As reference compounds, two model compounds ()-d 52 and (þ)-e 53 with the ene-yne THF moiety were synthesized starting from (R)-(þ)-epoxyhexane. Since the CD Cotton effects of ()-d 52 are the same in sign as those of natural products 45–51, their ACs were determined as shown.70 9.04.6.2.3

Absolute configuration of ciguatoxin The 2S configuration of ciguatoxin (CTX, 54)71 was assigned on the basis of the CD exciton chirality data of tetrakis(p-Br-benzoate) of 54 and tris(p-Br-benzoate) of the AB fragment (Figure 26). This was later confirmed

118 Characterization by Circular Dichroism Spectroscopy

Figure 23 Solvent-dependent atropisomerism of antibiotic FD-594. Redrawn from T. Eguchi; K. Kondo; K. Kakinuma; H. Uekusa; Y. Ohashi; K. Mizoue; Y.-F. Qiano, J. Org. Chem. 1999, 64, 5371–5376.

Figure 24 Thysanone and CD data.

by chemical degradation and comparison with an authentic sample. The AC of C5 in CTX4A 55 was determined by comparison of the CD spectrum of stereoselectively synthesized p-Br-benzoate 56a, containing the AB ring fragment of CTX4A, with that of tris(p-Br-benzoate) 55a of CTX4A. Both compounds show intense exciton CDs of positive chirality, which are caused by the interaction between 1,3-diene and p-Br-benzoate chromophores. Since the relative configurations of CTXs have been determined by intensive NMR spectral studies, the ACs of CTXs were determined as illustrated. It should be noted that because of the extremely limited availability of CTXs, these studies were carried out using 5–100 mg samples. These ACs of the CTXs were later confirmed by total synthesis.

b a Δε (mol–1 dm–3 cm–1)

c

(–)-d

0.8

0.8

0.6

0.6

0.4

0.4

0.2

0.2

–0.2

–0.2

–0.4

–0.4 a b c

–0.6 –0.8

220

240

260 λ (nm)

280

d e b

–0.6 –0.8

220

240

260

280

(+)-e Figure 25 Mutafurans A–G 45–51 and CD spectra of compounds a–e at 25 C in hexane. Redrawn from B. I. Morinaka; C. K. Skepper; T. F. Molinski, Org. Lett. 2007, 9, 1975–1978.

120 Characterization by Circular Dichroism Spectroscopy

Figure 26 Absolute configuration of ciguatoxin and CD data.

9.04.6.3

Determination of Absolute Configuration by the CD Exciton Chirality Method

9.04.6.3.1

Application of the CD exciton chirality method to acyclic 1,2-glycols To determine the ACs of acyclic 1,2-glycols, the CD exciton chirality method has been applied to their dibenzoates or bis(2-anthroates), which show typical bisignate Cotton effects (see Section 9.04.4) as exemplified in Figures 27 and 28.72,73 Acyclic dibenzoates or bis(2-anthroates) can rotate around the bond connecting two benzoate or 2-anthroate chromophores, and therefore the CD sign depends on the conformational equilibrium. From the data of many examples, general rules were derived as shown in Figures 27 and 28. In the case of the diesters of a terminal 1,2-glycol, CD and AC are correlated as shown in Figure 27. For example, the diester 57 (bis(p-Br-benzoate)72 or bis(2-anthroate)73) with the AC as shown adopts three rotational conformers 57A, 57B, and 57C, among which the conformer 57B is unstable because of two gauche relationships among three bulky groups. On the other hand, the conformers 57A and 57C have one gauche relationship between two bulky groups, respectively, and therefore, they are stable and dominant in the equilibrium. The stable conformer 57A has a positive exciton chirality between two chromophores, while in conformer 57C the two chromophores are in a trans-relationship, and therefore, no exciton chirality is generated. Thus the CD spectrum of diester 57 reflects a positive exciton chirality of conformer 57A. The 1 H NMR coupling constants (J(trans) ¼ 6.8–8.4 Hz, J(gauche) ¼ 3.6 Hz) support this conclusion. The CD

Δε +90

273 (+97)

+60

CD A = +190

+30 0

ε × 10–4

–30 –60 –90

253 (–93)

258 (147 000)

UV

200

250

300

Chrom = 2-anthroyl

15

(S )-59

10

in CH3CN

5

350 λ (nm)

Figure 27 Applications of the CD exciton method to acyclic terminal 1,2-glycols. Redrawn from I. Akritopoulou-Zanze; K. Nakanishi; H. Stepowska; B. Grzeszczyk; A. Zamojski; N. Berova, Chirality 1998, 9, 699–712.

122 Characterization by Circular Dichroism Spectroscopy

Internal 1,2-glycols bis(p-Br-benzoate) or bis(2-anthroate): threo-isomer +

Exciton chirality: zero OChrom R2 H

H OChrom R2

H

H OChrom

R1

60 First CD, (+) Second CD, (–)

OChrom OChrom

R1

H

H

(60A) J(trans) = 6.1~8.7 Hz

(60B) J(gauche) = 2.9~4.1 Hz

R2

H OChrom 60

First CD, (+) Second CD, (–)

61

First CD, (–) Second CD, (+)

CD

62

H

A = +20.9

First CD, (–) Second CD, (+)

Me

238.2 (–8.5)

–10

OBz-p-Br

H

OBz-p-Br

UV in EtOH

200

250

Δε

237.0 (+6.9)

64 H

–20

EtOOC

λ (nm) 300

H H

COOEt OBz-p-Br OBz-p-Br COOEt

–40

254.3 (–40.7)

OBz-p-Br COOEt

H

4

2

CD A = –47.6 (-----)

Me

(2S, 3S)-64

244.0 (36 200)

First CD, (+) Second CD, (–)

H Me

ε × 10–4

H

63

+20

OBz-p-Br OBz-p-Br

Me

ChromO H R1 R2 ChromO H

R2

H OChrom

252.8 (+12.4)

Δε

(60C)

H OChrom R1

+ +10

H OChrom

Diester with polar or bulky groups J(gauche) = 2.9~4.1 Hz

ChromO H R1 R2 ChromO H

H OChrom

R1

H

R2

Diester J(trans) = 6.1~8.7 Hz

R1

OChrom R2

ChromO

OBz-p-Br

4

UV : Hexane : EtOH

65 0 200

250

6

(2R, 3R)-65 245.0 (40 300)

ε × 10–4

R1

300 λ(nm)

2

Figure 28 Applications of the CD exciton method to acyclic internal 1,2-glycols with threo-configuration. Redrawn from N. Harada; A. Saito; H. Ono; S. Murai; H.-Y Li; J. Gawronski; K. Gawronska; T. Sugioka; H. Uda, Enantiomer 1996, 1, 119–138.

spectrum of (S)-1,2-propanediol bis(2-anthroate) 59 shows very intense exciton Cotton effects, from which the AC of this compound could be assigned (Figure 27).73 If a terminal 1,2-diester adopts the opposite AC as shown in 58, the opposite CD is obtained. Thus the AC of terminal 1,2-glycols can be determined by the CD exciton chirality method. In a similar manner, the CD exciton chirality method is applicable to internal 1,2-glycols with threo-configuration (Figure 28).72,73 The exciton chirality between two chromophores depends on the rotational conformation. For example, the diester 60 (bis(p-Br-benzoate) or bis(2-anthroate)) with the AC as shown adopts three rotational conformers 60A, 60B, and 60C, among which the conformers 60B and 60C are unstable because of three gauche relationships among four bulky groups. On the other hand, the conformer 60A has two gauche relationships between bulky groups, and hence it is stable and dominant in the equilibrium. The conformers 60A and 60B have positive and negative twists between two chromophores, respectively, while in the conformer 60C two chromophores are in the trans-relationship, and therefore, no exciton chirality is generated. The 1H NMR coupling constant (J(trans) ¼ 6.1 – 8.7 Hz) supports the preference of the conformer

Characterization by Circular Dichroism Spectroscopy

123

60A. After all, the CD spectrum of diester 60 reflects a positive chirality of conformer 60A. The CD spectrum of (2S,3S)-2,3-butanediol bis(p-Br-benzoate) 64 shows a positive exciton couplet, from which the AC of this compound could be assigned (Figure 28).72 If an internal 1,2-glycol has the opposite AC, the opposite CD Cotton effects are observed as shown in 61. The above relationship between the AC and the exciton CD Cotton effects holds for most internal 1,2-glycols. However, if a glycol has polar or extremely bulky groups (R1 and R2), the conformational equilibrium is changed. In such a case, the two polar or extremely bulky groups R1 and R2 adopt a trans-relationship to diminish the electric repulsive force or steric repulsion, and therefore the conformer 60B becomes dominant. The preference of the conformer 60B is supported by the 1H NMR coupling constant (J(gauche) ¼ 2.9 – 4.1 Hz). The CD spectrum of (2R,3R)-diethyl tartrate bis(p-Br-benzoate) 65, in which the two polar ethyl ester groups adopt a trans-relationship, shows a negative exciton couplet reflecting the preference of the conformer 60B.72 Thus the AC of terminal 1,2-glycols can be determined by the CD exciton chirality method in conjunction with 1H NMR analysis. If the groups R1 and R2 are identical, the 1H NMR vicinal coupling constant between two methine protons cannot be obtained from the routine NMR spectrum because of the same chemical shift. In such a case, the 1H NMR13C satellite band method is useful to determine the Jvic value.72,74 In the case of erythro-1,2-glycols, the determination of AC is more difficult. If the two groups R1 and R2 are identical, the glycol is a meso-isomer and hence achiral. If they are different, the glycol is chiral. In general, the exciton CD Cotton effects of erythro-diester are weak and depend on the equilibrium of the rotational conformations. Therefore, the assignment of ACs needs the further conformational analysis by other methods, for example, nuclear overhauser effect (NOE).73 The AC of 1,3-glycols can also be assigned in a similar manner.75–77 9.04.6.3.2

Absolute configuration of urothion To determine the AC of urothion 66, a yellowish pteridine pigment isolated from human urine, the compound was subjected to desulfurization with Raney-Ni yielding a product 67, which was converted to tris(p-Cl-benzoate) 6878 (Figure 29). On the other hand, authentic samples of (S)-67a and (R)-67b were synthesized starting from D-glucose. Since the []D values of 67, 67a, and 67b were too small to assign their ACs by comparison, tris(p-Cl-benzoates) 68, 68a, and 68b were prepared and their CD spectra compared. The CD spectrum of 68 agreed with that of (S)-68a, and therefore, the AC of urothion 66 was determined to be R. The bisignate Cotton effects at 247 and 228 nm originate mainly from the exciton coupling between the two benzoate groups in the side chain. According to the exciton chirality method applied to acyclic 1,2-glycols (Section 9.04.6.3.1), the positive sign of the first Cotton effect leads to the S configuration, which agrees with that obtained by comparison of CD spectra. 9.04.6.3.3

Absolute configuration of cephalocyclidin A, a five-memberd ring cis-a-glycol The unprecedented pentacyclic structure of cephalocylidin A 69, a cephalotaxus alkaloid, was elucidated on the basis of X-ray crystallography, 1H-NMR, and CD analysis79 (Figure 30). The presence of a tetra-substituted benzenoid ring in the intact cephalocylidin does not allow assignment of AC from CD. However, upon p-methoxycinnamolylation of the secondary hydroxyl groups at 2-C, 3-C, the derivative 70 provided a useful, though rather weak bisignate CD band associated with a negative exciton coupling due to the small dihedral angle between the cinnamate chromophores. The CD couplet, sufficiently removed from other aromatic transitions, allowed for a straightforward assignment of the (2R,3S) AC, and eventually of the remaining four stereogenic centers, which were determined by taking into account the known relative configurations from NMR and X-ray analysis. 9.04.6.3.4

Absolute configuration of dihydro-b-agarofuran sesquiterpene A screening program of South American medicinal plants for drugs resistant to parasites yielded a number of dihydro--agarofuran sesquiterpenes from the roots of Maytensus magellanica.40 Because of their unique ability to block P-glycoprotein exporter activity, these compounds are considered to be privileged structures (Figure 31). Compound 13 (see Figure 15), the most active of this series, is representative of these new sesquiterpenes that were isolated based on their activity against a multidrug-resistant strain of Leishmania tropica

Figure 29 Urothion and CD data.

Characterization by Circular Dichroism Spectroscopy

125

Figure 30 Absolute configuration of cephalocyclidin A 69 as determined by CD exciton chirality method.

Figure 31 Dihydro--agarofuran sesquiterpene 13 and CD data.

overexpressing a P-glycoprotein-like exporter. As seen in the structure, 13 contains cinnamate and benzoate esters on carbons 1 and 9 in an ideal 1–3 relationship for determining their AC by taking advantage of the sign of the anticipated exciton couplet in its CD spectrum. In other words, the long axes of the transition dipole moments of these esters, depending on their absolute stereochemical relationship, will describe either a rightor a left-handed twist as evidenced by the sign of the exciton couplet in its CD spectrum. In the event, the CD shows a clear and positive exciton couplet at around 270 and 226 nm, respectively. Thus, this positive right-handed relationship defines the absolute stereochemistry of this family of sesquiterpenes. 9.04.6.3.5

Absolute configuration of phomopsidin The CD spectrum of phomopsidin 71, a marine-derived fungal metabolite shows only one very weak Cotton effect at 266 nm associated with the diene-carboxylic acid chromophore at C-6 with a moderate UV absorption at 266 nm80 (Figure 32). The Cotton effect due to the absorption of the two isolated double bonds below 200 nm were difficult to measure. Since the observed single Cotton effect was unsuitable for a determination of AC, the phomopsidin methyl ester was subjected to esterification with p-nitrobezoyl chloride. As expected the CD spectrum of the corresponding 11-p-nitrobezoate derivative 72 exhibited a clear-cut positive exciton couplet arising from a through-space interaction between the dienoate and p-nitrobezoate chromophores, whose electric transition moments and twisted axial/equatorial orientation, respectively, fit well the requirements for a nondegenerate exciton coupling.

126 Characterization by Circular Dichroism Spectroscopy

Figure 32 CD and UV data of phomopsidin 71 and phomopsidin methyl ester p-nitrobezoate 72.

9.04.6.3.6

Absolute configuration of spiroxin A, a bis-acetophenone fungal metabolite Spiroxin A 73 is the major component of a group of metabolites isolated from fermentations of the fungus LL-37H24881 (Figure 33). It is a bis-acetophenone with a spiroketal grouping at carbon 4 that locks the two conjugated chromophores in either a right- or a left-handed twist. The relative stereochemistry had previously been established by NMR. The CD spectrum of spiroxin A 73, however, exhibits a complex CD in the 200–280 nm region, which was difficult to interpret. Thus, the AC could not be assigned from the CD of the intact molecule. To resolve this issue, esterification of the two phenolic hydroxyls with retinoic acid was carried out because all-trans retinoic acid methyl ester has a max 356 nm (" 39 500) and is well red-shifted from the absorption of the existing chromophore. Furthermore, the transition dipole moment is aligned parallel to the all-trans polyene providing for potentially clear interpretation of the CD of the two interacting retinoate chromophores. Microscale derivatization gave access to spiroxin A bis(retinoate) 74 whose CD showed no clear exciton couplet in the retinoic acid region. Thus the CD of spiroxin A 73 itself was subtracted from that of the bis(retinoate) 74 to give the difference spectrum which then showed a clear, negative exciton couplet at ext 385 nm (" 17.3) and 331 nm (" þ17.4), which permits an unequivocal assignment of the twist as left-handed and thus the AC as shown. 9.04.6.3.7

Absolute configuration of pinellic acid A useful application of the allylic benzoate CD method for determining the AC of allylic alcohols was that used for pinellic acid 7582,83 (Figure 34). Pinellic acid was isolated from Pinelliae tuber, a component of Japanese

Figure 33 Spiroxin A 73 and CD data.

Characterization by Circular Dichroism Spectroscopy

127

Figure 34 Pinellic acid 75 and exciton CD.

herbal medicine, and exhibits oral adjuvant activity for nasal influenza vaccine. The relative configuration of the three stereogenic centers at carbons 9, 12, and 13 were determined by NOE studies of the methyl ester of its acetonide. This established the syn configuration for the 12-C, 13-C vicinal diols. The acetonide was then converted to the p-bromo-benzoate 76, the 1H NMR spectrum of which indicated an antiperiplanar relationship between the 9 and 10 protons (J9,10 ¼ 7.0 Hz). The CD of the bromo-benzoate allylic ester showed a positive Cotton effect at ext 245 nm (" þ6.97) indicative of the S configuration at 9-C. This predicted the AC of pinellic acid to be either (9S,12S,13S) or (9S,12R,13R). The question was resolved by a stereospecific synthesis of both isomers. Comparison of spectral data of the two synthetic preparations with those of the natural product indicated that the AC of the natural product was (9S,12S,13S) as shown. 9.04.6.3.8

Absolute configuration of phorboxazole Phorboxazoles are marine natural products that exhibit strong cytostatic activity. The AC of phorboxazole A 77 was assigned as shown by total synthesis except for the configuration of 38-C84 (Figure 35). The AC at the 38-C allylic alcohol had originally been assigned as R by application of the Mosher methoxy trifluoromethyl phenyl acetic acid (MTPA) method. However, there was an anomaly in the NMR data. To corroborate the assigned R configuration, the following CD studies were carried out. The threo and erythro model compounds 78a and 78b were synthesized by several steps from (S)-malic acid and the derived allylic alcohols converted to 2-naphthoate esters 79a and 79b. The NMR vicinal coupling constants of 37-H/38-H were observed to be J37,38 ¼ 7.0 Hz for 79a and J37,38 ¼ 3.7 Hz for 79b. The data for 79a are similar to that of natural product 77, J37,38 ¼ 7.9 Hz, indicating that compound 77 has the same relative configurational relationship as 79a. With this relative configurational relationship established, examination of the CD spectrum of these two model compounds permitted the assignment of the 38-C ACs in 79a and 79b. The minor threo ester 79a showed a strong negative Cotton effect at ext 234 nm (" 9.2) indicating a negative twist between the esterified alcohol and the allylic double bond. In contrast, the major erythro product 79b showed a similar CD to that of the threo compound 79a except for the sign of the Cotton effect, ext 234 nm (" þ15.1) describing a positive helicity. These exciton couplets are ascribed to exciton interactions between the transition dipole moments of the 1Bb band of the 2-naphthoate chromophore and the transition of the 39-C/40-C double bond. The NMR coupling constant between 38-H and 39-H was observed as J38,39 ¼ 9.6 Hz for 79a and J38,39 ¼ 9.2 Hz for 79b indicating that these two protons are in trans-relationship in their stable conformations. Based on these data, the AC for 38-C was determined to be S in threo-79a and R in erythro-79b. Because the two model compounds were derived from (S)-malic acid, they have opposite ACs to phorboxazole A between 33-C and 37-C. And because both phorboxazole A 77 and threo-79a have the same relative configurations at 33-C through 38-C, these analyses corroborated the originally assigned 38R configuration of the natural product.

128 Characterization by Circular Dichroism Spectroscopy

Figure 35 Phorboxazole A 77 and allylic benzoate method.

9.04.6.3.9

Absolute configuration of gymnocin-B Several challenges were encountered during the course of the configurational assignments of gymnocin-B 80, a cytotoxic marine natural product containing the largest 15-polyether skeleton isolated so far85 (Figure 36). Along with the conformational flexibility arising from the presence of five, seven-membered rings, investigation of the sterically hindered and remote critical hydroxyl groups at 10-C and 37-C in the B and J rings was especially difficult. In addition, the sample isolated from the red-tide dinoflagellate was available in extremely limited amount. On the basis of known relative configurations at all 31 stereogenic centers previously assigned by NMR, the formidable task of determining their ACs was achieved by direct CD analysis at the critical 10-C and 37-C secondary hydroxyl groups. The potent but bulky triphenylporphyrin-cinnamate chromophore was chosen and introduced into 10-OH and 37-OH by acryloylation/cross metathesis under microscale conditions. Owing to very intense UV–Vis porphyrin absorption, this gymnocin-B derivative 81 did show a clear-cut exciton split CD even though the two stereogenic centers bearing the porphyrins were approximately 30 A˚ apart. However, only after extensive conformational analysis of this derivative could the observed positive couplet be rationalized regarding the ACs at 10-C/37-C. For this purpose a conformational analysis by MMFF94s/Monte Carlo calculation was first carried out on a few truncated models and then finally on the entire gymnocin-B bis(triphenylporphyrin-cinnamate) derivative 81. This analysis permitted correlation of the positive interporphyrin twist in the preferred 10-axial and 37-equatorial TPP-cin conformations of the arbitrarily chosen (10S,37S)-configuration with the observed positive exciton-coupled CD. In addition, the Boltzman-weighted calculated CD by De Voe’s coupled oscillator method was in full agreement with the experimental results.

Characterization by Circular Dichroism Spectroscopy

RO H O Me

5

H

10

B A

O

H

C

O H Me

H D

H

O

O

H

O

F

E H

H

H

O

H

H H

G

O H

Me

O I

H

H

129

OR 37

J O H H

Me O K

H L

H O H

Gymnocin-B 80, R = H

H O H Me M Me O N O O 55 H O H

54

O

R= NH N O N HN

Bis(TPP-cinnamate) 81

10-TPPcin (ax) 37-TPPcin (eq)

Figure 36 Gymnocin-B 80 and the lowest-energy conformation of its 10,37-bis(TPP-cinnamate) derivative 81 obtained by Monte Carlo/MMFF94s with Spartan 02. Experimental CD (in MeOH, c ¼ 3.0 106): 419 nm (" þ11), 414 nm (" 15); Boltzmann weighted (at 298 K) average CD calculated by De Voe’s method: 420 nm (" þ25), 414 nm (" 25). Redrawn from K. Tanaka; Y. Itagaki; M. Satake; H. Naoki; T. Yasumoto; K. Nakanishi; N. Berova, J. Am. Chem. Soc. 2005, 127, 9561–9570.

9.04.6.3.10 Absolute configuration of antitumor antibiotic AT2433-A1 containing a secondary amino group

To determine the AC of antitumor antibiotic AT2433-A1 82, amino sugar bis(p-Br-benzoyl) derivative 84 was prepared from the natural product86 (Figure 37). Its CD spectrum showed a negative couplet leading to the AC as shown. However, it was later found that this assignment was wrong as explained below. The authentic samples 85a and 85b were synthesized from a starting material with known AC. Surprisingly, the CD of 85a showed a weak positive exciton couplet, while that of 85b showed a strong negative one. The 1H NMR of 85a, a benzamide derivative of the secondary amine, indicated the existence of (Z) and (E) amide isomers, which adopt negative and positive exciton chiralities, respectively. They cancel each other to some extent and the remaining CD is governed by the (E) amide. On the other hand, the CD of the primary amine derivative 85b reflects its AC in a straightforward way because of its (Z) conformation. Therefore, when the exciton chirality method is applied to secondary amines, the analysis of (E) and (Z) conformations is critical. The total synthesis of AT2433-B1 83 was carried out confirming the AC of the secondary amine. 9.04.6.3.11

Absolute configuration of chiral binaphthoquinones The synthesis of ()-89-hydroxyisodiospyrin 86, a naturally occurring bi(naphthoquinone), was carried out as shown; the coupling of chiral compound 88 with bromide 89 yielded a product that was treated with MeI giving iodide 90 as crystals.87 The AC of 90 was determined as shown by X-ray crystallography. Compound 90 was converted to binaphthalene 91, the CD of which shows typical and intense exciton-coupled Cotton effects as shown in Figure 38. From the positive sign of the first Cotton effect, an S configuration was assigned to 91. The oxidative demethylation and treatment with AlCl3 furnished (S)-(þ)-89-hydroxyisodiospyrin 87, which was identified to be the enantiomer of natural product 86. The AC of the natural product was thus determined to be

130 Characterization by Circular Dichroism Spectroscopy

Figure 37 Application of the exciton chirality method to secondary amine (numerical CD data were obtained from the spectrum reported in Chisholm et al.86).

(R)-()-86. The CD spectrum of (S)-(þ)-87 shows two positive and one negative Cotton effects around 360–260 nm, but their " values are smaller than those of 91 and the CD curve deviated from the ideal pattern of the exciton coupling. Thus, to determine the ACs by the exciton method, it is important to select the most appropriate chromophores, that is, binaphthalene rather than binaphthoquinone as used in this case. 9.04.6.3.12

Absolute configuration of pre-anthraquinones Atropisomeric pigments 92, 93, 94, and 95 were isolated from indigenous Australian toadstools belonging to the genus Dermocybe (Figure 39). The structures of these pigments were deduced by spectroscopic methods, and their ACs of atropisomerism were determined by CD spectra.88 The CD spectrum of 93 shows intense negative first and positive second Cotton effects at 272 and 251 nm, respectively, and therefore the AC with negative helicity between two long axes of aromatic chromophores was assigned. The same helicity was assigned to 92 showing a similar CD curve. The AC at the 39 position was determined by chemical correlation; the reductive cleavage of 93 yielded (R)-torosachrysone methyl ether with known AC. Pigments 94 and 95 are diastereomers of each other, but their CD curves are almost mirror images leading to opposite helicity between the two aromatic chromophores. The weak Cotton effects of 95 as compared to those of 93 reflect a smaller dihedral angle between the two aromatic chromophores in the rigid structure of 95.

Characterization by Circular Dichroism Spectroscopy

131

Figure 38 Synthesis and absolute configuration of axially chiral binaphthoquinones (numerical CD data were obtained from the spectra reported in Baker et al.87).

9.04.6.3.13

Absolute configuration of vinblastine The AC of vinblastine 14, one of the best-known Vinca alkaloids, was originally established by X-ray crystallography by Moncrief and Lipscomb.89 CD experimental and theoretical studies have also been carried out for the purpose of determining the AC of vinblastine and its natural and synthetic analogues.41,90 The CD spectrum of vinblastine 14 consists of intrinsic CD bands associated with the isolated transitions in chiral cleavamine and indoline (‘half-molecules’) alone, together with the exciton CD due to the through-space interaction between the indole and vindoline chromophores (Figure 40). This exciton CD reflects the AC at the 169-C stereogenic center of vinblastine 14. Therefore, to obtain the net exciton CD, the intrinsic CD bands were subtracted from the CD of 14 giving ‘difference CD’, which showed an intense positive couplet around 220 nm (Figure 40(b)). This exciton CD is generated by the interaction between two 1Bb transition moments of indole and indoline chromophores. The analysis of these CD spectra thus illustrates not only the validity of the general CD additivity rule but, importantly, it also reflects the positive exciton chirality and S AC at 169-C.

9.04.6.4

Absolute Configurations by Theoretical Calculation of CD Spectra

9.04.6.4.1 MO

Absolute configuration of a biflavone as determined by p-electron SCF-CI-DV

The AC of a natural biflavone atropisomer, ()-49,4-,79,70-tetra-O-methylcupressuflavone 96, has been determined to be aR (axial chirality, R) by theoretical calculation of its CD spectrum91 (Figure 41). The -electron system of biflavone 96 is strongly twisted to produce intense CD Cotton effects as shown. For the molecular structure of (aR)-96 with a counterclockwise screw sense, UV and CD spectra were calculated by the -electron SCF-CI-DV MO method. The calculated CD and UV curves are in excellent agreement with the observed spectra. Therefore, the AC of biflavone ()-96 was determined to be aR. The theoretically

132 Characterization by Circular Dichroism Spectroscopy

Figure 39 Pre-anthraquinones and CD data.

determined AC of this biflavone was later confirmed by total synthesis of the natural product ()-96. This theoretical approach should be a promising tool for determination of the AC of various natural products with a twisted -electron system.

9.04.6.4.2 Absolute configurations of naturally occurring dihydroazulene and marine natural product halenaquinol as determined by SCF-CI-DV MO

The AC of a naturally occurring (þ)-1,8a-dihydro-3,8-dimethylazulene 97 was similarly determined by -electron SCF-CI-DV MO (Figure 42).92 Dihydroazulene 97 shows intense CD Cotton effects reflecting its twisted -electron system. The CD and UV spectra of a model compound (8aS)-98 were calculated giving CD data as shown in Figure 42, which were similar in position and sign to those of the natural product (þ)-97. Therefore, the AC of (þ)-97 was theoretically determined to be 8aS. To verify this theoretical determination in an experimental manner, a model compound (8aS)-(þ)-99, which has a methyl group at the angular position and therefore is inert toward the oxidation to azulene, was synthesized. The observed CD data of (8aS)-(þ)-99 were also similar to those of (þ)-97. Therefore, the 8aS AC of (þ)-97 was established and it was proved that the -electron SCF-CI-DV MO method gives a correct AC.92

(a) Cleavamine

20'

N

OH

14'

N H CH3O2C

Vindoline

H

16'

N

20

20

CD cleavamine

10

H

–10

–10

–20

N

H3CO

H 3C

H OH

OAc CO2CH3

Vinblastine 14

–20

–30

4 × 104

234 nm (–28)

4 × 104

CD vindoline

4

250

300

350

Indole chromophore

0 400

250

300

4

0 400

350

Lb: 297 nm 1

258 nm (+14)

60

Bb: 225 nm

305 nm (+9)

N H

200

2 × 10

1

40 20

217 nm (–18)

–30

2 × 10

200

(b)

254 nm (+14)

10

1

La: 311 nm

40 20

231 nm (+46) 265 (+6)

306 (+8)

0 –20

Indoline chromophore

Sum CD Cleavamine + vindoline

1

Lb: 313 nm

–40 –60 –80 200

–20 –40

CD vinblastine

–60

H3CO 250

300

Difference CD: vinblastine – [cleavamine + vindoline]

350

400

N 1Bb: 215 nm CH3 1

213 nm (–71)

–80 200

250

300

350

400

La: 254 nm

Figure 40 (a) Vinblastine 14 consists of two half-molecules, cleavamine and vindoline: the CD spectra of the component molecules. (b) Left: The sum CD spectrum (dotted line) ¼ CD(cleavamine) þ CD(vindoline) and CD spectrum of vinblastine: Center: Electric transition moments of indole and indoline chromophores; Right: Difference CD ¼ CD(vinblastine) [CD(cleavamine) þ CD(vindoline)]. Redrawn from C. A. Parish; J.-G. Dong; W. G. Bornmann; J. Chang; K. Nakanishi; N. Berova, Tetrahedron 1998, 54, 15739–15758.

134 Characterization by Circular Dichroism Spectroscopy

+50 267.5 (+21.3)

Δε

263.2 (+21.7)

359.7 (+28.6)

0 CD Obsd (–)

–50 326.2 (–54.4)

in EtOH

ε × 10–4

10

CD

–50

317.5 (–45.0)

ε × 10–4

Δε

+50 362.0 (+25.6)

Calcd (aR)

10

226.8 (78 300) 225.8 (51 800)

322.6 (66 200)

UV

273.0 (41 400)

5

5

324.2 (40 900)

200

300

400 λ (nm)

UV

200

300

400 λ (nm)

Figure 41 Absolute configuration of atropisomer, biflavone 96, as determined by CD calculation. Redrawn from N. Harada; H. Ono; H. Uda; M. Parveen; N. U.-D. Khan; B. Achari; P. K. Dutta, J. Am. Chem. Soc. 1992, 114, 7687–7692.

Figure 42 Dihydroazulene and halenaquinol compounds with a twisted -electron system and their CD data.

The ACs of marine natural products, halenaquinol 100 and related compounds were also determined by -electron SCF-CI-DV MO (Figure 42).93 Halenaquinol 100 has a twisted -electron system, but its CD Cotton effects are weak. On the other hand, halenaquinol derivative ()-101 showed intense CD Cotton effects as listed in Figure 42. Therefore, the -electron system of this compound was selected for theoretical calculation. The CD spectrum of the model compound (12bS)-102 was calculated giving the data as shown in Figure 42. Although the wavelength position of the calculated Cotton effects deviated from those of the observed values, the basic pattern of CD spectrum was well reproduced by the calculation. Therefore, the ACs of ()-101 and (þ)-100 were determined to be 12bS. This AC was later confirmed by the total synthesis of halenaquinol (þ)-100 and related compounds.93

Characterization by Circular Dichroism Spectroscopy

135

TDDFT calculation of ECD of b-lactam antibiotics Recently, J. Frelek and coworkers proposed an empirical helicity rule relating the configuration of the bridgehead carbon atom in clavams and oxacephams to the sign of observed Cotton effect at approximately 220–240 nm94–97 (Figure 43). The rule was established empirically on the basis of X-ray data and a tentative assignment of the electronic transition at 220 (oxacephams) and 240 nm (clavams) to an n, -amide transition in the azetidinone system. According to this rule, which was found experimentally to be correct for a variety of oxacephams94,95 and clavams,96,97 a positive sign of the 220 nm Cotton effect corresponds to the (R)-AC at the bridgehead carbon atom whereas a negative sign indicates the (6S)-AC. The rationale for this rule relied on an assumption for conformational rigidity of the bicyclic system and localization of most of the molecular excitations within the amide chromophore. Having in mind the long-recognized utility of the -lactam helicity rule, it is not surprising that this rule was one of the first to prompt quantum mechanical investigations on its validity. In a recent study, Frelek et al. confirmed by time-dependent density functional theory (TDDFT) calculations the validity of the -lactam helicity rule to a series of clavams.98,99 Furthermore, by using for the first time a combination of TDDFT calculations with full quantum mechanical Born–Oppenheimer molecular dynamics, Frelek et al. were able to show a surprisingly high sensitivity of CD to molecular conformations of cephams and their carba and oxa analogues.98,99 9.04.6.4.3

9.04.6.4.4

TDDFT calculation of ECD of quadron and related compounds The study of four sesquiterpenes, quadron 110, suberosenone 111, suberosanone 112, and suberosenol A acetate 113, represents the first attempt to apply the ab initio DFT methodology for simulations of three different chiroptical properties, namely, OR and electronic and vibrational CD (ECD and VCD) for the purpose of determining the ACs of natural products100 (Figure 44). For example, in the case of quadron 110, the same AC is obtained from all three chiroptical properties, which leads to an AC of the highest reliability. It is to be noted that quadron belongs to a molecular type where the establishment of the AC is safely assigned using only the ECD

Figure 43 -Lactam antibiotics.

Figure 44 Quadron and related compounds.

136 Characterization by Circular Dichroism Spectroscopy

method. In such types the low-energy transitions, such as the carbonyl group n transition, are limited in number and density, therefore they are well resolved experimentally. However, in general, when the assignment of AC is the main goal, the more experimental chiroptical data for the substrate that are available, the better the selection of suitable theoretical method(s). For example, the assignment of the AC of suberosenone 111, suberosanone 112, and suberosenol A acetate 113 was made on the basis of calculated OR values and comparison with the only currently available experimental ORs. It would have greatly benefited the AC assignments if ECD and VCD data had been provided. 9.04.6.4.5 TDDFT concerted calculation of CD, VCD, and OR of schizozygine, plumericin, and related compounds

The alkaloid schizozygine 114, and the iridoids plumericin 115 and isoplumericin 116 are natural products where the concerted application of two or three chiroptical methods has led to more reliable assignment of ACs by TDDFT (Figure 45). Owing to the presence of multisignate CD bands in the CD spectrum of schizozygine 114 and an imperfection of the B3LYP functional, the calculated ECD alone did not permit a safe AC assignment.101 Therefore it required additional support by OR and VCD data. Plumericin 115 and isoplumericin 116 also exhibited trisignate CD spectra that precluded the sole application of the ECD method. The unequivocal assignment of their ACs was made after the experimental VCD spectra were compared with the calculated data.102 A recent study by Stephens et al. on iso-schizogaline 117 and iso-schizogamine 118 provides further insights on the difficulties encountered in some cases to assign the AC only on the basis of TDDFT simulations of ECD data.103 According to the authors, when the chiral molecules contain a substantial density of low-energy electronic states, this may give rise to electronic excitations in the near-UV that in turn will prevent the resolution and assignment of individual transitions. In such cases the assignment of AC will require additional support by VCD and OR data. 9.04.6.4.6

TDDFT calculation of CD of alkaloid chimonantine This study on chimonantine 119 provides an example of the enormous advance in the past decade in computational analysis, which makes feasible the correct assignment of the AC of large and conformationally flexible molecules by ab initio calculations of ECD and OR104 (Figure 46). Such calculations are free of the limitations typical for coupled oscillator approaches, where the presence of chromophores with certain electronic and geometrical attributes is a prerequisite for a straightforward configurational analysis. The molecular flexibility of pyrrolo[2,3-b]indoline alkaloids, including chimonantine, led Mason and Vane in 1966 to conclude that it was impossible to deduce their AC from chiroptical data.105 The recent study by

Figure 45 Schizozygine, plumericin, and related compounds.

Characterization by Circular Dichroism Spectroscopy

137

70 Me

60

N

50

H

H N

40 30

(–)-Chimonantine 119

20 Δε

N

10

H

N H Me

0 –10 –20 –30

R velocity Exp.

–40

R length

–50 –60 175

190

205

220

235

250

265

280

295

310

325

340

λ (nm) Figure 46 ()-Chimonantine 119 experimental CD spectrum in cyclohexane and two calculated in the velocity and length formalism. The TDDFT/B3LYP/6-31G calculated spectra were obtained as Boltzmann average upon the total six conformers taking into account the 40 lowest energy transitions and assuming a Gaussian distribution with i ¼ 0.15 eV. Redrawn from E. Giorgio; K. Tanaka; L. Verotta; K. Nakanishi; N. Berova; C. Rosini, Chirality 2007, 19, 434–445.

Giorgio et al. shows, indeed, that 40 years later the situation has changed dramatically in favor of theoretical predictions of AC when the natural products possess challenging molecular complexity. 9.04.6.4.7 Absolute configuration of hypothemycin by TDDFT calculation and solid-state CD

The theoretical calculations of the CD spectrum of the antitumor macrolide hypothemycin 120 were based on a geometry derived from X-ray analysis without further optimization and TDDFT methodology (B3LYP/ TZVP)106 (Figure 47). The comparison of the calculated CD spectrum with experimental CDs, measured in the solid state (in KBr) and in solution, revealed some differences of approximately 270–300 nm. In other recent studies,107 a good agreement of calculated CD with the spectra measured in the solid state as well as in solution was found. However, it seems the hypothemycin example illustrates a boarder line example for the application of this new solid-state CD/TDDFT approach. It does point out the need for caution in the interpretation of solid-state experimental/theoretical data. Most likely, intermolecular H-bonding of hypothemycin in the solid state and perhaps also in solution, which has not been taken into account by the calculation in vacuum, is responsible for the observed differences.

Figure 47 Absolute configuration of hypothemycin.

138 Characterization by Circular Dichroism Spectroscopy

9.04.6.5

CD of Supramolecular Systems

CD has been particularly useful in providing insight into the chirality of supramolecular assemblies. And in some cases monitoring the change in the CD signal on titration of the host molecule with a ligand can lead to information about the mode of binding and dissociation constants. Thermodynamic parameters may be obtained when monitoring the CD signal as a function of temperature. Examples of these cases are given below. 9.04.6.5.1

Theoretical simulation of CD spectrum of calicheamicin The aglycon portion of calicheamicin 121 and other 10-membered ring enediyne antitumor antibiotics contain dienonecarbamate and enediyne chromophores in a unique bicyclic ring structure in which these two subunits are essentially orthogonal to each other. The CD spectrum of calicheamicin 121, as well as the other members of this family, all of which contain the same bicyclic system, exhibits a characteristic and strongly negative exciton-coupled CD at 310 and 270 nm, which was used to assign the stereochemical relationship between these two chromophores at the time of the structure determination. This was later confirmed by stereospecific total synthesis of the ()-aglycon, calicheamicinone 122. Additional confirmation was then obtained by calculating the theoretical CD spectra of the whole calicheamicin aglycon A, and the dienone and enediyne chormophores individually, by using DFT and the De Voe’s coupled oscillator method.108 The necessary input geometry was obtained from the X-ray structure of the synthetic ()-calicheamicinone 122. In order to simplify the calculations, the allylic sulfur of the trisulfide moiety was replaced with a methyl group and the carbohydrate tail portion replaced with a hydrogen. The DFT calculations showed that the enediyne chromophore alone contributes very weakly to the exciton couplet whereas the twisted dienone chromophore makes a more significant contribution. However, only by taking into account both chromophores simultaneously could the shape of the experimental CD spectrum be adequately reproduced, but with only about a third of the experimental intensity. The De Voe calculations support those of the DFT method. The dipole transition moments used for the De Voe calculations are shown in the structure of calicheamicin (Figure 48). 9.04.6.5.2

Calicheamicin binding to an oligonucleotide CD titration studies of the calicheamicin–DNA interaction provided a dissociation constant and evidence for a calicheamicin-induced DNA conformational change. As mentioned in Section 9.04.6.5.1, calicheamicin is a potent antitumor agent that binds to double-stranded DNA at specific sequences and subsequently cleaves both strands of the double helix. The carbohydrate tail portion of the molecule is the DNA recognition part of the molecule. Measurement of the change in the CD signal of a duplex DNA 12-mer (39-GGGCCAGGATTC-59 hybridized with its complementary sequence) on titration with calicheamicin at a wavelength not masked by the absorption of the antibiotic gave a binding isotherm with saturation as well as a prominent isobestic point.109

Figure 48 Calicheamicin 121 and CD data.

Characterization by Circular Dichroism Spectroscopy

139

The calculated dissociation constant of a few micromolar agreed well with the value obtained from a direct measurement carried out using microcalorimetry.110 Furthermore, the CD titration showed that the DNA conformation was condensed somewhat as evidenced by a decrease in the CD signal of the DNA due to the binding of the hydrophobic antibiotic in the DNA minor groove. 9.04.6.5.3 Chiral stacking of anthocyanin flower pigments as revealed by CD spectroscopy

The molecular basis of flower pigments has fascinated organic chemists for the past 100 years. However, it is only recently that this enigma has finally been resolved primarily by the seminal effort of Japanese researchers led by Goto and Kondo,111–113 and more recently by Takeda and coworkers.114 Goto and Kondo determined that the deep blue color of Commelina communis flower is due to a high molecular weight pigment, commelinin 123.111,112 This supramolecular, Mg-containing pigment consists of a flattened spherical cluster with six molecules each of an anthocyanin and a copigment flavone glycoside, which surrounds two Mg2þ ions located in the center of the complex (Figure 49) in a threefold axis of symmetry. The structure was subsequently confirmed by X-ray analysis. The structures of two other similar supramolecular pigments, protodelphin115 and protocyanin,113,114 were later determined to be responsible for the color of the blue flowers of Salvia patens and the cornflower, Centaurea cyanus, respectively.116 CD spectroscopy, in addition to X-ray analysis of commelinin 123 and protocyanin, showed that all three of these supramolecular pigments were stacked in a left-handed helical assembly with complex, large, negative exciton couplets in the visible region (e.g., commelinin 123, ext 668 nm (" 145.5) and 580 nm (" þ186.4)) of the spectra matching the maxima in the UV–Vis spectra. The pendent glucose sugars are all of the D form and this homochirality drives the formation of the left-handed chiral supramolecular assembly by specific hydrogen bonding between the sugars of the anthocyanin 124 and copigment flavone 125. Metal chelation and hydrophobic interactions between the aromatic chromophores also play an important role in the assembly. The observed exciton couplet in the CD of commelinin 123 clearly indicates that the homodimeric anthocyanins in the quinonoidol tautomeric form assemble in a left-handed helical twist. The dimeric flavone copigments are intercalated between the anthocyanins and are also assembled in a left-handed offset. Flavocommelin 125 itself

Figure 49 Flower color pigment, commelinin 123 and CD data. Redrawn from G. A. Ellestad, Chirality 2006, 18, 134–144.

140 Characterization by Circular Dichroism Spectroscopy

stacks in a left-handed geometry with a clear and negative exciton couplet at approximately ext 365 nm (negative) and 320 nm (positive), but this region is obscured in the CD of the supramolecule. 9.04.6.5.4

CD of chirally stacked carotenoid pigments An unusual application of CD published by Zisla et al. has to do with the chirality of self-assemblies of carotenoids and their esters in intact orange and yellow flower petals117 (Figure 50). Carotenoid esters themselves have been found to stack intermolecularly in either a right- or a left-handed twist as evidenced by strong, complex exciton couplets in the visible region of the CD spectrum. The handedness of the stacking relates to the absolute stereochemistry of the stereogenic centers at each end of the carotenoids. For example, the CD spectrum of lutein diacetate 126 recorded in aqueous ethanol shows a positive exciton couplet between 450 and 500 nm indicative of right-handed chirality of the stacked carotenoid esters. The aqueous solvent promotes the aggregation and apparently simulates the plant cell’s aqueous environment. A remarkably clear CD spectrum of intact flower petals from Chelidonium majus – obtained by using freshly picked petals pressed between two quartz windows – matched closely the above-mentioned spectrum of lutein diacetate demonstrating the validity of using intact petals. Based on their CD spectrum, each plant species appears to produce flowers with a distinctive CD that is characteristic of that species. Furthermore, there are usually two or more types of carotenoids in the petal and the CD is influenced by not only this chemical heterogeneity but by the presence of the proteins and lipids that co-occur with the pigments. 9.04.6.5.5

CD of diazepam–HSA and diazepam–AGP complexes ICD was used to determine the bound conformations of the 1,4-benzodiazepine, anxiolytic drug, diazepam 127, to the two main serum proteins, human serum albumin (HSA) and 1-acid glycoprotein (AGP).118,119 This is an interesting application of CD because diazepine lacks a stereogenic center, but due to the rapid inversion of the nonplanar seven-membered ring, the drug is in equimolar equilibrium between two chiral conformers, P (plus) and M (minus) (Figure 51). This study shows that the two serum proteins display different conformer priorities. The ICD of HSA-bound diazepam is strongly positive at ext 260 nm (" þ46.5) with a smaller negative band at ext 321 nm (" 8.3), which indicates an M conformer preference. In contrast, the CD of AGP-bound diazepam shows a negative signal at ext 261 nm (" 9.6) and a positive one at ext 313 nm (" þ1.6) indicating a P conformer preference. Diazepam has been shown to bind to the domain III region of HSA, as shown by photolabeling studies, although it binds to a low-affinity domain I site with an inverse ICD

Figure 50 Carotenoid pigment.

Figure 51 Equilibrium of diazepam 127 between two chiral conformers.

Characterization by Circular Dichroism Spectroscopy

141

spectrum. This is presumably due to a preference for the P conformation. Thus AGP/diazepam binding seems to mimic this minor binding site of HSA. The binding affinities to the two serum proteins are similar as shown by ultrafiltration experiments. 9.04.6.5.6

CD of bilirubin bound to human and bovine serum albumins Bilirubin 128, the cytotoxic yellow pigment of jaundice, is an achiral tetrapyrrole with no stereogenic centers.120 It consists of an equilibrium mixture of two equimolar conformers, P and M (Figure 52). Although it does not show any CD signal in aqueous solution, upon binding to HSA that serves as a chiral selector, the P conformer is preferentially bound as evidenced by the appearance of a positive exciton split CD at 457 nm (" þ49.5) and 407 nm (" 29.5). Interestingly, a negative exciton couplet at 457 nm (" 62.5) and 407 nm (" þ23.7) is observed with bilirubin bound to bovine serum albumin (BSA), which correlates to a preference for the M conformer in this binding site. Thus, this methodology provides important insight into the drug-binding properties of serum proteins. Another publication described the interesting preference for bilirubin enantiomeric conformations in biomembrane models composed of chiral micellar aggregates formed from enantiomeric N-alkyl-N,N-dimethyl-N-(1-phenyl)ethylammonium bromides, as determined by CD. This study points to a possible correlation between conformer-specific bilirubin neuromembrane alterations and bilirubin neurotoxicity.121

9.04.7 Concluding Remarks and Outlook As can be seen from the great variety of examples set forth in this chapter ranging from more traditional small molecule natural products and drugs to supramolecular flower pigments and the yellow jaundice pigment

Figure 52 Enantiomeric conformations P and M of bilirubin 128 with electric transition moments. Redrawn from R. V. Person; B. R. Peterson; D. A. Lightner, J. Am. Chem. Soc. 1994, 116, 42–59.

142 Characterization by Circular Dichroism Spectroscopy

bilirubin, the interest in applying CD method for the determination of AC in natural products has increased dramatically in recent years. The reasons for this are twofold. First, the importance of stereochemical relevance to biological activity has made the determination of molecular and supramolecular chirality extremely critical for understanding the behavior and interaction of molecules. This is especially true for the binding of natural products to the cellular receptors that mediate their biological activity. Second, there has been a tremendous advance during the past 10 years in the theoretical treatment of optical activity and the computational methods that support the ab initio calculations of CD spectra. Furthermore, these calculations in conjunction with experimental results obtained with the more sophisticated instrumentation presently available make the assignment of the ACs, and in certain cases even bioactive conformations, a straightforward and highly efficient process. It is these tremendous advances in the past decade in the development of ab initio calculations of CD spectra that has put the interpretation of CD spectra on more solid ground. There are good reasons to remain optimistic that recent momentum in technological progress in chiroptical instrumentation and in development of new more sophisticated ab initio computational methodologies will continue in the future at faster pace. This will make the calculations of optical activity properties, including CD, a truly indispensable and widely affordable approach in the stereochemical analysis of natural products and their interactions on molecular and supramolecular levels.

References 1. S. F. Mason, Molecular Optical Activity and the Chiral Discrimination; Cambridge University Press: Cambridge, 1982. 2. N. Berova; K. Nakanishi; R. W. Woody, Eds., Circular Dichroism: Principles and Applications, 2nd ed.; Wiley-VCH: New York, 2000. 3. E. L. Eliel; S. H. Wilen; L. N. Mander, Stereochemistry of Organic Compounds; John Wiley & Sons Inc.: New York, 1994; Chapter 13, pp 991–1118. 4. J. R. Cheeseman; M. J. Frisch; F. J. Devlin; P. J. Stephens, Chem. Phys. Lett. 1996, 252, 211–220. 5. P. L. Polavarapu, Vibrational Spectra: Principles and Applications with Emphasis on Optical Activity; Elsevier: AmsterdamLausanne-New York-Shannon-Tokyo, 1998. 6. T. B. Freedman; X. Cao; R. K. Dukor; L. A. Nafie, Chirality 2003, 15, 743–758. 7. N. Harada, Optical Rotation, Optical Rotatory Dispersion, and Circular Dichroism. In Handbook of Instrumental Analysis, Part 2, 2nd ed.; Y. Izumi, M. Ogawa, S. Kato, J. Shiokawa, T. Shiba, Eds.; Kagakudojin: Kyoto, Japan, 2005; pp 119–133. 8. N. Harada; K. Nakanishi, Circular Dichroic Spectroscopy – Exciton Coupling in Organic Stereochemistry; University Science Books: Mill Valley, CA, and Oxford University Press: Oxford, 1983. 9. N. Berova; L. Di Bari; G. Pescitelli, Chem. Soc. Rev. 2007, 36, 914–931. 10. D. A. Lightner; J. E. Gurst, Organic Conformational Analysis and Stereochemistry from Circular Dichroism Spectroscopy; Wiley-VCH: New York, 2000. 11. N. Berova; K. Nakanishi, Exciton Chirality Method: Principles and Application. In Circular Dichroism: Principles and Applications, 2nd ed.; N. Berova, K. Nakanishi, R. W. Woody, Eds.; Wiley-VCH: New York, 2000; Chapter 12, pp 337–382. 12. H. De Voe, J. Chem. Phys. 1964, 41, 393–400; 1965, 43, 3199–3208. 13. S. Superchi; E. Giorgio; C. Rosini, Chirality 2005, 16, 422–451. 14. C. M. Kemp; S. F. Mason, Tetrahedron 1966, 22, 629–635. 15. T. D. Crawford, Theor. Chem. Acc. 2006, 115, 227–245. 16. N. Berova; N. Harada; K. Nakanishi, Electronic Spectroscopy: Exciton Coupling, Theory and Applications. In Encyclopedia of Spectroscopy and Spectrometry; J. Lindon, G. Tranter, J. Holmes, Eds.; Academic Press: London, 2000; pp 470–488. 17. S. F. Mason; R. H. Seal; D. R. Roberts, Tetrahedron 1974, 30, 1671–1682. 18. I. Hanazaki; H. Akimoto, J. Am. Chem. Soc. 1972, 94, 4102–4106. 19. L. Di Bari; G. Pescitelli; P. Salvadori, J. Am. Chem. Soc. 1999, 121, 7998–8004. 20. K. Harata; J. Tanaka, Bull. Chem. Soc. Jpn. 1973, 46, 2747–2751. 21. J. M. Bijvoet; A. F. Peerdeman; A. J. Van Bommel, Nature 1951, 168, 271–272. 22. J. Trommel; J. M. Bijvoet, Acta Crystallogr. 1954, 7, 703–709. 23. J. M. Bijvoet; A. F. Peerdeman, Acta Crystallogr. 1956, 9, 1012–1015. 24. J. Tanaka; F. Ogura; H. Kuritani; M. Nakagawa, Chimia 1972, 26, 471–473. 25. J. Tanaka; K. Ozeki-Minakata; F. Ogura; M. Nakagawa, Nature (London) Phys. Sci. 1973, 241, 22–23. 26. J. Tanaka; K. Ozeki-Minakata; F. Ogura; M. Nakagawa, Spectrochim. Acta A 1973, 29, 897–924. 27. N. Harada; Y. Takuma; H. Uda, J. Am. Chem. Soc. 1976, 98, 5408–5409. 28. N. Harada; Y. Takuma; H. Uda, Bull. Chem. Soc. Jpn. 1977, 50, 2033–2038. 29. N. Harada; Y. Takuma; H. Uda, J. Am. Chem. Soc. 1978, 100, 4029–4036. 30. N. Sakabe; K. Sakabe; K. Ozeki-Minakata; J. Tanaka, Acta Crystallogr. B 1972, 28, 3441–3446. 31. J. Tanaka; C. Katayama; F. Ogura; H. Tatemitsu; M. Nakagawa, Chem. Commun. 1973, 21–22. 32. D. H. R. Barton; H. T. Cheung; A. D. Cross; L. M. Jackman; M. Martin-Smith, J. Chem. Soc. 1961, 5061–5073. 33. I. C. Paul; G. A. Sim; T. A. Hamor; J. M. Robertson, J. Chem. Soc. 1962, 4133–4145. 34. S. Hosozawa; N. Kato; K. Munakata, Tetrahedron Lett. 1974, 15, 3753–3756.

Characterization by Circular Dichroism Spectroscopy 35. 36. 37. 38. 39. 40. 41. 42. 43. 44. 45. 46. 47. 48. 49. 50. 51. 52. 53. 54. 55. 56. 57. 58. 59. 60. 61. 62. 63. 64. 65. 66. 67. 68. 69. 70. 71. 72. 73. 74. 75. 76. 77. 78. 79. 80. 81. 82. 83. 84. 85. 86. 87. 88. 89. 90. 91. 92. 93. 94. 95.

143

N. Kato; S. Shibayama; K. Munakata; C. Katayama, Chem. Commun. 1971, 1632–1633. N. Kato; K. Munakata; C. Katayama, J. Chem. Soc., Perkin Trans. 2 1973, 69–73. N. Kato; M. Shibayama; K. Munakata, J. Chem. Soc., Perkin Trans. 1 1973, 712–719. N. Harada; H. Uda, J. Am. Chem. Soc. 1978, 100, 8022–8024. D. Rogers; G. G. Unal; D. J. Williams; S. V. Ley; G. A. Sim; B. S. Joshi; K. R. Ravindranath, J. Chem. Soc., Chem. Commun. 1979, 97–99. M. L. Kenney; F. Corte´s-Selva; J. M. Perez-Victoria; I. A. Jime´nez; A. G. Gonzalez; O. M. Mun˜oz; F. Gamarro; S. Castanys; A. G. Ravelo, J. Med. Chem. 2001, 44, 4668–4676. C. A. Parish; J.-G. Dong; W. G. Bornmann; J. Chang; K. Nakanishi; N. Berova, Tetrahedron 1998, 54, 15739–15758. N. Harada, J. Am. Chem. Soc. 1973, 95, 240–242. A. Guerriro; M. D’Ambrosio; V. Cuomo; F. Vanzanella; F. Pietra, Helv. Chim. Acta 1989, 72, 438–446. M. Koreeda; N. Harada; K. Nakanishi, J. Am. Chem. Soc. 1974, 96, 266–268. T. Ishikawa; M. Murota; T. Watanabe; T. Harayam; H. Ishii, Tetrahedron Lett. 1995, 36, 4269–4272. X. Huang; K. Nakanishi; N. Berova, Chirality 2000, 12, 237–255. G. Pescitelli; S. Gabriel; Y. Wang; J. Fleischhauer; R. W. Woody; N. Berova, J. Am. Chem. Soc. 2003, 125, 7613–7628. S. Matile; N. Berova; K. Nakanishi, Chem. Biol. 1996, 3, 379–392. N. Zhao; P. Zhou; N. Berova; K. Nakanishi, Chirality 1995, 7, 636–651. D. Rele; N. Zhao; K. Nakanishi; N. Berova, Tetrahedron 1996, 52, 2759–2776. N. Zhao; N. Berova; K. Nakanishi; M. Rohmer; P. Mougenot; U. J. Jurgens, Tetrahedron 1996, 52, 2777–2788. A. Kawamura; N. Berova; V. Dirsch; A. Mangoni; K. Nakanishi; G. Schwartz; A. Bielawska; Y. Hannun; I. Kitagawa, Bioorg. Med. Chem. 1996, 4, 1035–1043. N. Harada; K. Nakanishi, J. Am. Chem. Soc. 1969, 91, 3989–3991. G. Cai; N. Bozhkova; J. Odingo; N. Berova; K. Nakanishi, J. Am. Chem. Soc. 1993, 115, 7192–7198. T. Kurta´n; N. Nesnas; Y.-Q. Li; X. Huang; K. Nakanishi; N. Berova, J. Am. Chem. Soc. 2001, 123, 5962–5973. T. Kurta´n; N. Nesnas; F. E. Koehn; Y.-Q. Li; K. Nakanishi; N. Berova, J. Am. Chem. 2001, 123, 5974–5982. X. Huang; N. Fujioka; G. Pescitelli; F. E. Koehn; T. R. Williamson; K. Nakanishi; N. Berova, J. Am. Chem. Soc. 2002, 124, 10320–10335. J. W. Van Klink; S.-H. Baek; A. J. Barlow; H. Ishii; K. Nakanishi; N. Berova; N. B. Perry; R. T. Weavers, Chirality 2004, 16, 549–558. H. Ishii; S. Krane; Y. Itagaki; N. Berova; K. Nakanishi; P. J. Weldon, J. Nat. Prod. 2004, 67, 1426–1430. Q. Yang; C. Olmsted; B. Borhan, Org. Lett. 2002, 4, 3423–3426. X. Li; M. Tanasova; C. Vasileiou; B. Borhan, J. Am. Chem. Soc. 2008, 130, 1885–1893. V. V. Borovkov; J. M. Lintuluoto; Y. Inoue, J. Am. Chem. Soc. 2001, 123, 2979–2989. S. Allenmark, Chirality 2003, 15, 409–422. G. A. Ascoli; E. Domenici; C. Bertucci, Chirality 2006, 18, 667–690. B. Norde´n; T. Kurucsev, J. Mol. Recognit. 1994, 7, 141–156. M. Simonyi; Z. Bikadi; F. Zsila; J. Deli, Chirality 2003, 15, 680–698. Y. Kikuchi; K. Kobayashi; Y. Aoyama, J. Am. Chem. Soc. 1992, 114, 1351–1358. T. Eguchi; K. Kondo; K. Kakinuma; H. Uekusa; Y. Ohashi; K. Mizoue; Y.-F. Qiano, J. Org. Chem. 1999, 64, 5371–5376. C. D. Donner; M. Gill, J. Chem. Soc., Perkin Trans.1 2002, 938–948. B. I. Morinaka; C. K. Skepper; T. F. Molinski, Org. Lett. 2007, 9, 1975–1978. M. Satake; A. Morohashi; H. Oguri; T. Oishi; M. Hirama; N. Harada; T. Yasumoto, J. Am. Chem. Soc. 1997, 119, 11325–11326. N. Harada; A. Saito; H. Ono; S. Murai; H.-Y. Li; J. Gawronski; K. Gawronska; T. Sugioka; H. Uda, Enantiomer 1996, 1, 119–138. I. Akritopoulou-Zanze; K. Nakanishi; H. Stepowska; B. Grzeszczyk; A. Zamojski; N. Berova, Chirality 1998, 9, 699–712. N. Harada; H.-Y. Li; N. Koumura; T. Abe; M. Watanabe; M. Hagiwara, Enantiomer 1997, 2, 349–352. N. Harada; A. Saito; H. Ono; J. Gawronski; K. Gawronska; T. Sugioka; H. Uda; T. Kuriki, J. Am. Chem. Soc. 1991, 113, 3842–3850. D. Rele; N. Zhao; K. Nakanishi; N. Berova, Tetrahedron 1996, 52, 2759–2776. N. Zhao; N. Berova; K. Nakanishi; M. Rohmer; P. Mougenot; U. J. Ju¨rgens, Tetrahedron 1996, 52, 2777–2788. A. Sakurai; H. Horibe; N. Kuboyama; Y. Hashimoto; Y. Okumura, J. Biochem. 1995, 118, 552–554. J. Kobayashi; M. Yoshinaga; N. Yoshida; M. Shiro; H. Morita, J. Org. Chem. 2002, 67, 2283–2286. H. Kobayashi; S. Meguro; T. Yoshimoto; M. Namikoshi, Tetrahedron 2003, 59, 455–459. T. Wang; O. Shirota; K. Nakanishi; N. Berova; L. A. McDonald; L. R. Barbieri; G. Carter, Can. J. Chem. 2001, 79, 1786–1791. T. Sunazuka; T. Shirahata; K. Yoshida; D. Yamamoto; Y. Harigaya; T. Nagai; H. Kiyohara; H. Yamada; I. Kuwajima; S. Omura, Tetrahedron Lett. 2002, 43, 1265–1268. K. Kouda; T. Ooi; K. Kaya; T. Kusumi, Tetrahedron Lett. 1996, 37, 6347–6350. T. F. Molinski; L. J. Brzezinski; J. W. Leahy, Tetrahedron: Asymmetry 2002, 13, 1013–1016. K. Tanaka; Y. Itagaki; M. Satake; H. Naoki; T. Yasumoto; K. Nakanishi; N. Berova, J. Am. Chem. Soc. 2005, 127, 9561–9570. J. D. Chisholm; J. Golik; B. Krishnan; J. A. Matson; D. L. Van Vranken, J. Am. Chem. Soc. 1999, 121, 3801–3802. R. W. Baker; S. Liu; M. V. Sargent, Aust. J. Chem. 1998, 51, 255–266. M. S. Buchanan; M. Gill; P. Millar; S. Phonh-Axa; E. Raudies; J. Yu, J. Chem. Soc. Perkin Trans. 1 1999, 795–801. J. W. Moncrief; W. N. Lipscomb, Acta Cryst. 1966, 21, 322–331. J. P. Kutney; D. E. Gregonis; R. Imhof; I. Itoh; E. Jahngen; A. I. Scott; W. K. Chan, J. Am. Chem. Soc. 1975, 97, 5013–5015. N. Harada; H. Ono; H. Uda; M. Parveen; N. U.-D. Khan; B. Achari; P. K. Dutta, J. Am. Chem. Soc. 1992, 114, 7687–7692. N. Harada; J. Kohori; H. Uda; K. Nakanishi; R. Takeda, J. Am. Chem. Soc. 1985, 107, 423–428. N. Harada; H. Uda; M. Kobayashi; N. Shimizu; I. Kitagawa, J. Am. Chem. Soc. 1989, 111, 5668–5674. R. Lysek; K. Borsuk; M. Chmielewski; Z. Kaluza; Z. Urbanczyk-Lipkowska; A. Klimek; J. Frelek, J. Org. Chem. 2002, 67, 1472–1479. T. T. Danh; W. Bocian; L. Kozerski; P. Szczukiewicz; J. Frelek; M. Chmielewski, Eur. J. Org. Chem. 2005, 67, 429–440.

144 Characterization by Circular Dichroism Spectroscopy 96. 97. 98. 99. 100. 101. 102. 103. 104. 105. 106. 107. 108. 109. 110. 111. 112. 113. 114. 115. 116. 117. 118. 119. 120. 121.

J. Frelek; R. Lysek; K. Borsuk; J. Jagodzinski; B. Furman; A. Klimek; M. Chmielewski, Enantiomer 2002, 7, 107–114. M. Cierpucha; J. Solecka; J. Frelek; P. Szczukiewicz; M. Chmielewski, Biorg. Med. Chem. 2004, 12, 405–416. M. Chmielewski; M. Cierpucha; P. Kowalska; M. Kwit; J. Frelek, Chirality 2008, 20, 621–627. J. Frelek; P. Kowalska; M. Masnyk; A. Kazimierski; A. Korda; M. Woznica; M. Chmielewski; F. Furche, Chem. Eur. J. 2007, 13, 6732–6744. P. J. Stephens; D. M. McCann; F. J. Devlin; A. B. Smith III, J. Nat. Prod. 2006, 69, 1055–1064. P. J. Stephens; J.-J. Pan; F. J. Devlin; M. Urbanova; J. Hajicek, J. Org. Chem. 2007, 72, 2508–2524. P. J. Stephens; J.-J. Pan; F. J. Devlin; K. Kron; T. Kurtan, J. Org. Chem. 2007, 72, 3521–3536. P. J. Stephens; J.-J. Pan; F. J. Devlin; M. Urbanova; O. Julinek; J. Hajicek, Chirality 2008, 20, 454–470. E. Giorgio; K. Tanaka; L. Verotta; K. Nakanishi; N. Berova; C. Rosini, Chirality 2007, 19, 434–445. S. F. Mason; G. W. Vane, J. Chem. Soc. B 1966, 370–374. H. Hussain; K. Krohn; U. Florke; B. Schulz; S. Draeger; G. Pesitelli; P. Salvadori; S. Antus; T. Kurtan, Tetrahedron Asymmetry 2007, 18, 925–930. H. Hussain; K. Krohn; U. Floerke; B. Schulz; S. Draeger; G. Pescitelli; S. Antus; T. Kurtan, Eur. J. Org. Chem. 2007, 292–295. E. Giorgio; K. Tanaka; W. Ding; G. Krishnamurthy; K. Pitts; G. Ellestad; C. Rosini; N. Berova, Bioorg. Med. Chem. 2005, 13, 5072–5079. G. Krishnamurthy; W.-D. Ding; G. A. Ellestad, Tetrahedron 1994, 50, 1341–1349. M. Chatterjee; P. J. Smith; C. A. Townsend, J. Am. Chem. Soc. 1996, 118, 1938–1948. T. Goto; T. Kondo, Angew. Chem. Int. Ed. Engl. 1991, 30, 17–33. T. Kondo; K. Yoshida; A. Nakagawa; T. Kawai; H. Tamura; T. Goto, Nature 1992, 358, 515–518. T. Kondo; M. Ueda; H. Tamura; K. Yoshida; M. Isobe; T. Goto, Angew. Chem. Int. Ed. Engl. 1994, 33, 978–979. M. Shiono; N. Matsugaki; K. Takeda, Nature 2005, 436, 791. T. Kondo; K. Oyama; K. Yoshida, Angew. Chem. Int. Ed. Engl. 2001, 40, 894–897. G. A. Ellestad, Chirality 2006, 18, 134–144. F. Zisla; J. Deli; M. Simonyi, Planta 2001, 213, 937–942. I. Fitos; J. Visy; F. Zsila; G. Ma´dy; M. Simonyi, Bioorg. Med. Chem. 2007, 15, 4857–4862. M. Pistolozzi; C. Bertucci, Chirality 2008, 20, 552–558. R. V. Person; B. R. Peterson; D. A. Lightner, J. Am. Chem. Soc. 1994, 116, 42–59. C. Bombelli; C. Bernadini; G. Elemento; G. Manacini; A. Sorrenti; C. Villani, J. Am. Chem. Soc. 2008, 130, 2732–2733.

Biographical Sketches

Professor Nina Berova received her Ph.D. in chemistry in 1971 from the University of Sofia, Bulgaria. In 1982 she became an associate professor at the University of Sofia and at the Institute of Organic Chemistry, Bulgarian Academy of Sciences. In 1988 she joined the Department of Chemistry of Columbia University, New York, first as a visiting professor, and later she accepted her current position of research professor at the same department. She has been a recipient of many scholarships, among them, in 1989–92 a research fellowship at the University of Bochum, Germany, a visiting professorship in 1994 at Ecole Normal Superieure de Lyon, a lecturership in 1996 by the Japan Society for Promotion of Science (JSPS), and more recently visiting professorships at the University of Naples, the University of Santiago de Compostela, Spain, Tokyo Institute of Technology, and University ‘Louis Pasteur’, Strasbourg. Her research is focused on organic stereochemistry and chiroptical spectroscopy, in particular, on the electronic circular dichroism and its application in structural analysis. In 2000 she was the coeditor and coauthor of a comprehensive monograph Circular Dichroism: Principles and Applications (first edition 1994), coedited with K. Nakanishi

Characterization by Circular Dichroism Spectroscopy

and R. W. Woody, published by Wiley-VCH. She has received various awards including the Gold Medal ‘Piero Pino’ (2003), ACS/CA Editor Award (2005), and ‘Chirality’ Gold Medal (2007). Since 1998 she is the editor of the Wiley-Liss Journal Chirality.

Dr. George A. Ellestad obtained a B.S. and M.S. in chemistry from Oregon State University in 1957 and 1958, respectively, and a Ph.D. in organic chemistry from University of California, Los Angeles in 1962. After postdoctoral studies at the University of London he joined Lederle Laboratories in Pearl River, New York in 1964. His almost 40-year career at Lederle/Wyeth included structural and bioorganic studies on pharmacologically active mold metabolites; spermidine, glycopeptide, and tetracycline antibiotics; enediyne antitumor agents; and finally biophysical chemistry and enzymology for hit and lead characterization and assay development. His group’s efforts on the structure and DNA cleavage chemistry of the enediyne antitumor agent calicheamicin helped lead to Mylotarg, an antibody conjugate for use in the treatment of acute myologenous leukemia. He also contributed to the development of Tygacil, a new semisynthetic tetracycline active against resistant bacterial infections that are no longer susceptible to previously useful antimicrobial agents. George retired from Wyeth in 2004 and in 2005 became an adjunct senior research chemist at Columbia University working in the laboratory of Professors Koji Nakanishi and Nina Berova studying the circular dichroism properties of porphyrin–DNA conjugates. Dr. Ellestad was the recipient of the 2006 ACS Medicinal Chemistry award.

Professor Nobuyuki Harada obtained his B.Sc., M.Sc., and Ph.D. degrees from Tohoku University in 1965, 1967, and 1970, respectively. In 1970 he joined the Chemical Research Institute of Nonaqueous Solutions, Tohoku University as research associate. After postdoctoral studies at the Department of Chemistry, Columbia University, U.S.A. (1973–75) he was promoted as associate professor at the Chemical Research Institute of Nonaqueous Solutions, Tohoku University. In addition, he was appointed as the adjunct associate professor, Institute

145

146 Characterization by Circular Dichroism Spectroscopy

for Molecular Science, Okazaki National Research Institutes, Japan (1980–82) and a visiting research scientist, R&D Department, Experimental Station, Du Pont de Nemours & Company, U.S.A. (1987). In 1992 he was promoted as professor, Institute for Chemical Reaction Science, Tohoku University. In 2006 he retired from Tohoku University and was appointed professor emeritus, Tohoku University. Since then he has been a visiting researcher and scholar, Department of Chemistry, Columbia University, U.S.A. Professor Harada’s research field covers (a) natural products chemistry and structural organic chemistry; (b) theory and development of the CD exciton chirality method; (c) enantioresolution, absolute configurational and conformational studies of chiral compounds by NMR and X-ray methods using novel reagents; and (d) molecular machine, light-powered chiral molecular motors. In 1983 he published with Professor K. Nakanishi the monograph Circular Dichroic Spectroscopy – Exciton Coupling in Organic Stereochemistry, published by University Science Books, Mill Valley, California and Oxford University Press, Oxford. He has also contributed to the chemical community as an active editor of the journal Enantiomer (1996–2002), an associate editor of Chirality (2003–05), and was the organizer of CD Conference 2001, Sendai. He received the Academic Prize from the Chemical Society of Japan in 1984 and Molecular Chirality Award from the Molecular Chirality Research Organization, Japan in 2000.

9.05 Determination of Structure including Absolute Configuration of Bioactive Natural Products Kenji Mori, The University of Tokyo, Tokyo, Japan ª 2010 Elsevier Ltd. All rights reserved.

9.05.1 9.05.2 9.05.3 9.05.4 9.05.4.1 9.05.4.2 9.05.5 9.05.5.1 9.05.5.2 9.05.5.3 9.05.6 9.05.7 9.05.8 9.05.9 9.05.10 References

Introduction Absolute Configuration and Sign of Optical Rotation Elucidating the Structure of Pheromones of Stink Bugs Absolute Configuration Involving Remote Stereocenters German Cockroach Pheromone Plakoside A Absolute Configuration Involving Stereocenters Separated by a Polymethylene Spacer cis-Solamin Murisolin New World Screwworm Fly Origin of Biological Homochirality Exceptions to Biological Homochirality Mimics of Bioactive Natural Products and Bioisosterism Inventions of Pesticides and Medicinals Conclusion

147 147 148 149 149 153 154 154 157 158 158 159 161 163 164 165

9.05.1 Introduction Since the advent of modern physical tools such as UV, IR, 1H-NMR, 13C-NMR, mass spectrometry (MS), circular dichroism (CD), and X-ray analysis, the structure determination of bioactive small molecules is often regarded as a routine operation for natural products chemists.1 Such a view by biologists and many chemists is contestable, and two reviews have appeared recently, both treating incorrectly assigned structures of many natural products.2,3 Even X-ray analysis can be erroneous.3 There are some cases in which the correctly proposed structures of the presumably bioactive molecules do not represent the structures of genuinely bioactive molecules, as shown by the bioassay of synthetic compounds with the proposed structures.2 This type of error usually stems from the incorrect and nonreproducible bioassay methods employed for the biological phenomena in discussion. In the case of the complex marine polyether brevenal (1), the structure as shown in the upper part of Figure 1 was proposed by Bourdelais et al.4 through extensive spectroscopic studies. After completing the synthesis of the proposed structure, Fuwa et al.5 revised the structure of brevenal as 1, because there were subtly distinct discrepancies of the chemical shifts in the 1H- and 13C-NMR spectra of the left-hand region of the synthetic material compared with those of the natural brevenal. It may therefore be important to evaluate the proposed structure by means of its synthesis.

9.05.2 Absolute Configuration and Sign of Optical Rotation Semiochemicals are usually compounds with structures much simpler than that of brevenal. But even with simple compounds, there are possibilities of misassigning their absolute configuration. Male-produced pheromone components of the flea beetle Aphthona flava were isolated and identified in 2001 by Bartelt et al.6 They proposed himachalene-type sesquiterpene structures 2–5 (Figure 2) to the components. In 2004, Mori and coworkers synthesized 2–5 and their enantiomers ent-2–ent-5 from enantiomers of citronellal, and the 147

148 Determination of Structure including Absolute Configuration of Bioactive Natural Products

Figure 1 Structure of brevenal (1).

pheromone components were found to possess the absolute configuration as depicted in ent-2–ent-5.7 Mori’s assignments were opposite to those proposed by Bartelt et al., and indeed ent-2–ent-4 were pheromonally active against the Hungarian flea beetle Phyllotreta cruciferae, while 2–4 were inactive.8 Bartelt et al.6 proposed the absolute configuration 5 for their pheromone component on the basis of its positive rotation (in hexane), because Pandey and Dev9 reported positive rotation (in chloroform) of their synthetic 5. Mori10 synthesized ent-5 by employing (R)-ar-turmerone (6) as the key intermediate, and found it to be dextrorotatory in hexane while levorotatory in chloroform. A simple mistake of using hexane as the solvent, instead of the reported chloroform, for measuring the optical rotation resulted in stereochemical misassignment of the absolute configuration of their pheromone components. A similar example had been reported in 1976.11 (1S,4S,5S)-cis-Verbenol (7) is a pheromone component of Ips bark beetles. Prior to Mori’s work,11 some researchers had called 7 (þ)-cis-verbenol, while others referred to it as ()-cis-verbenol. After synthesis of 7 and measurements of its optical rotations in different solvents, it became clear that 7 was dextrorotatory in acetone or methanol but levorotatory in chloroform. It is therefore of utmost importance to use the same solvent as reported by others, when one compares the sign of the optical rotation of a new sample with the previous data.

9.05.3 Elucidating the Structure of Pheromones of Stink Bugs A simple example of the examination of a proposed structure through synthesis is provided in this section. In 2005, Takita12 proposed the structure of the male-produced aggregation pheromone of the stink bug Eysarcoris lewisi as the sesquisabinene alcohol, (E)-2-methyl-6-(49-methylenebicyclo[3.1.0]hexyl)hept-2-en-1-ol (8) (Scheme 1). Mori13 synthesized (6R)-8 and (6S)-8 from the enantiomers of citronellal (10). The key steps were the intramolecular addition of an -keto carbene to the alkene bond (11 ! 12) and (E)-selective olefination of 13 to give 14. The 1H- and 13C-NMR spectra of 8 around the trisubstituted double bond at C-2 were different from those of the natural pheromone. (Z)-Alcohol 9 was therefore synthesized by (Z)-selective olefination of 13 with Ando’s reagent 15, giving 16. 1 H- and 13C-NMR spectra of synthetic (R,Z)-9 and (S,Z)-9 (both mixtures of diastereomers of C-19 and C-59) were very similar to those of the natural pheromone, and (R,Z)-9 was pheromonally active against E. lewisi.13 Mori’s synthesis, however, could not determine the relative configuration at C-19 and C-59 of the pheromone.

Determination of Structure including Absolute Configuration of Bioactive Natural Products

149

Figure 2 Absolute configuration of Aphthona flava pheromone components.

Mori et al.14 finally determined the absolute configuration of the pheromone as (2Z,6R,19S,59S)-9 by employing lipase-catalyzed asymmetric acetylation of 179 as the key step (Scheme 2). Reduction of (6R)-12 with L-selectride afforded a mixture of 17 and 179, the latter of which could be acetylated with vinyl acetate in the presence of lipase PS-D (Amano) to give 18. The remaining 17 was oxidized to give (6R,19S,59R)-12. Its absolute configuration was determined as depicted by CD comparison with ()-sabina ketone 19 with a known absolute configuration. (2Z,6R,19S,59S)-9 was synthesized from (6R,19S,59R)-12, while (6R,19R,59S)-12 yielded (2Z,6R,19R,59R)-9. NMR and GC comparisons of these two products with the natural pheromone revealed (2Z,6R,19S,59S)-9 to be the correct structure of the pheromone. Synthetic (2Z,6R,19S,59S)-9 was biologically active, and none of its stereoisomers was either active or inhibitory.

9.05.4 Absolute Configuration Involving Remote Stereocenters 9.05.4.1

German Cockroach Pheromone

In 1974, Nishida et al.15–17 isolated and identified the components of the contact sex pheromone of the German cockroach Blattella germanica. They proposed the structures of the three components as 20, 21, and 22 (Figure 3). Their isolated amounts are shown in parentheses. As to the absolute configuration of 20 and 21,

150 Determination of Structure including Absolute Configuration of Bioactive Natural Products

Scheme 1 Synthesis of the possible structures of the male-produced aggregation pheromone of the stink bug Eysarcoris lewisi. Reagents: (i) 37% CH2O, EtCO2H, pyrrolidine, PriOH (90%); (ii) LiAlH4, Et2O (91%); (iii) MeC(OEt)3, EtCO2H, heat (95%); (iv) KOH, aq. EtOH (83%); (v) NaOEt, EtOH; (vi) (COCl)2, C5H5N, hexane (quant., 2 steps); (vii) CH2N2, Et2O (quant.); (viii) Cu, CuSO4, cyclohexane, heat (58%); (ix) OsO4, NaIO4, THF, ButOH, H2O (quant.); (x) Ph3PTC(Me)CO2Et, THF, CH2Cl2 (57%); (xi) Ph3P(Me)Br, BunLi, THF (96%); (xii) Bui2AlH, toluene (55%).

Determination of Structure including Absolute Configuration of Bioactive Natural Products

151

Scheme 2 Synthesis of the male-produced aggregation pheromone of the stink bug Eysarcoris lewisi. Reagents: (i) (a) LiBBus3 H, THF; (b) 30% H2O2, dil. NaOH (94%); (ii) (a) lipase PS-D (Amano), CH2TCHOAc, Et2O, room temperature, 10–13 h, repeat three times; (b) SiO2 chromatography; (iii) Prn4 NRuO4 , NMO, MS 4A CH2Cl2, room temperature, 5 h (quant.); (iv) K2CO3, MeOH (quant.).

Nishida et al.18 proposed the 3S-configuration on the basis of their optical rotatory dispersion (ORD) spectra coupled with NMR studies employing a chiral shift reagent. No information was available to assign the absolute configuration at C-11, because the stereocenter at C-11 was separated from the C-3 stereocenter by seven methylene groups. Mori et al.19 established the absolute configuration of 20 and 21 as 3S,11S by synthesizing all four stereoisomers of 20 and 21 and comparing their physical properties with those of the natural products. As shown in Figure 3, the stereoisomers of 20 and 21 were synthesized from (R)-isopulegol (23) via (R)citronellic acid (24) of 92% ee (enantiomeric excess). Because the two stereocenters of 20 and 21 were separated, their stereoisomers showed identical 1H- and 13C-NMR spectra. However, their IR spectra as nujol mulls (i.e., as solid state and not as solutions) showed differences. Their optical rotations and melting points (mp’s) were also very important in assigning the absolute configuration of natural 20 as shown in Table 1. The natural ketone 20 was dextrorotatory in hexane, and (3S,11S)-20 as well as (3S,11R)-20 showed positive rotations, while (3R,11R)- and (3R,11S)-20 were levorotatory. The natural 20 must therefore be either (3S,11S)or (3S,11R)-20. As chloroform solutions, all the stereoisomers of 20 showed IR spectra that were identical to

152 Determination of Structure including Absolute Configuration of Bioactive Natural Products

Figure 3 Structures of the sex pheromone components of the German cockroach Blattella germanica and related compounds.

each other. Their 1H- and 13C-NMR spectra were also indistinguishable. However, when their IR spectra were measured as nujol mulls, the stereoisomeric and crystalline ketones 20 showed subtle differences in the spectra due to the difference in their crystalline lattice structures. Thus, the IR spectrum of natural 20 was identical to those of (3S,11S)- and (3R,11R)-20. The natural 20 seemed to be (3S,11S)-20 at this stage. To confirm this conclusion, the mp’s of the four stereomers of 20 were measured, and the mixture mp determinations of the four isomers with the natural 20 were carried out. As can be seen from Table 1, (3S,11S)- and (3R,11R)-20 showed the same mp as the natural 20. Mixture mp determinations revealed (3S,11S)-20 to be the natural 20, because it showed no depression.19 The classical method of mixture mp test is still useful in establishing the identity of two like samples. Similarly, the absolute configuration of the natural 21 could be established as 3S,11S.19 Later in 1990, highly pure (>99% ee) stereoisomers of 20 were synthesized from (R)-citronellal and ethyl (R)-3-hydroxybutanoate.20 Bioassay of the four pure isomers of 20 by Schal and coworkers21 showed that the natural pheromone (3S,11S)-20 was the least effective of the four stereoisomers at eliciting courtship

Determination of Structure including Absolute Configuration of Bioactive Natural Products

153

Table 1 Specific rotations, IR spectra, and mp’s of the natural and synthetic stereoisomers of 20 and their mixture mp’s with natural 20

responses in males. The German cockroach produces the least active (3S,11S)-20 due to the stereochemical restriction in the course of its biosynthesis.

9.05.4.2

Plakoside A

In 1997, plakoside A (25) (Figure 4) was isolated by Fattorusso and coworkers22 as an immunosuppressive metabolite of the Carribean sponge Plakortis simplex. It is a structurally unique glycosphingolipid with a prenylated D-galactose moiety and cyclopropane-containing alkyl chains. Its 2S,3R,209R stereochemistry was proposed on the basis of the CD measurements of its degradation products.22 The absolute configuration at the two cyclopropane moieties of 25, however, remained unknown, although the cis-stereochemistry was suggested by detailed 1H-NMR analysis of 25.22 In 2000, Nicolaou et al.23 accomplished the synthesis of (2S,3R,11R,12S,209R,509Z,1109R,1209S)-25, and found its 1H- and 13C-NMR data to be identical to those reported for the natural 25. They therefore claimed their synthetic product to be identical to the natural product. However, in 2001, Seki and Mori24 synthesized both (2S,3R,11R,12S,209R,509Z,1109R,1209S)- and (2S,3R,11S,12R,209R,509Z,1109S,1209R)-25, both of which were spectroscopically indistinguishable from natural 25. Then, which stereoisomer of 25 is plakoside A? In order to solve this problem, degradation studies of natural plakoside A (25) were executed as shown in Figure 5, and the degradation products were compared with the synthetic samples of known absolute configuration.25 Lipase-catalyzed asymmetric acetylation of meso-diol 26 gave enantiomerically pure 27, which was converted to the enantiomers of the reference acids 28 and 29. These acids were derivatized and analyzed by high-performance liquid chromatography (HPLC) according to Ohrui and coworkers.26–28 Esterification of acids 28 and 29 with Ohrui’s chiral and fluorescent reagent ROH 30 yielded esters 31 and 32. All of these derivatives were separable by reversed-phase HPLC at a column temperature of –50 C. Owing to the presence of the anthracene system in 31 and 32, their picogram quantities were detectable by fluorescence, and therefore minute amounts of degradation products could be analyzed. Degradation of plakoside A pentaacetate (33) was executed by first treating it with nitrous acid in acetic anhydride through N-nitrosation at the amide nitrogen of 33 to give 34 and 35, which were further cleaved to give 28 and 29, respectively. A mixture of 28 and 29 was derivatized with 30, and the products were subjected to HPLC analysis to show them to be (6S,7R)-31 and (9S,10R)-32. Accordingly, the absolute configuration of plakoside A must be (2S,3R,11S,12R,209R,509Z,1109S,1209R)-25. The synthetic product (2S,3R,11R,12S,209R,509Z,1109R,1209S)-25 of Nicolaou et al. turned out to be a diastereomer of plakoside A.25 A combination of enantioselective synthesis and HPLC analysis is a powerful method for the determination of the absolute configuration of a compound with stereogenic centers remote from other functionalities and stereogenic centers.

154 Determination of Structure including Absolute Configuration of Bioactive Natural Products

Figure 4 Structure of plakoside A (25).

9.05.5 Absolute Configuration Involving Stereocenters Separated by a Polymethylene Spacer 9.05.5.1

cis-Solamin

Annonaceus acetogenins, isolated from the plant species belonging to Annonaceae (custard apple family), are waxy solids with cytotoxic and antitumor activity. They are characterized by the presence of one or more 2,5disubstituted tetrahydrofuran rings connected to a butenolide through a polymethylene spacer. As exemplified by cis-solamin A (36) and cis-solamin B (37) (Figure 6), metabolites of tropic fruit tree Annona muricata, two stereogenic moieties are separated by a polymethylene spacer. Within the tetrahydrofuran moiety, its relative configuration could be determined by NMR analysis as depicted, but its absolute configuration was difficult to determine. Brown and coworkers29 synthesized four possible stereoisomers (36, ent-36, 37, and ent-37) of cis-solamin by the route summarized in Figure 6. The four synthetic isomers of cis-solamin were indistinguishable from each other and from natural cis-solamin on the basis of their IR, MS, 1H-NMR, and 13C-NMR spectra, owing to the length and flexibility of the spacer connecting the tetrahydrofurandiol and butenolide moieties. Optical rotation values obtained for each of the pairs of diastereomers were also very similar, and consistent with the known fact that the contribution to optical rotation from the butenolide moiety dominates that from a pseudosymmetrical

Determination of Structure including Absolute Configuration of Bioactive Natural Products

Figure 5 Determination of the absolute configuration of plakoside A.

155

156 Determination of Structure including Absolute Configuration of Bioactive Natural Products

Figure 6 Structure and synthesis of cis-solamin A (36).

tetrahydrofurandiol region in acetogenins. In the course of this study, Brown and coworkers29 found that all four isomers (36, ent-36, 37, and ent-37) were separable by enantioselective HPLC employing a cyclodextrinbased stationary phase. Subsequently, Figade´re and coworkers30 analyzed natural cis-solamin by enantioselective HPLC, and found it to be a 9:8 mixture of cis-solamin A (36) and cis-solamin B (37). Thus, natural cis-solamin was stereochemically heterogeneous demonstrating that enantioselective chromatography is indeed a powerful technique in

Determination of Structure including Absolute Configuration of Bioactive Natural Products

157

stereochemical studies of natural products. Although NMR and X-ray analysis are regarded as the most powerful techniques for structure elucidation, a chromatographic method gave the decisive evidence to show the heterogeneity of cis-solamin. 9.05.5.2

Murisolin

In 2006, Curran et al.31 published an important paper on murisolin (43; Figure 7), another acetogenin, entitled ‘On the proof and disproof of natural product stereostructures’. They synthesized two 16-member stereoisomer libraries of murisolin isomers that provided 24 of the 32 possible diastereomers of murisolin (43). Each member of the 16-member sublibrary of murisolins was subjected to NMR analysis at 600 MHz (1H) and 150 MHz (13C). The library members have 4R,34S configurations in the butenolide moiety with all possible configurations at the remaining stereogenic centers in the tetrahydrofurandiol fragment. Every NMR spectrum belongs to one of only six groups, and the spectra within each group are substantially identical. Symmetry considerations of simple model compounds 44 as shown in Figure 7 help us to understand why there are only six groups. The butenolide group in 43 is substantially separated from the tetrahydrofurandiol moiety, and therefore 1H-NMR spectrum of the former cannot be affected sufficiently to show differences due to the stereochemistry of the latter. The six groups of the 1H-NMR spectra were organized according to the local symmetry of the tetrahydrofurandiol moiety. On the basis of this NMR information, inspection of the NMR spectrum of a murisolin stereoisomer enables users to assign the relative configuration to its tetrahydrofurandiol region. Very small (0.1 ppm) differences were observed in the hydroxybutenolide region of the 150 MHz 13C-NMR spectra of

Figure 7 Structure of murisolin (43) and group classifications of its stereoisomers on the basis of simple model compounds 44.

158 Determination of Structure including Absolute Configuration of Bioactive Natural Products

murisolin stereoisomers, based on the syn/anti relative configuration at C-4 and C-34. Derivatization of 43 and its stereoisomers to tris-(S)-Mosher esters followed by NMR measurements revealed that murisolin-Mosher ester stereoisomers exhibited one of only 10 sets of 1H-NMR spectra. Through these observations it was possible to assign 4R,15R,16R,19R,20R,34S configuration to murisolin (43), which was in accord with the previous proposals. Curran et al. also comment that enantioselective HPLC is superior to either optical rotation or melting point comparisons to prove or disprove structures, if all the candidate isomers are available. Curran et al.’s work informs us that construction of stereoisomer libraries followed by thorough studies on their NMR and enantioselective HPLC behaviors is an especially reliable way of elucidating the stereostructure of natural products. This type of thorough stereochemical analysis is likely to become more popular in the future in connection with the advances in parallel synthesis. 9.05.5.3

New World Screwworm Fly

Female-produced sex pheromones of the New World screwworm fly Cochliomyia hominivorax were first studied by Pomonis et al.32 in 1993. They isolated 16 pheromone candidates from the female flies, but they were unable to identify the pheromonally active compounds. In 2002, Mori and coworkers33 synthesized stereoisomeric mixtures of 45 and 46 (Figure 8), and they were found to be pheromonally active. Subsequently, all four stereoisomers of the more potent acetate 45 were synthesized as shown in Figure 8.34 In the course of the synthesis, the parent alcohol 50 was esterified with the anthracene-containing acid (1S,2S)-51, and the derived ester 52 was analyzed by HPLC at 25 C by the method of Ohrui and coworkers.26–28 All four diastereomers of 52 were separable, and therefore the stereochemical purities of the four isomers of 45 could be estimated as depicted.34 The four isomers of 45 showed identical IR, 1H-NMR, and 13C-NMR spectra. In addition, all of them were equally bioactive as the sex pheromone. Accordingly, their derivatization to 52 followed by HPLC analysis was the only way to distinguish the stereoisomers. Finally, the natural pheromone component was shown to be (6R,19R)-45 by its derivatization to 52 followed by HPLC analysis.35 Enantioselective HPLC or gas chromatography (GC) and chromatographic analysis after derivatization with Ohrui’s reagent seems to be the most sensitive method for discrimination of stereoisomers. Thus, it may be concluded that structural analysis must, from time to time, be carried out by employing various kinds of different analytical methods. Otherwise, mistakes are likely to occur.

9.05.6 Origin of Biological Homochirality A characteristic hallmark of life is believed to be its ‘homochirality’.36 In general, it is true, although natural products are not always enantiomerically pure.37 The origin of biomolecular homochirality is discussed in depth by MacDermott.36 Those who are interested to see whether the parity-violating weak force is the cosmic dissymmetry that Pasteur was looking for should read her chapter in the book entitled ‘Chirality in Natural and Applied Science’. Soai et al.38 discovered and developed asymmetric autocatalysis (Figure 9), in which the structures of the chiral catalyst (S)-54 and the chiral product (S)-54 are the same after the addition of diisopropylzinc to aldehyde 53. Consecutive asymmetric autocatalysis starting with (S)-54 of 0.6% ee amplifies its ee, and yields itself as the product with >99.5% ee. Even chiral inorganic crystals, such as quartz or sodium chlorate, act as chiral inducers in this reaction. Soai et al.’s asymmetric autocatalysis gives us an insight to speculate on the early asymmetric reactions on this planet Earth. However, it can be argued whether such strictly anhydrous organometallic reactions are possible under the nonartificial conditions or not. A phenomenon that may be related to the origin of biological homochirality was recently reported by Cooks and coworkers:39 Serine sublimes with spontaneous chiral amplification. Sublimation of near racemic sample of serine 55 (Figure 9) yields a sublimate that is enriched in the major enantiomer. The chiral purity maximizes at 190–210 C, and then falls as thermolysis becomes favorable. This simple one-step sublimation may represent a possible mechanism for the chiral amplification step to explain the origin of biological homochirality.

Determination of Structure including Absolute Configuration of Bioactive Natural Products

159

Figure 8 Synthesis of four stereoisomers of the most potent component 45 of the female sex pheromone of the New World screwworm fly Cochliomyia hominivorax.

9.05.7 Exceptions to Biological Homochirality Until recently, we believed our human bodies to be constituted from L-amino acids only. Advances in analytical methods now indicate that there are a number of D-amino acids in human bodies as detailed in the review by Fujii and Saito.40 Free D-serine was observed predominantly in mammalian brain, and free D-aspartic acid

160 Determination of Structure including Absolute Configuration of Bioactive Natural Products

Figure 9 Structures of compounds in connection with biological homochirality (1).

Figure 10 Structures of compounds in connection with biological homochirality (2).

(56; Figure 10) exists in various mammalian tissues. For example, in the prefrontal cortex of human brain, as much as 60% of the total aspartic acid was in the D-form at week 14 of gestation, but rapidly decreased to trace levels by the time of birth.

Determination of Structure including Absolute Configuration of Bioactive Natural Products

161

D-Amino acids were detected in various aged human tissues such as tooth, bone, aorta, brain, erythrocytes, eye lens, skin, ligament, and lung. Especially D-serine was found in the -amyloid protein of Alzheimer’s disease. Stereoinversion of L-aspartic acid to D-aspartic acid takes place in alpha A and alpha B crystallins of human lens. Thus, aged persons possess more D-aspartic acid in the lens. This phenomenon seems to be related to cataract.40 The fluctuation of the amount of free D-amino acids in living bodies suggests that D-amino acids might be one of the factors controlling the generation and differentiation of cells or tissues. D-Amino acids in proteins can be interpreted as molecular markers of aging.40 Another review is available concerning the occurrence and functions of free D-aspartic acid and its metabolizing enzymes.41 As already described in the first edition of this Comprehensive Series,42 the limpet Achmeia (Collisella) limatula produces a defensive metabolite, limatulone, both as the racemate (57 and its enantiomer) and as the mesoisomer 58.43 This example demonstrates that nature does not always produce enantiomerically pure compounds. In 2006, Rezanka et al.44 isolated an antifeedant, syriacin (59), from the freshwater sponge Ephydatia syriaca in the Jordan river. It is an unusual sulfated ceramide glycoside with branched-chain sphingosine and also a branched-chain fatty acid. From the viewpoint of absolute configuration, 59 is very unusual, because it contains both (R)- and (S)-configured sec-butyl terminals in its alkyl chains. It seems to be biosynthesized from precursors with opposite absolute configuration.

9.05.8 Mimics of Bioactive Natural Products and Bioisosterism There are practical demands for the invention of pheromone mimics, because pheromones are often too labile to be used in pest control. Various mimics have been prepared to date, several of which will be described in this section. Tacke et al.45 synthesized the enantiomers of sila-linalool (61) as shown in Figure 11. The starting material 60 was converted into ()-61, which was resolved by GC to give both (þ)-61 and (–)-61. Both enantiomers were bioactive as tested by electroantennographic detection (EAD) on the males of the vernal solitary bee Colletes cunicularius. There was no major difference between the bioactivity of the sila-pheromone 61 and the natural linalool. The substitution of a carbon atom by silicon provides a good example of bioisosterism. (1S,5R)-Frontalin (62) is the aggregation pheromone of bark beetles such as Dendroctonus brevicomis and D. frontalis. Strunz et al.46 synthesized its isomer 63, which was shown to be pheromonally active. Bravo et al.47 synthesized the trifluoro analogue 64 of frontalin. Its bioactivity, however, was not reported. (4S,5R)-Eldanolide (65) is the male-produced sex pheromone of the African sugarcane borer Eldana saccharina. Itoh et al.48,49 reported the synthesis and pheromonal activity of its fluorinated analogues 66–68. Two analogues, 66 and ent-66, were bioactive, while the remaining four analogues showed no activity as revealed by EAD. (7R,8S)-Disparlure (69) is the female-produced sex pheromone of the gypsy moth Lymantria dispar. Plettner and coworkers50 synthesized and bioassayed its 5-oxa analogues 70 and ent-70. GC-EAD bioassay revealed both 70 and ent-70 to be bioactive. The dose–response curve for 70 and that for ent-70 were similar. Interestingly, pheromone-binding protein 1 (PBP1), which binds (7S,8R)-ent-69 strongly, binds 70 and ent-70 with nearly the same affinity as ent-69. The affinity of PBP1 for naturally occurring (7R,8S)-69 is known to be much weaker than for ent-69. Neither 70 nor ent-70 functioned as a pheromone inhibitor. The concept of bioisosterism works in this case, too, although with a subtle difference. (4R,8R)-4,8-Dimethyldecanal (71; tribolure) is the aggregation pheromone of the flour beetle Tribolium castaneum and T. confusum. Due to the air sensitivity of 71 as an aldehyde, the more stable formate ester 72 was synthesized. This was found to be bioactive and was used in commercial pheromone traps.51 This is an example of bioisosterism by which a carbon atom is replaced by an oxygen atom. (2S,3R,19R)-Stegobinone (73) is the female sex pheromone of the drugstore beetle Stegobium paniceum. Its (2S,3R,19S) isomer is a strong inhibitor of pheromone action. The methyl group at C-19 of 73 is so readily epimerizable that the natural 73 soon becomes biologically inactive, and 73 cannot be used practically.

162 Determination of Structure including Absolute Configuration of Bioactive Natural Products

Figure 11 Structures of pheromones and their mimics.

Determination of Structure including Absolute Configuration of Bioactive Natural Products

163

Scientists at Fuji Flavor Co. synthesized stegobiene (74), which showed pheromone activity and could be used commercially to monitor the population of the drugstore beetle. The female sex pheromone (R)-75 of the Israeli pine bast scale Matsucoccus josephi is also a potent kairomone that attracts the scale insect’s predator Elatophilus hebraicus. A mimic 76 of the pheromone 75 shows only the pheromone activity with no kairomone activity.52,53 Accordingly, 76 is a more useful population-monitoring agent for M. josephi than the pheromone itself, which also catches the beneficial predator E. hebraicus.

9.05.9 Inventions of Pesticides and Medicinals Natural products continue to be prototypes of pesticides and medicinals. Chemists’ creativity and efforts have brought about many new mimics that are more potent, more economical, and more stable or safer than the original natural products. Pyrethrum powder is the dried flowerheads of Chrysanthemum cinerariaefolium and has been used widely as an insecticide. Its active principle was studied by L. Ruzicka, H. Staudinger, R. Yamamoto, and others, and pyrethrin I (77) (Figure 12) was identified as the major component.54 Even now, after 80 years following the elucidation of the structure, modification of 77 continues to generate a group of insecticides called pyrethroids. Allethrin (78) was the first mimic to be manufactured in a large scale by Sumitomo Chemical Co. in 1953. Subsequently, in 1979 Sumitomo developed (S)-fenvalerate (79), while in 1981 etofenprox (80) was commercialized by Mitsui Chemical Co. These two compounds are stable in field conditions and are widely used as agricultural insecticides.55 Agelasphin 9b (81) and its relatives were isolated from the Okinawan marine sponge Agelas mauritianus as glycosphingolipids and they exhibited anticancer activity in vivo in mice and humans.56 By simplifying the structure of 81, researchers at Kirin Brewery Co. developed KRN7000 (82) as an anticancer drug candidate.57 It has been shown that KRN7000 (82) is a ligand that forms a complex with CD1d protein, a glycolipid presentation protein on the surface of the antigen-presenting cells of the immune system. Lipid alkyl chains of 82 are bound in the grooves in the interior of the CD1d protein and the galactose head group of 82 is presented to the invariant V14 antigen receptors of natural killer (NK)T cells. After activation by recognition of the CD1d–82 complex, NKT cells release both helper T1 (Th1) and Th2 types of cytokines simultaneously in large quantities. Th1-type cytokines such as interferon (IFN)- mediate protective immune functions like tumor rejection, whereas Th2-type cytokines such as interleukin (IL)-4 mediate regulatory immune functions to ameliorate autoimmune diseases. Th1- and Th2-type cytokines can antagonize each other’s biological actions. Because of this antagonism, the use of 82 for clinical therapy has not been successful yet. To circumvent this problem, many research groups modified the structure of KRN7000 (82) to develop new analogues of 82 that induce NKT cells to produce either Th1- or Th2-type cytokines. Modification of the -galactosyl part of 82 afforded -C-GalCer (83)58 and RCA1-56 (84).59 These two compounds showed an enhanced Th1-type response in vivo to generate IFN- . Modification of the phytosphingosine part of 82 by shortening the alkyl chain to give OCH (85) resulted in an enhanced Th2-type response in vivo to produce IL-4.60 Introduction of an aromatic ring at the end of the fatty acid chain to give 86 caused enhanced IFN- production.61 Chemical modification of the parent KRN7000 (82) turned out to be a promising way to invent a more specific anticancer drug candidate. It is true that we can see computer-generated docking models of the bioactive prototype compounds and their receptors. Even so, however, invention of mimics is restricted by the limited human capacity to imagine only conventional changes in functional groups and skeletons of the parent compounds. Natural products certainly will give us new and vast opportunities to find out unusual structures beyond our imagination. There are many mimics of natural products in the areas of taste, flavor, and fragrance. These interesting topics have been treated in Volume 4.

164 Determination of Structure including Absolute Configuration of Bioactive Natural Products

Figure 12 Structures of natural product-inspired pesticides and medicinals.

9.05.10 Conclusion This chapter has treated the three important points in the studies of bioactive natural products.

Determination of Structure including Absolute Configuration of Bioactive Natural Products

165

First, recent examples of the determination of structure including the absolute configuration of bioactive natural products have been discussed, emphasizing the techniques to solve stereochemical problems among compounds with remote stereogenic moieties separated by a polymethylene spacer. Case studies with pheromones and acetogenins have been given to illustrate the problems and solutions. Second, problems related to biological homochirality have been discussed to contemplate its origin and also to see exceptions to the homochirality principle. The presence and roles of D-amino acids in organisms as well as the presence of stereochemically heterogeneous compounds have been illustrated with examples. Third, invention of mimics of bioactive natural products has been briefly discussed to show the importance of natural products as prototypes of pesticides and medicinals. There are so many new discoveries in natural products research that no one can be an expert in all the areas unless they were content to be superficial. Thus, we have to remember the following words of Apostle Paul, ‘‘The person who thinks he knows something really does not know as he ought to know.’’ (I Corinthians 8:2).

Abbreviations EAD HPLC IFN IL NK mp ORD PBP1

electroantennographic detection high-performance liquid chromatography interferon interleukin natural killer melting point optical rotatory dispersion pheromone-binding protein 1

References 1. K. Mori, Ed., Comprehensive Natural Products Chemistry, Vol. 8: Miscellaneous Natural Products including Marine Natural Products, Pheromones, Plant Hormones, and Aspect of Ecology; Elsevier: Oxford, 1999; Chapter 1, pp 2–6. 2. K. Mori, Chem. Rec. 2005, 5, 1–6. 3. K. C. Nicolaou; S. A. Snyder, Angew. Chem. Int. Ed. Engl. 2005, 44, 1012–1044. 4. A. J. Bourdelais; H. M. Jacocks; J. L. C. Wright; P. M. Bigwarfe, Jr.; D. G. Baden, J. Nat. Prod. 2005, 68, 2–6. 5. H. Fuwa; M. Ebine; A. J. Bourdelais; D. G. Baden; M. Sasaki, J. Am. Chem. Soc. 2006, 128, 16989–16999. 6. R. J. Bartelt; A. A. Cosse´; B. W. Zilkowski; D. Weisleder; F. A. Momany, J. Chem. Ecol. 2001, 27, 2397–2423. 7. S. Muto; M. Bando; K. Mori, Eur. J. Org. Chem. 2004, 1946–1952. 8. M. To´th; E. Csonka; R. J. Bartelt; A. A. Cosse´; B. W. Zilkowski; S. Muto; K. Mori, J. Chem. Ecol. 2005, 31, 2705–2720. 9. R. C. Pandey; S. Dev, Tetrahedron 1968, 24, 3829–3839. 10. K. Mori, Tetrahedron: Asymmetry 2005, 16, 685–692. 11. K. Mori; N. Mizumachi; M. Matsui, Agric. Biol. Chem. 1976, 40, 1611–1615. 12. M. Takita, Tohoku Nogyo Kenkyu Seika Joho 2005, 19, 50–51. 13. K. Mori, Tetrahedron Asymmetry 2007, 18, 838–846. 14. K. Mori; T. Tashiro; T. Yoshimura; M. Takita; J. Tabata; S. Hiradate; H. Sugie, Tetrahedron Lett. 2008, 49, 354–357. 15. R. Nishida; H. Fukami; S. Ishii, Experientia 1974, 30, 978–979. 16. R. Nishida; H. Fukami; S. Ishii, Appl. Entomol. Zool. 1975, 10, 10–18. 17. R. Nishida; T. Sato; Y. Kuwahara; H. Fukami; S. Ishii, J. Chem. Ecol. 1976, 2, 449–455. 18. R. Nishida; Y. Kuwahara; H. Fukami; S. Ishii, J. Chem. Ecol. 1979, 5, 289–297. 19. K. Mori; S. Masuda; T. Suguro, Tetrahedron 1981, 37, 1329–1340. 20. K. Mori; H. Takikawa, Tetrahedron 1990, 46, 4473–4486. 21. D. Eliyahu; K. Mori; H. Takikawa; W. S. Leal; C. Schal, J. Chem. Ecol. 2004, 30, 1839–1848. 22. V. Costantino; E. Fattorusso; A. Mangoni; M. Di Rosa; A. Ianaro, J. Am. Chem. Soc. 1997, 119, 12465–12470. 23. K. C. Nicolaou; J. Li; G. Zanke, Helv. Chim. Acta 2000, 83, 1977–2006. 24. M. Seki; K. Mori, Eur. J. Org. Chem. 2001, 3797–3809. 25. T. Tashiro; K. Akasaka; H. Ohrui; E. Fattorusso; K. Mori, Eur. J. Org. Chem. 2002, 3659–3665. 26. H. Ohrui; H. Terashima; K. Imaizumi; K. Akasaka, Proc. Jpn. Acad. Ser. B Phys. Biol. Sci. 2002, 78, 69–72. 27. K. Imaizumi; H. Terashima; K. Akasaka; H. Ohrui, Anal. Sci. 2003, 19, 1243–1249. 28. H. Ohrui, Proc. Jpn. Acad. Ser. B Phys. Biol. Sci. 2007, 83, 127–135. 29. A. R. L. Cecil; Y. Hu; M. J. Vincent; R. Duncan; R. C. D. Brown, J. Org. Chem. 2004, 69, 3368–3374. 30. Y. Hu; A. R. L. Cecil; X. Frank; C. Gleye; B. Figade´re; R. C. D. Brown, Org. Biomol. Chem. 2006, 4, 1217–1219.

166 Determination of Structure including Absolute Configuration of Bioactive Natural Products 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42.

43. 44. 45. 46. 47. 48. 49. 50. 51. 52. 53. 54.

55.

56. 57. 58. 59. 60. 61.

D. P. Curran; Q. Zhang; H. Lu; V. Gudipati, J. Am. Chem. Soc. 2006, 128, 9943–9956. J. G. Pomonis; L. Hammack; H. Hakk, J. Chem. Ecol. 1993, 19, 985–1007. A. Furukawa; C. Shibata; K. Mori, Biosci. Biotechnol. Biochem. 2002, 66, 1164–1169. K. Mori; T. Ohtaki; H. Ohrui; D. R. Berkebile; D. A. Carlson, Eur. J. Org. Chem. 2004, 1089–1096. K. Akasaka; D. A. Carlson; T. Ohtaki; H. Ohrui; K. Mori; D. R. Berkebile, to be submitted. A. J. MacDermott, The Origin of Biomolecular Chirality. In Chirality in Natural and Applied Science; W. J. Lough,I. W. Wainer, Eds.; CRC Press: Boca Raton, FL, 2002; Chapter 2, pp 23–52. K. Mori, Acc. Chem. Res. 2000, 33, 102–110. K. Soai; I. Sato; T. Shibata, Chem. Rec. 2001, 1, 321–332. R. H. Perry; C. Wu; M. Nefliu; R. G. Cooks, Chem. Commun. 2007, 1071–1073. M. Fujii; T. Saito, Chem. Rec. 2004, 4, 267–278. R. Yamada; Y. Kera; S. Takahashi, Chem. Rec. 2006, 6, 259–266. K. Mori, Overview. In Comprehensive Natural Products Chemistry, Vol. 8: Miscellaneous Natural Products including Marine Natural Products, Pheromones, Plant Hormones, and Aspect of Ecology; K. Mori, Ed.; Elsevier: Oxford, 1999; Chapter 1, pp 7–10. K. Mori; H. Takikawa; M. Kido, J. Chem. Soc. Perkin Trans. 1 1993, 169–179. T. Rezanka; K. Sigler; V. M. Dembitsky, Tetrahedron 2006, 62, 5937–5943. R. Tacke; T. Schmidt; M. Hofmann; T. Tolasch; W. Francke, Organometallics 2003, 22, 370–372. G. M. Strunz; C.-M. Yu; L. Ya; P. S. White; E. A. Dixon, Can. J. Chem. 1990, 68, 782–786. P. Bravo; E. Corradi; M. Frigerio; S. V. Meille; W. Panzeri; C. Pesenti; F. Viani, Tetrahedron Lett. 1999, 40, 6317–6320. T. Itoh; K. Sakabe; K. Kudo; P. Zagatti; M. Renou, Tetrahedron Lett. 1998, 39, 4071–4074. T. Itoh; K. Sudo; K. Yokota; N. Tanaka; S. Hayase; M. Renou, Eur. J. Org. Chem. 2004, 406–412. J. A. H. Inkster; I. Ling; N. S. Honson; L. Jacquet; R. Gries; E. Plettner, Tetrahedron Asymmetry 2005, 16, 3773–3784. K. Mori; S. Kuwahara; M. Fujiwhara, Proc. Indian Acad. Sci. (Chem. Sci.) 1988, 100, 113–117. S. Kurosawa; M. Takenaka; E. Dunkelblum; Z. Mendel; K. Mori, ChemBioChem 2000, 1, 56–66. E. Dunkelblum; M. Harel; F. Assael; K. Mori; Z. Mendel, J. Chem. Ecol. 2000, 26, 1649–1657. E. D. Morgan; I. D. Wilson, Insect Hormones and Insect Chemical Ecology. In Comprehensive Natural Products Chemistry, Vol. 8: Miscellaneous Natural Products including Marine Natural Products, Pheromones, Plant Hormones, and Aspect of Ecology; K. Mori, Ed.; Elsevier: Oxford, 1999; Chapter 5, pp 333–336. K. Mori, Searching Environmentally Benign Methods for Pest Control: Reflection of a Synthetic Chemist. In Pesticide Chemistry. Crop Protection, Public Health, Environmental Safety; H. Ohkawa, H. Miyagawa, P. W. Lee, Eds.; Wiley-VCH Verlag: Weinheim, 2007; Chapter 2, pp 13–22. T. Natori; M. Morita; K. Akimoto; Y. Koezuka, Tetrahedron 1994, 50, 2771–2776. M. Morita; K. Motoki; K. Akimoto; T. Natori; T. Sakai; E. Sawa; K. Yamaji; Y. Koezuka; E. Kobayashi; H. Fukushima, J. Med. Chem. 1995, 38, 2176–2187. G. Yang; J. Schmieg; M. Tsuji; R. W. Franck, Angew. Chem. Int. Ed. Engl. 2004, 43, 3818–3822. T. Tashiro; R. Nakagawa; T. Hirokawa; S. Inoue; H. Watarai; M. Taniguchi; K. Mori, Tetrahedron Lett. 2007, 48, 3343–3347. K. Murata; T. Toba; K. Nakanishi; B. Takahashi; T. Yamamura; S. Miyake; H. Annoura, J. Org. Chem. 2005, 70, 2398–2401. M. Fujino; D. Wu; R. Garcia-Navarro; D. D. Ho; M. Tsuji; C.-H. Wong, J. Am. Chem. Soc. 2006, 128, 9022–9023.

Biographical Sketch

Kenji Mori was born in 1935. In all, he spent 42 years at the University of Tokyo. He holds B.Sc. (agricultural chemistry, 1957), M.Sc. (biochemistry, 1959), and Ph.D. (organic chemistry, 1962) degrees. He was appointed as assistant professor in the Department of Agricultural Chemistry at the University of Tokyo (1962), and was promoted to associate professor (1968) and professor (1978–95). Currently, he is Professor Emeritus. Dr. Mori worked for 7 years (1995–2001) as a professor at the Science University of Tokyo. At present, he is a research consultant at RIKEN (Institute of Physical and Chemical Research) and at

Determination of Structure including Absolute Configuration of Bioactive Natural Products

Toyo Gosei Co., Ltd. He was awarded the Japan Academy Prize (1981), the Silver Medal of the International Society of Chemical Ecology (1996), the American Chemical Society’s Ernest Guenther Award in the Chemistry of Natural Products (1999), the Special Prize of the Society of Synthetic Organic Chemistry, Japan (2003), and the Frantisek Sorm Memorial Medal of the Academy of the Czech Republic (2003).

167

9.06 NMR – Small Molecules and Analysis of Complex Mixtures Arthur S. Edison, University of Florida, Gainesville, FL, USA Frank C. Schroeder, Cornell University, Ithaca, NY, USA ª 2010 Elsevier Ltd. All rights reserved.

9.06.1 9.06.1.1 9.06.1.2 9.06.2 9.06.2.1 9.06.2.2 9.06.2.3 9.06.2.4 9.06.2.5 9.06.3 9.06.3.1 9.06.3.1.1 9.06.3.1.2 9.06.3.2 9.06.3.2.1 9.06.3.2.2 9.06.3.2.3 9.06.3.3 9.06.3.4 9.06.4 9.06.4.1 9.06.4.2 9.06.4.3 9.06.4.4 9.06.4.5 9.06.4.6 9.06.4.7 9.06.5 References

Introduction Sensitivity Mixtures Routine NMR Spectroscopy for Natural Products Structure Elucidation COSY and TOCSY HSQC and HMQC HMBC NOESY and ROESY Other Techniques and Current Limitations Complex Mixtures NMR Spectroscopic Analysis of Complex Natural Products Mixtures Comparing NMR spectroscopic characterization of proteins and small molecule mixtures NMR spectroscopy versus mass-spectrometry-based approaches for characterizing crude mixtures Differential Analysis through 2D NMR Spectroscopy DANS for screening of a fungal extract library DANS-based identification of bacillaene Identification of signaling molecules in Caenorhabditis elegans through DANS Complex Mixture Analysis by NMR Metabolomics/Metabonomics Methods to Improve Sensitivity Specialized NMR Probes Signal-to-Noise Issues Small Coils Cooling the Electronics High-Temperature Superconducting Coils Probe Summary Dynamic Nuclear Polarization Outlook

169 170 171 172 173 175 176 177 177 179 180 181 182 183 183 184 184 186 187 187 188 189 189 189 190 191 191 192 193

9.06.1 Introduction Modern spectroscopic techniques have revolutionized compound identification and quantification. Only a few decades ago, identification of a structurally complex natural product would require multigram quantities of isolated material, which would then be subjected to series of derivatization and degradation experiments, aiming to deduce the unknown’s structure from that of resulting derivatives or fragments that may represent known compounds. As a result of the tremendous advances in sensitivity and resolution of NMR spectroscopy over the past 30 years, identification of microgram quantities of new compounds has now become routine. For example, the structure of the polyketide antibiotic, erythromycin (1), was identified in 1957 only after extensive chemical and spectroscopic studies based on multigram amounts of isolated compound.1–3 By the time its 169

170 NMR – Small Molecules and Analysis of Complex Mixtures

Figure 1 Structures of the polyketide antibiotic erythromycin (1) and the marine polyketide hemi-phorboxazole A (2), which was recently identified based on a 16.5 mg (28 nmol) sample isolated from Phorbas sp.7

chemical structure was finally identified, erythromycin had already found extensive use in human medicine. Today, natural products of similar complexity, for example, hemi-phorboxazol (2), are routinely identified based on samples of 100 mg or less4–7 (Figure 1). Of course, factors such as the structural complexity and novelty of the discovered compounds must be considered when making such comparisons. Whereas it may not be particularly challenging to design an analytical method that can reliably detect 109 mol of a known compound (e.g., a pesticide residue), determining the structure of an unknown natural product based on 109 mol of sample will likely present great difficulty.8

9.06.1.1

Sensitivity

A recent example illustrates how increases in sensitivity and the advent of multidimensional NMR spectroscopy have truly revolutionized organic structure determination. Identification of the first cardiotonic steroids from an invertebrate source in the late 1970s required the extraction of 28 000 Photinus pyralis fireflies. The crude extract was fractionated into five pure fractions, representing amounts from over 1 g down to 70 mg, which were then characterized by a combination of chemical and spectroscopic methods. Key structural information was afforded by one-dimensional 1H- and 13C-NMR spectroscopic analyses using a modest 250 MHz NMR spectrometer, resulting in identification of the bufadienolide (3).9 Just over 25 years later, a similar analysis was carried out using a partially purified extract obtained from only 50 fireflies of the rare species, Lucidota atra. A 600 MHz spectrometer equipped with a microcoil probe was used,10 allowing the characterization of 13 new bufadienolides (e.g., 4) present in amounts ranging from 20 to 75 mg, corresponding to a decrease in sample requirement of roughly four orders of magnitude.4 In another example, the disulfated steroid (5) was identified based on a sample of only 4 mg (6 nmol) and a 1.7 mm microprobe at 500 MHz. The steroid (5) functions as a ‘sperm attracting and activating factor’ (SAAF) in chemical signaling systems of the ascidian (sea squirt) Ciona intestinalis (Figure 2).11 Improvements in NMR spectroscopic sensitivity also benefit studies aimed at elucidation of the biological context of natural products. For example, NMR spectroscopic analysis of insect metabolites traditionally necessitated the pooling of material collected from multiple individuals, effectively eliminating NMR spectroscopy as a technique that could be used for the detailed analysis of metabolite dynamics in ecological studies. However, sensitivity increases derived from microsample NMR technology enabled Dossey et al.12 to analyze metabolite mixtures within individual walking sticks, Anisomorpha buprestoides, permitting complete characterization of the iridioid anisomorphal (6) from a single insect specimen. Using the A. buprestoides secretion as a model system, subsequent work by Zhang et al.13 demonstrated the application of covariance-based mixture analysis to automatically identify individual components in an unpurified sample. The ability to analyze individual specimens by both NMR spectroscopy and MS holds considerable promise for future biological studies.

NMR – Small Molecules and Analysis of Complex Mixtures

171

Figure 2 Bufadienolides (3 and 4) from fireflies, SAAF (5) from the ascidian Ciona intestinalis, and anisomorphal (6) from walking sticks.

9.06.1.2

Mixtures

Traditionally, detailed NMR spectroscopic characterization of natural product samples was not initiated until largely pure samples of individual compounds had been obtained, usually through extensive chromatographic fractionation. However, the potential advantages of structure identification of individual components in mixtures have been widely recognized since at least the mid-1990s. Several techniques based on diffusionordered spectroscopy were developed to aid in this process including DOSY (diffusion-ordered spectroscopy)14 and DECODES (diffusion-encoded spectroscopy).15 Unfortunately, these methods often fail to resolve multiply overlapping signals and suffer from low dynamic range, which reduces their utility for structure determination in complex mixtures of organic small molecules. As a result, DOSY and related methods were primarily used to analyze mixtures of synthetic products16–19 and never found widespread use in natural products research. Using 2D NMR spectroscopy for the analysis of complex natural products mixtures recently regained momentum, as several studies demonstrated that using simple dqfCOSY (double-quantum-filtered correlation spectroscopy), TOCSY, HSQC, or HMBC spectra for complex mixtures offers exciting new perspectives for natural products research and chemical biology. Compared to mass spectrometric (MS) analyses of small molecule mixtures, such 2D NMR spectroscopic investigations offer the benefit of more detailed structural information, which is of particular relevance for the detection of unanticipated chemotypes. Recent examples include the identification of sulfated nucleosides, such as 7, from spider venom,20,21 the detection of ascarosides (e.g., 8) as part of the mating signal in the nematode Caenorhabditis elegans,22,23 and the identification of the highly unstable polyketide bacillaene (9) from Bacillus subtilis.24 These studies show that using state-of-the-art NMR spectroscopy even minor components of complex small molecule mixtures can be characterized. Such NMR spectroscopic analyses of complex mixtures may not always permit complete structural assignments; however, additional results from mass spectroscopic analyses frequently allow proposing complete structures. As a result, the need for chromatographic separations is greatly reduced, which not only accelerates compound discovery, but also offers distinct advantages for the discovery of chemically unstable metabolites. It seems likely that the pervasive use of chromatography in natural products chemistry has skewed our knowledge of secondary metabolism, because sensitive compounds often do not survive extended exposure to solvents or chromatography media. In fact, the original motivation to explore the utility of high-resolution 2D NMR spectroscopy for the characterization of small molecule mixtures arose because alkaloids present in the poison gland secretion of Myrmicaria ants were found to be highly unstable for chromatographic isolation.25,26 Myrmicarin

172 NMR – Small Molecules and Analysis of Complex Mixtures

Figure 3 Natural products identified from complex metabolite mixtures.

430A (10), the most unstable of the Myrmicaria alkaloids identified so far, thus represents one of the first members of a growing class of natural products that have never been isolated in pure form (Figure 3). Advanced processing of spectroscopic data, taking advantage of statistical tools originally developed for metabolomics studies such as STOCSY,27 SHY,28 and others29–32 could further enhance the utility of NMR spectroscopy and MS for analyzing natural products mixtures and correlating chemical information with biological data. However, to date there have been few reports on the application of metabolomics techniques to natural products research.33 In this chapter, we start with a brief overview of the standard methods currently used for the NMRspectroscopic identification of natural products and other types of organic small molecules, which is followed by a section dedicated to NMR spectroscopic characterization of small molecule mixtures and a discussion of approaches to increase NMR spectroscopic sensitivity.

9.06.2 Routine NMR Spectroscopy for Natural Products Structure Elucidation Strategies for NMR spectroscopic structure elucidation of organic compounds have been reviewed extensively.34–37 In this section, we briefly describe a set of the most commonly useful 2D NMR spectra that is sufficient for most (though certainly not all) organic structure determination problems, and we comment on specific modifications of acquisition parameters that facilitate the analysis of mixtures. Any NMR spectroscopic analysis of an organic sample will normally begin with the examination of a simple 1 H-NMR spectrum, which serves to assess purity, concentration of minor components (if any), and overall complexity of the structures in the sample. Furthermore, the 1H spectrum provides an opportunity to examine line shape characteristics of the sample’s components, and, if necessary, reevaluate solvent choice, sample concentration, or acquisition temperature. If large quantities of pure compound are available, 1D 13C-NMR spectra may also be useful. However, in most cases acquisition of a pair of (1H,13C)-HSQC and (1H,13C)HMBC spectra will be a better use of spectrometer time, unless structural features are suspected that preclude full characterization by HSQC and HMBC. For example, compounds that feature quaternary carbon atoms that

NMR – Small Molecules and Analysis of Complex Mixtures

173

cannot be detected by HMBC will often require acquisition of a 1D 13C spectrum and, if possible, a 2D 13 C-INADEQUATE (incredible natural abundance double quantum transfer experiment) spectrum. Experienced natural products chemists may be able to recognize certain compound classes or characteristic structural features at this early stage of the analysis, and for known compounds such tentative structural assignments can often be confirmed through comparison with literature NMR data and additional mass spectrometric analyses. For unknown compounds, the next step in the structure elucidation process usually consists of acquisition of variants of four different types of 2D NMR spectra: 9.06.2.1 1

1

COSY and TOCSY

(1) ( H, H)-COSY or TOCSY is used for characterization of the proton spin systems. A ‘spin system’ is represented by any group of protons that interact through scalar couplings, for example ethyl butanoate features two spin systems, one consisting of the five protons of the ethoxy group, and one consisting of the seven protons of the butanoyl group. There are many different COSY and TOCSY variants, differing in time requirement and type of (1H,1H)-coupling information provided. COSY spectra show crosspeaks only for directly coupled protons, whereas TOCSY spectra may show crosspeaks not only for protons directly coupled with each other, but also with other protons in the same spin system. For example, in COSY spectra, the alpha proton in the amino acid leucine will show crosspeaks to the adjacent beta-methylene protons, whereas in a TOCSY spectrum, depending on the experimental parameters, the alpha proton may have additional crosspeaks with protons of the gamma methine and the two methyl groups. TOCSY spectra are useful in situations where part of a spin system is obscured in the corresponding COSY spectrum, for example, due to an extensive overlap in the aliphatic region. TOCSY crosspeaks at the chemical shift of a nonobscured proton can often be used to reveal the obscured parts of the spin system. For example, TOCSY spectra are used extensively in the NMR-spectroscopic characterization of proteins where it is used to map complete amino acid spin systems onto the corresponding amide protons. TOCSY can also be useful for the analysis of natural products, especially for complex mixtures where overlap is often a problem. As described below, Bruschweiler’s group has developed mixture analysis methods that are based on TOCSY spectra.13,38,39 Acquisition parameters for TOCSY spectra can be roughly tuned to emphasize either COSY-type interactions with short mixing times or more complete spin system correlations with longer mixing times. Several mixing sequences are available on modern spectrometers, and some of the more popular are DIPSI-240 and MLEV-17.41 Bax42 has provided an excellent overview of TOCSY (also known as homonuclear Hartmann–Hahn or HOHAHA) that summarizes the basic principles and demonstrates applications. For identifying new natural products, TOCSY spectra are often less straightforward to analyze than COSY spectra, because a TOCSY crosspeak does not necessarily indicate that two protons are coupled with each other – the presence of a crosspeak only shows that two protons are part of the same spin system. Furthermore, TOCSY spectra may be significantly more crowded than COSY spectra, and TOCSY crosspeak intensity is often difficult to correlate with structural properties. Finally, the fine structure of TOCSY crosspeaks is much less amenable to detailed analysis than dqfCOSY crosspeaks; as described below, the antiphase dqfCOSY crosspeaks contain information on proton multiplicities and scalar coupling constants, which TOCSY crosspeaks cannot provide. Among variants of COSY, simple gradient COSY (gCOSY) spectra and phase-sensitive double-quantumfiltered COSY (dqfCOSY) spectra are the most useful for natural product analysis. gCOSY spectra can be acquired extremely fast – using a small molecule sample of >1 mg a decent spectrum can usually be acquired within 5–10 min. However, gCOSY spectra provide only very limited information. The fine structure of gCOSY crosspeaks is often poorly defined, which poses problems for differentiating signals of overlapping peaks and does not allow distinguishing crosspeaks that are due to large coupling constants from crosspeaks that are due to smaller couplings. This is of particular relevance for the analysis of complex spin systems where distinguishing between long-range couplings and stronger geminal or vicinal couplings is important, and where coupling constants may carry important information about relative configuration. In addition, artifacts are sometimes difficult to distinguish from ‘real’ crosspeaks in gCOSY spectra. For these reasons, dqfCOSY spectra are usually a better choice for any compound or mixture sample that includes complex proton spin systems. If acquired using sufficiently long acquisition times (600 ms or more),

174 NMR – Small Molecules and Analysis of Complex Mixtures

3.450

3.500

O

H

3.550

O

O H H

3.600

11

OH

F1 (ppm)

H3C HO

H

3.650

3.700

3.750

1.950

1.900

1.850

1.800

1.750

1.700

F2 (ppm)

Figure 4 Part of the dqfCOSY spectrum of the ascaroside (11), a component of the Caenorhabditis elegans dauer pheromone. The fine structure of the four shown crosspeaks permits accurate determination and assignments of the geminal and all vicinal coupling constants of the two methylene protons (red).43

dqfCOSY crosspeaks closely reflect the splitting patterns of corresponding multipletts in one-dimensional 1H spectra. Based on their splitting patterns, dqfCOSY crosspeaks belonging to a specific proton can be easily recognized and grouped together, and as a result, overlapping signals can be clearly distinguished. Furthermore, the characteristic antisymmetric fine structure of each crosspeak not only allows for fairly accurate determination of coupling constant values, but also permits determining the coupling partner responsible for the coupling constant (Figure 4). Therefore, crosspeaks due to small coupling constants can be easily distinguished from crosspeaks due to large coupling constants. Another advantage resulting from the highly characteristic appearance of dqfCOSY crosspeaks is that artifacts can be recognized very easily. dqfCOSY spectra should always be acquired using pulse sequences that employ phase-cycling for coherence selection. Although gradient-selected versions of dqfCOSY are available, line shapes in these gradient versions are usually extremely poor. DqfCOSY spectra usually provide sufficiently accurate values for coupling constants that are larger than twice the line width of the corresponding proton signals, for example, coupling constants larger than 2–4 Hz. However, for signals of protons that have several similar though not identical coupling constants, the interpretation of the dqfCOSY crosspeaks may present considerable difficulty. For analysis of such highly complex spin systems, or in situations where precise knowledge of small coupling constants is required, E.COSY (‘exclusive’ COSY) spectra are better suited.44,45 E.COSY crosspeaks are less complex than dqfCOSY crosspeaks and can be used to obtain highly accurate values even for very small long-range coupling constants. For example, E.COSY was used to determine coupling constants in the ladybird beetle alkaloid psylloborine A (12) (Figure 5), which features mostly an aliphatic heptacyclic ring system. As evident from the dqfCOSY spectrum shown in Figure 6, this compound’s spin systems are extremely complex and some dqfCOSY crosspeaks are

Figure 5 Structure of the dimeric polyacetate alkaloid psylloborine from the ladybird beetle Psyllobora vigintiduopunctata (12).46

NMR – Small Molecules and Analysis of Complex Mixtures

175

(1′–Heq /1′–Hax) 0.9

F1 (ppm)

1.0 1.1 1.2 1.3 1.4 (1–Heq /1–Hax) 1.5 1.6 1.6

1.5

1.4

1.3

1.2 F2 (ppm)

1.1

1.0

0.9

0.8

Figure 6 The 75–1.64 ppm region of the 1H-NMR and the dqfCOSY spectrum of psylloborine A (12) (C6D6, 500 MHz). The vicinal coupling constants of the proton 19 Heq (1.37 ppm) cannot be directly extracted, due to poor resolution in F2 and overlap with other crosspeaks, for example, of the proton 8 Heq at 1.36 ppm. The E.COSY signals corresponding to the crosspeaks (1 Heq/1 Hax) and (19 Heq/19 Hax) are shown in Figure 7.

difficult to interpret. However, the interpretation of corresponding E.COSY spectrum was straightforward (Figure 7).46 It should be noted, however, signal to noise (S/N) of E.COSY is considerably lower than that of dqfCOSY, and that E.COSY requires careful calibration of pulse width in order to minimize artifacts.

9.06.2.2 1

13

HSQC and HMQC

( H, C)-HSQC or HMQC serves to identify proton-bearing carbons and to associate these carbons with their attached protons. HSQC spectra feature generally better line shapes than HMQC and the commonly used multiplicity-edited HSQC versions offer the added benefit of distinguishing CH3, CH2, and CH groups. However, the HMQC pulse sequence is significantly shorter than the HSQC sequence, and therefore magnitude-mode HMQC spectra may have better S/N than magnitude-mode HSQC spectra. Use of adiabatic 13 C- pulses in the HSQC sequence can significantly reduce this sensitivity disadvantage. For the analysis of complex mixtures with many overlapping proton and/or carbon signals, HSQC is much better suited than HMQC because of better line shapes. Both HMQC and HSQC are usually acquired with 1H-decoupling during acquisition and as a result feature one crosspeak per 1H-resonance that is located at a chemical shift value close to that of the 12C-attached protons. It should be noted, however, that the proton chemical shifts of HSQC or HMQC crosspeaks are not exactly identical to that of the main signals in the 1H [or (1H,1H)-COSY and (1H,13C)-HMBC] spectra. The latter represent 12C-bound protons (unless the sample is isotopically labeled), whereas the signals in HSQC/HMQC obviously represent 13C-bound protons, whose chemical shift values may differ slightly, and to a varying extent. Such small differences in chemical shift can make precise calibration of HMQC/HSQC spectra difficult and may present a problem for samples that feature

176 NMR – Small Molecules and Analysis of Complex Mixtures

J (1eq,2)

J (1′eq,3′eq)

1.70 0.64 1.72

1.74

F1 (ppm)

F1 (ppm)

0.66

1.76

0.68

0.70 1.78 0.72

J (1eq,9a) 1.80

0.74

J (1eq,1ax) 1.28

1.26 F2 (ppm)

1.24

1.22

J (1′eq,1′ax) 1.48

1.46 F2 (ppm)

J (1′eq,2′) + J (1′eq,9a′) 1.44

1.42

Figure 7 Left: Crosspeak of the geminal pair 1 Heq/1 Hax in the E.COSY spectrum of psylloborine A (12) (CD2Cl2, 500 MHz). The passive vicinal couplings J(1eq,2) and J(1eq,9a) and the active coupling J1eq,1ax can be determined without interference from components of opposite phase. Right: E.COSY crosspeak of the geminal pair 19 Heq/19 Hax. Although, in comparison to the corresponding dqfCOSY spectrum (Figure 2), the crosspeak is greatly simplified, the passive vicinal coupling constants J(19eq,9a9) and J(19eq,29) cannot be directly extracted. Owing to partial overlap, only the sum J(19eq,9a9) þ J(19eq,29) can be determined. Since J(19eq,9a9) is accessible from the E.COSY crosspeak of 9a9 H/19 Hax (not shown), J(19eq,29) can be calculated. The E.COSY crosspeak 19 Heq/19 Hax also allows one to determine the active geminal coupling J(19eq,19ax). In addition, a small four-bond coupling J(19eq,39eq) is revealed.46

many overlapping proton signals of very similar chemical shift, for example, complex mixtures. Acquisition of coupled (‘nondecoupled’) HSQC spectra is sometimes advantageous, for example, in cases where one-bond 1 H–13C coupling constants are of interest, or in cases where decoupling would result in low-quality spectra due to sample heating (this can be an issue when using long acquisition times, or when using polar solvents, especially in the presence of salts).

9.06.2.3 1

13

HMBC

( H, C)-HMBC provides correlations between protons and carbons that are two or three bonds apart from each other (though occasionally four-bond or even five-bond correlations may be observed). HMBC spectra are important for the detection of quaternary carbons and serve to link separate structural fragments obtained from analysis of COSY/TOCSY and HSQC/HMQC. However, interpretation of HMBC spectra often represents the most challenging step in the structure elucidation process, for several reasons. The intensity of crosspeaks in HMBC spectra is notoriously difficult to predict. Some two- or three-bond correlations may be extremely weak or not appear at all, and therefore the absence of an HMBC crosspeak cannot, a priori, be taken as evidence against a specific structural connection. Furthermore, there is no simple method for distinguishing two- and three-bond correlations. Additional difficulties may arise in situations where weak HMBC crosspeaks could represent either a three-bond or a four-bond (or rarely five-bond) correlation. Finally, because standard HMBC experiments are usually optimized for (1H,13C)-long range coupling constants of intermediate size, both very strong and very weak (1H,13C)-long-range couplings may give rise to weak crosspeaks in routine HMBC spectra, which can be source of considerable confusion. The latter concern can be addressed by acquiring two separate HMBC spectra using two different mixing delays, for example, 50 and 100 ms. It should be noted that when using long mixing delays, the acquisition time should be increased to at least twice the mixing delay, for example, for a delay of 100 ms, the acquisition time should be at least 200 ms, which is considerably above the default values in common HMBC parameter sets. Even when using standard parameters, HMBC signal

NMR – Small Molecules and Analysis of Complex Mixtures

177

intensity is often somewhat lower than that of HSQC or HMQC spectra, primarily because of the relatively long mixing delay (40–120 ms) in the HMBC pulse sequence. Often 1H line shape is a good predictor of S/N in HMBC spectra: generally, narrow 1H line widths correlate with good S/N of corresponding signals in the HMBC spectra. For most organic small molecules, acquisition of one-dimensional 13C spectra is not required when wellresolved HSQC and HMBC spectra are available. Exceptions include compounds that have quaternary carbons that simply do not show any HMBC correlations, for example, because there are no protons within two or three bonds of some carbons (see also Section 9.06.2.5). Another limitation of routine HMBC is that spectral resolution in the 13C-chemical shift dimension is limited. As is the case for HMQC (but not HSQC), HMBC crosspeaks are broadened in the 13C-chemical shift dimension by the (1H,1H)-coupling constants of the proton whose long-range (1H,13C)-coupling is observed. As a result, crosspeaks belonging to carbons with very similar chemical shifts can sometimes not be unambiguously assigned. However, the interfering (1H,1H)-couplings can be removed by using a constant-time variant of the standard HMBC experiment. Using band-selective, constant-time HMBC variants, spectra with extremely high resolution in the 13C-dimension can be easily obtained.47 9.06.2.4 1

NOESY and ROESY

1

( H, H)-NOESY and ROESY provide information about spatial proximity of protons that are separated by up to about 5 A˚, which can be used to determine relative configuration and, in some cases, conformation of organic small molecules. Other applications include the study of chemical exchange or investigations of the interaction of natural products with their protein targets. NOESY and ROESY spectra are similar, but the choice of which to use depends on the rate of tumbling of the molecule, which is roughly proportional to the molecular weight, but also depends on its polarity and on polarity and viscosity of the solvent. NOESY crosspeaks are opposite in sign for small and large molecules. For small molecules, the sign of NOESY crosspeaks is opposite to the sign of the diagonal, whereas for large molecules NOESY cross- and diagonal peaks have the same sign. Correspondingly, there is a range of molecules for which NOESY crosspeaks are close to zero,48 and thus NOESY is not suitable. NOESY can generally be used for organic small molecules of molecular weights below 800 Da, unless the compound under investigation is very polar, or requires the use of a highly polar and/or viscous solvent, such as DMSO. In ROESY spectra, the sign of the crosspeaks are always opposite to that of the diagonal peaks, and therefore ROESY is suitable for intermediate-sized molecules around 1 kDa and smaller molecules in viscous solvents or at low temperatures. Both NOESY and ROESY can also be used to investigate chemical exchange, and they show crosspeaks for nuclei that are in slow (relative to the chemical shift differences) exchange. Importantly, the sign of chemical exchange crosspeaks is the same as that of the diagonal, and so for small molecules both NOESY and ROESY can be used to distinguish chemical exchange crosspeaks from crosspeaks due to spatial proximity. In larger molecules, but sometimes also in small molecules, additional crosspeaks can occur through ‘spin diffusion’, a relay of magnetization along a chain of 1H’s that are close together that leads to TOCSY-type crosspeaks. Mixing times for acquiring natural product NOESY or ROESY spectra should be set to 500–800 ms (NOESY) and 200–400 ms (ROESY). 9.06.2.5

Other Techniques and Current Limitations

Not all types of natural products can be sufficiently characterized using routine 2D NMR spectroscopy as described in the preceding section. For example, in highly unsaturated compounds some carbons may not be detected by HMBC simply because there are no protons within three bonds of these carbons. There are few ways to address this problem with NMR-spectroscopic means, and in some cases, chemical modification (e.g., hydrogenation) or degradation may be necessary in order to complete structural assignments. In a rare event that large amounts of the compound in question are available, (13C,13C)-correlations that provide direct evidence for carbon–carbon bonds such as (13C,13C)-INADEQUATE (or its 1H-detected cousin, ADEQUATE) can be useful.49 However, due to the low natural abundance of 13C, sensitivity of INADEQUATE is extremely low, as only pairs of adjacent 13C atoms contribute to the signal. One of the

178 NMR – Small Molecules and Analysis of Complex Mixtures

Figure 8 Myrmicarin 237A (13) and 237B (14), which were identified using (13C,13C)-2D-INADEQUATE.50

very few examples for the use of INADEQUATE for natural product samples is presented by the identification of the indolizidine alkaloids myrmicarin 237A and 237B (13 and 14) (Figure 8).50 These compounds equilibrate through keto–enol tautomerism and therefore had to be characterized as a 1:1 mixture of diastereomers, which resulted in extremely crowded COSY and HMBC spectra. A (13C,13C)-COSY-type INADEQUATE was then used to unambiguously distinguish between the 13C-resonances of the two diastereomers (Figure 9). A different problem is posed by structures that include large numbers of NMR-inactive heteroatoms. In such cases, it may be impossible to assemble based on NMR spectroscopic data simply because there are too many possibilities for arranging the heteroatoms around the identified carbon- and hydrogen-based partial structures. For compounds that include nitrogen or phosphorus, 15N- and 31P-NMR spectroscopy can often supply important additional information. As natural products chemists detect and investigate more and more complex structures, additional limitations of current NMR spectroscopic approaches have become apparent. Examples for compound classes that pose great difficulty for NMR spectroscopists include compounds with ill-defined conformations, natural products that occur as large families of structurally similar compounds, or oligomeric compounds whose

10

20

30

40

50

60

ppm ppm

60

50

40

30

20

10

Figure 9 COSY-like (13C,13C)-2D-INADEQUATE of a 220 mg sample of a 1:1 mixture of myrmicarin 237A (13) and 237B (14), acquired over 54 h.50

NMR – Small Molecules and Analysis of Complex Mixtures

179

assembly follows an irregular scheme, such as complex glycosides or lipidated natural product derivatives. For the latter groups of compounds, some partial degradation and/or derivatization may still be required in order to enable NMR spectroscopic analysis.

9.06.3 Complex Mixtures NMR spectroscopy evolved primarily as a tool for the characterization of pure compounds or simple welldefined mixtures (see Section 9.06.1.2), whereas strategies for the NMR spectroscopic identification of compounds from complex mixtures have been developed only recently. As several examples have shown, using NMR spectroscopy for the analysis of complex mixtures can open up new perspectives and may enable new lines of inquiry in both natural products chemistry and metabolomics. Before discussing some of these examples in greater detail, it is useful to consider what developments spawned the recent surge in applications of NMR spectroscopy to mixtures and what prevented earlier uses of NMR spectroscopy for this purpose. Following the initial observation in the early 1950s that the resonance frequency of a nucleus is influenced by its chemical environment,51 and that the fine structure of a resonance could be influenced by other nuclei through intervening chemical bonds,51,52 chemists quickly seized upon the enormous potential of NMR spectroscopy for structure determination. As a result, NMR spectroscopy became one of the most important spectroscopic tools of organic chemists. Combining NMR spectroscopic with mass spectrometric analyses proved particularly useful, with MS providing information about molecular weight and atomic composition, and NMR spectroscopy contributing information about chemical environment and, importantly, connectivity and spatial configuration. Until about 1980, NMR spectroscopic structure elucidation was largely based on onedimensional spectra, providing information about chemical shift (suggesting a specific chemical environment), relative signal intensity (indicating the number of a specific type of nuclei in the molecule), and multiplicity (suggesting connectivity between individual groups of nuclei in the molecule).34,53 Extracting signal intensity and multiplicity information from one-dimensional NMR spectra depended crucially on sample purity, because signals of impurities could easily skew signal intensities or obstruct important splitting patterns. Furthermore, based on one-dimensional spectra it is often impossible to determine whether two signals represent nuclei that are part of the same molecule or whether they represent two (or more) separate structures. As a result, NMR spectroscopy was deemed largely unsuitable for the analysis of complex mixtures such as crude natural products extracts, and NMR spectroscopic analysis was usually initiated only after pure or almost pure samples of the compound(s) of interest had been obtained, generally as the endproduct of extensive chromatographic fractionation. The eventual realization that NMR spectroscopy can nonetheless be applied most advantageously to the characterization of mixtures then depended on at least two separate developments. First, the advent of 2D-(and subsequently, multidimensional) NMR spectroscopy enabled much better access to connectivity information than could be obtained from the analysis of multiplets in one-dimensional spectra. 2D spectra such as COSY, HSQC, or HMBC yield correlations that correspond to connectivity through one or more chemical bonds.34,54 Importantly, dispersion of signals along a second (or third) chemical shift dimension almost always removes any ambiguity resulting from overlap of signals in one dimension, and therefore enables recognition and identification of partial structures that may belong to several different compounds.55 However, even the advent of multidimensional spectroscopy alone did not yet suffice to make NMR spectroscopic analysis of complex mixtures broadly applicable. In the early days of 2D-spectroscopy, the capabilities of spectrometers and processing hardware limited resolution and dynamic range of the spectra severely. For example, processing of a very low-resolution COSY spectrum on a Bruker AC250P (250 MHz proton) spectrometer console in 1990 could take as much as 30 min. Moreover, early 2D spectra, especially the most useful inversely detected HSQC and HMBC, were prone to artifacts, making it nearly impossible to unambiguously discern signals representing minor components. The advent of improved data acquisition systems and greatly increased computing power fundamentally changed the scope of multidimensional NMR spectroscopy, and today very low-artifact COSY, TOCSY, or HSQC spectra can be obtained whose resolution approximates that of one-dimensional spectra.35,36 Along with increases in sensitivity and resolution derived from higher magnetic field strength and improved probe design

180 NMR – Small Molecules and Analysis of Complex Mixtures

(see Section 9.06.4), these developments have set the stage for a broad exploration of the utility of NMR spectroscopy for characterizing complex mixtures. Sections 9.06.3.1 and 9.06.3.2 describe recent examples for using 2D NMR spectroscopy for the characterization of new natural products from complex biological extracts, whereas Section 9.06.3.3 describes a method for computational deconvolution of 2D-spectra of complex mixtures into one-dimensional subspectra that represent partial structures of individual components. Computational approaches have also been applied to ensembles of one-dimensional spectra of complex small molecule mixtures for the purpose of biomarker identification. Corresponding applications in metabolomics are discussed in Section 9.06.3.4.

9.06.3.1

NMR Spectroscopic Analysis of Complex Natural Products Mixtures

The idea to use 2D-NMR spectroscopy for a systematic characterization of crude or unfractionated natural products mixtures was first conceived in connection with research on the chemical ecology of arthropods.20,21,25,26,46,56–58 During studies of the chemical composition of various arthropod secretions, several cases were encountered for which conventional analytical methodology based on fractionation of the secretions aiming at the isolation of individual components failed to identify the biologically active principles. It was concluded that the chromatography-based fractionation of these secretions had resulted in destruction or loss of the active components. As a consequence, the use of NMR spectroscopy for the characterization of native, entirely unfractionated materials was considered. As one of the first examples, the unfractionated defensive secretion of a Ladybird beetle pupa, Epilachna borealis, was subjected to 2D NMR spectroscopic analysis including dqfCOSY, NOESY, HSQC, and HMBC spectra, for which acquisition parameters were somewhat modified in order to obtain higher-resolution spectra.58 This approach quickly resulted in the identification of a previously overlooked group of compounds that made up more than 50% of the secretion, a new family of macrocyclic lactone alkaloids, the polyazamacrolides, such as 15 and 16 (Figure 10). These insect secretions presented a perfect starting point for exploring the utility of direct NMR spectroscopic analysis of crude mixtures, because the secretions consisted of mixtures of only one to three structurally

Figure 10 Polyazamacrolides from pupae of the ladybird beetle Epilachna borealis58 and sulfated nucleosides (17) and (18) identified from spider venom.20,21,59

NMR – Small Molecules and Analysis of Complex Mixtures

181

distinct groups of small molecules. However, as recent analyses of crude spider venom have shown that even much more complicated mixtures of small molecules are amenable to NMR spectroscopic analysis.20,21 NMR spectroscopic studies of crude spider venom were motivated by the earlier identification of a bis-sulfated nucleoside, HF-7 (17), a selective and potent kainate receptor antagonist, from the venom of the grass spider, Hololena curta.59 The discovery of this entirely unexpected natural product suggested that spider venoms might harbor interesting new classes of neurotoxins. Moreover, it seemed unlikely that HF-7 is the only spider venom component of its kind. The question remained why sulfated nucleosides had previously escaped detection, even though spider venoms had been subject to intensive chemical scrutiny, which had led to identification of hundreds of proteins, peptides, acylated polyamines, and various small molecule neurotransmitters. Given this very high degree of complexity, it is not surprising that most previous studies of spider venom chemistry applied some form of chromatographic fractionation as a first step. Because sulfated nucleosides are somewhat susceptible to hydrolysis, it was suspected that sulfated nucleosides may have been overlooked in some earlier analyses as a result of decomposition during chromatographic fractionation. Building on experience gathered from characterization of the polyazamacrolides, it was thus attempted to characterize entire, unfractionated spider venom samples using 2D NMR spectra, including dqfCOSY, HMQC, and HMBC. This approach led to the identification of sulfated nucleosides such as 18 as important components in the venoms of several spider species, including previously well-studied species such as the hobo spider, Tegenaria agrestis, and the brown recluse spider, Loxosceles recluse.20,21 Effectively, these 2D NMR spectroscopic analyses provided a largely undistorted and impartial view of spider venom composition, without any skewing of the results stemming from chromatographic separation. It is important to note that such ‘direct’ NMR spectroscopic analyses of complex natural product mixtures may not always permit assigning complete structures. In many cases, a full or near-complete characterization will only be possible for a few major components, whereas more or less extensive partial structures will be obtained for minor components. However, any partial structures elucidated will provide important information that may be (1) used to search natural product databases for similar compounds, (2) combined with results from GC–MS or LC–MS analysis to develop better hypotheses about their structures, (3) used to develop a fractionation scheme tailored to the isolation of specific compounds of interest, and (4) used to design syntheses for the proposed structures. Direct NMR spectroscopic analyses are particularly well-suited to examine natural product extracts for the presence of novel or unanticipated compounds. However, 2D spectra of natural product mixtures are often extremely complex, which limits the feasibility of using 2D NMR as a first-line tool for the characterization of large numbers of natural product extracts. In some cases, this concern can be addressed by focusing only on specific features in the spectra, for example, groups of crosspeaks that correlate with a certain biological activity or genotype. A method that facilitates recognition of 2D NMR signals relevant within a specific biological context, DANS (‘differential analysis through 2D NMR spectroscopy’), is discussed in Section 9.06.3.2. Even if detailed spectral interpretation is not pursued, 2D spectra obtained for a crude natural products sample can be useful as a largely unbiased record of its original composition against which the results from any subsequent fractionation can be compared. Such comparisons can aid in recognizing artifacts or detecting loss of some components of the original mixture. 9.06.3.1.1 Comparing NMR spectroscopic characterization of proteins and small molecule mixtures

In some regard, NMR spectra obtained from complex mixtures of small molecules resemble those of biological macromolecules such as large peptides, proteins, or oligonucleotides.55 NMR spectra of both biological macromolecules and small molecule mixtures are similar in that they feature a very large number of overlapping signals, which through the use of a variety of two- or three-dimensional experiments can be assigned to individual substructures. In case of macromolecules such as proteins, these substructures may represent individual amino acids residues, whereas in the case of crude natural products extracts these substructures constitute fragments of the various secondary metabolites it contains. However, there are significant differences in the strategies for NMR spectroscopic analysis of crude natural products mixtures and biological macromolecules. Analysis of sets of NMR spectra from proteins or nucleic acids is primarily based on template recognition, and thus NMR spectroscopic analysis of biological macromolecules usually consists of sets of 2D and

182 NMR – Small Molecules and Analysis of Complex Mixtures

3D experiments addressing specific structural features of these templates. For example, NMR spectroscopic analysis of proteins is based on a series of specialized NMR-pulse sequences designed to identify amino acid residues, the sequence of amino acids, and spatial proximity within the chain(s). Many macromolecular NMR pulse sequences are highly specific, such as the HNCO experiment to detect the repeating covalent structure of peptide bonds in proteins.60 While analyzing crude natural products extracts, NMR spectroscopic experiments cannot be tailored in this way, because these crude mixtures usually contain a very large variety of components featuring highly diverse structures. Furthermore, the structural properties of these compounds will vary considerably between extracts, in a largely unpredictable manner. Therefore, NMR-based analysis of crude natural products extracts has to rely on experiments that focus on the most basic common features of organic molecules as frameworks of carbon and hydrogen. These experiments are primarily versions and combinations of (1H,1H)-COSY/TOCSY, (1H,1H)-NOESY/ROESY, and (1H,13C)HSQC/HMQC/HMBC.36 When dealing with spectra of complex mixtures, signal overlap, especially in the proton dimensions, becomes a serious problem, which in some cases may necessitate some form of pre fractionation. Using highfield spectrometers can help increase spectral dispersion and thus reduce overlap, as the relative size of crosspeaks decreases approximately as (1/F)n, with F being the field strength of the magnet and n the dimensionality of the experiment (neglecting line shape effects). Interference by overlap can be alleviated further by taking advantage of the much longer relaxation times of small molecules compared to those of macromolecules. The slower relaxation of small molecules allows for longer acquisition times especially for directly and indirectly detected proton magnetization, which results in correspondingly higher resolution of the spectra and permits detection of smaller scalar couplings. For example, the initial NMR spectroscopic characterization of crude natural products extracts in the studies discussed here was largely based on very high-resolution dqfCOSY spectra.21,23,61

9.06.3.1.2 NMR spectroscopy versus mass-spectrometry-based approaches for characterizing crude mixtures

Traditionally, efforts to characterize unfractionated small-molecule mixtures have relied primarily on combinations of MS with HPLC or gas chromatography. As MS is extremely sensitive and typical GC–MS and LC– MS analyses are fast, can be automated easily, and thus can accommodate large numbers of samples, these techniques would seem extremely well suited for the purpose of characterizing libraries of unfractionated natural product extracts,62 and in fact, various LC–MS-based approaches are being pursued to characterize fungal or bacterial metabolomes.33 However, there are important drawbacks to the exclusively LC–MS-based approaches. One major disadvantage of using MS as the primary analytical tool is that most mass-spectrometric techniques are strongly biased toward the detection of a few specific compound classes.63 For example, positive ion electrospray ionization MS is by orders of magnitude more sensitive for basic amines, amino acids, or peptides than for nonbasic polyketides or terpenoids. Alternative ionization techniques such as APCI, MALDI, and so on are biased in different ways and to varying degrees, and no single spectrometric approach is sufficient to provide an unbiased snapshot of a small molecule mixture of unknown composition. Regardless of the ionization technique chosen, the structural information available from mass spectrometric analyses is often insufficient for detailed structural assignments.64 Although a few compound classes, notably peptides, can be characterized extremely well by MS65,66 in most classes of small molecule metabolites, MS can provide only limited structural information beyond a tentative molecular formula. In the context of analyzing natural product extracts of diverse origins, this lack of structural information is particularly problematic, as decisions over further fractionation of an extract depend entirely on assumptions as to whether a specific extract is likely to contain new, interesting chemotypes or not. Therefore, the detailed structural information available through 2D NMR spectroscopy of small molecule mixtures can represent an invaluable addition to mass spectroscopic results. Of course, any NMR-based characterization of natural product mixtures normally will have to be complemented by HPLC–MS or GC–MS analyses. A significant disadvantage of NMR-based approaches for the characterization of natural product mixtures is represented by the much lower sensitivity and dynamic range of NMR spectra compared to MS. Furthermore, the often high complexity of 2D NMR spectra obtained for mixtures can make their interpretation challenging.

NMR – Small Molecules and Analysis of Complex Mixtures

183

The latter disadvantage could be overcome through the use of computational analysis, or through approaches based on graphical comparison of sets of 2D NMR spectra (DANS), as described in the following section.

9.06.3.2

Differential Analysis through 2D NMR Spectroscopy

One of the big remaining challenges in natural products chemistry is to develop better methods for connecting newly identified small molecule structures with their biological functions, including knowledge of the mechanisms regulating their biosynthesis and of their molecular targets. The traditional armamentarium of natural products chemistry appears ill-suited for this purpose, given the complexity of most organism’s metabolomes and the scope of assigning functions to hundreds, if not thousands, of individual components, many of which represent previously undescribed chemical structures.33 Efforts aimed at determining the structure of biologically relevant small molecules have traditionally relied on bioassay-guided fractionation, usually based on highly time-consuming multistep chromatographic fractionation schemes that require extensive biological assays at every stage in the process. As a result, approaches based on bioassay-guided fractionation often take years to tease out and identify the biologically active component(s), and the need for fractionation poses great difficulty in cases of synergism, that is, cases where more than one compound is required to elicit the monitored activity.22,23 Importantly, for chemically unstable compounds chromatographic fractionation may be unsuitable all together. Several recent studies have shown that 2D NMR analyses of natural product extracts can be highly effective for associating small molecules with specific biological properties, most significantly phenotype and genotype of the producing organism(s). These studies are based on differential analyses of 2D NMR spectra (DANS), a method for graphic comparison 2D NMR spectra representing different biological states, for example different phenotypes or genotypes.

9.06.3.2.1

DANS for screening of a fungal extract library DANS was first used for the detection of differential expression of natural products in a small library of fungal extracts.67 This library was derived from a Tolypocladium cylindrosporum strain that was cultured under a variety of ‘stress’ conditions, aiming to elicit the production of secondary metabolites from otherwise inactive biosynthetic pathways. The resulting unfractionated metabolite extracts were used to acquire dqfCOSY spectra with very high resolution in both dimensions. dqfCOSY was chosen for these studies because dqfCOSY crosspeaks feature highly regular fine structures and are thus particularly information rich. In addition, dqfCOSY spectra offer fairly good dynamic range, which often permits detailed characterization of spin systems representing even very minor components, as had been demonstrated with the examples described in Section 9.06.3.1. For differential analysis, dqfCOSY spectra corresponding to different extracts were superimposed onto each other, using a specific algorithm that suppressed signals common to all extracts, but highlighted signals unique to individual spectra. The algorithm chosen for this overlay allowed suppression of signals even in cases where compounds occurred at significantly different concentrations in different fungal extracts. As a result, only signals representing compounds whose expression was very strongly dependent on the culturing conditions were highlighted in the overlay. The DANS algorithm can be fine-tuned to reveal less severe differences as well, though it is not suitable for accurate quantitative measurements (Figure 11). Application of DANS enabled fast screening of the Tolypocladium extract library for proton spin systems representing chemotypes that are produced only under specific conditions, and led to the identification of two new terpenoid indole alkaloids that are expressed under certain nutrient-deficient conditions, but do not get produced using standard culturing protocols. The structures of the two new indole alkaloids, TC-705A (19) and TC-705B (20), were proposed on the basis of NMR spectra obtained for the unfractionated extracts and subsequently confirmed through additional spectroscopic analyses of isolated samples.67 In addition to TC705A and TC-705B, differential expression of several known compounds was observed. These known compounds were identified based on comparison of NMR spectroscopic data obtained from DANS with literature data, in conjunction with results from additional mass spectrometric analyses.

184 NMR – Small Molecules and Analysis of Complex Mixtures

H

OH OH

NH HO

H

O O

OH

HO

H

OH

OH O

H

O O

NH Overlay of 2D-NMR spectra

H

H TC-705A, 19

H

OH H

Library of fungal extracts

O

OH

OH H

TC-705B, 20 Figure 11 Identification of new fungal natural products through DANS (schematic).

67

9.06.3.2.2

DANS-based identification of bacillaene In a second example, DANS was used to determine the structure of the elusive product of the polyketide gene cluster pksX in B. subtilis.24 The 80 kb pksX gene cluster encodes an unusual hybrid polyketide/nonribosomal peptide synthase that had been linked to the production of the uncharacterized antibiotic bacillaene. Multiple copies of this synthase – each similar in size to the ribosome – assemble into a single organelle-like complex with a mass of tens to hundreds of megadaltons. The resource requirements of the assembled megacomplex suggest that bacillaene serves important biological functions. However, the unconventional domain organization of the PksX synthase and the presence of multiple enzymes that act in trans rather than in the standard assembly-line mode that is characteristic of polyketide and nonribosomal peptide biosynthesis precluded bioinformatic prediction of bacillaene’s structure. Furthermore, isolation of bacillaene using traditional activity-based fractionation could not be accomplished due to the molecule’s chemical instability. Therefore, identification of bacillaene based on NMR spectra of largely unfractionated bacterial extracts was pursued. DANS-based comparison of a bacillaene-producing B. subtilis strain and a corresponding knockout strain clearly identified distinct proton spin systems present in the bacillaene producer but absent in the knockout. Acquisition of additional (1H,13C)-HMQC, (1H,13C)-, and (1H,15N)-HMBC, and ROESY spectra for the bacillaene-producing strain subsequently permitted full identification of the two main products of PksX, bacillaene (9) and dihydrobacillaene (21), along with several double-bond stereoisomers. The biosynthesis of bacillaene by the PksX synthase was subsequently investigated by Moldenhauer et al.68 Small molecules like bacillaene, which link genotype (the pksX gene cluster) with phenotype (antibiotic and likely other activities), are central to chemical biology, and as this example demonstrates, comparative NMR-based approaches such as DANS should be generally useful for their characterization (Figure 12). 9.06.3.2.3

Identification of signaling molecules in Caenorhabditis elegans through DANS The utility of DANS for the identification of signaling molecules in eukaryotes was recently demonstrated with the identification of a mating pheromone in the nematode C. elegans.23 C. elegans is an important model organism for biomedical research, and a systematic characterization of structures and functions of small molecules in C. elegans will be critical for advancing our understanding of many biological processes.69 Earlier work had shown that three glycosides of the dideoxysugar ascarylose are part of a male-attracting pheromone that is produced by C. elegans hermaphrodites.22,43,70 These compounds, the ascarosides ascr#2, ascr#3, and ascr#4, showed strong synergism as mating signals: mixtures of ascarosides were potently active at concentrations at which individual components effected no response.22 Although biologically fascinating, the ascarosides’ synergistic properties resulted in tremendous logistical challenges for their identification through activity-guided fractionation, as this required to combinatorially recombine chromatographic fractions in order to assess activity. Despite these efforts, biological testing of mixtures of ascr#2, ascr#3, and ascr#4 at

NMR – Small Molecules and Analysis of Complex Mixtures

185

Figure 12 Compounds identified through DANS Bacillus subtilis.

physiological concentrations did not fully reproduce activity of the original pheromone extracts, and it seemed likely that important components of the mating pheromones remained to be identified (Figure 13). For the purpose of identifying missing components of the mating pheromone, the C. elegans mutant strain daf22 offered a unique opportunity. daf-22-derived metabolite extracts had been shown to have little dauerinducing activity and are not significantly active in the male attraction assay. Therefore, a careful comparison of the daf-22 metabolome with that of wild-type worms should reveal the missing daf-22-dependent pheromone components among compounds present in wild-type worms but absent in daf-22. As in the examples described in the preceding sections, this comparison was accomplished through DANS based on largely unfractionated metabolite extracts that represent highly complex mixtures of many hundred metabolites.23 For differential analysis of the dqfCOSY spectra, the daf-22-derived spectrum was superimposed onto the wild-type spectrum, again using an algorithm that suppressed signals present in both mutant and wild-type spectra. As a result, only signals present in the wild-type spectrum but entirely absent from the daf-22 spectrum remained unaltered in the overlay (Figure 14). DANS-based comparisons of daf-22 and wild-type metabolite extracts revealed several partial structures representing compounds produced only by wild type but not daf-22 worms, including several previously unknown compounds. These compounds represented far <0.1% of the entire metabolite mixture and therefore further characterization through HSQC or HMBC was not possible based on spectra of the unfractionated metabolite extracts. However, the differentially produced compounds were easily identified after partial

Wild-type metabolome Active Wild-type NMR spectrum

daf-22 metabolome Inactive

daf-22 NMR spectrum

daf-22- dependent metabolites

DANS overlay Figure 13 DANS-based comparison of Caenorhabditis elegans wild-type and daf-22 mutant metabolomes.23

186 NMR – Small Molecules and Analysis of Complex Mixtures

Figure 14 Components of the Caenorhabditis elegans mating signal identified through DANS.23

chromatographic purification, using additional 2D NMR spectroscopy and MS, as the ascarosides ascr#7 (22) and ascr#8 (8). In total, the DANS-based comparison of C. elegans wild-type and daf-22 metabolite extracts led to the identification of four novel ascarosides, three of which were shown to function as mating pheromones or regulators of developmental timing. Ultimately, these investigations allowed to fully reconstitute the maleattracting activity of wild-type pheromone extract to that of a daf-22 mutant.23 One significant problem for any comparison of metabolite mixtures is that metabolism is strongly dependent on environmental conditions, and even small changes in temperature, nutrient conditions, or other factors can induce significant changes in relative concentrations of compounds. To minimize the impact of such variations, the algorithm used for DANS in this study was chosen in such a way that it would highlight only cases where a compound is completely absent (given the detection limit of the NMR spectroscopic equipment) from the daf22 spectra. Increase of NMR-spectroscopic sensitivity, or consideration of metabolites whose biosynthesis is less strongly daf-22-dependent, could reveal additional compounds relevant for phenotypic differences between wild-type and daf-22 worms. This study showed that comparative NMR spectroscopic methods such as DANS can be used to dissect changes in small molecule production in response to genetic manipulation, and that this approach could complement or replace activity-guided fractionation for identifying biologically relevant small molecules. The primary benefit of DANS lies in the ability to quickly obtain structural information for metabolites that may represent good candidates for further evaluation in a specific biological context. 9.06.3.3

Complex Mixture Analysis by NMR

Covariance NMR data processing developed by the Bruschweiler laboratory leads to high-resolution symmetric 2D datasets, even with relatively low-resolution acquisition in the indirect dimension.71,72 Bruschweiler’s group has developed an efficient approach called COLMAR to identify individual components in complex biological covariance NMR spectra.73 COLMAR is freely available through a Web Portal developed and maintained by the Bruschweiler laboratory (http://spin.magnet.fsu.edu/). The input dataset for COLMAR is a covariance processed 2D NMR spectrum. Originally, COLMAR was developed for homonuclear TOCSY spectra, but the Bruschweiler laboratory is adding other options for the analysis of 2D 13CHSQC–TOCSY datasets. The heart of COLMAR is an algorithm called DemixC, which deconvolutes covariance TOCSY spectra and extracts 1D spectral traces that represent individual spin systems with minimal likelihood of overlap and thus, individual compounds.13,74 Although they are a probabilistic measure of nonoverlapping spin systems, the 1D traces from DemixC look like 1D NMR spectra and can be analyzed similar to 1D NMR spectra of pure compounds. The final component of COLMAR is an efficient database matching algorithm called COLMAR Query.75,76 Chemical shifts from the DemixC traces are screened against the BMRB or other metabolomics spectral database.77 The output of COLMAR Query is a ranked list of the highest scoring compounds with the best matches to known compounds in the database. COLMAR represents an efficient way to semiautomatically identify known compounds from a complex mixture, because it only requires a single 2D TOCSY spectrum as input. As mixture analysis by NMR is developed, it is increasingly critical to improve NMR small molecule databases. There are currently three main publically accessible small-molecule databases available with NMR data, the Biological Magnetic Resonance Data Bank (BMRB: http://www.bmrb.wisc.edu/),78,79 the Madison Metabolomics Consortium Database (MMCD: http://mmcd.nmrfam.wisc.edu/),80 and the Human Metabolome Database (HMDB: http://www.hmdb.ca/).81 These databases each support searching

NMR – Small Molecules and Analysis of Complex Mixtures

187

experimental NMR databases for matches to experimental spectra. The MMCD and HMDB both extensively link to other databases, and MMCD has chemical shift prediction protocols that can aid identification. Importantly, they contain experimental NMR spectra that can be downloaded and compared with experimental mixtures. New tools such as MetaboMiner82 are being developed to analyze experimental NMR data of unknown mixtures with library spectra from all of the databases. The BMRB database has extensive datasets with raw time-domain data that can be freely downloaded and analyzed. The MMCD directly utilizes the BMRB experimental data, and efforts are being made to put the experimental data from the HMDB into the BMRB database. The BMRB accepts referenced and assigned NMR data from users, so the database is steadily growing. NMR databases are less developed than their GC–MS counterparts, and there are several technical issues related to referencing, solution conditions, scalar couplings, and specific types of NMR experiments and detected nuclei that make NMR more complex than mass-spectrometry databases. As NMR small molecule databases develop, complex mixture analysis by NMR will become more and more important and routine. 9.06.3.4

Metabolomics/Metabonomics

Another very powerful approach to complex mixture analysis by NMR has been developed for biomarker discovery. The Nicholson group has developed computational tools and approaches to identify small-molecule metabolites that change in response to some perturbation such as the use of a drug or from disease.83 Statistical correlation spectroscopy (STOCSY) is a powerful approach that utilizes the natural variation found in all biological samples to find biomarkers.27 STOCSY is based on a very simple concept: standard 1D NMR spectra are recorded on a large number of samples, and the NMR signals in these spectra are then statistically correlated by comparing their amplitudes between samples. By statistically correlating the amplitudes of chemical shifts from individual spectra, resonances that are from the same compound or biosynthetic pathway can be identified. Furthermore, the Nicholson group has developed approaches that can be used to discriminate between correlations from the same compound versus correlations from the same metabolic pathway.84 Much of the development and most applications of STOCSY concerned very complex mixtures such as human urine or blood plasma. For example, STOCSY can be used to compare groups of control versus diseased or drug-treated individuals in order to discover compounds that are unique biomarkers of the condition of interest.85–87 A relatively simple extension of STOCSY, called statistical heterospectroscopy (SHY), allows for correlating different types of datasets, such as NMR and MS or even microarray data collected on the same samples.28 SHY could aid in structure identification from complex mixtures by correlating NMR and massspectrometry data to assign molecular weights to compounds with known chemical shifts. One great advantage of the STOCSY approach to complex mixture analysis is that key resonances can be efficiently identified as potential biomarkers. In other words, it can help to find the proverbial needle in a haystack from a large mixture of compounds.

9.06.4 Methods to Improve Sensitivity Much of the technical development of NMR over the past half century has focused on improving sensitivity. The fundamental problem is the low starting Boltzmann polarization that arises from the low energies of nuclear spin transitions. Several methods have been developed to improve the sensitivity or S/N in NMR. One major approach is through pulse sequence development to optimize the efficiency and information content of NMR spectra through manipulating the spin physics; some of the more important experiments for small molecules were described above. NMR frequencies are directly proportional to the magnetic field by the basic equation, !0 ¼ B0, which relates the frequency (!0) to the applied field (B0) by the gyromagnetic ratio (). This simple equation drives the development and purchase of larger and larger magnets, because the S/N goes up as the resonance frequency goes up. The exact increase in S/N depends on many factors, especially differential rates of relaxation at different field strengths, but it is commonly accepted that the S/N increases approximately as B0 1:5B0 1:75 .88 Unfortunately, the price of big magnets also increases significantly as the field strength increases.

188 NMR – Small Molecules and Analysis of Complex Mixtures

For example, a 950 MHz (22.3 T) superconducting system costs around $8 million whereas a 500 MHz (11.7 T) is closer to $500 000. The biggest magnets also require considerable physical infrastructure and space, making the highest field systems difficult for most users to acquire, maintain, and operate. In the future, NMR facilities might become more like X-ray synchrotron facilities with very large magnets at a few major sites that can provide remote access to users. Some major magnet facilities, such as the National High Magnetic Field Laboratory, also have resistive or hybrid (resistive plus superconducting) magnets that currently can reach field strengths up to 45 T (for hybrid resistive and superconducting magnets) and require large power supplies and other infrastructure. While these low homogeneity magnets are not yet suitable for routine NMR, there is a possibility that methods will be developed to better utilize these for high-resolution studies.89 A far more practical solution to improve S/N for most natural products chemists is with the NMR probe. Through the radio frequency (RF) coil, the NMR probe is the interface between the sample and the spectrometer, and it is used to both excite nuclear spins and detect the electrical signals generated by precessing spins. For a fraction of the cost required to purchase a magnet, a fairly routine 500 or 600 MHz system can provide outstanding S/N for small molecules with the right choice of probe. The basic requirements for a probe are that they have an electrical conductor oriented to deliver a magnetic field B1 that is perpendicular to the static field B0, and there are several ways to do this. Standard commercial probes that are sold with virtually every NMR system have coils that are made from copper wire and wound in a geometry to deliver a horizontal B1 magnetic field while accommodating a 5 mm vertically loaded NMR tube. This system has been used successfully for many years, because 5 mm tubes allow for approximately 600 ml of liquid for analysis and provide good S/N for samples with concentrations of about 1 mmol l1 on modern instruments using a standard probe. For a molecule with a molecular weight of 500 Da, an investigator would need about 300 mg to get good results with a standard 5 mm NMR probe using a 600 MHz spectrometer. For challenging studies, it is helpful to consider two different types of sample limitations, mass and solubility limited. Natural product studies are often mass limited because of challenges associated with the collection and isolation of samples. In contrast, they are often not solubility limited, because a wide range of organic solvents can be employed. In contrast, studies with proteins or other biological macromolecules often are solubility limited, but relatively large quantities of samples can often be produced. The worst scenario for NMR is when a sample is both mass and solubility limited. Although there is some overlap, the type of sample will often dictate the choice in NMR probe technology that can best solve the problem. Many natural product samples are severely mass limited, and it is difficult or impossible to isolate enough material to achieve the necessary concentration in a 5-mm tube. Several methods can be worked out to improve the situation with mass limited samples. Perhaps the simplest is to use 5-mm NMR tubes with susceptibility matched plugs that reduce the need for excess sample outside of the active volume of the probe. Samples need to be long enough to extend beyond the coil to avoid edge effects that severely degrade the homogeneity of the field. Susceptibility plugs allow most of the sample to be positioned in the center of the probe, but they require careful loading and positioning to avoid air bubbles that will degrade line shapes. This will bring the sample requirements of a 500 Da compound to about 150 mg in a standard probe at 600 MHz. Although susceptibility matched NMR tubes are useful in optimizing the use of the available sample, they can be difficult to shim and, as a result, the line shapes are sometimes compromised, which can lower overall S/N, especially in HMBC spectra. Of all 2D spectra routinely used for small molecule structure elucidation, HMBC spectra generally have the lowest S/N, and, unfortunately, signal strength in HMBC spectra is also strongly dependent on 1H line shapes.

9.06.4.1

Specialized NMR Probes

For studies at common magnetic field strengths such as 11.7 T (500 MHz) or 14.1 T (600 MHz), there are three main ways to improve S/N beyond a standard 5-mm probe. The simplest is to make the coils smaller, as the mass sensitivity of an NMR measurement increases roughly in inverse proportion to the diameter of the coil. A second very popular approach is to cool the entire coil and preamplifier in order to reduce the noise, thus increasing the S/N. A third approach utilizes material that conducts electricity more efficiently than copper wire.

NMR – Small Molecules and Analysis of Complex Mixtures

9.06.4.2

189

Signal-to-Noise Issues

S/N values are routinely used in NMR, especially when shopping for a new spectrometer or probe. One would think that this ratio of two numbers would be an unambiguous and objective way to compare systems, but unfortunately, it is not so straightforward. First, major NMR vendors use different algorithms to estimate noise, and several additional definitions of noise are used in the literature. Second, the thickness of the walls of NMR tubes can influence S/N measurements, especially as the tube diameter decreases. Not all probe and spectrometer manufacturer’s use the same standards. It is most common among conventional top-loading tube systems to use 0.1% ethylbenzene in CDCl3, but solenoidal flow systems typically report S/N values using 10 mmol l1 sucrose in D2O. Finally, when working with very small volumes, solvent volatility can play a role in manufacturing consistent sealed standards. For example, when evaluating the performance of a 1-mm probe, we found differences as large as 10% between two factory-sealed samples of 0.1% ethylbenzene in CDCl3. In short, S/N is a useful guide but needs to be interpreted with great care, especially when informing decisions on major purchases. 9.06.4.3

Small Coils

The sensitivity of an NMR coil is defined as the B1 field per unit current, and this is inversely proportional to the diameter of the coil,88 so smaller coils have greater mass sensitivity. Any type of coil can be made smaller, and standard saddle coil or Helmholtz designs that accommodate vertically loaded tubes are commercially available as small as 1 mm in diameter. However, depending on details of the coil geometry, solenoid coils can provide between 2 and 3 times greater sensitivity than a standard saddle coil.88 Solenoid coils pose two challenges, both of which have been nicely addressed. First, the horizontal orientation of the solenoid in the main B0 field causes severe distortions to the field. Uncorrected, this leads to poor NMR line shape and big losses in sensitivity. Andrew Webb, Jonathan Sweedler, and colleagues solved this problem by surrounding the coil in a fluid that has the same magnetic susceptibility as copper wire.90 Using this approach, high-quality NMR spectra can be obtained from extremely small volumes of sample by using small coil diameters.90–92 The second problem involves sample handling; the horizontal orientation of the coil makes standard NMR tubes and sample loading impossible. In principle, samples can be placed in sealed capillary tubes and inserted into the coils, but this is cumbersome and requires that the probe be removed from the magnet for each sample change. Moreover, sealing the tubes without introducing air bubbles is difficult. A much more practical solution is to connect tubing to flow the samples into and out of the coil. This loading scheme can be as simple as a syringe attached to tubing or as complex as the output of a chromatographic separation. Integrated systems that utilize 1-mm solenoidal microcoil probes and various sample-loading methods are available commercially from Protasis. The utility of commercially available solenoidal microprobes for the analysis of mass-limited natural products has recently been reviewed.10 Examples for natural products applications include the identification of 13 new steroids from only 50 specimens of the firefly Lucidota atra (e.g., 4 in Figure 2).4 These analyses were carried out on only partially purified samples, each containing 20– 100 mg of up to three steroids. In direct comparison to using a 5-mm inverse-detection room temperature probe and susceptibility plugs (Shigemi tubes), the use of a solenoidal microprobe provided an up to threefold gain in S/N while maintaining very high spectral quality. These small-volume systems are good choices for either high-throughput semiautomated analysis or in environments with multiple users, because the probes can be easily switched with other standard probes.10 S/N comparisons between conventional tube systems and flow solenoids are especially problematic. However, based on comparisons between a 1-mm cryogenic HTS probe93 and values in the literature from a 1-mm room temperature solenoid,94 about 30 mg of a 500 Da sample would give comparable results to a 1 mmol l1 sample in a 5-mm warm copper probe. 9.06.4.4

Cooling the Electronics

A second approach to increasing S/N through NMR probe design involves reducing the coil and receiver noise. Significant advances have been made during the past decade in cryogenically cooling the coils, electrical

190 NMR – Small Molecules and Analysis of Complex Mixtures

circuits, and preamplifiers in order to reduce the thermal noise associated with the measurement.95 Major commercial NMR vendors offer cryogenically cooled probes, and these are very effective, even with standard copper wire and coil geometries that allow top-loading samples. All of these probes thermally isolate the sample from the coils, which are cooled to about 20 K. Sample temperatures can be regulated in modest ranges around room temperature, so biological samples are easily analyzed. Although the increase in S/N is dependent upon the dielectric properties of the sample, an increase of about 4 is not unusual. Most cryogenic probes accommodate 5-mm tubes, and with sample tubes that are large, the conductivity of the solvent can have a significant influence on S/N with these probes. For organic solvents and low-salt aqueous buffers, cryogenic probes deliver the best results. However, even moderate salt concentrations can seriously degrade their performance. The salt dependence worsens with increasing field strengths and with larger diameter samples. A 500 Da sample in an organic solvent would require roughly 75 mg for good NMR spectra in a 5-mm cryoprobe using a standard 600 MHz spectrometer. One should note that the required sample concentration would only be about 250 mmol l1, whereas in order to get similar performance with a room-temperature probe one would have to use a much higher concentration of about 1 mmol l1. Large-volume cryogenic probes are excellent choices for samples with limited solubility, explaining their widespread use in biomolecular NMR. The primary disadvantages of cryogenic probes are that they are very expensive, require more physical infrastructure like chilled water lines than a conventional probe, and are difficult to install and remove. Most facilities with cryogenic probes keep them in dedicated instruments and only remove them for maintenance or repair. This works well for groups with similar samples and needs. Cryogenic probes typically have fixed frequencies and cannot replace the flexibility of broadband probes for unusual nuclei.

9.06.4.5

High-Temperature Superconducting Coils

Copper-based material is the most common conductor for NMR probes. However, there are other choices, which can provide better sensitivity through improved current carrying capacity. High-temperature superconducting material, specifically YBCO (yttrium barium copper oxide), has been used since the early 1990s in NMR coils. The first HTS probe was designed and built at Conductus (Sunnyvale, CA) in the 1990s.96 HTS coils are constructed by depositing YBCO onto planar surfaces and inductively coupling them to a copper RF circuit.97 These coils have a much higher quality factor (Q) than cooled copper coils and as a result have been shown to provide significantly higher S/N in NMR than achieved by cold copper coils. The drawbacks and challenges to HTS coils include poor filling factors due to the flat coil geometry, difficulty and cost in construction, and difficulty in tuning flat wafers to multiple frequencies required for biological NMR. However, the benefits of HTS probe technology are significant: for the same temperature and coil diameter, planar HTS coils can increase the S/N by up to a factor of 2 over copper wire. Combining HTS materials with cryogenic cooling and small coil size can result in very sensitive NMR probes. The National High Magnetic Field Laboratory, Bruker Biospin, and University of Florida recently collaborated to design and build a 1-mm cryogenic probe with HTS coils.93 This probe uses top-loading glass tubes with sample volumes between 5 and 10 ml, depending on the wall thickness. This probe has an S/N value of close to 300 for 0.1% ethylbenzene, which is about 20 more mass sensitive than a commercial 5-mm warm copper probe (S/N 1000 for 0.1% ethylbenzene). Thus, a 500 Da sample would require about 10 mg for good results. However, the concentration would increase to about 2 mmol l1, so the smallest probes are not appropriate for concentration-limited samples. This 1-mm HTS probe has been used in several natural products studies, including the analysis of insect defensive secretions with a single or very few insects,12,13,98– 100 identification of a component of the C. elegans mating pheromone,22 identification of glycosylated pheromones suspensoside A (23) and suspensoside B from male Caribbean fruit flies,101 and several marine natural product identifications (Figure 15).7,102,103 A 1.7 mm cryogenic probe is now available commercially, and this appears to provide excellent results. Dalisay et al.5,6 reported the identification of several new natural products of very low abundance from marine sponges of the genus Phorbas, including the tetrachloro polyketide muironolide A (24) and another polyketide, hemi-phorboxazol A (2). These structures were determined based on samples of only 90 mg of muironolide and 16.5 mg of hemiphorboxazol A.

NMR – Small Molecules and Analysis of Complex Mixtures

191

Figure 15 Examples for natural products identified using small-volume cryogenic probes.

The limitations of small-volume cryogenic probes are similar to standard cryogenic probes. If the coils are made from HTS material, there are additional challenges related to the fact that the coils need to be on planar surfaces that are not fully optimized to the geometry of cylindrical sample tubes and the glass vacuum tubes needed to isolate the cryogenic coil temperatures from the sample at room temperature. 9.06.4.6

Probe Summary

There are many choices of probes, and the best one depends on the amount of sample, the solubility of the sample, the number of different types of users of the system, and the budget. The most flexible probes in terms of accommodating a wide range of samples and users are standard 5-mm room temperature probes. Smaller diameter probes have higher mass sensitivity but because their sample volumes are dramatically smaller, they are not optimal for samples with limited solubility. The highest sensitivity probes are small, cryogenically cooled, and utilize high-conducting HTS materials, but these are expensive and suffer from general limitations of cumbersome cryogenic systems and small volumes. A good bet for general biomolecular NMR and some natural product work is a 5-mm cryogenically cooled probe; however, all cryogenic probes are expensive and more difficult to change with other probes such as broadband for nonproton-detected studies. 9.06.4.7

Dynamic Nuclear Polarization

Dynamic nuclear polarization (DNP) is a rapidly developing technique that achieves significantly higher S/N than conventional NMR spectroscopy by transferring the very large polarization of electrons to nuclei at low temperatures. Much of the development in DNP focused on solid state samples and frozen liquid samples,104 and techniques to directly polarize solutions in high magnetic fields are now being developed.105 Under the right conditions, DNP enhancements can reach several orders of magnitude,104,106 but there are some limitations of DNP that need to be overcome before it could be widely applied to natural product studies. First, DNP only lasts as long as the T1 of the polarized nucleus, so most applications are short 1D 13C experiments, although rapid acquisition 2D methods have been demonstrated on hyperpolarized samples.107 Second, nuclei with short T1 times are difficult or impossible to detect, and therefore identification of unknown compounds could be challenging without additional analyses using conventional NMR spectroscopy. Third, radicals need to be added to samples to provide a source of electrons, and investigators might be unwilling to add these to very precious natural product samples that took considerable time to isolate and purify. Finally, most solution DNP studies employ samples that were polarized in the frozen state and then thawed,106 and most studies have used aqueous solutions. Modified protocols and polarizing agents would need to be developed for natural products analysis in organic solvents. Despite these current limitations, the future of DNP for major enhancements of NMR S/N is very promising, and natural products chemists should follow the developments of this field.

192 NMR – Small Molecules and Analysis of Complex Mixtures

9.06.5 Outlook It is an exciting time for natural products chemistry. Analytical tools are now available that significantly reduce the amount of sample needed for structure determination. Studies that required heroic efforts and years to isolate enough material for NMR spectroscopic analysis a few decades ago can now be done with two, three, or more orders of magnitude less material today. This not only makes natural products research much more efficient, more importantly, it opens up possibilities for entirely new lines of scientific inquiry, involving individual variation, population chemical biology, and much more extensive examination of the influence of genetic or environmental factors in natural product expression levels. Because of the advances in analytical technology, natural products chemists now have tremendous opportunities to take full advantage of ‘omics’-type approaches. Genomics and proteomics technologies and databases make it much more feasible to both study and manipulate the biosynthesis of important natural products. This will allow a more complete understanding of basic biological processes but also will enable more efficient drug development that uses natural products as a starting point. Computational power and databases constantly improve and will allow the design of more and more comprehensive approach to biological problems. Metabolomics has recently emerged as a key component of ‘systems biology’, and with new analytical, computational, and information technology, metabolomics is evolving into a central hub that connects many of the other ‘omics’ to better understand biology, and consequently, human health. As analytical and computational tools improve, the distinction between natural products chemistry and metabolomics is becoming less and less clear. Natural products represent both downstream products and upstream regulators of metabolic pathways, and as we learn more about these interactions, we learn more about function and possible applications of natural products. One of the current challenges in metabolomics studies is ‘biomarker identification’, a new term for natural products chemistry. As traditional natural products studies become more integrated with the ‘omics’ and as metabolomics becomes more focused on identifying key metabolites, the two fields will become less and less distinct.

Abbreviations 2D-NMR COLMAR COSY DANS DECODES DNP DOSY dqfCOSY E.COSY gCOSY HMBC HMQC HOHAHA HSQC INADEQUATE MS NMR NOESY ROESY SHY STOCSY TOCSY

two-dimensional NMR complex liquid mixture analysis by NMR correlation spectroscopy differential analysis of 2D NMR spectra diffusion-encoded spectroscopy dynamic nuclear polarization diffusion-ordered spectroscopy double-quantum-filtered correlation spectroscopy exclusive correlation spectroscopy gradient correlation spectroscopy heteronuclear multiple-bond correlation heteronuclear multiple-quantum correlation homonuclear Hartmann–Hahn correlation heteronuclear single-quantum correlation incredible natural abundance double quantum transfer experiment mass spectrometry nuclear magnetic resonance nuclear Overhauser effect spectroscopy rotating-frame Overhauser effect spectroscopy statistical heterospectroscopy statistical correlation spectroscopy total correlation spectroscopy

NMR – Small Molecules and Analysis of Complex Mixtures

193

References 1. P. F. Wiley; K. Gerzon; E. H. Flynn; M. V. Sigal; O. Weaver; U. C. Quarck; R. R. Chauvette; R. Monahan, J. Am. Chem. Soc. 1957, 79, 6062–6070. 2. P. F. Wiley; M. V. Sigal; O. Weaver; R. Monahan; K. Gerzon, J. Am. Chem. Soc. 1957, 79, 6070–6074. 3. P. F. Wiley; R. Gale; C. W. Pettinga; K. Gerzon, J. Am. Chem. Soc. 1957, 79, 6074–6077. 4. M. Gronquist; J. Meinwald; T. Eisner; F. C. Schroeder, J. Am. Chem. Soc. 2005, 127, 10810–10811. 5. D. S. Dalisay; T. F. Molinski, Org. Lett. 2009, 11, 1967–1970. 6. D. S. Dalisay; B. I. Morinaka; C. K. Skepper; T. F. Molinski, J. Am. Chem. Soc. 2009, 131, 7552–7553. 7. D. S. Dalisay; E. W. Rogers; A. S. Edison; T. F. Molinski, J. Nat. Prod. 2009, 72, 732–738. 8. T. F. Molinski, Curr. Opin. Drug Discovery Dev. 2009, 12, 197–206. 9. J. Meinwald; D. F. Wiemer; T. Eisner, J. Am. Chem. Soc. 1979, 101, 3055–3060. 10. F. C. Schroeder; M. Gronquist, Angew. Chem. Int. Ed. Engl. 2006, 45, 7122–7131. 11. M. Yoshida; M. Murata; K. Inaba; M. Morisawa, Proc. Natl. Acad. Sci. U.S.A. 2002, 99, 14831–14836. 12. A. T. Dossey; S. S. Walse; J. R. Rocca; A. S. Edison, ACS Chem. Biol. 2006, 1, 511–514. 13. F. Zhang; A. T. Dossey; C. Zachariah; A. S. Edison; R. Bruschweiler, Anal. Chem. 2007, 79, 7748–7752. 14. H. Barjat; G. A. Morris; S. Smart; A. G. Swanson; S. C. R. Williams, J. Magn. Reson. Ser. B 1995, 108, 170–172. 15. K. Bleicher; M. F. Lin; M. J. Shapiro; J. R. Wareing, J. Org. Chem. 1998, 63, 8486–8490. 16. M. F. Lin; M. J. Shapiro, J. Org. Chem. 1996, 61, 7617–7619. 17. M. F. Lin; M. J. Shapiro; J. R. Wareing, J. Org. Chem. 1997, 62, 8930–8931. 18. M. F. Lin; M. J. Shapiro; J. R. Wareing, J. Am. Chem. Soc. 1997, 119, 5249–5250. 19. M. G. Lin; M. J. Shapiro, Anal. Chem. 1997, 69, 4731–4733. 20. A. E. Taggi; J. Meinwald; F. C. Schroeder, J. Am. Chem. Soc. 2004, 126, 10364–10369. 21. F. C. Schroeder; A. E. Taggi; M. Gronquist; R. U. Malik; J. B. Grant; T. Eisner; J. Meinwald, Proc. Natl. Acad. Sci. U.S.A. 2008, 105, 14283–14287. 22. J. Srinivasan; F. Kaplan; R. Ajredini; C. Zachariah; H. T. Alborn; P. E. Teal; R. U. Malik; A. S. Edison; P. W. Sternberg; F. C. Schroeder, Nature 2008, 454, 1115–1118. 23. C. Pungaliya; J. Srinivasan; B. W. Fox; R. U. Malik; A. H. Ludewig; P. W. Sternberg; F. C. Schroeder, Proc. Natl. Acad. Sci. U.S.A. 2009, 106, 7708–7713. 24. R. A. Butcher; F. C. Schroeder; M. A. Fischbach; P. D. Straight; R. Kolter; C. T. Walsh; J. Clardy, Proc. Natl. Acad. Sci. U.S.A. 2007, 104, 1506–1509. 25. F. Schroder; V. Sinnwell; H. Baumann; M. Kaib, Chem. Commun. 1996, 2139–2140. 26. F. Schroder; V. Sinnwell; H. Baumann; M. Kaib; W. Francke, Angew. Chem. Int. Ed. Engl. 1997, 36, 77–80. 27. E. Holmes; O. Cloarec; J. K. Nicholson, J. Proteome Res. 2006, 5, 1313–1320. 28. D. J. Crockford; E. Holmes; J. C. Lindon; R. S. Plumb; S. Zirah; S. J. Bruce; P. Rainville; C. L. Stumpf; J. K. Nicholson, Anal. Chem. 2006, 78, 363–371. 29. D. B. Kell; M. Brown; H. M. Davey; W. B. Dunn; I. Spasic; S. G. Oliver, Nat. Rev. Microbiol. 2005, 3, 557–565. 30. J. K. C. Nicholson; L. John; C. John; E. Holmes, Nat. Rev. Drug Discov. 2002, 1, 153–161. 31. B. J. Blaise, Proc. Natl. Acad. Sci. U.S.A. 2007, 104, 19808. 32. Y. Wang; J.r. Utzinger; J. Saric; J. V. Li; J. Burckhardt; S. Dirnhofer; J. K. Nicholson; B. H. Singer; R. Brun; E. Holmes, Proc. Natl. Acad. Sci. U.S.A. 2008, 105, 6127–6132. 33. S. Rochfort, J. Nat. Prod. 2005, 68, 1813–1820. 34. A. E. Derome, Nat. Prod. Rep. 1989, 6, 111–141. 35. W. F. Reynolds; R. G. Enriquez, J. Nat. Prod. 2002, 65, 221–244. 36. E. E. Kwan; S. G. Huang, Eur. J. Org. Chem. 2008, 2671–2688. 37. T. D. W. Claridge, High-Resolution NMR Techniques in Organic Chemistry, 1st ed.; Pergamon: Amsterdam: New York, 1999. 38. S. L. Robinette; F. Zhang; L. Bruschweiler-Li; R. Bruschweiler, Anal. Chem. 2008, 80, 3606–3611. 39. F. Zhang; R. Bruschweiler, ChemPhysChem 2004, 5, 794–796. 40. S. P. Rucker; A. J. Shaka, Mol. Phys. 1989, 68, 509–517. 41. A. Bax; D. G. Davis, J. Magn. Reson. 1985, 65, 355–360. 42. A. Bax, Methods Enzymol. 1989, 176, 151–168. 43. R. A. Butcher; M. Fujita; F. C. Schroeder; J. Clardy, Nat. Chem. Biol. 2007, 3, 420–422. 44. C. Griesinger; O. W. Sorensen; R. R. Ernst, J. Magn. Reson. 1987, 75, 474–492. 45. B. Vogeli; L. Yao; A. Bax, J. Biomol. NMR 2008, 41, 17–28. 46. F. C. Schroeder; T. Tolasch, Tetrahedron 1998, 54, 12243–12248. 47. T. D. Claridge; I. Perez-Victoria, Org. Biomol. Chem. 2003, 1, 3632–3634. 48. D. Neuhaus; M. P. Williamson, The Nuclear Overhauser Effect in Structural and Conformational Analysis, 2nd ed.; Wiley-VCH: New York, 2000. 49. A. Bax; R. Freeman; T. A. Frenkiel, J. Am. Chem. Soc. 1981, 103, 2102–2104. 50. W. Francke; F. Schroeder; F. Walter; V. Sinnwell; H. Baumann; M. Kaib, Liebigs Ann. 1995, 0, 965–977. 51. J. T. Arnold; S. S. Dharmatti; M. E. Packard, J. Chem. Phys. 1951, 19, 507. 52. M. E. Packard; J. T. Arnold, Phys. Rev. 1951, 83, 210–211. 53. R. R. Ernst, Biosci. Rep. 1992, 12, 143–187. 54. N. Bross-Walch; T. Kuhn; D. Moskau; O. Zerbe, Chem. Biodivers. 2005, 2, 147–177. 55. P. L. Rinaldi, Analyst 2004, 129, 687–699. 56. F. Schroeder; S. Franke; W. Francke; H. Baumann; M. Kaib; J. M. Pasteels; D. Daloze, Tetrahedron 1996, 52, 13539–13546. 57. F. C. Schroeder; S. R. Smedley; L. K. Gibbons; J. J. Farmer; A. B. Attygalle; T. Eisner; J. Meinwald, Proc. Natl. Acad. Sci. U.S.A. 1998, 95, 13387–13391.

194 NMR – Small Molecules and Analysis of Complex Mixtures 58. F. C. Schroder; J. J. Farmer; A. B. Attygalle; S. R. Smedley; T. Eisner; J. Meinwald, Science 1998, 281, 428–431. 59. J. McCormick; Y. Li; K. McCormick; H. I. Duynstee; A. K. van Engen; G. A. van der Marel; B. Ganem; J. H. van Boom; J. Meinwald, J. Am. Chem. Soc. 1999, 121, 5661–5665. 60. S. Grzesiek; H. Dobeli; R. Gentz; G. Garotta; A. M. Labhardt; A. Bax, Biochemistry 1992, 31, 8180–8190. 61. F. C. Schroeder; D. M. Gibson; A. C. Churchill; P. Sojikul; E. J. Wursthorn; S. B. Krasnoff; J. Clardy, Angew. Chem. Int. Ed. Engl. 2007, 46, 901–904. 62. F. E. Koehn, Prog. Drug Res. 2008, 65, 175, 177–210. 63. K. Biemann, J. Am. Soc. Mass Spectrom. 2002, 13, 1254–1272. 64. L. Smith; J. Novak; J. Rocca; S. McClung; J. D. Hillman; A. S. Edison, Eur. J. Biochem. 2000, 267, 6810–6816. 65. V. H. Wysocki; K. A. Resing; Q. Zhang; G. Cheng, Methods 2005, 35, 211–222. 66. K. Adermann; H. John; L. Standker; W. G. Forssmann, Curr. Opin. Biotechnol. 2004, 15, 599–606. 67. F. C. Schroeder; D. M. Gibson; A. C. L. Churchill; P. Sojikul; E. J. Wursthorn; S. B. Krasnoff; J. Clardy, Angew. Chem. Int. Ed. Engl. 2007, 46, 901–904. 68. J. Moldenhauer; X. H. Chen; R. Borriss; J. Piel, Angew. Chem. Int. Ed. Engl. 2007, 46, 8195–8197. 69. F. C. Schroeder, ACS Chem. Biol. 2006, 1, 198–200. 70. P. Y. Jeong; M. Jung; Y. H. Yim; H. Kim; M. Park; E. Hong; W. Lee; Y. H. Kim; K. Kim; Y. K. Paik, Nature 2005, 433, 541–545. 71. R. Bruschweiler; F. Zhang, J. Chem. Phys. 2004, 120, 5253–5260. 72. R. Bruschweiler, J. Chem. Phys. 2004, 121, 409–414. 73. F. Zhang; L. Bruschweiler-Li; S. L. Robinette; R. Bruschweiler, Anal. Chem. 2008, 80, 7549–7553. 74. F. L. Zhang; R. Bruschweiler, ChemPhysChem 2004, 5, 794–796. 75. S. L. Robinette; F. Zhang; L. Bru¨schweiler-Li; R. Bru¨schweiler, Anal. Chem. 2008, 80, 3606–3611. 76. D. A. Snyder; F. Zhang; S. L. Robinette; L. Bru¨schweiler-Li; R. Bru¨schweiler, J. Chem. Phys. 2008, 128, 052313. 77. B. R. Seavey; E. A. Farr; W. M. Westler; J. L. Markley, J. Biomol. NMR 1991, 1, 217–236. 78. E. L. Ulrich; H. Akutsu; J. F. Doreleijers; Y. Harano; Y. E. Ioannidis; J. Lin; M. Livny; S. Mading; D. Maziuk; Z. Miller; E. Nakatani; C. F. Schulte; D. E. Tolmie; R. Kent Wenger; H. Yao; J. L. Markley, Nucleic Acids Res. 2008, 36, D402–D408. 79. J. L. Markley; M. E. Anderson; Q. Cui; H. R. Eghbalnia; I. A. Lewis; A. D. Hegeman; J. Li; C. F. Schulte; M. R. Sussman; W. M. Westler; E. L. Ulrich; Z. Zolnai, Pac. Symp. Biocomput. 2007, 12, 157–168. 80. Q. L. Cui; A. Ian; A. D. Hegeman; M. E. Anderson; J. Li; C. F. Schulte; W. M. Westler; H. R. Eghbalnia; M. R. Sussman; J. L. Markley, Nat. Biotechnol. 2008, 26, 162–164. 81. D. S. Wishart; D. Tzur; C. Knox; R. Eisner; A. Guo; N. Young; D. Cheng; K. Jewell; D. Arndt; S. Sawhney; C. Fung; L. Nikolai; M. Lewis; M. Coutouly; I. Forsythe; P. Tang; S. Shrivastava; K. Jeroncic; P. Stothard; G. Amegbey; D. Block; D. Hau; J. Wagner; J. Miniaci; M. Clements; M. Gebremedhin; N. Guo; Y. Zhang; G. E. Duggan; G. D. Macinnis; A. M. Weljie; R. Dowlatabadi; F. Bamforth; D. Clive; R. Greiner; L. Li; T. Marrie; B. D. Sykes; H. J. Vogel; L. Querengesser, Nucleic Acids Symp. Ser. 2007, 35, D521–D526. 82. J. Munger; B. D. Bennett; A. Parikh; X.-J. Feng; J. McArdle; H. A. Rabitz; T. Shenk; J. D. Rabinowitz, Nat. Biotechnol. 2008, 26, 1179–1186. 83. J. K. Nicholson; J. C. Lindon, Nature 2008, 455, 1054–1056. 84. A. Couto Alves; M. Rantalainen; E. Holmes; J. K. Nicholson; T. M. Ebbels, Anal. Chem. 2009, 81, 2075–2084. 85. A. D. Maher; O. Cloarec; P. Patki; M. Craggs; E. Holmes; J. C. Lindon; J. K. Nicholson, Anal. Chem. 2009, 81, 288–295. 86. A. D. Maher; D. Crockford; H. Toft; D. Malmodin; J. H. Faber; M. I. McCarthy; A. Barrett; M. Allen; M. Walker; E. Holmes; J. C. Lindon; J. K. Nicholson, Anal. Chem. 2008, 80, 7354–7362. 87. E. Holmes; R. L. Loo; O. Cloarec; M. Coen; H. R. Tang; E. Maibaum; S. Bruce; Q. Chan; P. Elliott; J. Stamler; I. D. Wilson; J. C. Lindon; J. K. Nicholson, Anal. Chem. 2007, 79, 2629–2640. 88. D. I. Hoult; R. E. Richards, J. Magn. Reson. 1976, 24, 71–85. 89. B. Shapira; K. Shetty; W. W. Brey; Z. H. Gan; L. Frydman, Chem. Phys. Lett. 2007, 442, 478–482. 90. D. L. Olson; T. L. Peck; A. G. Webb; R. L. Magin; J. V. Sweedler, Science 1995, 270, 1967–1970. 91. D. Raftery, Anal. Bioanal. Chem. 2004, 378, 1403–1404. 92. A. G. Webb, J. Pharm. Biomed. Anal. 2005, 38, 892–903. 93. W. W. Brey; A. S. Edison; R. E. Nast; J. R. Rocca; S. Saha; R. S. Withers, J. Magn. Reson. 2006, 179, 290–293. 94. D. L. Olson; J. A. Norcross; M. O’Neil-Johnson; P. F. Molitor; D. J. Detlefsen; A. G. Wilson; T. L. Peck, Anal. Chem. 2004, 76, 2966–2974. 95. H. Kovacs; D. Moskau; M. Spraul, Prog. Nuclear Magn. Reson. Spectrosc. 2005, 46, 131–155. 96. W. W. Brey; W. Anderson; W. H. Wong; L. F. Fuks; V. Y. Kotsubo; R. S. Withers, Nuclear Magnetic Resonance Probe Coil. U.S. Patent 5,565,778, 1996. 97. W. A. Anderson; W. W. Brey; A. L. Brooke; B. Cole; K. A. Delin; L. F. Fuks; H. D. W. Hill; M. E. Johanson; V. Kotsubo; R. Nast; R. S. Withers; W. H. Wong, Bull. Magn. Reson. 1995, 17, 98–102. 98. A. T. Dossey; S. S. Walse; A. S. Edison, J. Chem. Ecol. 2008, 34, 584–590. 99. A. T. Dossey; S. S. Walse; O. V. Conle; A. S. Edison, J. Nat. Prod. 2007, 70, 1335–1338. 100. B. Wang; A. T. Dossey; S. S. Walse; A. S. Edison; K. M. Merz, Jr., J. Nat. Prod. 2009, 72, 709–713. 101. S. S. Walse; F. Lu; P. E. Teal, J. Nat. Prod. 2008, 71, 1726–1731. 102. S. Matthew; P. J. Schupp; H. Luesch, J. Nat. Prod. 2008, 71, 1113–1116. 103. J. C. Kwan; J. R. Rocca; K. A. Abboud; V. J. Paul; H. Luesch, Org. Lett. 2008, 10, 789–792. 104. A. B. Barnes; G. D. Paepe; P. C. van der Wel; K. N. Hu; C. G. Joo; V. S. Bajaj; M. L. Mak-Jurkauskas; J. R. Sirigiri; J. Herzfeld; R. J. Temkin; R. G. Griffin, Appl. Magn. Reson. 2008, 34, 237–263. 105. M. J. Prandolini; V. P. Denysenkov; M. Gafurov; B. Endeward; T. F. Prisner, J. Am. Chem. Soc. 2009, 131, 6090–6092. 106. J. H. Ardenkjaer-Larsen; B. Fridlund; A. Gram; G. Hansson; L. Hansson; M. H. Lerche; R. Servin; M. Thaning; K. Golman, Proc. Natl. Acad. Sci. U.S.A. 2003, 100, 10158–10163. 107. L. Frydman; D. Blazina, Nat. Phys. 2007, 3, 415–419.

NMR – Small Molecules and Analysis of Complex Mixtures

Biographical Sketches

Arthur S. Edison obtained a B.S. in chemistry from the University of Utah, where he studied monoterpenes isolated from southern Utah sagebrush by NMR. He completed his Ph.D. in biophysics from the University of Wisconsin, Madison, where he developed and applied NMR methods for peptide and protein structural studies under the supervision of John Markley and Frank Weinhold. In 1993, Dr. Edison joined the laboratory of Anthony O. W. Stretton at the University of Wisconsin as a Jane Coffin Childs postdoctoral fellow where he investigated the role of neuropeptides in the nervous system of the parasitic nematode Ascaris suum. He joined the faculty at the University of Florida and the National High Magnetic Field Laboratory in 1996 and is currently the Director of Chemistry & Biology at the NHMFL. Dr. Edison’s current research is in technology development for high-sensitivity NMR and natural product discovery in nematodes and other invertebrates. Dr. Edison is the recipient of the 1997 American Heart Association Robert J. Boucek Award, a CAREER Award from the National Science Foundation in 1999, and, with his postdoctoral scientist Aaron Dossey, the Beal award for the best publication of the year in the Journal of Natural Products in 2007.

Frank C. Schroeder studied chemistry and physics at the University of Hamburg, where he worked under the guidance of Wittko Francke. He received his doctorate in 1998 for studies on structures and functions of insect-derived natural products, which included the serendipitous discovery of a group of structurally complex ant alkaloids, the myrmicarins. During his graduate studies, he developed a deep appreciation for NMR spectroscopy as a tool in natural products chemistry and metabolomics. He continued to develop new analytical methodology for characterizing structures and functions of small molecule metabolites as a postdoc and later research associate with Jerrold Meinwald at Cornell University and Jon Clardy at Harvard Medical School. In August 2007, he joined the faculty of Cornell

195

196 NMR – Small Molecules and Analysis of Complex Mixtures

University’s Boyce Thompson Institute and the Cornell Department of Chemistry and Chemical Biology. Dr. Schroeder’s research aims to develop NMR spectroscopy-based approaches that complement or enhance traditional methodology by enabling detailed characterization of small molecule metabolites in complex biological samples, with regard to both chemical structure and biological function. His current work focuses on a comprehensive structural and functional annotation of the metabolome of the model organism Caenorhabditis elegans.

9.07 Biomolecular Recognition by Oligosaccharides and Glycopeptides: The NMR Point of View Katalin E. Ko¨ve´r, La´szlo´ Szila´gyi, and Gyula Batta, University of Debrecen, Debrecen, Hungary Dusˇan Uhrı´n, University of Edinburgh, Edinburgh, UK Jesu´s Jime´nez-Barbero, Centro de Investigaciones Biolo´gicas, Madrid, Spain ª 2010 Elsevier Ltd. All rights reserved.

9.07.1 9.07.1.1 9.07.1.2 9.07.1.2.1 9.07.1.2.2 9.07.1.2.3 9.07.1.2.4 9.07.1.3 9.07.1.3.1 9.07.1.3.2 9.07.1.3.3 9.07.1.4 9.07.1.4.1 9.07.1.4.2 9.07.1.4.3 9.07.1.4.4 9.07.2 9.07.2.1 9.07.2.2 9.07.2.2.1 9.07.2.3 9.07.2.4 9.07.3 9.07.3.1 9.07.3.2 9.07.3.3 9.07.3.3.1 9.07.3.3.2 9.07.3.4 9.07.3.5 9.07.3.6 9.07.3.7 9.07.4 9.07.4.1 9.07.4.2 9.07.4.2.1 9.07.4.2.2 9.07.4.2.3 9.07.4.3 9.07.4.4

Scalar and Residual Dipolar Coupling Constants in the Structure Determination of Carbohydrates by NMR Introduction NMR Methodology Assignment of resonances Measurement of proton–proton coupling constants Measurement of proton–carbon coupling constants Measurement of carbon–carbon coupling constants Conformational Analysis of Carbohydrates in Solution Conformational analysis of glycosidic linkages in carbohydrates Conformational analysis of hydroxymethyl groups in carbohydrates Conformational analysis of hydroxyl groups in carbohydrates Conformational Analysis of Carbohydrates in Dilute Liquid Crystalline Media Accurate structures of small, rigid molecules from RDCs Measurement of RDCs Interpretation of RDCs Conclusions Conformation of Oligosaccharides in the Free and Bound States Introduction The Conformation of Oligosaccharides in Solution The use of NOEs for conformational analysis of oligosaccharide molecules The Bound State Conclusions Bacterial Cell Wall Peptidoglycans and Fragments: Structural Studies and Functions Introduction Peptidoglycan Structure Peptidoglycan Fragments Syntheses Structural studies Peptidoglycan Recognition Proteins Interactions of PGRPs and Other Proteins with PGN Fragments: Structural Studies Physiological Activities of Muropeptides Conclusions NMR of Glycopeptide (Vancomycin-Type) Antibiotics: Structure and Interaction with Cell Wall Analogue Peptides Introduction Basics of Mode of Action Structure of the cell wall Main targets of glycopeptide antibiotics Common structural features of the glycopeptide antibiotics NMR Methods for Solution Structure Comparison of Crystal and Solution Structures – Dimerization and Ligand Binding

198 198 198 198 199 200 201 201 201 203 204 207 207 208 210 213 214 214 215 215 217 218 218 218 219 221 221 222 222 223 226 226 227 227 227 227 227 228 228 230

197

198 Biomolecular Recognition by Oligosaccharides and Glycopeptides: The NMR Point of View 9.07.4.5 9.07.4.6 9.07.4.7 References

Possible Role of Dynamics Upon Ligand Binding Solid-State NMR of Glycopeptide Antibiotics with Bacterial Cell Wall Complexes Conclusions

232 232 234 236

9.07.1 Scalar and Residual Dipolar Coupling Constants in the Structure Determination of Carbohydrates by NMR 9.07.1.1

Introduction

Interactions of the oligosaccharide moieties of glycolipids and glycoproteins with various receptors are fundamental processes in many biological events, particularly those related to immunology. For such interactions to be efficient, it is vital that the three-dimensional (3D) structure (conformation) of the carbohydrate ligand matches the spatial requirements of the receptor’s binding site. Hence, knowledge of the conformational behavior is indispensable for understanding these processes on a molecular level. NMR is the main source of structural information for biomolecules in solution. However, a serious impediment with conformational analysis based on NMR observables (such as chemical shifts, coupling constants, NOEs, and relaxation times) is that the measured values represent averages in the case of flexible molecules showing rapidly interconverting conformations. Recent developments in NMR spectroscopy, along with advances in computational techniques, have produced new approaches to the interpretation of spin–spin coupling constants extracted from biomolecules. Quantum chemical studies of useful accuracy are now becoming more routine, and are increasingly being used in conjunction with experimental data to map out the expected structural patterns for oligosaccharides, as well as other biomolecules. An understanding of how structure influences coupling constants is still of paramount importance in interpreting the NMR data. During the past 10 years, a variety of systematic attempts have been made to develop databases of coupling constants, allowing the development of empirical rules for their interpretation in terms of molecular structure. At the same time, progress in the field of quantum chemistry has allowed accurate and reliable calculations of these coupling constants using biomolecular fragments containing dozens of atoms. In more propitious instances, a combination of empirical and theoretical studies can now provide valuable information. A recent example includes interpretation of spin–spin couplings involving hydroxyl and hydroxylmethyl groups. 9.07.1.2

NMR Methodology

Carbohydrate NMR spectroscopy has developed rapidly during the last few years and has been reviewed several times.1–9 In this review, predominantly the latest techniques and applications will be reported. The focus is on the measurement and application of coupling constants in the conformational analysis of flexible regions of carbohydrates, such as glycosidic linkages, exocyclic hydroxymethyl, and hydroxyl groups. 9.07.1.2.1

Assignment of resonances Evidently, the first step in any NMR study of carbohydrates involves assigning proton and carbon resonances. This can be done on the basis of scalar and through space connectivities, using 2D/3D homo- and heteronuclear correlation experiments.1,4–6,10 Traditionally, homonuclear 2D double quantum filtered correlation spectroscopy (DQF-COSY) and total correlated spectroscopy (TOCSY) spectra are valuable in the identification of resonances of individual monosaccharide units. In the presence of small couplings, through space connectivities detected by NOESY/ROESY (nuclear Overhauser effect spectroscopy/ rotational nuclear Overhauser effect spectroscopy) experiments are also useful in completing the resonance assignment. When the 1H NMR spectra of complex oligosaccharides are too crowded to fully elucidate the structure by homonuclear correlation methods, it is efficient to use 2D heteronuclear correlation methods, such as heteronuclear single quantum correlation

Biomolecular Recognition by Oligosaccharides and Glycopeptides: The NMR Point of View

199

(HSQC) or heteronuclear multiple quantum correlation (HMQC). The combined experiments such as 2D HSQC(HMQC)-TOCSY experiments are powerful tools for the assignment of the 13C and 1H resonances belonging to the same sugar residue providing enhanced dispersion of TOCSY correlations in the carbon dimension. More recently, different carbon multiplicity editing methods, for example, DEPT (distortionless enhanced polarization transfer)-HMQC and E-HSQC, have been developed to reduce the complexity of proton–carbon correlation spectra and to enhance the resolution by narrowing the applied spectral window.11 To obtain information about the glycosidic linkage, 1H–1H NOESY/ROESY and/or long-range 1H–13C correlated spectra, heteronuclear multiple bond correlation (HMBC) or CT (constant time)-HMBC,12 are recorded. The combined 2D HSQC(HMQC)-NOESY(ROESY) experiments could also be helpful, but have limited applications due to their low sensitivity in samples with natural abundance of 13C. Several homonuclear 3D NMR experiments, such as TOCSY–NOESY, ROESY–COSY, or TOCSY– COSY, have been employed to reduce spectral overlap, however, with limited success due to the generally poor proton chemical shift dispersion of complex oligosaccharides.3,5,10,13,14 The increased chemical shift dispersion of 13C resonances makes the 3D heteronuclear methods more attractive; however, such experiments on natural abundance samples require very long measurement times. Despite this, several heteronuclear 3D spectra such as HSQC-TOCSY, HMQC-NOESY, and HSQC-HMBC have been reported on natural abundance samples.15–17 With the availability of uniformly 13C-enriched carbohydrates, 13C-edited homonuclear 1H correlation experiments, such as 3D HSQC(HMQC)COSY(TOCSY, NOESY), are the techniques of choice that can overcome spectral overlap. Another strategy for carbohydrate NMR assignment of 13C-labeled samples relies on 1JCC (40 Hz)-mediated magnetization transfer. It has been used in the form of 3D HCCH-COSY (3D experiment correlating Ha, Ca, and Hb in an (–HaCa. . .CbHb–) segment), HCCH-TOCSY (3D experiment correlating Ha, Ca, and Hb in an (–HaCa. . .CbHb–) segment), and their constant time variants in the study of high-molecular-weight glycoconjugates.18 Many of these 3D homo- and heteronuclear experiments and their applications for structural characterization of carbohydrates have been recently reviewed, and therefore will not be described here in further detail.5 Selective and/or double-selective analogues of 2D and 3D homo- and heteronuclear experiments are particularly valuable for carbohydrates. Nonoverlapping resonances of anomeric protons (or anomeric carbons) offer a convenient starting point and are ideal for selective excitation with soft-shaped pulses. In addition, recently developed gradient-enhanced chemical-shift-selective filters enable selective excitation of overlapping resonances, which nevertheless differ in their chemical shift by a few hertz.19 This widens the applicability of 1D NMR techniques to the studies of carbohydrates. The resulting reduced dimensionality 1D or 2D spectra show good signal-to-noise and high digital resolution, facilitating the extraction of important information. An arsenal of these selective/double-selective 1D and 2D experiments has been implemented using pulsed field gradients, and their usefulness in carbohydrate structure elucidation has been demonstrated.19–25

9.07.1.2.2

Measurement of proton–proton coupling constants The simple 1D proton spectrum of small- to medium-sized carbohydrates is often quite suitable for extracting the proton–proton coupling constants. In some circumstances, however, spectrum simulations become necessary to derive the accurate values due to second-order effects.26 Signal overlap in the spectra of more complex carbohydrate molecules may hamper the analysis of 1D spectra and therefore also the measurement of coupling constants. Acquisition of pure phase proton multiplets through the gradient-enhanced 1D TOCSY spectra via selective excitation of well-resolved (e.g., anomeric proton) resonances may allow the extraction of coupling constants.27–29 However, in the case of severe overlap of resonances, 2D proton–proton correlation spectra, such as DQF-COSY, TOCSY and exclusive correlation spectroscopy (E.COSY), and/or 2D J-resolved spectra,30,31 recorded with sufficient data points to define the peaks properly, can give the values of such coupling constants. It is important to emphasize that in analyzing the peak separation of the resulting antiphase (in DQF-COSY) or in-phase (TOCSY) multiplets caution should be taken as the apparent splitting may not always correspond to the true value of the coupling constant. As a general rule, the gradient versions of NMR experiments provide enhanced sensitivity and fewer spectral artifacts, require shorter measurement time, and allow for better solvent suppression.

200 Biomolecular Recognition by Oligosaccharides and Glycopeptides: The NMR Point of View

More recently, an intensity-based, quantitative J method has been reported, which enables the determination of all endocyclic 1H homonuclear couplings in natural abundance carbohydrates of any molecular size.32 The proposed 2D 13C-COSMO-HSQC (cosine modulated heteronuclear single quantum correlation) relies on cosine modulation of 1H magnetization with respect to all active homonuclear couplings as a preparation of HSQC. As a result, scalar couplings smaller than the natural line widths can be determined even in the presence of strong signal overlap. 9.07.1.2.3

Measurement of proton–carbon coupling constants Several NMR pulse sequences have been reported for the accurate measurement of 1,nJC,H values of carbohydrates, including frequency/line-separation-based techniques or quantitative J spectroscopy33 using either carbon or proton detection. These techniques have developed rapidly over the years with the introduction of gradient spectroscopy and with the advent of improved selective excitation schemes. Such methodological advancements have been comprehensively reviewed recently.34 In this section we will concentrate mainly on those methods that are typically applied to carbohydrates. Heteronuclear coupling constants (1,nJC,H) are most commonly measured from heteronuclear 2D experiments. The 1JC,H couplings can be easily extracted from J-resolved spectra as well as from F1 or F2 proton coupled HSQC spectra. The undesired evolution of nJC,H during t1 can be eliminated with use of an appropriate bilinear rotation decoupling (BIRD) pulse, such as BIRDd,X in J-resolved spectroscopy35 and BIRDr in F1-coupled HSQC.36 Spin-state selective excitation techniques, S3E and S3CT37,38 (spin-stateselective coherence transfer), can also be used for the measurement of 1JC,H. Long-range heteronuclear coupling constants within the same monosaccharide residue can be conveniently extracted from the E.COSY multiplets of the hetero (!1) half-filtered TOCSY (HETLOC (pulse sequence for determination of heteronuclear long-range couplings))39–42 and also from the proton–carbon correlation-based HECADE43 spectra. The sensitivity-enhanced gradient versions of these experiments allow for an additional scaling in the F1 dimension to avoid accidental overlap of the E.COSY multiplets. The signal displacement of cross-peak doublets (resolved by 1JCH in F1) measured along the F2 dimension provides the corresponding 1 JC,H or nJC,H heteronuclear coupling constants. In addition to the magnitude of J, the relative sign of long-range versus one-bond couplings can also be determined from the tilt of the E.COSY pattern. Alternatively, comparison of the widths of cross-peak multiplets in the sensitivity- and gradient-enhanced coupled/decoupled HSQC-TOCSY spectra yields the same coupling information, but requires two experiments to be acquired.44 More recently, spin-state-edited variants of HSQC-TOCSY experiment have been proposed to separate the components of E.COSY multiplets into two subspectra, allowing an easy and straightforward measurement of couplings even in the case of crowded spectra of complex oligosaccharides.45 Typically, the J values are measured from direct multiplet analysis, or in the case of complex multiplets, the use of a fitting procedure becomes necessary. In all of the above experiments, the presence of a small proton–proton coupling weakens the signal intensity radically due to inefficiency of the TOCSY transfer. Another inherent limitation of these TOCSY-based experiments is that neither long-range coupling constants between protons and carbons separated by a quaternary carbon or heteroatom, nor those of quaternary carbons can be obtained. Phase-sensitive HMBC46,47 as well as J-resolved HMBC and their variants34,48 can be used as complementary experiments, particularly suited for the measurement of conformationally dependent trans-glycosidic 3JCH coupling constants. Phase-sensitive HMBC experiment can be designed in such a way that the coupling constants are evaluated from the intensity of cross-peaks, while in the J-resolved HMBC spectra the crosspeaks are split by nJCH multiplied with a suitable scaling factor in the F1 dimension. A very nice collection of spin-state-edited heteronuclear cross-polarization (HCP) experiments has been reported for the measurement of heteronuclear coupling constants of both protonated and nonprotonated carbons.49,50 Alternatively, 1D analogues of 2D HSQC, HSQC-TOCSY, and HETLOC experiments and a selective version of J-resolved spectroscopy using selective excitation and/or chemical shift filtering of proton or carbon resonances may be used for the measurement of coupling constants.49,51–59 Band-selective decoupling of some of the protons during acquisition leads to reduced multiplicity, and so facilitates the multiplet analysis.56 More recently, a gradient-enhanced heteronuclear single quantum multiple bond correlation (HSQMBC) experiment and its variants34,47,60–62 have been proposed for the measurement of heteronuclear coupling

Biomolecular Recognition by Oligosaccharides and Glycopeptides: The NMR Point of View

201

constants from the antiphase splitting of cross-peak multiplets. Unfortunately, proton couplings that generate additional in-phase splittings, may cause some complications during the analysis. A proper peak fitting is required for the accurate measurement of these heteronuclear coupling constants. The availability of 13C-labeled carbohydrates has allowed the use of 3D NMR experiments for the measurement of both proton–proton and proton–carbon coupling constants.10,63,64 9.07.1.2.4

Measurement of carbon–carbon coupling constants C-labeled carbohydrates have also been used to measure the carbon–carbon coupling constants in carbohydrates. If carbohydrates are labeled at only a few selected positions, 1D INADEQUATE (incredible natural abundance double quantum transfer experiment)65,66 or simple proton-decoupled 1D 13C spectra can provide an array of one-bond and long-range carbon–carbon coupling constants.67–70 Uniform 13C enrichment necessitates the use of more sophisticated NMR techniques, as illustrated by the 1H-detected long-range 13C–13C correlation71,72 or CT 13C–13C COSY experiments73 using carbohydrates. In addition the signs of long-range 13 C–13C coupling constants can be determined by a 13C–13C COSY-45 experiment.74 Increased sensitivity of cryoprobes will likely to lead to a more frequent measurement of carbon–carbon coupling constants of carbohydrates using samples with the natural abundance of 13C. The use of pulsed field gradients or sophisticated phase-cycling in combination with double-quantum filtration provide excellent suppression of the main 13C signal in either 1H- or 13C-detected experiments. Measurement of 1JCC coupling constants is straightforward; problems arise when the resolution of long-range 13C–13C doublets is compromised by the small sizes of coupling constants or fast spin–spin relaxation. Several approaches have recently been proposed and applied to carbohydrates with the aim of overcoming these limitations.75–78 These include the acquisition of in-phase, rather than antiphase doublets,75 acquisition of doubly J-modulated spectra,76 or the combined use of in-phase and antiphase doublets77,78 as a means of increasing the accuracy of the measured coupling constants. 13

9.07.1.3

Conformational Analysis of Carbohydrates in Solution

9.07.1.3.1

Conformational analysis of glycosidic linkages in carbohydrates Until recently, conformational analysis of the exocyclic glycosidic linkages has been based largely on the observation of interresidue 1H–1H nuclear Overhauser effects (NOEs). The information obtained about linkage conformation using NOE data alone may not be sufficient due to the considerable mobility of most oligosaccharides. The difficulties arise from the limited number of available NOEs and their nontrivial interpretation due to the r6 dependence on 1H–1H internuclear distances.3,79 A means of increasing the number of conformational constraints is to use NOEs involving hydroxyl protons. These interresidue NOEs are particularly useful in conformational analysis since they are very sensitive to conformational changes of glycosidic linkage. However, due to rapid exchange with the protons of water the use of hydroxyl protons for structural studies in aqueous solution is still an experimental challenge (see Section 9.07.1.3.3). To overcome this problem, the measurement of different types of experimental parameters that are sensitive to linkage conformation and are more easily interpreted in flexible systems is required. The use of scalar spin– spin coupling constants for both configurational and conformational analysis of rigid and flexible molecules is well established. The reader is referred to some of the most recent reviews on this topic.80–86 In oligosaccharides the two trans-glycosidic 3JHCOC spin–spin coupling constants provide a direct measure of the torsion angles phi () and psi (j), respectively. The torsion angle is defined as O59–C19–On–Cn, where n is the linkage carbon number, while the j angle is related to C19–On–Cn–C(n–1) (Scheme 1).

Scheme 1 Schematic diagram indicating the glycosidic torsion angles in carbohydrates.

202 Biomolecular Recognition by Oligosaccharides and Glycopeptides: The NMR Point of View

Karplus-type correlation curves that relate these vicinal couplings with the glycosidic dihedral angles are available (Equation (1)).87–91 3

JCOCH ¼ 5:7 cos2 ðQÞ – 0:6 cos ðQÞ þ 0:5

ð1Þ

The angles determined by this approach are not unique due to the ambiguous nature of the Karplus equation;92 as a consequence, a single 3J value may correspond to up to four different torsion angles. Even in rigid structure, this ambiguity can only be resolved with the use of molecular modeling, where a single pair of j angles can be identified from 16 possible combinations. In flexible systems, however, the measurement of additional interglycosidic coupling constants, which are sensitive to linkage conformation, is required in order to calculate the population of individual low-energy conformers. Serianni and coworkers demonstrated that the interglycosidic C–C scalar coupling constants (2JCOC and 3JCOCC) provide particularly valuable information on linkage conformation.68,90,93 The recent advances of heteronuclear NMR spectroscopy and a wider availability of 13C-labeled saccharides have drawn increasing attention to these experimental parameters. The interglycosidic vicinal 3JCOCC coupling constants show the expected Karplus dependence on the j dihedral angles. A 3JCOCC Karplus-type equation (Equation (2)) has been deduced based on experimental and computational studies and parameterized using a wide range of conformationally restricted carbohydrates with particular 13C labels.68,70, 89–91 3

JCOCC ¼ 3:49 cos2 ðQÞ þ 0:16

ð2Þ

Nevertheless, it has become recently apparent94 that the two types of trans-glycosidic coupling pathways (CaOCC related to j and CCaOC related to where Ca is the anomeric carbon) are not equivalent. Therefore they cannot be treated using a single, generalized Karplus equation. The effect of an internal electronegative substituent on the 3JCC value in the CCaOC pathway should be properly handled and taken into account in the quantitative analysis of linkage conformations. In addition, it is important to consider the effect of terminal electronegative substituents when they lie in the coupling plane and have positive contribution to the observed coupling constant. Another study has shown that the magnitude and sign of the geminal 2JCOC coupling can also be correlated to the glycosidic linkage conformation.91,95,96 An approximate correlation, relating 2JCOC with the glycosidic angle has been derived using the ‘projection-resultant’ method by Serianni et al. It has been shown that 2JCOC depends on the COC glycosidic bond angle as well, with larger angles producing more negative values.93 Furthermore, 1JCH coupling constants involving the C–H pairs around the glycosidic linkage have been shown to offer information on the glycosidic dihedral angles.97,98 Other intraresidue couplings involving anomeric carbon such as 1JC19C29 and 2JC19H29 are also known to be sensitive to the angle.99,100 The strength of this new strategy for the analysis of linkage conformation in carbohydrates relies on the redundancy of experimental scalar coupling data. The approach includes the combined use of proton–carbon and carbon–carbon coupling constants thus providing sufficient experimental data to deduce the glycosidic linkage conformation even for flexible oligosaccharides in which more than one conformation exists in solution. The calculation of statistical weights of individual conformers, based on all available scalar coupling constants is relatively simple, implying linear averages over an ensemble of conformers. Several recent studies have proved the inherent advantages of this strategy demonstrating that ambiguities of Karplus-type equations can be overcome with the use of multiple coupling data related to the same angle. Although the carbon–carbon coupling constants are mostly measured in 13C-enriched samples, recent progress in biochemical and chemical synthesis of suitably labeled samples encouraged the use of this J coupling-based approach in conformational studies of flexible oligosaccharides. Moreover, the recent advances in NMR hardware (high field magnets, cryoprobe technology) and methodology make these couplings accessible even in natural abundance samples. Another recent study has proposed the use of C–H dipolar cross-correlated relaxation rates to resolve the ambiguity of 3JHCOC coupling data in deducing linkage conformation.101 The authors developed a new HMBC-type experiment, -HMBC, in which four NMR parameters, including two interglycosidic 3JHCOC couplings and two CiHi,CiHj cross-correlated relaxation rates are measured and then related to the corresponding torsion angles and j. The pulse sequence is shorter and therefore more sensitive than the conventional HMBC experiment, making it feasible to obtain data on natural abundance samples with conventional NMR probes.

Biomolecular Recognition by Oligosaccharides and Glycopeptides: The NMR Point of View

203

9.07.1.3.2

Conformational analysis of hydroxymethyl groups in carbohydrates The exocyclic hydroxymethyl (CH2OH) group in monosaccharides and 1 ! 6 glycosidic linkages of oligosaccharides present another important conformational domain of carbohydrates. Rotation about the C5–C6 bond influences both the intra- and intermolecular hydrogen bonding characteristics as well as the dipole moment of the molecules. Therefore, the study of the hydroxymethyl group conformation is essential in the determination of the 3D structures of carbohydrates. Several recent experimental and theoretical studies have focused on the conformational properties of unsubstituted CH2OH groups in order to identify the factors that influence their rotamer distribution and to derive new Karplus equations for the interpretation of the measured scalar coupling constants. The conformation of the CH2OH group about the exocyclic C5–C6 bond can be described by the torsional angle ! (O5–C5–C6–O6), but it is more usual to define it by means of the populations of the three staggered rotamers, gauche–gauche (gg), gauche–trans (gt), and trans–gauche (tg) (see Scheme 2). The first letter describes the torsional relationship between O6 and O5, while the second indicates that between O6 and C4. Most studies use this three-state staggered rotamer model to analyze the coupling constants and thus to deduce the rotamer distribution, but other treatments of J couplings have also been described.102–104 Traditional analysis of hydroxymethyl group conformation has relied on vicinal proton–proton scalar couplings, where the rotamer populations are calculated from the measured JH5,H6R and JH5,H6S coupling constants.105,106 The stereochemical assignment of the 1H NMR signals of the prochiral protons at C6, H6R, and H6S, usually established on the basis of their chemical shifts and vicinal proton–proton coupling constants, is therefore of major importance for this kind of study.107,108 Although this approach is frequently used, the inappropriate limiting values for the gauche and trans couplings derived from different types of Karplus equations lead to unrealistic negative populations of tg rotamers. Serianni and coworkers109 have recently proposed new limiting values, which provide a more accurate description of the rotamer populations and, in contrast to earlier Karplus equations, generate positive populations for the tg rotamer. This approach was applied to several mono- and disaccharides110–112 and the dependence of the hydroxymethyl rotamer population on nonbonded interactions, stereoelectronic effects, and/or hydrogen bonds were systematically analyzed. Theoretical and experimental studies have demonstrated that the geminal proton–proton coupling (2JH6R,H6S) is influenced by both ! and (C5–C6–O6–H6) dihedral angles. The latter dependence is particularly useful in the conformational analysis of the 1 ! 6 glycosidic linkage, providing complementary data to other J coupling correlations.109

Scheme 2 Schematic representation of a hexopyranoses (a) and a 1 ! 6 linked disaccharide (b) showing the ! torsion angle. Schematic diagram of the gt, tg, and gg staggered conformers around the C5–C6 bond.

204 Biomolecular Recognition by Oligosaccharides and Glycopeptides: The NMR Point of View

Vicinal proton–carbon coupling constants (3JH6R,C4 and 3JH6S,C4) also provide valuable information about the hydroxymethyl conformation. Theoretical calculations were applied to establish Karplus-type equations that correlate these couplings with ! torsion angle and then these were used to estimate C5–C6 rotamer populations.88,113,114 The populations obtained from these couplings, however, were significantly different from those deduced from the vicinal proton–proton couplings and from the results of theoretical calculations. This finding may be partly due to the fact that 3JCH is less sensitive to conformational changes. A useful application of 3JCH is to assist with the stereochemical signal assignment of the diastereotopic C6 methylene protons.104,113,114 The introduction of 13C-labeling into saccharides has offered a unique opportunity for the measurement of different proton–carbon and carbon–carbon coupling constants, extending the arsenal of experimental parameters for the conformational analysis of hydroxymethyl fragments. These one-, two-, and three-bond couplings involving C5 and C6 and their attached protons (18 couplings in total) (Scheme 2) are known to exhibit dependence on the C5–C6 torsion angle, thus providing complementary conformational constraints. In addition, some of them are also influenced by the C6–O6 angle ().109 Based on theoretical and experimental methods, a set of new Karplus equations have been developed to correlate the magnitudes and signs of these couplings. J couplings that display dependencies on more than one structural parameter are remarkably useful to probe specific conformational features. For example, two-bond proton–carbon coupling constants (2JH6R,C5 and 2JH6S,C5) were shown to be sensitive to both ! and . Therefore, together with 2JH6R,H6S, correlated conformations about both torsion angles could be studied.115,116 This approach can be particularly valuable in evaluating linkage conformations of biologically relevant 1 ! 6-linked oligosaccharides. One-bond 13C–13C coupling constant 1JC5,C6 is also influenced by both ! and .115 Likewise, second-order dependence was reported for three-bond 13C–13C coupling constants in aldohexopyranosyl rings.69,115 These recent studies have unquestionably demonstrated that the collective use of these proton- (JHH) and carbon-based (JHC, JCC) J couplings leads to a more detailed understanding of the conformation and mobility of hydroxymethyl group, both free or involved in a glycosidic linkage. 9.07.1.3.3

Conformational analysis of hydroxyl groups in carbohydrates Intramolecular hydrogen bonds in crystals of carbohydrates are well documented.117 Hydroxyl groups positioned on the surface of saccharides serve as important recognition sites in saccharide–protein interactions and also mediate the interactions with solvent molecules. More recently, the use of hydroxyl protons in conformation, structure, and interaction studies of carbohydrates in solution by NMR has gained increasing importance. High-resolution NMR offers several different ways to deduce the conformation of hydroxyl groups and to seek evidence for the existence of intra- and/or intermolecular hydrogen bonds.118,119 The most important NMR parameters obtained for the hydroxyl protons are chemical shifts (), vicinal proton–proton coupling constants (3JHC,OH), temperature coefficients (/T), deuterium-induced differential isotope shifts, and exchange rates (kex).119–123 These parameters may provide information on hydrogen bond interactions and hydration as well. Moreover NOEs and chemical exchanges involving hydroxyl groups observed by NOESY and ROESY experiments also add to the number of distance restraints used in conformational analysis. Until a few years ago, the detection of hydroxyl proton resonances was achieved in aprotic solvents, such as dimethyl-sulfoxide (DMSO) or CDCl3, in order to eliminate the problem of fast exchange with the protons of the solvent. However, under these conditions, the influence of the organic solvents on the conformational equilibrium must also be considered. For example, DMSO is known to enhance the ability of hydroxyl groups to participate in intramolecular hydrogen bonding. It has been shown that H-bonds observed in DMSO do not persist significantly in water and are competed out by intermolecular H-bonds involving water molecules. However, strong and persistent H-bonds have been reported to exist for simple and complex oligosaccharides in aqueous solution.124–126 Unfortunately, under normal conditions, in aqueous solution of carbohydrates, the hydroxyl protons are in fast chemical exchange with water, which severely limits the utility of these protons as conformational probes in NMR studies. Recently, several groups have had success in detecting these hydroxyl protons by using binary mixtures of water and different organic solvents (such as water/acetone-d6, water/methanol-d4, and water/DMSO-d6127,128). By lowering the temperature, or using dilute aqueous solutions under supercooled

Biomolecular Recognition by Oligosaccharides and Glycopeptides: The NMR Point of View

205

conditions129 the chemical exchange effects are also reduced. However, for all NMR experiments in water careful sample preparation is required, involving the fine adjustment of pH to 6–7, the removal of traces of metal ion impurities to avoid undesired exchange with water protons, and the application of gradient-based pulse schemes for efficient suppression of water signal. It has been observed that hydrogen bonding causes downfield a change in the chemical shifts of hydroxyl protons and that the magnitude of deshielding is dependent on the strength of such H-bonds.119,125,130 Since the chemical shifts are influenced by several factors that are difficult to predict, the use of them alone as indicator of H-bonding is not recommended. Solvent accessibility, and thus exchange rates, could be reduced for hydrogenbonded hydroxyl groups. However, as the exchange rate is very sensitive to pH, temperature, solvent composition, and metal ion impurities as well, this parameter also leaves ambiguity in the probing of the hydrogen bonds. A small temperature coefficient of <5 ppb K1 has been commonly accepted as an indicator of the reduced interaction with the solvent due to participation in intramolecular H-bonds. Further evidence may come from nuclear Overhauser enhancement (NOESY/ROESY) and exchange measurements to determine whether or not a particular H-bond exists in solution. It has been reported that exchange peaks detected between hydroxyl protons can be an indicator of weak, transient hydrogen bonds in aqueous solution.118,119 Several studies have demonstrated that more direct and reliable evidence for H-bonds comes from scalar spin–spin couplings of hydroxyl protons. Until recent years, proton–proton vicinal couplings involving hydroxyl protons were mostly measured and used for deducing OH conformation in solution. This is partly due to the sensitivity problem related to the measurement of heteronuclear coupling constants and, additionally, to the lack of appropriate Karplus equation relating the heteronuclear vicinal coupling constant 3JC,OH to the C–O torsion angle. According to the Karplus equation derived for the vicinal proton–proton coupling constant, 3JH,OH,131–135 coupling of the order of approximately 5.5 Hz indicates free rotation of the hydroxyl group around the C–O bond. Deviation from this rotationally averaged value may be a sign of participation in a H-bond. Large 3JH,OH of approximately 8–10 Hz indicates a preferred trans/anti orientation with respect to the ring C–H, while small value of 3JH,OH approximately 2–3 Hz indicates a preferred syn conformation. However, since only one 3JH,OH value is available for each OH group, a more accurate description of OH conformation is not feasible. Calculation of rotamer populations about the C–O bond (Scheme 3) requires additional NMR parameters involving heteronuclei. With the advent of cryoprobe technology and the design of new, gradient- and sensitivity-enhanced pulse sequences, the measurement of heteronuclear coupling constants of hydroxyl protons has become more feasible. As a result, the past several years have witnessed a significant effort to extend the use and interpretation of 3JH,OH and 3JC,OH coupling constants. In particular, their combined use can potentially be a useful approach for the investigation of OH conformation and calculation of the rotamer distribution around the C–O bond. Unfortunately, until recently, only few works have reported the application of 3JC,OH in carbohydrates.126,136,137 A proper Karplus equation describing the relationship between 3JC,OH and C–O torsion angle was not yet available. In a recent work,138 we have attempted to parameterize the Karplus dependence of 3JC,OH couplings taking advantage of the redundant set of J couplings measured for the two anomeric forms of a simple monosaccharide, 4,6-O-benzylidene-1-metoxy-D-glucose (Figure 1).

H3

H3

H3 H

H

C4

C2 g– Θ = –60°

C4

C2 g+ Θ = 60°

C4

C2 H Anti Θ = 180°

Scheme 3 Definition of hydroxyl rotamers about the C3–O3 bond. The Q torsion angle is defined by the corresponding Hi–Ci–Oi–Hi angle.

206 Biomolecular Recognition by Oligosaccharides and Glycopeptides: The NMR Point of View

Figure 1 Schematic diagram of the - and -anomers of 4,6-O-benzylidene-1-metoxy-D-glucose (1, 2) and of 4,6-Omethylidene-1-metoxy-D-glucose model compounds (3, 4).

A complete set of vicinal proton–proton and proton–carbon J couplings involving four OH protons (i.e., three couplings for each OH, altogether 12 pieces of experimental data) were measured for the - and -anomers in CDCl3 using natural abundance samples. The data were simultaneously analyzed using a global fit approach to yield the OH rotamer populations and to derive a Karplus equation for the 3JC,OH coupling. In this iterative procedure, the rotamer populations of OH groups (i.e., eight populations in the two molecules) were adjusted together with the three Karplus parameters describing the angular dependence of 3JC,OH couplings (i.e., 11 variables in all) to obtain the best fit between the experimental and calculated coupling constants. The Karplus equation deduced from this fitting procedure is as follows: 3

JC;OH ¼ 5:4939 cos2 ðQÞ – 0:5853 cos ðQÞ þ 0:1023

ð3Þ

It is important to note that to derive a Karplus equation that accounts for the effects of electronegative substituents would require a significantly larger amount of experimental data. Independently, density functional theory (DFT) calculations have been carried out on both anomers of the model compound 4,6-O-methylidene-1-metoxy-D-glucose (Figure 1) to investigate the conformational properties of the hydroxyl groups using a standard basis set (B3LYP/6-311þþG(d,p)) as well as using functions for accurate description of Fermi contact contribution as implemented in Gaussian 03 rev/D.139 The calculated populations of the lowest energy conformers and the calculated conformationally averaged coupling constants were in good agreement with the corresponding NMR data. It was found that in all low-energy conformers ‘dual-type’ hydrogen bonds (Figure 2) stabilize the overall structure. With increasing availability of 13C-enriched saccharides, additional 13C–1H and 13C–13C coupling constants have provided valuable information to confirm and/or extend structural conclusions based on the traditional proton-based NMR data. In a recent study, a correlation has been established that relates the values of 1JCH in –CHOH– groups to the strength of H-bonds involving the OH hydrogen.140 It was also shown that the C–O conformation in solution

Biomolecular Recognition by Oligosaccharides and Glycopeptides: The NMR Point of View

g+

g–

g+

207

Anti

Figure 2 Three-dimensional representation of the (g, gþ) rotamer of the - and of the (anti, gþ) rotamer of the -anomer of 4,6-O-methylidene-1-metoxy-D-glucose. Both low-energy conformers contain the 3 ! 2 and 2 ! 1 type dual hydrogen bond network stabilizing the overall system.

can be evaluated indirectly through complementary coupling constants, such as 1 JCC 99 or 2JC,OH. Based on DFT, new Karplus equations have been derived for 3JH,OH and 3JC,OH couplings, which also account for the nature and orientation of internal and terminal electronegative substituents.126 It can be expected that combining J couplings displaying direct or secondary (indirect) dependence on the C–O torsion angle may provide a more detailed picture of the conformational behavior of hydroxyl groups that have important implications on the chemical and biological reactivity of saccharides.

9.07.1.4

Conformational Analysis of Carbohydrates in Dilute Liquid Crystalline Media

Liquid crystal NMR spectroscopy is a well-established method for obtaining accurate geometries of small and rigid molecules.141–144 Although this method has been applied during the past three decades to numerous molecules, the route from dipolar couplings to molecular structures is not an easy one. The main complication is that the solutes in liquid crystals normally exhibit complex, second-order spectra. The limitation of this method is that beyond 10 interacting spins spectra acquired in strong liquid crystals usually become too complicated to be analyzed properly.145 The introduction of dilute liquid crystalline media in the past decade has brought the possibility of imposing very low order on the solute molecules, resulting in significant reduction of dipolar coupling constants, referred to as residual dipolar couplings (RDCs). Preserving the near first-order character of spectra in such media greatly facilitates the extraction of RDCs, which have become a rich source of structural information for large biomolecules.146 In this chapter we focus on the use of RDCs in the structure elucidation of free carbohydrates. For information on the application of RDCs in the analysis of protein–carbohydrate complexes, see Section 9.07.2. A variety of weak liquid crystalline media have been used to align carbohydrates. Examples include Pf1 phage,147–150 filamentous bacteriophage fd,151 C12E5/n-hexanol,77,152–154 C8E5/n-octanol,155 cetylpyridinium bromide/n-hexanol/NaCl,73,156–158 DMPC/DHPC (dimyristoyl-phosphatidylcholine/dihexanoylphosphatidylcholine) bicelles,149,159–164 and mineral liquid crystals such as aqueous suspension of V2O5 or a lamellar phase composed of covalent rigid planes of H3Sb3P2O14 dispersed in water.165 Several questions arise when making the transition from the strong to the weak alignment: (1) can RDCs provide accurate structures of small molecules comparable to those obtained in strong liquid crystals? (2) are the existing methods for the measurement of scalar coupling constants suitable for accurate measurement of RDCs? and (3) perhaps most importantly, given the opportunity to study larger systems, can we handle the inevitable flexibility intrinsic to larger molecules and still obtain meaningful information about their conformations? In this chapter we address the above questions focusing on carbohydrates, noting that parallel efforts are underway in the studies of small, organic molecules.166–168 We begin by observing that the utilization of RDCs in the analysis of small molecules, including oligosaccharides, is still ‘under construction’ and that there are no definite answers in particular to questions (2) and (3). Because of this, rather than providing a detailed description of one or two approaches, we present here a brief survey of the current state of the field, which will hopefully orient the reader in his/her work with the primary literature. 9.07.1.4.1

Accurate structures of small, rigid molecules from RDCs A solution structure of a simple methylated monosaccharide, methyl -D-xylopyranoside (I) in a weakly aligned medium, was recently elucidated based on 30 RDCs (15 DHH, 4 1DCH, and 11 nDCH).153 These were

208 Biomolecular Recognition by Oligosaccharides and Glycopeptides: The NMR Point of View

measured using intensity-based methods yielding absolute values of less than 6.5 Hz with an estimated precision of 0.02 Hz. The structure of I was refined using vibrationally corrected RDCs against a model in which the distances between the directly bonded atoms were fixed at their ab initio values, while eight bond angles, eight dihedral angles, and five order parameters were optimized. The refined structure of I is very similar to that obtained by ab initio calculations, with 11 bond and dihedral angles differing by 0.8 or less and the remaining five differing by up to 3.3 . Comparison with the neutron diffraction structure showed larger and more numerous differences, which were attributed to crystal packing effects.

This study has shown that, providing a large number of RDCs can be obtained with sufficient accuracy, there is no principal reason why small RDCs cannot yield very accurate structures of small, rigid molecules at the level of those obtained in strong liquid crystals. A few aspects of this work deserve a comment here. 1. The largest potential source of error in the accurate determination of RDCs is the higher order effects. These are prevalent in carbohydrates. Fortunately, as the RDCs are determined as the difference between the measured splittings in the aligned and isotropic phases, the errors in the obtained RDCs are reduced substantially. Although affecting the absolute values of splitting in either sample, the strong coupling effects are to some extent reduced in the readout of the RDCs.151 However, this is only the case when the degree of the higher order is close between the two systems. Hence, it implies that the weakest alignment possible should be used. On the other hand, the induced RDCs should be large enough to allow accurate measurement of as many RDCs as required for a particular study, which usually includes those between more distant nuclei. It is imperative that identical methods are utilized for both isotropic and aligned samples, where the compensation for the higher order effects is most successful. 2. Vibrational corrections have been routinely used to correct RDCs measured in strong liquid169 while this practice had not been taken up thus far in weakly aligned systems. This work has shown the largest vibrational corrections (8%) for 1DCH coupling constants, that is, above the uncertainty of the measurement in this particular instance. 3. The dihedral angles of the neutron diffraction structure differed by less than 5 from those of the refined solution structure of I, yet some back-calculated RDCs based on the neutron diffraction structure deviated by up to 0.8 Hz from the experimental values. This illustrates the exquisite sensitivity of RDCs to the molecular geometry and suggests a question: What kind of structures should be used in the interpretation of RDCs? It is the experience of several groups that ab initio structures generally yield better agreement between the experimental and back-calculated RDCs than those generated by force fields. It is therefore likely that the use of vibrational corrections when analyzing force field generated structures is superfluous and the main reason why they have been neglected thus far. One should also be aware that there are small but genuine differences between the solid- and liquid-state structures, which can be identified by RDCs. 9.07.1.4.2

Measurement of RDCs Presented below is a brief outline of the techniques used for the measurement of various types of RDCs in carbohydrates. These methods can be divided into two categories: frequency- and intensity-based. Frequencybased methods determine the splittings from the frequency difference between spectral lines, while the intensity-based methods require a series of spectra to be acquired, each differing in the length of a crucial delay or delays in the pulse sequence. The splitting is then obtained by fitting the intensity of spectral lines to a known function. The latter methods are usually more time consuming, but also work on unresolved multiplets. This, together with higher precision, conditioned by sufficient signal-to-noise ratios, is their main advantage. The most readily measurable RDCs are those between the directly bonded protons and carbons. However, the internuclear orientations of one-bond CH vectors in monosaccharide rings are degenerate to a large extent (e.g., all axial CH vectors in hexopyranoses point approximately in the same direction). Some methods of interpretation of RDCs require sampling of at least five unique orientations, which must then be provided by additional types of RDCs, such as DHH, nDCH, and DCC.

Biomolecular Recognition by Oligosaccharides and Glycopeptides: The NMR Point of View

209

Owing to a fourfold degeneracy of the calculations of RDCs measured in the isotropic and aligned samples (RDC ¼ jJ þ Dj jJ j), the signs of RDCs are not always obvious. There are, however, several exceptions. Most notably, the one-bond 1H–13C RDCs – due to the large positive value of corresponding scalar couplings. Based on the same argument, large positive 3JHH and large negative 2JHH coupling constants also usually allow unambiguous determination of the signs of corresponding RDCs. The E.COSY-based techniques can, albeit mostly for the intraring RDCs, yield the signs of DHH and nDCH couplings. In the absence of experimental data, the signs of RDCs can be implied by considering all possibilities (four when J 6¼ 0 or two when J 0) and using the fit between the experimental and theoretical RDCs as the criterion.155 As this approach is structure dependent, caution should be exercised and large numbers of couplings need to be used so as not to bias the analysis. 9.07.1.4.2(i)

H–1H residual dipolar coupling constants Frequency-based methods. 1H–1H E.COSY spectra

1

have been used to determine the sizes and, in some cases also the signs, of 1H–1H RDCs. Unfortunately, this does not include the interring dipolar couplings; the corresponding cross-peaks do not show the E.COSY pattern.170 1H–1H DQF-COSY in combination with J doubling was used to extract the RDCs from the spectra of a highly deuterated dodecasaccharide.157 COSY spectra, analyzed using the ACME procedure,171 were used to measure RDCs in aligned carbohydrates.155,161,164 E. COSY style multiplets involving 13C nuclei at natural abundance provide 1H–1H splittings from F2 displacements of the two parts of the multiplet by sampling either 1 41 H or 13C172 frequencies in F1. The latter approach benefits from larger dispersion of 13C chemical shifts. In addition, both methods provide at the same time the one-bond 1H–13C splittings in F1.152,156 Alternatively, separation of / states into two spectra in S3-CT-TOCSY147,173 can be used to reduce the spectral overlap. Geminal 1H–1H couplings of 1 ! 6 linkages were measured using COS3 pulse sequence148,174 or J-modulated HMQC incorporating a BIRD pulse.175 Signed COSY experiments176 with TOCSY or NOESY mixing times were used by Landersjo¨164 to determine the signs of some interring RDCs. Intensity-based methods. 1H–1H RDCs have been obtained by analyzing the intensity ratios of the diagonal and cross-peaks in a series of 2D CT COSY spectra.177 This method can only be applied to resolved resonances, for example, those of anomeric protons.149 Similar limitations apply to J-modulated 1D directed COSY,153,178 which uses selective 180 pulses to produce a series of 1D spectra for each pair of coupled spins. This approach has recently been extended to include additional selection blocks yielding a versatile method for the measurement of coupling constants in compounds with severely overlapping proton resonances such as those found in carbohydrates.154 The problem of overlapping resonances can also be resolved by involving 13C nuclei, as demonstrated on natural abundance (13C COSMO HSQC)32 or uniformly 13C isotopically enriched carbohydrates (2D-HSQC-(sel C, sel H)-CT COSY experiment).73,158 One-bond 1H–13C residual dipolar coupling constants

Frequency-based methods. One-bond H– C splittings can in principle be measured from 1H–13C HSQC-type spectra, where the decoupling is removed either in the F2150,165,179 or F1149,156,157,159 domain. Variable156,157,159 or constant-time sampling149,158,161 can be employed when sampling in F1. When strong coupling is present, the line shapes of and spin-doublet components are different in F2, which makes the determination of the splitting difficult. To minimize the errors, Almond et al.150 overlaid resonances in the spectra acquired from the aligned and isotropic samples (which are similar except for line broadening), and then measured the distance between the two resonances. This distance is equal to 1DCH but does not require the distance between and spin resonances to be measured. The readout of line frequencies is simpler in F1, but potentially less precise due to limited digital resolution. The digital resolution can be increased if long-range, either proton–carbon or proton–proton, interactions are removed by the action of a BIRD pulse.36,175,180 If required, overlap reduction can be achieved by separating the / states into two spectra as demonstrated in S3-CT-HSQC37,147,148 or SPITZE (spin state selective zero overlap)-HSQC.170 Intensity-based methods. A 2D 1JCH-modulated 13C–1H CT-HSQC181 has been used to measure the proton– carbon RDCs in carbohydrates.155,164,182 This method samples the CH splitting using carbon magnetization and a narrow range of evolution intervals (25–30 ms) in order to avoid the interference of long-range proton–carbon splittings. Further increase in the precision of the measured splittings was achieved by using longer evolution intervals (170 ms).151,153 In this method, the interference of 1H–1H or long-range 1H–13C couplings was removed by the application of BIRD pulses.180,183 BIRD pulses work best on weakly aligned samples 9.07.1.4.2(ii) 1

13

210 Biomolecular Recognition by Oligosaccharides and Glycopeptides: The NMR Point of View

(1DCH < 10 Hz), where the interference of proton–proton RDCs and the variations in the one-bond splitting is minimal. These highly accurate methods were recently used to measure 1DCH RDCs as a function of the carbohydrate concentration. The observed changes provided evidence for the existence of calcium-mediated interactions between Lewis X-related trisaccharides.184 9.07.1.4.2(iii)

Long-range 1H–13C residual dipolar coupling constants Frequency-based methods. 2DCH cou-

pling constants have been measured with an E. COSY-type experiment incorporating S3CT pulse element and producing HSQC–like spectra.149 Homonuclear correlation spectra in the form of !1 13C-filtered DQF-COSY can also provide both one-bond and two-bond proton–carbon splittings.153 In addition, both methods also yield the signs of 2DCH coupling constants. Intensity-based methods. Long-range quantitative J spectra have been used to measure 2DCH coupling constants of sucrose.161,163 J-modulated constant-time HMBC experiments153 were used to measure the long-range proton–carbon RDCs of methyl -D-xylopyranoside. The efficiency of the latter experiments, particularly for aligned samples, is improved dramatically when the proton of interest can be selectively inverted allowing refocusing of proton–proton and proton–carbon splittings.153 13

C–13C residual dipolar coupling constants

Frequency-based methods. One-bond C– C RDCs of uniformly 13C isotopically enriched sucrose were measured from 1D spectra simplified through the use of selective 13C decoupling.161 Increased sensitivity of cryoprobes has allowed the measurement of 13C–13C RDCs at the natural abundance of 13C using tens rather than hundreds of milligrams of compounds. A recent comparison of 1H- and 13C-detection in INADEQUATE experiments78 showed that 13C detection is a viable alternative, and likely a method of choice for aligned samples.77 Simultaneous determination of one-bond and long-range 13C–13C RDCs of methyl -D-xylopyranoside by 13 C-detected IPAP (in-phase antiphase)-INADEQUATE illustrated this point.77 Intensity-based methods. Intensity-based methods detecting 13C–13C pairs are practical only for 13C isotopically enriched oligosaccharides. 13C–13C CT-COSY was used to measure one-bond 13C–13C RDCs in uniformly 13C isotopically enriched lactose.73 9.07.1.4.2(iv)

One-bond and long-range 13

13

9.07.1.4.3

Interpretation of RDCs Over the past few years, a number of approaches have emerged for the interpretation of RDCs in terms of carbohydrate structure. However, a key feature of all these methods is that they require the alignment tensor to be determined. Order matrix analysis uses experimental RDCs, while some molecular properties, such as molecular shape or mass distribution provide the alignment tensor a priori without the need for the experimental RDCs. Common to all methods, the interpretation of RDCs in terms of structure is down to describing the orientations of internuclear vectors, along which the dipolar couplings are measured. This is usually done in a special Cartesian molecular frame, referred to as the principal order (or alignment) frame (PAF).185 In this frame of reference the order matrix (or the alignment tensor) is diagonal, characterized by three order parameters, Sx9x9, Sy9y9, Sz9z9 (or axial and rhombic components of the alignment tensor) and three Euler angles, which define the orientation of the PAF in the initial molecular frame. From this short description it is obvious why RDCs are referred to as a long-range sensor: it is irrelevant how far away the two internuclear vectors are from each other. By establishing their orientation relative to a common molecular axis their mutual orientation can be determined. In the next section individual approaches for the interpretation of RDCs are briefly outlined. Order matrix analysis The order matrix describes the residual orientation of the molecule and the strength of the alignment. The subsequent diagonalization of the symmetric order matrix yields the parameters described above: three angles (orientation) and three order parameters (strength). Owing to the fact that an order matrix is traceless (Sii ¼ 0)141 a minimum of five, instead of six, unique RDCs are required to calculate it. Singular value decomposition is used to find the best least square solution for the order parameters.185 The three order parameters can easily be related to two parameters of the alignment tensor taking into account the fact that the order matrix is traceless. 9.07.1.4.3(i)

Biomolecular Recognition by Oligosaccharides and Glycopeptides: The NMR Point of View

211

Similar to the orientation of internuclear vectors, the orientation of rigid molecular fragments can be established via the analysis of RDCs. Suppose individual monosaccharide rings are connected by rigid glycosidic linkages and that it was possible to determine the PAF for each ring using five or more unique RDCs. The rings can then be rotated into their principal order frames and translated to make glycosidic linkages. Programs such as REDCAT,186 have been developed to do precisely this. Building the oligosaccharide structure in this way is meaningful only if (1) the monosaccharide rings are rigid and (2) share the same order frame. In other words, RDCs interpreted, as outlined above, yield the correct structure of an oligosaccharide only if the whole molecule is rigid.149 Even then, complications arise from the cos2 dependency of RDCs; four solutions in all quadrants of the circle are equally valid. Some can be eliminated based on steric clashes; some may be contradicted by additional experimental parameters such as NOEs or coupling constants. Alternatively, the ambiguity can be resolved by measuring RDCs in a complementary aligned medium.146 The approximation of rigid oligosaccharides has been used in early interpretation of RDCs in carbohydrates.179,182 A relaxed grid search has been used to generate structures of human milk oligosaccharides159 using a consistent valence force field. Individual low-energy structures were then tested against NOEs and RDCs. Even flexible oligosaccharides can be studied in this way by dividing them into rigid sections.156 In this study, restrained simulated annealing was used to refine the whole molecule, but RDCs were used separately to refine the two flexible parts. Interestingly, small changes of the dihedral angles of the hexapyranose rings were observed accompanied by an improved fit between the experimental and calculated RDCs. Restrained simulated annealing was also used in the study of a trisaccharide from ganglioside Gm3182 yielding excellent agreement with the experimental RDCs. The raffinose and sucrose structures have been refined in X-Plore effectively using one alignment tensor for the entire molecule.170 However, subsequent studies using RDCs found evidence for flexibility in sucrose.163 It therefore seems possible that the use of restrained MD or simulated annealing can artificially improve the agreement between the experimental and back-calculated RDCs. This fact alone should therefore not be used as evidence for the existence of a single rigid conformer. In a study of mannose oligosaccharides, a single structure fitted the RDC data; however, a dynamic ensemble of structures was required to predict the experimental relaxation data.147 The possibility that a single structure could correspond to a virtual conformer should therefore never be ruled out. Its consistency with experimental data sampling motions on different timescales, that is, 1H–1H NOE, 3JCH and 3JCC scalar couplings, or relaxation data, should always be checked.147,156 A promising approach for detecting the signs of conformational averaging in carbohydrates was proposed by Tian et al.149 and illustrated on the analysis of the conformational and motional properties of a trimannoside. This approach is based on the analysis of order matrices of qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi P individual monosaccharide ring. A generalized degree of order (GDO) was introduced as GDO ¼ ð2=3Þ ij Sij2 , where Sij are Saupe order matrix elements. For rigidly connected molecular fragments, GDO values, as well as the individual order parameters, are identical for each monosaccharide ring. In the case of the trimannoside it was found that the GDO for the two rings connected by a 1 ! 3 linkage were similar, while the third ring linked via a 1 ! 6 linkage had GDO reduced by 40%, indicating flexibility. The advantage of the GDO concept is that it reflects both internal and orientational averaging and it can be expressed in any frame fixed in the molecule.146 The question that remains to be answered is to what extent can GDO values differ between individual rings for them to be considered rigid. A 1.2-fold difference was deemed sufficient by Tian et al.,149 but was interpreted as a sign of flexibility by Stevensson.160 Studies of motional averaging of RDCs (e.g., Deschamps187) will help to address this problem. The accuracy with which GDO can be determined will also play a role; potential errors can originate in the measurement of RDCs, as well in the monosaccharide structures used to calculate order matrices. Determination of the alignment tensor from the molecular shape or mass distribution This methodology does not rely on the calculation of the order matrix from experimental data using a static molecular model. The RDCs are calculated using the alignment tensor derived from a potentially dynamic molecular model and some molecular properties. The agreement between the experimental and calculated RDCs is then used as a criterion for justifying both the model and the approximations used in calculating the alignment tensor. The attraction of this approach is that it does not require five unique RDCs for each monosaccharide ring to be measured. Instead, for rigid structures, a single scaling factor is optimized to obtain a fit between the experimental and theoretical RDCs.

9.07.1.4.3(ii)

212 Biomolecular Recognition by Oligosaccharides and Glycopeptides: The NMR Point of View

In an early example of the steric approach, the 1 ! 6 linkage of the trimannoside, identified as flexible by Tian et al.149 was studied by Almond.147 To simulate the alignment, it was assumed that the alignment was caused by steric restriction at the phage surface, as described previously in the prediction of the alignment from structure (PALES).188 The alignment tensor was calculated for every point of the MD trajectory and average RDCs were calculated using the gg and the gt portions of the simulation. Analysis of a 50 ns trajectory in water and comparison with the experimental data showed that the sampling of major molecular conformers was not correct, likely due to the shortcomings of the force field. A two parameter fit to the experimental RDCs found the best agreement for 55% gg and 45% gt conformers. This agreed well with the results provided by other experimental techniques147 and is practically identical to a similar analysis of the same trimannoside performed by Prestegard and Yi.189 The latter study calculated the alignment based on the rmsd overlay of different conformers of the entire molecule. Another example of the use of steric alignment was provided by the work of Almond et al.147 The use of an inertia tensor to calculate the alignment of molecules was initially proposed for thermotropic liquid crystals190 and recently applied to the studies of carbohydrates in weakly ordered media. This approach assumes that the ordering coordinate frame corresponds to the principal axis system of the moment of inertia tensor.157,191 Assuming the existence of a rigid molecule, it has been used to identify the conformer yielding the best fit between the experimental and theoretical RDCs from a series of Monte Carlo-generated decasaccharide structures.157 The assumption of a rigid molecule was partly relaxed when a set of structures, generated by molecular dynamics runs, are used.191 The use of the inertia tensor to calculate the alignment was conveniently formalized by the tracking alignment from the moment of inertia tensor (TRAMITE) program.162 As an alternative to the moment of inertia, second moment of atomic distribution, also known as the gyration tensor, was proposed to approximate the molecular alignment.192 Both tensors share a common PAF, while the latter provides more realistic values of the order parameters, in particular for highly elongated molecular shapes. The conformation of lactose has been investigated using TRAMITE and PALES.158 The authors have found that a single syn-, syn- conformer reproduced the experimental RDCs equally well using either model. The mixing in of the anticonformers for either the or dihedral angles beyond 3% worsened the fit significantly. This contrasts with the interpretation of NOE data, which required a 10% presence of the anti- conformer. In this study, only the -anomer of the reducing glucose was investigated. An interesting observation was made by Freedberg et al.193 who noticed a significant difference between the RDCs of the - and -anomers of glucose in lactose, suggesting that the two molecules have a different 3D structure in aqueous solution. By treating the molecule as a rigid entity and interpreting both the NOE and RDC data, they have concluded that the -anomer is consistent with the syn conformation also found in the X-ray structure, while the -anomer is not, suggesting some contribution of anti- conformer. Although residual dipolar couplings calculated using the radius of gyration tensor, performed for each frame of the MD simulation, were in trend agreement with the experimental data for a pentasaccharide and a hexasaccharide,150 the correlation was weaker than obtained previously in the study of trimannoside.148 As the simulations showed the existence of well-defined regions in the / space, the RDCs were back calculated using one alignment tensor for the entire molecule. A much better agreement with the experimental RDCs was observed from this and the simulated data also agreed with the NOESY data. The authors have therefore concluded that there are inadequacies in prediction of the alignment tensor using the mass distribution methodology. This may be particularly the case for highly anisotropic rod-like carbohydrates with a protruding side sugar. Other approaches to the interpretation of RDCs in flexible systems Using order matrix analysis,185 Freedberg161 investigated the dynamics of the furanose ring. This involved evaluating the fit of experimentally determined RDCs to 20 possible structures of sucrose’s fructofuranosyl ring, which differed only in their pucker phases. Using solely RDCs, the fit indicated that the ring pucker is localized to the NE quadrant of the pseudorotational wheel, most likely within the 20–70 range. Furthermore, the results obtained were in excellent agreement with the data provided by other methods. The solution structure and dynamics of sucrose were examined using a combination of RDCs and molecular mechanic force fields.163 It was found that the alignment tensors of the glucose and fructose rings were different, indicating internal dynamics. RDCs were fitted to structural models using order matrix analysis185 and algorithms available in NMRPipe. Fitting two structures simultaneously using 35 residual dipolar couplings resulted in a substantial improvement compared to using a single rigid structure. This process is dependent on

9.07.1.4.3(iii)

Biomolecular Recognition by Oligosaccharides and Glycopeptides: The NMR Point of View

213

the force field used. As major disagreements between force fields were found, multiple force fields were used to interpret the NMR data. Assuming rigid monosaccharide rings, the source of flexibility in carbohydrates is the glycosidic linkage characterized by the / torsion angles. In this approximation the complete information about the conformation of a disaccharide fragment is embodied in the conformational distribution function P(,). It has been proposed to construct such a function as a combination of the additive potential AP194 and maximum entropy ME195 methods used in the interpretation of RDCs in strong liquid crystals. The authors refer to this method as additive potential maximum entropy (APME) and have shown that it is valid in the low-order limit.160 This method assumes that each rigid segment (monosaccharide ring) makes its own contribution toward the overall ordering of the molecule and that the conformation-dependent elements of the order matrix are described as the sum of the conformation-independent and conformation-dependent terms. Similar to the singular value decomposition,185 at least five independent RDCs are required for each monosaccharide ring to characterize the orientational order, while additional interresidue RDCs are required for the construction of the conformational distribution function P(,). This function can at the same time incorporate interresidue scalar couplings (3JCH, 3JCC) and interresidue NOEs. The APME has thus far been tested on a disaccharide -L-Rhap-(1 ! 2)--L-Rhap-OMe.155,160 In addition to a global energy minimum at ¼ 0 , the analysis uncovered the existence of a weak local minimum at 160 corresponding to anti- conformer. The latter was not found by either the MD or LD molecular simulations, which showed the existence of two minima distributed at around 40 around ¼ 0 angle. The main APME minimum is a broader one encompassing both MD minima. It is difficult to ascertain whether the differences in the experimental GDO parameters (1.2-fold) for the two rings are due to the transitions between the minima identified by the MD or if these are due to the admixture of the anti- conformer suggested by the APME analysis. In order to investigate the motion along the glycosidic linkage of a disaccharide, Yi et al.152 modified a 4-O-D-galactopyranosyl--D-mannopyranoside by attaching n-butyl chain to the reducing end of the molecule. This resulted in a significant increase of RDCs compared to the native disaccharide in the C12E5/hexanol/ water aligning medium indicating that the modified disaccharide has been immobilized and reoriented through some specific association with the medium. The reduction in GDO by approximately 38% for Gal revealed the existence of a significant internal motion between rigid Gal and Man residues. By transiently anchoring one end of the disaccharide to a aligned bilayer medium using a short alkyl chain, the reference frame was forced to coincide with the frame determined for the reducing end sugar. This eliminated the need for additional data and allowed rigorous interpretation of the differences in sizes of order tensor elements of individual rings. Two conformers, S1 (17%) and S2 (83%), have been identified by the MD simulation for this disaccharide. To improve the accuracy of the modeling, the MD geometries of S1 and S2 were first submitted to a full geometry optimization using Gaussian 98, which altered the torsion angles slightly compared to the MD and presumably also the ring geometry. Using REDCAT, which can handle multiple state conformational averaging, and the immobilized rigid Man ring as the reference segment, the authors found that the experimental RDCs were satisfied for 15 10% of S1 and 85 10% of S2. 9.07.1.4.4

Conclusions Significant advances have been made in incorporating RDCs into the conformational studies of carbohydrates since the appearance of the first reports at the end of the last decade. It quickly became obvious that the interpretation of RDCs in carbohydrates is not straightforward and model dependent. Inherent flexibility of carbohydrates poses the greatest challenge for the interpretation of RDCs in terms of carbohydrate conformation. The limited dispersion of internuclear orientations of CH vectors between directly bonded carbon and proton atoms in monosaccharide rings means that additional types of RDCs such as DHH, nDCH, and DCC are required for proper analysis of RDCs. The demand on the number and accuracy of measured RDCs increases for flexible carbohydrates. Attention must be paid to higher order effects caused by the narrow range of 1H chemical shifts of carbohydrates. The methods for the measurement of RDCs surveyed in this review are those currently used in the field. It therefore is possible that some already published high-resolution NMR methods provide more efficient alternatives. Further development in this area is expected.

214 Biomolecular Recognition by Oligosaccharides and Glycopeptides: The NMR Point of View

RDCs are very sensitive to small changes in molecular geometry. Force field-generated structures therefore introduce certain fuzziness into the interpretation of RDCs. It is expected that the use ab initio structures in the interpretation of RDCs will increase. This will generate the need for the incorporation of vibrational corrections for 1DCH RDCs. The choice of molecular frames used in the interpretation of RDCs is very important. Currently, order matrix analysis of individual rings or calculations of the alignment tensor from the molecular shape or mass distribution are used. Forcing the reference frame to coincide with the frame determined for a part of the molecule is a promising strategy. More studies are needed in order to evaluate the merits of each individual approach. When describing conformation of flexible molecules using RDCs, models are needed that take into account both the overall molecular tumbling and internal motions. The simplest approach assumes that the molecule can be described by a single average conformation allowing only for small amplitude motions. This situation can be adequately described by using a single order matrix. However, caution must be exercised, as the existence of a single rigid structure that agrees with the experimental RDCs can hide the presence of conformational averaging, and in fact this structure can turn out to be a virtual. It is vital that other experimental parameters such as scalar coupling constants, NOEs, relaxation parameters, and potentially other NMR parameters, currently not used in the conformational analysis of carbohydrates, are cross-checked against the RDC-derived structures. Alternatively, these parameters should be incorporated in the process of structure generation as exemplified by the APME approach. Conformational equilibria can, in principle, be affected by the interaction with the medium. For a pentasaccharide, strong discrepancies between the NOE- and RDC-based structures were attributed to the interactions between the pentasaccharide and the mesogens, shifting the conformational equilibrium.165 Pure steric alignment is likely to be the safest way of aligning flexible molecules. On the other hand, the electrostatic interactions, or the alignment caused by a transient insertion of a part of the molecule into the oriented phase, should be treated with some caution. It is possible that under these circumstances one could strongly orient a minor member of a preexisting distribution of conformers and the observed RDCs would be heavily weighted by the properties of this conformer. A combined use of NMR parameters reflecting motions on different timescales will undoubtedly lead to a more accurate description of the conformational space occupied by flexible carbohydrates. The long-range orientation information provided by RDCs will play an increasingly important role.

9.07.2 Conformation of Oligosaccharides in the Free and Bound States 9.07.2.1

Introduction

In recent years, it has been shown that the interactions between carbohydrates and proteins mediate a broad range of biological activities, starting from fertilization and extending to pathological processes such as tumor metastasis.196 The implications in immunology processes has also been demonstrated.197 In all these processes, the 3D structures of both molecular entities are of paramount importance.198,199 It is obvious that a detailed knowledge of the structure of sugar entities, both free and bound to proteins, is indeed relevant from both basic and applied scientific viewpoints. This information may be extracted by different means, including NMR and different reviews have addressed this topic.200 X-ray crystallography has also been widely employed for characterizing free and complexed carbohydrate-binding proteins (for instance, Banerji et al.201). Accordingly, examples of the application of X-ray to the study of these compounds are of prime interest.202,203 However, carbohydrates are often rather difficult to crystallize, probably because of their inherent flexibility. Furthermore, X-ray basically provides only indirect information on the dynamics of the biomolecules and, moreover, for flexible structures, only one conformation may be analyzed. This section will focus on recent advances on the application of NMR methods, especially those based on relaxation (mainly NOE) methods to deduce the conformational behavior of saccharides in their free and receptor-bound states. It is not pretended to be exhaustive, and just provides a few key references for each protocol and methodology for further evaluation.

Biomolecular Recognition by Oligosaccharides and Glycopeptides: The NMR Point of View

9.07.2.2

215

The Conformation of Oligosaccharides in Solution

NMR has been widely applied in this field, since it provides both conformational and dynamic information. Because of the particular features of sugars, it is recognized that relaxation NMR parameters204 should be complemented with computational methods, as molecular mechanics/dynamics calculations,205 to define the structural and conformational features of the carbohydrate unambiguously. This task is commonly achieved by calculating potential energy surfaces for the glycosidic linkages, using a force field,206 or ab initio methods.207 It should be kept in mind, when comparing such calculations to experimental data, especially in water solution, that these methods provide just a first estimate of the conformational regions that are energetically accessible, and the possible presence of different conformational families.208 Protocols based on single-point conformers, restrained209 or unrestrained molecular dynamics,210 may also be employed satisfactorily. With molecular mechanics calculations, care needs to be taken when considering the relative energy values provided by the force field. Nevertheless, the calculated geometries are usually very good approximations to those existing in solution211 and in the solid state.212 Obviously, scalar coupling constants gather key conformational information that may be used to access to the distribution of oligosaccharide (or glycomimetics) conformers in solution.213 Also, residual dipolar couplings can also be used to this end. These methods have been reviewed in Section 9.07.1. A recent example of the combined application of RDC- and NOE-based experiments to deduce the solution conformation of a tetrasaccharide has been reported.214 It was demonstrated that the inclusion of RDCs permitted to further refine the actual geometries. In a parallel manner, the detailed analysis of the obtained conformer distributions using either NOEs or RDCs for lactose permitted to detect minor, but significant, differences between both methods. The different time-averaging sensitivity of RDCs and NOEs to motional processes may be at the heart of the distinct results. 9.07.2.2.1

The use of NOEs for conformational analysis of oligosaccharide molecules The relationship between NOEs and proton–proton distances is well established and can be worked out, at least semiquantitavely and also quantitatively, when a full matrix relaxation analysis is considered. The detailed study of the conformation and dynamics of a tetrasaccharide related to the LeX antigen provides a good example of this approach.215 NOE intensities are sensitive to the respective conformer populations, and that therefore, an indication of the population distribution when these molecules are free in solution and even in the protein-bound state may be obtained by focusing on key interresidual NOEs.216 For carbohydrate molecules, the key distances with conformational information are those between proton pairs on either ring.

The distance between hydrogen atoms A and D depends on the torsion angles around the glycosidic linkages (HA–CA–O–CD) and (HD–CD–O–CA). Very often, only one or two significant proton–proton distances can be measured.217 From this viewpoint, when adopting the typical restrained-based approach for generating conformations, as usually adopted for other biomolecules, the possibility of generating virtual conformations218 is very high and special care should be taken. Restrained simulations include an additional term in the force field to penalize the deviation from the experimentally deduced distance. Depending on the force constant employed to weight the experimental constrain, deviations from ideal geometries could occur, with concomitant distortions of the sugar rings. In very special cases, for branched sugars, there are a large number of interresidual proton–proton contacts.219 The careful analysis of these contacts may be used to deduce the existence (or not) of single conformers with molecular motion around well-defined geometries for the different glycosidic torsions of the molecule. Otherwise, when a significant number of restraints do appear, a time-averaged-restrained molecular dynamics protocol may be adopted, using a memory function, in the form of an exponential decay constant.220

216 Biomolecular Recognition by Oligosaccharides and Glycopeptides: The NMR Point of View

Simulation lengths of approximately 1 order of magnitude larger than the exponential decay constant should be used to generate reliable estimates of average properties. In this approach, conformational distributions that are able to simultaneously cope with all the deduced proton–proton distances are obtained. NOE-derived distances are included as time-averaged distance constraints and scalar coupling constants (if any) as time-averaged J coupling restraints, related to torsion angles (see Section 9.07.1). In principle, hr3i1/3 or hr6i1/6 averages can be used for the distances, while linear averages are used for the coupling constants. The key point is to employ reasonable force constants for both the NOE- and J-based terms of the force field to avoid the molecule that can get trapped in high-energy, physically improbable, incorrect minima.221 As additional mean to enlarge the number of interresidual NOEs, those involving hydroxyl protons can also be employed (see Section 9.07.1). Technically, the observation of hydroxyl protons is a difficult task, although a variety of methods have been proposed (as reviewed by Siebert et al.222). Also, regarding the problem of distinguishing pure NOEs from chemical exchange correlations, the best method is to combine NOESY/ROESY data. In ROESY, the interactions due to both processes have a different sign, while in NOESY experiments, at low temperature, both processes give rise to cross-peaks with the same sign as the diagonal peaks. Also, for certain mixing periods, cross-peaks can appear that are mediated by water molecules. Under all circumstances, and due to the hr6i or hr3i dependence of the NOE, minor populations of conformers can be detected, provided that they show exclusive interresidue proton–proton distances.223 In any case, the existence of molecular motion around the glycosidic linkages of oligosaccharides has been firmly established.224 Even in some special cases, simultaneous negative and positive NOE cross-peaks may be obtained for the same oligosaccharide, thus indicating motion at different timescales in different regions of the molecule. Aminoglycoside antibiotics225 and a LeX-related saccharide226 provide two examples of this feature. Since NMR parameters are essentially time averaged, the information deduced from NMR experiments generally corresponds to the time-averaged conformation in solution.227 Regarding the relaxation timescale, ratios from transverse and longitudinal cross-relaxation rates obtained through off-resonance ROESY experiments228 and/or through the comparison of data taken from individual NOESY, ROESY, or tilted rotational nuclear Overhauser effect spectroscopy (T-ROESY) experiments229 may be used to extract local correlation times for different pairs of protons in the oligosaccharide. These ratios are independent of interproton distances and may allow the estimation of specific correlation times. From these correlation times, proton–proton intra- and interresidue distances may be quantitatively extracted. This is probably the best method of choice for quantitative analysis of conformation and molecular motion, while for a semiquantitative analysis, intraresidue signals may also be taken as internal distance references, as a first approximation, to deduce the unknown distances for the target interresidue proton pairs. 13 C-NMR relaxation parameters may also be employed to access the rates of overall and internal motions for saccharide molecules.230 Longitudinal and transversal heteronuclear relaxation times as well as heteronuclear NOEs depend on the molecular motion of the molecule, including overall and internal motions. Thus, careful analysis of these parameters can be employed to demonstrate the presence of conformational hetereogeneity and/or dynamics, as well as the restriction to motion and the timescale of the existing fluctuations. Different examples of application of this methodology to trisaccharides,231,232 tetrasaccharides,233 pentasaccharides,234 and polysaccharides235,236 have been described. With regard to the monosaccharide moieties, the average shape of the pyranose rings may be deduced from the vicinal proton–proton coupling values, also assisted by NOE values and, if possible, other NMR parameters (i.e., RDC, see Section 9.07.1). A very special case within carbohydrate molecules is the iduronic acid ring within glycosaminoglycans.237 This ring is usually exchanging between chair and skew boat conformations.238 43NMR spectroscopy is particularly appropriate for analyzing the conformational equilibrium of this flexible ring,239 as the interconversion between conformers leads to changes in both the dihedral angles between vicinal protons and the intraring H–H distances, which can be monitored by the measurement of scalar spin–spin coupling constants, and NOEs, respectively.240 Proton–proton scalar coupling constants are very sensitive to changes in the dihedral angles and the empirical relationship between both parameters is well known. In addition, NOESY experiments can clearly reveal the presence of the 2SO conformer as only in this conformation, protons at positions 2 and 5 of the ring are close enough as to originate an observable NOE.241 As in all cases, it should always be remembered that the experimental values correspond to the time-averaged conformation in solution. Obviously, the best approach should combine both J and NOE data and the postulated conformation or conformational equilibrium in solution should correlate with all of the NMR data in an unambiguous manner.242

Biomolecular Recognition by Oligosaccharides and Glycopeptides: The NMR Point of View

217

The conformation of oligosaccharides under particular experimental conditions has also been elucidated. Thus, several studies have dealt with glycolipids or other amphiphilic sugar moeties embedded into membrane-like environments, such as micelles or lipid aggregates,243–247 deriving structural, conformational, and dynamic information248 by using different NMR methods, including relaxation,249 and/or dipolar coupling measurements (see Section 9.07.1). The importance of cation binding to modulate the conformational behavior and molecular recognition features of different oligosaccharides, particularly of the aminoglycoside antibiotic family, has also been addressed,250 together with the concomitant implications for the geometric and thermodynamic features of the interaction processes with nucleic acid and protein receptors.251 Also, in the last decade, and following pioneer works in the 1980s,252 substantial advances on the elucidation of many fine details of the conformational and dynamic features of the conformation of natural253 and designed nonnatural O-glycopeptides254 have also been elucidated by using NMR and modeling procedures.255 Conformational differences between peptides glycosylated at either Ser or Thr moieties have been detected.256 Initial ideas on the principles of mucin architecture due to glycol clustering257 or to the existence of MUC1 tandem repeats258 has been shown. Also, within this context, the essential structural motifs for glycoproteins to show antifreeze activity have been deduced.259 In a parallel manner, the principles underlying the conformational behavior of N-linked glycopeptides in solution has also been studied,260 emphasizing the observed differences on peptide conformation upon glycosylation,261 as well as postulating on the molecular basis for the observed glycosylation-induced conformational switchings.262 Finally, the study of the interactions of some glycopeptides with membranes have shown that glycosyl enkephalin analogues adopt turn conformations in amphipathic media.263 9.07.2.3

The Bound State

In some cases, partial264 or complete information265 on the 3D structure of the protein–sugar supramolecule has been derived, using modern NMR methods266 similar to those employed for deriving protein structure in solution, adapted for complexes with carbohydrates. Changes in the dynamic behavior of the protein backbone before and after sugar binding have also been explored using relaxation measurements.267 The requirements are the standard ones for protein structure determination,268 with the use of isotope-labeled receptors. Recent examples include, in different occasions, complexes of glycosaminoglycans269 or sulfated analogues270 bound to different receptors,271,272 or neutral sugars to other lectins.273,274 Regarding protein-bound conformations of carbohydrate molecules, although valuable information may be gained by using X-ray crystallography, transferred (TRNOE) experiments can be used for solution studies, provided that the exchange rate between the bound and the free state is fast.275 In complexes of large molecules, cross-relaxation rates of the bound compound are opposite in sign to those of the free ligand and produce negative NOEs.276 Following this methodology, as pioneered by Prestegard277,278 for studying carbohydrate–protein interactions, many cases have been described.279 Notably, the conditions required to monitor TR-NOEs appear to be satisfied frequently by sugar receptors.280 The reason for this favorable situation probably rests in various facts: these interactions are not extremely strong, there is fast exchange between the free and the bound states of the ligand, and the perturbations of the conformational equilibrium of a given oligosaccharide upon binding to a protein are accessible to observation by transferred nuclear Overhauser enhancement (TRNOE).281 Different mixing times and protein/ligand molar ratios should be systematically used in order to gain quantitative conclusions. Comparison with TR-ROESY282 and/or QUIET-trNOESY280 experiments should also be performed to detect spin-diffusion effects. Depending on the architecture of the protein-binding site, the major conformation of the oligosaccharide existing in solution might be recognized by the receptor, as reported in a variety of cases.283,284 In other cases, a conformational selection process takes place with exclusive recognition of one of the conformational distribution,285 which can be drastically different from the major one in water solution.286 In other cases, the protein can recognize different geometries of the same ligand, provided that no strong ligand–receptor contacts take place to properly define one unique selected conformer.287 From the protein perspective, there are also cases in which the protein skeleton is perfectly preorganized to accommodate the carbohydrate ligand, as reported for the hevein family.288,289 However, cases in which a major

218 Biomolecular Recognition by Oligosaccharides and Glycopeptides: The NMR Point of View

reorganization of the protein takes place upon sugar binding have also been reported, as for the CD44 hyaluronan-binding domain290 or the mannan-specific family 35 carbohydrate-binding module.291 The conformation of oligosaccharides bound to other molecular entities, for instance, nucleic acids has also been derived. A paradigmatic case is that of neamine bound to tRNA(Phe), as deduced by trNOE measurements,292 or the recognition of aminoglycoside analogues by TAR-RNA.293 Recently, saturation transfer difference (STD) NMR methods have also been employed to deduce the bound conformation of different ligands and inhibitors into lectins294,295 and enzymes296 binding sites. Also, the interaction of different saccharides with large particles, such as viruses297–299 or living cells has also been monitored by STD methods.300 Some STD-related NMR-editing methods to discriminate the NMR signals of the ligand and the (isotope-labeled) receptor have also been proposed to efficiently monitor the existence of recognition processes.301 When the off-rate of the dissociation process of the ligand from the bound to the free state is slow, trNOE or STD methods are no longer applicable and alternative NMR methods have to be applied. For instance, the bound conformation of a heparin-derived hexasaccharide to fibroblast growth factor 1 has been derived using halffiltered NOESY experiments, by conducting the experiments on a sample containing 13C-labeled protein.302 In this manner, the protein protons, which are bound to 13C atoms are removed from the NOESY spectrum, which now only contains the saccharide cross-peaks. For different heparin fragments, the trNOE method is still valid to get conformational information in the bound state, using antithrombin III or FGF as receptors.303,304 Recently, the possibility of obtaining dihedral angle information from a ligand in the bound state by exchange-transferred cross-correlation spectroscopy has been reported. This method has also been employed in the carbohydrate field with partial success.305 More examples are still required to further validate this approach to get bound bioactive conformations. Independently of the use of NOEs, the RDC measured after molecular alignment of the target molecule in a strong magnetic field induced by the use of, for instance, a His-tagged protein with a nickel-chelate-carrying lipid inserted into the lipid bilayer-like306 or a dilute liquid crystalline medium has been applied to deduce the orientation of oligosaccharides, both free (see Section 9.07.1) in solution, partially modified with 13C-labeled acetyl groups307 and within protein-binding sites.308 At this point, it should be mentioned that RDCs are the linear weighted average of those for the free and bound state and, therefore, unless a noticeable percentage of ligand is bound, the changes in RDCs can be difficult to manage to get the bound state conformation. Within this idea309 recently, the employment of systems containing paramagnetic probes310 or paramagnetic ionbinding tags, coupled or not311 to membrane-like environments has also been elegantly employed in a successful manner to deduce binding features of sugar–receptor interactions.312,313 9.07.2.4

Conclusions

The conformational study of oligosaccharide molecules in solution is nowadays still a complex problem, but there are a good number of different reported protocols and examples that can be used as models for tackling the problem. The access to high magnetic fields (900 MHz or more in the near future) may allow obtaining better resolved spectra that permit accessing to conformational information in a more straightforward manner.314 The selection of the method to study the bioactive conformation of the saccharides in the receptor site depends on the kinetics of the dissociation process. Thus, for fast processes, trNOE methods can be successfully employed, while for tight binding complexes more sophisticated approaches are a must to access the fine details of the geometrical features.

9.07.3 Bacterial Cell Wall Peptidoglycans and Fragments: Structural Studies and Functions 9.07.3.1

Introduction

The main task of the immune system of higher organisms is to detect the presence of and combat invading microorganisms. The innate immune system, which is highly conserved, is the only defense in invertebrates and plants. In vertebrates that are endowed with an adaptive mechanism as well the innate system constitutes the first line of immune defense. The innate immune system recognizes pathogens using pattern recognition receptors (PRRs) based on unique molecular signatures, called pathogen-associated molecular patterns (PAMPs), which are

Biomolecular Recognition by Oligosaccharides and Glycopeptides: The NMR Point of View

219

absent from the host cells. One of the most important PAMPs is polymeric peptidoglycan (PGN), a supporting constituent present in the cell walls of all bacteria. PGN is, therefore, a main target for the innate immune response. Among the various PRRs, on the surface of (such as CD14315) or inside (such as NODs316), the immune cells that are able to detect and recognize PGN peptidoglycan recognition proteins (PGRP) constitute an important family. In addition to PRGPs there are other soluble PGN recognition molecules like soluble CD14,317 C-type lectins (e.g., the mannose-binding lectin318), mouse RegIII and human HIP/PAP,319 and lysozyme, a muramidase, whose bactericidal activity has long been known (for an early history, see Phillips320). PGRPs and some PGN-hydrolyzing enzymes deserve special attention because only their interactions with PGNs have been characterized, to date, at the molecular level. Virtually, all of our present knowledge on PGRPs refer to proteins of insect or mammalian origin. Although their functions and structures are similar, the modes of action are different for these two classes of immune proteins. In insects immune response is usually induced, after the initial binding event, by a complex cascade of intracellular signaling to generate antimicrobial agents, usually small peptides (for an update, see Sahl321). Mammalian PGRPs have, in addition, a direct bactericidal activity on interaction with PGN and some of them possess catalytic properties as well.322–324 The structures of PGNs and the mechanisms of the host’s immune response have been described in greater detail in several recent reviews;322,323,325–327 therefore, only a brief background will be given below. 9.07.3.2

Peptidoglycan Structure

The outer cell wall of Gram-positive bacteria consists of a thick layer of PGN, accounting for approximately 50% of the cell wall mass,328 intertwined with lipoteichoic acid (LTA), and overlaying the cytoplasmic lipid membrane. The cell wall of Gram-negative bacteria is characterized by a thin PGN layer between two lipid membranes, the outer one harboring the highly immunogenic lipopolysaccharide (LPS) (see Figure 3). The main function of PGN is to maintain the integrity of the bacterial cell against external (e.g., antimicrobial agents) or internal (osmosis) challenges. For this reason it is a prime target to antimicrobial chemotherapy. In addition, it participates in ‘classical’ cell wall functions, such as recognition and regulatory activities, as well. The backbone of PGN is a linear glycan polymer made up of alternating N-acetyl-D-glucosamine (GlcNAc) and N-acetylmuramic acid (MurNAc: 3-O-(D-2-carboxy)ethyl-GlcNAc) units linked together with -(1–4)glycosidic bonds. Short peptides consisting of four or five alternating D- and L-amino acids are attached to MurNAc via the 3-O-carboxyethyl group. The glycan backbone is highly conserved in all bacteria (small functional group modifications notwithstanding329), the amino acid at position 3 of the peptide chain is, however, variable: being L-lysine in most Gram-positive bacteria and meso-diaminopimelic acid (mDap) in most Gram-negative ones (for exception, however, see, e.g., Lee and Hollingsworth330). L-lysine is replaced by 331 L-ornithine in Lactobacillus fermentum. Cross-links between the pending peptide chains are established either by direct amide bond between mDap in one strand and D-Ala of the other (Gram-negative) or via an oligoglycine interpeptide bridge connecting L-Lys to D-Ala on adjacent strands (Gram-positive) (see Scheme 4). Other interpeptide connections may occur, however, depending on the bacterium strain.332 It is LTA

PGN Lipid bilayer

PGN

Figure 3 Schematic cross sections of the cell wall of Gram-positive (left) and Gram-negative (right) microorganisms. The light blue and pink features embedded into the lipid bilayers represent integral membrane proteins and lipopolysaccharide (LPS), respectively.

Scheme 4 Chemical structures of the cell wall peptidoglycans and points of attack by different lytic enzymes.

Biomolecular Recognition by Oligosaccharides and Glycopeptides: The NMR Point of View

221

of note that the peptide moieties display several ‘unnatural’ features such as the nonproteinogenic amino acid mDap or L-Orn, others with D-configurations and, the unusual peptide bond engaging the -carboxyl of Gln (therefore it is often labeled as iGln) rather than the -one as in proteins. Evidently, these nonconventional features are part of the weaponry the microorganisms use to overcome host defense based, for instance, on proteolytic enzymes. Yet, several enzymes are capable to attack and cleave PGN at different sites. Historically, hen egg-white lysozyme, or muramidase, was the first enzyme whose bacteriolytic property was attributed to its hydrolytic activity to cleave the -glycosidic bond between MurNAc and GlcNAc (for structure and early history, see Phillips320). Other enzymes involved in the cleavage of PGN are indicated in Scheme 4. For a recent review on PGN hydrolases, see Vollmer.329 Structural aspects of the interactions of proteins involved in the recognition or cleavage of PGNs will be discussed in Section 9.07.3.5. The orientation of the PGN chains in the cell wall is the subject of some controversy. In the so-called scaffold arrangement the glycan chains are arranged perpendicular to the cytoplasmic membrane333 whereas in the horizontal model the glycans and cross-linked peptides are oriented parallel to it.334 The former would result in a honeycomb-like structure with pores large enough to accommodate proteins that interact with PGN. Recent molecular modeling calculations based on the NMR solution structure of a dimer of the repeating unit of Lys-type PGN (cf. Section 9.07.3.3.2)335 seem to support the former model. Different models for the architecture of PGN are discussed in a recent review by Vollmer.336 Solid-state NMR is a powerful emerging technique for structural studies of giant molecules like the PGNs. Very recently, a through-bond 13 C-correlation (2D) spectrum of surprisingly good quality could be obtained of a whole, uniformly 13 C-labeled, PGN sacculus (estimated molecular weight (MW) 3 106 kDa). In addition to chemical shift assignments, various NMR experiments allowed to establish that the glycan strands are more rigid than the peptide stems and to study PGN–protein interactions.337

9.07.3.3

Peptidoglycan Fragments

9.07.3.3.1

Syntheses MurNAc-L-Ala-D-iGln (‘muramyl dipeptide’: MDP) was found as the minimal structure to possess biological activity.338–341 Following this discovery hundreds of derivatives have been synthesized and tested in vitro and in vivo (cf. Section 9.07.3.6). Recent examples of the chemical syntheses of small PGN sequences or analogues include the following. Classical solution phase methods were used to obtain MurNAc-pentapeptide (MPP: MurNAc-L-Ala-D-iGln-L-Lys-D-Ala-D-Ala, in the form of Me- or SPh glycosides).342 MPP with the anomeric OH phosphorylated was also prepared as an intermediate toward the total synthesis of Lipid I.343,344 Similarly, synthesis of the complete repeating unit of Lys-type PGN (GMPP: GlcNAc-(1–4)MurNAc-L-Ala-D-iGln-LLys-D-Ala-D-Ala) was reported (with MurNAc OH-1 phosphorylated) in connection with the total synthesis of lipid II, an intermediate of the biosynthesis of PGN.345,346 GMPP and MurNAc-tetrapeptide were also synthesized by Hesek et al.347 The same group also reported the synthesis of the tetrasaccharide-containing fragment of Lys-type PGN: a dimer of the above unit (GMPP2: GlcNAc-(1–4)MurNAc-(L-Ala-D-iGln-LLys-D-Ala-D-Ala)-GlcNAc-(1–4)MurNAc-(L-Ala-D-iGln-L-Lys-D-Ala-D-Ala)).348 Construction of Dap-type muropeptides required procedures for the synthesis of the unusual amino acid mDap.349 This was accomplished either by a multistep procedure starting from N-benzyloxycarbonyl-L-glutamate350 or by taking advantage of a metathesis cross-coupling reaction.351 The first syntheses of mDap-containing muramyl- and 1,6-anhydromuramyl peptides (TCT analogues) were then accomplished by Kubasch and Schmidt.350 MurNAc-tri- to pentapeptides, either with L-Lys or mDap as the third residue in the peptide stem, were also obtained using solid-phase peptide synthesis.352 For studies of interactions with PGRPs (vide infra), a cross-linked Lys-type PGN model was constructed by interconnecting the D-Ala ends of two identical MurNAc tetrapeptides.353 Di-, tetra-, and octasaccharide fragments containing the full repeating glycan sequence (i.e., GlcNAc-(1–4)MurNAc) and truncated peptide chains (i.e., L-Ala-D-iGln) were prepared using a block synthesis approach.354 The same group recently reported the syntheses of tetra- and octasaccharide fragments of the Lys-type PGN with tri-, tetra-, and pentapeptide chains, respectively, attached to MurNAc.355,356 The tetrasaccharide tetrapeptides proved to be competitive inhibitors of the melanization cascade, an important immune defense mechanism in arthropods.357

222 Biomolecular Recognition by Oligosaccharides and Glycopeptides: The NMR Point of View

9.07.3.3.2

Structural studies The full repeating unit of a Gram-negative PGN (Dap-GMPP (in earlier literature it was labeled as PGM (for peptidoglycan monomer) but, for consistency, it will be abbreviated as Dap-GMPP henceforth): GlcNAc-(1–4)-MurNAc-L-Ala-D-iGln-mDap(!NH2)-D-Ala-D-Ala) was first isolated from the cell wall of Brevibacterium divaricatum358 but, to our knowledge, its chemical synthesis has not, apparently, been reported to date. The conformation of Dap-GMPP was determined in aqueous solution using NMR-restrained molecular modeling. 359 The preferred conformation of the peptide chain is characterized by a well-defined torsion angles around rotatable bonds from C3–O(MurNAc) through N–C(Ala1); the C-terminal part, on the other hand, exhibits more conformational flexibility. The Dap-GMPP molecule was found, furthermore, to exhibit an amphipatic character with wellseparated lipophilic side chains of the Lac and Ala residues at one side and a hydrophilic domain constituted by the charged Dap side chain and the disaccharide hydroxylic groups.359 This property is certainly important in binding to enzymes as indicated by a computational model, based on an X-ray structure, of PGN binding to DD-transpeptidase, an enzyme participating in PGN biosynthesis. The model indicated binding of the hydrophilic glycan strands to hydrophilic grooves on the enzyme surface.360 Dap-GMPP is a potent, nontoxic, nonpyrogenic immunostimulator (for references, see Halassy et al.361). To investigate the influence of lipophilic versus hydrophilic character on biological activity, lipophilic derivatives of Dap-GMPP bearing either (adamant-1-yl)-acetyl- or Boc-Tyr substituents at the "-amino group of Dap (Ad-Dap-GMPP and BocTyr-Dap-GMPP, respectively, (Scheme 5) were synthesized362 and their solution structures determined by NMR and molecular modeling.363 The conformations of these lipophilic derivatives in DMSO exhibited similar characteristics to those found for Dap-GMPP in water,359 with slightly more disordered C-terminal residues. For DapGMPP itself, the C-terminal conformations were different in the two solvent: NOEs measured in DMSO indicated steric proximity between the charged end groups (Ala5–COO and Dap3 NH3 þ ) because of electrostatic pull. Missing of such ordering in water solution was attributed to the polar water molecules effectively shielding of the explicit charges by hydratation.363 Recent solid-state NMR results, albeit measured on intact PGN, seem to be in agreement with this solution-state peptide conformation.364 The structure of the Lys-type dimer, GMPP2 (vide supra) in water solution was recently determined using NMR-restrained molecular modeling computations.335 The general features of this structure, such as relative rigidity of the glycan backbone, together with the N-terminal peptide part, increased flexibility at the peptide C-terminal, are similar to those of the Dap-GMPP.359 The glycosidic torsion angles are also very similar, indicating virtually identical glycan backbone conformations in solution for GMPP2 and Dap-GMPP. Surprisingly, however, preferred conformations around the MurNAcpeptide junction (i.e., D-Lac, Ala1) in GMPP2 were found markedly different from those in DapGMPP, both in water solution. The structural significance of this difference will be discussed in Section 9.07.3.5.

9.07.3.4

Peptidoglycan Recognition Proteins

PGRPs are recently discovered365–367 pattern recognition factors that play important roles in the innate immunity from insects to mammals. The current state of the art regarding PGRPs has been more than adequately covered in recent reviews.322,323,368 Based on these reviews, only a brief summary will therefore be given here focusing on mammalian PGRPs mainly for reasons mentioned in Section 9.07.3.1. Insect PGRPs are involved in Toll, Imd, or prophenoloxidase cascade signaling mechanisms to generate immune response. In mammals, four PGRPs have been identified thus far: PGLYRP1 to PGLYRP4 (formerly: PGRP-S, PGRP-L, PGRP-I, and PGRP-I, respectively). The main function of mammalian PGRPs seems to be a direct bactericidal action rather than functioning as PRRs, as is the case for their insect counterparts.326 On the basis of X-ray studies and molecular modeling, it was recently suggested that the bactericidal effect is due to the inhibition of cell wall synthesis by sterically blocking the access of biosynthetic enzymes to the nascent PGN chains and also

Biomolecular Recognition by Oligosaccharides and Glycopeptides: The NMR Point of View

223

Scheme 5 Structures of Dap-GMPP (1), the repeating unit of a Gram-negative PGN from the cell wall of Brevibacterium divaricatum, Ad-Dap-GMPP (2), and BocTyr-Dap-GMPP (3). Modified with permission from K. Fehe´r; P. Pristovsˇek; L. Szila´gyi; D. Ljevakovic´; J. Tomasic´, Bioorg. Med. Chem. 2003, 11 (14), 3133–3140.

by forcing them into a conformation that prevents formation of cross-links between the peptide stems369 (cf. Section 9.07.3.5). This mechanism is reminiscent of that suggested for glycopeptide antibiotics such as vancomycin, ramoplanin, and actagardine (cf. Section 9.07.4). Nevertheless, several aspects regarding the functions of PGRPs in mammals is still not clear.322,326,370,371 It was suggested, for instance, that PGLYRP2, a catalytic PGRP with N-acetylmuramoyl-L-alanine amidase activity,372,373 might function to turn off excessive immune response325 by hydrolizing PGN.

9.07.3.5

Interactions of PGRPs and Other Proteins with PGN Fragments: Structural Studies

Considerable efforts have recently been focused to elucidate the structural basis of PGRPs responsible for PGN binding using X-ray crystallography. At least seven crystal structures for PGRPs of insect or human origin have been reported during the last 5 years.322,323 All PGRPs contain at least one C-terminal PGN-binding domain, which is approximately 165 amino acids long and shares an approximately 30% sequence homology to bacteriophage T7 lysozyme and other bacterial type 2 amidases (enzymes that hydrolyze the amide bond between the lactyl moiety of MurNAc and L-Ala of PGN).322,323,368 All amidases, including catalytic PGRPs (such as PGLYRP2, cf. Section 9.07.3.4), have a Zn2þ binding site: the metal ion is essential for the catalytic activity. Some insect PGRPs are

224 Biomolecular Recognition by Oligosaccharides and Glycopeptides: The NMR Point of View

transmembrane proteins, all mammalian PGRPs are, however, secreted in the form of homo- or heterodimers, which are disulfide linked. Human PGLYRP3 and PGLYRP4 both contain two PGN-binding domains; this might serve to increase the binding affinity to PGN by a multivalent effect. On the other hand, two binding domains may confer multiple binding specificities for these PGRPs to recognize PRRs of different pathogens and, enhancing thereby their antibacterial efficiencies.322 Insect PGRPs do not form disulfide-linked dimers. All structures exhibit similar overall folds, as determined by X-ray crystallography, comprising a central five-stranded -sheet and three helices, and are cross-linked by 1–3 disulfide bonds. All PGRPs contain, in addition an N-terminal chain of 30–50 residues, which is highly variable with respect to both sequence and conformation; the functional significance of this segment is yet to be elucidated. To understand the structural basis for the recognition of PGN by PGRPs, complexes of PGRP with PGN would have to be subjected to structure determination methods. Polymeric PGN is, however, too large and noncrystalline and therefore unsuitable for NMR or X-ray studies studies. For this reason complexes of noncatalytic PGRPs with small PGN fragments have been subjected to X-ray crystal determination. The first structure was reported for the C-terminal binding domain of human PGLYRP-3 (PGRP-I) in complex with MurNAc-L-Ala-D-iGln-L-Lys (MTP of Gram-positive bacteria), revealing an extensive network of H-bond and van der Waals contacts of MTP with 16 residues of the protein’s binding cleft. Most contacts were to the peptide part of MTP and only a few to the MurNAc. No significant conformational change of the protein was detected as a result of the binding.374 A subsequent structure determination of the same protein complexed, however, with a larger fragment, including the complete peptide stem of PGN, that is, MurNAc-L-Ala-D-iGln-L-Lys-DAla-D-Ala (muramyl pentapeptide, MPP), revealed ligand-induced conformational changes in the binding cleft, and this was suggested to occur in many PGRPs.375 Most of the protein–ligand contacts involve the peptide part of MPP establishing hydrogen bonds between the main chain atoms of the peptide and 20 PGRP residues. Tracheal cytotoxin (TCT, cf. Section 9.07.3.3.1) induces heterodimerization between PGRP-LCa and PGRP-LCx; this is the first step to trigger immune response in Drosophila by activating the Imd pathway.376 PGRP-LCa and PGRP-LCx are both transmembrane proteins with different PGN-binding domains in their extracellular parts. The crystal structure of the ectodomains of PGRP-LCa and PGRP-LCx bridged by TCT shows that TCT binds to LCx in the ternary LCx–TCT–LCa complex through its peptide chain, in a similar way as MPP binds PGLYRP-3, and exposes the disaccharide part for interaction with LCa.377 This is in line with results obtained by biochemical methods.332 Both proteins undergo induced-fit conformational changes during this process. Bringing together the PGRPLCx and PGRP-LCa receptors is necessary to trigger the activation step for the Imd pathway.377 Drosophila PGRP-LE, a soluble protein involved in the Imd signaling, also binds TCT.378 A crystallographic study revealed that TCT induces an infinite head-to-tail dimerization in which the disaccharide moiety occupies the dimer interface;379 this is analogous to the TCT-induced heterodimerization of PGRP-LCx and PGRP-LCa. Dimerization of PGRP-LE increases its binding strength to TCT by approximately 30-fold (KD 30 nmol l1). In both cases the Dap carboxylate group engages in a key electrostatic interaction with the guanidino side chain of an Arg residue of the protein: this provides a basis for the discrimination between Gram-negative and Gram-positive (Dap or Lys, respectively) types of PGN by PGRPs. 377,379 Other studies have, however, suggested Asn-236 and Phe-237 to be the key residues in PGLYRP-3 responsible for the discriminatory recognition of Daptype or Lys-type PGNs.352,353 Mutation of just these two residues suffices to change the specificity from one type to the other. This is an indication for the adaptive character of the innate immune system enabling a quick immune response to new microbial challenges.353 The crystal and molecular structure of PGLYRP4 C-terminal domain (PGLYRP4-C) in free form and in complex with the complete repeating unit of Gram-positive PGN, GlcNAc–MurNAc-L-Ala-DiGln-L-Lys-D-Ala-D-Ala (GMPP), was reported recently.369 The bound conformation of GMPP was compared with two reported solution structures, obtained by NMR-constrained molecular modeling: one for the dimer of GMPP (GMPP2)335 and another one for a Dap-type PGN monomer (Dap-GMPP, cf. Section 9.07.3.3.2).359 In the PGLYRP4-C–GMPP complex the glycosidic torsion angles between

Biomolecular Recognition by Oligosaccharides and Glycopeptides: The NMR Point of View

225

GlcNAc and MurNAc change significantly, with respect to the solution conformations of either Dap-GMPP or GMPP2. Particularly, the (C1–O1(GlcNAc)–C4–C3(MurNAc)) value in the crystal structure (161 ) indicates a higher energy conformation for the disaccharide part than in solution (118 in the reference DapGMPP359 or 126 in GMPP2335). On the other hand, the peptide orientation in the bound state, as indicated by the D-Lac torsions, is very close to that found for Dap-GMPP in solution359 rather than that for the nonbound GMPP2.335 The peptide stems of other muropeptides (MPP or TCT) are also locked in this conformation when bound to various PGRPs374,375,377,379 so these PGN-binding proteins appear to consistently favor one of the two possible conformations for D-Lac. Modeling of the PGLYRP4-C–GMPP2 complex indicated that there would be a steric clash between the protein and the peptide stem of GMPP2 should the latter retain its solution conformation. Based on a computational modeling of the cell wall PGN structure335 it was suggested that the peptide orientation in this conformation is unfavorable for cross-linking during PGN biosynthesis. On the other hand, distortion of the glycan backbone conformation upon binding to PGRPs is likely to prevent the functions of glycan synthesizing enzymes by steric hindrance. It was hypothesized that the disruptive action of PGRPs on bacterial cell wall formation might be attributed to this double mechanism.369 Muramidases or lytic transglycosylases (LTs) are hydrolyzing the MurNAc–GlcNAc glycosidic bond and play a role, just as lysozyme (Section 9.07.3.2), in the turnover and recycling of PGN to facilitate cell growth and division. Their mechanisms of action is, however, different from that of lysozyme. Unlike the latter, and most of other glycoside hydrolyses, which require participation of two catalytic carboxylates at the active site, LTs have a single catalytic residue and may utilize the N-acetyl group of MurNAc to provide anchimeric assistance in catalysis similar to the mechanism described for chitinases. As a result, hydrolysis of the interglycosidic bond between MurNAc and GlcNAc by LTs does not produce a reducing MurNAc end with a free glycosidic OH group but rather an 1,6-anhydro bond is formed via nucleophilic attack by the 6-OH group of the same MurNAc residue.380 Recent crystallographic studies on the structures of protein complexes with PGN or chitin fragments have significantly contributed to our understanding of the reaction mechanisms, PGN dynamics and immunological aspects of these systems. LTs are generally membrane-anchored proteins, except the 70 kDa Slt70, which is soluble. Its crystal structure in complex with GlcNAc-(1–4)-MurNAc(1,6-anhydro)-L-Ala-D-iGln-mDap (G(anh)MTP) revealed a shallow groove, adjacent to the PGN-binding site, for the binding of this muropeptide. The structure furthermore confirmed the presence of a specific binding site for the peptide part of G(anh)MTP and it was suggested that Slt70 starts the cleaving reaction at the MurNac end of the PGN chain.381 A functionally similar enzyme is the 18 kDa lytic transglycosylase from bacteriophage lambda (LaL). Its crystallographic structure, determined in its from complexed with hexa-N-acetylchitohexaose, represents the first example of a lysozyme in which all binding subsites are occupied.382 Slt35 is a fully active, soluble form of the integral membrane transglycosylase MltB. Four sugar-binding sites and two peptide-binding sites were identified in this protein by X-ray crystallography of its complex with the muropeptide GMDP.383 It is of note in this context that the minimal structure needed to activate the Toll immune pathway in Drosophila was found to be a muropeptide dimer, GlcNAc–MurNAc-L-Ala-D-iGln-L-Lys(D-Ala-D-Ala)-(Gly)5-L-Lys[(Gly)5]-D-iGln-L-Ala-MurNAc– GlcNAc, that is, a PGN fragment with four sugar residues.384 A recent crystal structure of an inactive mutant (D308A) of MltA in complex with chitohexaose has shown that all six sugar residues are bound in the active site of the enzyme and binding induces a large reorientation of two structural domains of the enzyme. Although the natural substrate of MltA is PGN rather than chitin, implications for PGN hydrolysis were drawn from a model of the (GlcNAc–MurNAc)3 complex built on the basis of the MltA(D308A)-chitohexaose crystal structure. Based on this model it was suggested that the cleavage of glycosidic bond is facilitated by a high-energy halfchair conformation of the pyranose ring bound at the active site.385 This distortion is, interestingly, very similar to that proposed for lysozyme catalysis.386,387 This protein does not possess, however, unlike other membrane transglycosylases (MLTs), binding sites for the peptide part of PGN.385 Another enzyme that plays an important role in PGN breakdown in processes like cell wall turnover, cell separation, or sporulation is CwlC amidase from Bacillus subtilis, which hydrolyzes the amide bond between the lactyl group of MurNAc and L-Ala. The solution structure of the C-terminal domain of CwlC was determined by 3D and 4D NMR using uniform 15N- and 13C-labeling by taking advantage of h3JNC9 hydrogen bond restraints and 1DNH dipolar couplings.388 The PGN-binding region was explored by following chemical shift changes in the 1H–15N HSQC

226 Biomolecular Recognition by Oligosaccharides and Glycopeptides: The NMR Point of View

spectra during titration with non-well-defined soluble PGN digests. Two equivalent, symmetrically located binding sites were proposed and it was suggested that multivalency effects, extensively studied in lectin–carbohydrate interactions, might play a role in the CwlC–PGN interaction as well.388 9.07.3.6

Physiological Activities of Muropeptides

It was known long ago that bacterial cells were capable to boost the immune response in certain diseases: ‘Freund’s adjuvant’ to treat pulmonary infection consisted of a suspension in oil of killed mycobacteria.389 Later it was recognized that the cell wall PGN was responsible for inflammatory syndromes in several diseases like arthritis, meningitis, or septic shock. In a quest to identify the structural basis of immunogenecity, smaller PGN fragments were synthesized and analyzed (for reviews, see Chedid et al.,390 Azuma,391 and www.curehunter.com/ m/keywordSummaryC033575.do). The Dap-type PGN monomer, Dap-GMPP (cf. Section 9.07.3.3.2) isolated from the cell wall of the Gram-negative Brevibacterium divaricatum is a nontoxic, nonpyrogenic immunostimulator. Its adjuvant activities on the immune system of mice challenged with ovalbumin (OVA) have been thoroughly investigated. For example, Dap-GMPP enhanced the immunogenicity of peptides of measles virus origin.361 Lipophilic derivatives of Dap-GMPP bearing either (adamant-1-yl)-acetyl- or Boc-Tyr substituents at the "-amino group of Dap were shown by NMR and molecular modeling to assume conformations different from that of the parent PGM in solution363 (cf. Section 9.07.3.3). Their immunostimulating activities were, however, comparable to that of the parent Dap-GMPP.362 Tracheal cytotoxin (TCT), a muropeptide derived of Gram-negative PGN, such as the cell wall of Bordetella pertussis, elicits immune responses in Drosophila.392 In addition to immunostimulating activity393 this muropeptide is a very potent somnogenic. This activity may be related to the conformation of the 1,6-anhydro-bridged MurNAc that is very different (1C4) from that of the nonbridged, monocyclic glucopyranose ring (4C1). Hundreds of small MW muropeptides, mostly synthetic or synthetically modified PGN fragments, possess multiple biological activities, influencing the immune response from insects to mammals. These aspects have been discussed in detail in a recent review.394 Growth of the bacteria involves breakdown and resynthesis of the cell wall. Muropeptides released from the cell wall PGN during these processes are mediating a much broader range of interactions, other than immune signaling, between bacteria and other organisms. For instance, PGN fragments, such as TCT, play a role in the pathogenesis of several bacterial infections, such as those caused by the Gram-negative Helicobacter pylori or by the Gram-positive Listeria monocytogenes and several other bacteria as well (for a review, see Boneca395). There are indications that PGN fragments might induce immune responses in plants during host–pathogen interactions and mediate various symbiotic interactions between bacteria themselves and between bacteria and eukaryotes. All these intriguing roles of muropeptides have been summarized in a recent review.396 9.07.3.7

Conclusions

Bacterial cell wall peptidoglycans (PGN) and fragments play important roles in the immune response of higher organisms against bacterial infections, and mediate various symbiotic interactions between bacteria themselves and between bacteria and eukaryotes. Recent structural studies of peptidoglycans and smaller fragments thereof have significantly enhanced our understanding of the underlying molecular mechanisms and contributed therefore to develop improved strategies to fight bacterial infections. Experimentally, X-ray diffraction and NMR spectroscopic techniques play major roles in structure elucidation and in studies of the relevant molecular interactions, eminently, those between various recognition proteins and PGN-related carbohydrates. Following characterization of PGN structural features, synthetic approaches to smaller PGN fragments and determination of their 3D structures by NMR and X-ray techniques is discussed in this section. Among proteins interacting with PGNs peptidoglycan recognition proteins (PGRPs) emerged recently as major pattern recognition factors that play important roles in the innate immunity from insects to mammals. Their structures, together with some lytic proteins involved in the cell wall biosynthesis and breakdown, and their complexes with muropeptides were extensively investigated and elucidated by X-ray and NMR methods in the last decade. In view of the intriguing physiological activities of a great number of muropeptides and the new knowledge generated by the structural investigation outlined the prospects for development of efficient therapeutics against the alarmingly increasing bacterial resistance seem promising.

Biomolecular Recognition by Oligosaccharides and Glycopeptides: The NMR Point of View

227

9.07.4 NMR of Glycopeptide (Vancomycin-Type) Antibiotics: Structure and Interaction with Cell Wall Analogue Peptides 9.07.4.1

Introduction

For the pessimist it appears that we may be losers in the fight against the often lethal superbugs. The prevalence of superbugs, such as methicillin-resistant Staphylococcus aureus (MRSA) and vancomycin-resistant Staphylococcus aureus (VRSA) is increasing rapidly both in hospitals and in the community. Until recently, the glycopeptide vancomycin remained the only antibiotic effective at infections caused by multiresistant Gram-positive bacteria and they have been considered as the last line of defense against MRSA. However, treatments with vancomycin for the past decades has led to the appearance of staphylococcus and enterococcus strains that are resistant to vancomycin and all known glycopeptides. GRE (glycopeptide-resistant enterococci) and GISA (staphylococci with an intermediate resistance to glycopeptides) infections are especially threatening. This section focuses on the molecular basis of the bacterial cell wall biosynthesis inhibition by glycopeptide antibiotics for two reasons. The first reason is that vancomycin has been in the frontlines of the antibacterial combat for nearly 30 years, which is unusually long before resistance appears. The second reason is that only few biomolecular recognition models could be studied at an atomic resolution to such an extent as glycopeptides and cell wall analogue molecules or stable isotope labeled cell walls. For solution NMR or X-ray studies the cell wall is often mimicked using oligopeptides terminating in D-Ala-D-Ala sequence. The interaction with ‘intact’ cell wall can be best achieved using solid-state NMR close to in vivo conditions and the research on the field has shifted to this direction. However, there are fine details, especially dynamics upon ligand binding, that await for some more experimental and theoretical studies.

9.07.4.2

Basics of Mode of Action

9.07.4.2.1

Structure of the cell wall Surprisingly, the 3D structure of the cell wall of Gram-positive bacteria has only recently been solved335 (see also Section 9.07.3). The cell wall’s basic structural unit has long been known to be a peptidoglycan, N-acetylglucosamine (NAG)–N-acetylmuramic (NAM)-pentapeptide. Meroueh et al., who recently synthesized a NAG–NAM(pentapeptide)–NAG–NAM(pentapeptide) dimer, have now succeeded in determining the structure of this 2 kDa peptidoglycan by solution NMR spectroscopy. Their analysis reveals an ordered, righthanded, helical saccharide conformation, with a set of repeated glycosidic torsion angles that lead the authors to suggest an oligomer structure with three NAG–NAM repeats per turn. Computer models of the cell wall structure, assuming such threefold symmetry as well as incomplete cross-linking, show a honeycomb pattern with pores ranging in size from 70 to 120 A˚. These pores are large enough to accommodate the cell wall’s own catalytic machinery as well as channel proteins and other macromolecules. Furthermore, opposite to beliefs, the orientation of the glycan strand is orthogonal to the membrane and not parallel as had been previously assumed by many, based on the structures of chitin and cellulose, the two other main wall-forming -1,4-linked glycan biopolymers in nature. 9.07.4.2.2

Main targets of glycopeptide antibiotics The layer of the bacterial cell wall that ensures strength is this covalently cross-linked peptidoglycan. The larger the fraction of adjacent peptide strands that are connected by action of transpeptidases, the higher the mechanical strength to osmotic lysis. Transglycosylases act on the glycan strands to extend the sugar chains by incorporation of new peptidoglycan units from N-acetylglucosamine-1,4-N-acetylmuramyl-pentapeptidepyrophosphoryl-undecaprenol (lipid II). The vancomycin family of glycopeptide antibiotics target the peptidoglycan layer in the cell wall assembly. Briefly, the mode of action of vancomycin antibiotics is through the binding of peptidoglycan cell wall fragments terminating in a D-Ala-D-Ala sequence to the carboxylate anionbinding pocket of the antibiotic as suggested by Nieto and Perkins397 and Perkins398 and later confirmed by the Williams group in Cambridge.399–402 Vancomycin ties up the peptide substrate and thereby prevents it from reacting with either the transpeptidases or the transglycosylases. The net effect is the same: failure to make peptidoglycan cross-links leads to a weaker wall that predisposes the treated bacteria to a killing lysis of the cell

228 Biomolecular Recognition by Oligosaccharides and Glycopeptides: The NMR Point of View

wall layer. The concave binding pocket of the vancomycin antibiotic makes five hydrogen bonds to the D-AlaD-Ala dipeptide terminus of each uncross-linked peptidoglycan pentapeptide side chain, which accounts for the high affinity of the antibiotic for its target, both in partially cross-linked walls and in the lipid II intermediate. Interestingly, some semisynthetic derivatives of eremomycin, vancomycin, teicoplanin, and some other glycopeptides were found to exhibit activity against viruses,403 for example, HIV-1, HIV-2, and Malonisarcoma retrovirus. 9.07.4.2.3

Common structural features of the glycopeptide antibiotics The first known structure within the glycopeptide family of antibiotics was vancomycin. Vancomycin, the clinically most important glycopeptide was discovered in 1956; however, its structure was disclosed only 25 years later, after correcting404,405 the previous X-ray structure of CDP-1406 by NMR.407 Since that time hundreds of related glycopeptides were discovered or prepared as semisynthetic compounds. More than 12 000 papers were published about the vancomycin topic within the last 20 years. Some recent reviews408–413 give in-depth accounts of the topic. Here we just summarize some basic structural features of the main representatives (vancomycin, eremomycin, teicoplanin, ristocetin-A) in Scheme 6. Hitherto, all known glycopeptides have a heptapeptide aglycon and the residues are numbered starting from the N terminus (1–7). The polypeptide portion of vancomycin consists of an N-methyl-leucine at position 1 and substituted phenyl glycines at positions 2, 4, 5, 6, and 7, furthermore an asparagine at position 3. In all tested cases the stereochemical configurations of R,R,S,R,R,S, and S were found in vancomycin antibiotics. In many of these glycopeptides, residues 1 and 3 are aliphatic like in vancomycin. In other cases residues 1 and 3 are also aromatic amino acids, for example, in ristocetin and in teicoplanin and not surprisingly, they are substituted phenyl glycines. Aromatic rings 2–4 and 4–6 are cross-linked via etheric oxygens, and if residues 1 and 3 are aromatic, they may be cross-linked similarly. The 5, 7 phenyl rings are directly connected with carbon–carbon bond. The mesh of cross-linked aromatic rings lends an enhanced rigidity to the backbone of the glycopeptides. Moreover, the chlorine-bearing aromatic rings of residues 2 and 6 are prohibited from rotating by other atoms in their cycles (with residue 4). Thus, these rings give rise to stable and distinct rotational isomers, that is, atropisomers.414,415 Vancomycin is tricyclic with three ring systems, which gives eight possible atropisomers, but the natural product is a single isomer. In vancomycin and other glycopeptides that have one chlorine atom on one or both rings, the chlorine on ring 2 is on the edge facing away from the ligand-binding site, whereas on ring 6 it is on the edge facing toward the ligand-binding site. The peptide bond between residues 5 and 6 exists in the less stable cis configuration; however, the cis arrangement dominates glycopeptide antibiotics. The role of carbohydrates in glycopeptide antibiotics is somewhat less appreciated: they make the glycopeptides water soluble in spite of the hydrophobic aromatic rings. Glucose units of di- or oligosaccharides are typically bound to the central aromatic ring 4 with -glycosidic linkage. They are often linked to positively charged amino sugars (e.g., vancosamine, eremosamine ¼ 4-epi-vancosamine) by an (1–2) glycosidic bond. The 180 conformational flips of these oligosaccharides explain the generation of slowly interconverting asymmetric homodimers416 in aqueous solution. In the system of eremomycin complexed with an unnatural ligand, the positively charged amino group of the eremosamine monosaccharide at residue-6 forms an ‘intradimeric’ salt bridge to the bound carboxylate anion of the ligand. Also, carbohydrate–carbohydrate recognition was postulated between the two disaccharides at the two halves of the dimer.

9.07.4.3

NMR Methods for Solution Structure

According to known structures, the molecular weight of the monomeric glycopeptides is in the 1.3–2.1 kDa range. In aqueous solution (in vitro conditions) nearly all of the vancomycin antibiotics form stronger or weaker noncovalently bound dimers (except teicoplanin) at concentrations useful for NMR; consequently, the effective molecular weight – especially when they are complexed with ligands – approaches the 4–5 kDa range, which is close to the mass of small proteins. In the beginning, nearly exclusively 1H NMR methods were applied. The early application of negative homonuclear NOEs (nuclear Overhauser effect) in the slow motion regime399,417 was a landmark to determine intermolecular distances. The NOE experiments were invaluable like the 2D NOESY for protein structures introduced by Ernst418 and Wu¨thrich.419

Scheme 6 Structure of some important glycopeptide antibiotics.

230 Biomolecular Recognition by Oligosaccharides and Glycopeptides: The NMR Point of View

NMR studies provided NOE-derived distances between ristocetin and Ac-D-Ala-D-Ala that established the bound position of the dipeptide as lying antiparallel to the antibiotic backbone in the concave exterior of the molecular surface.400 Since the 1980s were connected with the rapid development of the basic 2D-NMR methods, COSY, TOCSY, NOESY, ROESY, HETCOR, HSQC, and HMBC techniques could be readily implemented to structural works at natural isotope abundance (see also Section 9.07.1). In fact, the general solution NMR methods applicable to oligopeptides and peptide–peptide interactions are useful to glycopeptides. A few textbooks can be cited for more details and practical hints.420–422 For solution NMR studies isotope labeling was sparse: 15N labeling of eremomycin423 and 13C incorporation to C1 of Ac-D-Ala-D-Ala424 since the general sensitivity for assignment purposes is sufficient. Among others, 13C NMR signal assignments of eremomycin,425 vancomycin,426 and a few aglycons427 were published. Application of 3D techniques would be an advantage for congested spectra of dimers in the absence of ligands; however, no applications were reported. 9.07.4.4

Comparison of Crystal and Solution Structures – Dimerization and Ligand Binding

The first two reports of the crystal structure of vancomycin complexed with an acetate anion were independently published404,405 as milestone contributions. It was concluded that vancomycin dimerized in a ‘back-to-back’ configuration, with segments of antiparallel structure forming both at the ‘back’ of each monomer, across the dimer interface between heptapeptide backbones, and at the ‘front’ of each monomer, between the drug and its ligand. In the crystal,404 one of the binding pockets is occupied by an acetate ion that mimics the C terminus of the cell wall peptide; the other is closed by the asparagine side chain, which occupies the place of a ligand. The occupied binding pocket exhibits high flexibility but the closed binding pocket is relatively rigid. X-ray crystallography has since verified this back-to-back antiparallel configuration in balhimycin428,429 crystals, and has also shown that the aromatic rings and the sugar residues in these compounds engage in an extensive array of interactions across the dimer interface. The dimeric structure of ureido-balhimycin was established by NOEs and distance geometry calculations proved the antiparallel orientation of the two monomers.430 Important reports on vancomycin crystals complexed with NAc-D-Ala suggested a ligand-mediated dimerization mode for vancomycin431 or with weakly binding ligands432 were published. Concerning the cooperativity between dimerization and ligand binding in vancomycin, it was suggested433 that hydrogen bonds across the dimer interface may reduce dynamic fluctuations of the heptapeptide backbone and thereby stabilize hydrogen bonds between the backbone and the ligand. The crystallographic data appear to weigh against this explanation because the shape of the macrocyclic rings in vancomycin and balhimycin is tightly conserved in spite of differing hydrogen bond geometries. Loll first interpreted this as evidence that structural rigidity is an inherent property of these molecules, irrespective of whether their backbones are hydrogen bonded or not. However, higher affinity ligands may induce structural change432 in the antibiotic if compared to low-affinity ligands. All of these crystal structures show the essential features of unsymmetric homodimers first demonstrated in the NMR study416 of eremomycin.425,434 In these structures the H-bond pattern between the two halves of the dimer forms a two-stranded antiparallel -sheet. The monomers have concave shapes offering the binding site at the exterior of the dimer. NMR structures of aglyco-vancomycin (AGV) were published435 in D2O/DMSO (4:1) mixture or neat DMSO.436 While the ligand binding is demonstrated in the former, there is no dimer formation or ligand binding in the latter. Significant conformational differences exist between the latter NMR structure in neat DMSO and the crystal structure437: most notably in the ligand-binding site and in the aromatic rings of residues 2, 4, and 6. The NMR structure shows the peptide backbone bulging outward in the vicinity of residues 2 and 3 to form the so-called -pleated sheet conformation, which is unfavorable for both ligand binding and dimerization. However, the aglycon crystals were grown from an aqueous solution that supports dimerization. Moreover, the AGV is indeed capable of adopting an active conformation much like to vancomycin, and therefore the inactive conformation is not a result of the removal of the sugar residues. A detailed NMR study438 with chloroeremomycin (A82846B) and the pentapeptide ligand, Ala-GGlu-Lys-D-Ala-D-Ala, proved that the complex of A82846B and its cell wall pentapeptide form an asymmetrical dimer similar to that seen for eremomycin complexed with the unnatural (pyrrole-2-carboxylate) ligand.416 Prowse suggested that the carboxylate group may assume more than one orientation in the binding pocket and that the side chain of Asn-3 is an integral part of the hydrogen-bonding network. On the other hand, multiple binding modes were neither found in 13C{1H} heteronuclear NOE of vancomycin,424 nor was a role attributed to Asn-3 eremomycin–ligand complexes.416

Biomolecular Recognition by Oligosaccharides and Glycopeptides: The NMR Point of View

231

Scheme 7 The heptapeptide backbones of glycopeptide antibiotics forming back-to-back and face-to-face complexes with Ac-D-Ala.

A new dimeric form of vancomycin has been found431 in which two monomers are related in a face-toface configuration, and bound NAc-D-Ala ligands comprise a large portion of the dimer interface. A virtually infinite chain of vancomycin monomers comprise the crystal lattice. These chains are made up of alternating back-to-back and face-to-face contacts between monomers (Scheme 7). The biological significance of these new oligomers is not clear yet. Dimerization is believed to promote antimicrobial action because the binding of one monomer to the bacterial cell wall brings a second monomer into proximity with other peptidoglycan ligands, leading to the formation of a ‘chelate’ with the peptidoglycan. Back-to-back dimerization also increases the affinity between ligands and individual monomers. At least three mechanisms have been suggested to explain this allosteric effect. First, dimerization may induce conformational changes, or suppress thermomolecular deformations, and thereby yield a more favorable site for ligand binding. Second, dimerization may position positively charged groups such that the negatively charged carboxyl terminus of a ligand is attracted into the binding site. Third, antiparallel structure in a dimer may polarize the backbone peptide groups involved in ligand binding and strengthen their hydrogen bonding potential. These mechanisms are mutually compatible, and all may contribute to ligand affinity. In the X-ray study439 of balhimycin and degluco-balhimycin complexed with di, tri, and pentapeptides, an unexpected variability of the extent of oligomerization an binding modes were observed. Appearance of face-to-face oligomers (tetra, hexa, and octamers), even virtually infinite layers are not due to crystal contacts; they depend on the arrangement of the ligand in the binding pocket and have little impact on the drug backbone conformation.

232 Biomolecular Recognition by Oligosaccharides and Glycopeptides: The NMR Point of View

However, bigger peptide ligands cause enhanced backbone bending of the drug. Face-to-face dimers are formed when the model peptide reaches a critical fraction of the size of the cell wall precursor pentapeptide. The extensive interactions in this interface should enhance the kinetic and thermodynamic stability of the complexes. In the pentapeptide complex, the relative positions of the peptides are close to those required for D-Ala elimination, so this structure may provide a model for the prevention of the enzyme-catalyzed cell wall cross-linking by antibiotic binding. Interesting new results were obtained using more realistic cell-wall mimics. In the absence of drug dimers (vancomycin concentration 0.081 mmol l1) the binding affinity of the large glycopeptide (NAG–NAM-peptide) dimer was checked440 to vancomycin by isothermal titration calorimetry (ITC). Two sequential binding events were observed that resulted in a final 2:1 vancomycin/cell wall analogue complex, and the association constant for the second was double of the first. Additional favorable enthalpic increment and a more unfavorable entropic contribution for the second binding step was experimentally observed. Molecular dynamics simulations also displayed reduced motion supporting thereby the ITC results. A challenging review was published on the structural biology aspects of vancomycin antibiotics.441 9.07.4.5

Possible Role of Dynamics Upon Ligand Binding

There is a consensus of the main structural motifs of molecular recognition in vancomycin antibiotics complexed with cell wall analogues. However, subtle details, for example, enthalpy–entropy compensation442–445 and allosterism remained to be further disclosed. Following other studies, recent theoretical molecular dynamics calculations446,447 suggest that the known cooperativity433,448–451 between antibiotic dimerization and ligand binding could be explained by the nonadditivity of the entropic costs of dimerization and ligand binding. It was suggested that, in the absence of major conformational changes or other enthalpydriven processes, enhanced internal molecular dynamics up to the picosecond timescale by themselves may be responsible for the observed cooperativity. NMR order parameters S2 can be derived from 15N NMR auto- and cross-relaxation, and they sample a similar fraction of internal timescales (picosecond range). For eremomycin, it was proven by 15N relaxation that the two sides of the dimer are dynamically equivalent and the binding pocket is the least protected site for solvent access.423 Experimental NMR evidences were found on the 180 rotation of the peptide group between residues 2 and 3 in vancomycin.452 Recent molecular dynamics calculations suggested453 a breathing mechanism that is able to enhance desolvation of the binding site. According to MD calculations, the barriers to the rotation of two different backbone peptide groups are sufficiently low, and their rotation destabilizes the water captured in the binding pocket. After the water molecule is expelled from the binding cavity – a key first step for molecular recognition – there is a chance for the ligand entry. Sporadic examples on proteins demonstrated increased internal dynamics of the host upon the binding of a small hydrophobic ligand that may outweigh the entropic cost of association.454 On the other hand, Williams emphasizes the importance of structural tightening upon cooperative ligand binding.455 According to Williams, the structural model of the ristocetin-A dimer system leads to the conclusion that positively cooperative binding will reduce the dynamic behavior of the receptor system. The importance of structural tightening, as opposed to partially bound states was underlined to explain chemical shift changes upon binding.456,457 The idea was further extended to support general principles of ligand-induced reduction in motion within receptors and enzymes.458 9.07.4.6

Solid-State NMR of Glycopeptide Antibiotics with Bacterial Cell Wall Complexes

Some promising new compounds have been successfully tested against pathologic MRSA strains recently. Among them, the chlorobiphenyl derivatives could be easily transformed to fluoro derivatives. The 19F nucleus is an excellent spy for solid-state NMR, where the detected weak 13C–19F or 15N–19F dipolar interactions can ‘see’ to long distances, significantly farther than NOEs in solution. A recent review459 summarizes the potential of contemporary solid-state REDOR460 (rotational echo double resonance) and TEDOR461 (transferred echo double resonance) techniques with interesting applications on peptide antibiotics and also with the use of 31P and 2H nuclei. In general, these techniques are capable to measure internuclear distances in between 6 and 20 A˚ and work on S–I (rare–abundant) spin pairs where the S rare spin is observed. The advantages of REDOR are that it is independent of the chemical shift tensor of the coupled nuclei and does not require the resolution of

Biomolecular Recognition by Oligosaccharides and Glycopeptides: The NMR Point of View

233

the S–I dipolar coupling on the S chemical shift scale. On the other hand, in some cases, comparison with unlabeled samples require extra experiments. This problem can be circumvented by the TEDOR461 method. TEDOR is capable to select the coupled spins from the background of uncoupled nuclei. TEDOR and REDOR can be used in combination, which allows the measurement of the S–X distance in an I–S–X labeled three-spin system. Since the 19F isotope is abundant in some of the new-generation vancomycin antibiotics (e.g., fluorobiphenyl chloroeremomycin, LY329332), the 13C{19F} REDOR can position the single 19F nucleus with respect to the natural abundance 13C (or selectively 13C-labeled Gly and D-Ala and/or [15N] Lys nuclei of cell wall constituents that are resolved on their chemical shift scale.462 Binding affinity data showed that the enhanced potency of LY329332 and that of the chlorinated analogue (both 1000-fold greater than that of vancomycin against vancomycin-resistant enterococci) are not reflected in an increase in binding affinity for mature peptidoglycan in the tested normal (not resistant) S. aureus strain (ATCC 6538P). The binding model assumes that the vancomycin cleft binds to a stem terminating in D-Ala-D-Ala. In the model, the fluorine of the biphenyl moiety is not near the L-Ala of the complexed stem, but rather the L-Ala of a nearest neighbor stem on an adjacent glycan strand. This nearest neighbor stem is situated with a bridge (85% of all stems have bridges), and this arrangement is the source of 19F coupling to the carbonyl carbons of D-Ala and Gly. The complex is presumably stabilized by interactions of the sugars of LY329332 with proximate glycans. The model of mature peptidoglycan-LY329332 complex is putative, since only three distances could be determined with respect to the 19F spy nucleus. However, a sensible model was built up without the assumption of either dimeric antibiotics or membrane anchoring of the fluorobiphenyl tail. The sample preparation conditions are probably relevant to the picture obtained by solid-state NMR. It must be emphasized that mature, normal (not resistant) S. aureus strains were used in this case. The antibiotics concentration in these studies are close to the upper limit in human clinical treatments (100–200 mmol l1). The lyophilization of cell walls and intact cells comlexed with LY329332, for example, resulted in anhydrous samples where trehalose was applied to mimic water according to the water replacement hypothesis. The results were in contradiction with the generally accepted aqueous solution NMR structures in the sense that neither dimerization nor membrane anchoring was observed in the solid state. Absence of dimers is not too surprising, because reversible dimers are documented exclusively in water solutions (e.g., monomeric vancomycin D-Ala-D-Ala complex was analyzed in DMSO463). Furthermore, at 0.1–0.2 mmol l1 antibiotic concentration, the dimer concentration would be low for typical Kdim ¼ 500–1000 mol1 l1 and one expects 5–15% dimer only, except eremomycin and chloro-eremomycin where the dimer would be over 90% in aqueous solution. These two facts taken together and the use of mature cells may at least in part explain that detection of dimers failed in solid-state NMR studies. Similar investigations464 determined the effect of vancomycin on cell wall assembly in normal S. aureus during active cell division. It was suggested that at the therapeutic level vancomycin interrupts peptidoglycan synthesis by interference with transglycosylation. In other works364 of the Scaefer group they studied 19F derivatives of vancomycin, eremomycin, and chloroeremomycin – including modifications at the C terminus – that are active against resistant strains. REDOR technique was used to measure the dipolar couplings between 19F of the drugs and 13 C and 15N labels incorporated in peptidoglycan (PG) stems and bridging pentaglycyl segments. They improved their TEDOR/molecular dynamics model of [19F]oritavancin465 in the following: The pentaglycyl bridge is now believed helical, the pentaglycyl bridging segment is lowered into a protective cleft formed by the 4-eremosamine and the glycopeptide core, the D-iso-Gln of the bound stem is moved up toward the 4-eremosamine moiety and, the unbound neighboring stem is moved away from the C-terminus of the glycopeptide core and proximity to the bound stem. These views are supported by some biological evidences showing that sugars on glycopeptides may improve antimicrobial activities without enhancing binding affinities.466 In the same work using the aid of bioaffinity mass spectrometry, the benefitial impact of drug self-dimerization was questioned. Very recent studies467 using oritavancin with and without the D-Ala binding pocket suggest that oritavancin has dual mode of action. First, transglycosylation is inhibited via binding to lipid II. Second, correlation of the model structures and antibiotic activity led to the conclusion that the hydrophobic substituent of the drug disaccharide and components of the aglycon structure form a secondary binding site for pentaglycyl segments in S. aureus. They proposed that this secondary binding site compensates for the loss of binding affinity to D-Ala-D-Lac stem termini, and thus allows the disaccharide-modified glycopeptides to

234 Biomolecular Recognition by Oligosaccharides and Glycopeptides: The NMR Point of View

maintain their activity against VRSA. In similar studies468 using vancomycin derivatives with damaged D-Ala binding cleft were initiated. REDOR techniques were used to study binding modes of des-N-methylleucyl-4(4-fluorophenyl) benzyl-vancomycin (DFPBV), the fluorinated analogue of des-N-methylleucyl-4-(4chlorophenyl) benzyl-vancomycin (DCPBV), which is active against both vancomycin-susceptible and vancomycin-resistant bacteria. Importantly, lack of hydrophobic side chain in the 4-disaccharide moiety in des-N-methylleucyl-vancomycin (DV) causes loss of activity against both types of bacteria. The proposed mode of action of DFPBV is as follows: It binds to the template peptidoglycan as it is positioned to inhibit transpeptidase activity by nonspecific steric interference. This mechanism is independent of the interaction with lipid II and requires no specific binding to enzymes. DFPBV may also bind at nascent peptidoglycan sites even though lattice constraints are only partially formed. In summary, the proposed mode of action of vancomycin and second-generation glycopeptide antibiotics significantly differs from the view suggested until now from interpretation of in vitro solution NMR results. Lack of drug dimers and membrane anchoring is surprising, however, the anhydrous environment for solid-state NMR may in part explain these results. The potential of recent solid-state NMR methods can be extended if some more different spy nuclei can be substituted to active drugs, and the biological targets from resistant enterococci and staphylococci will be available. 9.07.4.7

Conclusions

Glycopeptide antibiotics are believed to represent a last line of defense against the often lethal Gram-positive bacterial infections. Nowadays, vancomycin resistance is widespread, and new antibiotics should be developed to cope with multiresistant strains. Although the structures of vancomycin antibiotics and the essence of mode of action has been learned decades ago, fine details of these delicate molecular recognition processes are still not fully understood. Up to date NMR methods (both solution and solid-state), X-ray crystallography, in silico calculations, and calorimetry all contribute for better understanding of the mode of antibacterial action. Surprisingly, some recent semisynthetic glycopeptide ‘antibiotics’ exhibit remarkable antiviral effect extending thereby the scope of glycopeptide research.

Acknowledgments For Section 9.07.1: K. E. K. thanks the support from the Hungarian Scientific Research Fund (OTKA NK 68578 and T-048713). D. U. acknowledges the support of Wellcome Trust (078780/Z/05/Z). Both authors thank Nicholle G. A. Bell (The University of Edinburgh) for reading the manuscript. For Section 9.07.3: L. S. wishes to thank financial support from the Hungarian Scientific Research Fund (grants T-048713 and NK-68578). For Section 9.07.4: G. B. thanks the financial support from the Hungarian Scientific Research Fund (OTKA NK 68578).

Abbreviations AGV APME BIRD C12E5 C8E5 COSMO-HSQC CT DCPBV DEPT DFPBV DFT

aglyco-vancomycin additive potential maximum entropy bilinear rotation decoupling pentaethylene glycol mono-n-dodecyl ether pentaethylene glycol octyl ether cosine modulated heteronuclear single quantum correlation constant time des-N-methylleucyl-4-(4-chlorophenyl) benzyl-vancomycin distortionless enhanced polarization transfer des-N-methylleucyl-4-(4-fluorophenyl) benzyl-vancomycin density functional theory

Biomolecular Recognition by Oligosaccharides and Glycopeptides: The NMR Point of View

DHPC DMPC DMSO DQF-COSY DV E.COSY GDO GISA GlcNAc GMPP GRE HCCH-COSY HCCH-TOCSY HCP HETLOC HMBC HMQC HSQC HSQMBC INADEQUATE IPAP ITC LPS LT LTA mDap MDP MLT MPP MRSA MurNAc NAG NAM NMR NOD NOESY PAF PALES PAMP PGM PGN PGRP PRR RDC REDOR ROESY S3CT SPITZE TCT TEDOR TOCSY

dihexanoyl-phosphatidylcholine dimyristoyl-phosphatidylcholine dimethyl-sulfoxide double quantum filtered correlation spectroscopy des-N-methylleucyl-vancomycin exclusive correlation spectroscopy generalized degree of order staphylococci with an intermediate resistance to glycopeptides N-acetyl-D-glucosamine GlcNAc-(1-4)MurNAc-pentapeptide glycopeptide-resistant enterococci 3D experiment correlating Ha, Ca, and Hb in an (–HaCa. . .CbHb–) segment 3D experiment correlating Ha, Ca, and Hb in an (–HaCa. . .CbHb–) segment heteronuclear cross-polarization pulse sequence for determination of heteronuclear long-range couplings heteronuclear multiple bond correlation heteronuclear multiple quantum correlation heteronuclear single quantum correlation heteronuclear single quantum multiple bond correlation incredible natural abundance double quantum transfer experiment in-phase antiphase isothermal titration calorimetry lipopolysaccharide lytic transglycosylase lipoteichoic acid meso-diaminopimelic acid MurNAc-dipeptide membrane transglycosylase MurNAc-pentapeptide methicillin-resistant Staphylococcus aureus N-acetylmuramic acid N-acetylglucosamine N-acetylmuramic nuclear magnetic resonance nucleotide-binding oligomerization domain nuclear Overhauser effect spectroscopy principle aligning frame prediction of the alignment from structure pathogen-associated molecular pattern peptidoglycan monomer peptidoglycan peptidoglycan recognition protein pattern recognition receptor residual dipolar coupling rotational echo double resonance rotational nuclear Overhauser effect spectroscopy spin-state-selective coherence transfer spin state selective zero overlap tracheal cytotoxin transferred echo double resonance total correlated spectroscopy

235

236 Biomolecular Recognition by Oligosaccharides and Glycopeptides: The NMR Point of View

TRAMITE TRNOE T-ROESY VRSA

tracking alignment from the moment of inertia tensor transferred nuclear Overhauser enhancement tilted rotational nuclear Overhauser effect spectroscopy vancomycin-resistant Staphylococcus aureus

References 1. J. Dabrowski, Two-dimensional and Related NMR Methods in Structural Analyses of Oligosaccharides and Polisaccharides. In Two-Dimensional NMR Spectroscopy. Applications for Chemists and Biochemists; W. R. Croasmun, R. M. K. Carlson, Eds.; VCH: New York, 1994; pp 741–783. 2. L. E. Lerner, Carbohydrate Structure and Dynamics from NMR Spectroscopy and Its Application to Biomedical Research. In NMR Spectroscopy and Its Applications to Biomedical Research; S. K. Sarkar, Ed.; Elsevier Science B.V.: Amsterdam, 1996; pp 313–344. 3. C. A. Bush; M. Martin-Pastor; A. Imbery, Annu. Rev. Biophys. Biomol. Struct. 1999, 28 (1), 269–293. 4. A. S. Serianni, Carbohydrates. In Bioorganic Chemistry; S. M. Hecht, Ed.; Oxford University Press: New York, 1999; pp 244–312. 5. J. O. Duus; C. H. Gotfredsen; K. Bock, Chem. Rev. 2000, 100 (12), 4589. 6. W. A. Bubb, J. Magn. Reson. Part A 2003, 19A (1), 1–19. 7. Y. Kajihara; H. Sato, Trends Glycosci. Glycotechnol. 2003, 15 (84), 197–220. 8. M. Hricovini, Curr. Med. Chem. 2004, 11 (19), 2565–2583. 9. J. L. M. Jansson; A. Maliniak; G. Widmalm, Conformational Dynamics of Oligosaccharides: NMR Techniques and Computer Simulations. In NMR Spectroscopy and Computer Modeling of Carbohydrates: Recent Advances; J. F. G. Vliegenthart, R. J. Woods, Eds.; John Wiley & Sons: Chichester, 2006; Vol. 930, pp 20–39. 10. H. van Halbeek, Curr. Opin. Struct. Biol. 1994, 4 (5), 697–709. 11. E. Fukushi, Biosci. Biotechnol. Biochem. 2006, 70 (8), 1803–1812. 12. K. Furihata; H. Seto, Tetrahedron Lett. 1998, 39 (40), 7337–7340. 13. A. Padilla; G. W. Vuister; R. Boelens; G. J. Kleywegt; A. Cave; J. Parello; R. Kaptein, J. Am. Chem. Soc. 1990, 112 (13), 5024–5030. 14. J. N. Breg; R. Boelens; G. W. Vuister; R. Kaptein, J. Magn. Reson. 1990, 87 (3), 646–651. 15. P. Dewaard; R. Boelens; G. W. Vuister; J. F. G. Vliegenthart, J. Am. Chem. Soc. 1990, 112 (8), 3232–3234. 16. G. V. T. Swapna; R. Ramachandran, J. Magn. Reson. 1992, 100 (1), 166–170. 17. D. Uhrin, J. Magn. Reson. 2002, 159 (2), 145–150. 18. L. P. Yu; R. Goldman; P. Sullivan; G. F. Walker; S. W. Fesik, J. Biomol. NMR 1993, 3 (4), 429–441. 19. P. T. Robinson; T. N. Pham; D. Uhrin, J. Magn. Reson. 2004, 170 (1), 97–103. 20. H. Bircher; C. Muller; P. Bigler, Magn. Reson. Chem. 1991, 29 (7), 726–729. 21. L. Poppe; H. Vanhalbeek, J. Magn. Reson. 1992, 96 (1), 185–190. 22. D. Uhrin; J. R. Brisson; D. R. Bundle, J. Biomol. NMR 1993, 3 (3), 367–373. 23. D. Uhrin; J. R. Brisson; G. Kogan; H. J. Jennings, J. Magn. Reson. B 1994, 104 (3), 289–293. 24. M. J. Gradwell; H. Kogelberg; T. A. Frenkiel, J. Magn. Reson. 1997, 124 (1), 267–270. 25. C. Roumestand; C. Delay; J. A. Gavin; D. Canet, Magn. Reson. Chem. 1999, 37 (7), 451–478. 26. R. Laatikainen; M. Niemitz; U. Weber; J. Sundelin; T. Hassinen; J. Vepsalainen, J. Magn. Reson. Ser. A 1996, 120 (1), 1–10. 27. J. J. Titman; J. Keeler, J. Magn. Reson. 1990, 89 (3), 640–646. 28. K. E. Ko¨ve´r; D. Uhrin; V. J. Hruby, J. Magn. Reson. 1998, 130 (2), 162–168. 29. M. J. Thrippleton; J. Keeler, Angew. Chem. Int. Ed. Engl. 2003, 42 (33), 3938–3941. 30. F. Rastrelli; A. Bagno, J. Magn. Reson. 2006, 182 (1), 29–37. 31. P. Giraudeau; S. Akoka, J. Magn. Reson. 2007, 186 (2), 352–357. 32. C. Zwahlen; S. J. F. Vincent, J. Am. Chem. Soc. 2002, 124 (24), 7235–7239. 33. G. Zhu; A. Bax, J. Magn. Reson. Ser. A 1993, 104 (3), 353–357. 34. B. L. Marquez; W. H. Gerwick; R. T. Williamson, Magn. Reson. Chem. 2001, 39 (9), 499–530. 35. S. Uhrinova; D. Uhrin; T. Liptaj; J. Bella; J. Hirsch, Magn. Reson. Chem. 1991, 29 (9), 912–922. 36. K. Fehe´r; S. Berger; K. E. Ko¨ve´r, J. Magn. Reson. 2003, 163 (2), 340–346. 37. M. D. Sorensen; A. Meissner; O. W. Sorensen, J. Magn. Reson. 1999, 137 (1), 237–242. 38. F. Cordier; A. J. Dingley; S. Grzesiek, J. Biomol. NMR 1999, 13 (2), 175–180. 39. M. Kurz; P. Schmieder; H. Kessler, Angew. Chem. Int. Ed. Engl. 1991, 30 (10), 1329–1331. 40. G. Z. Xu; J. S. Evans, J. Magn. Reson. Ser. A 1996, 123 (1), 105–110. 41. D. Uhrin; G. Batta; V. J. Hruby; P. N. Barlow; K. E. Ko¨ve´r, J. Magn. Reson. 1998, 130 (2), 155–161. 42. G. Z. Xu; B. Zhang; J. S. Evans, J. Magn. Reson. 1999, 138 (1), 127–134. 43. W. Kozminski; D. Nanz, J. Magn. Reson. 1997, 124 (2), 383–392. 44. K. E. Ko¨ve´r; V. J. Hruby; D. Uhrin, J. Magn. Reson. 1997, 129 (2), 125–129. 45. P. Nolis; T. Parella, J. Magn. Reson. 2005, 176 (1), 15–26. 46. R. A. E. Edden; J. Keeler, J. Magn. Reson. 2004, 166 (1), 53–68. 47. K. Kobzar; B. Luy, J. Magn. Reson. 2007, 186 (1), 131–141. 48. A. Meissner; O. W. Sorensen, Magn. Reson. Chem. 2001, 39 (1), 49–52. 49. T. Parella; J. Belloc; F. Sanchez-Ferrando, Magn. Reson. Chem. 2004, 42 (10), 852–862. 50. P. Nolis; T. Parella, Curr. Anal. Chem. 2007, 3 (1), 47–68.

Biomolecular Recognition by Oligosaccharides and Glycopeptides: The NMR Point of View 51. 52. 53. 54. 55. 56. 57. 58. 59. 60. 61. 62. 63. 64. 65. 66. 67. 68. 69. 70. 71. 72. 73. 74. 75. 76. 77. 78. 79. 80. 81. 82. 83. 84. 85. 86. 87. 88. 89. 90. 91. 92. 93. 94. 95. 96. 97. 98. 99. 100. 101. 102. 103. 104. 105. 106. 107. 108. 109. 110. 111. 112. 113. 114.

237

L. Poppe; H. Vanhalbeek, Magn. Reson. Chem. 1991, 29 (8), 848–851. L. Poppe; H. Vanhalbeek, J. Magn. Reson. 1991, 93 (1), 214–217. L. Poppe; S. Q. Sheng; H. Vanhalbeek, Magn. Reson. Chem. 1994, 32 (2), 97–100. D. Uhrin; A. Mele; K. E. Ko¨ve´r; J. Boyd; R. A. Dwek, J. Magn. Reson. Ser. A 1994, 108 (2), 160–170. K. E. Ko¨ve´r; D. Jiao; D. Uhrin; P. Forgo´; V. J. Hruby, J. Magn. Reson. Ser. A 1994, 106 (1), 119–122. T. Nishida; G. Widmalm; P. Sandor, Magn. Reson. Chem. 1995, 33 (7), 596–599. T. Rundlof; A. Kjellberg; C. Damberg; T. Nishida; G. Widmalm, Magn. Reson. Chem. 1998, 36 (11), 839–847. M. Findeisen; S. Berger, Magn. Reson. Chem. 2003, 41 (6), 431–434. P. Vidal; N. Esturau; T. Parella; J. F. Espinosa, J. Org. Chem. 2007, 72 (9), 3166–3170. R. T. Williamson; B. L. Marquez; W. H. Gerwick; K. E. Ko¨ve´r, Magn. Reson. Chem. 2000, 38 (4), 265–273. H. Koskela; I. Kilpelainen; S. Heikkinen, J. Magn. Reson. 2003, 164 (2), 228–232. K. E. Ko¨ve´r; G. Batta; K. Fehe´r, J. Magn. Reson. 2006, 181 (1), 89–97. R. Gitti; G. X. Long; C. A. Bush, Biopolymers 1994, 34 (10), 1327–1338. Q. W. Xu; S. Mohan; C. A. Bush, Biopolymers 1996, 38 (3), 339–353. J. Wu; A. S. Serianni, Carbohydr. Res. 1992, 226 (2), 209–219. J. M. Duker; A. S. Serianni, Carbohydr. Res. 1993, 249 (2), 281–303. T. E. Walker; R. E. London; T. W. Whaley; R. Barker; N. A. Matwiyoff, J. Am. Chem. Soc. 1976, 98 (19), 5807–5813. B. Bose; S. Zhao; R. Stenutz; F. Cloran; P. B. Bondo; G. Bondo; B. Hertz; I. Carmichael; A. S. Serianni, J. Am. Chem. Soc. 1998, 120 (43), 11158–11173. B. Bose-Basu; T. Klepach; G. Bondo; P. B. Bondo; W. Zhang; I. Carmichael; A. S. Serianni, J. Org. Chem. 2007, 72 (20), 7511–7522. U. Olsson; A. S. Serianni; R. Stenutz, J. Phys. Chem. B 2008, 112 (14), 4447–4453. A. Bax; D. Max; D. Zax, J. Am. Chem. Soc. 1992, 114 (17), 6923–6925. Q. W. Xu; C. A. Bush, Carbohydr. Res. 1998, 306 (3), 335–339. M. Martin-Pastor; A. Canales-Mayordomo; J. Jimenez-Barbero, J. Biomol. NMR 2003, 26 (4), 345–353. S. K. Zhao; G. Bondo; J. Zajicek; A. S. Serianni, Carbohydr. Res. 1998, 309 (2), 145–152. K. E. Ko¨ve´r; P. Forgo´, J. Magn. Reson. 2004, 166 (1), 47–52. T. N. Pham; K. E. Ko¨ve´r; L. Jin; D. Uhrin, J. Magn. Reson. 2005, 176 (2), 199–206. L. Jin; D. Uhrin, Magn. Reson. Chem. 2007, 45 (8), 628–633. L. Jin; K. E. Ko¨ve´r; M. R. Lenoir; D. Uhrin, J. Magn. Reson. 2008, 190 (2), 171–182. S. W. Homans, Prog. Nucl. Magn. Reson. Spectrosc. 1990, 22, 55–81. A. S. Serianni, Nuclear Magnetic Resonance Approaches to Oligosaccharide Structure Elucidation. In Glycoconjugates; H. Allen, E. C. Kisalius, Eds.; Marcel Dekker: New York, 1992; pp 71–102. M. Eberstadt; G. Gemmecker; D. F. Mierke; H. Kessler, Angew. Chem. Int. Ed. Engl. 1995, 34 (16), 1671–1695. C. Altona, Vicinal Coupling Constants and Conformation of Biomolecules. John Wiley: London, 1996. W. A. Thomas, Prog. Nucl. Magn. Reson. Spectrosc. 1997, 30 (3–4), 183–207. R. H. Contreras; J. E. Peralta; C. G. Giribet; M. C. De Azua; J. C. Facelli, Annu. Rep. NMR Spectrosc. 2000, 41, 55–184. G. E. Martin, Qualitative and Quantitative Exploitation of Heteronuclear Coupling Constants. In Annual Reports on NMR Spectroscopy; G. A. Webb, Ed.; Academic Press: New York, 2002; Vol. 46, pp 37–100. M. Kraszni; Z. Szakacs; B. Noszal, Anal. Bioanal. Chem. 2004, 378 (6), 1449–1463. B. Mulloy; T. A. Frenkiel; D. B. Davies, Carbohydr. Res. 1988, 184, 39–46. I. Tvaroska; M. Hricovini; E. Petrakova, Carbohydr. Res. 1989, 189, 359–362. M. J. Milton; R. Harris; M. A. Probert; R. A. Field; S. W. Homans, Glycobiology 1998, 8 (2), 147–153. F. Cloran; I. Carmichael; A. S. Serianni, J. Am. Chem. Soc. 1999, 121 (42), 9843–9851. M. Martin-Pastor; C. A. Bush, Biochemistry 1999, 38 (25), 8045–8055. M. Karplus, J. Chem. Phys. 1959, 30 (1), 11–15. F. Cloran; I. Carmichael; A. S. Serianni, J. Am. Chem. Soc. 2000, 122 (2), 396–397. H. Q. Zhao; I. Carmichael; A. S. Serianni, J. Org. Chem. 2008, 73 (8), 3255–3257. A. S. Serianni; P. B. Bondo; J. Zajicek, J. Magn. Reson. B. 1996, 112 (1), 69–74. T. Church; I. Carmichael; A. S. Serianni, Carbohydr. Res. 1996, 280 (2), 177–186. I. Tvaroska; F. R. Taravel, J. Biomol. NMR 1992, 2 (5), 421–430. I. Tvaroska; F. R. Taravel, Adv. Carbohydr. Chem. Biochem. 1995, 51, 15–61. I. Carmichael; D. M. Chipman; C. A. Podlasek; A. S. Serianni, J. Am. Chem. Soc. 1993, 115 (23), 10863–10870. T. E. Klepach; I. Carmichael; A. S. Serianni, J. Am. Chem. Soc. 2005, 127 (27), 9781–9793. S. Ilin; C. Bosques; C. Turner; H. Schwalbe, Angew. Chem. Int. Ed. Engl. 2003, 42 (12), 1394–1397. Z. Dzakula; W. M. Westler; A. S. Edison; J. L. Markley, J. Am. Chem. Soc. 1992, 114 (15), 6195–6199. Z. Dzakula; A. S. Edison; W. M. Westler; J. L. Markley, J. Am. Chem. Soc. 1992, 114 (15), 6200–6207. L. Poppe, J. Am. Chem. Soc. 1993, 115 (18), 8421–8426. K. Bock; J. O. Duus, J. Carbohydr. Chem. 1994, 13 (4), 513–543. G. D. Rockwell; T. B. Grindley, J. Am. Chem. Soc. 1998, 120 (42), 10953–10963. Y. Nishida; H. Hori; H. Ohrui; H. Meguro, J. Carbohydr. Chem. 1988, 7 (1), 239–250. H. Hori; Y. Nishida; H. Ohrui; H. Meguro, J. Carbohydr. Chem. 1990, 9 (5), 601–618. R. Stenutz; I. Carmichael; G. Widmalm; A. S. Serianni, J. Org. Chem. 2002, 67 (3), 949–958. A. Roen; J. I. Padron; J. T. Vazquez, J. Org. Chem. 2003, 68 (12), 4615–4630. C. Nobrega; J. T. Vazquez, Tetrahedron: Asymmetry 2003, 14 (18), 2793–2801. C. Mayato; R. Dorta; J. Va´zquez, Tetrahedron: Asymmetry 2004, 15 (15), 2385–2397. I. Tvaroska; J. Gajdos, Carbohydr. Res. 1995, 271 (2), 151–162. I. Tvaroska; F. R. Taravel; J. P. Utille; J. P. Carver, Carbohydr. Res. 2002, 337 (4), 353–367.

238 Biomolecular Recognition by Oligosaccharides and Glycopeptides: The NMR Point of View 115. C. Thibaudeau; R. Stenutz; B. Hertz; T. Klepach; S. Zhao; Q. Q. Wu; I. Carmichael; A. S. Serianni, J. Am. Chem. Soc. 2004, 126 (48), 15668–15685. 116. M. Tafazzoli; M. Grhiasi, Carbohydr. Res. 2007, 342 (14), 2086–2096. 117. G. A. S. W. Jeffrey, Hydrogen Bond in Biological Structures. Springer-Verlag: Berlin, 1991. 118. H. C. Siebert; M. Frank; C. W. Lieth; J. Jimenez-Barbero; H. J. Gabius, Detection of Hydroxyl Protons. In NMR Spectroscopy of Glycoconjugates; J. Jime´nez-Barbero, T. Peters, Eds.; Wiley-VCH: Weinheim, 2003. 119. C. Sandstrom; L. Kenne, Hydroxy Protons in Structural Studies of Carbohydrates by NMR Spectroscopy. In NMR Spectroscopy and Computer Modeling of Carbohydrates: Recent Advances; J. F. G. Vliegenthart, R. J. Woods, Eds.; John Wiley & Sons: Chichester, 2006; Vol. 930; pp 114. 120. J. Dabrowski; H. Grosskurth; C. Baust; N. E. Nifant’ev, J. Biomol. NMR 1998, 12 (1), 161–172. 121. C. E. Anderson; A. J. Pickrell; S. L. Sperry; T. E. Vasquez; T. G. Custer; M. B. Fierman; D. C. Lazar; Z. W. Brown; W. S. Iskenderian; D. D. Hickstein; D. J. O’Leary, Heterocycles 2007, 72, 469–495. 122. T. T. Nguyen; T. N. Le; F. Duus; B. K. V. Hansen; P. E. Hansen, Magn. Reson. Chem. 2007, 45 (3), 245–252. 123. P. E. Hansen, J. Labelled Comp. Radiopharm. 2007, 50 (11–12), 967–981. 124. C. Sandstrom; H. Baumann; L. Kenne, J. Chem. Soc., Perkin Trans. 2 1998, (4), 809–815. 125. C. Sandstrom; H. Baumann; L. Kenne, J. Chem. Soc., Perkin Trans. 2 1998, (11), 2385–2393. 126. H. Q. Zhao; Q. F. Pan; W. H. Zhang; I. Carmichael; A. S. Serianni, J. Org. Chem. 2007, 72 (19), 7071–7082. 127. B. Adams; L. Lerner, J. Am. Chem. Soc. 1992, 114 (12), 4827–4829. 128. H. C. Siebert; S. Andre; J. F. G. Vliegenthart; H. J. Gabius; M. J. Minch, J. Biomol. NMR 2003, 25 (3), 197–215. 129. L. Poppe; H. Vanhalbeek, Nat. Struct. Biol. 1994, 1 (4), 215–216. 130. B. Bernet; A. Vasella, Helv. Chim. Acta 2000, 83 (5), 995–1021. 131. R. R. Fraser; M. Kaufman; P. Morand; G. Govil, Can. J. Chem. 1969, 47 (3), 403–409. 132. K. G. R. Pachler, Tetrahedron 1971, 27 (1), 187. 133. C. A. G. Haasnoot; F. Deleeuw; C. Altona, Tetrahedron 1980, 36 (19), 2783–2792. 134. H. Fukui; T. Baba; H. Inomata; K. Miura; H. Matsuda, Mol. Phys. 1997, 92 (1), 161–165. 135. L. Alkorta; J. Elguero, Theor. Chem. Acc. 2004, 111 (1), 31–35. 136. P. Dais; A. S. Perlin, Can. J. Chem. 1982, 60 (13), 1648–1656. 137. G. Batta; K. E. Ko¨ve´r, Carbohydr. Res. 1999, 320 (3–4), 267–272. 138. K. E. Ko¨ve´r; A. Lipta´k; T. Beke; A. Perczel, J. Comput. Chem. 2009, 30 (4), 540–550. 139. M. J. Frisch, et al., Gaussian03, rev. D.01.; Gaussian Inc.: Wallingford, CT: 2004. 140. N. C. Maiti; Y. P. Zhu; I. Carmichael; A. S. Serianni; V. E. Anderson, J. Org. Chem. 2006, 71 (7), 2878–2880. 141. A. Saupe, Angew. Chem. Int. Ed. Engl. 1968, 7 (2), 97. 142. J. W. Emsley; J. C. Lindon, NMR Spectroscopy Using Liquid Crystal Solvents; Pergamon: Oxford, 1975. 143. R. Y. Dong, Nuclear Magnetic Resonance of Liquid Crystals; Springer: New York, 1994. 144. J. W. Emsley, In Encyclopedia of Nuclear Magnetic Resonance; D. M. Grant, R. K. Harris, Eds.; Wiley: Chichester, 1996; pp 2788–2799. 145. C. Algieri; F. Castiglione; G. Celebre; G. De Luca; M. Longeri; J. W. Emsley, Phys. Chem. Chem. Phys. 2000, 2 (15), 3405–3413. 146. J. H. Prestegard; H. M. Al-Hashimi; J. R. Tolman, Q. Rev. Biophys. 2000, 33 (4), 371–424. 147. A. Almond; J. Bunkenborg; T. Franch; C. H. Gotfredsen; J. O. Duus, J. Am. Chem. Soc. 2001, 123 (20), 4792–4802. 148. A. Almond; J. O. Duus, J. Biomol. NMR 2001, 20 (4), 351–363. 149. F. Tian; H. M. Al-Hashimi; J. L. Craighead; J. H. Prestegard, J. Am. Chem. Soc. 2001, 123 (3), 485–492. 150. A. Almond; B. O. Petersen; J. O. Duus, Biochemistry 2004, 43 (19), 5853–5863. 151. T. N. Pham; T. Liptaj; K. Bromek; D. Uhrin, J. Magn. Reson. 2002, 157 (2), 200–209. 152. X. B. Yi; A. Venot; J. Glushka; J. H. Prestegard, J. Am. Chem. Soc. 2004, 126 (42), 13636–13638. 153. T. N. Pham; S. L. Hinchley; D. W. H. Rankin; T. Liptaj; D. Uhrin, J. Am. Chem. Soc. 2004, 126 (40), 13100–13110. 154. L. Jin; T. N. Pham; D. Uhrin, ChemPhysChem 2007, 8 (8), 1228–1235. 155. C. Landersjo¨; J. L. M. Jansson; A. Maliniak; G. Widmalm, J. Phys. Chem. B 2005, 109 (36), 17320–17326. 156. M. Martin-Pastor; C. A. Bush, J. Biomol. NMR 2001, 19 (2), 125–139. 157. K. Lycknert; A. Maliniak; G. Widmalm, J. Phys. Chem. A 2001, 105 (21), 5119–5122. 158. M. Martin-Pastor; A. Canales; F. Corzana; J. L. Asensio; J. Jimenez-Barbero, J. Am. Chem. Soc. 2005, 127 (10), 3589–3595. 159. M. Martin-Pastor; C. A. Bush, Biochemistry 2000, 39 (16), 4674–4683. 160. B. Stevensson; C. Landersjo; G. Widmalm; A. Maliniak, J. Am. Chem. Soc. 2002, 124 (21), 5946–5947. 161. D. I. Freedberg, J. Am. Chem. Soc. 2002, 124 (10), 2358–2362. 162. H. F. Azurmendi; C. A. Bush, J. Am. Chem. Soc. 2002, 124 (11), 2426–2427. 163. R. M. Venable; F. Delaglio; S. E. Norris; D. I. Freedberg, Carbohydr. Res. 2005, 340 (5), 863–874. 164. C. Landersjo¨; B. Stevensson; R. Eklund; J. Ostervall; P. Soderman; G. Widmalm; A. Maliniak, J. Biomol. NMR 2006, 35 (2), 89–101. 165. P. Berthault; D. Jeannerat; F. Camerel; F. A. Salgado; Y. Boulard; J. C. P. Gabriel; H. Desvaux, Carbohydr. Res. 2003, 338 (17), 1771–1785. 166. R. M. Gschwind, Angew. Chem. Int. Ed. Engl. 2005, 44 (30), 4666–4668. 167. J. L. Yan; E. R. Zartler, Magn. Reson. Chem. 2005, 43 (1), 53–64. 168. C. M. Thiele, Concepts Magn. Reson. Part A 2007, 30A (2), 65–80. 169. S. Sykora; J. Vogt; H. Bosiger; P. Diehl, J. Magn. Reson. 1979, 36 (1), 53–60. 170. H. Neubauer; J. Meiler; W. Peti; C. Griesinger, Helv. Chim. Acta 2001, 84 (1), 243–258. 171. F. Delaglio; Z. R. Wu; A. Bax, J. Magn. Reson. 2001, 149 (2), 276–281. 172. W. Willker; D. Leibfritz, J. Magn. Reson. 1992, 99 (2), 421–425. 173. M. H. Lerche; A. Meissner; F. M. Poulsen; O. W. Sorensen, J. Magn. Reson. 1999, 140 (1), 259–263. 174. T. S. Untidt; T. Schulte-Herbruggen; O. W. Sorensen; N. C. Nielsen, J. Phys. Chem. A 1999, 103 (45), 8921–8926. 175. K. E. Ko¨ve´r; K. Fehe´r, J. Magn. Reson. 2004, 168 (2), 307–313. 176. G. Otting; M. Ruckert; M. H. Levitt; A. Moshref, J. Biomol. NMR 2000, 16 (4), 343–346.

Biomolecular Recognition by Oligosaccharides and Glycopeptides: The NMR Point of View 177. 178. 179. 180. 181. 182. 183. 184. 185. 186. 187. 188. 189.

190. 191. 192. 193.

194. 195. 196. 197. 198. 199. 200. 201. 202. 203. 204. 205. 206.

207. 208. 209. 210. 211. 212. 213. 214. 215. 216. 217. 218. 219. 220. 221. 222. 223. 224. 225. 226. 227. 228. 229. 230. 231.

239

F. Tian; P. J. Bolon; J. H. Prestegard, J. Am. Chem. Soc. 1999, 121 (33), 7712–7713. T. N. Pham; T. Liptaj; P. N. Barlow; D. Uhrin, Magn. Reson. Chem. 2002, 40 (11), 729–732. T. Rundlof; C. Landersjo; K. Lycknert; A. Maliniak; G. Widmalm, Magn. Reson. Chem. 1998, 36 (10), 773–776. D. Uhrin; T. Liptaj; K. E. Ko¨ve´r, J. Magn. Reson. Ser. A 1993, 101 (1), 41–46. N. Tjandra; A. Bax, J. Magn. Reson. 1997, 124 (2), 512–515. G. R. Kiddle; S. W. Homans, FEBS Lett. 1998, 436 (1), 128–130. J. R. Garbow; D. P. Weitekamp; A. Pines, Chem. Phys. Lett. 1982, 93 (5), 504–509. G. Nodet; L. Poggi; D. Abergel; C. Gourmala; D. X. Dong; Y. M. Zhang; J. M. Mallet; G. Bodenhausen, J. Am. Chem. Soc. 2007, 129 (29), 9080–9085. J. A. Losonczi; M. Andrec; M. W. F. Fischer; J. H. Prestegard, J. Magn. Reson. 1999, 138 (2), 334–342. H. Valafar; J. H. Prestegard, J. Magn. Reson. 2004, 167 (2), 228–241. M. Deschamps; I. D. Campbell; J. Boyd, J. Magn. Reson. 2005, 172 (1), 118–132. M. Zweckstetter; A. Bax, J. Am. Chem. Soc. 2000, 122 (15), 3791–3792. J. H. Prestegard; X. B. Yi, Structure and Dynamics of Carbohydrates Using Residual Dipolar Couplings. In NMR Spectroscopy and Computer Modeling of Carbohydrates: Recent Advances; J. F. G. Vliegenthart, R. J. Woods, Eds.; John Wiley & Sons: Chichester, 2006; Vol. 930, pp 40–59. E. T. Samulski; R. Y. Dong, J. Chem. Phys. 1982, 77 (10), 5090–5096. C. Landersjo¨; C. Hoog; A. Maliniak; G. Widmalm, J. Phys. Chem. B 2000, 104 (23), 5618–5624. A. Almond; J. B. Axelsen, J. Am. Chem. Soc. 2002, 124 (34), 9986–9987. D. I. Freedberg; S. O. Ano; S. E. Norris; R. M. Venable, Carbohydrate Structure from NMR Residual Dipolar Couplings: Is There a Correlation between Lactose’s Anomeric Configuration and Its Three-Dimensional Structure? In: In NMR Spectroscopy and Computer Modeling of Carbohydrates: Recent Advances; J. F. G. Vliegenthart, R. J. Woods, Eds.; John Wiley & Sons: Chichester, 2006; Vol. 930, pp 220–234. J. W. Emsley; G. R. Luckhurst; C. P. Stockley, Proc. Math. Phys. Eng. Sci. 1982, 381 (1780), 117–138. D. Catalano; L. Dibari; C. A. Veracini; G. N. Shilstone; C. Zannoni, J. Chem. Phys. 1991, 94 (5), 3928–3935. H. J. Gabius; H. C. Siebert; S. Andre; J. Jimenez-Barbero; H. Rudiger, ChemBioChem 2004, 5 (6), 740–764. D. B. Moody, Nature 2007, 448 (7149), 36–37. M. A. Johnson; B. M. Pinto, Carbohydr. Res. 2004, 339 (5), 907–928. H. Kogelberg; D. Solis; J. Jimenez-Barbero, Curr. Opin. Struct. Biol. 2003, 13 (5), 646–653. J. Jime´nez-Barbero; T. Peters, NMR Spectroscopy of Glycoconjugates; Wiley-VCH: Weinheim, 2002. S. Banerji; A. J. Wright; M. Noble; D. J. Mahoney; I. D. Campbell; A. J. Day; D. G. Jackson, Nat. Struct. Mol. Biol. 2007, 14 (3), 234–239. M. A. Walti; P. J. Walser; S. Thore; A. Grunler; M. Bednar; M. Kunzler; M. Aebi, J. Mol. Biol. 2008, 379 (1), 146–159. U. Neu; K. Woellner; G. Gauglitz; T. Stehle, Proc. Natl. Acad. Sci. U.S.A. 2008, 105 (13), 5219–5224. M. R. Wormald; A. J. Petrescu; Y. L. Pao; A. Glithero; T. Elliott; R. A. Dwek, Chem. Rev. 2002, 102 (2), 371–386. A. Imberty; S. Perez, Chem. Rev. 2000, 100 (12), 4567–4588. S. Perez; A. Imberty; S. B. Engelsen; J. Gruza; K. Mazeau; J. Jimenez-Barbero; A. Poveda; J. F. Espinosa; B. P. van Eyck; G. Johnson; A. D. French; M. Louise; C. E. Kouwijzer; P. D. J. Grootenuis; A. Bernardi; L. Raimondi; H. Senderowitz; V. Durier; G. Vergoten; K. Rasmussen, Carbohydr. Res. 1998, 314 (3–4), 141–155. O. Coskuner, J. Chem. Phys. 2007, 127 (1), 015101. M. K. Dowd; P. J. Reilly; A. D. French, Biopolymers 1994, 34 (5), 625–638. F. Corzana; I. Cuesta; F. Freire; J. Revuelta; M. Torrado; A. Bastida; J. Jimenez-Barbero; J. L. Asensio, J. Am. Chem. Soc. 2007, 129 (10), 2849–2865. K. N. Kirschner; A. B. Yongye; S. M. Tschampel; J. Gonzalez-Outeirino; C. R. Daniels; B. L. Foley; R. J. Woods, J. Comput. Chem. 2008, 29 (4), 622–655. J. L. Asensio; M. Martin-Pastor; J. Jimenez-Barbero, Int. J. Biol. Macromol. 1995, 17 (3–4), 137–148. A. D. French; A. M. Kelterer; G. P. Johnson; M. K. Dowd; C. J. Cramer, J. Mol. Graph. Model. 2000, 18 (2), 95–107. B. Lopez-Mendez; C. Jia; Y. Zhang; L. H. Zhang; P. Sinay; J. Jimenez-Barbero; M. Sollogoub, Chem. Asian J. 2008, 3 (1), 51–58. A. Silipo; Z. Zhang; F. J. Canada; A. Molinaro; R. J. Linhardt; J. Jimenez-Barbero, ChemBioChem 2008, 9 (2), 240–252. A. Poveda; J. L. Asensio; M. Martin-Pastor; J. Jimenez-Barbero, Carbohydr. Res. 1997, 300 (1), 3–10. J. Jimenez-Barbero; J. L. Asensio; F. J. Canada; A. Poveda, Curr. Opin. Struct. Biol. 1999, 9 (5), 549–555. J. L. Asensio; J. Jimenez-Barbero, Biopolymers 1995, 35 (1), 55–73. D. A. Cumming; J. P. Carver, Biochemistry 1987, 26 (21), 6664–6676. J. L. Asensio; A. Hidalgo; I. Cuesta; C. Gonzalez; J. Canada; C. Vicent; J. L. Chiara; G. Cuevas; J. Jimenez-Barbero, Chem. Commun. (Camb.) 2002 (19), 2232–2233. J. L. Asensio; F. J. Canada; X. Cheng; N. Khan; D. R. Mootoo; J. Jimenez-Barbero, Chemistry 2000, 6 (6), 1035–1041. D. A. Cumming; R. N. Shah; J. J. Krepinsky; A. A. Grey; J. P. Carver, Biochemistry 1987, 26 (21), 6655–6663. H.-C. Siebert; M. Frank; C.-W. von der Lieth; J. Jime´nez-Barbero; H.-J. Gabius, Detection of Hydroxyl Protons. In NMR Spectroscopy of Glycoconjugates; J. Jime´nez-Barbero, T. Peters, Eds.; Wiley-VCH: Weinheim, 2002; pp 39–57. J. Dabrowski; T. Kozar; H. Grosskurth; N. E. Nifantev, J. Am. Chem. Soc. 1995, 117 (20), 5534–5539. R. Eklund; K. Lycknert; P. Soderman; G. Widmalm, J. Phys. Chem. B 2005, 109 (42), 19936–19945. J. L. Asensio; A. Hidalgo; I. Cuesta; C. Gonzalez; J. Canada; C. Vicent; J. L. Chiara; G. Cuevas; J. Jimenez-Barbero, Chemistry 2002, 8 (22), 5228–5240. A. Poveda; J. L. Asensio; M. Martin-Pastor; J. Jimenez-Barbero, Chem. Commun.421–422. M. Hricovini; R. N. Shah; J. P. Carver, Biochemistry 1992, 31 (41), 10018–10023. A. Poveda; M. Santamaria; M. Bernabe; A. Prieto; M. Bruix; J. Corzo; J. Jimenez-Barbero, Carbohydr. Res. 1997, 304 (3–4), 209–217. M. Mackeen; A. Almond; I. Cumpstey; S. C. Enis; E. Kupce; T. D. Butters; A. J. Fairbanks; R. A. Dwek; M. R. Wormald, Org. Biomol. Chem. 2006, 4 (11), 2241–2246. A. Andersson; A. Ahl; R. Eklund; G. Widmalm; L. Maler, J. Biomol. NMR 2005, 31 (4), 311–320. A. M. Dixon; R. Venable; G. Widmalm; T. E. Bull; R. W. Pastor, Biopolymers 2003, 69 (4), 448–460.

240 Biomolecular Recognition by Oligosaccharides and Glycopeptides: The NMR Point of View 232. 233. 234. 235. 236. 237. 238. 239. 240. 241. 242. 243. 244. 245. 246. 247. 248. 249. 250. 251. 252. 253. 254. 255. 256. 257. 258. 259. 260. 261. 262. 263. 264. 265. 266. 267. 268. 269. 270. 271. 272. 273. 274. 275. 276. 277. 278. 279. 280. 281.

C. Hoog; C. Landersjo; G. Widmalm, Chemistry 2001, 7 (14), 3069–3077. A. Poveda; J. L. Asensio; M. Martin-Pastor; J. Jimenez-Barbero, J. Biomol. NMR 1997, 10 (1), 29–43. L. Maler; G. Widmalm; J. Kowalewski, J. Biomol. NMR 1996, 7 (1), 1–7. A. Poveda; M. Martin-Pastor; M. Bernabe; J. A. Leal; J. Jimenez-Barbero, Glycoconj. J. 1998, 15 (3), 309–321. K. Lycknert; G. Widmalm, Biomacromolecules 2004, 5 (3), 1015–1020. B. Mulloy; M. J. Forster, Glycobiology 2000, 10 (11), 1147–1156. J. Angulo; M. Hricovini; M. Gairi; M. Guerrini; J. L. de Paz; R. Ojeda; M. Martin-Lomas; P. M. Nieto, Glycobiology 2005, 15 (10), 1008–1015. D. R. Ferro; A. Provasoli; M. Ragazzi; B. Casu; G. Torri; V. Bossennec; B. Perly; P. Sinay; M. Petitou; J. Choay, Carbohydr. Res. 1990, 195 (2), 157–167. J. Angulo; P. M. Nieto; M. Martin-Lomas, Chem. Commun. 2003 (13), 1512–1513. D. Mikhailov; R. J. Linhardt; K. H. Mayo, Biochem. J. 1997, 328 (Pt 1), 51–61. G. Torri; B. Casu; G. Gatti; M. Petitou; J. Choay; J. C. Jacquinet; P. Sinay, Biochem. Biophys. Res. Commun. 1985, 128 (1), 134–140. D. Acquotti; L. Poppe; J. Dabrowski; C. W. Vonderlieth; S. Sonnino; G. Tettamanti, J. Am. Chem. Soc. 1990, 112 (21), 7772–7778. L. Poppe; H. van Halbeek; D. Acquotti; S. Sonnino, Biophys. J. 1994, 66 (5), 1642–1652. B. G. Winsborrow; J. R. Brisson; I. C. Smith; H. C. Jarrell, Biophys. J. 1992, 63 (2), 428–437. K. P. Howard; J. H. Prestegard, Biophys. J. 1996, 71 (5), 2573–2582. J. H. Prestegard; J. Glushka, Residual Dipolar Couplings: Structure and Dynamics of Glycolipids. In NMR Spectroscopy of Glycoconjugates; J. Jimenez Barbero, T. Peters, Eds.; Wiley-VCH: Weinheim, 2002; pp 231–245. J. J. Hernandez-Gay; L. Panza; F. Ronchetti; F. J. Canada; F. Compostella; J. Jimenez-Barbero, Carbohydr. Res. 2007, 342 (12–13), 1966–1973. F. Chevalier; J. Lopez-Prados; P. Groves; S. Perez; M. Martin-Lomas; P. M. Nieto, Glycobiology 2006, 16 (10), 969–980. J. Revuelta; T. Vacas; M. Torrado; F. Corzana; C. Gonzalez; J. Jimenez-Barbero; M. Menendez; A. Bastida; J. L. Asensio, J. Am. Chem. Soc. 2008, 130 (15), 5086–5103. F. Freire; I. Cuesta; F. Corzana; J. Revuelta; C. Gonzalez; M. Hricovini; A. Bastida; J. Jimenez-Barbero; J. L. Asensio, Chem. Commun. (Camb.)174–176. B. N. Rao; C. A. Bush, Biopolymers 1987, 26 (8), 1227–1244. A. Kuhn; H. Kunz, Angew. Chem. Int. Ed. Engl. 2007, 46 (3), 454–458. F. Corzana; J. H. Busto; S. B. Engelsen; J. Jimenez-Barbero; J. L. Asensio; J. M. Peregrina; A. Avenoza, Chemistry 2006, 12 (30), 7864–7871. F. Corzana; J. H. Busto; G. Jimenez-Oses; M. Garcia de Luis; J. L. Asensio; J. Jimenez-Barbero; J. M. Peregrina; A. Avenoza, J. Am. Chem. Soc. 2007, 129 (30), 9458–9467. F. Corzana; J. H. Busto; G. Jimenez-Oses; J. L. Asensio; J. Jimenez-Barbero; J. M. Peregrina; A. Avenoza, J. Am. Chem. Soc. 2006, 128 (45), 14640–14648. D. M. Coltart; A. K. Royyuru; L. J. Williams; P. W. Glunz; D. Sames; S. D. Kuduk; J. B. Schwarz; X. T. Chen; S. J. Danishefsky; D. H. Live, J. Am. Chem. Soc. 2002, 124 (33), 9833–9844. L. Kinarsky; G. Suryanarayanan; O. Prakash; H. Paulsen; H. Clausen; F. G. Hanisch; M. A. Hollingsworth; S. Sherman, Glycobiology 2003, 13 (12), 929–939. Y. Tachibana; G. L. Fletcher; N. Fujitani; S. Tsuda; K. Monde; S. Nishimura, Angew. Chem. Int. Ed. Engl. 2004, 43 (7), 856–862. S. E. O’Connor; J. Pohlmann; B. Imperiali; I. Saskiawan; K. Yamamoto, J. Am. Chem. Soc. 2001, 123 (25), 6187–6188. C. J. Bosques; S. M. Tschampel; R. J. Woods; B. Imperiali, J. Am. Chem. Soc. 2004, 126 (27), 8421–8425. S. E. O’Connor; B. Imperiali, Chem. Biol. 1996, 3 (10), 803–812. M. M. Palian; V. I. Boguslavsky; D. F. O’Brien; R. Polt, J. Am. Chem. Soc. 2003, 125 (19), 5823–5831. C. Vanhaverbeke; J. P. Simorre; R. Sadir; P. Gans; H. Lortat-Jacob, Biochem. J. 2004, 384 (Pt 1), 93–99. N. Aboitiz; M. Vila-Perello; P. Groves; J. L. Asensio; D. Andreu; F. J. Canada; J. Jimenez-Barbero, ChemBioChem 2004, 5 (9), 1245–1255. D. C. Williams, Jr.; M. Cai; J. Y. Suh; A. Peterkofsky; G. M. Clore, J. Biol. Chem. 2005, 280 (21), 20775–20784. A. Canales-Mayordomo; R. Fayos; J. Angulo; R. Ojeda; M. Martin-Pastor; P. M. Nieto; M. Martin-Lomas; R. Lozano; G. Gimenez-Gallego; J. Jimenez-Barbero, J. Biomol. NMR 2006, 35 (4), 225–239. S. C. Tjong; T. S. Chen; W. N. Huang; W. G. Wu, Biochemistry 2007, 46 (35), 9941–9952. C. D. Blundell; A. Almond; D. J. Mahoney; P. L. DeAngelis; I. D. Campbell; A. J. Day, J. Biol. Chem. 2005, 280 (18), 18189–18201. K. W. Hung; T. K. Kumar; K. M. Kathir; P. Xu; F. Ni; H. H. Ji; M. C. Chen; C. C. Yang; F. P. Lin; I. M. Chiu; C. Yu, Biochemistry 2005, 44 (48), 15787–15798. A. P. Herbert; J. A. Deakin; C. Q. Schmidt; B. S. Blaum; C. Egan; V. P. Ferreira; M. K. Pangburn; M. Lyon; D. Uhrin; P. N. Barlow, J. Biol. Chem. 2007, 282 (26), 18960–18968. A. Canales; R. Lozano; B. Lopez-Mendez; J. Angulo; R. Ojeda; P. M. Nieto; M. Martin-Lomas; G. Gimenez-Gallego; J. JimenezBarbero, FEBS J. 2006, 273 (20), 4716–4727. L. M. Koharudin; A. R. Viscomi; J. G. Jee; S. Ottonello; A. M. Gronenborn, Structure 2008, 16 (4), 570–584. C. A. Bewley; S. Kiyonaka; I. Hamachi, J. Mol. Biol. 2002, 322 (4), 881–889. A. A. Bothnerb; R. Gassend, Ann. N. Y. Acad. Sci. 1973, 222 (DEC31), 668–676. J. P. Albrand; B. Birdsall; J. Feeney; G. C. K. Roberts; A. S. V. Burgen, Int. J. Biol. Macromol. 1979, 1 (1), 37–41. V. L. Bevilacqua; D. S. Thomson; J. H. Prestegard, Biochemistry 1990, 29 (23), 5529–5537. V. L. Bevilacqua; Y. Kim; J. H. Prestegard, Biochemistry 1992, 31 (39), 9339–9349. A. Poveda; J. Jimenez-Barbero, Chem. Soc. Rev. 1998, 27 (2), 133–143. J. Jime´nez-Barbero; T. Peters, TR-NOE Experiments to Study Carbohydrate-Protein Interactions. In NMR Spectroscopy of Glycoconjugates; J. Jime´nez-Barbero, T. Peters, Eds.; Wiley-VCH: Weinheim, 2002; pp 289–309. J. Angulo; C. Rademacher; T. Biet; A. J. Benie; A. Blume; H. Peters; M. Palcic; F. Parra; T. Peters, Meth. Enzymol. 2006, 416, 12–30.

Biomolecular Recognition by Oligosaccharides and Glycopeptides: The NMR Point of View

241

282. J. L. Asensio; F. J. Canada; J. Jimenez-Barbero, Eur. J. Biochem. 1995, 233 (2), 618–630. 283. A. Poveda; J. L. Asensio; J. F. Espinosa; M. Martin-Pastor; J. Canada; J. Jimenez-Barbero, J. Mol. Graph. Model. 1997, 15 (1), 9–17, 53. 284. A. Germer; C. Mugge; M. G. Peter; A. Rottmann; E. Kleinpeter, Chemistry 2003, 9 (9), 1964–1973. 285. H. C. Siebert; S. Andre; S. Y. Lu; M. Frank; H. Kaltner; J. A. van Kuik; E. Y. Korchagina; N. Bovin; E. Tajkhorshid; R. Kaptein; J. F. Vliegenthart; C. W. von der Lieth; J. Jimenez-Barbero; J. Kopitz; H. J. Gabius, Biochemistry 2003, 42 (50), 14762–14773. 286. T. Weimar; B. O. Petersen; B. Svensson; B. M. Pinto, Carbohydr. Res. 2000, 326 (1), 50–55. 287. K. Lycknert; M. Edblad; A. Imberty; G. Widmalm, Biochemistry 2004, 43 (30), 9647–9654. 288. M. I. Chavez; C. Andreu; P. Vidal; N. Aboitiz; F. Freire; P. Groves; J. L. Asensio; G. Asensio; M. Muraki; F. J. Canada; J. JimenezBarbero, Chemistry 2005, 11 (23), 7060–7074. 289. J. Jimenez-Barbero; Javier F. Canada; J. L. Asensio; N. Aboitiz; P. Vidal; A. Canales; P. Groves; H. J. Gabius; H. C. Siebert, Adv. Carbohydr. Chem. Biochem. 2006, 60, 303–354. 290. M. Takeda; S. Ogino; R. Umemoto; M. Sakakura; M. Kajiwara; K. N. Sugahara; H. Hayasaka; M. Miyasaka; H. Terasawa; I. Shimada, J. Biol. Chem. 2006, 281 (52), 40089–40095. 291. R. B. Tunnicliffe; D. N. Bolam; G. Pell; H. J. Gilbert; M. P. Williamson, J. Mol. Biol. 2005, 347 (2), 287–296. 292. R. Szilaghi; S. Shahzad-ul-Hussan; T. Weimar, ChemBioChem 2005, 6 (7), 1270–1276. 293. D. Raghunathan; V. M. Sanchez-Pedregal; J. Junker; C. Schwiegk; M. Kalesse; A. Kirschning; T. Carlomagno, Nucleic Acids Res. 2006, 34 (12), 3599–3608. 294. A. Bhunia; V. Jayalakshmi; A. J. Benie; O. Schuster; S. Kelm; N. R. Krishna; T. Peters, Carbohydr. Res. 2004, 339 (2), 259–267. 295. C. Sandstrom; O. Berteau; E. Gemma; S. Oscarson; L. Kenne; A. M. Gronenborn, Biochemistry 2004, 43 (44), 13926–13931. 296. J. Angulo; B. Langpap; A. Blume; T. Biet; B. Meyer; N. R. Krishna; H. Peters; M. M. Palcic; T. Peters, J. Am. Chem. Soc. 2006, 128 (41), 13529–13538. 297. A. J. Benie; R. Moser; E. Bauml; D. Blaas; T. Peters, J. Am. Chem. Soc. 2003, 125 (1), 14–15. 298. C. Rademacher; N. R. Krishna; M. Palcic; F. Parra; T. Peters, J. Am. Chem. Soc. 2008, 130 (11), 3669–3675. 299. T. Haselhorst; H. Blanchard; M. Frank; M. J. Kraschnefski; M. J. Kiefel; A. J. Szyczew; J. C. Dyason; F. Fleming; G. Holloway; B. S. Coulson; M. von Itzstein, Glycobiology 2007, 17 (1), 68–81. 300. S. Mari; D. Serrano-Gomez; F. J. Canada; A. L. Corbi; J. Jimenez-Barbero, Angew. Chem. Int. Ed. Engl. 2004, 44 (2), 296–298. 301. K. E. Ko¨ve´r; P. Groves; J. Jimenez-Barbero; G. Batta, J. Am. Chem. Soc. 2007, 129 (37), 11579–11582. 302. A. Canales; J. Angulo; R. Ojeda; M. Bruix; R. Fayos; R. Lozano; G. Gimenez-Gallego; M. Martin-Lomas; P. M. Nieto; J. JimenezBarbero, J. Am. Chem. Soc. 2005, 127 (16), 5778–5779. 303. M. Guerrini; M. Hricovini; G. Torri, Curr. Pharm. Des. 2007, 13 (20), 2045–2056. 304. M. Guerrini; S. Guglieri; D. Beccati; G. Torri; C. Viskov; P. Mourier, Biochem. J. 2006, 399 (2), 191–198. 305. S. Ravindranathan; J. M. Mallet; P. Sinay; G. Bodenhausen, J. Magn. Reson. 2003, 163 (2), 199–207. 306. R. D. Seidel, 3rd; T. Zhuang; J. H. Prestegard, J. Am. Chem. Soc. 2007, 129 (15), 4834–4839. 307. F. Yu; J. J. Wolff; I. J. Amster; J. H. Prestegard, J. Am. Chem. Soc. 2007, 129 (43), 13288–13297. 308. H. Shimizu; A. Donohue-Rolfe; S. W. Homans, J. Am. Chem. Soc. 1999, 121 (24), 5815–5816. 309. N. U. Jain; S. Noble; J. H. Prestegard, J. Mol. Biol. 2003, 328 (2), 451–462. 310. C. Tang; C. D. Schwieters; G. M. Clore, Nature 2007, 449 (7165), 1078–1082. 311. S. Liu; A. Venot; L. Meng; F. Tian; K. W. Moremen; G. J. Boons; J. H. Prestegard, Chem. Biol. 2007, 14 (4), 409–418. 312. T. Zhuang; H. Leffler; J. H. Prestegard, Protein Sci. 2006, 15 (7), 1780–1790. 313. T. Zhuang; H. S. Lee; B. Imperiali; J. H. Prestegard, Protein Sci. 2008, 17 (7), 1220–1231. 314. C. D. Blundell; M. A. Reed; M. Overduin; A. Almond, Carbohydr. Res. 2006, 341 (12), 1985–1991. 315. R. Dziarski; R. I. Tapping; P. S. Tobias, J. Biol. Chem. 1998, 273 (15), 8680–8690. 316. L. Franchi; C. McDonald; T. D. Kanneganti; A. Amer; G. Nunez, J. Immunol. 2006, 177 (6), 3507–3513. 317. R. Dziarski; S. Viriyakosol; T. N. Kirkland; D. Gupta, Infect. Immun. 2000, 68 (9), 5254–5260. 318. J. Nadesalingam; A. W. Dodds; K. B. M. Reid; N. Palaniyar, J. Immunol. 2005, 175 (3), 1785–1794. 319. H. L. Cash; C. V. Whitham; C. L. Behrendt; L. V. Hooper, Science 2006, 313 (5790), 1126–1130. 320. D. C. Phillips, Proc. Nat. Acad. Sci. U.S.A. 1967, 57, 484. 321. H. G. Sahl, Chem. Biol. 2006, 13, 1015–1016. 322. R. J. Guan; R. A. Mariuzza, Trends Microbiol. 2007, 15 (3), 127. 323. J. Royet; R. Dziarski, Nat. Rev. Microbiol. 2007, 5 (4), 264–277. 324. X. F. Lu; M. H. Wang; J. Qi; H. T. Wang; X. N. Li; D. Gupta; R. Dziarski, J. Biol. Chem. 2006, 281 (9), 5895–5907. 325. H. Steiner, Immunol. Rev. 2004, 198 (1), 83–96. 326. R. Dziarski; D. Gupta, J. Endotoxin Res. 2005, 11 (5), 304–310. 327. B. Fournier; D. J. Philpott, Clin. Microbiol. Rev. 2005, 18 (3), 521. 328. R. Dziarski, Cell. Mol. Life Sci. 2003, 60 (9), 1793–1804. 329. W. Vollmer, FEMS Microbiol. Rev. 2008, 32, 287–306. 330. J. Lee; R. I. Hollingsworth, Carbohydr. Res. 1997, 303 (1), 103–112. 331. E. Simelyte; M. Rimpilainen; L. Lehtonen; X. Zhang; P. Toivanen, Infect. Immun. 2000, 68 (6), 3535–3540. 332. P. Mellroth; J. Karlsson; J. Hakansson; N. Schultz; W. E. Goldman; H. Steiner, Proc. Natl. Acad. Sci. U.S.A. 2005, 102 (18), 6455–6460. 333. B. A. Dmitriev; O. Holst; E. T. Rietschel; S. Ehlers, J. Bacteriol. 2004, 186 (21), 7141–7148. 334. W. Vollmer; J. V. Holtje, J. Bacteriol. 2004, 186 (18), 5978–5987. 335. S. O. Meroueh; K. Z. Bencze; D. Hesek; M. Lee; J. F. Fisher; T. L. Stemmler; S. Mobashery, Proc. Natl. Acad. Sci. U.S.A. 2006, 103 (12), 4404–4409. 336. W. Vollmer; B. Joris; P. Charlier; S. Foster, FEMS Microbiol. Rev. 2008, 32 (2), 259–286. 337. T. Kern; S. Hediger; P. Muller; C. Giustini; B. Joris; C. Bougault; W. Vollmer; J. P. Simorre, J. Am. Chem. Soc. 2008, 130 (17), 5618.

242 Biomolecular Recognition by Oligosaccharides and Glycopeptides: The NMR Point of View 338. F. Ellouz; A. Adam; R. Ciobaru; E. Lederer, Biochem. Biophys. Res. Commun. 1974, 59, 1317–1325. 339. S. Kotani; Y. Watanabe; F. Kinoshita; T. Shimono; I. Morisaki; T. Shiba; S. Kusumoto; Y. Tarumi; K. Ikenaka, Biken J. 1975, 18 (2), 105–111. 340. I. Azuma; K. Sugimura; T. Taniyama; M. Yamawaki; Y. Yamamura; S. Kusumoto; S. Okada; T. Shiba, Infect. Immun. 1976, 14 (1), 18–27. 341. L. Chedid; F. Audibert; P. Lefrancier; J. Choay; E. Lederer, Proc. Natl. Acad. Sci. U.S.A. 1976, 73 (7), 2472–2475. 342. D. J. Silva; C. L. Bowe; A. A. Branstrom; E. R. Baizman; R. J. Sofia, Bioorg. Med. Chem. Lett. 2000, 10 (24), 2811–2813. 343. S. Ha; E. Chang; M. C. Lo; H. Men; P. Park; M. Ge; S. Walker, J. Am. Chem. Soc. 1999, 121 (37), 8415–8426. 344. M. S. VanNieuwenhze; S. C. Mauldin; M. Zia-Ebrahimi; J. A. Aikins; L. C. Blaszczak, J. Am. Chem. Soc. 2001, 123 (29), 6983–6988. 345. B. Schwartz; J. A. Markwalder; Y. Wang, J. Am. Chem. Soc. 2001, 123 (47), 11638–11643. 346. M. S. VanNieuwenhze; S. C. Mauldin; M. Zia-Ebrahimi; B. E. Winger; W. J. Hornback; S. L. Saha; J. A. Aikins; L. C. Blaszczak, J. Am. Chem. Soc. 2002, 124 (14), 3656–3660. 347. D. Hesek; M. J. Lee; K. I. Morio; S. Mobashery, J. Org. Chem. 2004, 69 (6), 2137–2146. 348. D. Hesek; M. Suvorov; K. Morio; M. Lee; S. Brown; S. B. Vakulenko; S. Mobashery, J. Org. Chem. 2004, 69 (3), 778–784. 349. R. J. Cox; A. Sutherland; J. C. Vederas, Bioorg. Med. Chem. 2000, 8 (5), 843–871. 350. N. Kubasch; R. R. Schmidt, Eur. J. Org. Chem. 2002, (16), 2710–2726. 351. A. R. Chowdhury; G. J. Boons, Tetrahedron Lett. 2005, 46 (10), 1675–1678. 352. S. Kumar; A. Roychowdhury; B. Ember; Q. Wang; R. J. Guan; R. A. Mariuzza; G. J. Boons, J. Biol. Chem. 2005, 280 (44), 37005–37012. 353. C. P. Swaminathan; P. H. Brown; A. Roychowdhury; Q. Wang; R. J. Guan; N. Silverman; W. E. Goldman; G. J. Boons; R. A. Mariuzza, Proc. Natl. Acad. Sci. U.S.A. 2006, 103 (3), 684–689. 354. S. Inamura; K. Fukase; S. Kusumoto, Tetrahedron Lett. 2001, 42 (43), 7613–7616. 355. S. Inamura; Y. Fujimoto; A. Kawasaki; Z. Shiokawa; E. Woelk; H. Heine; B. Lindner; N. Inohara; S. Kusumoto; K. Fukase, Org. Biomol. Chem. 2006, 4 (2), 232–242. 356. Y. Fujimoto; S. Inamura; A. Kawasaki; Z. Shiokawa; A. Shimoyama; T. Hashimoto; S. Kusumoto; K. Fukase, J. Endotoxin Res. 2007, 13 (3), 189–196. 357. J. W. Park; B. R. Je; S. Piao; S. Inamura; Y. Fujimoto; K. Fukase; S. Kusumoto; K. Soderhall; N. C. Ha; B. L. Lee, J. Biol. Chem. 2006, 281 (12), 7747–7755. 358. D. Keglevic´; B. Ladesˇic´; J. Tomasˇic´; Z. Valinger; R. Naumski, Biochim. Biophys. Acta 1979, 585 (2), 273–281. 359. H. Matter; L. Szila´gyi; P. Forgo´; Z. Marinic´; B. Klaic´, J. Am. Chem. Soc. 1997, 119 (9), 2212–2223. 360. W. Lee; M. A. McDonough; L. P. Kotra; Z. H. Li; N. R. Silvaggi; Y. Takeda; J. A. Kelly; S. Mobashery, Proc. Natl. Acad. Sci. U.S.A. 2001, 98 (4), 1427–1431. 361. B. Halassy; S. Mateljak; F. B. Bouche; M. M. Putz; C. P. Muller; R. Frkanec; L. Habjanec; J. Tomasˇic´, Vaccine 2006, 24 (2), 185–194. 362. D. Ljevakovic´; J. Tomasˇic´; V. Sˇporec; B. H. Sˇpoljar; I. Hanzl-Dujmovic´, Bioorg. Med. Chem. 2000, 8 (10), 2441–2449. 363. K. Fehe´r; P. Pristovsˇek; L. Szila´gyi; D. Ljevakovic´; J. Tomasˇic´, Bioorg. Med. Chem. 2003, 11 (14), 3133–3140. 364. S. J. Kim; L. Cegelski; M. Preobrazhenskaya; J. Schaefer, Biochemistry 2006, 45 (16), 5235–5250. 365. H. Yoshida; K. Kinoshita; M. Ashida, J. Biol. Chem. 1996, 271 (23), 13854–13860. 366. D. W. Kang; G. Liu; A. Lundstrom; E. Gelius; H. Steiner, Proc. Natl. Acad. Sci. U.S.A. 1998, 95 (17), 10078–10082. 367. C. Liu; Z. J. Xu; D. Gupta; R. Dziarski, J. Biol. Chem. 2001, 276 (37), 34686–34694. 368. R. Dziarski; D. Gupta, Genome Biol. 2006, 7 (8), 232. 369. S. Cho; Q. Wang; C. P. Swaminathan; D. Hesek; M. Lee; G. J. Boons; S. Mobashery; R. A. Mariuzza, Proc. Natl. Acad. Sci. U.S.A. 2007, 104 (21), 8761–8766. 370. S. E. Girardin; D. J. Philpott, Immunity 2006, 24 (4), 363–366. 371. A. E. Myhre; A. O. Aasen; C. Thiemermann; J. E. Wang, Shock 2006, 25 (3), 227–235. 372. Z. M. Wang; X. N. Li; R. R. Cocklin; M. H. Wang; M. Wang; K. Fukase; S. Inamura; S. Kusumoto; D. Gupta; R. Dziarski, J. Biol. Chem. 2003, 278 (49), 49044–49052. 373. P. Mellroth; H. Steiner, Biochem. Biophys. Res. Commun. 2006, 350 (4), 994–999. 374. R. J. Guan; A. Roychowdhury; B. Ember; S. Kumar; G. J. Boons; R. A. Mariuzza, Proc. Natl. Acad. Sci. U.S.A. 2004, 101 (49), 17168–17173. 375. R. J. Guan; P. H. Brown; C. P. Swaminathan; A. Roychowdhury; G. J. Boons; R. A. Mariuzza, Protein Sci. 2006, 15 (5), 1199–1206. 376. C. R. Stenbak; J. H. Ryu; F. Leulier; S. Pili-Floury; C. Parquet; M. Herve; C. Chaput; I. G. Boneca; W. J. Lee; B. Lemaitre; D. Mengin-Lecreulx, J. Immunol. 2004, 173 (12), 7339–7348. 377. C. I. Chang; Y. Chelliah; D. Borek; D. Mengin-Lecreulx; J. Deisenhofer, Science 2006, 311 (5768), 1761–1764. 378. T. Kaneko; T. Yano; K. Aggarwal; J. H. Lim; K. Ueda; Y. Oshima; C. Peach; D. Erturk-Hasdemir; W. E. Goldman; B. H. Oh; S. Kurata; N. Silverman, Nat. Immunol. 2006, 7 (7), 715–723. 379. J. H. Lim; M. S. Kim; H. E. Kim; T. Yano; Y. Oshima; K. Aggarwal; W. E. Goldman; N. Silverman; S. Kurata; B. H. Oh, J. Biol. Chem. 2006, 281 (12), 8286–8295. 380. A. Taylor; B. C. Das; J. Vanheijenoort, Eur. J. Biochem. 1975, 53 (1), 47–54. 381. E. J. van Asselt; A. Thunnissen; B. W. Dijkstra, J. Mol. Biol. 1999, 291 (4), 877–898. 382. A. K. W. Leung; H. S. Duewel; J. F. Honek; A. M. Berghuis, Biochemistry 2001, 40 (19), 5665–5673. 383. E. J. van Asselt; K. H. Kalk; B. W. Dijkstra, Biochemistry 2000, 39 (8), 1924–1934. 384. S. R. Filipe; A. Tomasz; P. Ligoxygakis, EMBO Rep. 2005, 6 (4), 327–333. 385. K. E. van Straaten; T. R. M. Barends; B. W. Dijkstra; A. Thunnissen, J. Biol. Chem. 2007, 282 (29), 21197–21205. 386. N. C. J. Strynadka; M. N. G. James, J. Mol. Biol. 1991, 220 (2), 401–424. 387. N. C. J. Strynadka; M. N. G. James, FASEB J. 1992, 6 (1), A11. 388. M. Mishima; T. Shida; K. Yabuki; K. Kato; J. Sekiguchi; C. Kojima, Biochemistry 2005, 44 (30), 10153–10163.

Biomolecular Recognition by Oligosaccharides and Glycopeptides: The NMR Point of View 389. 390. 391. 392. 393. 394. 395. 396. 397. 398. 399. 400. 401. 402. 403. 404. 405. 406. 407. 408. 409. 410. 411. 412. 413. 414. 415. 416. 417. 418. 419. 420. 421. 422. 423. 424. 425. 426. 427. 428. 429. 430. 431. 432. 433. 434. 435. 436. 437. 438. 439. 440. 441. 442. 443. 444. 445. 446.

243

J. Freund, Adv. Tuberc. Res. 1956, 1, 130–148. L. Chedid; F. Audibert; M. Jolivet, Dev. Biol. Stand. 1986, 63, 133–140. I. Azuma, Int. J. Immunopharmacol. 1992, 14 (3), 487–496. T. Kaneko; W. E. Goldman; P. Mellroth; H. Steiner; K. Fukase; S. Kusumoto; W. Harley; A. Fox; D. Golenbock; N. Silverman, Immunity 2004, 20 (5), 637–649. W. H. A. Dokter; A. J. Dijkstra; S. B. Koopmans; B. K. Stulp; W. Keck; M. R. Halie; E. Vellenga, J. Biol. Chem. 1994, 269 (6), 4201–4206. L. Szila´gyi; P. Pristovsˇek, Mini Rev. Med. Chem. 2007, 7 (8), 861–870. I. G. Boneca, Curr. Opin. Microbiol. 2005, 8 (1), 46–53. K. A. Cloud-Hansen; S. B. Peterson; E. V. Stabb; W. E. Goldman; M. J. McFall-Ngai; J. Handelsman, Nat. Rev. Microbiol. 2006, 4 (9), 710–716. M. Nieto; H. R. Perkins, Biochem. J. 1971, 123, 773. H. R. Perkins, Biochem. J. 1969, 111, 195–205. D. H. Williams, Acc. Chem. Res. 1984, 17 (10), 364–369. J. R. Kalman; D. H. Williams, J. Am. Chem. Soc. 1980, 102 (3), 906–912. D. H. Williams; M. P. Williamson; D. W. Butcher; S. J. Hammond, J. Am. Chem. Soc. 1983, 105 (5), 1332–1339. J. C. J. Barna; D. H. Williams; M. P. Williamson, J. Chem. Soc. Chem. Commun. 1985, 254–256. J. Balzarini; C. Pannecouque; E. De Clereq; A. Y. Pavlov; S. S. Printsevskaya; O. V. Miroshnikova; M. I. Reznikova; M. N. Preobrazhenskaya, J. Med. Chem. 2003, 46 (13), 2755–2764. M. Schafer; T. R. Schneider; G. M. Sheldrick, Structure 1996, 4 (12), 1509–1515. P. J. Loll; A. E. Bevivino; B. D. Korty; P. H. Axelsen, J. Am. Chem. Soc. 1997, 119 (7), 1516–1522. G. M. Sheldrick; P. G. Jones; O. Kennard; D. H. Williams; G. A. Smith, Nature 1978, 271 (5642), 223–225. M. P. Williamson; D. H. Williams, J. Am. Chem. Soc. 1981, 103 (22), 6580–6585. K. C. Nicolaou; C. N. C. Boddy; S. Brase; N. Winssinger, Angew. Chem. Int. Ed. Engl. 1999, 38 (15), 2097–2152. D. L. Boger, Med. Res. Rev. 2001, 21 (5), 356–381. D. Kahne; C. Leimkuhler; L. Wei; C. Walsh, Chem. Rev. 2005, 105 (2), 425–448. M. N. Preobrazhenskaya; E. N. Olsufyeva, Expert Opin. Ther. Pat. 2004, 14 (2), 141–173. E. N. Olsuf’eva; M. N. Preobrazhenskaya, Russ. J. Bioorganic Chem. 2006, 32 (4), 303–322. D. H. Williams; B. Bardsley, Angew. Chem. Int. Ed. Engl. 1999, 38 (9), 1173–1193. D. L. Boger; S. Miyazaki; S. H. Kim; J. H. Wu; S. L. Castle; O. Loiseleur; Q. Jin, J. Am. Chem. Soc. 1999, 121 (43), 10004–10011. D. L. Boger; J. H. Weng; S. Miyazaki; J. J. McAtee; S. L. Castle; S. H. Kim; Y. Mori; O. Rogel; H. Strittmatter; Q. Jin, J. Am. Chem. Soc. 2000, 122 (41), 10047–10055. P. Groves; M. S. Searle; J. P. Mackay; D. H. Williams, Structure 1994, 2 (8), 747–754. J. R. Kalman; D. H. Williams, J. Am. Chem. Soc. 1980, 102 (3), 897–905. A. Kumar; R. R. Ernst; K. Wuthrich, Biochem. Biophys. Res. Commun. 1980, 95 (1), 1–6. K. Wu¨thrich, NMR of Proteins and Nucleic Acids; Wiley: New York, 1986. J. Keeler, Understanding NMR Spectroscopy; John Wiley & Sons: 2005. J. Cavanagh; W. J. Fairbrother; I. A. G. Palmer; N. J. Skelton; M. J. Rance, Protein NMR Spectroscopy: Principles and Practice; Acad Press/Elsevier: Burlington, MA, 2006. N. E. Jacobsen, NMR Spectroscopy Explained: Simplified Theory, Applications and Examples for Organic Chemistry and Structural Biology; Wiley-Interscience: Hoboken, NJ, 2007. G. Batta; F. Sztaricskai; M. O. Makarova; E. G. Gladkikh; V. V. Pogozheva; T. F. Berdnikova, Chem. Commun. 2001 (5), 501–502. G. Batta; K. E. Ko¨ve´r; Z. Sze´kely; F. Sztaricskai, J. Am. Chem. Soc. 1992, 114 (7), 2757–2758. G. Batta; F. Sztaricskai; K. E. Ko¨ve´r; C. Rudel; T. F. Berdnikova, J. Antibiot. 1991, 44 (11), 1208–1221. C. M. Pearce; D. H. Williams, J. Chem. Soc. Perkin Trans. 1995, 2 (1), 1995 (1),153–157. F. Sztaricskai; G. Batta; P. Herczegh; A. Bala´zs; J. Jeko}; E. Roth; P. T. Szabo´; S. Kardos; F. Rozgonyi; Z. Boda, J. Antibiot. 2006, 59 (9), 564–582. G. M. Sheldrick; E. Paulus; L. Vertesy; F. Hahn, Acta Crystallogr. B 1995, 51, 89–98. M. Schafer; G. M. Sheldrick; T. R. Schneider; L. Vertesy, Acta Crystallogr. D Biol. Crystallogr. 1998, 54, 175–183. M. Eberstadt; W. Guba; H. Kessler; H. Kogler; D. F. Mierke, Biopolymers 1995, 36 (4), 429–437. P. J. Loll; R. Miller; C. M. Weeks; P. H. Axelsen, Chem. Biol. 1998, 5 (5), 293–298. P. J. Loll; J. Kaplan; B. S. Selinsky; P. H. Axelsen, J. Med. Chem. 1999, 42 (22), 4714–4719. J. P. Mackay; U. Gerhard; D. A. Beauregard; M. S. Westwell; M. S. Searle; D. H. Williams, J. Am. Chem. Soc. 1994, 116 (11), 4581–4590. G. F. Gause; M. G. Brazhnikova; N. N. Lomakina; T. F. Berdnikova; G. B. Fedorova; N. L. Tokareva; V. N. Borisova; G. Y. Batta, J. Antibiot. 1989, 42 (12), 1790–1799. D. Li; U. Sreenivasan; N. Juranic; S. Macura; F. J. Puga, II; P. M. Frohnert; P. H. Axelsen, J. Mol. Recognit. 1997, 10 (2), 73–87. S. G. Grdadolnik; P. Pristovsek; D. F. Mierke, J. Med. Chem. 1998, 41 (12), 2090–2099. J. Kaplan; B. D. Korty; P. H. Axelsen; P. J. Loll, J. Med. Chem. 2001, 44 (11), 1837–1840. W. G. Prowse; A. D. Kline; M. A. Skelton; R. J. Loncharich, Biochemistry 1995, 34 (29), 9632–9644. C. Lehmann; G. Bunkoczi; L. Vertesy; G. M. Sheldrick, J. Mol. Biol. 2002, 318 (3), 723–732. M. Rekharsky; D. Hesek; M. Lee; S. O. Meroueh; Y. Inoue; S. Mobashery, J. Am. Chem. Soc. 2006, 128 (24), 7736–7737. P. J. Loll; P. H. Axelsen, Annu. Rev. Biophys. Biomol. Struct. 2000, 29, 265–289. M. S. Searle; G. J. Sharman; P. Groves; B. Benhamu; D. A. Beauregard; M. S. Westwell; R. J. Dancer; A. J. Maguire; A. C. Try; D. H. Williams, J. Chem. Soc. Perkin Trans. 1996, 1 (23), 1996 (23), 2781–2786. D. McPhail; A. Cooper, J. Chem. Soc., Faraday Trans. 1997, 93 (13), 2283–2289. A. Losi; A. A. Wegener; M. Engelhard; S. E. Braslavsky, J. Am. Chem. Soc. 2001, 123 (8), 1766–1767. D. H. Williams; D. P. O’Brien; B. Bardsley, J. Am. Chem. Soc. 2001, 123 (4), 737–738. S. Jusuf; P. J. Loll; P. H. Axelsen, J. Am. Chem. Soc. 2002, 124 (14), 3490–3491.

244 Biomolecular Recognition by Oligosaccharides and Glycopeptides: The NMR Point of View 447. 448. 449. 450. 451. 452. 453. 454. 455. 456. 457. 458. 459. 460. 461. 462. 463. 464. 465. 466. 467. 468.

S. Jusuf; P. J. Loll; P. H. Axelsen, J. Am. Chem. Soc. 2003, 125 (13), 3988–3994. M. S. Searle; P. Groves; D. H. Williams, Proc. Indian Acad. Sci. Chem. Sci. 1994, 106 (5), 937–954. M. F. Cristofaro; D. A. Beauregard; H. S. Yan; N. J. Osborn; D. H. Williams, J. Antibiot. 1995, 48 (8), 805–810. Y. R. Cho; A. J. Maguire; A. C. Try; M. S. Westwell; P. Groves; D. H. Williams, Chem. Biol. 1996, 3 (3), 207–215. B. Bardsley; D. H. Williams, J. Chem. Soc. Perkin Trans. 1998, 2 (9), 1998 (9), 1925–1929. J. P. Waltho; D. H. Williams; D. J. M. Stone; N. J. Skelton, J. Am. Chem. Soc. 1988, 110 (17), 5638–5643. S. Jusuf; P. H. Axelsen, Biochemistry 2004, 43 (49), 15446–15452. L. Zidek; M. V. Novotny; M. J. Stone, Nat. Struct. Biol. 1999, 6 (12), 1118–1121. D. H. Williams; A. J. Maguire; W. Tsuzuki; M. S. Westwell, Science 1998, 280 (5364), 711–714. D. H. Williams; N. L. Davies; J. J. Koivisto, J. Am. Chem. Soc. 2004, 126 (43), 14267–14272. D. H. Williams; N. L. Davies; R. Zerella; B. Bardsley, J. Am. Chem. Soc. 2004, 126 (7), 2042–2049. D. H. Williams; E. Stephens; D. P. O’Brien; M. Zhou, Angew. Chem. Int. Ed. Engl. 2004, 43 (48), 6596–6616. O. Toke; L. Cegelski; J. Schaefer, Biochi. Biophys. Acta 2006, 1758 (9), 1314–1329. T. Gullion; J. Schaefer, J. Magn. Reson. 1989, 81 (1), 196–200. A. W. Hing; S. Vega; J. Schaefer, J. Magn. Reson. 1992, 96 (1), 205–209. S. J. Kim; L. Cegelski; D. R. Studelska; R. D. O’Connor; A. K. Mehta; J. Schaefer, Biochemistry 2002, 41 (22), 6967–6977. H. Molinari; A. Pastore; L. Y. Lian; G. E. Hawkes; K. Sales, Biochemistry 1990, 29 (9), 2271–2277. L. Cegelski; S. J. Kim; A. W. Hing; D. R. Studelska; R. D. O’Connor; A. K. Mehta; J. Schaefer, Biochemistry 2002, 41 (43), 13053–13058. L. Cegelski; D. Steuber; A. K. Mehta; D. W. Kulp; P. H. Axelsen; J. Schaefer, J. Mol. Biol. 2006, 357 (4), 1253–1262. P. J. Vollmerhaus; E. Breukink; A. J. R. Heck, Chemistry 2003, 9 (7), 1556–1565. S. J. Kim; L. Cegelski; D. Stueber; M. Singh; E. Dietrich; K. S. E. Tanaka; T. R. Parr; A. R. Far; J. Schaefer, J. Mol. Biol. 2008, 377 (1), 281–293. S. J. Kim; S. Matsuoka; G. J. Patti; J. Schaefer, Biochemistry 2008, 47 (12), 3822–3831.

Biographical Sketches

Katalin E. Ko¨ve´r is a distinguished research scientist at the Department of Inorganic and Analytical Chemistry, University of Debrecen. She obtained her M.S. in chemistry in 1979 from the L. Kossuth University, Debrecen, her Univ.D. in chemistry in 1984 from the L. Kossuth University, Debrecen, and her Ph.D. in chemistry in 1988 from the Hungarian Academy of Sciences, Budapest. She was a postdoc fellow (1991–93) in Tucson, Arizona with V. J. Hruby. She was awarded the D.Sc. degree in chemistry in 2002 by the Hungarian Academy of Sciences, Budapest. She has many years of expertise in pulse program development for sensitive and accurate determination of NMR parameters by 1D and 2D methods and in analyzing structural and motional parameters from the measured data for small and large molecules alike. Her current research interests include methodological developments focusing on multidimensional, proton-detected heteronuclear experiments, selective experiments, and gradient-enhanced experiments; NMR structure determination of biologically important oligopeptides/proteins, oligosaccharides, and antibiotics; investigations of receptor– ligand interactions; NMR dynamics study including heteronuclear relaxation, dipole–dipole (DD/DD) and dipole–chemical shift anisotropy (DD/CSA) relaxation interference measurements and their interpretation in terms of the relevant structural and dynamic parameters; application of Transverse relaxation optimized spectroscopy (TROSY) approach; and measurement and application of residual dipolar coupling constants (RDC).

La´szlo´ Szila´gyi has been associated all his career with the University of Debrecen (formerly, L. Kossuth University) except for two postdoc periods in Strasbourg (with J.-M. Lehn) and Stanford (with O. Jardetzky). His research interests include application of NMR spectroscopy to the structure elucidation of natural products such as carbohydrates, aminoglycoside and macrolide antibiotics, flavonoids, morphine alkaloids, etc.; conformational studies of

Biomolecular Recognition by Oligosaccharides and Glycopeptides: The NMR Point of View

(glyco)peptides and proteins by NMR; synthesis of novel carbohydrate scaffolds; and studies of carbohydrate–protein interactions.

Gyula Batta is a distinguished research scientist and professor at the Department of Biochemistry, University of Debrecen. He completed his M.S. in physics in 1976 from the L. Kossuth University, Debrecen and his Ph.D. in chemistry in 1988 from the Hungarian Academy of Sciences (HAS), Budapest. He was a postdoc fellow (1993–94) in Tucson, Arizona with Professor J. Gervay. Under the Go West scholarship, he worked with Professor D. H. Williams in 1994 at Cambridge. He also worked with Professor J. Kowalewski under the STINT scholarship in 1999 at Stockholm. He was awarded the D.Sc. degree in chemistry in 2001 by the HAS, Budapest. His research interests are in high-resolution NMR, methodological developments, NMR dynamics from relaxation and relaxation interferences, and diffusion and saturation transfer methods; NMR structure determination of calcium binding and antifungal proteins, glycopeptide antibiotics, oligosaccharides, and antibiotics; and molecular recognition by glycopeptide antibiotics and protein–carbohydrate interactions.

Dusˇan Uhrı´n received his M.Sc. in chemistry from the Slovak Technical University in Bratislava, Slovakia in 1982. From 1983 he worked as research assistant at the Institute of Chemistry, Slovak Academy of Sciences in Bratislava, where he received his Ph.D. in 1990. In 1991 he was awarded the Soros/FCO Scholarship for East European Scientists and worked under the supervision of Professor Raymond Dwek in the Glycobiology Institute, Oxford University, UK. He was a research associate (1992–95) in the Institute for Biological Sciences, NRC, Ottawa, Canada. He returned to the UK in 1995 as the manager of the Edinburgh Biomolecular NMR unit in the School of Chemistry, University of Edinburgh, where he was appointed to a lectureship in 2000. He is currently head of the NMR facility and a reader at the University of Edinburgh. His research interests are in the development (selective techniques, measurement of scalar and residual dipolar coupling constants) and the application of high-resolution NMR spectroscopy to the structure elucidation of molecules. The systems he studies include small organic molecules, complex carbohydrates, and proteins and their complexes with carbohydrates.

Jesu´s Jime´nez-Barbero was born in Madrid in 1960. He studied chemistry at the University of Madrid (UAM) and received his B.Sc. degree in 1982. After serving in the military service, he started his Ph.D. work with Manuel Martı´n-Lomas and Manuel Bernabe´ at the Institute of Organic Chemistry of the Higher Research Council of Spain (CSIC), Madrid, working on the synthesis and conformational analysis of sugar derivatives. In the last year of his Ph.D. work, he worked with NMR, especially with those methods applied to the measurements of longrange carbon–proton coupling constants. He received his Ph.D. degree in 1987, after a stay at CERMAV-CNRS, Grenoble, working with Serge Perez on the application of molecular mechanics calculations to polysaccharide molecules. In 1988, after a short period at the University of Zu¨rich, he moved as a postdoctoral fellow to the National Institute for Medical

245

246 Biomolecular Recognition by Oligosaccharides and Glycopeptides: The NMR Point of View

Research at Mill Hill, UK to work with Jim Feeney and Berry Birdsall on NMR of proteins, in particular, with dihydrofolate reductase. After returning to Madrid he got a tenure scientist position, although he was allowed to move to Carnegie Mellon University, Pittsburg, USA to work with Aksel Bothner-By on NMR methodology and then with Miguel Llina´s on protein NMR (1990–92). After returning to Madrid again, he worked on molecular recognition, especially on protein–carbohydrate interactions, with particular emphasis on the application of NMR methods, but also using a variety of other techniques, from organic synthesis to modeling protocols and other biophysical techniques. In 1996, he was promoted to senior research scientist of CSIC at the Institute of Organic Chemistry and in 2002 to research professor of CSIC. Soon after that, he moved to the Centre for Biological Research (CIB-CSIC), Madrid, where he is working at the Protein Science Department. He has coauthored more than 280 publications in international journals, has delivered more than 120 lectures at symposia and institutions, and despite not working at the university, has tutored 14 Ph.D. students. He was awarded the Janssen-Cilag Prize in Organic Chemistry of the Royal Society of Chemistry of Spain (RSEQ) in 2003 and is serving as the Secretary General of this Institution (RSEQ) since 2004. He is a member of the editorial boards of Chemistry – A European Journal (from 2001 to date), Organic & Biomolecular Chemistry (from 2007 to date), Glycoconjugate Journal (from 2008 to date), Carbohydrate Research (from 2001 to date), Journal of Carbohydrate Chemistry (from 2002 to date), and European Journal of Organic Chemistry (starting in 2009).

9.08 Determination of Three-Dimensional Structures of Nucleic Acids by NMR Nikolai B. Ulyanov and Thomas L. James, University of California, San Francisco, San Francisco, CA, USA ª 2010 Elsevier Ltd. All rights reserved.

9.08.1 9.08.2 9.08.2.1 9.08.2.2 9.08.2.3 9.08.2.4 9.08.3 9.08.3.1 9.08.3.2 9.08.4 9.08.4.1 9.08.4.2 9.08.4.3 9.08.4.4 9.08.4.5 9.08.5 References

Introduction Sample Preparation Chemical Synthesis of Oligonucleotides Enzymatic Synthesis of RNA Enzymatic Synthesis of DNA Segmental Isotopic Labeling Resonance Assignments Spin System Assignments Sequential Assignments Extracting Structural Information Detection of Hydrogen Bonds Nuclear Overhauser Effects and Interproton Distances Scalar Coupling Data Residual Dipolar Couplings Other Structural Restraints Three-Dimensional Structure Refinement

247 247 248 248 251 252 253 253 254 258 258 260 265 266 267 268 272

9.08.1 Introduction The majority of structures of biological macromolecules are being solved in solid state using X-ray crystallography. Nevertheless, nuclear magnetic resonance (NMR) has been established as a routine alternative method for high-resolution structure determination. Until May 2008, among almost 4000 nucleic acid structures (3888) deposited in the Protein Data Bank (PDB),1 a formidable 24% have been determined by solution NMR methods (compared to 13% for proteins). Both methods have pros and cons, a detailed comparison of which is beyond the scope of this chapter but has been discussed elsewhere.2–4 A choice of the method is dictated by many factors, not the least of which is the specific expertise in a particular research group, but also a success or failure in growing crystals suitable for X-ray diffraction versus the availability of significant time and resources required for structure determination by NMR. A major advantage of solid-state structures determined by crystallography is a defined spatial resolution, which can be used to assess the overall quality of structure determination. Solution structures determined by NMR do not have an intrinsic spatial resolution; their quality can be instead assessed by a number of approaches, some of which are discussed below. On the other hand, solution conditions often approximate the physiological state of a functional biomolecule in a better manner, while the solid state structures are sometimes distorted by crystal packing forces–a problem that can be especially severe for nucleic acids. In this chapter, we outline typical approaches to solve a DNA or RNA structure in solution, including sample preparation, resonance assignments, extracting structural information, and refinement.

9.08.2 Sample Preparation To determine a three-dimensional (3D) structure in solution, one typically needs to prepare several samples of a nucleic acid in milligram quantities. A successful structure determination often depends on a careful design of strategies for labeling of the molecule with isotopes 13C, 15N, and sometimes 2H. Isotopic labeling is necessary 247

248 Determination of Three-Dimensional Structures of Nucleic Acids by NMR

to overcome a severe overlap of 1H resonances in larger molecules and facilitate the assignments of NMR resonances and extracting structural information by acquiring heteronuclear multidimensional NMR spectra.5–8 Deoxyoligoribonucleotides (DNA oligonucleotides) or oligoribonucleotides (RNA oligonucleotides) can be synthesized either chemically or enzymatically, both unlabeled and isotopically labeled.7,9,10 9.08.2.1

Chemical Synthesis of Oligonucleotides

The established and widely used method for oligonucleotide synthesis is based on phosphoramidite chemistry.11 Phosphoramidites are nucleotides (nts) with protection groups attached to each reactive group, amines, hydroxyls, and phosphates. In the case of RNA, the additional 29-hydroxyl group is also protected. Starting with a 39-terminal nucleoside attached to an insoluble polymeric support, phosphoramidite monomers are sequentially added to the growing oligonucleotide chain in the 39!59 direction. The 59-terminal protection group is specifically removed before the addition of each new monomer. In the end of the synthesis, the oligonucleotide is cleaved from the support and all protection groups are removed. The oligonucleotide is then purified either with polyacrylamide gel electrophoresis (PAGE), high-performance liquid chromatography (HPLC), or both. Using the solid support allows automation of each step, adding the reagents and washing out the reactants, which is controlled by the computer in modern DNA/RNA synthesizers. All reagents for the automated synthesis of unlabeled oligonucleotides are readily available. Furthermore, many companies offer a reasonably priced custom synthesis of unlabeled DNA oligonucleotides of up to 50–60 nts long, while chemical RNA synthesis, although also feasible, is almost an order of magnitude more expensive. The amount of a DNA oligonucleotide synthesized at the 1-mmol scale should be sufficient to prepare an NMR sample with the concentration of 0.5 m mol l1 in a volume of 0.25 ml. Isotopically labeled oligonucleotides can be prepared in a similar way using automated synthesizers with labeled phosphoramidites. This approach has serious advantages for NMR applications over the enzymatic synthesis (see below), because it allows incorporation of labeled residues in specific positions in DNA or RNA sequence. However, not all labeled phosphoramidites are commercially available, and even the ones that are available are quite expensive. Labeled phosphoramidites can be prepared chemically from labeled nucleosides, which in turn can be either harvested from bacteria grown on labeled media or by chemical or enzymatic coupling of ribose and nucleobases (see a review by Kojima et al.10 and references therein). One additional advantage of this approach is the possibility of introducing isotopes in specific positions instead of uniform labeling of a nucleoside, which can simplify NMR spectra of such samples significantly. Practically, any position on the ribose moiety can be individually labeled with 13C,10,12 and also certain positions on nucleobases can be labeled with 13C or 15N.13–15 9.08.2.2

Enzymatic Synthesis of RNA

DNA-dependent RNA polymerases from bacteriophages T3, T7, or SP616–18 are a family of homologous relatively small (100 kDa) single-subunit RNA polymerases that do not require additional protein factors for any stages of transcription, that is, initiation, elongation, or termination. These polymerases are easy to overexpress in Escherichia coli; they are very active, terminate less frequently (compared to the E. coli RNA polymerase) and initiate the transcription very stringently from their own promoters (reviewed in Tabor19). Such properties make these polymerases a very convenient tool for in vitro RNA synthesis using a variety of experimental strategies.20 RNA polymerase from bacteriophage T7 (T7 RNAP) is perhaps the one used most commonly. Although T7 RNAP is available commercially, it is more cost-effective to purify it in-house for the synthesis of large quantities of RNA; a number of protocols for the purification of T7 RNAP overexpressed in E. coli have been published.9,21–23 Nucleoside triphosphates (NTPs) required for the in vitro transcription are available commercially, both unlabeled and labeled, including uniformly 15N-labeled, doubly 13C/15N-labeled, and 2H-labeled. It is also possible to produce labeled NTPs in-house from ribosomal RNA extracted from bacterial cells grown in appropriately labeled media.24,25 The procedure involves hydrolyzing RNA down to nucleotide monophosphates (NMPs) and then phosphorylating them to NTPs. A number of strategies have been worked out by using variously labeled media for growing E. coli cells that produce isotopic labels incorporated into specific positions in nucleosides. For example, using 13C-formate and 12C-glucose as carbon

Determination of Three-Dimensional Structures of Nucleic Acids by NMR

249

sources, produces 13C isotopes incorporated specifically into the C8 positions of purines with more than 85% efficiency, see a review by Latham et al.8 and references therein. Normally, bacteriophage RNAP initiates and terminates RNA transcription at specific sequences (with certain efficiency).26 The most straightforward experimental setup for the in vitro preparative RNA production, however, is a so-called run-off transcription, when the RNA synthesis starts at a specific T7 promoter and ends when RNAP falls off from the physical end of the DNA template.20,27 This setup allows for a convenient preparation of DNA promoter and template sequences, which can be chemically synthesized. Furthermore, it has been found that the DNA template does not have to be fully base paired; most or even the entire coding strand can remain single-stranded.27 The minimum base-paired region spanning positions from –15 to –3 (where þ1 denotes the start of transcription residue) is still sufficient for the fully efficient transcription with T7 RNAP.27 This is consistent with the notion of forming the transcription bubble as a part of the process of promoter recognition: in the crystal structure of T7 RNAP with the open promoter, the template and nontemplate strands are unwound downstream starting with position –4; reviewed in Cheetham and Steitz.28 This property allows preparing a single universal DNA top strand for the transcription of all RNA sequences; only the bottom coding DNA strand needs to be redesigned each time (Figure 1). Natural T7 RNAP promoter sequences are strongly conserved from position –17 to position þ6.29 Nevertheless, RNA is also transcribed from promoters with altered positions þ1 to þ6, though with a possibly decreased yield. As a rule, the most efficient yield is achieved for RNA sequences starting with GG or GA.27 To improve the yield, the transcription reaction conditions need to be optimized for each RNA sequence, which includes varying MgCl2 concentration and relative amounts of T7 RNAP, template, and NTPs.9 In addition to the correct RNA fragment and several shorter abortive products produced by the in vitro transcription, T7 RNAP also often incorporates a nontemplate nucleotide at the 39-end of the main transcript,27 creating a so-called ‘n þ 1 product’. The desired RNA product then needs to be separated from the incorrect-size transcripts and also from the DNA template, unused NTPs, and RNAP. The RNAP is sometimes removed from the reaction mixture by phenol–chloroform extraction; then the RNA is purified most often by denaturing PAGE at a single-nucleotide resolution, but also sometimes using HPLC with anion-exchange30 or gel-filtration31 columns. These methods can be advantageous because PAGE purification is the most time-consuming step of RNA preparation, but also because acrylamide oligomers often contaminate the final RNA sample and complicate the NMR spectrum.31 Also, using gel-filtration chromatography does not require denaturing of RNA, which may be critical in some cases. However, column purification does not separate the n from n þ 1 transcripts, which necessitates different approaches to avoid 39-end heterogeneity. The amount of acrylamide oligomers after the elution of RNA from the polyacrylamide gel can be significantly reduced by repetitive (5–6 times) ethanol precipitation after the elution of RNA from the gel (Z. Du, personal communication). The relative amounts of unwanted products strongly depend on RNA sequence; sometimes the yield of the n þ 1 product can be greater than that of the main transcript. The mechanism of the non-nucleotide addition by bacteriophage RNAP is poorly understood; it is likely that the stability of the hybrid between the template DNA and the nascent RNA is implicated. Sometimes, the redesign of the sequence at the 39-end of RNA can significantly reduce the amount of the n þ 1 product. A very useful strategy for reducing the nontemplate nucleotide addition is modification of deoxyribonucleotides near the 59-end of the template DNA. Kao et al.32 found that introducing 29-O-methyl groups in the ribose in the penultimate or the last two positions of the template dramatically reduces the amount of the n þ 1 transcripts. An alternative approach to produce –15

–10

–5

+1

5′ -TAATACGACTCACTATAG 3′ -ATTATGCTGAGTGATATCTGCCGAACGACATGCGCCGTTCTCCGCAG T7 RNAP

pppGACGGCUUGCUGUACGCGGCAAGAGGCGUC Figure 1 Design of the promoter and template DNA sequences for the in vitro transcription with T7 RNAP. The doublestranded portion of the promoter is numbered relative to the transcription start site (þ1). The coding portion of DNA is shown in bold; the RNA product is shown in italics.

250 Determination of Three-Dimensional Structures of Nucleic Acids by NMR

homogenous termini in RNA is to chemically synthesize short chimeric oligonucleotides consisting of DNA residues and residues with 29-O-methyl ribose modifications complementary to the RNA region just upstream of the desired 39-terminus. When such chimeric oligonucleotide is hybridized to RNA, it directs a site-specific cleavage of RNA by RNase H, producing a precise 39-terminus.33,34 An alternate method to avoid the 39-end heterogeneity is by using a hammerhead ribozyme (HHR) cleaving RNA at a specific site.35 This method is gaining popularity especially for larger RNA, see, for example, Kim et al.,36 Tzakos et al.,37 and Easton and Lukavsky.38 HHR folds in a three-way junction structure.39 Its catalytic center has conserved unpaired residues at the junctions, but there is very little sequence requirement for base pairs in the three stems. The autocatalytic cleavage depends on the presence of Mg2þ and occurs downstream of the nucleotide denoted H in Figure 2(a). This could be any residue except G, although the highest cleavage rates are observed in ribozymes with sequences GUG, GUA, AUA, and AUC just upstream of the cleavage site.40 The 59-fragment produced as a result of the cleavage has a 29–39 cyclic phosphate group at its 39-terminus, and the 39-fragment has a hydroxyl group at its 59-terminus.39,41 The RNA substrate can be either added in trans or designed attached to the ribozyme. To produce a homogeneous 39-end after the cleavage, the HHR can be designed in cis at the 39-end of the transcript35 (Figure 2(b)). RNA with practically any length and sequence can be prepared via in vitro transcription with T7 RNAP.20 However for larger RNA, using chemically synthesized DNA templates becomes less practical, because of the exponential decrease in the yield of the template with the oligonucleotide length. For templates greater than 50–100 nt, an alternative method is to use a fully double-stranded DNA template designed within a linearized high-copy DNA plasmid, example, pUC18.31 Another potential complication for preparation of large RNA is the denaturation that RNA undergoes during the PAGE purification. RNA requires to be refolded into its native conformation after such purification, which sometimes may be problematic, see, for example, Uhlenbeck.42 To alleviate this potential problem, several nondenaturing methods of RNA purification have been proposed, including using gel-filtration columns31 and various affinity tag purification strategies.43–45 As an example of the latter, one affinity tag purification strategy included the Ffh M-domain protein from the signal recognition particle (SRP) of Thermotoga maritima coupled to an Affigel-10 matrix.44 The designed RNA included at its 39-terminus a duplicated T. maritima SRP RNA, which forms a high-affinity complex with the M-domain protein. The SRP RNA was separated from the RNA of interest (at the 59-terminus of the transcript) by the C75U mutant hepatitis delta virus ribozyme that is activated by imidazole. For the purification, the transcription reaction mixture was loaded onto the M-domain protein affinity column, washed, and then the RNA of interest was released by adding an imidazole-containing buffer. In another variant of affinity tag purification,45 the specific RNA–protein interaction was achieved by using a coat protein of bacteriophage MS2 that binds with high affinity to a short RNA hairpin.46 The MS2 coat protein was fused with a histidinetagged maltose binding protein, so that a traditional Ni2þ-affinity column could be used for the immobilization of the RNA transcript. The cleavage and release of the RNA of interest was achieved by using another ribozyme that is activated by a small molecule, glucosamine-6-phosphate.47

(a)

3′

(b)

5′

(c) 5′

III

A A

A G GC C G

AA G

CC GG A G II

U

A U

H

3′ 3′

A

G

U

C

I

5′

Figure 2 Hammerhead ribozyme. The filled triangle denotes the cleavage site. (a) An example of sequence in the catalytic center of the HHR. Stems I, II, and III are numbered; thin lines show Watson–Crick base pairs. ‘H’ stands for any nucleotide except G. (b) Design of the HHR at the 39-end of the RNA transcript. After the self-cleavage, the RNA of interest (shown in a double line) has a homogenous 39-terminus with the 29–39 cyclic phosphate. (c) HHR at the 59-end of the RNA transcript producing a homogenous 59-terminus with the terminal OH group in the RNA of interest.

Determination of Three-Dimensional Structures of Nucleic Acids by NMR

251

Finally, certain abundant RNA molecules, such as tRNA, can be overexpressed and purified in quantities sufficient for NMR samples directly from E. coli cells; the RNA can be prepared unlabeled or isotopically labeled when grown on appropriate media.8,48

9.08.2.3

Enzymatic Synthesis of DNA

While enzymatic synthesis of RNA is cost effective compared to chemical synthesis even for unlabeled molecules, enzymatic synthesis of DNA is more expensive because of the higher costs of deoxyribonucleotide triphosphates (dNTPs), and it is used almost exclusively to produce isotopically labeled DNA samples. Several methods have been proposed and used to enzymatically synthesize labeled DNA for NMR studies. These methods can be divided into two groups, in vitro primer extension methods, and growing bacterial cells with a plasmid containing the DNA fragment of interest on an isotopically labeled minimal media. The most common setup for the primer extension reaction makes use of the Klenow fragment.49–53 The Klenow fragment is a fragment of the E. coli DNA polymerase I devoid of the 59–39 exonuclease activity but retaining the 59–39 polymerase and 39–59 exonuclease activities.54 The polymerization reaction requires a single-stranded DNA template and either a DNA or RNA primer; the two can be combined in a single chemically synthesized hairpin construct (Figure 3). In contrast to the RNA transcription, the DNA product remains covalently attached to the primer. If the 39-terminal residue of the primer is ribonucleotide, then the DNA product can be easily cleaved off from the primer by incubation at alkaline conditions. Zimmer and Crothers49 have found that the DNA yield is higher when using a mutant Klenow fragment that is additionally devoid of the 39–59 exonuclease activity,55 however, this enzyme can produce longer DNA products beyond the template for certain sequences. To remove the nontemplate residues, the wild-type Klenow fragment with the intact 39–59 exonuclease activity can be added before the alkaline cleavage of the DNA product from the primer.56 Other DNA polymerases can be used instead of the Klenow fragment with a similar setup, such as Taq DNA polymerase56 and murine mammary leukemia virus reverse transcriptase.57 The primer extension methods produce single-stranded DNA product; to prepare a double-stranded DNA, each strand needs to be synthesized separately. The amount of the DNA product is limited by the amount of the template and primer introduced into the reaction, so they also need to be prepared in milligram quantities. In a different setup, a fragment of double-stranded DNA is amplified by DNA polymerase in a polymerase chain reaction (PCR) during repeated thermal cycling.58–61 The DNA sequence must be designed with flanking sites for a restriction enzyme. The procedure starts with preparing a chemically synthesized double-stranded DNA of interest directly repeated two times. Most often, this tandem repeat is used both as a template and selfprimers in PCR, which leads both to the amplification of the quantity of DNA and amplification of the number of repeats in a process called endonuclease-sensitive repeat amplification (ESRA).58 Bidirectional primers can also be used.59 The PCR is run in two steps, with the concentration of dNTPs increased for the second step. Also, adding single-repeat DNA as additional primers for the second step can increase the final DNA yield by twofold.60 In the end, the multi-repeat product is cut with the restriction enzyme to produce single-repeat double-stranded DNA (possibly with overhangs, depending on the restriction enzyme used). The above methods require one of the DNA polymerase enzymes and labeled dNTPs; both are available commercially or can be prepared in-house. In any case, the costs of making labeled DNA sample are higher

G A GGATCUcctaattataacgaagttagttagtacattagg–3′ A CCTAGAGGATTAATATTGCTTCAATCAATCATGTAATCC–5′ G Figure 3 Design of the DNA template and primer for the in vitro primer extension synthesis of DNA with the Klenow fragment. The product DNA (shown in lower case italics) is covalently attached to the 39-terminus of the primer. For convenience, the template and primer are combined in a single monomolecular hairpin construct in this example. The primer ends with a single RNA residue (boxed); the site of the alkaline cleavage is shown with an arrow. If the RNA residue is placed several positions upstream in the primer, the portion of the primer downstream of that ribonucleotide will remain attached to the product DNA after the alkaline cleavage, and will therefore remain unlabeled.

252 Determination of Three-Dimensional Structures of Nucleic Acids by NMR

than RNA, because of the higher costs of labeled dNTPs compared to NTPs. For example, Feigon and coworkers reported that growing Methylophilus methylotropus bacteria on labeled media at optimized conditions yielded 1.5 g NMPs and 0.5 g dNMPs from 31 l of culture.56 Finally, double-stranded DNA can be directly amplified by cloning it in a high-copy number plasmid and growing E. coli cells in a medium with 13C-labeled glucose and 15NH4Cl as the only source of carbon and nitrogen, respectively.58,62 Prior to growing bacteria on the labeled media, the repeat number of the DNA is amplified using the ESRA procedure (see above), and stable clones with multiple repeats are selected. It has been noted that stable cloning of sequences with multiple repeats may not be easily achieved.58 This method does not require preparation or purchasing labeled dNTPs.

9.08.2.4

Segmental Isotopic Labeling

The main purpose of isotopic labeling is overcoming severe resonance overlap in larger molecules by conducting multidimensional heteronuclear experiments.63 However, for highly repetitive sequences and with increase in molecular weight, the overlap in resonances catches up with these methods and again becomes severe. Preparing multiple samples where different parts of the molecule are labeled one at a time can significantly simplify NMR spectra, because it allows acquiring NMR signals only from the labeled portions. Alternatively, parts of the molecule could be deuterated to make them ‘invisible’ to NMR. Chemical synthesis of oligonucleotides (see above) is the most flexible approach in this respect, because it allows incorporation of labeled residues in arbitrary positions,64,65 although, for large oligonucleotides this method can be prohibitively expensive. With the enzymatic synthesis, a straightforward approach is a type-specific isotopic labeling, for example, preparing nucleic acid molecules with all G’s labeled but the rest of the residues unlabeled, or with only A’s and U’s labeled.66–68 However, this approach only partially solves the problem, because residues of the same type tend to have overlapping resonances. Segmental labeling of enzymatically synthesized RNA molecules involves ligation of two (or potentially more) fragments, one isotopically labeled, and another unlabeled. Two strategies of RNA ligation have been reported for preparing milligram quantities of the product. In one, T4 DNA ligase was used to ligate two RNA fragments annealed to a continuous complementary DNA.34 Creating oligomeric RNA molecules, circularization or joining RNA molecules in incorrect orientation is prevented in this approach, because precise base pairing to the cDNA at the junction is critical for the RNA–RNA ligation with T4 DNA ligase. In another approach, T4 RNA ligase is used to join together two RNA oligonucleotides.36 T4 RNA ligase catalyzes formation of the 39,59 phosphodiester linkage between one RNA fragment with a monophosphate at the 59-terminus and another fragment with a hydroxyl group at the 39-terminus.69 Preparing the correct termini on RNA fragments is possible when using the HHR.36 To prepare the 59-fragment, the HHR is placed at the 39end of the transcript (Figure 2(c)). After the cleavage, this fragment has a hydroxyl group at the 59-terminus, which is not a substrate for the T4 RNA ligase. The 39-terminus of this fragment, produced by T7 RNAP, also has a hydroxyl group, which is a valid substrate for T4 RNA ligase. To prepare the 39-fragment, the HHR is placed in the 59-end of the transcript (Figure 2(b)). After the cleavage, this fragment has a cyclic 29–39 phosphate at its 39-terminus, which is not a substrate for T4 RNA ligase. The 59-terminus of this fragment normally has a triphosphate, produced by T7 RNAP, however, by priming the transcription reaction with GMP, the 59-terminus is replaced with a monophosphate70 and becomes a valid substrate for T7 RNA ligase. An alternative method to prepare correct termini for the ligation with T7 RNA ligase is to dephosphorylate both ends of the 59-fragment with E. coli alkaline phosphatase and phosphorylate both ends of the 39-fragment with T4 polynucleotide kinase.71,72 For the segmental isotopic labeling of DNA, a variant of the primer extension method with the Klenow fragment (see above) has been used.51,52 For example, to label only the 39-part of the DNA molecule, the ribonucleotide within the chemically synthesized primer is placed not at the 39-terminus but several residues upstream, such that after the alkaline cleavage, part of the primer (unlabeled) is included in the DNA product (see Figure 3 legend). To label the 59-part of DNA, the DNA is produced in two steps. At first, the labeled part is synthesized as usual with the primer extension method, cleaved off, and purified from the primer. Then, it is annealed to another, longer DNA template and again is extended with the Klenow fragment using unlabeled

Determination of Three-Dimensional Structures of Nucleic Acids by NMR

253

NTPs. Because the product and the template have exactly the same length after the second step, the template can be biotinylated in the case if it needs to be separated from the product.51

9.08.3 Resonance Assignments Assigning resonances of nuclei to specific frequencies is a critical step in structure determination. Obviously, errors in resonance assignments lead to errors in the resulting structures, which can be sometimes severe, such as incorrectly folded structures, and sometimes subtle. Unfortunately, there are presently no robust tools for finding possible errors in assignments. Because of that, structure determination must be an iterative process: after the structures are calculated, it is necessary to not only calculate average figures of merit (see below), but also to examine individual violations of structural restraints as some of them may be due to mis-assignments. Even though shorter nucleic acids can be solved by purely homonuclear NMR methods, isotopic labeling, and heteronuclear NMR methods allow for more reliable and more complete assignments, allow measurements of a greater number of and qualitatively different kinds of experimental structural restraints, which in its turn improves the accuracy of the solution structure. Assignment strategies for nucleic acids have been discussed in great detail in numerous reviews,7,8,63,73–75 so we will cover them briefly with some emphasis on lesser discussed topics. 9.08.3.1

Spin System Assignments

The assignments process involves identification of spin systems within each residue and sequential assignments. A number of experiments are available for identifying the spin systems. Proton pairs directly connected via scalar through-bond J-coupling interactions can be revealed in homonuclear two-dimensional (2D) COSY spectra;76 multi-step J-coupling interactions can be detected in 2D TOCSY experiments.77–79 In contrast to antiphase multiplet structure of COSY peaks, TOCSY peaks have a simple in-phase structure, and therefore they have a better signal-to-noise ratio. While many intra-sugar proton correlations can be potentially detected in 2D COSY and TOCSY spectra, the resonance dispersion is particularly favorable in two spectral regions. One correlates base H6 and H5 protons for cytosines in DNA and for cytosines and uracils in RNA. Both H6 and H5 resonances have good dispersion, so this spectral region is also often used to assess the general quality of a sample, identify possible impurities, conformational species, and so on. Another spectral region correlates anomeric H19 protons with the rest of the sugar protons. This is especially useful for DNA, where most sugars have predominantly C29-endo puckers with both H19–H29 and H19–H20 J-couplings in the 5–10 Hz range.80,81 In contrast, sugars in helical regions of RNA have C39-endo puckers with JH19-H29 below 3 Hz; therefore, these peaks are only observable for flexible residues and nonstandard conformation (Figure 4(a)). A complete set of such correlations (Figure 4(b)) can be observed in 13C-labeled RNA molecules by taking advantage of relatively large C–H and C–C J-couplings. Experiments HCCH-COSY, HCCH-COSY-RELAY, and HCCH-TOCSY with a single, double, and multi-step COSY-type transfer of magnetization, respectively, between neighboring 13C nuclei can be run either in a 2D version showing only correlations between the protons, or in a 3D version with 13C selection.84–87 The HCCH-TOCSY experiment can also be used to correlate H8 and H2 protons in adenines taking advantage of relatively small (8–10 Hz) two-bond carbon–carbon couplings,88–91 and its variant, HCCCHTOCSY, can be used to correlate aromatic H6 protons with methyl groups in thymines in DNA.92 The correlations between H6 and methyl groups in thymines are also usually observed in homonuclear 2D TOCSY spectra (Figure 5(a)), even though this four-bond J-coupling is very small, approximately 1 Hz.92 It is likely that these cross-peaks are observed via residual rotating frame cross-relaxation (ROESY) rather than throughbond J-coupling interactions, despite the fact that the ROESY effect is minimized in modern ‘clean’ TOCSY pulse sequences.79 Indeed, cross-peaks in 2D TOCSY can also be observed between H19 and H8 in residues with the syn conformation around the glycosidic bond (Figure 5(b)) and even occasionally for sequential H29(i ) –H6/H8(i þ 1) correlations in RNA (Figure 5(c)), where the corresponding interproton distance is very short (see below). The signal-to-noise ratio for such peaks is very low; therefore, it is not recommended to rely solely on such data during the assignments, but rather use them in combination with other assignment methods.

254 Determination of Three-Dimensional Structures of Nucleic Acids by NMR

(a)

(b)

4.0

C34

U9

C34

U9 C33

C10

G27

C12

4.4 C11 U20

U13 4.6

4.8

C10 C11 C12 U9 U13 A8 C14 C7

A15 G28 G27U26 C16 G17 U29 A3 G25 G18 C30 G2 C31 C24 G19 G1 U32 C23 U20 C33 A22 C34 G21 G4

6.0

G5

A6

5.8

5.6

ω 2–1H (ppm)

A8

C12 G21 U26 U20 C11 G19 C31 C30 C16 U13 G25 A22 G4 U32 C7 A6 G17 C14 G2 G18 G5 A15 C24 U29 G27

5.4

A3

C10

G28

ω 2–1H (ppm)

ω 1–1H (ppm)

4.2

U26

G1 6.0

5.8

5.6

5.4

ω 1–1H (ppm)

Figure 4 2D double quantum-filtered homonuclear COSY (a) and 2D version of HCCH-COSY (b) spectra of a 34-nt RNA from the stem-loop IV domain of the Enterovirus internal ribosome entry site;68 the nucleotide sequence is shown in the inset. Positive components of the multiplet peaks are shown in red and negative components are shown in green. COSY H29–H19 cross-peaks, labeled in (a), are only observed for highly flexible residues associated with a 6-nt internal loop, for the 39-terminal residue G34 and for U20 from the tetraloop GUGA. In contrast, all residues show H19–H29 cross-peaks in the HCCH-COSY spectrum. The spectrum was acquired with the spectral width of 1800 Hz in both dimensions; the symmetric region of the spectrum additionally contains aliased H6–H5 cross-peaks (not shown). All spectra shown in this chapter were acquired on a 600 MHz Varian Inova spectrometer, processed with the NMRPipe/NMRDraw82 and annotated with the Sparky program.83

A series of through-bond experiments has been developed to correlate exchangeable protons with aromatic base protons: imino H1 proton with H8 proton in guanines, imino H3 proton with H5 and H6 in uracils, amino protons with H6 in cytosines and amino protons with H2 and H8 in adenines in uniformly 13C,15N-labeled RNA;90,94–99 see also discussion of these experiments in Furtig et al.7 Finally, aromatic protons H6 and H8 can be correlated with anomeric H19 protons within the same residue (Figure 6) by establishing H6/H8–C6/C8–N1/N9 and H19–C19– N1/N9 connectivities via triple-resonance HCN experiments.100,102–106 These experiments can be run either in 2D or 3D versions; in many cases, the N1/N9 resonance dispersion is sufficient to establish unambiguous H6/ H8–H19 correlations by acquiring the 1H,15N plane in the 2D version. The aromatic-to-anomeric proton correlations are very important for establishing sequential assignments using NOESY spectra (see below). 9.08.3.2

Sequential Assignments

Rigorous sequential assignments, that is, correlating nuclei in neighboring residues, require transferring magnetization along the backbone, including the phosphorus nucleus. Various variants of triple-resonance HCP experiments have been developed for this purpose.107–111 The connections between sequential residues are established in these experiments by correlating the C49 and H49 nuclei with phosphorus in the same (n) and the 39-neighboring (n þ 1) residues. Observing these correlations may be sometimes problematic, especially for helical regions, because of a limited resonance dispersion for C49 and H49. However, for residues in ‘unusual’ conformations, such as in internal loops, the dispersion of these nuclei is markedly better. In addition, a related HCP-CCH-TOCSY experiment extends the magnetization transfer to the C19 and H19 nuclei, which are much better resolved.112 Besides, sequential connectivities can be established by correlating phosphorus with protons in unlabeled nucleic acids, using such experiments as

Determination of Three-Dimensional Structures of Nucleic Acids by NMR

(a)

(b) T17

1.2

T6

T23

U10

5.2

1.3

ω1–1H (ppm)

255

C3

T9 1.4 1.5

5.4

T20

T7

C6

U7 Impurity

T3 1.7 7.6

7.4

7.2

ω2–1H (ppm)

C27 5.6

U2 U8 U11

5.8 Impurity

G16H1′–H8

6.0

(c) 4.3

C17

U11H2′–G12H8

4.4

ω1–1H (ppm)

7.0

ω1–1H (ppm)

1.6

4.7

C15

G9H2′–C10H6 6.2

4.5 4.6

U13

C1

T24 T18

C20

C17H2′–G18H8 U8H2′–G9H8 C20H2′–A21H8 G18H2′–G19H8

8.0

7.8

7.6

ω2–1H (ppm)

4.8 8.0

7.8

7.6

ω2–1H (ppm)

7.4

Figure 5 Cross-peaks in homonuclear 2D TOCSY spectra arising due to ROESY effects. ‘Clean’ TOCSY spectra were acquired with the MLEV-17 spin-lock sequence. (a) Base proton H6-to-methyl correlations in a 27-nt AT-rich DNA stem-loop structure;93 the spectrum was recorded with the 50-ms mixing sequence. (b) and (c) TOCSY spectra acquired for a 31-nt stem-loop RNA (unpublished data). (b) H5–H6 cross-peaks in pyrimidines and a H19–H8 cross-peak (boxed) in the syn guanine from the tetraloop UACG; the spectrum was recorded with the 30-ms mixing sequence. (c) Sequential H29–H6/H8 cross-peaks; the spectrum was recorded with the 90-ms mixing sequence.

HETCOR113 or hetero-TOCSY;114,115 see also a review by Pardi87 and references therein. Unfortunately, these methods are not routinely used for structure determination of nucleic acids, except for relatively short oligonucleotides, because of the relatively small chemical shift dispersion of 31P and its fast relaxation via the chemical shift anisotropy mechanism. The main method for establishing sequential assignments still remains the one based on the through-space dipolar interactions between protons within a short distance of each other. Such interactions give rise to nuclear Overhauser effects (NOE), which can be recorded in 2D or 3D NOESY experiments.116–118 This method deals with nonexchangeable protons; therefore, the spectra are recorded with the sample in D2O. If possible, the sample needs to be lyophilized and dissolved in high-grade D2O to avoid the necessity of any water suppression, which can lead to disappearance of cross-peaks for protons resonating close to the water proton resonance frequency (H39 in DNA, H29 and H39 in RNA). Strictly speaking, the NOE-based method of sequential assignments is not rigorous, because it requires some assumptions about the structure of DNA or RNA. In the worst-case scenario, incorrect structural assumptions may lead to seemingly self-consistent, but still erroneous assignments. Fortunately, structured nucleic acids have the majority of residues in right-handed helical conformations, where the presence or absence of certain NOE cross-peaks does not depend on the details of the structure. NOE-based sequential

256 Determination of Three-Dimensional Structures of Nucleic Acids by NMR

8.2

8.0

7.8

7.6

U5N1–H6

144

7.4

7.2

6.2

6.0

5.8

5.6

U5N1–H1′

U18N1–H6

5.4

5.2

5.0

U18N1–H1′

144

U32N1–H1′

146

U28N1–H6 U24N1–H6 U16N1–H6 U19N1–H6

U11N1–H6

U32N1–H6

U28N1–H1′ U24N1–H1′

U16N1–H1′

146

U13N1–H6 C21N1–H6

150

U30N1–H6

C4N1–H6

U19N1–H1′ U30N1–H1′ C21N1–H1′

C34N1–H6

150

C10N1–H1′

C36N1–H6

152

C8N1–H1′ C26N1–H1′

C4N1–H1′

C10N1–H6

C8N1–H6 C27N1–H6

C34N1–H1′

148

U13N1–H1′

152

C36N1–H1′

C26N1–H6

C12N1–H6

ω 1–15N (ppm)

ω 1–15N (ppm)

U29N1–H1′ U29N1–H6

148

C27N1–H1′ C12N1–H1′

154

C38N1–H6

C37N1–H6

C38N1–H1′

154

C37N1–H1′

156

156 8.2

8.0

7.8

ω 2–1H

7.4 7.6 (ppm)

7.2

6.2

6.0

5.8

5.6 5.4 (ppm)

5.2

5.0

ω 2–1H

Figure 6 Correlations between H6 and H19 protons established via common N1 nitrogens in a 38-nt RNA construct from the consensus stem D of the cloverleaf domain of 59-untranslated region of enteroviruses; the nucleotide sequence is shown in the inset of Figure 11. Two 2D (1H,15N) versions of multiple-quantum HCN experiments100 were acquired with optimization for the H6/H8–C6/C8–N1/N9 transfer (left panel) and for the H19–C19–N1/N9 transfer (right panel). Correlations for pyrimidine residues are shown; purine N9 nuclei resonate downfield between 168 and 172 ppm (not shown). Reproduced with permission from Z. Du; J. Yu; N. B. Ulyanov; R. Andino; T. L. James, Biochemistry 2004, 43, 11959–11972, Copyright (2004) American Chemical Society.

assignments for helical regions are based on the fact that many sugar protons are within the NOE distance from H6 or H8 aromatic protons from the same residue and from the downstream residue (Table 1), but not from the upstream residue. Anomeric H19 protons are most useful for this purpose, because of better resonance dispersion for these protons; this spectral region is sometimes called a fingerprint region for nucleic acids. Figure 7 shows an outline of a fragment of the assignment ‘walk’ (H19A4, H8A4)–(H19A4, H8G5)–(H19G5, H8G5), and so forth. Establishing this walk is greatly facilitated by the HCN correlations (see above), which help distinguish intra- and inter-residue H19–H6/H8 NOE cross-peaks. These connectivities can be interrupted between neighboring nts lacking stacking interactions or with nonstandard stacking interactions, such as in internal and apical loops. Therefore, it is useful to have several starting points for the assignment walk. In addition to 59- and 39-ends of the oligonucleotide, starting points for assignments can be found based on specific features of the nucleotide sequence that give rise to specific patterns in NOESY spectra. For example, all instances of two neighboring pyrimidines can be located with the help of H5 protons. Indeed, in addition to the cross-peaks H19(n)–H6(n), H19(n)–H6(n þ 1), and H19(n þ 1)–H6(n þ 1), common to all residues, the consecutive pyrimidines exhibit two strong crosspeaks H5(n)–H6(n) and H5(n þ 1)–H6(n þ 1) and a medium cross-peak H5(n þ 1)–H6(n) in the same spectral region; two medium-to-strong cross-peaks H5(n)–H5(n þ 1) and H19(n)–H5(n þ 1) are observed in the region of the anomeric diagonal, and a weak-to-medium cross-peak H6(n)–H6(n þ 1) is observed in the region of the aromatic diagonal. In all cases, the diagonal regions have plenty of NOE cross-peaks that are useful not only for assignments but also for extraction of structural information (see below). It is important to remember that modern spectrometers are more sensitive, and larger molecules tumble in solution more slowly, which leads to more effective spin diffusion. Because of that, NOE cross-peaks can be sometimes observed at an interproton distance well above 5 A˚. For example, at higher mixing times, sequential H19–H19 cross-peaks and even cross-strand H19–H19 cross-peaks can be observed in the region

Determination of Three-Dimensional Structures of Nucleic Acids by NMR

257

Table 1 Typical interproton distances (A˚) in helical regions First proton

Second proton

DNA

RNA

H19 (n) H19 (n) H29 (n) H29 (n) H20 (n) H20 (n) H39 (n) H39 (n) H49 (n) H59 (n) H50 (n) H19 (n) H19 (n) H29 (n) H29 (n) H20 (n) H39 (n) H39 (n) H19 (n) H19 (n) H19 (n) H19 (n) H19 (n) H19 (n) H19 (n) H6/H8 (n) H2 A (n) H2 A (n) H2 A (n) H2 A (n) H2 A H2 A H2 A (n) H6/H8 (n) H5 (n) H19 (n) H19

H6/H8 (n) H6/H8 (n þ 1) H6/H8 (n) H6/H8 (n þ 1) H6/H8 (n) H6/H8 (n þ 1) H6/H8 (n) H6/H8 (n þ 1) H6/H8 (n) H6/H8 (n) H6/H8 (n) H5 (n) H5 (n þ 1) H5 (n) H5 (n þ 1) H5 (n þ 1) H5 (n) H5 (n þ 1) H29 (n – 1) H49 (n) H49 (n þ 1) H59 (n) H59 (n þ 1) H50 (n) H50 (n þ 1) H5 (n þ 1) H19 (n) H29 (n) H19 (n þ 1) H6/H8 (n þ 1) H19b H19c H2 A (n þ 1) H6/H8 (n þ 1) H5 (n þ 1) H19 (n þ 1) H19e

3.6–3.9 3.2–4.4 2.0–2.9 2.4–3.7 3.5–4.2 2.1–2.6 3.6–4.5 4.0–5.0 4.5–5.0 4.0–6.5 4.5–5.4 5.3–5.4 4.9–5.2 4.3–4.7 2.3–3.0 3.7–4.1 (6.0–6.6)a 4.0–4.5 (5.3–7.0)a 2.9–3.4 4.4–5.4 4.5–4.8 2.8–4.0 4.9–5.1 4.6–5.5 3.3–3.9 4.3–4.5 (6.5–6.8)a 3.5–4.8 (5.4–6.0)a 4.6–5.2 3.8–5.2 3.5–3.7 4.5–5.2 3.8–4.3 4.5–5.4 (6.0–7.7)a

3.5–3.9 4.4–4.9 3.7–4.1 2.0–2.2 n/a n/a 2.7–3.2 3.1–3.6 4.0–4.6 3.6–4.1 4.0–4.4 5.3–5.4 5.3–5.7 5.2–5.5 3.6–3.8 n/a 4.6–5.1 3.5–4.1 4.0–4.5 3.2–3.6 (5.6–6.2)a 4.6–4.8 4.7–4.9 5.1–5.3 (6.3–6.6)a 3.8–4.3 4.5–4.7 4.7–5.1 2.8–3.7 4.7–5.3 5.2–5.4 3.7–4.3 4.2–4.7 4.6–5.4 3.8–4.0 (5.3–5.8)d (5.6–6.7)d

a

These NOE cross-peaks are typically not observed. H19 from the residue base-paired to the adenine. H19 downstream from the residue base-paired to the adenine. d These NOE cross-peaks can be observed at larger mixing times. e H19 two residues downstream in the opposite strand. n/a, not applicable. b c

of the anomeric diagonal (see Figure 8 and Table 1). A higher mixing time is recommended for this observation not only to make the cross-peaks stronger (Figure 9), but also to decrease the intensity of the diagonal peaks via spin diffusion, which otherwise could mask weak cross-peaks. Similar connectivities can also be established for the (H29, H6/H8) and (H39, H6/H8) cross-peaks, and, in the case of DNA, also for cross-peaks entailing H20 and H6/H8, although the chemical shift dispersion is less favorable for H29 and H39 protons in RNA. Nevertheless, these spectral regions can also be very useful when used in combination with the walk in the anomeric-to-aromatic region. Figure 10 shows examples of 1D slices of 2D NOESY spectra through frequencies of aromatic protons for DNA and RNA, and Table 1 lists most common cross-peaks expected for residues in helical regions.

258 Determination of Three-Dimensional Structures of Nucleic Acids by NMR

(a)

(b) U6H5–H6 U6H5–C7H6

A11H2′–C12H6 U6H5–G5H8 C18H5–H6

5.2

C18H5–A17H8

ω 1–1H (ppm) (H1′/H5)

5.4

C18H1′–A17H2 C3H1′–A2H2 U19H5–H6 U8H5–H6

C3H1′–A4H8

C18H1′–H6

U6–C7

G13H3′–G14H8

5.6

U8 C7–U8

U8H5–C7H6 C10H1′–C12H6 C7H1′–A17H2

U6 G5–U6

C3H5–A4H8 C3H5–A2H8 C9H5–U8H6

G1H1′–A2H8 G1H1′–H8

5.8

U19H5–C18H6

C9H1′–A15H2 U9H1′–A4H2

C9H5–H6 G16H1′–A17H8 G16H1′–A15H2 C7H5–U6H6 G5H1′–A4H2 C3H5–H6

C7H5–H6

C7 G16H1′–H8

U6 G5

C7

G13H1′–G14H8 A2H1′–H8

U10H5′–C9H6 U10H5′–H6 G20H1′–A4H2 G20H1′–H8 G13H1′–H8 A2H1′–C3H6 A15H1′–H2 A2H1′–H2

6.0 A17H1′–H8

A4

C12H1′–H6 A4H1′–H2 A15H1′–H8 A17H1′–C18H6

A15H1′–G16H8

A4–G5

C12H5–H6

6.2

G5

A4

A11H1′–H2

8.4

8.2

8.0

7.8

7.6

7.4

7.2

ω 2–1H (ppm) (H2/H6/H8) Figure 7 Fingerprint region of a 2D NOESY spectrum (a) of a 20-nt RNA hairpin from U4 snRNA acquired with a mixing time of 400 ms.119 The H19–H6/H8 ‘walk’ is shown for the A4–G5–U6–C7 helical segment; the NMR structure of this segment is shown in (b). Yellow lines connect H19 protons with aromatic H6 or H8 protons.

9.08.4 Extracting Structural Information 9.08.4.1

Detection of Hydrogen Bonds

Establishing base pairing patterns in nucleic acids gives perhaps the most important structural information – the one about the secondary structure of the molecule. Although base pairing can be predicted reliably in simple cases of duplex DNA or RNA, it is less obvious in an arbitrary case, because nucleobases can pair in a great variety of geometries (see, e.g., Leontis and Westhof121) or even form triples or quadruples. The simplest way to detect Watson–Crick AT, AU, GC, and wobble GU pairs is via a 2D NOESY spectrum recorded in a 90/10% H2O/D2O solvent.122 To slow down the rapid exchange of imino and amino protons with solvent, this spectrum is often acquired at a slightly lower temperature (5–15 C) and in slightly acidic buffer (pH 6). However, one needs to be careful when refining structures based on data acquired at different temperatures, because change in temperature usually leads to some changes in chemical shifts and sometimes even to structural alterations (see, e.g., Lefevre et al.123 and Ulyanov et al.124). The AU and AT Watson–Crick base pairs are easily distinguished122 by a strong NOE cross-peak between the hydrogen-bonded imino proton of U or T and the H2 proton of A. GC pairs are distinguished by two strong peaks between the imino proton of G and resolved amino protons of C; the two amino protons of C are also correlated with each other by a strong NOE cross-peak. It has been found that J-scalar couplings can be observed for nuclei connected by hydrogen bonds;125 the nature of such couplings is similar to the J-couplings of nuclei connected by covalent bonds, that is, via interaction of nuclear spins with electron spins.126 A series of heteronuclear experiments have been developed in the past decade for detecting hydrogen bonds between nucleobases by observing scalar coupling across the hydrogen bonds (see reviews of Furtig et al.,7 Latham et al.,8 Grzesiek et al.,127Cornish et al.,128 and Dingley et al.,129 and references therein). Hydrogen bonds NH–N involving imino protons can be detected in the HNNCOSY experiment (Figure 11) due to the J-coupling between the two 15N nuclei, 2hJNN (symbol ‘h’ in the superscript denotes that out of the two bonds separating the 15N nuclei one is actually a hydrogen bond). The

Determination of Three-Dimensional Structures of Nucleic Acids by NMR

5.2

5.4

C10H5–U11H5 C16H5–G15H1′ C10H5–G9H1′ C16H5–H1′ 2_peaks species 2_peaks species species C18H5–H1′ C3H5–A2H1′ U34H5–G33H1′ C25H5–H1′ U34H5–C35H5 C25H5–G24H1′ U34H5–H1′ C3H5–H1′ 22–15 C22H5–A21H1′ 22–23 C6H5–U7H5 species C6H5–H1′ C10H1′–U11H5 C6H5–G5H1′ 22–21 10–11 U7H5–U8H5 10–9 U7H5–C8H1′ 20–19 U7H5–C6H5 6–5 25–24 7–8 20–21 C6H1′–H5 28–8 28–27 3–2 U7H1–U8H5

ω1 – 1H (ppm)

28–29

25–26

3–4

C6H1′–U7H5 16–15 U8H5–U7H1′ C3H1′–H5 15–16 U8H5–U7H5′ 15–22

16–17

2_peaks

C32H5–G31H1′ 11–12 34–33 8–9 C32H5–H1’ 15–14

8–27

34–35

5.6

30–31

12–13

32–31 C35H5–H1′

C32H1′–H5 32–33 5–4

9–8

C35H5–U34H1′

17–16

4–5

4–3 33–34

31–32

2_peaks 5–6

27–8

33–32

27–26

1–2

U11H5–C10H1′ 24–25 2_peaks 19–20

12–11

U34H1′–C35H5

9–10 27–28 26–25

35–34

26–27

21–20

C35H1′–H5 31–30

13–12

6.0

14–15

6.0

5.8

5.6

A2H1′–C3H5

5.4

C18H1′–H5 C16H1′–H5 G15H1’–C16H5

U11H5–C10H5 G24H1′–C25H5

G33H1′–U34H5 21–22 A21H1′–C22H5

29–28

2–3

2–1

C25H1′–H5

G9H1′–C10H5

29–30

14–13

2_peaks

C35H5–U34H5 G5H1′–U6H5

G31H1′–C32H5

13–14

C35 – G1 U34 – A2 G33 – C3 C32 – G4

23–22

8–28

8–7

30–29

5.8

11–10

species

259

G31 G30 A29 * G5 G28 – U6 A27 – U7 A26 – U8 C25 – G9 G24 – C10 G23 * U11 C22 * G12 A13 A21 A C20 – G15 14 G19 – C16 C18 – G17 G17 – C18 C16 – G19 G15 – C20 A14 A21 A13 G12 – C22 U11 * G23 C10 – G24 G9 – C25 U8 – A26 U7 – A27 C6 – G28 G5 * A29 G30 G G4 – C32 31 C3 – G33 A2 – U34 G1 – C35

5.2

ω2 – 1H (ppm) Figure 8 Region of the anomeric diagonal of the 300-ms 2D NOESY spectrum of the 35-nt extended dimer stem-loop SL1 RNA from HIV-1. Peaks labeled only with numbers denote residue numbers for H19–H19 cross-peaks; cross-strand crosspeaks are labeled in italics. Reproduced with permission from N. B. Ulyanov; A. Mujeeb; Z. Du; M. Tonelli; T. G. Parslow; T. L. James, J. Biol. Chem. 2006, 281, 16168–16177. Copyright ª 2006 American Society for Biochemistry and Molecular Biology.

2h

JNN was found to be in the range of 5–10 Hz for the Watson–Crick and Hoogsteen hydrogen bonds.130–132 In hydrogen bonds involving amino groups NH2–N, the 15N frequencies on donor and acceptor groups are separated by approximately 150 ppm. A pseudo-heteronuclear variant of the HNN-COSY experiment with selective 15N pulses has been developed for the detection of such hydrogen bonds, which are present, for example, in sheared A–A and other purine–purine base pairs.133,134 The N–H–OTC hydrogen bonds can be detected with the selective long-range H(N)CO experiment.134 A H(CN)N(H) pulse sequence has been developed and used to detect the NH2–N7 hydrogen bonds in base tetrads via the H8–(N2,N6) correlations despite the fact that the amino protons were not observed due to conformational exchange broadening.135 The NH–N hydrogen bonds can also be detected in the absence of an observable imino proton, by correlating the imino 15N nucleus with the nonexchangeable proton on the paired residue, adenine H2 for Watson–Crick interactions, or purine H8 for Hoogsteen interactions.136 This experiment, the quantitative 2JHN HNN-COSY, can be conducted even in a D2O solvent, because the magnetization originates on nonexchangeable H2 or H8 protons and it is also detected on nonexchangeable H2 or h8 protons. A hydrogen bond involving the ribose 29-hydroxyl group, OH–N, have been detected for the stable tetraloop in the 1H,15N CPMG HSQC experiment,137 even though hydroxyl protons are typically not observed due to the rapid exchange with water. Finally, intra- and inter-molecular hydrogen bonds in symmetric dimers

260 Determination of Three-Dimensional Structures of Nucleic Acids by NMR

0.05

U8H1′–H6

NOE

0.04

0.03 A27H1′–G28H1′ 0.02

0.01 U8H1′–A27*H1′

0.1

0.2 0.3 Mixing time (s)

0.4

Figure 9 NOE intensities were simulated for the extended dimer stem-loop SL1 RNA (PDB 2GM0, first structure in the ensemble120) via CORMA using an effective correlation time of 31 ns for a series of mixing times (unpublished data). The curves (solid, dashed, and dotted) show calculated NOE intensities, and the symbols (diamond, squares, and circles) show normalized experimental NOE intensities for the cross-peaks U8H19–A27H19, A27H19–G28H19, and U8H19–H6, respectively (the asterisk denotes that the residue is from the symmetric strand). The vertical bars show estimated experimental errors in the intensities. The corresponding three distances in the structure used for the simulations are 6.74, 5.45, and 3.59 A˚, and the lower and upper distance bounds calculated with RANDMARDI from the experimental data are 4.4–7.0, 4.0–7.7, and 3.1–4.5 A˚, respectively. The experimental cross-peaks U8H19–A27H19 and A27H19–G28H19 can be seen in Figure 8 at a mixing time of 300 ms. Note that despite the relatively large size of the dimer (22.6 kDa), the intensity of weak cross-peaks can still benefit from further increase in the mixing time, while the medium-strength cross-peak, U8H19–H6, starts decaying after 300 ms.

can be discriminated by comparing intensities of the 2hJNN HNN-COSY cross-peaks for fully 15N-labeled samples and for the 1-to-1 mixtures of labeled and unlabeled samples;138 this approach is similar to the asymmetric isotope labeling in combination with NOE measurements.139–143 Once the base pairing pattern is established, the structural restraints are generated for each hydrogen bond, usually as hydrogen bond length, and sometimes as hydrogen bond angle as well. The length and angle parameters for each type of hydrogen bond are usually derived from crystal structures of nucleic acids.

9.08.4.2

Nuclear Overhauser Effects and Interproton Distances

Interproton distances derived from the NOE data are very important types of structural information, in fact, generally, the most important basis for rigorous structure determination. It has become possible to solve highresolution structures of proteins and short DNA duplexes (see, e.g., Wu¨thrich73 andKaptein et al.,144 and references therein) only after the introduction of an experimental technique, homonuclear 2D NOE spectroscopy, or NOESY, allowing measurements of many NOE cross-peaks.116,145 The off-diagonal cross-peaks in the NOESY spectrum arise due to the exchange of magnetization between the nuclei during the mixing period of the experiment via dipole–dipole cross-relaxation. In short, NOE cross-peaks are observed only for relatively short interproton distances, the NOE intensity builds up with increasing mixing time, and the build-up is more efficient for larger molecules. A mathematical framework for calculations of interproton NOEs has been

Determination of Three-Dimensional Structures of Nucleic Acids by NMR

(a)

ω1–1H (ppm) 4

6

2 8.2 T6H7–A5H8

ω1

T6M7 A5H2′

2

(ppm)

6

7

5

4

A6H2′ &H3′

C7H1′

C7H2′ &H5″

U9H2′–A8H2

A6H1′

U9H4′–A8H2

A8H2′–H2

U9H1′–A8H2

C10H5–A8H2

A8H1′–H2

G28H1′–A8H2

A8H8–H2

C11H6–A8H2

C7H6

7.3

C7H5′–H6 C7H3′–H6

C7H4′–H6 C7H5″–H6 C7H2′–H6

A6H2′–C7H6

C7H5–H6

C7H1′–H6

A6H1′–C7H6

A6H8–C7H6

A8H8–C7H6

C7H5

A8H8 A6H8

A5H2′–H8

A4H2′–A5H8

A4H2″ & A5H2″

4 –1H

8.1

A4H2′

A4H4′ A5H5′ A5H4′ A5H5″ A5H3′ A4H3′

6

A5H2″–H8

A5H2″–A5H8

8 (b)

A5H5″–H8

A4H8

A5H5′–H8

A5H1′

T6H6

A4H4′–A5H8 A5H4′–H8

A4H1′

A5H3′–H8 A4H3′–A5H8

A4H1′–A5H8

A5H1′–H8

T6H6–A5H8

A4H8–A5H8

ω2–1H (ppm)

8

ω2–1H (ppm)

261

7.2

C7H3′ C7H5′

C7H4′

(c) 7

6

5

4

ω3–1H (ppm)

C7H5′–H6

C7H3′–H6

C7H5–H6

A6H1′–C7H6

C7H6–H6

A8H8–C7H6

C7H2′–H6

ω1–13C: 140.0 ppm

7.3

7.2 A8H8 &A6H8

C7H6

A6H1′

C7H5 C7H1′

A6H2′ &H3′

C7H3′ C7H5′

C7H2′&H4′&H5″

ω2–1H (ppm) Figure 10 Portions of NOESY spectra and 1D slices through the frequencies of aromatic protons. (a) A 150-ms 2D NOESY spectrum of a 27-nt DNA stem-loop;93 a slice through the frequency of A5H8 is shown. (b) A 200-ms 2D NOESY spectrum of a 34-nt RNA stem-loop;68 a slice through the frequency of C7H6 is shown. Assignments of H59 and H50 protons are tentative. Note that some of the cross-peaks partially overlap with cross-peaks in another slice through the frequency of A8H2. (c) A 150-ms 3D 13C-edited NOESY-HMQC spectrum of the same molecule shown in (b). A slice of the proton and carbon frequencies of H6 and C6 in residue C7 are shown. Note a significantly lower digital resolution in the indirect !2 dimension in this spectrum compared to the indirect !1 dimension in the 2D NOESY spectrum shown in (b).

262 Determination of Three-Dimensional Structures of Nucleic Acids by NMR

12

G15 G14 G7 G3 G33 G2 G31 G1 G35

11

10 G22

G23

U30

U11 U28 160

U18

U16 U32 U24

GN1

140

13

U13

UN3

14

ω1 – 15N (ppm)

A20 C21 U19 G22 U18 o G23 A17 – U24 U16 – A25

180

G15 – C26 G14 – C27 U13 o U28 CN3

C12 o U29 U11 o U30

200

C10 – G31 A9 – U32 C8 – G33 G7 – C34 AN1

A6 U5

220

C4 – G35 G3 – C36

14

13

ω2 – 1H (ppm)

12

G2 – C37

10

G1 – C38

Figure 11 2D HNN-COSY spectrum of the 38-nt RNA construct from the consensus stem D of the cloverleaf domain of enteroviruses; sequence is shown in the inset. Labeled peaks in the upper part of the spectrum arise from the one-bond correlations of the NH imino groups. Cross-peaks in the lower part of the spectrum (not labeled) arise due to the scalar coupling (2JNN of 5–7 Hz) between 15N nuclei across the NH–N hydrogen bonds in Watson–Crick AU and GC pairs. They have the opposite phase compared with the diagonal NH peaks. Reproduced with permission from Z. Du; J. Yu; N. B. Ulyanov; R. Andino; T. L. James, Biochemistry 2004, 43, 11959–11972, with permission from the American Chemical Society. Copyright (2004) American Chemical Society.

established subsequently.146–148 A matrix of NOE intensities A is related to a matrix of dipolar relaxation rates R by an exponential matrix expression: Aðm Þ ¼ expð – Rm Þ Að0Þ

ð1Þ

where m is the experimental mixing time, the length of the mixing period in the three-pulse 2D NOE experiment. The off-diagonal terms in matrix R, the dipolar cross-relaxation rates Rij between protons i and j, are inversely proportional to the sixth power of the interproton distances. The proportionality coefficients depend on the motional characteristics of the molecule; they increase with the rotational correlation time c, that is, with the size of the molecule. A complete set of expressions for the matrix in Equation (1) for an isotropically tumbling rigid molecule is given in Keepers and James,148 and the rate expressions for spins in rapidly rotating methyl groups are given in Liu et al.149 CORMA148 is a computer program used in our lab to evaluate Equation (1). Although the cross-relaxation rates depend only on the corresponding interproton distances, the resulting NOE intensities depend on the full relaxation network because of the matrix nature of the exponential equation (Equation (1)). This gives rise to the so-called spin diffusion, or indirect magnetization transfer, an effect when the observed NOE intensity is affected by the surrounding protons. Similarly to the

Determination of Three-Dimensional Structures of Nucleic Acids by NMR

263

direct magnetization transfer, the spin diffusion is also more effective at higher mixing times m and higher correlation times c, that is, for larger molecules. Using Equation (1), it is possible to calculate theoretical NOE intensities for various molecular models and compare them with observed NOE data, see, for example, Keepers and James,148 Massefski and Bolton,150 and Suzuki et al.151 To refine the molecular structure using NOE data, conceptually the most straightforward approach is to incorporate the NOE calculations directly into the refinement program.152–160 However, with available computers at the time when these techniques were introduced, almost two decades ago, this was a computationally challenging task. An approach used instead in many labs was to estimate interproton distances from the NOE data at first, and then use distance restraints to refine the solution structure. This is still the most frequently used approach, although one might anticipate revisiting methods of direct refinement against NOE because of the dramatically increased power of modern computers. The interproton distances can be estimated approximately, using a so-called isolated spin-pair approximation (ISPA), by ignoring the full relaxation network. This can be done by either using the initial slopes of the NOE build-up curves161–163 (i.e., NOE measured at a series of mixing times), or qualitatively categorizing NOE intensities into weak, medium, and strong groups.2,4 Both work well for the determination of solution structures of globular proteins, because even approximate distances extracted from the long-range NOEs (i.e., coming from non-neighboring residues) efficiently help define the protein fold during the refinement. On the other hand, such approximate distances have a diminished utility for the determination of extended structures with relatively few long-range NOEs, such as DNA or RNA duplexes. To a large extent, it was the interest in the sequence-dependent conformation of DNA in solution (see, e.g., Schmitz and James164 and Ulyanov and James165) that motivated the development of full relaxation matrix methods for calculation of interproton distances from NOE data. In the ‘modified ISPA’ approach,166,167 the distances are calculated from the NOE intensities using special calibration curves, which take into account that short distances are typically overestimated and long distances underestimated in the classical ISPA due to the spin diffusion, but ignores individual differences between specific pairs of protons. The curves are calibrated based on NOE data for interproton distances with fixed values. Model calculations for a short DNA duplex showed that this approach produces good results,166 however, it is expected that the errors should increase for larger molecules with more prominent spin diffusion. Nevertheless, to calculate the interproton distances rigorously from the 2D NOESY cross-peaks, one needs to invert Equation (1), which is mathematically possible only when the complete matrix A of NOE intensities is available. In a typical NOESY experiment, however, many cross-peaks remain not quantified due to spectral overlap and incomplete resonance assignments; in addition, many cross-peaks are not detected because they are below the noise level. To solve this problem approximately, various iterative algorithms have been developed, including MARDIGRAS,168,169 IRMA,170 and MORASS.171 For example, in MARDIGRAS, an algorithm developed in our lab, the iterations start with substituting all missing NOE intensities with intensities calculated from an arbitrary molecular model. Then, the relaxation rates are calculated from the hybrid NOE matrix using the inverted Equation (1), and the rates corresponding to the experimentally observed cross-peaks are substituted with the ideal rates calculated from the model. The hybrid rate matrix is then used to calculate the next approximation of the NOE matrix, and the process is iterated until convergence, after which the interproton distances corresponding to the observed NOE’s are calculated from the relaxation rates. The method was found to be relatively insensitive to the model structure.169,172–175 Still, there is some residual dependence of calculated interproton distances on the model structure;176 such dependence is expected to grow for larger molecules with higher correlation time c. This dependence can be minimized by calculating the distances and refining the molecule in two or more iterations: using a starting model to calculate first set of distances and refining a preliminary structure, and then using this preliminary structure to calculate the final set of distances.120,143 To obtain meaningful and accurate distances, it is important to accurately integrate the intensities of NOE cross-peaks, which could be achieved, for example, with the linefitting procedures incorporated in SPARKY software.83 Only nonoverlapped or successfully deconvoluted peaks should be used for distance calculation. Equally important is to account for random errors in experimental NOE intensities. For this purpose we developed a procedure, RANDMARDI, which calculates lower and upper distance bounds based on repeated MARDIGRAS calculations for randomly perturbed experimental intensities.177 The extent of random perturbations must correspond to realistic estimates of experimental errors.

264 Determination of Three-Dimensional Structures of Nucleic Acids by NMR

If the experimental errors are underestimated, it can lead to tight but inaccurate distance bounds. Conversely, the overestimated errors can lead to unnecessarily imprecise distance bounds. RANDMARDI takes into account two types of experimental errors: relative integration errors and absolute errors due to spectral noise. The first kind can be estimated, for example, by comparing intensities of symmetric peaks below and above the diagonal, and the second type can be estimated as 50–200% of the lowest quantifiable peak, depending on the spectrum quality. In addition to the arbitrary model, distance calculations with MARDIGRAS require isotropic rotational correlation time c as input parameter. Effective rotational correlation time can be estimated by a number of experimental approaches.176 An approach that usually produces self-consistent results is to estimate c based on the same NOESY data that are used for distance calculations. MARDIGRAS can be run at a series of correlation times, and a c range can be selected that reproduces best fixed interproton distances and distances with limited variation, see, for example, Ulyanov et al.120 For that purpose, the experimental NOE intensities (which are integrated in arbitrary units) must be normalized based on the total sum of all observed intensities; if possible, intensities of diagonal peaks must also be integrated and included to make the dependence of calculated distances on c more apparent, see a discussion in Tonelli.176 Fixed interproton distances and distances with limited variation in nucleic acids are listed in Table 2. The full relaxation matrix approach for distance determination from 2D NOESY data has been used in many groups to refine solution structures of DNA, including mismatched duplexes, variously chemically modified duplexes, complexes with small molecules, and so on.93,174,176,178–195 However, there are fewer RNA structures Table 2 Fixed distances and distances with limited variation between nonexchangeable protons in nucleic acidsa Protons

Lower bound

Upper bound

H19–H29 H19–H20b H19–H39 H19–H49 H29–H20b H29–H39 H29–H49 H39–H49 H20–H39b H20–H49b H19–H59/H50c H29–H59/H50c H39–H59/H50c H49–H59/H50c H29–H59/H50b,c H5–H6d H6–M7e H2–H8f

2.7 2.2 3.8 2.9 1.8 2.3 3.8 2.6 2.7 2.8 4.0 2.3 2.2 2.1 3.8 2.4 2.9 6.4

3.0 2.4 4.0 4.0 1.8 2.5 3.9 3.1 3.1 4.2 5.3 5.4 3.9 3.0 5.4 2.5 2.9 6.4

a Interproton distances (A˚) are calculated assuming the aliphatic and aromatic C–H distances of 1.09 and 1.08 A˚, respectively. The distance variations correspond to the sugar conformation variation covering the range of pseudo-rotation phase angle between 30 and 210 with amplitude of pseudo-rotation between 36 and 41 , that is, excluding only the most unfavorable sugar puckers. The calculations have been carried out with the miniCarlo program. b DNA only. c To determine the ranges of distances involving H59/H50 protons, the backbone torsion angle gamma was additionally varied from 0 to 360 . d Cytosine or uracil. e Third-root averaged distance between H6 and methyl protons in thymine. f Adenine.

Determination of Three-Dimensional Structures of Nucleic Acids by NMR

265

determined using this method,66–68,119,120,196–201 where approximate methods of distance determination are used more often. There may be several reasons for that. One reason is that for many RNAs, the main scientific interest is in its global fold and not in relatively subtle structural features, such as sequence-dependent bending of DNA duplexes, thus justifying the elimination of a time-consuming process of accurate NOE integration and distance calculation. The other reason is that because of the relative ease of synthesizing labeled RNA molecules, accurate interproton distances can be substituted with other types of structural information derived from heteronuclear NMR, such as residual dipolar couplings (RDC) (see below), while most of the DNA structures have been determined using homonuclear data. Finally, for larger RNA molecules, the resonance overlap in homonuclear 2D NOESY spectra can preclude accurate integration of NOE intensities. Nevertheless, for moderate size RNA, extracting accurate distances from NOE data acquired at a series of mixing times can be beneficial for better defining the structure; MARDIGRAS combined with the random error analysis allows accurate if not very precise estimates of bounds for distances well above 6 A˚. For example, intra-residue adenine H2–H8 cross-peaks have been observed at 300 ms for the extended dimer of SL1 RNA from HIV-1 for adenines A13, A14, A21, and A26;120 these cross-peaks correspond to a fixed distance of 6.4 A˚ (Table 2). The RANDMARDI procedure produced distance bounds of 5.3–7.8, 5.4–7.4, 5.7–7.9, and 4.8–7.4 A˚ for these four cross-peaks, respectively, thus justifying the use of distance restraints for such long interproton distances. While the intra-residue H2–H8 distances are of no use for structure determination, there are several inter-residue H2–H8 and H2–H6 cross-peaks observed at higher mixing times in the region of the aromatic diagonal of the 2D NOESY spectra (not shown) and H19–H19 cross-peaks in the region of the anomeric diagonal (Figure 8). For example, the distance restraints of 4.0–7.7 A˚ for the sequential distance A27–H19– G28H19 and especially 4.4–7.0 A˚ for the cross-strand distance U8H19–A27H19 (Figure 9) are very helpful for better defining the RNA conformation, even despite the relatively large error bars. Indeed, even though the sequential H19–H19 distance is typically within the range of 5.3–5.8 A˚ for helical regions of RNA (Table 1), it can be beyond 10 A˚ for certain RNA conformations, and there is no theoretical upper limit for cross-strand distances.

9.08.4.3

Scalar Coupling Data

The magnitude of scalar (J) couplings between nuclei separated by rotatable bonds depends on the value of the dihedral (torsion) angle,202,203 which can be used in refinements as structural information. In addition to classical J-correlated spectroscopy (COSY)76 a number of experimental techniques have been developed for observing and measuring homonuclear and heteronuclear J-coupling constants, including E.COSY204 and quantitative J correlation,205 reviewed in Bax et al.206 Specific applications of these techniques to nucleic acids are discussed in a very comprehensive review by Wijmenga and van Buuren;167 see also applications of the constant time HSQC technique for measuring 3JCP and constant time COSY for 3JHP couplings207,208 and a discussion of line-fitting of homonuclear COSY peaks using the ACME program for measurements of small proton–proton couplings.209 A discussion of an older approach for measurement of 3JHH couplings using the SPHINX and LINSHA programs210 can be found.164 To briefly summarize these approaches, the backbone beta torsion angle (O59–C59) can be estimated from the 3JH59/50P5 or 3JC49P5 couplings, and the epsilon torsion (C39–O39) can be estimated from 3JH39P3, 3JC49P3, or 3 JC29P3. The parameterizations for the generalized Karplus equations for these couplings are given in Mooren et al.211 Also, the conformations for beta and epsilon torsions can be established qualitatively based on intensities of cross-peaks in the triple-resonance 3D HCP experiment.75,108 The gamma torsion angle (C49–C59) can be estimated using a properly parameterized Karplus equation for the 3JH49H59 and 3JH49H50 couplings, or qualitatively based on 3JH59C39, 3JH50C39, 2JH59C49, 2JH50C49; the same couplings can help establish stereospecific assignments for H59 and H50 protons.167,212 The glycosidic torsion angle can be estimated from the 3JH19C4/2 and 3JH19C8/6 couplings; the corresponding Karplus equation parameters have been derived.213 There are many homonuclear and heteronuclear couplings that are sensitive to the conformation, that is, pucker, of the five-membered sugar ring. The parameters for generalized Karplus equations for 3JHH couplings have been given,214,215 and parameters for 3JH39C19, 3JH29C49, 2JH29C19, 2JH39C29, 2JH29C39, and 2JH39C49 have been reported.213

266 Determination of Three-Dimensional Structures of Nucleic Acids by NMR

It is possible to use scalar coupling data directly during refinement of NMR structures using appropriately parameterized Karplus equations J ðjÞ ¼ A cos2 j þ B cos j þ C

ð2Þ

where j is the torsion angle, and A, B, C are parameters specific for the J-coupling (j). However, the functional form of generalized Karplus equations for J-couplings for the sugar ring is different from Equation (2),214,215 which is typically not implemented in refinement programs. More commonly, torsion angles are first estimated from the experimental J-couplings167 and then used as restraints during refinement. Sugar ring conformational parameters, pseudo-rotation phase angle and pseudo-rotation amplitude, can be estimated from the experimental J-couplings using the PSEUROT program,216 and then used directly as restraints or further converted into exocyclic torsion angles, depending on the refinement program. In RNA, most residues have C39-endo sugar puckers (N-conformations) with small 3JH19H29 coupling (<2–3 Hz) and with the corresponding H19–H29 cross-peaks not observed in homonuclear 2D COSY or TOCSY spectra. These cross-peaks are observed only for flexible residues and residues locked in the S-conformations; the presence or absence of TOCSY H19–H29 cross-peaks is often used for qualitative estimation of sugar puckers.

9.08.4.4

Residual Dipolar Couplings

Direct through-space interactions between magnetic dipoles depend on the dipole orientations in such a way that the interactions average to zero for molecules tumbling freely in isotropic solutions, giving rise to sharp NMR signals. However, if a molecule is oriented relative to the magnetic field, such interactions no longer average to zero, and the RDC can be observed.217–220 The orientation can be achieved by magnetic field after dissolving the molecule in dilute liquid crystalline media, such as phospholipid bicelles,221,222 filamentous phage Pf1,223,224 n-alcyl-PEG mixture with n-hexanol,225 or even by magnetic field alone.226,227 The RDC values depend on the nature of the nuclei, the distance between the nuclei, the average orientation of the internuclear vector relative to the magnetic field, and the degree of orientation. To maintain the NMR signals sharp, the degree of orientation must be very small, such that the dipolar interactions are on the order of 0.1% of their full values.228 Mathematically, the RDC value D can be calculated for the internuclear vector in the matrix form, using a symmetric alignment tensor with zero trace (with five independent matrix elements), see, for example, Tsui et al.229 Alternatively (see, e.g., Clore et al.230 and Bax et al.231), it can be written as 3 Dð;jÞ ¼ Da 3 cos2 – 1 þ R sin2 cos 2j 2

ð3Þ

where angles and j describe the orientation of the internuclear vector in the frame of the diagonalized alignment tensor, and Da and R are the alignment tensor characteristics called the magnitude of the residual dipolar coupling tensor and the rhombicity. Three more parameters not included explicitly in Equation (3) are the three Euler angles defining the alignment tensor orientation. The magnitude Da depends not only on the alignment tensor, but also on the nature of the two nuclei; in some refinement programs this parameter is always scaled for the N–H dipolar interactions, so the experimental RDC values must be scaled accordingly.232 Although it is possible to extract the orientations of chemical bonds from the RDC data,233 usually the RDC data are used directly in refinements, either via the explicit matrix expression or via Equation (3). The alignment tensor fitting experimental RDC values can be calculated for a given molecular structure using the singular-value decomposition method.234–236 For an unknown structure, parameters Da and R can be estimated based on analysis of the distribution of experimental RDC values, see, for example, Bax et al.231 However, this method works best only for proteins, where the distribution of internuclear vectors with measured RDCs is relatively uniform. For nucleic acids, Da and R can be estimated using grid search and preliminarily refined structures.221,237 Alternatively, the alignment tensor can be kept unconstrained and optimized together with molecular conformation during structure refinement.120,229 The latter approach has an advantage of not artificially restricting the conformational variability in the ensemble of refined structures by fixing the alignment tensor; also see Tjandra et al.221 for a discussion of this problem.

Determination of Three-Dimensional Structures of Nucleic Acids by NMR

267

Practically, RDCs are measured using the same methods as for J-couplings, specifically as a difference in couplings observed in the aligned media and couplings observed under isotropic conditions. It is easier to measure RDC for C–H and N–H vectors with large one-bond couplings. However, RDCs can be measured for a variety of one- and two-bond C–C, N–C, N–H, C–H and also for P–H and homonuclear H–H interactions that may not even be coupled under isotropic conditions (reviewed in Latham et al.8). Owing to the orientation dependence, the RDC restraints define the global orientation of internuclear vectors, making them fundamentally different from the distance and torsion angle restraints, which define relative positions of nuclei. This property makes RDC restraints especially valuable for elongated nucleic acid molecules, where the experimental errors in relative restraints tend to propagate along the polynucleotide chains. Incorporation of RDC data in refinement of nucleic acid structures substantially improves the accuracy and precision of global conformations.238–240 During the past decade, using RDC data became routine in structure determination of nucleic acids, both for DNA52,53,191,221,241–247 and RNA.68,101,120,248–278 9.08.4.5

Other Structural Restraints

In this section we briefly mention the rarely used or newly emerging types of structural information that can aid in refinements of solution structures. Only applications to nucleic acids will be listed, even though many of these methods have been previously applied to protein structure determination. When a paramagnetic molecule is present in solution, the magnetic dipoles of its unpaired electrons cause strong perturbations of chemical shifts of surrounding nuclei, called pseudo-contact shifts. A structure of a DNA duplex in complex with chromomycin A3 and a divalent metal was solved based on pseudo-contact shifts measured as difference in chemical shifts of Co2þ and Zn2þ complexes.279,280 The structure was refined together with the magnetic susceptibility tensor, the knowledge of which is necessary to calculate the pseudocontact shifts. Because of the long range nature of pseudo-contact shifts, the structure was defined to a much higher degree than is typical for NOE-based refinements. Chemical shift anisotropic (CSA) tensor of each nucleus is reduced to its isotropic average value for molecules in isotropic solutions. However, if a molecule is partially aligned in a liquid crystalline solution (see above), the incomplete CSA tensor averaging leads to a difference in the chemical shifts observed under isotropic and aligned conditions () on the order of a few to tens of parts per billion for the degrees of alignment typically used for RDC measurements. These changes in chemical shifts can be calculated for a given structure based on the molecular alignment tensor (see above), provided that the principal components of the CSA tensor and their orientation relative to the molecular frame are known.281 The data can provide efficient orientation restraints. 31P data were used in refinement of a DNA duplex242 using the 31P CSA tensor measured by single-crystal NMR for the phosphodiester diethyl phosphate.282 The only caveat for this approach is an assumption that the CSA tensor does not depend on molecular conformation. Nevertheless, magnitudes and orientations of CSA tensors for 31P and sugar carbons have fairly uniform values for the helical regions. This has been demonstrated by fitting observed data to the helical residues of a stem-and-loop RNA structure previously refined using RDC data.283,284 Still there may exist a conformation-dependent variability of CSA tensors, even though relatively small; there remains a concern that fixing the CSA tensors to particular values may artificially narrow the conformational envelope of refined structures. The 13C CSA tensor magnitudes are more conformation-dependent for some base carbons, for example, for pyrimidine C6 carbons in B-form DNA vs. A-form RNA.285 Such a dependence is likely even greater for nonhelical residues. When the positions of the downfield components of 1H–13C TROSY HSCQ cross-peaks286,287 are compared under the isotropic and aligned conditions, both RDC and CSA effects contribute to the observed difference: 9 ¼ þ RDC/2. These values, referred to as pseudo-CSA, can be used directly in molecular refinements.288 The reason for using combined 9 values, rather than and RDC separately, is that it is easier to measure accurately the positions of downfield TROSY components for larger molecules, because of the optimized line width of these components. In addition to using J-scalar coupling data (see above), several other methods have been proposed to estimating sugar conformations in RNA: based on the 13C–1H dipole–dipole cross-correlated relaxation,289 based on the cross-correlated relaxation rates involving 13C CSA and 13C–1H dipolar interactions,290 and based on the 13C chemical shifts of sugar carbons.291

268 Determination of Three-Dimensional Structures of Nucleic Acids by NMR

Isotropic chemical shifts of protons are very sensitive to the environment and as a result are very conformation-dependent. It is possible to calculate 1H chemical shifts from the nucleic acid structure.292–295 The observed chemical shift values are usually represented as a sum of ‘random coil’, or reference values, and a conformation-dependent part: obs ¼ ref þ conf. The conformation-dependent conf is calculated from a structure as a sum of two components, the ring current effect and the contribution of the electric field created by partial atomic charges of the molecule. The electric field contribution was shown to be minor for nucleic acids,293,294 but this conclusion may be structure-dependent to some extent. The reference values ref are calibrated based on a set of reference structures. The proton chemical shifts could be used as restraints during structural refinements, but more typically they are back-calculated from the refined structures for validation purposes, see, for example, Flodell et al.266 Finally, NMR data can be used in combination with data of other methods to determine solution structures of nucleic acids. For example, the homology model of E. coli tRNAVal based on the X-ray structure of yeast tRNAPhe was refined based on experimental RDC data and small angle X-ray scattering (SAXS) data.296

9.08.5 Three-Dimensional Structure Refinement The details of computational approaches for determination of NMR structures have been extensively reviewed,164,237,297–302 so only some general considerations will be discussed here. Refinement of nucleic acid structures based on experimental NMR data can be carried out with any molecular simulation or molecular modeling software that have options for calculating NMR parameters for simulated structures. Some examples of such general-purpose programs are AMBER,303 GROMOS,304 XPLOR,305 CNS,306 NIH version of XPLOR,232,307 and DYANA.308 Some programs specialized for nucleic acids modeling, such as miniCarlo309 or JUMNA310 are also capable of refining structures against NMR data.300,311 Most of these programs are not especially user-friendly, so the selection of software is often dictated by expertise present in a particular lab. However, this choice also depends on the availability of options for calculating specific NMR parameters acquired in the experiment. From this perspective, the NIH version of XPLOR is arguably one of the most advanced for NMR refinement of structure. In addition to the traditional XPLOR interface, it also has a Python wrapper, which allows great flexibility in designing refinement protocols and developing custom potentials. The downside is that it takes a Python programmer to fully utilize this program. Nevertheless, many sample Python scripts are distributed together with the software, which help a novice learn this program. The purpose of the refinement is to find a stereochemically sound structure or a set of structures that satisfy all experimental restraints. This is achieved by optimizing the total energy of the system defined as a sum of conformational energy of the molecule and the pseudo-energy of restraints: Econf. The conformational energy Econf is calculated according to a general-purpose force field or one specialized for nucleic acids (for a review, see Orozco et al.312). The role of the restraint energy Erestr is to enforce the experimental restraints; it can be either in the form of a simple harmonic potential or a flat-well potential313,314

Erestr

8 for x < xlower kðx – xlower Þ2 ; > > < 0; for xlower x xupper ¼ > > : 2 kðx – xupper Þ ; for x > xupper

ð4Þ

where x is an observable NMR parameter calculated for the structure, and xlower and xupper are the experimentally determined bounds for this parameter; k is a user-defined force constant. The form of potential described by Equation (4) does not penalize the molecule when the NMR parameter remains within the experimental uncertainty, but quickly builds up when it deviates from the observed values. Sometimes this form is further modified by making the potential linear when the calculated parameters deviate too far either from xlower or xupper178 to avoid an overly strong build-up of energy, which could interfere with some refinement engines, such as molecular dynamics (MD). When all observed parameters are self-consistent, the flat-well potential has a single (within experimental uncertainty) region of global minimum; a simple harmonic form of the penalty function is intended to simplify the otherwise rugged potential surface of the molecule.

Determination of Three-Dimensional Structures of Nucleic Acids by NMR

269

By far the most popular method of optimizing the total energy E is via simulated annealing (SA) protocols within restrained MD.315 In MD simulations, the Newtonian equation of motion is solved for a molecule coupled to a thermal bath.316 During SA, the molecule is simulated at first at high temperature, and then the temperature is slowly reduced; this procedure helps avoid entrapment in local energy minima. The Metropolis Monte Carlo method317 can be used to generate the Boltzmann distribution of molecular conformation at a given temperature; this method is also applicable for setting up the SA procedure.300 The molecule is often refined in the space of atomic Cartesian coordinates. However, it is also possible to refine NMR structures in the space of internal coordinates, either torsion angles,301,308 or helicoidal parameters.300 The helicoidal internal coordinates are only applicable for nucleic acids, while torsion angles can be used for refinement of any molecule. The internal variables module (IVM) in XPLOR allows a flexible setup of simulations using an arbitrary mixture of rigid-body, torsion angles, and Cartesian coordinates.307,318 Using internal coordinates has an advantage of significantly reducing the degrees of freedom in the system, allowing for a more efficient search for the global minimum energy. Even more important, it effectively prevents unintentional distortion of nucleic acid geometry, such as bond lengths and angles and planarity of aromatic bases. When using Cartesian coordinates-based methods, a sufficient number of improper torsion angle restraints must be used to enforce base planarity and the SHAKE algorithm to constrain N–H and C–H bond lengths.319 Still, the presence of NOE and especially RDC restraints may create strong forces distorting the bond angles involving N–H and C–H bonds; special care must be taken to prevent such distortions.320 Another computational method for the structure refinement is restrained energy minimization; this method is used less frequently and mostly in combination with internal coordinates, because it is less efficient in overcoming energy barriers between local minima, especially in the Cartesian coordinate space. The restrained energy minimization is often used, however, at the end of a SA protocol. All three methods, restrained MD, Metropolis Monte Carlo, and minimization require an initial structure. Such a structure can be either modeled, generated in an extended or random conformation, or calculated using experimental distance restraints utilizing the Distance Geometry algorithm.321 The conformational energy Econf is defined by a chemical force field; its role in the refinement of NMR structures is to make sure that the resulting structures are physically reasonable, that is, they do not contain inter-atomic clashes or unfavorable electrostatic interactions. It is necessary to use this term during the energy optimization, because experimental structural restraints alone are never sufficient to uniquely determine the solution conformation, even with the most complete NMR data. Force fields are often used even during X-ray refinements of high-resolution crystal structures.315 Because of this, there is always a possibility that the resulting structures are somewhat biased toward the force field used. In particular, it is still a matter of substantial controversy if the electrostatic component of Econf should be used during the NMR refinements of nucleic acids, see, for example, Zhou et al. and Bru¨nger et al.237,315 With few exceptions (see, e.g., Aramini et al.,191,323 and Schmitz et al.322), the simulations of nucleic acids during the refinements are carried out in vacuo, without explicit water molecules, with the effect of solvent modeled using an effective dielectric constant that scales down the electrostatic interactions. The electrostatic interactions are sometimes scaled down even further during the high-temperature stage of the SA procedure to lower energy barriers between local minima. Sometimes the electrostatic interactions are omitted entirely, and the van der Waals potential is replaced with a simplified repulsion term in an attempt to make the resulting structures less biased toward a particular force field choice. An alternative approach is to choose the force field as realistic as possible in an attempt to compensate for the always insufficient number of experimental restraints. An extreme of this approach is to supplement the chemical force field with a mean force potential describing relative positions of bases derived from a database of crystal structures.324 The question of the force field influence can be addressed directly by comparing structures refined using different force fields;174,237,240,300 this question is interconnected with the number and type of experimental structural restraints used in the refinement. It has been acknowledged that the degree of definition of nucleic acid structures can be rather low when only the NOE-based restraints are used325 and especially when important cross-strand restraints are missing.158 However, the degree of definition can improve when more NOE-based distance restraints are available.180,326 Also it improves dramatically when long-range RDC restraints are used.238–240 In particular, it has been shown that the force field dependence decreases for structures refined with RDC restraints.240

270 Determination of Three-Dimensional Structures of Nucleic Acids by NMR

The question of force field dependence is a part of two more general issues, precision and accuracy of structure determination by NMR, that is, the degree of definition of the structure and how far the determined structure is from the ‘true’ solution structure. An accepted method to determine the precision of structure determination is to repeat the computations with different initial conformations, typically randomly generated. Usually, the resulting conformations are ranked according to either total or restraint energy; the 10–20 best structures are selected to represent the final ‘NMR ensemble’. The differences between the conformations are usually assessed using atomic root-mean-square deviation (RMSD) after the structures are superimposed onto each other, although other measures have also been proposed that are independent of molecular size.327 The precision is expressed as an average RMSD, either an average pair-wise RMSD or an average RMSD between calculated structures and a structure with averaged coordinates. Assessing the accuracy of structure determination is a much more difficult problem, because the ‘true’ structure is not known. Still, certain things can be done to assess the quality of the refined structures. First, the quality of the conformations must be examined, for example, by comparing the conformational energy Econf of refined structure with the energy of the structure minimized in the absence of any restraints, or at least by verifying the absence of van der Waals clashes. Unfortunately, because of the higher intrinsic complexity, there is no convenient equivalent of a protein Ramachandran map for nucleic acids. Nevertheless, some validation tools are available with structure deposition in the PDB.1 Next, the degree to which all experimental restraints are satisfied by the refined structures must be examined. If possible, not only derived structural restraints (NOE-derived distances, torsion angles), but the raw NMR data as well (NOE intensities, J-couplings) must be analyzed. Several figures of merits, R- and Q-factors have been proposed for this purpose (reviewed in James328). Most often, average deviations are calculated for distance restraints, sixth-root weighted R-factor for NOE intensities, and average deviations or RMSD for RDC and J-couplings. Also, individual large deviations need to be examined separately, as they may potentially indicate problems with experimental data, such as integration errors or even mis-assignments. A good indicator of accuracy is the free R-factor,329 which is calculated by repeating the refinement with excluding 10% randomly chosen experimental data, and then calculating the R-factor only for these 10% of data. The accuracy of a procedure for structure determination can also be assessed using NMR data simulated for model structures.238–240,326 Using this approach, a possible bias in refined structures due to a particular choice of computational procedure, force field, number and type of experimental restraints, and so on can be investigated. However, the source of the bias may also be due to experimental data, such as mis-assignments, incorrectly estimated experimental errors, or conformational averaging. Since nucleic acids are flexible in solution at room temperature, structural restraints derived from NMR data are averaged over the measurement time and over the ensemble of accessible conformations, sometimes with complicated averaging rules (e.g., for NOE-derived distances). Therefore, structures determined by one of the methods outlined above represent average structures, or more exactly, model structures satisfying average experimental restraints. The conformational variations in the ‘NMR ensembles’ must be regarded as reflecting a degree of indetermination of such an average structure by available experimental data, and not the true variability of solution conformations, although sometimes there may be some correlation between the two. When all solution conformations belong to the same energy minimum, the average structure will be close to this minimum and have low energy. However, when distinct conformers contribute to the observed NMR signal, the NMR-derived structural restraints may have intrinsic contradictions, and the resulting refined ‘average’ structures may have a relatively high energy. The best documented example of such a situation is sugar repuckering in DNA. Sugar rings in solution DNA exist in two rapidly interconverting conformations; the major conformer is S, and the minor conformer is N; the minor conformation is often more pronounced for pyrimidine residues, see, for example, Schmitz and James,164 Rinkel et al.,330 Celda et al.,331 and Ulyanov et al.332 Many observable NMR parameters, including interproton distances and J-couplings, depend on the exact sugar conformation; therefore, values for the corresponding experimental restraints are averaged taking into account populations of these two conformers. This leads to conformational averaging artifacts, whenever one attempts to satisfy such restraints during refinement, thus explaining why high-resolution NMR structures of DNA duplexes tend to have sugars with lower pseudo-rotation phase angle than crystal structures (see, e.g., Ulyanov and James165).

Determination of Three-Dimensional Structures of Nucleic Acids by NMR

271

Several approaches have been developed that allow the time- and ensemble-averaged nature of NMR restraints and determining individual solution conformers. They include MD with time-averaging of restraints (MDtar),333 calculating the populations of individual solution conformers either with quadratic programming algorithm (PDQPRO)332 or genetic algorithm (FINGAR),334 and various variants of multiple-copy refinement.335–338 Applications to nucleic acids have been mostly limited to either MDtar or MDtar/PDQPRO combination.322,323,339–341 The main impediment for successful application of these methods has been a paucity of experimental restraints, because a significantly greater number of restraints are required to define several solution conformers than a single average conformation. However, with many new types of structural restraints introduced recently, we can expect a renewed interest in application of such methods, see, for example, Schwieters and Clore.342 In conclusion, NMR has become a routine and reliable technique for the determination of average solution structures of moderately sized nucleic acids of up to about 30–40 nt; the challenges for structure determination increase more than linearly as molecular size increases. It is expected that development of experimental and computational techniques will lead to rapid progress in two directions: a better understanding of flexibility and dynamics of nucleic acids in solution,342,343 and increasing the size limit of nucleic acids amenable to structure determination.254,259,261,265,275,296

Abbreviations 1D 2D 3D COSY CSA DNA dNTP ESRA HHR HPLC ISPA IVM MD MDtar NMP NMR NOE NTP PAGE PCR PDB RDC RMSD ROESY PEG RNA RNAP SA SAXS SRP tRNA

one-dimensional two-dimensional three-dimensional classical J-correlated spectroscopy chemical shift anisotropy deoxyribonucleic acid deoxynucleoside triphosphate endonuclease-sensitive repeat amplification hammerhead ribozyme high-performance liquid chromatography isolated spin-pair approximation internal variables module molecular dynamics MD with time-averaging of restraints nucleoside monophosphate nuclear magnetic resonance nuclear Overhauser effect nucleoside triphosphate polyacrylamide gel electrophoresis polymerase chain reaction Protein Data Bank residual dipolar coupling root-mean square deviation residual rotating frame cross-relaxation polyethylene glycol ribonucleic acid RNA polymerase simulated annealing small-angle X-ray scattering signal recognition particle transfer RNA

272 Determination of Three-Dimensional Structures of Nucleic Acids by NMR

Nomenclature Hz kDa ms ns nt ppm

hertz kiloDalton millisecond nanosecond nucleotide parts per million

References 1. H. M. Berman; J. Westbrook; Z. Feng; G. Gilliland; T. N. Bhat; H. Weissig; I. N. Shindyalov; P. E. Bourne, Nucleic Acids Res. 2000, 28, 235–242. 2. G. Wagner; S. G. Hyberts; T. F. Havel, Annu. Rev. Biophys. Biomol. Struct. 1992, 21, 167–198. 3. M. Billeter, Q. Rev. Biophys. 1992, 25, 325–377. 4. K. Wu¨thrich, Acta Crystallogr. D Biol. Crystallogr. 1995, 51, 249–270. 5. E. P. Nikonowicz; A. Pardi, J. Mol. Biol. 1993, 232, 1141–1156. 6. G. Varani; F. Aboulela; F. H. T. Allain, Prog. Nucl. Magn. Reson. Spectrosc. 1996, 29, 51–127. 7. B. Furtig; C. Richter; J. Wohnert; H. Schwalbe, Chembiochem. 2003, 4, 936–962. 8. M. P. Latham; D. J. Brown; S. A. McCallum; A. Pardi, Chembiochem. 2005, 6, 1492–1505. 9. J. F. Milligan; O. C. Uhlenbeck, Methods Enzymol. 1989, 180, 51–62. 10. C. Kojima; A. Ono; M. Kainosho, Methods Enzymol. 2001, 338, 261–283. 11. M. H. Caruthers; A. D. Barone; S. L. Beaucage; D. R. Dodds; E. F. Fisher; L. J. McBride; M. Matteucci; Z. Stabinsky; J. Y. Tang, Methods Enzymol. 1987, 154, 287–313. 12. P. Wenter; L. Reymond; S. D. Auweter; F. H. Allain; S. Pitsch, Nucleic Acids Res. 2006, 34, e79. 13. A. Ono; S. Tate; Y. Ishido; M. Kainosho, J. Biomol. NMR 1994, 4, 581–586. 14. J. Santalucia; L. X. Shen; Z. P. Cai; H. Lewis; I. Tinoco, Jr., Nucleic Acids Res. 1995, 23, 4913–4921. 15. A. J. Shallop; B. L. Gaffney; R. A. Jones, J. Org. Chem. 2003, 68, 8657–8661. 16. G. A. Kassavetis; E. T. Butler; D. Roulland; M. J. Chamberlin, J. Biol. Chem. 1982, 257, 5779–5788. 17. J. J. Dunn; F. W. Studier, J. Mol. Biol. 1983, 166, 477–535. 18. C. E. Morris; J. F. Klement; W. T. McAllister, Gene 1986, 41, 193–200. 19. S. Tabor, Curr. Protoc. Mol. Biol. 2001, 16, 16.2.1–16.2.11. 20. G. Krupp, Gene 1988, 72, 75–89. 21. P. Davanloo; A. H. Rosenberg; J. J. Dunn; F. W. Studier, Proc. Natl. Acad. Sci. U.S.A. 1984, 81, 2035–2039. 22. J. R. Wyatt; M. Chastain; J. D. Puglisi, Biotechniques 1991, 11, 764–769. 23. B. He; M. Rong; D. Lyakhov; H. Gartenstein; G. Diaz; R. Castagna; W. T. McAllister; R. K. Durbin, Protein Expr. Purif. 1997, 9, 142–151. 24. R. T. Batey; J. L. Battiste; J. R. Williamson, Methods Enzymol. 1995, 261, 300–322. 25. E. Nikonowicz, Methods Enzymol. 2001, 338, 320–341. 26. S. T. Jeng; J. F. Gardner; R. I. Gumport, J. Biol. Chem. 1992, 267, 19306–19312. 27. J. F. Milligan; D. R. Groebe; G. W. Witherell; O. C. Uhlenbeck, Nucleic Acids Res. 1987, 15, 8783–8798. 28. G. M. Cheetham; T. A. Steitz, Curr. Opin. Struct. Biol. 2000, 10, 117–123. 29. D. Imburgio; M. Rong; K. Ma; W. T. McAllister, Biochemistry 2000, 39, 10419–10430. 30. T. P. Shields; E. Mollova; L. Ste Marie; M. R. Hansen; A. Pardi, RNA 1999, 5, 1259–1267. 31. P. J. Lukavsky; J. D. Puglisi, RNA 2004, 10, 889–893. 32. C. Kao; M. Zheng; S. Rudisser, RNA 1999, 5, 1268–1272. 33. Y. Hayase; H. Inoue; E. Ohtsuka, Biochemistry 1990, 29, 8793–8797. 34. J. Xu; J. Lapham; D. M. Crothers, Proc. Natl. Acad. Sci. U.S.A. 1996, 93, 44–48. 35. C. A. Grosshans; T. R. Cech, Nucleic Acids Res. 1991, 19, 3875–3880. 36. I. Kim; P. J. Lukavsky; J. D. Puglisi, J. Am. Chem. Soc. 2002, 124, 9338–9339. 37. A. G. Tzakos; L. E. Easton; P. J. Lukavsky, J. Am. Chem. Soc. 2006, 128, 13344–13345. 38. J. M. Carothers; J. H. Davis; J. J. Chou; J. W. Szostak, RNA 2006, 12, 567–579. 39. C. J. Hutchins; P. D. Rathjen; A. C. Forster; R. H. Symons, Nucleic Acids Res. 1986, 14, 3627–3640. 40. G. Ferbeyre; V. Bourdeau; M. Pageau; P. Miramontes; R. Cedergren, Genome Res. 2000, 10, 1011–1019. 41. G. A. Prody; J. T. Bakos; J. M. Buzayan; I. R. Schneider; G. Bruening, Science 1986, 231, 1577–1580. 42. O. C. Uhlenbeck, RNA 1995, 1, 4–6. 43. H. K. Cheong; E. Hwang; C. Lee; B. S. Choi; C. Cheong, Nucleic Acids Res. 2004, 32, e84. 44. J. S. Kieft; R. T. Batey, RNA 2004, 10, 988–995. 45. R. T. Batey; J. S. Kieft, RNA 2007, 13, 1384–1389. 46. K. A. LeCuyer; L. S. Behlen; O. C. Uhlenbeck, EMBO J. 1996, 15, 6847–6853. 47. W. C. Winkler; A. Nahvi; A. Roth; J. A. Collins; R. R. Breaker, Nature 2004, 428, 281–286. 48. A. Vermeulen; S. A. McCallum; A. Pardi, Biochemistry 2005, 44, 6024–6033. 49. D. P. Zimmer; D. M. Crothers, Proc. Natl. Acad. Sci. U.S.A. 1995, 92, 3091–3095.

Determination of Three-Dimensional Structures of Nucleic Acids by NMR 50. 51. 52. 53. 54. 55. 56. 57. 58. 59. 60. 61. 62. 63. 64. 65. 66. 67. 68. 69. 70. 71. 72. 73. 74. 75. 76. 77. 78. 79. 80. 81. 82. 83. 84. 85. 86. 87. 88. 89. 90. 91. 92. 93. 94. 95. 96. 97. 98. 99. 100. 101. 102. 103. 104. 105. 106. 107. 108. 109. 110. 111. 112. 113. 114. 115.

273

D. E. Smith; J.-Y. Su; F. M. Jucker, J. Biomol. NMR 1997, 10, 245–253. G. Mer; W. J. Chazin, J. Am. Chem. Soc. 1998, 120, 607–608. D. MacDonald; K. Herbert; X. Zhang; T. Polgruto; P. Lu, J. Mol. Biol. 2001, 306, 1081–1098. A. Barbic; D. P. Zimmer; D. M. Crothers, Proc. Natl. Acad. Sci. U.S.A. 2003, 100, 2369–2373. H. Klenow; I. Henningsen, Proc. Natl. Acad. Sci. U.S.A. 1970, 65, 168–175. V. Derbyshire; N. D. Grindley; C. M. Joyce, EMBO J. 1991, 10, 17–24. J. E. Masse; P. Bortmann; T. Dieckmann; J. Feigon, Nucleic Acids Res. 1998, 26, 2618–2624. A. Kettani; S. Bouaziz; E. Skripkin; A. Majumdar; W. Wang; R. A. Jones; D. J. Patel, Structure 1999, 7, 803–815. J. M. Louis; R. G. Martin; G. M. Clore; A. M. Gronenborn, J. Biol. Chem. 1998, 273, 2374–2378. X. Chen; S. V. S. Mariappan; J. J. Kelly, III; J. H. Bushweller; E. M. Bradbury; G. Gupta, FEBS Lett. 1998, 436, 372–376. M. H. Werner; V. Gupta; L. J. Lambert; T. Nagata, Methods Enzymol. 2001, 338, 283–304. B. Rene; G. Masliah; L. Zargarian; O. Mauffret; S. Fermandjian, J. Biomol. NMR 2006, 36, 137–146. S. Ramanathan; B. J. Rao; K. V. Chary, Biochem. Biophys. Res. Commun. 2002, 290, 928–932. J. Cromsigt; B. van Buuren; J. Schleucher; S. Wijmenga, Methods Enzymol. 2001, 338, 371–399. S.-I. Yamakage; T. V. Maltseva; F. P. Nilson; A. Fo¨ldesi; J. Chattopadhyaya, Nucleic Acids Res. 1993, 21, 5005–5011. A. Fo¨ldesi; S.-I. Yamakage; F. P. R. Nilsson; T. V. Maltseva; J. Chattopadhyaya, Nucleic Acids Res. 1996, 24, 1187–1194. C. Glemarec; J. Kufel; A. Foldesi; T. Maltseva; A. Sandstrom; L. A. Kirsebom; J. Chattopadhyaya, Nucleic Acids Res. 1996, 24, 2022–2035. U. Schmitz; S. Behrens; D. M. Freymann; R. J. Keenan; P. Lukavsky; P. Walter; T. L. James, RNA 1999, 5, 1419–1429. Z. Du; N. B. Ulyanov; J. Yu; R. Andino; T. L. James, Biochemistry 2004, 43, 5757–5771. P. J. Romaniuk; O. C. Uhlenbeck, Methods Enzymol. 1983, 100, 52–59. J. R. Sampson; O. C. Uhlenbeck, Proc. Natl. Acad. Sci. U.S.A. 1988, 85, 1033–1037. T. Ohtsuki; G. Kawai; K. Watanabe, J. Biochem. 1998, 124, 28–34. T. Ohtsuki; G. Kawai; K. Watanabe, FEBS Lett. 2002, 514, 37–43. K. Wu¨thrich, NMR of Proteins and Nucleic Acids; Wiley: New York, 1986. F. J. M. van de Ven; C. W. Hilbers, Nucleic Acids Res. 1988, 16, 5713–5726. P. J. Lukavsky, Basic Principles of RNA NMR Spectroscopy. In Structure and Biophysics – New Technologies for Current Challenges in Biology and Beyond; J. D. Puglisi, Ed.; Heidelberg: Springer 2007; pp 65–80. W. P. Aue; E. Bartholdi; R. R. Ernst, J. Chem. Phys. 1976, 64, 2229–2246. L. Braunschweiler; R. R. Ernst, J. Magn. Reson. 1983, 53, 521–528. A. Bax; S. Subramanian, J. Magn. Reson. 1986, 67, 565–569. C. Griesinger; G. Otting; K. Wu¨thrich; R. R. Ernst, J. Am. Chem. Soc. 1988, 110, 7870–7872. C. A. G. Haasnoot; F. A. A. M. de Leeuw; C. Altona, Tetrahedron 1980, 36, 2783–2792. L. J. Rinkel; C. Altona, J. Biomol. Struct. Dyn. 1987, 4, 621–649. F. Delaglio; S. Grzesiek; G. W. Vuister; G. Zhu; J. Pfeifer; A. Bax, J. Biomol. NMR 1995, 6, 277–293. T. D. Goddard; D. G. Kneller, SPARKY, Ver. 3.0; University of California: San Francisco, 1998. L. E. Kay; M. Ikura; A. Bax, J. Am. Chem. Soc. 1990, 112, 888–889. G. M. Clore; A. Bax; P. C. Driscoll; P. T. Wingfield; A. M. Gronenborn, Biochemistry 1990, 29, 8172–8184. S. W. Fesik; H. L. Eaton; E. T. Olejniczak; E. R. P. Zuiderweg; L. P. McIntosh; F. W. Dahlquist, J. Am. Chem. Soc. 1990, 112, 886–888. A. Pardi, Methods Enzymol. 1995, 261, 350–380. P. Legault; B. T. Farmer; L. Mueller; A. Pardi, J. Am. Chem. Soc. 1994, 116, 2203–2204. J. P. Marino; J. H. Prestegard; D. M. Crothers, J. Am. Chem. Soc. 1994, 116, 2205–2206. J.-P. Simorre; G. R. Zimmermann; L. Mueller; A. Pardi, J. Am. Chem. Soc. 1996, 118, 5316–5317. B. Simon; K. Zanier; M. Sattler, J. Biomol. NMR 2001, 20, 173–176. V. Sklena`r; J. Masse; J. Feigon, J. Magn. Reson. 1999, 137, 345–349. N. B. Ulyanov; W. R. Bauer; T. L. James, J. Biomol. NMR 2002, 22, 265–280. J.-P. Simorre; G. R. Zimmermann; A. Pardi; B. T. Farmer, II; L. Mueller, J. Biomol. NMR 1995, 6, 427–432. J.-P. Simorre; G. R. Zimmermann; L. Mueller; A. Pardi, J. Biomol. NMR 1996, 7, 153–156. V. Sklena`r; T. Dieckmann; S. E. Butcher; J. Feigon, J. Biomol. NMR 1996, 7, 83–87. R. Fiala; F. Jiang; D. J. Patel, J. Am. Chem. Soc. 1996, 118, 689–690. J. Wohnert; R. Ramachandran; M. Gorlach; L. R. Brown, J. Magn. Reson. 1999, 139, 430–433. J. Wohnert; M. Gorlach; H. Schwalbe, J. Biomol. NMR 2003, 26, 79–83. J. P. Marino; J. L. Diener; P. B. Moore; C. Griesinger, J. Am. Chem. Soc. 1997, 119, 7361–7366. Z. Du; J. Yu; N. B. Ulyanov; R. Andino; T. L. James, Biochemistry 2004, 43, 11959–11972. V. Sklena`r; R. D. Peterson; M. R. Rejante; J. Feigon, J. Biomol. NMR 1993, 3, 721–728. B. T. Farmer, II; L. Muller; E. P. Nikononwicz; A. Pardi, J. Biomol. NMR 1994, 4, 129–134. R. Fiala; J. Czernek; V. Sklenar, J. Biomol. NMR 2000, 16, 291–302. R. Riek; K. Pervushin; C. Fernandez; M. Kainosho; K. Wu¨thrich, J. Am. Chem. Soc. 2001, 123, 658–664. B. Brutscher; J. P. Simorre, J. Biomol. NMR 2001, 21, 367–372. H. A. Heus; S. S. Wijmenga; F. J. M. van de Ven; C. W. Hilbers, J. Am. Chem. Soc. 1994, 116, 4983–4984. J. P. Marino; H. Schwalbe; C. Anklin; W. Bermel; D. M. Crothers; C. Griesinger, J. Am. Chem. Soc. 1994, 116, 6472–6473. G. Varani; F. Aboulela; F. Allain; C. C. Gubser, J. Biomol. NMR 1995, 5, 315–320. S. Tate; A. Ono; M. Kainosho, J. Magn. Reson. B 1995, 106, 89–91. R. Ramachandran; C. Sich; M. Gru¨ne; V. Soskie; L. R. Brown, J. Biomol. NMR 1996, 7, 251–255. J. P. Marino; H. Schwalbe; C. Anklin; W. Bermel; D. M. Crothers; C. Griesinger, J. Biomol. NMR 1995, 5, 87–92. S. A. Schroeder; J. M. Fu; C. R. Jones; D. G. Gorenstein, Biochemistry 1987, 26, 3812–3821. G. W. Kellogg; A. A. Szewczak; P. B. Moore, J. Am. Chem. Soc. 1992, 114, 2727–2728. G. W. Kellogg; B. I. Schweitzer, J. Biomol. NMR 1993, 3, 577–595.

274 Determination of Three-Dimensional Structures of Nucleic Acids by NMR 116. 117. 118. 119. 120. 121. 122. 123. 124. 125. 126. 127. 128. 129. 130. 131. 132. 133. 134. 135. 136. 137. 138. 139. 140. 141. 142. 143. 144. 145. 146. 147. 148. 149. 150. 151. 152. 153. 154. 155. 156. 157. 158. 159. 160. 161. 162. 163. 164. 165. 166. 167. 168. 169. 170. 171. 172. 173. 174. 175. 176. 177. 178. 179. 180. 181.

J. Jeener; B. H. Meier; P. Bachmann; R. R. Ernst, J. Chem. Phys. 1979, 71, 4546–4553. L. Mueller; P. Legault; A. Pardi, J. Am. Chem. Soc. 1995, 117, 11043–11048. C. Zwahlen; P. Legault; S. J. F. Vincent; J. Greenblatt; R. Konrat; L. E. Kay, J. Am. Chem. Soc. 1997, 119, 6711–6721. L. R. Comolli; N. B. Ulyanov; A. M. Soto; L. A. Marky; T. L. James; W. H. Gmeiner, Nucleic Acids Res. 2002, 30, 4371–4379. N. B. Ulyanov; A. Mujeeb; Z. Du; M. Tonelli; T. G. Parslow; T. L. James, J. Biol. Chem. 2006, 281, 16168–16177. N. B. Leontis; E. Westhof, RNA 2001, 7, 499–512. H. A. Heus; A. Pardi, J. Am. Chem. Soc. 1991, 113, 4360–4361. J. F. Lefevre; A. N. Lane; O. Jardetzky, FEBS Lett. 1985, 190, 37–40. N. Ulyanov; M. H. Sarma; V. B. Zhurkin; R. H. Sarma, Biochemistry 1993, 32, 6875–6883. P. R. Blake; B. Lee; M. F. Summers; M. W. Adams; J. B. Park; Z. H. Zhou; A. Bax, J. Biomol. NMR 1992, 2, 527–533. A. J. Dingley; F. Cordier; S. Grzesiek, Concepts Magn. Reson. 2001, 13, 103–127. S. Grzesiek; F. Cordier; A. Dingley, Methods Enzymol. 2001, 338, 111–133. P. V. Cornish; D. P. Giedroc; M. Hennig, J. Biomol. NMR 2006, 35, 209–223. A. J. Dingley; L. Nisius; F. Cordier; S. Grzesiek, Nat. Protoc. 2008, 3, 242–248. A. J. Dingley; S. Grzesiek, J. Am. Chem. Soc. 1998, 120, 8293–8297. K. Pervushin; A. Ono; C. Fernandez; T. Szyperski; M. Kainosho; K. Wu¨thrich, Proc. Natl. Acad. Sci. U.S.A. 1998, 95, 14147–14151. A. J. Dingley; J. E. Masse; R. D. Peterson; M. Barfield; J. Feigon; S. Grzesiek, J. Am. Chem. Soc. 1999, 121, 6019–6027. A. Majumdar; A. Kettani; E. Skripkin, J. Biomol. NMR 1999, 14, 67–70. A. J. Dingley; J. E. Masse; J. Feigon; S. Grzesiek, J. Biomol. NMR 2000, 16, 279–289. A. Majumdar; A. Kettani; E. Skripkin; D. J. Patel, J. Biomol. NMR 1999, 15, 207–211. M. Hennig; J. R. Williamson, Nucleic Acids Res. 2000, 28, 1585–1593. D. P. Giedroc; P. V. Cornish; M. Hennig, J. Am. Chem. Soc. 2003, 125, 4676–4677. H. Sotoya; A. Matsugami; T. Ikeda; K. Ouhashi; S. Uesugi; M. Katahira, Nucleic Acids Res. 2004, 32, 5113–5118. C. H. Arrowsmith; R. Pachter; R. B. Altman; S. B. Iyer; O. Jardetzky, Biochemistry 1990, 29, 6332–6341. P. J. M. Folkers; R. H. A. Folmer; R. N. H. Konings; C. W. Hilbers, J. Am. Chem. Soc. 1993, 115, 3798–3799. M. Burgering; R. Boelens; R. Kaptein, J. Biomol. NMR 1993, 3 (6), 709–714. F. Aboul-ela; E. P. Nikonowicz; A. Pardi, FEBS Lett. 1994, 347, 261–264. N. B. Ulyanov; V. I. Ivanov; E. E. Minyat; E. B. Khomyakov; M. V. Petrova; K. Lesiak; T. L. James, Biochemistry 1998, 37, 12715–12726. R. Kaptein; R. Boelens; R. M. Scheek; W. F. van Gunsteren, Biochemistry 1988, 27, 5389–5394. A. Kumar; R. R. Ernst; K. Wu¨thrich, Biochem. Biophys. Res. Commun. 1980, 95, 1–6. S. Macura; R. R. Ernst, Mol. Phys. 1980, 41, 95–117. S. Macura; Y. Huang; D. Suter; R. R. Ernst, J. Magn. Reson. 1981, 43, 259–281. J. W. Keepers; T. L. James, J. Magn. Reson. 1984, 57, 404–426. H. Liu; P. D. Thomas; T. L. James, J. Magn. Reson. 1992, 98, 163–175. W. Massefski; P. H. Bolton, J. Magn. Reson. 1985, 65, 526–530. E.-I. Suzuki; N. Pattabiraman; G. Zon; T. L. James, Biochemistry 1986, 25, 6854–6865. G. Gupta; M. H. Sarma; R. H. Sarma, Biochemistry 1988, 27, 7909–7918. A. N. Lane, Biochim. Biophys. Acta 1990, 1049, 205–212. J. D. Baleja; J. Moult; B. D. Sykes, J. Magn. Reson. 1990, 87, 375–384. J. D. Baleja; M. W. Germann; J. H. van de Sande; B. D. Sykes, J. Mol. Biol. 1990, 215, 411–428. A. M. J. J. Bonvin; R. Boelens; R. Kaptein, J. Biomol. NMR 1991, 1, 305–309. H. Robinson; A. H.-J. Wang, Biochemistry 1992, 31, 3524–3533. N. Ulyanov; A. A. Gorin; V. B. Zhurkin; B.-C. Chen; M. H. Sarma; R. H. Sarma, Biochemistry 1992, 31, 3918–3930. S.-G. Kim; B. R. Reid, Biochemistry 1992, 31, 12103–12116. M. Foti; S. Marshalko; E. Schurter; S. Kumar; G. P. Beardsley; B. I. Schweitzer, Biochemistry 1997, 36, 5336–5345. A. Kumar; G. Wagner; R. R. Ernst; K. Wu¨thrich, J. Am. Chem. Soc. 1981, 103, 3654–3658. P. Cuniasse; L. C. Sowers; R. Eritja; B. Kaplan; M. F. Goodman; J. A. H. Cognet; M. LeBret; W. Guschlbauer; G. V. Fazakerley, Nucleic Acids Res. 1987, 15, 8003–8022. B. Reid; K. Banks; P. Flynn; W. Nerdal, Biochemistry 1989, 28, 10001–10007. U. Schmitz; T. L. James, Methods Enzymol. 1995, 261, 3–44. N. B. Ulyanov; T. L. James, Methods Enzymol. 1995, 261, 90–120. F. J. M. van de Ven; M. J. J. Blommers; R. E. Schouten; C. W. Hilbers, J. Magn. Reson. 1991, 94, 140–151. S. S. Wijmenga; B. N. M. van Buuren, Prog. Nucl. Magn. Reson. Spectrosc. 1998, 32, 287–387. B. A. Borgias; T. L. James, Methods Enzymol. 1989, 176, 169–183. B. A. Borgias; T. L. James, J. Magn. Reson. 1990, 87, 475–487. R. Boelens; T. M. G. Koning; G. A. van der Marel; J. H. van Boom; R. Kaptein, J. Magn. Reson. 1989, 82, 290–308. C. B. Post; R. P. Meadows; D. G. Gorenstein, J. Am. Chem. Soc. 1990, 112, 6796–6803. M. Gochin; T. L. James, Biochemistry 1990, 29, 11172–11180. K. Weisz; R. H. Shafer; W. Egan; T. L. James, Biochemistry 1992, 31, 7477–7487. U. Schmitz; I. Sethson; W. Egan; T. L. James, J. Mol. Biol. 1992, 227, 510–531. A. Mujeeb; S. M. Kerwin; W. Egan; G. L. Kenyon; T. L. James, Biochemistry 1992, 31, 9325–9338. M. Tonelli; E. Ragg; A. M. Bianucci; K. Lesiak; T. L. James, Biochemistry 1998, 37, 11745–11761. H. Liu; H. P. Spielmann; N. B. Ulyanov; D. E. Wemmer; T. L. James, J. Biomol. NMR 1995, 6, 390–402. U. Schmitz; D. A. Pearlman; T. L. James, J. Mol. Biol. 1991, 221, 271–292. A. Mujeeb; S. M. Kerwin; G. L. Kenyon; T. L. James, Biochemistry 1993, 32, 13419–13431. K. Weisz; R. H. Shafer; W. Egan; T. L. James, Biochemistry 1994, 33, 354–366. H. P. Spielmann; T. J. Dwyer; J. E. Hearst; D. E. Wemmer, Biochemistry 1995, 34, 12937–12953.

Determination of Three-Dimensional Structures of Nucleic Acids by NMR 182. 183. 184. 185. 186. 187. 188. 189. 190. 191. 192. 193. 194. 195. 196. 197. 198. 199. 200. 201. 202. 203. 204. 205. 206. 207. 208. 209. 210. 211. 212. 213. 214. 215. 216. 217. 218. 219. 220. 221. 222. 223. 224. 225. 226. 227. 228. 229. 230. 231. 232. 233. 234. 235. 236. 237. 238. 239. 240. 241. 242. 243.

275

P. V. Sahasrabudhe; R. T. Pon; W. H. Gmeiner, Biochemistry 1996, 35, 13597–13608. Y. Coppel; N. Berthet; C. Coulombeau; C. Coulombeau; J. Garcia; J. Lhomme, Biochemistry 1997, 36, 4817–4830. M. Petersen; J. P. Jacobsen, Bioconjug. Chem. 1998, 9, 331–340. N. B. Ulyanov; V. I. Ivanov; E. E. Minyat; E. B. Khomyakova; M. V. Petrova; K. Lesiak; T. L. James, Biochemistry 1998, 37, 12715–12726. L. Ayadi; M. Jourdan; C. Coulombeau; J. Garcia; R. Lavery, J. Biomol. Struct. Dyn. 1999, 17, 245–257. E. V. Bichenkova; D. Marks; M. I. Dobrikov; V. V. Vlassov; G. A. Morris; K. T. Douglas, J. Biomol. Struct. Dyn. 1999, 17, 193–211. R. J. Isaacs; W. S. Rayens; H. P. Spielmann, J. Mol. Biol. 2002, 319, 191–207. M. Petersen; K. Bondensgaard; J. Wengel; J. P. Jacobsen, J. Am. Chem. Soc. 2002, 124, 5974–5982. H. V. Tommerholt; N. K. Christensen; P. Nielsen; J. Wengel; P. C. Stein; J. P. Jacobsen; M. Petersen, Org. Biomol. Chem. 2003, 1, 1790–1797. J. M. Aramini; S. H. Cleaver; R. T. Pon; R. P. Cunningham; M. W. Germann, J. Mol. Biol. 2004, 338, 77–91. I. Gomez-Pinto; E. Cubero; S. G. Kalko; V. Monaco; G. van der Marel; J. H. van Boom; M. Orozco; C. Gonzalez, J. Biol. Chem. 2004, 279, 24552–24560. Q. Zhang; T. J. Dwyer; V. Tsui; D. A. Case; J. Cho; P. B. Dervan; D. E. Wemmer, J. Am. Chem. Soc. 2004, 126, 7958–7966. H. Baruah; M. W. Wright; U. Bierbach, Biochemistry 2005, 44, 6059–6070. G. Shanmugam; A. K. Goodenough; I. D. Kozekov; F. P. Guengerich; C. J. Rizzo; M. P. Stone, Chem. Res. Toxicol. 2007, 20, 1601–1611. D. J. Kerwood; P. N. Borer, Magn. Reson. Chem. 1996, 34, S136–S146. P. V. Sahasrabudhe; W. H. Gmeiner, Biochemistry 1997, 36, 5981–5991. A. Mujeeb; T. G. Parslow; A. Zarrinpar; C. Das; T. L. James, FEBS Lett. 1999, 458, 387–392. U. Schmitz; T. L. James; P. Lukavsky; P. Walter, Nat. Struct. Biol. 1999, 6, 634–638. D. J. Kerwood; M. J. Cavaluzzi; P. N. Borer, Biochemistry 2001, 40, 14518–14529. Y. Yuan; D. J. Kerwood; A. C. Paoletti; M. F. Shubsda; P. N. Borer, Biochemistry 2003, 42, 5259–5269. M. Karplus, J. Chem. Phys. 1959, 30, 11–31. M. Karplus, J. Am. Chem. Soc. 1963, 85, 2870–2871. C. Griesinger; O. W. Sorensen; R. R. Ernst, J. Magn. Reson. 1987, 75, 474–492. G. W. Vuister; A. Bax, J. Am. Chem. Soc. 1993, 115, 7772–7777. A. Bax; G. W. Vuister; S. Grzesiek; F. Delaglio; A. C. Wang; R. Tschudin; G. Zhu, Methods Enzymol. 1994, 239, 79–105. C. Sich; O. Ohlenschla¨ger; R. Ramachandran; M. Go¨rlach; L. R. Brown, Biochemistry 1997, 36, 13989–14002. T. Carlomagno; M. Hennig; J. R. Williamson, J. Biomol. NMR 2002, 22, 65–81. F. Delaglio; Z. Wu; A. Bax, J. Magn. Reson. 2001, 149, 276–281. H. Widmer; K. Wu¨thrich, J. Magn. Reson. 1986, 70, 270–279. M. M. Mooren; S. S. Wijmenga; G. A. van der Marel; J. H. van Boom; C. W. Hilbers, Nucleic Acids Res. 1994, 22, 2658–2666. J. P. Marino; H. Schwalbe; S. J. Glaser; C. Griesinger, J. Am. Chem. Soc. 1996, 118, 4388–4395. J. H. Ippel; S. S. Wijmenga; R. de Jong; H. A. Heus; C. W. Hilbers; E. de Vroom; G. A. van der Marel; J. H. van Boom, Magn. Reson. Chem. 1996, 34, S156–S176. J. van Wijk; B. D. Huckriede; J. H. Ippel; C. Altona, Meth. Enzym. 1992, 211, 286–306. C. Altona; R. Francke; R. de Haan; J. H. Ippel; G. H. Daalmans; A. J. H. Westra Hoekzema; J. van Wijk, Magn. Reson. Chem. 1994, 32, 670–678. Y. T. van den Hoogen; C. M. Hilgersom; D. Brozda; K. Lesiak; P. F. Torrence; C. Altona, Eur. J. Biochem. 1989, 182, 629–637. J. R. Tolman; J. M. Flanagan; M. A. Kennedy; J. H. Prestegard, Proc. Natl. Acad. Sci. U.S.A. 1995, 92, 9279–9283. N. Tjandra; A. Bax, Science 1997, 278, 1111–1114. N. Tjandra; J. G. Omichinski; A. M. Gronenborn; G. M. Clore; A. Bax, Nat. Struct. Biol. 1997, 4, 732–738. R. S. Lipsitz; N. Tjandra, Annu. Rev. Biophys. Biomol. Struct. 2004, 33, 387–413. N. Tjandra; S. Tate; A. Ono; M. Kainosho; A. Bax, J. Am. Chem. Soc. 2000, 122, 6190–6200. P. Bayer; L. Varani; G. Varani, J. Biomol. NMR 1999, 14, 149–155. M. R. Hansen; L. Mueller; A. Pardi, Nat. Struct. Biol. 1998, 5, 1065–1074. M. R. Hansen; P. Hanson; A. Pardi, Methods Enzymol. 2000, 317, 220–240. M. Ru¨ckert; G. Otting, J. Am. Chem. Soc. 2000, 122, 7793–7797. R. D. Beger; V. M. Marathias; B. F. Volkman; P. H. Bolton, J. Magn. Reson. 1998, 135, 256–259. J. Ying; A. Grishaev; M. P. Latham; A. Pardi; A. Bax, J. Biomol. NMR 2007, 39, 91–96. A. Bax, Protein Sci. 2003, 12, 1–16. V. Tsui; L. Zhu; T. H. Huang; P. E. Wright; D. A. Case, J. Biomol. NMR 2000, 16, 9–21. G. M. Clore; A. M. Gronenborn; N. Tjandra, J. Magn. Reson. 1998, 131, 159–162. A. Bax; G. Kontaxis; N. Tjandra, Methods Enzymol. 2001, 339, 127–174. C. D. Schwieters; J. J. Kuszewski; N. Tjandra; G. M. Clore, J. Magn. Reson. 2003, 160, 66–73. W. J. Wedemeyer; C. A. Rohl; H. A. Scheraga, J. Biomol. NMR 2002, 22, 137–151. J. A. Losonczi; M. Andrec; M. W. Fischer; J. H. Prestegard, J. Magn. Reson. 1999, 138, 334–342. M. Zweckstetter; A. Bax, J. Am. Chem. Soc. 2000, 122, 3791–3792. Y. Wei; M. H. Werner, J. Biomol. NMR 2006, 35, 17–25. H. Zhou; A. Vermeulen; F. M. Jucker; A. Pardi, Biopolymers 1999, 52, 168–180. A. Vermeulen; H. Zhou; A. Pardi, J. Am. Chem. Soc. 2000, 122, 9638–9647. O. Mauffret; G. Tevanian; S. Fermandjian, J. Biomol. NMR 2002, 24, 317–328. K. McAteer; M. A. Kennedy, J. Biomol. Struct. Dyn. 2003, 20, 487–506. P. Padrta; R. Stefl; L. Kralik; L. Zidek; V. Sklenar, J. Biomol. NMR 2002, 24, 1–14. Z. Wu; F. Delaglio; N. Tjandra; V. B. Zhurkin; A. Bax, J. Biomol. NMR 2003, 26, 297–315. K. McAteer; A. Aceves-Gaona; R. Michalczyk; G. W. Buchko; N. G. Isern; L. A. Silks; J. H. Miller; M. A. Kennedy, Biopolymers 2004, 75, 497–511.

276 Determination of Three-Dimensional Structures of Nucleic Acids by NMR 244. 245. 246. 247. 248. 249. 250. 251. 252. 253. 254. 255. 256. 257. 258. 259. 260. 261. 262. 263. 264. 265. 266. 267. 268. 269. 270. 271. 272. 273. 274. 275. 276. 277. 278. 279. 280. 281. 282. 283. 284. 285. 286. 287. 288. 289. 290. 291. 292. 293. 294. 295. 296. 297. 298. 299. 300. 301. 302. 303. 304. 305. 306. 307.

R. Stefl; H. Wu; S. Ravindranathan; V. Sklenar; J. Feigon, Proc. Natl. Acad. Sci. U.S.A. 2004, 101, 1177–1182. B. Wu; F. Girard; B. van Buuren; J. Schleucher; M. Tessari; S. Wijmenga, Nucleic Acids Res. 2004, 32, 3228–3239. J. G. Renisio; S. Cosquer; I. Cherrak; S. El Antri; O. Mauffret; S. Fermandjian, Nucleic Acids Res. 2005, 33, 1970–1981. F. Alvarez-Salgado; H. Desvaux; Y. Boulard, Magn. Reson. Chem. 2006, 44, 1081–1089. N. Sibille; A. Pardi; J. P. Simorre; M. Blackledge, J. Am. Chem. Soc. 2001, 123, 12135–12146. J. J. Warren; P. B. Moore, J. Biomol. NMR 2001, 20, 311–323. K. Bondensgaard; E. T. Mollova; A. Pardi, Biochemistry 2002, 41, 11532–11542. T. C. Leeper; M. B. Martin; H. Kim; S. Cox; V. Semenchenko; F. J. Schmidt; S. R. Van Doren, Nat. Struct. Biol. 2002, 9, 397–403. L. D. Finger; L. Trantirek; C. Johansson; J. Feigon, Nucleic Acids Res. 2003, 31, 6461–6472. D. C. Lawrence; C. C. Stover; J. Noznitsky; Z. Wu; M. F. Summers, J. Mol. Biol. 2003, 326, 529–542. P. J. Lukavsky; I. Kim; G. A. Otto; J. D. Puglisi, Nat. Struct. Biol. 2003, 10, 1033–1038. S. A. McCallum; A. Pardi, J. Mol. Biol. 2003, 326, 1037–1050. C. A. Theimer; L. D. Finger; L. Trantirek; J. Feigon, Proc. Natl. Acad. Sci. U.S.A. 2003, 100, 449–454. E. O’Neil-Cabello; D. L. Bryce; E. P. Nikonowicz; A. Bax, J. Am. Chem. Soc. 2004, 126, 66–67. P. Vallurupalli; P. B. Moore, J. Mol. Biol. 2003, 325, 843–856. V. D’Souza; A. Dey; D. Habib; M. F. Summers, J. Mol. Biol. 2004, 337, 427–442. D. G. Sashital; G. Cornilescu; C. J. McManus; D. A. Brow; S. E. Butcher, Nat. Struct. Mol. Biol. 2004, 11, 1237–1242. J. H. Davis; M. Tonelli; L. G. Scott; L. Jaeger; J. R. Williamson; S. E. Butcher, J. Mol. Biol. 2005, 351, 371–382. T. C. Leeper; G. Varani, RNA 2005, 9, 394–403. D. W. Staple; S. E. Butcher, J. Mol. Biol. 2005, 349, 1011–1023. C. A. Theimer; C. A. Blois; J. Feigon, Mol. Cell 2005, 17, 671–682. Y. Chen; J. Fender; J. D. Legassie; M. B. Jarstfer; T. M. Bryan; G. Varani, EMBO J. 2006, 25, 3156–3166. S. Flodell; M. Petersen; F. Girard; J. Zdunek; K. Kidd-Ljunggren; J. Schleucher; S. Wijmenga, Nucleic Acids Res. 2006, 34, 4449–4457. Y. Nomura; M. Kajikawa; S. Baba; S. Nakazato; T. Imai; T. Sakamoto; N. Okada; G. Kawai, Nucleic Acids Res. 2006, 34, 5184–5193. R. J. Richards; C. A. Theimer; L. D. Finger; J. Feigon, Nucleic Acids Res. 2006, 34, 816–825. R. J. Richards; H. Wu; L. Trantirek; C. M. O’Connor; K. Collins; J. Feigon, RNA 2006, 12, 1475–1485. S. J. Headey; H. Huang; J. K. Claridge; G. A. Soares; K. Dutta; M. Schwalbe; D. Yang; S. M. Pascal, RNA 2007, 13, 351–360. R. J. Marcheschi; D. W. Staple; S. E. Butcher, J. Mol. Biol. 2007, 373, 652–663. D. G. Sashital; V. Venditti; C. G. Angers; G. Cornilescu; S. E. Butcher, RNA 2007, 13, 328–338. N. Shankar; T. Xia; S. D. Kennedy; T. R. Krugh; D. H. Mathews; D. H. Turner, Biochemistry 2007, 46, 12665–12678. C. A. Theimer; B. E. Jady; N. Chim; P. Richard; K. E. Breece; T. Kiss; J. Feigon, Mol. Cell 2007, 27, 869–881. H. Wu; J. Feigon, Proc. Natl. Acad. Sci. U.S.A. 2007, 104, 6655–6660. J. Zoll; M. Tessari; F. J. Van Kuppeveld; W. J. Melchers; H. A. Heus, RNA 2007, 13, 781–792. N. J. Reiter; L. J. Maher, III; S. E. Butcher, Nucleic Acids Res. 2008, 36, 1227–1236. H. Van Melckebeke; M. Devany; C. Di Primo; F. Beaurain; J. J. Toulme; D. L. Bryce; J. Boisbouvier, Proc. Natl. Acad. Sci. U.S.A. 2008, 105, 9210–9215. K. Tu; M. Gochin, J. Am. Chem. Soc. 1999, 121, 9276–9285. M. Gochin, Structure 2000, 8, 441–452. Z. Wu; N. Tjandra; A. Bax, J. Am. Chem. Soc. 2001, 123, 3617–3618. J. Herzfeld; R. G. Griffin; R. A. Haberkorn, Biochemistry 1978, 17, 2711–2718. E. O’Neil-Cabello; Z. Wu; D. L. Bryce; E. P. Nikonowicz; A. Bax, J. Biomol. NMR 2004, 30, 61–70. D. L. Bryce; A. Grishaev; A. Bax, J. Am. Chem. Soc. 2005, 127, 7387–7396. J. Ying; A. Grishaev; D. L. Bryce; A. Bax, J. Am. Chem. Soc. 2006, 128, 11443–11454. K. Pervushin; R. Riek; G. Wider; K. Wu¨thrich, Proc. Natl. Acad. Sci. U.S.A. 1997, 94, 12366–12371. B. Brutscher; J. Boisbouvier; A. Pardi; D. Marion; J.-P. Simorre, J. Am. Chem. Soc. 1998, 120, 11845–11851. A. Grishaev; J. Ying; A. Bax, J. Am. Chem. Soc. 2006, 128, 10010–10011. C. Richter; C. Griesinger; I. Felli; P. T. Cole; G. Varani; H. Schwalbe, J. Biomol. NMR 1999, 15, 241–250. J. Boisbouvier; B. Brutscher; A. Pardi; D. Marion; J. P. Simorre, J. Am. Chem. Soc. 2000, 122, 6779–6780. M. Ebrahimi; P. Rossi; C. Rogers; G. S. Harbison, J. Magn. Reson. 2001, 150, 1–9. D. A. Case, J. Biomol. NMR 1995, 6, 341–346. S. S. Wijmenga; M. Kruithof; C. W. Hilbers, J. Biomol. NMR 1997, 10, 337–350. J. Cromsigt; C. W. Hilbers; S. S. Wijmenga, J. Biomol. NMR 2001, 21, 11–29. D. S. Wishart; D. A. Case, Methods Enzymol. 2001, 338, 3–34. A. Grishaev; J. Ying; M. D. Canny; A. Pardi; A. Bax, J. Biomol. NMR 2008, 42, 99–109. G. M. Clore; A. M. Gronenborn, Crit. Rev. Biochem. Mol. Biol. 1989, 24, 479–564. A. T. Bru¨nger; M. Karplus, Acc. Chem. Res. 1991, 24, 54–61. T. L. James; V. J. Basus, Annu. Rev. Phys. Chem. 1991, 42, 501–542. N. B. Ulyanov; U. Schmitz; T. L. James, J. Biomol. NMR 1993, 3, 547–568. E. G. Stein; L. M. Rice; A. T. Bru¨nger, J. Magn. Reson. 1997, 124, 154–164. G. M. Clore; C. D. Schwieters, Curr. Opin. Struct. Biol. 2002, 12, 146–153. D. A. Case; T. E. Cheatham, III; T. Darden; H. Gohlke; R. Luo; K. M. Merz, Jr.; A. Onufriev; C. Simmerling; B. Wang; R. J. Woods, J. Comput. Chem. 2005, 26, 1668–1688. J. de Vlieg; R. M. Scheek; W. F. van Gunsteren; R. Kaptein; J. Thomason, Proteins 1988, 3, 209–218. A. T. Bru¨nger, XPLOR Manual, Ver. 3.1; Yale University Press: New Haven, 1993. A. T. Bru¨nger; P. D. Adams; G. M. Clore; W. L. DeLano; P. Gros; R. W. GrosseKunstleve; J. S. Jiang; J. Kuszewski; M. Nilges; N. S. Pannu; R. J. Read; L. M. Rice; T. Simonson; G. L. Warren, Acta Crystallogr. D Biol. Crystallogr. 1998, 54, 905–921. C. D. Schwieters; J. J. Kuszewski; G. Marius Clore, Prog. Nucl. Magn. Reson. Spectrosc. 2006, 48, 47–62.

Determination of Three-Dimensional Structures of Nucleic Acids by NMR 308. 309. 310. 311. 312. 313. 314. 315. 316. 317. 318. 319. 320. 321. 322. 323. 324. 325. 326. 327. 328. 329. 330. 331. 332. 333. 334. 335. 336. 337. 338. 339. 340. 341. 342. 343.

277

P. Gu¨ntert; C. Mumenthaler; K. Wu¨thrich, J. Mol. Biol. 1997, 273, 283–298. V. B. Zhurkin; N. B. Ulyanov; A. A. Gorin; R. L. Jernigan, Proc. Natl. Acad. Sci. U.S.A. 1991, 88, 7046–7050. R. Lavery; K. Zakrzewska; H. Sklenar, Comput. Phys. Commun. 1995, 91, 135–158. A. Amir-Aslani; O. Mauffret; F. Sourgen; S. Neplaz; R. G. Maroun; E. Lescot; G. Tevanian; S. Fermandjian, J. Mol. Biol. 1996, 263, 776–788. M. Orozco; A. Perez; A. Noy; F. J. Luque, Chem. Soc. Rev. 2003, 32, 350–364. J. D. Baleja; R. T. Pon; B. D. Sykes, Biochemistry 1990, 29, 4828–4839. D. J. Kerwood; G. Zon; T. L. James, Eur. J. Biochem. 1991, 197, 583–595. A. T. Bru¨nger; P. D. Adams; L. M. Rice, Structure 1997, 5, 325–336. H. J. C. Berendsen; J. P. M. Postma; W. F. van Gunsteren; A. Di Nola; J. R. Haak, J. Chem. Phys. 1984, 81, 3684–3690. N. Metropolis; A. W. Rosenbluth; M. N. Rosenbluth; A. H. Teller; E. Teller, J. Chem. Phys. 1953, 21, 1087–1092. C. D. Schwieters; G. M. Clore, J. Magn. Reson. 2001, 152, 288–302. J. P. Ryckaert; G. Cicotti; H. J. C. Berendsen, J. Comput. Phys. 1977, 23, 327–341. N. B. Ulyanov; Z. Du; T. L. James, Refinement of Nucleic Acid Structures with Residual Dipolar Coupling Restraints in Cartesian Coordinate Space. In Modern Magnetic Resonance; G. A. Webb, Ed.; Springer: Netherlands, 2006; pp 665–670. T. F. Havel; K. Wu¨thrich, J. Mol. Biol. 1985, 182, 281–294. U. Schmitz; N. B. Ulyanov; A. Kumar; T. L. James, J. Mol. Biol. 1993, 234, 373–389. J. M. Aramini; A. Mujeeb; N. B. Ulyanov; M. W. Germann, J. Biomol. NMR 2000, 18, 287–302. J. Kuszewski; C. Schwieters; G. M. Clore, J. Am. Chem. Soc. 2001, 123, 3903–3918. W. J. Metzler; C. Wang; D. Kitchen; R. M. Levy; A. Pardi, J. Mol. Biol. 1990, 214, 711–736. F. H. Allain; G. Varani, J. Mol. Biol. 1997, 267, 338–351. V. N. Maiorov; G. M. Crippen, Proteins 1995, 22, 273–283. T. L. James, Methods Enzymol. 1994, 239, 416–439. A. Bru¨nger, Nature 1992, 355, 472–474. L. J. Rinkel; G. A. van der Marel; J. H. van Boom; C. Altona, Eur. J. Biochem. 1987, 166, 87–101. B. Celda; H. Widmer; W. Leupin; W. J. Chazin; W. A. Denny; K. Wu¨thrich, Biochemistry 1989, 28, 1462–1470. N. B. Ulyanov; U. Schmitz; A. Kumar; T. L. James, Biophys. J. 1995, 68, 13–24. A. E. Torda; R. M. Scheek; W. F. van Gunsteren, J. Mol. Biol. 1990, 214, 223–235. D. A. Pearlman, J. Biomol. NMR 1996, 8, 49–66. A. M. J. J. Bonvin; A. T. Bru¨nger, J. Mol. Biol. 1995, 250, 80–93. J. Fennen; A. E. Torda; W. F. van Gunsteren, J. Biomol. NMR 1995, 6, 163–170. J. Kemmink; R. M. Scheek, J. Biomol. NMR 1995, 5, 33–40. A. Go¨rler; N. B. Ulyanov; T. L. James, J. Biomol. NMR 2000, 16, 147–164. L. J. Yao; T. L. James; J. T. Kealey; D. V. Santi; U. Schmitz, J. Biomol. NMR 1997, 9, 229–244. U. Schmitz; A. Donati; T. L. James; N. B. Ulyanov; L. Yao, Biopolymers 1998, 46, 329–342. R. J. Isaacs; H. P. Spielmann, J. Am. Chem. Soc. 2004, 126, 583–590. C. D. Schwieters; G. M. Clore, Biochemistry 2007, 46, 1152–1166. M. Getz; X. Sun; A. Casiano-Negroni; Q. Zhang; H. M. Al-Hashimi, Biopolymers 2007, 86, 384–402.

Biographical Sketches

Nikolai B. Ulyanov studied mathematics in Moscow State University and worked on computational modeling of DNA bending as part of his Ph.D. project in the group of Dr. Victor Zhurkin in the Engelhardt Institute of Molecular Biology in Moscow. He was a postdoctoral fellow with Prof. Ramaswamy Sarma studying NMR spectroscopy of DNA at the State University of New York at Albany. Currently he is Associate Adjunct Professor at the University of California at San Francisco. His research interests focus on the structure and dynamics of nucleic acids, studied by computational methods and NMR.

278 Determination of Three-Dimensional Structures of Nucleic Acids by NMR

Thomas L. James is Professor of Chemistry, Pharmaceutical Chemistry and Radiology at the University of California, San Francisco, where after 13 years he recently stepped down as Chair of the Department of Pharmaceutical Chemistry. He received his Ph.D. from the University of Wisconsin. After 2 years in industry and 2 years of postdoctoral work with Prof. Mildred Cohn in the emerging area of biological NMR, he joined the UCSF faculty. Most of his research has focused on the development and the use of NMR in biology. Part of which has involved in vivo NMR, for example, spectroscopic imaging to investigate stroke, prostatic cancer, and drug toxicology. Other NMR-related research emphasized atomic level understanding with major goals to (1) enhance the accuracy and precision of protein and nucleic acid structures determined, (2) develop the means of describing conformational ensembles, (3) apply those methodologies to study biomolecular structure and dynamics and small molecule–macromolecule interactions, and (4) use 3D nucleic acid structures and computational search algorithms to discover novel ligands to serve as drug leads. Professor James has authored about 360 publications, including one book written and eight books edited. He has served as Editor or on the Editorial Board of four journals, four years on an NIH study section, and on several advisory boards.

9.09 Derivation of Peptide and Protein Structure using NMR Spectroscopy Glenn F. King and Mehdi Mobli, The University of Queensland, St. Lucia, QLD, Australia ª 2010 Elsevier Ltd. All rights reserved.

9.09.1 9.09.2 9.09.2.1 9.09.2.2 9.09.2.3 9.09.2.4 9.09.2.5 9.09.2.6 9.09.2.6.1 9.09.2.6.2 9.09.2.6.3 9.09.2.6.4 9.09.2.6.5 9.09.2.6.6 9.09.2.6.7 9.09.2.6.8 9.09.2.6.9 9.09.3 9.09.3.1 9.09.3.2 9.09.3.2.1 9.09.3.2.2 9.09.3.2.3 9.09.3.2.4 9.09.3.2.5 9.09.3.3 9.09.3.3.1 9.09.3.3.2 9.09.4 9.09.4.1 9.09.4.2 9.09.4.3 9.09.4.4 9.09.4.5 9.09.5 9.09.5.1 9.09.5.2 9.09.5.3 9.09.5.4 9.09.5.5 9.09.6 9.09.6.1 9.09.6.2 9.09.6.3 9.09.6.3.1

Introduction Sample Considerations and Solvent Suppression Heterogeneity of the Nuclear Magnetic Resonance Sample Paramagnetic Ions and Additives pH Temperature Ionic Strength Solvent Suppression Overview Radiation damping Presaturation Jump–return and binomial sequences WATERGATE and water-flip-back Excitation sculpting Coherence pathway selection Postprocessing methods Summary Data Acquisition for Nonlabeled Peptides Overview Homonuclear Resonance Assignment Strategies Overview Suitability of a protein for homonuclear assignment techniques Spin system identification Sequence-specific resonance assignment Three-dimensional homonuclear NMR Heteronuclear Resonance Assignment Strategies Introduction Heteronuclear correlation spectroscopy Data Acquisition for Isotopically Labeled Proteins Overview Heteronuclear-Edited NMR Experiments Triple Resonance Experiments for Protein Backbone Assignment Triple Resonance Experiments for Protein Side Chain Assignment Summary of Resonance Assignment Strategies Extraction of Structural Constraints Overview Interproton Distances Backbone Dihedral Angles Side chain Dihedral Angles Hydrogen Bonds Calculation of Structures from NMR Data Overview Parameterization of NMR-Derived Conformational Restraints Structure Calculation Methods Torsion angle dynamics

280 281 281 281 281 282 282 283 283 283 284 285 285 286 287 287 287 288 288 289 289 290 292 292 294 295 295 296 297 297 299 301 304 305 306 306 306 308 310 312 313 313 314 315 315

279

280 Derivation of Peptide and Protein Structure using NMR Spectroscopy 9.09.6.3.2 9.09.6.3.3 9.09.6.4 9.09.7 References

Dynamical simulated annealing Chemical shifts as structural restraints Assessing the Quality of Structures Derived from NMR Data Conclusions and Future Prospects

316 317 318 318 320

9.09.1 Introduction Nuclear magnetic resonance (NMR) spectroscopy is an invaluable tool for determining the three-dimensional (3D) structure of both small and large biomolecules. For very small molecules, determination of high-resolution structures may not be relevant either because the molecules adapt a single rigid structure or because they sample a vast amount of conformational space due to the presence of multiple rotatable bonds with a low energy barrier for interconversion between rotamers. In such cases, structure determination may be simply a matter of determining the local environment about a stereocenter or the orientation of two molecular fragments. However, many peptides and small proteins adopt specific structural forms, with stable secondary and tertiary structures, and in these cases NMR spectroscopy is the most widely used tool for determining their solution conformation (see Chapter 9.06). The fundamental limitation of any NMR-based protein structure determination endeavor is the ability to unambiguously identify the nuclear resonance frequencies (or chemical shifts) of each NMR-active nucleus in the protein. Inevitably, as the number of nuclei in a protein increases, so does the spectral complexity. A limit is ultimately reached where peak overlap is so severe that resonance frequencies become ambiguous, thus curtailing further analysis. Resonance overlap can be alleviated in several ways. Since resonance dispersion is proportional to magnetic field strength, spectral complexity can be decreased by performing experiments at higher magnetic field strength, which has the added benefit of improving the sensitivity; however, this can be very costly and it generally offers only limited alleviation of the spectral overlap problem. A more efficient means of increasing resonance dispersion is to employ multidimensional NMR experiments in which the chemical shift of one nucleus is related to that of its interacting partner(s), resulting in dispersion of resonances into extra frequency dimensions. The extent to which this can be achieved is related to the sensitivity at which the various nuclei can be detected. Moreover, since NMR spectroscopy is a relatively insensitive spectroscopic method, there is an inherent concentration limit below which NMR studies of peptides and proteins are unlikely to be successful (see Chapter 2.15). For unlabeled peptides purified from natural sources, one is generally limited to two-dimensional (2D) NMR experiments (although some 3D experiments are possible as outlined in Section 9.09.3.2.5). Therefore, determination of the structure of even small peptides can become problematic due to spectral overlap. The limit at which we are able to conduct NMR structural studies of peptides depends on the chemical shift dispersion of each individual protein, but generally speaking we can expect unlabeled peptides of up to 10 kDa (90–95 amino acid residues) to be amenable to structure determination using homonuclear NMR. Modern molecular biology approaches often allow recombinant peptides and proteins to be produced in various host cells such as bacteria and yeast. In such cases, the protein/peptide of interest can be uniformly labeled with NMR-active 13C and 15N isotopes by growing the host cells in a defined minimal medium. This provides access to a vastly increased range of multidimensional NMR experiments, and it increases spectral dispersion due to the greater chemical shift range of 13C and 15N nuclei compared with 1H, thereby enabling routine structural studies of proteins up to 30 kDa (270 amino acid residues). Thus, in order to distinguish between the homonuclear and heteronuclear NMR approaches, this chapter has been deliberately partitioned into two sections; one section deals with unlabeled peptides (Section 9.09.3) and the other with isotopically labeled proteins (Section 9.09.4). However, it should be noted that while the types of NMR experiments that can be employed to study these two categories of natural products are dramatically different, both approaches ultimately yield the same types of NMR-derived structural restraints that are ultimately used to reconstruct the 3D conformation of the peptide or protein.

Derivation of Peptide and Protein Structure using NMR Spectroscopy

281

9.09.2 Sample Considerations and Solvent Suppression 9.09.2.1

Heterogeneity of the Nuclear Magnetic Resonance Sample

Two key factors affecting the outcome of protein NMR experiments are the purity and oligomeric nature of the sample. Sample impurities will exacerbate spectral overlap problems and they can lead to incorrect peak assignments, particularly in nuclear Overhauser enhancement spectroscopy (NOESY) spectra; thus, a sample purity of >90% is essential, and >95% is desirable. A generally more difficult problem to overcome is aggregation of the peptide or protein sample at the high concentrations required for NMR analysis (typically 0.5 mmol l1). It is absolutely essential that the oligomeric state of the protein sample is known prior to beginning the NMR investigation as this will determine the types of experiments that are feasible. For example, dimerization of a 25 kDa protein would preclude NMR structural studies by taking the protein beyond the readily accessible molecular weight range. Dimerization of a 10 kDa protein would leave it within the accessible molecular weight range but special experimental and/or computational strategies would be required to unravel the intra- and intermolecular nuclear Overhauser enhancements (NOEs). There are numerous methods available for monitoring protein self-association. These include sedimentation equilibrium and sedimentation velocity experiments performed using an analytical ultracentifuge (AUC)1 and multiangle laser light scattering (MALLS).2 Both methods can be used to measure the molecular mass of proteins in solution, with an accuracy of 1–3%, without making any assumptions about the shape of the molecule or its degree of hydration. A variety of NMR-based methods can also be used for monitoring protein self-association; while not as accurate as AUC or MALLS, these methods are very convenient since they can be applied directly to the sample to be used for structure determination and they obviate the need for access to other specialized equipment. One of the simplest and most accurate NMR-based approaches is to derive an estimate of the protein’s molecular weight by measuring its translational diffusion coefficient using pulsed-field-gradient spin-echo (PFGSE) NMR.3

9.09.2.2

Paramagnetic Ions and Additives

Paramagnetic ions such as Cu2þ, Mn2þ, high-spin Fe3þ, and low-spin Co2þ can cause contact broadening of the resonances of nearby nuclei. While paramagnetic ions can be used to probe protein structure,4 these are special NMR applications and in general these ions should be excluded from samples to be used in high-resolution structure determinations. This can be achieved by treating samples with a metal-chelating agent such as Chelex, or by adding a small amount of ethylenediaminetetraacetic acid (EDTA) (5–50 mmol l1) to the NMR sample.5 Microbial contamination is another potential problem as the highly concentrated protein sample represents an excellent growth medium for algae, bacteria, and fungi during the days to weeks over which NMR experiments will be performed. Algal growth can be eliminated by minimizing exposure of the sample to light. Azide and fluoride can be used to prevent microbial contamination, but azide is volatile below pH 7 and fluoride cannot be used in the presence of metal ions.5 Broad-spectrum antibiotics such as chloramphenicol are excellent alternatives; chloramphenicol is chemically inert and is effective against both Gram-positive and Gram-negative bacteria at concentrations of 10–50 mmol l1.5 Unlabeled additives such as EDTA are not problematic in heteronuclear NMR studies as they do not yield signals in 13C/15N-isotope-filtered experiments. For example, it is common practice in heteronuclear NMR studies to add high concentrations (typically 1–15 mmol l1) of a reducing agent such as dithiothreitol (DTT) or tris(2-carboxy-ethyl)phosphine (TCEP) to avoid oxidation of cysteine residues. High concentrations of nondenaturing detergents have also been used to prevent protein aggregation.3 However, for homonuclear NMR studies, it might be necessary to use deuterated additives if the additive concentrations are sufficiently high that they would otherwise obscure resonances from the protein.

9.09.2.3

pH

Sample pH is a critical parameter for several reasons. First, it can dramatically affect the solubility of the protein – at high micromolar to millimolar concentrations, many proteins become insoluble when the pH approaches their isoelectric point (pI). Of crucial concern to the NMR experiment is the effect of pH on the rate of exchange of the

282 Derivation of Peptide and Protein Structure using NMR Spectroscopy

labile backbone amide protons with solvent protons (most often H2O). The backbone amide protons are usually the starting point for obtaining resonance assignments in homonuclear scalar correlation experiments such as correlated spectroscopy (COSY) and total correlation spectroscopy (TOCSY) (see Section 9.09.3.2), they provide critical NOE connectivities, and they are often one of the correlated nuclei in heteronuclear triple resonance experiments (see Section 9.09.4.3). Thus, in most NMR experiments, it is desirable to observe as many of the backbone amide-proton resonances as possible. The exchange of amide protons with solvent water protons is both acid and base catalyzed, with the rate of exchange being lowest at pH values of 3 and 5 for the backbone amide and the side chain amide protons of Asn and Gln residues, respectively.6 Thus, while in theory it might be desirable to work in the pH range 3–5 to maximize the intensity of amide-proton resonances, the protein may be insoluble or may extensively aggregate at these pH values, or its structure may be perturbed. In these cases, the lowest pH consistent with native conformation, tolerable solubility, and negligible aggregation should be chosen for the NMR study. Even a reduction in pH from 7.5 to 6.5 will lead to a 10-fold reduction in the rate of amide-proton exchange. While many earlier homonuclear protein NMR studies were performed at pH <5 in order to limit amide-proton exchange, modern heteronuclear NMR methods and improved water suppression techniques (Section 9.09.2.6) now allow protein structures to be determined at pH values close to neutral. Note that the amide-proton exchange rate can also be reduced by decreasing the temperature, but this has other consequences as outlined in the next section.

9.09.2.4

Temperature

Temperature has an important influence on various aspects of the NMR experiment. Increasing the sample temperature will generally increase signal amplitudes (as long as protein aggregation is not induced) because resonances will become narrower due to the decrease in molecular correlation time ( c). This in turn increases the efficiency of coherence transfer through scalar couplings. The rate of exchange of amide protons with solvent water is reduced at lower temperature, making it easier to observe labile amide protons. However, for some experiments, suppression of the water resonance is more efficient at higher temperatures because of its reduced linewidth. The major factors in deciding which temperature to use are the solubility, state of aggregation, and, most importantly, long-term stability of the peptide or protein sample. Thus, there has been a general trend toward decreased sample temperatures in biomolecular NMR studies as the size of protein being studied has increased (see Chapter 9.07). While many early NMR studies of peptides and small proteins were performed above room temperature (30–50 C), most recent studies of larger proteins have been carried out at lower temperatures (typically 20–25 C) due to sample instability at higher temperatures.

9.09.2.5

Ionic Strength

In addition to the sample concentration and specific parameters related to the pulse sequence being used, the spectral signal-to-noise ratio (SNR) depends on various components of the spectrometer hardware, in particular the sensitivity of the probe and preamplifier. The SNR can be related to the temperature of the receiver coil (TC), its resistance (RC), the temperature of the sample (TS), the resistance added to the coil by the sample (RS), and the noise temperature of the amplifier (TA) by the following equation:7 SNR ½TC RC þ TA ðRC þ RS Þ þ TS RS – 0:5

ð1Þ

The sample resistance, RS, depends on the exact buffer and salt conditions used to solubilize the peptide or protein. Although as a general rule the value of RS will increase, and the SNR will correspondingly decrease, as the sample ionic strength is increased, RS is more strictly related to the sample conductivity (). Thus, for two different buffers of equivalent ionic strength, the one with lower conductivity will yield the best SNR. In theory, then, dissolving the protein of interest in H2O would provide the best SNR. However, for proteins and peptides, it is often impractical to use H2O or a salt/buffer combination with very low conductivity since many proteins are ‘salted in’ at moderate salt concentrations (see Chapter 9.12). Thus, the choice of buffer will

Derivation of Peptide and Protein Structure using NMR Spectroscopy

283

necessarily be a compromise between maximizing protein solubility and minimizing sample conductivity. Nevertheless, for conventional room temperature NMR probes, salt concentrations as high as 1 mol l1 can be used without sacrificing SNR to the point where useful data cannot be collected.8 However, high-conductivity buffers are more problematic for modern cryogenically cooled (CC) NMR probes, which have been the most important hardware innovation in biomolecular NMR in the past decade. In CC probes, the receiver coil (TC) and preamplifier (TA) are cryogenically cooled to 15–30 K, leading to dramatic improvements in SNR, as predicted by Equation (1). For most multidimensional biomolecular NMR experiments, CC probes provide a 3–4-fold improvement in SNR over conventional probes.7 However, one consequence of the dramatic decrease in TC and TA in CC probes is that the final TSRS term in Equation (1) dominates the SNR relationship, and hence the conductivity of the sample can have a dramatic impact on spectral SNR.7 In general, for CC probes, there will be a very significant deterioration in SNR once the salt concentration exceeds 200 mmol l1. Thus, if one is using a CC probe, it becomes very important to minimize the conductivity of the salt and buffer without compromising the solubility and stability of the protein sample. The optimization of buffers for use with CC probes is an active area of research and will not be covered in depth here. The reader is referred to tables 1 and 2 in Kelly et al.7 for valuable information regarding the choice of buffers for NMR studies. An interesting recent development is the use of arginine–glutamate salt (typically 50 mmol l1 L-arginine þ 50 mmol l1 L-glutamate) for NMR studies of proteins using CC probes. This zwitterionic salt not only has much lower conductivity than NaCl but has also been shown to help solubilize proteins that are prone to aggregation.9,10 9.09.2.6

Solvent Suppression

9.09.2.6.1

Overview One of the principal advantages of NMR is that molecular information can be probed near physiological conditions. In practice, this equates to dissolving the molecule of interest (such as a peptide or protein) in aqueous solution, often with the addition of appropriate buffers and salts. This unfortunately results in a very strong 1H resonance signal from the protons of the solvent water. This is not surprising considering that the water concentration is near 55 mol l1 whereas that of the solute is generally 1 mmol l1 or less, resulting in a dynamic range of 105:1. In NMR applications, the solvent signal is most efficiently suppressed by the use of a deuterated solvent (in this case 2H2O, often referred to as D2O); in addition, this provides a deuterium lock signal that is used to correct for magnetic field fluctuations. Unfortunately, many NMR experiments relevant to peptide and protein structure determination require detection of the signals from the backbone amide protons; since these labile protons undergo chemical exchange with the solvent, deuteration renders them invisible in the (1H) NMR spectrum (see also Section 9.09.2.3 on the pH dependence of backbone amide-proton exchange). Consequently, for NMR structure determination studies, the peptide/protein must be dissolved in nondeuterated water (although a small amount of D2O, typically 5–10%, is still added to the sample for the deuterium lock). Thus, one of the first practical considerations in setting up an NMR experiment in H2O is determining the optimal method for suppressing the water signal. Methods developed for this purpose are termed solvent suppression or water suppression techniques. The extent of activity in this field bears witness to its importance and complexity (see Croasmun and Carlson,11 Price,12 and Levitt13 and citations therein). In the following sections, we first discuss the concept of radiation damping before introducing popular methods for solvent suppression in NMR studies of proteins. 9.09.2.6.2

Radiation damping The signal detected in an NMR experiment is the result of a current generated in the receiver coil due to the bulk magnetization (M0) of the irradiated nuclei precessing from the transverse (xy) plane toward its equilibrium state along the z-axis due to various relaxation processes.14 The rate at which this decay into Mxy occurs is manifest in an NMR spectrum by the linewidth of the observed signal. Large molecules generally have fast relaxation rates, which results in broad signals, whereas one would intuitively expect the water signal to be much sharper than those of the solute due to its comparatively much slower relaxation rate. In practice,

284 Derivation of Peptide and Protein Structure using NMR Spectroscopy

however, the water signal is significantly broader than anticipated on the basis of its intrinsic relaxation rate. This phenomenon, termed radiation damping, is observed when a very strong NMR signal is present (such as when a peptide or protein is dissolved in >90% 1H2O) and it is exacerbated when the sensitivity of the receiver coil is very high (as is the case for CC probes). The observed line broadening can be explained by considering that the current induced in the receiver coil due to the strong water signal in turn produces a radiofrequency (rf) magnetic field with the same frequency that rotates the water magnetization back to equilibrium. The extent of this effect can be represented by the time constant TRD (Equation (2)), which describes the rate at which the water magnetization is driven back to equilibrium:12,15 –1 TRD ¼ 2QM0

ð2Þ

where is the gyromagnetic ratio (also referred to as the magnetogyric ratio), and Q are the filling and quality factors describing the sensitivity of the probe, and M0 is the equilibrium magnetization per unit volume (assuming the water is irradiated by a 90 pulse). The above relationship shows that an increase in any of the terms on the right-hand side of Equation (2) will lead to increased linewidth. The decay rate of water in highly sensitive CC probes is reduced from seconds to milliseconds due to radiation damping.16 Although several techniques involving both pulse sequence17 and hardware18,19 modifications have been proposed to specifically reduce radiation damping, the most effective way to achieve this is by suppressing its initiation through solvent suppression. In the following sections, a variety of solvent suppression methods are discussed that also ameliorate the effects of radiation damping by virtue of reducing the intensity of the solvent signal.

9.09.2.6.3

Presaturation The most direct form of solvent suppression is presaturation of the solvent signal through the application of a frequency-selective, low-power rf pulse at the water resonance over a relatively long period of time (typically seconds) prior to execution of the pulse sequence. This has the effect of equalizing the populations of nuclear spins in the low-energy () and high-energy () states (i.e., the two available energy states for a spin 1=2 nucleus) with the phases of the individual spins being randomly distributed in the transverse plane. Care should be taken when choosing the resonance frequency of water as errors in tuning can cause a frequency shift.20 The presaturation method can in some cases leave an unwanted distortion of the baseline (residual hump) due to magnetic field inhomogeneities across the sample volume. This may be reduced by combining presaturation with the first increment of a NOESY experiment in which the NOESY mixing time is set to zero. The phase cycling of the NOESY experiment reduces signals from parts of the sample where the magnetic field (both external field, B0, and applied field, B1) is inhomogeneous. A drawback of this approach is that it restricts the experiment to the use of 90 pulses, which may not be optimal. This restriction is lifted in the FLIPSY21 approach, making it more generally applicable. Although commonly used in the past, presaturation is becoming much less popular for biomolecular applications mainly due to the advent of more efficient methods (as discussed below) and also due to a number of shortcomings such as those listed below:

1. Careful minimization of B0 field inhomogeneity (i.e., shimming) is required. 2. There is bleaching of resonances close to the water irradiation frequency (which is where the H resonances of peptides and proteins are often found). 3. There is saturation of amide-proton resonances due to chemical exchange and transfer of this saturation throughout the peptide/protein via dipolar interactions (spin diffusion). 4. Solvent signals originating outside the main sample volume are not efficiently suppressed and can lead to poor solvent suppression. However, due to its simple setup and generally efficient solvent suppression, presaturation is still a useful method for acquiring an exploratory one-dimensional (1D) experiment in order to gauge properties of the NMR sample.

Derivation of Peptide and Protein Structure using NMR Spectroscopy

285

9.09.2.6.4

Jump–return and binomial sequences The jump–return sequence differs from presaturation in that instead of saturating the water resonance, it aims to ensure that no net magnetization is produced at this frequency at the end of the pulse sequence. In its simplest form, this sequence consists of two 90 pulses with opposite phases separated by a delay . For example, a 90 x pulse rotates the equilibrium (z) magnetization about the x-axis onto the y-axis. Since signals are detected only in the xy-plane, a strong signal would be observed at this point. If the delay is ignored, the magnetization is simply rotated back to the z-axis (from the transverse plane) by the second 90 x pulse, due to the opposing phases of the two pulses, thus resulting in no observable signal. However, during the delay, all resonances at frequencies different from that of the carrier frequency (i.e., the frequency at which the rf field is applied) acquire a frequency-dependent component along the x-axis. This frequency dependence is sinemodulated and dependent on the time delay according to ¼ 1/4 , where is the difference (in Hz) between the carrier frequency and the frequency at which the maximum intensity will be found along the x-axis. The component of resonances along the x-axis at the end of the period will be left completely unperturbed by the 90 x pulse and thus observable. A very similar sequence is the 11 sequence, which actually belongs to a family of binomial pulse sequences. Of these sequences, the 1331 (often referred to as 1–3) is the most popular due to its wider water suppression window (see Figure 1).22 Due to the frequency-dependent sine modulation of the resonance amplitudes (the so-called ‘excitation profile’), the resonances on either side of the water signal (carrier frequency) have opposite signs. In addition, the binomial sequences suffer from baseline distortions due to a strong linear phase gradient (see Figure 1). This baseline distortion can be particularly troublesome in multidimensional experiments and therefore the binomial sequences have not proved popular for multidimensional NMR studies. 9.09.2.6.5

WATERGATE and water-flip-back The water suppression methods discussed above have lost much of their popularity in biomolecular NMR due to the advent of improved methods that utilize pulsed-field gradients (PFGs). PFGs (often referred to simply as gradients) apply a nonuniform spatial encoding to the NMR sample along the z-axis (only z gradients are discussed as these are available on most modern NMR spectrometers). After the application of a gradient pulse, transverse magnetization (i.e., in the xy-plane) will be dephased. If a spectrum is recorded at this point, no net magnetization will be observed. However, this dephasing is not random (cf. presaturation) and the original signal can be recovered by applying a gradient pulse with the opposite sign. In the meantime, any magnetization along the z-axis will be left unperturbed. One of the earliest gradient-based solvent suppression techniques, and perhaps still the most widely used, is the WATERGATE sequence.23 This sequence starts with a hard 90 x pulse, which places the equilibrium magnetization on the y-axis; note that since this is generally the point at which data acquisition commences in most experiments, the WATERGATE sequence can be appended to most sequences in a modular fashion. The initial hard pulse is followed by a gradient pulse (Gz), which dephases both the solute and solvent signals. This is followed by a selective 90 pulse (i.e., one that affects only a narrow frequency range) on the water resonance. At this point, the water magnetization will be dephased and located on the z-axis due to the cumulative effect of

(middle), and 133 1 (right) pulse sequences used for suppressing Figure 1 Excitation profiles of the jump–return (left), 11 the water resonance.

286 Derivation of Peptide and Protein Structure using NMR Spectroscopy

(a) S

1H

S

Gz (d) (b)

90°x

180°x 90°x

S

1H

1H

H2O Gz (e) (c) 1H

3-9-19 -19-9-3 S

S 1H

Gz Figure 2 Gradient-echo-based water suppression pulse sequences. (a) WATERGATE; (b) water-flip-back; (c) excitation sculpting; (d and e) examples of the ‘S’ pulse train that is sandwiched between the gradient echo: (d) water-selective inversion pulse; (e) excitation tailoring using a binomial series.

the hard and soft 90 x pulses, whereas the solute signals will be unaffected. The next hard 180 x pulse shifts the water magnetization back onto the þz-axis, whereas the solute magnetization is rotated to the þy-axis. A final selective pulse rotates the water magnetization back onto the y-axis (where initially it was dephased by Gz). The effect of this pulse train is thus to place the solvent and solute magnetization at opposite ends of the y-axis. When the final Gz pulse is applied, the solute molecules are rephased whereas the water resonances are further dephased (see Figure 2(a)). One of the drawbacks of WATERGATE is that the water molecules have some transverse magnetization at the end of the pulse sequence, albeit dephased. This saturation can be transferred onto labile amide protons through chemical exchange, which, in a similar manner to presaturation, results in reduced sensitivity. The water-flip-back sequence24 is different from the WATERGATE sequence in that the initial hard 90 x pulse is preceded by a selective 90 x pulse at the water resonance frequency (see Figure 2(b)). If we follow through the steps of the WATERGATE sequence as outlined above, we find that, at the end of the pulse sequence, the water magnetization is located along the z-axis and thus is not saturated. This has the advantage of improving the sensitivity of the experiment and further reducing the onset of radiation damping as the gradient pulses remove transverse water magnetization. Since the water is located along the z-axis when the pulse field gradients are applied in the water-flip-back sequence, a lower field strength (compared to WATERGATE) can be used for the gradient pulses. 9.09.2.6.6

Excitation sculpting In the gradient methods discussed above, a pulse train (S) is placed within two gradients G of equal intensity and duration; such a sequence is a special case of a PFGSE, defined by the sequence G--G. The effect of a PFGSE on a frequency-selective sequence (G-S-G) as in the above examples is to refocus all frequencies experiencing a net inversion (180 rotation) and to dephase stationary components. A drawback of such a PFGSE (apart from

Derivation of Peptide and Protein Structure using NMR Spectroscopy

287

those discussed above) is that it will introduce a frequency-dependent phase error determined by the properties of the sandwiched pulse train S, leading to undesired baseline distortions. It has been shown that these phase errors can be removed by the addition of a second gradient echo, resulting in a double PFG spin echo (DPFGSE). The second gradient echo ‘chips away’ at the unwanted magnetization to produce the desired excitation profile, hence the name ‘excitation sculpting’ (see Figure 2(c)). One of the requirements of this sequence is that the first set of gradients should be different from the second set (i.e., G1-S-G1-G2-S-G2) so that no dephased magnetization is accidentally refocused by the second set of gradients. Since any sequence S can be used regardless of phase properties, it allows for more freedom in designing the sequence S to give the desired excitation profile. It should be noted that when using these sequences there is a dead time between the last pulse and the start of the acquisition (due to instrumental limitations). This can, in some cases, lead to baseline distortions as some of the initial points of the free induction decay (FID) may be lost. Such distortions can effectively be removed by (1) introducing a short delay before the final 180 pulse, (2) incorporating a spin echo at the end of the sequence,25 or (3) spectral processing methods.26 9.09.2.6.7

Coherence pathway selection The methods outlined above all achieve solvent suppression by manipulating (tailoring) the excitation profile of the observed spectrum. A different approach to achieving this is to take advantage of the fact that the water signal is a singlet that contains no observable homo- or heteronuclear scalar (spin–spin) coupling. Most multidimensional NMR experiments inherently select for a given coherence pathway, generally involving the transfer of magnetization from one nucleus to another through scalar couplings. Thus, the water signal should inherently be suppressed. However, in practice, unwanted coherences are often suppressed by phase cycling and the large (and broad) water signal can still deteriorate the spectral quality because of extreme dynamic range issues (see above). In such cases, PFGs can, as in the above experiments, be used to effectively suppress signals that do not follow the desired coherence pathway. Generally, this is done by applying PFGs at a point during the pulse sequence where the desired magnetization is aligned along the z-axis and the water is in the transverse plane. The principles are exactly as discussed above, with the difference being that the water magnetization is manipulated based on its coherence pathway rather than its chemical shift. This principle can be applied to both homonuclear27 and heteronuclear28 NMR experiments. In practice, if the coherence selection is not performed at the beginning of the pulse sequence, dynamic range issues and radiation damping may still cause a deterioration in the results. An advantage of this approach is that signals with frequencies similar to the water resonance can be detected since the suppression is not frequency-based, but instead suppresses uncoupled resonances. 9.09.2.6.8

Postprocessing methods In all of the solvent suppression methods discussed above, there will almost always be a residual water signal remaining at the end of the pulse sequence due to one or a combination of factors such as field inhomogeneities, pulse imperfections, and frequency-dependent delays. Such residual signals can be removed by various postprocessing methods, and these can result in much improved baseline properties that can be important when performing quantitative analyses or when analyzing resonances near the water frequency. It should be noted that to some extent all postprocessing methods are cosmetic and do not address issues such as radiation damping and other experimental issues caused by the solvent. The most popular postprocessing method for suppression of the solvent signal is the use of a low-pass frequency filter.29 This method simply filters out frequencies outside a certain bandwidth in the time domain. The filter is usually applied to the central frequency, which generally corresponds to the water resonance. Thus, only the water resonance and nearby resonances remain when the filter is applied. This filtered signal is then subtracted from the original time-domain signal, which results in a visually appealing flat baseline in place of the water signal. 9.09.2.6.9

Summary The above introduction to solvent suppression serves to elucidate some of the complications imposed by the solvent water signal and common measures for ameliorating them. Note that many modern pulse

288 Derivation of Peptide and Protein Structure using NMR Spectroscopy

Figure 3 Excitation profiles of the WATERGATE sequence obtained using either a selective 180 pulse sandwiched between two hard 90 pulses (left panel) or a 3-9-19 binomial sequence in place of the ‘S’ element of the gradient echo (right panel) (see also Figure 2).

sequences employ a combination of the above methods. For example, we can conclude that if the pulse angle of the individual pulses in the binomial series is doubled (precluding those that have a period when aligned along the z-axis), the net effect of the series is a selective 180 pulse on all resonances except that of the solvent, which is left unperturbed. The same result is achieved by the selective pulse train S in the gradient-echo sequences. Thus, the binomial series can be inserted into any of the PFGSE methods (WATERGATE or excitation sculpting) in place of the S sequence between the gradients. In addition, the effect of the two gradients is to remove the linear phase distortion found when solely using the binomial series for water suppression. This combination of techniques results in an excitation profile consistent with the binomial series but with the advantageous phase properties of the gradient-echo methods, allowing for more elaborate excitation profiles (e.g., the 3-9-19 sequence;30 see Figures 2 and 3). Much recent progress31–33 in this area has involved combinations of such methods to improve solvent suppression.

9.09.3 Data Acquisition for Nonlabeled Peptides 9.09.3.1

Overview

The general strategy for determination of the 3D structure of proteins and peptides using NMR spectroscopy comprises three distinct stages: 1. The assignment of NMR resonances (1H together with 15N and 13C when possible) to specific atoms or atom groups in the protein. 2. The extraction of experimental constraints from the NMR data, which provide information on the relative spatial positions of these atoms in the protein (see Section 9.09.5). 3. The use of these constraints as input into a computer program that attempts to derive a family of structures for the protein, each of which ‘satisfies’ the experimental constraints (see Section 9.09.6). There are two general approaches through which stage (1), resonance assignment, can be accomplished, and the choice between them is determined essentially by the molecular mass of the protein. The first approach, which was pioneered in the laboratory of Kurt Wu¨thrich34 and which led to his award of the 2002 Nobel Prize in Chemistry, involves the use of 2D 1H1H (homonuclear) NMR experiments such as COSY, NOESY, and TOCSY (see below). This approach is still widely used today, but only for proteins smaller than 10 kDa that cannot be isotopically labeled. Nevertheless, the homonuclear NMR strategy is suitable for studying peptides purified from natural sources, and this will be the focus of Section 9.09.3.

Derivation of Peptide and Protein Structure using NMR Spectroscopy

9.09.3.2

289

Homonuclear Resonance Assignment Strategies

9.09.3.2.1

Overview The advent of 2D NMR techniques in the early 1980s was the key breakthrough that allowed detailed structural information to be extracted from proteins in solution. If information concerning the interactions between spins is to be extracted from 1D experiments, pulses must be selectively applied to particular resonances and their effect(s) on other spins gauged from changes to the 1D spectrum. This method is adequate in the absence of significant spectral overlap, but soon becomes impractical even for molecules of modest size. By contrast, extending such measurements into a second dimension alleviates the overlap problem to a significant degree. These 2D experiments35 consist of discrete elements – a preparation period; an evolution period (t1) where spins are ‘labeled’ as they precess in the xy-plane according to their chemical shift; a mixing period, during which correlations are made with other spins; and a detection period (t2) where an FID is recorded (Figure 4). Note that these elements may be combined to create more complex experiments, which is the basis of the higher dimensionality (3D and 4D) NMR experiments outlined in Section 9.09.4. The FID is signal-averaged as usual (as required for both signal-to-noise and phase cycling considerations)35 and then the process is repeated a number of times with incremented values of t1. After Fourier transformation of the series of t1-incremented experiments, the amplitude of each signal through the series is found to be modulated according to both its intrinsic resonance frequency and the frequency of the proton(s) to which it is correlated during the mixing period (see Figure 5). In addition, transverse relaxation, which occurs during the pulse sequence, will result in smaller peak intensities for increasing values of t1, so that cross sections of the t2-transformed data (F2) have the form of exponentially decreasing sinusoids (cf. FIDs). A Fourier transformation of these cross sections (i.e., with respect to t1) thus yields a planar spectrum with two frequency dimensions (F2 and F1; termed the directly detected and indirectly detected dimensions, respectively) where, in most homonuclear 2D experiments, the 1D spectrum

Preparation A

Evolution

Mixing

t1

Detection t2

B

C

τm

D ‘Spin-lock’

Figure 4 (a) The generic elements of a 2D NMR experiment. The basic pulse sequences shown for (b) DQFCOSY, (c) NOESY, and (d) TOCSY experiments illustrate that the mixing period determines the type of correlation observed in the spectrum. Black rectangles represent 90 pulses. m is the mixing time in the NOESY experiment and the spin-lock time in the TOCSY experiment.

290 Derivation of Peptide and Protein Structure using NMR Spectroscopy

t1

F2

Figure 5 A Fourier-transformed signal in F2, at frequency !2, is modulated in the indirect dimension (t1) by the incremental delay. This ‘interferogram’ demonstrates how a sinusoid is created in the indirect dimension when a series of such spectra are collected. Fourier transformation of this sinusoid (along t1) would thus yield a peak with a frequency !1 in F1.

(sometimes simplified) is present as a diagonal, and correlations between spins are represented by off-diagonal elements known as crosspeaks. Experiments are distinguished by the nature of the correlations that are probed during the mixing period. Scalar couplings between protons up to three bonds apart are revealed using correlated spectroscopy (COSY)35 or preferably double-quantum-filtered COSY (DQFCOSY),36 which has superseded the basic COSY as the experiment of choice for elucidating these couplings due to the narrower lineshapes it produces. NOESY37,38 connects protons that are close in space (<5.5 A˚; see Section 9.09.5.2 for an explanation of the nuclear Overhauser effect). The basic pulse sequences for the COSY and NOESY experiments are given in Figures 4(b) and 4(c), respectively, and it can be seen that an important difference between them lies in the nature of the mixing period. In DQFCOSY, this consists of two 90 pulses separated by a brief delay (3 ms), while in NOESY two 90 pulses sandwich an extended mixing time (typically 50–300 ms) during which the NOEs are allowed to build up (there are also phase cycling differences). A further invaluable experiment that also yields scalar connectivities is TOCSY39 (also called HOHAHA for homonuclear Hartmann–Hahn spectroscopy;40 Figure 4(d)). In TOCSY spectra, correlations are observed between (potentially) all protons within a spin system (i.e., a group of protons that share mutual coupling partners, such as the HN, H, and -methyl protons of an alanine residue) whether or not they are directly coupled to each other. The coupling is developed during the application of a spin-locking pulse (termed the isotropic mixing period), and the extent to which magnetization is propagated along a spin system depends on the duration of this spin-lock pulse (typically 30–100 ms) and the magnitude of the scalar couplings involved. These three experiments form the basis of the sequential assignment method proposed originally by Wu¨thrich34 for the complete assignment of resonances in 1 H NMR spectra of polypeptides. In the first stage of this procedure, the individual spin systems of each amino acid are identified from the scalarcoupled 2D experiments, that is DQFCOSY and TOCSY. Note that all crosspeaks in these experiments correspond to intraresidue connectivities, since there are no interresidue pairs of protons within three bonds of each other. This procedure is greatly aided by inspecting the average for values chemical shift the side chain protons of assigned proteins (Figure 6) in the Biological Magnetic Resonance Data Bank (BioMagResBank or BMRB). This procedure allows residue types to be distinguished, but no information is provided on the positions of these residues within the polypeptide sequence. This information comes from the second stage of the approach, where the characteristic patterns of through-space correlations generated in the NOESY experiment are used to connect sequential pairs of residues and thereby achieve sequence-specific resonance assignment.

9.09.3.2.2

Suitability of a protein for homonuclear assignment techniques It has generally been found that homonuclear resonance assignment cannot be applied successfully to proteins larger than around 10 kDa. There are two reasons for this. First, the complexity (number of crosspeaks) in 2D spectra increases in an approximately linear fashion with the number of chemically inequivalent protons in the molecule. Thus, for proteins larger than around 10 kDa, spectral overlap will generally prevent the exhaustive assignment of resonances necessary for structure determination. The second, more fundamental, reason arises

δ

γ2

δ

γ

1

1H

chemical shift (ppm)

β

γ

1

γ

β β

2

γ β

3

δ

β

β γ

β

γ

γ ε

β

γ

γ

β

ε

β

β

β

α α α

α

α

α

α

α

α

α

3

β δ

α

1

γ

β

α

β 2

β

α α

291

δ

β

β

4

Trp Tyr Val

Thr

Ser

Pro

Phe

Met

Lys

Leu

Ile

Gly His

Gln

Glu

Asp Asn cys

Arg

Ala

Derivation of Peptide and Protein Structure using NMR Spectroscopy

α

β α

α

α

4

α α

5

5

6

6

Figure 6 Average chemical shift values for protein backbone and side chain protons extracted from BioMagResBank (http://www.bmrb.wisc.edu). Error bars indicate standard deviations.

T1

T1/T2

Intermediate motion (T1 ≤ T2) Slow motion (T1 < T2)

Fast motion (T1 = T2)

T2 Molecular correlation time (τc) Figure 7 Schematic plot of the relationship between T1 (longitudinal relaxation time), T2 (transverse relaxation time), and the molecular correlation time ( c). In general, small molecules have short correlation times, whereas large molecules have longer correlation times. T1 and T2 are equal in small molecules, whereas T2 is the dominant relaxation mechanism for large molecules.

from the dependence of the transverse relaxation time, T2 (and hence the linewidth, 1/2, which equals 1/T2 in the absence of field inhomogeneity), on the molecular correlation time c (see Figure 7). c is a measure of how rapidly a molecule tumbles in solution (actually the time taken for a molecule to rotate through one radian), and is given for a spherical molecule by the Stokes–Einstein equation: c ¼

4rh3 3kT

ð3Þ

where is the solvent viscosity, rh is the hydrodynamic radius of the molecule, k is Boltzmann’s constant, and T is the temperature. Note that the derived correlation time is an approximate upper limit.

292 Derivation of Peptide and Protein Structure using NMR Spectroscopy

It can be seen from Figure 7 that, as the molecule tumbles more slowly, T2 relaxation becomes more efficient compared to T1 relaxation. (A small T2 implies efficient relaxation and broad lines, since T2 is the inverse of the relaxation rate.) This increase in linewidth with molecular size causes two problems: (1) spectral overlap will clearly be worse for broader signals and (2) the efficiency of information transfer between spins in the scalar-coupled experiments (coherence transfer) becomes very poor when the resonance linewidths start to exceed the magnitude of the spin–spin coupling constants. For example, a 7 Hz coupling (an average value for three-bond 1H1H couplings, 3JHH) between protons with 20 Hz linewidths gives a COSY-type transfer efficiency of only 2%.41 Note, however, that the 10-kDa size limit is only a rough guide, and the exact limit depends on the shape of the protein (which influences the tumbling rate) and the chemical shift dispersion; for example, -helical domains generally display less dispersion than -sheets, so that the size limit for predominantly helical proteins will be somewhat lower than that for other proteins. In order to ascertain whether or not these homonuclear methods will provide complete resonance assignments, a DQFCOSY spectrum of the protein dissolved in H2O should be recorded. Note that it will be necessary to attenuate the huge signal arising from the solvent (as discussed in Section 9.09.2.6). The majority of the crosspeaks in the so-called fingerprint region of this spectrum (F2 10.0–6.0 ppm, F1 3.0–6.5 ppm) will be due to correlations between amide protons and the H proton of the same residue. The number of HNH crosspeaks should be 90% of the number of residues in the protein for homonuclear methods to be adequate. However, crosspeaks may be lost for a number of reasons, including transfer of saturation due to the solvent suppression technique used, and the cancellation of antiphase components of the crosspeaks when the linewidths are large compared to the coupling constant. These problems can be circumvented to some degree (see Chapter 4 in Roberts42) and attempts should be made to do so before casting the homonuclear assignment strategy aside. 9.09.3.2.3

Spin system identification Once it has been decided to employ the homonuclear strategy for resonance assignment, good quality DQFCOSY and TOCSY spectra should be acquired in H2O. The TOCSY spectrum should be acquired with two different mixing times, since the intensity profile for TOCSY crosspeaks is complex, with the intensity of each correlation depending on each of the individual J-couplings if a multistep transfer is involved.43 Thus, a mixing time that is optimal for long-range transfer from, for example, the amide proton of an Ile residue to its side chain methyl groups may be quite nonoptimal for short-range transfer from HN to H.43 It may also be useful to record either one or both of these experiments in D2O, especially if it is suspected that artifacts resulting from incomplete solvent suppression are obscuring crosspeaks involving H protons near the water signal. The two spectra complement each other in information content, with TOCSY skewers providing connectivities between most (or often all) protons in the same spin system, and the DQFCOSY distinguishing between direct and indirect connectivities (Figure 8). These spectra are used to identify the type of spin system associated with each HNH crosspeak, and these spin system types can be matched either to specific amino acids or to groups of amino acids. Note that complete assignment of the longer spin systems such as Lys and Arg is not crucial for sequential assignment, or frequently even for the generation of NOE constraints. Because these residues generally lie on the surface of proteins with their side chains oriented toward the solvent, they are generally very mobile and often exhibit very few structurally useful NOEs 9.09.3.2.4

Sequence-specific resonance assignment Once all traceable spin systems have been delineated as described above, they need to be matched to specific residues in the sequence. This is achieved using through-space connectivities derived from a NOESY spectrum (or in the case of smaller polypeptides – those with MW 1000–3000 Da – a ROESY spectrum45), which correlates pairs of protons less than 5.5 A˚ apart in space, regardless of their relative positions in the primary structure. In general, the shortest mixing time that yields a good quality spectrum is preferable, since indirect effects are observed at longer values. That is, at longer mixing times, magnetization may effectively be transferred between protons that are separated by >5.5 A˚ (see Section 9.09.5.2). The sequential assignment procedure relies on the observation of connections between the HN, H, and H protons of adjacent residues in the sequence. It has been demonstrated that, irrespective of secondary structure,

Derivation of Peptide and Protein Structure using NMR Spectroscopy

293

γ′ γ A6

1.0

β

2.0

β′

3.0

β

chemical shift (ppm)

C10

1H

β

α α

4.0

V29

α 7.9 1H

7.8

7.7

chemical shift (ppm)

Figure 8 Residue-specific resonance assignment using TOCSY spectra. Portion of the amide region of a 2D 1H1H TOCSY spectrum ( ¼ 80 ms) of the 37-residue spider toxin -atracotoxin-Hv1c.44 Intraresidue scalar correlations from the backbone amide proton to each of the side chain protons (so-called amide ‘skewers’) are shown for residues Ala6, Cys10, and Val29. The side chain protons corresponding to each correlation are indicated on the spectrum.

Residue i

Residue i + 1

H2C

O

N

C

C

H

H

H2C N

C

C

H

H

O

Figure 9 Intra- and interresidue NOE connectivities used to make sequential assignments using the homonuclear strategy. A two-residue protein segment is shown. Blue dotted lines represent intraresidue scalar couplings, which are used to identify the residue type. Purple dashed lines represent intraresidue NOEs, which may assist in this process. Solid red lines show interresidue NOEs, which are used to connect individual spin systems and thereby make sequential assignments.

at least one (generally more) of these pairs of protons will be less than 3.5 A˚ apart, and thus should give rise to an NOE with medium-to-strong intensity.34 The most useful of these are the dN(i, i þ 1) (that is, the H of a residue i to the HN of residue i þ 1), dNN(i, i þ 1), and dN(i, i þ 1) correlations (see Figure 9).

294 Derivation of Peptide and Protein Structure using NMR Spectroscopy

Before commencing this stage of the procedure, it is useful to record a NOESY spectrum in D2O. In this spectrum, the only signals downfield of 6 ppm correspond to carbon-bound protons from His, Phe, Trp, and Tyr, so these crosspeaks may be marked as such on the H2O-NOESY to simplify the assignment task. Note that complete exchange of the backbone amide protons for deuterons may take days, weeks, or even months for some protons, depending on sample temperature and pH (high temperature and pH favor exchange). In fact, partial hydrogen–deuterium exchange may be used to simplify (edit) both scalar-coupled and NOESY spectra if spectral overlap proves to be a problem (as is often the case for predominantly -helical proteins). DQFCOSY/TOCSY and NOESY spectra collected soon after dissolution of the protein in D2O will exhibit only a subset of the HN protons, together with their associated connectivities. Similarly, a sample that has been quantitatively exchanged with D2O may be freeze-dried and reconstituted in H2O, and data collected on this so-called reverse-exchanged sample will contain a complementary subset of correlations. Thus, the presence in the H2O-NOESY of any of the three classes of NOE listed above is used to infer a sequential juxtaposition of the two amino acids concerned. It should be realized, however, that these types of NOEs can (and often will) be observed between residues that are not neighbors in the sequence, so that caution, as always, should be exercised. Consequently, it is more reliable to base a sequential connection on the evidence of more than one of these three types of NOEs. Breaks in the sequential assignment will inevitably occur – these can be the result of spectral overlap (e.g., two amide protons with identical chemical shifts will prevent the observation of sequential NOEs) or of the structure itself (e.g., the presence of a proline, which lacks an amide proton). Sometimes the H protons of Pro can be used to continue the assignment sequence (i.e., using dN(i, i þ 1) connectivities), although this requires prior assignment of the proline spin system, usually a difficult task in the early stages of the procedure. Once short sequences of spin systems (3–4 residues) have been picked out, these can be mapped onto the polypeptide sequence. In some cases, a unique match may be found, but often there will be several possible assignments, and the segment must be extended in either or both directions as described above, until (hopefully) all but one of the possibilities can be excluded. For a 200-residue protein containing all 20 types of amino acids, there is a 99% probability that a tetrapeptide segment will be unique.34 This process is repeated until all possible assignments have been made (Figure 10). Although homonuclear resonance assignment is almost exclusively carried out using the sequential assignment method, one other approach has found use in some applications. The main-chain-directed (MCD) method46 is based on the identification of cyclic patterns of NOEs, which are characteristic of the different types of secondary structure. Because of this, it is less suitable for the assignment of unstructured or irregularly structured sections of a protein.

9.09.3.2.5

Three-dimensional homonuclear NMR In the late 1980s, a further increase in dimensionality of NMR spectra was proposed and realized: the extension of 2D experiments to a third dimension. A 3D experiment may be considered to be a combination of two 2D experiments in which the detection period (t2) of the first experiment is replaced by that of a second experiment (of which at least the first 90 pulse of the preparation period is removed). Thus the 3D experiment entails two evolution times (t1 and t2), two mixing periods, and a detection period (t3). The two evolution times are incremented independently, and a 3D Fourier transformation, analogous to the 2D transform described previously, yields three orthogonal frequency axes in a cubic arrangement. Two implementations of this technique involve the combination of the 2D NOESY and TOCSY experiments to give the 3D NOESY–HOHAHA47 (and the closely related HOHAHA–NOESY48) experiments, and of two NOESY sequences, giving a NOESY–NOESY.49 The extension into the third dimension offers a potential increase in resolution since crosspeaks are now characterized by three frequencies. If, for example, both the HN and H signals for two Ala residues were coincident, their scalar correlations would still be distinguishable in a HOHAHA–NOESY if their H protons (to which they would show NOEs) had distinct chemical shifts. Note that this chemical shift difference needs to be larger than the resolution of the spectrum afforded in that dimension – this is less likely than in the corresponding 2D spectra, as the increase in dimensionality effectively results in a decrease in resolution for the same length experiment. Thus, in some cases, this approach can partially alleviate the spectral overlap problem, which hampers the use of homonuclear assignment techniques for larger proteins at a cost of longer experiments.48,50 However, these experiments have not been used widely as they add little additional information to the

Derivation of Peptide and Protein Structure using NMR Spectroscopy

295

G5

G19

A6 G5

G28

A12 4.0

T4

S26

C13

I2

G28

G19 C22

C10 4.5 C17 A11

D35

S21 A24 K23

N27 E25

C32

chemical shift (ppm)

K34

D7

1H

V29

E36 R8

R33

C16

C3

Y31

5.0

T20 9.5

9.0

8.5 1H

8.0

7.5

chemical shift (ppm)

Figure 10 Sequence-specific resonance assignment using interresidue NOEs. A portion of the fingerprint region of a 2D 1 H1H NOESY spectrum ( ¼ 250 ms) of the 37-residue spider toxin -atracotoxin-Hv1c is shown.44 Intraresidue HHN NOEs are highlighted in green. Shown are two sets of sequence-specific resonance assignments obtained by connecting adjacent residues (i and i þ 1) via interresidue H(i)HN(i þ 1) NOEs. The red lines illustrate the backbone ‘walk’ from Ile2 to Arg8, while the blue lines illustrate the backbone ‘walk’ from Gly19 to Ala24.

2D experiments. Consequently, it appears that for proteins that are not amenable to a straightforward 2D homonuclear approach, isotopic labeling is probably the method of choice (see Section 9.09.4).

9.09.3.3

Heteronuclear Resonance Assignment Strategies

9.09.3.3.1

Introduction Traditionally, only homonuclear experiments are used for determination of the structure of peptides when isotopic labels cannot be introduced. The reason for this is primarily the low sensitivity of such experiments, since the natural abundance of the NMR-relevant isotopes of carbon and nitrogen are only 1.1 and 0.4%, respectively (not to mention the unfavorable gyromagnetic ratios of these nuclei compared to 1H). Moreover, although the assignment of these nuclei can aid in obtaining sequence-specific resonance assignments, they do not provide any additional structural restraints. In addition, the low abundance of these isotopes does not allow for heteronuclear-edited experiments (see Section 9.09.4). Due to these drawbacks, heteronuclear experiments were in the past rarely pursued for peptide structure determination. However, with the introduction of residual dipolar couplings (RDCs; see Chapter 9.07) as a source of structural restraints, these experiments may prove useful in structure calculations.51 The method for acquiring RDC information involves recording a set of two spectra for each experiment, one where the sample is isotropically tumbling (normal solution conditions) and one where the sample is in an aligned medium (generally a phage or liquid crystal solution). The spectral qualities in the second sample are often inferior to those of the first sample, due to the restricted tumbling. In addition, the set of 2D spectra used to extract the coupling constants are recorded without decoupling the heteronucleus, resulting in a splitting, which further reduces sensitivity (at least half the SNR) compared to the traditional decoupled spectrum. These experiments are therefore of interest only when large sample quantities

296 Derivation of Peptide and Protein Structure using NMR Spectroscopy

are available (several millimoles per sample). Furthermore, it should be mentioned that recovery of the peptide from the sample containing the alignment media is generally not trivial and therefore often not attempted. Thus, if sample quantities are scarce, this experiment should be left as the final experiment. The assignment of heteronuclei generally requires acquisition of a heteronuclear single quantum coherence (HSQC) experiment and a heteronuclear multiple bond correlation (HMBC) experiment, and these are described in more detail below. 9.09.3.3.2

Heteronuclear correlation spectroscopy Although multibond heteronuclear correlation experiments can to some extent aid in the assignment process, these experiments are most easily analyzed once the sequential assignment process is complete. The simplest experiment is the 1H15N HSQC (see also Section 9.09.4), which correlates each backbone amide proton with its directly attached 15N nucleus. For the purpose of assignment, it is best to acquire this experiment under ideal conditions (i.e., not in an aligned medium and with decoupling of the 15N nuclei during acquisition). If assignments have already been obtained for each of the amide protons, then it is simply a matter of collecting an HSQC experiment with enough resolution and sensitivity to assign all of the directly attached nitrogen atoms. Depending on sample conditions and hardware available, this experiment can take from a few hours up to a day to run (Figure 11). The second heteronuclear one-bond experiment to acquire is the 1H13C HSQC experiment. This experiment correlates the chemical shift of each 1H resonance with the 13C chemical shift of the carbon atom to which it is attached. In comparison with the 1H15N HSQC experiment, the 1H13C HSQC spectrum is more crowded, simply due to the large number of one-bond CH correlations; these correlations include 1 H13C and side chain 1H13C methyl/methylene one-bond correlations. The 1H13C correlations are generally the easiest to assign as they appear in a distinct spectral region (1H ¼ 3.5–5 ppm), they have a simple splitting pattern, and they typically have good signal dispersion. In contrast, the crowded and more complex splitting patterns found in the side chain region (0.5–3 ppm) will make the assignment of anything but the methyl groups rather difficult. One method for improving the level of assignments is to record a 13C HMBC spectrum. This experiment correlates 1H atoms with 13C atoms that share a small coupling constant, thus providing multiple bond correlations. By using the assigned data for the 1H and 13C atoms found by the methods described above, additional assignments may be possible. Apart from aiding in the assignment of side chains, which may be ambiguous from the homonuclear and one-bond heteronuclear assignments, the experiments can provide sequential information as they show correlations to quaternary atoms (such as the CTO). Such assignments

110

125

chemical shift (ppm)

120

15N

115

130

9.5

9.0 1H

8.5

8.0

7.5

7.0

chemical shift (ppm)

Figure 11 Natural abundance 1H15N HSQC spectrum of the 37-residue spider toxin -atracotoxin-Hv1c.44 The experiment was acquired at 900 MHz using a 0.5 mmol l1 unlabeled peptide sample and an acquisition time of 12 h.

Derivation of Peptide and Protein Structure using NMR Spectroscopy

297

may be particularly useful for prolines (which lack HN atoms). Although it is theoretically possible to extract coupling information from HMBC spectra, the low sensitivity of this experiment rarely allows for this. The heteronuclear NMR experiments discussed above highlight how much extra resonance dispersion can be gained via this approach. The power of this added dimension becomes clear if, for example, the 1H15N HSQC experiment shown above, where each HN atom is essentially resolved, was to be combined with a TOCSY or NOESY experiment to provide a third frequency dimension. The resulting 3D 15N-HSQC-TOCSY/NOESY spectrum would contain virtually no overlap of interresidue resonances. Such experiments are indeed possible and have been the driving force in producing uniformly 15N- and/or 13C-labeled proteins. This field has been the most intensely researched area of NMR in the past 20 years, and the strategies employed to determine protein and peptide structures using heteronuclear NMR experiments are discussed in the next section (see Chepter 9.19).

9.09.4 Data Acquisition for Isotopically Labeled Proteins 9.09.4.1

Overview

As mentioned in Section 9.09.3, the homonuclear NMR strategy will fail to provide complete and unambiguous assignments for larger proteins – the slower molecular correlation time with its consequent increase in the efficiency of transverse relaxation translates to broader lines and poor coherence transfer via 1H1H scalar couplings. The increased number of protons in larger proteins also increases the resonance overlap problem. Homonuclear 3D NMR techniques, while providing some relief, still rely heavily on inefficient (for large proteins) 1H1H scalar couplings. The gain in resolution from the added dimension is also tempered by both the limited frequency range of 1H and the large increase in the number of crosspeaks generated in such experiments (compared to either of the constituent 2D experiments). The advent of recombinant DNA technology has allowed the relatively facile production of proteins bearing isotopic labels in a variety of arrangements. For example, specific amino acid types may be labeled (e.g., 100% 13 C labeling of all carbons in all Leu residues) or the whole protein may be labeled uniformly with 13C and/or 15 N. In general, either uniform 15N labeling or 15N/13C labeling is used. The magnetic properties of these nuclei (both with spin quantum number I ¼ 1/2) allow them to be utilized in high-resolution NMR, most commonly by exploiting their often large one-bond and two-bond scalar couplings to each other and to directly attached protons (Figure 12). These large couplings constitute a major advantage of heteronuclear over homonuclear multidimensional NMR, as magnetization transfer is very efficient in comparison with the homonuclear case (where 3 JHH 3–14 Hz). Thus, the 1H15N HSQC experiment (discussed further in Section 9.09.3.3.2),52 which correlates the chemical shifts of 15N nuclei (both backbone and side chain) to their directly attached proton(s), has very high sensitivity because magnetization is transferred via a very large one-bond J-coupling of 90 Hz (Figure 12). The HSQC pulse sequence (see Figure 13(a)) involves the initial transfer of 1H magnetization to 15N through the one-bond coupling (using a sequence known as insensitive ‘nuclei enhanced by polarization transfer’ (INEPT)53), an evolution period (t1) where the magnetization is labeled with the 15N chemical shift, and transfer back to 1H (with reverse-INEPT) for 1H chemical shift detection during t2. Double Fourier transformation yields a 2D spectrum with no diagonal and a single in-phase crosspeak representing each 1HN–15N correlation.

130 Hz

Hβ 7 Hz

11 Hz

15 Hz

Cα 140 Hz

N 90 Hz

C

35 Hz

O

H

55 Hz

C

N

O

H

Figure 12 Segment of a polypeptide chain showing the magnitude of the scalar J-couplings used in heteronuclear NMR experiments.

298 Derivation of Peptide and Protein Structure using NMR Spectroscopy

(a)

Preparaion

Evolution

Mixing

Detection

1H

t2

t1

15N

Decouple

(b) 110

Sidechain

125

chemical shift (ppm)

120

15N

115

130

9.5

9.0

8.5 1H

8.0

7.5

7.0

6.5

chemical shift (ppm)

Figure 13 (a) Pulse sequence for the 2D 1H-15N HSQC experiment. Unfilled and filled rectangles represent 90 and 180 pulses. The delays () are tuned to 1/4J to allow magnetization transfer between 1H and 15N. The gray rectangle on the 15N line indicates decoupling for that nucleus during signal acquisition. (b) 1H-15N HSQC spectrum of a 41-residue peptide toxin (0.5 mmol l1) from the spider Agelena orientalis that has been uniformly labeled with 15N. The experiment shows all amideproton–15N correlations; these arise mainly from the backbone amides and also from side chain amides of Asn and Gln residues. Side chain amide correlations can be readily identified because (1) the 15N nucleus is correlated to two 1H chemical shifts arising from each of the two directly attached protons and (2) each correlation has a weak partner 0.5–0.6 ppm upfield in the 15N dimension that results from the deuterium isotope effect produced by the 10% semideuterated NHD moieties present in 90% H2O/10% D2O solution. One pair of side chain amide correlations is labeled.

Figure 13(b) shows a 1H15N HSQC spectrum acquired from 0.5 mmol l1 sample of a 41-residue peptide toxin from the spider Agelena orientalis. The toxin was produced recombinantly and uniformly labeled with 15N. This HSQC spectrum was collected in 30 min, compared with the 12 h required to acquire a natural abundance spectrum from an unlabeled sample of equivalent concentration (see Figure 11). The HSQC, together with the related heteronuclear multiple quantum coherence (HMQC)54 experiment, forms the cornerstone of a wide range of 2D, 3D, and 4D experiments that are designed to facilitate sequence-specific resonance assignment and determination of protein structure. Note that the HSQC technique is the technique of choice for correlation of 1 H and 15N shifts due to generally narrower linewidths in the 15N dimension.55,56 Furthermore, because these and most of the other heteronuclear experiments described below are designed to observe amide protons, the sample must be in H2O (rather than D2O). Consequently, a means of suppressing the H2O resonance is required (for details see Section 9.09.2.6). Because of the complex multistep nature of multidimensional experiments, their main adversary is transverse relaxation, T2. First, T2 (and hence linewidth) for a particular nucleus determines how efficient coherence

Derivation of Peptide and Protein Structure using NMR Spectroscopy

299

transfer via scalar couplings will be for that nucleus. As described above, when linewidths become larger than the magnitude of the scalar couplings concerned, transfer efficiency declines markedly. The linewidths for a 20 kDa globular protein at 25 C will be 12 Hz for HN, 7 Hz for N when proton-coupled (4 Hz when decoupled), 15 Hz for 13C, and 25 Hz for H (attached to 13C).57 These are in general smaller than the couplings used in these experiments, although it is clear that the small 1JCN coupling is a primary determinant of which experiments may be carried out with reasonable sensitivity as the protein size increases. A second problem is that the transverse magnetization associated with a particular nucleus loses phase coherence (and therefore intensity) at a rate characterized by T2 for that nucleus. Thus it is important to minimize the length of time spent on nuclei such as C, which have comparatively short T2 times (20 ms for C). Thus, experiments that correlate backbone amide nuclei with side chain nuclei of the ‘preceding’ residue are significantly more sensitive than the corresponding intraresidue experiments, as the former avoids the need for magnetization transfer via the 1JCN coupling. For INEPT-type transfer between two nuclei, magnetization must be resident on each nucleus for around 1/2J s. This corresponds to 50 ms spent on C for C ! N transfer, but only 9 ms for C ! C9 transfer. Clearly the latter pathway allows less transverse relaxation. For similar reasons, most of the experiments described below will decrease rapidly in efficiency for proteins larger than 20 kDa. Obviously, it is also important to limit the total length of the pulse sequence so as to minimize T2 relaxation prior to signal acquisition. Simple concatenation of magnetization transfer and free precession (frequency labeling) periods as described above produced the first triple resonance experiments, but it is possible to overlay such periods so that both free precession and magnetization transfer occur during the same time interval. This gives rise to so-called ‘constant time’ experiments with significantly shortened pulse sequences.58,59 This is one of the many ‘tricks’ employed to increase SNR in heteronuclear multidimensional NMR experiments (including the use of PFGs60 and sensitivity enhancement61,62). The increase in sensitivity gained by application of such tricks can then be used to either improve resolution (shorten experiment time) through the use of nonuniform sampling63 or study larger systems (or both in favorable cases64). In addition, for very large proteins, various sequences incorporating the TROSY principle have been developed, which often also greatly benefit from 2H labeling, since relaxation by 2H is less efficient than by 1H, leading to longer T2.65 These strategies are, however, not discussed in this chapter, as they are methods that only improve the applicability and sensitivity of the experiments covered here. The principles covered here are those that are fundamental to NMR structure determination and valid regardless of pulse sequence elaborations designed to improve sensitivity or resolution. As mentioned above, the most common labeling strategies for NMR studies of proteins are either uniform 15N labeling or uniform double labeling (15N/13C). 13C enrichment is generally more expensive, and in some expression systems more difficult than 15N labeling. The experiments that employ single-labeled samples are often referred to as heteronuclear-edited experiments and those that employ doublelabeled samples are known as triple resonance experiments. This section is thus split into two segments, which describe the assignment strategy for each of these two cases. 9.09.4.2

Heteronuclear-Edited NMR Experiments

Concatenation of the 1H15N HSQC (or HMQC) sequence with a 1H1H NOESY gives rise to the 3D 15N-edited NOESY–HSQC (or 3D NOESY–HMQC) experiment.66–68 Here, two of the frequency dimensions represent the amide 1H and 15N chemical shifts, while the third dimension provides information about the chemical shift of protons with which each amide proton is dipolar coupled (i.e., separated by <5.5 A˚). The spectrum is routinely viewed as narrow 2D (1H1H) strips taken at the 15N chemical shift of each crosspeak in the 1H15N HSQC spectrum (see Figure 14). As exemplified in Figure 14, the increase in resolution compared to a simple 2D NOESY is dramatic, due in part to the lack of a straightforward correlation between 15N chemical shift and the secondary structure in which a residue is located (in contrast to the case of HN, H, and C chemical shifts). An analogous combination of TOCSY and HMQC/HSQC yields 3D TOCSY–HMQC/HSQC,69,70 where the third dimension as described above shows the chemical shifts of protons to which the amide protons would exhibit correlations in a conventional TOCSY (i.e., those protons in the same spin system). Thus, when satisfactory NOESY–HSQC and TOCSY–HSQC spectra are obtained, a semiclassical route to resonance assignment

300 Derivation of Peptide and Protein Structure using NMR Spectroscopy

2D

3D

1.0

2.0

3.0

1H

4.0

(ppm) 5.0

6.0 Lys17

Trp13

Gly33

δ(15N): 119 δ(15N): 120 δ(15N): 109 7.0

8.0

8.0

8.0 1H

8.0

8.0

(ppm)

Figure 14 Comparison of 2D NOESY and 3D 15N-edited NOESY–HSQC spectra of a 41-residue peptide toxin from the Australian funnel-web spider Hadronyche infensa. A strip from the 2D NOESY spectrum is shown on the far left and it illustrates overlapping NOE correlations from three different amide protons (those of Trp13, Lys17, and Gly33). Fortunately, the 15N nuclei for these three amide groups have unique chemical shifts and hence they appear on different 2D planes in the 3D NOESY–HSQC experiment. Strips from these three planes are shown on the right, and they demonstrate that all of the NOE correlations are perfectly resolved in the 3D experiment.

can be followed. TOCSY skewers from the TOCSY–HSQC are used to identify spin system types and to account for intraresidue NOEs in the NOESY–HSQC. Sequential NOEs can then be identified from the latter spectrum and used to deduce interresidue connectivities as usual. Information from 2D DQFCOSY and 2D NOESY spectra may aid assignment, as many direct scalar correlations and NOEs should be distinguishable, even for large proteins. 2D versions of these two 3D experiments, consisting of 15N shifts in one dimension (F1) and skewers of NOE/TOCSY correlations to the directly attached amide proton in the other (F2), have been described.55,71 These 2D experiments have the advantage of smaller demands on spectrometer time and easier implementation, although their effective resolution compared to the 3D experiments is obviously much poorer. Note that analogous experiments, such as the 13C-edited HSQC–NOESY,72 can be performed on 13C-labeled proteins. For labeled proteins, this latter experiment provides the largest number of conformational restraints for protein structure calculations (see Section 9.09.5.2). The 15N-edited NOESY–HSQC only provides distance information for protons that are close in space to amide protons (since magnetization originates and/or terminates

Derivation of Peptide and Protein Structure using NMR Spectroscopy

301

on an amide proton). Thus, it provides no information about distances between pairs of carbon-bound protons, which represent by far the largest group of close interproton distances in proteins. A further implementation of heteronuclear editing of homonuclear spectra can be carried out when specifically labeled samples are available (e.g., 15N labeling of all Leu residues). Normal 2D homonuclear pulse sequences to which a so-called difference echo is appended yield 1H–1H spectra where only the residues carrying the labels appear.73,74 This can be useful for resolving ambiguities that may be present even in the 3D experiments, or to study interactions of two differently labeled proteins.75 Some researchers have made full chemical shift assignments by generating many such specifically labeled samples and applying these techniques to each one.76,77 However, this is a very labor-intensive approach that requires a well-behaved expression system, and it is not expected to be widely applicable.77 From a resonance assignment viewpoint, the most significant limitation of the experiments described in this section is that they rely on the transfer of magnetization via small homonuclear couplings (3JN can be as low as 3 Hz for -helical regions of a protein), and hence they will fail for larger proteins, as noted earlier. Moreover, the sequential assignment process requires the use of NOEs, which do not provide unambiguous connections as readily as scalar couplings; assignment consequently involves a pattern matching process that is very time consuming and prone to error. The next section describes an approach that largely circumvents these problems and allows routine resonance assignment for proteins up to 20 kDa (and larger in favorable cases, particularly when deuteration is applied).78 9.09.4.3

Triple Resonance Experiments for Protein Backbone Assignment

As previously discussed, the large size of one- and two-bond heteronuclear (and homonuclear JCC) couplings (Figure 12) results in very efficient magnetization transfer (using either HSQC or HMQC sequences) relative to homonuclear scalar transfer through either COSY- or TOCSY-type techniques. Therefore, resonance assignment using NMR spectra that make use of these large couplings represents an appealing alternative. Although two of these couplings (1JNC and 1JNC9) are quite small, the heteronuclear experiments have been (and still are being) carefully designed and optimized so as to minimize the problems presented by these couplings (see below). The concept underlying this class of experiments is that magnetization is transferred between nuclei via scalar couplings, such that the frequencies of some or all of the atoms involved in the transfer pathway are sampled. Thus a 3D (or 4D) spectrum is obtained that correlates the chemical shifts of three (or four) nuclei as defined by the coherence pathway chosen. In this way, a number of different interresidue correlations can be made, providing unambiguous sequential assignments (cf. NOE-based connections). The size of these one-bond J-couplings is generally insensitive to conformation, allowing the delays in the pulse sequences to be accurately tuned to the coupling constants. Although the names of triple resonance NMR experiments appear rather esoteric, they are in fact rationally derived on the basis of the nuclei that are involved in the coherence transfer pathway; nuclei that are used in the transfer pathway, but whose chemical shift is not sampled, appear in parentheses. Thus, the HNCO experiment provides interresidue correlations between the HN and N nuclei of residue i, and carbonyl carbon of residue i 1; in the pulse sequence name, the term ‘HN’ implies that both the amide proton and its attached nitrogen are frequency-labeled. Note that CO refers to the carbonyl carbon, and that CA, CB, HA, and HB refer to the C, C, H, and H nuclei, respectively. The order in which the atoms are listed indicates the direction of the magnetization transfer. In virtually all cases, magnetization starts on 1H and is transferred to a heteronucleus (using either an INEPT or HMQC transfer), so that the sensitivity of the experiment is increased relative to starting with the magnetization on the heteronucleus.79 In addition, 1H magnetization is always detected in the direct dimension, again for sensitivity reasons, and therefore experiments such as the HNCO are termed ‘out-and-back’ experiments. That is, after transfer to CO, the reverse pathway is traced, so that the entire experiment is described by HN(i) ! N(i) ! CO(i 1) ! N(i) ! HN(i). The HN(CA)CO,80 also an out-andback experiment, selects a symmetrically related intraresidue pathway: HN(i) ! N(i) ! C(i) ! CO(i) ! C(i) ! N(i) ! HN(i). The C is not frequency labeled and hence it appears in parentheses. Note that the C nuclei (together with C nuclei in other experiments) are treated separately from the carbonyl carbons in these sequences. To make this possible, specialized pulses must often be generated that excite specific spectral regions; for example, a pulse may be required that excites the CO region of the carbon spectrum but not the

302 Derivation of Peptide and Protein Structure using NMR Spectroscopy

N

C

N

N

C

N

N

C

N

H

O

H

H

O

H

H

O

H

Figure 15 Schematic illustration of three different pairs of triple resonance NMR experiments that can be used for making sequence-specific resonance assignments. Left panel: HNCACO and HNCO; middle panel: HNCA and HN(CO)CA; right panel: HNCACB and CBCA(CO)NH. In each case, the experiment listed first, which is shown in red, provides intraresidue correlations (and sometimes also interresidue correlations), whereas the experiment listed second, shown in blue, provides only interresidue correlations.

aliphatic region. This requires either a spectrometer with four separate amplifiers or, if this is not available, the use of off-resonance frequency-selective pulses (for a review of selective pulses, see Kessler et al.81). A formidable array of triple resonance experiments has been developed since the concept was first introduced.82–84 Consequently, a number of different triple resonance strategies are available for making sequence-specific resonance assignments, and the strategy that works best will depend on the size, and hence the relaxation properties, of the protein concerned (in particular C and H atoms, see below). These experiments mostly consist of complementary pairs (see Figure 15), with one providing exclusively interresidue connectivities and the other providing both intra- and interresidue connections. This situation arises because C(i) is coupled to both N(i) (1JCN) and N(i 1) (2JCN), so that any sequence that transfers magnetization between C and N will branch off in two directions. Crosspeaks resulting from these two pathways are often distinguishable in the final spectrum due to the lower intensity of the interresidue signals (see Figure 16), which arise from the smaller 2JCN coupling constant. Conversely, the interresidue pathway can be selected exclusively by routing magnetization (originating from either N(i) or C(i 1)) through C9(i 1). Thus, the basic strategy is to create clusters of nuclei and then to link these clusters (preferably in two or more independent ways) to generate fragments of sequentially linked amino acid residues. These fragments can then be matched with the known amino acid sequence of the protein using spin system information, generally with the help of chemical shift information. The HNCA58,83 and HN(CO)CA58,85 are one pair of experiments that can be used for sequence-specific resonance assignment (see Figures 15 and 16). The HNCA correlates the N(i)H(i) unit with both C(i) and C(i 1), while the HN(CO)CA, by virtue of the transfer through CO(i), provides only the interresidue NH(i) ! C(i 1) correlation. In theory, these two experiments should suffice to elucidate sequential assignments for all HN, N, and C nuclei. However, in practice, overlap and/or missing signals preclude this, and a second linkage between residues, involving an atom other than C, is required. One possibility is to use H; the HN(CA)HA86 and HN(COCA)HA80 experiments achieve this connection. Another method uses CO as the linking nucleus. The HNCO58,87 is one of the most sensitive triple resonance experiments (see below), correlating the H(i)N(i) unit with CO(i 1), while the complementary experiment HN(CA)CO gives the intraresidue H(i)N(i) ! CO(i) correlation (and often a weaker interresidue crosspeak; see Figure 16). Unfortunately, the latter experiment suffers from lower sensitivity because of a long residence time on C, which has a fast transverse relaxation rate, and hence it is not suitable for large proteins. Thus, two of these sets of two 3D spectra can provide a pair of interresidue links; the same result can be arrived at by recording two complementary 4D experiments, which sample an extra chemical shift each during the magnetization transfer pathway. For example, the HNCAHA88–91 and HN(CO)CAHA88,91 combine the first four 3D triple resonance experiments described above, resulting in two connections between residues (H and C chemical shifts) in each experiment. An alternative and powerful method for providing two interresidue links is achieved using two 3D experiments, the CBCANH90 (or the very similar, but more sensitive, HNCACB)92 and the CBCA(CO)NH (see Figures 15–17).93,94 These experiments connect H(i)N(i) units of one residue with the C and C atoms of both the same and the preceding residues, in a manner similar to the HNCA/HN(CO)CA pair (i.e., up to four crosspeaks are seen at each combination of N/HN frequencies). Thus, the HNCACB yields the frequencies of up to six nuclei from a single 3D data set (HN(i), N(i), C(i), C(i), C(i 1), and C(i 1)). Interpretation of the spectra is simplified, however, by the opposite signs of the

Derivation of Peptide and Protein Structure using NMR Spectroscopy

HN(CA)CO

HNCO

303

172 174 176 178 180

HN(CO)CA

48

56 60 64 68 HNCACB

CBCA(CO)NH

chemical shift (ppm)

52

13C

HNCA

30 38 46 54 62 70

9.2

9.0

8.8

8.6 1H

8.9

8.7

8.5

8.3

chemical shift (ppm)

Figure 16 2D planes taken from pairs of 3D triple resonance NMR experiments designed to obtain sequence-specific resonance assignments for a 41-residue peptide. The 2D planes were extracted along the 15N dimension. The planes in red show intraresidue correlations, while the spectra in blue show interresidue correlations. The vertical lines show the frequency positions at which a 1D trace was extracted along the 13C dimension. These 1D traces are shown on the right-hand side of each 2D plane, and they provide an indication of the relative sensitivity of each of the experiments. Note that the HNCACB spectrum contains peaks of opposite sign for the C correlations (shown in red) and the C correlations (shown in green). The horizontal dotted lines highlight the correlations that are obtained in both spectra, which enable sequence-specific assignment (also see Figure 17).

C and C correlations. Two additional features add to the utility of these experiments. First, the chemical shifts of C and C enable facile identification of the spin systems of several residues (Ala, Thr, Ser, and also Gly due to the absence of a C correlation), providing entry points for sequence-specific assignment. Second, the measured C and C chemical shifts overlap with correlations obtained in the experiments used for obtaining side chain assignments (Section 9.09.4.4). A closely related experiment that correlates the amide unit of residue i with the side chain protons (rather than the carbons) of residue i 1 is the HBHA(CBCACO)NH, which forms an assignment pair with the HNHAHB.93,94 The 3D HCACO59,83,95,96 and 3D HCA(CO)N59,83,95,96 are also very useful experiments. The HCACO has high sensitivity (because the small JCN couplings are avoided) and correlates three intraresidue atoms; it is therefore used widely to complement the other triple resonance experiments. The HCA(CO)N is one of the less sensitive experiments, but its particular benefit lies in its ability to provide correlations to amide nitrogens that are connected to broadened amide protons,97 and/or assign prolines (which do not have a HN atom). Such protons may occur in flexible regions (such as loops and the N- and C-termini) and may fail to provide crosspeaks with sufficient intensity in many of the other triple resonance experiments, which both begin with and detect HN magnetization. Because both of these experiments detect H in the direct dimension, the spectra must be recorded in D2O. This may give rise to a small isotope shift of the 13C resonances (C9 and C), although

304 Derivation of Peptide and Protein Structure using NMR Spectroscopy

1H:

8.35 117.9

15N:

1H:

7.58 119.1

15N:

1H:

7.89 120.4

15N:

Y13 CB

V12 CB

CBCA(CO)NH

N11 CB

HNCACB

N11 CA Y13 CA

V12 CA

Asn11

Val12

Trp13

Figure 17 An example of the pattern matching process used to obtain sequence-specific resonance assignments from pairs of triple resonance NMR spectra. The spectra were acquired from a 41-residue peptidic spider toxin (0.5 mmol l1) at 900 MHz. Shown are pairs of strips from the CBCA(CO)NH (gray) and HNCACB (green) spectra taken at the same 1H and 15N frequencies. Note that the interresidue HN(i)N(i) ! CC(i 1) correlations appear in both strips (as indicated by the horizontal dotted lines). These correlations need to be matched with intraresidue correlations in another pair of strips to provide a set of sequence-specific resonance assignments. For example, note how the two ‘weak’ interresidue correlations in the HNCACB strip for Val12 can be matched with the two ‘strong’ intraresidue correlations in the HNCAB strip for Asn11, indicating that these residues are adjacent.

such shifts are readily accounted for. The suppression of residual water can also saturate H protons that lie directly underneath the water resonance, thus preventing the observation of correlations for these residues.98 9.09.4.4

Triple Resonance Experiments for Protein Side Chain Assignment

There are two further experiments that together provide multiple interresidue connectivities; however, they are primarily used to complete the side chain assignment process. In the 3D H(CCO)NH–TOCSY99,100 and H(C)NH–TOCSY,99,101 magnetization begins on the side chain protons of residue i 1 and is first transferred to the attached carbon using INEPT. It is then propagated along the side chain via an isotropic (TOCSY-like) mixing sequence. In the former experiment, the magnetization that arrives at C(i 1) at the end of the mixing period is transferred to CO(i 1) and then to N(i) and finally HN(i) for detection. In the latter sequence, direct transfer from C(i 1) to N(i 1) and N(i) yields, as for the HNCA, both intra- and interresidue connectivities (although the latter are rather weak). During these experiments, the chemical shifts of the side chain protons Hx(i 1), N(i), and HN(i) are sampled (and also N(i 1) and HN(i 1) in the H(C)NH–TOCSY). In principle, the information provided by the H(C)NH–TOCSY is available in a 15N-edited TOCSY–HSQC spectrum.

Derivation of Peptide and Protein Structure using NMR Spectroscopy

305

In practice, this is often not the case since magnetization is transferred along the side chain via small and inefficient homonuclear couplings in the case of the TOCSY–HSQC compared with the large 1JCH and 1JCC couplings in the H(C)NH–TOCSY. These experiments can be recorded in four dimensions with the side chain carbons comprising the fourth dimension (i.e., HCNH–TOCSY and HC(CO)NH–TOCSY),99 in order to provide 13C chemical shifts and a potential increase in resolution if ambiguities still remain. Note also that the H(CCO)NH–TOCSY is very similar in concept to the HBHA(CBCACO)NH described above. The main difference is that transfer from C to C in the latter case is through a COSY-type step. This excludes magnetization that may have originated on C from being observed, in contrast to the isotropic mixing sequence used in the two TOCSY-type experiments. 3D HCCH experiments,102–106 that is HC(C)H–COSY and HC(C)H–TOCSY, provide the ability to obtain assignments for side chain protons and carbons in larger proteins (up to 30 kDa) where the 3D TOCSY–HSQC fails because of the large 1H linewidths. In these experiments, proton magnetization is first transferred to carbon (1JCH 125–150 Hz) from where it is propagated along the carbon skeleton of the residue via the one-bond 13C13C couplings (1JCC 30–55 Hz) using either HOHAHA- or COSY-type methods. Finally, 1H magnetization is detected during the acquisition period following transfer of magnetization from 13 C to directly attached protons via the large 1JCH coupling. In this way, indirect proton correlations (such as are normally found in TOCSY) are observed using only larger heteronuclear couplings (>30 Hz). Each crosspeak is characterized by the frequencies of two (directly or indirectly coupled) protons and the carbon to which the magnetization was first transferred. Consequently, there is much redundant information in these spectra, since the transfer will proceed in both directions. The HCCH–COSY works in an analogous manner, except that a single step 13C ! 13C transfer is used, such that correlations are seen only between directly coupled protons (i.e., protons separated by 3 bonds). The HCCH experiments avoid the pitfalls caused by proton linewidths being greater than 3JHH couplings by transferring magnetization exclusively via large one-bond couplings. These experiments have proven extremely useful in the assignment of both proton and carbon side chain resonances in medium- to largesized proteins. Note also that the HCCH experiments are best carried out in D2O, in order to avoid the large water signal that could otherwise obscure a number of H resonances and their associated correlations. Finally, a number of more specialized experiments have been proposed that deal with specific problems in the side chain assignment process. For example, although the aromatic protons of the Phe and Tyr residues have traditionally been assigned using NOE connectivities between H and H protons, heteronuclear NMR experiments have been developed that allow these aromatic protons to be unambiguously assigned using scalar rather than dipolar correlations. For example, 2D (HB)CB(CGCD)HD and (HB)CB(CGCDCE)HE experiments can be used to correlate the 13C chemical shift of Phe and Tyr residues with the 1H chemical shift of the H and H" protons, respectively, using magnetization transfer solely via scalar couplings.107 9.09.4.5

Summary of Resonance Assignment Strategies

With such an array of possible NMR methods for resonance assignment, it must be decided which is the most suitable approach for the protein of interest. Although peptides and many smaller proteins may be amenable to the homonuclear approach described in Section 9.09.3, there are a number of advantages in using a heteronuclear approach employing 15N- or 15N/13C-labeled protein. As long as a suitable expression system is available, labeling is relatively straightforward and (at least for 15N) not excessively expensive. For smaller proteins, 15N labeling will significantly simplify the resonance assignment process by allowing the use of 15 N-edited NOESY–HSQC and TOCSY–HSQC experiments, and a number of medium-sized proteins have been assigned using this 15N-directed strategy.69,76,108,109 Note, however, that the 15N-edited NOESY–HSQC only yields NOEs involving at least one amide proton, and NOEs between carbon-bound protons will have to be obtained from analysis of 2D NOESY spectra. However, the additional advantages of 15N labeling, such as the ability to measure amide-proton exchange rates conveniently and probe backbone dynamics, make it an attractive strategy even for proteins smaller than 10 kDa. For proteins smaller than 10 kDa, the 15N-only approach may prove adequate. If not, double labeling with both 15N and 13C will be necessary, and assignment will be most readily achieved using triple resonance

306 Derivation of Peptide and Protein Structure using NMR Spectroscopy

experiments. A key feature of the triple resonance strategy is that nearly all resonance assignments can be made on the basis of scalar couplings. A further advantage is that they provide 15N and 13C chemical shifts; the former are used in amide exchange and backbone dynamics studies, while the latter contain information on secondary structure (see below) and together these shifts can be used to derive estimates of backbone and dihedral angles (see Section 9.09.5.3). A decision on which of the armory of triple resonance experiments to use will be partly based on the relaxation properties (i.e., size) of the protein. This is essentially due to the rapid transverse relaxation of C and H and its steep dependence on molecular correlation time (see Figure 7). Experiments that involve long residence times on these nuclei (e.g., HNCACB and HN(CA)CO) will often be of limited use for proteins larger than 20 kDa. Although several discrete triple resonance strategies that in theory can yield complete assignments have been outlined above, in practice a whole battery of these experiments are often applied to a protein.110 This is generally the result of problems that arise during the assignment process and which could not be predicted ab initio, such as the chemical shift coincidence of several atoms in a cluster with those from another cluster. Alternatively, some crosspeaks may be weak or absent in a given spectrum, for one of a number of reasons. For example, amide protons in less ordered regions may exchange rapidly with solvent protons; this will broaden their NMR signal, reducing the obtainable SNR for the corresponding crosspeak(s) in multidimensional spectra. Conversely, the carbon atoms of such mobile regions will be much sharper than those of the remainder of the protein (since they are effectively rotating independently of the bulk of the protein, and therefore exhibit longer T2 values). The observation of weaker correlations in the presence of such narrow, intense ones can be exacerbated by artifacts such as t1 noise associated with the stronger crosspeaks.97 The decision as to which triple resonance experiments to use therefore remains to some degree empirical, and often needs to be determined separately for individual cases and for individual spectrometers. As a starting point however, the more sensitive ‘out-and-back’ experiments should be attempted first (e.g., HNCO, HNCA, HN(CO)CA, and CBCA(CO)NH).

9.09.5 Extraction of Structural Constraints 9.09.5.1

Overview

Once resonance assignment is complete, the next step in the structure determination process can be tackled, namely the extraction of structural restraints from the NMR data. These restraints can then be used as input to a computer algorithm that attempts to calculate a 3D structure of the protein (or generally a family of structures) that is consistent with these restraints. The most important source of conformational information is homonuclear 1H–1H NOEs, which are observed between protons that are spatially separated by 5.5 A˚. Scalar coupling constants (especially 3J-couplings) provide information about torsion (dihedral) angles that can be used to derive angle restraints for structure calculations. Hydrogen bonds can be inferred or observed directly using NMR (see Section 9.09.5.4) and distance restraints defining these hydrogen bonds can also be used in structure calculations. Finally, the chemical shifts of various nuclei have been shown in many cases to be reliable indicators of protein secondary structure. In the sections below, we consider the information content of each of these classes of NMR data. Residual dipolar couplings (RDCs) have been briefly mentioned in Section 9.09.3.3.1 and they are discussed in more detail in Chapter 9.07. 9.09.5.2

Interproton Distances

When two protons are close in space, they are said to be dipolar-coupled (as opposed to scalar-coupled, which is a through-bond coupling mechanism). The modulation of this dipolar coupling as a result of molecular tumbling allows relaxation of the protons, and this relaxation may be manifested as a NOE, which can be observed as a crosspeak between the two protons in a two- or higher dimension NOESY experiment (NOESY). For large molecules such as proteins, this relaxation occurs predominantly through coupling of the dipole modulation to a simultaneous mutual spin flipping of the protons, which is essentially a zero frequency process. As the protein gets larger, its tumbling (and hence the dipole modulation) occurs at a slower frequency (Equation (3)). The prevalence (spectral density) of low-frequency processes35 therefore increases, and

Derivation of Peptide and Protein Structure using NMR Spectroscopy

307

coupling to the zero frequency proton–proton cross-relaxation process becomes more efficient. As relaxation becomes more efficient, the rate of buildup of an NOE during the mixing period of a NOESY increases. The buildup rate ( ) also depends on the strength of the dipolar interaction between the two protons, so that

¼

1 c r6

ð4Þ

The 1/r6 dependence of causes the buildup rate to fall off very rapidly with internuclear distance, with the result that NOEs are short-range interactions that are typically not observed between protons separated by more than 5.5 A˚ (but see below). Nevertheless, this provides extremely valuable structural information since spatially proximal protons will yield a crosspeak in NOESY spectra regardless of how distal they are in the amino acid sequence. Since several thousand NOEs will be observed for even a protein of modest size, NOEs provide the most important structural restraints for structure calculations. But first they need to be assigned to specific proton pairs, quantified, and converted into distance information. For small proteins (5 kDa), complete or nearly complete assignment of NOE connectivities can usually be accomplished by visual inspection of 2D NOESY spectra (see Clore and Gronenborn,111 and references therein). Many of these NOEs would have been identified as a matter of course if the homonuclear assignment procedure described in Section 9.09.3 was employed. Any ambiguities in NOE assignments may be resolved by an iterative back-calculation procedure,112–114 whereby structure calculations are first carried out using only unambiguously assigned NOEs. The resulting preliminary structures are used to resolve multiple assignment possibilities for unassigned NOEs by excluding those possibilities that are grossly inconsistent with the calculated structures. The newly assigned NOEs are included in a second round of calculations, and the new, better defined structures used to assign NOEs that remained ambiguous after the first iteration. This procedure can be repeated until no further ambiguities are resolved. For large proteins, severe resonance overlap precludes a simple 2D approach, and resolution enhancement by incorporation of an extra dimension (or two) into the experiment is required. This can potentially be achieved in a 3D homonuclear experiment (3D NOESY–HOHAHA or NOESY–NOESY), but far more preferable is editing of the NOESY according to the frequencies of attached heteronuclei using an isotopically labeled sample. Thus a range of experiments are available, depending on the labeling pattern present in the protein: 3D 15N-edited NOESY, 3D 13C-edited NOESY, 4D 13C,13C-edited NOESY, and 4D 13C,15N-edited NOESY. These experiments, in particular the 4D versions, provide dramatic increases in resolution (see, for example, Clore et al.115), despite restricted digital resolution, because so few crosspeaks appear in each plane. A disadvantage of the 15N-edited experiments is that they only yield NOEs involving at least one amide proton. Thus for a high-resolution structure, 13C-edited NOESY experiments are essential so that NOEs between pairs of carbon-bound protons can be detected.116,117 Once the identity of all or most of the observable NOEs is established, the proximity of each proton pair must be gauged. As noted above, the rate of buildup of an NOE is proportional to the distance between the two protons. However, because all other proton pairs in a molecule give rise to oscillating fields at similar frequencies (as a result of molecular tumbling), these pairs can contribute to the cross-relaxation of a proton. This phenomenon is termed spin diffusion (SD) since it results from a stepwise (diffusive) transfer of magnetization away from a given proton pair via other neighboring protons. The observable results of spin diffusion are (1) a change in the shape of the buildup curve for direct NOEs (see below) and (2) the appearance of crosspeaks between spatially distal protons. Obviously, the latter effect reduces the useful information content of an NOE experiment and SD should therefore be minimized as much as possible. Since SD is an indirect phenomenon, its buildup has an initial lag phase in comparison to direct NOEs, and thus the use of short mixing times allows its effects to be discarded to a first approximation. Note that since cross-relaxation is more efficient for larger values of c, SD becomes more of a problem the larger a protein is, necessitating the use of shorter and shorter mixing times. The buildup of direct NOEs is approximately linear at short mixing times. In this regime (the isolated spinpair approximation (ISPA)),118 it is assumed that the intensity of the observed crosspeak is directly proportional to r6. The proportionality can be estimated by measuring the intensities (Iref) of NOEs between protons that

308 Derivation of Peptide and Protein Structure using NMR Spectroscopy

are separated by a fixed, conformation-independent distance (dref), such as geminal methylene (1.7 A˚) or orthoaromatic (2.45 A˚) protons. Unknown distances, dij, can therefore be calculated as dij ¼ dref

1=6 Iref Iij

ð5Þ

where Iij is the intensity of the crosspeak of interest. These derived distances represent ‘upper limits’ for the interproton distance, since a number of mechanisms may operate to reduce the observed NOE intensity and lead to the estimation of an artificially longer distance. In addition, the use of a single reference distance can introduce systematic errors.119 Consequently, a better way to derive less biased distance estimates is to use two different types of reference distances in combination.119 For structure calculations, error ranges need to be placed on the derived distances. The most conservative method is to simply assign an upper distance bound of 5–6 A˚ to all proton pairs that yield an NOE, irrespective of NOE intensity. This can be useful for rapidly determining the overall fold of a protein, but it clearly discards useful information. An alternative approach is to partition the distances into broad categories: for example, 1.8–2.8, 1.8–3.5, and 1.8–5.0 A˚ for strong, medium, and weak NOEs, respectively, where 1.8 A˚ is the van der Waals contact distance between two hydrogen atoms.120 If stereospecific assignments are not available for pairs of methylene protons or Leu/Val methyl groups, a so-called pseudoatom is created midway between each pair in the structure calculations, and the upper distance bound is relaxed for those NOEs (by 1 A˚ for a methylene pair).121 This categorization procedure is still rather conservative, but it is better to underinterpret than overinterpret the data. In any case, the most important factor in determining the final quality of an NMR structure is the total ‘number’ of distance restraints, not their ‘precision’.34,122 Because ISPA is an approximation, care should be taken not to overinterpret the interproton distances derived from Equation (5). The isolated spin-pair approximation is really only applicable to backbone protons such as H and HN for which the effective correlation time for modulation of the dipolar coupling is equivalent to, or very close to, the molecular correlation time c. This will not be the case for protons found in more dynamic regions of the protein, such as flexible loops, or protons at the tip of long side chains such as those of Arg and Lys residues. A ‘uniform averaging model’ developed to account for this flexibility123 shows that the relationship between NOE intensity and internuclear distance for protons in more dynamic regions of the protein is closer to r4 than the r6 in the ISPA model. Thus, structure calculation programs such as CYANA114 use a more complicated NOE calibration model that is based on an r6 dependence for NOEs involving only backbone protons and an r4 dependence for backbone–side chain and side chain–side chain NOEs. Regardless of whether automatic or manual calibration is used, care should be taken to ensure that an appropriate calibration model is used for converting NOE intensities into interproton distances. 9.09.5.3

Backbone Dihedral Angles

The backbone conformation of a peptide or protein can be completely defined by specifying the value of the , , and ! dihedral angles for each amino acid residue. Since peptide bonds invariably assume the trans conformation (! ¼ 180 ), except in rare instances, experimental determination of and would obviously be extremely useful for defining a protein’s 3D structure. NMR can be used to obtain estimates of the value of dihedral angles (although not with a high degree of precision) by taking advantage of the fact that the magnitude of three-bond coupling constants has a characteristic dependence on the dihedral angle between the two coupled atoms. This dependence is described by a Karplus equation124 of the type J ðÞ ¼ A cos2 – B cos þ C

ð6Þ

where the constants A, B, and C have been determined empirically for various types of dihedral angle.125 The Karplus relationship has the form shown in Figure 18, which shows the dependence of the 3J(HNH) coupling constant on the protein backbone dihedral . Although there are potentially multiple solutions for at a given value of 3J(HNH), in practice in proteins is mostly restricted to the range ¼ 30 to 180 ,126 such that unique solutions are possible for many values of 3J(HNH). In particular, regular secondary structural elements

Derivation of Peptide and Protein Structure using NMR Spectroscopy

309

10 9 8

3J(H H ) N α

(Hz)

7 6 5 4 3

β

α

–120

–60

2 1

60

120

φ (degrees) 3

Figure 18 Plot of the coupling constant J(HNH) as a function of the associated backbone dihedral angle (based on the Karplus parameterization described by Billeter et al.130). The approximate regions of space associated with -helix and -sheet are indicated and it can be seen that these two secondary structural elements give rise to distinct 3J(HNH) values of <6 and >8 Hz, respectively.

have characteristic 3J(HNH) values of 3–6 Hz for -helices and >8 Hz for -strands. Intermediate values between these two ranges may represent rigid structures with a well-defined angle, but more often indicate averaging of the torsion angle through internal motion (e.g., in flexible loops or flexible N- and C-terminal regions). Thus, although values of 3J(HNH) can often be measured for most or all residues in a peptide or protein, typically only those values that correspond to these secondary structure elements are converted to angle restraints for use in structure calculations. For example, it is common in structure calculations to restrain to 120 30 and 60 20 for 3J(HNH) > 8 Hz and 3J(HNH) < 6 Hz, respectively.127 These dihedral angle restraints are extremely useful in structure calculations since a single dihedral angle restraint typically constrains the solution confirmation of a protein much more than a single NOE-derived interproton distance restraint. The dihedral angle is associated with as many as six different coupling constants: 3J(HNH), 3J(HNC), 3 J(HNC9), 3J(C9i1H), 3J(C9i1C), and 3J(C9i1C9). However, 3J(HNH) is the most experimentally accessible coupling and the most widely used for estimation of , and hence we will focus our discussion on methods for estimating this coupling constant. For small unlabeled peptides, 3J(HNH) can often be measured directly from the separation of the two components of the amide-proton doublet in a high-resolution 1D spectrum (or from the three components of the amide-proton triplet in the case of glycine residues, where HN is coupled to two H protons). Alternatively, 3J(HNH) can be measured from the magnitude of the F2 antiphase splitting of the HN–H crosspeaks in a high-resolution DQFCOSY spectrum.128 However, this method is generally not suitable for large proteins,129 since, irrespective of the coupling constant, the minimum separation of the antiphase components in a DQFCOSY crosspeak is 0.576 times the linewidth.119 Thus, for proteins with linewidths 10 Hz, 3J(HNH) couplings smaller than 5 Hz will be overestimated. It might still be possible to broadly group the couplings into those that are <6 Hz and those that are >8 Hz, but even these estimates become unreliable for proteins larger than 15 kDa. A number of heteronuclear NMR methods have been developed for measuring 3J(HNH) in proteins that can be isotopically labeled with 15N. These methods utilize the large one-bond 15N–1H coupling constant, rather than relying on the measurement of antiphase splittings, and they allow measurement of 3J(HNH) in proteins as large as 20 kDa. The J-modulated 1H–15N HSQC129–131 consists essentially of a normal HSQC with an extra delay period 2 appended prior to signal acquisition. During this delay, the amide-proton magnetization, which has been labeled during 1 with the attached 15N frequency, evolves according to its coupling to H.

310 Derivation of Peptide and Protein Structure using NMR Spectroscopy

Consequently, the intensity of the observed crosspeaks is modulated according to both 3J(HNH) and 2, according to the equation V ð2 Þ ¼ A ½cos ðJ 1 Þ cos ðJ 2 Þ – 0:5 sin ðJ 1 Þ sin ðJ 2 Þe – 2 =T 92

ð7Þ

whereV ð2 Þ is the crosspeak volume as a function of the delay time 2, A is the crosspeak volume at 2 ¼ 0, J is the 3J(HNH) coupling constant, and T29 is the apparent 1H transverse relaxation time. A number of spectra with incremented values of 2 are recorded, and the change in crosspeak intensity with 2 can be fitted with Equation (7). A modification of the basic pulse sequence for this experiment has also been proposed,130 which is optimized for larger proteins where fast transverse relaxation is especially problematic. 3J(HNH) can also be estimated from the in-phase splitting of the HMQC-type crosspeaks in the 15N dimension in a 1H–15N HMQC-J experiment.132 Because the linewidths in this dimension are significantly narrower than in the homonuclear DQFCOSY, small splittings are resolvable even for relatively large proteins.132 The most popular heteronuclear method for measuring 3J(HNH) is the 3D HNHA experiment.133 In comparison with the J-modulated HSQC, the HNHA further alleviates spectral overlap by dispersing signals into a third frequency dimension that reports the chemical shift of H. In the 3D HNHA spectrum, 1HN–1H correlations are observed as crosspeaks with opposite phase to the diagonal 1HN–1HN peaks. The intensity ratio of the crosspeaks and diagonal peaks yields an accurate estimate of the 3J(HNH) coupling constant, namely, Icross =Idiagonal ¼ – tan2

3

J ðHN H Þ2

ð8Þ

where 2 is the length of the transfer period in the pulse sequence. This method is very efficient for small proteins and peptides but begins to fail for proteins larger than 10 kDa due to the unfavorable relaxation properties of H. The backbone dihedral angle is much less experimentally accessible than since the coupling constants related to this angle are typically small (almost zero in the case of 3J(NiNiþ1)). Thus, although a number of experiments such as the HCACO[N]–ECOSY134 have been designed to measure values of 3J(HNiþ1), they are rarely used. Instead, for isotopically labeled proteins, an alternative approach is available for estimation of both and backbone dihedral angles that completely obviates the need to measure coupling constants. This approach relies on the fact that the secondary chemical shifts of the 1H, 13C9, 13C, 13C, and amide nitrogen nuclei are conformation dependent135–137 and that these shifts have been determined as a matter of course during the sequence-specific assignment process. The TALOS program takes advantage of this relationship by taking chemical shift information for triplets of residues from the protein being studied and searching for the best match in a database of proteins for which both a high-resolution X-ray crystal structure and complete NMR chemical shift assignments are available.138 On average, TALOS provides confident estimates of the and dihedral angles for 70% of residues in a protein, and less than 2% of these predictions are likely to be incorrect. Hence, the TALOS-derived angle estimates are essentially ‘free’ structural information and they should be used whenever possible. The PREDITOR web-server (http://wishart.biology.ualberta.ca/preditor) provides a similar function; however, it is also capable of providing estimates of side chain 1 and backbone ! angles in favorable circumstances, and its accuracy can be improved by reference to the structures of homologous proteins if they are available in the Protein Data Bank (PDB).139 9.09.5.4

Side chain Dihedral Angles

The side chain dihedral angle 1 is important for high-resolution definition of protein structures as it determines the angle at which each amino acid side chain branches out from the protein backbone. Moreover, in combination with certain types of NOEs, it can allow stereospecific assignment of prochiral -methylene protons, which improves the precision of NMR structures by obviating the need to include pseudoatom distance corrections.140–142 1 can be inferred from the measurement of 1H-1H couplings, and it generally corresponds to one of the three possible staggered rotamers, where each atom attached to C is either in one of the two gauche positions relative to H or in the trans position (Figure 19). For unlabeled proteins, 1 determination relies on the measurement of 3J(HH), whereas for 15N-labeled proteins the measurement of 3J(NH) is often more useful.

Derivation of Peptide and Protein Structure using NMR Spectroscopy

311

φ (degrees) N

C′

N

C′

N

Cα Hβ 3

Hβ 2

Hβ 3

R

Hβ 2

Hβ 2

C′ Cα

R

Hβ 3

R

Rotamer

g2g3

g2t3

t2g3

χ1

60°

180°

–60°

3

J(HαHβ2)

<5 Hz

<5 Hz

>10 Hz

3J(H H α β3)

<5 Hz

>10 Hz

<5 Hz

3J(NH

β2)

∼5 Hz

∼1 Hz

∼1 Hz

3J(NH

β3)

∼1 Hz

∼1 Hz

∼5 Hz

NOE(HαHβ2)

Strong

Strong

Weak

NOE(HαHβ3)

Strong

Weak

Strong

NOE(HNHβ2)

Weak

Medium/strong

Strong

NOE(HNHβ3)

Medium/strong

Strong

Weak

Figure 19 Newman projections of the three possible staggered conformers about the 1 dihedral angle. The combination of 3J(NH) and 3J(HH) coupling constants can be used to define 1 and obtain stereospecific assignment of the -methylene protons. Alternatively, either the 3J(NH) or 3J(HH) coupling constants can be used in combination with the intensities of the HN–H and H–H NOEs to obtain this information.

For amino acids with a -methine proton (Val, Thr, Ile), the magnitude of 3J(HH) determines whether H is trans or gauche.143 For amino acids with a -methylene group, there are two relevant homonuclear couplings, 3J(HH2) and 3J(HH3). Figure 19 shows their values for each of the three staggered rotamers. For amino acids with a single H proton, a DQFCOSY spectrum recorded in D2O (to eliminate any multiplet structure arising from the 3J(HNH) coupling and to reduce loss of signals under the solvent peak) may be used to measure 3J(HH). However, for amino acids with a -methylene group, modified COSY experiments such as E.COSY144 or P.E.COSY,145 which simplify the multiplet structure of the H–H crosspeaks, are better suited to this task. These experiments rely on a splitting of the crosspeak of interest by a further large passive coupling; for the H-H2/3 system, the splitting is by the geminal 3J(HH) coupling (14 Hz). As a consequence, couplings can be measured between signals with substantially broader lines than is possible in a simple DQFCOSY.119 However, spectral overlap, exacerbated by broader lines, still thwarts these experiments for large proteins, and heteronuclear experiments are an attractive alternative. A modified 3D HCCH–TOCSY experiment, where proton decoupling is not applied during the 13C chemical shift evolution time, has been used for this purpose.146 2D planes showing 13C and 1H are taken through the indirect proton dimension, and the crosspeaks in the resulting 1H-13C correlation spectra have an E.COSY format. That is, the crosspeaks consist of two in-phase signals, separated in the 13C dimension by the passive 1J(CH) coupling and in the 1H dimension by 3J(HH) coupling, which can be measured directly. Alternatively, Clore et al.140 have shown that the size of the 3J(HH) coupling for each -proton is directly reflected in the intensity of HN ! H correlations in a 3D HOHAHA–HMQC experiment. That is, for the g2t3 and t2g3 conformations, only one HN ! H crosspeak is generally visible (or else one is much more intense), corresponding to the H with the large 3J(HH) coupling. For g2g3, both correlations are absent, while disordered side chains sampling some or all of the possible rotamers display two crosspeaks of similar intensity. For labeled proteins, the 3D HNHB experiment147 has become the most popular approach for determining 1 and for providing stereospecific assignment of prochiral -methylene protons. In contrast with the

312 Derivation of Peptide and Protein Structure using NMR Spectroscopy

experiments described above, it allows measurement of the three-bond heteronuclear coupling, 3J(NH), between the amide nitrogen and the H protons rather than the homonuclear 3J(HH) coupling; however, as for 3J(HH), the magnitude of this coupling is related to 1. The intensity of the H crosspeak in the HNHB spectrum is a reflection of the magnitude of 3J(NH); however, the original version of this experiment required an additional 2D reference spectrum to be acquired for proper quantification.147,148 Thus, it is more common to run a modified version of this experiment that allows the magnitude of 3J(NH) to be extracted directly from the ratio of the diagonal and crosspeak intensities in a single 3D HNHB spectrum according to the equation149 Icross =Idiag ¼ – tan2

3 J NH T

ð9Þ

where T is a fixed delay in the pulse sequence. More often than not, however, the HNHB is analyzed qualitatively, much as described above for the 3D HOHAHA–HMQC experiment. This is possible because 3 J(NH) is only 1 Hz or less when H is proximal to HN, leading to a weak or unobservable crosspeak. Thus, for the g2g3 and t2g3 conformations, one often observes only a single HNH crosspeak that corresponds to the H with the large 3J(NH) trans coupling (5 Hz). For the g2t3 rotamer, both crosspeaks are absent, while disordered side chains that sample some or all possible rotamers display two crosspeaks of similar intensity. Thus, for amino acids with prochiral -methylene protons, 1 must be either 60 or –60 if only one H crosspeak is visible, whereas both crosspeaks will be absent if 1 ¼ 180 . It is not possible to distinguish between the g2g3 and t2g3 rotamers on the basis of the 3J(NH) coupling alone (unless the identities of the pro-R and pro-S -methylene protons are already known). The 3J(HH) coupling has a similar ‘blindspot’ with respect to discriminating between the g2t3 and t2g3 rotamers. As outlined in Figure 19, a combination of 3J(NH) and 3J(HH) would allow resolution of the three possible rotamers, but these two couplings are rarely both available. One usually measures 3J(HH) from an ECOSY experiment if the protein is unlabeled, whereas 3J(HH) is typically determined using an HNHB experiment if 15N-labeled protein is available. Fortunately, by combining measurement of either the 3J(HH) or 3J(NH) coupling constant with knowledge of the relative intensities of the H-H and HN-H crosspeaks in the NOESY spectra, it is possible to determine 1 and stereospecifically assign the -methylene protons (see Figure 19). The strategy for making stereospecific assignments is best explained with an example. Let us imagine a 15 N/13C-labeled peptide containing a single Asn residue with magnetically inequivalent -methylene protons at 2.72 and 2.83 ppm. The HNHB reveals an intense H crosspeak at 2.83 ppm (3J(NH) 5 Hz) but no crosspeak at 2.72 ppm (3J(NH) <1 Hz). Thus, 1 must be 60 or 60 but at this stage we cannot tell which. Now let us imagine that the 15N-edited NOESY–HSQC spectrum reveals a very strong HN–H crosspeak for the H proton at 2.72 ppm but only a weak HN–H crosspeak for the H proton at 2.83 ppm. These NOE intensities are consistent with the assignment of 1 to 60 or 60 , but we still cannot distinguish between them. Finally, the 13C-edited NOESY–HSQC spectrum reveals a weak H–H crosspeak for the H proton at 2.72 ppm but an intense H–H crosspeak for the H proton at 2.83 ppm. Thus, 1 must be 60 , as both H protons would yield intense H–H crosspeaks if 1 was 60 (compare the Newman projections for the g2g3 and t2g3 rotamers in Figure 19). Moreover, if 1 ¼ 60 , then the H proton at 2.72 ppm with a small 3J(NH) coupling constant must be H2, while the H proton at 2.83 ppm with a large 3J(NH) coupling constant must be H3. Using this approach, one can typically obtain stereospecific assignments (and associated 1 values) for 50% or more of the pairs of -methylene protons in a peptide or small protein.

9.09.5.5

Hydrogen Bonds

Linus Pauling and Robert Corey surmised in the early 1950s (i.e., well before any protein structures had been experimentally determined) that the most stable protein folds would be those that maximized hydrogen bond formation, while still maintaining normal bond lengths and bond angles, and avoiding unfavorable steric overlap. They showed that there are only two polypeptide folds that adhere to this rule, and they christened them -helices and -sheets. The subsequent determination of over 56 000 protein structures using NMR spectroscopy and X-ray crystallography has confirmed that -helices and -sheets are indeed the major secondary structure elements in folded proteins. Thus, the experimental identification of hydrogen bonds

Derivation of Peptide and Protein Structure using NMR Spectroscopy

313

can be extremely helpful both for defining elements of secondary structure in folded proteins and for use as conformational restraints in protein structure calculations (see Chapter 9.03). Due to their small atomic mass, hydrogen atoms diffract X-rays poorly and hence they can only be resolved in crystal structures solved at extremely high resolution (<1.2 A˚); thus, hydrogen bonds are ‘inferred’ in most protein crystal structures. Similarly, in NMR studies, hydrogen bonds have historically been inferred from the presence of ‘slowly exchanging amide protons’. The rate of exchange of amide protons with solvent can be slowed by many orders of magnitude in folded proteins compared with unstructured peptides,150,151 and this slow exchange is largely due to the existence of hydrogen bonds involving the amide proton, often in regular elements of secondary structure. Amide-proton exchange rates are typically measured by monitoring the change in intensity with time of amide-proton crosspeaks in a 2D spectrum following dissolution of the protein in 100% D2O. Several types of 2D spectra can be used for this purpose, including TOSCY, COSY, or best of all, a 1H–15N HSQC spectrum if 15N-labeled protein is available. The advantages of the HSQC spectrum are that dispersion is generally better and good signal-to-noise can be achieved much faster than with homonuclear 2D experiments. These experiments are usually analyzed qualitatively, with an amide proton being declared as ‘slowly exchanging’ if its corresponding crosspeak is still apparent in the spectrum after a certain period of time following dissolution of the protein in D2O. However, this must be done with caution since, even in an unstructured peptide, there are intrinsic differences in the exchange rates for different types of amino acid residues. A more quantitative approach involves acquisition of a series of spectra following dissolution of the protein in D2O, after which a single exponential function can be fitted to the change in peak intensity with time in order to derive a pseudo-first-order rate constant for the exchange process. The rate constants for the exchange of each residue in an unstructured peptide152–154 can then be divided by the observed rate constants for that residue in the protein under investigation to give a so-called protection factor. Large protection factors (>1000), when observed for amide protons that exhibit NOEs and coupling constants characteristic of regular secondary structure, can be used to generate restraints for structure calculations. More recently, it has been realized that it is possible to use NMR to measure scalar couplings ‘across’ hydrogen bonds in both proteins155–159 and nucleic acids.160,161 (see Chapter 9.08). This has the dual advantage of providing direct proof for existence of the hydrogen bond while simultaneously revealing the identity of the donor and acceptor atoms. The h3JNN couplings in nucleic acids are relatively easy to access experimentally as they range from 2.5 to 11 Hz. (The notation hnJAB indicates a trans scalar coupling between nuclei A and B in which one of the n bonds is actually a hydrogen bond.)161 However, the through-hydrogen-bond scalar couplings in proteins are much smaller and therefore more difficult to measure: the typical ranges for h3JNC9 and h2JHC9 in proteins are 0.2 to 0.9 and 0.6 to 1.3 Hz, respectively.162,163 For peptides and small proteins (<10 kDa) that can be isotopically labeled, h3JNC9 couplings can be visualized using a ‘long-range HNCO’ experiment. This is essentially a conventional HNCO experiment (discussed in Section 9.09.4.3) in which the time for magnetization transfer from N ! C9 via INEPT is substantially increased in order to favor transfer via the small three-bond h3JNC9 coupling as opposed to the larger one-bond 1 JN(i)C9(i 1) coupling.162–164 The experiment can be acquired as a 3D HNCO or, more commonly, as a 2D H(N)CO, in which the chemical shift of the 15N nucleus is not recorded (see Figure 20). The large INEPT delays required for magnetization transfer via h3JNC9 means that this experiment becomes very inefficient for larger proteins with short T2 values. However, in these cases, a TROSY version of the experiment can be used with perdeuterated protein in which the amide deuterons have been converted to protons by exchange in H2O buffer.162–164 In favorable cases, this can extend the size range for this experiment to 30 kDa.165 Note that, as shown in Figure 20, the long INEPT transfer time in the long-range HNCO also allows observation of intraresidue N ! C9 correlations via 2JN(i)C9(i), which is of similar magnitude to h3JNC9 (i.e., 1 Hz).

9.09.6 Calculation of Structures from NMR Data 9.09.6.1

Overview

The final step in protein structure determination using NMR is to use a computer program that combines the NMR-derived conformational restraints with additional restraints resulting from the covalent structure of the protein (i.e., bond lengths and bond angles) in order to calculate a 3D structure that is consistent with all of these

314 Derivation of Peptide and Protein Structure using NMR Spectroscopy

HN5 HN18 C4 inter

I5 Hbond

HN25

174

S19 intra

175 K25 intra C16 Hbond

F24 inter

C18 inter C17 inter 9.70

176

chemical shift (ppm)

V33 Hbond

HN8 S7 inter

13C

HN19

177

C36 Hbond 9.50 1H

9.30 9.10 chemical shift (ppm)

8.90

Figure 20 Long-range 2D H(N)CO experiment acquired at 600 MHz using a 1 mmol l1 sample of the 37-residue spider toxin !-atracotoxin-Hv1a.127 The long INEPT transfer time allows observation of sequential interresidue correlations (inter) via 1JN(i)C(i 1) (15 Hz), intraresidue correlations (intra) via 2JN(i)C9(i) (1 Hz), and through-hydrogen-bond correlations (H-bond) via 3hJNC9 (1 Hz). For example, the 1HN nucleus of Lys25 shows a correlation to its own C9 as well as the C9 of its neighbor Phe24 and its hydrogen bond partner Val33.

restraints. The primary experimental restraints are interproton distances derived from NOESY crosspeak intensities (Section 9.09.5.2), dihedral angle restraints derived from either J coupling constants or database searches based on chemical shift information (Sections 9.09.5.3 and 9.09.5.4), and hydrogen bond restraints based on either measurement of amide-proton exchange rates or a long-range HNCO experiment (Section 9.09.5.5). Residual dipolar couplings can also be used to provide orientational restraints (see Chapter 9.07) but these are rarely used for peptides and small proteins that are the focus of this chapter. The direct use of chemical shifts in structure calculations has not yet become ‘mainstream’ but nevertheless this is a promising area of investigation that we discuss in Section 9.09.6.3.3. We shall first briefly consider how to parameterize NMR-derived conformational restraints and then examine the various types of computational approaches that can be used to derive protein structures based on NMR data.

9.09.6.2

Parameterization of NMR-Derived Conformational Restraints

The primary experimental aim in any protein structure determination via NMR is to collect enough conformational restraints so that the 3D structure can be uniquely reconstructed using a computer algorithm. In trying to parameterize the NMR-derived information about interproton distances, dihedral angles, and hydrogen bonds, it is important to remember that it is the quantity rather than the precision of the restraints that is important.34,122 Hence the parameterization should be conservative; overrestraining the distances and angle estimates is more likely to lead to errors than conservatively applied restraints. Our intention here is to provide a rough guide to such parameterization. Hydrogen bonds are usually parameterized using a pair of distance restraints, one between the amide proton and its acceptor carbonyl oxygen (typically 1.8–2.0 A˚), and the other between the amide nitrogen and the carbonyl oxygen (typically 2.7–3.0 A˚).166 However, hydrogen bonds in proteins can be considerably longer, as well as shorter,167–169 than implied by these restraints and hence they are overly restrictive. Thus, we recommend setting hydrogen bond restraints of 1.7–2.2 and 2.7–3.2 A˚ for the HNO and NO distances,

Derivation of Peptide and Protein Structure using NMR Spectroscopy

315

respectively. (If one is using the program CYANA114 to automatically generate hydrogen bond restraints, then these distances will need to be edited in the H-bond.cya macro.) The parameterization of dihedral angle restraints will depend on the source of the angle estimates. As discussed in Section 9.09.5.3, it is common practice in homonuclear NMR studies to restrain the backbone dihedral angle to 120 30 and 60 20 for 3J(HNH) >8 Hz and 3J(HNH) <6 Hz, respectively.127 However, more fine-grained angle estimates can be obtained in heteronuclear NMR studies when programs such as TALOS138 are used to match chemical shifts against a database. In these cases, each estimate of the and dihedral angles has an associated error, which is simply the standard deviation of the set of dihedral angles derived from the database matches. This error can range from as little as a few degrees to 50 or more. Our experience has shown that doubling the error estimates from TALOS (i.e., using an error of 2 rather than 1 standard deviation) produces reliable results and avoids over-restraining the structures.2,170 In both homonuclear and heteronuclear NMR studies, it is usual to restrain the side chain 1 dihedral angle to 20 or 30 around the preferred rotamer determined from analysis of coupling constants and NOEs as outlined in Section 9.09.5.4. Interproton distances are the dominant conformational restraints derived from NMR experiments, and hence their parameterization is important. As discussed in Section 9.09.5.2, it is still relatively common practice to ‘manually’ partition interproton distance restraints into broad categories such as 1.8–2.8, 1.8–3.5, and 1.8– 5.0 A˚ for strong, medium, and weak NOEs. However, this approach is not recommended since it provides only a very coarse-grained set of restraints and it is both time consuming and somewhat arbitrary. Regardless of whether restraints are to be derived from 2D or 3D NOESY spectra, it is better to integrate the crosspeaks and use a structure calculation program such as CYANA114 to automatically derive interproton distance estimates from these crosspeak intensities using a uniform-averaging-type model that uses an internal calibration and takes account of the type of NOE (i.e., backbone–backbone, backbone–side chain, or side chain–side chain). Adjustments to the derived parameterization can then be easily made if it is believed that the derived distances are too tight or too loose (although the latter is much less of a problem than the former). Finally, it is important to add pseudoatom corrections121 for pairs of methylene protons and Leu/Val methyl groups that have not been stereospecifically assigned. This can be done automatically by programs such as CYANA114 during the process of converting NOE intensities into interproton distance restraints. 9.09.6.3

Structure Calculation Methods

Although the first protein structure determined using NMR was reported in 1985,171 there is still no consensus method for deriving a 3D structure from NMR-derived conformational restraints. Indeed, a comprehensive overview of the variety of approaches and computer programs available for calculating protein structures using NMR-derived restraints is beyond the scope of this chapter. Rather, we will provide a brief overview of the two most commonly used structure calculation methods, torsion angle dynamics and simulated annealing. 9.09.6.3.1

Torsion angle dynamics Torsion angle dynamics (TAD) programs, such as DISMAN172 and its refined descendants DIANA,173 DYANA,174 and CYANA,114,175 operate by minimizing a variable target function in torsion angle space. The programs begin with a random 3D structure generated on the basis of the known amino acid sequence of the protein and standard bond lengths and angles. The starting structure is then refined by varying the torsion (dihedral) angles in order to minimize a variable target function that includes terms for the various types of experimental restraints. For example, the part of the target function (T) dealing with violations of upper distance bounds in DIANA is173 T ¼ wu

" X

Uu

dij2 – uij2 2uij

!#2 ð10Þ

where dij is the distance between atoms i and j in the current structure, uij is the upper bound on this distance, wu is a weighting factor for upper-bound violations, and Uu is the Heaviside (step) function, which equals 0 if dij uij or 1 if dij > uij. The target function contains similar terms for experimentally derived lower distance

316 Derivation of Peptide and Protein Structure using NMR Spectroscopy

bounds, a term for dihedral angle restraints, and a van der Waals’ repulsion term that places a lower limit on interatomic distances in order to avoid unfavorable steric clashes; the latter term is a ‘soft’ model for the more computationally expensive repulsive term in the Lennard-Jones 6–12 potential (see Equation (11)). The problem with trying to minimize the target function by introducing all restraints simultaneously is that the function will have many local minima. The variable target function approach was introduced by Braun and Go172 in an attempt to alleviate this problem. Instead of introducing all restraints simultaneously, one first optimizes using only local restraints (such as intraresidue and sequential restraints), and then introduces sequentially more long-range restraints until they have all been added into the calculation. This has the effect of optimizing the local conformation prior to determining the overall fold of the protein. Since torsion angles are the only independent variables in these calculations, TAD is less computationally intensive than the metric–matrix distance geometry approach176 that was popular in the early days of protein NMR but which has now fallen out of favor.

9.09.6.3.2

Dynamical simulated annealing Dynamical simulated annealing (DSA)177 is a variant of restrained molecular dynamics (RMD).178 There are numerous programs available for performing molecular dynamics (MD) simulations, including GROMOS,178 AMBER,179 CHARMM,180 X-PLOR/CNS,181 and OPLS.182 In MD simulations, Newton’s equations of motion are solved for all atoms under the influence of a physical force field (Vphysical), which for a protein has the form183 Vphysical ¼

X1 2 bonds þ

Kb ðb – b0 Þ2 þ

X1 K ð – 0 Þ2 2 angles

X 1 X K ð – 0 Þ2 þ Kj ½1 þ cosðnj – Þ2 2 improper dihedrals

ð11Þ

dihedrals

þ

X

C12 ði;f Þ=rif12 – C6 ði;j Þ=rij6 þ qi qj =4"0 "r rij

pairsði;j Þ

where the K terms are force constants. The first term is a harmonic potential representing covalent bond stretching along bond b; the force constant Kb and minimum-energy bond length b0 vary with the type of covalent bond. A similar term is used to describe bending of bond angles (). Two forms are used to describe distortions of dihedral angles: a harmonic term is used for dihedral angles that are not allowed to make transitions (e.g., dihedral angles within aromatic rings), whereas a cosinusoidal term is used for dihedral angles j that may make 360 turns. The final term is a sum over all pairs of nonbonded interatomic interactions: the first part sums the van der Waals interactions (a typical Lennard-Jones 6–12 potential) and the second part sums all electrostatic (Coulombic) interactions. There are numerous variants of this physical force field, including explicit inclusion of terms for hydrogen bonds. The general RMD strategy for refining protein structures based on NMR-derived restraints is to add restraining potentials to the force field so that the structure can be refined against both the covalent geometry and nonbonded interactions (i.e., Vphysical) as well as terms representing the experimentally derived distance (Vdistances) and dihedral angle (Vdihedral) restraints, namely, Vtotal ¼ Vphysical þ Vdistances þ Vdihedral

ð12Þ

A restrained molecular dynamics simulation using this expanded force field is performed using random or TAD-generated structures as the starting point for the simulation. The motion of the molecule is simulated for sufficient time to enable it to sample large regions of conformational space with a view toward converging on the structure with the global energy minimum or somewhere close to it by the end of the simulation. In the final stage of the RMD simulation, the structures are energy-minimized. DSA is similar to high-temperature RMD except that the nonbonded van der Waals’ and Coulombic terms in the force field (i.e., the last term in Equation (11)) are replaced with a simple quadratic van der Waals’ repulsive term (Vrepel) with repulsive force constant krepel. The first stage of the simulation is performed at very high temperature (1000 K is typical) with krepel set to a very low value so that atoms can move freely under the

Derivation of Peptide and Protein Structure using NMR Spectroscopy

317

influence of the experimental terms, even being allowed to pass ‘through’, or very near to, each other. The value of krepel is gradually incremented to reduce nonbonded contacts and then the temperature is lowered once krepel reaches its maximum value. Finally, the structures are energy-minimized in a full RMD-type force field (i.e., the Vrepel term is replaced by the full Lennard-Jones and Coulombic terms given in Equation (11)). The advantage of DSA over RMD is that the initial high temperature, combined with weak constraints on nonbonded interactions, enables the molecule to sample regions of conformational space that would be energetically inaccessible in classical room temperature RMD. Thus, the molecules are more likely to reach the global energy minimum corresponding to a structure with good covalent geometry, favorable nonbonded interactions, and minimal violations of the experimental constraints. Of course, one could use randomly generated initial conformations to calculate structures using RMD or DSA. However, the disadvantage of this approach is that much computational time is wasted in calculating atomic trajectories for structures that are far removed from those that will ultimately satisfy the experimental constraints. The structures derived via the comparatively rapid TAD approach, on the other hand, usually satisfy the large majority of experimental restraints, thus minimizing the amount of computationally intensive RMD/DSA that is necessary to produce final refined structures. Hence, it is fairly common nowadays to generate initial structures via TAD and then to refine these using DSA. Note, however, that the computational time for the DSA approach is reduced relative to RMD due to the simplified force field, and consequently many groups have found it profitable to generate final conformations from random starting structures using only DSA.

9.09.6.3.3

Chemical shifts as structural restraints The resonance frequencies, or chemical shifts, of various atoms are typically used only for assignment of individual atoms so that structural parameters specific to those atoms can be extracted from the NMR data. The chemical shift itself however is a measure of the local magnetic field experienced by the nucleus, which is dependent on

1. the electronic configuration of the atom itself, which is influenced by neighboring atoms (e.g., partial atomic charges, steric interactions, hydrogen bonding), 2. local magnetic fields due to anisotropic fields generated by nearby atoms (e.g., currents from aromatic rings and carbonyl groups), and 3. bulk solvent properties and direct intermolecular interactions. Thus, the chemical shift itself is an incredibly rich and precise source of information. Indeed, these shifts can be thought of as a unique fingerprint of the molecule under the conditions of the measurement. The obvious question that arises is why a 3D structure cannot be derived from chemical shifts. In theory, there is no reason why this should not be possible; however, in practice, the ab initio models required for accurately calculating the above contributions in a dynamic and complex system such as a protein, in particular when solvent effects are considered, are beyond our current computational capabilities. All is not lost however, as the fact that proteins consist of a limited number of residue types that are connected through the same repetitive bonding structure allows for derivation of vastly simplified empirical models for approximating the abovementioned complex contributions. The earliest models simply correlated the HN and H chemical shift to hydrogen bonding and secondary structure of the protein.136,137,184–189 For example, the H protons in -helices display a marked upfield shift relative to random coil peptides,34 while the opposite is true for residues in -sheets. Subsequent studies correlated the secondary chemical shift of 13C9, 13C, 13C, and amide nitrogen nuclei to various backbone torsion angles.135,136,190–198 Recently, by coupling this approach with some form of empirical Monte Carlo structure calculation protocol (force field or de novo structure determination), several investigators have been able to determine protein structures from chemical shift data alone.199–202 These approaches require only assignment of the backbone nuclei plus C in order to predict the 3D fold of the protein. Side chain assignments, which are generally more difficult to obtain, are not required. However, at present, these approaches cannot produce high-resolution protein structures that would be useful for applications such as structure-based drug design.

318 Derivation of Peptide and Protein Structure using NMR Spectroscopy

9.09.6.4

Assessing the Quality of Structures Derived from NMR Data

Since NMR-based structure calculation methods generate a family of structures that ‘satisfy’ the NMR-derived structural restraints, it has become common practice to assess the quality of structures by measuring the root mean squared deviation (RMSD) of individual structures from the mean structure. However, this is a measure of the precision of the structures rather than their accuracy. Moreover, the global RMSD is not a particularly good measure of precision as it does not discriminate between a structure that is poorly reproduced on average and one that is accurately reproduced except for a single ill-defined segment. Residue-by-residue, segment-bysegment, or domain-by-domain RMSD comparisons are often better indicators of precision, although they lack the visual impact of an overlay of an ensemble of structures based on minimization of the global RMSD. Measurement of the accuracy of NMR-derived structures is a much more difficult task than estimating their precision. An absolute measure of the accuracy of an NMR-derived structure is not possible in the absence of any knowledge about the ‘true’ structure and therefore it has to be measured by some statistic.203 One advantage of iterative relaxation matrix analysis (IRMA),204,205 in which the structure is iteratively refined by comparison of the experimental NOESY spectrum with a synthetic spectrum back-calculated from the coordinates of the current structural model, is that it enables an NMR ‘R factor’ to be calculated,203,205 which is analogous to the R factor (or reliability index) used in crystallography. However, IRMA is not widely used for structure calculations and hence NMR R factors are rarely reported. The most reliable indicator of the quality of an NMR-derived structure is its stereochemical merit as judged by programs such as PROCHECK-NMR,206,207 WHAT IF,208 and MolProbity.209 PROCHECK reports numerous measures of stereochemical merit, including Ramachandran plot quality, deviations of bond lengths, bond angles, and dihedral angles from ideality, unfavorable side chain rotamers, and bad nonbonded interactions. An added bonus is that the program cleans up the coordinate files for submission to the PDB by ensuring that the atom labels conform to IUPAC-IUB nomenclature and by performing some basic stereochemical checks on the file. MolProbity, which can be accessed online at http://molprobity.biochem.duke.edu, additionally offers all-atom contact analysis and more detailed Ramachandran and side chain rotamer analysis. It also provides an overall ‘MolProbity score’ that allows the structure to be ranked on a percentile basis against other structures in the PDB. A MolProbity score that caused a structure to be ranked in the 25th percentile or lower would be a cause for concern, and it should provoke a detailed analysis of the MolProbity output, including any bad steric clashes, poor rotamer distributions, or less than 80% of residues in the most favored region of the Ramachandran plot. A word of caution, however, is warranted when using these programs. In contrast with X-ray crystallography, where highly dynamic regions of the protein do not appear in the electron density maps and thus are omitted from the final coordinate file, all regions of the protein are modeled in NMR structure calculations. Highly dynamic regions of the protein, in which multiple conformations are accessed during the timescale of the NMR experiment, will have either a completely ill-defined conformation due to the lack of NOE information or an unrealistic one due to time- and population-weighted averaging of the NOEs and coupling constants. These regions of the protein are likely to have poor Ramachandran plot quality and bad side chain rotamer distributions, but these analyses are meaningless when applied to such mobile regions. Thus, if these regions are included in the PROCHECK or MolProbity analysis, they will reduce the overall stereochemical quality of the structural ensemble and may give a false indication of the quality of the well-structured region of the protein or peptide. Thus, these regions should be ‘omitted’ from the stereochemical analysis, just as they effectively are in the analysis of X-ray crystal structures.

9.09.7 Conclusions and Future Prospects NMR is unrivaled in its ability to provide structural information on peptides and small proteins, as evidenced by the fact that it accounts for 75% of the structures with mass less than 5 kDa deposited in the PDB. Moreover, it has the distinct advantage of also being able to provide information about protein dynamics and intermolecular interactions, as detailed elsewhere in this volume. Perhaps its major disadvantage relative to X-ray crystallographic approaches is that it is relatively slow and involves a great deal of user intervention.

Derivation of Peptide and Protein Structure using NMR Spectroscopy

319

Thus, much effort is currently being devoted to speeding up data acquisition and automating the process of spectral assignment and structure determination.210 There are numerous methods that have been developed for expediting data acquisition, including projection reconstruction,211,212 multiway decomposition,213 GFT NMR,214 and nonuniform sampling (NUS).63,215 We routinely use the last approach in our laboratory and have found that it can reduce the time for acquiring 3D triple resonance experiments such as the CBCA(CO)HN from a few days to a few hours. Although the raw data must be processed via maximum entropy reconstruction (MaxEnt)63,216,217 rather than a conventional Fourier transform, it yields conventional spectra that are amenable to either classical manual analysis or automated assignment approaches; for example, the spectra shown in Figure 16 were collected using NUS and processed using MaxEnt. A distinct advantage of these rapid data acquisition approaches is that they often allow data to be collected with higher digital resolution in the indirect dimensions, which facilitates automated spectral assignment. For example, using the NUS/MaxEnt approach, it takes less than 20 h to acquire a set of 2D HNCO, 3D CBCA(CO)HN, and 3D HNCACNB spectra that are of sufficient quality for the online PINE server (http://pine.nmrfam.wisc.edu)218 to routinely achieves 100% sequence-specific backbone assignment. Programs such as CYANA114,175,219 and ARIA220,221 also now provide the ability to automatically assign NOESY spectra and calculate structures, which dramatically improves the speed of the NMR structure determination process since, particularly for homonuclear NMR, much more time is usually spent analyzing the data than collecting the data. In many ways, the CANDID module222 employed in CYANA mimics the iterative manual approach: a set of initial NOESY assignments are made based on various criteria (including the possibility of a crosspeak being assigned to multiple NOEs), a structure is calculated based on these assignments plus any dihedral angle and hydrogen bond information input to the program, the NOEs are adjusted based on the initial set of structures, and then the process is repeated through a cycle of seven iterations to produce the final ensemble of structures. In contrast with the manual approach, which can take weeks or even months, the automated process performed by CYANA takes 40 min on a laptop for a peptide of 5 kDa, or just a few minutes on even a modest server. We strongly recommend this approach if the speed of structure determination is a concern. In conclusion, while NMR remains the dominant technique for determining the structures of peptides and small proteins, there are numerous developments that promise to dramatically improve the rate at which a protein structure can be determined using NMR. We predict that it will not be long before peptide/protein structure determination within 1 week becomes relatively routine.

Acknowledgments Peptide studies reported from this laboratory were supported by Australian Research Council Discovery Grant DP0774245 to G.F.K. We thank Dr. Scott Robson for proofreading an earlier version of this chapter and for making numerous helpful suggestions.

Abbreviations 2D 3D AUC BMRB CC COSY DQFCOSY DSA DTT HMBC HMQC HOHAHA

two-dimensional three-dimensional analytical ultracentrifugate Biological Magnetic Resonance Data Bank cryogenically cooled correlated spectroscopy double-quantum-filtered correlated spectroscopy dynamical simulated annealing dithiothreitol heteronuclear multiple bond correlation heteronuclear multiple quantum coherence homonuclear Hartmann–Hahn spectroscopy

320 Derivation of Peptide and Protein Structure using NMR Spectroscopy

HSQC INEPT IRMA ISPA MALLS MaxEnt MD NMR NOE NOESY NUS PFG PFGSE RDC RMD RMSD SNR TAD TCEP TOCSY

heteronuclear single quantum coherence insensitive nuclei enhanced by polarization transfer iterative relaxation matrix analysis isolated spin-pair approximation multiangle laser light scattering maximum entropy reconstruction molecular dynamics nuclear magnetic resonance nuclear Overhauser enhancement nuclear Overhauser enhancement spectroscopy nonuniform sampling pulsed-field gradient pulsed-field-gradient spin-echo residual dipolar coupling restrained molecular dynamics root mean squared deviation signal-to-noise ratio torsion angle dynamics tris(2-carboxy-ethyl)phosphine total correlation spectroscopy

Nomenclature T1 T2 c

longitudinal relaxation time transverse relaxation time gyromagnetic ratio molecular correlation time

References 1. G. J. Howlett; A. P. Minton; G. Rivas, Curr. Opin. Chem. Biol. 2006, 10, 430–436. 2. S. L. Rowland; W. F. Burkholder; K. A. Cunningham; M. W. Maciejewski; A. D. Grossman; G. F. King, Mol. Cell 2004, 13, 689–701. 3. A. J. Dingley; J. P. Mackay; B. E. Chapman; M. B. Morris; P. W. Kuchel; B. D. Hambly; G. F. King, J. Biomol. NMR 1995, 6, 321–328. 4. G. Otting, J. Biomol. NMR 2008, 42, 1–9. 5. W. U. Primrose, Sample Preparation. In NMR of Macromolecules; G. C. K. Roberts, Ed.; IRL Press: Oxford, 1993. 6. Y. Bai; J. S. Milne; L. Mayne; S. W. Englander, Proteins Struct. Funct. Genet. 1993, 17, 75–86. 7. A. E. Kelly; H. D. Ou; R. Withers; V. Dotsch, J. Am. Chem. Soc. 2002, 124, 12013–12019. 8. B. Pan; Z. Deng; D. Liu; S. Ghosh; G. P. Mullen, Protein Sci. 1997, 6, 1237–1247. 9. A. P. Golovanov; G. M. Hautbergue; S. A. Wilson; L. Y. Lian, J. Am. Chem. Soc. 2004, 126, 8933–8939. 10. G. M. Hautbergue; A. P. Golovanov, J. Magn. Reson. 2008, 191, 335–339. 11. W. R. Croasmun; R. M. K. Carlson, Two-Dimensional NMR Spectroscopy: Applications for Chemists and Biochemists, 2nd ed.; VCH Publishers: New York, 1994. 12. W. S. Price, Annu. Rep. NMR Spectrosc. 1999, 38, 289–354. 13. M. H. Levitt, Concept Magn. Reson. A 1996, 8, 77–103. 14. P. Luginbuhl; K. Wu¨thrich, Prog. Nucl. Magn. Reson. Spectrosc. 2002, 40, 199–247. 15. N. Bloembergen; R. V. Pound, Phys. Rev. 1954, 95, 8–12. 16. X. Mao; C. Ye, Sci. China C Life Sci. 1997, 40, 345–350. 17. V. Sklena´rˇ, J. Magn. Reson. A 1995, 114, 132–135.

Derivation of Peptide and Protein Structure using NMR Spectroscopy 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 43. 44. 45. 46. 47. 48. 49. 50. 51. 52. 53. 54. 55. 56. 57. 58. 59. 60. 61. 62. 63. 64. 65. 66. 67. 68. 69. 70. 71. 72. 73. 74. 75. 76. 77. 78. 79. 80. 81. 82.

321

D. Abergel; C. Carlotti; J. Magn. Reson. B 1995, 109, 218–222. C. Anklin; M. Rindlisbacher; G. Otting; F. H. Laukien, J. Magn. Reson. B 1995, 106, 199–201. S. Y. Huang; C. Anklin; J. D. Walls; Y. Y. Lin, J. Am. Chem. Soc. 2004, 126, 15936–15937. D. Neuhaus; I. M. Ismail; C. W. A. Chung, J. Magn. Reson. A 1996, 118, 256–263. P. J. Hore, J. Magn. Reson. 1983, 55, 283–300. M. Piotto; V. Saudek; V. Sklena´rˇ, J. Biomol. NMR 1992, 2, 661–665. S. Grzesiek; A. Bax, J. Am. Chem. Soc. 1993, 115, 12593–12594. J. Cavanagh; W. J. Fairbrother; A. G. Palmer, III; N. J. M. Skelton, Protein NMR Spectroscopy: Principles and Practice; Academic Press: San Diego, CA, 1996. H. Barkhuijsen; R. de Beer; W. M. M. J. Bove´e; D. van Ormondt, J. Magn. Reson. 1985, 61, 465–481. R. E. Hurd, J. Magn. Reson. 1990, 87, 422–428. G. Wider; K. Wu¨thrich, J. Magn. Reson. B 1993, 102, 239–241. D. Marion; M. Ikura; A. Bax, J. Magn. Reson. 1989, 84, 425–430. V. Sklena´rˇ; M. Piotto; R. Leppik; V. Saudek, J. Magn. Reson. A 1993, 102, 241–245. E. Prost; P. Sizun; M. Piotto; J.-M. Nuzillard, J. Magn. Reson. 2002, 159, 76–81. A. J. Simpson; S. A. Brown, J. Magn. Reson. 2005, 175, 340–346. B. D. Nguyen; X. Meng; K. J. Donovan; A. J. Shaka, J. Magn. Reson. 2007, 184, 263–274. K. Wu¨thrich, NMR of Proteins and Nucleic Acids; John Wiley & Sons: New York, 1986. R. R. Ernst; G. Bodenhausen; A. Wokaun, Principles of Nuclear Magnetic Resonance in One and Two Dimensions; Clarendon Press: Oxford, 1987. U. Piantini; O. W. Sorensen; R. R. Ernst, J. Am. Chem. Soc. 1982, 104, 6800–6801. J. Jeener; B. H. Meier; P. Bachmann; R. R. Ernst, J. Chem. Phys. 1979, 71, 4546–4553. A. Kumar; R. R. Ernst; K. Wu¨thrich, Biochem. Biophys. Res. Commun. 1980, 95, 1–6. L. Braunschweiler; R. R. Ernst, J. Magn. Reson. 1983, 53, 521–528. D. G. Davis; A. Bax, J. Am. Chem. Soc. 1985, 107, 2820–2821. E. R. P. Zuiderweg; S. R. Van Doren, Trends Analyt. Chem. 1994, 13, 24–36. G. C. K. Roberts, NMR of Macromolecules: A Practical Approach; Oxford University Press: New York, 1993. J. Cavanagh; W. J. Chazin; M. Rance, J. Magn. Reson. 1990, 87, 110–131. X.-H. Wang; M. Connor; R. Smith; M. W. Maciejewski; M. E. H. Howden; G. M. Nicholson; M. J. Christie; G. F. King, Nat. Struct. Biol. 2000, 7, 505–513. A. Bax; D. G. Davis, J. Magn. Reson. 1985, 63, 207–213. S. W. Englander; A. J. Wand, Biochemistry 1987, 26, 5953–5958. G. W. Vuister; R. Boelens; R. Kaptein, J. Magn. Reson. 1988, 80, 176–185. S. S. Wijmenga; C. P. M. Mierlo, Eur. J. Biochem. 1991, 195, 807–822. R. Boelens; G. W. Vuister; T. M. G. Koning; R. Kaptein, J. Am. Chem. Soc. 1989, 111, 8525–8526. G. W. Vuister; R. Boelens; A. Padilla; G. J. Kleywegt; R. Kaptein, Biochemistry 1990, 29, 1829–1839. S. Meier; D. Haussinger; E. Pokidysheva; H. P. Bachinger; S. Grzesiek, FEBS Lett. 2004, 569, 112–116. G. Bodenhausen; D. J. Ruben, Chem. Phys. Lett. 1980, 69, 185–189. G. A. Morris; R. Freeman, J. Am. Chem. Soc. 1979, 101, 760–762. A. Bax; R. H. Griffey; B. L. Hawkins, J. Am. Chem. Soc. 1983, 105, 7188–7190. T. J. Norwood; J. Boyd; J. E. Heritage; N. Soffe; I. D. Campbell, J. Magn. Reson. 1990, 87, 488–501. A. Bax; M. Ikura; L. E. Kay; D. A. Torchia; R. Tschudin, J. Magn. Reson. 1990, 86, 304–318. A. D. Bax; S. Grzesiek, Acc. Chem. Res. 1993, 26, 1–138. S. Grzesiek; A. Bax, J. Magn. Reson. 1992, 96, 432–440. R. Powers; A. M. Gronenborn; G. M. Clore; A. Bax, J. Magn. Reson. 1991, 94, 209–213. L. E. Kay, Curr. Opin. Struct. Biol. 1995, 5, 674–681. A. G. Palmer, III; J. Cavanagh; P. E. Wright; M. Rance, J. Magn. Reson. 1991, 93, 151–170. J. Cavanagh; M. Rance, J. Magn. Reson. 1990, 88, 72–85. M. Mobli; J. C. Hoch, Concept Magn. Reson. A 2008, 32, 436–448. T. Luan; V. Jaravine; A. Yee; C. Arrowsmith; V. Orekhov, J. Biomol. NMR 2005, 33, 1–14. K. Pervushin, Q. Rev. Biophys. 2000, 33, 161–197. E. R. P. Zuiderweg; S. W. Fesik, Biochemistry 1989, 28, 2387–2391. D. Marion; L. E. Kay; S. W. Sparks; D. A. Torchia; A. Bax, J. Am. Chem. Soc. 1989, 111, 1515–1517. S. W. Fesik; E. R. P. Zuiderweg, J. Magn. Reson. 1988, 78, 588–593. D. Marion; P. C. Driscoll; L. E. Kay; P. T. Wingfield; A. Bax; A. M. Gronenborn; G. M. Clore, Biochemistry 1989, 28, 6150–6156. S. W. Fesik; E. R. P. Zuiderweg, Q. Rev. Biophys. 1990, 23, 97–131. A. M. Gronenborn; A. Bax; P. T. Wingfield; G. M. Clore, FEBS Lett. 1989, 243, 93–98. M. Ikura; L. E. Kay; R. Tschudin; A. Bax, J. Magn. Reson. 1990, 86, 204–209. M. R. Bendall; D. T. Pegg; D. M. Doddrell; J. Field, J. Am. Chem. Soc. 1981, 103, 934–936. R. Freeman; T. H. Mareci; G. A. Morris, J. Magn. Reson. 1981, 42, 341–345. A. P. Golovanov; R. T. Blankley; J. M. Avis; W. Bermel, J. Am. Chem. Soc. 2007, 129, 6528–6535. Y. Muto; K. Yamasaki; Y. Ito; S. Yajima; H. Masaki; T. Uozumi; M. Wa¨lchli; S. Nishimura; T. Miyazawa; S. Yokoyama, J. Biomol. NMR 1993, 3, 165–184. L. P. McIntosh; A. J. Wand; D. F. Lowry; A. G. Redfield; F. W. Dahlquist, Biochemistry 1990, 29, 6341–6362. V. Kanelis; J. D. Forman-Kay; L. E. Kay, IUBMB Life 2001, 52, 291–302. A. E. Derome, Modern NMR Techniques for Chemistry Research; Pergamon Press: Oxford, 1987. R. T. Clubb; V. Thanabal; G. Wagner, J. Magn. Reson. 1992, 97, 213–217. H. Kessler; S. Mronga; G. Gemmecker, Magn. Reson. Chem. 1991, 29, 527–557. L. E. Kay; G. M. Clore; A. Bax; A. M. Gronenborn, Science 1990, 249, 411–414.

322 Derivation of Peptide and Protein Structure using NMR Spectroscopy 83. 84. 85. 86. 87. 88. 89. 90. 91. 92. 93. 94. 95. 96. 97. 98. 99. 100. 101. 102. 103. 104. 105. 106. 107. 108. 109. 110. 111. 112. 113. 114. 115. 116. 117. 118. 119. 120. 121. 122. 123. 124. 125. 126. 127. 128. 129. 130. 131. 132. 133. 134. 135. 136. 137. 138. 139. 140. 141. 142. 143. 144. 145.

M. Ikura; L. E. Kay; A. Bax, Biochemistry 1990, 29, 4659–4667. M. Sattler; J. Schleucher; C. Griesinger, Prog. Nucl. Magn. Reson. Spectrosc. 1999, 34, 93–158. A. Bax; M. Ikura, J. Biomol. NMR 1991, 1, 99–104. R. T. Clubb; V. Thanabal; G. Wagner, J. Biomol. NMR 1992, 2, 203–210. L. E. Kay; M. Ikura; A. Bax, J. Magn. Reson. 1991, 91, 84–92. L. E. Kay; M. Wittekind; M. A. McCoy; M. S. Friedrichs; L. Mueller, J. Magn. Reson. 1992, 98, 443–450. W. Boucher; E. D. Laue; S. L. Campbell-Burk; P. J. Domaille, J. Biomol. NMR 1992, 2, 631–637. S. Grzesiek; A. Bax, J. Magn. Reson. 1992, 99, 201–207. E. T. Olejniczak; R. X. Xu; A. M. Petros; S. W. Fesik, J. Magn. Reson. 1992, 100, 444–450. M. Wittekind; L. Mueller, J. Magn. Reson. B 1993, 101, 201–205. S. Grzesiek; A. Bax, J. Am. Chem. Soc. 1992, 114, 6291–6293. S. Grzesiek; A. Bax, J. Biomol. NMR 1993, 3, 185–204. S. Grzesiek; A. Bax, J. Magn. Reson. B 1993, 102, 103–106. A. Palmer; W. Fairbrother; J. Cavanagh; P. E. Wright; M. Rance, J. Biomol. NMR 1992, 2, 103–108. S. Seip; J. Balbach; S. Behrens; H. Kessler; K. Flukiger; De R. Meyer; B. Erni, Biochemistry 1994, 33, 7174–7183. M. L. Remerowski; T. Domke; A. Groenewegen; H. A. M. Pepermans; C. W. Hilbers; F. J. M. Ven, J. Biomol. NMR 1994, 4, 257–278. T. M. Logan; E. T. Olejniczak; R. X. Xu; S. W. Fesik, FEBS Lett. 1992, 314, 413–418. G. T. Montelione; B. A. Lyons; S. D. Emerson; M. Tashiro, J. Am. Chem. Soc. 1992, 114, 10974–10975. B. A. Lyons; G. T. Montelione, J. Magn. Reson. B 1993, 101, 206–209. S. W. Fesik; H. L. Eaton; E. T. Olejniczak; E. R. P. Zuiderweg; L. P. McIntosh; F. W. Dahlquist, J. Am. Chem. Soc. 1990, 112, 886–888. A. Bax; G. M. Clore; A. M. Gronenborn, J. Magn. Reson. 1990, 88, 425–431. L. E. Kay; M. Ikura; A. Bax, J. Am. Chem. Soc. 1990, 112, 888–889. A. Majumdar; H. Wang; R. C. Morshauser; E. R. P. Zuiderweg, J. Biomol. NMR 1993, 3, 387–397. E. T. Olejniczak; R. X. Xu; S. W. Fesik, J. Biomol. NMR 1992, 2, 655–659. T. Yamazaki; J. D. Forman-Kay; L. E. Kay, J. Am. Chem. Soc. 1993, 115, 11054–11055. B. J. Stockman; N. R. Nirmala; G. Wagner; T. J. Delcamp; M. T. DeYarman; J. H. Freisheim, Biochemistry 1992, 31, 218–229. C. Redfield; L. J. Smith; J. Boyd; G. M. P. Lawrence; R. G. Edwards; R. A. G. Smith; C. M. Dobson, Biochemistry 1991, 30, 11029–11035. J. Anglister; S. Grzesiek; H. Ren; C. B. Klee; A. Bax, J. Biomol. NMR 1993, 3, 121–126. G. M. Clore; A. M. Gronenborn, Crit. Rev. Biochem. Mol. Biol. 1989, 24, 479–564. C. Eccles; P. Gu¨ntert; M. Billeter; K. Wu¨thrich, J. Biomol. NMR 1991, 1, 111–130. P. Guntert; K. D. Berndt; K. Wu¨thrich, J. Biomol. NMR 1993, 3, 601–606. P. Gu¨ntert, Methods Mol. Biol. 2004, 278, 353–378. G. M. Clore; L. E. Kay; A. Bax; A. M. Gronenborn, Biochemistry 1991, 30, 12–18. P. J. Kraulis; P. J. Domaille; S. L. Campbell-Burk; T. Van Aken; E. D. Laue, Biochemistry 1994, 33, 3515–3531. M. J. M. Burgering; R. Boelens; D. E. Gilbert; J. N. Breg; K. L. Knight; R. T. Sauer; R. Kaptein, Biochemistry 1994, 33, 15036–15045. A. M. Gronenborn; G. M. Clore, Prog. Nucl. Magn. Reson. Spectrosc. 1985, 17, 1–32. I. L. Barsukov; L.-Y. Lian, Structure Determination from NMR Data I. In NMR of Macromolecules; G. C. K. Roberts, Ed.; Oxford University Press: New York, 1993; pp 315–357. I. D. Kuntz; J. F. Thomason; C. M. Oshiro, Methods Enzymol. 1989, 177, 159–204. K. Wu¨thrich; M. Billeter; W. Braun, J. Mol. Biol. 1983, 169, 949–961. G. M. Clore; M. A. Robien; A. M. Gronenborn, J. Mol. Biol. 1993, 231, 82–102. ; K. Wu¨thrich, Biochim. Biophys. Acta 1981, 667, 377–396. W. Braun; C. Bo¨sch; L. R. Brown; N. Go M. Karplus, J. Am. Chem. Soc. 1963, 85, 2870–2871. A. Pardi; M. Billeter; K. Wu¨thrich, J. Mol. Biol. 1984, 180, 741–751. J. S. Richardson, Adv. Protein Chem. 1981, 34, 167–339. J. I. Fletcher; R. Smith; S. I. O’Donoghue; M. Nilges; M. Connor; M. E. H. Howden; M. J. Christie; G. F. King, Nat. Struct. Biol. 1997, 4, 559–566. D. Marion; K. Wu¨thrich, Biochem. Biophys. Res. Commun. 1983, 113, 967–974. D. Neuhaus; G. Wagner; M. Vasak; J. Ka¨gi; K. Wu¨thrich, Eur. J. Biochem. 1985, 151, 257–273. M. Billeter; D. Neri; G. Otting; Y. Q. Qian; K. Wu¨thrich, J. Biomol. NMR 1992, 2, 257–274. D. Neri; G. Otting; K. Wu¨thrich, J. Am. Chem. Soc. 1990, 112, 3663–3665. L. E. Kay; A. Bax, J. Magn. Reson. 1990, 86, 110–126. G. W. Vuister; A. Bax, J. Am. Chem. Soc. 1993, 115, 7772–7777. A. C. Wang; A. Bax, J. Am. Chem. Soc. 1995, 117, 1810–1813. S. Spera; A. Bax, J. Am. Chem. Soc. 1991, 113, 5490–5492. D. S. Wishart; B. D. Sykes; F. M. Richards, J. Mol. Biol. 1991, 222, 311–333. D. S. Wishart; B. D. Sykes, J. Biomol. NMR 1994, 4, 171–180. G. Cornilescu; F. Delaglio; A. Bax, J. Biomol. NMR 1999, 13, 289–302. M. V. Berjanskii; S. Neal; D. S. Wishart, Nucleic Acids Res. 2006, 34, 63–69. G. M. Clore; A. Bax; A. M. Gronenborn, J. Biomol. NMR 1991, 1, 13–22. J. D. Forman-Kay; G. M. Clore; P. T. Wingfield; A. M. Gronenborn, Biochemistry 1991, 30, 2685–2698. H. J. Dyson; G. P. Gippert; D. A. Case; A. Holmgren; P. E. Wright, Biochemistry 1990, 29, 4129–4136. A. Demarco; M. Llinas; K. Wu¨thrich, Biopolymers 1978, 17, 637–650. C. Griesinger; O. W. Soerensen; R. R. Ernst, J. Am. Chem. Soc. 1985, 107, 6394–6396. L. Mueller, J. Magn. Reson. 1987, 72, 191–196.

Derivation of Peptide and Protein Structure using NMR Spectroscopy 146. 147. 148. 149. 150. 151. 152. 153. 154. 155. 156. 157. 158. 159. 160. 161. 162. 163. 164. 165. 166. 167. 168. 169. 170. 171. 172. 173. 174. 175. 176. 177. 178. 179. 180. 181. 182. 183. 184. 185. 186. 187. 188. 189. 190. 191. 192. 193. 194. 195. 196. 197. 198. 199. 200. 201. 202. 203. 204. 205. 206. 207.

323

A. Bax; D. Max; D. Zax, J. Am. Chem. Soc. 1992, 114, 6923–6925. S. J. Archer; M. Ikura; D. A. Torchia; A. Bax, J. Magn. Reson. 1991, 95, 636–641. A. Bax; G. W. Vuister; S. Grzesiek; F. Delaglio; A. C. Wang; R. Tschudin; G. Zhu, Methods Enzymol. 1994, 239, 79–105. P. Du¨x; B. Whitehead; R. Boelens; R. Kaptein; G. W. Vuister, J. Biomol. NMR 1997, 10, 301–306. A. Hvidt; S. O. Nielsen, Adv. Protein Chem. 1966, 21, 287–386. S. W. Englander; N. R. Kallenbach, Q. Rev. Biophys. 1983, 16, 521–655. F. K. Junius; J. P. Mackay; W. A. Bubb; S. A. Jensen; A. S. Weiss; G. F. King, Biochemistry 1995, 34, 6164–6174. S. Grzesiek; H. Doebeli; R. Gentz; G. Garotta; A. M. Labhardt; A. Bax, Biochemistry 1992, 31, 8180–8190. R. S. Molday; S. W. Englander; R. G. Kallen, Biochemistry 1972, 11, 150–158. P. R. Blake; B. Lee; M. F. Summers; M. W. Adams; J. B. Park; Z. H. Zhou; A. Bax, J. Biomol. NMR 1992, 2, 527–533. P. R. Blake; J. B. Park; M. W. W. Adams; M. F. Summers, J. Am. Chem. Soc. 1992, 114, 4931–4933. F. Cordier; S. Grzesiek, J. Am. Chem. Soc. 1999, 121, 1601–1602. F. Cordier; M. Rogowski; S. Grzesiek; A. Bax, J. Magn. Reson. 1999, 140, 510–512. G. Cornilescu; J.-S. Hu; A. Bax, J. Am. Chem. Soc. 1999, 121, 2949–2950. A. J. Dingley; S. Grzesiek, J. Am. Chem. Soc. 1998, 120, 8293–8297. K. Pervushin; A. Ono; C. Ferna´ndez; T. Szyperski; M. Kainosho; K. Wu¨thrich, Proc. Natl. Acad. Sci. U.S.A. 1998, 95, 14147–14151. S. Grzesiek; F. Cordier; A. J. Dingley, Methods Enzymol. 2001, 338, 111–133. A. J. Dingley; F. Cordier; V. A. Jaravine; S. Grzesiek, Scalar Couplings Across Hydrogen Bonds. In BioNMR in Drug Research; O. Zerbe, Ed.; Wiley-VCH: Weinheim, 2003; pp 207–226. F. Cordier; L. Nisius; A. J. Dingley; S. Grzesiek, Nat. Protoc. 2008, 3, 235–241. Y. X. Wang; J. Jacob; F. Cordier; P. Wingfield; S. J. Stahl; S. Lee-Huang; D. Torchia; S. Grzesiek; A. Bax, J. Biomol. NMR 1999, 14, 181–184. G. Wagner; W. Braun; T. F. Havel; T. Schaumann; N. Go˜; K. Wu¨thrich, J. Mol. Biol. 1987, 196, 611–639. E. N. Baker; R. E. Hubbard, Prog. Biophys. Mol. Biol. 1984, 44, 97–179. I. K. McDonald; J. M. Thornton, J. Mol. Biol. 1994, 238, 777–793. R. S. Lipsitz; Y. Sharma; B. R. Brooks; N. Tjandra, J. Am. Chem. Soc. 2002, 124, 10621–10626. V. Y. Gorbatyuk; N. J. Nosworthy; S. A. Robson; N. P. S. Bains; M. W. Maciejewski; C. G. dos Remedios; G. F. King, Mol. Cell 2006, 24, 511–522. M. P. Williamson; T. F. Havel; K. Wu¨thrich, J. Mol. Biol. 1985, 182, 295–315. , J. Mol. Biol. 1985, 186, 611–626. W. Braun; N. Go P. Gu¨ntert; W. Braun; K. Wu¨thrich, J. Mol. Biol. 1991, 217, 517–530. P. Gu¨ntert; C. Mumenthaler; K. Wu¨thrich, J. Mol. Biol. 1997, 273, 283–298. P. Gu¨ntert, Prog. Nucl. Magn. Reson. Spectrosc. 2003, 43, 105–125. T. F. Havel; K. Wu¨thrich, Bull. Math. Biol. 1984, 46, 673–698. M. Nilges; G. M. Clore; A. M. Gronenborn, FEBS Lett. 1988, 229, 317–324. R. M. Scheek; W. F. van Gunsteren; R. Kaptein, Methods Enzymol. 1989, 177, 204–218. W. D. Cornell; P. Cieplak; C. I. Bayly; I. R. Gould; K. M. J. Merz; D. M. Ferguson; D. C. Spellmeyer; T. Fox; J. W. Caldwell; P. A. Kollman, J. Am. Chem. Soc. 1995, 117, 5179–5197. B. R. Brooks; R. E. Bruccoleri; B. D. Olafson; D. J. States; S. Swaminathan; M. Karplus, J. Comput. Chem. 1983, 4, 187–217. A. T. Brunger, Nat. Protoc. 2007, 2, 2728–2733. W. L. Jorgensen; J. Tirado-Rives, J. Am. Chem. Soc. 1988, 110, 1657–1666. W. F. van Gunsteren; H. J. C. Berendsen, Angew. Chem. Int. Ed. Engl. 1990, 29, 992–1023. A. Pardi; G. Wagner; K. Wu¨thrich, Eur. J. Biochem. 1983, 137, 445–454. G. Wagner; A. Pardi; K. Wu¨thrich, J. Am. Chem. Soc. 1983, 105, 5948–5949. A. Pastore; V. Saudek, J. Magn. Reson. 1990, 90, 165–176. M. P. Williamson, Biopolymers 1990, 29, 1423–1431. ¨ sapay; D. A. Case, J. Am. Chem. Soc. 1991, 113, 9436–9444. K. O L. Szilagyi, Prog. Nucl. Magn. Reson. Spectrosc. 1995, 27, 325–443. I. Ando; H. Saito; R. Tabeta; A. Shoji; T. Ozaki, Macromolecules 1984, 17, 457–461. H. Saito, Magn. Reson. Chem. 1986, 24, 835–852. J. Glushka; M. Lee; S. Coffin; D. Cowburn, J. Am. Chem. Soc. 1989, 111, 7716–7722. A. C. De Dios; J. G. Pearson; E. Oldfield, Science 1993, 260, 1491–1496. H. Le; E. Oldfield, J. Biomol. NMR 1994, 4, 341–348. M. Iwadate; T. Asakura; M. P. Williamson, J. Biomol. NMR 1999, 13, 199–211. D. S. Wishart; D. A. Case, Methods Enzymol. 2001, 338, 3–34. S. Neal; A. M. Nip; H. Zhang; D. S. Wishart, J. Biomol. NMR 2003, 26, 215–240. Y. Wang; O. Jardetzky, J. Biomol. NMR 2004, 28, 327–340. A. Cavalli; X. Salvatella; C. M. Dobson; M. Vendruscolo, Proc. Natl. Acad. Sci. U.S.A. 2007, 104, 9615–9620. Y. Shen; O. Lange; F. Delaglio; P. Rossi; J. M. Aramini; G. Liu; A. Eletsky; Y. Wu; K. K. Singarapu; A. Lemak; A. Ignatchenko; C. H. Arrowsmith; T. Szyperski; G. T. Montelione; D. Baker; A. Bax, Proc. Natl. Acad. Sci. U.S.A. 2008, 105, 4685–4690. D. S. Wishart; D. Arndt; M. Berjanskii; P. Tang; J. Zhou; G. Lin, Nucleic Acids Res. 2008, 36, W496–W502. J. A. Vila; J. M. Aramini; P. Rossi; A. Kuzin; M. Su; J. Seetharaman; R. Xiao; L. Tong; G. T. Montelione; H. A. Scheraga, Proc. Natl. Acad. Sci. U.S.A. 2008, 105, 14389–14394. A. T. Brunger; G. M. Clore; A. M. Gronenborn; R. Saffrich; M. Nilges, Science 1993, 261, 328–331. B. A. Borgias; T. L. James, Methods Enzymol. 1989, 176, 169–183. P. D. Thomas; V. J. Basus; T. L. James, Proc. Natl. Acad. Sci. U.S.A. 1991, 88, 1237–1241. R. A. Laskowski; M. W. MacArthur; D. S. Moss; J. M. Thornton, J. Appl. Crystallogr. 1993, 26, 283–291. R. A. Laskowski; J. A. C. Rullmann; M. W. MacArthur; R. Kaptein; J. M. Thornton, J. Biomol. NMR 1996, 8, 477–486.

324 Derivation of Peptide and Protein Structure using NMR Spectroscopy 208. G. Vriend, J. Mol. Graph. 1990, 8, 52–56. 209. I. W. Davis; A. Leaver-Fay; V. B. Chen; J. N. Block; G. J. Kapral; X. Wang; L. W. Murray; W. B. Arendall, III; J. Snoeyink; J. S. Richardson; D. C. Richardson, Nucleic Acids Res. 2007, 35, 375–383. 210. A. S. Altieri; R. A. Byrd, Curr. Opin. Struct. Biol. 2004, 14, 547–553. 211. E. Kupcˇe; R. Freeman, J. Am. Chem. Soc. 2006, 128, 6020–6021. 212. S. Hiller; F. Fiorito; K. Wu¨thrich; G. Wider, Proc. Natl. Acad. Sci. U.S.A. 2005, 102, 10876–10881. 213. D. Malmodin; M. Billeter, J. Am. Chem. Soc. 2005, 127, 13486–13487. 214. S. Kim; T. Szyperski, J. Am. Chem. Soc. 2003, 125, 1385–1393. 215. M. W. Maciejewski; A. S. Stern; G. F. King; J. C. Hoch, Nonuniform Sampling in Biomolecular NMR. In Handbook of Modern Magnetic Resonance, Part II; G. A. Webb, Ed.; Springer: Dordrecht, 2006; pp 1287–1293. 216. M. Mobli; M. W. Maciejewski; M. R. Gryk; J. C. Hoch, J. Biomol. NMR 2007, 39, 133–139. 217. M. Mobli; M. W. Maciejewski; M. R. Gryk; J. C. Hoch, Nat. Methods 2007, 4, 467–468. 218. A. Bahrami; A. Assadi; J. L. Markley; H. Eghbalnia, PLoS Comp. Biol. 2009, e1000307. 219. P. Guntert, Eur. Biophys. J. 2009, 38, 129–143. 220. J. P. Linge; M. Habeck; W. Rieping; M. Nilges, Bioinformatics 2003, 19, 315–316. 221. M. Habeck; W. Rieping; J. P. Linge; M. Nilges, Methods Mol. Biol. 2004, 278, 379–402. 222. T. Herrmann; P. Gu¨ntert; K. Wu¨thrich, J. Mol. Biol. 2002, 319, 209–227.

Biographical Sketches

Glenn King graduated BSc and Ph.D. from the University of Sydney before undertaking postdoctoral studies with Professor Iain Campbell, FRS, at the University of Oxford. Glenn was a faculty member in the Department of Biochemistry at the University of Sydney from 1989 to 1998 before joining the University of Connecticut as Professor of Biochemistry and Microbiology in 1999. He returned to Australia in 2007 to take up a position as Professorial Research Fellow at the Institute for Molecular Bioscience at the University of Queensland. One of his major interests over the past 10 years has been the structure, function, and potential applications of peptide toxins expressed in spider venoms. Glenn recently founded Vestaron, an agricultural biotechnology company based in the United States, which aims to develop environmentally friendly insecticides based on natural insecticidal peptides. Glenn serves on the Scientific Advisory Boards of several companies and he is a Fellow of the American Academy of Microbiology.

Derivation of Peptide and Protein Structure using NMR Spectroscopy

Mehdi Mobli is an ARC Senior Research Associate at the Institute for Molecular Bioscience (IMB) at The University of Queensland, where he is currently managing an NMR structural genomics project. Mehdi received his undergraduate degree in Chemical Engineering from Chalmers University of Technology in Gothenburg, Sweden, and did his graduate work on calculation of chemical shifts in organic molecules in the laboratory of Professor Raymond Abraham at The University of Liverpool, UK. After a brief stint with Professor Jeffrey Hoch at the University of Connecticut, USA, working on methods for processing nonuniformly sampled multidimensional NMR data, Mehdi returned to the University of Manchester, UK, to work on the structure and dynamics of heparan sulfate derived from the capsular polysaccharides of pathogenic Escherichia coli strains. Mehdi is the coauthor (along with Professor Abraham) of the recently published monograph Modelling 1H NMR spectra of Organic Compounds.

325

9.10 Mass Spectrometry: An Essential Tool for Trace Identification and Quantification Charles H. Hocart, Australian National University, Canberra, ACT, Australia ª 2010 Elsevier Ltd. All rights reserved.

9.10.1 9.10.1.1 9.10.1.2 9.10.2 9.10.2.1 9.10.2.2 9.10.2.2.1 9.10.2.2.2 9.10.2.2.3 9.10.2.2.4 9.10.2.2.5 9.10.2.2.6 9.10.2.2.7 9.10.2.2.8 9.10.2.2.9 9.10.2.3 9.10.2.3.1 9.10.2.3.2 9.10.2.3.3 9.10.2.3.4 9.10.2.3.5 9.10.2.3.6 9.10.2.3.7 9.10.2.3.8 9.10.2.3.9 9.10.3 9.10.3.1 9.10.3.1.1 9.10.3.1.2 9.10.3.2 9.10.3.2.1 9.10.3.2.2 9.10.3.2.3 9.10.3.2.4 9.10.3.2.5 9.10.3.2.6 9.10.4 9.10.4.1 9.10.4.2 9.10.4.2.1 9.10.4.2.2 9.10.4.3 9.10.4.3.1 9.10.4.3.2 9.10.4.3.3

Introduction Overview Scope of the Present Work Components of a Mass Spectrometer The Mass Spectrometer – Overview Ion Source and Ionization Methods Electron ionization Chemical ionization (positive and negative) and electron capture ionization Ionization by proton transfer reaction Electrospray ionization Atmospheric pressure chemical ionization Atmospheric pressure photoionization Matrix-assisted laser desorption ionization Secondary-ion mass spectrometry Ambient ionization methods Mass Analyzers Resolution and accuracy Magnetic and electric Quadrupole Quadrupole 3D-ion trap Linear 2D-ion trap Orbitrap Time-of-flight Fourier transform ion cyclotron resonance Ion mobility spectrometry Tandem Mass Spectrometry Analyzers Tandem-in-space Tandem-in-time Fragmentation Collision-induced dissociation Photon-induced dissociation Electron capture dissociation Electron detachment dissociation Electron transfer dissociation Combined use of dissociation techniques Experimental Use of Mass Spectrometry Spoilt for Choice – Which Ionization Method to Choose? MS Scan Modes Single MS analyzer (nontrapping) scan modes Tandem MS scan modes Identification – Unknown Small Molecules An LC/MS approach GC/MS approach Determination of elemental formula

328 328 329 329 329 329 330 332 334 335 338 338 339 340 340 341 341 343 344 345 346 346 346 348 349 349 350 350 350 351 351 352 354 354 355 356 356 357 359 359 360 361 361 361 363

327

328 Mass Spectrometry: An Essential Tool for Trace Identification and Quantification 9.10.4.3.4 9.10.4.4 9.10.4.4.1 9.10.4.4.2 9.10.4.5 9.10.4.5.1 9.10.4.5.2 9.10.4.5.3 9.10.4.5.4 9.10.4.5.5 9.10.4.5.6 9.10.4.5.7 9.10.4.5.8 9.10.4.6 9.10.4.7 References

Database searching and interpretation of fragmentation from first principles Criteria for Identification of a Known Compound FDA Guidance for Industry 118 EU performance of analytical methods Quantification Components of an MS-based metabolite assay Sample preparation Fractionation and extraction of sample Internal standards Standard addition External standards Optimization of the MS assay Chemical noise and contamination MS Imaging Future Prospects

365 367 367 368 369 370 371 372 373 376 376 376 378 380 381 384

9.10.1 Introduction 9.10.1.1

Overview

Mass spectrometry (MS) is an essential tool in the identification and quantification of natural products, primarily because of its speed, sensitivity, selectivity, and its versatility in analyzing solids, liquids, and gases. Indeed there are reports of viable viruses being collected after passage through a mass spectrometer.1 MS has become an interdisciplinary methodology, impacting very many areas of science from physics, through chemistry, to biology. The first mass spectrometer was constructed in the 1890s and was critical to the discovery of the electron by Sir Joseph John Thompson (winner of the Nobel Prize for Chemistry in 1906). Since then, MS has proven to be a technique of immense importance to scientific endeavors in a variety of fields, initially physics with the discovery of the electron and then stable isotopes and later, biology where it has been an essential tool for the high-throughput identification of proteins and their posttranslational modifications (PTMs). It is interesting to note that Thompson2 observed in his book Rays of Positive Electricity and Their Application to Chemical Analysis that the new technique could be profitably used for chemical analysis. However, this potential was largely ignored until World War II when MS came to be used to monitor the cracking process in oil refineries and to separate 235U and 238U for use in the atomic bomb. The last century has also seen considerable innovation and development of the technique and three further Nobel Prizes have been awarded for the discovery of isotopes of nonradioactive elements (1922, Francis Aston), development of new analyzers (1989, Wolfgang Paul – quadrupoles (Q’s) and ion trap), and soft desorption ionization methods (2002, Koichi Tanaka and John B. Fenn – laser desorption ionization and electrospray ionization (ESI), respectively). MS continues to evolve and innovations in hardware and software are being driven by demands from medicine and biology for instruments with better mass accuracy, better mass resolution, increased dynamic range, faster data acquisition, and enhanced tandem MS capabilities. Samples of increasing complexity and diminishing size are being presented for analysis and entirely new fields of endeavor, such as proteomics and metabolomics, have been established, based on modern and continuing developments in MS. Those interested in the history of MS are referred to Grayson,3 Griffiths,4 and to Watson and Sparkman.5 Today, MS instruments are used in identifying and quantifying, for example, drugs, pollutants, products of chemical syntheses, planetary atmospheric components, biopolymers, and metabolites from microorganisms, plants, and animals. These analytes range in size from a few mass units (e.g., elemental gases) to hundreds of kilodaltons (kDa) (e.g., proteins and protein complexes) and cover a large range of polarities (e.g., hydrocarbons

Mass Spectrometry: An Essential Tool for Trace Identification and Quantification

329

to sulfated carbohydrates). In addition to the wide applicability, another attractive feature of mass spectrometric analyses is that they can potentially be performed with a large degree of specificity and sensitivity (e.g., zeptomolar concentrations – 1021 mol l1). Thus these instruments are used by a multitude of research disciplines and regulatory authorities (e.g., drug testing in sport,6,7 Olympic Games,8 space exploration,9 geological dating,10 biological tissue imaging,11 wine industry,12 metabolomics,13,14 proteomics,15–17). Although mass spectrometers are of widespread utility, it is also important to understand their limitations. Particular instruments are usually designed and dedicated to a narrow range of tasks dictated by their linkage to specific modes of sample presentation (e.g., solids probe, liquid chromatograph, gas chromatograph, or a proton transfer reaction drift tube) and methods of ionization (e.g., electrospray or electron impact). A well-equipped MS laboratory will therefore contain a variety of instruments with different capabilities. 9.10.1.2

Scope of the Present Work

MS is most commonly applied to problems of identification and quantification, particularly in the area of natural products chemistry. I hope in this brief chapter to give the nonspecialist chemist or biologist some basic background in MS and its capabilities so that they can sensibly engage with the MS specialist or MS literature in seeking solutions to their particular analytical problems. To this end, we will look specifically at the components of a mass spectrometer, the presentation of samples, the ionization processes available, and how the data generated from an analysis can be used for identification and quantification. Readers should also refer to complementary chapters in this volume on chromatography (chromatographically separated components of mixtures may be fed directly into the MS source for analysis) and proteomics (high-throughput technique for identification and quantification of large sets of proteins by MS (see Chapters 9.11–9.13). In keeping with the philosophy of this series, only selective references to the literature have been made and wherever possible these have been review and tutorial style articles.

9.10.2 Components of a Mass Spectrometer 9.10.2.1

The Mass Spectrometer – Overview

The mass spectrometer may be divided into a number of discrete components: a sample inlet, an ion source, one or more analyzers, a detector, and finally, a computer to both collect data and control the operational parameters of the instrument (Figure 1). In principle, gas-phase neutral molecules are ionized so that they may be separated by the electric and/or magnetic fields of the analyzer according to their mass (m) to charge (z) ratios (m/z). The ions are then detected and recorded as a mass spectrum, graphing the ion abundance against the m/z ratio of the individual ions (Figure 2). To enhance the passage of the ion stream, the ion source, analyzer region, and detector are held under vacuum. At atmospheric pressure (760 torr), there is a density of some 1019 molecules ml1, yielding a mean free path of 104 cm. However, in an evacuated region at 106 torr, the density drops to 1010 molecules ml1 and the mean free path is extended out to 103 cm, increasing the probability that an ion will be able to physically traverse the instrument without collision with a residual gas molecule. This requirement for a maximal mean free path is particularly important for the beam-type instruments (magnetic sectors and multiple analyzer instruments) and for ion cyclotron resonance (ICR) cells. In the latter, ions may literally travel many kilometers over an observation period of 1 s.18 In some respects, the linear and Q ion traps are the exception, in that although the analyzers are held within a vacuum system, the traps themselves contain a helium buffer gas, which is required to collisionally cool the trapped ions. 9.10.2.2

Ion Source and Ionization Methods

Prior to analysis, the sample must be volatilized and ionized.19 These processes can be separate or linked, depending on the nature of the sample and the ionization process being used. Samples may be presented for MS analysis in solid, liquid, or gaseous form and, furthermore, they may be a mixture of components. In the case of mixtures, separation is usually necessary for unambiguous identification or quantification because the

330 Mass Spectrometry: An Essential Tool for Trace Identification and Quantification

Vacuum Chromatography LC or CE for soluble samples GC or SFC for volatilized or gaseous samples

Source

Mass analyzer

Detector

Ionization of sample

Separate ions by m/z ratio

Detect and measure abundance of ions

Solids probe MALDI plate direct infusion

Computer

Liquid or solid samples

- Data collection - MS operation and control

Figure 1 Principle components of a mass spectrometer. For mass spectrometry (MS) analysis of a sample, the neutral analyte(s) must first be ionized, positively or negatively, to allow manipulation by the magnetic and/or electric fields in the MS analyzer. Ions are sorted according to their mass to charge ratio (m/z), which is then plotted against their intensity to generate a mass spectrum. The flight path of the ions is evacuated to maximize the mean free path of the ions and to reduce the possibility of unfavorable interactions with residual air molecules.

simultaneous presence of two or more components in the source region will result in an overlapping or mixed spectrum. Mixtures are therefore often separated by gas chromatography (GC) or capillary electrophoresis (CE) or supercritical fluid chromatography (SFC) or liquid chromatography (LC), with the eluted and separated components being supplied directly into the MS source.20 These hyphenated approaches are known as GC/MS, CE/MS, SFC/MS and LC/MS, respectively. If chromatography is not required, samples may be introduced directly into the ion source. In the case of gaseous samples, volatilization is of course unnecessary, and the sample can be introduced into the source using appropriate gas handling techniques. Nonpolar, thermally stable, low-molecular-weight solids and liquids can be placed in metal or glass crucibles (solids probe or direct insertion probe) or may be directly applied to a wire loop (direct exposure probe). The crucible or wire loop is then heated to thermally desorb or volatilize the sample. Some polar low-molecular-weight compounds may also be directly analyzed after being chemically derivatized21–27 to mask the polar functional groups and thereby increase volatility (e.g., by alkylation or silylation) and improve thermal stability (Section 9.10.4.3.2). Otherwise, samples may be dissolved in an appropriate solvent and subject to either an atmospheric pressure ionization process or be laser desorbed from a solid matrix as described below.

9.10.2.2.1

Electron ionization Electron ionization (EI), originally developed by Dempster,28 is widely used in MS for relatively volatile samples that are thermally stable and have relatively low molecular weight. Samples are typically presented in the effluent from a GC or are volatilized from a solids probe inserted into the high vacuum source. Ionization is effected by interaction between the gas-phase analyte molecules and a stream of high-energy electrons (typically 70 eV) drawn from a filament. Ionization occurs by removal of an electron to form an odd-electron ion, Mþ? (Equation (1)). EI generally creates a singly charged positive ion, and any doubly or triply charged ions are of very low abundance. EI is also a high-energy process and excess energy remaining after ionization can be dissipated by fragmentation (possibly with rearrangement) of covalent bonds in the molecular ion, to lose either a radical (e.g., CH3 ? ) (Equation (2)) or a neutral species (e.g., H2O or CH3OH) (Equation (3)). Ionization : M þ e – ! Mþ? þ 2e –

ð1Þ

Mass Spectrometry: An Essential Tool for Trace Identification and Quantification

173 (–H2O) 301

331

M+• 386

213 (–H2O) 231 273

275

178 HO

H Cholesterol, MW 386 145

100 95 90 85 80 75 70 65 60 55 50 45 40 35 30 25 20 15 10 5 0

81 95 105

M–CH3–H2O 301

67

M–H2O

213

159

79

M–CH3

119 133

163

M+•

231 255

386

173 199 368

185

275

353

371 247 260

326 283

100

150

200

250 m/z

314

300

339

350

400

450

Figure 2 Mass spectrum of cholesterol generated by electron ionization (EI). The EI mass spectrum of cholesterol is characterized by the presence of a molecular ion at m/z 386 and by extensive fragmentation and contains information on the steroid nucleus and side chain. There is no indication as to the position of the double bond in this spectrum but the 3-hydroxy5 structure can be identified after conversion to an ester.29,30

Fragmentation : Mþ? ! ½M – Rþ þ R? Mþ? ! ½M – R

þ?

þR

radical loss

neutral loss

ð2Þ ð3Þ

The fragmentation observed during EI is defined by the chemical structure of the analyte and the resulting highly reproducible pattern of fragmentation may be used for structural elucidation and identification of unknowns5,31–34 (Figures 2 and 3(a)). This reproducibility has been exploited to develop user-generated and commercial libraries of spectra (some containing several hundred thousand spectra), which can be rapidly searched for comparable spectra. For some compounds, fragmentation may be so extensive that the molecular ion does not appear in the EI spectrum (e.g., Figure 3(a)). If this molecular

332 Mass Spectrometry: An Essential Tool for Trace Identification and Quantification

mass information is required, then the analyst will have to resort to one of the softer or less energetic ionization processes such as chemical ionization (CI, Figures 3(b), 3(c), and 3(e)) or ESI (Figure 3(g)) as outlined below. The displaced electron is generally assumed to be the electron with the lowest ionization energy. In order of probability, this will be a nonbonding electron followed by a bond electron and then a bond electron. Thus EI yields, in the first instance, a molecular ion which is a radical cation with an unpaired electron. In principle, any remaining energy will then be dissipated by bond cleavages that result in the formation of the most stable cation with a paired electron (even-electron ion). These even-electron ions may be formed by homolytic or heterolytic cleavages. This whole process happens very rapidly (<108 s) and is the reason for the close similarity of EI spectra produced across all different instruments. It is important to remember that mass spectral reactions in the EI source are unimolecular. This is because the pressure in the EI source is too low for bimolecular (ion–molecule) reactions to occur.

9.10.2.2.2

Chemical ionization (positive and negative) and electron capture ionization Like EI, CI is also typically applied to samples presented via a GC interface or volatilized from a solids probe. It is a less energetic or soft form of ionization and is designed to minimize fragmentation.35–38 CI is usually carried out in a source similar to that used for EI except that a reagent gas, commonly methane, isobutane, or ammonia, is added at a pressure of 0.3–1 torr. The electron beam then interacts with the reagent gas to produce reagent ions (Table 1) and thermal electrons. The neutral analyte molecules are then ionized by ion–molecule reactions to produce positive and negative analyte ions (Figures 3(b), 3(c), and 3(e)). The thermal electrons are also available for electron capture by electrophilic analytes, yielding negative analyte ions.

(a)

(b)

100

100

57 EI, threonine, MW 119

80

120, [M + H]+

74

56

80

102 75

60

OH

60

O

40

Methane CI 40

OH NH2

20

102

84

20

74

86

86

0 60

80

100

120 m/z

140

160

180

(c)

60

80

100

120 m/z

140

160

(d)

100

74

120,

100

[M + H]+

74

56 80

80

60

MS2 of m/z 120, [M + H]+ 102

60

Ammonia CI 102

40

40

86

56

20

20

84 0

120

0 60

80

Figure 3 (Continued)

100

120 m/z

140

160

180

60

80

100 m/z

120

180

Mass Spectrometry: An Essential Tool for Trace Identification and Quantification (e)

333

(f)

100

118, [M –

H]–

100

80

74

80

Ammonia NICI 60

MS2 of m/z 118, [M – H]–

60

154, [M + Cl]–

40

20

40

156

74

20

102

118 0

0 60

80

100

(g) 100

120 m/z

140

160

40

180

50

60

70

80 m/z

(h) 100

120, [M + H]+

80

80

60

60

74

100

110

120

102

MS2 of m/z 120, [M + H]+

+ve ESI 40

90

40

74

20

86 0

0 60

80

100

120

20

102

120 m/z

140

160

180

56 60

84 80

100

120 m/z

140

160

180

Figure 3 Mass spectra of the amino acid threonine. (a) Electron ionization (EI) mass spectra generated at an ionization energy of 70 eV. No molecular ion is observed. For an interpretation of the EI fragmentation, see Bieman and McCloskey39 and Junk and Svec.40 (b) Chemical ionization (CI) spectra generated using methane as the reagent gas. A prominent [M þ H]þ ion is observed at m/z 120. A discussion of the CI fragmentation may be found in Milne et al.41 and Solovev et al.35 (c) CI spectra generated using ammonia as the reagent gas. The spectrum is very similar to that in (b). (d) Tandem MS2 experiment selecting m/z 120 from the ammonia CI spectra. (e) Negative ion chemical ionization (NICI) spectra demonstrating proton abstraction [M H] m/z 118 and adduct formation with chlorine [M þ Cl] m/z 154 and 156. Threonine HCl was dissolved in 50% EtOH/water. (f) Tandem MS2 of m/z 118, in the negative mode. (g) Electrospray ionization (ESI) spectra, again featuring a prominent [M þ H]þ ion at m/z 120. (h) Tandem MS2 experiment of m/z 120 from the ESI spectra.

Table 1 Common reagent and analyte ions in chemical ionization (CI) Reagent gas

Major reagent ions

Product ions

Positive CI Methane, CH4 Isobutane, C4H10 Ammonia, NH4

CH5 þ , C2 H5 þ , C3 H5 þ C4 H9 þ NH4 þ

[M þ H]þ, [M þ C2H5]þ [M þ H]þ, [M þ C4H9]þ [M þ H]þ, [M þ NH4]þ

Negative CI Chloroform Ammonia N2O/CH4(1:1)

Cl NH2 – OH

[M H], [M þ Cl] [M H], [M þ NH2] [M H]

It is important to remember that these reactions are all occurring simultaneously in the source and that either the positive or negative ions can be selectively extracted from the source into the mass analyzer by placing the appropriate voltages on the extracting and focusing lenses. In the case of Q analyzers (Section 9.10.2.3.3), the switching between positive and negative polarity can be accomplished very rapidly so that

334 Mass Spectrometry: An Essential Tool for Trace Identification and Quantification

positive and negative ions may be analyzed from a single GC peak. This technique is known as pulsed positive ion/negative ion CI (PPINICI). In positive ion chemical ionization (PICI), the neutral analyte is most commonly ionized by proton transfer (Equation (4)) or adduct formation (Equations (5) and (6)). When, for example, methane is used as a reagent gas, the [M þ 1]þ, [M þ 29]þ, and [M þ 41]þ series of ions (Equations (4)–(6)) is good confirmation of the analyte molecular mass. M þ CH5þ ! ½M þ Hþ þ CH4

proton transfer

ð4Þ

M þ C2 H5þ ! ½M þ C2 H5 þ

adduct formation

ð5Þ

M þ C3 H5þ ! ½M þ C3 H5 þ

adduct formation

ð6Þ

Less commonly, charge transfer (Equation (7)) and hydride abstraction (Equation (8)) may be observed. M þ CH4þ? ! Mþ? þ CH4

charge transfer

M þ C2 H5þ ! ½M – Hþ þ C2 H6

hydride abstraction

ð7Þ ð8Þ

Under CI conditions, negative reagent ions are also formed (Table 1) and these can effect analyte ionization by hydride abstraction (Equation (9)) or anion attachment (Equation (10)) (Figure 3(e)). M þ NH2 – ! ½M – H – þ NH3 M þ NH2 – ! ½M – NH2 – þ NH3

hydride abstraction

ð9Þ

anion attachment

ð10Þ

As mentioned above, thermal electrons are also generated in the CI source, along with the reagent ions. These can be exploited for electron capture ionization (ECI), particularly in the case of molecules containing electrophilic moieties such as F, Cl, NO2, and CN and this may confer advantages of increased sensitivity and selectivity for a particular analyte. These electron-capturing groups can of course be added into the target analyte by appropriate derivatization prior to analysis, to selectively enhance the possibility of electron capture.21–27 It should be noted that this electron capture process is not, strictly speaking, negative CI as the analyte molecules are interacting with the thermal electrons and not the reagent ions derived from the CI gas. There are three different mechanisms for ECI: M þ e – ð0:1 eVÞ ! M – ? –

resonance electron capture

M þ e ð0 – 15 eVÞ ! ½M – A þ A

?

dissociative electron capture

M þ e – ð > 10 eVÞ ! ½M – B – þ Bþ þ e –

ion pair formation

ð11Þ ð12Þ ð13Þ

The sensitivity of ECI analysis is generally two to three orders of magnitude greater than that of CI or EI analysis. Little fragmentation occurs during ECI, and this mode of ionization is generally employed for quantification of trace amounts of known compounds. 9.10.2.2.3

Ionization by proton transfer reaction Recently, a variant of CI has been specifically developed to monitor in real time low concentrations of volatile organic compounds (VOCs).42 VOCs are normally present in complex mixtures that could be separated by GC; however, these separations are relatively slow (15–60 min) and are not suitable for real-time monitoring. Proton transfer reaction mass spectrometry (PTR-MS) uses CI based on proton transfer from hydroxonium ions (H3Oþ). These hydroxonium ions are produced in an external glow discharge ion source operating in pure water vapor. The reagent ions are then passed into a drift tube that is continuously flushed with the ambient air containing the VOCs of interest. The H3Oþ ion does not react with any of the common constituents of the atmosphere (N2, O2, Ar, or CO2) as their proton affinities are lower than those of water. However, most VOCs have proton affinities higher than water (>166.5 kcal mol1), and so proton transfers to the VOCs occur exothermically as a consequence of ion–molecule reactions in the drift tube. For the most part, these proton transfers are nondissociative and the mass analyzer can monitor a single ion species for each individual VOC (Equation (14)).

Mass Spectrometry: An Essential Tool for Trace Identification and Quantification H3 Oþ þ MVOC ! MHþVOC þ H2 O

335 ð14Þ

þ

þ

However, some dissociation to [M OH] (Equation (15)) or [M OR] (Equation (16)), depending on the chemical class of the analyte, can occur. MHþ ! ½M – OHþ þ H2 O

ð15Þ

MHþ ! ½M – ORþ þ ROH

ð16Þ

Some further selectivity can be introduced into the process by using ammonia as the reagent gas (NHþ4 reagent ions) so that proton transfers occur only with compounds with a proton affinity >204 kcal mol1. A more sophisticated, though less common version of PTR-MS is selected ion flow tube mass spectrometry (SIFT-MS). In this technique, a mixture of reagent ions is generated in a gas discharge ion source and then a Q mass filter is used to select one reagent ion, which is then injected into an inert carrier gas (usually He), for reaction with the gaseous sample, which is also injected into the carrier gas. The products of the ion–molecule reactions are then analyzed by a second mass analyzer. Recently, a triple cell PTR Fourier transform ion cyclotron resonance MS (FTICR MS) has been built to encompass the whole process with the advantage of high mass resolution and accuracy to characterize the ion–molecule reaction products43 (see also Section 9.10.2.3.8). This, however, is achieved at the cost of sensitivity (1 ppm compared with 0.1 ppb). The most common reagent ions used in SIFT-MS are H3Oþ, NOþ, and O2 þ? , and their reactions with many different classes of volatile organics have been well documented.44 The NOþ reagent ion can react with the VOCs, depending on their chemistry, in one or two of several different ways – charge transfer (Equation (17)), hydride ion transfer (Equation (18)), hydroxide ion transfer (Equation (19)), alkoxide ion transfer, and ion–molecule association (Equation (20)). M þ NOþ ! Mþ? þ NO?

ð17Þ

M þ NOþ ! ½M – Hþ þ HNO

ð18Þ

M þ NOþ ! ½M – OHþ þ HNO2

ð19Þ

M þ NOþ ! ½M þ NO

þ

ð20Þ

VOCs mostly react with O2 þ? via charge transfer (Equation (21)) or dissociative charge transfer (Equation (22)); however, this reagent ion has found most use in monitoring NO, NO2, and CS2 as NOþ, NO2 þ , and CS2 þ? , respectively. M þ O2þ? ! Mþ? þ O2

ð21Þ

þ

M þ O2þ? ! ½M – R þ R? þ O2 þ

þ

Reactions of different chemical classes with the H3O , NO , and O2 and Sˇpane˘l.44

9.10.2.2.4

ð22Þ þ?

reagent ions may be found in Smith

Electrospray ionization Electrospray is a process of transferring solution ions, typically large, nonvolatile polar molecules such as proteins, peptides, and carbohydrates, into the gas phase by ion desorption or ion evaporation.45 Samples are supplied to the source directly via a syringe or, most commonly, as the eluent from an LC column. The liquid is passed through a metal needle held at high voltage (1–3 kV with respect to the sample cone or MS inlet) and sprayed into the ionization chamber at atmospheric pressure. A coaxial nebulizer gas may assist spray formation in the case of high solvent flow rates. As the charged droplets evaporate and shrink in size, the charge concentration in the droplets increases to the point where like-charge repulsion overcomes surface tension and the droplets explode to form microdroplets. The process is repeated and ultimately ions are ejected (desorbed) into the gas phase. These ions are then attracted into the off-axis or orthogonal sample inlet (counterelectrode) of the mass spectrometer. This off-axis geometry has the advantage of excluding neutral molecules and solvent clusters from the mass spectrometer.

336 Mass Spectrometry: An Essential Tool for Trace Identification and Quantification

ESI is a very ‘soft’ process, inducing little fragmentation, but in the case of molecules with a number of chargeable sites, a distribution of charge states is generated (Figures 3(g) and 6). The distribution and nature of the charges is very much a function of the sample solvent. In protic solvents such as water or mixtures of water and methanol or acetonitrile, sample ions will form a protonated, [M þ nH]nþ, or deprotonated, [M nH]n, series of multicharged ions. If alkali metals or ammonia is present in solution, then cationization will also be observed. The number of charges that can be carried on an electrosprayed molecule depends on a number of factors including the size of the molecule, the tertiary structure (e.g., some charge-carrying sites – basic amino acids in þve ESI – may be physically removed from exposure to the solvent at the center of a folded protein), the number of sites on which a charge may be localized (acidic and basic sites), and the nature of the solvent (pH and presence of salts). The effect of solvent pH on the abundance and distribution of the charges on myoglobin is illustrated in Figure 4. The multicharging phenomenon means that ions of very large mass can be detected with conventional analyzers with mass ranges up to 3000 u. As a general rule, there will be one charge for every 8–10 amino acids (1000 mass units). Thus, for example, a protein or protein complex of 200 000 Da can be readily analyzed if it can accommodate 100 charges. Hence, 200 000 Da ¼ 2000 m=z 100z

This distribution of charges, especially when there may be more than one molecular species, can represent a very confusing picture. This situation may be further compounded by the presence of additional ion series that can occur when protonation competes with cations such as sodium and potassium. However, it is possible to deconvolute the multiple charge states and to calculate the mass of the molecule in question, by application of simple algebra. First, it is reasonable to assume when looking at the multicharged envelope of an unknown that adjacent peaks differ by one charge. For the most part, this will represent a proton, as the multicharged envelopes due to sodium and potassium tend to be much less abundant. In the myoglobin spectrum (Figure 4), two adjacent ions have been labeled M1 (higher value) and M2 (lower value) and these will carry n1 and n2 charges (protons), respectively. Thus n2 ¼ n1 þ 1

ð23Þ

Second, the observed m/z values of each of the peaks can be written as M1 ¼

Mr þ n 1 H n1

ð24Þ

where Mr is the mass of the unknown, n is the number of charges, H is the mass of a proton, M1 is the m/z experimental value, and M2 ¼

Mr þ n2 H Mr þ ðn1 þ 1ÞH ¼ n2 n1 þ 1

ð25Þ

The charge state, n1, can then be calculated from n1 ¼

M2 – H M1 – M2

ð26Þ

The mass of the unknown, Mr, can then be determined from Mr ¼ n1 ðM1 – H Þ

ð27Þ

Where the multicharged series is due to cationization, the mass of H should be replaced by that of the cation (e.g., Naþ or Kþ). Fortunately, most modern ESI-MS data systems have computer-based deconvolution algorithms to automate this process (Figure 4(c)). ESI is most commonly associated with the analysis of large biomolecules of medium to high polarity, and it is a major tool for proteomic analyses,17 but it can also be used for the MS analysis of small molecules provided they contain basic groups (e.g., amino, amide) for positive ESI or acidic groups (e.g., carboxylic acid, hydroxyl) for negative ESI.

(a) 942.80

100

M1 carries n1 = 18 charges 893.33

998.20

M2 carries n2 = 19 charges

80

ESI of myoglobin at pH 2

1060.47

60

848.67 808.33

1131.13 1211.73

40

1304.87

1413.60

771.60 1696.13

1541.80 616.53

20

738.20 707.60 1589.80

1465.40

1787.27

1884.07 1950.20

499.53 557.20

0 400

600

800

1000

1200

1400

1600

1800

2000

m/z

(b) 100 616.33

1130.80

+15

80

1304.93

1060.40

1413.60

+12

1211.73 998.13

ESI of myoglobin at pH 7

60 1544.33

1560.93

943.87

40

1651.80

1763.00 1794.13

651.13

20 461.87 504.20

1973.40

1870.20

809.20 674.40

0 400

600

800

1000

1200

1400

1600

1800

2000

m/z

(c) 100

16 952.0

80

60 Deconvoluted ESI spectra of myoglobin at pH 2

40 16 970.0 17 049.0 17 004.0

20

0 16 900

17 000

17 100

17 200 Mass

17 300

17 400

Figure 4 ESI mass spectrum of horse heart myoglobin (Mr, 16 951.49 Da) illustrating the multiple charge phenomena. Note that protonation is most effective at acid pH (a) rather than neutral pH (b) significantly altering the abundance and distribution of charge on myoglobin. Determination of the charge state can be made from first principles, using adjacent pairs of ions, from Equation (26), and the mass of the protein from Equation (27). The average mass determined from using all the data is 16 952.0 Da (c).

338 Mass Spectrometry: An Essential Tool for Trace Identification and Quantification

One of the great advantages of ESI is that generally it is very successful without the added complications of derivatization. Derivatization is often carried out under harsh conditions and the risk of sample degradation or the formation of multiple derivatives is very real. Nevertheless, derivatization can be a useful adjunct to ESI and there are many reports of derivatization being used to improve the ionization efficiency (and hence the sensitivity of an assay) by increasing the hydrophobicity or adding a group with a fixed charge to the analyte (see the review by Zaikin and Halket46). Although ESI can be performed at quite high flow rates (up to 1–2 ml min1), the trend has been to run at lower and lower flow rates. Low flow rates mean that the coaxial nebulizer gas and the heated drying gases are no longer required, simplifying the construction and operation of the source. However, the most attractive feature of low flow rates is the dramatic improvement in the ESI efficiency with nano-ESI (20–50 nl min1) producing smaller initial droplets (200 nm diameter compared with 1–2 mm, a 100–1000-fold reduction in volume) allowing a much greater proportion of the sample to pass into the gas phase and then into the MS analyzer. Consequently, smaller amounts of sample are required, allowing more sophisticated biological experiments to be attempted on smaller samples. The second advantage of using low flow rates in ESI is that the problem of ion suppression is reduced. Analytes and other components in the spray compete for charge so that analytes with the lowest ionization energy will be preferentially ionized at the expense, for example, of more abundant analytes with higher ionization energy. Therefore, when using ESI, caution should be exercised in extrapolating from the observed spectrum ion abundance to the concentration of the neutral analyte in solution.

9.10.2.2.5

Atmospheric pressure chemical ionization The atmospheric pressure chemical ionization (APCI) source is similar in design to the ESI source but the process of ionization is quite different.47 The liquid sample solution is sprayed through a heated nebulizer into the source at atmospheric pressure. A corona discharge acts to ionize the atmospheric gases and solvent molecules to generate a series of reagent ions, in a manner similar to CI. Ionization of the analyte molecules then occurs by ion–molecule reactions, with minimal fragmentation. In most cases, only singly charged ions are generated and these are then extracted out of the source into the MS analyzer. Unlike ESI, APCI actively generates ions from neutrals, making small (up to 1000–2000 Da), low to medium polarity analytes amenable to MS analysis. However, APCI is not as readily adaptable to low flow conditions as ESI because it is reliant on a concentrated cloud of solvent molecules to generate the necessary reagent ions.

9.10.2.2.6

Atmospheric pressure photoionization Atmospheric pressure photoionization (APPI) is a relatively new technique48–51 but the source design is almost identical to that used for APCI except that the corona discharge needle is replaced by a krypton discharge lamp, which irradiates the hot vaporized plume from the heated nebulizer with photons (10 and 10.6 eV). The mechanism of direct photoionization is quite simple. Where the ionization energy of the molecule is less than the energy of the photon, absorption of a photon is followed by ejection of an electron to form the molecular radical ion Mþ? (Equation (28)). M þ hv ! Mþ? þ e –

direct APPI

ð28Þ

However, in an atmospheric pressure environment, the major ion observed is [M þ H]þ, the result of ion–molecule reactions abstracting a proton from protic solvents to yield [M þ H]þ? (Equation (29)).50 Charge may also be lost by proton transfer or electron attachment. Mþ? þ S ! ½M þ Hþ þ ½S H?

ð29Þ

It should be noted that direct photoionization is not a very efficient process due to the strong absorption by the nebulizing gases and the solvent. Ionization efficiencies may be significantly enhanced by the use of a dopant such as toluene or acetone or anisole, which is added in excess to the vaporized solvent plume.50,51 These dopants can be photoionized (Equation (30)) and the resultant reagent ions are then available to ionize the analyte by ion–molecule reactions, resulting in proton transfer (Equation (31)) and charge exchange (Equation (32)).

Mass Spectrometry: An Essential Tool for Trace Identification and Quantification D þ hv ! Dþ? þ e – þ?

D

þ

dopant APPI

þ M ! ½M þ H þ ½D H Dþ? þ M ! Mþ? þ D

?

proton transfer

charge exchange

339 ð30Þ ð31Þ ð32Þ

All the reactions are dependent on the ionization energies and proton affinities of the analyte, solvent, and dopant. Thus there are three possibilities for ionization in the positive mode, direct photoionization, proton transfer, and charge exchange. The APPI source is also an effective generator of thermal electrons and is thus well suited to the generation of negatively charged analyte ions by ECI. Thermal electrons are readily generated by the 10 eV photons striking a metal surface (3–5 eV electron binding energy) and as can be seen from Equations (28) and (30), a thermal electron is generated for every photoionization event. The great advantage of APPI is that it can be used to ionize nonpolar classes of compounds such as alkanes, alkenes, and aromatics that are not ionized by ESI or APCI and it can be interfaced with normal-phase chromatography,49,51 where the corona discharge (APCI) and the high-voltage discharge (ESI) present a potential explosion hazard. The full potential of APPI, particularly in the context of combined APCI/APPI or ESI/APPI sources, has yet to be explored. The photoionization and fragmentation of peptides/proteins is not well characterized and may represent another method, along with electron caphere dissociation (ECD) (Section 9.10.3.2.3) and electron transfer dissociation (ETD) (Section 9.10.3.2.5), of generating sequence information.51,52 Also, unlike APCI, photoionization can be applied to very low solvent flow rates (less than 5 ml min1) relying on the analyte interacting with a photon of sufficient energy, and not on the solvent as a charge carrier. This alleviates the ESI and APCI problem of ion suppression where some analytes are unable to compete for charge from the charge carriers. 9.10.2.2.7

Matrix-assisted laser desorption ionization Matrix-assisted laser desorption ionization (MALDI), like ESI, is capable of ionizing and launching very large molecules (e.g., polysaccharides, synthetic polymers, peptides, and proteins) into the gas phase and is a major analytical tool for high-throughput proteomic studies.17,53 In many respects, MALDI is a complementary technique to ESI and both techniques are often applied to the same sample when determining protein identity. ESI produces macromolecular ions from solution, whereas MALDI produces them from the solid state. In principle, the sample is cocrystallized with a matrix onto a stainless-steel target or a target with a hydrophilic spot surrounded by a hydrophobic surface designed to concentrate the sample into a small area.54–57 The dried sample is then illuminated with a pulse of laser light (usually UV but also infrared (IR)) that is absorbed by the matrix chromophore. The photon energy is then transferred from the matrix to the embedded analyte which in turn is ionized and desorbed from the target. Singly charged ions, [M þ H]þ, are typically produced and because this is another ‘soft’ ionization process, little fragmentation occurs. This makes for a relatively simple interpretation of the spectra; however, it must be noted that the lower end of the mass scale (<800 m/z) is dominated by a plethora of intense matrix-derived ions. The lack of multiple charging of large analytes means that MS analyzers with an extended m/z range, such as time-of-flight (ToF) (see below), must be used. Successful MALDI analysis is dependent on a number of factors, not the least of which is selection of an appropriate matrix. The matrix must be soluble in solvents compatible with the analyte (usually an aqueous/ organic solvent mixture) and it must be possible to cocrystallize the analyte and matrix onto the target. The matrix must also be vacuum stable and be able to absorb at the emission wavelength of the laser. In addition, it must be able to cause codesorption of the analyte and promote analyte ionization. See Table 2 for a list of commonly used MALDI matrices. Other important factors that need to be optimized for MALDI analysis include the molar ratio of analyte to matrix (1:104 is a good starting value) and the power or fluence (energy per unit area) of each laser shot. The best spectra, in terms of minimizing fragmentation and achieving the best resolution, are acquired at just above the laser fluence for ion formation. However, at low laser power, few ions are generated by a single laser pulse, so MALDI spectra are typically accumulated over tens or even hundreds of laser pulses. One of the drawbacks with MALDI is that the quality of the spectra generated is very dependent on good sample preparation and

340 Mass Spectrometry: An Essential Tool for Trace Identification and Quantification Table 2 Common UV absorbing MALDI matrices (nitrogen laser, ¼ 337 nm) and their area of application Matrix

Analyte

Picolinic acid (PA) 3-Hydroxypicolinic acid (HPA) 3-Aminopicolinic acid (APA) Dihydroxybenzoic acid (DHB) -Cyano-4-hydroxycinnamic acid (CHCA) Sinapinic acid (SA) 2-(4-hydroxyphenylazobenzoic acid (HABA) 2,4,6-Trihydroxyacetophenone (THAP) 6,7-Dihydroxycoumarin

DNA, RNA DNA, RNA DNA, RNA Oligosaccharides Peptides, lipids, oligonucleotides Proteins Polymers Polymers, glycopeptides, oligonucleotides Lipids

even then some parts of the sample surface, the so-called ‘sweet spots’, will generate better quality spectra than others. Practice and automated sample preparation, however, go some way in reducing this problem. The sensitivity of the MALDI technique is generally comparable with that achieved by ESI but any advantage is offset, where automation is not available, by the work required in sample preparation and the difficulty of reproducibility. However, MALDI has a clear advantage over ESI in that targets holding a successful sample preparation can be stored and exploited repeatedly, at leisure. By comparison, ESI samples are nebulized and the sample consumed. While MALDI is reputed to be relatively insensitive to contaminants (e.g., buffers, detergents, and salts), it must be said that the cleaner the sampler, the better the sensitivity and the better the coverage of analytes because ionization suppression is reduced. A recent and exciting development of the MALDI technique has seen it adapted to molecular imaging of biological tissue sections (see discussion in Section 9.10.4.5). 9.10.2.2.8

Secondary-ion mass spectrometry Secondary-ion mass spectrometry (SIMS) is an ionization technique that with the advent of ESI and MALDI had largely fallen out of favor with chemists and biologists. However, it has undergone something of a revival as its ability to chemically characterize a surface is now being applied to MS imaging of biological tissues (see Section 9.10.4.5). In this technique, a solid surface is bombarded with a continuous beam of highly focused, high-energy ions such as gold (Au3 þ ), cesium (Csþ), or bismuth (Bi3 þ ) from a liquid metal ions gun (LMIG) or ions of Buckminster fullerene (C60 þ ).11,58–60 These ions penetrate the sample surface to a certain depth, depositing their energy through nuclear collisions and generating secondary ions (protonated or cationized) along the way. These secondary ions (<500 m/z) are sputtered or emitted from the surface and are then directed to the entrance of the mass spectrometer for analysis. 9.10.2.2.9

Ambient ionization methods Recently, a new family of ionization techniques that are distinguished by their ability to ionize analytes from surfaces under ambient conditions have been developed.61 These methods are also characterized by the fact that no prior separation or extraction of the sample is required. Of these methods two have so far been well characterized, desorption electrospray ionization (DESI)62 and direct analysis in real time (DART).63 DESI is closely related to ESI, with surface samples being ionized by a stream of charged solvent droplets to produce low-energy intact molecular ions. This technique has been successfully applied to a wide range of analytes (e.g., proteins, peptides, oligosaccharides, amino acids, terpenes, steroids, and lipids) that have been desorbed from a variety of surfaces, including paper, fabric, plastic, skin, and plant tissues. Ionization has also been demonstrated at up to 3 m away from the MS analyzer using an extended heated ion transfer capillary64 and sensitivities down to attomole levels have also been reported.62 DART uses a glow discharge plasma to excite a heated stream of inert gas, usually nitrogen or helium, which is directed onto the surface to be analyzed. These excited state atoms and molecules have been shown to effect,

Mass Spectrometry: An Essential Tool for Trace Identification and Quantification

341

like DESI, low-energy ionization of a variety of analytes (e.g., chemical warfare agents, pharmaceuticals, explosives, peptides) from a range of different surfaces (currency, concrete, skin, plant tissue, fabric, and glass). Again, like DESI, excellent sensitivities have been reported.

9.10.2.3

Mass Analyzers

After sample ionization, the ions are passed to the mass analyzer(s) where they are separated according to their mass to charge ratio (m/z). This separation can be based on a number of different ion properties, including momentum (magnetic sectors), kinetic energy (electrostatic analyzer), path stability (linear Q’s), resonance frequencies (Q ion traps, linear ion traps), orbital frequencies (ion cyclotrons), velocity (ToF), or axial frequency (Orbitrap), as ions transit or are contained by combinations of electric and/or magnetic fields. The principle of operation, compatibility with different ionization sources, mass accuracy, mass resolution, and utility for tandem MS experiments of these different analyzers will be briefly discussed. Other factors that can be used to compare the performance of mass analyzers include the mass range limit, scan speed, efficiency of ion transmission, mass accuracy, and mass resolution (Table 3). More prosaic considerations include, of course, cost and vendor support. A more in-depth discussion of this subject matter will be found in Gross,33 McLuckey and Wells,65 Tarantin,66 and Wollnik.67 Finally, it is important to realize that there is no one analyzer that is superior to all others. The choice of analyzer, therefore, must be based on the information required from the particular type of sample, remembering that analyses based on different mass analyzers can provide complementary information.

9.10.2.3.1

Resolution and accuracy No discussion of MS data or comparison of mass analyzers would be complete without including some definition of the data quality, particularly, the accuracy of the data and the resolving power at which they were obtained. Table 3 Common mass analyzers: their attributes and typical specifications Upper mass Mass analyzer

Measures

Ec Bc EB or BEd Qc ToFe QITg LITh FTICRi Orbitrapj

Kinetic energy Momentum

a

Path stability Flight time Resonance frequency Resonance frequency Orbital frequency Axial frequency

104 104 >104 >103 >103 >104 >104

Resolving power

Accuracy (ppm)

102–105 102–104 >104 103–104 103–104 >106 at m/z 100 6 104 at m/z 400

1–5 100 5–50f 50–100 100 <1 2–5k

Dynamic rangea

Costb

109 107 102–104 102–103 102–104 102–105 103–104

++++ + +++ ++ +++ +++++ ++++

Linear dynamic range. + ¼ low cost; +++++ ¼ high cost. May be configured with other analyzers for tandem-in-space experiments (e.g., QqQ, QqLIT, QqToF, and QqFTICR). d Double-focussing BE or EB analyzer may be configured with other analyzers for tandem-in-space experiments (e.g., EBE). e ToF combined with reflectron may be configured with other analyzers for tandem-in-space experiments (e.g., ToF–ToF, QIT-ToF, and QqToF). f 1–5 ppm with a lock mass. g Trapping-type instrument capable of tandem-in-time experiments and can be linked to ToF analyzer (QIT-ToF). h Trapping-type instrument capable of tandem-in-time experiments and can be linked to Q, FTICR, or Orbitrap analyzers (e.g., QqLIT-FTICR, and LIT-Orbitrap). i Trapping-type instrument capable of tandem-in-time experiments and can be configured to analyze fragments generated externally (e.g., QqFTICR or LIT-FTICR). j Trapping-type instrument configured to analyze fragments generated externally (e.g., LIT-Orbitrap). k < 1 ppm with a lock mass. b c

342 Mass Spectrometry: An Essential Tool for Trace Identification and Quantification

Ion abundance

(a)

(b)

R = M/ΔM at x % valley height x usually 5, 10, or 50%

R = M/ΔM at x % height x usually 50%

ΔM

M

Peak height

FWHM (ΔM )

h/2 Valley height

Mass Figure 5 There are two definitions of mass resolution. These are based on either two overlapping peaks of equal intensity separated by M (a) or a single well-defined peak with M defined as the full-width at half-maximum height (FWHM) (b).

There are two commonly used definitions for mass resolution (R). The first, used with magnetic sector instruments, is defined as the ability to separate two neighboring ions in a mass spectrum where Mx is the difference in m/z between the two peaks. The two peaks should be of equal size and similar shape and the degree of overlap (x) should be specified (Figure 5(a)). The latter is often specified as 10 or 50% of the valley height. M is the average of the two masses. R¼

M Mx

ð33Þ

A more convenient definition, commonly used with trapping and ToF analyzers, pertains to a single well-resolved peak where Mx is the peak width at a specified height x, usually half-maximum height (full-width at half-maximum height, FWHM) (Figure 5(b)). It should be noted that this FWHM definition of resolution equates to about twice that calculated from the 10% valley definition. Resolution can vary over the mass range and this should also be specified. For example, Q mass filters and ion traps are usually operated at ‘unit mass resolution’ (Mx ¼ 1) constant over the whole mass range. Thus the peaks at 100 m/z and 101 m/z will be separated at a resolution of 100 and the peaks at 1000 m/z and 1001 m/z will be separated with a resolution of 1000. Mass accuracy is the difference (M) between the measured accurate mass M and the calculated exact mass. It can be stated as absolute units of mass (differences of so many millimass units, mmu, 103 u) or as a relative mass accuracy in parts per million. Relative mass accuracy ¼

M 106 ppm M

ð34Þ

Mass accuracy is also closely bound up with mass resolution as failure to achieve sufficient resolution of the ion of interest, away from interfering isobaric ions, will seriously impinge on the attainable mass accuracy. An

Mass Spectrometry: An Essential Tool for Trace Identification and Quantification

343

appropriate level of mass accuracy and mass resolution in a mass spectrum can enable the determination of the elemental composition of the ions and can allow the analyst to distinguish, for example, glutamine from lysine (M ¼ 0.036 u) and phenylalanine from oxidized methionine (M ¼ 0.033 u) (see Section 9.10.4.3.3). 9.10.2.3.2

Magnetic and electric The use of magnetic and electric fields to separate ions was introduced by Thompson2 in his parabola mass spectrometer. The many developments that followed on from this culminated in the modern ‘double-focusing’ mass spectrometer that is available today. In principle, ions may be deflected by magnetic (B, momentum analyzer) or electric fields (E, kinetic energy analyzer). An ion, extracted with accelerating voltage (V) from the ion source and introduced orthogonally into a magnetic field, will follow a circular trajectory the radius (r) of which will be dependent on the ion’s m/z value, its velocity v, and the magnetic field strength B. The magnetic force zvB will be balanced by the centrifugal force mv2/r.

Thus

zvB ¼

mv2 r

mv ¼ Br z

or

ð35Þ

Hence it can be seen that the magnetic sector separates ions according to their momentum to charge ratio. If the velocity (v) of the ion as calculated from the kinetic energy (Ek) of the ion emerging from the source Ek ¼ zV ¼

mv2 2

ð36Þ

is substituted into Equation (35), we derive m B2r 2 ¼ z 2V

ð37Þ

from which it can be seen that changing the magnetic field (B) as a function of time will allow the successive passage of ions with varying m/z values. Ions with the same m/z value and the same kinetic energy will follow the same trajectory through the magnetic field. However, the process of ionization in the source results in ions being created with a small spread of kinetic energy. This energy dispersion then acts to limit the resolution achievable by the magnetic analyzer. This limitation can be countered by the addition of an electrostatic analyzer set to pass ions of a defined kinetic energy. An ion entering an electrostatic field travels in a circular path of radius r such that the centrifugal force is balanced by the electrostatic field strength (E).

For ions carrying z charges

mv2 ¼ zE r

ð38Þ

Substituting for the ion’s kinetic energy (Equation (36)) r¼

2Ek zE

ð39Þ

It can be seen from Equation (39) that the ion path is independent of the mass and that the electric field is a kinetic energy analyzer. The combination of the magnetic sector’s directional focusing and the electrostatic analyzer’s energy focusing results in a dramatic increase in the overall mass resolution and accuracy of the instrument. However, high resolving power is achieved at the cost of sensitivity because ions are selected within an increasingly narrow spread of energy and direction, with the rest being discarded. This double focusing characteristic can be obtained with the magnetic and electric analyzers arranged in either of the so-called forward (EB) or reverse (BE) geometries. These types of mass spectrometers are today rarely used for biological applications, primarily because of their expense, size, and the relatively slow scan speed, which is incompatible with the trend toward fast,

344 Mass Spectrometry: An Essential Tool for Trace Identification and Quantification

high-resolution LC and GC. The relatively low transmission efficiency also serves to limit the sensitivity of these instruments. The polarity of the magnetic field cannot be rapidly changed to perform, for example, PPNICI and rapid switching to selectively monitor a discrete number of ions (selected ion monitoring, SIM) is possible only over a narrow mass range. In addition, the high-voltage sources lend themselves to discharges when interfaced to liquid chromatographs or the relative high pressures in CI sources. The most important attribute of the double-focusing BE and EB instruments has been the acquisition of high mass accuracy and high mass resolution measurements; however, much of this demand is increasingly being met by ToF analyzers and by FTICR and Orbitrap instruments. Samples are usually introduced into these types of mass spectrometers by either a solids probe or GC.

9.10.2.3.3

Quadrupole The Q mass filter consists of four parallel rods of circular or hyperbolic cross section (10 cm long), extending in the z direction (direction of the ion beam). A high-frequency oscillating electric field is created in the space between the rods by rapidly switching the voltages applied to the rods, with adjacent rods having opposite polarity. The voltages are made up of a DC component (U) and a radio frequency (RF) component (V cos !t). The forces acting on ions within the central volume (radius r) of the rods are given by Fx ¼ max ¼ zðU þ V cos t Þ

2x r2

Fy ¼ max ¼ – zðU þ V cos t Þ

2y r2

ð40Þ ð41Þ

Ions are thus alternately attracted and repelled by the rod voltages as they pass through these quadrupolar fields along the central axis of the rods. The equations of motion are complex (Mathieu equations33), but in principle, only ions with a narrow range of m/z values will be able to traverse the field for particular values of U and V. Other ions will undergo unstable oscillations and be ejected. From these equations, it can be seen that mass and charge are the only factors describing the ion trajectories. Scanning of the mass spectrum is achieved by varying U and V while maintaining the ratio of U/V constant. Q performance is dependent on the number of RF cycles experienced by the ion as it traverses the rods, so the accelerating voltage (and thus ion velocity) applied to ions entering the rods is limited to approximately 10–20 eV. These low accelerating voltages mean that Q analyzers can tolerate higher pressures than the high-voltage sources of magnetic analyzers and are more suited to interfacing with atmospheric pressure sources (e.g., ESI and APCI). Q’s are compact, robust, and inexpensive. They have high ion transmission properties and because scanning is achieved by sweeping electric potentials, the mass range can be rapidly scanned, so they are readily adapted to interfacing with fast chromatography. The ability to rapidly change electric potentials in the source means that it is possible to rapidly switch between analyzing positive and negative ions in alternate scans, something that is impossible with BE-or EB-type instruments, which would require a change in the direction of the magnetic field. The potentials on the Q rods can also be rapidly switched to allow the selective monitoring of a discrete number of ions (SIM). Most importantly, Q’s are readily interfaced to a variety of ion sources and methods of ionization. However, the mass range is limited (2000–4000) and they are not generally capable of high mass resolution. The circular cross-section rods only approximate the required quadrupolar trapping fields and higher mass resolution can be achieved by the use of the more expensive hyperbolic rods. Q’s are also used in the so-called ‘RF-only’ mode (DC voltage set to zero) allowing transmission of ions with a wide range of m/z values and characteristically focusing them into the central region between the rods. This latter property means that RF-only Q’s have found wide use as ion guides or collision cells, to focus an ion beam or to improve the transmission of collision products. The amplitude of the RF voltage determines the low mass cutoff and, theoretically, all ions of m/z greater than the low cutoff value are transmitted. However, there is some discrimination against ions of high mass. Hexapoles and octapoles are used in a similar manner but have better wide band pass characteristics. All these RF-only multipole devices are designated ‘q’ in the shorthand used to describe instrumental configurations. These RF-only multipoles are commonly found in hybrid mass spectrometers used for tandem MS (Section 9.10.3) serving as collision cells and to efficiently transport ions between differentially pumped regions of the instrument.

Mass Spectrometry: An Essential Tool for Trace Identification and Quantification

345

Q mass spectrometers may be found interfaced with most of the sample introduction and ionization methods described above with the exception of MALDI. 9.10.2.3.4

Quadrupole 3D-ion trap The quadrupole ion trap (QIT) is about the size of a small fist and consists of a ring electrode and two hyperbolic end electrodes (see March and Todd68 for a detailed theory of operation and history of development). Like the linear ion trap (LIT, see below), the QIT operates at relatively high pressure (103 torr) with a helium buffer gas that assists the ions to maintain a stable orbital frequency. The buffer gas also serves as the collision gas for collision-induced dissociation (CID) during MS/MS experiments. Ions may be created inside the QIT or, more commonly, externally. An oscillating saddle field inside the trapping volume contains and focuses the ions into the center of the trap. From here the operator can scan the ions out of the trap to create a classic full mass spectral scan of the ions in the trap. Alternatively, a particular ion can be selected (isolated), collisionally fragmented and a scan of all the product ions generated (MS2 scan). This whole process can be repeated with any one of these fragment ions (MS3 scan) and as long as there are sufficient ions remaining in the trap to provide an adequate signal-to-noise ratio (S/N), the process can be repeated (Figure 6). The number of ions that can be retained in the QIT, or indeed in any trapping-type instrument, is limited by space charging effects. Space charging occurs when the cloud of ions becomes sufficiently dense that coulombic repulsion between the like-charged ions starts to overcome the trapping potential, resulting in degraded mass resolution and accuracy. Limiting the number of ions in the trap at any one time normally controls this effect. The QIT is compatible for use with the full range of methods for introducing solids, liquids, and gases – solids probe, GC, and LC – and with all the ionization methods described above including MALDI. (a)

(b)

Ring electrode

End cap electrode

+

+

+

+

+

+ +

+

+

+

+

Ions injected from source

+

Detector

+

+ + +

Precursor ion isolation

Ion accumulation (c)

(d)

+

+ + + +

Collision-induced dissociation of selected ion

Mass analysis of product ions

+

+

+

+

Helium buffer gas Analyte ions

Figure 6 Schematic of collision-induced dissociation (CID) in the quadrupole ion trap (QIT) (MS2 experiment). In separate events, ions from the source are accumulated and trapped in the space at the center of the electrodes (a). Ions with a specified m/z value are retained in the trap and all others ejected (b). The specified ions are then collisionally fragmented by axial excitation between the two end caps (c). The resulting product ions are then sequentially ejected to generate the product ion spectrum (d). In an MS3 experiment, one of these product ions may be selectively retained in the trap, excited, and fragmented.

346 Mass Spectrometry: An Essential Tool for Trace Identification and Quantification

9.10.2.3.5

Linear 2D-ion trap The two-dimensional linear ion trap (2D-LIT) is a logical development of the Q mass filter, described above, in that by the imposition of appropriate potentials at the entrance and exit of the Q’s, ions with a range of m/z values can be trapped within the axial quadrupolar field (see March and Todd69 for a detailed theory of operation and history of development). In common with the QIT, the LIT operates at relatively high pressure (103 torr) with a helium buffer gas. The buffer gas collisionally cools the ions and also acts as a collision gas for MS/MS experiments.70,71 The LIT has several advantages over the QIT. The larger volume means that more ions can be contained within the LIT before space charging becomes evident. This results in a greater dynamic range and improved sensitivity that can translate into lower detection limits for MS/MS analysis. Trapping efficiencies are also enhanced, as ions entering the trap have to overcome the trapping potential only on the front section. Once in the trap, the ions are collisionally cooled by interaction with the helium buffer gas and thereafter lack the energy to escape the trapping potential on the front section. Once in the trap, the ions are collisionally cooled by interaction with the helium buffer gas and thereafter lack the energy to escape the trapping potential on the front and back sections. This is in contrast to the QIT where there is only a narrow time window in which the amplitude and phase of the RF voltage are such that ions can pass through the end cap to enter the trap. This limits the trapping efficiency for the QIT to <5% compared to 29% for the LIT. At other phases and amplitudes, ions will have either too little or too much momentum so that the ions do not experience a sufficient number of collisions with the QIT buffer gas to be cooled and trapped.70 In summary, the LIT has a significant sensitivity advantage over the QIT. Using mass selective instability with resonance ejection, ions are scanned out of the trap through slits in the center of two opposite center section rods and focused onto two separate conversion dynodes. In the case of the QIT, where ions are scanned out of both end cap electrodes, the only place for a detector is behind the end cap opposite the ion entrance, so that only half of the ions scanned out of the trap are detected. Both the QIT and LIT operate at unit mass resolution with similar scan rates and both have the capacity to generate higher resolution spectra at slower scan rates. In theory, the LIT should have the same universal utility as the QIT in terms of the types of samples and in being interfaced with GC or LC but to date only the LC interface is commercially available.

9.10.2.3.6

Orbitrap A new mass analyzer, the Orbitrap, is a modified development of the ‘Knight-style’ Kingdon trap.68,72–73 The Orbitrap radially traps ions about a central spindle electrode that is contained by an outer barrel-like electrode maintained at a vacuum of more than 3 1010 torr. The m/z values of the ions are then measured from the frequency of the ion’s harmonic oscillations along the axis of the central electrode. These axial frequencies are independent of the energy and spatial spread of the ions and they are detected as a broadband image current of a time-domain signal that is converted to a mass spectrum by fast Fourier transform algorithms.74 The Orbitrap is available as a stand-alone instrument and as a hybrid consisting of a linear ion trap coupled to the Orbitrap via a C-trap, which is responsible for focusing and injecting ions tangentially into the Orbitrap (LTQ-Orbitrap). The performance characteristics of this analyzer are quite remarkable with mass accuracies of <2 ppm at a resolving power of 60 000 using an external calibration75 and of <1 ppm with internal calibration.76 As such, it has attracted the attention of analysts and instrument developers alike. New features have included ETD (see Section 9.10.3.2.5) and options for higher energy collisions in the C-trap or in an additional octapole collision cell.77

9.10.2.3.7

Time-of-flight Conceptually, the ToF analyzer is very simple, in that ions of the same kinetic energy, Ek (extracted from the ion source with accelerating voltage V), but differing m/z values take different times t to traverse a fixed distance d. Thus lighter ions travel the fastest and are detected before the heavier ones. For an ion of mass m, the electric charge q is equal to the number z of electron charges e (ez).

Mass Spectrometry: An Essential Tool for Trace Identification and Quantification

Ek ¼ ezV ¼

mv2 2

347

ð42Þ

d v

ð43Þ

m d2 z 2Ve

ð44Þ

Substituting for velocity into Equation (43) yields t2 ¼

To measure the flight time, the ions must be accelerated from the source in discrete packets. The resolving power of this simple experimental setup, linear ToF, was not good and after some initial popularity, the technique languished. The resolution was limited by the fact that at the time when ions are accelerated out of the source, they are not neatly lined up at the starting line. Rather they are positioned throughout the source and have a range of different kinetic energies. For the ultimate resolution, ions of the same mass (isobaric ions) positioned anywhere within the ion source need to arrive simultaneously at the detector.78,79 When the laser-induced ion plume is formed, there is no immediate application of the source extraction field and the plume is allowed to expand as if in a field-free region. If we consider just a group of isobaric ions, the more energetic ions fly faster and reach further into the source region than less energetic ones. Then at a chosen time, one of the electrodes of the extraction region is appropriately pulsed with high voltage to create the extraction potential. The ions in the tailing end of the plume (the originally less energetic) find themselves in a higher potential than the rest, and eventually acquire slightly higher velocity, enough to catch up with the leading-end ions by the time they reach the detector position. Variations in the longitudinal velocity of isobaric ions can also be corrected by the use of a reflectron. This is basically an electric field that initially slows the ions and then accelerates, or reflects, them back out toward the detector. The more energetic ions will penetrate deeper into the decelerating field than less energetic ions of the same m/z value and experience a longer flight path and a longer flight time. The end result is that ions of a given m/z value will arrive at the detector in a much narrower time span (time focusing). The combination of delayed extraction, to compensate for positional differences of the ions, the addition of one or more reflectrons in the flight path, to compensate for different ion kinetic energies, and fast digital electronics, can boost the mass resolution of the ToF analyzer to better than 104 (FWHM). At the start of the ToF renaissance, these analyzers were associated with MALDI sources as the discontinuous laser pulses are ideally suited to the pulsed nature of the ToF analyzer. However, continuous ion beams (e.g., EI and ESI) have also been coupled with ToF analyzers.78 This has been achieved by locating the ToF analyzer orthogonal to the continuous ion beam axis. An orthogonal accelerating voltage is then applied to the beam and a discrete linear ion packet can then be pulsed into the ToF. During the time that the ions are moving in the drift region, and in the reflectron, the orthogonal acceleration volume is refilled by the continuous beam, hence the high, mass analyzer efficiency that is characteristic of ToF analyzers. For illustrative purposes, Guilhaus et al.78 compared the approximate mass analyzer efficiency of a Q scanning over a 1000 u mass range and a ToF analyzer, with calculations of 0.025 and 25% maximum efficiencies, respectively. As Guilhaus et al.78 have stated, In scanning instruments some of the mass range is detected all of the time while in TOF instruments all of the mass range is detected some of the time.

ToF analyzers are relatively small and of medium expense and so represent a good alternative to magnetic sector and Q analyzers, especially when their speed and sensitivity advantages are considered. Their mass accuracy and ease of calibration are also well established. ToF analyzers also have the highest practical mass range of all mass analyzers. However, the digitizer speed may place limitations on the instrumental dynamic range. The very fast acquisition rates that are achieved in ToF analyzers mean that they are also ideally suited

348 Mass Spectrometry: An Essential Tool for Trace Identification and Quantification

to analyze fast GC separations with the added benefit that the high acquisition rates mean that coeluting components are much more readily deconvoluted than when a slower analyzer such as a Q is used.

9.10.2.3.8

Fourier transform ion cyclotron resonance The FTICR is a trapping-type instrument with the ICR cell being held within the field of a superconducting magnet.18,33,80 The cell itself consists of three pairs of opposing plates in the form of a cube or a cylinder. Ions are injected into the cell along the axis of the magnetic field and are then electrostatically trapped within the cell by the trapping potential placed on the two trapping plates that are orthogonal to the direction of travel. These ions are then subjected to an excitation pulse from the excitation plates and they will then, under the direction of the Lorentz force, spiral out from the center of the cell into a circular orbit. As noted above (Section 9.10.2.3.2), ions introduced orthogonally into a magnetic field will, under the direction of the Lorentz force, follow a circular trajectory, the radius (r) of which will be dependent on the ion’s m/z value, its velocity v, and the magnetic field strength B. The Lorentz force, qvB (q, charge; v, velocity), can be equated to the centripetal force mv2 ¼ qvB r

ð45Þ

and the angular frequency (!) of the ions trapped in these circular orbits (cyclotron motion) is given by !¼

v r

ð46Þ

so that substituting for v from Equation (46) into Equation (45) yields m!2 r ¼ q!rB !¼

qB m

ð47Þ

the cyclotron equation. From Equation (47) it can be seen that while the ion cyclotron frequency (!) of an ion is a function of its mass, charge, and the magnetic field, it is independent of the ion’s initial velocity. The cyclotron orbits of thermal energy ions when they first enter the ICR cell are both too small and incoherent to be detected. However, if an excitation pulse is applied at the cyclotron frequency, the resonant ions will absorb energy and be brought into phase with the excitation pulse. They will have a larger orbital radius and the ion packets will orbit coherently. The ions may then be detected as an image current induced in the receiver plates. Additionally, this excitation pulse increases the kinetic energy of the trapped ions to the extent that fragmentation can be collisionally induced by ion–molecule reactions. Alternatively, the excitation pulse may be used to increase the cyclotron radius so that ions are ejected from the ICR cell. Normally, many different ions will be present within the cell but they may all be excited by a rapid frequency sweep. The m/z values of the ions present in the ICR cell, and their abundance, may then be extracted mathematically from the resultant complex image current using a Fourier transformation to generate the mass spectrum of the ions. An important feature of FTICR is that the ions are detected nondestructively and that longer acquisition times over a narrower m/z range may be used to increase the measured mass resolution and the S/N. FTICR instruments have a stringent requirement for a very low background pressure (1010 torr) to minimize ion–molecule reactions and for this reason most analytical experiments are accessed through a variety of external ion sources that are separated from the cell by several stages of differential pumping. This vacuum requirement and the cryogenic cooling needed to run the superconducting magnet make this form of MS very capital intensive and expensive to run. However, this is offset by the extraordinary mass accuracy (sub-ppm with internal calibration), mass resolution (>106 at 100 u), and sensitivity (able to detect a few hundred ions at a time) that may be achieved.18 FTICR instruments can also serve as platforms for a variety of unique dissociation techniques (e.g., IRMPD, infrared multiphoton dissociation; ECD; EDD, electron detachment dissociation; see Section 9.10.3.2) and this

Mass Spectrometry: An Essential Tool for Trace Identification and Quantification

349

combined with their high mass accuracy and high mass resolution means they are ideally suited to identify and characterize large intact biomolecules – the ‘top-down’ approach.81 FTICR instruments that are designed to analyze low-molecular-weight molecules, such as VOCs, do not require superconducting cryogenic magnets and can be built using structured permanent magnets. Dehon et al.43 built a dedicated PTR-FTICR (proton-transfer reaction Fourier transform ion cyclotron resonance) containing a cascade of three differentially pumped cells within the same magnetic field. The first cell is used as an ion source (105 torr), from which the selected H3Oþ ions are drifted via the second cell into the third cell where they react with the sample (107–105 torr). After the reaction, the ions are drifted back to the second cell for FTICR analysis. Although this instrumental approach is not as sensitive as in PTR-MS instruments (1 ppm compared with 0.1 ppb), the mass resolution and mass accuracy of the FTICR means that molecular formulas may be readily determined for the VOCs. 9.10.2.3.9

Ion mobility spectrometry In drift tube ion mobility spectrometry (IMS), a packet of ions is drawn through an inert gas under the influence of a weak electric field. The extent of interaction with the inert gas and the rate of progress through the drift tube are dependent on the collisional cross section (shape and size) of the ion and on the number of charges carried by the ion. The requirement to gate the packets of ions entering the IMS and the need to wait for the ions to clear the drift tube result in a low duty cycle. If the sample is being supplied in a continuous flow, as in, for example, an ESI source, then much of the sample will be lost to the analysis. When this is combined with losses through radial diffusion, the overall sensitivity of the technique is poor. Nevertheless, the prospect of a technique to preprocess ions prior to MS analysis has proved attractive and in recent times two variations of this technique, circumventing these disadvantages, have been successfully developed for combination with MS. In high-field asymmetric waveform ion mobility spectrometry (FAIMS), a continuous stream of ions is fed into the device inlet in a stream of dry carrier or bath gas.82,83 The ions are then exposed to alternating strong and weak electric fields of opposite polarity across the carrier gas flow. The differential collisional interaction of ions with the carrier gas in the oscillating asymmetric fields results in different ions experiencing a net movement to one or the other wall electrode. If no other voltage is applied, the ions will eventually collide with one of the wall electrodes and be lost. However, if a low compensation voltage (CV) of correct magnitude and polarity is applied, then selected subsets of ions will be passed to the mass analyzer with a concomitant increase in their S/N and improved detection limits. For mixtures of ions, the CV can also be scanned. The ion separation achieved in the FAIMS device can also be refined by the use of different carrier gases.82 In the traveling wave IMS (TWIMS),84 ions are initially accumulated in a trap ion guide and then released as an ion packet into the ion mobility ion guide. Here axial motion through the stack is generated by a repeating sequence of transient DC voltages providing a continuous series of ‘traveling waves’. Ions are then separated as they are driven ahead of these potential hills through the stacked ring ion guides before transfer to the MS analyzer. Although a relatively new adjunct to MS, IMS, whether the FAIMS or the TWIMS variety, has demonstrated a wide-ranging usefulness, particularly with respect to analyzing complex mixtures. It has been used, for example, to separate positional isomers of small molecules, to remove chemical noise and thereby improve detection limits and sensitivities of assays, and to examine conformational forms of multicharged protein ions. In proteomic experiments, IMS can be used to select out triply charged ions for ETD and doubly charged ions for CID, ignoring the single-charged peptides, solvent ion clusters, and other chemical noise (e.g., phthalate ions). Both FAIMS and TWIMS can be used with existing LC techniques to separate ions in a continuous stream and are in principle compatible with all types of ion sources and analyzers.

9.10.3 Tandem Mass Spectrometry As you will see in the following section, it is quite common for an instrument to contain more than one analyzer. A shorthand nomenclature has been adopted to describe such instrumental configurations, using the analyzer abbreviations outlined above (Section 9.10.2.3), in which the order of the abbreviations represents the order of

350 Mass Spectrometry: An Essential Tool for Trace Identification and Quantification

the analyzers traversed by the ion beam. For example, QqQ designates the very common triple Q instrument with the two scanning Q’s separated by an RF-only Q that acts as the collision chamber. Other examples will be discussed below. The ‘soft’ ionization processes described above (Section 9.10.2.2) typically generate single- or multicharged molecular ions with little accompanying fragmentation. To obtain structurally informative fragments, these ions must be subject to a second round of mass spectral analysis. This is known as MS/MS or tandem MS. In the first MS stage, an ion is selected or isolated in the mass spectrometer, activated and fragmented, most commonly by CID, and the product ions mass analyzed in the second MS stage. Depending on the instrument being used, it is possible to perform multistage mass spectrometry (MSn) and to construct ion fragmentation pathways as part of an exercise in structural elucidation. It is also possible to use tandem MS to add a large degree of selectivity and to improve sensitivity in an assay by removing background chemical noise (see discussion below, Sections 9.10.4.2.2 and 9.10.4.5.7). With the demand for the analysis of increasingly complex samples, often coupled with a ‘soft’ ionization process, tandem MS along with mass determination with high accuracy and resolution has become an essential feature of modern biological mass spectrometers.

9.10.3.1

Analyzers

9.10.3.1.1

Tandem-in-space For the beam-type mass analyzers (sector, ToF, and Q), each stage of mass analysis is performed in discrete mass analyzers usually separated by a collision cell. This arrangement is called tandem-in-space. The use of multiple analyzers means that analyzers can be independently selected for the different stages of analysis based on the desired performance characteristics. Two common instrumental configurations for tandem-in-space experiments are the so-called QqQs, which consist of two Q mass filters, Q1 and Q3, separated by an RF-only Q collision cell (q) (Figure 7), and the QqToF class of instruments, which use a ToF analyzer in place of the third Q. The QqQ analyzer arrangement has the advantages of cost and ease of operation associated with Qs but leaves the analyst with limited mass resolution and mass accuracy with which to select and analyze ions. The replacement of the third Q by a ToF analyzer, although representing an increase in cost, gives the operator access to high-resolution/high mass accuracy data, in addition to greatly improved full scan sensitivity.85 Today’s generation of collision cells use RF-only multipoles (hexapoles and octapoles) or ring guides, which have improved transmission characteristics over the RF-only Q, but these are still commonly denoted as ‘q’ in instrumental shorthand. Recent developments in instrumentation have seen the commercial release of traps combined with ToF analyzers (QIT-ToF), quadrupoles with traps (QqLIT), and traps with traps (LIT-FTICR, LIT-ToF, and LIT-Orbitrap), all taking advantage of the MSn capabilities of the ion trap mass analyzers. The ability to select ions in a separate analyzer, prior to the final stage of MS analysis, serves to enhance the dynamic range and sensitivity of the final MS analysis. The development and characteristics of these hybrid combinations have been reviewed by Glish and Burinski86 and Hagar.87 The ToF–ToF combination with high mass accuracy and high mass resolution in both MS stages is also commercially available.88

9.10.3.1.2

Tandem-in-time Tandem MS may also be performed intime using a trapping-type analyzer (e.g., LIT, QIT, and FT-ICR) (Figure 6). The experimental efficiency of this arrangement is usually higher than that of tandem-in-space instruments as ions do not have to be transferred between analyzers; however, experiments take longer to complete and sample presented to the mass analyzer from a continuous source while the trap is in the analysis mode will be lost. The different stages of the tandem-in-time experiment all take place in a temporal sequence within the same physical space. In these experiments, the selected precursor ion is retained in the trap and all other ions expelled. The selected ion is then activated and fragmented and the fragments analyzed to generate the MS/MS (MS2) spectrum of the precursor ion. As long as there are sufficient ions still available in the trap, this process may be extended by selectively retaining one of the fragment ions and repeating the fragmentation process to generate the MS/MS/MS or MS3 spectrum.

Mass Spectrometry: An Essential Tool for Trace Identification and Quantification

(a)

Q1 RF-only mode

Full scan

+

+ +

+

+

+

+

Select

Product ion scan

Q3 scan

Collision cell (q)

+

(b)

351

Detector

Scan

+ + +

+

(c)

Scan

Precursor scan

Select

+ + +

+

(d)

Scan

Neutral loss scan

Scan at fixed offset from Q1

+ + +

+

(e)

Select

Selected reaction monitoring (SRM)

Select

+ + +

+

Figure 7 Scan modes for a tandem-in-space instrument, the triple quadruple (QqQ). (a) Full scan: all source ions are passed through to Q3 while Q1 and q (collision cell) are set to the RF-only mode. (b) Production scan: Q1 is set to pass a selected ion (precursor ion). This is fragmented in the collision cell and products are analyzed by scanning Q3. (c) Precursor scan: Q1 scans all the source ions into the collision cell for collision-induced dissociation (CID). Q3 is set to pass a selected product ion. A signal recorded at Q3 is correlated with the corresponding precursor ion passing through Q1. (d) Neutral loss scan: Q1 is set to scan ions into the collision cell for CID. The Q3 scan is offset by a specified mass, equal to the mass of the neutral, relative to Q1. (e) Selected reaction monitoring (SRM): an ion selected in Q1 is fragmented and a specific fragment is then recorded after selection by Q3. SRM is commonly used in quantitative work to improve assay selectivity and sensitivity.

9.10.3.2

Fragmentation

9.10.3.2.1

Collision-induced dissociation Fragmentation of ions in tandem experiments requires an input of energy to break internal covalent bonds. This is most commonly achieved by converting the kinetic energy of a collision, between the selected ion and an

352 Mass Spectrometry: An Essential Tool for Trace Identification and Quantification

inert collision gas such as helium or argon, into vibrational energy. Fragmentation then occurs when the internal energy exceeds the activation energy required to cleave a particular bond. It is also important to note that bond cleavage may be preceded by an internal rearrangement, such as, for example, hydrogen scrambling or the McLafferty rearrangement. For tandem-in-space experiments, CID occurs in a collision cell, physically located in the field-free region between the mass analyzers (Figure 7). The cell is differentially pumped and the flow of gas into the cell is carefully controlled. Increasing the collision gas pressure attenuates the main beam and, at the same time, the probability of ions undergoing single, double, triple, etc. collisions will increase, as will the scattering of the ion beam. Modern gas cells are usually either an RF-only multipole or a set of ring guides that are designed to contain and refocus, as much as possible, ions scattered from the direction of travel of the main beam. In high-energy collisions (KeV), the collision gas is usually helium as its high ionization energy reduces the risk of charge exchange. In low collision energy systems (1–200 eV), heavier gases such as argon or xenon have been used to improve the effectiveness of the CID process. The extent of fragmentation obtained from CID in tandem-in-space configurations is dependent on both the energy of the ions entering the collision cell and the pressure of the collision gas. Higher energy ions will be able to access fragmentations with higher activation energies and higher pressure of the collision gas will result in multiple collisions producing more extensive fragmentation, fragmenting fragments. For small ions, a single collision may be sufficient to induce the dissociation of a covalent bond; however, as the size of an ion increases, so does the number of vibrational degrees of freedom over which the collisional energy may be distributed. Thus the effectiveness of CID decreases with mass. Nevertheless, CID is quite effective for relatively large, multicharged polypeptides (up to 5 kDa), where the bond cleavage may be assisted by the coulombic repulsion of the multiple charges. The CID spectra generated in traps are qualitatively different from those generated by tandem-in-space experiments. In traps, only the selected ion is activated and once fragmentation of this ion has occurred, no further collisions can take place. Thus the fragment ion spectrum generated in a trap will be simpler and less informative than that from a tandem-in-space experiment. Where the precursor ion undergoes, for example, the simple loss of a neutral such as water, the product ion spectrum will be relatively uninformative, consisting mainly of [MH H2O]þ. However, this may be readily overcome by a broadband activation, which is applied to all ions in a range 20 m/z below the precursor ion. FTICR cells are typically operated under very high vacuum (1010 torr) and CID within the cell must be initiated by injecting a collision gas after an ion packet of a designated m/z value has been selected by expulsion of all other ions from the cell. The selected ions are then activated by sustained off-resonance irradiation (SORI). This results in the ion orbit expanding and contracting with time and in the process the ions undergo multiple, low-energy collisions with the injected collision gas. The collision gas is then pumped away and the fragment ion spectrum measured. Alternatively, if the FTICR is part of a tandem-in-space instrument (e.g., QqFTICR or LIT-FTICR), then ions can be fragmented by CID outside of the cell, with the product ions presented to the FTICR for mass analysis. Fragmentation in the FTICR cell may also be accomplished by photon-induced dissociation (PID, Section 9.10.3.2.2), ECD (Section 9.10.3.2.3), and EDD (Section 9.10.3.2.4) (see also discussion in Section 9.10.3.2.6 on the use of CID/IRMPD and ECD/ETD for protein/peptide sequencing, and Table 4). Finally, when using ESI or APCI, it is also possible to perform in-source CID at atmospheric pressure. This is sometimes referred to as ‘pseudo-MS/MS’. By increasing the entrance cone voltage, newly formed ions can be accelerated toward the entrance cone, colliding with other molecules, mostly atmospheric nitrogen, and fragmenting. It is important to note that there is no mass selection for a precursor ion and that selection is entirely based on chromatographic separation.

9.10.3.2.2

Photon-induced dissociation Gas-phase ions may be fragmented by photoexcitation (PID), particularly by IR photons tuned to the vibrational frequency of covalent bonds. The cross section of an ion for photon absorption is low compared with its collisional cross section, so PID is most commonly associated with the use of intense light sources (lasers) and FTICR where the period the ion is exposed to the photons can be lengthened to increase the

Mass Spectrometry: An Essential Tool for Trace Identification and Quantification

353

Table 4 Comparison of CID/IRMPD with ECD/ETD for peptide sequencing CID/IRMPD Molecules are vibrationally excited by either physical collision with a neutral gas (CID) or by absorption of an IR photon (IRMPD). Vibrational energy can be distributed over whole molecule. Larger analytes need more energy and efficiency drops off with increasing size. The weakest, most labile bonds break first (loss of water, PTMs, etc). In peptides, CID/IRMPD generates y- and b-series ions. Low-energy interactions do not allow isomeric Leu/Ile to be distinguished.

ECD/ETD An electron is directly (ECD) or indirectly, via an anion (M? from fluoranthrene) (ETD), transferred to a cation (positively charged peptide). Not applicable to singly charged ions. The electron is accepted by an amide-associated proton on the peptide backbone. This very unstable radical reacts very quickly to cleave the peptide bond at the site of reaction. Rapid process does not allow time to distribute energy over whole molecule. Can be applied to whole proteins in ‘top-down’ proteomic analysis. Bonds cleaved are those that accept the electron, not simply the weakest bonds. PTMs are preserved. Can distinguish isomeric Leu/Ile via secondary fragmentation of radical z? ions. In peptides and proteins, ETD generates c- and z-series ions but some y ions may also be observed. Masses of c and z ions may be 1 Da lighter or heavier, respectively, because of extensive hydrogen rearrangement.

In modern instrumentation, these techniques may both be applied to the same peptide to generate complementary sequence data and PTM data (IRMPD and ECD in FTICR MS,89 CID and ETD in QIT and LIT,90 and CID and ETD in QToF91).

chance of absorption and subsequent dissociation. For IRMPD, these IR photons may be supplied by a laser (10.6 mm from a CO2 laser) or they may be radiated from a heated blackbody (blackbody infrared dissociation, BIRD). IRMPD, BIRD, and CID all induce fragmentation by the addition of excess vibrational energy to covalent bonds and consequently yield very similar patterns of fragmentation. For example, when applied to protonated peptides and proteins, CID and IRMPD cleave the weakest bonds, the PTMs (e.g., phosphorylation, sulfation, -carboxylation, and N-and O-glycosylation) and the backbone peptide amide C–N bonds, to yield the characteristic series of N-terminal b ions and C-terminal y ions (Figure 8). The advantage of IRMPD and BIRD over CID is that no pump-down time is required to remove the collision gas and thus high-resolution detection can be effected immediately.

AA1

AA2

x2 y2 z2

x3 y3 z3

O Amino or N-terminus

H2N

R2

x1 y1 z1

O

a1 b1 c1

R4

H N

OH N H

N H R1

AA4

AA3

O

a2 b2 c2

R3

Carboxy or C-terminus

O

a3 b3 c3

Figure 8 Roepstorff and Fohlman92 notation for peptide fragmentation. For the x-, y-, and z-ion series, charge is retained on the C-terminus fragment and for the a-, b-, and c-ion series, charge is retained on the N-terminus fragment. Cleavage of the C–C bond gives rise to the a and x ions (e.g., by EDD); cleavage of the C–N amide bond, the b and y ions (e.g., CID); and cleavage of the N–C amine bond (e.g., by ECD or ETD), the c and z ions.

354 Mass Spectrometry: An Essential Tool for Trace Identification and Quantification

9.10.3.2.3

Electron capture dissociation An alternative method for achieving covalent bond cleavage in the FTICR cell, and one that has been mostly applied to sequencing peptides and proteins, is ECD.93,94 The multicharged peptide and protein ions from ESI are an ideal target for ECD as the cross section for electron capture increases by approximately the square of the ionic charge. The capture of a thermal electron (0 eV) is an exothermic reaction and in protonated peptides and proteins, results in cleavage of disulfide (S–S) bonds along with cleavage of the backbone N–C amine bond, yielding the characteristic complementary pairs of c and z? (90%) or a? and y (10%) fragment ions used for sequencing. ½M þ nHnþ þ e – ! ð½M þ nHðn – 1Þþ? Þtransient ! fragments –

?

R1 S S R2 þ e ! R1 SH þ S R2

ð48Þ ð49Þ

A unique feature of ECD is that the N-terminal fragment ions, the c ions, contain an extra hydrogen atom from the proton neutralized by the electron capture. The complementarity of the c/z? pair can thus be confirmed by the fact that their mass sum is 1 u greater than the Mr of the protein. The ECD process, by its nature, is a very rapid process and bond dissociation occurs faster than the redistribution of intramolecular vibrational energy that occurs with CID. This explains the dissociation of the strong N–C amine bonds in the presence of the weaker C–N amide bonds in peptides and proteins.93,94 Consequently, any labile PTMs (e.g., phosphorylation, sulfation, -carboxylation, N- and O-glycosylation) are preserved and may be unequivocally located in the peptide/protein sequence. See also discussion in Section 9.10.3.2.6 on the use of ECD/ETD and CID/IRMPD for protein/peptide sequencing, and Table 4. Recently, it has been observed that ECD can also occur for electrons with energies in the range of 3–13 eV,95 the so-called hot ECD (HECD), with the excess energy going into secondary fragmentation, including cleavage of the C–N amide bonds (b- and y-ion series) in multicharged peptides. Significantly, the isobaric isoleucine and leucine residues were reported as losing ?C2H5 and ?C3H7, respectively, allowing these isomeric amino acids to be distinguished.96 Unlike CID, where the applied intramolecular vibrational energy can be redistributed and dissipated across the whole molecule, so that the efficiency of CID is diminished with increasing analyte size, ECD can be used to sequence large, undigested proteins. This has enabled the development of the ‘top-down’ approach to proteomics, which has the advantage of directly sequencing a protein, along with its PTMs, rather than having to infer the sequence following an enzymic digestion (e.g., trypsin or Lys-C) and an in silico reassembly from the MS/MS data on the enzymic peptides. It has also been noted that there are a number of side chain losses in ECD that can aid, for example, in distinguishing the isobaric amino acid residues, leucine and isoleucine. However, the ECD technique is accessible only in the expensive FTICR instruments (see also discussion in Section 9.10.3.2.6 on the use of CID/IRMPD and ECD/ETD for protein/peptide sequencing, and Table 4). 9.10.3.2.4

Electron detachment dissociation EDD is a promising new FTICR technique, and is the negative ion complement to ECD. Both these electron-mediated techniques involve a radical ion intermediate, produced by either electron attachment to multiply charged cations (ECD) (Equations (48) and (49)) or electron removal from multiply charged anions (EDD) (Equation (50)). ðn – 1Þ – ? – ½M – nHn – þ e20 Þtransient þ 2e – ! fragments eV ! ð½M – nH

ð50Þ

Many compounds such as glycosaminoglycans (GAGs), nucleic acids, acidic peptides, or peptides with acidic PTMs such as phosphorylation or sulfation do not readily form positive ions, especially in mixtures where ion formation is favored by the more basic mixture components. When positive ions can be formed, then the spectra are usually characterized by the abundant loss of the PTM (e.g., sulfate from GAG) or, in the case of oligonucleotides, a proton from the sugar–phosphate backbone.

Mass Spectrometry: An Essential Tool for Trace Identification and Quantification

355

However, these acidic compounds do readily form negative ions. Exposure of these anions to energetic electrons (20 eV) is reported to produce a ‘positive radical charge (hole)’ which is exothermically neutralized by an electron.97 For peptides, this results in C–C bond cleavage, to form complementary a? and x fragment ions, with retention of the acidic PTM.94,98 Thus like ECD, EDD can also cleave covalent bonds without affecting weaker noncovalent interactions. EDD has also been shown to preferentially cleave S–S and C–S bonds.99 For GAGs, structurally informative glycosidic bond cleavages and cross ring cleavages can be generated without loss of the labile sulfate group.100,101 and the glucuronic acid and iduronic acid epimers in heparan sulfate tetrasaccharides can also be distinguished.102 EDD has also been used to partly characterize synthetic polyamidoamine dendrimers103 and fragmentation was found to complement that obtained with CID. Complete sequences of short oligonucleotides of both DNA and RNA have been determined with EDD104,105 and EDD may also be used to probe the tertiary structure of nucleic acid.106

9.10.3.2.5

Electron transfer dissociation With respect to peptide/protein sequencing, ECD has some very attractive features, producing random cleavages along the peptide backbone (c- and z-type ions) and at the same time preserving labile PTMs such as phosphorylation. Unfortunately, this technique is not readily transferable from FTICR instruments to the relatively low-cost and more common instruments that trap ions by RF electrostatic fields (QIT and LIT), where the bulk of this work is performed, as these analyzers are unable to trap the required dense cloud of thermal electrons. However, the Hunt group107,108 have developed an alternative method of delivering electrons to multiply charged cations, using anion–cation interactions to effect ETD. Radical cations (M?) of a polyaromatic hydrocarbon, usually fluoranthrene, are generated by methane CI, externally to the trap. C16 H10 þ e –thermal ! C16 H10 – ? ðm=z 202Þ

ð51Þ

These radical cations are then injected into the trap where they are mixed with the multiply charged peptide cations to which an electron is then transferred, leading to their direct dissociation into c- and z-type ions by the same mechanism responsible for ECD. The process is rapid (milliseconds) and quite compatible with the chromatographic timescale of LC–MS. ½M þ 3H3þ þ C16 H10 – ? ! ½M þ 3H2þ? þ C16 H10

ð52Þ

½M þ 3H2þ? ! ½c þ 2Hþ þ ½z þ Hþ?

ð53Þ

ETD is also applicable to large intact proteins but the multiply charged fragments are difficult to interpret because of the limited resolving power of the Q traps. However, it is possible to deprotonate the multiply charged fragment ions and to reduce their charge by a further round of cation–anion interactions with even-electron benzoate anions.108 ½M þ 7H7þ þ 6C6 H5 COO – ! ½M þ Hþ þ 6C6 H5 COOH

ð54Þ

More recently, it has been demonstrated that the reagent anions used for either ETD or proton transfer can be derived from the same neutral compound. The radical anions used for ETD, [M]?, are converted into even-electron proton transfer reagent anions. [M þ H], by changing the potential on the methane CI source.109 This voltage switch can be acheived in milliseconds allowing for rapid sequential ion–ion reactions and opens up the possibility of top-down sequencing of intact proteins in RF ion traps. It is unusual for either CID or ETD to provide complete sequence information from any one peptide but the use of both techniques provides complementary information, which can greatly extend the sequence coverage (Table 4). In addition, because the energy from the ETD process is directed into cleaving the C–N bond, the labile PTMs are preserved and their location in the peptide sequence can then be determined90 (see also discussion in Section 9.10.3.2.6 on the use of CID/IRMPD and ECD/ETD for protein/peptide sequencing, and Table 4).

356 Mass Spectrometry: An Essential Tool for Trace Identification and Quantification

It should be noted that ETD is a relatively inefficient process for doubly protonated peptide precursors [M þ 2H]2þ, which are the ions most commonly found in ‘bottom-up’ proteomics experiments. This situation may be retrieved, however, by using a supplemental low-energy CID method (ETciD) to target the nondissociated electron transfer (ET) product, [M þ 2H]2þ?. CID of the ET product then yields c- and z-type fragment ions. Swaney et al.110 have reported that in a large-scale analysis of doubly charged tryptic peptides, the use of ETciD resulted in a median sequence coverage of 89% compared to 63 and 77% for ETD and CID, respectively.

9.10.3.2.6

Combined use of dissociation techniques In proteomics experiments, neither CID/IRMPD (b- and y-series ions) nor ECD/ETD (c- and z-series ions) fragmentation, when used on its own, is capable of generating a complete set of sequence ions from which an unambiguous primary structure of a peptide may be derived (IRMPD and ECD in FTICR MS,86 CID and ETD in QIT and LIT,90 CID and ETD in QToF91). Thus database searching, which is at the heart of MS/MS-based proteomics, is vulnerable to misidentification of peptides (false positives).111 The respective characteristics of both these processes are summarized in Table 4. In CID/IRMPD, the C–N bond is cleaved to generate the well-defined b- and y-series ions and, in ECD/ ETD, the N–C bond is cleaved to generate the c- and z-series ions. However, the latter series is not so well defined as the masses of the c and z ions may be 1 Da lighter or heavier, respectively, due to hydrogen rearrangements.112 The other important difference between these two processes is the retention of PTMs (phosphorylation of serine, threonine, and histidine, along with the N- and O-glycosylations) in ECD/ETD and the ability to distinguish the isomeric amino acids, leucine, and isoleucine. On the basis of the complementarity of CID/IRMPD and ECD/ETD, Zubarev et al.111 have concluded that de novo sequencing of peptides using these two fragmentation techniques in conjunction with high mass accuracy113,114 can be achieved with >95% reliability. They have furthermore stated that it is . . . only de novo sequencing which can guarantee error-free sequence identification.

The complementarity of ECD/ETD has also been confirmed in a comprehensive comparison by Molina et al.115 on some 19 000 peptides. They found that by combining the respective peptide fragmentation data they could achieve a 92% sequence coverage for an average tryptic peptide. ECD may also be used in a complementary manner with the newly developed EDD technique. Although the fragmentation efficiency of EDD (average 3.6%) was low compared with ECD (average 15.7%), Kjeldsen et al.116 have recently demonstrated that the combination of the two techniques could increase the overall amino acid sequence coverage of proteins and PTM characterization. Further developments in the use of these complementary combinations of dissociation techniques will aid in generating a more comprehensive and reliable system of identifying and characterizing proteins and their PTMs. Such progress is likely to be based on a comprehensive understanding of gas-phase peptide chemistry and fragmentation.117

9.10.4 Experimental Use of Mass Spectrometry Whether you are using MS for identification or quantification, it is important to first have as much information about the sample as possible, particularly the matrix, and whether the sample is a mixture or a pure compound, and to clearly identify the data that need to be obtained. This will influence decisions on

• • • •

type of instrument required (e.g., do you require exact mass data – high accuracy and resolution, or selected reaction monitoring (SRM) for a complex sample), method of presentation to the mass spectrometer (e.g., solids probe, GC, or LC), most appropriate method of ionization, and finally the scan mode to be used.

Mass Spectrometry: An Essential Tool for Trace Identification and Quantification

357

In this section, we will look briefly at factors to be considered in selecting an ionization method, the choice of a scan mode, how mass spectral data can be used to identify an unknown compound or a known compound, and the factors to be considered in setting up a quantitative mass spectral assay. 9.10.4.1

Spoilt for Choice – Which Ionization Method to Choose?

The general range of application for each of the ionization methods described above (Section 9.10.2.2) is illustrated in Figure 9. For low-molecular-weight samples (<1000 mass units) of moderate polarity, there will be several options. If analyte identification is required, then GC/MS will be a good option as the GC will permit high-resolution separation of mixture components and the use of EI will generate spectra that can be searched against large databases of EI spectra (e.g., NIST-Wiley). Also molecular weight can be confirmed by the use of CI. However, samples for GC/MS analysis will generally need to have polar functional groups (e.g., carboxylic acids, hydroxyls, amines) derivatized prior to analysis to improve volatility and thermal stability. There are a very large number of possibilities for derivatization and the reader is best referred to the very comprehensive literature that is available (e.g., Knapp,21 Blau and Halket,22 Halket and Zaikin,23–27,118 Zaikin and Halket,46,119). Identification may be further aided by acquiring high-resolution, high mass accuracy data from which elemental formulas may be derived, particularly if constraints can be introduced with respect to the presence and number of particular elements (see Section 9.10.4.3.3). If samples are to run without derivatization, then recourse may be had to the ‘soft’ liquid spray ionization processes, APCI, APPI, and ESI. ESI is the best choice for polar molecules (Figure 9) such as drugs and their metabolites and is by far the best choice of these three for peptides and proteins. At the relatively nonpolar end of the spectrum, APCI and APPI will be the preferred choice. However, APPI will have an advantage in terms of operability at low flow rates and its ready application to normal-phase chromatography and to lower polarity compounds than APCI. All three of these ionization methods produce molecular ions, and perhaps some adduct ions, yielding molecular weight information, but little fragmentation. In these cases, CID must be used in a tandem MS experiment to generate structurally informative fragmentation. Unfortunately, there is little in the

Mr

105 MALDI 104

LC/MS APPI

103

LC/MS APCI

GC/MS EI and CI

102

LC/MS ESI

10 Neutral

Polar

Ionic

Figure 9 Approximate ranges of analyte polarity and size that may be suited to different ionization techniques. With respect to the surface desorption techniques, DESI and DART, they are comparable in their range of application to ESI and APCI, respectively.

358 Mass Spectrometry: An Essential Tool for Trace Identification and Quantification

way of MS/MS libraries available for searching, and structural elucidation will have to rely on user-generated libraries and on an interpretation of the spectra from first principles.5,31–33 Interpretation will be greatly assisted by the acquisition of high mass accuracy, high mass resolution data to generate elemental formulas (Section 9.10.4.3.3). It is worth noting that despite the very widespread use of tandem mass spectrometers, there are no standard conditions for the acquisition of MS/MS data (see Hopley et al.,120 for a discussion on the attempts to develop tandem MS/MS libraries). This is because the amount of energy that can be put into the fragmentation process is dependent on both instrument design and the experimental conditions. For beam instruments (tandem-in-space), the variable experimental conditions will include the energy of the ions entering the collision cell, the collision gas (e.g., Ar, He, N2), the pressure of the collision gas, and the dimensions of the collision cell. Similarly, for trap-type instruments, the extent of fragmentation will be dependent on the collision gas, energy of the ions, type of fragmentation (CID, PID, ECD, EDD, ETD), and duration of the process. This is in contrast to EI-generated spectra that are normally acquired with electrons of 70 eV energy. The induced EI fragmentation is readily reproduced across all brands of MS instruments with some variation in the intensity of fragment ions. In summary, where samples require chromatography prior to MS, the separation technique as well as the size and polarity of the analyte will influence which ionization technique will be most appropriate. Thus GC/ MS will use EI or CI and LC/MS will use ESI, APCI, APPI, or a combined source (e.g., APCI/APPI or ESI/ APPI) to ionize the chromatographic eluant. In the case of LC/MS, the most effective form of ionization may not be easily predicted and some experimentation may be required. For high-molecular-weight samples, most commonly proteins and peptides, but also polysaccharides and synthetic polymers, the choice of an ionization method will be limited to ESI (Section 9.10.2.2.4) and/or MALDI (Section 9.10.2.2.7) (see also Table 5 for a comparison of ESI and MALDI). As mention above, MALDI is a solid-phase-based ionization technique and ESI is a flow-based liquid technique. Both readily generate Table 5 Comparison of ESI-MS with MALDI-MS Advantages of ESI-MS

Advantages of MALDI-MS

Typical sensitivity in the range of femtomole to low picomole or attomole concentration. Best sensitivity combining capillary LC and nano-ESI. Can be readily interfaced to LC outlet, permitting multidimensional chromatography (MuDPIT analysis of whole proteomes). Desalting and sample cleanup can be performed online. Soft ionization with little fragmentation; however, labile posttranslational modifications such as phosphates are often lost. Soft ionization permits observation of noncovalently bound protein complexes. Multiple charging of proteins and peptides permits analysis of high-mass ions on low-mass range analyzers. Multicharged ions fragment more efficiently in CID than singly charged ions

Disadvantages of ESI-MS

Disadvantages of MALDI-MS

Sample injected onto LC column, or directly injected into source, is totally consumed. Multiple charging of proteins and peptides complicates interpretation of MS and MS/MS data. Mixture analysis requires use of LC interface to minimize problems of ion suppression. Gradient chromatography of a single sample can require 10 s or minutes to hours to complete.

Typical sensitivity in the range of femtomole to low picomole or attomole concentration. Sample analysis is very rapid – a few seconds to analyze mixture of peptides. Sample can be stored and reanalyzed at leisure. Sample may be further purified in situ if required. Soft ionization with little fragmentation. Phosphate groups may be retained at very low laser power. Proteins and peptides usually ionized with a single charge simplifying interpretation. High practical mass limit but at low-resolution.

Intense matrix background below 800 Da. Can only be interfaced with LC in an off-line mode. Cannot analyze noncovalently bound protein complexes. High-throughput productivity requires automation of sample preparation. Reflectron required for good mass resolution.

These are complementary techniques and while many analytes, including proteins and peptides, may be equally well ionized by either method, some will only be ionizable by ESI and not MALDI and vice versa.

Mass Spectrometry: An Essential Tool for Trace Identification and Quantification

359

gas-phase ions, with MALDI-generated ions being singly charged and ESI-generated ions carrying multiple charges. Samples for ESI are readily, and conveniently, analyzed in an online mode accepting the separated analytes from an interfaced LC column. However, this means that the MS analysis must be completed in the time it takes to elute each individual component. In very complex samples, where separation is incomplete, analysis must be completed very quickly and it is possible that minor components, or poorly ionized components, may not be analyzed at all. It should be noted that capillary LC while yielding enhanced chromatographic separation also results in narrower chromatographic peaks and a requirement for even faster MS analysis particularly where quantification is desired. MALDI, on the other hand, is not readily interfaced with chromatography and is usually performed off-line in conjunction with an automated plate spotter, yielding highly reproducible mixtures of sample and matrix and consequently very consistent high-quality MALDI spectra. However, the size of the fractions collected compromises, to some extent, the chromatographic separation but this disadvantage is offset by the ability to archive the samples stored on the MALDI target so that the MS analysis can be repeated at leisure. When coupled with an automated off-line plate spotter, MALDI is faster than ESI and is often used for high-throughout proteomic analysis. Spectra generated by ESI are more complex than MALDI because of the multicharging phenomena. However, multiply charged ions are more amenable to CID because the coulombic repulsion of like charges aids the fragmentation process. CID of MALDI-generated singly charged ions lack this advantage. As the size of the analyte ions increases, the efficiency of CID diminishes, as there is a corresponding greater capacity for the ion to absorb and redistribute the impact energy. Nevertheless, this has proven to be a very effective method of amino acid sequencing for ‘bottom-up’ proteomics on peptides (typically 800–5000 Da) from trypsin digests. In general, MALDI and ESI are of comparable sensitivity (femtomolar to low picomolar levels of peptides/ proteins); however, it is impossible to make definitive comparisons for two reasons. First, there have been, and continue to be, technological advances improving the sensitivity of both techniques whereby, for example, the use of special hydrophobic surfaces on MALDI targets has been matched by the development of nano-ESI. Second, it has been observed that in complex proteomic analyses, perhaps only 30–50% of all proteins are adequately ionized by both ESI and MALDI, with the remainder being best ionized by either ESI or MALDI alone. Thus there is a good case to be made for the use of both techniques in a comprehensive proteomic analysis (Table 5).

9.10.4.2

MS Scan Modes

9.10.4.2.1

Single MS analyzer (nontrapping) scan modes In the case of beam-type mass analyzers (B, BE, Q, ToF, and hybrids), the analyzer is commonly operated to scan over a defined mass range, generating a mass spectrum of all ions generated in the ion source. This is often referred to as a full scan (Figure 10(a)). Alternatively, the Q, B, or BE analyzers can be set to pass a selected ion or a selected series of ions (Figure 10(b)). This is known as SIM. This scan mode is commonly used for single Q analyzers (Section 9.10.2.3.3) because they can be rapidly switched between different ions over a large mass range. The SIM mode is designed to enhance the sensitivity of an assay by concentrating the analyzer time onto only the ions of interest. For example, if instead of scanning a mass range of 500, the analyzer is set to monitor just 5 m/z values, then the number of ions counted in each of those channels will be 100 times that observed in the scanning mode. This improvement in ion statistics translates directly into improved sensitivity; however, this must be offset against the loss of a great deal of analytical information. It should also be noted that SIM is inappropriate for ToF analyzers because instead of scanning, they sample the entire mass range at any one time (Section 9.10.2.3.7). However, postacquisition processing of ToF data can be used to extract the time-based intensity trace for any of the ions in the mass range monitored. SIM is also problematic for B analyzers except over a narrow mass range; however, a double-focusing instrument (BE or EB) (Section 9.10.2.3.2) does have the possibility to further enhance selectivity by performing SIM with high mass resolution, as does the more common ToF analyzer.

360 Mass Spectrometry: An Essential Tool for Trace Identification and Quantification

Scan

(a) +

Scan

+

Detector

+

+

Select

(b) Selected ion monitoring (SIM)

+ + +

+

Figure 10 Scan modes for a single beam-type analyzer (e.g., Q, B, E). (a) Full scan. (b) Selected ion monitoring scan, commonly used in quantitative work to improve assay sensitivity.

9.10.4.2.2

Tandem MS scan modes There are five main scan modes possible using MS/MS and these will be described and illustrated using the QqQ as an example (Figures 7(a)–7(e)). Double-focusing BE or EB instruments are also capable of tandem MS using linked scans to monitor ion transitions; however, they suffer from the disadvantage of having to either select ions with low mass resolution or detect ions with low mass resolution. Consequently, these instruments are rarely used today for tandem MS (see Gross,33 for further discussion). The first scan type to consider is the full scan of all the ions generated in the ion source using no mass selection. This is done by setting both Q1 and Q2 to pass all ions (RF-only mode) through to Q3, which is then scanned in the normal way (Figure 7(a)) to generate a spectrum of ions present in the source. The product ion scan entails the mass selection of a precursor ion in the first stage (Q1), fragmentation (CID or ETD) in the collision cell, and then mass analysis of all resultant fragment masses in the second stage of mass analysis (Q3) (Figure 7(b)). This experiment can be performed by beam (tandem-in-space) or trap (tandemin-time) instruments. It is commonly performed to identify transitions used for quantification by tandem MS or as part of an exercise in structural elucidation. In the precursor ion scan, the first mass analyzer (Q1) sequentially scans all precursor ions into the collision cell (Figure 7(c)) for fragmentation. The second analyzer (Q3) is then set to transmit a single specified ion product. The resulting mass spectrum is then a record of all the precursor ions that give rise to the specified common product ion, such as, for example, the metabolites of a particular drug, or class of compounds, which can be fragmented to a common structural moiety. The precursor ion scan can be carried out only with tandemin-space instruments. For the neutral loss scan, the first mass analyzer (Q1) scans all the masses (Figure 7(d)). The second mass analyzer (Q3) also scans, but at a fixed offset from the first mass analyzer. This offset corresponds to a neutral loss that is commonly observed for a particular class of compounds; for example, the loss of 44 u (CO2) from [M H] ions will be indicative of carboxylic acids. Alkyl loss (CnH2n þ 1) will be seen in the loss of 15, 29, or 43, etc. and the loss of 18 u (H2O) will be indicative of a primary alcohol. A comprehensive table of common neutral fragments may be found in McLafferty and Turecˇek.32 The mass spectrum is then a record of all precursor ions that lose the specified neutral fragment. Again, neutral loss scans cannot be performed with trap-type MS instruments or with ToF analyzers. However, postacquisition analysis software can be used to search for the specified neutral loss. SRM is a version of the product ion scan and is used in experiments designed to identify and quantify targeted analytes (Figure 7(e)). Both mass analyzers, Q1 and Q3, are set to pass predetermined masses. These correspond, first, to a specific precursor ion (Q1) and, second, to a fragmentation or transition (Q3) that is characteristic of the selected analyte. Typically, the MS will be rapidly switched between several sets of such transitions representing different analytes, internal standards (ISs), or possibly an alternative confirmatory

Mass Spectrometry: An Essential Tool for Trace Identification and Quantification

361

transition. Thus SRM adds a considerable degree of selectivity to an assay in that the fragmentation monitored is specific to the target analyte and is unlikely to also occur with any background chemical noise that is also selected by Q1. If the first mass analyzer can be operated with high mass accuracy and high mass resolution, this will further enhance selectivity. Sensitivity in SRM is also concomitantly improved, because by removing all the background chemical noise, the S/N of the monitored ion is improved. SRM may be performed by both tandem-in-space and trap-type instruments. 9.10.4.3

Identification – Unknown Small Molecules

Identification or structural elucidation of an unknown compound is one of the most challenging tasks that can be undertaken by an analytical chemist. Where the analyst has milligram or more amounts of the unknown, MS is often used in conjunction with other techniques such as nuclear magnetic resonance (NMR) and IR spectroscopy. However, when there are only limited quantities of sample, the sensitivity of MS makes this the technique of choice for assembling structural information and, where no definitive conclusion can be reached on the mass spectral data alone, serves to limit the search to a particular class of chemicals or set of isomers. 9.10.4.3.1

An LC/MS approach A useful first step is to analyze the sample using LC interfaced to ESI or APCI on a tandem mass spectrometer capable of accurate mass measurement at high resolution. The LC will serve to separate the unknown away from the matrix components and will reduce the potential for ion suppression. A short linear gradient of acetonitrile against 0.05 mol l1 ammonium acetate on a reverse-phase C-18 column represents a good starting point (see, e.g., Eckers et al.,121 Arthur et al.,122 Tozuka et al.,123 and Wolff et al.124). Spectra obtained from ESI or APCI will generally yield molecular ions of the type [M þ H]þ and little, if any, fragmentation. However, fragmentation can be induced by CID (Section 9.10.3.2.1) followed by an MS/MS analysis of the product ions (Section 9.10.4.2.2). High mass accuracy/high mass resolution measurement of the molecular ion can, with appropriate constraints, generate an elemental formula (see discussion in Section 9.10.4.3.3). The MS/MS analysis of the molecular ion will also yield a structurally informative set of fragment ions. Again, if exact mass data can be obtained for these, a further set of elemental formulas may be obtained. Unfortunately, because there are only limited collections of library spectra generated by MS/MS (see Section 9.10.3.2), the experimenter will generally have to resort directly to a first-principles interpretation. Considerable structural information is to be found in the fragmentation patterns of gas-phase ions. Although this discussion is often held in the context of EI-generated spectra, it is important to remember that the reactions of gas-phase ions are not dependent on their method of formation but rather on their intrinsic structural properties and their internal energy. Thus structural information can be obtained from fragmentation that is induced by a high-energy ionization process, such as EI, as well as from collisionally induced fragmentation of a [M þ H]þ ion that may have been generated by a ‘soft’ ionization process. Instructive examples of structural elucidation of drug metabolites using MSn fragmentation trees and exact mass data are described by Eckers et al.,121 Arthur et al.,122 Tozuka et al.,123 and Wolff et al.124 and of complex lipids are described by Hsu and Turk.125,126 In brief, the drugs and their metabolites are exhaustively fragmented and the fragments compared to locate the biologically induced structural changes – for example, oxidations, cleavages, alkylations, and conjugations. The relationship between the product ions and the precursor ions, and their elemental composition are key elements in assembling the structural features of the unknowns. One must also be aware of the possibility of isomers and the use of an appropriate separation technique may be required in addition to the MS and MSn data. 9.10.4.3.2

GC/MS approach If the unknown is sufficiently volatile or can be made volatile by derivatization, then the LC/MS approach can be complemented by the use of GC/MS. A useful starting point in this regard is to make a trimethylsilyl derivative. Silylation is applicable to a wide range of nonsterically hindered functional groups, including alcohols, phenols, thiols, amines, oximes, and carboxylic acids (Table 6).22,127 The dried sample (1 mg

362 Mass Spectrometry: An Essential Tool for Trace Identification and Quantification Table 6 Trimethylsilylation reagents Silylation reagent

Abbreviation

Strong silyl donors N,O-Bis(trimethylsilyl)acetamide N,O-Bis(trimethylsilyl)trifluoroacetamide N-Methyl-N-trimethylsilyltrifluoroacetamide

BSA BSTFA MSTFA

Moderate strength silyl donors Trimethylsilyldiethylamine

TMSDEA

Weak selective donors Trimethylimidazole (hydroxyl groups) Hexamethyldisilazane (hydroxyl groups) Trimethylchlorosilane (hydroxyl groups)

TMSIM HMDS TMCS

A range of reagents, with differing degrees of reactivity, are commonly available to make trimethylsilyl (TMS) derivatives. The silylating potential can be increased by the choice of an appropriate solvent (e.g., pyridine, DMF, acetonitrile) or by the addition of a catalyst (e.g., 1–20% TMCS).22,127

maximum) is dissolved in pyridine (10 ml) to which is added an equal volume of a strong silylation reagent such as N,O-bis(trimethylsilyl)trifluoroacetamide (BSTFA) plus 1% trimethylchlorosilane (TMCS) or N-methylN-trimethylsilyltrifluoracetamide (MSTFA). It is important that the silylation reagent is present in an excess of at least 2:1 molar ratio to active hydrogens and that the sample is dry. Unhindered moieties will be quickly silylated but derivatization times and the need for heat vary widely depending on the degree of steric hindrance but unless determined otherwise, heating at 70 C for 20–30 min will ensure the reaction is driven to completion for most active hydrogens. A variation on this approach, widely used in metabolomics experiments where a large range of chemical classes need to be derivatized, is to initially protect any carbonyl moieties by methyloximation (10 ml of a 20 mg ml1 solution of methoxyamine HCl in pyridine at 40 C for 90 min) (see, e.g., Fiehn128). The silylation reagent can then be directly added at the end of the methoximation reaction. After silylation, the sample may be injected onto a general purpose capillary column such as one coated with 5% phenyl-95% methylpolysiloxane. The column temperature is then increased to drive off the less volatile components. Although silylation is a good general derivatization reaction, it should be remembered that there are many other possibilities available, especially if a particular class of analyte is being assayed (e.g., Knapp,21 Blau and Halket,22 Halket and Zaikin,23–37 Zaikin and Halket46,119). If assay sensitivity is important then derivatization with electron-capturing groups for ECI should be considered (Section 9.10.2.2.2). The advantage of GC/MS over LC/MS is that extensive libraries of EI data are available for searching (see Section 9.10.4.3.4). Where necessary, this can be complemented by molecular weight data from CI. Library identification then requires confirmation by comparing the column retention time and MS and MSn data with that of a standard. If no library match is found, then a similar process of determining elemental formulas (see Section 9.10.4.3.3) and interpretation of the fragmentation data from first principles must be followed (see Section 9.10.4.3.4). A more sophisticated method of comparing retention times, and one that is applicable across different column phases and temperature programs, is by way of retention time indices or Kova´t’s indices (KIs).129 The KI for a particular analyte is calculated against a homologous series of n-alkanes, coinjected with the sample. KI ¼ 100n þ

100ðtx – tn Þ ðtnþ1 – tn Þ

ð55Þ

where n is the number of carbon atoms in the n-alkane standard that elutes immediately prior to the analyte of interest, tx is the retention time of the analyte, tn is the retention time of the n-alkane standard that elutes immediately prior to the analyte of interest, and tn þ 1 is the retention time of the n-alkane standard that elutes immediately after the analyte of interest. Thus in metabolomic experiments, the KI for individual metabolites is an important piece of confirmatory information where the mass spectral differences between isomers are minimal or nonexistent (Figure 11). Unfortunately, no such comparative set of indices are available for LC.

Mass Spectrometry: An Essential Tool for Trace Identification and Quantification

363

(a) 100

11.28

Glc Xyl

11.92

9.34

IS, Ino

80 Gal

60

Man

Rhm

Ara

40

10.85 10.72

Fuc

20

7.90

8.60

0 6.0

7.0

9.0

8.0

10.0 Time (min)

12.0

11.0

13.0

14.0

(b) 139

100 80

Glucitol (OAc)6, Rt = 11.28 min

60 40 20 0

85

115 97

73 55 61 67 71

103 110

116

187 127 128

140

153 157

167 175 182 188

199

217

229

(c) 139

100

Galactitol (OAc)6, Rt = 10.85 min

80 60

55 61 67 69 71

60

81

80

187

127

97

40 20

115

85

110 116 103

100

120

128

140

153 145 157 169 170 175 182 188

140

160

180

199

218

200

220

229

240

m/z

Figure 11 GC/MS assay of alditol hexa-acetates quantified against inositol internal standard (IS). (a) In the chromatogram shown here the monosaccharides making up a plant cell wall are being quantified as their alditol acetates, using inositol (Ino) as the (IS). The GC separation of these reduced sugars is essential for their identification. The mass spectra of the alditol acetates of the hexoses, glucose (Glc) (b), galactose (Gal) (c), and mannose (Man), are essentially identical, as are the mass spectra of the alditol acetates of the pentoses, xylose (Xyl) and arabinose (Ara), and the deoxysugars, rhamnose (Rhm) and fucose (Fuc).

9.10.4.3.3

Determination of elemental formula Accurate mass data can be a significant aid in identifying compounds as it can yield the elemental composition of the molecular ion and the associated fragment ions.130 The theoretical mass of a compound can be readily calculated from tables of elemental masses (Table 7) and there are software packages to automate this procedure. Although it is possible to accurately measure mass at low-resolution, the analyst runs the risk of including extraneous isobaric ions in the measurement. This will result in the mass measurement being skewed or shifted by the interfering ion(s) producing an erroneous elemental formula.131 Note that as higher mass ions are analyzed, the number of possible elemental formula consistent with the measured mass also increases along with the requirement to resolve away isobaric ions. This situation is nicely summarized by the editor of the JASMS132 in the journal’s guidance on the use of accurate mass data: When valence rules and candidate compositions encompassing C0-100, H3-74, O0-4, and N0-4 are considered at nominal parent mass 118, there are no candidate formulae closer together than 34 ppm. At nominal parent mass 500, there are

364 Mass Spectrometry: An Essential Tool for Trace Identification and Quantification Table 7 Stable isotopic masses and abundancesa Isotope

Mass (u)

Natural abundance (%)

1

1.007 825 031 9(6)b 2.014 101 777 9(6) 12 (exactly, by definition)c 13.003 354 838(5) 14.003 074 007 4(18) 15.000 108 973(12) 15.994 914 622 3(25) 16.999 131 50(22) 17.999 160 4(9) 18.998 403 20(7) 22.989 769 66(26) 27.976 926 49(22) 28.976 494 68(22) 29.973 770 18(22) 30.973 761 49(27) 31.972 070 73(15) 32.971 458 54(15) 33.967 866 87(14) 35.967 080 88(25) 34.968 852 71(4) 36.965 902 60(5) 38.963 7069(3) 39.963 998 67(29) 40.961 825 97(28) 78.918 337 9(20) 80.916 291(3)

99.988 5(70)b 0.011 5(70) 98.93(8) 1.07(8) 99.636(20) 0.364(20) 99.757(16) 0.038(1) 0.205(14) 100 100 92.223(19) 4.685(8) 3.092(11) 100 94.99(26) 0.75(2) 4.25(24) 0.01(1) 75.76(10) 24.24(10) 93.258 1(44) 0.011 7(1) 6.730 2(44) 50.69(7) 49.31(7)

H H 12 C 13 C 14 N 15 N 16 O 17 O 18 O 19 F 23 Na 28 Si 29 Si 30 Si 31 P 32 S 33 S 34 S 36 S 35 Cl 37 Cl 39 K 40 K 41 K 79 Br 81 Br 2

Electron (e)d Proton (Hþ)d

5.485 799 09(27) 104 1.007 276 452

– –

a

Data are derived from an IUPAC Technical Report by deLaeter et al.133 The () uncertainty of the measurement is indicated in parentheseis. c In mass spectrometry, the unit of measurement is the unified atomic mass unit (u), which is defined as 1/12 the mass of a 12C atom (1 u ¼ 1.660 540 29 1027 kg). d The very high mass resolution and accuracy that are available from BE, EB, FTICR, Orbitraps, and ToF analyzers mean that calculations of exact masses need to also account for electrons in the analyte ion.134,135 Thus, for example, an electron is lost in the formation of a radical cation (M+?) and a protonated molecule ([M þ H]þ) gains a proton not a hydrogen atom (m ¼ 1 e). An electron is gained in the formation of a radical anion (M?) and a deprotonated molecule ([M H]) loses a proton not a hydrogen atom. While this error will be insignificant for large molecules such as proteins, for small molecules, the error can be as large as several ppm. For example, ignoring the mass of three electrons in triply charged GluFib, [M þ 3H]3+, leads to an error of 1 ppm in the exact mass calculation.134 b

These are the isotopes most commonly encountered in natural product chemistry. Silica is encountered as a generic derivative for GC (e.g., trimethylsilyl and tert-butyldimethylsilyl derivatives).

five compositions that have a neighbouring candidate less than 5 ppm away. Using C0-100, H25-110, O0-15, and N0-15 at mass 750.4, there are 626 candidate formulae that have a neighbouring possibility less than 5 ppm away. Thus, for a measurement at m/z 118, an error of only 34 ppm uniquely defines a particular formula. At m/z 750.4 an error of 0.018 ppm would be required to eliminate all extraneous possibilities.

In practice, it is important to be able to restrict the type and number of elements in any possible formula so as to improve the degree of confidence in selecting the most appropriate formula and to eliminate impossible or unlikely combinations of elements. Further information to help with this may be gleaned from an examination of the isotope pattern of the molecular ion. Most of the elements present in organic compounds (C, H, N, O, P, and S; Si must obviously be included if silyl derivatives were used for GC/MS) have two or more stable isotopes (Table 7). This information can be used, for example, to estimate the number of carbons present in an

Mass Spectrometry: An Essential Tool for Trace Identification and Quantification

365

unknown from knowing that the 13C isotope has an abundance equal to 1.1% that of the 12C isotope. Thus an ion containing 10 carbons will have a 13C abundance ratio of 11% and by extension an ion containing 20 carbons will have a 13C abundance ratio of 22%. A number of other elements such as chlorine (35Cl:37Cl, 1:3), bromine (79Br:81Br, 1:1), sulfur (32S:33S:34S: 100:1:5), and silicon (28Si:29Si:30Si, 100:5:3) also have distinctive isotope patterns, recognition of which will aid in restricting the possible elemental formula of the unknown (for further discussion, see McLafferty and Turecˇek,32 Gross,33 and Watson and Sparkman5). Algorithms such as that described by Pickup and McPherson136 and Hsu137 can be used to model isotope distributions in elemental formulas and the comparison between the experimental and theoretical isotopic distribution can be assigned a goodness-of-fit score. Further constraints may be identified by application of the nitrogen rule,5,32,33 which states that a compound containing the common elements, C, H, O, S, Si, P, and the halogens, will have an odd nominal molecular weight if it contains an odd number of nitrogens. A compound with zero or an even number of nitrogen atoms will have an even nominal molecular weight. This is because every element with an odd mass has an odd valence and every element with an even mass has an even valence, with nitrogen being an exception, having an odd valence and an even mass. In addition, a consideration of the valency of the constituent elements leads to the derivation of a general algorithm for the number of rings and double bonds (R þ DB) present in an ion.32,33,138 Thus, for the elemental formula CcHhNnOo ðR þ DBÞ ¼ c – 0:5h þ 0:5n þ 1

ð56Þ

Other monovalent elements (F, Cl, Br, and I) are counted as hydrogens, trivalent elements (P) are counted as nitrogen, and tetravalent elements (Si) are included with carbon. For chemically possible formulae, r þ db > 1.5. Odd-electron ions (Mþ?) will have an integer value and even-electron ions will have 0.5 r þ db more than expected, so round up to next lowest integer.32,33 By way of example, Kind and Fiehn139 have described an integrated application of accurate mass data to metabolite identification, constrained by isotope abundance information and valence rules, in addition to the KI (Section 9.10.4.3.2). In the ideal case, the high mass accuracy and high mass resolution determination of the molecular ion will yield an unambiguous formula but this says nothing about the connectivity of the constituent atoms. For the trivial case of C2H6O, the exact mass is 46.041 864 8 but this does not distinguish ethanol (CH3CH2OH) from dimethyl ether (CH3OCH3). However, fragmentation occurs in a mostly predictable fashion and an examination of the molecular ion fragments will often reveal a distinctive ‘fingerprint’ including structurally diagnostic ions (m/z 31 for ethanol and m/z 29 for dimethyl ether).

9.10.4.3.4

Database searching and interpretation of fragmentation from first principles For the EI spectra of unknowns, a very valuable first step toward identification is to perform a simple spectral comparison with an EI library. As noted above, EI spectra (Section 9.10.2.2.1) are highly reproducible and are not instrument dependent. The widely available NIST-Wiley Library, for example, contains several hundred thousand spectra. A satisfactory match of the unknown and reference spectra can be confirmed experimentally against a reference standard. However, it should be noted that the EI mass spectra of stereoisomers and geometric isomers are often very similar, exhibiting the same fragmentation pattern and similar abundances of fragments. This can be seen, for example, in the spectra of glucitol hexaacetate and galactitol hexaacetate (Figures 11(b) and (c)). In these cases, the ambiguity of the MS identification can be overcome by a comparison of the GC retention times (Rt) (Figure 11(a)) (the use of KIs is described in Section 9.10.4.3.2). Judicious selection of the phase coating the inside of the capillary column (a wide range of different chemistries and polarities are available) that is interfaced with the MS instrument will permit the separation of many of these stereoisomers and geometric isomers. An alternative approach is to make a chemical derivative that the MS can distinguish. For example, mass spectra of fatty acid methyl esters (FAMEs) containing two double bonds are essentially identical regardless of the location of the double bond; however, if instead a dimethyloxazoline derivative is made, the location of the double bond can be readily determined (Figure 12).

366 Mass Spectrometry: An Essential Tool for Trace Identification and Quantification

(a)

67

100

FAME C18:2Δ9,12, MW 294

80 81

O

60

O-Me 95

40

M +.

55 135

20

150

109 121

262

164 178 191

209

220

294

0 60

100

140

180

220

260

300

340

m/z (b)

126

100

Δ9,12

DMOX C18:2

, MW 333 208

248

80

N O

113 222

60 ΔM = 12 276

40

M +.

ΔM = 12 72

20

236

182 140

55

290

98

67

262 168

81

196

154

248

318

208

333

304

0 60

100

140

180

220

260

300

340

m/z Figure 12 The mass spectrum of the fatty acid methyl ester (FAME) of linolenic acid (C18:29,12) contains no readily discernable structural information beyond the molecular ion (a). However, the dimethyloxazoline (DMOX) derivative, in which the charge is retained by the heterocyclic ring, can undergo charge remote fragmentation yielding a mass spectrum from which the location of the double bonds, but not their geometry (cis versus trans), can be readily determined (b). The latter stereochemistry can usually be distinguished by the GC retention time on an appropriate column.

Where no satisfactory comparable spectra can be found by a database search, the more laborious process of interpreting the spectra from first principles must be attempted. As mention before, this process will be considerably aided if elemental compositions of the molecular ion and the EI fragments are available.

Mass Spectrometry: An Essential Tool for Trace Identification and Quantification

367

Much effort has been expended on providing rational mechanisms for fragmentation and these are well summarized by, for example, Budzikiewicz et al.,31 McLafferty and Turecˇek,32 Gross,33 de Hoffmann and Stroobant34, and Watson and Sparkman.5 In EI, an odd-electron ion (Mþ?) is generated and the subsequent bond cleavages that follow result in the formation of the most stable cation with paired electrons (even-electron ion). The soft ionization techniques such as CI, ESI, APCI, and MALDI produce molecular species by the addition or abstraction of a proton, yielding an ion with an even number of electrons (e.g., [M þ H]þ). These ions are more stable than radical cations and their fragmentation is more likely to reflect steric effects, so isomers with essentially identical EI spectra often give rise to different soft ionization spectra and may fragment differently following CID.32 A comprehensive description of these processes is beyond the scope of this chapter and the reader is referred to one of many texts on the interpretation of fragmentation and to tables of common neutral losses and of common ion series for particular classes of compounds (see, e.g., Budzikiewicz et al.,31 McLafferty and Turecˇek,32 Gross,33 de Hoffmann and Stroobant,34 Dass,140 and Watson and Sparkman5). As mentioned above, mass spectral interpretation will be greatly aided if high mass accuracy data at high mass resolution are available to determine the elemental formula of the unknown and its fragments. Also there is increasing use of gas-phase ion/molecule reactions that can be exploited for class and functional group identification.141 9.10.4.4

Criteria for Identification of a Known Compound

Forensic laboratories and regulatory authorities responsible for the quality of food, drugs, and environmental pollution are major users, directly and indirectly, of MS for the purpose of identification and quantification. The major question they face is: How much information is required to support their claim, within the specified confidence limits, for the presence of known specified compounds in their samples? This is not an easy question to answer and is usually dealt with by defining the core analytical technology and a set of minimal performance criteria for acceptable identification and by reserving the right to assess methods on a case-by-case basis. While there may be a general consensus on the broad issues of what is required for confirmation of identity, there is no general agreement on specifics and a number of different approaches and specific requirements are used around the world. By way of example, we will look at the requirements of two such regulatory authorities, the US Food and Drug Administration (FDA) and the European Union (EU), with respect to residues of banned substances, mostly veterinary drugs, in animal products. The identification criteria set by both these regulatory authorities are quite stringent and similar types of criteria are also required for other regulatory authorities and also by editors of research journals. In addition to the mass spectral aspects of these assays, which are outlined below, there may also be extensive requirements to be met by the analyst with respect to compliance with good laboratory practice, which governs the operations of analytical laboratories and includes sampling regimes, assay validation procedures (e.g., limits of detection, limits of quantification, accuracy, reproducibility, and ruggedness), and laboratory accreditation (e.g., staff training, laboratory equipment, documentation, quality assurance, and quality control).142–145 9.10.4.4.1

FDA Guidance for Industry 118 In its Guidance for Industry 118,144 the FDA requires that methods for confirmation of identity include the use of a

• • •

comparison standard, chromatography interfaced to MS (GC/MS or LC/MS), and mass spectral matching.

The use of a standard is a fairly obvious requirement but where matrix effects alter either the chromatography or the spectrum, the authority will allow the use of a control extract spiked with the standard instead of using a pure standard. However, the analyst must then be able to demonstrate the absence of interference in a control extract containing no standard. The FDA asks that the use of MS be combined with chromatography but specifications are only listed for GC/MS and LC/MS. The omission of interfaces such as CE/MS SFC/MS is a reflection of the conservative

368 Mass Spectrometry: An Essential Tool for Trace Identification and Quantification Table 8 FDA criteria for mass spectral matching MS scan

Requirements

Full scan At least three structurally specific ions that completely define the molecule are present above a specified level. General correspondence between relative abundance of sample and standard ions (within the range of 20%). Prominent ions, not from analyte, can be explained.

SIM Relative abundance of three structurally specific ions of sample and standard should be within 10%. Relative abundance of four or more structurally specific ions of sample and standard should be within 15%.

MSn full scan All structurally specific ions present in standard spectra should be present in sample spectra. General correspondence between relative abundance of sample and standard ions (within the range of 20%). Prominent ions, not from analyte, can be explained.

MSn SRM If precursor ion is completely dissociated and only two structurally specific ions are monitored, the relative abundance of sample and standard ions should match within 10%). If three or more structurally specific ions are monitored, the relative abundance of sample and standard ions should match within 20%).

Summary of FDA requirements for identifying animal drug residues.144

nature of regulatory authorities with respect to the unproven reliability of these techniques to robustly deliver reproducible chromatograms, not only on a day-to-day basis but also over an extended period of time. There is, however, flexibility in the type of chromatogram that may be used: total ion currents (TICs), reconstructed ion currents (RICs), SIM, and SRM are all acceptable with the provision that the retention times for the standard and the analyte should be within 2% for GC/MS and 5% for LC/MS. With respect to mass spectral matching, the criteria for identification vary depending on the technique used for mass spectral data acquisition (see summary of requirements in Table 8). It is interesting to note that while the FDA does not rule out the use of exact mass measurements, it views these data as problematical as there are no generally accepted specific standards for their use. The problem here is that it is difficult to be definitive about the resolving power required, particularly, when analytes have masses greater than m/z 500. Clearly the resolving power and accuracy must be sufficient to exclude all reasonable alternative elemental compositions and they recommend that if exact mass measurements are to be used then multiple structurally specific ions should be measured.

9.10.4.4.2

EU performance of analytical methods The EU takes a slightly different approach to the FDA in setting the criteria for identification (Tables 9–11) but agrees with the FDA in accepting only GC/MS and LC/MS methods and in their requirement for an analyte standard.145 The EU tolerance for chromatographic performance is more stringent than the FDA, requiring GC and LC retention times for standards and samples to be within 0.05 and 2.5%, respectively. In addition to outlining a set of performance criteria for the different types of MS data (Tables 9 and 10), the EU uses a system of identification points to score the MS data (Table 11). Identification under this system is acceptable only if a certain number of identification points have been accumulated. So, for example, identification using GC/MS2 for one precursor ion and two product ions will earn four identity points, and identification using GC/MS and LC/MS, monitoring two ions with each technique, will also accrue four identity points. This level of identification is deemed sufficient for identification of their Group A banned substances (veterinary drug residues in meat for human consumption). It is interesting to note that the EU has set no qualifications around the acceptability of exact mass data, save that resolution should be greater than 10 000 (10% valley) for the entire mass

Mass Spectrometry: An Essential Tool for Trace Identification and Quantification

369

Table 9 EU criteria for mass spectral matching MS scan

Requirements

Full scan A minimum of four diagnostic ions (molecular ion, adducts, fragments, and isotope ions) with an intensity >10% in the standard must be observed in the sample. The molecular ion must be included if the relative intensity is 10% of the base peak. The relative intensities of the sample diagnostic ions are required to match those of the standard, within specified tolerances (Table 10).

SIM The molecular ion shall be one of the selected diagnostic ions. The S/N for each diagnostic ion shall be 3:1. A minimum of four identity points (Group A, banned substances) (Table 11) must be accumulated and these must be derived from at least one ion ratio measurement, meet the specified intensity tolerances (Table 10), and no more than three techniques can be used to achieve the minimum number of identity points.

Summary of EU requirements for identifying animal drug residues.145

Table 10 EU maximum permitted tolerances for relative ion intensities Relative intensity (% base peak)

GC–EI–MS (%)

GC/CI-MS, GC/MSn, LC/MS, LC/MSn (%)

>50 >20–50 >10–20

10

10 15 20 50

20 25 30 50

Table 11 EU identification points earned MS technique

Identity points/ion

Low-resolution (LR)-MS LR-MSnprecursor ion LR-MSnproduct ions High-resolution (HR)-MS HR-MSnprecursor ion HR-MSnproduct ions

1.0 1.0 1.5 2.0 2.0 2.5

A minimum of four identity points are required to confirm the presence of a Group A substance (banned veterinary products).

range and indeed it assigns two identity points for each measured ion (Table 11). This lack of qualification ignores the fact that the number of candidate elemental compositions increases markedly with mass (Section 9.10.4.3.3). This point is well illustrated by Nielen et al.,146 who using the anabolic steroid, stanozolol, and the -agonist Clenbuterol-R, as models, demonstrate the current EU mass accuracy criteria can yield false negative results.

9.10.4.5

Quantification

As noted by the Reverend Stephen Hales as long ago as 1727, scientific insight into the processes of nature can be obtained only through the discipline of measurement.

370 Mass Spectrometry: An Essential Tool for Trace Identification and Quantification Since we are assured that the all-wise Creator has observed the most exact proportions, of number, weight and measure, in the make of all things, the most likely way therefore, to get any insight into the nature of those parts of the creation, which come within our observation, must in all reason be to number, weigh and measure. Vegetable Staticks, Stephen Hales 1977–1761

It should be no surprise therefore that mass spectrometers are most commonly used for quantification. In addition to quantitative applications by regulatory authorities and industry (e.g., petrochemical, pharmaceutical, food, forensic, and environmental areas), the postgenomic era has witnessed an explosion in the use of mass spectrometers to determine and quantify gene function as exhibited in the gene products – proteins and metabolites. This has given rise to two new and unique areas of endeavor, proteomics and metabolomics. Both of these aim to analyze the complete respective sets of proteins or metabolites, present in a cell, tissue or organism at any one time point. The importance of these analyses lies in the fact that they provide information that is not directly attainable from the genomic sequence, including, for example, insight into developmental processes and responses to environmental stimuli and pathogens, at the cellular level. These data can then be linked to genomic and transcriptomic data to present the scientist with a holistic or systems biology view of an organism (see, e.g., Weckwerth et al.147 and Trauger et al.148). The challenge of proteiomic and metabolomic analysis lies in the complexity (e.g., PTMs of proteins and the array of different chemical classes of metabolites), and the large range of concentrations, of the components present in the sample and in the need for high-throughput and reproducible methodologies for their identification and quantification. A detailed discussion of protein and peptide analysis by MS may be found elsewhere in this volume (see Chapter 9.12).

9.10.4.5.1

Components of an MS-based metabolite assay Although techniques such as NMR and IR spectroscopy have found some utility in metabolite analysis, the most common approach has been to draw upon the versatility, speed, and high degree of specificity and sensitivity inherent in tandem MS.149 In the case of complex samples, this specificity and sensitivity can be enhanced by interfacing the mass spectrometer to some form of high-resolution chromatography such as GC, nano-LC, or CE. Using MS for quantification is no different in principle to using any other detector, and generally encompasses sample quenching, homogenization to break down tissue and cell structure, extraction, separation, sample analysis, calibration standard analysis, and finally data processing (Figure 13).143 Also included will be assay validation–determining the limits of quantitation, selectivity, accuracy, precision, and linear dynamic range of the assay (see FDA Guidance for Industry,144 Bioanalytical Method Validation,150 Pritchard and Barwick,151 and Boyd et al.,143 for a detailed discussion on validation and quality assurance in analytical chemistry). In addition, a mass spectral assay will include specific consideration of the following items:

• • • • •

Optimizing the quenching, extraction, and purification processes, being cognisant of reagents that may be incompatible with MS (e.g., nonvolatile salts and detergents; see also discussion on chemical noise and contamination in Section 9.10.4.5.8).152 Selecting the method of sample introduction to the MS, for example, GC/MS (need to consider analyte derivatization; Section 9.10.4.3.2) or LC/MS.152 Choosing the best ionization method (Section 9.10.4.1). For a metabolomics experiment, the most complete metabolite coverage may require analysis with multiple ionization methods.153 Choice of quantitative standard – IS (isotopically labeled of not), external standard, or standard addition.154 Selecting the most appropriate method of ion analysis – full scan, SIR, SRM.

It is important to remember, however, that in any quantitative assay, MS is just one part of a closely integrated overall procedure and that failure and compromise in any one step will invalidate the entire procedure. Some of the above points are illustrated in the development of a mass spectral assay for salicylic acid in tomato (Figures 14–17).

Mass Spectrometry: An Essential Tool for Trace Identification and Quantification

371

Representative sample selected

Addition of IS

Sample “quenched”, homogenized, IS equilibrated with sample

Fractionation/extraction of analyte(s) e.g., solvent or SPE

Preparation of calibration solutions

Possible derivatization of analyte(s)

GC/MS or LC/MS analysis

Use calibration curve to calculate analyte concentration(s)

Data analysis Figure 13 Flow chart for a quantitative assay using an internal standard. The most critical steps are the selection of a representative sample, the accurate preparation of the standards, and finally the addition of the standard to the sample – planning, weighing, making up to volume, and pipetting. It is sobering to remember that failure in any one of these will invalidate the entire assay no matter how sophisticated the instrumentation or how powerful the statistics applied to data analysis.

9.10.4.5.2

Sample preparation Cellular processes are dynamic and the level of a particular metabolite at any one time will represent the balance of biosynthesis, biochemical transformation into other metabolites, degradation, transportation into and out of the cell, and sequestration into and out of storage forms. Depending on the rates of these respective processes, the level of a metabolite can be subject to large and rapid change during quenching. Similarly, subtle changes introduced by developmental processes or genetic manipulation can also induce large changes in the level of metabolites (see, e.g., Schwab155). In any metabolite analysis, it is important that the analytical sample accurately represents the cellular or tissue status at the time the sample is taken. This means that quenching must very rapidly terminate all biological processes and that chemical degradation is minimized.13,152,156 Typically, quenching is achieved by extremes of temperature (<20 or >80 C) or acidity (pH < 2 or >10), possibly in the presence of an organic solvent, and/or an antioxidant, and in conjunction with homogenization. If the analysis is directed at a particular metabolite or class of metabolites (targeted analysis), the optimization

372 Mass Spectrometry: An Essential Tool for Trace Identification and Quantification

(a) 100

223

91

3-Hydroxybenzoic acid

80 O-TMS

O

M–CH3 267

193

60 O-TMS

40 105

126

20

149 135

119

+.

207

165

M

282

179

151

0 (b) 100

149

80

91

O

O-TMS Salicylic acid O-TMS

60

209

M–CH3

40 135

267

147

20

193

105 115

175 181

221 233

249 281

0 80

100

120

140

160

180

200

220

240

260

280

300

m/z Figure 14 Quantifying salicylic acid in tomato. Full-scan mass spectra of the per-trimethylsilyl derivatives of 3-hydroxybenzoic acid internal standard (IS) (a) and salicylic acid (b). In both cases, the [M CH3]þ ion (m/z 267), a structurally significant ion, was chosen for selected reaction monitoring.

of quenching is readily monitored. However, if, as in a metabolomic analysis, the objective is to analyze as many metabolites as possible, then it is inevitable that some compromise will have to be made, in which case reproducibility of the process becomes very important (see the review by Villas-Boˆas,157 and two case studies of optimizing metabolomic assays in blood plasma,158 and plant tissue.159 9.10.4.5.3

Fractionation and extraction of sample For a targeted assay, considerable effort is typically devoted to extracting the analyte or analyte class away from the sample matrix. Depending on the matrix, this may be as simple as a liquid–liquid extraction, selecting appropriate solid-phase extraction (SPE) chemistry (e.g., C-18 or ion exchange), or using affinity chromatography (specific lectins or antibodies bound to an inert matrix).13,152,156 This step is designed to, first, reduce the possibility of interference by isobaric ions and, second, to reduce the possibility of ion suppression in the ionization process (ESI and APCI). This sample purification step may then be complemented by a high-resolution chromatographic separation interfaced to the MS source (e.g., nano-LC or GC). ESI suppression has been correlated with high concentrations of nonvolatile matrix materials present in the spray and it is thought that this acts by inhibiting the formation of smaller droplets. Salts (e.g., phosphates and

Mass Spectrometry: An Essential Tool for Trace Identification and Quantification

373

(a) SA, Rt = 15.62 min

80 Total ion current

60

18.64

19.92 19.36

40 11.01

15.11

13.05

12.43

13.98

16.28

17.02 17.23

18.05 18.27

20 (b) 80 11.05

60

10.78

18.64

m/z 267

19.19

11.63 11.83

12.75 13.20

13.97

13

14

14.63 15.13

16.26

16.98 18.11 17.28 17.82

40 20 10

11

12

15 Time (min)

16

17

18

19

20

(c) 100

207

80 60

267

SA, m/z 267 M–CH3

40 191

20 133 115

147 165 177

208

225

281 253

327

355

0 100

150

200

250

300

350 m/z

401 415

400

461

450

500

550

Figure 15 Quantifying salicylic acid in tomato. The full-scan mass spectra of the trimethylsilylated tomato extract contains too much background chemical noise for the salicylic acid to be satisfactorily assayed. Neither the total ion current chromatogram (a) nor the extracted ion chromatogram of m/z 276 (b) contains a discrete peak for salicylic acid, although m/z 276 is observable in the mass spectra corresponding to the retention time of the analyte (c).

sulfates) and ion-pairing reagents (e.g., trifluoroacetic acid) are also implicated in ion suppression.160 Matrix effects may also be minimized by a simple process of sample dilution.161 As mentioned above, in a competitive ionization process, molecules with the lowest ionization potentials will be preferentially ionized and it is quite possible that this competition, in addition to matrix suppression, will result in the relative abundance of sample metabolites not being reflected in the MS data. Any reduction in the number of analyte ions available for analysis will have an impact on the assay with loss of sensitivity (higher limits of detection and quantitation). In the case of an untargeted metabolomic experiment, the issue of sample cleanup is complicated by the need to retain as many of the metabolites as possible, and avoiding bias against any particular group or class of components. This is often resolved by fractionation into a number of subsamples, for example, by retention and analysis of the remaining aqueous phase after solvent extraction or fractionation by mixed mode SPE. The trade-off here is that more analytes are potentially available for assay but at the expense of time devoted to running many more analyses on the different sample fractions.157,162 9.10.4.5.4

Internal standards In any analytical procedure, it is inevitable that there will be variations in instrumental parameters and in compliance with analytical protocols. It is also important to remember that in a mass spectrometer equimolar

374 Mass Spectrometry: An Essential Tool for Trace Identification and Quantification

(a) 100

223

80 3HBA, MS2 of m/z 267

60

40

179

207 193 251

20 267 165

149

195

237

225

249

252

0 (b) 100

209

80 SA, MS2 of m/z 267

60

40 251 233

193

20

267

181

249

149 175

223

195

0 140

160

180

200

220

225

239 247

240

252

260

m/z Figure 16 Quantifying salicylic acid in tomato. MS2 of m/z 267 for both 3-hydroxybenzoic acid (a) and salicylic acid (b). The salicylic acid ion at m/z 209 was chosen for quantification against the m/z 223 ion from the internal standard, 3-hydroxybenzoic acid.

amounts of different compounds do not give an equal response because of variation in the ionization efficiency, which is in part dependent on the molecular structure and in part the result of competition (ion suppression) from other analytes present in the source. These procedural and instrumental variations will affect the accuracy and precision of the assay; however, they may be compensated for by the inclusion of a standard in the assay. There are four possible ways in which a standard may be incorporated into an MS-based assay. The first is the use of a stable isotope-labeled standard (isotopomer) of the target analyte. The most common isotopes available for use include deuterium (2H), 13C, 15N, and 18O. The advantage of this approach is that the labeled standard will have identical chemical properties to the analyte and will be partitioned with the analyte throughout the analytical procedure, eliminating extraction and instrument bias and compensating for any ionization suppression by matching the ionization properties of the analyte. Thus the ratio of the amount of IS to analyte will remain constant up to the point of analysis. The mass spectrometer will then be able to independently detect the isotopically labeled standard by virtue of the heavier mass of its parent ion and fragment ions containing the labeled moiety. Quantification is then achieved by measuring the ratio of ions from the analyte and the IS, rather than an absolute value as in the use of an external standard. Then knowing the amount of standard added, the amount of the analyte present in the sample can be calculated from a comparison of the determined ion ratios (Figure 18). Some caution needs to be exercised in using such a standard; the label should ideally be nonexchangeable and the number of incorporated isotopes must be sufficient so that there is minimal cross talk from the naturally

Mass Spectrometry: An Essential Tool for Trace Identification and Quantification

375

(a) 100

6.45

80

SA m/z 209

SRM of m/z 267

60 40 20 0 (b) 100

7.00

80 3HBA

6.66

m/z 223

SRM of m/z 267

60 40 5.67

20

7.83

6.23

7.99

7.38

0 5.0

6.0

7.0

8.0

9.0

10.0

Time (min) Figure 17 Quantifying salicylic acid in tomato. Selected reaction monitoring of the transition from m/z 267 to m/z 209 and from m/z 267 to m/z 223 from a tomato extract. Salicylic acid (Rt 6.45 min) (a) and 3-hydroxybenzoic acid (Rt 6.66 min) (b) can be observed at an S/N of 222 and 95, respectively.

Response ratio RA/RIS o o o a

c

o o

b

o o Amount ratio A/IS Figure 18 Calibration curves using an internal standard (IS). Analytes are quantified against an IS that has been added as early as possible in the analytical procedure. The ratios of detector responses for the analyte (RA) and IS (RIS) are plotted against the ratio of known amounts of analyte (A) and IS. When a sample is analyzed, the ratio RA/RIS is measured. Then knowing the amount of IS added into the sample, the amount of analyte present in the sample can be estimated. Curves that do not pass through the origin of the graph or which are nonlinear are diagnostic of (a) chemical interference or sample carryover, (b) sample loss during the assay due to adsorption, and (c) saturation or cross-contribution between the IS and the analyte.

occurring levels of 13C and from isotopes of chlorine, bromine, and sulfur, when present. This also means that the highest degree of isotopic incorporation should be sought. Any large degree of cross talk will pose a limitation on the ultimate sensitivity of the assay. Nevertheless, isotopically labeled standards are usually regarded as approaching ideal although they are costly and of limited availability. Where a stable isotope-labeled standard is unavailable, the analyst can use either a chemically similar homologue (e.g., incorporating an additional methylene; different m/z values to monitor) or a chemically similar analogue (e.g., geometric isomer; same m/z values to monitor) that will need to be chromatographically separated

376 Mass Spectrometry: An Essential Tool for Trace Identification and Quantification

from the analyte. Obviously, it is also important that this chosen standard is not present in the sample (see, e.g., the use of 3-hydroxybenzoic acid as an IS for the quantification of salicylic acid in tomato in Figures 11–17). In metabolomic experiments, where hundreds of analytes are to be quantified, a number of ISs representing different chemical classes of analytes are generally used (see, e.g., Jiye et al.158 and Gullberg et al.159). These experiments are primarily comparative in nature as the experimenter is seeking to identify relative changes in metabolite levels and relative changes in metabolite fluxes as they occur in different experimental states. 9.10.4.5.5

Standard addition Where there is no appropriate standard for an analyte, quantification can be made by standard addition (spiking). In this procedure, the sample is divided into several aliquots of equal volume and a series of known but increasing amounts of the analyte standard are then added to each aliquot. The samples are then diluted to the same volume yielding a series of solutions with equal concentrations of matrix but increasing concentrations of analyte. These samples are then analyzed individually for the analyte of interest and the concentration of the unknown can then be calculated from where the regression curve of the responses versus the standard additions intercepts the abscissa (y ¼ 0) (Figure 19). The advantage of this method is the elimination of any chemical or physical bias between the standards and samples but this is achieved at the cost of a six- or sevenfold increase in the number of determinations required for each sample. 9.10.4.5.6

External standards External standards, so named because they are not added to the sample, are also occasionally used but are generally only applicable to samples requiring limited preparation and for which a consistent high degree of reproducibility and good recovery can be attained. Experiments should also be completed as quickly as possible to minimize instrumental variations (e.g., ion source contamination). In brief, instrument response is plotted against the concentration (or amount) of standard analyzed and this response curve is then used to calculate analyte concentration (or amount). However, unless the matrix is well characterized, this method can be subject to matrix effects (ion suppression) and to interference from isobaric matrix components. 9.10.4.5.7

Optimization of the MS assay In quantitative mass spectrometric assays, sources of error can be reduced to those associated with sample handling and processing, and to instrumental variation, for example, source contamination and stability of mass calibration. In general, the largest component of error is associated with sample handling and processing.163 To a large degree, variations in protocols for purification and derivatization, poor technique, and even gross spillage of sample can be (a)

(b) o

Response o

Δ Response (y – yo)

o

o

o

Extrapolation x intercept y=0

o

o

o

Interpolation

o

o

yo

o

o o o

[Analyte]

[Analyte] standard added

[Analyte] standard added

Figure 19 Standard addition calibration curves. Equal volumes of solvent containing varying amounts of standard are added (spiked) into the sample. The samples are analyzed and the analyzer response (e.g., area under the TIC or selected ion chromatogram) is plotted against the amount of standard added. The analyte concentration is estimated by extrapolating a linear least-squares regression to y ¼ 0 (a). An alternative approach is to plot the difference between the spiked samples and the unspiked sample. The same calibration curve now passes through the origin and the sample analyte concentration can now be determined by interpolation with improved confidence limits164 (b).

Mass Spectrometry: An Essential Tool for Trace Identification and Quantification

377

obviated by the use of an IS, as outlined above. Once the ratio of internal standard to analyte has been established, it will remain unchanged as long as the standard and the analyte have the same chemical properties. The integrity of the assay then depends almost entirely on the analyst’s ability to accurately weigh, dissolve, dilute, and dispense the IS into the sample as required. Any errors associated with the IS will be propagated throughout the entire assay. This equates to the analyst having a basic knowledge and understanding of the analytical capabilities of balances (milligram quantities measured on analytical balance), volumetric flasks (clean and temperature equilibrated), and pipettes (calibrated, serviced, and used appropriately).143 It is thus critical that the IS be added to the sample at the earliest possible stage of the assay, usually the quenching or homogenization steps, and that it be allowed to equilibrate with the analyte in the sample matrix over some defined period of time (Figure 13). This is particularly important where there is nonspecific binding of the analyte to proteins or other cellular debris and where complete (100%) recovery cannot be achieved. The equilibration time can be established by a time-course study.163 Adding too much or too little IS can also limit the dynamic range of the assay, as the comparison of very large ion currents (detector saturation) with very small ion currents (poor ion statistics) will greatly increase the variance of the assay. A good guide is a threefold excess of the IS over the analyte but this may take a few trials to establish. Other errors may be introduced into an MS assay by interference from isobaric ions. There are a number of possible remedies for this, including revising the sample preparation, changing the GC or LC column to separate away the interference, selecting an alternative structurally specific ion for the assay, and increasing the assay specificity by increasing mass resolution to monitor ions of selected elemental compositions. It is important to remember that the analyte spectrum should not be examined in isolation when choosing a set of ions for quantification but should include an appreciation of the ‘background’ ions that are also likely to be present, for example, ions from GC column bleed or solvent/reagent adduct ions from ESI (see Section 9.10.4.5.8). Any change in retention time and/or the shape of the chromatographic peak is likely to be indicative of interference, which is to say a lack of assay specificity. Selected ions should be structurally specific to the analyte and should be abundant in order to maximize the assay sensitivity. In the example of the assay for salicylic acid in tomato (Figure 14), the ion selected for MS2 was the structurally specific [M CH3]þ? ion for both the analyte and the IS. The ion at m/z 91, although intense, is a tropylium ion (C7 H7 þ? ) and would be an inappropriate selection as it would be present in most analytes containing a benzyl moiety. The ions at m/z 223 and m/z 209 in the product spectra (Figure 16) were chosen for the quantification because they were the most intense. For assays based on a full-scan MS, the specificity and sensitivity can be increased by

• • • •

Careful selection of ions for quantification. In general, higher m/z values are less subject to interference. Using a different analyte derivative might assist in this. It must be borne in mind, however, that regulatory authorities will require these to be structurally specific ions.144,145 Note that moving to SIM will not improve selectivity over that of a full scan but will improve sensitivity. Moving to high-resolution SIM and targeting a specific elemental composition may remove interference except when the interfering compound has the same elemental composition as the analyte. In this case, one should suspect an analyte isomer and change the sample chromatography accordingly. Using SRM. It is unlikely that the interfering ion will fragment in the same way as the analyte and the elimination of background chemical ‘noise’ by SRM will also improve sensitivity. Again, the ions selected should be structurally specific.144,145 See, for example, the S/Ns in the TIC for the salicylate assay (Figure 15) and compare them with those realized in the SRM traces (Figure 17). Using high-resolution SRM and targeting specific elemental compositions in the precursor and product ions.

A comprehensive discussion of trace quantitation using MS, including error calculations, confidence limits, limits of detection (LoD), limits of quantitation (LoQ), and method validation may be found in Boyd et al.,143 but also see a more general discussion of these issues as they pertain to analytical chemistry by Pritchard and Barwick.151 An example of a method validated according to the FDA and EU guidelines is described by Hermo et al.165 These authors used LC–ToF MS to determine the levels of multiresidue antibiotic quinolines in pig livers below the maximum residue limits. They describe the optimization of their method, which is then

378 Mass Spectrometry: An Essential Tool for Trace Identification and Quantification

comprehensively characterized by the determination of the linearity, the decision limit, LoD, LoQ, the precision, the accuracy, and finally the recoveries for the different residues. 9.10.4.5.8

Chemical noise and contamination As the sensitivity of mass spectral-based assays has improved and the interest in quantifying trace analytes has increased, the problems associated with chemical noise and sample contamination have also increased. Chemical noise and contamination in an assay have the effect of reducing the S/N of the analyte signal. This places an immediate restriction on achieving the full potential of the instrumental sensitivity with the assay LoD and LoQ set higher than they might otherwise have been. While it is unlikely that chemical noise and contamination can ever be completely eliminated, they can be minimized if care is taken to avoid known sources of contamination when the assay protocols are being planned (see reviews by Ende and Spiteller166 and Keller et al.,167 on mass spectral contaminants and their origins; the supplementary data in the latter review includes a literature compilation of contaminants in an Excel spreadsheet). In general, sample contamination can be sourced to almost every part of the assay, including

• • • • • • •

the person of the analyst (e.g., keratin proteins, fatty acids, amino acids, and cosmetic residues from hair and skin), solvents (e.g., degradation products, antioxidants, and stabilizers), reagents used in sample preparation (e.g., proteins, detergents, antioxidants, chemical bleed from ‘dip sticks’), laboratory ware (e.g., detergent residues, plasticizers, lubricants), chromatography (GC or LC column degradation or bleed, and late eluting components of previous samples), ionization process (matrix clusters from MALDI and solvent clusters from ESI, APCI, APPI, and DESI; clusters may also include common alkali metal cations, Naþ and Kþ, in addition to other cations from the assay reagents), and sample carryover and cross-contamination (inadequate washing of components that are reused for each batch of samples, e.g., pipettors, recycled sample vials, GC and LC autosamplers).

Some of the precautions that can be exercised should be a normal part of good laboratory practice, and include the appropriate use of personal protection. Covering the hair and using gloves will minimize the possibility of contamination from skin and hair-derived keratin proteins, as well as amino acids, fatty acids, and cosmetic residues from the skin surface. It is also obvious that, unless determined otherwise, the highest quality solvents should be used. This is particularly so in the case of water, which, as the initial solvent in reverse-phase chromatography, can concentrate impurities at the head of the column. It is worth remembering that with laboratory-prepared water, the outlet conductivity meter provides an estimation of residual ions in the water and does not provide a measure of any neutral organic contaminants, should they be present. Again good laboratory practice, as exemplified by the recommended periodic changes of the purification cartridges, is the best way to prevent this water becoming a source of contamination. Other potential contaminants are, however, less obvious and these include lubricants (e.g., silicone grease), plasticizers (e.g., phthalates, phenyl phosphates, sebacates, and bisphenol A), slip agents (e.g., oleamide, erucamide, and stearamide), biocides (e.g., quarternary ammonium compounds) and polymers extracted from laboratory consumables (e.g., silicones from laboratory tubing) and membrane filters (e.g., cyclic oligomers and Nylon 66).168,169 Contaminants can also be sourced to reagents used in sample protocols. For example, in the extraction and purification of proteins, it is common to use detergents, which, unless they are removed from the sample that is presented for MS analysis, can represent a very persistent form of contamination that is not readily removed except by long periods of washing or by replacement of the LC column and other associated components. Detergents can also be inadvertently introduced from laboratory glassware or sample vials that have been inadequately rinsed after washing. In addition to analyst-derived keratins, proteins/peptides can also be introduced, for example, in ‘bottom-up’ proteomics where the sample is digested by a proteolytic enzyme (most commonly trypsin).

Mass Spectrometry: An Essential Tool for Trace Identification and Quantification

379

This will give rise to a set of autolysis peptides from the self-digestion of the enzyme. These autolysis peptides are impossible to eliminate but can be minimized by using the highest quality autolysis-resistant enzyme. Other proteins such as bovine serum albumin (BSA) may be used in the immunopurification of specific proteins. Again, if this is an unavoidable part of the protocol, then the analyst should expect to observe peptide ions derived from these proteins. Chromatographic materials (e.g., solid-phase extraction tubes, LC and GC columns, TLC plates) including single-use materials must be thoroughly, and appropriately, washed or conditioned to remove contaminants originating from the manufacturing process or those that may have been acquired by exposure to packing materials or to the laboratory atmosphere. In some cases, contamination is unavoidable and it is important for the analyst to be able to recognize this and to plan accordingly. For example, all GC columns and septa continuously shed volatile siloxanes, a process known as ‘bleeding’, as the temperature is raised. The amount of bleed is proportional to the temperature and to the amount of phase on the column. Thus columns with thicker phase coatings generally bleed more, and especially so at higher temperatures. Although modern column phase chemistry is extremely robust, columns are inevitably degraded over time with a concomitant increase in bleed. This phase degradation is accelerated by trace levels of oxygen in the carrier gas at high temperatures, so it is important to ensure that a functional oxygen trap is part of the in-line gas purification process and that column temperature limits are not exceeded. LC columns can also bleed in the presence of the eluting solvent but for modern columns operated within their pH range this problem is generally minimal but will be exacerbated when chromatographing at high temperatures. Of greater concern is the solvent and adduct clusters generated by the atmospheric pressure ionization (ESI, APCI, APPI, and DESI).170 The many combinations and permutations of solvent clusters, complicated by the inclusion of solvent modifiers such as acetic acid, formic acid, or triethylamine, along with the ever-present sodium and potassium cations, form a very complex chemical background against which analyses must be performed. Moreover, in the case of a solvent gradient, this chemical noise will be changing, over time, in accord with the gradient. In addition to solvent clusters, clusters also form around the eluting analytes and any contaminants picked up during the assay. A number of hardware approaches have been used to minimize the impact of this chemical noise, including orthogonal and ‘z’ geometries for the spray outlet and MS inlet, the use of nebulizing and curtain gases, and ion mobility interfaces (FAIMS and TWIMS in Section 9.10.2.3.9). Tandem MS (Section 9.10.3) can be used to remove much of the remaining chemical noise and this approach can be enhanced by the use of curved collision cells to eliminate the transmission of fast neutrals to subsequent stages of tandem MS. A considerable amount of chemical noise (mostly <m/z 1000) in the form of matrix clusters is also generated by MALDI and this has generally precluded MALDI from being used to analyze small molecules. Some reduction in the occurrence and intensity of matrix clusters can be obtained by minimizing salt contamination by on-target washing and/or by sample purification (e.g., use of Zip-Tips)167 but this must be offset against potential loss of hydrophilic analytes. Various attempts have been made to find a substitute for the MALDI matrix but to date these have lacked universal acceptance, often related to the cost, difficulty of preparation, ease of contamination, or lack of long-term stability of the alternative target surfaces (e.g., porous silica, sol gels, graphite, carbon nanotubes, fullerenes, and polymers171). Finally, contamination of sample spectra can also occur by cross-contamination during sample preparation and by carryover of residual analyte from a sample analyzed earlier in the run.172,173 Essentially, any component of the assay that is reused for each sample or batch of samples can be a source of cross-contamination or carryover. These include, for example, evaporators, pipettors, automated liquid handlers, recycled sample vials, and LC and GC autosamplers. Care needs to be taken in the selection of appropriate wash solvents that will readily solubilize the sample and analytes. This will usually be a combination of high percentage of organic solvents that may include a volatile acidic or basic modifier (e.g., formic acid or aqueous ammonia). Failure to properly wash all sample components from a chromatographic column can result in late eluting components appearing in the next, or later, analytical runs. Unless care is exercised by the analyst, both these forms of contamination can go unnoticed and erroneous results may be reported for individual samples. Problems with cross-contamination should normally be identified during the validation phase of method development by the judicious use of blanks to test for problems with general laboratory contamination, sample preparation, and the autosampler. Carryover is

380 Mass Spectrometry: An Essential Tool for Trace Identification and Quantification

assessed by injecting one or more blanks after a high concentration sample, normally at the upper limit of quantitation. If the carryover is less than 20% of the lower limit of quantitation, then this is normally deemed to be acceptable. If possible, the analyst should order the analysis from low-concentration samples to high, with high-concentration samples followed by a blank and/or additional cycles of sample syringe washing. Trace analysis and the move to the use of smaller sample sizes represent particular challenges in that the ratio of surface area exposure to sample volume, or quantity of analyte, is increased, multiplying the possible effect and level of contamination. While mass spectral identification of contaminants will aid in identifying their source (see the literature-derived Excel database of contaminant mass spectra in the supplementary data of Keller et al.172), this is not essential. The key tool to their elimination is the appropriate use of sample blanks at each step of the analytical protocol during method development and validation. 9.10.4.6

MS Imaging

Traditional histological studies of tissue sections have been limited to either light or electron microscopy. Both these techniques have been used to obtain limited amounts of chemical information from the examined tissue. For the most part, this has been achieved through the use of a small number of specific chemical, radiographic, autoradiographic, and immunological stains. More recently, organisms have been genetically engineered to incorporate fluorescent tags into proteins (e.g., green fluorescent protein, GFP); however, these can potentially interfere with the normal functioning of the tagged protein. In general, although microscopy can yield excellent images at high resolution, there is little direct chemical information on the imaged components of the tissue surface. Since the original idea to generate chemical images of tissue sections using MALDI-MS (Section 9.10.2.2.7),174–177 two additional ionization techniques, SIMS (Section 9.10.2.2.8)11,60,178 and DESI (Section 9.10.2.2.9),179 have been added to the imaging repertoire but these have yet to gain the relative popularity of MALDI imaging. Unlike the traditional histological stains, these three MS imaging techniques require no prior assumptions about chemical identity and they are capable of sensitively visualizing a large range of small (e.g., metabolites) and large molecules (e.g., proteins) provided that they are ionizable for subsequent MS analysis, including direct molecular identification using tandem spectrometry (MSn). The great challenge in preparing a tissue sample for MALDI-MS imaging is that two contradictory processes must occur.176,180 First, tissue sections are frozen in liquid nitrogen to avoid delocalization and degradation of the peptide and protein analytes. Sections are then prepared by cryosectioning and these are then mounted on a cold MALDI target. Next, matrix, either sinapinic acid for high-molecular-weight proteins or -cyano-4-hydroxycinnamic acid for low-molecular-weight peptides and proteins (<3 kDa), is applied. For best image resolution and reproducibility, the matrix is usually uniformly sprayed directly onto the tissue surface. At this stage, it is important for the matrix solution to be able to solubilize, extract, and cocrystallize with the protein and peptide analytes while minimizing their delocalization. Several coatings of matrix are usually applied with a short interval between applications for the solvent to dry. This avoids the problem of large quantities of solvent potentially mobilizing the surface analytes with concomitant loss of image resolution. Images are then obtained by rastering a laser across the tissue surface, desorbing and ionizing the analytes, and generating spectra from specific locations (pixels). Virtual images based on the location of specific ions can then be generated and matched to the images of other specific ions and to light microscope images of the section. The intrinsic value of these MALDI images can be greatly enhanced if they can be precisely aligned with the image generated by traditional histopathological stains. Several approaches to this goal have been investigated and include rinsing the matrix from the tissue surface before applying the histological stain, staining the consecutive section,181 and the use of MALDI-compatible stains.182 Most recently, Caprioli’s group183 have reported a novel method of dry coating tissue sections with MALDI matrix, thus minimizing the problem of the matrix solvent mobilizing the surface analytes. The dry coating procedure proved to be simple and rapid and yielded high-quality images of phospholipids. Where it may not be convenient, or possible, to immediately flash freeze a tissue sample, ethanol-preserved paraffin-embedded specimens may also be used for MALDI imaging.184 Thinner microtome sections can be cut from the frozen tissue following this treatment and this is an advantage where comparisons need to be made with traditional histological stains for light microscopy.

Mass Spectrometry: An Essential Tool for Trace Identification and Quantification

381

Neither DESI nor SIMS requires any special treatment of the sample surface and images are generated by rastering a microprobe spray of solvent (DESI) or a beam of energetic ions (SIMS) over the tissue section. Virtual images of desorbed secondary ions are then generated as for MALDI imaging. SIMS and MALDI imaging must both be carried out in a high vacuum. They are also complementary techniques in that SIMS is applicable to small molecules (<500 m/z) and MALDI to large molecules (>800 m/z to avoid matrix cluster ions) with the SIMS analysis being performed prior to the application of matrix.59 SIMS imaging has been reported as being able to achieve lateral resolutions down to 50 nm11 and MALDI imaging has achieved resolutions down to 10–25 mm. Unlike MALDI (> 800 Da) and SIMS (< 500 Da), the new DESI technique can usefully image both small metabolite molecules and large proteins but the spatial resolution is limited to only slightly better than 400 mm179 when sampling from tissue sections. The image resolution that DESI can achieve is determined by the cross-sectional area of the applied solvent spray as it strikes the target and this in turn is determined by solvent flow rate, solvent composition, applied voltage, and size of spray orifice. The height of the spray orifice above the surface, the angle of the incident spray (55 ), and the angle at which the desorbed and ionized analytes are sampled (0–20 ) are also critical. When sampling from printed patterns on paper and thin-layer chromatography plates, image resolutions of 40 mm have been reported.185 Although DESI imaging will require further development before it can compete with the resolution achieved to date by SIMS and MALDI, it does offer the advantage of being applicable to surfaces and samples not readily brought within the vacuum system of the mass spectrometer. With the release of the first generation of commercial MS imaging instruments, MS imaging is being actively applied to problems in biology and human health. If the availability of chemical images, particularly if they can be correlated with the images from the traditional histopathology stains, proves useful, this will feedback to promote further technical developments of the technique. However, future developments in a clinical or diagnostic setting will have to meet the challenge of quality assurance and information validation.18