Beyond structural bioinformatics for genomics with dynamics characterization of an expanded KRAS mutational landscape

Current capabilities in genomic sequencing outpace functional interpretations. Our previous work showed that 3D protein structure calculations enhance mechanistic understanding of genetic variation in sequenced tumors and patients with rare diseases. The KRAS GTPase is among the critical genetic factors driving cancer and germline conditions. Because KRAS-altered tumors frequently harbor one of three classic hotspot mutations, nearly all studies have focused on these mutations, leaving significant functional ambiguity across the broader KRAS genomic landscape observed in cancer and non-cancer diseases. Herein, we extend structural bioinformatics with molecular simulations to study an expanded landscape of 86 KRAS mutations. We identify multiple coordinated changes strongly associated with experimentally established KRAS biophysical and biochemical properties. The patterns we observe span hotspot and non-hotspot alterations, which can all dysregulate Switch regions, producing mutation-restricted conformations with different effector binding propensities. We experimentally measured mutation thermostability and identified shared and distinct patterns with simulations. Our results indicate mutation-specific conformations, which show potential for future research into how these alterations reverberate into different molecular and cellular functions. The data we present is not predictable using current genomic tools, demonstrating the added functional information derived from molecular simulations for interpreting human genetic variation.


Introduction
In research and clinical genomics, determining which uninterpreted genetic variants (variants of unknown significance, VUS) carry biomedical significance is a significant unmet need.The need is acute in proto-oncogenes, which can drive malignant and non-malignant diseases alike.Currently, variants are classified using guidelines that mainly rely on genetic observations and linear models of DNA and protein sequences [1,2].Appropriate incorporation of mechanistic biophysical and biochemical features of the encoded 3D molecule will significantly improve the accuracy of interpreting disease-associated genomic variation compared to existing genomics tools.Current genomics tools predict missense mutations in the proto-oncogene KRAS (the rat sarcoma viral oncogene homolog discovered by Kirsten) to be uniformly damaging.Yet, experiments show that different mutations have distinct properties.KRAS enzymatic activity is conformationally regulated by specific loops called the Switch regions, whose positions also determine whether KRAS binds to other effector proteins.Switch conformations differ between the GDP and GTP-bound states, such that the GTP-bound form binds the effectors that activate the downstream cell signaling cascade.Thus, enzymatic and cell signaling activities interchange, with enzyme activity turning off cellular activity, while compromised GTPase activity leads to greater cellular activity.A previous study of KRAS G12D compared to G13D [3], for example, demonstrated that mutations lead to alternate Switch positions that define mutation-specific conformations and change binding to effector proteins.However, only the most common mutations have been characterized in depth, leaving significant ambiguity about the intrinsic effects of the expanded mutational landscape.Thus, KRAS is an excellent model system to advance biomedical knowledge and develop specific methodologies for using molecular simulations to interpret inter-individual genetic variation.
The current study illuminates specific details about KRAS mutant proteins.We recently used experimental structures across the RAS GTPase family to increase the possible scale, leveraging structure bioinformatics to interpret human genetic variation [4,5].Recent computational work demonstrated the energetic coupling between critical structural features of KRAS [6], namely the p-loop and Switches.Then, simulations can reveal the detailed molecular underpinnings of novel KRAS mutations [7].The most extensive KRAS study of this type analyzed six different G12 position variants [8].Herein, we use physics-based simulations and experimental biophysical validation to bring new and uniform information across a broad landscape of 86 KRAS mutations observed in cancer and non-cancer diseases.Our simulations capture dynamic movements across the protein, with the largest movements observed for the Switch regions.We demonstrate that each mutation produces shared and distinct protein structural and dynamic features.Differences across intrinsic features of KRAS mutant proteins will set the stage for how the enzyme behaves within human cells and across body tissues, shaping cellular responses.Therefore, the data generated and knowledge derived will transfer to different contexts, such as across tumors from other body tissues, empowering the interpretation of existing and future studies.

Developing and generating molecular dynamics simulation data
The initial conformations were taken from Chain A of the X-ray crystallography structure from the Protein Data Bank [9] of human WT KRAS bound to GDP-Mg 2+ (4OBE [10]).FoldX v4 [11] was used for computational mutagenesis, generating 86 variant structures.The 86 variants were selected according to incidence across human cancers and inherited germline RASopathies, concordant with our previous studies [5].Each mutant KRAS structure was modeled bound to GTP-Mg 2+ by fitting the 6OB2 [12] GDP-PnP to GDP and converting it into GTP.After these initial steps, we had two versions of each mutation, plus WT, wherein one was GTP bound, and the other was GDP bound.
All structures were prepared for simulations with a uniform initial environment.The CHARMM36 force field [13][14][15][16], CHARMM-modified TIP3P water [17][18][19] (including Lennard-Jones terms of hydrogen bonds), and standard CHARMM ion parameters were used [20].Our KRAS-WT model was centered in a cubic cell with a box-solute minimum distance of 10 Å, filled with TIP3P and neutralizing counter-ions up to 150 mM KCl.Each mutant KRAS model was placed in this initial environment.
Each system was independently energy-minimized using NAMD [21] with 10,000 steps of steepest decent minimization.Three independent replicates were produced for each KRAS mutation by generating different random velocities at the outset of equilibration.All protein and ligand atoms were restrained to their initial positions using a 10 kcal mol − 1 Å − 2 harmonic constraint.The energy was initialized at 10 K and following the Boltzmann distribution.The energy was slowly increased to 300 K over 0.7 ns with position restraints applied to all nonhydrogen protein atoms.Then, restraints were released; initially lowered to 5 kcal mol − 1 Å − 2 for 0.1 ns, then 2 kcal mol − 1 Å − 2 for 0.1 ns, and gradually released by increments of 0.05 kcal mol − 1 Å − 2 for 20 ps each until they reached zero.Then, free equilibration was performed for 12 ns under the NPT ensemble, implemented via the Langevin thermostat method at 300 K with a friction coefficient of 5 ps − 1 .Water molecules and ions were allowed to diffuse freely during equilibration time.The Langevin piston method [22] was applied to keep the pressure at 1 atm with an oscillation period of 400fs, a decay time of 100fs, and an isotopically scaled box.Periodic Boundary Conditions were applied in all three spatial directions.The particle mesh Ewald method [23,24] was used to calculate electrostatic interactions with a real-space cutoff of 12 Å and a grid spacing of ~1 Å.Then, the short-range van der Waals forces were switched from 10 to 12 Å.All bonds involving hydrogen atoms were constrained using the SHAKE algorithm [25], and all water molecules were constrained with SETTLE [26], with an integration timestep of 1fs.
After these system preparations, NAMD was also used to perform triplicate unrestrained simulations that extended from each equilibration run.These were completed in the NVT ensemble for 40 ns each (120 ns total) for each variant in both GDP and GTP bound forms.The full simulation time between all variants and nucleotides was 21 μs.The temperature in production simulations was maintained using the Anderson thermostat [27] with a 1 ps − 1 collision frequency.The pressure was isotopically regulated with the Monte-Carlo barostat, with box scaling attempted every 20 integration steps.Other parameters were the same as during equilibration.

Structural calculations and statistical comparisons
The root-mean-square deviation (RMSD) and root-mean-square fluctuation (RMSF) values were calculated using C α atoms after structurally aligning each trajectory to the initial WT conformation, using geometric matching [28] with default parameters and ignoring the two mobile Switch loops.Principal Component (PC) analysis summarized the dominant conformational changes across trajectories.PCs were calculated in Cartesian coordinates and using C α atoms.An observed Free Energy Landscape (FEL) was calculated for each PC-based subspace using the approach of Karamzadeh et al. [29].Differences between each mutation and WT were quantified using a distributional comparison.In this approach, the sampling of PC1 by a mutation was scored as the number of standard deviations in its mean, from the WT-mean, and WT-variation along PC1.This procedure was done for each of the top 3 PCs, providing standardized scores for each mutation.
Amino acid pairwise distances from the literature are used as markers for key experimentally determined biophysical or biochemical properties of KRAS and other RAS-family proto-oncogenes [3,7,30,31].Given the robust data supporting each, they were used to interpret our dynamics-based data.When stretching or compressing the distances between the amino acid pairs was used, the changes observed in the simulation were interpreted according to the biophysically and biochemically derived precedent.Thermostability measurements (described below) were also used for validation, which in this study have an unprecedented advantage over previous studies of being uniformly generated across the mutational landscape.Significance was scored according to how many standard deviations (denoted with σ) from WT-median values, and each mutation's median was observed.Multiple structure-and dynamics-based scores and calculations were combined.We generated cross-correlation matrices to understand their inter-relationships.All correlations are Spearman's unless stated otherwise.Then, we made UMAP embeddings [32] with default parameters to visualize the relative similarities of KRAS mutations across the dataset.

Establishing meta classifications of KRAS mutant proteins
Multiple calculations were made, and each was used to classify mutations based on their similarity to WT. Atomic interaction energy scores (E int ) were calculated using the NAMD Energy plugin and default parameters, with the same force field as the simulations were generated.The GTP and GDP E int scores were split into nine categories: Variants within one standard deviation (1σ) of the WT mean are considered neutral variants; variants within 1σ of WT GDP but more than 1σ away from WT GTP are referred to as "GDP Stable" or "GTP Unstable."Conversely, variants within 1σ from WT GTP but more than 1σ away from WT GDP are referred to as "GTP Stable" or "GDP Unstable."Variants that do not have GDP or GTP values within 1σ of WT will either be overly stable or unstable for both nucleotides.As such, variants were categorized as Stable, Neutral, and Unstable in two dimensions (GDP and GTP energy) to produce nine groups of variants.RMSD scores, calculated from RAS NF1 (PDB: 6OB2) and SOS1 (PDB: 6EPM) bound conformations, were split according to WT-like or deviated, according to their relative difference from WT in both GDP and GTP states.To score changes in dynamics sampling as measured by PC analysis, E int , RMSD, and distance monitor sampling, we split variants into three categories eachthose that exhibited significantly lower, WT-like, or considerably higher values; variants ≥ 1σ from the mean classified as high, while variants ≤ − 1σ are classified as low.To score changes in dynamics sampling as measured by PC analysis, RMSD, and distance monitor sampling, we split variants into three categories eachthose that exhibited significantly lower, WT-like, or considerably higher values; variants ≥ 1σ from the mean classified as high, while variants ≤ − 1σ are classified as low.This procedure was applied to 20 scores, producing 108 meta-classifications.Of these 108 meta-classifications, we filtered those not among the top 10 defining features of any k-means groups to generate a signature for each k-means group across the many metaclassifications to account for multiple protein features concurrently.Twelve K-means clustering groups of meta-class signatures were developed, with six groups in each GTP and GDP state.The statistical significance of group-wise mean differences was calculated using two-sided t-tests.Analyses were conducted using custom scripts in the R programming language [37].The entire matrix of scores across mutations was visualized using the pheatmap package [38].

Expression and purification of KRAS mutant proteins
The KRAS expression plasmid was obtained from Addgene (Plasmid #111849) as an E. Coli codon optimized plasmid encoding A.A. 1-169.Plasmids were transformed into E. Coli strain BL21(DE3) and grown at 37 • C in 50 mL of M9 media [39] with 15 N ammonium chloride as the sole Nitrogen source to express 15 N-labeled KRAS uniformly.Cells were grown at 37 • C until the optical density at 600 nm was 0.7, then 1 mM IPTG was added, and the temperature was adjusted to 18 • C for overnight induction.The next day, the cells were pelleted by centrifugation at 5000xg and stored at − 20 • C until further processing.
KRAS pellets were resuspended in 50 mM HEPES, 300 mM NaCl, 10 mM imidazole, 5% glycerol, 0.1% beta mercaptoethanol, 0.1 mg/mL Lysozyme, 1% Halt protease inhibitor (Thermo Fisher #7842), and 1 mM MgCl 2 .Then, the pellets were lysed on ice by sonication, centrifuged for 30 min at 15,000 g and 4 C, and the soluble fraction was kept for nickel affinity purification.KRAS proteins were purified in workgroups of 12, using the Maxwell 16 (Promega), utilizing magnetic nickel affinity beads (Promega #V8565) in a semi-automated nickel affinity purification strategy.The Maxwell 16 has a cassette with seven wells; the clarified cell lysate is placed in well 1, wash buffers (50 mM HEPES, pH 7.5, 300 mM NaCl, 10 mM imidazole, and 5% glycerol) in wells 2-6, and 150uL of magnetic resin was also in placed in well 2. The instrument will move the magnetic resin to well 1 to bind the protein of interest, then wash the resin in wells 3-6.The final step involves the instrument eluting the protein from the magnetic resin with the same buffer plus 500 mM Imidazole.After purification, elutions are dialyzed against 50 mM HEPES, pH 7.0, 50 mM NaCl, 10 mM MgCl 2 and 1 mM Dtt.

Nucleotide exchange experiments for producing GDP and GTPanalog-bound proteins
To prepare enough KRAS with the desired nucleotide, about 100uL of 100 µM KRAS was placed into a spin concentrator, 5 kDa MWCO vivaspin 500 (Sartorius #VS0112).Then 300uL of 'low magnesium' buffer (10 mM HEPES, pH 7.5, 50 mM NaCl, 500 µM MgCl 2 and 2 mM Dtt) was added and concentrated to 100uL, then the buffer addition and concentration to 100uL was repeated two more times.To exchange the nucleotide, 100uL of Nucleotide exchange buffer (20 mM HEPES, pH 7.0, 50 mM NaCl, 10 mM EDTA, 20-fold molar excess concentration of nucleotide, GTP, or GPPnP) was added and left on ice for 90 min.Then 20 mM MgCl 2 was added to 'lock' the nucleotide in place and left on ice for another 30 min before using 7 kDa MWCO spin desalting column (Thermo Fisher #89882) to exchange the protein into 10 mM HEPES pH 7.0, 20 mM NaCl, 5 mM MgCl 2 .

Measurement of KRAS thermostabilities
The NanoTemper Prometheus (NanoTemper Technologies) is a nano-DSF that can determine Thermal unfolding temperatures using the intrinsic fluorescence of the protein over a temperature gradient.The instrument uses glass capillaries requiring only 10uL of 1 mg/mL for KRAS.After nucleotide exchange, samples of each protein in both the GDP and GPPnP state were diluted to 1 mg/mL.Then, a capillary is placed in the sample and filled with capillary action.Temperature gradients were 20-95 • C at a speed of 0.5 • C/min.The melting temperature is determined to be the peak maximum of the first derivative of the 350/ 330 nm fluorescence ratio, fit with the software provided with the instrument (PR.ThermControl).

Results and discussion
The intrinsic hydrolytic activity of KRAS is primarily regulated at a conformational level by two loops designated Switch 1 and Switch 2 (Fig. 1A).Specific conformational changes, differently stabilized by GTP and GDP, coincide with binding different nucleotide exchange factors (GEFs), activating proteins (GAPs), or additional signaling proteins (effectors).We generate and analyze Molecular Dynamics (MD) to observe the different Switch conformations across an expanded KRAS mutational landscape.These calculations allow us to distinguish different Switch dynamics for each distinct mutation and group mutations by their similarity in dynamic behavior.Our goal is to interpret better the nuanced effects of each genetic variation on the encoded protein.Indeed, there is even potential to use these dynamics-based scores to develop a robust model capable of providing direct insight on how KRAS mutations cause disease, spanning classic hotspot, nonclassic hotspot, and non-hotspot genetic variants.

Experimental KRAS structural characterization has focused on a narrow field of mutations
We generated an experimental data corpus of KRAS Switch dynamics across multiple functional and ligand-bound states to quantify the current scope of KRAS structural studies.This curated dataset also serves to measure how extensive our simulations sample the experimental landscape.We identify 194 experimentally solved KRAS structures (Fig. 1B).We find that 73 (38%) do not have Switch 1 or Switch 2 fully resolved, meaning these loops were too dynamic in the protein crystal to assign coordinates.Therefore, current experimental data can be extended using simulations to enhance our understanding of Switch functional conformations and their mutational dependencies.Additionally, only 18 distinct mutants have experimentally solved structures, further demonstrating the utility of MD in providing unique, complementary, and highly detailed information on a large scale (Table S1).Of these 18 mutants, WT, G12C, and G12D comprise 137 (71%) of the 194 experimental structures (Table S1).Only 36 (19%) are with KRAS in complex with other proteins (detailed in Table S1), of which 22/36 (61%) of these complexes feature a nanodisc with an engineered RAS binding protein (DARPins and affirmers).Only 7/36 (19%) conformations are in complex with physiologic binding partners such as NF1 (GAP), SOS1 (GEF), and RAF1 (effector).In summary, there is a lack of uniformity across the experimentally determined ensemble of KRAS structures, demonstrating the need for a uniform and systematic approach to obtaining structural and dynamic information across a broad KRAS mutational landscape.

Coordinate PC landscape differentiates among groups of mutations
From our MD simulations, we uniformly and systematically quantified differences in conformational sampling among mutant KRAS proteins by first performing Principal Component (PC) analyses.We focused on the top 3 PCs, which capture the most dominant movements, collectively describing 37.7% of the conformational variation across all simulations.PC1 describes the overall movement of Switch 1, while PC2 and PC3 represent different Switch 2 movements (Fig. S1).We used a free energy landscape approach to summarize the sampling of the top PCs from GTP (Fig. 2A) and GDP (Fig. 2B) bound simulations.A critical difference between the two nucleotide states is that GTP-bound simulations sample around one energy well, while the GDP-simulations sample between two energy wells.These two wells indicate conformations with Switch 1 folded into the protein (positive direction, predominantly occupied by WT) or folded out where it will no longer stabilize the active conformation.This result is concordant with the inactive nature of GDP-bound KRAS.Yet, it is variably sampled by both hotspot and non-hotspot mutations.Switch 1 is stabilized in the GTPbound conformation more than in GDP-bound, due to interactions with the gamma-phosphate and Mg +2 .Further, GDP-bound PC2 correlates with E int while GTP-bound PCs do not (Fig. S2).These data indicate that multiple distinct changes across different mutations can result in similar molecular mechanical phenomena like Switch unfolding.
We quantified each mutation's time spent within these two energy wells (Fig. 2C) and used the occupancy ratio to scale the mutations.For example, we found that WT, K5N, G12V, G13I, L23R, D30E, Q61K, and K117N exhibit low Switch 1 dynamics that favor active RAS conformations.Alternatively, variants such as G12D, Q22R, H27Y, F28L, P34Q, Q61R, and Y71D sample an extended Switch 1, favoring inactive RAS conformations.Thus, we find utility in the PC metric as a variant that overly favors the flexibility or rigidity of Switch 1 to be conducive to dysregulating RAS pathway signaling.
The utility of PCs in distinguishing between specific variants is further demonstrated by comparing G12D, G12R, and WT between GTP (Fig. 2A) and GDP states (Fig. 2B).We highlight these specific mutations because G12D is the most studied mutation, while G12R is gaining attention as commonly observed in gastric cancers.The WT simulations exclusively sample stable closed Switch 1 conformations; G12D simulations exclusively sample extended Switch 1 out conformations, while G12R balances between them.Following this phenomenon is the Switch 1 RMSF of these three variants, which differs between these variants on a sequence basis (Fig. S1C).G12R has high fluctuation values for amino acid positions 34-40, WT has high fluctuation values from 31 to 40, and G12D features high fluctuation values for amino acid positions 27-40.These specific details for how the phosphate-binding loop (p-loop) and Switches have altered dynamics are likely the underlying features for differences in intrinsic activities.
In summary, mutations with high Switch 1 flexibility are less WT-like and more G12D-like.This result is important because the flexibility of Switch 1 is indicative of a decrease in interaction between RAS and the nucleotide, a phenomenon that previous literature [31] describes as having the potential to cause nucleotide expulsion or NF1 binding.As such, there is potential to estimate how much each variant favors the NF1 binding conformation by observing how similar Switch 1 dynamics are to the G12D simulations (Fig. 4C).Therefore, we quantify ligand interactions and their associations with GAP and GEF binding conformations across mutations in the following sections.

Switch conformational dynamic differences group KRAS mutations by effector binding compatibility
We next compared the total difference between our MD ensembles and different solved KRAS structures in complex with physiologic binding partners.Because the Switch conformations regulate binding to GAPs and GEFs, we calculated their structural match to the GAP, NF1 (Fig. 3A) and the GEF, SOS1 (Fig. 3B).We calculated the RMSD between evenly selected conformations from our simulations to compare these reference conformations.Our analyses identified consistent trends with mutations remaining near WT or favoring one of the binding competent conformations.Furthermore, the variants that favor the NF1 competent binding conformation over WT generally disfavor the SOS1 competent Fig. 2. Principal Component Analysis reveals a duality in Switch 1 conformational dynamics.A) Free energy landscape of PC1 and PC2 of C α coordinates within GTP simulations.The landscape is colored such that the dark blue region represents a higher density of coordinates, and the darker red represents a lower density of coordinates.Above and to the right of the landscape are violin plots representing the distributions of coordinate space for PC1 and PC2, respectively.The violin plots are colored such that WT is blue, G12D is green, and G12R is red.B) Free energy landscape of PC1 and PC2 of C α coordinates within GDP simulations.The landscape is colored such that the dark blue region represents a higher density of coordinates, and the darker red represents a lower density of coordinates.There are two different regions of blue level density where the right well is labeled energy well I and the left is marked energy well II.Above and to the right of the landscape are violin plots representing the distributions of coordinate space for PC1 and PC2, respectively.The violin plots are colored such that WT is blue, G12D is green, and G12R is red.C) Ratio between GDP wells I and II labeled by variant and sorted according to the ratio.A ratio of 0 indicates a variant does not sample well II, and a ratio of 1 indicates a variant does not sample well I. conformation and vice versa.The most extreme example is A18D, which favors the SOS1 binding conformation and disfavors the NF1 binding conformations greater than all other variants.By observing these trends across the variants, we separated the variants into groups based on deviation from WT.There are neutral variants which do not favor or disfavor any of the three conformations, compared to WT, comprised by 14/86 variants when comparing to NF1 (G12V, G13D, G13V, G15V, Q25H, N26Y, D33E, T35A, T35I, A59T, Q70P, R135T, A156V) and 14/ 86 variants for when comparing to SOS1 (G13R, G13I, V14I, G15V, Q22L, L23I, T35I, N26S, A59T, G63M, Y71C, H95N, A146V, L159S).The other variants, more than 1σ from WT, favor or disfavor one or more reference conformations depending on which nucleotide is bound.This offers insight into how these variants cause disease.For example, K5N has a similar deviation to WT from the SOS1 binding conformation when GDP is bound but is 3σ from WT when GTP is bound, indicating that K5N lacks affinity for the SOS1 binding conformation only when GTP is bound.Mutant proteins in these alternative groups have a higher propensity to maintain specific binding competent conformations instead of changing between conformations, predicting the mechanism by which these variants dysregulate the RAS pathway to cause disease.
RMSD is an excellent scoring method for assessing overall conformational change but does not provide a complete picture regarding detailed trends in conformational dynamics.Therefore, we first quantified per-mutation differences in Switch and allosteric (C-terminal) region flexibilities (Fig. S2).Globally, Switch 1 and 2 have the most flexibility in our system, concordant with the conventional understanding of KRAS dynamics and data across all available experimental structures (Figs.1C and 1D).We used the different degrees of flexibility Fig. 3. KRAS mutations exhibit significant differences in effector binding conformational dynamics.A) The replicate-averaged mobility of each amino acid displays the most extensive flexibility for the two Switches, yet with high mobility variable across mutations in the allosteric lobe.Dotted lines mark the WT values.B) The replicate-average difference from solved NF1 binding competent KRAS pose (PDB: 6OB2).C) Sum of Square Residuals (SSR) from the average RMSF between all variants on a per amino acids basis.D) Variants feature differing stabilities concerning the presence of the nucleotide in the KRAS binding pocket.The horizontal and vertical lines represent the average nucleotide interaction energy value for WT.The concentric circles represent how many standard deviations away variants are from the WT.The inner circle represents one standard deviation, the middle ring represents two standard deviations, and the outer circle represents three standard deviations.Each variant is also colored according to its distance from the WT.Variants colored in blue feature similar E int values to the WT.Red color variants represent variants that are far from the WT.
among mutations to subdivide mutations by their flexibility patterns, within and between Switches, and across the C-terminal allosteric lobe.Differences in dynamic patterns were calculated using the SSR from the WT profile (Fig. 3C).This approach identified variants with similar dynamics to WT and variants that deviate.In GTP simulations (Fig. 3C), 42/86 have |SSR MT -SSR WT | < 0.5, indicating similar flexibility profiles.Likewise, in GDP simulations (Fig. 3C), 41/86 are within 0.5 of SSR WT .There is an overlap of 21/86 variants (24%) within 0.5 SSR in both the GDP and GTP bound states and 45 (52%) within 0.5 in either.Reciprocally, 41/86 (48%) variants demonstrate increased variability in conformational dynamics compared to WT.Also, mutants with a high sum of square residuals (SSR) in both GDP and GTP simulations, such as N26I, Q61L, and A146T, overly favor the SOS1 binding conformation regardless of the nucleotide-bound state.SSR values and mutational groups are listed in Table S2.We anticipate these mutations alter cellular signaling primarily through changes to their intrinsic dynamics, Fig. 4. PC movements from simulations have specific biophysical interpretations.A) Leveraging the deep history of the RAS study, we identified ten distance measurements that have been experimentally characterized and used as monitors to detect four different biophysical phenomena (Table 2).The first three PCs are strongly associated with four biophysical phenomena.B) Display of each distance monitor used to describe the conformational change.Amino acids in each monitor are rendered in sticks, while the remainder of the KRAS structure is in cartoon representation.A black line is drawn between the R-groups of relevant amino acids to display a potential interaction between these monitors.C) Full correlation plot between the PCs and the distance monitors.Red circles represent negative correlations, while blue circles represent positive correlations.The size represents the magnitude of correlation strength, where a larger circle is a stronger correlation.

B.D. Ratnasinghe et al.
which dysregulates coupling to effector proteins and thereby defines their role in human diseases.

Interaction energy elucidates differing nucleoside binding responses on a multi-variant basis
Interactions between the nucleotide and the KRAS switch regions are important for proper protein function.One crucial use of MD is calculating the specific properties concerning the non-bonded interactions throughout the protein.As such, we leveraged this critical aspect of MD and calculated the interaction energy between the nucleotide and KRAS protein (Fig. 3D).The distribution of values on a per-variant basis divided the variants into distinct groups based on the difference from the WT.Most variants [41] are within 1σ of WT, which indicates little change in ligand interaction energy.The variants between 3σ and 4σ from WT are T3A, G12S, Q13I, A18D, Q25H, Q61K, Q70P, and K117E.The variants ≥ 4σ from WT include G12R, G13D, G15V, Q60R, and K117N.Thus, KRAS mutations fall into a stabilizing, neutral, and destabilizing spectrum.For example, E31Q is 1.5σ from the WT interaction energy for both GTP and GDP, indicating the protein has a more energetically favorable interaction with both nucleotides.Alternatively, Q25H has a similar GDP interaction energy to WT and a less stable GTP interaction energy at almost 3σ from WT.These results indicate that Q25H has favorable interactions with GDP but less favorable interactions with GTP, which likely functionally complement each other to reduce intrinsic GTPase activity.The energetic scores for all proteins are available in Table S2.In summary, relative nucleotide stability informs which variants have overly unfavorable or favorable magnitudes of nucleotide interactions, which have a strong potential to predict how variants can mishandle the nucleotide and dysregulate the RAS signaling pathway.

Biophysical interpretation of molecular dynamics scores describes mutational functions
Above, we assessed individual MD-based metrics that identify differences in conformational sampling across KRAS mutant proteins and predict features of their intrinsic differences.Now, we biophysically annotate the PC scores and interpret our simulations in the context of precise observations established from the extensive experimental literature on RAS.First, we identified a set of ten pairwise distances within KRAS that are either markers for biochemical activities or critical structural features of distinct RAS conformations according to biophysical measurements, including from NMR spectroscopy (Fig. 4B) [3,30,31,40].These ten distance pairs fall into four themes associated with different hotspot mutations (Table 1).They quantify the extent of opening Switch 1 and Switch 2 from the p-loop, the coordination of movements between Switches, and the electrostatic coordination of the magnesium cation accompanying GTP.MD-based PCs capture switch motions, the basis for the four themes (Fig. 4B), and strongly correlate with the ten distance monitors (Fig. 4C).For example, the distance monitor Y32 to Y59, which describes the interconnectedness between Switch 1 and Switch 2, correlates (ρ = − 0.75) with both PC1 and PC2, demonstrating that when variants favor the negative directions of PC1 and PC2, the Switch regions separate.Additionally, there is nearly a reciprocal relationship between PC1 in the GDP and GTP states, where Switch 1 has the opposite movement direction as the enzyme shifts between active and inactive conformations.In the GTP simulations, PC1 is anticorrelated (ρ = − 0.8) with distance monitors G12 to T35 and Q61 to T35.This trend is reversed in the GTP simulations demonstrating fundamental differences in how the nucleotide affects KRAS dynamics.These data confirm our simulations' functional and biophysical nature and establish their biophysical interpretation.
To determine relationships among all MD-based metrics (spanning RMSF, PCs, interaction energies, and distance monitors), we computed a complete correlation matrix (Fig. S2A and Fig. S2B), revealing relationships among these metrics.As previously discussed, PC scores  [3] Hbond within switch as well as ring stacking.G13D favors intermediates not used by WT G12D, G13D, WT [3,40] Describes the interconnection between switch 1 and switch We also generated a heat map of the MD metrics per-variant basis to gain insight into which variants have similar score profiles (Fig. S2C).Kmeans clusters of the heatmap data divide the mutations with similar SSR profiles, with about half of the mutations demonstrating significant changes in multiple scores simultaneously.For example, a slight change in Switch 1 or 2 has the potential to cascade and cause larger conformational changes, which are observed in unison by our MD metrics.The correlation matrix provides a complete summary of the interconnectedness between our MD metrics and the multi-faceted effects of each mutation on KRAS intrinsic dynamics.

Biophysical measurements confirm the uniqueness of each mutated protein
We developed a uniform high-throughput platform for recombinantly expressing and purifying human KRAS mutants from E. coli (see Methods).We successfully purified 65 of the 86 mutant proteins; the 21 not purified are K5E, G13I/V, V14I, G15V, S17G, A18D, L19F, Q22L, L23R, N26I, T35I, K117E/N/R, and A146V.We used thermostability measurements to determine relative similarities among the 65 KRAS mutations and their relative propensity for activating (GTP-stabilized) or inactivating (GDP-stabilized) conformations.After optimizing buffer conditions, WT KRAS measurements were highly reproducible with the GDP-bound form melting transition at 62.8 ± 0.9 • C, and 55.6 ± 1.4 • C when bound to the non-hydrolyzable the GTP-analog, GMP-PNP (referred to as "GTP state" herein).Thus, WT-KRAS is more stable in the inactive GDP-bound form by 7.2 • C. Temperatures of unfolding were highly similar between the two ligands and across mutations (p-value = 26 ×10 − 7 ), demonstrating a consistent pattern and system balance.
Thermal melting temperatures of KRAS mutations spanned a wide range of values from highly destabilizing (40 • C) to stabilizing (65 • C; Fig. 5A).Differences between GDP and PNP were bookended on one end by mutations with minor differences such as Q61R (ΔΔT m GDP-PNP =0. ).Q61R was expected to have the greatest difference in activated compared to inactivated KRAS stability, yet both nucleotide-bound forms were destabilized.Conversely, compared to G60V, with the greatest destabilization of the activated KRAS and little effect on the inactivated form.Mutations such as G12V stabilize both states but stabilize the GTP state much more than the GDP state.Other mutations, such as G13C/R/S and G12C, destabilize both states equally.At certain non-hotspot sites, such as G60, loss of the native amino acid has a larger effect on stability than differences in the alternate amino acids; G60A/R/V all have minor effects on GDP state (ΔΔT m = 0.5 ± 0.7 • C), but significantly destabilize the GTP state (ΔΔT m = − 5.8 ± 3.4 • C).The most extreme destabilization occurs for A146T, but the change is greater for the GDP state (− 16.7 • C) than for the GTP state (− 11.5 • C).Across our thermostability measurements, the mutational landscape of KRAS produces proteins with a spectrum of nucleotide-dependent stabilities that we anticipate alter intrinsic activity.

Computed features from molecular simulations predict biophysically meaningful values
Again, using our full complement of scores, UMAP embedding was used to compare the relative similarities across all variants (Fig. 5B).The results are promising because the UMAP captures the distinction Fig. 5. Using combined metrics allows dimensionality reduction that mirrors T m Data.A) Mean T m difference from WT on a per variant basis sorted by difference from WT. Symbols are as follows: circles represent GDP-bound variants, and triangles represent GTP-bound variants.Each symbol is colored according to their difference from WT, where blue symbols feature a lower value than WT and red symbols feature a higher value than WT.B) UMAP dimensionality reduction calculated using all MD metrics discussed in this study.Each variant is colored according to T m, as seen in A. Variants colored in grey have simulation data but do not have T m data.
between GTP and GDP states, with GDP states occupying one cluster, GTP states occupying a second cluster, and little overlap between clusters.We projected our melting temperature measurements onto the UMAP space, revealing that the GDP variants have a higher melting temperature compared to the GTP variants, yet each following a gradient; within GTP-bound simulations, we observe a linear spectrum with the most stabilized mutations on one side and least stable on the other.These data affirm that, even though melting temperature is a single composite measurement, our collection of MD-based scores captures features relevant to explaining experimental results.Additionally, we took the PC standard scores and colored the UMAPs by the values of those scores.Interestingly, we found a split in the PC1 standard scores closely resembling the melting temperature data split (Fig. S3).Conversely, the PC2 and PC3 standard scores appear unrelated to the melting temperature values.As such, we conclude that Switch 2 confirmation has a lower effect on overall KRAS conformational stability when compared to Switch 1 conformation.

Grouping of variants elucidates the utility of MD scores in categorizing variants by their conformational dynamics
So far, we have described several dynamics-based scores that follow established KRAS biophysical metrics and have utility in identifying critical differences between variants.Using diverse metrics simultaneously, we can more robustly subdivide and characterize groups of KRAS mutations.To better accomplish this, we defined six groups using all calculations (Fig. S2) to generate meta-classifications, which we define as clusters of KRAS mutations that share characteristics across all scores.Individual scores were used to classify mutations as WT-like (within 1σ), less than WT (< − 1σ), and greater than WT (> 1σ), making 90 total meta-classifications (Fig. 6).Every group has at least one hotspot variant except for GTP Group 5, the smallest group with five variants.GDP Groups 3 and 4, as well as GTP group 6, all have one hotspot variant.GDP Group 6 and GTP Group 3 have the most hotspot variants, with six and seven, respectively.All the other groups have three or four hotspot variants (Table 2).The spread of hotspot mutations across these profile groups indicates the importance of characterizing non-hotspot genetic variants and their role in congenital diseases and human cancers.
Each profile has unique features.Indeed, the different groups can readily be distinguished by their meta-classifications.For instance, GTP Group 4 has a higher overlap with distance monitors than GTP Group 2, even though the two groups have a similar number of variants.Investigating the top 5 meta classifications for each profile described the types of dynamics seen within each variant group (Table 3).For example, observing the patterns of meta-classes within GTP Group 4, we discern several points of crucial information.Primarily, the RMSD classification indicates that the variants are conformationally close to the NF1 binding competent conformation and far from the SOS1 binding competent conformation.Three distance monitors explain the precise mechanism underlying this conformational restriction that favors the NF1 binding conformation.E62-H95 and Q61-D92 are monitors for Switch 2, and both indicate greater flexibility in this group of variants.The A11-Q61 monitor describes Switch 2 movement away from the nucleotide.This is because A11 is within the p-loop involved in nucleotide interactions, and Q61 is within Switch 2. As such, we can summarize GTP Group 4 as a group of variants that favors NF1 binding and flexibility of Switch 2. Using our meta-classification information (Table 3 and Fig. 6), similar descriptions are evident for every group.Therefore, we display the capacity of molecular dynamic data to provide a systematic interpretation of genetic variations, enabling a functional understanding of the KRAS mutational landscape.

Patterns from dynamics calculations across mutations match existing experimental data
We consulted our established RAS resource [3,30,31] for groups of mutations that share experimentally derived features and compared them to our results described above.Two major hotspots, G12D and G13D, have been studied extensively via biochemical, biophysical, and cellular experiments.These previous studies revealed that both favor non-WT-like conformations, with G12D favoring the GTP-bound, Raf-bound state.In contrast, G13D favors this state to a lesser degree due to its destabilization of the binding pocket and the development of additional intermediates.G12D and G13D lie on nearly opposite sides of our UMAP space (for GTP and GDP forms), supporting their divergent effects on structure and dynamics in simulation.Additionally, the G12V and G13V classic hotspot mutations lie close in UMAP space to G12V in the active GTP-bound state yet diverge in the GDP-bound state.Heterogeneity in fibroblast and epithelial cell response to distinct Q61 mutations [41] match our results in that Q61E/P displays a different pattern of dynamics-based scores compared to Q61H/L/R, the latter of which more closely resembles G12D.Yet, in our analysis and previous experiments, Q61R remains closer to Q61P in character than Q61H/L.Non-hotspot mutations are less studied experimentally; in our assessment, non-hotspot mutations exhibited an even more diverse array of structure-and dynamics-based changes than hotspot mutations.

Conclusions
We have uniformly characterized a large and diverse repertoire of 86 KRAS mutations, spanning the alterations observed in congenital diseases and human tumors.Our simulation-based approach to representing the effects of genomic mutations on encoded 3D protein structures and their dynamics is scalable to hundreds of mutations, sensitive to details of the molecular environment (e.g., GDP versus GTP), and captures patterns identified through experimentation.This level of mechanistic specificity and molecular detail is currently lacking in genomics.Thus, our approach extends our previous work and achieves increased resolution for genomic data interpretation.
In previous work, we advanced the field of translational genomics by using calculations of 3D translated gene products, rather than the gene sequence, for better interpreting cancer-causing gene mutations and inter-individual genetic variation [4].We also used experimental structures across the RAS GTPase family to scale the interpretation of human genetic variation by leveraging structure bioinformatics into a thousand variants [5].In the current study, we explored the utility of 3D dynamic information for extending structural bioinformatics for genomics.We developed a series of dynamics scores that capture established experimental biophysical markers and used them to define groups of variants with related effects.Our approach of standardizing PC-based scores to WT dynamics was particularly helpful in matching with experiments.The strong correlation between the PC standard scores and biophysical markers allows for an interpretation of how different types of conformational change will cause different effects on the intrinsic activity of this key enzyme and signaling regulator.High-level trends across the structure and dynamics-based calculations also correlated with melting temperature values.As such, we have increased confidence in the utility of dynamics scores in differentiating between genomic variations.Indeed, in future studies, these results establish the potential to use dynamics scores on a genomic basis to distinguish disease variants and their potentially divergent mechanisms.
While the field now acknowledges heterogeneity in enzymatic and cellular outcomes across KRAS mutations, the ability for computational approaches to elucidate details of intrinsic dynamics may not be universally accepted.As data science tools continue to advance, their role in hypothesis generation, testing, and functional inferences similarly advances.Given the increasing accessibility of high-performance computing, ingenuity in algorithms, and complexity of analytic resources, studies such as the current work will establish a new level of mechanistic information about mutant proteins.This data applies to all human body tissues and cellular contexts, which can be applied to systems modeling to increase predictive and inferential power.
With any simulation-based study naturally comes the question of adequate sampling.The computational power used for simulations must be realistically balanced with the scope of molecular phenomena expected for such methods to scale on a genomic basis.In this study, we performed medium-length simulations to assess a baseline for the comprehensive study of inter-individual KRAS variation and as a pilot for their application on a more extensive basis.Indeed, we could observe important dynamic features from the current extent of sampling.Yet, we expect our future studies to focus on enhanced sampling techniques over more extended classic MD simulations to enumerate better the mutation-specific conformations and dynamics, which will likely elucidate additional details and nuance for the specific intrinsic molecular dysregulation.Likewise, the current simulations carry clear biophysical information yet do not precisely recapitulate changes in T m .Enhanced sampling will better elucidate the ensemble properties that reflect thermostability.While thermostability is important, especially for KRAS, and as it applies to stabilizing or destabilizing the active and inactive states, thermostability may provide limited information regarding the precise dimensions of enzyme function.Protein dynamics are too complex to be adequately described by a single value in an experiment or simulation.The strong correlations between our analysis and established experiments indicate merit in medium-length simulations for genomics and patterns of KRAS dysregulation that span hotspot and non-hotspot mutations.
As we continue optimizing our approach, we anticipate that dynamics-based scoring can be broadly applied to describe the differences and similarities between damaging and tolerated genetic variation.Further, more layers of information, such as protein complexation, membrane interactions, and more, can be incorporated to predict better when damaging variants will be pathogenic.This is a major opportunity  because of the variable penetrance and expressivity associated with many disease-causing mutationscomputational modeling and biophysical approaches can add context that may help explain these differences.Dynamics-based differences can produce mutation-specific pockets, which afford new drug development opportunities.The most recurrent somatic hotspot variants now have clinical trials for mutationspecific inhibitors.Mutations responsible for congenital diseases and less common in cancer will likely gain meaningful insight into their dynamic similarities to the classic recurrent oncogenic mutations.We believe the insight gained from molecular simulations will provide the foundational knowledge to develop new treatments and systematically interpret genetic variation.In summary, we successfully developed a new scalable workflow capable of functionally grouping genomic variants using MD scores.In using established experimental findings, we contextualized and validated our data.MD has been used mainly for studies that are biophysical in nature.Our workflow presents an expanded scale of MD using the method for genomics interpretation.Furthermore, MD is uniquely advantageous for scoring variants uniformly, from the most recurrent to one-of-a-kind.As such, we will continue improving our workflow to increase the ease and commonality of using MD to explain interindividual genetic variation, which may one day be applied to elucidate disease mechanisms and inform treatment with mutation-specific therapeutics targeting mutationally restricted conformations.

Fig. 1 .
Fig. 1.RAS Structural Overview.A) KRAS Structure colored in grey bound to GDP with carbons in tan and Mg 2+ colored Green (PDB: 4OBE).The p-loop is colored red, Switch 1 is colored pink, and Switch 2 is colored yellow.Additionally, α helices 1, 2, and 3 are labeled where α helix 1 is the first helix in the KRAS sequence and α helix 3 is the third helix in the sequence.The same KRAS structure with the same coloring rotated 60 • .Here, β sheets 1, 2, and 3 are labeled where β sheet 1 is the first sheet in the KRAS sequence and β sheet 3 is the third sheet in the sequence.B) Complete representation of all 194 PDB deposited KRAS structures (chain A).The GDP bound WT structure is used as the starting point for simulations (PDB: 4OBE) and is shown in cartoon representation with the other 193 conformations displayed in ribbon.The p-loop is colored red, Switch 1 is colored pink, and Switch 2 is colored yellow.

Fig. 6 .
Fig. 6.Rankings of Different Meta Classifications by Occupancy Across Clustering Groups.We define a Meta-class as the K-means groups across the 90 metric-specific classifications, which were then filtered down to 37 classifications where each classification is in at least one of the top 10 classifications for each K-means Group.The bars are ordered by the type of score that each Meta Classification comes from.The score type is indicated by the lengthwise bars below each plot.Green indicates RMSD, Purple indicates Interaction Energy, Magenta Indicates PC1 fraction, Teal indicates Switch 1 Distance Monitor, and Orange indicates a Switch 2 Distance Monitor.The Distance monitor Meta-classes that either.

Table 1
Description of previously characterized structural hallmarks of KRAS activity.
with groups of distance monitors.Furthermore, the PCs and these distance monitors correlate with groups of RMSD and RMSF metrics depending on the binding competent conformations and effector regions the PCs correspond to.This correlation structure is subdivided by metrics that strongly affect Switch 1 and metrics that affect Switch 2. correlate

Table 2
Variant Identity of each Kmeans Group.

Table 3
Summary of Top 5 Input Features for Distinguishing each Meta Classification.