DataXflow: Synergizing data-driven modeling with best parameter fit and optimal control – An efficient data analysis for cancer research

Building data-driven models is an effective strategy for information extraction from empirical data. Adapting model parameters specifically to data with a best fitting approach encodes the relevant information into a mathematical model. Subsequently, an optimal control framework extracts the most efficient targets to steer the model into desired changes via external stimuli. The DataXflow software framework integrates three software pipelines, D2D for model fitting, a framework solving optimal control problems including external stimuli and JimenaE providing graphical user interfaces to employ the other frameworks lowering the barriers for the need of programming skills, and simultaneously automating reoccurring modeling tasks. Such tasks include equation generation from a graph and script generation allowing also to approach systems with many agents, like complex gene regulatory networks. A desired state of the model is defined, and therapeutic interventions are modeled as external stimuli. The optimal control framework purposefully exploits the model-encoded information by providing those external stimuli that effect the desired changes most efficiently. The implementation of DataXflow is available under https://github.com/MarvelousHopefull/DataXflow. We showcase its application by detecting specific drug targets for a therapy of lung cancer from measurement data to lower proliferation and increase apoptosis. By an iterative modeling process refining the topology of the model, the regulatory network of the tumor is generated from the data. An application of the optimal control framework in our example reveals the inhibition of AURKA and the activation of CDH1 as the most efficient drug target combination. DataXflow paves the way to an agile interplay between data generation and its analysis potentially accelerating cancer research by an efficient drug target identification, even in complex networks.


Supplement
We provide further informa2on about how to use the graphical user interfaces (GUIs) of JimenaE, how to use our soAware for fiCng models adap2ng topologies and use the op2mal control framework for iden2fying efficient drug targets.
JimenaE GUI explana/on to set up D2D scripts and their descrip/on Once a graph, represen2ng the regula2on topology, is set up, we use this informa2on to generate the SQUAD equa2ons that model a dynamic gene regulatory network automa2cally with the JimenaE framework.In order to generate the D2D script to determine parameters of the SQUAD model fiPed according to the data, we use the following JimenaE interface: First, we have to load a network file (graph) into JimenaE.In the menu "Network", we can select "Import yEd File" to choose a ".graphml" file that represents the network topology (reg-ula2on informa2on).
AAerwards, we open the GUI by selec2ng "D2D" in the "Analysis" menu.The GUI is next explained step by step.The fields D1 to D3 allow us to select specific nodes for different purposes.D1: Nodes that are up or down regulated by an external s2mulus.By "posi2ve regula2on", we add the term + ! !$1 −  " ( to the corresponding j-th equa2on (E2).The mapping between gene names and the index  is provided in the D2D script that is output by pressing "Generate".Analogously, when marking a gene in the "Nega2ve regula2on" column, the term − ! ! " is added to the corresponding -th equa2on (E2).We remark that any list of nodes, equa2ons or any numerical representa2on of nodes follows the same order as in the original network file.This might differ from the representa2on order used in other parts of JimenaE, as some2mes an alphabe2c listening of nodes is used.the op2miza2on procedure to fit the parameters best to the data, as selected in the ini2al values sec2on (D4).For each parameter, an arSetPars ('label', init_Value, qFit, qLog, lb, ub) func2on is listed.With 'label' holding the parameter name, init_Value the ini2al assignment, qFit is fixed, fiPed or constant, here always fiPed, qLog is normal or logarithmic, here always normal, lb the lower bound and ub the upper bound of values the corresponding parameter can take during the op2miza2on process.To execute the "arSetPars" func2on for all parameters at once, we provide the script "arInitValues" in the folder "D2D_extended" within our git repository.Ensure that in the script "arInitValues" the accurate path to the output from JimenaE "name_InitValues.txt" is set.Data nodes list: "name_DataNodeNames.csv": Lis2ng all nodes that were selected in the Experimental Data Nodes sec2on (D2).This file is an easy way to later find and extract the corresponding data from the measurements with scripts where this list can be read in.The script we used to read in our data can be found in the "D2D_extended" folder of the corresponding git repository.

F1:
The PREDICTOR holds the informa2on about the 2me dura2on of the experiment as assigned in the GUI (D5) (Fig. sup1).F2: The STATES are the variables for the ac2va2on levels of the nodes in the network, auto-ma2cally derived from the network file loaded into JimenaE.The first column provides the mathema2cal expression while the seventh column provides the corresponding gene name.F3: The INPUTS define the mathema2cal expressions for the external s2muli.The defini2on in the data.defoverwrites the defini2on in the model.def.This part can be set by the GUI (D1) (Fig. sup1).F4: The ODES are the equa2ons for each node (E1).They are listed in the same order as their original node counterpart was listed in the network file.F5: The OBSERVABLES define the states for which there is measurement data as selected in the GUI (D2) (Fig. sup1).The measurement data consist of a variable represen2ng the best value for each gene ending with "_obs" and a variable for the measurement error (standard error) ending with "_std".The data is provided with the data.csvfile.We provide an example of a data.csv in folder "Example" in the provided git repository.A data.def file is needed for each experiment.An experiment is defined as one seCng of ini2al values of ac2vity levels of each node of the network and one seCng of external s2muli.In each run of that experiment, the ini2al values are the same and the external s2muli are set up the same, meaning, e.g., same intensity/concentra2on and 2me curves.For this fixed seCng, the experiment is repeated several 2mes and measurements are done aAer the same 2me aAer the star2ng point each, called  % .For each  % , best values and standard errors can be calculated from the corresponding measurement points.The ini2al values of the model are included into the fiCng parameters and adapted during the best fit procedure such that the 2me curves of the model fit best to the 2me curves consis2ng of best values and standard errors.In case, there are experiments that start in different ini2al values, we need to define different ini2al values such that the op2miza2on procedure can account for that circumstance.One important further applica2on for different ini2al states is the switch from one steady state to another like demonstrated in (Breitenbach et al. 2019).In such a use case, a cell is in a stable steady state as ini2al state and is perturbed with the external s2muli such that it transits to a different steady state.In order to change ini2al values for different experiments, we can use the CONDITIONS in the data.def to overwrite the standard name of the ini2al value "init_gene-Name" as follows.Each experiment, where the system starts in the "ini2al value 1", we write "init_geneName "init_geneName_1"" (without the outer quota2on marks) under CONDI-TIONS in the corresponding data.def.Analogously, we write "init_geneName "init_gene-Name_2"" (without the outer quota2on marks) under CONDITIONS in those data.deffiles where the nodes of the network ini2ally start with the "ini2al values 2".Then the op2miza2on algorithm can take different ini2al states/values into account for different experiments.However, we remark, that for each repe22on of such an experiment, the ini2al value should be the same.Otherwise, we might get high standard errors for the corresponding data points.
JimenaE explana/on to set up scripts for the external s/muli framework and their descrip/on The corresponding MATLAB scripts for the external s2muli framework are generated with the following JimenaE GUI (Fig. sup4).This GUI can be opened aAer loading the network file that has been used for the fiCng process by clicking on the slider "Analysis" and then "D2D Exter-nalS2muli".AAer opening this GUI, we have to first load two files (S1) and (S2) that have been generated by the previous GUI (Fig. sup1) and the D2D framework.The first file (S1) contains the results outpuPed by the D2D fiCng procedure with the PEtab rou2ne.The second file is the mapping file that is generated with the generate-buPon from the first GUI (Fig. sup1).AAerwards, we can specify which nodes should be a target and change their value by the influence of external s2muli (S4-S6).S1: To enable the system to work with the parameters calculated by D2D, load the corresponding ".tsv" file.We can find this file name_parameter.tsv in the parent folder within in the D2D folder with the name PEtab.S2: Loading in the mapping "name_D2DMapping.tsv",ensures that the parameter values from D2D are mapped correctly to their corresponding parameters in the JimenaE framework.S3: The "Use Parameters" buPon allows us to set the parameters of the network to the parameter values calculated by D2D and is helpful if we plan to use other func2ons of JimenaE, like for example the switch analyzer (name of the method to calculate op2mal external s2muli to switch between steady states, see Methods sec2on for details) or node centrality (weights nodes with respect to their influence on the network state, see (Kaltdorf et al. 2023)), where without pushing the buPon, standard values are used.AAer pushing the buPons, just navigate to the corresponding analysis with the "Analysis"-slider.S4: Here we can select nodes that should aPain a certain target state the network is supposed to get by the influence of the external s2muli (e.g., high apoptosis value for cancer cells by the drugs or treatment of the drug targets).The target value  " should be between 0 and 1.The weight  " models how important it is for the user that the corresponding ac2vity level of this node reaches the target state compared to the others.If the weight is zero, the values of the corresponding nodes are not considered in the objec2ve accoun2ng for devia2on from the target states and is minimized subject to corresponding constraints.S5: This list displays all nodes that were selected as Regulated Nodes in the first GUI (D2D script genera2on (D1)) and if they were selected as up or down regulated by an external s2mulus.Unchecking a node effects the corresponding external s2mulus such that it is not included into the genera2on of the external s2muli script.S6: Here we can select further nodes for equipping them with external s2muli to steer the network to the desired state (defined in (S4)).The nodes selected work in the same way as described in the D2D script generator under (D1).However, if no data-fiPed coupling constants (s) are available, these addi2onal parameters are set to 1.We remark that any other choice can be made here and is as good just by the fact that we have no data.Maybe, there is an educated guess how to set these values inspired by similar drugs.In case it turns out that results are sensi2ve to the concrete choice, we recommend to purposefully perform corresponding experiments to generate the data.In case of drug target iden2fica2on where there is no drug available yet to influence the network accordingly, we can look at the state affected by the corresponding external s2mulus and see the changes by the ac2on of the external s2mulus.Since the coupling constant and the external s2mulus are a product, the op2miza2on procedure sets the ac2vity level of the external s2mulus accordingly such that the product itself has op2mal values.From the analysis and the corresponding change of the state between with and without the ac2on of the external s2mulus, although we do not have a data-determined coupling constant, we have an es2ma2on of the effect strength a poten2al drug needs to have in terms of the ability to change the ac2vity level as calculated to have the desired effect on the network.S7: AAer loading the two files (S1) and (S2), defining target states (S4) and external s2muli (S5) and (S6), we can now generate the MATLAB scripts for the external s2muli framework.The buPon will open a window allowing us to select the loca2on and name of the newly created file in the same way as explained in the prior GUI under point (D6).The outpuPed file needs to be included into a folder with auxiliary func2ons.These auxiliary func2ons are provided in our git repository under the folder "external_s2muli".The main components of these scripts are explained in the following and depicted in Fig. sup4.The file name ends with "_main.m".

SF1:
The list of parameters with their values calculated by D2D to fill in the variables in the equa2ons below.Any deltas for which no fiPed values are available (e.g., added as the effect of a new poten2al drug onto the corresponding target node) are set to 1.This value is used in the equa2ons for both up and down regulated nodes as selected in the GUI under point (S6).Also see there for a short discussion on the choice of the values for the coupling constant in case of no data to fit them accordingly.SF2: Here all nodes selected as nodes of interest under point (S4) will be listed as long as their set weight  " is greater than 0, as elsewise the targeted value wouldn't have any effect on the objec2ve to be minimized anyway.For each node there are three values separated by "," while different nodes are separated by ";".The three values are the node number represen2ng the node in the external s2muli framework, the targeted value the ac2vity level is supposed to have, and the set weight to weight the corresponding devia2on of an ac2vity level from its desired value.The node numbering follows the same order as they were listed in the original network file and the order of the equa2ons.SF3: Here the number of nodes, the number of external s2muli/controls (both set automa2cally by JimenaE), the 2me delta for the discre2za2on of the 2me for solving the underlying system of equa2ons (E1) (standard 0.1, can be decreased in case of numerical instabili2es solving the system of equa2ons), the total 2me horizon which should coincide with predictor in the D2D script (automa2cally set), the parameter alpha, coincides with  introduced in the op2mal control problem above, weights costs of the ac2ve external s2muli (values greater than zero) compared to nodes of interest that they are close to their target values (for details see ((Breitenbach, Lorenz, and Dandekar 2019)), and the ini2al states of the nodes that fit best to the data (filled automa2cally from the D2D results) are listed.We remark that these scripts are for a scenario where each experiment starts in the same ini2al state.For a mul2-ini2al state scenario, see next paragraph.SF4: The equa2ons for each node as in (E1) are listed here.
Un2l now, we have considered the case where the objec2ve is to hold the network in a state by the external s2muli that is not necessarily permanent meaning if the external s2muli decay, the targeted node values go back to the ini2al ones.In the following, we describe how we can use the framework to calculate op2mal external s2muli to switch between to steady states meaning to transit the network from the ini2al state into a new state where it remains even when the external s2muli decay, see (Breitenbach et al. 2019) for details or the beginning of the Methods sec2on.For this purpose, we extended the original Switch Analyzer (Breitenbach et al. 2019) for the usage with D2D.To fit the network to different ini2al states, please see the Supplement "Jime-naE GUI explana2on to set up D2D scripts and their descrip2on" in the end about using the CONDITIONS.AAer having found the fiCng parameters, we make use of the parameters in the Switch analyzer as follows.We click in the slider "Analysis" on "D2D External s2muli" and in the GUI that opens in the boPom right corner "Use parameters" where we need to provide a path to the mapping file and the parameter file, all output from the fiCng process.Now, the parameters are deposited in the JimenaE.To generate the needed file to find efficient external s2muli to trigger the switch, we first have to search for steady states of our model in JimenaE.This func2on is listed in the Analysis menu under "Find Stable States".Click again into the slider "Analysis" and then "Find stable states".AAer the search is done, a GUI will pop up showing all found steady states and different op2ons for how to con2nue.Note that not all networks have stable steady states, or even if they have, they might not be found due to limita2ons of a used search method.To con2nue, we select the "D2D Switch Analyzer" buPon in the lower right corner of the GUI.A GUI is generated aAer selec2ng this op2on looking like Fig. sup6.SA1: Here we can select any pair of states we want to analyze for their switching behavior.One ini2al and one target state.The external s2muli will be op2mized to find the most effec2ve combina2on to steer the network from the ini2al state to the target state.SA2: This part allows us to select nodes we intend to perturb with the ac2va2ng or inhibi2ng external s2muli to perform the transi2on of the network from the ini2al to the target state.SA3: Loading in the parameters as calculated with the D2D framework.This part is the main benefit over the regular Switch Analyzer, as it allows us to include the finetuned/adapted parameters.SA4: To ensure the values in the parameter file are mapped to their corresponding parameters in the equa2ons, we addi2onally have to load the mapping file, which was created with the buPon under (D6).We find that file at the same loca2on where all other files created by the D2D GUI triggered by (D6) are located.SA5: This buPon allows us to generate the op2mal external s2muli MATLAB script needed for the analysis of switching a network between two steady states.Its genera2on is carried out in the same way as the file crea2on in the other described GUIs.The name of the files created with that buPon end with "_main.m".In Fig. sup7, we explain the main parts.

Fi?ng tutorial
In this sec2on, we present how to adapt the gene interac2on as depicted in Fig. sup8(A) to the single cell data.There are 20 nodes shown in the topology and display the pathway regula2on in a typical cancer (Fig. sup8(A)).The op2mal state to which the D2D best fiCng procedure converges shows oscilla2ons in almost all curves.We see that the oscilla2ng curves go through the data points, however, we favor a solu2on that has constant expression values which might fit bePer the real situa2on.We assume that the oscilla2ons rather come from the fact that the number of our measurement points are too few compared to the number of free parameters that we fit.Together with the fact of too few measurement points, we would like to remark that we cannot use the output of the Chi2Test func2on of D2D to evaluate if the model fits the data based on a level of significance, meaning how likely it is that devia2ons between data and model are rather explainable with noise instead of systema2c model errors.The reason is that the corresponding p-value is not defined due to a too liPle number of measurement points compared to the number of parameters that are fiPed.For details about the ra2onale about u2lizing the chi-square test for model fiCng, please see the Methods sec2on.For more details about the background of the chi-square test, please see (Breitenbach et al. 2022).We explain in the Discussion how a procedure can look like to increase measurement points purposefully in accordance with the presented framework.We remark that in terms of an effec2ve data evalua2on the measurement of data points needs to be considered with the research ques2on and the methods used for analysis.One of our aims of this work is to lower the effort for the applica2on of data fiCng and model analysis regarding efficient drug targets to facilitate data genera2on and its evalua2on at the same 2me.In the presented work, we limit ourselves to a visual evalua2on for fiCng which showcases just as well the applica2on of the total framework and at the same 2me demonstrates that our method can be used in the case of a sparse data founda2on.Under sec2on "Enlarged image of fiCng process" all images of the fiCng results are displayed solely to view them at a larger size.To improve the fiCng of the model with the data, we assume that the oscilla2on could be caused by the missing inhibi2ng input of the node FBXW7 (Fig. sup8(B)) since it seems too highly expressed causing a too strong inhibi2on of AURKA and an inhibi2on could damp the oscilla2ons.Based on data, a yellow node symbolizes the node under treatment condi+ons (KRAS-Inhibitor).The primary func+onal endpoints within this network are represented by apoptosis nodes in red and prolifera+on nodes in blue, highligh+ng important cellular processes.Apoptosis (red node) and prolifera+on (blue node) serve as output nodes.Grey nodes are without any data.B) Fi[ng results of 16 measured genes (18 nodes serve as regula+on nodes, two as output).For each gene, four data points are visualized in the graph depending on +me points (0h(purple), 4h (red), 24h (blue), and 72h (yellow)).In the protein-protein interac+on network, each gene is accompanied by a chi-square value, total Chi2 = 97.8907with 64 data points and 91 free parameters.Created with BioRender.com.
To overcome this problem of oscillations, the model was modified by extending an inhibitory fictive node with a constant expression value selected in the GUI (Fig. sup1 (D3)) named F-Box And WD Repeat Domain Containing 7 (FBXW7) (Fig. sup9(A); green).The idea of a fic2ve node is that we can test the effect of a poten2al gene with its inhibitory effect (in general with the required property of ac2va2ng or inhibi2ng) if it pushes the model in our favor and if yes, one can search for a corresponding gene with similar expression values.In other words, we can postulate the ac2on of another gene, focusing on a targeted search for such a gene and as we see later in our demonstra2on, we found a gene with the postulated influence on the network.This is a further example, how our proposed pipeline can accelerate research by giving guided hints for improvements from a data-driven evalua2on.Technically, we change the ini2al value in the updated_arSet.tsvfile, explicitly the arSetPars ('init_x21','ini2al value',1,0,0.0,1.0), which is the func2on to set the ini2al value for the fic2ve node for the parameter op2miza2on method, to 0.3 ('init_x21','ini2al value',0.3,0,0.0,1.0).Such varia2ons in the ini2al guess star2ng from 0.3 instead of 1.0 can cause the gradient method to converge to a local op2mum with bePer fiCng 2mer curves.We remark that finding well working ini2al guesses is some2mes trial and error work.However, with our framework, we can narrow down this trial-and-error work to par2cular single cases, e.g., a single fic2ve node that acts on a node that has a high devia2on from its data measured with regard to its contribu2on to the chi-square value.This value change provides an improvement of the model (Fig. sup9(B)) regarding fiCng to the data.While the best-fit plots are generally in good agreement with the data, the plot of Aurora Kinase A (AURKA) stands out with its devia2on from the data, as evidenced by a high chi-square value of 7.25452 (Fig. sup(B); red box), compared to the genes as one of the major contribu2ons to the total chi-square value of the model.This discrepancy might result from the weak ac2va2on of AURKA.)) features an extension with a fic+ve node highlighted in green.This fic+ve node exerts an inhibitory influence on FBXW7 (green) within the network.The node highlighted in yellow, represents the node KRAS under treatment condi+ons with a KRAS inhibitor.Major func+onal endpoints are represented by apoptosis (red) and prolifera+on (blue) nodes.Nodes for which no data is available are shown in gray.B) This figure illustrates the fi[ng results for 16 measured genes within the context of a protein-protein interac+on network.In the network, 18 nodes serve as regulatory nodes, while two nodes func+on as output nodes.Each of the 16 genes is depicted by four data points, corresponding to dis+nct +me points: 0 hours (purple), 4 hours (red), 24 hours (blue), and 72 hours (yellow).Addi+onally, accompanying each gene in the network, a chi-square value is provided to gauge the goodness of fit for the respec+ve gene's expression data, total Chi2 = 30.4133(64 data points, 91 free parameters).The red box symbolizes the node AURKA, which has the highest chi-square value (Chi2 = 7.25452).Ini+al value for x21= 0.3.Created with BioRender.com.
To improve the fiCng of AURKA, a fic2ve node is inserted to strengthen the ac2va2on of AURKA and all best-fit parameters from the previous run were taken to improve the fit to the data (Fig. sup10(A); orange; (B)).The ac2vity of AURKA exhibited a negligible change, resul2ng in a slight improvement in the chi-square value, which now stands at 18,947.This outcome occurred aAer adap2ng the op2mal fit parameters from the preceding run and ini2alizing the ini2al value of the fic2ve node x22 (fic2ve2) with 1 and of the fic2ve node x21 (fic2ve) with 0.16.These fic2ve nodes are modeled as nodes with a constant ac2vity level.In Fig. sup10, we see the node "fic2ve" in green and the node fic2ve2 in orange associated with AURKA and FBXW7.These changes are made in the file UpdateSetPars.txt.The "arBestFit_SetPars" does not have to be performed repeatedly (can be commented with a % in the advanced_script (Fig. 3) and only "arUpdateSetPars" with the changed values is used.The total chi-square value of 68.9107 is increased mainly by the Hepatocyte Growth Factor Receptor (MET) node, which has a chi-square value of 40.5377 (Fig. sup10 (B); red box).These fic+ve nodes play different roles within the network and postulate the ac+on of genes that we later iden+fy in our gene set.The green node exerts an inhibitory influence on FBXW7, while the orange node serves as an ac+vator for AURKA.In addi+on, a node highlighted in yellow indicates KRAS, which is under treatment condi+ons with a KRAS inhibitor based on the data.The primary func+onal endpoints within this network are represented by apoptosis nodes in red and prolifera+on nodes in blue, highligh+ng important cellular processes.Nodes for which no data is available are shown in gray.B) The fi[ng results for 16 measured genes within the context of a protein-protein interac+on network are visualized.In this network, there are 18 nodes serving as regulatory nodes, and two nodes specifically func+oning as output nodes.Each of the 16 genes is represented by four data points, corresponding to dis+nct +me points: 0 hours (purple), 4 hours (red), 24 hours (blue), and 72 hours (yellow).In addi+on to each gene's representa+on in the network, a chi-square value is provided to assess the goodness of fit for the respec+ve gene's expression data.The total chi-square value is calculated as 68.9107, considering 64 data points and 95 free parameters.Notably, the node MET is highlighted with a red box due to its associa+on with the highest chi-square value (Chi2=40.5377).For the op+miza+on run, the start value for both fic+ve nodes are set up as arSetPars('init_x22',1.0,1,0,0.0,1.0) and ('init_x21',0.16,1,0,0.0,1.0).Created with Bio-Render.com.
To improve the fit to the data, an addi2onal third node with a constant expression value, which inhibits MET, is added (Fig. sup11 (A)) and all previous best fit parameters are taken from the calcula2on before.AAer fiCng, the total chi-square value is 10.7794.This fiCng is performed by upda2ng the ini2al values for x22 (fic2ve2) and x23 (fic2ve3) to 0.5 for both in the "arUpdateSetPars" func2on.This result of a low total chi-square value in a sum over all 16 nodes compared to our first version of the topology indicates a good fit to the data.Also visually, the curves fit well to the data points within their standard errors.Our next step is to find representa2ve biologically corresponding nodes for the fic2ve nodes in the model to make our model closer to the modeled system.The search for genes to replace the fic2ve nodes leads to 3 genes.For two of these nodes, Microtubule Nuclea2on Factor (TPX2; orange) and Dual Specificity Tyrosine Phosphoryla2on Regulated Kinase 2 (DYRK2; green), measurements were performed and included as nodes with a constant expression level into the fiCng processes, please see Fig. sup1 (D3) how to set nodes to a constant expression level.The fic2ve node on MET is replaced by TP53 as an inhibitory influence on MET, reported in (Zhou et al. 2023).However, we do not have measurement data to examine if the fiPed constant expression value matches the measurements within their standard errors, as we see it for the other two genes.The op2miza2on method then only op-2mizes the ini2al value, which the corresponding node constantly takes for the whole 2me, such that the chi-square value is minimized.The node for inhibi2on of MET, the Tumor Protein P53 (TP53) gene, serves here as a poten2al node that can be verified in further experiments with corresponding data.We see that how our proposed framework can guide data acquisi2on purposefully to genes that seem meaningful to improve and extend a model targeted.In a future study, we could conduct a more in-depth inves2ga2on to iden2fy genes that poten2ally exhibit suitable interac2ons.In any case, the fic2ve node provides the informa2on that an inhibi2ng effect from some gene is probably missing but could be postulated based on our currently available data.The fiCng process with a total chi-square value of 19.951 is slightly increased by the new data input of TPX2 and DYRK2.The main reason for this increasement is that DYRK2 shows a higher ac2va2on in the data than needed in the model.The result is shown in Fig. 4.

Tutorial for finding most efficient interven/on points
We showcase the applica2on of the external s2muli part of JimenaE for iden2fying drug targets based on real data from a H358 cell line.The best fiCng model is used, which is given in our git repository in the "Example" folder in the PEtab, imported as yED graph (yEd 2019) (the topology file is available as H358_topology in our git repository in the "Example" folder), and analyzed with "D2D ExternalS2muli" (Fig. sup4).The following configura2ons have been made: The nodes of interest, namely apoptosis and prolifera2on, were established as target nodes with desired values.Both were assigned a weight of 1, with a target value of 1 for apoptosis and 0 for prolifera2on.In this context, the primary focus is on apoptosis to combat cancer, but also prolifera2on should s2ll be downregulated.Possible drug-target nodes are selected taking into considera2on whether they should receive a posi2ve (ac2va2ng) or nega2ve (inhibitory) regula2on (Fig. sup4).Further, a parameter file from the best fit (in the folder PEtab à parameter) and mapping file (output JimenaE à generated via 1.Step (Fig. 1)) are selected, both given in our git repository.AAer loading the files, the previous external s2muli are automa2cally set up in the "Set controls" (Fig. sup4).AAerward, a MATLAB script is generated using the "generate" func2on (Fig. sup4) and given a specific name.The file is saved in the directory where it will be executed alongside the external s2muli helper func2ons (folder "exter-nal_s2muli" in the git repository).For star2ng the analysis, the generated script is opened in MATLAB.The execu2on of this script in MATLAB requires the "Symbolic Math Toolbox".The toolbox "Parallel Compu2ng Toolbox" package is op2onal.If the "Parallel Compu2ng Toolbox" is not available, just replace "parfor" by "for" in the corresponding scripts switching off the parallel computa2on of the corresponding for-loops.To calculate external s2muli, at least one of the flags "combi_method" or "local_op2miza-2on_method" must be set to 1.For expedited calcula2ons, the tolerance parameter in line 17 of the JimenaE output file name_main.mhas been adjusted from tol2=10^-14 to tol2=10^-7.For our use case, this accuracy turned out sufficient meaning that a higher accuracy did not influence if an external s2mulus is constant zero or non-zero.Before the script is executed, the alpha parameter must be configured.The parameter alpha is used to weigh the ini2a2on of an external s2mulus to a complete inac2vity of the corresponding external s2mulus.The external s2mulus is some2mes called control as a synonym, which is used in, e.g., Fig. sup4 or Fig. sup5.Essen2ally, a higher alpha value means that the external s2mulus must be more ef-fec2ve in terms of steering the nodes of interest to the desired state not to be set to constant zero.The higher alpha, the more effec2ve the contribu2ons of the external s2mulus have to be such that it is non-zero.Consequently, increasing alpha for each run solving the op2mal control problem leaves only the most effec2ve external s2muli non-zero achieving a high apoptosis and a low prolifera2on.We choose alpha equal to 0.1 as a star2ng point.As a result, two plots are shown, Fig. sup12 and Fig. sup13.The first diagram depicts the observed external s2muli that are most effec2ve in achieving the desired state, which involves downregula2ng pro-lifera2on and upregula2ng apoptosis.For the calcula2on, 14 external s2muli were used affecting genes such that they are down-regulated (KRAS, MYC, NFE2L2, EP300, MET, TGFBR1, EGFR, STK11, AURKA, CD44, PIK3CA, FOS) and up-regulated (CDH1, GSK3B).The inhibiting external stimulus on KRAS results from the data since the data was generated by administrating the KRAS inhibitor.Consequently, the corresponding external stimulus needed to be set active in the fitting process of the model and is automatically set in JimenaE (see Fig. sup1(D1) and its explanations for details).The exact 2me curves of the external s2muli influencing the corresponding nodes are not essen2al for our study but the informa2on which external s2muli are set to constant zero since they are not effec2ve enough compared to the non-zero ones.However, for a model fiPed to experimental data, our presented framework can be used with the exact 2me curves of the external s2muli, which could mean a drug applica2on in terms of amount or dosage over 2me.The second plot is a 2me-dependent graph showing the output of the "nodes of interest" (Fig. sup12(B)).The nodes of interest are states supposed to have a certain value that deviates from the scenario where no external s2mulus is ac2ve.By the ac-2on of the external s2muli, we obtain a low prolifera2on and a high apoptosis as desired.By increasing the alpha parameter, it becomes possible to iden2fy more effec2ve drug targets regarding influencing our nodes of interest and to reduce the number of non-zero external s2muli.It is essen2al to take the changes of the nodes of interest into account caused by the drug target selec2on to ensure that the chances of the external s2muli regarding the nodes of interest are not minor due to a too high alpha.A high alpha hinders an extensive use of external s2muli because their ac2on cannot provide a corresponding benefit regarding pushing the nodes of interest to the desired values.If alpha is too high, all external s2muli are eventually constantly zero.It is always necessary to consider both plots, e.g., Fig. 5 and Fig. sup13.The values of the target nodes also allow a quan2fica2on of the effect of the ac2ve external s2muli.Consequently, we can see, for example, if there are non-linear effects in terms of the number of external s2muli meaning that a second ac2ve external s2mulus can bring much more effect regarding the desired nodes being close to their desired values compared to just a single drug.Increasing alpha results into the best combina2on of drug target nodes according to our data and model presented (Fig. 6).To investigate the influence of external stimuli, we established an alpha value of 0.5 and conducted experiments on 14 external stimuli.These external stimuli encompassed genes that were either down-regulated (e.g., KRAS, MYC, NFE2L2, EP300, MET, TGFBR1, EGFR, STK11, AURKA, CD44, PIK3CA, FOS) or up-regulated (such as CDH1 and GSK3B).A) The findings are visually depicted in a plot, highlighting specific drug-regulated nodes namely CDH1, STK11, and AURKA.B) In terms of the study's outcomes related to therapeutic efficacy, there is an observed decrease in cell proliferation and a heightened apoptotic effect.

Figure sup1 :
Figure sup1: GUI for crea+ng the D2D relevant files, see text for an explana+on of the purpose of those files (GUI1)

Figure sup2 :
Figure sup2: The model.def file defines important model specifica+ons for the D2D framework.A detailed de-scrip+on is given in the text.

Figure sup3 :
Figure sup3: The data.def file defines specifica+ons about the data to which the model parameters are supposed to be fiGed.A detailed descrip+on is given in the text.

Figure sup4 :
Figure sup4: JimenaE GUI to generate MATLAB scripts to apply the external s+muli framework (GUI2).A detailed descrip+on is in the text.

Figure sup5 :
Figure sup5: MATLAB script for the external s+muli framework generated by our JimenaE pipeline.Details can be found in the text.

Figure sup6 :
Figure sup6: GUI to generate the MATLAB script to calculate op9mal external s9muli.Details are given in the text.

Figure sup7 :
Figure sup7: MATLAB script Main.m generated with JimenaE for calcula+ng op+mal external s+muli for switching between two steady states.M1:The parameters as calculated by D2D, with any new regula2ons (SA2) added as corresponding deltas.M2: The state selected as the ini2al one (SA1) where the other parameters have the same meaning as in the other external s2muli MATLAB script.M3: The state selected as targeted state (SA1).M4: The equa2ons determining the network behavior for each node.

Figure sup. 8 :
Figure sup.8:Best parameter Fi7ng: Ini:al Approxima:on to the data.A) A topology of a protein-protein inter-ac+on of the H358 cell line (modified aSer (Peindl et al. 2022)).Based on data, a yellow node symbolizes the node under treatment condi+ons (KRAS-Inhibitor).The primary func+onal endpoints within this network are represented by apoptosis nodes in red and prolifera+on nodes in blue, highligh+ng important cellular processes.Apoptosis (red node) and prolifera+on (blue node) serve as output nodes.Grey nodes are without any data.B) Fi[ng results of 16 measured genes (18 nodes serve as regula+on nodes, two as output).For each gene, four data points are visualized in the graph depending on +me points (0h(purple), 4h (red), 24h (blue), and 72h (yellow)).In the protein-protein interac+on network, each gene is accompanied by a chi-square value, total Chi2 = 97.8907with 64 data points and 91 free parameters.Created with BioRender.com.

Figure sup9 :
Figure sup9: Best Parameter Fi7ng: Extension with a fic:ve node.A) The protein-protein interac+on network for the H358 cell line (modified from (Peindl et al. 2022)) features an extension with a fic+ve node highlighted in green.This fic+ve node exerts an inhibitory influence on FBXW7 (green) within the network.The node highlighted in yellow, represents the node KRAS under treatment condi+ons with a KRAS inhibitor.Major func+onal endpoints are represented by apoptosis (red) and prolifera+on (blue) nodes.Nodes for which no data is available are shown in gray.B) This figure illustrates the fi[ng results for 16 measured genes within the context of a protein-protein interac+on network.In the network, 18 nodes serve as regulatory nodes, while two nodes func+on as output nodes.Each of the 16 genes is depicted by four data points, corresponding to dis+nct +me points: 0 hours (purple), 4 hours (red), 24 hours (blue), and 72 hours (yellow).Addi+onally, accompanying each gene in the network, a chi-square value is provided to gauge the goodness of fit for the respec+ve gene's expression data, total Chi2 = 30.4133(64 data points, 91 free parameters).The red box symbolizes the node AURKA, which has the highest chi-square value (Chi2 = 7.25452).Ini+al value for x21= 0.3.Created with BioRender.com.

Figure sup10 :
Figure sup10: Best Parameter Fi7ng: 2 fic:ve nodes.A) The protein-protein interac+on network for the H358 cell line (aSer (Peindl et al. 2022)) was extended by a second fic+ve node, both marked in green and orange.These fic+ve nodes play different roles within the network and postulate the ac+on of genes that we later iden+fy in our gene set.The green node exerts an inhibitory influence on FBXW7, while the orange node serves as an ac+vator for AURKA.In addi+on, a node highlighted in yellow indicates KRAS, which is under treatment condi+ons with a KRAS inhibitor based on the data.The primary func+onal endpoints within this network are represented by apoptosis nodes in red and prolifera+on nodes in blue, highligh+ng important cellular processes.Nodes for which no data is available are shown in gray.B) The fi[ng results for 16 measured genes within the context of a protein-protein interac+on network are visualized.In this network, there are 18 nodes serving as regulatory nodes, and two nodes specifically func+oning as output nodes.Each of the 16 genes is represented by four data points, corresponding to dis+nct +me points: 0 hours (purple), 4 hours (red), 24 hours (blue), and 72 hours (yellow).In addi+on to each gene's representa+on in the network, a chi-square value is provided to assess the goodness of fit for the respec+ve gene's expression data.The total chi-square value is calculated as 68.9107, considering 64 data points and 95 free parameters.Notably, the node MET is highlighted with a red box due to its associa+on with the highest chi-square value (Chi2=40.5377).For the op+miza+on run, the start value for both fic+ve nodes are set up as arSetPars('init_x22',1.0,1,0,0.0,1.0) and ('init_x21',0.16,1,0,0.0,1.0).Created with Bio-Render.com.

Figure sup11 :
Figure sup11: Best Parameter Fi7ng: 3 fic:ve nodes.A) An extension of the protein-protein interac+on network (modified aSer (Peindl et al. 2022)) for cell line H358 by a third fic+ve node (turquoise).The turquoise node exerts a suppressive effect on MET, which is also shown in turquoise and represents a tyrosine kinase receptor.Nodes for which no data is available are shown in gray, whereas the yellow node corresponds to the KRAS node under certain treatment condi+ons with a KRAS inhibitor.The major output nodes are color-coded, with red representing apoptosis and blue represen+ng prolifera+on.B) Fi[ng results for 16 measured genes within a protein-protein interac+on network are displayed.This network consists of 18 regulatory nodes and two output nodes.Each gene is represented by four data points at varying +me intervals: 0 hours (purple), 4 hours (red), 24 hours (blue), and 72 hours (yellow).Chi-square values are provided for each gene to assess the goodness of fit for their expression data.The total chi-square value is 10.7794, considering 64 data points and 95 free parameters.The ini+al values in the op+miza+on rou+ne for x21, x22, and x23 are 0.16, 0.5, and 0.5, respec+vely and are adapted during the op+miza+on run such that the ac+vity levels of the nodes fit the data best.Created with BioRender.com.

Figure sup12 :
Figure sup12: External s:muli first run.To identify the impact of external stimuli, we set the alpha value to 0.1 and tested 14 external stimuli that down-regulated genes (KRAS, MYC, NFE2L2, EP300, MET, TGFBR1, EGFR, STK11, AURKA, CD44, PIK3CA, FOS) and up-regulated genes (CDH1, GSK3B).A) The plots show the activity level of the corresponding drugs influencing CDH1, NFE2L2, STK11, AURKA, and PIK3CA.B) The outcomes of this study regarding therapeutic efficacy show a downregulation in proliferation and an increase in apoptotic effects.

Figure sup13 :
Figure sup13: External s:muli second run.To investigate the influence of external stimuli, we established an alpha value of 0.5 and conducted experiments on 14 external stimuli.These external stimuli encompassed genes that were either down-regulated (e.g., KRAS, MYC, NFE2L2, EP300, MET, TGFBR1, EGFR, STK11, AURKA, CD44, PIK3CA, FOS) or up-regulated (such as CDH1 and GSK3B).A) The findings are visually depicted in a plot, highlighting specific drug-regulated nodes namely CDH1, STK11, and AURKA.B) In terms of the study's outcomes related to therapeutic efficacy, there is an observed decrease in cell proliferation and a heightened apoptotic effect.