An Integrative Genome-Scale Metabolic Modeling and Machine Learning Framework for Predicting and Optimizing Single-Cell Protein Production in Saccharomyces cerevisiae
arXiv:2603.25561v2 Announce Type: replace
Abstract: Saccharomyces cerevisiae is increasingly recognised as a key source for single-cell protein (SCP) production, a rising solution to global protein-supply challenges. This study presents a computational framework combining the Yeast9 genome-scale metabolic model (GEM) with machine learning and optimisation to predict and enhance biomass flux for SCP yield. The Yeast9 GEM, comprising 4,131 reactions, 2,806 metabolites, and 1,161 genes, was simulated using flux balance analysis (FBA) across 2,000 Latin Hypercube-sampled flux profiles. Random Forest and XGBoost regressors achieved R2 values of 0.9999760 and 0.9997702, respectively. A variational autoencoder (VAE) identified four metabolic clusters with mean biomass fluxes of 0.472, 0.493, 0.527, and 0.505 gDW/hr. SHAP-based feature attribution identified twenty key reactions in glycolysis, the TCA cycle, and amino-acid biosynthesis; 18/20 (90%) were confirmed essential by in silico knockout. Bayesian optimisation produced a 12.13-fold improvement in biomass flux (0.0858 to 1.041 gDW/hr) at glucose = -20.0, oxygen = -20.0, and ammonium = -8.9 mmol/gDW/hr. A generative adversarial network (GAN) generated novel flux configurations (variance = 0.124); stoichiometric feasibility verification returned 0/100 feasible profiles due to incomplete generator convergence, reported as a limitation. Pareto front analysis identified an optimal SCP operating point at 0.0858 gDW/hr biomass flux with amino-acid biosynthesis score of 1000.029 mmol/gDW/hr.