Biotechnology — Predictive computational models in personalized medicine research — Part 1: Constructing, verifying and validating models

This document specifies requirements and recommendations for the design, development and implementation of predictive computational models for research purposes in the field of personalized medicine and health product development. This document addresses the set-up, formatting, validation, simulation, storing and sharing of computational models used for personalized medicine. Requirements and recommendations for data used to construct or required for validating such models are also specified. This includes rules for formatting, descriptions, annotations, interoperability, integration, access and provenance of such data. This document does not apply to computational models used for standard routine clinical, diagnostic or therapeutic purposes.

Biotechnologie — Modèles informatiques prédictifs dans la recherche sur la médecine personnalisée — Partie 1: Construction, vérification et validation des modèles

General Information

Status
Published
Publication Date
17-Jun-2026
Technical Committee
ISO/TC 276 - Biotechnology
Drafting Committee
ISO/TC 276 - Biotechnology
Current Stage
6060 - International Standard published
Start Date
18-Jun-2026
Due Date
20-Jun-2026
Completion Date
18-Jun-2026

Buy Documents

Standard

ISO 9491-1:2026 - Biotechnology — Predictive computational models in personalized medicine research — Part 1: Constructing, verifying and validating models

Release Date:18-Jun-2026
English language (33 pages)
sale 15% off
Preview
sale 15% off
Preview

Relations

Effective Date
24-Jun-2023

Overview

ISO 9491-1:2026 is an international standard developed by ISO’s Technical Committee on Biotechnology, focusing on the design, development, and implementation of predictive computational models specifically for research in personalized medicine. This standard provides requirements and recommendations to ensure computational models are effectively constructed, verified, and validated for use in health-related research and health product development.

Importantly, ISO 9491-1:2026 addresses the entire model lifecycle-from set-up and formatting to simulation, storing, and sharing-ensuring research processes are transparent, data-driven, and aligned with best practices for interoperability and data provenance. While models used in routine clinical, diagnostic, or therapeutic settings fall outside its scope, the standard is pivotal for research and development environments where innovation and reproducibility are crucial.

Key Topics

  • Predictive Computational Models: Guidance on building and curating in silico models for personalized medicine, including mathematical, data-driven, and AI-based approaches.
  • Data Requirements: Specification of data formatting, description, annotation, integration, and access protocols. Emphasis is placed on the importance of high-quality, interoperable data for robust model development.
  • Model Validation and Verification: Recommendations for calibrating, validating, and verifying models using independent and quality-controlled datasets to ensure accuracy and reproducibility.
  • Simulation and Sharing: Requirements for documenting simulation set-ups, capturing results, and sharing model outputs within the research community.
  • Data Provenance and Metadata: Ensuring traceability and reliability through robust documentation of data sources, processing, and ownership.
  • Ethical Considerations: Outlining ethical requirements-such as privacy and data protection-in computational modelling for personalized medicine.

Applications

ISO 9491-1:2026 is especially valuable for organizations and researchers involved in:

  • Personalized Medicine Research: Facilitating translational research where computational models are used to predict disease risk, therapeutic responses, and disease progression.
  • Biotechnology and Pharmaceutical Development: Supporting the simulation and validation of new health products or interventions through standardized model building and data integration.
  • Clinical Trials: Enhancing the design and execution of in silico clinical trials by providing a framework for virtual patient simulations, pharmacokinetic/dynamic modelling, and quantitative systems pharmacology.
  • Collaborative Projects and Research Consortia: Improving data exchange, model reusability, and comparability between institutions and research teams through harmonized standards.
  • AI and Machine Learning in Healthcare: Establishing guidelines for the use of data-driven models, including artificial intelligence and machine learning systems, with a focus on model quality, transparency, and result reproducibility.

Related Standards

ISO 9491-1:2026 aligns with and references several other international standards crucial for data management and modelling in biotechnology, including:

  • ISO 20691: Biotechnology - Requirements for data formatting and description in the life sciences
  • ISO 20387:2026: Biotechnology - Biobanking - General requirements for biobanks
  • ISO 23494-1: Provenance information model for biological material and data - Part 1: Design concepts and general requirements
  • ISO 23494-2: Common Provenance Model

These standards collectively support the FAIR (Findable, Accessible, Interoperable, Reusable) and ALCOA (Attributable, Legible, Contemporaneous, Original, Accurate) principles integral to data and model management in biomedical research.

Practical Value

Implementing ISO 9491-1:2026 enables:

  • Enhanced Reproducibility: By standardizing processes, models, and data, research outcomes become easier to reproduce and validate across the global scientific community.
  • Greater Data Interoperability: Reducing heterogeneity and fostering integration of disparate health data sources.
  • Improved Collaboration: Supporting multi-site studies and data sharing while safeguarding data integrity and provenance.
  • Robust Model Development for R&D: Accelerating the advancement of innovative therapies and personalized medicine approaches through validated and reliable predictive models.

By integrating the requirements and recommendations of ISO 9491-1:2026, organizations and researchers position themselves at the forefront of standardized, impactful biomedical innovation.

Buy Documents

Standard

ISO 9491-1:2026 - Biotechnology — Predictive computational models in personalized medicine research — Part 1: Constructing, verifying and validating models

Release Date:18-Jun-2026
English language (33 pages)
sale 15% off
Preview
sale 15% off
Preview

Frequently Asked Questions

ISO 9491-1:2026 is a standard published by the International Organization for Standardization (ISO). Its full title is "Biotechnology — Predictive computational models in personalized medicine research — Part 1: Constructing, verifying and validating models". This standard covers: This document specifies requirements and recommendations for the design, development and implementation of predictive computational models for research purposes in the field of personalized medicine and health product development. This document addresses the set-up, formatting, validation, simulation, storing and sharing of computational models used for personalized medicine. Requirements and recommendations for data used to construct or required for validating such models are also specified. This includes rules for formatting, descriptions, annotations, interoperability, integration, access and provenance of such data. This document does not apply to computational models used for standard routine clinical, diagnostic or therapeutic purposes.

This document specifies requirements and recommendations for the design, development and implementation of predictive computational models for research purposes in the field of personalized medicine and health product development. This document addresses the set-up, formatting, validation, simulation, storing and sharing of computational models used for personalized medicine. Requirements and recommendations for data used to construct or required for validating such models are also specified. This includes rules for formatting, descriptions, annotations, interoperability, integration, access and provenance of such data. This document does not apply to computational models used for standard routine clinical, diagnostic or therapeutic purposes.

ISO 9491-1:2026 is classified under the following ICS (International Classification for Standards) categories: 07.080 - Biology. Botany. Zoology. The ICS classification helps identify the subject area and facilitates finding related standards.

ISO 9491-1:2026 has the following relationships with other standards: It is inter standard links to ISO/TS 9491-1:2023. Understanding these relationships helps ensure you are using the most current and applicable version of the standard.

ISO 9491-1:2026 is available in PDF format for immediate download after purchase. The document can be added to your cart and obtained through the secure checkout process. Digital delivery ensures instant access to the complete standard document.

Standards Content (Sample)


International
Standard
ISO 9491-1
Second edition
Biotechnology — Predictive
2026-06
computational models in
personalized medicine research —
Part 1:
Constructing, verifying and
validating models
Biotechnologie — Modèles informatiques prédictifs dans la
recherche sur la médecine personnalisée —
Partie 1: Construction, vérification et validation des modèles
Reference number
© ISO 2026
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting on
the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address below
or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii
Contents Page
Foreword .iv
Introduction .v
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
4 Principles . 5
4.1 General .5
4.2 Computational models in personalized medicine .5
4.2.1 General .5
4.2.2 Cellular systems biology models .6
4.2.3 Risk prediction for common diseases.7
4.2.4 Disease course and therapy response prediction .7
4.2.5 Pharmacokinetic/pharmacodynamic modelling and in silico trial simulations .8
4.2.6 Artificial intelligence systems (AI systems) .8
4.3 Standardization needs for computational models.9
4.3.1 General .9
4.3.2 Challenges .9
4.3.3 Common standards relevant for personalized medicine .10
4.4 Data preparation for integration into computer models .10
4.4.1 General .10
4.4.2 Pre-examination data .11
4.4.3 Data formatting . 12
4.4.4 Data description . 13
4.4.5 Data annotation (semantics) . 13
4.4.6 Data interoperability requirements across subdomains .14
4.4.7 Data integration . 15
4.4.8 Data provenance information . 15
4.4.9 Data access .16
4.5 Model formatting . .16
4.6 Model validation .17
4.6.1 General .17
4.6.2 Specific recommendations for model validation .17
4.7 Model simulation .19
4.7.1 General .19
4.7.2 Requirements for capturing and sharing simulation set-ups. 20
4.7.3 Requirements for capturing and sharing simulation results . 20
4.8 Requirements for model storing and sharing . 20
4.9 Application of models in clinical trials and research .21
4.9.1 General .21
4.9.2 Specific recommendations .21
4.10 Ethical requirements for modelling in personalized medicine .21
Annex A (informative) Common standards relevant for personalized medicine and in silico
approaches .23
Annex B (informative) Information on modelling approaches relevant for personalized
medicine . .26
Bibliography .28

iii
Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards
bodies (ISO member bodies). The work of preparing International Standards is normally carried out through
ISO technical committees. Each member body interested in a subject for which a technical committee
has been established has the right to be represented on that committee. International organizations,
governmental and non-governmental, in liaison with ISO, also take part in the work. ISO collaborates closely
with the International Electrotechnical Commission (IEC) on all matters of electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are described
in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the different types
of ISO document should be noted. This document was drafted in accordance with the editorial rules of the
ISO/IEC Directives, Part 2 (see www.iso.org/directives).
ISO draws attention to the possibility that the implementation of this document may involve the use of (a)
patent(s). ISO takes no position concerning the evidence, validity or applicability of any claimed patent
rights in respect thereof. As of the date of publication of this document, ISO had not received notice of (a)
patent(s) which may be required to implement this document. However, implementers are cautioned that
this may not represent the latest information, which may be obtained from the patent database available at
www.iso.org/patents. ISO shall not be held responsible for identifying any or all such patent rights.
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and expressions
related to conformity assessment, as well as information about ISO's adherence to the World Trade
Organization (WTO) principles in the Technical Barriers to Trade (TBT), see www.iso.org/iso/foreword.html.
This document was prepared by Technical Committee ISO/TC 276, Biotechnology.
This second edition cancels and replaces the first edition (ISO/TS 9491-1:2023), which has been technically
revised.
The main changes are as follows:
— normative references in Clause 2 have been consolidated, updated and revised;
— update and clarification of terminology including the alignment with the terminology of ISO/TS 9491-2;
— updated to match the latest developments in the domain;
— bibliography has been revised and updated;
— editorial revision and clarification of wording.
A list of all parts in the ISO 9491 series can be found on the ISO website.
Any feedback or questions on this document should be directed to the user’s national standards body. A
complete listing of these bodies can be found at www.iso.org/members.html.

iv
Introduction
The capacity to generate data in life sciences and health research has greatly increased in the last decade.
In combination with patient/personal-derived data, such as electronic health records, patient registries and
databases, as well as lifestyle information, this big data holds an immense potential for clinical applications,
especially for computer-based models with predictive capacities in personalized medicine. However, and
despite the ever-progressing technological advances in producing data, the exploitation of big data to
generate new knowledge for medical benefits, while guaranteeing data privacy and security, is lacking
behind its full potential. A reason for this obstacle is the inherent heterogeneity of big data and the lack
of broadly accepted standards allowing interoperable integration of heterogeneous health data to perform
analysis and interpretation for predictive modelling approaches in health research, such as personalized
medicine.
Common standards lead to a mutual understanding and improve information exchange within and across
research communities and are indispensable for collaborative work. In order to setup computer models in
personalized medicine, data integration from heterogeneous and different sources at different times plays a
key role. Consistent documentation of data, models and simulation results based on basic guiding principles
[6]
for data management practices, such as FAIR (findable, accessible, interoperable, reusable) or ALCOA
(attributable, legible, contemporaneous, original, accurate), and standards can ensure that the data and
the corresponding metadata (data describing the data and its context), as well as the models, methods and
visualizations, are of reliable high quality.
Hence, standards for biomedical and clinical data, simulation models and data exchange are a prerequisite
[7]
for reliable integration of health-related data . Such standards, together with harmonized ways to describe
their metadata, ensure the interoperability of tools used for data integration and modelling, as well as the
reproducibility of the simulation results. In this sense, modelling standards are agreed ways of consistently
structuring, describing, and associating models and data, their respective parts and their graphical
visualization, as well as the information about applied methods and the outcome of model simulations. Such
standards also assist in describing how constituent parts interact, or are linked together, and how they are
embedded in their physiological context.
Major challenges in the field of personalized medicine are to:
a) harmonize the standardization efforts that refer to different data types, approaches and technologies;
b) make the standards interoperable, so that the data can be compared and integrated into models.
An overall goal is to FAIRify data and processes in order to improve data integration and reuse. An additional
challenge is to ensure a legal and ethical framework enabling interoperability.
This document presents computational modelling requirements and recommendations for research in
the field of personalized medicine, especially with focus on collaborative research, such that health-
related data can be optimally used for translational research and personalized medicine worldwide.
The recommendations are primarily oriented towards the application of computational modelling in
the biotechnology domain (e.g. biomolecular and cellular research, as well as in clinical trials and drug
development), but also can be applied in other fields of personalized medicine research.

v
International Standard ISO 9491-1:2026(en)
Biotechnology — Predictive computational models in
personalized medicine research —
Part 1:
Constructing, verifying and validating models
1 Scope
This document specifies requirements and recommendations for the design, development and
implementation of predictive computational models for research purposes in the field of personalized
medicine and health product development.
This document addresses the set-up, formatting, validation, simulation, storing and sharing of computational
models used for personalized medicine. Requirements and recommendations for data used to construct
or required for validating such models are also specified. This includes rules for formatting, descriptions,
annotations, interoperability, integration, access and provenance of such data.
This document does not apply to computational models used for standard routine clinical, diagnostic or
therapeutic purposes.
2 Normative references
The following documents are referred to in the text in such a way that some or all of their content constitutes
requirements of this document. For dated references, only the edition cited applies. For undated references,
the latest edition of the referenced document (including any amendments) applies.
1)
ISO 20691, Biotechnology — Requirements for data formatting and description in the life sciences
2)
ISO 20387:2026, Biotechnology — Biobanking — General requirements for biobanks
ISO 23494-1, Biotechnology — Provenance information model for biological material and data — Part 1: Design
concepts and general requirements
ISO 23494-2, Biotechnology — Provenance information model for biological material and data — Part 2:
Common Provenance Model
3 Terms and definitions
For the purposes of this document, the following terms and definitions apply.
ISO and IEC maintain terminology databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at https:// www .iso .org/ obp
— IEC Electropedia: available at https:// www .electropedia .org/
1) https:// fairsharing .org/ 3533
2) Under preparation. Stage at the time of publication: ISO/FDIS 20387:2026.

3.1
artificial intelligence
AI
research and development of mechanisms and applications of AI systems (3.2)
Note 1 to entry: Research and development can take place across any number of fields such as computer science, data
science, humanities, mathematics and natural sciences.
[SOURCE: ISO/IEC 22989:2022, 3.1.3]
3.2
artificial intelligence system
AI system
engineered system that generates outputs such as content, forecasts, recommendations or decisions for a
given set of human-defined objectives
Note 1 to entry: The engineered system can use various techniques and approaches related to artificial intelligence to
develop a model to represent data, knowledge, processes, etc. which can be used to conduct tasks.
Note 2 to entry: AI systems are designed to operate with varying levels of automation.
[SOURCE: ISO/IEC 22989:2022, 3.1.4]
3.3
big data
extensive datasets — primarily in the data characteristics of volume, variety, velocity, and/or variability —
that require a scalable technology for efficient storage, manipulation, management, and analysis
Note 1 to entry: Big data is commonly used in many different ways, for example as the name of the scalable technology
used to handle big data extensive datasets.
EXAMPLE High volume, high diversity biological, clinical, environmental, and lifestyle information collected from
single individuals to large cohorts, in relation to their health and wellness status, at one or several time points (see
reference [8] for additional information).
[SOURCE: ISO/TR 24291:2021, 3.2, modified — EXAMPLE added.]
3.4
community consensus standard
standard that reflects the results of a consensus standardization effort from a specific domain-specific
expert group outside of recognized standard defining organizations and their technical committees
Note 1 to entry: Created by domain-specific professional societies, scientific standardization initiatives, individual
organizations or research communities (often in collaboration with industry partners)
Note 2 to entry: Often publicly available, open and not proprietary
3.5
computational model
in silico model
description of a biological system in either a mathematical expression or graphical form, or both, that is
implemented and studied with a computer highlighting objects and their interactions
Note 1 to entry: An object distributed processing (ODP) concept.
[SOURCE: ISO/IEC 16500-8:1999, 3.6, modified — Admitted term added. “biological”, “mathematical
expression or”, “, or both, that is implemented and studied with a computer” added, “interfaces” changed
to “interactions” and “as such it is similar to the OMT and UML notion of a class diagram” deleted from the
definition. “An object distributed processing (ODP) concept” moved to Note 1 to entry.]

3.6
data-driven model
model developed through the use of data derived from tests or from the output of investigated process or
from real world data or routinely acquired primary care data
[SOURCE: ISO 15746-1:2015, 2.4, modified — “or from real world data or routinely acquired primary care
data” added]
3.7
harmonization of data concepts
data harmonization
process of reconciling differences in semantics, structure and syntax of similar data concepts
Note 1 to entry: Harmonization can include the establishment of a single pervasive definition for each data concept
(i.e. standardization), but can also encompass flexible approaches in which definitions can be understood to grow
closer without becoming identical.
[SOURCE: ISO/TR 25100:2012, 2.1.4, modified — “harmonisation” replaced by “harmonization”, “may” in
Note 1 to entry replaced by “can”.]
3.8
data integration
systematic combining of data from different independent and potentially heterogeneous sources, to create a
more compatible, unified view of these data for research purpose
[SOURCE: ISO 5127:2017, 3.1.11.24]
3.9
genome-wide association studies
GWAS
testing of genetic variants across the genomes of many individuals to identify genotype–phenotype
associations
3.10
in silico clinical trial
use of computer modelling and simulation(s) to mimic human experimentation in the development or
regulatory evaluation process of a medicinal product (e.g. medical device) or medical intervention, under
defined conditions using verified and validated models
Note 1 to entry: It is a subdomain of ‘in silico medicine’, the discipline that encompasses the use of individualised
computer simulations in all aspects of the prevention, diagnosis, prognostic assessment, and treatment of disease.
[SOURCE: Reference [9], modified — Note 1 to entry added.]
3.11
in silico approach
computer-executable analyses of mathematical model(s) (3.13) to study and simulate a biological system
3.12
machine learning
ML
computer technology with the ability to automatically learn and improve from experience without being
explicitly programmed
EXAMPLE Speech recognition, predictive text, spam detection, or optimizing model parameters through
computational techniques, such that the model's behaviour reflects the data or experience.
[SOURCE: ISO 20252:2019, 3.52, modified — Abbreviated term “ML” added and EXAMPLES changed to
“Speech recognition, predictive text, spam detection, or optimizing model parameters through computational
techniques, such that the model's behaviour reflects the data or experience.”.]

3.13
mathematical model
set of equations that describes the behaviour of a physical system
[SOURCE: ISO 16730-1:2015, 3.11]
3.14
mechanism-based
approach in computational modelling that aims for a structural representation
3.15
model validation
comparison between the output of the calibrated model and the measured data, independent of the data set
used for calibration
[SOURCE: ISO 14837-1:2005, 3.7]
3.16
model verification
confirmation that the mathematical elements of the model behave as intended
[SOURCE: ISO 14837-1:2005, 3.8]
3.17
molecular biomarker
biomarker
molecular marker
detectable and/or quantifiable molecule or group of molecules used to indicate a biological condition, state,
identity or characteristic of an organism (e.g. an individual)
EXAMPLE Nucleic acid sequences, proteins, small molecules such as metabolites, other molecules such as lipids
and polysaccharides.
[SOURCE: ISO 16577:2022, 3.4.28, modified — “or an” changed to “of an” and “(e.g. an individual)” added to
definition.]
3.18
personalized medicine
precision medicine
medical model using characterization of individuals’ phenotypes and genotypes for tailoring the right
therapeutic strategy for the right person at the right time, and/or to determine the predisposition to disease
and/or to deliver timely and targeted prevention
Note 1 to entry: Examples for individuals’ phenotypes and genotypes are molecular profiling, medical imaging and
lifestyle data.
Note 2 to entry: Medical decisions, prevention strategies and therapies in personalized medicine are based on this
individuality.
[10]
[SOURCE: EU 2015/C 421/03, modified — Notes 1 and 2 to entry added and “(e.g. molecular profiling,
medical imaging, lifestyle data)” deleted from definition.]
3.19
phenotype
set of observable characteristics of an organism resulting from the interaction of its genotype with the
environment
[SOURCE: ISO 4454:2022, 3.14, modified — Note 1 to entry deleted.]

3.20
raw data
data in its originally acquired, direct form from its source before subsequent processing
[SOURCE: ISO 5127:2017, 3.1.10.04]
4 Principles
4.1 General
Research in the field of personalized medicine is highly dependent on the exchange of data from different
sources, as well as harmonized integrative analysis of large-scale personalized medicine data (big data in
health research). Computational modelling approaches play a key role for understanding, simulating and
predicting the molecular processes and pathways that characterize human biology. Modelling approaches in
biomedical research also lead to a more profound understanding of the mechanisms and factors that drive
diseases, and consequently allow for adapting personalized treatment strategies that are guided by central
clinical questions. Patients can greatly benefit from this development in research that equips personalized
medicine with predictive capabilities to simulate in silico clinically relevant questions, such as the effect of
therapies, the response to drug treatments or the progression of disease.
4.2 Computational models in personalized medicine
4.2.1 General
Computational models have the potential to translate in vitro, non-clinical and clinical results (and their
related uncertainty) into descriptive or predictive expressions. The added value of such models in medicine
[11][12][13][14]
and pharmacology has increasingly been recognized by the scientific community, as well as by
[15]
regulatory bodies such as the European Medicines Agency (e.g. EMA guideline on PBPK reporting ), or
[16][17]
the US Food and Drug Administration (FDA). Computational models are integrated in different fields
in medicine as well as in the development of drugs and other health products, expanding from disease
modelling, molecular and physiological biomarker research to assessment of drug and medical device efficacy
[18]
and safety. In silico approaches are also expanding in neighbouring fields, such as pharmacoeconomics,
[19] [20][21] [22][23]
analytical chemistry and biology that are out of scope of this document .
Model creation starts with a clinical question and the collection of data (see Figure 1). The data employed
need harmonized approaches for data integration to start the model construction. The initial model usually
undergoes several refinement and improvement iterations to enhance predictive capabilities. Common
standards (see 4.3.3) should be used for the model building and curation process. Accuracy measurements
and validation processes are key, and should be transparent, while model output and function should ideally
be interpretable or explainable.
A number of computational modelling approaches in pre-clinical and clinical research already address
these questions in detail (see 4.2.2 to 4.2.6) and, therefore, play a leading role for the future development of
personalized medicine.
Figure 1 — Modelling approach for personalized medicine
4.2.2 Cellular systems biology models
4.2.2.1 General
For the simulation of complex dynamic biological processes and networks, models can be either data-driven
(“bottom-up”) or mechanism-based (“top-down”).
Mechanism-based concepts aim for a structural representation of the governing physiological processes based
[24]
on model equations with limited amount of data, which are required for the base model establishment or,
[25][26] [11][27]
alternatively, on static interacting networks. Data-driven approaches require sufficiently rich
and quantitative (e.g. time-course) data to train and to validate the model. Due to the occasional black-box
nature of data-driven approaches, the model validation process relies on performance tests against known
results.
4.2.2.2 Challenges
The challenges are as follows:
a) the creation of models that balance the level of abstraction with comprehensiveness to make modelling
efforts reproducible and reusable (abstraction versus size);
b) the development of prediction models that can be adapted easily to individual patient profiles;
c) efficient parameter estimation tools to cope with population and disease heterogeneity;

d) overfitting of the model to the experimental/patient data and optimization methods for model
predictions in a realistic parametric uncertainty;
e) flexibility in models to cope with missing data (e.g. diverse patient profiles);
f) scaling from cellular to organ and to organism levels (e.g. high clinical relevance, high hurdles for
regulatory acceptancy).
4.2.3 Risk prediction for common diseases
4.2.3.1 General
Predictive models stratify patients into distinct subgroups at different levels of risk for clinical outcomes
(risk prediction for disease). By training the algorithm on clinical data, phenotypic or genotypic subgroups
can be identified which have identifiably different patterns of clinical markers. By then identifying which
patterns a patient fits best, the model can place a particular patient within the most similar trajectory,
thereby also stratifying the patient to a particular level of risk. Clinical markers used in such models can
be any health feature, which can be tokenized to be analysable by the model. These health features range
from disease history symptoms, treatment and other exposure data, family history, laboratory data, etc., to
genetic data.
4.2.3.2 Challenges
The challenges are as follows:
a) understanding the possible implication to patients at an individual level:
1) What can be inferred?
2) How to test the inference made?
b) limited replication of measurements and analyses (e.g. genetic associations) and poor application of
diverse populations (e.g. too poorly represented to be of interest for specific analyses), specifically of
mixed or non-European ancestry;
c) varying transparency of methodological choices and reproducibility;
d) limited cellular/tissue context and harmonized functional data availability across populations/studies;
e) missing environmental information coupled to genetic data.
4.2.4 Disease course and therapy response prediction
4.2.4.1 General
Prediction of the disease behaviour (mild versus severe, stable versus progressive) early in the disease
course based on specific molecular biomarkers can allow an improved timing of therapy introduction, as
[28]
well as the choice of therapy scheme (targeted therapy). Ideally, these models can provide a prediction
of multi-factorial diseases at unprecedented resolution, in a way that clinicians can use the information in
their daily decision-making.
4.2.4.2 Challenges
The challenges are as follows:
a) harmonization and standardization of clinical information for measuring the disease of interest;
b) developing transparent and quality-controlled workflows for data generation and interpretation in
clinical settings;
c) harmonization and application of existing and upcoming pre-examination workflow standards
(including specimen collection, storage and nucleic acid isolation), as well as developing feasible ring
trial formats and external quality assurance (EQA) schemes for given molecular analysis types;
d) transparent reduction of contents and definition of appropriate marker sets and dynamic models to
foster clinical translation;
e) developing intuitive visualization results and insights into molecular analyses, as well as critical
appraisal of limitations of models by physicians.
4.2.5 Pharmacokinetic/pharmacodynamic modelling and in silico trial simulations
4.2.5.1 General
[29][30]
Pharmacokinetic/pharmacodynamic (PK/PD) models can translate in vitro, non-clinical and clinical
PK/PD data into meaningful information to support decision-making. At the individual level, substance
PKs can either be described by non-compartmental analysis and compartmental PK modelling or by
physiologically-based PK (PBPK) modelling. PBPK models are commonly used for interspecies extrapolations
and drug-drug interactions modelling. At the population level, population PK models have become the
most commonly used top-down models that derive a pharmaco-statistical model from observed systemic
concentrations. PK/PD modelling involves on the one hand a quantification of drug absorption, disposition,
metabolism and excretion (PK) and on the other hand a description of the drug-induced effect (PD). PK/PD
models and quantitative systems pharmacology (QSP) both aim for mechanistic and quantitative analyses of
[31]
the interactions between a substance such as a drug and a specific biological system .
PK and PBPK modelling are currently used for simulations for virtual patient populations in in silico clinical
trials. The concept is that computer simulations are proposed as an alternative source of evidence to support
drug development to reduce, refine, complement or replace the established data sources including in vitro
experiments, in vivo animal studies and clinical trials in healthy volunteers and patients.
4.2.5.2 Challenges
The challenges are as follows:
a) reliable data sources for systems-related parameters are currently limited;
b) methods for data generation, collection and integration are not standardized;
[32]
c) the reporting of results is very heterogeneous and inconsistent ;
d) tools to be used and criteria for model evaluation are very variable across projects;
e) a very limited number of platforms (systems model) are currently considered reliable and qualified for
regulatory submission.
4.2.6 Artificial intelligence systems (AI systems)
4.2.6.1 General
Data-driven approaches, utilizing AI systems and machine learning (ML) treat the mechanism as unknown
and aim to model a function that operates on data input to predict the outcome, regardless of the unknown
physiological processes. The mechanisms operating in the complex systems being modelled, i.e. which
factors together drive outcomes, are considered too complex to be determined (e.g. black-box models). The
quality of AI systems is assessed through the accuracy of their predictions, tested in a variety of ways. These
data-driven models can be applied in a hypothesis-naive way, made as to which factors drive the causal
mechanism.
ML approaches learn the theory automatically from the data through a process of inference, model fitting or
[33]
learning from examples. ML can be supervised, unsupervised or partially supervised (see Annex B).

4.2.6.2 Challenges
The challenges are as follows:
a) imprecise reporting, which makes it difficult to obtain the full benefit of results, navigate biomedical
literature and generate clinically actionable findings;
b) data standardization, since most in silico methods require comparable input data;
c) data based on group associations, or pre-determined understanding of clinical relationships, can bias
and limit AI/ML predictions (inappropriately pre-processed data);
d) different proprietary systems in healthcare information technology (IT) make data extraction, labelling,
interpretation and standardization highly complex procedures (data lockdown).
4.3 Standardization needs for computational models
4.3.1 General
Major challenges in the field of personalized medicine are to harmonize the standardization efforts that
refer to different data types, approaches and technologies, as well as to make the standards interoperable,
so that the data can be compared and integrated into models. Reproducible modelling in personalized
medicine requires a basic understanding of the modelled system, as well as of its biological and physiological
background, and finally of the applied virtual experiments.
Because of the heterogeneous nature of the data in personalized medicine, harmonized strategies for data
integration are required that utilize broadly applicable standards to allow for reproducible data exploitation
to generate new knowledge for medical benefits. Whereas the model simulation process itself can vary
greatly or be even partially unknown, e.g. in AI systems, making it hard to standardize, the integration of
data into the model (input), as well as the outcome of the model (output) can be standardized and validated.
Extensive model validation, e.g. with a set of standardized and high-quality validation data as input, can
be used to validate the whole modelling process, even if the model simulation itself is not standardized.
The two key components for which broad standardization efforts make most sense in the model building
process are thus data integration and model validation (see Figure 2).
Figure 2 — Data integration and model validation as key factors for standardization requirements
for computational models
4.3.2 Challenges
Although for many different data types used in personalized medicine there are domain-specific annotation
standards and terminologies available (see Tables A.1 to A.4), the process of model building possesses the
following variety of challenges:
a) high degree of variability regarding data types (structured versus unstructured, molecular, clinical,
laboratory, patient-reported, etc.);
b) differences in coding and calculation within data types (between-machine variability, different
measurements, etc.);
c) heterogeneous utilization of existing data and lack of domain- and data-specific standard methods for
data pre-processing;
d) high effort of harmonization of data concepts in terms of time, resources and cost;
e) models relevant for clinical use need to be fit for purpose;
f) differences in IT systems used in data generation, e.g. enterprise resource planning systems and
laboratory result software or hardware, at national, regional or clinical centre level;
g) lack of standard workflows (compliant with national and regional regulations and laws) for personal
health data access and processing;
h) lack of training, awareness and empowerment for existing standards and workflows;
i) adoption of different domain-specific terminology standards for health data such as SNOMED CT, NPU
(Nomenclature for Properties and Units) or LOINC (Logical Observation Identifier Names and Codes);
j) differences in implementation of international terminologies such as the International Classification of
Diseases (ICD);
k) long-term variety and dynamics of data and standards;
l) language differences in unstructured text, and other factors.
4.3.3 Common standards relevant for personalized medicine
The use of common standards developed by specific user communities and different stakeholders, as well
as standard-defining organizations, has been enhanced as they have been coupled to tools, which have
spread in the respective field of research. Tables A.1 to A.4 provide an overview of some of these standards
currently in use by different communities.
4.4 Data preparation for integration into computer models
4.4.1 General
Computational models in the life sciences in general as well as in healthcare and personalized medicine
research in particular are increasingly incorporating rich and varied data sets to capture multiple aspects
of the modelled phenomenon. Data types are encoded in technology and subdomain specific formats and the
variety and incompatibility, as well as lack of interoperability, of such data formats have been noted as one of
the major hurdles for data preparation.
To allow for seamless integration of data used for the construction of predictive computational models in
personalized medicine, these data shall:
a) include or be annotated with sampling and specimen data that follow the requirements and
recommendations in accordance with the relevant domain-specific standards;
b) be formatted using generally accepted and interoperable standard data formats commonly used for the
corresponding data types in accordance with ISO 20691;
c) include or be annotated with descriptive metadata that consider generally accepted domain-specific
minimum information guidelines and describe the metadata attributes and entities using semantic
standards, standard terminologies, controlled vocabularies and ontologies as specified in ISO 20691;
d) follow best practice requirements and recommendations of generally accepted domain-specific data
interoperability frameworks;
e) be structured in a way that allows integration of the data into a model, together with other data;
f) include or be annotated with data provenance information that allows for tracking of the data and
source material throughout the whole data processing and modelling;

g) be made accessible via harmonized data access agreements (hDAAs) for controlled access data, if open
access to the data is not possible.
4.4.2 Pre-examination data
NOTE 1 Generally, dedicated measures need to be taken for collecting, stabilizing, transporting, storing and
processing of biological specimen/samples, to ensure that profiles of analytes of interest (e.g. gene sequence,
transcript, protein, metabolite) for examination are not changed ex vivo. Without these measures, analyte profiles
can change drastically during and after specimen collection, thus making the outcome from diagnostics or research
unreliable or even impossible, because the subsequent examination cannot determine the situation in the patient, but
determines an artificial profile generated during the pre-examination process.
NOTE 2 Important measures include, for example, times and temperatures of sample transportation not exceeding
the specifications provided in relevant International Standards (e.g. ISO 20916, ISO 20186-1, ISO 20658), giving
guidelines on all steps of the pre-examination workflow.
Measurement methods for analysing specim
...