CRDC-H Draft Model Presentation to Nodes

June 2020
These slides: bit.ly/ccdh-prototype-june2020
CRDC-H Draft Model Presentation to Nodes

Community
Development
(lead: Volchenboum;
co-lead Vasilevsky)
Data Model
harmonization
(lead: Chute;
co-lead Furner)
Ontology & Terminology
Ecosystem
(lead: Solbrig)
Tools & Data Quality
(lead: Balhoff)
Program Management and operations:
(lead: Haendel, co-lead Munoz-Torres)
Programmatic oversight:
CBIIT: Sherri De Coronado, Allen Dearry
FNL: Todd Pihl, Resham Kulkarni

From Practice-based Evidence
to Evidence-based Practice
Clinical
Databases
Registries
et al.
Clinical
Guidelines
Expert
Systems
Data Inference
Knowledge
Management
Decision
support
Terminologies and data models provide the consistency and comparability
essential for a Learning Health System
Patient
Encounters
Medical
Knowledge
Terminologies
Data models

Role of CCDH in the CRDC ecosystem
Facilitate retrospective and
prospective semantic harmonization of
data across nodes of the CRDC
Coordinate the community to ensure
quality “fit for purpose” design and
implementation of standards that will
facilitate interoperability of
heterogeneous data types and CRDC
resources
Find agreement across the
communities built around CRDC
- match and extend data models
- annotation, harmonization
- quality assurance

Data Model
harmonization
(lead: Chute
co-lead: Furner)
Ontology & Terminology
Ecosystem
(lead: Solbrig)
Tools & Data Quality
(lead: Balhoff)
Schema to
schema
OMOP to
FHIR
Term to
Term
Oncotree to
NCIt
Data records to
data records
“Smoking status
>7 packs per day”
to NCIT:C154510
[Heavy Smoker]

Data model harmonization
Structure:
Syntactic
Concept:
Representation
Ontology:
Meaning in
context
Relationships:
Connections

● Goal is to support harmonization of equivalent data elements in disparate models to
enable cross-node querying and data aggregation
● Node models have developed somewhat independently to fit specific use cases
○ Overall modeling space is broad: there is overlap, but each model covers unique semantic space
○ Divergence in modeling approach: equivalent entities and properties are not always captured in
syntactically equivalent ways
○ Heterogeneity of source data model artifacts
● The CCDH Data Model Harmonization group is defining a shared data model for use
across the CRDC, leveraging existing standards (e.g. FHIR, BRIDG) where possible.
● This harmonized model (CRDC-H) and terminological infrastructure are being designed to
meet the needs of systems like the Cancer Data Aggregator (CDA) that support integrated
search and metadata-based analyses across datasets in the CRDC ecosystem.
Data Model Harmonization: Overview

Data Model Harmonization: Overview
● Phase 1 has focused on foundational effort necessary to support more nuanced work in
additional phases
○ Phase 1 work was exploratory and the modeling abstract
● Phase 2 will provide more concrete model useful for implementation
○ Converge on a modeling and implementation approach that will work for CRDC

Five steps
in the
CRDC-H
Model
Development
Workflow
An iterative process through which content of source models is evaluated, aggregated,
mapped, and refactored into a standards- aligned and harmonized data model.
CRDC-H Model Development Workflow
Abstract specification
Low harmonization
Not standards-aligned
Concrete specification
Deep harmonization
Standards-aligned

(1) Standardized concept map and spreadsheet representations of source node models
provide a consistent, comparable, and computable substrate for harmonization efforts
Step 1:
Standardize
Source
Data Model
Documentation
Low harmonization
Deep harmonization
Standards-aligned
Links:
- Source model cmaps
- Standardized data dictionaries

(2) Equivalent elements are merged across sources to produce a single aggregated model,
providing a unified view of all information that the final CRDC-H model must represent.
Step 2.
Generate an
Aggregated
Data Model
(ADM)
Low harmonization
Deep harmonization
Standards-aligned
Links:
- ADM cmap
- ADM data dictionary
- February Progress Report
and Slide Deck

(3) Mappings of ADM elements to standard models like BRIDG and FHIR facilitate
understanding of source models, and development of a standards-aligned model.
Step 3.
Map the ADM
to Community
Standard Data
Models
Low harmonization
Deep harmonization
Standards-aligned

(4) Deeper harmonization is achieved as ADM elements are refactored into a more
normalized and standards-aligned conceptual domain model (CDM)
Step 4.
Refactor the
ADM into a
Conceptual
Domain Model
(CDM)
Low harmonization
Deep harmonization
Standards-aligned

(5) Mature elements of the CDM are refined into a concrete logical model, the CRDC-H,
which that will support implementation by CRDC nodes and the CDA
Step 5.
Refactor the
CDM into a
Logical Data
Model
(CRDC-H)
Low harmonization
Deep harmonization
Standards-aligned

Lessons learned from this initial deep dive will inform subsequent iterations that
incorporate new data sources and domains.
First Iteration:
Biospecimen
and
Administrative
entities from
GDC, PDC,
ICDC, and
HTAN
Low harmonization
Deep harmonization
Standards-aligned

1. The Aggregated Data Model (ADM) (Step 2)
2. Mapping the ADM to Standard Data Models (Step 3)
3. Refactoring the ADM into a CDM Prototype (Step 4)
4. Next Steps and Future Directions
Outline

The Aggregated Data Model
(ADM)
The substrate for mapping and refactoring efforts

The Aggregated Data Model (ADM)
The ADM represents the union of all elements
across our set of source data models, where
‘equivalent’ entities and attributes are merged.
● Provides a unified view of all information that
the final CRDC-H model must represent.
● Captures an initial set of entity and property
mappings across sources.
● Serves as a base data model that can be
evolved incrementally into a final CRDC-H

An excerpt from the ADM Data Dictionary showing the ADM.Program.name property, which
aggregates and deprecates equivalent properties from GDC, PDC, and ICDC models.
Content from standardized source dictionaries is merged and reorganized in a single sheet
1. Equivalent entities are collapsed into single record, with source definitions retained (rows 1-5)
2. Within an aggregated entity (EA), properties are ordered to group those that are equivalent (rows 8-10)
3. A new ADM row is created for each unique property in the aggregated entity (‘PA’, green row 7)
4. Rows for source properties it aggregates are marked as deprecated (‘PD’, yellow rows 8-10)
The Aggregated Data Model (ADM): Data Dictionary

The Aggregated Data Model (ADM)
Node models are not very well aligned at the outset
● e.g. ICDC and GDC: ~30% entity equivalence , <5% attribute equivalence
Source Model
Alignment
Property aggregation in the ADM was based on superficial analysis strict
aggregation criteria
● Only strictly equivalent elements within strictly equivalent entities are merged
Deeper aggregation and harmonization of elements will be achieved as the
ADM is refactored into the CDM.
The ADM as a whole is large and flat (55 entities, 984 attributes)

ICDC.birthdate vs GDC.birthyear: capture same concept at different level of precision - not aggregated in ADM
Harmonization in the ADM is Minimal
(1)
Examples:
GDC.gender vs ICDC.sex: capture related but distinct concept, using same values (M, F) - not aggregated in ADM(2)

Harmonization in the ADM is Minimal
Examples:
ADM.freezing_method and ADM.preservation_method: separate properties for different types of
specimen processing methods - not aggregated in ADM
(3)
While harmonization achieved in the ADM is minimal, it will serve as a substrate for
mapping and refactoring toward a much more deeply harmonized CDM prototype,
and maintain mappings back to elements in source node models.

Mapping the ADM to
Domain Standard Models
BRIDG and FHIR

● The BRIDG (Biomedical Research Integrated Domain Group) Model is a UML-based
Conceptual Model covering the domains of clinical and translational research
● A collaborative effort engaging stakeholders from CDISC, HL7, ISO, NCI, and the FDA
● Not an implementation model, but can be refined into a logical data model to support
application in data systems.
● One common use is as a 'hub' supporting cross-model mapping between any two models
that have individually been mapped to BRIDG
● Supporting infrastructure maintains computable mappings of BRIDG to community
models, and links to common data elements in semantic standards like the caDSR.
The BRIDG Conceptual Model
http://bridgmodel.nci.nih.gov

The BRIDG
Concept Map
shows scope
of the model
and high-level
concepts it
covers
https://cbiit.github.io/bridg-model/HTML/BRIDG5.3.1/EARoot/EA1.htm
The BRIDG Conceptual Model: Coverage and Scope

The
Comprehensive
BRIDG UML
Diagram shows
attributes and
relationships of
all classes in
the model
https://cbiit.github.io/bridg-model/HTML/BRIDG5.3.1/EARoot/EA3.htm
The BRIDG Conceptual Model: Full Model

BRIDG
Biospecimen
View shows
only modeling
related to this
subdomain
https://cbiit.github.io/bridg-model/HTML/BRIDG5.3.1/EARoot/EA2/EA51.htm
The BRIDG Conceptual Model: Biospecimen Subdomain

● Analysts from Samvit Solutions on loan from NCI CBIIT assisted in the mapping
process (Smita Hastak, Wendy Ver Hoef, Charles Yaghmour)
● Utilized a standard spreadsheet-based mapping template, widely used for other
BRIDG mapping efforts (e.g. OMOP, Sentinel, i2b2, mCODE)
● Mappings are defined as ‘paths’, rooted at the BRIDG equivalent of the mapped
ADM class (e.g. BRIDG.BiologicSpecimen for ADM.Sample)
● Mapping path for ADM.Sample.freezing_method:
● Full mapping spreadsheet located here (‘Mappings’ sheet, column K)
ADM -> BRIDG Mapping: Process and Tools
BiologicSpecimen <--beAFunctionPerformedBy-- Subject <--beParticipatedInBy--
PerformedMaterialProcessStep.methodCode
WHERE PerformedMaterialProcessStep--instantiate-->DefinedMaterialProcessStep.nameCode="freeze"

ADM -> BRIDG Mapping: Covering Model Diagrams
‘Covering’ views show all the classes and patterns in the BRIDG model needed to represent the content
of a single ADM entity (shown here for ADM.Sample)

The yellow path traces the BRIDG mapping for ADM.Sample.preinvasive_morphology, from the
PerformedDiagnosis.value field holding the data, to the BiologicSpecimen class rooting the mapping.
Start
End

ADM -> BRIDG Mapping: Applications and Benefits
1. Provides Semantic Clarity to Source Models
a. Forces us to deeply understand the meaning and utility of each ADM element
b. Highlights areas where node models or documentation are unclear or duplicative
2. Enables Cross-Model Mappings
a. Facilitates mappings to other models mapped to BRIDG
(e.g. OMOP, Sentinel, ACT/i2b2, PCORNet, HL7 FHIR mCODE IG, ...)
b. Provides a connection to the NCI semantic infrastructure and standards
(e.g. caDSR, EVS)
3. Informs ADM -> CDM Refactoring
a. Represents a hyper-normalized counterpoint to the flat node models in the ADM,
ensuring our harmonized model is grounded in reality.

● FHIR is a data exchange model and API framework
● Primary domain is patient-level healthcare data from EHRs
● Provides set of core resources, and a profiling mechanism that allows
implementations to add custom constraints and extensions to core resources
● Implementation Guides instruct implementers on how to assemble profiles into
exchange schema tailored for a specific community, application, or use case.
● Widely used in healthcare settings, with developing coverage of research
concepts, making it attractive candidate for re-use or alignment in our work.
Fast Healthcare Interoperability Resource (FHIR) Model
https://www.hl7.org/fhir/index.html

Catalog and
Example
Specification
of FHIR
Resources
https://www.hl7.org/fhir/index.html, https://www.hl7.org/fhir/specimen.html
Fast Healthcare Interoperability Resource (FHIR) Model

Structure:
Syntactic
Code/Value Set:
Representation
Ontology:
Meaning in
context
Relationships:
Connections

● Adapted the BRIDG-Mapping template to accommodate FHIR mappings
● Applied the BRIDG mapping path syntax to FHIR Resource model
(so mappings expressed with same language and level of granularity)
● FHIR mapping paths are typically shorter/simpler than those for the more
highly normalized BRIDG model
● Mapping path for ADM.Sample.freezing_method:
● Full mapping spreadsheet located here (‘Mappings’ sheet, column S)
ADM -> FHIR Mapping: Process and Tools
Specimen --processing--> Processing.procedure(CodableConcept)

ADM -> FHIR Mapping: Covering Model Diagrams
‘Covering’ views show all the classes and patterns in the FHIR models needed to
represent the content of a single ADM entity (shown here for ADM.Sample)

ADM -> FHIR Mapping: Applications and Benefits
1. Target for Model Alignment and Re-Use
a. FHIR provided a pragmatic target to guide CDM modeling - a middle ground
between the ADM and BRIDG
2. Interoperability with Clinical Data Systems
a. Alignment may facilitate broader interoperability with clinical systems that have
adopted FHIR
3. Potential to Leverage FHIR Infrastructure and Tooling
a. Use of the FHIR metamodel and/or Resource models can let us leverage tools
supporting API implementation, data validation, and automated documentation

ADM Models
Represented
using FHIR
Metamodel,
and generated
documentation
https://fhir.hotecosystem.org/ccdh/fhir/, https://fhir.hotecosystem.org/ccdh/fhir/aliquot.html
FHIR as a Modeling Framework

FHIR Resources Models For CCDH Data Harmonization
Model in Google Sheets FHIR Resource Model (Spreadsheet)
FHIR Resource
https://fhir.hotecosystem.org/ccdh/fhir/
FHIR Publish Process
caDSR identifiers
https://github.com/HOT-Ecosystem/cadsr-from-gdrive

Structure:
Syntactic
Code/Value Set:
Representation
Ontology:
Meaning in
context
Relationships:
Connections
ISO 11179-3

Structure:
Syntactic
Code/Value Set:
Representation
Ontology:
Meaning in
context
Relationships:
Connections
ISO 11179-3
CTS2

The CCDH Conceptual Domain
Model (CDM) Prototype
A Standards-Informed Refactoring of the ADM

Scope of
Phase 1
Effort
The CCDH Conceptual Domain Model
Subdomains:
● Biospecimen: Sample, Portion, Analyte, Aliquot, Slide
● Administrative: Case, Project, Program, Tissue Source Site, Center
Sources:
● CRDCs: GDC, PDC, ICDC, HTAN
● Standards: BRIDG, FHIR
Model Components Harmonized:
● Yes: Entities, Relationships, Properties
● No: Data Types, Value Set and Terminologies
Level of Formalization:
● An abstract conceptual model exploring different modeling approaches.
● Formalization into a concrete implementation model to follow in Phase 2.

Entity-
Level View
of Model
Refactoring
Model structure before and after refactoring of the ADM into the more normalized CDM
(Administrative (blue) and Biospecimen (orange) subdomains only)
ADM CDM
refactoring
144 specimen
properties in
total
74 specimen
properties in
total

Property-
Level View
of Model
Refactoring
Harmonization of properties capturing
specimen processing methods, as
source models are aggregated and
refactored into the CDM.
● During aggregation, five separate
properties found across source node
models are merged into two
properties in the ADM.
● During refactoring of the ADM into
the CDM, these two properties get
merged into a single ‘method’
property.
● The CDM ‘method’ element provides
a more flexible and generic structure
that will accommodate any type of
method, where some semantics get
pushed into the terminology.
refactoring
ADM
(2 properties)
aggregation
Node Models
(5 properties)
aggregation
CDM
(1 property)

Detailed
View of
CDM
Entities
and
Attributes
Entities in the CDM
prototype, and the
attributes held by each
Attribute count shown in
parentheses.

CDM Data
Dictionary
(link)
● The CDM prototype is presently specified as a spreadsheet-based data dictionary
● Entities and their Attributes are each described in a separate sheet
● Cardinality of attributes is specified to be as permissive as possible initially
● Data Types are minimally specified
○ Simple: declared only at a high level (limited to literal, boolean)
○ Complex: proposals for Identifier, Coding, DateTime, Quantity, . . .
● A ‘Referenced Entities’ sheet lists entities that are referenced in CDM relationships,
but are not in scope to model in this phase of work.
○ e.g. Organization, Visit, ConditionDiagnosis
● A ‘Data Containers’ sheet holds placeholders for objects that will be defined to group
sets of related properties (specific structures for these t.b.d.)
● Mappings of several types are also provided in the main Entity sheets:
○ ADM attributes that map to each CDM attribute (column L)
○ Source node attributes aggregated by these ADM attributes (column M)
○ CDM to FHIR mappings (column N)

1. Use of
Complex
Data Types
Key Features and Design Decisions
We explore the use of several complex data types to represent certain kinds of
related information
1. Identifier: groups an external identifier value with info about its source
a. avoids need for multiple source-specific identifier properties
2. Coding: formal structure for enumerated values that groups a code with its label and info
about its source
a. avoids need for separate properties for label and id)
3. DateTime: supports different ways to represent a date or time (precise vs offset)
a. avoids need for different properties to capture dates in different representations or formats
Pros: concise way to represent specific types of information using fewer properties
Cons: may add level of nesting that needs to be traversed to find data

2. Collapsing
Specimen
Entity
Subtypes
● A single CDM.Specimen entity covers entities distinguished at the class level in
some node models (Sample, Portion, Aliquot, Analyte, Slide)
● The Specimen.specimen_type property is used to indicate which of these more
specific types a particular instance represents.
● The goal here is to keep the initial prototype simpler, and reduce the redundancy of
properties that appear across specimen subtypes in the ADM
● This decision can be reversed if challenges are encountered, or we conclude that the
differences between these warrants an explicit entity-level distinction

3. Location of
Domain
Semantics:
‘In the Model’
vs
‘In the Data’
Where node models in the ADM lean heavily toward hard coding domain semantics in the model
itself, the CDM explores several approaches to capturing more of the semantics in the data.
Consider how Specimen composition measurements are represented:
The CompositionMeasurement object is an example of what we call 'Data Containers' in the CDM
● placeholders that will be formalized once we accrue the requirements needed to commit to
a specific type of structure.
Approaches like this let us achieve a deeper level of aggregation and harmonization, and better
accommodate future data and use cases.
Value Set = ‘non tumor tissue area’,
‘tumor tissue area’, ‘percentage tumor’,
‘percentage stroma’, ‘analysis area’, . . .
Value
Set

Future Directions and Next
Steps
Continued Evolution Toward the CRDC-H

● Phase 1 has focused on foundational effort necessary to support more nuanced work in additional
phases
○ Phase 1 work was exploratory and the modeling abstract
● Phase 2 will provide concrete model useful for implementation
○ Converge on a modeling and implementation approach that will work for CRDC
Continued Evolution of the CDM

Multiple streams of activity in Phase II
● Stream One: Incorporate additional CRDC
source nodes/models into the ADM (Steps 1
and 2)
○ HTAN
○ IDC
● Stream Two: Incorporate additional ADM
entities into the CDM (Steps 3 and 4)
○ Clinical subdomain entities
○ Input from stakeholders critical in
guiding this evolution

Multiple streams of activity in Phase II
● Stream Three: Evolve the existing CDM into
an implementable logical model (Step 5)
○ Further exploration of FHIR meta-
modeling language and biolinkml as
candidate languages for representing
CRDC-H with input from nodes and
CDA
● Test / validate the current CDM prototype
○ against feedback from nodes
○ against source node data
○ against competency queries
○ against requirements from other stakeholders
● Terminology / value set harmonization

● Melissa Haendel
● Christopher Chute
● Sam Volchenboum
● Jim Balhoff
● Nicole Vasilevsky
● Harold Solbrig
● Brian Furner
● Monica Munoz-Torres
● Anne Thessen
● Bill Duncan
● Davera Gabriel
● Dazhi Jiao
Acknowledgements
Center for Cancer Data Harmonization Center for Biomedical Informatics
& Information Technology
● Allen Dearry
● Sherri de Coronado
● Melissa Cook
Samvit Solutions
● Smita Hastak
● Wendy Ver Hoef
● Charles Yaghmour
● Todd Pihl
● Resham Kulkarni
Frederick National Laboratory
for Cancer Research
● Gaurav Vaidya
● Julie McMurry
● Kat Blumhardt
● Maura Kush
● Matt Brush
● Monica Palese
● Richard Zhu
● Steven Cox
● Shahim Essaid
● Shalki Shrivastava
● Tricia Francis

CRDC-H Draft Model Presentation to Nodes

Source Node (aka 'Source', 'Node'): sources of data that our data models are being built to support/accommodate. Most are proper Data Commons,
some are Data Coordinating Centers, some or related data collection efforts like HTAN.
CRDC-H = Cancer Research Data Commons Harmonized Model. This is the final, fully harmonized, implementable specification.
● Status: Not yet being developed, but the CDM will evolve into the CRDC-H as it modeling matures we commit to a formal modeling
language/framework to specify the model.
CDM = Conceptual Domain Model. A prototype that will evolve into the final CRDC-H model. Created by refactoring the ADM into a more deeply
harmonized model, aligned with standards like BRIDG and FHIR as possible.
● Sources: Currently covers models from GDC, PDC, ICDC, HTAN.
● Scope: Currently covers only the Biospecimen and Administrative subdomains
● Status: Actively evolving. Parts are incomplete, and defined at a more abstract/conceptual level - so not suitable for implementation at this time.
ADM = Aggregated Data Model. Simple aggregation of content from source node models into a single artifact. Strictly equivalent entities and
properties are collapsed, but overall harmonization provided by the ADM is minimal.
● Sources: Currently incorporates GDC, PDC, ICDC models, and the Level 1 Biospecimen model from HTAN
● Scope: Currently covers all subdomains
● Status: Not actively evolving, but will grow as we tackle new sources and elements of their model are incorporated
Key Terms and Definitions

The Aggregated Data Model (ADM): Concept Map
GDC
PDC
ICDC
Aggregated Data
Model (ADM)

The yellow path traces the BRIDG mapping for ADM.Sample.freezing_method, from the
PerformedMaterialProcessStep.method field holding the data, to the BiologicSpecimen root of the mapping.

Patient vs
Research
Subject Roles
● ADM.Case entity refactored into CDM.Patient and CDM.ResearchSubject
● Provides support for the use case
of a single individual being a
research subject on more than one
study
○ Assumes there are
mechanisms in place to
de-duplicate patients who
may exist in multiple different
repositories (e.g. USI in
pediatric cancer)

ADM
attributes
Mapping
to CDM
Entities
Entities in the CDM
prototype, holding
attributes form the ADM
that map into each.
Counts of mapped ADM
attributes in parentheses.

● Concept maps support
high-level understanding and
comparison of scope and
structure
● Entities in each cmap are
annotated with a count of
properties and relationships
they contain.
● Entities are color-coded
according to the subdomain
they cover.
● Diagrams for all node models
can be found here.
I. Standardized Data Model Documentation: Concept Maps
GDC Concept Map

An excerpt of the GDC.Case entry in the Google Sheets format used to standardize documentation
across all source nodes. Complete dictionaries for GDC, PDC, anD ICDC models are here.
I. Standardized Data Model Documentation: Data Dictionaries

I. Standardized Data Model Documentation: Metrics
Analysis of standardized documentation quantifies size and coverage of each model
Element Density (average P + R per E)
Model Density
GDC 21.6
PDC 23.8
ICDC 9.8
Element Counts in Source Data Models
Model Entity Relationship Property
GDC 26 34 527
PDC 21 27 473
ICDC 27 34 231
AD-Administrative, BP-Biospecimen Processing, BA-Biospecimen Analysis, CC-Cross-sectional Clinical, LC-Longitudinal Clinical, ST-Study, FI-FIle, BI-Biological.

II. Aggregated Data Model: Initial Mapping Metrics
● Metrics reflect mappings based on very strict criteria (full equivalence within an aggregated entity)
● GDC-PDC models show significant similarity (~50% E mapping and 35% P+R mapping)
● The ICDC model is very different from GDC/PDC (~30% E mapping and <5% P+R mapping).
● Many differences are related to the distinct biology and privacy considerations of the species the nodes
cover (dog vs human), and the differences in scope of the models (e.g. ICDC focus on clinical studies).
AD-Administrative, BP-Biospecimen Processing, BA-Biospecimen Analysis, CC-Cross-sectional Clinical, LC-Longitudinal Clinical, ST-Study, FI-FIle, BI-Biological.

II. Aggregated Data Model: Early Outcomes and Insights
● Differences in Scope - e.g. the ICDC model covers aspects of clinical trial design and execution not in
scope for GDC and PDC, but lacks a rich representation of biospecimen processing found in other models.
● Differences in Granularity - e.g. GDC model goes into much finer detail about specific tumor staging
systems and evidence than does the ICDC.
● Differences in Structure - e.g. the ICDC defines a larger set of more specialized entities to capture
clinical metadata than do GDC and PDC.
● Differences in Semantics: e.g. different elements or values are used for representing the same type of
information (gender vs sex, birth_date vs birth_year)
● Differences in Terminology - e.g. use of same term in different ways ('Study' in PDC vs ICDC), and use
of different terms for same concept (‘Treatment’ vs ‘Agent Administration’)
We have and will continue to identify many categories of differences to address in
harmonization efforts:

CRDC-H Draft Model Presentation to Nodes

More Related Content

What's hot (17)

Similar to CRDC-H Draft Model Presentation to Nodes (20)

More from Nicole Vasilevsky (13)

Recently uploaded (20)

CRDC-H Draft Model Presentation to Nodes