Bergquistetal-pilotingamodletodataapproach.pdf

Research and Applications

Piloting a model-to-data approach to enable predictive

analytics in health care through patient mortality

prediction

Timothy Bergquist1,

*, Yao Yan2,

*, Thomas Schaffter3, Thomas Yu

3, Vikas Pejaver

1,

Noah Hammarlund1, Justin Prosser4, Justin Guinney1,3, and Sean Mooney1

1Biomedical Informatics and Medical Education, University of Washington, Seattle, Washington, USA, 2Molecular Engineering

and Sciences Institute, University of Washington, Seattle, Washington, USA, 3Sage Bionetworks, Seattle, Washington, USA4Insti-

tute for Translational Health Sciences, University of Washington, Seattle, Washington, USA

*These authors contributed equally.

Corresponding Author: Sean Mooney, PhD, Biomedical Informatics and Medical Education, University of Washington, Seattle,

WA 98195, USA; sdmooney@uw.edu

Received 11 December 2019; Revised 16 April 2020; Editorial Decision 20 April 2020; Accepted 6 May 2020

ABSTRACT

Objective: The development of predictive models for clinical application requires the availability of electronic

health record (EHR) data, which is complicated by patient privacy concerns. We showcase the “Model to Data”

(MTD) approach as a new mechanism to make private clinical data available for the development of predictive

models. Under this framework, we eliminate researchers’ direct interaction with patient data by delivering con-

tainerized models to the EHR data.

Materials and Methods: We operationalize the MTD framework using the Synapse collaboration platform and

an on-premises secure computing environment at the University of Washington hosting EHR data. Container-

ized mortality prediction models developed by a model developer, were delivered to the University of Washing-

ton via Synapse, where the models were trained and evaluated. Model performance metrics were returned to

the model developer.

Results: The model developer was able to develop 3 mortality prediction models under the MTD framework us-

ing simple demographic features (area under the receiver-operating characteristic curve [AUROC], 0.693), dem-

ographics and 5 common chronic diseases (AUROC, 0.861), and the 1000 most common features from the

EHR’s condition/procedure/drug domains (AUROC, 0.921).

Discussion: We demonstrate the feasibility of the MTD framework to facilitate the development of predictive

models on private EHR data, enabled by common data models and containerization software. We identify chal-

lenges that both the model developer and the health system information technology group encountered and

propose future efforts to improve implementation.

Conclusions: The MTD framework lowers the barrier of access to EHR data and can accelerate the development

and evaluation of clinical prediction models.

Key words: electronic health records, clinical informatics, data sharing, privacy, data science

VC The Author(s) 2020. Published by Oxford University Press on behalf of the American Medical Informatics Association.

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/),

which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact

journals.permissions@oup.com 1393

Journal of the American Medical Informatics Association, 27(9), 2020, 1393–1400

doi: 10.1093/jamia/ocaa083

Advance Access Publication Date: 8 July 2020

Research and Applications

Dow

nloaded from https://academ

ic.oup.com/jam

ia/article/27/9/1393/5868591 by guest on 29 March 2022

INTRODUCTION

Electronic health records and the future of data-driven

health careHealthcare providers substantially increased their use of electronic

health record (EHR) systems in the past decade.1 While the primary

drivers of EHR adoption were the 2009 Health Information Tech-

nology for Economic and Clinical Health Act and the data exchange

capabilities of EHRs,2 secondary use of EHR data to improve clini-

cal decision support and healthcare quality also contributed to

large-scale adoption.3 EHRs contain a rich set of information about

patients and their health history, including doctors’ notes, medica-

tions prescribed, and billing codes.4 The prevalence of EHR systems

in hospitals enables the accumulation and utilization of large clinical

data to address specific clinical questions. Given the size and com-

plexity of these data, machine learning approaches provide insights

in a more automated and scalable manner.5,6 Healthcare providers

have already begun to implement predictive analytics solutions to

optimize patient care, including models for 30-day readmissions,

mortality, and sepsis.7 As hospitals improve data capture quality

and quantity, opportunities for more granular and impactful predic-

tion questions will become more prevalent.

Hurdles to clinical data accessHealthcare institutions face the challenge of balancing patient pri-

vacy and EHR data utilization.8 Regulatory policies such as Health

Insurance Portability and Accountability Act and Health Informa-

tion Technology for Economic and Clinical Health Act place the

onus and financial burden of ensuring the security and privacy of

the patient records on the healthcare institutions hosting the data. A

consequence of these regulations is the difficulty of sharing clinical

data within the research community. Research collaborations are of-

ten bound by highly restrictive data use agreements or business asso-

ciate agreements limiting the scope, duration, quantities, and types

of EHR data that can be shared.9 This friction has slowed, if not im-

peded, researchers’ abilities to build and test clinical models.9 While

these data host-researcher relationships are important and lead to

impactful collaborations, they are often limited to intrainstitution

collaborations, relegating many researchers with no healthcare insti-

tution connections to smaller public datasets or inferior synthetic

data. One exception to this is the patient-level prediction working

group in the Observational Health Data Sciences and Informatics

community, which developed a framework for building and exter-

nally validating machine learning models.10 While the PLP group

has successfully streamlined the process to externally validate model

performance, there is still an assumption that the model developers

have direct access to an EHR dataset that conforms to the Observa-

tional Medical Outcomes Partnerships (OMOP) Common Data

Model (CDM),11,12 on which they can develop their models. In or-

der to support model building and testing more widely in the re-

search community, new governance models and technological

systems are needed to minimize the risk of reidentification of

patients, while maximizing the ease of access and use of the clinical

data.

Methods for sharing clinical dataDe-identification of EHR data and the generation of synthetic EHR

data are 2 solutions to enable clinical data sharing. De-identification

methods focus on removing or obfuscating the 18 identifiers that

make up the protected health information as defined by the Health

Insurance Portability and Accountability Act.13 De-identification

reduces the risk of information leakage but may still leave a unique

fingerprint of information that is susceptible to reidentification.13,14

De-identified datasets like MIMIC-III are available for research and

have led to innovative research studies.15–17 However, these datasets

are either limited in size (MIMIC-III [Medical Information Mart for

Intensive Care-III] only includes 38 597 distinct adult patients and

49 785 hospital admissions), scope (MIMIC-III is specific to inten-

sive care unit patients), and availability (data use agreements are re-

quired to use MIMIC-III).

Generated synthetic data attempt to preserve the structure, for-

mat, and distributions of real EHR datasets but do not contain iden-

tifiable information about real patients.18 Synthetic data generators,

such as medGAN,16 can generate EHR datasets consisting of high-

dimensional discrete variables (both binary and count features), al-

though the temporal information of each EHR entry is not main-

tained. Methods such as OSIM2 are able to maintain the temporal

information but only simulate a subset of the data specific to a use-

case (eg, drug and treatment effects).19 Synthea uses publicly avail-

able data to generate synthetic EHR data but is limited to the 10

most common reasons for primary care encounters and 10 chronic

diseases that have the highest morbidity in the United States.20 To

our knowledge, no existing method can generate an entire synthetic

repository while preserving complete longitudinal and correlational

aspects of all features from the original clinical repository.

“Model to data” frameworkThe “Model to Data” (MTD) framework, a method designed to al-

low machine learning research on private biomedical data, was de-

scribed by Guinney et al21 as an alternative to traditional data

sharing methods.The focus of MTD is to enable the development of

analytic tools and predictive models without granting researchers di-

rect, physical access to the data. Instead, a researcher sends a con-

tainerized model to the data hosts who are then responsible for

running the model on the researcher’s behalf. In contrast to the

methods previously described, in which the shared or synthetic data

were limited in both scope and size, an MTD approach grants a re-

searcher the ability to use all available data from identified datasets,

even as those data stay at the host sites, while not giving direct ac-

cess to the researcher. This strategy enables the protection of confi-

dential data while allowing researchers to leverage complete clinical

datasets. The MTD framework relies on modern containerization

software such as Docker22 or Singularity23 for model portability,

which serves as a “vehicle,” sending models designed by a model de-

veloper to a secure, isolated, and controlled computing environment

where it can be executed on sensitive data. The use of containeriza-

tion software not only facilitates the secure delivery and execution

of models, but it opens up the ability for integration into cloud envi-

ronments (eg, Amazon Web Services, Google Cloud) for cost-

effective and scalable data analysis.

The MTD approach has been successful in a series of recent com-

munity challenges but has not yet been shown to work with large,

EHR datasets.24 Here, we present a pilot study of an MTD frame-

work implementation enabling the intake and ingestion of contain-

erized clinical prediction models by a large healthcare institution

(the University of Washington health system, UW Medicine) to their

on-premises secure computing infrastructure. The main goals of this

pilot are to demonstrate (1) the operationalization of the MTD ap-

proach within a large health system, (2) the ability of the MTD

framework to facilitate predictive model development by a re-

searcher (here referred to as the model developer) who does not

1394 Journal of the American Medical Informatics Association, 2020, Vol. 27, No. 9

Dow

nloaded from https://academ

ic.oup.com/jam

ia/article/27/9/1393/5868591 by guest on 29 March 2022

have direct access to UW Medicine EHR data, and (3) the feasibility

of a MTD community challenge for evaluating clinical algorithms

on remotely stored and protected patient data.

MATERIALS AND METHODS

Pilot data descriptionThe UW Medicine enterprise data warehouse (EDW) includes pa-

tient records from medical sites across the UW Medicine system in-

cluding the University of Washington Medical Center, Harborview

Medical Center, and Northwest Hospital and Medical Center. The

EDW gathers data from over 60 sources across these institutions in-

cluding laboratory results, microbiology reports, demographic data,

diagnosis codes, and reported allergies. An analytics team at the Uni-

versity of Washington transformed the patient records from 2010 to

the present day into a standardized data format, OMOP CDM v5.0.

For this pilot study, we selected all patients who had at least 1 visit

in the UW OMOP repository, which represented 1.3 million

patients, 22 million visits, 33 million procedures, 5 million drug ex-

posure records, 48 million condition records, 10 million observa-

tions, and 221 million measurements.

Scientific question for the pilot of the “model to data”

approachFor this MTD demonstration, the scientific question we asked the

model developer to address was the following: Given the past elec-

tronic health records of each patient, predict the likelihood that he/

she will pass away within the next 180 days following his/her last

visit. Patients who had a death record and whose last visit records

were within 180 days of the death date were defined as positives.

Negatives were defined as patients whose death records were more

than 180 days away from the last visit or who did not have a death

record and whose last visit was at least 180 days prior to the end of

the available data.

We selected all-cause mortality as the scientific question due to

the abundance and availability of patient outcomes from the Wash-

ington state death registry. As UW has linked patient records with

state death records, the gold standard benchmarks are not con-

strained to events happening within the clinic. Moreover, the mor-

tality prediction question has been thoroughly studied.25–27 For

these reasons, patient mortality prediction represents a well-defined

proof-of-concept study to showcase the potential of the MTD evalu-

ation platform.

Defining the training and evaluation datasetsFor the purpose of this study, we split the data into 2 sets: the train-

ing and evaluation datasets. In a live healthcare setting, EHR data is

constantly changing and evolving along with clinical practice, and

prospective evaluation of predictive models is important to ensure

that the clinical decision support recommendations generated from

model predictions are robust to these changes. We defined the evalu-

ation dataset as patients who had more recently visited the clinic

prior to our last death record and the training dataset as all the other

patients. This way the longitudinal properties of the data would be

approximately maintained.

The last death record in the available UW OMOP repository at the

time of this study was February 24, 2019. Any record or measurement

that was found after this date was excluded from the pilot dataset and

this date was defined as “end of data.” When building the evaluation

dataset, we considered the date 180 days prior to the end of data (Au-

gust 24, 2018) the end of the “evaluation window” and the beginning

of the evaluation window to be 9 months prior to the evaluation win-

dow start (November 24, 2017). We chose a 9-month evaluation win-

dow size because this resulted in an 80/20 split between the training

and evaluation datasets. We defined the evaluation window as the pe-

riod of time in which, if a patient had a visit, we included that patient

and all their records in the evaluation dataset. Patients who had visits

outside the window, but none within the window, were included in

the training data. Visit records that fell after the evaluation window

end were removed from the evaluation dataset (Figure 1, patient 7)

and from the training dataset for patients who did not have a con-

firmed death (Figure 1, patient 3). We only defined the true positives

for the evaluation dataset and created a gold standard of these

patients’ mortality status based on their last visit date and the death ta-

ble. However, we gave the model developer the flexibility to select

prediction dates for patients in the training dataset and to create corre-

sponding true positives and true negatives for training purposes. See

the Supplementary Appendix for additional information.

Model evaluation pipelineDocker containerized models

Docker is a tool designed to facilitate the sharing of software and de-

pendencies in a single unit called an image.22 These images make

Figure 1. Defining the evaluation dataset. Any patient with at least 1 visit within the evaluation window was included in the evaluation dataset (gold). All other pa-

tient records were added to the training dataset (blue). Visits that were after the evaluation window end were excluded from the evaluation dataset and from the

training dataset for patients who did not have a confirmed death (light/transparent blue). A 9-month evaluation window was chosen as the timeframe as that

resulted in an 80/20 split between the training dataset and the evaluation dataset.

Journal of the American Medical Informatics Association, 2020, Vol. 27, No. 9 1395

Dow

nloaded from https://academ

ic.oup.com/jam

ia/article/27/9/1393/5868591 by guest on 29 March 2022

package dependency, language compilation, and environmental var-

iables easier to manage. This technology enables the simulation of

an operating system that can be run on any computer that has the

Docker engine or compatible container runtime installed. These con-

tainers can also be completely isolated from the Internet or the

server on which they are hosted, an important feature when bringing

unknown codes to process protected data. For this study, the model

developer built mortality prediction Docker images, which included

dependencies and instructions for running models in the Docker

container.

Synapse collaboration platform

Synapse is an open-source software platform developed by Sage

Bionetworks (Seattle, WA) for researchers to share data, compare

and communicate their methodologies, and seek collaboration.28

Synapse is composed of a set of shared REST (representational state

transfer)-based web services that support both a website to facilitate

collaboration among scientific teams and integration with analysis

tools and programming languages to allow computational interac-

tions.29The Synapse platform provides services that enable submis-

sions of files or Docker images to an evaluation queue, which have

previously been used to manage containerized models submitted to

DREAM challenges.28 We use an evaluation queue to manage the

model developer’s Docker image submissions.

Submission processing pipeline

To manage the Docker images submitted to the Synapse Collabora-

tion Platform, we used a Common Workflow Language (CWL)

pipeline, developed at Sage Bionetworks. The CWL pipeline moni-

tors an evaluation queue on Synapse for new submissions, automati-

cally downloading and running the docker image when the

submission is detected. Executed commands are isolated from net-

work access by Docker containers run on UW servers.

UW on-premises server infrastructure

We installed this workflow pipeline in a UW Medicine environment

running Docker v1.13.1. UW Research Information Technology

uses CentOS 7 (Red Hat Linux) for their platforms. The OMOP

data were stored in this environment and were completely isolated

behind UW’s firewalls. The workflow pipeline was configured to

run up to 4 models in parallel. Each model had access to 70 GB of

RAM, 4 vCPUs, and 50 GB of SSD.

Institutional review board considerationsWe received an institutional review board (IRB) nonhuman subjects

research designation from the University of Washington Human

Subjects Research Division to construct a dataset derived from all

patient records from the EDW that had been converted to the

OMOP v5.0 Common Data Model (institutional review board num-

ber: STUDY00002532). Data were extracted by an honest broker,

the UW Medicine Research IT data services team, and no patient

identifiers were available to the research team. The model developer

had no access to the UW data.

RESULTS

Model development, submission, and evaluationFor this demonstration, a model developer built a dockerized mor-

tality prediction model. The model developer was a graduate student

from the University of Washington who did not have access to the

UW OMOP clinical repository. This model was first tested on a syn-

thetic dataset (SynPUF),30 by the model developer to ensure that the

model did not fail when accessing data, training, and making predic-

tions. The model developer submitted the model as a Docker image

to Synapse, via a designated evaluation queue, in which the Docker

image was uploaded to a secure Docker Hub cloud storage service

managed by Sage Bionetworks. The CWL pipeline at the UW secure

environment detected this submission and pulled the image into the

UW computing environment. Once in the secure environment, the

pipeline verified, built, and ran the image through 2 stages, the train-

ing and inference stages. During the training stage, a model was

trained and saved to the mounted volume “model” and during the

inference stage a “predictions.csv” file was written to the mounted

volume “output” with mortality probability scores (between 0 and

1) for each patients in the evaluation dataset (Figure 2). Each stage

Figure 2. Schema showing the Docker container structure for the training stage and inference stage of running the Docker image.

1396 Journal of the American Medical Informatics Association, 2020, Vol. 27, No. 9

Dow

nloaded from https://academ

ic.oup.com/jam

ia/article/27/9/1393/5868591 by guest on 29 March 2022

had a mounted volume “scratch” available for storing intermediate

files such as selected features (Figure 2). The model developer speci-

fied commands and dependencies (eg, python packages) for the 2

stages in the Dockerfile, train.sh, and infer.sh. The training and eval-

uation datasets were mounted to read-only volumes designated

“train” and “infer” (Figure 2).

After checking that the “predictions.csv” file had the proper for-

mat and included all the patients in the evaluation dataset, the pipe-

line generated an area under the receiver-operating characteristic

curve (AUROC) score and returned this to the model developer

through Synapse. When the Docker model failed, a UW staff mem-

ber would look into the saved log files to assess the errors. Filtered

error messages were sent to the model developer for debugging pur-

poses. See Figure 3 for the full workflow diagram.

Model developer’s perspectiveThe model developer built and submitted models, using 3 sets of fea-

tures: (1) basic demographic information (age on the last visit date,

gender, and race); (2) basic demographic information and binary

indicators for 5 common chronic diseases (cancer, heart disease,

type 2 diabetes, chronic obstructive pulmonary disease, and

stroke)31; and (3) the 1000 most common concept_ids selected from

the procedure_occurrence, condition_occurrence, and drug_expo-

sure domains in the OMOP dataset. For model 2, the developer used

the OMOP vocabulary search engine, Athena (“Athena” n.d.), to

identify 404 clinical condition_concept_ids associated with cancer,

76 condition_concept_ids with heart disease, 104 condition_concep-

t_ids with type 2 diabetes, 11 condition_concept_ids with chronic

obstructive pulmonary disease, and 153 condition_concept_ids with

stroke (Table 1). Logistic regression was used on the 3 sets of fea-

tures respectively. All model scripts are available online (https://

github.com/yy6linda/Jamia_ehr_predictive_model).

Model performanceThe submitted models were evaluated at the University of Washing-

ton by comparing the output predictions of the models to the true

180-day mortality status of all the patients in the evaluation dataset.

The implementation of the logistic regression model, Model 1, using

only demographic information, had an AUROC of 0.693. Model 2,

using demographic information and 5 common chronic diseases,

yielded an AUROC of 0.861. Model 3, using demographic informa-

tion and the most common 1000 condition/drug/procedure con-

cepts, yielded an AUROC of 0.921 (Figure 4).

Benchmarking the capacity of fixed computing

resources for running predictive modelsWe tested the capability of running models through the pipeline on

increasingly large feature sets using 2 machine learning algorithms:

logistic regression and neural networks. The models ran under fixed

computational resources: 70 GB of RAM and 4 vCPUs (a quarter of

the total available UW resources made available for this project).

This tested the feasibility of running multiple (here, 4) concurrent,

high-performance models on UW infrastructure for a community

challenge. A total of 6934 of the features used for this scalability test

were selected from condition_concept_ids that have more than 20

occurrences within 360 days from the last visit dates of patients in

the training dataset. The 2 selected algorithms were applied to a sub-

Figure 3. Diagram for submitting and distributing containerized prediction models in a protected environment. Dockerized models were submitted to Synapse by

a model developer to an evaluation queue. The Synapse Workflow Hook pulled in the submitted Docker image and built it inside the protected University of

Washington (UW) environment. The model trained on the available electronic health record data and then made inferences on the evaluation dataset patients,

outputting a prediction file with mortality probability scores for each patient. The prediction file was compared with a gold standard benchmark. The model’s per-

formance, measured by area under the receiver-operating characteristic curve, was returned to the model developer. CWL: Common Workflow Language.

Journal of the American Medical Informatics Association, 2020, Vol. 27, No. 9 1397

Dow

nloaded from https://academ

ic.oup.com/jam

ia/article/27/9/1393/5868591 by guest on 29 March 2022

set of the features of 1000, 2000, 3000, 4000, 5000, and 6000 se-

lected condition_concept_ids. We used the python sklearn package

to build a logistic regression model and keras frameworks to build a

3-layer neural network model (dimension 25 * 12 * 2). For both

models, we trained and inferred using the 6 different feature set

sizes. We report here the run times and max memory usage (Fig-

ure 5). While run times scale linearly with the number of features,

maximum memory usage scales in a slightly superlinear fashion.

DISCUSSION

In this pilot project, we implemented the MTD framework in the

context of an institutional enterprise data warehouse and demon-

strated how a model developer can develop clinical predictive mod-

els without having direct access to patient data. This MTD

evaluation platform relied on a mutually agreed-upon set of expecta-

tions between the data-hosting institution and the model developer,

including the use of (1) a common data model (here, OMOP), (2) a

standard containerization platform (here, Docker), (3) predeter-

mined input and output file formats, (4) standard evaluation metrics

and scripts, and (5) a feedback exchange mechanism (here, Synapse).

While we focused on the specific task of mortality status prediction

in this pilot study, our platform would naturally be generalizable to

other prediction questions or data models. A well-documented com-

mon data model (here, OMOP v5.0) is essential to the successful op-

eration of the MTD approach. This framework, however, is not

limited to the designated OMOP version, nor the OMOP CDM, and

could be expanded to the PCORnet Common Data Model,32 i2b2,33

or any other clinical data model. The focus of the MTD framework

is to deliver containerized algorithms to private data, of any stan-

dardized form, without exposing the data. With increased computa-

tional resources, our platform could scale up to handle submissions

of multiple prediction models from multiple researchers. Our scal-

ability tests show that complex models on wide feature sets can be

trained and evaluated in this framework even with limited resources

(70 GB per submission). These resources, including more RAM,

CPUs, and GPUs, could be expanded in a cloud environment and

parallelized across multiple models. This scalability makes the MTD

approach particularly appealing in certain contexts as discussed in

the following sections.

MTD as a mechanism to standardize sharing, testing,

and evaluation of clinical prediction modelsTypically, most clinical prediction models have been developed and

evaluated in isolation on site-specific or network-specific datasets,

without additional validation on external health record data from

other sites.27 By implementing an evaluation platform for common

clinical prediction problems, it would be possible to compare the

performance of models implementing different algorithms on the

same data and to test the robustness of the same model across differ-

ent sites, assuming those sites are using the same common data

model. This framework also motivates researchers to containerize

models for future reproduction. In the long term, we envision that

the MTD approach will enable researchers to test their predictive

models on protected health data without worrying about identifica-

tion of patients and to inspire the ubiquitous use of dockerized con-

tainers as a standard means to deliver and customize predictive

Table 1. Number of patients in the University of Washington Medicine Observational Medical Outcomes Partnerships repository who have

been diagnosed with cancer, heart disease, type 2 diabetes, or chronic obstructive pulmonary disease

Training set (n ¼ 956 212) Evaluation set (336 548)

Patients with cancer 66 203 (6.9) 42 195 (12.5)

Patients with heart disease 31 352 (3.3) 23 108 (6.9)

Patients with type 2 diabetes 40 938 (4.3) 28 234 (8.4)

Patients with chronic obstructive pulmonary disease 13 777 (1.4) 8302 (2.5)

Patients with stroke 5216 (0.6) 3927 (1.2)

Other patients 834 591 (87.3) 257 884 (76.6)

Values are n (%).

Figure 4. A comparison of the receiver-operating characteristic curves for the

3 mortality prediction models submitted, trained, and evaluated using the

“Model to Data” framework. AUC: area under the curve; cdp: condition/proce-

dure/drug.

Figure 5. Runtime and max memory usage for training predictive models in

the benchmarking test.

1398 Journal of the American Medical Informatics Association, 2020, Vol. 27, No. 9

Dow

nloaded from https://academ

ic.oup.com/jam

ia/article/27/9/1393/5868591 by guest on 29 March 2022

models across institutions. The proposed pipeline is not dependent

on the clinical question under investigation nor whether the study

question involves all steps of model development (training, infer-

ence/test, open-ended prospective, and non-performance-related

evaluation).

MTD as a mechanism for enabling community

challengesCommunity challenges are a successful research model where groups

of researchers develop and apply their prediction models in response

to a challenge question(s), for which the gold standard truth is

known only to the challenge organizers. There have been a large

number of successful biomedical community challenges including

DREAM,28 CAFA,34,35 CAGI,36 and CASP.37 A key feature of some

of these challenges is the prospective evaluation of prediction mod-

els, an often unmet need in clinical applications. The MTD ap-

proach uniquely enables such an evaluation on live EDW data.

Based on our observations in this pilot study, we will scale up our

platform to initiate an EHR mortality community challenge at the

next stage, in which participants from different backgrounds will

join us in developing mortality prediction models.

Lessons learned and limitationsDuring the iterative process of model training and feedback ex-

change with the model developer, we discovered issues that will

have to be addressed in future implementations. First, the model de-

veloper had multiple failed submissions due to discrepancies be-

tween the synthetic data and real data. Devoting effort toward

improving the similarity between the synthetic data and the UW

data will help alleviate this barrier. Correcting differences in data

type, column names, and concept availability would allow model

developers to catch common bugs early in the development process.

Second, providing manually filtered log files (filtered by UW staff)

that are generated by the submitted models when running on UW

data as an iterative process can be cumbersome. We propose that

prior to running submitted models on the UW data, models should

first be run on the synthetic dataset hosted in an identical remote en-

vironment that would allow the return of all log files to support

debugging. This would allow any major errors or bugs to be caught

prior to the model running on the real data. Third, inefficiently writ-

ten prediction models and their containers burdened servers and sys-

tem administrators. The root cause of this issue was the model

developer’s difficulty in estimating the computing resources (RAM,

CPU) and time needed to run the submitted models. We can use the

same synthetic data environment as solution 2 to estimate run time

and RAM usage on the full dataset prior to running the model on

the real data. Fourth, the model developer was unaware of the data

distributions or even the terminologies for certain variables making

feature engineering difficult. Making a data dictionary with the

more commonly used concept codes from the UW data available to

the model developer will enable smarter feature engineering.

The presented pilot predictive models are relatively simple. How-

ever, the MTD framework is also compatible with more compli-

cated machine learning algorithms and feature engineering. Future

researchers can dockerize their complicated predictive models with

more advanced feature engineering and send them through our pipe-

line as docker images. Our pipeline is able to execute these docker

images on real data and return scores. Model interpretation, such as

feature importance scores, is also feasible under this framework if

the feature importance calculation is embedded in the docker models

and output to a designated directory in the docker container. After

checks for information leakage, the UW IT would be able to share

that information for the model developer to further interpret their

models. However, the remote nature of the MTD framework limits

the opportunities for manual hyperparameter tuning which usually

requires direct interaction with data. Hyperparameters are model

parameters predefined before the models’ training stages (eg, learn-

ing rate, number of layers in neural networks, etc.). However, auto-

mated methods to tune the hyperparameters work with the

proposed pipeline. The emergence of AutoML, as well as other algo-

rithms including grid and random search, reinforcement learning,

evolutionary algorithms, and Bayesian optimization, allows hyper-

parameter optimization to be automated and efficient.38

CONCLUSION

We demonstrate the potential impact of the MTD framework to

bring clinical predictive models to private data by operationalizing

this framework to enable a model developer to build mortality pre-

diction models using protected UW Medicine EHR data without

gaining access to the dataset or the clinical environment. This work

serves as a demonstration of the MTD approach in a real-world clin-

ical analytics environment. We believe this enables future predictive

analytics sandboxing activities and the development of new clinical

predictive methods safely. We are extending this work to enable the

EHR DREAM Challenge: Patient Mortality Prediction as a further

demonstration.

FUNDING

This work was supported by the Clinical and Translational Science

Awards Program National Center for Data to Health funding by the

National Center for Advancing Translational Sciences at the Na-

tional Institutes of Health (grant numbers U24TR002306 and UL1

TR002319). Any opinions expressed in this document are those of

the Center for Data to Health community and the Institute for

Translational Health Sciences and do not necessarily reflect the

views of National Center for Advancing Translational Sciences,

team members, or affiliated organizations and institutions. TB, YY,

SM, JG, TY, TS, and JP were supported by grant number

U24TR002306. TB, JP, and SM were supported by grant number

UL1 TR002319. VP is supported by the Washington Research Foun-

dation Fund for Innovation in Data-Intensive Discovery and the

Moore/Sloan Data Science Environments Project at the University of

Washington and the National Institutes of Health grant number

K99 LM012992.

AUTHOR CONTRIBUTIONS

TB managed and curated the Observational Medical Outcomes

Partnerships repository, implemented the “Model to Data” (MTD)

framework, and was a major contributor in writing the paper. YY

built the mortality prediction models, tested the MTD framework,

and was a major contributor in writing the paper. TS was a major

contributor to the design and management of the study and in writ-

ing the paper. VP and NH were contributors in writing the paper. JP

and TY were contributors in maintaining and implementing the

MTD pipeline and were contributors in writing the paper. SM and

JG conceived of the project with TB and YY as well as funding and

Journal of the American Medical Informatics Association, 2020, Vol. 27, No. 9 1399

Dow

nloaded from https://academ

ic.oup.com/jam

ia/article/27/9/1393/5868591 by guest on 29 March 2022

overseeing scientific progress. All authors read and approved the fi-

nal paper.

SUPPLEMENTARY MATERIAL

Supplementary material is available at Journal of the American

Medical Informatics Association online.

ACKNOWLEDGMENTS

We would like to thank Drs. Gang Luo, Kari Stephens, Martin

Gunn, Aaron Lee, Meliha Yetisgen, and Su-In Lee for their advice

and efforts in planning this project.

CONFLICT OF INTEREST STATEMENT

The authors have no competing interest to declare.

REFERENCE LIST

1. Charles D, Gabriel M, Searcy T, et al. Adoption of electronic health record

systems among US non-federal acute care hospitals: 2008–2014. ONC

data brief 2015; 23. https://www.healthit.gov/sites/default/files/data-brief/

2014HospitalAdoptionDataBrief.pdf Accessed September 05, 2018.

2. Heisey-Grove D, Patel V. Physician Motivations for Adoption of Elec-

tronic Health Records. Washington, DC: Office of the National Coordina-

tor for Health Information Technology; 2014.

3. Birkhead GS, Klompas M, Shah NR. Uses of electronic health records for

public health surveillance to advance public health. Annu Rev Public

Health 2015; 36 (1): 345–59.

4. Jones SS, Rudin RS, Perry T, et al. Health information technology: an

updated systematic review with a focus on meaningful use. Ann Intern

Med 2014; 160 (1): 48–54.

5. Xiao C, Choi E, Sun J. Opportunities and challenges in developing deep

learning models using electronic health records data: a systematic review.

J Am Med Inform Assoc 2018; 25 (10): 1419–28.

6. Miotto R, Wang F, Wang S, et al. Deep learning for healthcare: review,

opportunities and challenges. Brief Bioinform 2018; 19 (6): 1236–46.

7. Kaji DA, Zech JR, Kim JS, et al. An attention based deep learning model of

clinical events in the intensive care unit. PLoS One 2019; 14 (2): e0211057.

8. Abouelmehdi K, Beni-Hessane A, Khaloufi H. Big healthcare data: pre-

serving security and privacy. J Big Data2018; 5: 1. doi:10.1186/s40537-

017-0110-7.

9. Allen C, Des Jardins TR, Heider A, et al. Data governance and data shar-

ing agreements for community-wide health information exchange: lessons

from the beacon communities. EGEMS (Wash DC) 2014; 2 (1): 1057.

10. Reps JM, Schuemie MJ, Suchard MA, et al. Design and implementation of

a standardized framework to generate and evaluate patient-level predic-

tion models using observational healthcare data. J Am Med Inform Assoc

2018; 25: 969–75.

11. Hripcsak G, Duke JD, Shah NH, et al. Observational Health Data Scien-

ces and Informatics (OHDSI): opportunities for observational researchers.

Stud Health Technol Inform 2015; 216: 574–8.

12. Klann JG, Joss MAH, Embree K, et al. Data model harmonization for the

All Of Us Research Program: Transforming i2b2 data into the OMOP

common data model. PLoS One 2019; 14 (2): e0212463.

13. Garfinkel SL. De-Identification of Personal Information. Gaithersburg,

MD: National Institute of Standards and Technology; 2015. doi:10.6028/

NIST.IR.8053.

14. Malin B, Sweeney L, Newton E. Trail re-identification: learning who you

are from where you have been. Pittsburgh, PA: Carnegie Mellon Univer-

sity, Laboratory for International Data Privacy; 2003.

15. Desautels T, Calvert J, Hoffman J, et al. Prediction of sepsis in the inten-

sive care unit with minimal electronic health record data: a machine learn-

ing approach. JMIR Med Inform 2016; 4 (3): e28.

16. Choi E, Biswal S, Malin B, et al. Generating multi-label discrete patient

records using generative adversarial networks. arXiv:1703.06490. 2017.

17. Johnson AEW, Pollard TJ, Shen L, et al. MIMIC-III, a freely accessible

critical care database. Sci Data 2016; 3 (1): 160035.

18. Foraker R, Mann DL, Payne P. Are synthetic data derivatives the future of

translational medicine? J Am Coll Cardiol Basic Trans Science 2018; 3

(5): 716–8.

19. Murray RE, Ryan PB, Reisinger SJ. Design and validation of a data simu-

lation model for longitudinal healthcare data. AMIA Annu Symp Proc

2011; 2011: 1176–85.

20. Walonoski J, Kramer M, Nichols J, et al. Synthea: An approach, method,

and software mechanism for generating synthetic patients and the syn-

thetic electronic health care record. J Am Med Inform Assoc 2018; 25:

230–8.

21. Guinney J, Saez-Rodriguez J. Alternative models for sharing confidential

biomedical data. Nat Biotechnol 2018; 36 (5): 391–2.

22. Docker. https://www.docker.com/ Accessed August 9, 2019.

23. Sylabs.io. Singularity. https://sylabs.io/ Accessed 18 November, 2019.

24. Ellrott K, Buchanan A, Creason A, et al. Reproducible biomedical bench-

marking in the cloud: lessons from crowd-sourced data challenges. Ge-

nome Biol 2019; 20 (1): 195.

25. Ge W, Huh J-W, Park YR, et al. An interpretable ICU mortality prediction

model based on logistic regression and recurrent neural networks with

LSTM units. AMIA Annu Symp Proc 2018; 2018: 460–9.

26. Avati A, Jung K, Harman S, et al. Improving palliative care with deep

learning. BMC Med Inform Decis Mak 2018; 18 (Suppl 4): 122.

27. Goldstein BA, Navar AM, Pencina MJ, et al. Opportunities and challenges

in developing risk prediction models with electronic health records data: a

systematic review. J Am Med Inform Assoc 2017; 24 (1): 198–208.

28. Saez-Rodriguez J, Costello JC, Friend SH, et al. Crowdsourcing biomedi-

cal research: leveraging communities as innovation engines. Nat Rev

Genet 2016; 17 (8): 470–86.

29. Omberg L, Ellrott K, Yuan Y, et al. Enabling transparent and collabora-

tive computational analysis of 12 tumor types within The Cancer Genome

Atlas. Nat Genet 2013; 45 (10): 1121–6.

30. Lambert CG, Amritansh Kumar P. Transforming the 2.33M-patient

Medicare synthetic public use files to the OMOP CDMv5: ETL-CMS soft-

ware and processed data available and feature-complete. Albuquerque,

NM: Center for Global Health, University of New Mexico; 2016.

31. Weng SF, Vaz L, Qureshi N, et al. Prediction of premature all-cause mor-

tality: A prospective general population cohort study comparing machine-

learning and standard epidemiological approaches. PLoS One 2019; 14

(3): e0214365.

32. Fleurence RL, Curtis LH, Califf RM, et al. Launching PCORnet, a na-

tional patient-centered clinical research network. J Am Med Inform Assoc

2014; 21 (4): 578–82.

33. Murphy SN, Weber G, Mendis M, et al. Serving the enterprise and beyond

with informatics for integrating biology and the bedside (i2b2). J Am Med

Inform Assoc 2010; 17 (2): 124–30.

34. Radivojac P, Clark WT, Oron TR, et al. A large-scale evaluation of com-

putational protein function prediction. Nat Methods 2013; 10 (3):

221–7.

35. Jiang Y, Oron TR, Clark WT, et al. An expanded evaluation of protein

function prediction methods shows an improvement in accuracy. Genome

Biol 2016; 17 (1): 184.

36. Cai B, Li B, Kiga N, et al. Matching phenotypes to whole genomes: Les-

sons learned from 4 iterations of the personal genome project community

challenges. Hum Mutat 2017; 38 (9): 1266–76.

37. Moult J. A decade of CASP: progress, bottlenecks and prognosis in protein

structure prediction. Curr Opin Struct Biol 2005; 15 (3): 285–9.

38. He X, Zhao K, Chu X. AutoML: a survey of the state-of-the-art.

arXiv:198.00709. 2019.

1400 Journal of the American Medical Informatics Association, 2020, Vol. 27, No. 9

Dow

nloaded from https://academ

ic.oup.com/jam

ia/article/27/9/1393/5868591 by guest on 29 March 2022