How to Build a Health Data AI Strategy

An AI Strategy consists of a department-wide approach to help realize the value of AI and to establish policies and practices that facilitate its growth across that specific sector. The Executive Order on Maintaining AI Leadership outlined a number of strategic objectives to ensure that the United States stays ahead of developing AI. Those key directives include:

Promote Investment

Promote sustained investment in AI R&D in collaboration with industry, academia, international partners and allies, and other non-federal entities.

Enhance Access

Enhance access to high-quality and fully traceable federal data, models, and computing resources to increase the value of such resources for AI R&D, while maintaining safety, security, privacy, and confidentiality protections consistent with applicable laws and policies.

Reduce Barriers 

Reduce barriers to the use of AI technologies to promote their innovative application while protecting American technology, economic and national security, civil liberties, privacy, and values.

Ensure Technical Standards

Ensure that technical standards minimize vulnerability to attacks from malicious actors and reflect federal priorities for innovation, public trust, and public confidence in systems that use AI technologies; and develop international standards to promote and protect those priorities.

Train the Next Generation

Train the next generation of American AI researchers and users through apprenticeships; skills programs; and education in science, technology, engineering, and mathematics (STEM), with an emphasis on computer science, to ensure that American workers, including federal workers, are capable of taking full advantage of the opportunities of AI.

Develop an Action Plan

Develop and implement an action plan, in accordance with the National Security Presidential Memorandum of February 11, 2019 (Protecting the United States Advantage in Artificial Intelligence and Related Critical Technologies) (the NSPM) to protect the advantage of the United States in AI and technology critical to United States economic and national security interests against strategic competitors and foreign adversaries.”

Invest in IT Infrastructure and Expertise to Support AI

AI demands a robust information technology (IT) infrastructure, including data infrastructure, and staff with the skills to apply it. Both infrastructure and expertise must be able to manage the large amounts of data needed to support AI as well as the development of advanced AI applications.

Executive Order: 

Actionable Opportunities: 

Develop comprehensive technology investment plans to support  organizational AI strategies. Administrators should improve legacy systems for managing the data that will fuel AI applications as well as IT modernization overall. It may also include public-private collaboration to reduce the cost of technical improvements.

Build expertise in designing and implementing AI applications. Most federal departments have limited organizational knowledge and experience needed to develop AI applications with their data. A department should bridge this gap through hiring programs, public-private collaborations, or fellowship programs to bring AI experts into government on a temporary basis.

Create national testbeds for AI development. Industry has led the development of AI applications, since commercial companies collect massive amounts of data and have the resources, expertise, and technical capacity to apply it. Federal agencies can help remove these barriers to entry by creating collaborative environments where data and code for AI applications can be tested, stored, and shared.

Ensure Access to Data for AI While Protecting Privacy

Concerns about privacy are paramount in the data for AI applications, especially when that application is for healthcare. While the use of health data in EHRs and other medical records is governed by federal and state legislation, other data types, like IoT data, are only regulated through “terms of service” agreements developed by the private sector. An agency seeking to develop an AI strategy will need to ensure that sensitive information is not disclosed or misused when these data sources are applied. At the same time, researchers need to be able to use sensitive data appropriately to develop new insights, diagnostic methods, and treatments.

Actionable Opportunities:

Provide guidance for de-identifying sensitive data. In order to protect privacy, health data can be de-identified in different ways before researchers analyze it. Data scientists, for example, have utilized codes that make it possible to link data about an individual from different sources without revealing the person’s identity. An agency should provide additional guidance on de-identification methods to protect data privacy and security while encouraging its use for AI applications.

Develop credentialing systems for controlled access to sensitive health data. Some sensitive data maintained by the federal government, such as collections of genomic data, are now available only to qualified researchers only, under agreements that prohibit them from sharing the data more widely. Agencies will want to explore the possibility of developing credentialing systems to determine who should have access to what kinds of data and under what conditions.

Use Standards to Improve Data Quality and Interoperability 

Data for AI applications should be clean, timely, accurate, and standardized. Roundtable participants identified numerous challenges related to integrating data and metadata from multiple sources. Common standards for data collection and management can ensure that data and metadata are accurate and consistent across healthcare applications, using a shared library of variables that are applied across datasets. Standardization also ensures that datasets will be interoperable between an organization’s external partners.

Actionable Opportunity: 

Adopt and expand existing common data models. Many participants noted the value of adopting existing common data models for data and metadata, wherever possible. Common data models standardize the way information is structured and make it easier to use in combination with other data. Examples of existing common data models include Patient Centered Outcome Research Network (PCORNet) model and the Observational Medical Outcomes Partnership (OMOP) model. Participants also mentioned ICD-10, which is a widely used coding system that could be expanded to improve health data quality and interoperability nationwide.

Remove Administrative Barriers to Data Sharing

AI applications are most effective when they can integrate large amounts of data about diverse facets of health although researchers often have difficulty accessing the data they need from government sources. To share data from other sources, researchers must have Data Use Agreements to ensure smooth data sharing between different federal agencies and other independent units. Drawing up and approving separate DUAs can take time and administrative resources, placing a burden on researchers and slowing down the research process.

Actionable Opportunity: 

Update and standardize data use agreements. A set of standard DUAs, using common terms and conditions, could accelerate and simplify data sharing between operating agencies within HHS. Revised DUAs could substantially reduce the time it takes for HHS researchers to request and receive important, time-sensitive data: Finalizing DUAs can now take up to 12 months. Standard DUAs for internal use within HHS could also become a model for agreements between HHS and outside partners.

Clarify Appropriate Use of Patient-Generated Data

Increasingly, patients are generating data about themselves that can complement research and clinical data. Patient-generated data includes data collected through sensors and wearables, and through social media and mobile applications. Large amounts of this data are collected under “terms of service” agreements and are being used by entities that are not covered by HIPAA. As interest in patient-generated data increases, there is a need for clearer rules around its appropriate use, particularly in the context of AI development.

Actionable Opportunities: 

Develop specific guidelines for entities not covered by HIPAA. HIPAA applies to traditional entities, such as health plans and healthcare providers, but does not apply to software development and social media companies that may be collecting patient-generated data with sensitive health information. While the HHS Office for Civil Rights has developed several informational resources for health app developers, entities that are not covered by HIPAA, and the individuals whose data they collect,  would benefit from further guidance and best practices on appropriate uses of patient-generated data.

Address Concerns About Accountability and Bias

Many AI applications that use health data are being developed as a “black box” without clear information about the algorithms and data being used to make decisions. AI strategies should include steps to address concerns about accountability, bias, and oversight. This will require improved transparency for both AI algorithms and the data that supports them.

Actionable Opportunities:

Develop guidelines for mitigating bias in health-related AI applications. Some Roundtable participants expressed interest in having HHS and its partners develop guidance for identifying and reducing bias in AI applications. Participants also suggested including an internal HHS review function to enforce such guidelines and help increase transparency.

Pilot and implement an FDA regulation for health-related AI applications. Roundtable participants expressed similar concerns about a lack of quality assurance and oversight for AI development in healthcare. The Food and Drug Administration (FDA) has established a set of iterative, agile guidelines to precertify the rapid development of Software as a Medical Device (SaMD). The FDA should continue its efforts to adopt a revised regulatory framework for AI applications in which proposed changes to algorithms must be disclosed to the FDA prior to market release. This framework should take into account the ability of AI applications to adopt in real time and provide ways to assess any risks from those changes.

  • Publish metadata about data sources. Metadata provides information about the structure of a dataset, the meaning of each variable within the data, the method of collection, and other important characteristics. Metadata can also provide information about the source of the data, the way it was collected, and other factors that may indicate potential causes of bias. Publishing metadata will make it easier to assess whether the data and the algorithms it supports are at risk of being biased in any way.

Case Study Example: The National Artificial Intelligence Institute

“The NAII seeks to develop AI research and development capabilities in VA as a means to support Veterans, their families, survivors, and caregivers. The NAII designs and collaborates on large-scale AI R&D initiatives, national AI policy, and partnerships across agencies, industries, and academia. The NAII is a joint initiative by the Office of Research and Development and the Office of the Secretary’s Center for Strategic Partnerships in VA. The NAII is dedicated to advancing AI research and development for real-world impact and outcomes to ensure Veteran health and well-being.”

For more information: