How can providers practice proper data use and sharing of patient data?

What are the common risks across data types?

Many of the benefits of accessing datasets that include sensitive PHI are well documented, including improved preventive care, clinical outcomes, and coordination of care. Similarly, many of the major health data types pose shared challenges and issues that can negatively impact patients and communities. 

Incomplete Data. Healthcare providers frequently make clinical decisions based on data that may not be complete or lacks additional context from complementary data. Physicians often make healthcare decisions based solely on a patient’s EHR or claims data, but they may miss additional insights and context that genomic or social determinants of health (SDOH) data could provide.

Possibility of Re-Identification Through Other Datasets. De-identification is used as a key strategy to preserve the anonymity of individuals and safeguard sensitive protected health information (PHI). Although data scientists have made progress in de-identifying data through the removal of key identifiers and anonymization, these safeguards are still sometimes not enough. Recent research has shown that re-identifying patient data, using machine learning technology to combine anonymized data with other third-party data in a process called the “mosaic effect” is a real possibility. 

Inappropriate Data Sharing and Use. Personal health data can be misused by third party providers that violate informed consent, or by data brokers that illegally obtain PHI without a patient’s knowledge or permission. While data sharing between covered entities is a common phenomenon, patients are unlikely to become aware of this occurrence when it takes place. Most are unaware of the limits of approved data use, and may not immediately realize when their right to privacy is being breached.   

Lack of knowledge around  proper data safeguards and protection guidelines. Many patients are unaware that mobile applications, medical devices, voice assistants and other technologies which collect their data are sold and managed by companies not covered under HIPAA. A recent survey showed that nearly one third of healthcare providers do not have a HIPAA compliance plan and the same number are uncertain about security safeguards for personal data. Moreover, many patients are unaware of the specific situations in which a healthcare entity covered by HIPAA’s rules is allowed to access PHI. Patients and providers both may be misled by the privacy policies of third-party companies that do not have to abide by data safeguards.


How can healthcare providers reduce risk? 

The data minimization principle is an important factor to be aware of as a healthcare provider. The principle emphasizes that data collection and the amount of data used for any particular project should be only what is necessary to accomplish the needed tasks. This reduces the possibility of unnecessarily gathering potentially sensitive information about an individual. Providers should be aware that  analysis for research using machine learning and AI often requires larger amounts of data to be meaningful, and patient consent for data use may become increasingly complex as data needs become broader.


De-identification and anonymization strategies. These strategies seek to remove sensitive personally identifiable information (PII) from individual and population-level data, or simply make it difficult to identify the source of the data. These strategies  can include:

  • Providing Anonymized Identifiers: These identifiers allow researchers to connect disparate datasets while preserving the privacy of individuals.
  • Removing Non-Critical Information: Researchers can remove key variables such as ZIP code digits, social security numbers, account information, and other identifying information.
  • Leveraging Synthetic Data: Synthetic data is produced by “a complex statistical model that generates a simulated population that has the same general features as the original data.”
  • Applying Differential Privacy: Differential privacy places constraints on algorithms that rely on inputs from a database of information. This masks the personal information so an external user cannot determine if an individual’s information was used in the computation process.
  • Generalized Statistical Approaches: Statistical approaches often include adding “noise” to the data to obscure specific variables such as age range or location.

Healthcare providers should be aware that care coordination may become difficult if certain aspects of a patient’s PHI are restricted. Additionally, interoperability may be impacted if databases follow different standards, making data sharing between them more  difficult.


Patient-Based Differential Access. This enables individuals to grant access to their personal data for the benefits of public research. Patients may opt-in and provide consent to use their personal data for a specific purpose, such as studying a rare disease or identifying genetic trends. Researchers are allowed to access this data based on the parameters of the patient’s original consent. Providers should be aware that this process requires them to share explicitly clear and detailed information about how their patients’ data is intended to be used. The patient’s initial consent may not cover future legitimate uses of data and can be revoked. 


Incidental Use and Disclosure: HIPAA permits certain incidental uses and disclosures that may occur as a by-product of another, permissible use of data. They are allowed as long as the covered entity has instituted a reasonable set of technical, administrative, and physical safeguards. However, poor definitions of incidental and secondary use can create confusion and hinder accountability for inappropriate uses of health data.

What are appropriate data use techniques to avoid discrimination?

The misuse of PHI can leave patients more at risk of discrimination and financial exploitation. Such harms can compound at the community level, as the analysis of large scale health datasets may benefit one group at the expense of others. Community redlining and individual financial discrimination are potential consequences of the misuse of health data. As new kinds of data analysis are developed, comprehensive guidelines and measures are needed to reduce the possibility of data misuse and resulting discrimination.

HIPAA includes several key nondiscrimination measures to ensure that insurance companies cannot increase premiums or exclude members based on their health status. Health status includes a series of factors including medical conditions, claims experience, receipt of healthcare, disability, and evidence of insurability. This provision is complemented by the Genetic Information Nondiscrimination Act (GINA) which similarly aims to prohibit employer- or insurance- based discrimination based on an individual’s genetic information.

HIPAA also builds on the data minimization principle with its “Minimum Necessary” use clause, which discourages covered entities from gathering noncritical information about a person. Specifically, a covered entity “must make reasonable efforts to use, disclose, and request only the minimum amount of protected health information needed to accomplish the intended purpose of the use, disclosure, or request.”

The E-Government Act of 2002 mandates that any agency that collects PII must evaluate the security of its systems to ensure adequate data protection. Most federal agencies achieve this by conducting a privacy impact assessment (PIA) of their operational and developmental systems. HHS publishes all of the PIAs from its various operating divisions and also shares the PIAs from its third party websites.