Why is my data needed in clinical or genomic research and what should I consider to participate?

When researchers and doctors pool your health data along with data of people with similar conditions, they can look for patterns in the data to improve diagnosis and care in a number of ways. Your health data will help researchers: 

  • improve diagnostics and clinical care
  • understand causes of diseases 
  • develop new treatments and preventive care methods 
  • improve patient safety
  • improve population health management by payers and providers
  • evaluate government health policy

In all of these ways, your data can contribute to the advancement of health research and care for everyone. If you are a member of a historically vulnerable or underserved community, your data is all the more relevant because clinical research in the past has not always included people like you.

To understand de-identification, it is important to first understand the concept of ‘protected health information’ or ‘PHI’. PHI is information about a person's health, their health condition, or the payment for health services rendered to them.  For example, your date of birth is not PHI on its own. However, when it is linked with any of your health information - if you were a patient at X hospital, for example - it becomes PHI. Please note that the standards applied by providers vary: for example, some researchers will take the position that any information about a person’s participation in a study is PHI.

The HIPAA regulations only apply to your data when it is considered PHI. De-identified data is not considered PHI for the purpose of law - therefore, when your data is shared by your provider with researchers after it has been de-identified, the law does not protect it. However, when the provider shares data in the form of a limited dataset, it is still considered PHI.

You should take steps to ensure that you understand the benefits and risks of participating in research being conducted by your healthcare provider or any other research program attempting to enroll you. Research institutes tend to treat the process of your consent as a regulatory compliance exercise or a one-off discussion. However, thanks to work by patient advocates and other groups, there is a concerted push towards adopting better models of patient engagement through, for example, ‘dynamic consent’. You might also want to ensure that the research aims do not conflict with your moral or other values.

You can use the following framework (informed by the recommendations of the Clinical Trials Transformation Initiative) to evaluate if the research institute conducting the trial is using ethical processes and ensuring informed consent from its subjects:

  • Ongoing and interactive conversation: Do the researchers treat the process of obtaining your consent as an ongoing and interactive communication? Ideally, the researchers should not simply administer a consent form at the beginning of the trial with limited to no engagement afterwards.
  • Customized consent: Have the researchers taken care to customize the consent process to the particular needs of the patient group you are part of?
  • Communication and responsiveness: Are the people taking your consent able to effectively communicate information specific to your trial? Are they responsive to the needs of your patient group?
  • Discussion tools and interactive techniques: Do the researchers use discussion tools and interactive techniques to ensure that your consent is well informed? Some factors you might consider are whether the researchers respond well to your education level, health literacy, disabilities, and other factors that may impact your ability to provide consent.
  • Plain language explanations: Do they explain in plain language important considerations in a clinical trial such as its purpose, how long it will last, compensation for injury, and how you can withdraw? Do they explain in plain language what steps they are taking to protect your data and what de-identification techniques they employ while sharing your data?

Population genomic data projects have been launched around the world to make advances in precision medicine. These are often launched by national governments to learn more about complex diseases that are often caused by a combination of genetic and environmental factors. These programs are also increasingly embracing the new wave of ‘dynamic consent’ models which are based on personalized communication platforms that aim to make the consent process a continuous two-way communication between researchers and participants.

‘Dynamic consent’ models have many features that you can use to evaluate how much control they will give you over how your data is used:

  • Does the program include an online portal to deliver information and allow you to change permissions or preferences?
  • Does the program allow you the opportunity to accept or decline participation in new research opportunities or studies?
  • Does the program allow you to receive individually interpreted results (personal medical information) or raw sequence data?
  • Does the program allow you to select between different levels of consent types (broad consent, per-study consent, etc.)?
  • Are you able to opt-out of the database at any time?

In addition to consent with respect to control of data, there are a number of other features that make a model dynamic including a dynamic education component, timely updates regarding the research, and more.

Example: The All of Us Research Program

The National Institute of Health’s ‘All of Us Research Program’ aims to collect genomic data from more than 1 million people with the goal of speeding up health research breakthroughs. Patient data from the All of Us Research Program is put in a cloud based repository which researchers will have access to. Authorized researchers are allowed access to individual level data. There is a public browser which presents aggregate data. The program allows those who sign-up to choose to provide access to their EHR data. The program allows users to submit data from wearables and may link that data with geolocation data and other datasets such as insurance claims data. The program, which specifically aims to include participants from historically underrepresented groups, employs a dynamic model to engage patients. The model:

  • Presents information about the study on a dynamic online portal
  • Allows individual medical results to be returned to participants if they are medically actionable,
  • Makes genetic sequencing data and wearable sensor data available to participants without interpretation,
  • Takes steps to verify that participants understand the concepts communicated before signing the form.

The All of Us Research Program does not allow the user to select different consent models. However, it scores well on other metrics by enabling dynamic education, the timely release of research information, and up-to-date research progress.

Genomic research programs like All of Us raise important questions about the collection of non-health data by the government: How will that data be used to determine eligibility for services under Medicaid and Medicare? Will it be integrated with mobile health data to create risk profiles? Will aggregated de-identified data be used for non-health policy making?

There are some existing legal protections for federally funded research programs. For example, the NIH Certificate of Confidentiality protects researchers from being forced to disclose identifying information about you in legal proceedings. The Genetic Information Nondiscrimination Act (GINA) prohibits discirmination in health insurance and employment based on such information. However, many of the issues raised by genomic research programs are not addressed through these laws.

Given the limitations in law, some questions to ask yourself when evaluating these programs include:

  • Do these programs recognize the importance of implementing robust security and privacy practices?
  • How do they communicate these practices?
  • Who is responsible for implementing them?
  • What steps are taken to de-identify data? What other steps are taken in presenting aggregate data?
  • Do they allow public access to participant level data?

The underlying mandates of programs like All of Us require the organizations who sign up to access the data to have privacy protection and security frameworks that are in line with NIST standards. However, since these practices are expected to be implemented by the organization conducting the research, there is limited visibility on how they are carried out in practice.

Example: The Million Veterans Program

Another research program which has an online platform to enable voluntary participation is the ‘Million Veterans Program’ (Million Veterans). Million Veterans collects genomic data to research cancer risk, diabetes complications, mental illnesses including PTSD and depression, and other issues among veterans. It communicates its efforts to guarantee confidentiality and privacy through a well drafted FAQ page. For example, it explains its de-identification techniques in plain language. It also provides information on situations in which research can be shared, the risk of genetic discrimination, and the inability of insurance companies or employers to access this information.

The law (in this case, HIPAA) protects your data prior to de-identification but not once it has been de-identified. It prescribes certain standards or technical approaches that must be met in order for data to be considered de-identified. Once these standards have been met, the law does not apply to your data.

It is important to understand that there will always be a risk of re-identification of de-identified data. With the evolution of data science and technology, it is possible to match de-identified datasets with other third party data to re-identify individuals with a high degree of accuracy. As a patient, you typically do not have visibility into the companies with whom your data is being shared in such de-identified datasets.

Keep yourself informed by reviewing your healthcare provider’s privacy policies to understand how they are sharing de-identified data.If such information is not easily accessible on their website, ask them if they share data with the providers. 

The law allows hospitals and other providers to share your protected health information with researchers and public health personnel if they have signed a ‘data use agreement’ with the provider and the provider has removed a number of identifiers in this ‘limited data set’. While the primary researchers conducting the study would have obtained your permission under the Common Rule, a data use agreement will be used when the primary researcher in the hospital or clinical setting wants to share the limited data set with a colleague at another institution not involved in the trial, or with a private registry not involved in the study.

Identifiers are data that are unique to an individual and include information like your name, street address, telephone, fax number, email address, social security number, or medical records numbers. In this instance, information like your date of birth, date of admission or discharge, city, state, or zip code are not considered identifiers because they are not unique.

Different providers will have different data use agreements and procedures to govern the sharing of limited datasets. For example, some will explicitly require that the IRB be notified if the limited data set is shared with someone who was not named in the authorization. Some will require that a short form version of a data use agreement is signed if it is shared with another researcher in the same institute.

Keep yourself informed by reviewing your provider’s policy on sharing of limited data sets. If such information is not easily accessible on their website, ask them if they share data with the providers.