Why is my data needed in clinical or genomic research and what should I consider to participate?

When researchers and doctors pool your health data along with data of people with similar conditions, they can look for patterns in the data to improve diagnosis and care in a number of ways. Your health data will help researchers:

improve diagnostics and clinical care
understand causes of diseases
develop new treatments and preventive care methods
improve patient safety
improve population health managementPopulation health management refers to the process of improving clinical health outcomes of a defined group of individuals through improved care coordination and patient engagement... More by payers and providers
evaluate government health policy

In all of these ways, your data can contribute to the advancement of health research and care for everyone. If you are a member of a historically vulnerable or underserved community, your data is all the more relevant because clinical research in the past has not always included people like you.

How do researchers access my health data? When do they need my consent to access my data?

Researchers can access your health information in multiple ways. This access is regulated by the Health Insurance Portability and Accountability Act of 1996 (HIPAA)The Health Insurance Portability and Accountability Act of 1996 (HIPAA) is a federal law that required the creation of national standards to protect sensitive patient health inform... as well as the regulations governing human subject research (the Federal Policy for the Protection of Human Subjects known as the Common RuleAlso known as the Federal Policy for the Protection of Human Subjects, the Common Rule was published in 1991 and codified in separate regulations by 15 Federal departments and agen... More). Your express consent is required by law only in certain situations.

Your consent is needed when:

Researchers associated with healthcare providers, pharmaceutical companies, research institutes, patient banks, genomic banks, and others can access your ‘personally identifiable information’ or PII after you have authorized them to.

Your consent is not needed when:

Researchers can access your information without your consent when your health data has been ‘de-identified’. De-identifiedA record in which identifying information is removed. Under the HIPAA Privacy Rule, data are de-identified if either: an experienced expert determines that the risk that certain in... More information is information that has been altered to remove certain information that may help identify a specific person.
Researchers can access your information without your consent when your health data is part of what is known as a ‘limited data set’. A limited dataset is a data set where many important identifiers are removed - however, it is not a de-identified dataset and it is considered to be PHI.
Researchers can access your information without your consent if a special review board (commonly known as an ‘Institutional Review Board’ or IRB) has allowed them to do so. An IRB is a body that has to be set up under the Common Rule by any research institution conducting human subject research.

Regulations in the United States generally allow you to opt-in to data sharing for clinical research only when a provider or researcher is collecting PHI directly from you. As a result, you may unknowingly contribute your personal health data to research when it has been de-identified or if it has been part of a limited dataset and protected by HIPAA.

What is protected health information? And how does it relate to medical research?

To understand de-identification, it is important to first understand the concept of ‘protected health information’ or ‘PHI’. PHI is information about a person's health, their health condition, or the payment for health services rendered to them. For example, your date of birth is not PHI on its own. However, when it is linked with any of your health information - if you were a patient at X hospital, for example - it becomes PHI. Please note that the standards applied by providers vary: for example, some researchers will take the position that any information about a person’s participation in a study is PHI.

The HIPAA regulations only apply to your data when it is considered PHI. De-identified data is not considered PHI for the purpose of law - therefore, when your data is shared by your provider with researchers after it has been de-identified, the law does not protect it. However, when the provider shares data in the form of a limited dataset, it is still considered PHI.

What can I do to ensure that I understand the consequences of participation in clinical research from a privacy and confidentiality perspective?

You should take steps to ensure that you understand the benefits and risks of participating in research being conducted by your healthcare provider or any other research program attempting to enroll you. Research institutes tend to treat the process of your consent as a regulatory compliance exercise or a one-off discussion. However, thanks to work by patient advocates and other groups, there is a concerted push towards adopting better models of patient engagement through, for example, ‘dynamic consent’. You might also want to ensure that the research aims do not conflict with your moral or other values.

You can use the following framework (informed by the recommendations of the Clinical Trials Transformation Initiative) to evaluate if the research institute conducting the trial is using ethical processes and ensuring informed consent from its subjects:

Ongoing and interactive conversation: Do the researchers treat the process of obtaining your consent as an ongoing and interactive communication? Ideally, the researchers should not simply administer a consent form at the beginning of the trial with limited to no engagement afterwards.
Customized consent: Have the researchers taken care to customize the consent process to the particular needs of the patient group you are part of?
Communication and responsiveness: Are the people taking your consent able to effectively communicate information specific to your trial? Are they responsive to the needs of your patient group?
Discussion tools and interactive techniques: Do the researchers use discussion tools and interactive techniques to ensure that your consent is well informed? Some factors you might consider are whether the researchers respond well to your education level, health literacy, disabilities, and other factors that may impact your ability to provide consent.
Plain language explanations: Do they explain in plain language important considerations in a clinical trial such as its purpose, how long it will last, compensation for injury, and how you can withdraw? Do they explain in plain language what steps they are taking to protect your data and what de-identification techniques they employ while sharing your data?

How is my data shared when I participate in population genomic data research?

Population genomic data projects have been launched around the world to make advances in precision medicineMedical care designed to optimize efficiency or therapeutic benefit for particular groups of patients, especially by using genetic or molecular profiling. - Oxford English Dictiona... More. These are often launched by national governments to learn more about complex diseases that are often caused by a combination of genetic and environmental factors. These programs are also increasingly embracing the new wave of ‘dynamic consent’ models which are based on personalized communication platforms that aim to make the consent process a continuous two-way communication between researchers and participants.

‘Dynamic consent’ models have many features that you can use to evaluate how much control they will give you over how your data is used:

Does the program include an online portal to deliver information and allow you to change permissions or preferences?
Does the program allow you the opportunity to accept or decline participation in new research opportunities or studies?
Does the program allow you to receive individually interpreted results (personal medical information) or raw sequence data?
Does the program allow you to select between different levels of consent types (broad consent, per-study consent, etc.)?
Are you able to opt-out of the database at any time?

In addition to consent with respect to control of data, there are a number of other features that make a model dynamic including a dynamic education component, timely updates regarding the research, and more.

Example: The All of Us Research Program

The National Institute of Health’s ‘All of Us Research Program’ aims to collect genomic data from more than 1 million people with the goal of speeding up health research breakthroughs. Patient data from the All of Us Research Program is put in a cloud based repository which researchers will have access to. Authorized researchers are allowed access to individual level data. There is a public browser which presents aggregate data. The program allows those who sign-up to choose to provide access to their EHR data. The program allows users to submit data from wearables and may link that data with geolocation data and other datasets such as insurance claims data. The program, which specifically aims to include participants from historically underrepresented groups, employs a dynamic model to engage patients. The model:

Presents information about the study on a dynamic online portal
Allows individual medical results to be returned to participants if they are medically actionable,
Makes genetic sequencing data and wearable sensor data available to participants without interpretation,
Takes steps to verify that participants understand the concepts communicated before signing the form.

The All of Us Research Program does not allow the user to select different consent models. However, it scores well on other metrics by enabling dynamic education, the timely release of research information, and up-to-date research progress.

How is my data protected if I submit my data to a genomic research project? How can I evaluate their privacy and security practices?

Genomic research programs like All of Us raise important questions about the collection of non-health data by the government: How will that data be used to determine eligibility for services under MedicaidMedicaid is a joint federal and state program that, together with the Children’s Health Insurance Program (CHIP), provides health coverage to over 72.5 million Americans, includi... More and MedicareMedicare is the federal health insurance program for: people who are 65 or older, certain younger people with disabilities, people with End-Stage Renal Disease (permanent kidney fa... More? Will it be integrated with mobile health data to create risk profiles? Will aggregated de-identified data be used for non-health policy making?

There are some existing legal protections for federally funded research programs. For example, the NIH Certificate of Confidentiality protects researchers from being forced to disclose identifying information about you in legal proceedings. The Genetic Information Nondiscrimination Act (GINA) prohibits discirmination in health insurance and employment based on such information. However, many of the issues raised by genomic research programs are not addressed through these laws.

Given the limitations in law, some questions to ask yourself when evaluating these programs include:

Do these programs recognize the importance of implementing robust security and privacy practices?
How do they communicate these practices?
Who is responsible for implementing them?
What steps are taken to de-identify data? What other steps are taken in presenting aggregate data?
Do they allow public access to participant level data?

The underlying mandates of programs like All of Us require the organizations who sign up to access the data to have privacy protection and security frameworks that are in line with NIST standardsStandards Developed by the National Institute on Standards and Technology. However, since these practices are expected to be implemented by the organization conducting the research, there is limited visibility on how they are carried out in practice.

Example: The Million Veterans Program

Another research program which has an online platform to enable voluntary participation is the ‘Million Veterans Program’ (Million Veterans). Million Veterans collects genomic data to research cancer risk, diabetes complications, mental illnesses including PTSD and depression, and other issues among veterans. It communicates its efforts to guarantee confidentiality and privacy through a well drafted FAQ page. For example, it explains its de-identification techniques in plain language. It also provides information on situations in which research can be shared, the risk of genetic discrimination, and the inability of insurance companies or employers to access this information.

Is my health data protected if it has been de-identified? What protections exist in law?

The law (in this case, HIPAA) protects your data prior to de-identification but not once it has been de-identified. It prescribes certain standards or technical approaches that must be met in order for data to be considered de-identified. Once these standards have been met, the law does not apply to your data.

It is important to understand that there will always be a risk of re-identificationRe-identification is the process by which anonymized personal data is matched with its true owner. In order to protect the privacy interests of consumers, personal identifiers, suc... More of de-identified data. With the evolution of data science and technology, it is possible to match de-identified datasets with other third party data to re-identify individuals with a high degree of accuracy. As a patient, you typically do not have visibility into the companies with whom your data is being shared in such de-identified datasets.

Keep yourself informed by reviewing your healthcare provider’s privacy policies to understand how they are sharing de-identified data.If such information is not easily accessible on their website, ask them if they share data with the providers.

Who can access my information if it is part of a limited data set? What protections exist in law?

The law allows hospitals and other providers to share your protected health information with researchers and public health personnel if they have signed a ‘data use agreement’ with the provider and the provider has removed a number of identifiers in this ‘limited data set’. While the primary researchers conducting the study would have obtained your permission under the Common Rule, a data use agreementA data use agreement (DUA) is an agreement that is required under the Privacy Rule and must be entered into before there is any use or disclosure of a limited data set...to an outs... More will be used when the primary researcher in the hospital or clinical setting wants to share the limited data seta data set where many important identifiers are removed - however, it is not a de-identified dataset and it is considered to be PHI. with a colleague at another institution not involved in the trial, or with a private registry not involved in the study.

Identifiers are data that are unique to an individual and include information like your name, street address, telephone, fax number, email address, social security number, or medical records numbers. In this instance, information like your date of birth, date of admission or discharge, city, state, or zip code are not considered identifiers because they are not unique.

Different providers will have different data use agreements and procedures to govern the sharing of limited datasets. For example, some will explicitly require that the IRB be notified if the limited data set is shared with someone who was not named in the authorization. Some will require that a short form version of a data use agreement is signed if it is shared with another researcher in the same institute.

Keep yourself informed by reviewing your provider’s policy on sharing of limited data sets. If such information is not easily accessible on their website, ask them if they share data with the providers.