1 Introduction

Noncommunicable diseases (NCDs) are the leading causes of mortality globally and a major global health challenge that affects people in all countries, regardless of their socioeconomic status (World Health Organization 2018; GBD 2017 Disease and Injury Incidence and Prevalence Collaborators 2018).

Approximately 30% of NCD-related deaths are considered premature, occurring before the age of 70 years, and approximately 85% of premature deaths occur in low- and middle-income countries (World Health Organization 2018). However, the prevalence of many NCDs increases with age (GBD 2017 Disease and Injury Incidence and Prevalence Collaborators 2018; Global Burden of Disease Study Collaborators 2015). The continued growth of the aging population, which is often pronounced in developed economies with relatively advanced healthcare systems, will increase the prevalence of NCDs and accentuate disease burden.

The World Health Organization (WHO) has identified 4 main types of NCDs that contribute the greatest burden. They are cardiovascular diseases (CVDs), cancers, chronic respiratory diseases, and diabetes (Fig. 1). In 2016, NCDs accounted for 71% (41 million) of deaths worldwide, with 44% of these deaths attributable to CVD, 22% to cancer, 9% to chronic respiratory disease, and 4% to diabetes (World Health Organization 2018). In the US, CVD accounted for approximately 1 of every 3 deaths that year (Benjamin et al. 2019). In addition, neurological conditions and mental health disorders, such as anxiety disorders, migraine, major depressive disorder, bipolar disorder, and Alzheimer’s disease, have emerged as major causes of disability (Global Burden of Disease Study Collaborators 2015). Indeed, mental health and well-being have been highlighted by the United Nations as important components of the goal to reduce premature mortality from NCDs by 33% over the next 10 years (United Nations 2020).

Fig. 1
figure 1

Global annual deaths by key NCDs; source: World Health Organizations (World Health Organization 2018)

NCDs are characterized by their long duration or continual recurrence (i.e., chronic) and slow progression, and many NCDs have high prevalence. For example, hypertension alone is diagnosed in over 20% of adults in a vast majority of countries across North America, Latin America, Europe, Asia, Africa, and Oceania (Clarivate Analytics 2020). NCDs are also major causes of morbidity and disability, including anxiety, depression, pain, and mobility impairment (Lisy et al. 2018). Data from the Global Burden of Disease study showed that the most common chronic sequelae, as consequences of disease, are largely attributable to NCDs (Global Burden of Disease Study Collaborators 2015).

Those who are most socioeconomically disadvantaged are often at a high risk of developing NCDs (Murray et al. 2005; Marmot and Bell 2019; Nulu 2017). Furthermore, in the era of COVID-19, NCDs pose an even greater threat. According to the Centers for Disease Control and Prevention, “people of any age with the following conditions are at increased risk for COVID-19: cancer; chronic kidney disease; chronic obstructive pulmonary disease (COPD); immunocompromised state from solid organ transplant; obesity (body mass index [BMI] of 30 or higher); CVD; sickle cell disease; and Type 2 diabetes mellitus” (Centers for Disease Control and Prevention 2020).

The economic costs of NCDs are burdensome to countries around the world (Bloom et al. 2011). Besides NCD treatment and control, the negative impact also includes the reduced productivity at work, more days absent from work (absenteeism), and early retirement, both for individuals with NCDs and due to premature death. These costs can lower family economic status, as well as national economic output. Furthermore, national funds that need to be deployed toward treatment of NCDs take away the funds that might otherwise be invested in infrastructure, research, and education (Chen et al. 2018). Studies also suggest that the medical and economic burdens due to NCDs, which are already high, continues to increase, especially in less-developed economies, as well as in middle- and high-income countries (Global Burden of Disease Risk Factors Collaborators 2016; Timmis et al. 2020). Therefore, how the social determinants of health (SDOH) affect NCDs is an increasingly important area of focus (Marmot and Bell 2019).

Governments around the world and international organizations, such as the World Bank and WHO, are increasingly committed to NCD prevention and control (World Health Organization 2013). Efforts to reduce the burden of NCDs are increasingly shifted from treatment to prevention. This is because the majority of NCDs occur as a result of modifiable risk factors. Therefore, reducing and controlling these risk factors is an effective means of reducing the burden of NCDs.

The WHO has called for NCD prevention efforts to focus on the following modifiable behavioral risk factors (tobacco use, physical inactivity, the harmful use of alcohol and unhealthy diets; Fig. 1) and metabolic risk factors (raised blood pressure, overweight/obesity, hyperglycemia, and hyperlipidemia). Prevention not only reduces NCD-related suffering to patients but is also more cost-effective to societies. For example, programs targeted at preventing or treating NCDs can have a significantly beneficial effect, with recent estimates suggesting that every US $1 spent on tackling NCDs will have a return of at least US $7 in the following 10 years (World Health Organization 2014).

A global comprehensive approach is needed to reduce the burden of NCDs, which requires the collaboration across various sectors, including academia, industry and governments (Upjohn 2020). This approach should reduce the risk factors for NCD and promote interventions to prevent and control NCDs and is especially important during the COVID-19 pandemic (Hassan et al. 2020). Organizations such as the WHO have realized the importance of innovative data visualization to help educate on both the importance of reducing the global burden of NCDs and the stopping the spread of the COVID-19 during 2020. Real-world data (RWD) and real-world evidence (RWE) play an important role for international organizations, governments, and societies, helping them to make informed decisions regarding NCD prevention and control.

2 Harness RWE to prevent and control NCDs

2.1 RWD and RWE

According to the US Food and Drug Administration (FDA), RWD means data collected outside of the framework of randomized, controlled trials (RCTs) (US Food and Drug Administration 2019). Furthermore, RWE is generated through the analysis of RWD and is used to answer specific clinical and research questions. The lack of RCTs designed to assess the burden and degree of the comorbid conditions occurring with NCDs means that RWE is the greatest resource available to study this important area. Global advances in information technologies and telecommunication infrastructures have enabled a massive amount of RWD to be generated from diverse data sources. The wide variety of data sources include pharmacovigilance databases, electronic medical records (EMRs) including medical images/imaging data and free-text notes from healthcare providers, electronic health records (EHRs), administrative insurance claims, patient registries, population health surveys, medical researches including genomics studies, data collected from digital apps and digital recording devices including wearable devices, and various other sources (Fig. 2).

Fig. 2
figure 2

RWD, RWE, and big data

RWD can be claims and transactions for healthcare resource utilization, electronic health records, surveys, linked datasets, and other digital data collected outside a traditional clinical trial. Because RWD can be generated at a low cost (relative to RCTs) and rapidly (e.g., streamed from wearable devices), it is often stored and processed in considerable quantity. Thus, it can be so-called big data due to its “volume” (there’s a lot of it), “variety” (the data takes many different forms), “velocity” (the data changes or are updated frequently) and “veracity” (the data may be of poor/unknown quality) (IBM 2020; Seth 2014). The immense potential value of data in the modern world has led to it being described as the “the New Oil” (The Economist 2017). Pharmaceutical companies are now investing in their RWE programs to increase their capabilities in this arena, across all aspects of the drug development and approval process (Deloitte 2017; Davis et al. 2018; Morgan et al. 2020).

2.2 Opportunities

RWE is the foundation for the understanding of disease epidemiology, including rates of disease incidence and prevalence, awareness, diagnosis, treatment, and control. RWE plays a critical role in quantifying disease burden, which can be measured according to the following aspects: patients’ lives saved/lost, gain/loss of daily function, work productivity, and income; quality of life; healthcare resource usage. RWE can also be used to detect vulnerable populations (e.g., elderly and persons at high risk of NCDs) and identify the most influential risk factors, which may lead to new and non-traditional solutions to clinical problems (Batra and Cheung 2019). Recently, RWE has been increasingly used in the regulatory arena for gaining label expansion, as well as accelerating drug approvals due to regulatory authorities now being more receptive to reviewing RWE (Katkade et al. 2018; Zou et al. 2020).

There is an opportunity to use RWE to study diverse populations that are frequently underrepresented in both clinical and observational studies. For example, patients who have certain diseases or clinical/demographic features (e.g. advanced age) are often excluded from RCTs (Kennedy-Martin et al. 2015). Nevertheless, how patients respond to treatment is important especially in areas such as NCDs, which are likely to impact many patients. For example, several observational RWD studies have identified commonly occurring NCDs as frequent comorbidities for COVID-19 (Yang and Jin 2020; Docherty et al. 2020; Richardson et al. 2020; Onder et al. 2020; Hassan et al. 2020). These comorbid NCDs such as CVD, chronic pulmonary disease and diabetes have also been demonstrated to worsen the clinical outcomes and increase the risk of death in those infected with COVID-19 (Docherty et al. 2020; Bergman et al. 2020). This understanding can help deploy preventive strategies to identify people at most risk of contracting severe COVID-19. This will in turn avoid overburdening the healthcare system. Finally, RWE can be used to measure and evaluate the effectiveness and cost-effectiveness of treatments, as well as programs targeting SDOHs (Li et al. 2020), and public policies for NCDs.

Global and societal efforts to reduce the populations’ risk of NCDs (World Health Organization 2013, 2018) can be supported by using RWE to monitor incidence and prevalence trends of NCDs and risk factors, and to target prevention measures at populations that are vulnerable to NCDs. On a larger scale, risk reduction and increased access to interventions can be attained through collaboration among public health policymakers, payors, healthcare providers, and patient groups. Disease management can improve, as primary care providers adjust their treatment approaches, access to screening, detection, and treatment services, and more people have access to and can benefit from palliative care. The rationales for these changes will rely heavily on the availability and intelligent use of RWE. Only when stakeholders clearly understand the context and implications of NCDs can they begin to effect and direct changes. Another important aspect to consider is regional and racial differences and variations of the prevalence of NCDs and the various risk factors for these diseases (World Health Organization 2013, 2016). This aspect can provide key information as to the effectiveness of local policy initiatives targeted at NCDs as well as the influence of various factors and behaviors that could influence NCDs, e.g., smoking and obesity (Kontis et al. 2015; Office of Disease Prevention and Health Promotion 2020).

Although RCTs remain critical in determining treatment safety, efficacy, and mechanisms of action (Collins et al. 2020), their focus of patient selection and controlled clinical trial setting make it difficult to generalize the findings of RCTs to real-world clinical practice, which often has many confounding factors affecting the effectiveness of the treatment (Sherman et al. 2016). In addition, RCTs often focus on the effects of a single disease on outcomes, rather than multiple related conditions, which given the high levels of co-occurrence of NCDs, can be a limitation. Furthermore, RCTs are time-consuming and are becoming increasingly costly to conduct (Baumfeld Andre et al. 2020). This is especially true due to the chronic nature of the diseases being discussed. For example, patient follow-up in RCTs is often insufficient to have a clear understanding of patient safety. In contrast, RWE allows a treatment’s effectiveness and tolerability to be evaluated in real-world practice, and therefore, RWE is important to assess the long-term safety of medications and can help identify rare adverse events (Collins et al. 2020). Furthermore, RWE provides tremendous opportunities to develop a more holistic understanding of patients and more effective approaches for comprehensive disease management. This is key to modifying patients’ behaviors such as improving adherence with treatment, which in turn helps optimize outcomes (see below). While it is premature to suggest that RWE will replace RCTs, RWE provides an important complementary mechanism to RCTs for healthcare professionals seeking to find novel solutions to address the burden of NCDs.

For example, RWE has been used for many years to study adherence with medications to treat NCDs and evaluate methods of improving adherence with these medications. Poor adherence to medications to treat NCDs is a major global issue. Low adherence with treatment increases the morbidity and mortality burden of NCDs, even in high-income countries (Khan and Socha-Dietrich 2018; Brown and Bussell 2011; Cutler et al. 2018) where effective therapies are available, disease burden can only be reduced if patients adhere to the treatment for the prescribed duration (World Health Organization 2003; Shau et al. 2019). Improving patient adherence to existing interventions increases treatment effectiveness, resulting in significant overall cost-savings associated with disease burden. Therefore, an additional key use of RWE in NCD prevention and control is to better understand how we can improve the complex issues surrounding treatment adherence and persistence (Cramer et al. 2008). To address this challenge, RWE can be used to estimate adherence and persistence rates and assess the factors associated with these rates (Chen et al. 2019). Such information can be translated into realistic plans based on authentic insights to improve the proportion of patents adherent with their therapeutic regimens.

2.3 Challenges

The availability of RWD may depend on technology, digitization, data capture systems, and data flow regulations (US Food and Drug Administration 2013). RWD is usually not collected for research purposes and thus can be messy and in many different formats or forms. Data accuracy, reliability, and quality must be taken into account when using RWD for research (Sherman et al. 2016). To make the most of these research opportunities, innovative digital/analytic capabilities and technologies must be enabled and elevated globally for use by medical researchers serving in industry, government, and academia. Given the variety of data sources for RWD, important issues arise around data integrity, integration, access, interoperability, standardization, quality control, security, privacy protection, and ethical standards for data use.

Informed consent is a key consideration in RWD. The US Food and Drug Administration in its guidance for Institutional Review Boards and Clinical Investigators states that “no investigator may involve a human being as a subject in research covered by these regulations unless the investigator has obtained the legally effective informed consent of the subject or the subject’s legally authorized representative. An investigator seeks such consent only under circumstances that provide the prospective subject or the representative sufficient opportunity to consider whether or not to participate and that minimize the possibility of coercion or undue influence. The information that is given to the subject or the representative shall be in language understandable to the subject or the representative. No informed consent, whether oral or written, may include any exculpatory language through which the subject or the representative is made to waive or appear to waive any of the subject’s rights, or releases or appears to release the investigator, the sponsor, the institution, or its agents from liability for negligence” (US Food and Drug Administration 1998). However, informed consent needs to be carefully examined and discussed. Nevertheless, informed consent can limit the access to certain types of data or information from some regions. Advances in the blockchain technology can also enable dynamic informed consent (Mamo et al. 2020).

When data are available and in aggregate, cutting-edge innovations might be introduced to address such challenges. This is because, in order to be of any use, structured data (i.e., encoded in a standardized data format) and unstructured data (e.g., free-text physician notes) from various sources may be combined and curated into databases using interoperable systems, which can be complicated. This is further complicated when databases obtained from various divergent sources are housed in separate repositories and data warehouses, which creates a major barrier to accessing and collating the relevant data.

Language differences between countries also present a common challenge in this regard. Furthermore, in many countries, data are collected and stored in siloed patient registries in non-standardized ways. These registries are often not connected with each other and contain diverse sets of information, which complicates analysis further.

Even within the same country, such as the United States, linking patient-level data across various sources can be problematic. Sometimes this can be accomplished using tokens that are generated from certain subsets of patient protected health information under the Health Insurance Portability and Accountability Act of 1996 (HIPAA) rule (US Department of Health & Human Services 2013, 2020; Centers for Disease Control and Prevention 2018). Other times, linkage can occur according to dates or locations of the treatment (Curtis et al. 2014). In many instances, some level of probabilistic matching, such as propensity score matching, may be required across multiple data sources or datasets (Desai et al. 2017). Sensitivity analyses based on different probabilistic matches can be considered to examine the robustness of the matching algorithm.

Databases can also be difficult to standardize; different diagnostic and treatment codes need to be harmonized (Observational Health Data Sciences and Informatics 2020). Similarly, it is important to establish and utilize standard and specific analytical methods and algorithms. For example, the Observational Health Data Sciences and Informatics (OHDSI) group is at the forefront of harmonizing and standardizing the analysis of observational data and has created a common-data model (Hripcsak et al. 2015). This, along with similar initiatives, will improve data reproducibility and the potential for groups to collaborate while working towards a common aim.

An additional complication in the analysis of RWD can be data and privacy protection laws, such as the California Consumer Privacy Act of 2018 (CCPA) (State of California Department of Justice 2018) or the European Union General Data Protection Regulation (GDPR) (European Commission 2019), which will need to be considered carefully when utilizing data from the European Union, even without containing personal identifiable information.

Moreover, when utilizing data from digital devices, it is necessary to consider how best to accurately combine data. For example, glucose monitors may not calculate summary measures in exactly the same way, which necessitates getting the raw data and using software package such as “cgmanalysis” in R to compute values in a consistent manner (Vigers 2020).

Once these initial challenges are addressed, robust data curation and analytic algorithms must be derived that are tailored toward specific patient populations. These algorithms will aid the identification of persons with, or at risk of, NCDs and help to address the challenge of NCDs throughout a patient’s NCD journey, including detection, screening, diagnosis, treatment, and monitoring. To achieve this, the current shortage of data translators (who act as shepherds with subject-matter expert knowledge) and data scientists will need to be addressed.

The medical and scientific insights gained through RWE-generation can potentially impact global policies to reduce the burden of NCDs. In addition, RWE can help provide opportunities to analyze both cross-sectional and longitudinal data to assess and monitor the effectiveness of various interventions designed to target NCDs and risk factors for NCDs.

3 Conclusions

There is a critical need to address the substantial burden associated with NCDs. This is even more urgent in the COVID-19 era given the increased risk of poor clinical outcomes in patients with NCDs who become infected with COVID-19. Some of the limitations associated with RCTs, notably their often narrow focus, can be addressed by using RWD. Therefore, over the coming years, we anticipate seeing more of the power of RWD and RWE being harnessed to address the immense healthcare burden associated with NCDs. Innovative techniques for both capturing and analyzing data will be utilized. For example, a distributed research network is useful for generating  RWE. To enhance the effectiveness and efficiency of healthcare delivery, it is important to understand the risk factors for disease progression, treatment patterns, and utilization. Educational initiatives regarding the potential for RWD and RWE to deliver patient-centric, value-base healthcare will lead to further improvements in the management of NCDs. However, it is also important to strike the right balance of understanding the challenges and limitations of RWD and RWE to ensure that they are utilized correctly.

Finally, partnerships can play a key role in improving the utilization of RWE. Fruitful collaborative research opportunities exist across different healthcare stakeholders, including academia, industry and government, based on health information technology and innovation for gaining valuable insights. This will, in turn, have a substantial beneficial impact on the prevention and management of NCDs.