Quality of Care in Performance-Based Financing: How It Is Incorporated in 32 Programs Across 28 Countries

Structural aspects of quality such as equipment and infrastructure were the most frequently measured, with some measurement of processes of clinical care. Further examination is warranted to assess whether variations in how quality of care is incorporated into performance-based financing programs lead to differential effects.

INTRODUCTION P erformance-based financing (PBF)-a mechanism by which health care providers or facilities earn incentives on the basis of achieving specific performance criteria-is emerging as an important tool to encourage providers and facilities to become more efficient and responsive to their clients. 1 Because PBF allows narrow targeting of health services and requires measurement and verification of progress, it is increasingly appealing to implementers and policy makers as a path to making progress toward the health-related Sustainable Development Goals (SDGs). In recent years, PBF programs have proliferated in many low-and middleincome countries (LMICs), often with technical and financial support from donors and international agencies. 2 For example, in 2015 the World Bank's Health Results Innovation Trust Fund supported 36 PBF programs on maternal and child health, associated with US$400 million in grants and US$2.2 billion in concessional loans. 3 In addition to paying providers and facilities for the quantity of services provided, PBF programs also often explicitly address quality of care in their payment formulas. Quality is included either directly, by paying for specific indicators, or indirectly, by modifying the overall bonus payment according to a broader measure of quality. There are several reasons to account for quality. First, providers may compromise quality when increasing the volume of services in response to the payment incentives. 2 Second, quality is increasingly recognized as a priority area in its own right. Third, to the extent that demand responds to quality, increasing quality can also help achieve desired increases in service utilization. Existing evidence indicates substantial gaps and variations in quality in many settings, [4][5][6] which has contributed to the inclusion of quality in the global development agenda. For example, one of the targets for SDG 3 (ensure healthy lives and promote well-being) is to achieve . . . access to quality essential health care services and . . . quality and affordable essential medicines and vaccines . . .. 7 PBF programs can potentially contribute to achieving these goals.
However, there is little systematic evidence on the design and implementation aspects of how existing PBF programs account for quality of care. While many studies focus on an individual PBF program's impact, there appears to be substantial heterogeneity in design and operational features of such programs, 2,8,9 reflecting the fact that PBF is comprised of a range of approaches rather than a uniform method. This variation has led to calls for better documentation of programs to better interpret impact estimates and provide practical guidance to policy makers. 8,9 In this article, we review how 32 PBF programs in 28 countries integrate quality of care within the programs' designs. Drawing on PBF program documents, we describe existing practice for how quality enters into the PBF payment formula, what quality indicators are being used, and how these measures are verified. This allows us to provide a deeper review of program parameters, describe both commonalities and variations across programs, and identify areas for further research and program development.

METHODS
This study employed an exploratory scoping methodology to characterize the full range of quality components in PBF and potential gaps that require further research. For our purposes, we focus on the supply-side performance-based incentives that are targeted at individual health facilities, and the payments that are linked to outputs and possibly modified by quality indicators. 8

Identifying Programs
First, we compiled a list of known existing supplyside, health facility-based PBF programs in LMICs based on a document review of published analyses in both the peer-reviewed and gray literature. We also identified existing programs through expert consultation with a number of key donor representatives from the World Bank, Kreditanstalt Für Wiederaufbau (KfW), the U.S. Centers for Disease Control and Prevention (CDC), and the U.S. Agency for International Development (USAID). Each donor provided a list of their PBF programs and a key contact person for each, if available. We solicited programmatic information from implementers and donors primarily through email. Our research team collected and organized program manuals and accompanying tools used to measure quality performance for all facility levels (primary, secondary, and tertiary).
All programs identified were included if sufficient program information could be obtained ( Figure 1). Programs were not excluded based on year of implementation, size, or phase (i.e., smallscale pilots to national implementations).

Abstracting and Coding
We systematically reviewed each of the current (most recent) program manuals and entered information into a Microsoft Excel database that captured key quality of care program attributes, including PBF scheme (purchaser, regulator, provider), payment formula, quality assessment processes (e.g., checklists or direct observations), geographic coverage, funder, level of facility, targeted health services, and the verification process. If the information in the manual was unclear, we followed up with the implementer or donor to obtain clarification. For countries with multiple programs, we included all for which we had sufficient information.
We also collected quality checklists for all levels of care and entered the quality indicators contained in the checklists in a distinct database, including revised checklists for the same program. We copied indicators verbatim from the original checklist documents and pasted them into the database. We translated indicators in languages other than English, primarily French, and confirmed the translation with a proficient speaker.
In addition to paying for the quantity of services provided, performancebased financing (PBF) programs also often explicitly address quality of care in their payment formulas.
There is little systematic evidence on the design and implementation of how PBF programs account for quality of care. For the purposes of this study, we defined an indicator as any measure with an associated point value in a PBF quality checklist, i.e., an indicator that could affect PBF payments. Some checklists included criteria to fulfill an indicator that did not have an associated score, and these criteria were therefore not considered to be indicators. Because checklists varied in the maximum point value, we also transformed the point value into a weight that could be compared across checklists. The weight for each indicator was calculated as a percentage value of the entire checklist of its particular PBF scheme, so that the sum of all indicators' weights within an individual PBF checklist totaled to 100. Data from the manuals and checklists were input by 3 researchers in multiple phases. Predetermined definitions were used to classify each programmatic component and indicator. After entering half of the indicators, a second researcher reviewed the database for consistency. Once all indicators were entered, the third researcher reviewed all entries. Difficult classifications were resolved through team discussions.

Analysis
We primarily used Microsoft Excel pivot tables to compare basic characteristics across the PBF programs, including regional distribution, funding source, geographic coverage, and bonus recipient categories. Several specific analyses were conducted on the classification of payment types, verification and means of assessments, and service types.
Health facilities receiving PBF payments typically use one of two performance payment types. The first type is a "carrot-and-stick" approach that uses a combination of rewards and punishment to induce behavior change. The "carrot" refers to the quantity payment and the "stick" is a deflator associated with the quality performance, i.e., the bonus is reduced if the quality score is less than the maximum. 2 The second type of performance  payment is a carrot-and-carrot approach, consisting of a bonus payment for the quality performance that is added to the quantity payment. 2 This dichotomy indicates whether the program rewards or penalizes a health facility based on quality performance. However, penalties and rewards can also be calculated using additional measures and in different ways. We therefore classified programs into 7 different payment types. The taxonomy for the type of payment used was developed by coding all programs on the basis of: (1) the relationships between quality and quantity, and (2) the presence of a threshold performance score. Each of the payment types are defined and visually displayed in Table 1. We retained the distinction between penalty and reward but further specified whether the payment's calculation was determined by quality performance thresholds.
Most PBF programs purchase services conditional on the verified quality of those services. Verification is the process by which the reported quantity of services provided and the quality scores are verified externally. Many programs verify performance at multiple levels of the health system; this assessment is concerned with the health facility performance. Counter-verification, or ex-post verification, is a supplemental verification process undertaken after the PBF payment has been distributed to assess that quality services were actually received by patients, typically through patient surveys and community assessments.
Programs also vary in their means of assessment and service types. We distinguished 7 means of assessment: checklists, register review, patient record review, direct observation, staff surveys, patient surveys, and exit interviews. We Quality score dictates amount of quality bonus received contingent upon achievement of a minimum quality score (threshold) required to receive any of the bonus.

Unconditional inflator
Quality bonus pool available; quality score dictates amount of quality bonus received.

Hybrid
Quality score can act as both an inflator and deflator depending on the quality score thresholds.
Quality only Payment is provided only for quality; no quantity payment.

Limitations
Our analysis has several limitations. First, we obtained program information from a small set of donors. As a result, our analytic sample is skewed toward programs with involvement of these donors, and programs of a particular donor may share design commonalities across countries. Related, there is no database of PBF programs worldwide that could help us establish the relative size or representativeness of our sample. Second, we were unable to obtain complete information on all PBF programs identified, and those programs for which complete information could not be obtained were excluded from the analysis.

Analytical Sample
The final analytic sample includes 32 PBF programs initiated between 2008 and 2015 in 28 LMICs. Collectively, these interventions used 68 quality tools and 8,490 quality indicators. Comprehensive information (programmatic manual and a set of quality tools) was available for 23 PBF programs; for 9 programs we received only quality tools without manuals (Supplementary  Table). Results on PBF program components are limited to those for which we received a manual. For 6 countries, we received multiple versions of revised checklists from different years. Three countries, the Democratic Republic of the Congo (DRC), Malawi, and Rwanda, had 2 concurrent PBF programs in distinct geographic regions and supported by different donors.

Primary Characteristics of the PBF Programs
The PBF programs included in the analysis were heavily concentrated in sub-Saharan Africa (n=21), followed by Europe and Central Asia (n=3), East Asia and the Pacific (n=2), South Asia (n=1), and Latin America and the Caribbean (n=1). The World Bank was the primary donor for 84% of the PBF programs (n=27), while a handful of programs were either partially or solely supported by other donors including USAID (n=5), the CDC (n=2), the Global Fund to Fight AIDS, Tuberculosis and Malaria (n=2), Gavi, the Vaccine Alliance (n=2), the United Nations Children's Program (UNICEF) (n=1) and KfW (n=1). A small set of the programs are cofinanced by country governments. Table 2 summarizes the characteristics of the geographic coverage, funding sources, payment typologies, and incentive allocation formulas for each of the 23 programs with manuals. Program coverage was predominately subnational, with just 4 of the 23 programs achieving national coverage and significant variation in the geographic coverage for the remaining programs. For each PBF program, the incentive payments were disbursed according to allocation formulas to 3 potential facility-based recipient categories: (1) the health facility, for reinvestment in supplies, infrastructure, and related items; (2) providers, as bonuses or salary top-ups; and (3) in some cases, facility management and administrative staff, also as bonuses or salary top-ups. The median percentage allocated to health facilities was 60% and ranged from 10% in Armenia to 100% in Burundi. The median percentage allocated to health care providers was 55%, ranging from 0% in Lesotho to 80% in Burkina Faso. In Armenia, Benin, and the DRC (USAID), a portion (10% to 20%) of the total PBF payment was distributed to facility-based managerial or administrative teams. Typically, the payments were allocated to all facility-based workers or facility-based workers responsible for PBF indicators.

Payment Type
In over half of the programs (n=13), performance on the quality checklists inflated the payments received by health facilities for their quantitative outputs (the carrot-carrot approach). Six of the programs were inflators without thresholds, meaning that health facilities received a quality bonus if they received a score >0%. The other 7 programs were conditional inflators with threshold scores ranging from 50% to 70% on quality checklists. Facilities had to exceed this threshold in order to increase the quantity payment.
Deflators or penalties tended to be unconditional (4 programs), meaning that the quantity payment could be deflated from 100% to 0% depending on the quality score. One program used a conditional deflator approach in which the quantity payment could deflate from 100% to the minimum threshold of 50%, and then quantity payments could be discontinuously reduced to zero if the quality score was below the threshold. Otherwise put, a minimum quality score of  Continued Incorporating Quality of Care in Performance-Based Financing www.ghspjournal.org 50% was required to receive any PBF payment for the quarter. Three programs were hybrids, meaning that the quality score could serve as either a bonus or a penalty depending on the facility's quality score (range from þ30% to À30%). The program in Armenia paid for performance solely based on quality checklists. There were 5 programs that included what is called an "equity bonus" for certain health facilities that was calculated based on eligibility criteria ranging from 0% to 30% of the quantity payment. The allocation of the equity bonus was irrespective of the facility quantity or quality performance and was intended to ensure that incentives were sufficient for health facilities in rural or hard-toreach areas with low population densities.
Countries with concurrent PBF programs (the DRC and Malawi) demonstrated variability in payment formulas. In the DRC, the USAIDfunded project payment was based on achievement against the target, with a cap for each indicator. Quality payment was based on a quality assessment score alongside the quantity indicator score and was subject to its own target (i.e., quality score multiplied by the quality payment cap). By contrast, the program funded by the World Bank, UNICEF, and the Global Fund set the quality bonus at 25% of the quantity payment only after a The Quality Performance Threshold (%) provides the performance threshold that health facilities must achieve in order to receive any quality bonus or incentive. For instance, in Burkina Faso, health facilities must score at least 50% on the quality checklist to receive a quality bonus. The amount of the bonus is quantity bonus amount ($) multiplied by quality score (50% or higher). c The Quality Payment Proportion (%) column details the quality payment proportion if the program provides further stipulations to the quality payment calculation beyond the Quality Performance Thresholds reported in the previous column. For instance, in Djibouti, if a health facility scores above 80% on the quality checklist, it receives a quality bonus that is equal to 30% of its calculated quantity bonus. However, if a health facility scores 45% on the quality checklist, it loses 10% of its anticipated quantity bonus. The percentages provided in the Quality Payment Proportion column are the proportion of the quantity payment that is allocated due to the quality score. None of the information provided in this facility scored 50% or above. In general, using descriptive analysis, our study did not find a relationship between payment type and related programmatic functions, for instance, allocation of incentives (between providers and facilities).

Health Facility Verification
Nearly all programs (91%, n=29) verified quality scores quarterly (every 3 months); the remaining 3 verified the scores biannually (Table 3). Verifiers were commonly regional government management teams, i.e., provincial, district, or health zone teams. In approximately half (8 of 15) of the programs with PBF programs at the tertiary and secondary level, the hospitals were verified using a peer-to-peer technique, in which a team of providers from one hospital verified another nearby hospital. The team composition and sampling of hospitals differed by program. All programs with complete information (n=23) included some type of counter verification, usually by an independent third party.

Means of Assessment
The means of assessment for quality indicators varied widely among PBF programs and between health facility levels ( Table 3). On average, 78% of the indicators collected were measured via checklists (6,656 of 8,490) and largely (over 90%) measured structural aspects including equipment, beds, and infrastructure. Record and register reviews each accounted for 9%, which, given the settings of these programs, required the verifier to page through multiple register books or paper-based patient records. The other assessment mechanisms included direct observation (3%) and surveys, staff interviews, and exit interviews (each <1%). Figure 2 and Figure 3 show the percentage of indicators, in the most recent checklists, that measure specific types of health services; Figure 2 focuses on primary health facilities and Figure 3 on secondary and tertiary facilities. General trends were similar for primary and secondary/tertiary health facilities. Checklists emphasized maternal care and facility management, followed by newborn and child care and facility equipment. On average, maternal, newborn, and child care indicators accounted for 34% of the weight of checklist points (of maximum number of points per checklist), varying between 37% for primary facilities and 28% for secondary/tertiary facilities. There was a 15% increase in the weight of points of inpatient and outpatient services from the primary facility level to the secondary/tertiary level. The increase was predominantly for structural attributes for inpatient services, such as surgical equipment and supplies. Table 4 lists 54 of the most common PBF program indicators across 10 service delivery categories. A majority (76%) of the indicators measured structural (physical) aspects of the health facility environment, while 24% measured processes of care delivered by the health worker. Indicators categorized in facility management, infrastructure, and maternal, newborn, and child health were more common (shared) across all checklists, compared with the other service categories.

DISCUSSION
This study found that the quality components of PBF programs are implemented in many contexts and with high variability and complexity. 8,11 Generally, the functional components for measuring and paying for quality (measurement tools, verification, payment formula) are consistent across programs, but the design and implementation differ.
For the programs included in this study, the quality payment formulas are split between bonuses and penalties. Within the same country (the DRC and Malawi), multiple PBF programs employ different payment formulas and allocation divisions for health care providers and facilities. Approximately half of the programs allocate 60% or more of the PBF payment (including the quality and quantity payments) to health facilities for reinvestment, while the other half allocates more than 60% to health care providers or splits it evenly between providers and facilities.
On average, there were 125 indicators per checklist to assess quality of care in PBF programs.
Nearly all PBF programs verified quality scores on a quarterly basis, usually by regional government management teams.
The quality indicators largely measured structural aspects such as equipment and infrastructure.
Most of the quality checklist indicators emphasized maternal care and facility management.
The quality payment formulas for the PBF programs included in this study are split between bonuses and penalties.       Note: Quarterly equates to a 3-month period of time. a Only the most recent quality checklist per program were included in this analysis, amounting to a total of 50 checklists. b Definitions for Means of Assessment: checklist, a verifier physically observes and assigns a point value; direct observation of a clinical consultation by the verifier; facility register, into which detailed patient contacts with the health facility are entered; patient record, in which consultation and treatment information is recorded by providers; patient survey, assessing the quality of care through a survey of patients; staff interview obtains information and knowledge from staff; exit interview, formal meeting with patient that is leaving the facility. Note: Classification of service types was guided by international standards into 10 categories to ease comparison. Inpatient and outpatient services were grouped together because the types of indicators and items being measured consisted of similar equipment and services.  Note: Classification of service types was guided by international standards into 10 categories to ease comparison. Inpatient and outpatient services were grouped together because the types of indicators and items being measured consisted of similar equipment and services.  Existence of well-kept fencing around health facility buildings 30 (44%) A communication system (radio or telephone) is effective 24/7 between health facility and next referral center(s) 21 (31%) Existence of the health map of the geographical area is available (and displayed) 20 (29%) Plan detailing the maintenance activities to be performed 8 (12%) Available general inventory of all furniture and equipment 8 (12%) Availability of electricity 24/7 (electricity, generator or solar power) 8 (12%)

Facility Management
Performance or activity reports submitted on time 32 (47%) Financial and accounting documents (including for RBF) available and well kept (bank statements, receipts, invoices etc.)

(46%)
Waste is treated and disposed properly in accordance with regulations of health care waste management (e.g., waste pit, placental pit, incinerator)

(40%)
Meeting minutes or documentation available from management or governing committee meeting 24 (35%) HMIS data analysis report for the quarter being assessed concerning priority problems 24 (35%) Business plan exists and is up-to-date 20 (29%)

Maternal Care
All deliveries are carried out by qualified personnel 27 (40%)  The justifications for the differences in allocation remains unexplained by donors and program designers. Moreover, it is difficult to discern what implications each payment type has on the quality of care provided and on provider behavior. PBF offers autonomy in quality investments through health facility managerial teams. However, determining the amount of a quality bonus requires knowledge of the quantity and quality score and the application of a complex formula. The implications for variable levels of fund allocation to facilities or staff and/or facility-level fiscal autonomy and strategic investment on quality dimensions, such as infrastructure and equipment, clinician competency, and patient satisfaction efforts, remains understudied.
Notably, the verification process is the most consistent across regions, with similar teams carrying out the verification on quarterly schedules. This is likely due to the availability of regional management teams already on the government payroll or the availability of donor-supported NGOs contracted to undertake the verification. This heavy reliance on regional management teams points to a common challenge faced by many countries-that is, the additional burden placed on these District Health Management Teams or equivalent regional teams to undertake regular verification of PBF facilities on an average of 125 indicators per facility. In addition, deploying district officials to conduct verifications could generate a conflict with their other roles, for example, to constructively monitor and support providers. 12 In the case of hospitals, peers may not be effective at verification-but in some contexts they may be some of the few experts qualified to assess quality. These issues point to the logistical and operational constraints in which PBF programs operate and also affirm the need for a well-articulated theory (or set of theories) of change for PBF. 13 Moreover, results from Table 3 point to a reliance on the checklist for assessment, limiting the utility of the PBF program to effect improvements in certain aspects of quality. Mixed modalities of assessment can address quality of care more holistically. For example, exit interviews and direct observation can inform the experience and provision of care while provider interviews can shed light on motivation. (Additional detail about indicator typology and measurement can be found in a related paper. 10 ) Selection of assessment methods is likely informed by trade-offs between cost and quality of data. For instance, register and patient record reviews may be less costly, but the quality data may vary. In Rwanda, patient record review, verified by qualified supervisors, were considered a valuable quality criterion, resulting in systemic improvements in data collection, monitoring, and supervision that contributed far more to the quality improvements than service delivery improvements. 14 Direct observations may yield good quality at relatively higher cost. One potential solution is to always conduct register reviews and supplement with direct observations for a random sample of facilities, hence maintaining this thorough measurement but at a lower overall cost. Moreover, the findings from Table 3 suggest that more cost-effective methods of assessment may need to be developed and/or employed such as clinical vignettes and tablet-or smartphone-based verification. Indeed, cost-effectiveness itself of different verification methods should be assessed to inform the selection of one method over another or a justification for using mixed methods. There is also consistency in quality assessment of service types. Figure 2 and Figure 3 demonstrate a clear preference for incentivizing maternal, newborn, and child health services and inpatient and outpatient services, suggesting a focus on burden of disease (mortality and morbidity). This could reflect homogeneity in policy priorities of the countries or donors, including the maternal, newborn, and child health focus of the Health Results Innovation Trust Fund of the World Bank, involved in these programs. Community engagement, non-communicable diseases, and pharmacy appear to have the fewest associated indicators, suggesting that these may be hardest to measure (community engagement), represent relatively low burden of disease or surveillance in the included countries (non-communicable diseases), or hardest to effect systemic improvements (supply chain in pharmacy) using PBF.
Our study also highlights the need for more systematic documentation. Theoretically, PBF should offer a wealth of data on quality of care given the length and frequency of measurement; however, this information remains hard to access by all actors. For policy makers and PBF practitioners, there is no comprehensive central repository for PBF program manuals and quality tools. The current structure of PBF manuals and quality checklists, long documents in PDF format, is not conducive to information sharing and aggregation, so the current state of practice has been unknown up to this point. Performance data from quality tools is inaccessible on a country or health facility level, with the notable exception of the PBF portal, which is an online platform that displays quantity and quality data at the facility level for select countries. 15 Although the portal is an important first step, sharing of health facility performance per quality indicator is required to better understand what types of quality measures are well suited for PBF. The growing PBF Community of Practice could be a good place to house both programmatic documentation and available performance data. 16 While our findings shed light on the current and past state-of-practice of addressing quality in PBF, they raise further questions. The observed differences in payment formula and allocation, service types, and length of the tools call for further examination of why each program is unique and the justification for the differences, and most importantly whether differences in design are associated with differential program impacts. Future foundational research could model the various incentives we identified in real-life PBF programs, also to characterize which approaches may be most effective, at least in theory. Specific research gaps related to program operations include detailed performance data and the percentage of incentives paid based on quality, leading to the cost-benefit to management and providers for completing the quality tool and investing in quality improvement measures. There is also the black box of PBF costs; calculating time costs to facility staff and quality-specific costs, predominantly verification costs. These costs and benefits should be compared with those of other quality assessment methods that are already being used like supportive supervision, accreditation, and independent quality evaluations by NGOs.

CONCLUSIONS
PBF is a potentially appealing instrument to address shortfalls in quality of care and, ultimately, to help meet policy priorities at the country and global levels, including the ambitious goals set forth in the SDGs. As our review of 32 PBF programs highlights, there is substantial variation and complexity in how programs incorporate quality of care considerations. There are differences in how quality is incorporated in the payment formula, how many and what indicators are included in checklists, and how they are measured. While PBF programs should be aligned with local conditions and they need to primarily focus on executing payments, the heterogeneity and similarities between programs suggests scope for learning how these programs can more effectively Future research could model the various incentives that we've identified in the PBF programs included in this study to characterize which approaches may be most effective.
PBF is a potentially appealing instrument to address shortfalls in quality of care. incentivize and support providers to address gaps in quality. 11 More research and policy effort is urgently needed to make the best use of PBF as a targeted supply-side intervention.