Close to Home: Evidence on the Impact of Community-Based Girl Groups

Available evidence, though limited, shows that programs can use community-based girl groups to help adolescent girls improve attitudes toward gender roles and norms, early pregnancy, and child marriage; evaluations indicate they have suboptimal performance on health behavior and health status.


INTRODUCTION
G overnments in countries that have populations of median age under 25 1 face demographic pressure as the result of infant mortality gains and high birth rates. Their young age structures offer an unprecedented opportunity for progress, which has stimulated global commitment to adolescents and, in particular, adolescent girls. Although attention to adolescent girls in low-and middle-income countries (LMICs) has increased dramatically, 2,3 hundreds of millions of adolescent girls still lack access to essential services and basic human rights. Despite progress, globally 12 million girls are still married as children annually, 4 and in sub-Saharan Africa, 35% of girls-versus 30% of boys-are not in school. 5 Girls at the highest risk of the worst outcomes-like child marriage, early pregnancy, and HIV infectionoften miss the benefits of social sector programs because of their socially isolated and marginalized status. Girls who lack contact with schools, where youth programs a Poverty, Gender, and Youth Research Program, Population Council, New York.
Global Health: Science and Practice 2020 | Volume 8 | Number 2 often take place, also may be excluded from formal health and financial services and labor markets. Adolescent girls with access to health facilities rarely receive adolescent-friendly services; providers may overlook their specific health needs or treat them insensitively. 6 Some programs use community-based girl groups (CBGG) to address risk for girls who are hard to reach through formal delivery channels like schools and health services. In CBGG programs, girls and young women meet regularly with a leader (e.g., a mentor) who uses a variety of pedagogical methods to address sexual and reproductive health (SRH), HIV prevention, life skills, economic and financial outcomes, and other topics.
CBGGs are proliferating across geographic regions. For example, under the Determined, Resilient, Empowered, AIDS-free, Mentored, and Safe (DREAMS) Partnership to reduce HIV infections among adolescent girls and young women, implementing partners in 14 countries in sub-Saharan Africa and Haiti use CBGGs to build adolescent girls' and young women's social and other assets (e.g., cognitive, economic, health assets). 7 Often, these are called "safe space" programs because they meet in community-based venues that girls and parents perceive as safe and private, which can reduce barriers to attendance and enable discussion of sensitive issues. The Population Council tests the CBGG model based on a theory of change that posits when multisectoral programs address girls holistically, content is tailored to respond to heterogeneous girl segments, and group meetings are accessible and mentor-led, they can build girls' protective assets and empower them to reduce risk and increase opportunity in the right environment. 8 Increasingly, randomized controlled trial (RCT) evidence joins the body of quasi-experimental studies of CBGG programs, expanding both the amount and type of evidence available. However, this evidence is not always available to funders and implementers in an accessible form they can use to inform decision making. One explanation is there has been little analysis of the evaluation evidence specific to CBGG programs, although they are included in broader reviews. [9][10][11] The time is right to consolidate what is known about CBGGs to help donors, researchers, policy makers, and implementers make informed decisions regarding funding, research, policy, and practice. 12 To help fill the gap between evidence generation and evidence use, we conducted the firstever literature review focused on the evidence on CBGG programs. We explored how programs with CBGGs were designed and their effects. We also identified questions that merit further research to inform programming to empower girls and advance their well-being. By critically reviewing impact evaluation evidence on CBGGs in LMICs, we aimed to answer 4 questions: 1. What design features do CBGGs with impact evaluations have?
2. What did those evaluations measure?
3. What were the program effects on girls?
4. What type of study designs generated which results?
The literature on CBGG programs was subjected to rigorous selection, search, abstraction, and analysis methods to produce a holistic, informed assessment of this program delivery model.

Study Selection
We reviewed literature in search of evaluations of programs that used group-based methods to deliver content to adolescent girls to build their life skills and empower them. To be considered for our analysis, the program had to include: (1) a group of 10-to 19-year-old girls who met regularly (i.e., more than once); (2) a female mentor who received dedicated training for the role; and (3) a meeting venue located in a community setting rather than a formal institution (e.g., not hospitals or schools during formal classroom hours). We considered group leaders as "mentors" if they were at least slightly older than participants, consistent with the majority of programs in our sample; peer educators also were considered if they fit our criteria.
Programs underwent 2 levels of screening to be included in our analysis. The first screening assessed if the evaluated program included the elements described above. The second screening focused on the rigor of the evaluation methodology. To clear this screening, study designs had to have: an impact evaluation that used an experimental or quasi-experimental study design, data collected at a minimum of 2 time points, an intervention and control/comparison group, and quantitative program effects and probability values (P values). We also included descriptive publications (i.e., those that did not report P values) if they provided supporting information about programs that were described in other papers in our sample. Evaluations that constructed a post hoc comparison group using statistical methods, such Girls at the highest risk of the worst health outcomes often miss the benefits of social sector program because of social isolation and marginalization.
We conducted the first-ever literature review dedicated to CBGG program evidence.
as propensity score matching, did not pass this screening.
We limited our search to peer-reviewed and non-peer-reviewed (gray) literature in English published between 2000 and 2017.

Participants
In our review, we sought evaluations of programs that targeted adolescent girls aged 10 to 19 years who were married or unmarried. Programs with young women (i.e., aged 20-24 years) were included only if adolescent girls also were enrolled. For programs with older participants (i.e., aged over 24 years), the analyses had to be stratified by or controlled for age to pass our screening. Programs that included adolescent boys and young men also passed the screening if their analyses controlled for sex or disaggregated results.

Outcomes
To understand the programs' operations and reported effects, we assessed both implementation science and impact evaluation findings. The evaluations used a large variety of impact measures across programs that encompassed both proximal and distal effects on outcomes. Program evaluations relied heavily on self-reported data, and a few used objective methods to measure the effects (e.g., biomarker testing for HIV, herpes simplex virus 2 [HSV-2], pregnancy status; banking information about savings amounts; problem sets to gauge numeracy and literacy levels).

Search Strategy
We searched for related publications and captured them based on a review of titles, abstracts, and summaries. To identify papers for our sample, we consulted systematic and other reviews of evidence on interventions for adolescents 11,13-16 and 3ie's evidence gap map on adolescent SRH. 17 We also consulted research and journal databases (e.g., Google Scholar, JSTOR, EBSCO'S Academic Search Complete, POPLINE, and DeepDyve) using key words including "girl-centered," "safe spaces," and "mentor." We also reviewed web sites of relevant implementing organizations with a history of programming for adolescent girls in LMICs. Programs outside LMICs were excluded.

Data Extraction
We extracted program details including: design features (country, setting); program aims; descriptions of participant details (girls' characteristics, mentor qualifications); group characteristics (group size, meeting frequency, program duration, topics covered including health services and male engagement activities); and evaluation details (sample size, program effects).

Data Analysis and Synthesis
For reporting purposes, we created and defined effect categories based on the description in the evaluations and the stated program goals. To enable the interpretation of the wide range of evaluation results, we constructed 8 outcome domains that aggregated the range of effects evaluated. The outcome domains are: (1) health beliefs and attitudes, (2) gender beliefs and attitudes, (3) educationrelated outcomes, (4) psychosocial outcomes, (5) health and gender knowledge and awareness (6 of 7 on health), (6) economic and financial outcomes, (7) health-related behavior, and (8) health status. If evaluations used multiple indicators to assess the same outcome, we combined them into 1 aggregated effect per study. For example, in the psychosocial outcome domain, social support is a composite of numerous indicators: sociability, number of friends, ability to go to girl/youth groups, has at least 1 social safety net, social inclusion index, and others (Table 1).
Within each domain, we report beneficialstatistically significant (a=0.05) changes in the intended direction (i.e., protective direction [<null value] for detrimental outcomes and positive direction [>null value] for advantageous outcomes)-and null (nonsignificant) measures for each effect. We also assessed the total number of times that evaluations measured effects in each outcome domain across the programs. Analyzing effect sizes was beyond the scope of the review. We considered unintended effects as a statistically significant change in the detrimental direction but excluded them from the analysis.

Ethics
Since this study did not involve human subjects research, we did not seek institutional review board approval.

Literature Search Results
The initial review produced 183 manuscripts, articles, and reports. The first screening eliminated 73 documents; we subjected the remaining 110 publications to the second screening and removed an additional 62 whose evaluation design did not meet our requirements. This left 48 publications that reported on evaluations of 30 programs: 14 RCTs and 16 using quasi-experimental design ( Figure 1). The program details and reported findings for these programs are found in Table 2. Sixty  . b Evaluation used cross-sectional surveys to collect baseline and end line data; although, the methodology report didn't contain details on matching or follow-up. Based on the assumption that baseline and end line samples covered different people, we aggregated the number of respondents across both in the calculation. c Though female genital mutilation/cutting (FGM/C) significantly increased for participants in control group compared to intervention arm, the study cites differing FGM traditions may be the reason, e.g., ages villages traditionally perform FGM/C. Difference between baseline and end line prevalence show most girls (>50%) in program villages entered program already circumcised, while most girls in control villages (<40%) were not. This suggests control villages perform FGM/C at later ages than program villages and the statistically significant difference-in-difference calculation between program and control villages might not be attributable to intervention. d Evaluation reports more female than male participants (405 versus 303) but doesn't report numbers of females/males in each arm. e Evaluation provides sex-stratified demographic information/analyses; doesn't report numbers of females/males in each arm. f Total sample size for both intervention and control/comparison arm; evaluation doesn't specify numbers for each. g Evaluation controls for sex in multivariable models but doesn't report numbers of females/males in each arm.
Evidence on the Impact of Community-Based Girl Groups www.ghspjournal.org

Implementation Science Findings
Program design and the quality of implementation influences program effects. The replication of a program with proven efficacy may fail to have the same real-world effect if not implemented with fidelity to the original design. Despite this, research on design features is largely missing from the literature. 10 We sought to fill this gap by collecting information on selected design features of programs in our sample. To note, not every publication provided the same amount of program design, planning, and implementation details. In addition, information was insufficient to compare the attributes of individual programs in our sample and rigorously assess success factors. The amount of information on design features varied considerably. Of 30 programs, 16 reported on the size and 21 on the frequency of group meetings (Figure 2). The most common group size was 15 to 25 girls, who typically met in groups weekly for 1 to 3 hours. Although no clear pattern emerged on program lifespan, nearly half of those reporting this information operated for more than a year. Information on girls' actual participation is needed to assess exposure; however, less than half the programs reported this. According to that information, programs retained an average of 75% of participants (definitions of retention varied from 50% to 100% of sessions).
The information provided about coverage revealed that the largest number of programs targeted unmarried girls aged 13-18 years who were both in school and not in school; more programs occurred in rural than in urban areas (14 rural, 9 urban, 7 in both; Figure 3). The limited details about which girls the programs tried to reach made it difficult to determine if they targeted girls at highest risk of the outcomes they sought to address. For example, for HIV prevention, were the girls who learned about condom self-efficacy the same girls having unprotected sex with an older partner? For child marriage prevention, were the girls who learned about the risks of early marriage the girls most likely to be married off?
Around one-third of programs reported that they adapted aspects of program design to different girl segments. Underscoring the importance of recognizing adolescent girls' heterogeneity, participation and program effects varied between types of girl. The subset of evaluations that disaggregated participation rates by girl segment (e.g.,  found that younger girls attended more frequently than older girls and unmarried girls attended more frequently than married girls, whose responsibilities and social expectations differ. The variation in participation points to the importance of disaggregating design features and evaluation results for programs that target large, diverse groups of girls-for instance, girls aged 10 to 19 years, or both girls in school and not in school-which characterized around half the programs in the sample. CBGG programs used a variety of interventions to deliver content to girls (Figure 4). In addition to serving as a base for referrals and community engagement, enhancements may have influenced outcomes for girls. All but 4 programs included content on life skills. Only 2 of the 30 programs restricted themselves to a single content area; in 17 programs, mentors combined life skills training with activities related to economic and financial outcomes, like income generation skills, financial literacy training, and access to microsavings or cash transfers. Nearly one-third of programs included activities to strengthen access to and/or quality of health services, such as health vouchers. Programs also included recreational activities such as sports and games. Across different content areas, regular group meetings built social support with mentors and peers to reduce social isolation. To complement the girl-centered content and promote an enabling environment, program staff used varied tactics to engage community members, local leaders, families, and male partners.
Programs recruited female mentors who often were local to the program community. Although most mentors were lay people, 4 programs recruited professionals from relevant fields, such as teachers and program staff. A schooling qualification was common, primarily secondary school graduation or the local equivalent. The mentors received specific training for their role; among those reporting this information, mentor training lasted 5 days or longer, and a few programs conducted refresher training following the initial mentor training. Despite the central role of mentors in this program model, reports rarely included details like selection criteria, job descriptions, and training strategies.

Distribution of Program Effects by Outcome Domain
Assessment of Evidence Base. Table 1 presents the total number of times that evaluations measured the effects in each outcome domain across programs. Figure 5 shows the amount of evidence available for each domain and the number of times those outcomes were measured; a program contributes 1 "time reported" (i.e., the y-axis) per effect (e.g., increased mobility). Evaluations measured multiple outcomes and, therefore, could Evidence on the Impact of Community-Based Girl Groups www.ghspjournal.org Global Health: Science and Practice 2020 | Volume 8 | Number 2 be counted more than once per domain. For example, an evaluation could contribute 2 times reported to the psychosocial outcomes domain if its evaluation measured both mobility and social support. Health-related behavior was the most frequently measured domain, followed by knowledge and awareness on health and gender, then psychosocial outcomes and health status. Figure 5 also shows the reported beneficial and null effect measures. In absolute terms, evaluations reported the largest number of beneficial measures for knowledge and awareness on health  Evidence on the Impact of Community-Based Girl Groups www.ghspjournal.org and gender, followed by psychosocial outcomes, then health-related behavior, economic and financial outcomes, gender beliefs and attitudes, education-related outcomes, health beliefs and attitudes, and health status. The number of beneficial effect measures in each outcome domain is not strictly comparable because the quantity of reported measures varied between domains. For example, programs had more opportunity to display changes in healthrelated behaviors than education-related outcomes because more reported on the former than the latter. To avoid biased interpretation, it is more informative to compare the number of beneficial measures with the overall number of measures (beneficialþnull) within each domain. In relative terms, programs reported more (i.e., >50%) beneficial measures than null ones for beliefs and attitudes about health and gender, education-related outcomes, psychosocial outcomes, knowledge and awareness on health and gender, and economic and financial outcomes. Programs reported fewer (<50%) beneficial measures for health-related behaviors and health status. Results for each domain are detailed below in order of the proportion of beneficial effect measures, from most to least relative benefits. Review of Evaluations and Their Effects. Figure 5 differentiates effects by study design. It indicates the likelihood that results are generalizable given that results of RCTs are more robust than other designs, although all impact evaluations in the sample met our criteria for rigor (as described above). To note, most effects on health status, health-related behavior, and knowledge and awareness were measured in RCTs, and quasiexperimental studies focused heavily on psychosocial outcomes. Across all outcome domains, quasi-experimental studies reported more beneficial measures than RCTs.

Health Attitudes and Beliefs
Programs focused on topics that threaten girls' growth and development, such as early pregnancy and female genital mutilation/cutting, to shift their attitudes about their health. Seven programs sought to change girls' health beliefs and attitudes; in total, 91% of the effect measures reported a significant change in the intended direction, making this the domain with the highest proportion of beneficial measures ( Table 1).
The evaluation of Ishraq, a program in Upper Egypt to empower adolescent girls and improve their knowledge and attitudes to promote healthy and safe transitions to adulthood, reported that it improved girls' attitudes toward performing female genital mutilation/cutting on their daughters in the future. 22 Regai Dzive Shiri was a cluster RCT to reduce HIV among Zimbabwean youth who were in school and not in school through work with community members, clinic staff, and young people. Its evaluation reported that it increased girls' concerns about unprotected sex (Table 2). 63,64 Gender Attitudes and Beliefs Programs aimed to shift participants' beliefs and attitudes toward a more egalitarian stance by addressing practices like child marriage and gender-based violence (GBV). Twelve programs aspired to change girls' attitudes and beliefs regarding gender; collectively, 72% of this domain's effect measures were beneficial (Table 1).
Program evaluations reported improvements in girls' attitudes or perceptions toward GBV, child marriage, and gender roles and norms. For example, an evaluation of Choices, a curriculum-based program to shift gender-related attitudes and behaviors in rural Nepal, reported that the program reduced girls' acceptance of GBV. 42 An evaluation of Better Life Options, a life skills education program in Uttar Pradesh, India, reported that it improved girls' attitudes toward child marriage ( Table 2). 30

Education-Related Outcomes
Programs aimed to improve education-related behaviors (e.g., school enrollment) and skills (e.g., numeracy). Evaluations of 10 programs assessed education-related effects and reported beneficial effects 65% of the time they were measured (Table 1).
Overall, program evaluations reported improvements in girls' numeracy skills and increases in school enrollment. In Ethiopia, Biruh Tesfa worked with marginalized girls to improve educationrelated outcomes. Among participants with no formal schooling, the evaluation reported that the program increased girls' numeracy and literacy scores. 27 An evaluation of the scale-up of Ishraq reported that girls' reading comprehension and multiplication skills improved (Table 2). 23

Psychosocial Outcomes
Evaluations used a variety of indicators to track psychosocial outcomes, which include self-efficacy, mobility, autonomy, and social support, as well as Evidence on the Impact of Community-Based Girl Groups www.ghspjournal.org Global Health: Science and Practice 2020 | Volume 8 | Number 2 experience of gender discrimination. Evaluations of 19 programs reported psychosocial outcomes, and 64% of these effect measures were statistically significant. Proportionally, more than half of the measures of girls' self-efficacy regarding SRH behaviors, such as condom use and HIV testing, social support, and assertiveness were beneficial ( Table 1). The evaluation of BRAC's Employment and Livelihood for Adolescents program-which aimed to reduce child marriage, keep girls in school, and increase girls' peer socialization in Bangladesh through income generation and group activities-reported that it increased girls' mobility. 19 The Young Citizens Program in Tanzania used education and community mobilization to strengthen very young adolescents' agency in planning and implementing health promotion activities related to HIV. The evaluation reported that it increased girls' efficacy to assert their thoughts and opinions with peers and adults. 50 Also in Tanzania, the evaluation of Mabinti Tushike Hatamu!, a program to reduce the vulnerability of girls who were not in school, reported that it increased the number of girls who said that community leader requested their opinion (Table 2). 49

Knowledge and Awareness about Health and Gender
Seventeen programs aspired to improve girls' knowledge about health topics, like HIV and marriage-related rights. Their evaluations reported beneficial effect measures 62% of the time, with more success on knowledge measures related to health (63%) than to gender (50%). Evaluations reported more beneficial effects regarding HIV and reproductive health knowledge than regarding sexually transmitted infection (STI) and menstrual regulation knowledge and awareness of marriagerelated rights (Table 1).
An evaluation of the Suubi & Bridges Project, a Ugandan peer mentorship program to protect AIDSorphaned adolescents against HIV and STIs by providing culturally appropriate HIV information, reported that the program increased HIV knowledge. 53 In India, Promoting Change in Reproductive Behavior (known as PRACHAR) in Bihar aimed to increase contraceptive use and delay pregnancy. Although it reportedly increased reproductive health knowledge, it did not succeed in delaying first pregnancy (Table 2). 32

Economic and Financial Outcomes
Evaluations of 15 programs measured economic and financial outcomes and reported beneficial measures 60% of the time. The effects with the highest proportion of beneficial measures were increasing girls' employment, savings accounts, and household assets, as well as decreasing food insecurity. The results related to girls' earnings were mixed (i.e., 50% beneficial), and according to the evaluations, no program reduced dowry practices ( Table 1).
The evaluation of the Shaping the Health of Adolescents in Zimbabwe (known as SHAZ!) Project, which aimed to prevent HIV among adolescent girls through structural interventions, reported that it increased girls' receipt of their own income. 65 Siyakha Nentsha was a 2-armed intervention in South Africa to improve girls' and boys' economic well-being that provided training on life skills, HIV/STI prevention, and social capital building. One arm also received household financial management and small business planning (financial education arm) and another received training in sexuality, reproductive rights, and stress and violence reduction (stress management arm). The evaluation reported that Siyakha Nentsha increased the number of savings accounts (stress management arm) and girls' interaction with banks (financial education arm) ( Table 2). 45

Health-Related Behavior
Nineteen programs sought to improve behaviors, especially those related to SRH (e.g., transactional sex, condom use). Collectively, 38% of the effect measures reported for this domain were beneficial. Effects that were beneficial every time they were measured included: increased secondary abstinence; menstrual hygiene management; and violence treatment, support, and/or prevention services. One-third of the programs included complementary activities to improve access to and quality of health services; however, evaluations reported that health service utilization significantly increased only 50% of the time it was measured. Child marriage significantly decreased nearly 40% of the times it was measured according to evaluation reports. Most program evaluations reported null effects for girls' number of sex partners, transactional sex, condom use, sexual debut, and contraceptive use (Table 1).
Although well under half of this domain's measures were beneficial (38%), individual programs reported notable changes in health Evidence on the Impact of Community-Based Girl Groups www.ghspjournal.org Global Health: Science and Practice 2020 | Volume 8 | Number 2 behavior. The Bangladeshi Association for Life Skills, Income, and Knowledge for Adolescents (known as BALIKA) program aimed to reduce child marriage using weekly girl-only meetings combined with different topics across 3 study arms. The program's evaluation reported that it decreased the odds of child marriage across all 3 arms: girls in the tutoring arm had the lowest odds of marriage before the age of 18. 18 The evaluation of Networks of Hope, a multi-arm South African program to reduce HIV risk by improving psychological and behavioral outcomes, reported that it increased girls' consistent condom use. 44 In a rare example of a longitudinal effect measure, the evaluation of a Mexican program, Cuídate! Promueve tu Salud, reported that it increased participants' age at first sex in a 4-year follow-up survey ( Table 2). 41

Health Status
Evaluations of 11 programs (8 were RCTs) assessed changes in health status using self-reports and biomarkers. Few evaluations reported statistically significant improvements in health status effects, such as experience of physical violence and HSV-2 incidence; of the times evaluations measured improved health status, only 26% were beneficial. The 4 programs that measured HIV incidence did not report a decrease. Evaluations reported that measures of decreasing girls' experience of sexual and physical violence were null more often than beneficial. No programs reported mental health improvements or STI reductions (Table 1).
Stepping Stones is a program to improve sexual health with participatory learning to build knowledge, risk awareness, and communication skills. Its evaluation reported that the program reduced HSV-2 incidence. 46 The evaluation of Growing Up Safe & Healthy in Bangladesh, which used a multipronged delivery model including male groups, female groups, and community mobilization, reported it decreased girls' experience of physical and/or sexual violence (Table 2). 20

DISCUSSION
The expanding evidence base on CBGGs enables an analysis of their effects across programs and countries. Notably, the size of the evidence base varies for each outcome domain and limits comparability between the summaries of impact. The variation reflects funding patterns for CBGGs, which are dominated by HIV prevention, explaining the preponderance of health behavior measurement. The results only describe what was measured, which may or may not encompass all the changes resulting from the programs. For these reasons, the relative assessment, which indicates how the program did in relation to its aims, is more informative than the absolute assessment.
Different types of study designs in our sample yielded different types of results. In general, the RCTs emphasized outcomes that could be objectively measured in the domains of health status and behavior (albeit mostly self-reported). The quasi-experimental evaluations tended to emphasize outcomes that are more complex to measure, such as psychosocial outcomes and attitudes.
Evaluations of programs using CBGGs reported improvements in girls' attitudes and beliefs about gender and health; boosts in educated-related outcomes, such as numeracy and school enrollment; and increases in girls' economic and psychosocial assets. They also reported positive effects on knowledge and awareness about health and gender. In general, these results suggest that CBGGs appear to have more potential to impact individual outcomes than outcomes that rely on a group. Theoretically, all of these are along the causal pathway to good health.
Despite the reported boost that programs gave mediating factors that theoretically improve health behavior and health status, reports of program performance on behavior and health status is mixed. For instance, condom use increased less than half the times measured (5 of 11) and contraceptive use increased one-third of the times measured (3 of 9). Only one-quarter of reported measures of girls' health status (e.g., experience of physical or sexual violence, fertility, STI incidence) were statistically significant, and child marriage practices improved just under half the time that evaluations measured them (3 of 8 times). These results are not unexpected given that attitudes and knowledge change faster than behavior and, ultimately, health status. 66 The theoretical pathway to health behavior change is well-documented and offers possible reasons that changes in mediating factors did not consistently translate into behavior change and better health within evaluation time frames. Explanations relate to girls' locus of control and program and study designs. 67,68 First, the main benefits of CBGG programs reflect changes that are internal to girls-for example, attitudes toward child marriage, demand for health services, self-esteem, and literacy. In general, effects are weaker on outcomes that rely on factors external to girls-such as condom use, HIV testing, child marriage, and health service utilization. This The main CBGG program benefits to girls appear to be internal changes, such as attitudes toward child marriage and self-esteem.
Evidence on the Impact of Community-Based Girl Groups www.ghspjournal.org Global Health: Science and Practice 2020 | Volume 8 | Number 2 difference may reflect inequitable interpersonal relationships; weak access to transport, finances, services; and other socioeconomic factors that impede girls' ability to exercise their voice, choice, and control over behaviors and, consequently, their health and well-being. Notably, most programs with CBGGs included activities to engage community members that theoretically have the potential to reduce barriers to behavior change. However, details on community engagement and its influence on girl-level outcomes was rarely reported in the impact evaluations in our sample. Second, related to study and program design, participation rates varied between different subpopulations of girls. This may have led to mixed effects for different girl segments that reported results may have masked. For instance, if younger girls participate more in meetings than older girls, they may derive more benefits that may not appear in a summary effect measure. 61 Zambia's Adolescent Girls' Empowerment Program documented more participation among younger and rural participants than older and urban ones; not surprisingly, the evaluation found that younger unmarried girls benefited more than older married girls. 61 Given their central role in delivering content in CBGGs, mentor performance is another important mediator of effects masked by aggregated results. The scant evidence available on mentor quality indicates that mentors' own characteristics and the quality of their performance is a major source of variability in girls' participation and impact. 61 Aggregated results of impact evaluations of programs for diverse groups of girls (e.g., girls aged 10-19 years in school and not in school) and mentors risk eclipsing effects for some subsets of participants in the absence of disaggregation.
Third, related to study design, when and what outcomes the impact evaluations measured influenced our results. The types of outcome measures that dominated impact evaluations and the data collection instruments used may not have been adequate to capture the types of changes that CBGGs are most likely to bring about. In addition, most evaluations captured short-term effects after programs ended; they rarely returned to measure long-term impact. A few notable exceptions include Mexico's Cuídate! Promueve tu Salud, where researchers returned 4 years after activities ended to assess the durability of effects. Most young adolescents are not yet sexually active; given the possibility that younger participants attend more regularly than older ones, it is conceivable that the most active CBGG participants faced the least behavioral risk within evaluation time frames. This would limit the likelihood of evaluations finding sexual behavioral and health effects. Long-term follow-up would reveal if benefits endure and these girls reduce behavioral and health risks as they age or if benefits wash out over time.

Limitations
The summary of CBGGs effects is informative. However, limited evidence and the lack of comparability between studies make these results preliminary. The small size of the evidence base, as well as the tremendous variability in the study designs, implementation features, and outcomes measured, prevented us from conducting a metaanalysis, which would have enabled us to assess effects across programs. More evidence, including from implementation science research, would shed light on the most promising design features, making the practical implications of impact evaluation results clearer. In addition, too few multicomponent studies compared different combinations of interventions and content to enable a detailed assessment of attribution. For example, we could not assess the effect of group-level changes resulting from community engagement activities that may have influenced girl-level effects.
Although the literature review was comprehensive, it was not a systematic review; as a result, we may have missed relevant evidence. The tendency to favor positive results in publications may have led us to overestimate the benefits of CBGGs. Additionally, evaluations relied heavily on self-reported information, which introduces the possibility of social-desirability and recall biases. Finally, although the RCTs were designed to reduce the risk of selection bias, it is possible that girls who joined CBGG programs and participated regularly differed from nonparticipants and dropouts in ways that influenced the likelihood of impact.

SUMMARY AND IMPLICATIONS FOR PROGRAMS, POLICIES, AND RESEARCH
Most CBGGs in our sample included 20 (6 5) girls, met weekly for more than an hour, and lasted for a year or longer; they frequently combined life skills training with content to promote economic and financial outcomes, such as financial literacy or access to microfunds/bank accounts. Providing girls with an opportunity to build social connections with peers and mentors in a safe space has intrinsic value. Furthermore, the evaluations in this review Evidence on the Impact of Community-Based Girl Groups www.ghspjournal.org Global Health: Science and Practice 2020 | Volume 8 | Number 2 indicate that programs with these characteristics can use locally recruited female mentors to build girls' economic and psychosocial assets; improve their attitudes, beliefs, knowledge, and awareness on health and gender; and enhance educationrelated outcomes. Enhancements found in many programs like community engagement and health services strengthening may have influenced the impact of the CBGGs on girls. These results suggest that CBGGs have more potential for benefits that may contribute to girls' empowerment than to their health in the near term. Girls' empowerment, which encompasses their voice, choice, and control over key aspects of their lives, can increase their likelihood of growing into successful, healthy adults. 69,70 Empowerment is a critical development goal in itself that can position girls to make decisions and affect outcomes of importance to themselves, their families, and their communities-especially when the social environment supports these changes. Beyond direct benefits, a girl's empowerment can affect other aspects of her health and well-being. As girls gain voice, choice, and control, in the context of an enabling environment, over time they may benefit from improved outcomes, including delayed marriage and pregnancy, reduced violence, better health, more education, and greater learning. Ultimately, these positive shifts may improve girls' and women's well-being and life chances and reduce the intergenerational transmission of poverty.
These results have implications for research. As the evidence on CBGGs grows, future studies should assess the types of girl-level changes CBGG programming is most likely to bring about, including neglected outcomes such as mental health and nutrition. More evidence would enable a rigorous comparison-such as a meta-analysisof how this program model performs on key outcomes, like child marriage, relative to other interventions, which would make an important contribution to the evidence base. Ensuring impact evaluations are robust and illuminate program methodology and outcome measurement is paramount; using comprehensive research reporting standards and guidelines can help. 71,72 Future evaluations also should consider using triangulation techniques (i.e., comparing selfreported information to records) or supplemental data collection methods (e.g., direct observation) to validate self-reported responses.
Questions remain about how to use the platform that CBGGs provide to best protect and empower adolescent girls in their communities. How do effects vary between different girl segments, and which girl segments are the most important to target (e.g., unmarried, younger girls) for broad changes over time and into the next generation? This program delivery model has salience for married girls, who often are socially isolated and facing high risk, but few impact evaluations included them. Other questions on the effects of CBGGs include how durable effects are and if they wash out over time.
Given increased investment in CBGGs, evidence is needed on their scalability, such as the minimum package of elements required to have an effect. Evaluations of layered combinations of interventions would be informative. Other questions on designing for scale relate to the optimal design model in real-world conditions: the ideal dosage or level of exposure; duration; group size and composition; mentor qualifications and skills; and the cost of retaining quality, effectiveness, and cost-effectiveness as coverage expands. For an enabling environment, how can girl programs effectively engage and mobilize boys, men, and other community members? What are effective tactics for institutionalizing CBGGs within existing government systems, including health systems, for sustainability?
Community-based programming can offer a way to reach adolescents who are out of school, disengaged from formal labor markets, and who rarely use health services. Given that excluded adolescents often face the highest risks of the worst outcomes, assessing the potential of targeted CBGG programs to reach these subpopulations is vital to understand their potential for equity and cost-effectiveness. More impact evaluations should disaggregate results to reflect adolescent heterogeneity, as well as determining what add-ons are required to reach and retain the most excluded girls.
Questions remain about how to use CBGG platforms to best protect and empower adolescent girls in their communities.
Evidence on the Impact of Community-Based Girl Groups www.ghspjournal.org Global Health: Science and Practice 2020 | Volume 8 | Number 2