Medical profiling: improving standards and risk adjustments using hierarchical models

https://doi.org/10.1016/S0167-6296(99)00034-XGet rights and content

Abstract

The conclusions from a profile analysis to identify performance extremes can be affected substantially by the standards and statistical methods used and by the adequacy of risk adjustment. Medically meaningful standards are proposed to replace common statistical standards. Hierarchical regression methods can handle several levels of random variation, make risk adjustments for the providers' case-mix differences, and address the proposed standards. These methods determine probabilities needed to make meaningful profiles of medical units based on standards set by all appropriate parties.

Introduction

Measuring and understanding differences in health care provider performance are drawing increasing attention from government agencies providing care or subsidies to purchase care, from firms providing health care benefits to employees, from managed care organizations and other insurers, and from individual consumers selecting health care providers. Health providers and insurance companies, in particular, have increased interest in profiling to assist with contractual arrangements and with choosing integration partners. We hope this paper will help improve statistical methods for addressing these issues and will encourage the use of interval standards to assess providers.

Much of the profile evaluation literature addresses the difficulties of measuring health care provider quality, including data base accuracy, patient and provider confidentiality, and making risk adjustments to handle case-mix differences (Kassirer, 1994). Comparing providers is complicated by differences in hospital volumes and by differences in the risks of the patients they treat. Adjusting for differences should include all variations out of a provider's control, without adjusting for any variation in its control. Ideal risk adjustment is unrealistic, however, and researchers therefore face a significant challenge (Newhouse, 1996).

For example, profile analyses must account for unequal patient volumes across providers. Garnick et al. (1989) review literature on establishing minimum volume requirements for surgeries and find that the large unexplained variance of provider outcomes makes establishing a minimum volume difficult. Sloan et al., 1986a, Sloan et al., 1986b report an inadequate basis for setting minimum volume standards because the variance of mortality rates was much higher for low volume hospitals than for high volume hospitals. These studies, and others like them, demonstrate the need for better ways to account for unequal sample sizes across providers. Hierarchical models, like the one considered here, can accomplish this goal.

The main example here concerns the US Department of Veterans Affairs (VA), which operates over 170 hospitals as part of one of the largest US health care delivery systems. Intense interest has arisen in performance monitoring and profiling VA facilities because of needs to allocate the VA's $17 billion budget appropriately (Lehner et al., 1996), to improve accountability of providers insulated from competitive market forces, and to integrate health care service networks within and outside the VA system.

Self-reporting by facilities in the spirit of Continuous Quality Improvement (CQI) and of Total Quality Management (TQM) is desirable, but obtaining consistent reporting patterns from all VA facilities has proved difficult or impossible (Burgess, 1995). Consequently, the VA has developed a set of 15 annual performance monitors that it has used since 1988 to measure the outcomes and processes of its hospitals. Our example focuses on Fiscal Year 1995 for the particular monitor that records the fraction of patients returned to Intensive Care Units (ICUs) at the VA within 3 days of transferring out, if the patient's ICU return was part of the same admission. This return rate measures the ability of a hospital staff to assess patient needs for continuing ICU care and; ceteris paribus, lower rates indicate better care. A related issue concerns whether reductions in ICU days induced by managed care foster too many readmissions.

Risk adjustments for the return rate monitor are based on average return rates for more than 400 Diagnosis Related Groups (the DRG associated with the original ICU stay) and two age groups (over and under 65), estimated from over 100,000 VA patients nationally. Each ICU's expected number of returns (denoted ei for Hospital i in later sections) was computed based on these patient level variables and compared by a statistical test to the ICU's observed returns. The VA perceived that the result produced an excessive number of outliers, thereby threatening the continued viability of profiling.

The methods offered here can help the VA, and other organizations, to calibrate outliers better and to design better performance standards. The VA might use the results to identify outlying hospitals so as to commend exemplary hospitals and to encourage underperforming hospitals to improve. Whether these measures also should influence budgeting or curtailing programs in hospitals remains an open question, partly because difficulties persist in achieving adequate risk adjustment. Nevertheless, as feedback between information and providers develops, and as profiling methods improve, such uses probably will be considered.

The key contributions here include using interval standards for providers and addressing these standards with probability statements derived from hierarchical models. Section 2develops the main ideas at a less technical level and, using ICU data, compares them with a commonly used procedure based on P-values. Section 3provides the technical description of the model, while Section 4explains the statistical method and interprets the numerical results for ICU returns. Section 5discusses the methods presented in the context of broader health economics research. Section 6concludes by reviewing the VA's hospital monitoring system, the advantages of the recommended approach, and some needed extensions.

Section snippets

Improving the approach to medical profiling

This section introduces and illustrates two improvements on P-value profiles. One improvement employs a hierarchical model to assess inter-hospital information and uses that information to evaluate performances. The other improvement replaces point standards with interval standards.

Hierarchical models as an approach

This section further discusses the hierarchical models introduced in Section 2.2. As described in Section 3.1, these models account for regression-to-the-mean, unequal sample sizes, and risk adjustment. Section 3.2details the specific Poisson hierarchical model needed for the ICU data analyses in Section 4. Hierarchical models with other distributional assumptions (e.g., Normal or Binomial) are available for use, with health care examples in Goldstein and Spiegelhalter (1996), Normand et al.

Analysis, profile results and model checking

The analyses that follow combine data from all 148 VA hospitals to fit the hierarchical Poisson model presented in Section 3.2. In Section 4.1, we discuss estimating the unknown Gamma distribution parameters in Eq. (2)and the inclusion of covariates. In Fig. 1, in Section 2, we already have demonstrated how the choice of exemplary and substandard criteria affects conclusions about hospital performance. In this section, we propose and illustrate how all of the information needed by hospital

Discussion

We have chosen to provide a finely honed example of recommended profiling methods in this paper that illustrates the gains obtained by deriving probability statements from hierarchical models using interval standards. Nevertheless, the methods discussed here have much wider implications and possibilities for health economists facing a variety of other healthcare market issues. We briefly present two of those application issues here for discussion. The first issue highlights a fundamental

Conclusions

Since 1988 VA has been building a national information system with process and outcome quality monitors, like this ICU monitor, accessible to managers and researchers throughout the VA system. This system includes the observed and expected frequency counts, at-risk sample sizes, and probabilities of high and low extremes for each monitor (as determined by hierarchical models similar to the one described here). In the case of high extremes, this information system and its abstracts are used by

Acknowledgements

The especially detailed comments of the editors as well as comments from two anonymous referees are gratefully acknowledged. Dr. Morris gratefully acknowledges the support of NSF grant DMS-9705156. The conclusions and opinions in this paper are those of the authors and do not necessarily reflect official positions of the US Department of Veterans Affairs.

References (28)

  • F.A. Sloan et al.

    Diffusion of surgical technology: an exploratory study

    Journal of Health Economics

    (1986)
  • Burgess, J.F., Jr., 1995. Comments on Measuring and improving quality in health care by M.R. Chassin. In: Abbott, T.A.,...
  • Christiansen, C.L., Morris, C.N., 1996a. Fitting and checking a two-Level Poisson model: modeling patient mortality...
  • Christiansen, C.L., Morris, C.N., 1996b. Poisson Regression Interactive Multilevel Modeling (PRIMM), ©1996, Online....
  • C.L. Christiansen et al.

    Hierarchical Poisson regression modeling

    Journal of the American Statistical Association

    (1997)
  • C.L. Christiansen et al.

    Improving the statistical approach to health care provider profiling

    Annals of Internal Medicine

    (1997)
  • D.W. Garnick et al.

    Surgeon volume vs. hospital volume: which matters more?

    Journal of the American Medical Association

    (1989)
  • H. Goldstein et al.

    League tables and their limitations: statistical issues in comparisons of institutional performance

    Journal of the Royal Statistical Society A

    (1996)
  • L.I. Iezzoni et al.

    Using administrative data to screen hospitals for high complication rates

    Inquiry

    (1994)
  • J.P. Kassirer

    The use and abuse of practice profiles

    The New England Journal of Medicine

    (1994)
  • L.A. Lehner et al.

    Data and information requirements for VA resource allocation systems

    Medical Care

    (1996)
  • H.S. Luft et al.

    Calculating the probability of rare events: why settle for an approximation?

    Health Services Research

    (1993)
  • C.A. Ma et al.

    Quality competition, welfare, and regulation

    Journal of Economics

    (1993)
  • C.N. Morris

    Parametric empirical Bayes inference: theory and applications

    Journal of the American Statistical Association

    (1983)
  • Cited by (71)

    View all citing articles on Scopus
    View full text