**DOI:**10.1128/CDLI.12.5.640-643.2005

## ABSTRACT

Our objective was to develop data-based algorithms for definition of immunologic response to AIDS therapies in pediatric patients, taking account of T-cell subset measurement errors. The study design involved cross-protocol analysis of 2,148 enrollees in six completed Pediatric AIDS Clinical Trials Group trials. We used standard quantitation of T-cell subsets; linear modeling with mean-dependent measurement error variance was used to develop 95% tolerance limits for change in CD4%. For individuals with a CD4% of approximately 25%, the measurement error-based 95% tolerance interval ranges from 15% to 35%, whereas for individuals with a CD4% of approximately 5%, the tolerance interval ranges from 3% to 7%. When pairs of CD4% measures taken within a time interval of less than 30 days are averaged to estimate steady-state CD4%, tolerance interval width decreases by approximately 30%. A simple graphical tool that provides a data-based criterion for immunologic response over and above variation ascribable to T-cell measurement error is provided. Variability in CD4% due to measurement error is substantial, increases with level of CD4%, and complicates assessment of immunologic response to therapy. Replicates of CD4% measures could be used to improve precision of interpretation of CD4% measures.

Methods of managing human immunodeficiency virus (HIV) disease are evolving rapidly. The most mature and well-established markers of disease progression are CD4 T-helper cell count and RNA copy number (2, 3, 5, 7). While these parameters are straightforwardly used for cross-sectional classification of patients with respect to clinical state, much uncertainty exists regarding evaluation and interpretation of changes in these markers over time (4). Recovery of CD4 T cells is an important criterion for immune reconstitution in patients who are given antiretroviral therapy. This report focuses on the short-term variability of repeated measures of CD4% in HIV-infected children enrolled in the Pediatric AIDS Clinical Trials Group (PACTG) treatment trials. Data from six clinical trials conducted in the past decade are combined for approximate estimation of measurement error variability. The impact of using replicated CD4% measures (i.e., averaging two measures obtained simultaneously or separated by a very short time intervals) to reduce effects of measurement error is illustrated.

## MATERIALS AND METHODS

Participants.Data were collected from six clinical trials conducted in the PACTG: protocols 152, 190, 300, 338, 377, and 382. Of these, protocols 152, 190, and 300 were the earlier treatment trials, consisting of one- or two-nucleoside analog reverse transcriptase inhibitor therapies, with protocols 152 and 190 completing by 1996 and 300 completing in 2001. Protocols 338, 377, and 382 were instituted later (ending by 2002) and evaluated triple-antiretroviral therapies, including protease inhibitors and/or nonnucleoside analogues.

Measurement protocols.The six trials enumerated above used the same protocol for T-cell subset measurement. Immunophenotyping for T-cell subsets was performed by two-color standard flow cytometry according to the PACTG consensus protocol (http://pactg.s-3.com/immeth.html ). Laboratories that performed these assays were certified by the PACTG immunology quality assurance program. All trials used “pre-entry” and “entry” timed T-cell subset measurements to characterize patient baseline state. These pairs of T-cell subset measurements usually occurred over a period of about one week, during which the patient is reasonably assumed to be clinically stable and on a stable treatment regimen preparatory to starting a new trial.

Statistical methods.Standard descriptive statistics and scatterplots were derived using the R statistical computing environment (www.r-project.org ). Variance component analysis was conducted using the linear mixed effects models software package for R by Pinheiro and Bates (6).

The basic model for response variable *y* (e.g., CD4%) measured on subjects from study *s* is
$$mathtex$$\[y_{ij}{=}{\mu}_{s}{+}(\mathrm{age}_{ij}{-}c){\beta}_{s}{+}a_{i}{+}e_{ij}\]$$mathtex$$
where *i* indexes patients in study *s*, *j* = 1, 2 indexes the pre-entry and entry measures, μ_{s} is the overall mean value of response for a subject aged *c* years, β_{s} is the study-specific slope of mean response on age at time of measurement, *a _{i}* is a subject-specific random effect with distribution $$mathtex$$\(N(0,\ {\sigma}_{b}^{2}\)$$mathtex$$), and

*e*is a residual error term.

_{ij}The residuals *e _{ij}* are of central interest in this study. Let

*m*denote the true mean CD4% for subject

_{i}*i*. If

*m*is very small (say 5% or less), then the measurement error variability for measures on subject

_{i}*i*tends also to be small, in part because negative values of CD4% are not possible. Among subjects for whom

*m*is larger, measurement error variability tends to be larger as well. We therefore adopt a heteroskedastic measurement error model. Conditionally on the value of

_{i}*a*,

_{i}*e*has distribution $$mathtex$$\(N(0,\ {\sigma}_{e}^{2}\ {\cdot}\ m_{i}^{2{\delta}}\)$$mathtex$$). Note that as

_{ij}*m*tends to zero, so does the measurement error variance. As

_{i}*m*increases, measurement error variability increases proportionally to $$mathtex$$\(m_{i}^{2{\delta}}\)$$mathtex$$. The parameter δ can be used to control the growth of measurement error variability with magnitude of mean CD4%.

_{i}Given the generally short time elapsed between repeated measures and the likely clinical and therapeutic stability of the patient in the pre-entry to entry interval, the *e _{ij}* are reasonably regarded as data on irreducible measurement error in the subset measurement process. For a specified value of

*m*, “intraclass correlation coefficient” $$mathtex$$\[{\rho}_{icc}(m){=}\frac{{\sigma}_{b}^{2}}{{\sigma}_{b}^{2}{+}{\sigma}_{e}^{2}\ {\cdot}\ m^{2{\delta}}}\]$$mathtex$$ varies between 0 and 1 and is a dimensionless measure of repeatability, high values reflecting good repeatability.

## RESULTS

Demographics and baseline characteristics.Table 1 (columns 2 to 6) presents basic descriptive statistics regarding the patients in the various protocols. This table includes information on all patients who supplied valid data on pre-entry and entry CD4 counts, regardless of treatment assignment or reappearance of patients in multiple protocols. Individuals whose pre-entry and entry measures were separated in time by more than 30 days were excluded.

Median baseline patient age varied from 2.2 to 7.1 years. Protocols 152, 300, and 382 provided data on patients as young as six months at baseline; minimum ages on other protocols ranged from 9 months on 377 to 15 months on 338. The overall median age was 3.35 years.

In general, HIV RNA loads were uncontrolled, with medians well above 10,000 copies in all protocols for which viral loads were measured.

Immunosuppression as measured by median CD4% varied among the cohorts, with a median of 23% for protocol 300 and a median of 29% for protocol 382. Median CD8 counts were above 1,000 for all protocols (data not shown), and CD4/CD8 ratios ranged from 0.53 (protocol 190) to 0.79 (protocol 382). Forty-nine percent of patients were Centers for Disease Control and Prevention (CDC) category I (CD4% of >25), 32% were category II (15 < CD4% < 24), and 19% were category III (CD4% of <15%). A total of 116 patients (5.4%) had a CD4% of <5%.

After excluding individuals whose time elapsed from pre-entry to entry T-cell counts exceeded 30 days, median lag between pre-entry and entry measurements was generally considerably less than two weeks.

Graphical depiction of CD4% repeatability.Figure 1 depicts on a study-specific basis the dispersion in entry-pre-entry CD4% measures. If CD4% measurements were perfect, we would expect these scatterplots of individual-level changes to be tightly concentrated around the line *y* = 0, as in general there is no biological basis for variation in CD4% in this pretreatment interval. For very short lags (up to three days), variability is very modest, but it appears that short-term variability is substantial at five days and that longer lags are not associated with greater variability. The appearance of these figures, in conjunction with the numerical analyses to be discussed, led us to conclude that amalgamation of the data across these protocols was a reasonable step in learning about CD4% repeatability in pediatric clinical trials.

Modeling of CD4% repeatability.Table 1 (columns 7 to 11) presents results of fitting the variance component model of the “Participants” section in two basic forms: to data stratified by study and to the amalgamated complete data set. The data show that study-specific mean CD4% varied from 24.5% to 30.4%. Despite the fact that intra- and interpatient measures varied considerably among studies, the parameter δ, which describes the association between CD4% measurement error and CD4% mean, varied remarkably less, between 0.59 and 0.77. Furthermore the repeatability of CD4% measurements at means of 15% and 25% was remarkably stable across studies, with values of 0.91 to 0.95 and 0.84 to 0.90, respectively.

Tolerance limits for CD4% change.Figure 2 depicts 95% tolerance limits for change in true mean value when comparing two CD4% measures. To use the figure, find the baseline or other reference value on the *x* axis and draw a vertical line from that point. Note where the vertical line intersects the solid lines and project to the *y* axis. The interval thus defined on the *y* axis is the tolerance limit. This identifies the central 95% of the distribution of CD4% measures consistent with fluctuation due to measurement error alone, with no change in underlying mean between baseline and follow-up. Follow-up measures that lie outside the interval are highly unlikely under a hypothesis of no change in underlying mean CD4%. Note that the choice of 95% limits is conventional. Other choices of threshold can be used.

Figure 2 also provides information on the effect of reducing measurement error variability by averaging pairs of CD4% measures. The error decreases by approximately 30% when the CD4% result is derived from the average of two measurements performed within a time interval of ≤30 days.

## DISCUSSION

Loss of CD4 T cells is a hallmark of HIV infection. This trend is reversed in patients given antiretroviral therapy that successfully achieve virologic suppression, but it is now well recognized that gains in CD4 counts may also occur in the absence of durable virologic suppression. Clear criteria for designating an individual as an immunologic responder are lacking, and many studies have relied upon arbitrary measures of CD4 increase from baseline values. The goal of this study was to establish how to interpret a change in CD4 from an initial value. By estimating measurement error variability using repeated CD4% measures taken over a short time interval, we provide a basis for identifying changes that are likely to represent true immunologic response or deterioration.

Based on the variance analysis, we constructed a model depicting 95% tolerance limits for change in true mean value of CD4%. As shown in Fig. 2, the model indicates that the variance is greater at higher levels of CD4% than at lower CD4%. Thus using a standard definition of gain in CD4 either in absolute number or in percentages could be misleading, failing to identify clinically meaningful CD4 changes at lower initial CD4 percentages or overinterpreting changes at higher initial CD4 percentages. The data support the idea of computing averages of two measurements performed within a time interval of <30 days.

This study is the first model-based analysis of pediatric patients for assessing CD4 measurement variability. While we assembled records on over 2,000 participants in PACTG trials conducted over the past ten years, we acknowledge that our cohort does not represent a systematic or random sample of HIV-positive children. The protocols selected for this study included infants and young children <8 years of age with various degrees of immunosuppression, and our basic model included an adjustment for an age effect. The size and diversity of the cohort on which our model is constructed contribute to, but do not guarantee, the face validity of our inferences on measurement variability. It was noted by a referee that some of the observations shown in Fig. 1 appear to be outliers. A formal test for outliers in the marginal distribution of all CD4% measures used in the model did not reject the null hypothesis of no outliers. However, a test applied to the marginal distribution of all entry-pre-entry differences did identify 15 putative outliers, with magnitude of the difference exceeding 21 percentage units. When the model was refit, excluding these observations, the estimate of δ was 0.56, that of $$mathtex$$\({\sigma}_{e}^{2}\)$$mathtex$$was 0.35, and the impact on the analog of Fig. 2 was slight, with a slight narrowing of the bands. In the absence of frank evidence of data error, we prefer to employ the entire data set to fit the model of interest. Robustness to data anomalies is a concern with any application of sophisticated statistical modeling. A related analysis with HIV-1 RNA measures was conducted by Brambilla and colleagues (1), who obtained robust estimates of assay standard deviation by rescaling empirical quantiles.

The construct validity of the tolerance limits presented in Fig. 2 can be assessed by evaluating the association between declared “genuine immunologic response” (changes beyond the bounds of the tolerance limit for given initial value) and other clinical, immunologic, or virologic events. Such analyses will be important steps toward formation of an evidence-based criterion for immunologic response to antiretroviral therapy.

## ACKNOWLEDGMENTS

This work was supported by NIH/NIAID contract/grant number 2 U01 AI41110-06.

## FOOTNOTES

- Received 24 November 2004.
- Returned for modification 14 January 2005.
- Accepted 23 February 2005.

- Copyright © 2005 American Society for Microbiology