An official website of the United States government

Between-Group Variance (BGV)

left-img

The variance is a commonly used statistic that summarizes all squared deviations from a population average.  In the case of grouped data this is the Between-Group Variance (BGV), and is calculated as: 

$$\widehat{BGV} = {\sum}_{j=1}^Jp_j(y_j-\mu)^2,$$

where yj is group j's health status, pj is the population share in group j, and μ is the average health status of the population:  \(\mu = {\sum}_{j=1}^Jp_jy_j\).

One way to interpret the BGV as the variance that would exist in the population if each individual had the mean health of their social group (i.e., no within-social group variation) (2).  The BGV may be a useful indicator of absolute disparity for unordered group data because it weights by population group size and is sensitive to the magnitude of larger deviations from the population average (3).

Variance of \(\widehat{BGV}\)

The variance of \(\widehat{BGV}\) based on quadratic form approximation method can be estimated as: 

$$\mathrm{var}_{TL}(\widehat{BGV})=4{\sum}_{j=1}^Jp^2_j\hat{\sigma}^2_j(y_j-\mu)^2 + 2\left[S_2^2-S_4+{\sum}_{j=1}^Jp^2_j(1-p_j)^2\hat{\sigma}^4_j\right],$$

where \(S_4 = {\sum}_{j=1}^Jp^4_j\hat{\sigma}^4_j\)  and \(S_2 = {\sum}_{j=1}^Jp^2_j\hat{\sigma}^2_j\).  The standard error of \(\widehat{BGV}\) based on quadratic form approximation method is:  \(\sqrt{\mathrm{var}_{TL}(\widehat{BGV})}\).  See Ahn et al. 2018 (4) for details on how \(\mathrm{var_{TL}(\widehat{BGV})}\) was derived.  We use notation \(\mathrm{var}_{TL}\) for the purpose of consistency.

Variance of \(\widehat{BGV}\) Based on Monte Carlo Simulation (MCS) Method

Randomly generate M age-adjusted rates \(y_{j}^{(m)},m=1,...,M,\) using the distribution:  \(y_{j}^{(m)}\sim Gamma(mean=y_{j},var=\widehat{\sigma}_{j}^{2})\) for each social group.  The standard error of \(\widehat{BGV}\) based on MCS method is:  \(\sqrt{\mathrm{var}_{MCS}(\widehat{BGV})}\)Gamma distribution instead of truncated normal distribution was recommended to simulate age-adjusted rates (1).  Then calculate \(\widehat{BGV}^{(m)}\) using \(y_{j}^{(m)}\).  Thus:  \(\mathrm{var}_{MCS}(\widehat{BGV}) = (M-1)^{-1}{\sum}_{m=1}^M(\widehat{BGV}^{(m)}-\overline{\widehat{BGV}})^2\), where \(\overline{\widehat{BGV}}=M^{-1}{\sum}_{m=1}^M\widehat{BGV}^{(m)}\).  \(M=1,000\) is used in HD*Calc.

95% Confidence Interval of \(\widehat{BGV}\)

The 95% confidence interval of \(\widehat{BGV}\) based on quadratic form approximation method is:  \(\widehat{BGV} \pm 1.96 \times \sqrt{\mathrm{var}_{TL}(\widehat{BGV})}.\)

The lower and upper bounds of the 95% confidence interval of \(\widehat{BGV}\) based on the MCS method are the 2.5th percentile and 97.5th percentile of the \(1,000 \widehat{BGV}^{(m)}\) values.

 

right-img