如何為$ \ hat {Y} $生成置信帶


5

假設我運行線性回歸模型。我對生成預測間隔感興趣。預測值易於計算,但是如何計算每個$ \ hat {Y} $ s的標準偏差?

在R中,我使用predict.lm函數並使用interval='prediction'參數。我會將CI的最高範圍減去實際數字,然後除以1.96以獲得95%的CI,但是我想直接得到它以確保。

我嘗試了se.fit = TRUE和類似的東西,但是沒有用。

-2

we have $$ \hat{\beta}\pm t_{\alpha/2,n-2} \sqrt{\frac{MSE}{\sum(x_i-\bar{x})^2}} $$ then

  l=lm(y~x)
  MSE=mean ( (l$residuals)^2) 
      SSX=sum ( (x-mean(x))^2 )
      U= l$coefficients + qt(1-alpha/2,n-2) * sqrt(MSE/SSX)
  L= l$coefficients - qt(1-alpha/2,n-2) * sqrt(MSE/SSX)

3

By fitting an lm object you obtain all the necessary components to do this. Mathematically you have estimates:

$$\hat{\beta} = \left( \mathbf{X}^T\mathbf{X} \right) ^{-1} \left( \mathbf{X}^T y \right) $$

and and estimate:

$$\mbox{vcov}\left(\hat{\beta} \right) = \hat{\sigma}^2 \left( \mathbf{X}^T\mathbf{X} \right) ^{-1} $$

the beta-hats are obtained by calling coef to the lm object and the variance estimate, vcov to the lm object.

Mathematically, for any $\mathbf{X}_{pred}$ observation you wish to predict the fitting $\hat{Y} = E \left[ Y | \mathbf{X} = \mathbf{X}_{pred} \right]$ then since the $\hat{Y}$ is given by: $\mathbf{X}_{pred}^T \hat{\beta}$ it is a simple mathematical manipulation to find that:

$$\mbox{var} \left( \hat{Y} \right) = \mathbf{X}_{pred}^T \mbox{vcov}\left(\hat{\beta} \right) \mathbf{X}_{pred} = \hat{\sigma}^2 \mathbf{X}_{pred}^T \left( \mathbf{X}^T\mathbf{X} \right) ^{-1} \mathbf{X}_{pred} $$

It is a simple rule of quadratic forms that the farther $\mathbf{X}_{pred}$ is from the sample mean for each covariate (in a Euclidean sense), the greater $\left( \mathbf{X}_{pred}^T\mathbf{X}_{pred} \right)$ will be and, hence, the greater the variance of $\hat{Y}$.

Simply, the variance only differs as a function of the cross product of your predicted $X$. An illustrating example in R since you seem to be interested in both the theoretical and computational aspects...

x <- 1:100
y <- rnorm(100, x, 100)
plot(x, y)
f <- lm(y ~ x)
X <- model.matrix(f)
pred.se <- apply(X, 1, function(Xrow) t(Xrow) %*% vcov(f) %*% Xrow)
lines(1:100, 1:100 + 1.96*sqrt(pred.se))
lines(1:100, 1:100 - 1.96*sqrt(pred.se))
## "conf band is for uncertainty in predicted ys, should be substantially 
## tighter than observed vales

enter image description here