Using a sample size of n=20 from Exemple 3.1.6 from Bolfarine, build an approximate confidence interval for \(\theta\), where \(f(x|\theta)\) is given by:
$$
f(x | \theta) = \frac{1}{2}(1 + \theta{}x); -1 \leq x \leq 1, \quad -1 \leq \theta \leq 1
$$
To build a confidence interval we need in the first place a pivotal quantity. From the asymptotic distribution of maximum likelihood estimators, we know that
$$
Q(X,\theta) = \frac{\hat{\theta} – \theta}{\sqrt{(nI_F(\hat{\theta}))^{-1}}}
$$
follows an approximate standard normal distribution.
Therefore,
$$
P\left(-z_{\alpha/2} \leq \frac{\hat{\theta} -\theta}{\sqrt{(nI_F(\hat{\theta}))^{-1}}} \leq z_{\alpha/2} \right) \approx \gamma = 1 – \alpha
$$
Developing this expression:
$$
\begin{eqnarray*}
\gamma &\approx& P\left( \frac{-z_{\alpha/2}}{\sqrt{(nI_F(\hat{\theta}))^{-1}}} \leq \hat{\theta} – \theta \leq \frac{z_{\alpha/2}}{\sqrt{(nI_F(\hat{\theta}))^{-1}}} \right) \\
&=& P\left( \frac{-z_{\alpha/2}}{\sqrt{(nI_F(\hat{\theta}))^{-1}}} – \hat{\theta} \leq – \theta \leq \frac{z_{\alpha/2}}{\sqrt{(nI_F(\hat{\theta}))^{-1}}} – \hat{\theta} \right) \\
&=& P\left( \hat{\theta} + \frac{z_{\alpha/2}}{\sqrt{(nI_F(\hat{\theta}))^{-1}}} \geq \theta \geq \hat{\theta} – \frac{z_{\alpha/2}}{\sqrt{(nI_F(\hat{\theta}))^{-1}}} \right) \\
&=& P\left( \hat{\theta} – \frac{z_{\alpha/2}}{\sqrt{(nI_F(\hat{\theta}))^{-1}}} \leq \theta \leq \hat{\theta} + \frac{z_{\alpha/2}}{\sqrt{(nI_F(\hat{\theta}))^{-1}}} \right)
\end{eqnarray*}
$$
And therefore an approximate confidence interval for \(\theta\) will be given by
$$
\begin{equation}
\left[ \hat{\theta} - \frac{z_{\alpha/2}}{\sqrt{(nI_F(\hat{\theta}))^{-1}}} ; \hat{\theta} + \frac{z_{\alpha/2}}{\sqrt{(nI_F(\hat{\theta}))^{-1}}} \right]
\end{equation}
$$
We won’t use the sample size of 20 given by the Bolfarine example because the sample has an invalid value. One of the observations is -3.4919, but the random variable is defined only within [-1,1].
Let us implement a set of R functions which will enable us to work with this density function:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | # This funcion simulates values from this random variable rbolfarine <- function (n, theta) { sample <- runif(n) (-1 + 2 * sqrt(0.25 - theta * (0.5 - 0.25 * theta - sample)))/theta } # Fisher Information calculation inffisher <- function (theta.j) { 1/(2 * theta.j^3) * (log((1 + theta.j)/(1 - theta.j)) - 2 * theta.j) } #Approximation of theta's value by sucessive iterations: mmvapprox2 <- function (x, teta.0 = mean(x), maxeps = 1e-04) { eps <- maxeps + 1 while (eps > maxeps) { teta.i <- teta.0 + sum(x/(1 + teta.0 * x))/sum(x^2/(1 + teta.0 * x)^2) eps <- abs(teta.i - teta.0) teta.0 <- teta.i } teta.i } |
Finnaly, the following function generates a confidence interval, based on the expression we found above. The arguments for the function define the confidence level we want and the sample to be taken into account. There is also an additional argument, teta, which defines the functions behaviour: if specified the functions calculates the confidence interval and returns 1 if teta is within this interval, 0 otherwise. If teta is left as NULL, the function returns the confidence interval.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | IC.complete <- function (x, conf = 0.95, teta = NULL) { teta.hat <- mmvapprox2(x, maxeps = 1e-07) lower <- teta.hat + qnorm((1 - conf)/2)/sqrt(length(x) * inffisher(teta.hat)) upper <- teta.hat - qnorm((1 - conf)/2)/sqrt(length(x) * inffisher(teta.hat)) if (is.null(teta)) { c(lower, upper) } else { if (teta >= lower && teta <= upper) { 1 } else { 0 } } } |
We then generate a sample of size \(n=50\) and \(\theta=0.4\) and will build a confidence interval with \(\gamma = 0.95\):
We will now run a simulation in R, to check whether these confidence intervals are correct.
1 2 3 | > sum(apply(matrix(rbolfarine(1000 * 1000, 0.4), ncol = 1000), + 2, IC.complete, conf = 0.25, teta = 0.4))/1000 [1] 0.258 |
On the first example we simulated 1000 samples of size 1000 and built 1000 confidence intervals, with confidence level of 0.25. The result shown is the observed proportion of confidence intervals which contained the \(\theta\) parameter.
1 2 3 | > sum(apply(matrix(rbolfarine(1000 * 1000, 0.4), ncol = 1000), + 2, IC.complete, conf = 0.95, teta = 0.4))/1000 [1] 0.957 |
Same simulation, but with confidence level of 0.95. As the values are close, we conclude that our confidence intervals were built correctly.