1 Inferences About the Difference Between Two Population Means: σ[sub(1)] and σ[sub(2)] Known
Tải bản đầy đủ - 0trang
10.1
Inferences About the Difference Between Two Population Means: σ1 and σ2 Known
409
The point estimator of the difference between the two population means is the difference
between the two sample means.
POINT ESTIMATOR OF THE DIFFERENCE BETWEEN TWO POPULATION MEANS
x¯1 Ϫ x¯ 2
The standard error of
x¯1 Ϫ x¯2 is the standard
deviation of the sampling
distribution of x¯1 Ϫ x¯2.
(10.1)
Figure 10.1 provides an overview of the process used to estimate the difference between
two population means based on two independent simple random samples.
As with other point estimators, the point estimator x¯1 Ϫ x¯ 2 has a standard error that describes the variation in the sampling distribution of the estimator. With two independent
simple random samples, the standard error of x¯1 Ϫ x¯ 2 is as follows:
STANDARD ERROR OF x¯1 Ϫ x¯2
σx¯1Ϫx¯2 ϭ
ͱ
σ 21
σ 22
ϩ
n1
n2
(10.2)
If both populations have a normal distribution, or if the sample sizes are large enough that
the central limit theorem enables us to conclude that the sampling distributions of x¯1 and
x¯ 2 can be approximated by a normal distribution, the sampling distribution of x¯1 Ϫ x¯ 2 will
have a normal distribution with mean given by μ1 Ϫ μ 2.
As we showed in Chapter 8, an interval estimate is given by a point estimate Ϯ a margin of error. In the case of estimation of the difference between two population means, an
interval estimate will take the following form:
x¯1 Ϫ x¯ 2 Ϯ Margin of error
FIGURE 10.1
ESTIMATING THE DIFFERENCE BETWEEN TWO POPULATION MEANS
Population 1
Inner-City Store Customers
Population 2
Suburban Store Customers
1 = mean age of inner-city
store customers
2 = mean age of suburban
store customers
1 – 2 = difference between the mean ages
Two Independent Simple Random Samples
Simple random sample of
n1 inner-city customers
Simple random sample of
n 2 suburban customers
x1 = sample mean age for the
inner-city store customers
x 2 = sample mean age for the
suburban store customers
x1 – x2 = Point estimator of 1 – 2
410
Chapter 10
Inference About Means and Proportions with Two Populations
With the sampling distribution of x¯1 Ϫ x¯ 2 having a normal distribution, we can write the
margin of error as follows:
The margin of error is
given by multiplying the
standard error by zα/2.
Margin of error ϭ zα/2 σx¯1Ϫx¯2 ϭ zα/2
ͱ
σ 21
σ 22
ϩ
n1
n2
(10.3)
Thus the interval estimate of the difference between two population means is as follows:
INTERVAL ESTIMATE OF THE DIFFERENCE BETWEEN TWO POPULATION
MEANS: σ1 AND σ2 KNOWN
x¯1 Ϫ x¯ 2 Ϯ zα/2
ͱ
σ 21
σ 22
ϩ
n1
n2
(10.4)
where 1 Ϫ α is the confidence coefficient.
Let us return to the Greystone example. Based on data from previous customer demographic studies, the two population standard deviations are known with σ1 ϭ 9 years and
σ2 ϭ 10 years. The data collected from the two independent simple random samples of
Greystone customers provided the following results.
Sample Size
Sample Mean
Inner City Store
Suburban Store
n1 ϭ 36
x¯1 ϭ 40 years
n 2 ϭ 49
x¯ 2 ϭ 35 years
Using expression (10.1), we find that the point estimate of the difference between the mean
ages of the two populations is x¯1 Ϫ x¯ 2 ϭ 40 Ϫ 35 ϭ 5 years. Thus, we estimate that the customers at the inner-city store have a mean age five years greater than the mean age of the suburban store customers. We can now use expression (10.4) to compute the margin of error and
provide the interval estimate of μ1 Ϫ μ 2. Using 95% confidence and zα/2 ϭ z.025 ϭ 1.96, we have
x¯1 Ϫ x¯ 2 Ϯ zα/2
ͱ
ͱ
40 Ϫ 35 Ϯ 1.96
σ 21
σ 22
ϩ
n1
n2
102
92
ϩ
36
49
5 Ϯ 4.06
Thus, the margin of error is 4.06 years and the 95% confidence interval estimate of the
difference between the two population means is 5 Ϫ 4.06 ϭ .94 years to 5 ϩ 4.06 ϭ
9.06 years.
Hypothesis Tests About μ1 ؊ μ2
Let us consider hypothesis tests about the difference between two population means. Using
D0 to denote the hypothesized difference between μ1 and μ 2 , the three forms for a hypothesis test are as follows:
H0: μ1 Ϫ μ 2 Ն D0
Ha: μ1 Ϫ μ 2 Ͻ D0
H0: μ1 Ϫ μ 2 Յ D0
Ha: μ1 Ϫ μ 2 Ͼ D0
H0: μ1 Ϫ μ 2 ϭ D0
Ha: μ1 Ϫ μ 2 D0
10.1
Inferences About the Difference Between Two Population Means: σ1 and σ2 Known
411
In many applications, D0 ϭ 0. Using the two-tailed test as an example, when D0 ϭ 0 the
null hypothesis is H0: μ1 Ϫ μ 2 ϭ 0. In this case, the null hypothesis is that μ1 and μ 2 are
equal. Rejection of H0 leads to the conclusion that Ha: μ1 Ϫ μ 2 0 is true; that is, μ1 and
μ 2 are not equal.
The steps for conducting hypothesis tests presented in Chapter 9 are applicable here.
We must choose a level of significance, compute the value of the test statistic and find the
p-value to determine whether the null hypothesis should be rejected. With two independent
simple random samples, we showed that the point estimator x¯1 Ϫ x¯ 2 has a standard error
σx¯1Ϫx¯2 given by expression (10.2) and, when the sample sizes are large enough, the distribution of x¯1 Ϫ x¯ 2 can be described by a normal distribution. In this case, the test statistic for the difference between two population means when σ1 and σ2 are known is as
follows.
TEST STATISTIC FOR HYPOTHESIS TESTS ABOUT μ1 Ϫ μ 2: σ1 AND σ2 KNOWN
zϭ
(x¯1 Ϫ x¯ 2 ) Ϫ D0
ͱ
σ 21
σ 22
ϩ
n1
n2
(10.5)
Let us demonstrate the use of this test statistic in the following hypothesis testing example.
As part of a study to evaluate differences in education quality between two training centers, a standardized examination is given to individuals who are trained at the centers. The
difference between the mean examination scores is used to assess quality differences
between the centers. The population means for the two centers are as follows.
μ1 ϭ the mean examination score for the population
of individuals trained at center A
μ 2 ϭ the mean examination score for the population
of individuals trained at center B
We begin with the tentative assumption that no difference exists between the training
quality provided at the two centers. Hence, in terms of the mean examination scores, the
null hypothesis is that μ1 Ϫ μ 2 ϭ 0. If sample evidence leads to the rejection of this hypothesis, we will conclude that the mean examination scores differ for the two populations.
This conclusion indicates a quality differential between the two centers and suggests that a
follow-up study investigating the reason for the differential may be warranted. The null and
alternative hypotheses for this two-tailed test are written as follows.
H0: μ1 Ϫ μ 2 ϭ 0
Ha: μ1 Ϫ μ 2 0
WEB
file
ExamScores
The standardized examination given previously in a variety of settings always resulted in
an examination score standard deviation near 10 points. Thus, we will use this information
to assume that the population standard deviations are known with σ1 ϭ 10 and σ2 ϭ 10. An
α ϭ .05 level of significance is specified for the study.
Independent simple random samples of n1 ϭ 30 individuals from training center A and
n2 ϭ 40 individuals from training center B are taken. The respective sample means are
x¯1 ϭ 82 and x¯ 2 ϭ 78. Do these data suggest a significant difference between the population
412
Chapter 10
Inference About Means and Proportions with Two Populations
means at the two training centers? To help answer this question, we compute the test statistic using equation (10.5).
zϭ
(x¯1 Ϫ x¯ 2 ) Ϫ D0
ͱ
σ 21
n1
ϩ
σ 22
n2
ϭ
(82 Ϫ 78) Ϫ 0
ͱ
102
102
ϩ
30
40
ϭ 1.66
Next let us compute the p-value for this two-tailed test. Because the test statistic z is in
the upper tail, we first compute the area under the curve to the right of z ϭ 1.66. Using the
standard normal distribution table, the area to the left of z ϭ 1.66 is .9515. Thus, the area
in the upper tail of the distribution is 1.0000 Ϫ .9515 ϭ .0485. Because this test is a twotailed test, we must double the tail area: p-value ϭ 2(.0485) ϭ .0970. Following the usual
rule to reject H0 if p-value Յ α, we see that the p-value of .0970 does not allow us to reject
H0 at the .05 level of significance. The sample results do not provide sufficient evidence to
conclude the training centers differ in quality.
In this chapter we will use the p-value approach to hypothesis testing as described in
Chapter 9. However, if you prefer, the test statistic and the critical value rejection rule may
be used. With α ϭ .05 and zα/2 ϭ z.025 ϭ 1.96, the rejection rule employing the critical value
approach would be reject H0 if z Յ Ϫ1.96 or if z Ն 1.96. With z ϭ 1.66, we reach the same
do not reject H0 conclusion.
In the preceding example, we demonstrated a two-tailed hypothesis test about the difference between two population means. Lower tail and upper tail tests can also be considered. These tests use the same test statistic as given in equation (10.5). The procedure for
computing the p-value and the rejection rules for these one-tailed tests are the same as those
presented in Chapter 9.
Practical Advice
In most applications of the interval estimation and hypothesis testing procedures presented
in this section, random samples with n1 Ն 30 and n 2 Ն 30 are adequate. In cases where either or both sample sizes are less than 30, the distributions of the populations become important considerations. In general, with smaller sample sizes, it is more important for the
analyst to be satisfied that it is reasonable to assume that the distributions of the two populations are at least approximately normal.
Exercises
Methods
SELF test
1. The following results come from two independent random samples taken of two
populations.
a.
b.
c.
Sample 1
Sample 2
n1 ϭ 50
x¯1 ϭ 13.6
σ1 ϭ 2.2
n2 ϭ 35
x¯2 ϭ 11.6
σ2 ϭ 3.0
What is the point estimate of the difference between the two population means?
Provide a 90% confidence interval for the difference between the two population means.
Provide a 95% confidence interval for the difference between the two population means.
10.1
SELF test
Inferences About the Difference Between Two Population Means: σ1 and σ2 Known
413
2. Consider the following hypothesis test.
H 0: μ 1 Ϫ μ 2 Յ 0
H a: μ 1 Ϫ μ 2 Ͼ 0
The following results are for two independent samples taken from the two populations.
a.
b.
c.
Sample 1
Sample 2
n1 ϭ 40
x¯1 ϭ 25.2
σ1 ϭ 5.2
n2 ϭ 50
x¯2 ϭ 22.8
σ2 ϭ 6.0
What is the value of the test statistic?
What is the p-value?
With α ϭ .05, what is your hypothesis testing conclusion?
3. Consider the following hypothesis test.
H0: μ1 Ϫ μ 2 ϭ 0
Ha: μ1 Ϫ μ 2 0
The following results are for two independent samples taken from the two populations.
a.
b.
c.
Sample 1
Sample 2
n1 ϭ 80
x¯1 ϭ 104
σ1 ϭ 8.4
n2 ϭ 70
x¯ 2 ϭ 106
σ2 ϭ 7.6
What is the value of the test statistic?
What is the p-value?
With α ϭ .05, what is your hypothesis testing conclusion?
Applications
SELF test
4. Condé Nast Traveler conducts an annual survey in which readers rate their favorite cruise
ship. All ships are rated on a 100-point scale, with higher values indicating better service.
A sample of 37 ships that carry fewer than 500 passengers resulted in an average rating of
85.36, and a sample of 44 ships that carry 500 or more passengers provided an average rating of 81.40 (Condé Nast Traveler, February 2008). Assume that the population standard
deviation is 4.55 for ships that carry fewer than 500 passengers and 3.97 for ships that carry
500 or more passengers.
a. What is the point estimate of the difference between the population mean rating for
ships that carry fewer than 500 passengers and the population mean rating for ships
that carry 500 or more passengers?
b. At 95% confidence, what is the margin of error?
c. What is a 95% confidence interval estimate of the difference between the population
mean ratings for the two sizes of ships?
5. The average expenditure on Valentine’s Day was expected to be $100.89 (USA Today,
February 13, 2006). Do male and female consumers differ in the amounts they spend?
The average expenditure in a sample survey of 40 male consumers was $135.67, and the
average expenditure in a sample survey of 30 female consumers was $68.64. Based on past
surveys, the standard deviation for male consumers is assumed to be $35, and the standard
deviation for female consumers is assumed to be $20.
414
Chapter 10
a.
b.
c.
WEB
file
Hotel
Inference About Means and Proportions with Two Populations
What is the point estimate of the difference between the population mean expenditure
for males and the population mean expenditure for females?
At 99% confidence, what is the margin of error?
Develop a 99% confidence interval for the difference between the two population means.
6. Suppose that you are responsible for making arrangements for a business convention. Because of budget cuts due to the recent recession, you have been charged with choosing a
city for the convention that has the least expensive hotel rooms. You have narrowed your
choices to Atlanta and Houston. The file named Hotel contains samples of prices for rooms
in Atlanta and Houston that are consistent with the results reported by Smith Travel Research (SmartMoney, March 2009). Because considerable historical data on the prices of
rooms in both cities are available, the population standard deviations for the prices can be
assumed to be $20 in Atlanta and $25 in Houston. Based on the sample data, can you conclude that the mean price of a hotel room in Atlanta is lower than one in Houston?
7. During the 2003 season, Major League Baseball took steps to speed up the play of baseball games in order to maintain fan interest (CNN Headline News, September 30, 2003).
The following results come from a sample of 60 games played during the summer of 2002
and a sample of 50 games played during the summer of 2003. The sample mean shows the
mean duration of the games included in each sample.
a.
b.
c.
d.
e.
8.
2002 Season
2003 Season
n1 ϭ 60
x¯1 ϭ 2 hours, 52 minutes
n 2 ϭ 50
x¯ 2 ϭ 2 hours, 46 minutes
A research hypothesis was that the steps taken during the 2003 season would reduce
the population mean duration of baseball games. Formulate the null and alternative hypotheses.
What is the point estimate of the reduction in the mean duration of games during the
2003 season?
Historical data indicate a population standard deviation of 12 minutes is a reasonable
assumption for both years. Conduct the hypothesis test and report the p-value. At a .05
level of significance, what is your conclusion?
Provide a 95% confidence interval estimate of the reduction in the mean duration of
games during the 2003 season.
What was the percentage reduction in the mean time of baseball games during the 2003
season? Should management be pleased with the results of the statistical analysis? Discuss. Should the length of baseball games continue to be an issue in future years? Explain.
Will improving customer service result in higher stock prices for the companies providing
the better service? “When a company’s satisfaction score has improved over the prior
year’s results and is above the national average (currently 75.7), studies show its shares
have a good chance of outperforming the broad stock market in the long run” (BusinessWeek, March 2, 2009). The following satisfaction scores of three companies for the 4th
quarters of 2007 and 2008 were obtained from the American Customer Satisfaction Index.
Assume that the scores are based on a poll of 60 customers from each company. Because
the polling has been done for several years, the standard deviation can be assumed to equal
6 points in each case.
Company
2007 Score
2008 Score
Rite Aid
Expedia
J.C. Penney
73
75
77
76
77
78
10.2
Inferences About the Difference Between Two Population Means: σ1 and σ2 Unknown
a.
b.
c.
d.
e.
10.2
415
For Rite Aid, is the increase in the satisfaction score from 2007 to 2008 statistically
significant? Use α ϭ .05. What can you conclude?
Can you conclude that the 2008 score for Rite Aid is above the national average of
75.7? Use α ϭ .05.
For Expedia, is the increase from 2007 to 2008 statistically significant? Use α ϭ .05.
When conducting a hypothesis test with the values given for the standard deviation,
sample size, and α, how large must the increase from 2007 to 2008 be for it to be statistically significant?
Use the result of part (d) to state whether the increase for J.C. Penney from 2007 to
2008 is statistically significant.
Inferences About the Difference Between Two
Population Means: σ1 and σ2 Unknown
In this section we extend the discussion of inferences about the difference between two
population means to the case when the two population standard deviations, σ1 and σ2 , are
unknown. In this case, we will use the sample standard deviations, s1 and s2 , to estimate the
unknown population standard deviations. When we use the sample standard deviations, the
interval estimation and hypothesis testing procedures will be based on the t distribution
rather than the standard normal distribution.
Interval Estimation of μ1 ؊ μ2
In the following example we show how to compute a margin of error and develop an interval estimate of the difference between two population means when σ1 and σ2 are unknown.
Clearwater National Bank is conducting a study designed to identify differences between
checking account practices by customers at two of its branch banks. A simple random
sample of 28 checking accounts is selected from the Cherry Grove Branch and an independent simple random sample of 22 checking accounts is selected from the Beechmont
Branch. The current checking account balance is recorded for each of the checking accounts. A summary of the account balances follows:
WEB
file
CheckAcct
Sample Size
Sample Mean
Sample Standard Deviation
Cherry Grove
Beechmont
n1 ϭ 28
x¯1 ϭ $1025
s1 ϭ $150
n 2 ϭ 22
x¯ 2 ϭ $910
s2 ϭ $125
Clearwater National Bank would like to estimate the difference between the mean
checking account balance maintained by the population of Cherry Grove customers and the
population of Beechmont customers. Let us develop the margin of error and an interval estimate of the difference between these two population means.
In Section 10.1, we provided the following interval estimate for the case when the
population standard deviations, σ1 and σ2 , are known.
ͱ
x¯1 Ϫ x¯ 2 Ϯ zα/2
σ 21 σ 22
n1 ϩ n2
416
Chapter 10
When σ1 and σ2 are
estimated by s1 and s2 , the
t distribution is used to
make inferences about the
difference between two
population means.
With σ1 and σ2 unknown, we will use the sample standard deviations s1 and s2 to estimate
σ1 and σ2 and replace zα/2 with tα/2. As a result, the interval estimate of the difference between two population means is given by the following expression:
Inference About Means and Proportions with Two Populations
INTERVAL ESTIMATE OF THE DIFFERENCE BETWEEN TWO POPULATION
MEANS: σ1 AND σ2 UNKNOWN
ͱ
s2
s2
x¯1 Ϫ x¯ 2 Ϯ tα/2 n1 ϩ n2
1
2
(10.6)
where 1 Ϫ α is the confidence coefficient.
In this expression, the use of the t distribution is an approximation, but it provides excellent
results and is relatively easy to use. The only difficulty that we encounter in using expression
(10.6) is determining the appropriate degrees of freedom for tα/2. Statistical software packages
compute the appropriate degrees of freedom automatically. The formula used is as follows:
DEGREES OF FREEDOM: t DISTRIBUTION WITH TWO INDEPENDENT RANDOM
SAMPLES
df ϭ
s 21
s2
ϩ 2
n1
n2
2
s 21 2
1
1
s 22
ϩ
n1 Ϫ 1 n1
n2 Ϫ 1 n2
(10.7)
2
Let us return to the Clearwater National Bank example and show how to use expression
(10.6) to provide a 95% confidence interval estimate of the difference between the population
mean checking account balances at the two branch banks. The sample data show n1 ϭ 28, x¯1 ϭ
$1025, and s1 ϭ $150 for the Cherry Grove branch, and n 2 ϭ 22, x¯ 2 ϭ $910, and s2 ϭ $125
for the Beechmont branch. The calculation for degrees of freedom for tα/2 is as follows:
df ϭ
n1
s 21
s 22
1
2 2
1
2
1
2
2
150 2
1252
2
n ϩ n
28 ϩ 22
ϭ
ϭ 47.8
1
s
s
1
150
125
1
1
ϩ
ϩ
Ϫ 1 n
n Ϫ 1 n
28 Ϫ 1 28
22 Ϫ 1 22
2 2
2
2 2
2 2
2
We round the noninteger degrees of freedom down to 47 to provide a larger t-value and a
more conservative interval estimate. Using the t distribution table with 47 degrees of freedom, we find t.025 ϭ 2.012. Using expression (10.6), we develop the 95% confidence interval estimate of the difference between the two population means as follows.
ͱ
s2
s2
x¯1 Ϫ x¯ 2 Ϯ t.025 n1 ϩ n2
1
2
1025 Ϫ 910 Ϯ 2.012
ͱ
150 2
1252
ϩ
28
22
115 Ϯ 78
The point estimate of the difference between the population mean checking account balances
at the two branches is $115. The margin of error is $78, and the 95% confidence interval