|
Applied Statistical Methods 1000
Extra Credit Exercises
Rules
- All extra credit should be your individual work; otherwise, points
will be deducted. [Students who wish to work together on these
problems must request my permission to do so in advance.]
- Hand them in to me in lecture, in a separate pile from regular
assignments.
Extra Credit 1
Due in lecture February 2. Worth 5 pts.
Students in a class were classified according to whether their major was
undecided or not, and whether they lived on or off campus. 40 students
lived off campus and had a decided major; 10 students lived off campus and
had an undecided major. 24 students lived on campus and had a decided
major; 26 students lived on campus and had an undecided major.
- First analyze the relationship:
- Complete a two-way table for the data.
- Which group has a higher proportion living on campus---the decided
or the undecided majors?
- Compute a table of counts expected if there were no relationship
between living situation and major decided or not.
- Calculate the chi-squared statistic.
- Which one of the following is the best way to summarize the
situation? (i) There is no statistically significant relationship between
living situation and major being decided or not. (ii) Year at Pitt is a
confounding variable in the relationship between living situation and
major decided or not. (iii) Living on campus prevents students from
deciding on a major. (iv) Deciding on a major causes students to move
off campus.
- [This is the challenging part!]
Now create two separate two-way tables for "underclassmen" and
"upperclassmen", whose counts together total to those in the original
table, but neither of which shows a significant relationship between
living situation and major being decided or not. In other words,
create a scenario which demonstrates Simpson's Paradox.
Extra Credit 2
Due in lecture March 12. Worth 5 pts.
Extra Credit Exercises 2 through 10 are based on student survey data
survey9-21-03.txt, which is taken to be
our population. To
download it into MINITAB, type ctrl A to highlight, ctrl C to copy, start up
MINITAB, type ctrl V to paste it. If it asks about delimiters, click OK.
The purpose of this exercise is to explore how sample size affects the
distribution of sample proportion.
- First verify that the population of categorical values for the
variable "live" has equal proportions of the two possible values "off"
and "on":
- Stat>Tables>Tally
- Variables Live
- Display Counts and Percents
- Next take repeated small samples (size 10) from the population of values
for the variable "live", for which the population proportion p living off
campus you have verified to be approximately 50% or .5. [About half of
our population of students live off campus, the other half on campus.]
Our theory about the behavior of sampling distributions is for an infinite
number of repetitions, but for practical purposes you will take 20
random samples altogether.
- Calc>Random Data>Sample from Columns
- Sample 10 rows from Column Live
- Store samples in Livesmallsample1
- Stat>Tables>Tally
- Variables Livesmallsample1
- Display Counts and Percents
- Create a column called "phatliven=10" and type in the sample proportion
living off campus (for example, .6 if the sample proportion is 60%)
- Calc>Random Data>Sample from Columns
- Sample 10 rows from Column Live
- Store samples in Livesmallsample2 [The easiest way to do this
is to simply change the "1" in the variable name to a "2".]
- Stat>Tables>Tally
- Variables Livesmallsample2 [Again, just change the "1" to a
"2".]
- Display Counts and Percents
- Type the second sample proportion as the second entry in the column
"phatliven=10"
- Repeat the process above 20 times altogether, finishing with
"Livesmallsample20", for which the proportion living off campus will be
the 20th entry in "phatliven=10"
- Finally, obtain summaries and display:
- Stat>Basic Statistics>Display Descriptive Statistics
- Enter the variable phatliven=10
- Graph>Stem-and-Leaf
- Enter the variable phatliven=10
- Summarize the distribution of sample proportion for samples of size
10 by reporting center, spread, and shape.
- Now take repeated large
samples (size 40) from the population of values
for the variable "live" (20 samples altogether):
- Calc>Random Data>Sample from Columns
- Sample 40 rows from Column Live
- Store samples in Livelargesample1
- Stat>Tables>Tally
- Variables Livelargesample1
- Display Counts and Percents
- Create a column called "phatliven=40" and type in the sample proportion
living off campus (for example, .525 if the sample proportion is 52.5%)
- Calc>Random Data>Sample from Columns
- Sample 40 rows from Column Live
- Store samples in Livelargesample2
- Stat>Tables>Tally
- Variables Livelargesample2
- Display Counts and Percents
- Type the second sample proportion as the second entry in the column
"phatliven=40"
- Repeat the process above 20 times altogether, finishing with
"Livelargesample20", for which the proportion living off campus will be
the 20th entry in "phatliven=40"
- Finally, obtain summaries and display:
- Stat>Basic Statistics>Display Descriptive Statistics
- Enter the variable phatliven=40
- Graph>Stem-and-Leaf
- Enter the variable phatliven=40
- Summarize the distribution of sample proportion for samples of size
40 by reporting center, spread, and shape.
- Lastly, and most importantly,
compare the centers, spreads, and shapes for samples of size 10 vs. 40.
- Stat>Basic Statistics>Display Descriptive Statistics
- Enter the variables phatliven=10 and
phatliven=40
- Stat>Basic Statistics>2-sample t
- Activate "Samples in Different Columns"
- Enter the variables phatliven=10 and
phatliven=40
- Check Graph>Boxplots of Data
Write a few sentences to compare the distribution of sample proportion
for small vs. large samples, including mention of center, spread, and shape.
Are your results consistent with the theory presented in Chapter 9?
Extra Credit 3
Prerequisite: Extra Credit 2. Due in lecture March 12. Worth 5 pts.
The purpose of this exercise is to explore how population shape
affects the distribution of sample proportion.
- First verify that the population of categorical values for the
variable "handed", for which we are interested in the proportion who are
ambidextrous, is very skewed: there are only about 3% (.03) who are ambidextrous; the remaining
97% favor either the right or the left hand.
- Stat>Tables>Tally
- Variables Handed
- Display Counts and Percents
- Next take repeated small
samples (size 10) from the population of values
for the variable "handed", for which the population proportion p who are
ambidextrous you have verified to be approximately 3% or .03.
Our theory about the behavior of sampling distributions is for an infinite
number of repetitions, but for practical purposes you will take 20
random samples altogether.
- Calc>Random Data>Sample from Columns
- Sample 10 rows from Column Handed
- Store samples in Handedsmallsample1
- Stat>Tables>Tally
- Variables Handedsmallsample1
- Display Counts and Percents
- Create a column called "phathandedn=10" and type in the sample proportion
ambidextrous (for example, .1 if the sample proportion is 10%, or 0 if
the sample only contains right- and left-handed people)
- Calc>Random Data>Sample from Columns
- Sample 10 rows from Column Handed
- Store samples in Handedsmallsample2 [The easiest way to do this
is to simply change the "1" in the variable name to a "2".]
- Stat>Tables>Tally
- Variables Handedsmallsample2 [Again, just change the "1" to a
"2".]
- Display Counts and Percents
- Type the second sample proportion as the second entry in the column
"phathandedn=10"
- Repeat the process above 20 times altogether, finishing with
"Handedsmallsample20", for which the proportion who are ambidextrous will be
the 20th entry in "phathandedn=10"
- Finally, obtain summaries and display:
- Stat>Basic Statistics>Display Descriptive Statistics
- Enter the variable phathandedn=10
- Graph>Stem-and-Leaf
- Enter the variable phathandedn=10
- Summarize the distribution of sample proportion for samples of size
10 by reporting center, spread, and shape.
- Now take repeated large
samples (size 40) from the population of values
for the variable "handed" (20 samples altogether):
- Calc>Random Data>Sample from Columns
- Sample 40 rows from Column Handed
- Store samples in Handedlargesample1
- Stat>Tables>Tally
- Variables Handedlargesample1
- Display Counts and Percents
- Create a column called "phathandedn=40" and type in the sample proportion
who are ambidextrous (for example, .075 if the sample proportion is 7.5%)
- Calc>Random Data>Sample from Columns
- Sample 40 rows from Column Handed
- Store samples in Handedlargesample2
- Stat>Tables>Tally
- Variables Handedlargesample2
- Display Counts and Percents
- Type the second sample proportion as the second entry in the column
"phathandedn=40"
- Repeat the process above 20 times altogether, finishing with
"Handedlargesample20", for which the proportion who are ambidextrous will be
the 20th entry in "phathandedn=40"
- Finally, obtain summaries and display:
- Stat>Basic Statistics>Display Descriptive Statistics
- Enter the variable phathandedn=40
- Graph>Stem-and-Leaf
- Enter the variable phathandedn=40
- Summarize the distribution of sample proportion for samples of size
40 by reporting center, spread, and shape.
- Next
compare the centers, spreads, and shapes for samples of size 10 vs. 40.
- Stat>Basic Statistics>Display Descriptive Statistics
- Enter the variables phathandedn=10 and
phathandedn=40
- Stat>Basic Statistics>2-sample t
- Activate "Samples in Different Columns"
- Enter the variables phathandedn=10 and
phathandedn=40
- Check Graph>Boxplots of Data
Are your results consistent with the theory presented in Chapter 9?
- Lastly, and most importantly, compare the shapes
of the distributions
of sample proportion for samples of size 10 coming from "Live" vs. from
"Handed" and for samples of size 40 coming from "Live" vs. from "Handed".
For which population do the distributions of sample proportion for a given
sample size tend to be
more normal, for the variable "Live" or for the variable "Handed"?
Extra Credit 4
Due in lecture March 12. Worth 5 pts.
Extra Credit Exercises 2 through 10 are based on student survey data
survey9-21-03.txt, which is taken to be
our population. To
download it into MINITAB, type ctrl A to highlight, ctrl C to copy, start up
MINITAB, type ctrl V to paste it. If it asks about delimiters, click OK.
The purpose of this exercise is to explore how sample size affects the
distribution of sample mean.
- First verify that our population of quantitative values for the
variable "math" has mean mu=610.44 and standard deviation sigma=72.14, and
that the shape is quite normal:
- Stat>Basic Statistics>Display Descriptive Statistics
- Variables Math
- Graph>Histogram
- Variables Math
- Now take repeated small samples (size 10) from the population of
quantitative values for the variable "math".
Our theory about the behavior of sampling distributions is for an infinite
number of repetitions, but for practical purposes you will take 20
random samples altogether.
- Calc>Random Data>Sample from Columns
- Sample 10 rows from Column Math
- Store samples in Mathsmallsample1
- Stat>Basic Statistics>Display Descriptive Statistics
- Variables Mathsmallsample1
- Create a column called "xbarmathn=10" and type in the sample mean
Math SAT score
- Calc>Random Data>Sample from Columns
- Sample 10 rows from Column Math
- Store samples in Mathsmallsample2
- Stat>Basic Statistics>Display Descriptive Statistics
- Variables Mathsmallsample2
- Type the second sample mean as the second entry in the column
"xbarmathn=10"
- Repeat the process above 20 times altogether, finishing with
"Mathsmallsample20", for which the sample mean Math SAT score will be
the 20th entry in "xbarmathn=10"
- Finally, obtain summaries and display:
- Stat>Basic Statistics>Display Descriptive Statistics
- Enter the variable xbarmathn=10
- Graph>Stem-and-Leaf
- Enter the variable xbarmathn=10
- Summarize the distribution of sample mean for samples of size
10 by reporting center, spread, and shape.
- Now take repeated large
samples (size 40) from the population of values
for the variable "math" (20 samples altogether):
- Calc>Random Data>Sample from Columns
- Sample 40 rows from Column Math
- Store samples in Mathlargesample1
- Stat>Basic Statistics>Display Descriptive Statistics
- Variables Mathlargesample1
- Create a column called "xbarmathn=40" and type in the sample mean
Math SAT score
- Calc>Random Data>Sample from Columns
- Sample 40 rows from Column Math
- Store samples in Mathlargesample2
- Stat>Basic Statistics>Display Descriptive Statistics
- Variables Mathlargesample2
- Type the second sample mean as the second entry in the column
"xbarmathn=40"
- Repeat the process above 20 times altogether, finishing with
"Mathlargesample20", for which the sample mean Math SAT score will be
the 20th entry in "xbarmathn=40"
- Finally, obtain summaries and display:
- Stat>Basic Statistics>Display Descriptive Statistics
- Enter the variable xbarmathn=40
- Graph>Stem-and-Leaf
- Enter the variable xbarmathn=40
- Summarize the distribution of sample mean for samples of size
40 by reporting center, spread, and shape.
- Lastly, and most importantly,
compare the centers, spreads, and shapes for samples of size 10 vs. 40.
- Stat>Basic Statistics>Display Descriptive Statistics
- Enter the variables xbarmathn=10 and
xbarmathn=40
- Stat>Basic Statistics>2-sample t
- Activate "Samples in Different Columns"
- Enter the variables xbarmathn=10 and
xbarmathn=40
- Check Graph>Boxplots of Data
Are your results consistent with the theory presented in Chapter 9? Write
a paragraph to explain your answer.
Extra Credit 5
Prerequisite: Extra Credit 4. Due in lecture March 12. Worth 5 pts.
The purpose of this exercise is to explore how population shape
affects the
distribution of sample mean.
- First verify that our population of quantitative values for the
variable "Earned" has mean mu=3.776 thousand
and standard deviation sigma=6.503, and
that the shape is quite skewed to the right:
- Stat>Basic Statistics>Display Descriptive Statistics
- Variables Earned
- Graph>Histogram
- Variables Earned
- Now take repeated small samples (size 10) from the population of
quantitative values for the variable "earned".
Our theory about the behavior of sampling distributions is for an infinite
number of repetitions, but for practical purposes you will take 20
random samples altogether.
- Calc>Random Data>Sample from Columns
- Sample 10 rows from Column Earned
- Store samples in Earnedsmallsample1
- Stat>Basic Statistics>Display Descriptive Statistics
- Variables Earnedsmallsample1
- Create a column called "xbarearnedn=10" and type in the sample mean
Earned SAT score
- Calc>Random Data>Sample from Columns
- Sample 10 rows from Column Earned
- Store samples in Earnedsmallsample2
- Stat>Basic Statistics>Display Descriptive Statistics
- Variables Earnedsmallsample2
- Type the second sample mean as the second entry in the column
"xbarearnedn=10"
- Repeat the process above 20 times altogether, finishing with
"Earnedsmallsample20", for which the sample mean Earned SAT score will be
the 20th entry in "xbarearnedn=10"
- Finally, obtain summaries and display:
- Stat>Basic Statistics>Display Descriptive Statistics
- Enter the variable xbarearnedn=10
- Graph>Stem-and-Leaf
- Enter the variable xbarearnedn=10
- Summarize the distribution of sample mean for samples of size
10 by reporting center, spread, and shape.
- Now take repeated large
samples (size 40) from the population of values
for the variable "earned" (20 samples altogether):
- Calc>Random Data>Sample from Columns
- Sample 40 rows from Column Earned
- Store samples in Earnedlargesample1
- Stat>Basic Statistics>Display Descriptive Statistics
- Variables Earnedlargesample1
- Create a column called "xbarearnedn=40" and type in the sample mean
Earned SAT score
- Calc>Random Data>Sample from Columns
- Sample 40 rows from Column Earned
- Store samples in Earnedlargesample2
- Stat>Basic Statistics>Display Descriptive Statistics
- Variables Earnedlargesample2
- Type the second sample mean as the second entry in the column
"xbarearnedn=40"
- Repeat the process above 20 times altogether, finishing with
"Earnedlargesample20", for which the sample mean Earned SAT score will be
the 20th entry in "xbarearnedn=40"
- Finally, obtain summaries and display:
- Stat>Basic Statistics>Display Descriptive Statistics
- Enter the variable xbarearnedn=40
- Graph>Stem-and-Leaf
- Enter the variable xbarearnedn=40
- Summarize the distribution of sample mean for samples of size
40 by reporting center, spread, and shape.
- Next
compare the centers, spreads, and shapes for samples of size 10 vs. 40.
- Stat>Basic Statistics>Display Descriptive Statistics
- Enter the variables xbarearnedn=10 and
xbarearnedn=40
- Stat>Basic Statistics>2-sample t
- Activate "Samples in Different Columns"
- Enter the variables xbarearnedn=10 and
xbarearnedn=40
- Check Graph>Boxplots of Data
Are your results consistent with the theory presented in Chapter 9? Write
a paragraph to explain your answer.
- Lastly, and most importantly, compare shapes of the distributions of
sample mean for samples of size 10 coming from "Math" vs. from "Earned",
and for samples of size 40 coming from "Math" vs. from "Earned". For which
population do the distributions of sample mean for a given sample size
tend to be more normal, for
the variable "Math" or for the variable "Earned"?
Extra Credit 6
Due in lecture March 30. Worth 5 pts.
Extra Credit Exercises 2 through 10 are based on student survey data
survey9-21-03.txt, which is taken to be
our population. To
download it into MINITAB, type ctrl A to highlight, ctrl C to copy, start up
MINITAB, type ctrl V to paste it. If it asks about delimiters, click OK.
The purpose of this exercise is to understand the long-run behavior of
confidence intervals.
- First verify that the population proportion p of all survey
respondents living off campus is almost exactly .5:
- Stat>Tables>Tally
- Variables Live, Check "Counts and Percents".
-
Next, take repeated samples (20 altogether) of size 40 from the population of
categorical values for the variable "live", obtaining a 90% confidence interval
each time for the "unknown" population proportion, based on each sample
proportion.
- Calc>Random Data>Sample from Columns
- Sample 40 rows from Column Live
- Store samples in Livesample1
- Stat>Basic Statistics>1 proportion
- Samples in columns Livesample1
- Options>Confidence Level>90 and Check "Use test and interval based
on normal distribution"
- The first confidence interval is shown in the session window; you will
need to examine all 20 intervals together once they've been produced.
- Calc>Random Data>Sample from Columns
- Sample 40 rows from Column Live
- Store samples in Livesample2
- Stat>Basic Statistics>1 proportion
- Samples in columns Livesample2
- [No need to specify 90% confidence and normal distribution, since these
continue to be enabled by default.]
- Repeat the process above 20 times altogether, finishing with
"Livesample20".
- Now examine all 20 confidence intervals. How many of them contain
the actual population proportion .5? In the long run, what percentage of
the 90% confidence intervals should contain p? [Note: keep all the
results in your session window handy if you intend to do Extra Credit 7,
which will focus on p-values rather than on confidence intervals.]
The purpose of this exercise is to understand the long-run behavior of
hypothesis tests.
- First recall that the population proportion p of all survey
respondents living off campus is almost exactly .5.
- Next examine all 20 p-values obtained in Extra Credit 6. [These were
produced for the two-sided test about the null hypothesis that the population
proportion p living off campus is .5] How many of them reject the null
hypothesis at the 10% level? In the long run, what percentage of
the tests should reject against the two-sided alternative at the 10% level,
when the null hypothesis is in fact true?
Extra Credit 8
Due in lecture April 13. Worth 5 pts.
Extra Credit Exercises 2 through 10 are based on student survey data
survey9-21-03.txt, which is taken to be
our population. To
download it into MINITAB, type ctrl A to highlight, ctrl C to copy, start up
MINITAB, type ctrl V to paste it. If it asks about delimiters, click OK.
The purpose of this exercise is to understand the long-run behavior of
confidence intervals.
- First verify that the population mean mu of all survey
respondents' Math SAT scores is 610.44, and the standard deviation sigma
is 72.14.
- Stat>Basic Statistics>Display Descriptive Statistics
- Variables Math
-
Next, take repeated samples (20 altogether) of size 40 from the population of
quantitative values for the variable "math", obtaining a 90% confidence
interval each time for the "unknown" population mean, based on each sample
mean.
- Calc>Random Data>Sample from Columns
- Sample 40 rows from Column Math
- Store samples in Mathsample1
- Stat>Basic Statistics>1-sample z
- Variables Mathsample1; Test Mean 610.44 [This will
be needed for Extra Credit 9] and enter Sigma as 72.14
- Options>Confidence Level>90
- The first confidence interval is shown in the session window; you will
need to examine all 20 intervals together once they've been produced.
- Calc>Random Data>Sample from Columns
- Sample 40 rows from Column Math
- Store samples in Mathsample2
- Stat>Basic Statistics>1-sample z
- Variables Mathsample2; Continue to Test Mean 610.44,
given sigma 72.14 and
opt for Confidence Level 90.
- Repeat the process above 20 times altogether, finishing with
"Mathsample20".
- Now examine all 20 confidence intervals. How many of them contain
the actual population mean 610.44? In the long run, what percentage of
the 90% confidence intervals should contain mu?
[Note: keep all the
results in your session window handy if you intend to do Extra Credit 9,
which will focus on p-values rather than on confidence intervals.]
The purpose of this exercise is to understand the long-run behavior of
hypothesis tests.
- First recall that the population mean Math SAT score mu of all
survey respondents was 610.44.
- Next examine all 20 p-values obtained in Extra Credit 8. [These were
produced for the two-sided z test about the null hypothesis that the population
mean mu was 610.44.] How many of them reject the null
hypothesis at the 10% level? In the long run, what percentage of
the tests should reject against the two-sided alternative at the 10% level,
when the null hypothesis is in fact true?
Extra Credit 10.
Due in lecture April 13. Worth 5 pts.
The purpose of this exercise is to learn about chi-square goodness of fit
tests.
- Read Section 15.3 of your textbook, pages 652 to 656 (2nd ed. 544 to 547).
- Find the counts of surveyed fall stats students in years 1, 2, 3, 4
by accessing the survey results:
survey9-21-03.txt. Carry out a chi-square
goodness of fit test by hand to determine if the population of undergraduate
stats students
may be evenly divided among years 1, 2, 3, 4, or if the proportions in the
various years differ significantly. ["Other" students should be excluded
from your calculations.]
[ Home
| Calendar
| Assignments
| Handouts
]
|