|
Applied Statistical Methods 1000
Assignments
Rules
- All assignments should be your individual work; otherwise, points
will be deducted. [Students who wish to work together on homework
must request my permission to do so in advance.]
- Because answer keys are made available after homework is turned
in, late homeworks will not be accepted. In a valid emergency,
your recitation instructor may make an exception.
- Your homework should be neat and well-organized. Show your work
and circle your answers. Your recitation instructor is a student
like you and will not take time to decipher poor handwriting, put
pages in order, or read notes scrawled in margins.
- Be sure to write or print your name at the top of the first page
of your homework. Put your name or initials at the top of each
additional sheet of paper or computer output. Staple your pages
together.
- Answer keys are placed on file in the Math-Stat Library
(4th floor Thackeray)
on Mondays after assignments are handed in. They are
on two-hour reserve so that you can take them out to be copied.
- Computer output must be circled/underlined and explained in order to receive full credit. Hand in printout of session window and/or graphs, not the worksheet of data values.
- With an Exercise for which you must find an article or internet report,
a copy must be handed in with your work.
Homework 0
Due in lecture January 9. Points shown total 2.
Exercise: Hand in an article or report about a statistical study;
tell what variable or variables are involved and whether they are quantitative
or categorical. If there are two variables, tell which is explanatory and
which is response.
Homework 00
Due in lecture January 16. Points shown total 6.
Exercise: Pick a quantitative variable from those in the survey.
Use MINITAB to display the variable's values with all three graphs discussed:
a dotplot, a histogram, and a stemplot. Report the median for center,
range for spread, and describe the shape. Be sure to mention if there
are outliers.
Exercise: Consider the values of one quantitative variable
in our survey compared for two categorical groups. First, state your
expectations about how the quantitative values would compare for the two
groups. Then use MINITAB to get
side-by-side boxplots and report the Five Number Summary for each. Tell
how their centers, spreads, and shapes compare. Use the 1.5*IQR Rule to
report the boundaries for low and high outliers in both groups, and tell
whether there are any outliers according to the Rule.
Exercise: Find an article or report about an experiment. Tell
what the variables of interest are, whether they are quantitative or
categorical, and which is explanatory and response. Describe the subjects,
treatments, whether or not the study was blind, etc.
Homework 1
Due in lecture January 23. Points shown total 20.5.
[1 pt.] |
1.7(a)(b) |
(page 9) |
[1 pt.] |
1.10 |
(page 9) |
[1.5 pts.] |
1.12 |
(page 9) |
[1.5 pts.] |
1.17 |
(page 10) |
[.5 pt.] |
1.18(a) |
(page 10) |
[1.5 pts.] |
2.5(b)(c)(d) |
(page 48) |
[2 pts.] |
2.7 |
(page 48) |
[2.5 pts.] |
2.16 |
(page 49) |
[1 pt.] |
2.26 |
(page 51) |
[2.5 pts.] |
2.41 |
(page 52) |
[.5 pt.] |
2.54(c) |
(page 53) |
[1.5 pts.] |
2.56(b)(c)(d) |
(page 53) |
[1 pt.] |
2.57(b)(c) |
(page 53) |
[.5 pt.] |
2.84(b) |
(page 55) |
[2 pts.] |
Exercise |
|
Find an article or report about an observational study.
Tell what the variables of interest are, whether they are quantitative or
categorical, which is explanatory and response (if there are two variables).
Are there any potential confounding variables that should have been controlled
for? Are there any other pitfalls of concern?
|
Homework 2
Due in lecture January 30. Points shown total 17.5.
[.5 pts.] |
3.9 |
(page 82) |
[.5 pts.] |
3.10(refer to 3.9(b) |
(page 82) |
[.5 pts.] |
3.11(refer to 3.9(b) |
(page 82) |
[.5 pt.] |
3.27(b) |
(page 83) |
[.5 pt.] |
3.39 |
(page 84) |
[.5 pt.] |
3.40 |
(page 84) |
[1 pt.] |
3.54 |
(page 85) |
[.5 pt.] |
3.59 |
(page 86) |
[1 pt.] |
3.62 |
(page 86) |
[1 pt.] |
4.4 |
(page 121) |
[1.5 pts.] |
4.7 |
(page 121) |
[.5 pt.] |
4.10 |
(page 122) |
[.5 pt.] |
4.20 |
(page 123) |
[1 pt.] |
4.31 |
(page 124) |
[1 pt.] |
4.38 |
(page 124) |
[1 pt.] |
4.57(a)(d) |
(page 125) |
[1.5 pts.] |
4.81 |
(page 128) |
[2 pts.] |
Exercise |
|
Find an article or internet report about a sample
survey. Tell if the variable(s) of interest is quantitative or categorical.
Then tell how the individuals were selected and whether or not you believe
they adequately represent the population of interest. Discuss whether any
of the 5 common problems in the selection process (using the wrong sampling
frame, etc.) apply, or if any
of the 7 pitfalls in the surveying process (deliberate bias, etc.)
apply. Were the questions open or closed?
|
[2 pts.] |
Exercise |
|
Pick two quantitative variables from our survey,
decide on roles of explanatory and response, and tell what you expect
to see in terms of their relationship. Use MINITAB to explore
the relationship between them: start by assessing the scatterplot.
Be sure to mention direction, form, strength, and outliers. Your
summary should tell the value of the correlation r and the equation
of the regression line, if the form appeared linear. Summarize your
findings in the contextof the specific variables chosen.
|
Homework 3
Due in lecture February 6. Points shown total 22.
[1 pt.] |
5.1(c)(d) |
(page 161) |
[1.5 pts.] |
5.3(b)(c)(d) |
(page 161) |
[1 pt.] |
5.4 |
(page 162) |
[1 pt.] |
5.10 |
(page 163) |
[.5 pt.] |
5.23 |
(page 164) |
[.5 pt.] |
5.32 |
(page 165) |
[6 pts.] |
5.55 |
(page 167) |
Use MINITAB; mark and hand in relevant output along with specific
answers to textbook questions. |
[1 pt.] |
6.3(d)(e) |
(page 193) |
[2.5 pts.] |
6.7 |
(page 194) |
[.5 pt.] |
6.15 |
(page 194) |
[1.5 pts.] |
6.25 |
(page 195) |
[1 pt.] |
6.50 |
(page 199) |
[2 pts.] |
Exercise |
|
Read Boys spur marriage and complete a two-way
table for this study consistent with all the numbers reported. Assume
that the 600 children are equally divided between boys and girls, and
assume that half of the fathers of girls ended up marrying the mother.
If the proportion marrying the mother is 42% higher in the case of boys,
how many would that be?
Use a chi-square procedure to tell whether the difference observed is
statistically significant.
|
[2 pts.] |
Exercise |
|
Pick two categorical variables from our survey, decide
which should be explanatory and response, and discuss if and how you expect
them to be related. Then analyze the relationship
between them: compare conditional percentages in the response category of
interest and tell whether the observed difference seems to you to be
significant. Then compute a table of counts expected if the variables were
not related, and compute the chi-square statistic. Use Table A.5 to tell
whether there is a statistically significant relationship.
|
Extra Credit 1
Due in lecture February 6. Worth 5 pts.
Students in a class were classified according to whether their major was
undecided or not, and whether they lived on or off campus. 40 students
lived off campus and had a decided major; 10 students lived off campus and
had an undecided major. 24 students lived on campus and had a decided
major; 26 students lived on campus and had an undecided major.
- First analyze the relationship:
- Complete a two-way table for the data.
- Which group has a higher proportion living on campus---the decided
or the undecided majors?
- Compute a table of counts expected if there were no relationship
between living situation and major decided or not.
- Calculate the chi-squared statistic.
- Which one of the following is the best way to summarize the
situation? (i) There is no statistically significant relationship between
living situation and major being decided or not. (ii) Year at Pitt is a
confounding variable in the relationship between living situation and
major decided or not. (iii) Living on campus prevents students from
deciding on a major. (iv) Deciding on a major causes students to move
off campus.
- Now create two separate two-way tables for "underclassmen" and
"upperclassmen", whose counts together total to those in the original
table, but neither of which show a significant relationship between
living situation and major being decided or not. In other words,
create a scenario which demonstrates Simpson's Paradox.
Homework 4
Due in lecture February 13. Points shown total 13.5.
[1.5 pts.] |
7.17 |
(page 241) |
[1 pt.] |
7.18 |
(page 241) |
[1 pt.] |
7.25(c)(d) |
(page 242) |
[2 pts.] |
7.34 |
(page 242) |
[1.5 pts.] |
7.78 |
(page 246) |
[1 pt.] |
7.83 |
(page 246) |
[.5 pt.] |
7.85 |
(page 246) |
[.5 pt.] |
7.93 |
(page 247) |
[.5 pt.] |
7.94 |
(page 247) |
[2 pts.] |
Exercise |
|
Write up and email me (directly, not as an attachment) a
personal coincidence
story that happened to you. Were the occurrences really so unlikely?
|
[2 pts.] |
Exercise |
|
Use the class survey to report the probability
distribution of year for the surveyed undergraduates (years 1, 2, 3, and
4). [You will need to tally the years and adjust the total to exclude
"other" students.] Find the mean, variance, and standard deviation. Use mean and
standard deviation in a sentence about the distribution of year in order
to tell what is typical for surveyed students.
|
Homework 5
Due in lecture February 20. Points shown total 17.5.
[1 pt.] |
8.6 |
(page 285) |
[1 pt.] |
8.7(b)(c) |
(page 285) |
[1.5 pts.] |
8.18 |
(page 286) |
[.5 pt.] |
8.27 |
(page 287) |
[1 pt.] |
8.34(a)(b) |
(page 287) |
[1.5 pts.] |
8.43(a)(c)(d) |
(page 288) |
[1 pt.] |
8.44(f)(g) |
(page 288) |
[1.5 pts.] |
8.45 |
(page 288) |
[1.5 pts.] |
8.49(b)(c)(d) |
(page 288) |
[1.5 pts.] |
8.50 |
(page 288) |
[1 pt.] |
8.51(b)(c) |
(page 288) |
[1 pt.] |
8.53(c)(d) |
(page 289) |
[1 pt.] |
8.54(d)(e) |
(page 289) |
[.5 pt.] |
8.56 |
(page 289) |
[1 pt.] |
8.60(a)(b) |
(page 289) |
[1 pt.] |
8.62 |
(page 289) |
Homework 6
Due in lecture February 27. Points shown total 17.5.
[.5 pts.] |
9.6(a) |
(page 319) |
[1 pt.] |
9.12(c)(d) |
(page 319) |
[.5 pt.] |
9.13 |
(page 319) |
[1 pt.] |
9.17(b)(c) |
(page 320) |
[1 pt.] |
9.28(b)(d) |
(page 321) |
[.5 pt.] |
9.30 |
(page 321) |
[1 pt.] |
9.45(c)(d) |
(page 322) |
[1 pt.] |
9.47(c)(d) |
(page 323) |
[1.5 pts.] |
9.55(b)(c)(d) |
(page 323) |
[.5 pt.] |
9.56 |
(page 323) |
[1.5 pts.] |
9.69 |
(page 324) |
[1.5 pts.] |
9.70 |
(page 324) |
[2 pts.] |
Exercise |
|
Assume the proportion of females in all intro Stat classes
is p=.5.
What are the mean and standard deviation of sample
proportion, if population proportion were indeed .5?
Use our class survey responses to find the sample proportion of
females in the survey.
Then use a normal approximation to find the
probability of a sample proportion as high as the one observed, if the
population proportion
were truly .5. Characterize the results, based on your probability, in words
such as ``not unusual'', ``unlikely'', ``almost impossible'', etc. Finally,
tell whether you believe p is .5.
|
[2 pts.] |
Exercise |
|
If students each picked a number truly at random from 1
to 20, then their responses would follow a ``uniform distribution'', with
each of the numbers appearing with probability 1/20=.05. It can
be shown that the
mean of all the numbers between 1 and 20 is 10.5, and the standard deviation
is 5.77. What are the mean and standard deviation of sample
mean selection for a sample of 400? students, if their selections are truly
random?
Use our class survey responses to find the sample mean ``random'' number
selected. Then use a normal approximation to find the
probability of a sample mean as high as the one observed, if the
population mean
were truly 10.5. Characterize the results, based on your probability, in words
such as ``not unusual'', ``unlikely'', ``almost impossible'', etc. Finally,
tell whether you have statistical evidence of bias in favor of higher numbers.
|
[2 pts.] |
Exercise |
|
Find an article or report that includes mention of
sample size and summarizes values of a categorical variable with a count,
proportion, or percentage. Based on that information, set up a 95%
confidence interval for population proportion in the category of interest.
|
Extra Credit 2
Due in lecture March 15. Worth 5 pts.
Extra Credit Exercises 2 through 10 are based on Fall 2003 Survey data
survey9-21-03.txt, which is taken to be
our population. To
download it into MINITAB, type ctrl A to highlight, ctrl C to copy, start up
MINITAB, type ctrl V to paste it. If it asks about delimiters, click OK.
Do not hand in all the details of the Session Window; please just include
the relevant descriptive summaries, graphs, and your answers to all questions
posed.
The purpose of this exercise is to explore how sample size affects the
distribution of sample proportion.
- First verify that the population of categorical values for the
variable "live" is very symmetric, with
equal proportions of the two possible values "off"
and "on":
- Stat>Tables>Tally
- Variables Live
- Display Counts and Percents
- Next take repeated small
samples (size 10) from the population of values
for the variable "live", for which the population proportion p living off
campus you have verified to be approximately 50% or .5. [About half of
our population of students live off campus, the other half on campus.]
Our theory about the behavior of sampling distributions is for an infinite
number of repetitions, but for practical purposes you will take 20
random samples altogether.
- Calc>Random Data>Sample from Columns
- Sample 10 rows from Column Live
- Store samples in Livesmallsample1
- Stat>Tables>Tally
- Variables Livesmallsample1
- Display Counts and Percents
- Create a column called "phatliven=10" and type in the sample proportion
living off campus (for example, .6 if the sample proportion is 60%)
- Calc>Random Data>Sample from Columns
- Sample 10 rows from Column Live
- Store samples in Livesmallsample2 [The easiest way to do this
is to simply change the "1" in the variable name to a "2".]
- Stat>Tables>Tally
- Variables Livesmallsample2 [Again, just change the "1" to a
"2".]
- Display Counts and Percents
- Type the second sample proportion as the second entry in the column
"phatliven=10"
- Repeat the process above 20 times altogether, finishing with
"Livesmallsample20", for which the proportion living off campus will be
the 20th entry in "phatliven=10"
- Finally, obtain summaries and display:
- Stat>Basic Statistics>Display Descriptive Statistics
- Enter the variable phatliven=10
- Graph>Stem-and-Leaf
- Enter the variable phatliven=10
- Summarize the distribution of sample proportion for samples of size
10 by reporting center, spread, and shape.
- Now take repeated large samples (size 40) from the population of values
for the variable "live" (20 samples altogether):
- Calc>Random Data>Sample from Columns
- Sample 40 rows from Column Live
- Store samples in Livelargesample1
- Stat>Tables>Tally
- Variables Livelargesample1
- Display Counts and Percents
- Create a column called "phatliven=40" and type in the sample proportion
living off campus (for example, .525 if the sample proportion is 52.5%)
- Calc>Random Data>Sample from Columns
- Sample 40 rows from Column Live
- Store samples in Livelargesample2
- Stat>Tables>Tally
- Variables Livelargesample2
- Display Counts and Percents
- Type the second sample proportion as the second entry in the column
"phatliven=40"
- Repeat the process above 20 times altogether, finishing with
"Livelargesample20", for which the proportion living off campus will be
the 20th entry in "phatliven=40"
- Finally, obtain summaries and display:
- Stat>Basic Statistics>Display Descriptive Statistics
- Enter the variable phatliven=40
- Graph>Stem-and-Leaf
- Enter the variable phatliven=40
- Summarize the distribution of sample proportion for samples of size
40 by reporting center, spread, and shape.
- Lastly, and most importantly,
compare the centers, spreads, and shapes for samples of size 10 vs. 40.
- Stat>Basic Statistics>Display Descriptive Statistics
- Enter the variables phatliven=10 and
phatliven=40
- Stat>Basic Statistics>2-sample t
- Activate "Samples in Different Columns"
- Enter the variables phatliven=10 and
phatliven=40
- Check Graph>Boxplots of Data
Write a few sentences to compare the distribution of sample proportion
for small vs. large samples, including mention of center, spread, and shape.
Are your results consistent with the theory presented in Chapter 9?
Extra Credit 3
Prerequisite: Extra Credit 2. Due in lecture March 15. Worth 5 pts.
The purpose of this exercise is to explore how population shape
affects the
distribution of sample proportion.
- First verify that the population of categorical values for the
variable "handed", for which we are interested in the proportion who are
ambidextrous, is very skewed, with only about 3% (.03) who are ambidextrous; the remaining
97% favor either the right or the left hand.
- Stat>Tables>Tally
- Variables Handed
- Display Counts and Percents
- Next take repeated small
samples (size 10) from the population of values
for the variable "handed", for which the population proportion p who are
ambidextrous you have verified to be approximately 3% or .03.
Our theory about the behavior of sampling distributions is for an infinite
number of repetitions, but for practical purposes you will take 20
random samples altogether.
- Calc>Random Data>Sample from Columns
- Sample 10 rows from Column Handed
- Store samples in Handedsmallsample1
- Stat>Tables>Tally
- Variables Handedsmallsample1
- Display Counts and Percents
- Create a column called "phathandedn=10" and type in the sample proportion
living off campus (for example, .1 if the sample proportion is 10%, or 0 if
the sample only contains right- and left-handed people)
- Calc>Random Data>Sample from Columns
- Sample 10 rows from Column Handed
- Store samples in Handedsmallsample2 [The easiest way to do this
is to simply change the "1" in the variable name to a "2".]
- Stat>Tables>Tally
- Variables Handedsmallsample2 [Again, just change the "1" to a
"2".]
- Display Counts and Percents
- Type the second sample proportion as the second entry in the column
"phathandedn=10"
- Repeat the process above 20 times altogether, finishing with
"Handedsmallsample20", for which the proportion who are ambidextrous will be
the 20th entry in "phathandedn=10"
- Finally, obtain summaries and display:
- Stat>Basic Statistics>Display Descriptive Statistics
- Enter the variable phathandedn=10
- Graph>Stem-and-Leaf
- Enter the variable phathandedn=10
- Summarize the distribution of sample proportion for samples of size
10 by reporting center, spread, and shape.
- Now take repeated large
samples (size 40) from the population of values
for the variable "handed" (20 samples altogether):
- Calc>Random Data>Sample from Columns
- Sample 40 rows from Column Handed
- Store samples in Handedlargesample1
- Stat>Tables>Tally
- Variables Handedlargesample1
- Display Counts and Percents
- Create a column called "phathandedn=40" and type in the sample proportion
who are ambidextrous (for example, .075 if the sample proportion is 7.5%)
- Calc>Random Data>Sample from Columns
- Sample 40 rows from Column Handed
- Store samples in Handedlargesample2
- Stat>Tables>Tally
- Variables Handedlargesample2
- Display Counts and Percents
- Type the second sample proportion as the second entry in the column
"phathandedn=40"
- Repeat the process above 20 times altogether, finishing with
"Handedlargesample20", for which the proportion who are ambidextrous will be
the 20th entry in "phathandedn=40"
- Finally, obtain summaries and display:
- Stat>Basic Statistics>Display Descriptive Statistics
- Enter the variable phathandedn=40
- Graph>Stem-and-Leaf
- Enter the variable phathandedn=40
- Summarize the distribution of sample proportion for samples of size
40 by reporting center, spread, and shape.
- Next
compare the centers, spreads, and shapes for samples of size 10 vs. 40.
- Stat>Basic Statistics>Display Descriptive Statistics
- Enter the variables phathandedn=10 and
phathandedn=40
- Stat>Basic Statistics>2-sample t
- Activate "Samples in Different Columns"
- Enter the variables phathandedn=10 and
phathandedn=40
- Check Graph>Boxplots of Data
Are your results consistent with the theory presented in Chapter 9?
- Lastly, and most importantly, compare the shapes of the distributions
of sample proportion for samples of size 10 coming from "Live" vs. from
"Handed" and for samples of size 40 coming from "Live" vs. from "Handed".
For which population do the distributions of sample proportion for a given
sample size tend to be
more normal, for the variable "Live" or for the variable "Handed"?
Extra Credit 4
Due in lecture March 15. Worth 5 pts.
Extra Credit Exercises 2 through 10 are based on Fall 2003 Survey data
survey9-21-03.txt, which is taken to be
our population. To
download it into MINITAB, type ctrl A to highlight, ctrl C to copy, start up
MINITAB, type ctrl V to paste it. If it asks about delimiters, click OK.
Do not hand in all the details of the Session Window; please just include
the relevant descriptive summaries, graphs, and your answers to all questions
posed.
The purpose of this exercise is to explore how sample size affects the
distribution of sample mean.
- First verify that our population of quantitative values for the
variable "math" has mean mu=610.44 and standard deviation sigma=72.14, and
that the shape is quite normal:
- Stat>Basic Statistics>Display Descriptive Statistics
- Variables Math
- Graph>Histogram
- Variables Math
- Now take repeated small samples (size 10) from the population of
quantitative values for the variable "math".
Our theory about the behavior of sampling distributions is for an infinite
number of repetitions, but for practical purposes you will take 20
random samples altogether.
- Calc>Random Data>Sample from Columns
- Sample 10 rows from Column Math
- Store samples in Mathsmallsample1
- Stat>Basic Statistics>Display Descriptive Statistics
- Variables Mathsmallsample1
- Create a column called "xbarmathn=10" and type in the sample mean
Math SAT score
- Calc>Random Data>Sample from Columns
- Sample 10 rows from Column Math
- Store samples in Mathsmallsample2
- Stat>Basic Statistics>Display Descriptive Statistics
- Variables Mathsmallsample2
- Type the second sample mean as the second entry in the column
"xbarmathn=10"
- Repeat the process above 20 times altogether, finishing with
"Mathsmallsample20", for which the sample mean Math SAT score will be
the 20th entry in "xbarmathn=10"
- Finally, obtain summaries and display:
- Stat>Basic Statistics>Display Descriptive Statistics
- Enter the variable xbarmathn=10
- Graph>Stem-and-Leaf
- Enter the variable xbarmathn=10
- Summarize the distribution of sample mean for samples of size
10 by reporting center, spread, and shape.
- Now take repeated large
samples (size 40) from the population of values
for the variable "math" (20 samples altogether):
- Calc>Random Data>Sample from Columns
- Sample 40 rows from Column Math
- Store samples in Mathlargesample1
- Stat>Basic Statistics>Display Descriptive Statistics
- Variables Mathlargesample1
- Create a column called "xbarmathn=40" and type in the sample mean
Math SAT score
- Calc>Random Data>Sample from Columns
- Sample 40 rows from Column Math
- Store samples in Mathlargesample2
- Stat>Basic Statistics>Display Descriptive Statistics
- Variables Mathlargesample2
- Type the second sample mean as the second entry in the column
"xbarmathn=40"
- Repeat the process above 20 times altogether, finishing with
"Mathlargesample20", for which the sample mean Math SAT score will be
the 20th entry in "xbarmathn=40"
- Finally, obtain summaries and display:
- Stat>Basic Statistics>Display Descriptive Statistics
- Enter the variable xbarmathn=40
- Graph>Stem-and-Leaf
- Enter the variable xbarmathn=40
- Summarize the distribution of sample mean for samples of size
40 by reporting center, spread, and shape.
- Lastly, and most importantly,
compare the centers, spreads, and shapes for samples of size 10 vs. 40.
- Stat>Basic Statistics>Display Descriptive Statistics
- Enter the variables xbarmathn=10 and
xbarmathn=40
- Stat>Basic Statistics>2-sample t
- Activate "Samples in Different Columns"
- Enter the variables xbarmathn=10 and
xbarmathn=40
- Check Graph>Boxplots of Data
Are your results consistent with the theory presented in Chapter 9? Write
a paragraph to explain your answer.
Extra Credit 5
Prerequisite: Extra Credit 4. Due in lecture March 15. Worth 5 pts.
The purpose of this exercise is to explore how population shape affects the
distribution of sample mean.
- First verify that our population of quantitative values for the
variable "Earned" has mean mu=3.776 thousand
and standard deviation sigma=6.503, and
that the shape is quite skewed to the right:
- Stat>Basic Statistics>Display Descriptive Statistics
- Variables Earned
- Graph>Histogram
- Variables Earned
- Now take repeated small samples (size 10) from the population of
quantitative values for the variable "earned".
Our theory about the behavior of sampling distributions is for an infinite
number of repetitions, but for practical purposes you will take 20
random samples altogether.
- Calc>Random Data>Sample from Columns
- Sample 10 rows from Column Earned
- Store samples in Earnedsmallsample1
- Stat>Basic Statistics>Display Descriptive Statistics
- Variables Earnedsmallsample1
- Create a column called "xbarearnedn=10" and type in the sample mean
Earned SAT score
- Calc>Random Data>Sample from Columns
- Sample 10 rows from Column Earned
- Store samples in Earnedsmallsample2
- Stat>Basic Statistics>Display Descriptive Statistics
- Variables Earnedsmallsample2
- Type the second sample mean as the second entry in the column
"xbarearnedn=10"
- Repeat the process above 20 times altogether, finishing with
"Earnedsmallsample20", for which the sample mean Earned SAT score will be
the 20th entry in "xbarearnedn=10"
- Finally, obtain summaries and display:
- Stat>Basic Statistics>Display Descriptive Statistics
- Enter the variable xbarearnedn=10
- Graph>Stem-and-Leaf
- Enter the variable xbarearnedn=10
- Summarize the distribution of sample mean for samples of size
10 by reporting center, spread, and shape.
- Now take repeated large
samples (size 40) from the population of values
for the variable "earned" (20 samples altogether):
- Calc>Random Data>Sample from Columns
- Sample 40 rows from Column Earned
- Store samples in Earnedlargesample1
- Stat>Basic Statistics>Display Descriptive Statistics
- Variables Earnedlargesample1
- Create a column called "xbarearnedn=40" and type in the sample mean
Earned SAT score
- Calc>Random Data>Sample from Columns
- Sample 40 rows from Column Earned
- Store samples in Earnedlargesample2
- Stat>Basic Statistics>Display Descriptive Statistics
- Variables Earnedlargesample2
- Type the second sample mean as the second entry in the column
"xbarearnedn=40"
- Repeat the process above 20 times altogether, finishing with
"Earnedlargesample20", for which the sample mean Earned SAT score will be
the 20th entry in "xbarearnedn=40"
- Finally, obtain summaries and display:
- Stat>Basic Statistics>Display Descriptive Statistics
- Enter the variable xbarearnedn=40
- Graph>Stem-and-Leaf
- Enter the variable xbarearnedn=40
- Summarize the distribution of sample mean for samples of size
40 by reporting center, spread, and shape.
- Next
compare the centers, spreads, and shapes for samples of size 10 vs. 40.
- Stat>Basic Statistics>Display Descriptive Statistics
- Enter the variables xbarearnedn=10 and
xbarearnedn=40
- Stat>Basic Statistics>2-sample t
- Activate "Samples in Different Columns"
- Enter the variables xbarearnedn=10 and
xbarearnedn=40
- Check Graph>Boxplots of Data
Are your results consistent with the theory presented in Chapter 9? Write
a paragraph to explain your answer.
- Lastly, and most importantly, compare shapes of the distributions
of sample mean fro samples of size 10 coming from "Math" vs. "Earned"
and for samples of size 40 coming from "Math" vs. from "Earned". For
which population do the distributions of sample mean for a given sample
size tend to be more normal, for the variable "Math" or for the variable
"Earned"?
Homework 7
Due in lecture March 5. Points shown total 10.
[1 pt.] |
10.1(a)(b) |
(page 349) |
[.5 pts.] |
10.11(e) |
(page 350) |
[1.5 pts.] |
10.13 |
(page 350) |
[.5 pt.] |
10.14 |
(page 350) |
[.5 pt.] |
10.19(a) |
(page 350) |
[.5 pt.] |
10.24 |
(page 351) |
[1 pt.] |
10.27(a)(c) |
(page 351) |
[1 pt.] |
10.28 |
(page 351) |
[.5 pt.] |
10.31 |
(page 351) |
[.5 pt.] |
10.33 |
(page 351) |
[.5 pt.] |
10.42 |
(page 352) |
[2 pts.] |
Exercise |
|
Here is an excerpt from a Pittsburgh Post-Gazette article entitled
Criminal pasts cited for many city school bus drivers:
"State auditors checking the records of a random sample of 100 city bus drivers
have found that more than a quarter of them had criminal histories.
The audit also found that 26 of the drivers were never checked for child abuse
histories---in Pennsylvania schools, a mandate for all employees and even some
volunteers.
In all, the auditors discovered 80 convictions for various offenses among the
100 sampled. Thirty-four of those incidents occurred more than ten years ago,
including one rape and four drug offenses.
In Pennsylvania, it's perfectly legal for school officials to hire a bus driver
with certain convictions that are more than five years old---but that doesn't
mean they should, state Auditor General Robert P. Casey Jr. said yesterday
in releasing the report.
``No one convicted of rape should be driving a school bus full of children,''
said Casey, who also said he was disappointed with the school district's
initial response to the audit. ``The General Assembly needs to look at this
law,'' he said.
A series of problems last year with school bus drivers---including a February
accident that was nearly fatal to an 8-year-old Elliott girl---prompted Casey
to take a closer look at Pittsburgh's staff of 750 drivers, he said.
When his office presented their results to school officials about eight months
ago, Casey said, 'they were very reluctant to do anything about it,' and sent
him only a brief response outlining what steps were being taken to remedy the
problems..."
Note that the article states that about 25% in a sample of Pittsburgh school
bus drivers had criminal records. Report a 98% confidence interval for the
proportion of all Pittsburgh school bus drivers with criminal records. One
of the conditions for our approximation is not quite met; what is it?
|
Homework 8
Due in lecture March 19. Points shown total 14.5
[1 pt.] |
11.1(c)(d) |
(page 382) |
[.5 pt.] |
11.6 |
(page 383) |
[1 pt.] |
11.10(a)(b) |
(page 383) |
[1.5 pts.] |
11.19(a)(b)(c) |
(page 384) |
[3 pts.] |
11.27 |
(page 384) |
[.5 pt.] |
11.29 |
(page 385) |
[.5 pt.] |
11.36 |
(page 385) |
[.5 pt.] |
11.45 |
(page 386) |
[2 pts.] |
11.55 |
(page 387) |
[2 pts.] |
Exercise |
|
In a previous Exercise, we explored the sampling
distribution
of sample proportion of females, when random samples are taken from a
population where the proportion of females is .5. We noted the sample
proportion of females among surveyed Stats students, and calculated by hand
the probability of observing such a high sample proportion, if population
proportion were really only .5. We used this probability to decide whether
we were willing to believe that population proportion is in fact .5.
For this Exercise, address the same question by carrying out a formal
hypothesis test using MINITAB. Be sure to specify the appropriate
alternative hypothesis. State your conclusions clearly in context.
|
[2 pts.] |
Exercise |
|
Refer to the article How not to catch a spy:
Use a lie detector, which reports at the bottom of the first column,
``Even if the test were designed to catch eight of every 10 spies, it would
produce false results for large numbers of people. For every 10,000 employees
screened, Fienberg said, eight real spies would be singled out, but 1,598
innocent people would be singled out with them, with no hint of who's a spy
and who isn't.'' Based on this information, set up a two-way table,
classifying 10,000 employees as actually being spies or not, and being singled
out as a spy by the lie detector or not.
Report the probability of a Type I Error and of a Type II Error.
If someone is identified by the lie detector as being a spy, what is the
probability that he or she is actually a spy?
|
Extra Credit 6
Due in lecture March 29. Worth 5 pts.
Extra Credit Exercises 2 through 10 are based on Fall 2003 Survey data
survey9-21-03.txt, which is taken to be
our population. To
download it into MINITAB, type ctrl A to highlight, ctrl C to copy, start up
MINITAB, type ctrl V to paste it. If it asks about delimiters, click OK.
The purpose of this exercise is to understand the long-run behavior of
confidence intervals.
- First verify that the population proportion p of all Fall 2003 survey
respondents living off campus is almost exactly .5:
- Stat>Tables>Tally
- Variables Live, Check "Counts and Percents".
-
Next, take repeated samples (20 altogether) of size 40 from the population of
categorical values for the variable "live", obtaining a 90% confidence interval
each time for the "unknown" population proportion, based on each sample
proportion.
- Calc>Random Data>Sample from Columns
- Sample 40 rows from Column Live
- Store samples in Livesample1
- Stat>Basic Statistics>1 proportion
- Samples in columns Livesample1
- Options>Confidence Level>90 and Check "Use test and interval based
on normal distribution"
- The first confidence interval is shown in the session window; you will
need to examine all 20 intervals together once they've been produced.
- Calc>Random Data>Sample from Columns
- Sample 40 rows from Column Live
- Store samples in Livesample2
- Stat>Basic Statistics>1 proportion
- Samples in columns Livesample2
- [No need to specify 90% confidence and normal distribution, since these
continue to be enabled by default.]
- Repeat the process above 20 times altogether, finishing with
"Livesample20".
- Now examine all 20 confidence intervals. How many of them contain
the actual population proportion .5? In the long run, what percentage of
the 90% confidence intervals should contain p? [Note: keep all the
results in your session window handy if you intend to do Extra Credit 7,
which will focus on p-values rather than on confidence intervals.]
The purpose of this exercise is to understand the long-run behavior of
hypothesis tests.
- First recall that the population proportion p of all Fall 2003 survey
respondents living off campus is almost exactly .5.
- Next examine all 20 p-values obtained in Extra Credit 6. [These were
produced for the two-sided test about the null hypothesis that the population
proportion p living off campus is .5] How many of them reject the null
hypothesis at the 10% level? In the long run, what percentage of
the tests should reject against the two-sided alternative at the 10% level,
when the null hypothesis is in fact true?
Homework 9
Due in lecture March 26. Points shown total 22.5.
[1 pt.] |
12.2(b)(c) |
(page 425) |
[1 pt.] |
12.10(a)(b) |
(page 426) |
[1 pt.] |
12.13(b)(c) |
(page 426) |
[.5 pts.] |
12.26(e) |
(page 427) |
[1.5 pts.] |
12.28(a)(b)(c) |
(page 427) |
[1 pt.] |
12.32 |
(page 428) |
[.5 pt.] |
12.36(c) |
(page 428) |
[3.5 pts.] |
12.42 |
(page 429) |
Use MINITAB; mark and hand in relevant output along with specific
answers to textbook questions. |
[.5 pt.] |
12.69(c) |
(page 432) |
[2 pts.] |
Exercise |
|
In a previous Exercise, we explored the sampling
distribution
of sample mean number selected, when random samples are taken from a
population where all numbers between 1 and 20 are equally likely, so
population mean is 10.5. We noted the sample mean selection by surveyed Stats
students, and calculated by hand
the probability of observing such a high sample mean, if population
mean were really only 10.5. We used this probability to decide whether
we were willing to believe that population mean was in fact 10.5, or if
students were rather biased towards higher numbers.
For this Exercise, address the same question by using MINITAB to set up a
confidence interval for unknown population mean selection, given that
population standard deviation is 5.77. Does your interval contain 10.5?
What do you conclude?
|
[2 pts.] |
Exercise |
|
For this Exercise, address the same question again by using MINITAB to set up a
confidence interval for unknown population mean selection, but this time
assume population standard deviation is unknown. Does your interval contain
10.5? What do you conclude?
|
[2 pts.] |
Exercise |
|
Find paired data in our survey, such as math and verbal
SATs, ages of mothers and fathers, heights of females and their mothers,
or heights of males and their fathers. Use MINITAB to test Ho: mu(d)=0
against an appropriate Ha. State your conclusion in terms of the
variable chosen.
|
[2 pts.] |
Exercise |
|
Compare values of a quantitative survey variable for
two categorical groups, such as males and females or on and off campus
students, by testing Ho: mu1-mu2=0 against an appropriate Ha.
State your conclusion in terms of the variable chosen.
|
[2 pts.] |
Exercise |
|
Read the article The most important meal, which
reports that in a study of American eight-graders in 96 public schools in
San Diego, New Orleans, Minneapolis, and Austin, overweight students were
more likely to skip breakfast than students who were not overweight.
Unstack the data in our class survey according to gender, then for each
gender group test the null hypothesis of equal weights for students who
did and did not eat breakfast, according to their survey responses. Make
sure to formulate the correct alternative hypothesis.
|
[2 pts.] |
Exercise |
|
Read Science lifts 'mummy's curse' and use the
means for Age at death, exposed vs. unexposed, along with the sample
sizes n and standard deviations (in parentheses) to test for a significant
difference in age at death between those who were and were not exposed to
the ``mummy's curse''. State your conclusions clearly.
|
Homework 10
Due in lecture April 9. Points shown total 23.
[1 pt.] |
13.2(c)(d) |
(page 482) |
[2 pts.] |
13.5 |
(page 482) |
Instead of n=28 or n=81, use n=65 for (a)(b)(c)(d) |
[3.5 pts.] |
13.7 |
(page 482) |
[2.5 pts.] |
13.11 |
(page 482) |
[1.5 pts.] |
13.14 |
(page 483) |
[1 pt.] |
13.16(d) and add (e) |
(page 483) |
(e) If you were using Table A.2, keeping in mind Ha, the value of t,
and the df, what range would you report for the p-value? |
[.5 pts.] |
13.33(b) |
(page 485) |
[1.5 pts.] |
13.70 |
(page 489) |
MINITAB is optional |
[1.5 pts.] |
14.2 |
(page 518) |
[1 pt.] |
14.11(b)(c) |
(page 519) |
[1.5 pts.] |
14.13(a)(b)(c) |
(page 519) |
[1 pt.] |
14.17(c)(d) |
(page 520) |
[.5 pt.] |
14.33 |
(page 522) |
[2 pts.] |
Exercise |
|
Find two quantitative variables from our survey,
summarize their relationship as in Chapter 5 (see Exercise end of HW2), and
then test Ho: beta1=0.
State your conclusions in terms of the variables of interest.
|
[2 pts.] |
Exercise |
|
Compare values of a quantitative survey variable for
more than two categorical groups by carrying out an ANOVA test in MINITAB.
State your conclusions in terms of the particular variables chosen.
|
Extra Credit 8
Due in lecture April 12. Worth 5 pts.
Extra Credit Exercises 2 through 10 are based on Fall 2003 Survey data
survey9-21-03.txt, which is taken to be
our population. To
download it into MINITAB, type ctrl A to highlight, ctrl C to copy, start up
MINITAB, type ctrl V to paste it. If it asks about delimiters, click OK.
The purpose of this exercise is to understand the long-run behavior of
confidence intervals.
- First verify that the population mean mu of all Fall 2003 survey
respondents' Math SAT scores is 610.44, and the standard deviation sigma
is 72.14.
- Stat>Basic Statistics>Display Descriptive Statistics
- Variables Math
-
Next, take repeated samples (20 altogether) of size 40 from the population of
quantitative values for the variable "math", obtaining a 90% confidence
interval each time for the "unknown" population mean, based on each sample
mean.
- Calc>Random Data>Sample from Columns
- Sample 40 rows from Column Math
- Store samples in Mathsample1
- Stat>Basic Statistics>1-sample z
- Variables Mathsample1; Test Mean 610.44 [This will
be needed for Extra Credit 9] and enter Sigma as 72.14
- Options>Confidence Level>90
- The first confidence interval is shown in the session window; you will
need to examine all 20 intervals together once they've been produced.
- Calc>Random Data>Sample from Columns
- Sample 40 rows from Column Math
- Store samples in Mathsample2
- Stat>Basic Statistics>1-sample z
- Variables Mathsample2; Continue to Test Mean 610.44,
given sigma 72.14 and
opt for Confidence Level 90.
- Repeat the process above 20 times altogether, finishing with
"Mathsample20".
- Now examine all 20 confidence intervals. How many of them contain
the actual population mean 610.44? In the long run, what percentage of
the 90% confidence intervals should contain mu?
[Note: keep all the
results in your session window handy if you intend to do Extra Credit 9,
which will focus on p-values rather than on confidence intervals.]
The purpose of this exercise is to understand the long-run behavior of
hypothesis tests.
- First recall that the population mean Math SAT score mu of all Fall 2003
survey respondents was 610.44.
- Next examine all 20 p-values obtained in Extra Credit 8. [These were
produced for the two-sided z test about the null hypothesis that the population
mean mu was 610.44.] How many of them reject the null
hypothesis at the 10% level? In the long run, what percentage of
the tests should reject against the two-sided alternative at the 10% level,
when the null hypothesis is in fact true?
Extra Credit 10.
Due in lecture April 12. Worth 5 pts.
The purpose of this exercise is to learn about chi-square goodness of fit
tests.
- Read Section 15.3 of your textbook, pages 544 to 547.
- Find the counts of surveyed fall stats students in years 1, 2, 3, 4
by accessing the survey results:
survey9-21-03.txt. Carry out a chi-square
goodness of fit test by hand to determine if the population of stats students
may be evenly divided among years 1, 2, 3, 4, or if the proportions in the
various years differ significantly. ["Other" students should be excluded
from your calculations.]
Homework 11
Due in lecture Monday, April 12. Points shown total 19.
[1 pt.] |
15.1(c)(d) |
(page 550) |
[1 pt.] |
15.2(c)(d) |
(page 550) |
Use Table A.5. |
[1 pt.] |
15.14(a)(b) |
(page 552) |
[2 pts.] |
15.41(a)(b)(d) |
(page 556) |
[1 pt.] |
16.1(b)(c) |
(page 584) |
[1.5 pts.] |
16.3 |
(page 584) |
[1.5 pts.] |
16.6(b)(c)(d) |
(page 585) |
Your conclusion should state whether or not population means could
be equal. |
[2 pts.] |
16.7 |
(page 585) |
[5 pts.] |
16.9(a)(b) and add on (c)(d) |
(page 585) |
(c) State Ho and Ha. (d) Use MINITAB to carry out a test, then state
your conclusions. |
[1 pt.] |
16.11 |
(page 586) |
[2 pts.] |
Exercise |
|
Pick two categorical variables from our survey. Decide
which should be explanatory (row variable) and which response. Use MINITAB
to compare conditional percentages in each row [explanatory variable must
be entered before response] and carry out a chi-square test for a
relationship. Use Table A.5 to give a range for the P-value.
|
[ Home
| Calendar
| Assignments
| Handouts
]
|