Location: Anthony Tanbakuchi / Courses / MAT167 / MAT167 Introduction to Statistics

MAT167 Introduction to Statistics

An introduction to statistics. Includes sampling, data display, measures of central tendency, variability, and position; random variables, probability, probability distributions; sampling distributions, assessing normality, confidence intervals, hypothesis testing, ANOVA, and regression. Use of the statistics software R is taught throughout the course.

Announcements

Final grades have been posted. The solutions to the final exam are now on the website. Have a good summer and thanks for all your hard work this semester. –Anthony

Course Info

• Spring 2009: section 22684, 3 credit hours.
• 9:10 am - 10:25 am Tuesday & Thursday, Santa Rita A102, Jan 20 through May 19 2009, West Campus, Pima Community College.
• Syllabus

Exam Dates

• Feb 24: MIDTERM I
• April 16: MIDTERM II
• May 19: Final Exam Ch 1-12 (2 hours)

Resources

R Statistics Software

Using R on Campus:
R is on a few computers in the Academic Commons Computer Lab (2nd floor Santa Catalina building). Just go into the computer lab and ask Jody or Dennis (one of the lab managers) where the computers are. Printers are also available in this lab.
To install R:
Visit the R Resources page.
Basic R usage and examples:
R Basics page
R Data Sets:
• Triola Book Data Page
• Class Survey Data Page
• See quick reference sheet or R intro lecture for info on how to use the data sets.
• Technical Notes On R:
When you start a new problem, it’s best to delete all the variables to ensure you don’t accidentally use old data, just type `rm(list=ls())`. Note that you will need to reload the book data if you need it.
When you close R, you do not need to save the workspace if it asks. Saving the workspace just saves the variables you have defined.

Lectures and Homework

All homework is due at the beginning of class on Tuesday. Thus, homework assigned on Tuesday and Thursday is due at the beginning of class on the following Tuesday.

1. Tue, Jan 20

FOUNDATIONS

Introductory Material. (Sections 1.1-1.4)

2. Thur, Jan 22

Introduction to R.

3. Tue, Jan 27

DESCRIPTIVE STATISTICS

Summarizing & graphing data. (Sections 2.1-2.4)

• Lecture:Handout

• Homework:

• From this point forward if the book has the TI symbol next to a problem, use R to do it.
• As always, make sure to include your plots made with R in the HW.
• If you get stuck using R take a look at these R examples.
1. Sec 2.2: 1-17 odds (do 17 by hand)

2. Sec 2.3: 1, 3.

3. Additional Problem for 2.3: Use R to make two histograms: one of the male heights and of the female heights in the Appendix B Data Set 1 (`Mhealth` and `Fhealth` tables in R). Include both of the histograms in your HW. Then write a paragraph discussing the differences between the male and female heights that you can see from the histograms (ie. center, variation, shape, outliers, min, max). Do either of the histograms have a distribution that is approximately normal?

HINTS: If you are having trouble getting the book data, see the R intro lecture or look at the back of the quick reference sheet. See the top part of this page to download the data sets.

4. Sec 2.4: 1-4, 9 (use R), 13 (just sketch by hand), 17 (use R), 19 (use R)

Hint for 19: to make a plot with Both lines and points rather than a scatter plot use the optional argument `type="b"` for the plot function. ex. `plot(t, y, type="b")`. The time vector `t` goes from 1990 to 2000. A quick way to make `t` is to use this shortcut: `t=1990:2000`.

4. Thur, Jan 29

Summation Notation.

• Lecture: Handout

• Homework: (If you need more explanation and practice with summation notation: see this page )

5. Tue, Feb 3

Measures of center. (Sections 3.1-3.2)

• Lecture: Handout

• Homework:

1. Sec 3.2: 1-9 odds, 13, 15, 21, 23, 25, 29 (Use R if the problem has TI by it from now on.)

Hint for 21: to get the first set of differences use:

``x=WEATHER\$HIGH-WEATHER\$PREDICTE``

You can find the second set of differences in the same way once you figure out the correct column name. (I admit the author’s column names are not that good).

Hint for 23: to get the pennies for before 1983:

``x=Coins\$WEIGHT[Coins\$TYPE=="Pre-1983 Pennies"]``

To find the post 1983 pennies use the same method but take a look at the `Coins` table to see what they are called and then modify the above statement.

Hint on 29 b: `mean(x, trim=0.10)`

6. Thur, Feb 5

Measures of variation. (Sections 3.3)

• Lecture: Handout

• Homework:

1. Sec 3.3: 1-9 odds, 15, 21
7. Tue, Feb 10

Relative standing and exploratory data analysis. (Sections 3.4-3.5)

• Lecture: Handout

• Homework

1. Sec 3.4: 1, 5, 7, 9, 11, 13-27 odds

2. Sec 3.5: 1, 3, 5, 9

3. Additional problem: Use the following code to make two boxplots for comparing gender against bear weight and length. Then use the boxplots to discuss and compare the distribution of lengths and weights of bears in terms of their gender. (Make sure the book data is loaded into R first)

``````boxplot(Bears\$LENGTH ~ Bears\$SEX, main="Comparison of bear length")
boxplot(Bears\$WEIGHT ~ Bears\$SEX, main="Comparison of bear weight")``````
8. Thur, Feb 12

Descriptive Statistics: Case Study.

PROBABILITY

Probability I: Addition rule. (Sections 4.1-4.3)

• Lecture: Handout

• Homework:

1. Sec 4.2: 1-25 odds, 29
2. Sec 4.3: 1-23 odds
9. Tue, Feb 17

Probability II: Multiplication rule. (Sections 4.4-4.5)

• Lecture: Handout

• Homework:

1. Sec 4.4: 1-21 odds
2. Sec 4.5: 1-25 odds
10. Thur, Feb 19

Random variables (Sections 5.1-5.2)

• Lecture: Handout (Printout next lecture on counting, we may cover part of that if we have time.)

• Homework:

1. Sec 5.2: 1-19 odds
11. Tue, Feb 24

MIDTERM I (Chapters 1-4)

12. Thur, Feb 26

Rodeo Holiday (No Classes)

13. Tue, Mar 3

Counting & Binomial distribution. (Sections 4.7, 5.3-5.4)

• Lecture: Handout A, Handout B

• Homework:

1. Sec 4.7: 1, 5, 7, 9, 13
2. Sec 5.3: 1, 3, every other odd 5-33, 35 (If the book says to use a table in the appendix, use `dbinom` in R instead.)
3. Sec 5.4: 1, 3, every other odd 5-17, 19
14. Thur, Mar 5

Intro to the normal distribution. (Sections 6.1-6.2)

• Lecture: Handout

• Homework

1. Sec 6.2: 1-4, 5-39 odds (most of these are easy if you use `pnorm` and `qnorm` R function.). You must make sketches to show the area.

NOTE: From this point onward, if the book says to use a lookup table in Appendix A, use R instead. (You won’t be given tables on the tests.)

HINT: If you use the technique I used in class, you don’t need to find z scores OR use the table in the back of the book. If the question refers to data that has a standard normal distribution, then it has a normal distribution with a mean=0 and a standard deviation=1.

For example, to do 6.2 #10, it says to find the probability a thermometer has a reading less than -2.50 if the readings have a standard normal distribution. Thus we want to find P(x<-2.50) where x has a standard normal distribution. In R you would type:

``````> pnorm(-2.50, mean=0, sd=1)
0.006209665``````

So the probability is only 0.00621!

15. Tue, Mar 10

Normal distribution cont. (Section 6.3)

• Lecture: Continuation of previous lecture.

• Homework

1. Sec 6.3: 1, 2, 4, 5-23 odds Make sketches to show the area.
16. Thur, Mar 12

INFERENTIAL STATISTICS

Sampling distributions, estimators, and the Central limit theorem (CLT). (Section 6.4-6.5)

• Lecture: Handout

• Homework:

1. Sec 6.4: 1-7 odds, 11
2. Sec 6.5: 1-17 odds Make sketches
17. Tue, Mar 17 & Thur, Mar 19

Spring Break (No class)

18. Tue, Mar 24

Normal as approx. to the binomial and assessing normality. (Sections 6.6-6.7)

• Lecture: Handout A, Handout B

• Homework:

1. Sec 6.6: 1-23 odds (Use R not the appendix tables!) Make sketches
2. Sec 6.7: 1, 3, 9 & 13, 11 & 15
19. Thur, Mar 26

Estimating a population proportion (Sections 7.1-7.2)

• Lecture: Handout

• Homework: (Yes, there are many problems for this HW, but these problems require practice.)

1. Sec. 7.2: 1-35 odds
20. Tue, Mar 31

Estimating a population mean. (Sections 7.3-7.4)

• Lecture: Handout

• Homework: (Yes, there are many problems for this HW, but these problems require practice.)

1. Sec. 7.3: 1-23 odds, 27, 29, 33
2. Sec. 7.4: 1-13 odds, 19, 21, 23
21. Thur, April 2

HYPOTHESIS TESTING

Intro to hypothesis testing (Sections 8.1-8.2)

• Lecture: Handout

• Homework:

1. Sec. 8.2: 1-43 odds (skip 17-23). You don’t need to find critical values. However, if the book asks you to find the test statistic, find that using the equation.

Hint for 29-36. If you have the test statistic and it’s a z-score, then use the cumulative probability distribution for the standard normal `pnorm` and find the tail area. See the section on p-value in the notes, it also discusses what to do.

22. Tue, April 7

Testing a claim about a proportion (Section 8.3)

• Lecture: Continuation of last lecture handout.

• Homework:

1. Sec. 8.3: 1-3 odds, 5(c,d,e), 9, 15, 19, 23

Note 1: You do not need to find the test statistic or critical values. We are using the p-values.

Note 2: R uses the continuity correction for more accurate p-values. Your p-values and test statistics will differ from the book’s answers by a few percent. The following are a few of the p-values you will get with R to help you verify your work: Q5: p-value = 0.9114, Q9: p-value < 2.2e-16, Q15: p-value = 0.5395.

23. Thur, April 9

Testing a claim about a mean (Section 8.4-8.5)

• Lecture: Handout

• Homework:

1. Sec. 8.4: 1-7 odds, 13, 15
2. Sec. 8.5: 3-13 odds, 21, 25, 27, 31
24. Tue, April 14

Understanding tests and estimates

• Lecture: Handout
• Homework: Study for the exam. I won’t accept any email questions after 5 pm the night before the exam. Don’t start studying the night before the test.
25. Thur, April 16

MIDTERM II (Chapters 5-8 and 4.7)

26. Tue, April 21

Inferences about two proportions (Sections 9.1-9.2)

• Lecture: Handout

• Homework:

1. Sec. 9.2: 1-7 odds, 15, 17, 19, 21, 25

Note that R uses the continuity correction so the p-values will differ by a few percent from the book’s.

27. Thur, April 23

Inferences about two means & matched pairs (Section 9.3-9.4)

• Lecture: Handout

• Homework:

1. Sec. 9.3: 1-7 odds, 23, 25, 27, 28

Hint for 27: Use the `Coins` table, to get the quarters for before 1964:

``pre=Coins\$WEIGHT[Coins\$TYPE=="Pre-1964 Quarters"]``

To find the post 1964 quarters use the same method but take a look at the `Coins` table to see what they are called and then modify the above statement. The command `summary(Coins)` is helpful to find the categories.

Hint for 28: Use the `Cola` table. Just figure out which two columns you need.

2. Sec. 9.4: 1, 3, 5 (manually find the test statistic & p-value only), 13, 15, 17 (b-c), 19

28. Tue, April 28

MODELING AND TESTING RELATIONSHIPS

Correlation (Section 10.1-10.2)

• Lecture: Handout

• Homework: Include scatter plots for each set of data that you find r

1. Sec. 10.2: 1-11 odds, 21, 23, 27, 29, 31, 33, 35
29. Thur, April 30

Regression (Section 10.3)

• Lecture: Handout

• Homework: Make sure to determine if r is significant first (via hypothesis test) SHOW WORK. Include scatter plots with regression line and residual plots.

1. Sec. 10.3: 1-11 odds, 21, 23, 27, 29, 31, 33

Hint for 5 and 7: you will need to determine if the linear correlation coefficient is significant. Since r and n are already given just use the test statistic equation to manually find the p-value.

Variation and prediction intervals, multiple regression (Section 10.4-10.5)

• Lecture: Continuation of regression lecture.
• Homework: No HW. These sections are optional course material. However, I highly recommend you read them.
30. Tue, May 5

Contingency tables (Section 11.3)

• Lecture: Handout

• Homework: Note that R uses the Yate’s continuity correction, so your P-values may differ slightly from the book’s.

1. Sec. 11.3: 1-5, 7, 11, 13, 17, 21
31. Thur, May 7

ANOVA I (Section 12.1)

• Lecture: Handout

• Homework:

1. Sec. 12.2: 1-4, 5 (skip d), 9

ANOVA II (Section 12.2)

• Lecture: Continuation of last lecture handout.

• Homework: Don’t type in the data for 11-14 manually, download Chapter 12 Data File and load it into R (just like the book data), it has data for each problem. The table name is listed next to each problem. Also, don’t forget to include the boxplots.

1. Sec. 12.2: 11 `car.crash` (p-val=0.421), 12 `car.crash` (p-val=0.296), 13 `stress` (p=val=0.091), 14 `skulls` (p-val=0.0305), 16 (p-val=0.0369)
32. Tue, May 12

Review / Questions

• Lecture: Handout
• Homework: Study for the final exam.
33. Thur, May 14

Review / Questions

34. Tue, May 19

FINAL EXAM Chapters 1-12 (2 hours - early class start time)

8:10 am to 10:10 am

If you would like your final exam back, turn in a self addressed stamped envelope with 2 first class stamps affixed with your final exam. Once the exams are graded I will mail back those I have envelopes for. If you don’t provide an envelope your exam will be shredded for your privacy.

Updated on:
Thu May 21 2009 at 12 PM