MAT167 Introduction to Statistics

An introduction to statistics. Includes sampling, data display, measures of central tendency, variability, and position; random variables, probability, probability distributions; sampling distributions, assessing normality, confidence intervals, hypothesis testing, ANOVA, and regression. Use of the statistics software R is taught throughout the course.
Announcements
Final grades have been posted. The solutions to the final exam are now on the website. Have a good summer and thanks for all your hard work this semester. –Anthony
Course Info
- Spring 2009: section 22684, 3 credit hours.
- 9:10 am - 10:25 am Tuesday & Thursday, Santa Rita A102, Jan 20 through May 19 2009, West Campus, Pima Community College.
- Syllabus
Instructor Info
- Instructor: Anthony Tanbakuchi
- Office: Radiology Research Labs, U of A, (520) 626-4500 (Map To Office)
- Easiest to contact me via email: mat167@tanbakuchi.com
Exam Dates
- Feb 24: MIDTERM I
- April 16: MIDTERM II
- May 19: Final Exam Ch 1-12 (2 hours)
Resources
Quick Reference Sheet:
R Statistics Software
- Using R on Campus:
- R is on a few computers in the Academic Commons Computer Lab (2nd floor Santa Catalina building). Just go into the computer lab and ask Jody or Dennis (one of the lab managers) where the computers are. Printers are also available in this lab.
- To install R:
- Visit the R Resources page.
- Basic R usage and examples:
- R Basics page
- R Data Sets:
-
- Triola Book Data Page
-
- Class Survey Data Page
-
- See quick reference sheet or R intro lecture for info on how to use the data sets.
- Technical Notes On R:
- When you start a new problem, it’s best to delete all the variables to ensure you don’t accidentally use old data, just type
rm(list=ls()). Note that you will need to reload the book data if you need it. - When you close R, you do not need to save the workspace if it asks. Saving the workspace just saves the variables you have defined.
Solutions / Exams
-
Spring 2009
-
Fall 2008
-
Summer 2008
-
Spring 2008
-
Fall 2007
Lectures and Homework
All homework is due at the beginning of class on Tuesday. Thus, homework assigned on Tuesday and Thursday is due at the beginning of class on the following Tuesday.
-
Tue, Jan 20
FOUNDATIONS
Introductory Material. (Sections 1.1-1.4)
-
In Class Survey: Sexual Partners Survey (encrypted connection) (Do not submit this until instructed to do so.)
-
Lecture: Handout
-
Special Home Work Complete within 24 hours:
- CRITICAL A: Student Information (encrypted connection)
- CRITICAL B: Student Survey (encrypted connection)
-
Home Work (Due next Tuesday)
- CRITICAL C: Return syllabus student contract signed (last page).
- Sec 1.2: odds 1-25, 26
- Sec 1.3: every other odd 1-17, odds 21-27
- Sec 1.4: odds 1-29
- If you plan on using your own computer for homework, try to install R on it using these installation instructions. If you have problems getting it to install, email me. If you don’t have a home computer, you can use R in the academic computer commons on campus.
-
-
Thur, Jan 22
Introduction to R.
-
Lecture: Handout
-
Homework:
- R Worksheet
- R New York Times Article Read the New York Times article on R. (PDF of article if link does not work.)
-
-
Tue, Jan 27
DESCRIPTIVE STATISTICS
Summarizing & graphing data. (Sections 2.1-2.4)
-
Lecture:Handout
-
Homework:
- From this point forward if the book has the TI symbol next to a problem, use R to do it.
- As always, make sure to include your plots made with R in the HW.
- If you get stuck using R take a look at these R examples.
-
Sec 2.2: 1-17 odds (do 17 by hand)
-
Sec 2.3: 1, 3.
-
Additional Problem for 2.3: Use R to make two histograms: one of the male heights and of the female heights in the Appendix B Data Set 1 (
MhealthandFhealthtables in R). Include both of the histograms in your HW. Then write a paragraph discussing the differences between the male and female heights that you can see from the histograms (ie. center, variation, shape, outliers, min, max). Do either of the histograms have a distribution that is approximately normal?HINTS: If you are having trouble getting the book data, see the R intro lecture or look at the back of the quick reference sheet. See the top part of this page to download the data sets.
-
Sec 2.4: 1-4, 9 (use R), 13 (just sketch by hand), 17 (use R), 19 (use R)
Hint for 19: to make a plot with Both lines and points rather than a scatter plot use the optional argument
type="b"for the plot function. ex.plot(t, y, type="b"). The time vectortgoes from 1990 to 2000. A quick way to maketis to use this shortcut:t=1990:2000.
-
-
Thur, Jan 29
Summation Notation.
-
Lecture: Handout
-
Homework: (If you need more explanation and practice with summation notation: see this page )
-
-
Tue, Feb 3
Measures of center. (Sections 3.1-3.2)
-
Lecture: Handout
-
Homework:
-
Sec 3.2: 1-9 odds, 13, 15, 21, 23, 25, 29 (Use R if the problem has TI by it from now on.)
Hint for 21: to get the first set of differences use:
x=WEATHER$HIGH-WEATHER$PREDICTEYou can find the second set of differences in the same way once you figure out the correct column name. (I admit the author’s column names are not that good).
Hint for 23: to get the pennies for before 1983:
x=Coins$WEIGHT[Coins$TYPE=="Pre-1983 Pennies"]To find the post 1983 pennies use the same method but take a look at the
Coinstable to see what they are called and then modify the above statement.Hint on 29 b:
mean(x, trim=0.10)
-
-
-
Thur, Feb 5
Measures of variation. (Sections 3.3)
-
Lecture: Handout
-
Homework:
- Sec 3.3: 1-9 odds, 15, 21
-
-
Tue, Feb 10
Relative standing and exploratory data analysis. (Sections 3.4-3.5)
-
Lecture: Handout
-
Homework
-
Sec 3.4: 1, 5, 7, 9, 11, 13-27 odds
-
Sec 3.5: 1, 3, 5, 9
-
Additional problem: Use the following code to make two boxplots for comparing gender against bear weight and length. Then use the boxplots to discuss and compare the distribution of lengths and weights of bears in terms of their gender. (Make sure the book data is loaded into R first)
boxplot(Bears$LENGTH ~ Bears$SEX, main="Comparison of bear length") boxplot(Bears$WEIGHT ~ Bears$SEX, main="Comparison of bear weight")
-
-
-
Thur, Feb 12
Descriptive Statistics: Case Study.
- Lecture: Handout
PROBABILITY
Probability I: Addition rule. (Sections 4.1-4.3)
-
Lecture: Handout
-
Homework:
- Sec 4.2: 1-25 odds, 29
- Sec 4.3: 1-23 odds
-
Tue, Feb 17
Probability II: Multiplication rule. (Sections 4.4-4.5)
-
Lecture: Handout
-
Homework:
- Sec 4.4: 1-21 odds
- Sec 4.5: 1-25 odds
-
-
Thur, Feb 19
Random variables (Sections 5.1-5.2)
-
Lecture: Handout (Printout next lecture on counting, we may cover part of that if we have time.)
-
Homework:
- Sec 5.2: 1-19 odds
-
-
Tue, Feb 24
MIDTERM I (Chapters 1-4)
-
Thur, Feb 26
Rodeo Holiday (No Classes)
-
Tue, Mar 3
Counting & Binomial distribution. (Sections 4.7, 5.3-5.4)
-
Thur, Mar 5
Intro to the normal distribution. (Sections 6.1-6.2)
-
Lecture: Handout
-
Homework
-
Sec 6.2: 1-4, 5-39 odds (most of these are easy if you use
pnormandqnormR function.). You must make sketches to show the area.NOTE: From this point onward, if the book says to use a lookup table in Appendix A, use R instead. (You won’t be given tables on the tests.)
HINT: If you use the technique I used in class, you don’t need to find z scores OR use the table in the back of the book. If the question refers to data that has a standard normal distribution, then it has a normal distribution with a mean=0 and a standard deviation=1.
For example, to do 6.2 #10, it says to find the probability a thermometer has a reading less than -2.50 if the readings have a standard normal distribution. Thus we want to find P(x<-2.50) where x has a standard normal distribution. In R you would type:
> pnorm(-2.50, mean=0, sd=1) 0.006209665So the probability is only 0.00621!
-
-
-
Tue, Mar 10
Normal distribution cont. (Section 6.3)
-
Lecture: Continuation of previous lecture.
-
Homework
- Sec 6.3: 1, 2, 4, 5-23 odds Make sketches to show the area.
-
-
Thur, Mar 12
INFERENTIAL STATISTICS
Sampling distributions, estimators, and the Central limit theorem (CLT). (Section 6.4-6.5)
-
Lecture: Handout
-
Homework:
- Sec 6.4: 1-7 odds, 11
- Sec 6.5: 1-17 odds Make sketches
-
-
Tue, Mar 17 & Thur, Mar 19
Spring Break (No class)
-
Tue, Mar 24
Normal as approx. to the binomial and assessing normality. (Sections 6.6-6.7)
-
Thur, Mar 26
Estimating a population proportion (Sections 7.1-7.2)
-
Lecture: Handout
-
Homework: (Yes, there are many problems for this HW, but these problems require practice.)
- Sec. 7.2: 1-35 odds
-
-
Tue, Mar 31
Estimating a population mean. (Sections 7.3-7.4)
-
Lecture: Handout
-
Homework: (Yes, there are many problems for this HW, but these problems require practice.)
- Sec. 7.3: 1-23 odds, 27, 29, 33
- Sec. 7.4: 1-13 odds, 19, 21, 23
-
-
Thur, April 2
HYPOTHESIS TESTING
Intro to hypothesis testing (Sections 8.1-8.2)
-
Lecture: Handout
-
Homework:
-
Sec. 8.2: 1-43 odds (skip 17-23). You don’t need to find critical values. However, if the book asks you to find the test statistic, find that using the equation.
Hint for 29-36. If you have the test statistic and it’s a z-score, then use the cumulative probability distribution for the standard normal
pnormand find the tail area. See the section on p-value in the notes, it also discusses what to do.
-
-
-
Tue, April 7
Testing a claim about a proportion (Section 8.3)
-
Lecture: Continuation of last lecture handout.
-
Homework:
-
Sec. 8.3: 1-3 odds, 5(c,d,e), 9, 15, 19, 23
Note 1: You do not need to find the test statistic or critical values. We are using the p-values.
Note 2: R uses the continuity correction for more accurate p-values. Your p-values and test statistics will differ from the book’s answers by a few percent. The following are a few of the p-values you will get with R to help you verify your work: Q5: p-value = 0.9114, Q9: p-value < 2.2e-16, Q15: p-value = 0.5395.
-
-
-
Thur, April 9
Testing a claim about a mean (Section 8.4-8.5)
-
Lecture: Handout
-
Homework:
- Sec. 8.4: 1-7 odds, 13, 15
- Sec. 8.5: 3-13 odds, 21, 25, 27, 31
-
-
Tue, April 14
Understanding tests and estimates
- Lecture: Handout
- Homework: Study for the exam. I won’t accept any email questions after 5 pm the night before the exam. Don’t start studying the night before the test.
-
Thur, April 16
MIDTERM II (Chapters 5-8 and 4.7)
-
Tue, April 21
Inferences about two proportions (Sections 9.1-9.2)
-
Lecture: Handout
-
Homework:
-
Sec. 9.2: 1-7 odds, 15, 17, 19, 21, 25
Note that R uses the continuity correction so the p-values will differ by a few percent from the book’s.
-
-
-
Thur, April 23
Inferences about two means & matched pairs (Section 9.3-9.4)
-
Lecture: Handout
-
Homework:
-
Sec. 9.3: 1-7 odds, 23, 25, 27, 28
Hint for 27: Use the
Coinstable, to get the quarters for before 1964:pre=Coins$WEIGHT[Coins$TYPE=="Pre-1964 Quarters"]To find the post 1964 quarters use the same method but take a look at the
Coinstable to see what they are called and then modify the above statement. The commandsummary(Coins)is helpful to find the categories.Hint for 28: Use the
Colatable. Just figure out which two columns you need. -
Sec. 9.4: 1, 3, 5 (manually find the test statistic & p-value only), 13, 15, 17 (b-c), 19
-
-
-
Tue, April 28
MODELING AND TESTING RELATIONSHIPS
Correlation (Section 10.1-10.2)
-
Lecture: Handout
-
Homework: Include scatter plots for each set of data that you find r
- Sec. 10.2: 1-11 odds, 21, 23, 27, 29, 31, 33, 35
-
-
Thur, April 30
Regression (Section 10.3)
-
Lecture: Handout
-
Homework: Make sure to determine if r is significant first (via hypothesis test) SHOW WORK. Include scatter plots with regression line and residual plots.
-
Sec. 10.3: 1-11 odds, 21, 23, 27, 29, 31, 33
Hint for 5 and 7: you will need to determine if the linear correlation coefficient is significant. Since r and n are already given just use the test statistic equation to manually find the p-value.
-
Variation and prediction intervals, multiple regression (Section 10.4-10.5)
- Lecture: Continuation of regression lecture.
- Homework: No HW. These sections are optional course material. However, I highly recommend you read them.
-
-
Tue, May 5
Contingency tables (Section 11.3)
-
Lecture: Handout
-
Homework: Note that R uses the Yate’s continuity correction, so your P-values may differ slightly from the book’s.
- Sec. 11.3: 1-5, 7, 11, 13, 17, 21
-
-
Thur, May 7
ANOVA I (Section 12.1)
-
Lecture: Handout
-
Homework:
- Sec. 12.2: 1-4, 5 (skip d), 9
ANOVA II (Section 12.2)
-
Lecture: Continuation of last lecture handout.
-
Homework: Don’t type in the data for 11-14 manually, download Chapter 12 Data File and load it into R (just like the book data), it has data for each problem. The table name is listed next to each problem. Also, don’t forget to include the boxplots.
- Sec. 12.2: 11
car.crash(p-val=0.421), 12car.crash(p-val=0.296), 13stress(p=val=0.091), 14skulls(p-val=0.0305), 16 (p-val=0.0369)
- Sec. 12.2: 11
-
-
Tue, May 12
Review / Questions
- Lecture: Handout
- Homework: Study for the final exam.
-
Thur, May 14
Review / Questions
-
Tue, May 19
FINAL EXAM Chapters 1-12 (2 hours - early class start time)
8:10 am to 10:10 am
If you would like your final exam back, turn in a self addressed stamped envelope with 2 first class stamps affixed with your final exam. Once the exams are graded I will mail back those I have envelopes for. If you don’t provide an envelope your exam will be shredded for your privacy.