MAT167 Introduction to Statistics
An introduction to statistics. Includes sampling, data display, measures of central tendency, variability, and position; random variables, probability, probability distributions; sampling distributions, assessing normality, confidence intervals, hypothesis testing, ANOVA, and regression. Use of the statistics software R is taught throughout the course.
Announcements
Final grades have been posted. The solutions to the final exam are now on the website. Have a good summer and thanks for all your hard work this semester. –Anthony
Course Info
 Spring 2009: section 22684, 3 credit hours.
 9:10 am  10:25 am Tuesday & Thursday, Santa Rita A102, Jan 20 through May 19 2009, West Campus, Pima Community College.
 Syllabus
Instructor Info
 Instructor: Anthony Tanbakuchi
 Office: Radiology Research Labs, U of A, (520) 6264500 (Map To Office)
 Easiest to contact me via email: mat167@tanbakuchi.com
Exam Dates
 Feb 24: MIDTERM I
 April 16: MIDTERM II
 May 19: Final Exam Ch 112 (2 hours)
Resources
Quick Reference Sheet:
R Statistics Software
 Using R on Campus:
 R is on a few computers in the Academic Commons Computer Lab (2nd floor Santa Catalina building). Just go into the computer lab and ask Jody or Dennis (one of the lab managers) where the computers are. Printers are also available in this lab.
 To install R:
 Visit the R Resources page.
 Basic R usage and examples:
 R Basics page
 R Data Sets:

 Triola Book Data Page

 Class Survey Data Page

 See quick reference sheet or R intro lecture for info on how to use the data sets.
 Technical Notes On R:
 When you start a new problem, it’s best to delete all the variables to ensure you don’t accidentally use old data, just type
rm(list=ls())
. Note that you will need to reload the book data if you need it.  When you close R, you do not need to save the workspace if it asks. Saving the workspace just saves the variables you have defined.
Solutions / Exams

Spring 2009

Fall 2008

Summer 2008

Spring 2008

Fall 2007
Lectures and Homework
All homework is due at the beginning of class on Tuesday. Thus, homework assigned on Tuesday and Thursday is due at the beginning of class on the following Tuesday.

Tue, Jan 20
FOUNDATIONS
Introductory Material. (Sections 1.11.4)

In Class Survey: Sexual Partners Survey (encrypted connection) (Do not submit this until instructed to do so.)

Lecture: Handout

Special Home Work Complete within 24 hours:
 CRITICAL A: Student Information (encrypted connection)
 CRITICAL B: Student Survey (encrypted connection)

Home Work (Due next Tuesday)
 CRITICAL C: Return syllabus student contract signed (last page).
 Sec 1.2: odds 125, 26
 Sec 1.3: every other odd 117, odds 2127
 Sec 1.4: odds 129
 If you plan on using your own computer for homework, try to install R on it using these installation instructions. If you have problems getting it to install, email me. If you don’t have a home computer, you can use R in the academic computer commons on campus.


Thur, Jan 22
Introduction to R.

Lecture: Handout

Homework:
 R Worksheet
 R New York Times Article Read the New York Times article on R. (PDF of article if link does not work.)


Tue, Jan 27
DESCRIPTIVE STATISTICS
Summarizing & graphing data. (Sections 2.12.4)

Lecture:Handout

Homework:
 From this point forward if the book has the TI symbol next to a problem, use R to do it.
 As always, make sure to include your plots made with R in the HW.
 If you get stuck using R take a look at these R examples.

Sec 2.2: 117 odds (do 17 by hand)

Sec 2.3: 1, 3.

Additional Problem for 2.3: Use R to make two histograms: one of the male heights and of the female heights in the Appendix B Data Set 1 (
Mhealth
andFhealth
tables in R). Include both of the histograms in your HW. Then write a paragraph discussing the differences between the male and female heights that you can see from the histograms (ie. center, variation, shape, outliers, min, max). Do either of the histograms have a distribution that is approximately normal?HINTS: If you are having trouble getting the book data, see the R intro lecture or look at the back of the quick reference sheet. See the top part of this page to download the data sets.

Sec 2.4: 14, 9 (use R), 13 (just sketch by hand), 17 (use R), 19 (use R)
Hint for 19: to make a plot with Both lines and points rather than a scatter plot use the optional argument
type="b"
for the plot function. ex.plot(t, y, type="b")
. The time vectort
goes from 1990 to 2000. A quick way to maket
is to use this shortcut:t=1990:2000
.


Thur, Jan 29
Summation Notation.

Lecture: Handout

Homework: (If you need more explanation and practice with summation notation: see this page )


Tue, Feb 3
Measures of center. (Sections 3.13.2)

Lecture: Handout

Homework:

Sec 3.2: 19 odds, 13, 15, 21, 23, 25, 29 (Use R if the problem has TI by it from now on.)
Hint for 21: to get the first set of differences use:
x=WEATHER$HIGHWEATHER$PREDICTE
You can find the second set of differences in the same way once you figure out the correct column name. (I admit the author’s column names are not that good).
Hint for 23: to get the pennies for before 1983:
x=Coins$WEIGHT[Coins$TYPE=="Pre1983 Pennies"]
To find the post 1983 pennies use the same method but take a look at the
Coins
table to see what they are called and then modify the above statement.Hint on 29 b:
mean(x, trim=0.10)



Thur, Feb 5
Measures of variation. (Sections 3.3)

Lecture: Handout

Homework:
 Sec 3.3: 19 odds, 15, 21


Tue, Feb 10
Relative standing and exploratory data analysis. (Sections 3.43.5)

Lecture: Handout

Homework

Sec 3.4: 1, 5, 7, 9, 11, 1327 odds

Sec 3.5: 1, 3, 5, 9

Additional problem: Use the following code to make two boxplots for comparing gender against bear weight and length. Then use the boxplots to discuss and compare the distribution of lengths and weights of bears in terms of their gender. (Make sure the book data is loaded into R first)
boxplot(Bears$LENGTH ~ Bears$SEX, main="Comparison of bear length") boxplot(Bears$WEIGHT ~ Bears$SEX, main="Comparison of bear weight")



Thur, Feb 12
Descriptive Statistics: Case Study.
 Lecture: Handout
PROBABILITY
Probability I: Addition rule. (Sections 4.14.3)

Lecture: Handout

Homework:
 Sec 4.2: 125 odds, 29
 Sec 4.3: 123 odds

Tue, Feb 17
Probability II: Multiplication rule. (Sections 4.44.5)

Lecture: Handout

Homework:
 Sec 4.4: 121 odds
 Sec 4.5: 125 odds


Thur, Feb 19
Random variables (Sections 5.15.2)

Lecture: Handout (Printout next lecture on counting, we may cover part of that if we have time.)

Homework:
 Sec 5.2: 119 odds


Tue, Feb 24
MIDTERM I (Chapters 14)

Thur, Feb 26
Rodeo Holiday (No Classes)

Tue, Mar 3
Counting & Binomial distribution. (Sections 4.7, 5.35.4)

Thur, Mar 5
Intro to the normal distribution. (Sections 6.16.2)

Lecture: Handout

Homework

Sec 6.2: 14, 539 odds (most of these are easy if you use
pnorm
andqnorm
R function.). You must make sketches to show the area.NOTE: From this point onward, if the book says to use a lookup table in Appendix A, use R instead. (You won’t be given tables on the tests.)
HINT: If you use the technique I used in class, you don’t need to find z scores OR use the table in the back of the book. If the question refers to data that has a standard normal distribution, then it has a normal distribution with a mean=0 and a standard deviation=1.
For example, to do 6.2 #10, it says to find the probability a thermometer has a reading less than 2.50 if the readings have a standard normal distribution. Thus we want to find P(x<2.50) where x has a standard normal distribution. In R you would type:
> pnorm(2.50, mean=0, sd=1) 0.006209665
So the probability is only 0.00621!



Tue, Mar 10
Normal distribution cont. (Section 6.3)

Lecture: Continuation of previous lecture.

Homework
 Sec 6.3: 1, 2, 4, 523 odds Make sketches to show the area.


Thur, Mar 12
INFERENTIAL STATISTICS
Sampling distributions, estimators, and the Central limit theorem (CLT). (Section 6.46.5)

Lecture: Handout

Homework:
 Sec 6.4: 17 odds, 11
 Sec 6.5: 117 odds Make sketches


Tue, Mar 17 & Thur, Mar 19
Spring Break (No class)

Tue, Mar 24
Normal as approx. to the binomial and assessing normality. (Sections 6.66.7)

Thur, Mar 26
Estimating a population proportion (Sections 7.17.2)

Lecture: Handout

Homework: (Yes, there are many problems for this HW, but these problems require practice.)
 Sec. 7.2: 135 odds


Tue, Mar 31
Estimating a population mean. (Sections 7.37.4)

Lecture: Handout

Homework: (Yes, there are many problems for this HW, but these problems require practice.)
 Sec. 7.3: 123 odds, 27, 29, 33
 Sec. 7.4: 113 odds, 19, 21, 23


Thur, April 2
HYPOTHESIS TESTING
Intro to hypothesis testing (Sections 8.18.2)

Lecture: Handout

Homework:

Sec. 8.2: 143 odds (skip 1723). You don’t need to find critical values. However, if the book asks you to find the test statistic, find that using the equation.
Hint for 2936. If you have the test statistic and it’s a zscore, then use the cumulative probability distribution for the standard normal
pnorm
and find the tail area. See the section on pvalue in the notes, it also discusses what to do.



Tue, April 7
Testing a claim about a proportion (Section 8.3)

Lecture: Continuation of last lecture handout.

Homework:

Sec. 8.3: 13 odds, 5(c,d,e), 9, 15, 19, 23
Note 1: You do not need to find the test statistic or critical values. We are using the pvalues.
Note 2: R uses the continuity correction for more accurate pvalues. Your pvalues and test statistics will differ from the book’s answers by a few percent. The following are a few of the pvalues you will get with R to help you verify your work: Q5: pvalue = 0.9114, Q9: pvalue < 2.2e16, Q15: pvalue = 0.5395.



Thur, April 9
Testing a claim about a mean (Section 8.48.5)

Lecture: Handout

Homework:
 Sec. 8.4: 17 odds, 13, 15
 Sec. 8.5: 313 odds, 21, 25, 27, 31


Tue, April 14
Understanding tests and estimates
 Lecture: Handout
 Homework: Study for the exam. I won’t accept any email questions after 5 pm the night before the exam. Don’t start studying the night before the test.

Thur, April 16
MIDTERM II (Chapters 58 and 4.7)

Tue, April 21
Inferences about two proportions (Sections 9.19.2)

Lecture: Handout

Homework:

Sec. 9.2: 17 odds, 15, 17, 19, 21, 25
Note that R uses the continuity correction so the pvalues will differ by a few percent from the book’s.



Thur, April 23
Inferences about two means & matched pairs (Section 9.39.4)

Lecture: Handout

Homework:

Sec. 9.3: 17 odds, 23, 25, 27, 28
Hint for 27: Use the
Coins
table, to get the quarters for before 1964:pre=Coins$WEIGHT[Coins$TYPE=="Pre1964 Quarters"]
To find the post 1964 quarters use the same method but take a look at the
Coins
table to see what they are called and then modify the above statement. The commandsummary(Coins)
is helpful to find the categories.Hint for 28: Use the
Cola
table. Just figure out which two columns you need. 
Sec. 9.4: 1, 3, 5 (manually find the test statistic & pvalue only), 13, 15, 17 (bc), 19



Tue, April 28
MODELING AND TESTING RELATIONSHIPS
Correlation (Section 10.110.2)

Lecture: Handout

Homework: Include scatter plots for each set of data that you find r
 Sec. 10.2: 111 odds, 21, 23, 27, 29, 31, 33, 35


Thur, April 30
Regression (Section 10.3)

Lecture: Handout

Homework: Make sure to determine if r is significant first (via hypothesis test) SHOW WORK. Include scatter plots with regression line and residual plots.

Sec. 10.3: 111 odds, 21, 23, 27, 29, 31, 33
Hint for 5 and 7: you will need to determine if the linear correlation coefficient is significant. Since r and n are already given just use the test statistic equation to manually find the pvalue.

Variation and prediction intervals, multiple regression (Section 10.410.5)
 Lecture: Continuation of regression lecture.
 Homework: No HW. These sections are optional course material. However, I highly recommend you read them.


Tue, May 5
Contingency tables (Section 11.3)

Lecture: Handout

Homework: Note that R uses the Yate’s continuity correction, so your Pvalues may differ slightly from the book’s.
 Sec. 11.3: 15, 7, 11, 13, 17, 21


Thur, May 7
ANOVA I (Section 12.1)

Lecture: Handout

Homework:
 Sec. 12.2: 14, 5 (skip d), 9
ANOVA II (Section 12.2)

Lecture: Continuation of last lecture handout.

Homework: Don’t type in the data for 1114 manually, download Chapter 12 Data File and load it into R (just like the book data), it has data for each problem. The table name is listed next to each problem. Also, don’t forget to include the boxplots.
 Sec. 12.2: 11
car.crash
(pval=0.421), 12car.crash
(pval=0.296), 13stress
(p=val=0.091), 14skulls
(pval=0.0305), 16 (pval=0.0369)
 Sec. 12.2: 11


Tue, May 12
Review / Questions
 Lecture: Handout
 Homework: Study for the final exam.

Thur, May 14
Review / Questions

Tue, May 19
FINAL EXAM Chapters 112 (2 hours  early class start time)
8:10 am to 10:10 am
If you would like your final exam back, turn in a self addressed stamped envelope with 2 first class stamps affixed with your final exam. Once the exams are graded I will mail back those I have envelopes for. If you don’t provide an envelope your exam will be shredded for your privacy.