# exam CLOSE WINDOW Final Exam, Date Submitted:

exam CLOSE WINDOW Final Exam, Date Submitted: 12/15/2015 25. Computing the sample correlation coefficient and the coefficients for the least-squares regression line In ongoing economic analyses, the federal government compares per capita incomes not only among different states but also for the same state at different times. Typically, what the federal government finds is that “poor” states tend to stay poor and “wealthy” states tend to stay wealthy. Would we have gotten information about the per capita income for a state (denoted by ) from its per capita income (denoted by )? The following bivariate data give the per capita income (in thousands of dollars) for a sample of sixteen states in the years and (source: U.S. Bureau of Economic Analysis, Survey of Current Business, May ). The data are plotted in the scatter plotin Figure 1. per capita income, x (in s) per capita income, y (in s) Illinois 11.1 31.3 Idaho 8.7 23.4 South Carolina 7.8 23.5 Louisiana 8.8 22.8 Arkansas 7.6 22.1 Hawaii 11.5 27.8 South Dakota 8.1 25.1 Arizona 9.6 25.3 Mississippi 7.1 20.5 North Dakota 8.1 23.5 Utah 8.5 23.4 Georgia 8.5 27.2 West Virginia 8.2 20.9 Virginia 10.2 29.5 Tennessee 8.3 25.6 Kansas 10.0 26.6 x 7 8 9 10 11 12 13 y 20 22 24 26 28 30 32 34 36 38 Figure 1 The least-squares regression line for these data has a slope of approximately . Answer the following. Carry your intermediate computations to at least four decimal places. (If necessary, consult a list of formulas.) What is the value of the y-intercept of the least-squares regression line for these data? Round your answer to at least two decimal places. What is the value of the sample correlation coefficient for these data? Round your answer to at least three decimal places. Additional Resources Elementary Statistics (A Brief Version), 6th Ed. Bluman Chapter 10: Correlation and Regression Section 10.2: Regression Supplementary Resources Bivariate data give insight into the relationship between two quantitative variables. In this scenario, the two variables are per capita income, denoted by , and per capita income, denoted by . Looking at the sample data for these two variables plotted in the scatter plot in Figure 1, we see evidence of a positive linear relationship between the variables: as per capita incomes increase, the corresponding per capita incomes tend to increase, and this relationship looks like it could be modeled nicely by a straight line. The value of the y-intercept: A method called simple linear regression can be used to estimate and analyze the linear relationship between the two variables. The linear relationship is estimated via a line that “best fits” the sample data points. This “best-fitting” line is the one that minimizes the sum of the squares of the vertical distances of the data points from the line and is called the least-squares regression line for the data. As with any other line, the least-squares regression line is completely specified by its slope and its y-intercept. The slope of the least-squares regression line is often denoted by , and the y-intercept is often denoted by . The relationship between the slope and the y-intercept of the least-squares regression line is . The notation for the slope and y-intercept of the least-squares regression line (In this equation, is the sample mean of the values and is the sample mean of the values.) For our data, and are computed to be approximately and , respectively, and the slope of the least-squares regression line is given as . Thus, we have The value of the sample correlation coefficient: A statistic that is used to quantify the strength and direction of the linear relationship between two variables is the sample correlation coefficient, usually denoted . The sample correlation coefficient depends on the sample data but is always a number from to . If is positive, then the sample data indicate a positive linear relationship between the two variables; if is negative, then the sample data indicate a negative linear relationship between the two variables. The closer that is to the extreme values and , the stronger the linear relationship indicated by the sample data. When , the sample data indicate no linear relationship between the two variables. The following illustration summarizes these facts: perfectnegativelinearrelationship perfectpositivelinearrelationship nolinearrelationship increasingnegativelinearrelationship increasingpositivelinearrelationship =r−1 =r0 =r1 The value of the sample correlation coefficient is related to the slope of the regression line by the formula , in which is the sample standard deviation of the values and is the sample standard deviation of the values. x 7 8 9 10 11 12 13 y 20 22 24 26 28 30 32 34 36 38 Figure 2 For the data in this problem, with given as and and computed to be approximately and , respectively, we have The least-squares regression line for these data (the line having slope and y-intercept ) is shown in Figure 2. The positive relationship between per capita incomes and per capita incomes is clearly seen by the slope of the line. That the relationship is positive is also seen in the value of , which is greater than zero. Note: It may seem in Figure 2 that the y-intercept is not . This is because the horizontal axis in the figure has been shifted to account for the large numbers on the axis. The answer is: What is the value of the y-intercept of the least-squares regression line for these data? Round your answer to at least two decimal places. 7.50 What is the value of the sample correlation coefficient for these data? Round your answer to at least three decimal places. 0.819 CLOSE WINDOW Final Exam, Date Submitted: 12/15/2015 1. Discrete versus continuous variables Which of the following variablesare best thought of as continuous, which discrete? Indicate your choice for each by checking the appropriate column. Variable Discrete Continuous (a) The number of occupied tables at Zito’s Cafe at p.m. next Friday (b) The body temperature measurement of a participant in a lie-detector test (c) The number of students, in a class of , who improve their score from the first midterm to the second midterm (d) The number of tickets purchased by a caller on a Rose Bowl ticket hotline Additional Resources Elementary Statistics (A Brief Version), 6th Ed. Bluman Chapter 1: The Nature of Probability and Statistics Section 1.2: Variables and Types of Data Lecture: Identify types of data Supplementary Resources Variables are classified as categorical or quantitativedepending on the values that they can take. Categorical variables are variables whose values are categories, and quantitative variables are variables whose values are numbers. Quantitative variables are further classified as discrete or continuous, and in this problem we focus on this latter classification. A discrete variable is one whose possible values can be counted, meaning that its possible values correspond to some subset of the whole numbers. Thus, if the variable can take on only whole-numbered values, then it is discrete. The number of keystrokes used to type an email message is a discrete variable because it can take on only whole-numbered values: , , , , and so on. A continuous variable is one whose possible values make up an interval of real numbers. For any continuous variable, the possible values include some range of numbers without any gaps. The time spent studying for the next statistics test is a continuous variable because it could take on any value in a range of numbers, with no numbers in the range excluded. The time spent studying could be, for instance, minutes, or minutes, or minutes, or any real number in between. It is not necessary that the values be whole numbers (like , , , etc.). Note also that it doesn’t matter whether time is measured in minutes or some other units. The possible values will make up an entire interval of numbers no matter the units. Discrete variables often arise when quantities are being counted (“the number of”), and continuous variables often arise when quantities are being measured. With these considerations in mind, we can classify the four variables in the problem: (a) The number of occupied tables at Zito’s Cafe at p.m. next Friday: The number of occupied tables must be a whole number. It could be , , , , or any whole number up to the total number of tables at the cafe. It couldn’t be a number in between these numbers (like , for instance). Thus, the variable is discrete. (b) The body temperature measurement of a participant in a lie-detector test: The value of the temperature could be, for instance, degrees Fahrenheit, or degrees Fahrenheit, or degrees Fahrenheit, or any number in between. In fact, there are no real numbers that are excluded as possible values for the temperature in degrees Fahrenheit. (If some other temperature units are chosen, a similar comment applies.) Thus, the variable is continuous. (c) The number of students, in a class of , who improve their score from the first midterm to the second midterm: The number of students who improve must be a whole number. In particular, it must be one of the numbers . (It can’t be a number between two whole numbers, like .) This means that the variable is discrete. (d) The number of tickets purchased by a caller on a Rose Bowl ticket hotline: The number of tickets purchased is necessarily a whole number: , or , or , or , etc. (It is impossible to purchase, say, tickets.) This means that the variable is discrete. The answer is: Variable Discrete Continuous (a) The number of occupied tables at Zito’s Cafe at p.m. next Friday (b) The body temperature measurement of a participant in a lie-detector test (c) The number of students, in a class of , who improve their score from the first midterm to the second midterm (d) The number of tickets purchased by a caller on a Rose Bowl ticket hotline CLOSE WINDOW Final Exam, Date Submitted: 12/15/2015 4. Interpreting relative frequency histograms A realtor is displaying the histogram below, which summarizes the percentage of appreciation in value (over the past five years) for each of a sample of houses in Newburg Park. 10 8 6 4 2 0 Frequency 3 3 5 9 5 0 10 20 30 40 50 Appreciation (in percent) Based on the histogram, find the proportion of appreciation percentages in the sample that are less than percent. Write your answer as a decimal, and do not round your answer. Additional Resources Elementary Statistics (A Brief Version), 6th Ed. Bluman Chapter 2: Frequency Distributions and Graphs Section 2.2: Histograms, Frequency Polygons, and Ogives Supplementary Resources A histogram is a useful device for displaying information regarding the number of observations in a data set that fall within certain classes of values. Some histograms, such as the one in the problem, display the frequencies of the classes, and others display the relative frequencies. A relative frequency is a frequency expressed as a proportion (or a percentage) of the total number of measurements. The relative frequency for a class is found by dividing the frequency for the class by the total number of measurements. The relative frequency for a class can be thought of as the proportion (or percentage) of measurements that fall into that class. What is the purpose of reporting relative frequencies? The relative frequencies for the classes in the histogram in the problem are shown in the far-right column of Table 1. The sample contains measurements, so the relative frequencies for the classes are simply the class frequencies divided by . Appreciation (percent) Frequency Relative frequency 0 up to 10 3 3/25 = 0.12 10 up to 20 3 3/25 = 0.12 20 up to 30 5 5/25 = 0.20 30 up to 40 9 9/25 = 0.36 40 up to 50 5 5/25 = 0.20 Table 1 We’re asked to find the proportion of appreciation percentages in the sample that are less than . This proportion is the sum of the relative frequencies of classes containing measurements less than . There are classes that contain measurements less than : the up to class the up to class the up to class the up to class. The sum of the relative frequencies for these classes is . Thus, () of the appreciation percentages in the sample are less than . Remark: We could obtain this answer in another way. Namely, we could find the number of appreciation percentages in the sample that are less than and then express this frequency as a relative frequency. The number of appreciation percentages that are less than is the sum of the frequencies of classes containing measurements less than . These classes are the classes shown previously. The sum of their frequencies is . Thus, of the appreciation percentages in the sample are less than . This means that the proportion of appreciation percentages that are less than is . Are the class endpoints included in the classes or not? The answer is . CLOSE WINDOW Final Exam, Date Submitted: 12/15/2015 14. Central limit theorem: Sample mean According to records, the amount of precipitation in a certain city on a November day has a mean of inches, with a standard deviation of inches. What is the probability that the mean daily precipitation will be inches or more for a random sample of November days (taken over many years)? Carry your intermediate computations to at least four decimal places. Round your answer to at least three decimal places. Additional Resources Elementary Statistics (A Brief Version), 6th Ed. Bluman Chapter 6: The Normal Distribution Section 6.3: The Central Limit Theorem Exercise: Use the central limit theorem to solve problems involving sample means for large samples Supplementary Resources We’re asked about a probability involving the mean of a sample of precipitation amounts, given that the sample was drawn from a population having a specified mean and standard deviation. We’ll use (as we always do for sample means) to denote the mean of a sample of precipitation amounts taken from such a population. Note that the value of depends on the particular sample of precipitation amounts chosen: different samples of precipitation amounts may give different values of . (As such, is a random variablethat depends on the sample.) We must find , which is the probability that the mean precipitation amount for randomly chosen November days is inches or more. The sample is drawn from a population with mean and standard deviation . We do not know the shape of the population distribution, though, so it would seem that we have too little information about the distribution of to calculate . However, there is an important result called the central limit theorem that applies in this situation. Because the sample size is large enough, regardless of the shape of the population distribution, the distribution of is approximately normalwith mean and standard deviation . (Note: we will use the exact value of in the calculations below rather than the approximation of in order to minimize rounding errors.) Therefore, the variable follows approximately the standard normal distribution. Thus, . The answer is . CLOSE WINDOW Final Exam, Date Submitted: 12/15/2015 16. Confidence interval for the population mean: Use of the standard normal The lifetime of a certain brand of battery is known to have a standard deviation of 19.8 hours. Suppose that a random sample of 100 such batteries has a mean lifetime of 34.1 hours. Based on this sample, find a 95% confidence interval for the true meanlifetime of all batteries of this brand. Then complete the table below. Carry your intermediate computations to at least three decimal places. Round your answers to one decimal place. (If necessary, consult a list of formulas.) What is the lower limit of the 95% confidence interval? What is the upper limit of the 95% confidence interval? You answered: What is the lower limit of the 95% confidence interval? What is the upper limit of the 95% confidence interval? Please answer all the questions before clicking on “Next.” The correct answer is: What is the lower limit of the 95% confidence interval? 30.2 What is the upper limit of the 95% confidence interval? 38.0 Points: 0 of 1 point CLOSE WINDOW Final Exam, Date Submitted: 12/15/2015 21. Confidence interval for the difference of population means: Use of the standard normal The human resources department of a consulting firm gives a standard creativity test to a randomly selected group of new hires every year. This year, new hires took the test and scored a mean of points with a standard deviation of . Last year, new hires took the test and scored a mean of points with a standard deviation of . Assume that the population standard deviations of the test scores of all new hires in the current year and the test scores of all new hires last year can be estimated by the sample standard deviations, as the samples used were quite large. Construct a confidence interval for , the difference between the mean test score of new hires from the current year and the mean test score of new hires from last year. Then complete the table below. Carry your intermediate computations to at least three decimal places. Round your answers to at least two decimal places. (If necessary, consult a list of formulas.) What is the lower limit of the 99% confidence interval? What is the upper limit of the 99% confidence interval? You answered: What is the lower limit of the 99% confidence interval? What is the upper limit of the 99% confidence interval? Please answer all the questions before clicking on “Next.” The correct answer is: What is the lower limit of the 99% confidence interval? −12.28 What is the upper limit of the 99% confidence interval? 0.68 Points: 0 of 1 point CLOSE WINDOW Final Exam, Date Submitted: 12/15/2015 25. Computing the sample correlation coefficient and the coefficients for the least-squares regression line An advertising firm wishes to demonstrate to its clients the effectiveness of the advertising campaigns it has conducted. The following bivariate data on twelve recent campaigns, including the cost of each campaign (denoted by , in millions of dollars) and the resulting percentage increase in sales (denoted by ) following the campaign, were presented by the firm. A scatter plotof the data is shown in Figure 1. Campaign cost, x (in millions of dollars) Increase in sales, y (percent) 2.20 6.78 1.60 6.27 3.19 6.50 4.03 6.98 1.24 6.38 2.23 6.46 2.29 6.58 1.64 6.62 3.60 6.86 2.88 6.93 2.96 6.52 3.67 6.77 x 1 1.5 2 2.5 3 3.5 4 y 6 6.2 6.4 6.6 6.8 7 7.2 Figure 1 The least-squares regression line for these data has a slope of approximately . Answer the following. Carry your intermediate computations to at least four decimal places. (If necessary, consult a list of formulas.) What is the value of the y-intercept of the least-squares regression line for these data? Round your answer to at least two decimal places. What is the value of the sample correlation coefficient for these data? Round your answer to at least three decimal places. You answered: What is the value of the y-intercept of the least-squares regression line for these data? Round your answer to at least two decimal places. What is the value of the sample correlation coefficient for these data? Round your answer to at least three decimal places. Please answer all the questions before clicking on “Next.” The correct answer is: What is the value of the y-intercept of the least-squares regression line for these data? Round your answer to at least two decimal places. 6.16 What is the value of the sample correlation coefficient for these data? Round your answer to at least three decimal places. 0.718 Points: 0 of 1 point