Advanced Statistical Analysis

Confidence Intervals

The confidence intervals for our important variables are as follows. We are using the Z-table values for our intervals because we had 58 students and 57 residents respond to our survey.

Proportion of students whose favorite entertainment activity is movies:

P (+ / -) Z * S.E.

95% Confidence Level Z = 1.96
S.E. = sq rt
(P * (1 - P)) / n = sq rt (0.4655 (1 - 0.4655)) / 58 = 0.06547
Confidence Interval = 0.4655 (+ / -) 1.96 * (0.06547)
Confidence Interval = 33.718% to 59.382%

This interval shows, with a 95% confidence level, between 33.718% and 59.382% of Kalamazoo College students consider movies to be their favorite type of entertainment that they spend money on

Proportion of residents whose favorite entertainment activity is movies:

P (+ / -) Z * S.E.

95% Confidence Level Z = 1.96
S.E. = sq rt (P * (1 - P)) / n = sq rt (0.2807 (1 - 0.2807)) / 57 = 0.05952
Confidence Interval = 0.2807 (+ / -) 1.96 * (0.05952)
Confidence Interval = 16.404% to 39.736%

This interval shows, with a 95% confidence level, between 16.404% and 39.736% of Kalamazoo residents consider movies to be their favorite type of entertainment that they spend money on.

Confidence interval for average income of students:

X (+ / -) Z * S.E.

95% Confidence Level Z = 1.96
S.E. = S / sq rt n = 55,542.204 / sq rt 58 = 7,293.04868
Confidence Interval = 118,103.4483 (+ / -) 1.96 * (7,293.04868)
Confidence Interval = $103,809.07289 to $132,397.82371

This interval shows, with a 95% confidence level, that the average household income in which a Kalamazoo College student resides will be between $103,809.07289 and $132,397.82371.

Confidence interval for average income of residents:

X (+ / -) Z * S.E.

95% Confidence Level Z = 1.96
S.E. = S / sq rt n = 32443.154/ sq rt 57 = 4,297.20072
Confidence Interval = 60,631.5790 (+ / -) 1.96 * (4,297.20072)
Confidence Interval = $52,209.0656 to $69,054.0924

This interval shows, with a 95% confidence level, that the average household income for Kalamazoo residents will be between $52,209.0656 and $69,054.0924.

Confidence interval for student expenditures in the past week on favorite entertainment activity:

X (+ / -) Z * S.E.

95% Confidence Level Z = 1.96
S.E. = S / sq rt n =16.55744 / sq rt 58 = 2.17410
Confidence Interval = 9.28879 (+ / -) 1.96 * (2.17410)
Confidence Interval = $5.0276 to $13.5500

This interval shows, with a 95% confidence level, that the average entertainment expenditures (on favorite entertainment activity) within the past week of Kalamazoo College students will be between $5.0276 and $13.5500.

Confidence interval for resident expenditures in the past week on favorite entertainment activity:

X (+ / -) Z * S.E.

95% Confidence Level Z = 1.96
S.E. = S / sq rt n = 331.27972 / sq rt 57 = 43.87907
Confidence Interval = 76.17544 (+ / -) 1.96 * (43.87907)
Confidence Interval = $-9.82754 to $162.17842

This interval shows, with a 95% confidence level, that the average entertainment expenditures (on favorite entertainment activity) within the past week of Kalamazoo residents will be between $-9.82754 and $162.17842.

 

Hypothesis Testing of Means Or Proportions:

1) The U.S. Department of Labor: Bureau of Labor Statistics3 conducted the first study we looked at. It stated that non-college graduates make up 60.5% of the total expenditures on entertainment spending. Information for this hypothesis test can be found in Appendix C.

Null Hypothesis (Ho): Our non-college graduates make up 60.5% of the average total expenditures on entertainment.
Alternative Hypothesis (Ha): Our non-college graduates do not make up 60.5% of the average total expenditures on entertainment.

95% confidence level Z = 1.96
S.E. = sq rt (P * (1 - P)) / n = sq rt (0.605 * (0.395)) / 115 = 0.045586
Critical Region = P (+ / - ) Z * (S.E.)
Critical Region = 0.605 (+ / -) 1.96 * 0.045586
Critical Region = 51.5652% to 69.4348%

With a 95% confidence level we can safely reject our null hypothesis. The portion of spending of non-college graduates only make up about 4.1935% of the average total expenditures. This rejection can be blamed primarily on the $100,000 motor home purchased by a college graduate. If this outlier is taken out of our statistics our hypothesis test is nearly exactly what the U.S. Department of Labor: Bureau of Labor Statistics stated in their report. Without this outlier the non-college graduates make up 60.6407% of the average total expenditures on entertainment.

 

2) Our next hypothesis test comes from the U.S. Department of Labor: Bureau of Labor Statistics3. This study stated that 22% of the total entertainment spending came from those younger than 35 years. Information for this hypothesis test can be found in Appendix C.

Null Hypothesis (Ho): Our subjects younger than 35 years make up 22.0% of the average total expenditures on entertainment.
Alternative Hypothesis (Ha): Our subjects younger than 35 years do not make up 22.0% of the average total expenditures on entertainment.

95% confidence level Z = 1.96
S.E. = sq rt(P * (1 - P)) / n = sq rt (0.220 * (0.78)) / 115 = 0.038629
Critical Region = P (+ / - ) Z * (S.E.)
Critical Region = 0.22 (+ / -) 1.96 * 0.038629
Critical Region = 14.4287% to 29.5713%

We can safely reject this hypothesis as well, as the population less than 35 years only spent $3390 of the $107,429 total. This is equivalent to 3.156% and is well out of our acceptable range with 95% confidence. Even if the motor home value is taken out and we recalculate our percentage, we are too far above the top end of the critical region with a percent of 45.632. There are many possible discrepancies in this comparison, as we do not know where the subjects surveyed for the U.S. Department of Labor: Bureau of Labor Statistics survey live. All of our subjects are located in the Kalamazoo area currently and spending habits are different as you move around the nation. Also the students at Kalamazoo College tend to have higher budgets than the average American student, which could explain the hypothesis being rejected if the motor home is not factored into ourtotals.

 

3) Another study we looked at pointed out that 25.685% of entertainment expenditures go towards fees and admissions4. This percentage applies to people in the Midwest part of the United States only, as other parts of the country have corresponding percentages. This hypothesis test will be very useful to the movie stores and cinemas in the Kalamazoo area. For our test we are going to look at both the students at Kalamazoo College and the residents of Kalamazoo.

Null Hypothesis (Ho): Kalamazoo College students and residents spend 25.685% of their average entertainment expenditures on fees and admissions.
Alternative Hypothesis (Ha): Kalamazoo College students and residents do not spend 25.685% of their average entertainment expenditures on fees and admissions.

95% confidence level Z = 1.96
S.E. = sq rt (P * (1 - P)) / n = sq rt (0.25685 * (0.74315)) / 115 = 0.040741
Critical Region = P (+ / - ) Z * (S.E.)
Critical Region = 0.25685 (+ / -) 1.96 * 0.040741
Critical Region = 17.6998% to 33.6702%

It appears that our sample population, with a 95% confidence level, will reject our null hypothesis. Only 1.8235% of the average total expenditures went towards fees and admissions. Like many of the results of this project, this hypothesis test may be affected by the $100,000 motor home. If the motor home is taken out, our sample population allocates 26.3696% of average expenditures to fees and admissions. This percent is well within our critical region and would be accepted. This shows that our 115 people are representative of the larger survey compared to..

4) Our final hypothesis test will test fees and admissions again, but this time against people less than 25 years from the Midwest. The study we are comparing to4 stated that 23.6063% of entertainment expenditures went towards fees and admissions for this age group. This hypothesis test will show if our sample college population corresponds with a larger survey. This time our survey will not factor in the Kalamazoo residents and will look just at the students.

Null Hypothesis (Ho): Kalamazoo College students spend 23.6063% of their entertainment expenditures on fees and admissions.
Alternative Hypothesis (Ha): Kalamazoo College students do not spend 23.6063% of their entertainment expenditures on fees and admissions.

95% confidence level Z = 1.96
S.E. = sq rt (P * (1 - P)) / n = sq rt (0.236063 * (0.763937)) / 58 = 0.055761
Critical Region = P (+ / - ) Z * (S.E.)
Critical Region = 0.236063 (+ / -) 1.96 * 0.055761
Critical Region = 12.6772% to 34.5354%

This null hypothesis is accepted, as our sample student population spends about 17.2989% of their expenditures on fees and admissions. There is no $100,000 motor home to throw off any of our calculations so our sample population should be representative of the population in the survey we were comparing to.

 

Hypothesis Testing of the Difference of Means or Proportions

With this hypothesis test we will assess the difference in entertainment spending on fees and admissions between male and female Kalamazoo College students. Data for this hypothesis can be found in Appendix C. Typically males pay for their dates when going to the movies or pay for the movie rental, so we predict that they will expend more on entertainment materials.

 

1) Null Hypothesis (Ho): Average entertainment spending of males - average entertainment spending of females = 0.
Alternative Hypothesis (Ha): Average entertainment spending of males - average entertainment spending of females = 0.

95% confidence level Z = 1.96
S.E. = sq rt (S12 / n1) + (S22 / n2) = sq rt (26.112882/52) + (30.380312/63) = 5.26909
Critical Region = P ( + / - ) Z * (S.E.)
Critical Region = 0 ( + / - )1.96 * 5.26909
Critical Region = $-10.3274 to $10.3274

If the average entertainment spending on fees and admissions for males - the average entertainment spending on fees and admissions for females is within this critical area then we should accept our null hypothesis. $16.024 - $17.869 is within our critical area.

We found that women actually spend more on average then men at Kalamazoo College. However, with our confidence interval at 95% we will accept our hypothesis that the average entertainment spending of male students is greater than the average entertainment spending of female students.

 

2) Our second hypothesis test will be to see if the average total student spending is equal to the average total non-student spending. This test is the heart of our whole study, so we will look at the spending with the motor home and without the motor home. We predicted that the students of Kalamazoo College would have more disposable income available to them. See Appendix E for data sheet.

 

Null Hypothesis (Ho): Average entertainment spending of students - average entertainment spending of residents = 0.
Alternative Hypothesis (Ha): Average entertainment spending of students - average entertainment spending of residents = 0.

95% confidence level Z = 1.96
S.E. = sq rt (S12 / n1) + (S22 / n2) = sq rt (93.827782/58) + (6622.0032/57) = 877.1922
Critical Region = P + Z * (S.E.)
Critical Region = 0 + 1.96 * 877.1922
Critical Region = $-1719.297 to $1719.297

If the average total spending of students - the average total spending of the residents is within our critical area then we can accept this hypothesis.

With the motor home included in the residents' expenditures our total average spending is $2873-$104556. This is nowhere close to our critical region. Without the motor home the total average spending is $2873-$4556. This value of $-1683 is within our critical area and cannot be rejected with a 95% confidence level. These results are surprising to us because the average income of a Kalamazoo resident is much lower than the average household income of Kalamazoo College students. This shows that Kalamazoo College may be such a difficult school that students cannot spend the income they desire on entertainment.

 

Multiple regressions:

1) Our first regression examines which variables affected the amount of total spending (our Y variable). Our explanatory variables (X variables) are age, number of people over the age of 18 in the household, hours of work per week, and income. We predict that age will have a positive coefficient as we believe that college students (represented by the younger age group in our sample) will be less likely to spend higher amounts on entertainment. Second, we expect that the number of people over the age 18 living in the household will have a positive coefficient. The more adults in a household the more likely there will be a higher number of people who will spend on entertainment. Third, we believe that hours of work in a week will have a positive coefficient. This variable is directly related to income. Finally, we predict that income will have a positive coefficient, as the more money a person or household has, the more money they have to spend on entertainment.

We found the following to be our regression equation:

Total Entertainment Spending = -15872.9594 + 185.2459 (Age) + 2090.8573 (People Over 18) + 98.2199 (Hours of work) + 0.02455 (Income)

We can assess our regression through an analysis of the equation and the t-stats for each variable. Our predictions were correct for all of the coefficients. The t-stat for income was 1.3673, which is fairly significant but not enough to be classified as statistically significant. The regression equation shows, through a positive coefficient, that the older a person is the more they will spend on entertainment purchases. This variable provided a t-stat of 3.0644, the highest in this regression. The next variable, number of people over 18 in the household also provided a significant t-stat of 2.5477. Hours of work per week showed a fairly significant t-stat of 1.8783, but upon referring to a t-table we found that this was not significant enough to be considered statistically significant at the 95% confidence level. Based on our data points, as well as a very low R squared value (0.12) we can assume that most of the variation within our data has not been explained. The regression chart can be found in our appendix section.

 

2) In our second regression we looked at the variables that affected the spending on fees and admissions (our Y variable). Our explanatory variables are gender, education, income, and whether the person is a Kalamazoo College student or Kalamazoo resident. We predict that gender will have a negative coefficient as our sample is comprised of more women than men (63/115) and females are listed as 0 and males 1 in our dummy variable. As a result of this overrepresentation (compared to actual population proportions), women are more likely to dominate the spending in this category. Second, we believe that education will have a positive coefficient as those with higher levels of education will have more of an opportunity to be involved in ticketed events.   Third, we expect that income will have a positive coefficient, as the more money a person or household has, the more they will be able to spend on fees and admissions. Lastly, we predict that our citizen/student variable will have a positive coefficient, citizens being represented by 1 and students by 0 in our dummy variable. We believe that citizens are more likely to spend on fees and admissions because they have greater access to these off campus events, as well as events on the Kalamazoo College campus. Many students do not have cars, and as a result they are limited to on campus events, which are often free of charge.

We found the following to be our regression equation:

Spending on fees and admissions = -14.0639 - 3.6444 (Gender) + 3.8940 (Education) + 0.0001 (Income) + 16.5593 (Citizen/Student)

We can assess our regression through an analysis of the equation and the t-stats for each variable. Our predictions were correct for all of the coefficients. The t-stat for gender was -0.7246, which proves to not be statistically significant. Our next variable, education, is represented by a t-stat of 1.5114, which is not statistically significant even though it does show a slight significance. Household income was also shown to not be statistically significant as its t-stat is 1.782. This is very close to the level that we are looking for, but is not high enough to be considered significant for this study. Our final variable, Citizen/Student, was the only variable that proved to be significant. The t-stat of 2.3724 shows a clear significance. Additionally, the R square value of 0.13 suggests that the variation within our data has not been adequately explained. The regression chart can be found in our appendix section.

 

3) In our third regression we looked at the variables that affected household income (our Y variable). Our explanatory variables are Citizen/Student, education, number of people over 18, and hours of work per week. We predict that Citizen/Student will have a negative coefficient, citizens being represented by 1 and students by 0 in our dummy variable. Kalamazoo College students tend to come from households with higher than those of Kalamazoo citizens. Based on our sample, the average household income for Kalamazoo College students is $118,103.45, while the average household income for Kalamazoo citizens is $60,631.58. We expect that our second variable, education, will have a positive coefficient as people with higher education have more opportunities to receive higher paying jobs, many that mandate graduate bachelors degrees or higher. We predict that our third variable, hours of work per week, will also have a positive coefficient as the more hours a person works per week the larger amount of money they are able to collect. Lastly, we believe that the number of people over the age of 18, our fourth variable, will have a positive coefficient as the more people that are able to enter the workforce a household contains, the more likely the household income is to be high.

We found the following to be our regression equation:

Household Income = 57003.43 - 61818.77 (Citizen/Student) + 11090.63 (Education) + 139.31 (Hours of work) + 7454.11 (Number of people over 18)

We can assess our regression through an analysis of the equation and the t-stats for each variable. Our predictions were correct for all of the coefficients. The t-stat for citizen/student is -4.4615, which shows a strong statistical significance. This shows that students of Kalamazoo College are likely to have a higher salary than a resident of Kalamazoo. This is the highest t-stat in all of our regressions. The t-stat for education is 2.6340, which also shows a strong statistical significance, and suggests that the higher education one has achieved, the higher their income is likely to be. Our third variable, hours of work per week, has a t-stat of 0.4592, which is not high enough to be considered significant. Lastly, the number of people over 18 in the household has a t-stat of 1.5652. Although this represents a slight significance, it is not enough to be considered statistically significant. Additionally, the R square value of 0.34, which even though it is the highest of all of our regressions, suggests that the variation within our data has not been adequately explained. The regression chart can be found in our appendix section.

Home