Here is a description of the assignment, and my answers are below the assignment.
Statistics for Social Research SOC310/SWK310
Fall 2014
Final Examination
DUE DATE: This exam is due on Thursday, December 18, by 5:00 p.m. Remember, exams will not be accepted late, unless you are struck by an incapacitating illness or injury. Even then, I will want to see evidence of just how far you got on the exam before you were stricken.
As with the midterm, you will be fine provided you plan ahead carefully and leave yourself plenty of time to complete the exam. The computer analysis may take anywhere from fifteen minutes to two or three hours, while it will take several more hours to answer all the questions.
OUTSIDE HELP: The exam is open book, open notes, and has no time limit other than the due date. You may discuss the exam with anyone before you begin the computer analysis. Once you have started, you may only ask questions of the instructor or the TA.
GENERAL INSTRUCTIONS: For this exam you will be working with four or more variables from the 2012 General Social Survey. A portion of the data from this survey is available on the class web page in a file named gss12.sav. At least two of your variables must be categorical in nature (i.e., either nominal or ordinal), while at least two others must be interval or ratio.
Before anything else, you should check your variables to make sure that values that should be treated as missing have indeed been defined as such in SPSS. You will then describe your variables and test the relationships between certain of them based on information obtained from the FREQUENCIES, CROSSTABS, TTEST, MEANS, SCATTERPLOT and REGRESSION commands.
In addition to your answers to the questions that follow, you will need to turn in all of the relevant output generated by SPSS. Be sure to get rid of any output that is not relevant to the analyses that you have conducted to answer the exam questions. Take care also to document which specific section(s) of your output you are referring to as you answer each question.
QUESTIONS: As you work through this exam, you will be writing a report on the data and your findings. Such a report typically includes graphs, statistical tables, and text. The text is critical insofar as it highlights, interprets, and summarizes what the graphs and statistics reveal. Answer the following questions in order and in prose form. Report and interpret fully the values of relevant statistics in the text. Try to come as close to a realworld, nontechnical explanation of the meaning of each number as possible. And as always, do not forget to label your numbers wherever appropriate.
-
For starters:
-
What population does your sample represent?
-
How large is your sample?
-
How was the sample drawn?
-
-
For each of your variables:
-
What is the name of each variable in SPSS?
-
What does each variable aim to measure and how is that operationalized?
-
What level of measurement does each variable achieve? Explain.
-
-
For one interval/ratio level variable of your choosing:
-
Generate a frequency distribution and a bar chart or histogram for your chosen variable. If appropriate, describe the shape of the distribution.
-
Report and interpret the values for all of the measures of central tendency that are appropriate for your variable.
-
Report and interpret the values for all of the measures of dispersion that are appropriate for your variable.
-
-
For your two categorical variables:
-
State a research hypothesis concerning the relationship between the two categorical variables you have chosen. In doing so, specify theoretically which variable is the independent variable and which is the dependent variable, as well as why you expect there to be a relationship between the two of them.
-
Generate a crossclassification table for the two variables.
-
Describe the pattern and size of any observed relationship between your two variables by making the appropriate percentage comparisons.
-
Test the hypothesis that there is a relationship between the two variables in the population from which the sample is drawn.
-
Report and interpret an appropriate chisquare based measure of association assessing the strength of the observed relationship. Explain why you chose the measure of association you did.
-
Report and interpret an appropriate PRE measure of association assessing the strength of the observed relationship. Explain why you chose the measure of association you did.
-
-
For one of your categorical variables and one of your interval/ratio level variables:
-
State a research hypothesis concerning the relationship between the two variables that you have chosen. Be sure to state clearly which is the independent variable, which is the dependent variable, and why you expect there to be a relationship between them.
[NOTE: You have a couple of choices here: if the categorical variable has just two categories, or if you wish to focus on just two of its categories, then your research hypothesis can be of the sort expected in a standard difference of means test (ttest); if the categorical variable has more than two categories (and you wish to treat it that way), then your research hypothesis should be of the sort appropriate for an ANOVA test.]
-
Report the overall mean on the interval/ratio level variable as well as the means on that same variable for each of the subgroups defined by the categorical variable. What do those subgroup means suggest about whether there is a relationship between your two variables?
-
Test the hypothesis that there is a relationship between the two variables in the population from which the samples are drawn.
-
Assess the strength of this relationship using statistics appropriate to the sort of analysis you have conducted.
-
-
For your two interval/ratio level variables:
-
Briefly explain which variable is the independent variable, which variable is the dependent variable, and why you expect a relationship between the two.
-
Produce a scatterplot summarizing the relationship between the two variables. Does it look like there is a relationship there? If so, how would you describe what that relationship looks like?
-
Report the coefficients of the OLS regression line describing the relationship between the two variables. Explain what these coefficients tell you.
-
Calculate, by hand, predicted values of the dependent variable for two nonzero values of the independent variable.
-
Test the hypothesis that there is a relationship between the two variables in the population from which the sample is drawn.
-
Give and interpret the values of Pearson’s r and Rsquared for this relationship.
-
1.
a. U.S. Americans
b. 1974 people
c. multistage cluster sampling
2.
a. What is the name of each variable in SPSS?
abany, advfront, adults, and age
b. What does each variable aim to measure?
· abany aims to measure if a female respondent answering this question wants abortion legal for any reason. abany is operationalized by requesting respondents to answer “yes” or “no” in response to the question, “Is there any reason you think abortion should be legal?”.
· advfront aims to measure how much people think science research is necessary and how much it should be funded by the government. advfront is operationalized by having respondents answer either “strongly agree”, “agree”, “disagree”, or “strongly disagree”, according to their corresponding opinion.
· adults aims to measure the number of children the respondent has. adults is operationalized by having the respondent enter how many adults above 18 they have in the household.
· age aims to measure the age of the respondent in years. age is operationalized by having the respondent enter a number for their age.
c.
· abany achieves nominal measurement by having answers without associated magnitude: yes and no.
· advfront achieves ordinal measurement by having answers that measure magnitude, but the distance between these measurements is not numeric; there is no numeric difference between agree and disagree.
· adults achieves ratio measurement by having a meaningful zero value (no adults 18 years and above), and by having numeric differences between answers.
· age achieves ratio level measurement by having a meaningful zero value (zero means that there is no age, and the person is not yet one year old), and by having numeric differences between answers.
3.
a.
Here is a frequency distribution followed by a bar chart for the abany variable.
ABORTION IF WOMAN WANTS FOR ANY REASON |
|||||
|
Frequency |
Percent |
Valid Percent |
Cumulative Percent |
|
Valid |
YES |
554 |
28.1 |
44.4 |
44.4 |
NO |
694 |
35.2 |
55.6 |
100.0 |
|
Total |
1248 |
63.2 |
100.0 |
|
|
Missing |
IAP |
666 |
33.7 |
|
|
DK |
40 |
2.0 |
|
|
|
NA |
20 |
1.0 |
|
|
|
Total |
726 |
36.8 |
|
|
|
Total |
1974 |
100.0 |
|
|
b.
Statistics
|
||
ABORTION IF WOMAN WANTS FOR ANY REASON |
||
N |
Valid |
1248 |
Missing |
726 |
|
Mode |
2 |
The mode is a measure of central tendency computed by determining which response is most common. A mode of 2 (no) for abany means that the response “no” was given most frequently.
c.
Statistics
|
||
ABORTION IF WOMAN WANTS FOR ANY REASON |
||
N |
Valid |
1248 |
Missing |
726 |
|
|
|
|
Minimum |
1 |
|
Maximum |
2 |
|
Percentiles |
25 |
1.00 |
50 |
2.00 |
|
75 |
2.00 |
Percentile is a measure of dispersion. Here the 25th percentile is 1 (yes), calculated by finding the case 25% of the way up through a list of cases ordered from smallest magnitude to greatest magnitude. 50th percentile and 75th percentile can be calculated the same way. A 25th percentile of 1 (yes) means that 25% of the data has a value of 1 (yes) or less. A 50th percentile of 2 (no) means that 2 (no) is a value equal to or greater than 50% of the data.
4. Two Categorical Variables: abany and advfront.
a. abany and advfront are my two categorical variables. abany is the independent variable. This is because a person’s view on whether or not abortion ought to be legalized is more likely to affect their opinion as to how important science research is. The logic here is that if someone wants to have abortions legalized, they will want better science in order to have smoother abortions, therefore, more money ought to be put into science research.
b. Crosstab between advfront and abany:
SCI RSCH IS NECESSARY AND SHOULD
BE SUPPORTED BY FEDERAL GOVT * ABORTION IF WOMAN WANTS FOR ANY REASON
Crosstabulation
|
|||||
|
ABORTION IF WOMAN WANTS
FOR ANY REASON |
Total |
|||
YES |
NO |
||||
SCI RSCH IS NECESSARY AND SHOULD BE SUPPORTED BY FEDERAL GOVT |
Strongly agree |
Count |
80 |
68 |
148 |
% within ABORTION IF WOMAN WANTS FOR ANY REASON |
27.9% |
20.7% |
24.0% |
||
Agree |
Count |
175 |
215 |
390 |
|
% within ABORTION IF WOMAN WANTS FOR ANY REASON |
61.0% |
65.3% |
63.3% |
||
Disagree |
Count |
23 |
37 |
60 |
|
% within ABORTION IF WOMAN WANTS FOR ANY REASON |
8.0% |
11.2% |
9.7% |
||
Strongly disagree |
Count |
9 |
9 |
18 |
|
% within ABORTION IF WOMAN WANTS FOR ANY REASON |
3.1% |
2.7% |
2.9% |
||
Total |
Count |
287 |
329 |
616 |
|
% within ABORTION IF WOMAN WANTS FOR ANY REASON |
100.0% |
100.0% |
100.0% |
c. 61 percent of women who want abortion legalized agree that science research is necessary, while 65 percent who do not want abortion legalized agree that science research is necessary. This begins to lead me to reject my anticipated results, because there are more women agreeing for science who do not want abortion legalized. However, when looking at the strongly agree row, my hypothesis begins to look true again, because there are 7.2% more abany respondents who said yes than those who said no.
d. Test the hypothesis that there is a relationship between the two variables in the population from which the sample is drawn
I. Assumptions
a. L of M: 2 nominal
b. Sampling: random
c. sample size: 616, 4 x 2 table
d. Population distributions: N/A
II. Hypotheses:
a. Ho = no relationship between groups abany and advfront
b. Ha = relationship between the two variables
III. Test Statistic
a. chi-squared
b. df = 3
c. X^2 = 5.504
Chi-Square
Tests
|
|||
|
Value |
df |
Asymp. Sig. (2-sided) |
Pearson Chi-Square |
5.504a
|
3 |
.138 |
Likelihood Ratio |
5.515 |
3 |
.138 |
Linear-by-Linear Association |
3.177 |
1 |
.075 |
N of Valid Cases |
616 |
|
|
a. 0 cells (0.0%) have expected count less
than 5. The minimum expected count is 8.39. |
|||
|
IV. p-value and interpretation
a. p = .138
b. If Ho were true, the probability of getting a crosstab like we got is less than .138.
V. Conclusion
a. Given p = .138, we fail to reject the Ho and do not find support for Ha.
e. Cramer’s V X^2 based measure of association test. I chose this test because it is based off of X^2, and also is suitable for tables bigger than 2×2, because it corrects a problem that comes up in the Phi value test for tables bigger than 2×2. Here we observe a very weak relationship, because the value of Cramer’s V (0.95) is on the low end of the “weak relationship” scale, which extends from 0 – 0.3.
Symmetric
Measures
|
|||
|
Value |
Approx. Sig. |
|
Nominal by Nominal |
Phi |
.095 |
.138 |
Cramer’s V |
.095 |
.138 |
|
N of Valid Cases |
616 |
|
|
a. Not assuming the null hypothesis. |
|||
b. Using the asymptotic standard error
assuming the null hypothesis. |
f. Gamma PRE measure of association test. The value of gamma between abany and advfront is .156, which means that we do 15.6% better predicting values of advfront when we know whether the woman respondent thinks abortion should be legalized for any reason than if we don’t. I chose this measure of association because I have an ordinal level variable as well as a nominal level variable. Ordinal measures of association are appropriate for a mixed test like this (one ordinal and one nominal level variable), unless there are more than two options for the nominal level variable (Agresti and Finlay 246).
Symmetric
Measures
|
|||||
|
Value |
Asymp. Std. Errora
|
Approx. Tb
|
Approx. Sig. |
|
Ordinal by Ordinal |
Gamma |
.156 |
.074 |
2.088 |
.037 |
N of Valid Cases |
616 |
|
|
|
|
a. Not assuming the null hypothesis. |
|||||
b. Using the asymptotic standard error assuming
the null hypothesis. |
|||||
5.
a. research hypothesis between abany (dependent variable) and age (independent variable):
Ho: There is no relationship between abany and age. μ1 – μ2 </= 0
Ha: There is a relationship between abany and age. μ1 – μ2 > 0
I expect there to be a relationship between these two variables because as someone gets older, their perspectives tend to change, and there may be a common direction in which the beliefs tilt. Another, more realistic reason is that the older respondents matured during a time when the common belief was against abortion, while younger respondents matured when pro-abortion was a more common belief.
b. The mean for age is 48.19. For Yes the mean age is 46.63, and for No the mean age is 48.88. There may be a slight relationship between the two variables, but age doesn’t seem to have much affect on abany, because the mean age for No was only 2.25 years different from the mean age for Yes.
Statistics
|
|
|||||
|
ABORTION IF
WOMAN WANTS FOR ANY REASON |
AGE OF RESPONDENT |
|
|||
N |
Valid |
1248 |
1969 |
|||
Missing |
726 |
5 |
||||
Mean |
1.56 |
48.19 |
|
|||
Descriptives
|
||||||||||
|
ABORTION IF WOMAN WANTS FOR ANY REASON |
Statistic |
Std. Error |
|||||||
AGE OF RESPONDENT |
YES |
Mean |
46.63 |
.691 |
||||||
95% Confidence Interval for Mean |
Lower Bound |
45.27 |
|
|||||||
Upper Bound |
47.99 |
|
||||||||
5% Trimmed Mean |
46.18 |
|
||||||||
Median |
46.00 |
|
||||||||
Variance |
263.871 |
|
||||||||
Std. Deviation |
16.244 |
|
||||||||
Minimum |
18 |
|
||||||||
Maximum |
89 |
|
||||||||
Range |
71 |
|
||||||||
Interquartile Range |
24 |
|
||||||||
Skewness |
.310 |
.104 |
||||||||
Kurtosis |
-.531 |
.207 |
||||||||
NO |
Mean |
48.88 |
.689 |
|||||||
95% Confidence Interval for Mean |
Lower Bound |
47.52 |
|
|||||||
Upper Bound |
50.23 |
|
||||||||
5% Trimmed Mean |
48.45 |
|
||||||||
Median |
49.00 |
|
||||||||
Variance |
327.119 |
|
||||||||
Std. Deviation |
18.086 |
|
||||||||
Minimum |
18 |
|
||||||||
Maximum |
89 |
|
||||||||
Range |
71 |
|
||||||||
Interquartile Range |
30 |
|
||||||||
Skewness |
.236 |
.093 |
c.
I. Assumptions
a. L of M: 1 nominal (abany), 1 ratio (age).
b. M of S: random
c. Sample size: 1248 valid responses = n (variables from same sample, so there’s only one sample size to report)
d. Population distribution: N/A
II. Hypotheses
a. Ho: there is no relationship between abany and age, μ1 – μ2 </= 0
b. Ha: there is a relationship between abany and age, μ1 – μ2 > 0
III. Test
a.
One-Sample
Test
|
||||||
|
Test Value = 0 |
|||||
t |
df |
Sig.
(2-tailed) |
Mean
Difference |
95% Confidence
Interval of the Difference |
||
Lower |
Upper |
|||||
ABORTION IF WOMAN WANTS FOR ANY REASON |
110.598 |
1247 |
.000 |
1.556 |
1.53 |
1.58 |
AGE OF RESPONDENT |
120.908 |
1968 |
.000 |
48.193 |
47.41 |
48.98 |
IV. p-value and interpretation
a. p < .000
b. If Ho were true (μ1 – μ2 </= 0), then the probability of getting a sample difference of means as far above 0 as we got (2.25 years) is .000.
V. Conclusion
a. We reject Ho, therefore, by proof of contradiction, we find support for the alternative Ha (μ1 – μ2 > 0).
d. This relationship is very weak, because the gamma value is .067.
Symmetric
Measures
|
|||||
|
Value |
Asymp. Std. Errora
|
Approx. Tb
|
Approx. Sig. |
|
Ordinal by Ordinal |
Gamma |
.067 |
.033 |
2.021 |
.043 |
N of Valid Cases |
1243 |
|
|
|
|
a. Not assuming the null hypothesis. |
|||||
b. Using the asymptotic standard error
assuming the null hypothesis. |
|||||
6.
a. My two variables are age (ratio) and adults (ratio). Age is the independent variable, because it affects the amount of adults over age 18 living in one’s household.
b. scatter plot of age (x) vs. adults (y)
The younger the respondent the higher the likelihood that they have a greater number of members in their household.
c. The unstandardized coefficient of the age of respondent is -.012, which tells us that there is a negative slope, where for every increase in year of age there is a decrease of .012 in number of people over 18 in one’s household. The unstandardized coefficient under column b, in row constant, has a value of 2.473. This is the y intercept of the graph.
Coefficientsa
|
|
||||||||||||
Model |
Unstandardized
Coefficients |
Standardized Coefficients |
t |
Sig. |
|
||||||||
B |
Std. Error |
Beta |
|
||||||||||
1 |
(Constant) |
2.473 |
.053 |
|
47.022 |
.000 |
|||||||
AGE OF RESPONDENT |
-.012 |
.001 |
-.258 |
-11.771 |
.000 |
||||||||
a. Dependent Variable: HOUSEHOLD MEMBERS 18
YRS AND OLDER |
|
||||||||||||
d. y = -.012x + 2.473.
y = -.012(21) + 2.473
y = 2.221 people. The number of people over 18 most likely to be in a 21 year old’s household is 2.221.
y = -.012(55) + 2.473
y = 1.813 people. The number of people over 18 most likely to be in a 55 year old’s household is 1.813.
e.
I. assumptions
a. L of M: two ratio (from the same sample): adults and age
b. M of S: random
c. sample size: n = 1958 people
d. assumptions: N/A, b/c of C.L.T. (sample size over 30).
II. hypotheses test:
a. Ho: there is no relationship, population slope = 0
b. Ha: there is a relationship, population slope =/ 0
III. test statistic
a. t = 120.908
|
||||||
|
Test Value = 0 |
|||||
t |
df |
Sig. (2-tailed) |
Mean Difference |
95% Confidence Interval of the Difference |
||
Lower |
Upper |
|||||
AGE OF RESPONDENT |
120.908 |
1968 |
.000 |
48.193 |
47.41 |
48.98 |
HOUSEHOLD MEMBERS 18 YRS
AND OLDER |
101.007 |
1957 |
.000 |
1.891 |
1.85 |
1.93 |
IV. interpret p-value
a. a. p < .000
b. If Ho were true (population slope = 0), then the probability of getting a slope like we got (-.012 household members over 18 per year of age) is .000.
V. conclusion
a. We reject Ho, therefore, by proof of contradiction, we find support for the alternative Ha (population slope =/ 0).
f. R square is .066. The average error from a straight line that our results deviate is 0.66. The Pearson’s r is -.258., meaning there is a negative relationship, but it is not perfect.
Model
Summary and Parameter Estimates
|
|||||||
Dependent Variable: HOUSEHOLD MEMBERS 18 YRS
AND OLDER |
|||||||
Equation |
Model Summary |
Parameter
Estimates |
|||||
R Square |
F |
df1 |
df2 |
Sig. |
Constant |
b1 |
|
Linear |
.066 |
138.559 |
1 |
1951 |
.000 |
2.473 |
-.012 |
The independent variable is AGE OF
RESPONDENT. |
Correlations
|
|||
|
HOUSEHOLD
MEMBERS 18 YRS AND OLDER |
AGE OF
RESPONDENT |
|
HOUSEHOLD MEMBERS 18 YRS AND OLDER |
Pearson Correlation |
1 |
-.258**
|
Sig. (2-tailed) |
|
.000 |
|
Sum of Squares and Cross-products |
1342.611 |
-7374.530 |
|
Covariance |
.686 |
-3.778 |
|
N |
1958 |
1953 |
|
AGE OF RESPONDENT |
Pearson Correlation |
-.258**
|
1 |
Sig. (2-tailed) |
.000 |
|
|
Sum of Squares and Cross-products |
-7374.530 |
615657.277 |
|
Covariance |
-3.778 |
312.834 |
|
N |
1953 |
1969 |
|
**. Correlation is significant at the 0.01
level (2-tailed). |
I wrote this report for Gordon College course, Statistics for Social Research. Find the syllabus below.