Here is a description of the assignment, and my answers are below the assignment.
Statistics for Social Research SOC310/SWK310
Fall 2014
Final Examination
DUE DATE: This exam is due on Thursday, December 18, by 5:00 p.m. Remember, exams will not be accepted late, unless you are struck by an incapacitating illness or injury. Even then, I will want to see evidence of just how far you got on the exam before you were stricken.
As with the midterm, you will be fine provided you plan ahead carefully and leave yourself plenty of time to complete the exam. The computer analysis may take anywhere from fifteen minutes to two or three hours, while it will take several more hours to answer all the questions.
OUTSIDE HELP: The exam is open book, open notes, and has no time limit other than the due date. You may discuss the exam with anyone before you begin the computer analysis. Once you have started, you may only ask questions of the instructor or the TA.
GENERAL INSTRUCTIONS: For this exam you will be working with four or more variables from the 2012 General Social Survey. A portion of the data from this survey is available on the class web page in a file named gss12.sav. At least two of your variables must be categorical in nature (i.e., either nominal or ordinal), while at least two others must be interval or ratio.
Before anything else, you should check your variables to make sure that values that should be treated as missing have indeed been defined as such in SPSS. You will then describe your variables and test the relationships between certain of them based on information obtained from the FREQUENCIES, CROSSTABS, TTEST, MEANS, SCATTERPLOT and REGRESSION commands.
In addition to your answers to the questions that follow, you will need to turn in all of the relevant output generated by SPSS. Be sure to get rid of any output that is not relevant to the analyses that you have conducted to answer the exam questions. Take care also to document which specific section(s) of your output you are referring to as you answer each question.
QUESTIONS: As you work through this exam, you will be writing a report on the data and your findings. Such a report typically includes graphs, statistical tables, and text. The text is critical insofar as it highlights, interprets, and summarizes what the graphs and statistics reveal. Answer the following questions in order and in prose form. Report and interpret fully the values of relevant statistics in the text. Try to come as close to a realworld, nontechnical explanation of the meaning of each number as possible. And as always, do not forget to label your numbers wherever appropriate.

For starters:

What population does your sample represent?

How large is your sample?

How was the sample drawn?


For each of your variables:

What is the name of each variable in SPSS?

What does each variable aim to measure and how is that operationalized?

What level of measurement does each variable achieve? Explain.


For one interval/ratio level variable of your choosing:

Generate a frequency distribution and a bar chart or histogram for your chosen variable. If appropriate, describe the shape of the distribution.

Report and interpret the values for all of the measures of central tendency that are appropriate for your variable.

Report and interpret the values for all of the measures of dispersion that are appropriate for your variable.


For your two categorical variables:

State a research hypothesis concerning the relationship between the two categorical variables you have chosen. In doing so, specify theoretically which variable is the independent variable and which is the dependent variable, as well as why you expect there to be a relationship between the two of them.

Generate a crossclassification table for the two variables.

Describe the pattern and size of any observed relationship between your two variables by making the appropriate percentage comparisons.

Test the hypothesis that there is a relationship between the two variables in the population from which the sample is drawn.

Report and interpret an appropriate chisquare based measure of association assessing the strength of the observed relationship. Explain why you chose the measure of association you did.

Report and interpret an appropriate PRE measure of association assessing the strength of the observed relationship. Explain why you chose the measure of association you did.


For one of your categorical variables and one of your interval/ratio level variables:

State a research hypothesis concerning the relationship between the two variables that you have chosen. Be sure to state clearly which is the independent variable, which is the dependent variable, and why you expect there to be a relationship between them.
[NOTE: You have a couple of choices here: if the categorical variable has just two categories, or if you wish to focus on just two of its categories, then your research hypothesis can be of the sort expected in a standard difference of means test (ttest); if the categorical variable has more than two categories (and you wish to treat it that way), then your research hypothesis should be of the sort appropriate for an ANOVA test.]

Report the overall mean on the interval/ratio level variable as well as the means on that same variable for each of the subgroups defined by the categorical variable. What do those subgroup means suggest about whether there is a relationship between your two variables?

Test the hypothesis that there is a relationship between the two variables in the population from which the samples are drawn.

Assess the strength of this relationship using statistics appropriate to the sort of analysis you have conducted.


For your two interval/ratio level variables:

Briefly explain which variable is the independent variable, which variable is the dependent variable, and why you expect a relationship between the two.

Produce a scatterplot summarizing the relationship between the two variables. Does it look like there is a relationship there? If so, how would you describe what that relationship looks like?

Report the coefficients of the OLS regression line describing the relationship between the two variables. Explain what these coefficients tell you.

Calculate, by hand, predicted values of the dependent variable for two nonzero values of the independent variable.

Test the hypothesis that there is a relationship between the two variables in the population from which the sample is drawn.

Give and interpret the values of Pearson’s r and Rsquared for this relationship.

1.
a. U.S. Americans
b. 1974 people
c. multistage cluster sampling
2.
a. What is the name of each variable in SPSS?
abany, advfront, adults, and age
b. What does each variable aim to measure?
· abany aims to measure if a female respondent answering this question wants abortion legal for any reason. abany is operationalized by requesting respondents to answer “yes” or “no” in response to the question, “Is there any reason you think abortion should be legal?”.
· advfront aims to measure how much people think science research is necessary and how much it should be funded by the government. advfront is operationalized by having respondents answer either “strongly agree”, “agree”, “disagree”, or “strongly disagree”, according to their corresponding opinion.
· adults aims to measure the number of children the respondent has. adults is operationalized by having the respondent enter how many adults above 18 they have in the household.
· age aims to measure the age of the respondent in years. age is operationalized by having the respondent enter a number for their age.
c.
· abany achieves nominal measurement by having answers without associated magnitude: yes and no.
· advfront achieves ordinal measurement by having answers that measure magnitude, but the distance between these measurements is not numeric; there is no numeric difference between agree and disagree.
· adults achieves ratio measurement by having a meaningful zero value (no adults 18 years and above), and by having numeric differences between answers.
· age achieves ratio level measurement by having a meaningful zero value (zero means that there is no age, and the person is not yet one year old), and by having numeric differences between answers.
3.
a.
Here is a frequency distribution followed by a bar chart for the abany variable.
ABORTION IF WOMAN WANTS FOR ANY REASON 


Frequency 
Percent 
Valid Percent 
Cumulative Percent 

Valid 
YES 
554 
28.1 
44.4 
44.4 
NO 
694 
35.2 
55.6 
100.0 

Total 
1248 
63.2 
100.0 


Missing 
IAP 
666 
33.7 


DK 
40 
2.0 



NA 
20 
1.0 



Total 
726 
36.8 



Total 
1974 
100.0 


b.
Statistics


ABORTION IF WOMAN WANTS FOR ANY REASON 

N 
Valid 
1248 
Missing 
726 

Mode 
2 
The mode is a measure of central tendency computed by determining which response is most common. A mode of 2 (no) for abany means that the response “no” was given most frequently.
c.
Statistics


ABORTION IF WOMAN WANTS FOR ANY REASON 

N 
Valid 
1248 
Missing 
726 




Minimum 
1 

Maximum 
2 

Percentiles 
25 
1.00 
50 
2.00 

75 
2.00 
Percentile is a measure of dispersion. Here the 25^{th} percentile is 1 (yes), calculated by finding the case 25% of the way up through a list of cases ordered from smallest magnitude to greatest magnitude. 50^{th} percentile and 75^{th} percentile can be calculated the same way. A 25^{th} percentile of 1 (yes) means that 25% of the data has a value of 1 (yes) or less. A 50^{th} percentile of 2 (no) means that 2 (no) is a value equal to or greater than 50% of the data.
4. Two Categorical Variables: abany and advfront.
a. abany and advfront are my two categorical variables. abany is the independent variable. This is because a person’s view on whether or not abortion ought to be legalized is more likely to affect their opinion as to how important science research is. The logic here is that if someone wants to have abortions legalized, they will want better science in order to have smoother abortions, therefore, more money ought to be put into science research.
b. Crosstab between advfront and abany:
SCI RSCH IS NECESSARY AND SHOULD
BE SUPPORTED BY FEDERAL GOVT * ABORTION IF WOMAN WANTS FOR ANY REASON
Crosstabulation



ABORTION IF WOMAN WANTS
FOR ANY REASON 
Total 

YES 
NO 

SCI RSCH IS NECESSARY AND SHOULD BE SUPPORTED BY FEDERAL GOVT 
Strongly agree 
Count 
80 
68 
148 
% within ABORTION IF WOMAN WANTS FOR ANY REASON 
27.9% 
20.7% 
24.0% 

Agree 
Count 
175 
215 
390 

% within ABORTION IF WOMAN WANTS FOR ANY REASON 
61.0% 
65.3% 
63.3% 

Disagree 
Count 
23 
37 
60 

% within ABORTION IF WOMAN WANTS FOR ANY REASON 
8.0% 
11.2% 
9.7% 

Strongly disagree 
Count 
9 
9 
18 

% within ABORTION IF WOMAN WANTS FOR ANY REASON 
3.1% 
2.7% 
2.9% 

Total 
Count 
287 
329 
616 

% within ABORTION IF WOMAN WANTS FOR ANY REASON 
100.0% 
100.0% 
100.0% 
c. 61 percent of women who want abortion legalized agree that science research is necessary, while 65 percent who do not want abortion legalized agree that science research is necessary. This begins to lead me to reject my anticipated results, because there are more women agreeing for science who do not want abortion legalized. However, when looking at the strongly agree row, my hypothesis begins to look true again, because there are 7.2% more abany respondents who said yes than those who said no.
d. Test the hypothesis that there is a relationship between the two variables in the population from which the sample is drawn
I. Assumptions
a. L of M: 2 nominal
b. Sampling: random
c. sample size: 616, 4 x 2 table
d. Population distributions: N/A
II. Hypotheses:
a. Ho = no relationship between groups abany and advfront
b. Ha = relationship between the two variables
III. Test Statistic
a. chisquared
b. df = 3
c. X^2 = 5.504
ChiSquare
Tests



Value 
df 
Asymp. Sig. (2sided) 
Pearson ChiSquare 
5.504^{a}

3 
.138 
Likelihood Ratio 
5.515 
3 
.138 
LinearbyLinear Association 
3.177 
1 
.075 
N of Valid Cases 
616 


a. 0 cells (0.0%) have expected count less
than 5. The minimum expected count is 8.39. 


IV. pvalue and interpretation
a. p = .138
b. If Ho were true, the probability of getting a crosstab like we got is less than .138.
V. Conclusion
a. Given p = .138, we fail to reject the Ho and do not find support for Ha.
e. Cramer’s V X^2 based measure of association test. I chose this test because it is based off of X^2, and also is suitable for tables bigger than 2×2, because it corrects a problem that comes up in the Phi value test for tables bigger than 2×2. Here we observe a very weak relationship, because the value of Cramer’s V (0.95) is on the low end of the “weak relationship” scale, which extends from 0 – 0.3.
Symmetric
Measures



Value 
Approx. Sig. 

Nominal by Nominal 
Phi 
.095 
.138 
Cramer’s V 
.095 
.138 

N of Valid Cases 
616 


a. Not assuming the null hypothesis. 

b. Using the asymptotic standard error
assuming the null hypothesis. 
f. Gamma PRE measure of association test. The value of gamma between abany and advfront is .156, which means that we do 15.6% better predicting values of advfront when we know whether the woman respondent thinks abortion should be legalized for any reason than if we don’t. I chose this measure of association because I have an ordinal level variable as well as a nominal level variable. Ordinal measures of association are appropriate for a mixed test like this (one ordinal and one nominal level variable), unless there are more than two options for the nominal level variable (Agresti and Finlay 246).
Symmetric
Measures



Value 
Asymp. Std. Error^{a}

Approx. T^{b}

Approx. Sig. 

Ordinal by Ordinal 
Gamma 
.156 
.074 
2.088 
.037 
N of Valid Cases 
616 




a. Not assuming the null hypothesis. 

b. Using the asymptotic standard error assuming
the null hypothesis. 

5.
a. research hypothesis between abany (dependent variable) and age (independent variable):
Ho: There is no relationship between abany and age. μ1 – μ2 </= 0
Ha: There is a relationship between abany and age. μ1 – μ2 > 0
I expect there to be a relationship between these two variables because as someone gets older, their perspectives tend to change, and there may be a common direction in which the beliefs tilt. Another, more realistic reason is that the older respondents matured during a time when the common belief was against abortion, while younger respondents matured when proabortion was a more common belief.
b. The mean for age is 48.19. For Yes the mean age is 46.63, and for No the mean age is 48.88. There may be a slight relationship between the two variables, but age doesn’t seem to have much affect on abany, because the mean age for No was only 2.25 years different from the mean age for Yes.
Statistics




ABORTION IF
WOMAN WANTS FOR ANY REASON 
AGE OF RESPONDENT 


N 
Valid 
1248 
1969 

Missing 
726 
5 

Mean 
1.56 
48.19 


Descriptives



ABORTION IF WOMAN WANTS FOR ANY REASON 
Statistic 
Std. Error 

AGE OF RESPONDENT 
YES 
Mean 
46.63 
.691 

95% Confidence Interval for Mean 
Lower Bound 
45.27 


Upper Bound 
47.99 


5% Trimmed Mean 
46.18 


Median 
46.00 


Variance 
263.871 


Std. Deviation 
16.244 


Minimum 
18 


Maximum 
89 


Range 
71 


Interquartile Range 
24 


Skewness 
.310 
.104 

Kurtosis 
.531 
.207 

NO 
Mean 
48.88 
.689 

95% Confidence Interval for Mean 
Lower Bound 
47.52 


Upper Bound 
50.23 


5% Trimmed Mean 
48.45 


Median 
49.00 


Variance 
327.119 


Std. Deviation 
18.086 


Minimum 
18 


Maximum 
89 


Range 
71 


Interquartile Range 
30 


Skewness 
.236 
.093 
c.
I. Assumptions
a. L of M: 1 nominal (abany), 1 ratio (age).
b. M of S: random
c. Sample size: 1248 valid responses = n (variables from same sample, so there’s only one sample size to report)
d. Population distribution: N/A
II. Hypotheses
a. Ho: there is no relationship between abany and age, μ1 – μ2 </= 0
b. Ha: there is a relationship between abany and age, μ1 – μ2 > 0
III. Test
a.
OneSample
Test



Test Value = 0 

t 
df 
Sig.
(2tailed) 
Mean
Difference 
95% Confidence
Interval of the Difference 

Lower 
Upper 

ABORTION IF WOMAN WANTS FOR ANY REASON 
110.598 
1247 
.000 
1.556 
1.53 
1.58 
AGE OF RESPONDENT 
120.908 
1968 
.000 
48.193 
47.41 
48.98 
IV. pvalue and interpretation
a. p < .000
b. If Ho were true (μ1 – μ2 </= 0), then the probability of getting a sample difference of means as far above 0 as we got (2.25 years) is .000.
V. Conclusion
a. We reject Ho, therefore, by proof of contradiction, we find support for the alternative Ha (μ1 – μ2 > 0).
d. This relationship is very weak, because the gamma value is .067.
Symmetric
Measures



Value 
Asymp. Std. Error^{a}

Approx. T^{b}

Approx. Sig. 

Ordinal by Ordinal 
Gamma 
.067 
.033 
2.021 
.043 
N of Valid Cases 
1243 




a. Not assuming the null hypothesis. 

b. Using the asymptotic standard error
assuming the null hypothesis. 

6.
a. My two variables are age (ratio) and adults (ratio). Age is the independent variable, because it affects the amount of adults over age 18 living in one’s household.
b. scatter plot of age (x) vs. adults (y)
The younger the respondent the higher the likelihood that they have a greater number of members in their household.
c. The unstandardized coefficient of the age of respondent is .012, which tells us that there is a negative slope, where for every increase in year of age there is a decrease of .012 in number of people over 18 in one’s household. The unstandardized coefficient under column b, in row constant, has a value of 2.473. This is the y intercept of the graph.
Coefficients^{a}



Model 
Unstandardized
Coefficients 
Standardized Coefficients 
t 
Sig. 


B 
Std. Error 
Beta 


1 
(Constant) 
2.473 
.053 

47.022 
.000 

AGE OF RESPONDENT 
.012 
.001 
.258 
11.771 
.000 

a. Dependent Variable: HOUSEHOLD MEMBERS 18
YRS AND OLDER 


d. y = .012x + 2.473.
y = .012(21) + 2.473
y = 2.221 people. The number of people over 18 most likely to be in a 21 year old’s household is 2.221.
y = .012(55) + 2.473
y = 1.813 people. The number of people over 18 most likely to be in a 55 year old’s household is 1.813.
e.
I. assumptions
a. L of M: two ratio (from the same sample): adults and age
b. M of S: random
c. sample size: n = 1958 people
d. assumptions: N/A, b/c of C.L.T. (sample size over 30).
II. hypotheses test:
a. Ho: there is no relationship, population slope = 0
b. Ha: there is a relationship, population slope =/ 0
III. test statistic
a. t = 120.908



Test Value = 0 

t 
df 
Sig. (2tailed) 
Mean Difference 
95% Confidence Interval of the Difference 

Lower 
Upper 

AGE OF RESPONDENT 
120.908 
1968 
.000 
48.193 
47.41 
48.98 
HOUSEHOLD MEMBERS 18 YRS
AND OLDER 
101.007 
1957 
.000 
1.891 
1.85 
1.93 
IV. interpret pvalue
a. a. p < .000
b. If Ho were true (population slope = 0), then the probability of getting a slope like we got (.012 household members over 18 per year of age) is .000.
V. conclusion
a. We reject Ho, therefore, by proof of contradiction, we find support for the alternative Ha (population slope =/ 0).
f. R square is .066. The average error from a straight line that our results deviate is 0.66. The Pearson’s r is .258., meaning there is a negative relationship, but it is not perfect.
Model
Summary and Parameter Estimates


Dependent Variable: HOUSEHOLD MEMBERS 18 YRS
AND OLDER 

Equation 
Model Summary 
Parameter
Estimates 

R Square 
F 
df1 
df2 
Sig. 
Constant 
b1 

Linear 
.066 
138.559 
1 
1951 
.000 
2.473 
.012 
The independent variable is AGE OF
RESPONDENT. 
Correlations



HOUSEHOLD
MEMBERS 18 YRS AND OLDER 
AGE OF
RESPONDENT 

HOUSEHOLD MEMBERS 18 YRS AND OLDER 
Pearson Correlation 
1 
.258^{**}

Sig. (2tailed) 

.000 

Sum of Squares and Crossproducts 
1342.611 
7374.530 

Covariance 
.686 
3.778 

N 
1958 
1953 

AGE OF RESPONDENT 
Pearson Correlation 
.258^{**}

1 
Sig. (2tailed) 
.000 


Sum of Squares and Crossproducts 
7374.530 
615657.277 

Covariance 
3.778 
312.834 

N 
1953 
1969 

**. Correlation is significant at the 0.01
level (2tailed). 
I wrote this report for Gordon College course, Statistics for Social Research. Find the syllabus below.