### The Department of Psychology

Lab 4 Â– Two Sample Independent and Dependent t-Test

Consider the question, "Are females better students than males?" LetÂ’s assume you have collected information on the scholastic performance of 35 females and 28 males currently enrolled at USC. How would you answer the question above?

A good, first step is to look at descriptive statistics for your data. This ought to give you some ideas.

- Retrieve file "**lab4_ind.sav**". Set the path to **C:\MYDOCUMENTS** to find the file.

Given the way the data file is structured, we cannot describe males and females separately, unless we tell SPSS to temporarily split the data file into two segments, based on gender. Then we will be able to describe differences between genders on the variable GPA.

**Splitting the file **

- Make sure you are in the "Data Editor" window.

- Click on "Data" and then "Split File".

- Select "Organize Output by Groups".

- Select variable "Gender" and move it into "Groups Based On".

- Click on the PASTE button.

- Go to Syntax Window and run the pasted commands.

**Describing data within the split file environment**

As youÂ’ll see, this is no different from describing a variable in the un-split environment.

- Click on "Statistics", select "Summarize", then click on "Descriptives".

- Select variable "GPA" and move it into "Variables" field.

- Click on the OPTION button.

- Select "Mean", "S. E. Mean"; un-select "Minimum", "Maximum" and "Std.

Deviation".

- Click on the CONTINUE button, then the PASTE button.

- Run the pasted commands.

Table 1 shows the output. Examining the means, notice that the mean GPA for the females is higher (2.28) than the mean GPA for males (2.10). What does this mean? Are the females scholastically more apt than males? Well, as you have learned, estimates of population parameters from sample data do fluctuate Â– a product of random sampling. To answer our question of differences in academic performance between genders, we will have to compare the difference between the average performance of males and females to the standard error of the difference. We can get a rough estimate about whether the difference is likely to be significant by examining Confidence Intervals using the standard errors (values are given in the output of Table 1).

Table 1

Gender Of Students = Male

Gender Of Students = Female

**Getting a rough idea using 95% CI**

Before we can accomplish this, letÂ’s un-split the data file

- Make sure you are in the "Data Editor" window.

- Click on "Data", then "Split File".

- Select "Analyze All Cases" and then click on "PASTE" button.

- Run the pasted commands.

Now letÂ’s build the CIÂ’s

- Make sure you are in the "Data Editor" window.

- Click on "Graph", then "Error Bar".

- Select "Simple" and "Summaries for Groups of Cases", then click on DEFINE.

- Select variable "GPA" and move it into the "Variable" field.

- Select variable "Gender" and move it into the "Category Axis" field.

- Notice that by default we have a 95% CI for the mean.

- Click on the PASTE button, then run the pasted command.

Examine the Graph Window. Notice that the CIÂ’s for the two means overlap with one another. That is an indication that the difference between the means is probably not statistically significant. This is **not** equivalent to conducting the t-test. However, in many cases, this strategy will produce results that agree with the conclusions obtained based on the t-test.

Now that our expectation is that there is no difference in GPA between the two groups, letÂ’s conduct a two sample t-test to confirm our expectation.

1. Formulating the hypotheses (null, H_{0} and alternative, H_{1})

The alternative hypothesis (H_{1} ) states the hope of the experimenter. In other words, we hope to prove that there is** **a difference between males and females in their scholastic performance as measured by their GPA.

The null hypothesis (H_{0} ) reflects the situation that the experimenter hopes to disprove. In this case, that there** **is no difference between males and females in their scholastic performance as measured by their GPA.

Notice that the two hypotheses, combined, express competing ideas about the state of the world.

2. Selecting a significance level a

For this lab, we are going to use a = .05. We could be more conservative, accept fewer Type I errors, and use a = .01 .

- Setting the Decision Stage

A. Choosing a statistical test

In this problem, we are comparing two sample means. Since the population means and standard deviations are unknown, the t-test is the correct choice. However, there are two possibilities, the **independent** two sample t-test and the **dependent **two-sample t-test.

Dependent means that there is a relationship between pairs of scores collected for the two groups. For example, if we had equal numbers of males and females, where brothers and sisters were selected from the same family, we would have dependent scores. This assumes that a common family genetic heritage contributes to GPA. Another possibility is to have the same subjects participate in both conditions of our experiment. Then the subjectÂ’s scores in the two different conditions would be dependent because of the common contributions of each subjectÂ’s ability to the pairs of scores they obtained on our dependent measure. Scores for two groups can only be dependent if they are paired, based on their common dependency, and thus, both conditions of the independent variable have the same number of scores.

Independent means there is no relationship between the scores collected on the two groups. That is, there is no common influence on the scores from the two groups.

Since the two groups in our case have unequal sample size, we know that we are dealing with independent groups. Thus, we select the **two-sample independent t-test**.

B. Finding a critical value or values

This part becomes unnecessary when using SPSS. This is because the information from tables such as A-2, on page 519 of your text, are built into the program. SPSS actually determines the probability of observing the t-value obtained. If that probability is less than .05, we will reject Ho. However, to keep SPSS honest, and you familiar with Table A-2, you should look up the critical value in your text, and write it on your print outs.

C. Locating rejection region

Figure 1

LB t-obs = -.85 .85 UB

In the case of the two tailed test, the area to the left of the lower bound (LB) and the area to the right of upper bound (UB) in Figure 1 is equal to one half of alpha. In our case there would be .025 of the area in each region. The P-value given by SPSS is the one associated with the observed value of t computed on the sample data. It is expressed as the combined area from the t-observed (-.85) to the left and from itÂ’s mirror image (+.85) to the right (just like two a /2Â’s combine to a ).

D. Formulating the decision rules

Knowing that the **Fail to reject H _{0}**region is in between LB and UB and that the

**Reject H**region is the remainder of the area under the curve (a ) we can formulate the decision rule in terms of a P-value and type I error (a ).

_{0}We reject H_{0} if the P-value associated with the observed statistic is less than or equal to a .

We fail to reject H_{0} if the P-value associated with the observed statistic is greater than a .

4. Calculating the observed statistic

- Click on "Statistics", select "Compare Means", and click on "Independent Sample T-test".

- Select variable "GPA" and move it into the "Test Variable" field.

- Select variable "Gender" and move it into the "Grouping Variable" field.

- Click on now lit DEFINE GROUPS button.

- In the "Group 1" field, type 1 Â– indicates the code for the male group.

- In the "Group 2" field, type 2 Â– indicates the code for the female group.

- Click on the CONTINUE button.

- Click on the OPTION button.

- Since we chose alpha = .05, we leave value 95 in the field "Confidence interval".

- Click on the CONTINUE and PASTE button.

- Go to the syntax window and run the t-test command.

- Click on the rectangle in the upper right corner of the output window.

Table 2 shows the SPSS output. The first part of the output provides the descriptive statistics, as we have seen in Table 1.

The middle part of the output provides the mean difference between males and females. (This difference is computed by subtracting group 2 from group 1. Since we have labeled males as group 1 and females as group 2, the difference was achieved by subtracting the female mean GPA from the male mean GPA.) As you can see, the difference is very small, only - .1737.

Ignore the LeveneÂ’s test for the equality of variance.

The bottom part of the output provides the observed t statistic (called t-value), df, P-value for the two tail hypothesis (called Two Tail Sig), standard error of the difference, and the 95% CI on the difference. For reasons beyond what you will learn in this class, always use the line of the output labeled **UNEQUAL**. (In general, this version of the t-test will produce results closer to the alpha level we chose to use).

Table 2

5. Making a decision

See Figure 1. The P-value (.397) is greater than a = .05. As you should recall the P-value is the area in the two outlying regions of the tails of the sampling distribution. In this case, .397 of the area falls in the extreme tails outside of the t-observed (-.85) and its mirror image (.85). This area is greater than the area in the tail from the LB and UB (a ). Since this can only happen when t-observed lies in between LB and UB or in the fail to reject H_{0} region, we conclude that there is not enough evidence to suggest that the null hypothesis is incorrect. So we fail to reject the H_{0}. What does that mean? Well, what we have concluded is that the males and the females do not differ in their scholastic performance as measured by GPA.

Regarding type I and type II error. Type I error results when the null hypothesis is true and we make the mistake of rejecting it (the chance of this happening is a , the significance level). Type II error occurs when we fail to reject the null hypothesis and in fact this null hypothesis is false. Often, the chance of this happening is unknown.

In our case, we have failed to reject the null hypothesis. So, the only error we could be making is a type II error. There is an unknown possibility that the null hypothesis is wrong.

Assignment:

1. Turn to the data on page 214 of your text book. You have two teaching conditions. Enter the data into a new data widow of SPSS. You should use a format similar to that used for the GPA data we have analyzed in our class example. Enter all of the data into one column, and enter a coding variable, (1 or 2) to indicate the teaching condition, in a second column. Then, test the hypothesis that the computer stimulated teaching produces different results from the conventional teaching approach. Use the steps described in the lab example, above.

2. Now, consider the following problem. You just concluded an experiment on human recall. Your experiment had two conditions: (1) 25 subjects were allowed to study a list of unrelated words for 1 minute. Then, they waited for 5 minutes in silence and were asked to recall as many words from the list as possible. (2) These same subjects were again presented with a list of unrelated words for a duration of 1 minute (different word list). This time, however, the subjects waited for 5 minutes listening to the deafening roar of heavy metal music. Then again, the subjects were asked to recall as many words as possible from the second list. Also, about half of your subject experienced condition 1 prior to the condition 2 and vice versa. This was to insure that the order effect did not bias your findings. The question you are trying to answer is: "Is there a difference in the recall between these two conditions?" **Do not forget to discuss the possibility of committing an error (be careful here)!**

The data are in the file "**lab6_ass.sav**".

Hint: Are these groups dependent or independent? If you conclude the scores are dependent, the commands you will need to analyze the data are presented below.** **

Example for Conducting Two-Sample t-test for Dependent Scores

- Click on "Statistics", select "Compare Means", and click on "Paired Samples

T Test".

- Select two variables ("Metal" and "Quiet") and move them into the "Paired

Variables" field.

- Click on the OPTION button.

- Since alpha = .05, leave value 95 in the field "Confidence interval".

- Click on the CONTINUE and PASTE button.

- Go to the syntax window and run the t-test command.

Sample output from the Dependent t-test:

The first part of the output gives you basic descriptive statistics for each variable, separately. This includes a measure of association called correlation. A correlation of .688 indicates a very strong relationship between the two variables.

The second part gives you idea of how dependent the two samples are using the Pearson Correlation Coefficient.

The third part of the output, the left part of it, provides descriptive statistics for the individual differences computed by subtracting the second group (QUIET) from the first group (HEAVY METAL).

The right side of the second part provides t-observed (t-value), df., and P-value (2-tailed significance).