LIN 351 Week 8 HW: Distributional analysis of your (r) data

You have coded a lot of data about how tokens of (r) are pronounced. Now it's time to see what differences there are between groups of speakers and between linguistic contexts. What differences do you expect? That is, which groups or contexts will have more [r-1] than others? Which do you think will have less?

State two hypotheses indicating your expectations. Make sure they are clear and objective, with no hedging. One should be about a predicted social factor effect and the other about a predicted linguistic factor effect.

To prepare your data to test your hypotheses, first, open your token file (the one with all your speakers' data) in Excel.

Choose how to sort your data, to show the distribution of each dependent variable with respect to each variant of the independent variable you are testing first.

The best way to do this is with Pivot Tables, a valuable Excel tool.

Select all your data.
Choose "Summarize with Pivot Table" from Excel's Data menu.
When prompted, choose "New worksheet" for the location of your Pivot Table. You shouldn't need to fill in anything else in that pop-up window. Click "OK."
Either experiment and see if you can see how to make the tables you need intuitively, or follow these instructions.

In the new sheet, you'll see "PivotTable Fields" on the right.
From the "Field Name" list at the top, select the name of your dependent variable column and drag it to the "Columns" field. Also drag it to the "Values" field.
In the "Values" field, click the little "i" to the right of the variable name and make sure that "Count" is selected.
Select the name of one independent variable column and drag it to the "Rows" field.

Your new pivot table displays a list of how many tokens for each variant of your dependent variable exist in each level of your independent variable. [See sample pivot table for a small subset of data.] If this looks good, hide any "blank" rows/columns, then copy the table and "Paste Special > Values" to a different (new) sheet in your Excel file.
In the copied version, add a column to calculate the "Percent [r]"
In the row for the first level of the independent variable, in that new column, type a formula that gives you the count of [r-1] / total for that row.
Fill that formula down to every row of the table.
Choose the "%" format from the Number formatting options. Here is an illustration of the copied table and then an improved version of it below.
You can now see how often [r-1], rather than [r-0], was produced for each level of the independent variable, across all your speakers. This is your index score for that group of speakers.
What does it show you? Do the numbers seem to support your hypothesis?

Repeat these steps to make a similar table for each linguistic and each social independent variable, always reporting the dependent variable in your columns.

Submit these tables for your Week 8 homework.
For two of them, include your hypothesis and a sentence indicating how the data do or don't support the hypothesis.

Note: So, far, you have been doing univariate analysis – looking at only one independent variable at a time. Later, you will about analysis with more than one independent variable included. This is very important when your data set does not have a balanced distribution of every combination of every independent variable. That is, when you are dealing with real world data.

[Return to top] [Return to syllabus]

Updated February 17, 2025