Downloading Goldvarb
You can download Goldvarb in 3 formats (depending on your operating system)
from http://individual.utoronto.ca/tagliamonte/goldvarb.htm. There are version for Macintosh, Windows, and LINUX. For an (older) DOS version, see: http://www.crm.umontreal.ca/~sankoff/Varbrul_MS-DOS.html.
For manuals, see:
- Mac
- Windows
- You will need the Neapean.Tok token file (0.293 Kb)if you want to follow through the steps in the manualstutorial. You *might* have to rename the file "Neapean.tkn" to get it to work.
Naomi's instructions for using Goldvarb
- Preparing the Token file
- Preparing the Conditions file
- Create cell and result files
- The Results file
- Preparing for a statistical analysis
- Statistical analysis
- Cross-tabulation
-
Preparing the Token file
This may be done in Word or Excel or directly in Goldvarb. These
instructions illustrate the latter method. It is also possible to use Goldsearch
to prepare token files for large corpora, if you are feeling computer savvy.
- Open Goldvarb
- Select "New" in the "File" menu.
- Add ".tkn" as a suffix to whatever title you choose.
- Select "Tokens" as the type of new file.
- When it asks you to select number of groups, type the number of
elements you will have in each token. Then hit return or
"OK". You can always change the number of elements, so
just click "OK" and go on if you aren't sure.
- Enter your tokens, one per line.
- Each token line must begin with an open parenthesis "("
- Separate the token itself from following material by at least 3
spaces or a tab.
- Anything in a line that doesn't begin with "(" will be
ignored by the program, so you can label different parts of your
file by writing whatever you want on a line that doesn't begin with
"(".
- There is a "search & replace" command in the
"Tokens" menu which you can use to automate repetitive
tasks. For example, assume every line needs to have a "N"
as the second element of the token (representing the speaker
"Naomi", and you have used "y" and "n"
as the first element of your token. Select "search &
replace" and replace each "y" with "yN" and
then each "n" with "nN".
- You can cut and paste to and from a token file for editing, just
like in Word.
It will look something like:
(N 1 good morning 1 1
(Y 1 appreciate your 1 2
(Y 1 about your 1 2
(Y 1 breakfast pause 1 2
- Save the token file.
- Checking your token file.
- In the Token menu, choose "Generate factor
specifications." This will make a list of all the characters
you used in each column. You can see these in the little window
below your token file. Click through the factor groups and make sure
there are no anomalous characters (likely indicating typos or
missing one element in a token).
- Alternatively, you can enter the factor specifications into the
"Factor specification" window and then select "Check
tokens" at the program will look for lines that have an
anomalous character (i.e., one that you didn't specify for that
group.)
- If it asks you to "Set fill character," just type
"/" and say ok. That means that if you have any token
string that isn't long enough (you specified the length at the
beginning with "Select number of groups") it will fill in
"/"'s to fill out the line-- then you'll see them and
check what you missed.
-
Preparing the Conditions file
This is where you choose how to sort your data. For starters, you will just
do a general sort of all your data, showing the distribution of each
dependent variable with respect to each variant of the independent variable.
- In the Token menu, choose "No recode." This will give you a
general overview of the tokens and patterns you have.
- Save your new Conditions file.
- Name it something that will make sense like "all" or
"first" or "Vanessa."
- Add the suffix ".cnd" to the name.
- Note that the program suggests an appropriate name, but it will
always suggest a name that is the same as your token file name, and
you will probably make several different condition files from your
token file.
- Watch it generate a list of conditions, written in Lisp programming
language. Your condition file will look something like this:
(
; Identity recode: All groups included as is.
(1)
(2)
(3)
)
This means: "Don't do anything to any of the groups. Just use them
all (elements (1), (2) and (3) of each token) as they are, with the
stuff in column (1) as the dependent variable. You can ignore certain
factors or factor groups, combine groups, add new elements to your token
file to indicate something like "y" in column 1 and
"N" in column 2, etc. See "Preparing
for a statistical analysis"
-
Create cell and result files
- In the "Cell" menu select "Load cells to memory."
- Click "OK" when it asks whether to use the tokens and
condition files that you see on the screen.
- This won't work if there is anything wrong with your token file or
conditions file.
- A cell file is created, which you can ignore.
- A Results file is also created.
- You will be asked to select the application values. This means,
"how will the dependent variable be examined?" The possible
variants of the dependent variable are listed. You can rearrange their
order and/or erase some of them. Assume the window shows you the string
"YN?".
- If you select "Y" it will do a binary comparison,
comparing tokens coded as "Y" for the dependent variable
to all tokens with any other variant.
- If you select "YN" it will do a binary comparison,
comparing tokens coded with "Y" for the dependent variable
to tokens coded with "N", and ignoring any tokens with
other variants (e.g., those coded with "?").
- If you select 3 or more characters, all tokens with those codes
for the dependent variable will be examined, and the others ignored.
- For a first run-through, select the whole string, but put them in an
order with the more relevant ones first. (i.e., put "?" at the
end as you won't really get much information from it.
- To go on to the statistical analysis, you must select one of the
binary comparison options (listing either one or two variants).
- Save the Results file under an appropriate name. It should match your conditions
file so that you can look back at the Conditions under which the
results were generated.
-
The Results file
This file shows you the distribution of each dependent variable with respect
to each variant of the independent variables. It looks like this:
CELL CREATION 11/21/96 12:01 pm
Name of token file: Pat.Tkn
Name of condition file: first.Cnd
(
(1)
(2)
(3)
)
Number of cells: 3
Application value(s): ab
Total no. of factors: 4
Group Y N Total %
----------------------------------
1 (2)
C N 2 1 3 75
% 67 33
V N 0 1 1 25
% 0 100 * KnockOut *
Total N 2 2 4
% 50 50
----------------------------------
2 (3)
N N 1 2 3 75
% 33 67
P N 1 0 1 25
% 100 0 * KnockOut *
Total N 2 2 4
% 50 50
----------------------------------
TOTAL N 2 2 4 (extremely small sample for
% 50 50 demonstration purposes only)
Name of new cell file: Untitled.Cel
This table lists the possible variants of the independent variable (Y and
N) across the top of the table and the possible variants of each dependent
variable, one per row. So this Results file compares tokens with
"Y" vs. "N" as the independent variable, i.e., deleted
vs. non-deleted (t,d).
The first independent variable examined is following segment, with
"C" for following consonant and "V" for following vowel.
It shows that 67% of the words with a following consonant had deleted (t,d),
but 0% of the words with a following vowel had deleted (t,d).
The second independent variable examined is Speaker, with "P"
for Pat and "N" for Naomi (pseudonyms, of course). We see that
Speaker N deleted (t,d) in 33% of her tokens and Speaker P in 100%.
The word * KnockOut * appears in every line that has a "0" in it. Finally, we see that overall, for the whole token set,
(t,d) deletion occurred 50% of the time, or in 2 out of 4 tokens. Note: You can copy and paste this table into a Word document, such as a
research paper. The columns will line up if you choose the Courier font. It
will have strings of spaces rather than tabs, which can be a pain to edit,
but you can fix it up as necessary. You can also edit it right in the
Results window, which can be dangerous. But, if you do mess it up, you can
also reconstruct a new results file by going back
to the "Cell" menu and selecting "Load cells to memory."
-
Preparing for a statistical analysis
In order to find out which variable are significant, you must have a results
file with no "Knockouts," i.e., no
"0"'s. This may mean combining or deleting certain factors or
factor groups. This process is done by creating a new
conditions file.
- Make sure you have principled reasons for the changes you make. For
example, it's ok to combine following stop and following fricative if
there was 95% deletion for stops and 100% for fricatives, because (a)
stops and fricative are somewhat similar phonetically and (b) the
numbers were fairly similar.
- Select "Recode setup" from the "Tokens" menu.
- For any factor groups that you wish to leave intact, select them
on the left side (clicking on their factor group number) and then
click "Copy."
- To exclude a certain factor, click on it on the left side, then
click "Exclude" and say "ok." Then copy the
factor group. The excluded token will still show up, but tokens
containing it will be ignored in the analysis.
- To recode a factor group, select it, choose recode, and then type
over the letters on the right to show how you want to recode them.
- Make sure it worked. Do this by going back through steps 3 (Create
cell and result files) and 4 (The Results file)
and making sure that there are no knock-outs. If it didn't work, try making
a new condition file. This may get tedious, so you might want to
copy down your coding as it shows up in the Conditions window. You
should be able to figure out most of the Lisp code.
-
Statistical analysis
- Choose "Binomial, one-level" from the "Cells"
menu.
- This will create a table showing the weights of each factor. It
looks like all the tables of weights and probabilities you've seen
in various articles. The weights are the values for p1, p2, etc., in
the equations we looked at, representing the effect that each factor
has on whether the rule applies.
- It will also show the frequency of each factor.
- It also gives an input value, which is the po in the equation.
- So for any one combination of factors (e.g., following consonant,
Pat as speaker) we could calculate the probability of deletion by
combining the po value with the appropriate p1, p2, for each factor
group.) But, we don't have to do it because Goldvarb did it for us--
those weights combined will equal the probability of a certain type
of token undergoing the rule application.
- Choose "Binomial, up & down" for analysis of which
factors are significant.
- This analysis takes longer than the one-level, especially if you
have a big token file.
- It spits out a lot of text and numbers, but indicates which
factors were found to be significant.
- You will notice that, generally, the factor groups it finds to be
significant are those that have the biggest spread in values in the
one-level analysis. If there is a lot of overlap between two
different factor groups (such as all the tokens with a following
consonant having been produced by Pat), there may be differences.
-
Cross-tabulation
If you want to look for that type of overlap, or any interaction between 2
dependent variables, choose "Cross-tab" from the Cells menu and
select the 2 groups you are interested in. A new table will be created which
shows you their distribution. You can save it as Text or Picture. The
"Picture" one looks nicer when copied into another document, but
can't be edited, and takes up more disk space. The "Text" one can
be edited, and has to be, in order to be legible. Use Courier font to make
the columns line up.
NOTES:
- The various files shown as examples are nonsense, un-related, and not
created from real data.
- A
different set of instructions, written by the program's creators, can
also be examined.
|
This page was last updated by Naomi
Nagy on
March 24, 2009
.
|
|