Naomi Nagy

Linguistics at U of T

Goldvarb

Downloading Goldvarb

You can download Goldvarb in 3 formats (depending on your operating system) from http://individual.utoronto.ca/tagliamonte/goldvarb.htm.  There are version for Macintosh, Windows, and LINUX. For an (older) DOS version, see: http://www.crm.umontreal.ca/~sankoff/Varbrul_MS-DOS.html.

For manuals, see:

  • Mac
  • Windows
  • You will need the Neapean.Tok token file (0.293 Kb)if you want to follow through the steps in the manualstutorial. You *might* have to rename the file "Neapean.tkn" to get it to work.

Naomi's instructions for using Goldvarb

  1. Preparing the Token file
  2. Preparing the Conditions file
  3. Create cell and result files
  4. The Results file
  5. Preparing for a statistical analysis
  6. Statistical analysis
  7. Cross-tabulation

 

  1. Preparing the Token file

    This may be done in Word or Excel or directly in Goldvarb. These instructions illustrate the latter method. It is also possible to use Goldsearch to prepare token files for large corpora, if you are feeling computer savvy.

     

    1. Open Goldvarb
    2. Select "New" in the "File" menu.
      1. Add ".tkn" as a suffix to whatever title you choose.
      2. Select "Tokens" as the type of new file.
      3. When it asks you to select number of groups, type the number of elements you will have in each token. Then hit return or "OK". You can always change the number of elements, so just click "OK" and go on if you aren't sure.
    3. Enter your tokens, one per line.
      1. Each token line must begin with an open parenthesis "("
      2. Separate the token itself from following material by at least 3 spaces or a tab.
      3. Anything in a line that doesn't begin with "(" will be ignored by the program, so you can label different parts of your file by writing whatever you want on a line that doesn't begin with "(".
      4. There is a "search & replace" command in the "Tokens" menu which you can use to automate repetitive tasks. For example, assume every line needs to have a "N" as the second element of the token (representing the speaker "Naomi", and you have used "y" and "n" as the first element of your token. Select "search & replace" and replace each "y" with "yN" and then each "n" with "nN".
      5. You can cut and paste to and from a token file for editing, just like in Word.

      It will look something like:

      
      (N       1       good            morning         1       1
      (Y       1       appreciate      your            1       2
      (Y       1       about           your            1       2
      (Y       1       breakfast       pause           1       2
      

       

    4. Save the token file.
    5. Checking your token file.
      1. In the Token menu, choose "Generate factor specifications." This will make a list of all the characters you used in each column. You can see these in the little window below your token file. Click through the factor groups and make sure there are no anomalous characters (likely indicating typos or missing one element in a token).
      2. Alternatively, you can enter the factor specifications into the "Factor specification" window and then select "Check tokens" at the program will look for lines that have an anomalous character (i.e., one that you didn't specify for that group.)
      3. If it asks you to "Set fill character," just type "/" and say ok. That means that if you have any token string that isn't long enough (you specified the length at the beginning with "Select number of groups") it will fill in "/"'s to fill out the line-- then you'll see them and check what you missed.

  2. Preparing the Conditions file

    This is where you choose how to sort your data. For starters, you will just do a general sort of all your data, showing the distribution of each dependent variable with respect to each variant of the independent variable.
    1. In the Token menu, choose "No recode." This will give you a general overview of the tokens and patterns you have.
    2. Save your new Conditions file.
      1. Name it something that will make sense like "all" or "first" or "Vanessa."
      2. Add the suffix ".cnd" to the name.
      3. Note that the program suggests an appropriate name, but it will always suggest a name that is the same as your token file name, and you will probably make several different condition files from your token file.
    3. Watch it generate a list of conditions, written in Lisp programming language. Your condition file will look something like this:
      
      (
      ; Identity recode:  All groups included as is.
      (1)
      (2)
      (3)
      )
      
      This means: "Don't do anything to any of the groups. Just use them all (elements (1), (2) and (3) of each token) as they are, with the stuff in column (1) as the dependent variable. You can ignore certain factors or factor groups, combine groups, add new elements to your token file to indicate something like "y" in column 1 and "N" in column 2, etc. See "Preparing for a statistical analysis"

  3. Create cell and result files

    1. In the "Cell" menu select "Load cells to memory."
    2. Click "OK" when it asks whether to use the tokens and condition files that you see on the screen.
    3. This won't work if there is anything wrong with your token file or conditions file.
    4. A cell file is created, which you can ignore.
    5. A Results file is also created.
    6. You will be asked to select the application values. This means, "how will the dependent variable be examined?" The possible variants of the dependent variable are listed. You can rearrange their order and/or erase some of them. Assume the window shows you the string "YN?".
      1. If you select "Y" it will do a binary comparison, comparing tokens coded as "Y" for the dependent variable to all tokens with any other variant.
      2. If you select "YN" it will do a binary comparison, comparing tokens coded with "Y" for the dependent variable to tokens coded with "N", and ignoring any tokens with other variants (e.g., those coded with "?").
      3. If you select 3 or more characters, all tokens with those codes for the dependent variable will be examined, and the others ignored.
    7. For a first run-through, select the whole string, but put them in an order with the more relevant ones first. (i.e., put "?" at the end as you won't really get much information from it.
    8. To go on to the statistical analysis, you must select one of the binary comparison options (listing either one or two variants).
    9. Save the Results file under an appropriate name. It should match your conditions file so that you can look back at the Conditions under which the results were generated.

  4. The Results file

    This file shows you the distribution of each dependent variable with respect to each variant of the independent variables. It looks like this:
    
     CELL CREATION   11/21/96 12:01 pm 
    Name of token file:  Pat.Tkn	
    Name of condition file:  first.Cnd
    	(
    (1)	
    (2)
    (3)
    )
           Number of cells:  3
      Application value(s):  ab
      Total no. of factors:  4
    
     Group        Y      N   Total   %
    ----------------------------------
     1 (2)
       C   N      2      1       3  75
           %     67     33
    
       V   N      0      1       1  25
           %      0    100          * KnockOut *
    
     Total N      2      2       4
           %     50     50
    ----------------------------------
     2 (3)
       N   N      1      2       3  75
           %     33     67
    
       P   N      1      0       1  25
           %    100      0          * KnockOut *
    
     Total N      2      2       4
           %     50     50
    ----------------------------------
     TOTAL N      2      2       4		(extremely small sample for
           %     50     50			demonstration purposes only)
    
     Name of new cell file:  Untitled.Cel
    

    This table lists the possible variants of the independent variable (Y and N) across the top of the table and the possible variants of each dependent variable, one per row. So this Results file compares tokens with "Y" vs. "N" as the independent variable, i.e., deleted vs. non-deleted (t,d).

    The first independent variable examined is following segment, with "C" for following consonant and "V" for following vowel. It shows that 67% of the words with a following consonant had deleted (t,d), but 0% of the words with a following vowel had deleted (t,d).

    The second independent variable examined is Speaker, with "P" for Pat and "N" for Naomi (pseudonyms, of course). We see that Speaker N deleted (t,d) in 33% of her tokens and Speaker P in 100%.

    The word * KnockOut * appears in every line that has a "0" in it. Finally, we see that overall, for the whole token set, (t,d) deletion occurred 50% of the time, or in 2 out of 4 tokens.

    Note: You can copy and paste this table into a Word document, such as a research paper. The columns will line up if you choose the Courier font. It will have strings of spaces rather than tabs, which can be a pain to edit, but you can fix it up as necessary. You can also edit it right in the Results window, which can be dangerous. But, if you do mess it up, you can also reconstruct a new results file by going back to the "Cell" menu and selecting "Load cells to memory."

  5. Preparing for a statistical analysis

    In order to find out which variable are significant, you must have a
    results file with no "Knockouts," i.e., no "0"'s. This may mean combining or deleting certain factors or factor groups. This process is done by creating a new conditions file.

     

    1. Make sure you have principled reasons for the changes you make. For example, it's ok to combine following stop and following fricative if there was 95% deletion for stops and 100% for fricatives, because (a) stops and fricative are somewhat similar phonetically and (b) the numbers were fairly similar.
    2. Select "Recode setup" from the "Tokens" menu.
      1. For any factor groups that you wish to leave intact, select them on the left side (clicking on their factor group number) and then click "Copy."
      2. To exclude a certain factor, click on it on the left side, then click "Exclude" and say "ok." Then copy the factor group. The excluded token will still show up, but tokens containing it will be ignored in the analysis.
      3. To recode a factor group, select it, choose recode, and then type over the letters on the right to show how you want to recode them.
    3. Make sure it worked. Do this by going back through steps 3 (Create cell and result files) and 4 (The Results file) and making sure that there are no knock-outs. If it didn't work, try making a new condition file. This may get tedious, so you might want to copy down your coding as it shows up in the Conditions window. You should be able to figure out most of the Lisp code.

  6. Statistical analysis

    1. Choose "Binomial, one-level" from the "Cells" menu.
      1. This will create a table showing the weights of each factor. It looks like all the tables of weights and probabilities you've seen in various articles. The weights are the values for p1, p2, etc., in the equations we looked at, representing the effect that each factor has on whether the rule applies.
      2. It will also show the frequency of each factor.
      3. It also gives an input value, which is the po in the equation.
      4. So for any one combination of factors (e.g., following consonant, Pat as speaker) we could calculate the probability of deletion by combining the po value with the appropriate p1, p2, for each factor group.) But, we don't have to do it because Goldvarb did it for us-- those weights combined will equal the probability of a certain type of token undergoing the rule application.
    2. Choose "Binomial, up & down" for analysis of which factors are significant.
      1. This analysis takes longer than the one-level, especially if you have a big token file.
      2. It spits out a lot of text and numbers, but indicates which factors were found to be significant.
      3. You will notice that, generally, the factor groups it finds to be significant are those that have the biggest spread in values in the one-level analysis. If there is a lot of overlap between two different factor groups (such as all the tokens with a following consonant having been produced by Pat), there may be differences.

  7. Cross-tabulation

    If you want to look for that type of overlap, or any interaction between 2 dependent variables, choose "Cross-tab" from the Cells menu and select the 2 groups you are interested in. A new table will be created which shows you their distribution. You can save it as Text or Picture. The "Picture" one looks nicer when copied into another document, but can't be edited, and takes up more disk space. The "Text" one can be edited, and has to be, in order to be legible. Use Courier font to make the columns line up.

NOTES:

  1. The various files shown as examples are nonsense, un-related, and not created from real data.
  2. A different set of instructions, written by the program's creators, can also be examined.

This page was last updated by Naomi Nagy on March 24, 2009

.

email: naomi dot nagy at utoronto dot ca | Return to my home page