Frequently asked questions about -gologit-

Last modified 21 December 2004 VKF

  1. Why doesn't this command work under Stata 7?
  2. Are there any references for this model?
  3. How do I test the proportional odds assumption?
  4. How can I generate predicted values?
  5. How do I interpret the coefficients?

1. Why doesn't this command work under Stata 7?

Many people have reported the following error when trying to run -gologit- under Stata 7.

. gologit
gologit: ado-file does not define command
r(162);

Changes to Stata's internal code in version 7 no longer allow -gologit- to run. To use -gologit- under Stata 7, you need to replace the file gologit.ado on your computer with a new one. There is an updated version of -gologit- on the Boston College IDEAS Statistical Software Components website. To install the update, run the following command from within Stata:

. net install http://fmwww.bc.edu/RePEc/bocode/g/gologit, replace
checking gologit consistency and verifying not already installed...

the following files will be replaced:
    c:\ado\stbplus\g\gologit.ado
    c:\ado\stbplus\g\goll.ado
    c:\ado\stbplus\g\gologit.hlp

installing into c:\ado\stbplus\...
installation complete.

2. Are there any references for this model?

A number of people have noted that this model is theoretically possible (Agresti 1984:113; Agresti 1990:330; Armstrong and Sloan 1989:194; Brant 1990:1172; Clogg and Shihadeh 1994:146-147; Fahrmeir and Tutz 1994:91; McCullagh and Nelder 1989:155; Maddala 1983:46), but they usually pass over it in favor of the more restrictive proportional odds model. Peterson and Harrell (1990) discuss a model similar to this one but conceptualize it slightly differently.

Agresti, A. 1984. Analysis of Ordinal Categorical Data. Wiley Series in Probability and Mathematical Statistics. New York: John Wiley & Sons.

Agresti, A. 1990. Categorical Data Analysis. Wiley Series in Probability and Mathematical Statistics. New York: John Wiley & Sons.

Armstrong, B. and M. Sloan. 1989. “Ordinal Regression Models for Epidemiologic Data.” American Journal of Epidemiology 129:191-204.

Brant, R. 1990. “Assessing Proportionality in the Proportional Odds Model for Ordinal Logistic Regression.” Biometrics 46:1171-1178.

Clogg, C. and E. Shihadeh. 1994. Statistical Models for Ordinal Variables. Advanced Quantitative Techniques in the Social Sciences Series Volume 4. Thousand Oaks, California: Sage Publications.

Fahrmeir, L. and G. Tutz. 1994. Multivariate Statistical Modeling Based on Generalized Linear Models. Springer Series in Statistics. New York: Springer-Verlag.

McCullagh, P. and J. Nelder. 1989. Generalized Linear Models. Second Edition. Monographs on Statistics and Applied Probability Number 37. New York: Chapman and Hall.

Maddala, G. 1983. Limited-dependent and Qualitative Variables in Econometrics. Econometric Society Monographs no. 3. New York: Cambridge University Press.

Peterson, B. and F. Harrell, Jr. 1990. “Partial Proportional Odds Models for Ordinal Response Variables.” Applied Statistics 39:205-217.

3. How do I test the proportional odds assumption?

Consider the familiar auto data. These data contain information on 1978 repair records of automobiles. Here is a table of the data:

. tab rep78

     Repair |
Record 1978 |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |          2        2.90        2.90
          2 |          8       11.59       14.49
          3 |         30       43.48       57.97
          4 |         18       26.09       84.06
          5 |         11       15.94      100.00
------------+-----------------------------------
      Total |         69      100.00

Since small cell sizes are a big problem for gologit, let us combine the lowest category (poor) with the second lowest category (fair). The new variable we will use has four categories: poor/fair, average, good, and excellent.

. recode rep78 1=2
(2 changes made)

. tab rep78

     Repair |
Record 1978 |      Freq.     Percent        Cum.
------------+-----------------------------------
          2 |         10       14.49       14.49
          3 |         30       43.48       57.97
          4 |         18       26.09       84.06
          5 |         11       15.94      100.00
------------+-----------------------------------
      Total |         69      100.00

Suppose we wanted to determine if repair records are related to where the car was manufactured (foreign or domestic) and we wanted to determine if the proportional odds assumption holds for our model.

The easiest way to do this is to use the command -omodel- (STB-42: sg76).

. omodel logit rep78 foreign

Iteration 0:   log likelihood = -88.688037
Iteration 1:   log likelihood = -74.691845
Iteration 2:   log likelihood = -74.040907
Iteration 3:   log likelihood = -74.025242
Iteration 4:   log likelihood = -74.025218

Ordered logit estimates                           Number of obs   =         69
                                                  LR chi2(1)      =      29.33
                                                  Prob > chi2     =     0.0000
Log likelihood = -74.025218                       Pseudo R2       =     0.1653

------------------------------------------------------------------------------
   rep78 |      Coef.   Std. Err.       z     P>|z|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
 foreign |    2.98155   .6203637      4.806   0.000        1.76566    4.197441
---------+--------------------------------------------------------------------
   _cut1 |  -1.362642   .3557343             (Ancillary parameters)
   _cut2 |   1.232161   .3431227  
   _cut3 |   3.246209   .5556646  
------------------------------------------------------------------------------

Approximate likelihood-ratio test of proportionality of odds
across response categories:
         chi2(2) =      0.60
       Prob > chi2 =    0.7415

Another way to do this would be to use a likelihood-ratio test by comparing the likelihood statistics of a proportional odds model and a generalized ordered logit model.

. ologit rep78 foreign

Iteration 0:   log likelihood = -88.688037
Iteration 1:   log likelihood = -74.691845
Iteration 2:   log likelihood = -74.040907
Iteration 3:   log likelihood = -74.025242
Iteration 4:   log likelihood = -74.025218

Ordered logit estimates                           Number of obs   =         69
                                                  LR chi2(1)      =      29.33
                                                  Prob > chi2     =     0.0000
Log likelihood = -74.025218                       Pseudo R2       =     0.1653

------------------------------------------------------------------------------
   rep78 |      Coef.   Std. Err.       z     P>|z|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
 foreign |    2.98155   .6203637      4.806   0.000        1.76566    4.197441
---------+--------------------------------------------------------------------
   _cut1 |  -1.362642   .3557343             (Ancillary parameters)
   _cut2 |   1.232161   .3431227  
   _cut3 |   3.246209   .5556646  
------------------------------------------------------------------------------

. gologit rep78 foreign
Iteration 0:  Log Likelihood = -88.688037
Iteration 1:  Log Likelihood = -74.819066
Iteration 2:  Log Likelihood = -73.867875
Iteration 3:  Log Likelihood = -73.768875
Iteration 4:  Log Likelihood = -73.736757
Iteration 5:  Log Likelihood = -73.732887
Iteration 6:  Log Likelihood = -73.732218
Iteration 7:  Log Likelihood = -73.732011
Iteration 8:  Log Likelihood = -73.731985
Iteration 9:  Log Likelihood =  -73.73198
Iteration 10:  Log Likelihood = -73.731979
Iteration 11:  Log Likelihood = -73.731979

Generalized Ordered Logit Estimates                 Number of obs    =      69
                                                    Model chi2(3)    =   29.91
                                                    Prob > chi2      =  0.0000
Log Likelihood =    -73.7319786                     Pseudo R2        =  0.1686

------------------------------------------------------------------------------
   rep78 |      Coef.   Std. Err.       z     P>|z|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
mleq1    |
 foreign |   18.83547   5340.977      0.004   0.997      -10449.29    10486.96
   _cons |   1.334995   .3554083      3.756   0.000       .6384077    2.031583
---------+--------------------------------------------------------------------
mleq2    |
 foreign |   3.004804   .7119183      4.221   0.000        1.60947    4.400138
   _cons |  -1.213014   .3434171     -3.532   0.000      -1.886099   -.5399287
---------+--------------------------------------------------------------------
mleq3    |
 foreign |   2.847834   .8462725      3.365   0.001       1.189171    4.506498
   _cons |  -3.135478   .7223112     -4.341   0.000      -4.551182   -1.719774
------------------------------------------------------------------------------

The likelihood statistics computed by -ologit- is -74.025218 and for -gologit- it is -73.7319786. Twice the difference of these two statistics follows a chi-square distribution with 2 degrees of freedom (the -gologit- model has 2 more parameters than the -ologit- model).

. display 2*(74.025218-73.7319786)
.5864788

. display chiprob(2,0.5864788)
.74584357

From these Stata commands, we see that the chi-square value we observe is not unlikely (p=0.7458) under the null hypothesis that the -ologit- model fits as well as the -gologit- model. In this case, then, the data do not violate the proportional odds assumption.

A third way to test the proportional odds assumption is to use a Wald test and test whether or not the coefficients in each panel are the same. After estimating the -gologit- model, we can use Stata’s -test- command.

. test [mleq2=mleq3], notest

 ( 1)  [mleq2]foreign - [mleq3]foreign = 0.0

. test [mleq1=mleq2], accumulate 

 ( 1)  [mleq2]foreign - [mleq3]foreign = 0.0
 ( 2)  [mleq1]foreign - [mleq2]foreign = 0.0

           chi2(  2) =    0.03
         Prob > chi2 =    0.9854

The final -test- command evaluates the joint hypothesis that the coefficient from panel 2 is the same as that in panel 3 and that the coefficient in panel 1 is the same as that in panel 2. Again, the data are consistent with the proportional odds assumption.

4. How can I generate predicted values?

Using the same data and model we used to test the proportional odds assumption, we can also generate predicted values.

. clear

. use auto
(1978 Automobile Data)

. recode rep78 1=2
(2 changes made)

. gologit rep78 foreign
Iteration 0:  Log Likelihood = -88.688037
Iteration 1:  Log Likelihood = -74.819066
Iteration 2:  Log Likelihood = -73.867875
Iteration 3:  Log Likelihood = -73.768877
Iteration 4:  Log Likelihood = -73.736757
Iteration 5:  Log Likelihood = -73.732888
Iteration 6:  Log Likelihood = -73.732218
Iteration 7:  Log Likelihood = -73.732011
Iteration 8:  Log Likelihood = -73.731985
Iteration 9:  Log Likelihood =  -73.73198
Iteration 10:  Log Likelihood = -73.731979
Iteration 11:  Log Likelihood = -73.731979

Generalized Ordered Logit Estimates                 Number of obs    =      69
                                                    Model chi2(3)    =   29.91
                                                    Prob > chi2      =  0.0000
Log Likelihood =    -73.7319786                     Pseudo R2        =  0.1686

------------------------------------------------------------------------------
   rep78 |      Coef.   Std. Err.       z     P>|z|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
mleq1    |
 foreign |   18.84205   5295.333      0.004   0.997      -10359.82     10397.5
   _cons |   1.334995   .3554083      3.756   0.000       .6384077    2.031583
---------+--------------------------------------------------------------------
mleq2    |
 foreign |   3.004804   .7119183      4.221   0.000        1.60947    4.400138
   _cons |  -1.213014   .3434171     -3.532   0.000      -1.886099   -.5399287
---------+--------------------------------------------------------------------
mleq3    |
 foreign |   2.847834   .8462726      3.365   0.001       1.189171    4.506498
   _cons |  -3.135478   .7223113     -4.341   0.000      -4.551182   -1.719774
------------------------------------------------------------------------------

-gologit- is a multiple-equation estimation command. So we must use the multiple-equation version of predict.

. predict xb1, equation(mleq1) xb

. predict xb2, equation(mleq2) xb

. predict xb3, equation(mleq3) xb

These three commands give you the sum of the products of the coefficients and their associated variables for the three panels of coefficients. We can interpret these sums the same way we would interpret ordinary binary logits. Recall that the four categories for our dependent variable are: poor/fair, average, good, and excellent. -xb1- is the log odds that a car has a better than fair repair record vs. a fair or poor repair record. -xb2- is the log odds that a car has a better than average repair record vs. an average or worse repair record. -xb3- is the log odds that a car has an excellent repair record vs. a good or worse repair record.

We can convert these log odds to probabilities using the following commands:

. gen p1 = 1/(1+exp(-xb1))

. gen p2 = 1/(1+exp(-xb2))

. gen p3 = 1/(1+exp(-xb3))

-p1- now contains the probability that a car has an average or better repair record. -p2- is the probability that a car has a better than average repair record. -p3- is the probability that a car has a better than good repair record.

. list foreign p1-p3 in 40/50

      foreign         p1         p2         p3 
 40. Domestic   .7916657   .2291682   .0416673  
 41. Domestic   .7916657   .2291682   .0416673  
 42.  Foreign          1   .8571466   .4285809  
 43. Domestic   .7916657   .2291682   .0416673  
 44.  Foreign          1   .8571466   .4285809  
 45.  Foreign          1   .8571466   .4285809  
 46.  Foreign          1   .8571466   .4285809  
 47. Domestic   .7916657   .2291682   .0416673  
 48.  Foreign          1   .8571466   .4285809  
 49. Domestic   .7916657   .2291682   .0416673  
 50.  Foreign          1   .8571466   .4285809  

From these cumulative probabilities we can calculate the probabilities for the individual categories.

. gen prob1 = 1-p1

. gen prob2 = p1-p2

. gen prob3 = p2-p3

. gen prob4 = p3

. list foreign prob1-prob4 in 40/50

      foreign      prob1      prob2      prob3      prob4 
 40. Domestic   .2083343   .5624975   .1875009   .0416673  
 41. Domestic   .2083343   .5624975   .1875009   .0416673  
 42.  Foreign          0   .1428534   .4285658   .4285809  
 43. Domestic   .2083343   .5624975   .1875009   .0416673  
 44.  Foreign          0   .1428534   .4285658   .4285809  
 45.  Foreign          0   .1428534   .4285658   .4285809  
 46.  Foreign          0   .1428534   .4285658   .4285809  
 47. Domestic   .2083343   .5624975   .1875009   .0416673  
 48.  Foreign          0   .1428534   .4285658   .4285809  
 49. Domestic   .2083343   .5624975   .1875009   .0416673  
 50.  Foreign          0   .1428534   .4285658   .4285809  

5. How do I interpret the coefficients?

You can interpret gologit coefficients as coefficients from binary logit models where you have collapsed the categories of your outcome variable into two categories.

Suppose your categories are numbered 1, 2, and 3. The first panel of coefficients can be interpreted as those from a binary logit regression where your dependent variable is recoded as 1 vs. 2+3. The second panel of coefficients can be interpreted as those from a binary logit regression where your dependent variable is recoded 1+2 vs. 3. Positive coefficients mean that higher values on the covariates make higher values on the dependent variable more likely.