Many people have reported the following error when trying to run -gologit- under Stata 7.
. gologit gologit: ado-file does not define command r(162);
Changes to Stata's internal code in version 7 no longer allow -gologit- to run. To use -gologit- under Stata 7, you need to replace the file gologit.ado on your computer with a new one. There is an updated version of -gologit- on the Boston College IDEAS Statistical Software Components website. To install the update, run the following command from within Stata:
. net install http://fmwww.bc.edu/RePEc/bocode/g/gologit, replace
checking gologit consistency and verifying not already installed...
the following files will be replaced:
c:\ado\stbplus\g\gologit.ado
c:\ado\stbplus\g\goll.ado
c:\ado\stbplus\g\gologit.hlp
installing into c:\ado\stbplus\...
installation complete.
A number of people have noted that this model is theoretically possible (Agresti 1984:113; Agresti 1990:330; Armstrong and Sloan 1989:194; Brant 1990:1172; Clogg and Shihadeh 1994:146-147; Fahrmeir and Tutz 1994:91; McCullagh and Nelder 1989:155; Maddala 1983:46), but they usually pass over it in favor of the more restrictive proportional odds model. Peterson and Harrell (1990) discuss a model similar to this one but conceptualize it slightly differently.
Agresti, A. 1984. Analysis of Ordinal Categorical Data. Wiley Series in Probability and Mathematical Statistics. New York: John Wiley & Sons.
Agresti, A. 1990. Categorical Data Analysis. Wiley Series in Probability and Mathematical Statistics. New York: John Wiley & Sons.
Armstrong, B. and M. Sloan. 1989. “Ordinal Regression Models for Epidemiologic Data.” American Journal of Epidemiology 129:191-204.
Brant, R. 1990. “Assessing Proportionality in the Proportional Odds Model for Ordinal Logistic Regression.” Biometrics 46:1171-1178.
Clogg, C. and E. Shihadeh. 1994. Statistical Models for Ordinal Variables. Advanced Quantitative Techniques in the Social Sciences Series Volume 4. Thousand Oaks, California: Sage Publications.
Fahrmeir, L. and G. Tutz. 1994. Multivariate Statistical Modeling Based on Generalized Linear Models. Springer Series in Statistics. New York: Springer-Verlag.
McCullagh, P. and J. Nelder. 1989. Generalized Linear Models. Second Edition. Monographs on Statistics and Applied Probability Number 37. New York: Chapman and Hall.
Maddala, G. 1983. Limited-dependent and Qualitative Variables in Econometrics. Econometric Society Monographs no. 3. New York: Cambridge University Press.
Peterson, B. and F. Harrell, Jr. 1990. “Partial
Proportional Odds Models for Ordinal Response Variables.” Applied Statistics
39:205-217.
Consider the familiar auto data. These data contain information on 1978 repair records of automobiles. Here is a table of the data:
. tab rep78
Repair |
Record 1978 | Freq. Percent Cum.
------------+-----------------------------------
1 | 2 2.90 2.90
2 | 8 11.59 14.49
3 | 30 43.48 57.97
4 | 18 26.09 84.06
5 | 11 15.94 100.00
------------+-----------------------------------
Total | 69 100.00
Since small cell sizes are a big problem for gologit, let us combine the lowest category (poor) with the second lowest category (fair). The new variable we will use has four categories: poor/fair, average, good, and excellent.
. recode rep78 1=2
(2 changes made)
. tab rep78
Repair |
Record 1978 | Freq. Percent Cum.
------------+-----------------------------------
2 | 10 14.49 14.49
3 | 30 43.48 57.97
4 | 18 26.09 84.06
5 | 11 15.94 100.00
------------+-----------------------------------
Total | 69 100.00
Suppose we wanted to determine if repair records are related to where the car was manufactured (foreign or domestic) and we wanted to determine if the proportional odds assumption holds for our model.
The easiest way to do this is to use the command -omodel- (STB-42: sg76).
. omodel logit rep78 foreign
Iteration 0: log likelihood = -88.688037
Iteration 1: log likelihood = -74.691845
Iteration 2: log likelihood = -74.040907
Iteration 3: log likelihood = -74.025242
Iteration 4: log likelihood = -74.025218
Ordered logit estimates Number of obs = 69
LR chi2(1) = 29.33
Prob > chi2 = 0.0000
Log likelihood = -74.025218 Pseudo R2 = 0.1653
------------------------------------------------------------------------------
rep78 | Coef. Std. Err. z P>|z| [95% Conf. Interval]
---------+--------------------------------------------------------------------
foreign | 2.98155 .6203637 4.806 0.000 1.76566 4.197441
---------+--------------------------------------------------------------------
_cut1 | -1.362642 .3557343 (Ancillary parameters)
_cut2 | 1.232161 .3431227
_cut3 | 3.246209 .5556646
------------------------------------------------------------------------------
Approximate likelihood-ratio test of proportionality of odds
across response categories:
chi2(2) = 0.60
Prob > chi2 = 0.7415
Another way to do this would be to use a likelihood-ratio test by comparing the likelihood statistics of a proportional odds model and a generalized ordered logit model.
. ologit rep78 foreign
Iteration 0: log likelihood = -88.688037
Iteration 1: log likelihood = -74.691845
Iteration 2: log likelihood = -74.040907
Iteration 3: log likelihood = -74.025242
Iteration 4: log likelihood = -74.025218
Ordered logit estimates Number of obs = 69
LR chi2(1) = 29.33
Prob > chi2 = 0.0000
Log likelihood = -74.025218 Pseudo R2 = 0.1653
------------------------------------------------------------------------------
rep78 | Coef. Std. Err. z P>|z| [95% Conf. Interval]
---------+--------------------------------------------------------------------
foreign | 2.98155 .6203637 4.806 0.000 1.76566 4.197441
---------+--------------------------------------------------------------------
_cut1 | -1.362642 .3557343 (Ancillary parameters)
_cut2 | 1.232161 .3431227
_cut3 | 3.246209 .5556646
------------------------------------------------------------------------------
. gologit rep78 foreign
Iteration 0: Log Likelihood = -88.688037
Iteration 1: Log Likelihood = -74.819066
Iteration 2: Log Likelihood = -73.867875
Iteration 3: Log Likelihood = -73.768875
Iteration 4: Log Likelihood = -73.736757
Iteration 5: Log Likelihood = -73.732887
Iteration 6: Log Likelihood = -73.732218
Iteration 7: Log Likelihood = -73.732011
Iteration 8: Log Likelihood = -73.731985
Iteration 9: Log Likelihood = -73.73198
Iteration 10: Log Likelihood = -73.731979
Iteration 11: Log Likelihood = -73.731979
Generalized Ordered Logit Estimates Number of obs = 69
Model chi2(3) = 29.91
Prob > chi2 = 0.0000
Log Likelihood = -73.7319786 Pseudo R2 = 0.1686
------------------------------------------------------------------------------
rep78 | Coef. Std. Err. z P>|z| [95% Conf. Interval]
---------+--------------------------------------------------------------------
mleq1 |
foreign | 18.83547 5340.977 0.004 0.997 -10449.29 10486.96
_cons | 1.334995 .3554083 3.756 0.000 .6384077 2.031583
---------+--------------------------------------------------------------------
mleq2 |
foreign | 3.004804 .7119183 4.221 0.000 1.60947 4.400138
_cons | -1.213014 .3434171 -3.532 0.000 -1.886099 -.5399287
---------+--------------------------------------------------------------------
mleq3 |
foreign | 2.847834 .8462725 3.365 0.001 1.189171 4.506498
_cons | -3.135478 .7223112 -4.341 0.000 -4.551182 -1.719774
------------------------------------------------------------------------------
The likelihood statistics computed by -ologit- is -74.025218 and for -gologit- it is -73.7319786. Twice the difference of these two statistics follows a chi-square distribution with 2 degrees of freedom (the -gologit- model has 2 more parameters than the -ologit- model).
. display 2*(74.025218-73.7319786) .5864788 . display chiprob(2,0.5864788) .74584357
From these Stata commands, we see that the chi-square value we observe is not unlikely (p=0.7458) under the null hypothesis that the -ologit- model fits as well as the -gologit- model. In this case, then, the data do not violate the proportional odds assumption.
A third way to test the proportional odds assumption is to use a Wald test and test whether or not the coefficients in each panel are the same. After estimating the -gologit- model, we can use Stata’s -test- command.
. test [mleq2=mleq3], notest
( 1) [mleq2]foreign - [mleq3]foreign = 0.0
. test [mleq1=mleq2], accumulate
( 1) [mleq2]foreign - [mleq3]foreign = 0.0
( 2) [mleq1]foreign - [mleq2]foreign = 0.0
chi2( 2) = 0.03
Prob > chi2 = 0.9854
The final -test- command evaluates the joint hypothesis that the coefficient from panel 2 is the same as that in panel 3 and that the coefficient in panel 1 is the same as that in panel 2. Again, the data are consistent with the proportional odds assumption.
Using the same data and model we used to test the proportional odds assumption, we can also generate predicted values.
. clear
. use auto
(1978 Automobile Data)
. recode rep78 1=2
(2 changes made)
. gologit rep78 foreign
Iteration 0: Log Likelihood = -88.688037
Iteration 1: Log Likelihood = -74.819066
Iteration 2: Log Likelihood = -73.867875
Iteration 3: Log Likelihood = -73.768877
Iteration 4: Log Likelihood = -73.736757
Iteration 5: Log Likelihood = -73.732888
Iteration 6: Log Likelihood = -73.732218
Iteration 7: Log Likelihood = -73.732011
Iteration 8: Log Likelihood = -73.731985
Iteration 9: Log Likelihood = -73.73198
Iteration 10: Log Likelihood = -73.731979
Iteration 11: Log Likelihood = -73.731979
Generalized Ordered Logit Estimates Number of obs = 69
Model chi2(3) = 29.91
Prob > chi2 = 0.0000
Log Likelihood = -73.7319786 Pseudo R2 = 0.1686
------------------------------------------------------------------------------
rep78 | Coef. Std. Err. z P>|z| [95% Conf. Interval]
---------+--------------------------------------------------------------------
mleq1 |
foreign | 18.84205 5295.333 0.004 0.997 -10359.82 10397.5
_cons | 1.334995 .3554083 3.756 0.000 .6384077 2.031583
---------+--------------------------------------------------------------------
mleq2 |
foreign | 3.004804 .7119183 4.221 0.000 1.60947 4.400138
_cons | -1.213014 .3434171 -3.532 0.000 -1.886099 -.5399287
---------+--------------------------------------------------------------------
mleq3 |
foreign | 2.847834 .8462726 3.365 0.001 1.189171 4.506498
_cons | -3.135478 .7223113 -4.341 0.000 -4.551182 -1.719774
------------------------------------------------------------------------------
-gologit- is a multiple-equation estimation command. So we must use the multiple-equation version of predict.
. predict xb1, equation(mleq1) xb . predict xb2, equation(mleq2) xb . predict xb3, equation(mleq3) xb
These three commands give you the sum of the products of the coefficients and their associated variables for the three panels of coefficients. We can interpret these sums the same way we would interpret ordinary binary logits. Recall that the four categories for our dependent variable are: poor/fair, average, good, and excellent. -xb1- is the log odds that a car has a better than fair repair record vs. a fair or poor repair record. -xb2- is the log odds that a car has a better than average repair record vs. an average or worse repair record. -xb3- is the log odds that a car has an excellent repair record vs. a good or worse repair record.
We can convert these log odds to probabilities using the following commands:
. gen p1 = 1/(1+exp(-xb1)) . gen p2 = 1/(1+exp(-xb2)) . gen p3 = 1/(1+exp(-xb3))
-p1- now contains the probability that a car has an average or better repair record. -p2- is the probability that a car has a better than average repair record. -p3- is the probability that a car has a better than good repair record.
. list foreign p1-p3 in 40/50
foreign p1 p2 p3
40. Domestic .7916657 .2291682 .0416673
41. Domestic .7916657 .2291682 .0416673
42. Foreign 1 .8571466 .4285809
43. Domestic .7916657 .2291682 .0416673
44. Foreign 1 .8571466 .4285809
45. Foreign 1 .8571466 .4285809
46. Foreign 1 .8571466 .4285809
47. Domestic .7916657 .2291682 .0416673
48. Foreign 1 .8571466 .4285809
49. Domestic .7916657 .2291682 .0416673
50. Foreign 1 .8571466 .4285809
From these cumulative probabilities we can calculate the probabilities for the individual categories.
. gen prob1 = 1-p1
. gen prob2 = p1-p2
. gen prob3 = p2-p3
. gen prob4 = p3
. list foreign prob1-prob4 in 40/50
foreign prob1 prob2 prob3 prob4
40. Domestic .2083343 .5624975 .1875009 .0416673
41. Domestic .2083343 .5624975 .1875009 .0416673
42. Foreign 0 .1428534 .4285658 .4285809
43. Domestic .2083343 .5624975 .1875009 .0416673
44. Foreign 0 .1428534 .4285658 .4285809
45. Foreign 0 .1428534 .4285658 .4285809
46. Foreign 0 .1428534 .4285658 .4285809
47. Domestic .2083343 .5624975 .1875009 .0416673
48. Foreign 0 .1428534 .4285658 .4285809
49. Domestic .2083343 .5624975 .1875009 .0416673
50. Foreign 0 .1428534 .4285658 .4285809
You can interpret gologit coefficients as coefficients from binary logit models
where you have collapsed the categories of your outcome variable into two categories.
Suppose your categories are numbered 1, 2, and 3. The first panel of coefficients can
be interpreted as those from a binary logit regression where your dependent variable is
recoded as 1 vs. 2+3. The second panel of coefficients can be interpreted as those from
a binary logit regression where your dependent variable is recoded 1+2 vs. 3. Positive
coefficients mean that higher values on the covariates make higher values on the dependent
variable more likely.