STATA - Chow test
“Chow test 检验法”是著名美籍华人、美国宾夕法尼亚大学教授邹至庄(G.C.CHOW,1960)于1960年
提出的一种数理检验方法。Chow’s断点(Breakpoint)检验的思想是对每个子样本单独拟合方程来观察
估计方程是否有显著差异。零假设是两个子样本拟合的方程无显著差异。有显著差异意味着关系中结构
改变。检验之前把数据分成两个或更多的子样本,对总体样本可拟合一个方程,对子样本可分别拟合方程,
Chow’s断点检验基于这两组方程的残差平方和的比较。
The Chow test provides a test of whether the set of linear regression parameters (i.e.,
the intercepts and slopes) is equal across groups. Hence the Chow test can applied not only to
time series data but also other data.However, the Stata can not do it directly. Here are some
discussions.
我是在设置虚拟变量时,遇到了这个问题,想要考察不同分组是否存在显著差异。下面是minixi网友
提供的一种算法,把它保留下来:
A good and simple case of minixi.
/*
在Stata中实现Chow检验的间接方法主要有3:
(1)F检验,
(2)引入虚拟变量比较简单
(3)似然比检验,下面以李子奈第二版77页的数据为例
*/
. use lzn77,clear
. /* Chow 模型稳定性检验(lrtest) */
. * 用似然比作chow检验,chow检验的零假设:前后期无结构变化
. * 估计前阶段模型(1981-1994)
. qui reg lnq lnxc lnp0 lnp1 in 1/14
. est store A
. * 估计后阶段模型(1995-2001)
. qui reg lnq lnxc lnp0 lnp1 in 15/21
. est store C
. * 整个区间上的估计结果保存为All
. qui reg lnq lnxc lnp0 lnp1 in 1/21
. est store All
. * 用似然比检验检验结构没有发生变化的约束
. lrtest (All)(A C),stats
Likelihood-ratio test LR chi2(4) = 45.00
Prob > chi2 = 0.0000
Assumption: (All) nested in (A, C)
——————————————————————————
Model | Obs ll(null) ll(model) df AIC BIC
————-+—————————————————————-
All | 21 -19.49077 47.11958 4 -86.23917 -82.06108
A | 14 -7.845462 38.61595 4 -69.23191 -66.67568
C | 7 12.72088 31.00136 4 -54.00272 -54.21908
——————————————————————————
Here are several discussions from www.stata.com.
1)http://www.stata.com/support/faqs/stat/chow.html
How can I compute the Chow test statistic?
Title Computing the Chow statistic
Author William Gould, StataCorp
Date January 1999; minor revisions July 2005
——————————————————————————–
You can include the dummy variables in a regression of the full model and then use the test
command on those dummies. You could also run each of the models and then write down the
appropriate numbers and calculate the statistic by hand—you also have access to functions
to get appropriate p-values.
——————————————————————————–
Here is a longer answer:
Let’s start with the Chow test to which many refer. Consider the model
y = a + b*x1 + c*x2 + u
and say that we have two groups of data. We could estimate that model on the two groups separately:
y = a1 + b1*x1 + c1*x2 + u for group == 1
y = a2 + b2*x1 + c2*x2 + u for group == 2
and we could estimate a single, pooled regression
y = a + b*x1 + c*x2 + u for both groups
In the last regression, we are asserting that a1==a2, b1==b2, and c1==c2. The formula for
the “Chow test” of this constraint is
ess_c – (ess_1+ess_2)
———————
k
———————————
ess_1 + ess_2
—————
N_1 + N_2 – 2*k
and this is the formula to which people refer. ess_1 and ess_2 are the error sum of squares
from the separate regressions, ess_c is the error sum of squares from the pooled (constrained)
regression, k is the number or estimated parameters (k=3 in our case), and N_1 and N_2 are the
number of observations in the two groups.
The resulting test statistic is distributed F(k, N_1+N_2-2*k).
Let’s try this. I have created small datasets:
clear
set obs 100
set seed 1234
generate x1 = uniform()
generate x2 = uniform()
generate y = 4*x1 – 2*x2 + 2*invnormal(uniform())
generate group = 1
save one, replace
clear
set obs 80
generate x1 = uniform()
generate x2 = uniform()
generate y = -2*x1 + 3*x2 + 8*invnormal(uniform())
generate group = 2
save two, replace
use one, clear
append using two
save combined, replace
The models are different in the two groups, the residual variances are different, and so are
the number of observations. With this dataset, I can carry forth the Chow test. First, I run
the separate regressions:
. regress y x1 x2 if group==1
Source | SS df MS Number of obs = 100
———+—————————— F( 2, 97) = 36.10
Model | 328.686307 2 164.343154 Prob > F = 0.0000
Residual | 441.589627 97 4.55247038 R-squared = 0.4267
———+—————————— Adj R-squared = 0.4149
Total | 770.275934 99 7.78056499 Root MSE = 2.1337
——————————————————————————
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
———+——————————————————————–
x1 | 5.121087 .728493 7.03 0.000 3.67523 6.566944
x2 | -3.227026 .7388209 -4.37 0.000 -4.693381 -1.760671
_cons | -.1725655 .5698273 -0.30 0.763 -1.303515 .9583839
——————————————————————————
. regress y x1 x2 if group==2
Source | SS df MS Number of obs = 80
———+—————————— F( 2, 77) = 5.02
Model | 544.11726 2 272.05863 Prob > F = 0.0089
Residual | 4169.24211 77 54.1460014 R-squared = 0.1154
———+—————————— Adj R-squared = 0.0925
Total | 4713.35937 79 59.6627768 Root MSE = 7.3584
——————————————————————————
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
———+——————————————————————–
x1 | -1.21464 2.9578 -0.41 0.682 -7.104372 4.675092
x2 | 8.49714 2.688249 3.16 0.002 3.144152 13.85013
_cons | -2.2591 1.91076 -1.18 0.241 -6.06391 1.545709
——————————————————————————
and then I run the combined regression:
. regress y x1 x2
Source | SS df MS Number of obs = 180
———+—————————— F( 2, 177) = 2.93
Model | 176.150454 2 88.0752272 Prob > F = 0.0559
Residual | 5316.21341 177 30.035104 R-squared = 0.0321
———+—————————— Adj R-squared = 0.0211
Total | 5492.36386 179 30.683597 Root MSE = 5.4804
——————————————————————————
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
———+——————————————————————–
x1 | 2.692373 1.41842 1.90 0.059 -.1068176 5.491563
x2 | 2.061004 1.370448 1.50 0.134 -.6435156 4.765524
_cons | -1.380331 1.017322 -1.36 0.177 -3.387973 .62731
——————————————————————————
For the Chow test,
ess_c – (ess_1+ess_2)
———————
k
———————————
ess_1 + ess_2
—————
N_1 + N_2 – 2*k
here are the relevant numbers copied from the output above:
ess_c = 5316.21341 (from combined regression)
ess_1 = 441.589627 (from group==1 regression)
ess_2 = 4169.24211 (from group==2 regression)
k = 3 (we estimate 3 parameters)
N_1 = 100 (from group==1 regression)
N_2 = 80 (from group==2 regression)
So, plugging in, we get
5316.21341 – (441.589628+4169.24211) 705.38167
———————————— ———
3 3
—————————————– = —————
441.589628 + 4169.24211 4610.8317
———————– ———
100+80-2*3 174
235.12722
= ———-
26.499033
= 8.8730491
The Chow test is F(k,N_1+N_2-2*k) = F(3,174), so our test statistic is F(3,174) = 8.8730491.
Now, I will do the same problem by running one regression and using test to test certain
coefficients equal to zero. What I want to do is estimate the model
y = a3 + b3*x1 + c3*x2 + a3′*g2 + b3′*g2*x1 + c3′*g2*x2 + u
where g2=1 if group==2 and g2=0 otherwise. I can do this by typing
. generate g2 = (group==2)
. generate g2x1 = g2*x1
. generate g2x2 = g2*x2
. regress y x1 x2 g2 g2x1 g2x2
Think about the predictions from this model. The model says
y = a3 + b3*x1 + c3*x2 + u when g2==0
y = (a3+a3′) + (b3+b3′)*x1 + (c3+c3′)*x2 + u when g2==1
Thus the model is equivalent to estimating the separate models
y = a1 + b1*x1 + c1*x2 + u for group == 1
y = a2 + b2*x1 + c2*x2 + u for group == 2
the relationship being
a1 = a3 a2 = a3 + a3′
b1 = b3 b2 = b3 + b3′
c1 = c3 c2 = c3 + c3′
Some of you may be concerned that in the pooled model (the one estimating a3, b3, etc.),
we are constraining the var(u) to be the same for each group, whereas, in the separate-equation
model, we estimate different variances for group 1 and group 2. This does not matter, because
the model is fully interacted. That is probably not convincing, but what should be convincing
is that I am about to obtain the same F(3,174) = 8.87 answer and, in my concocted data, I have
different variances in each group.
So, here is the result of the alternative test coeffiecients against 0 in a pooled specification:
. generate g2 = (group==2)
. generate g2x1 = g2*x1
. generate g2x2 = g2*x2
. regress y x1 x2 g2 g2x1 g2x2
Source | SS df MS Number of obs = 180
———+—————————— F( 5, 174) = 6.65
Model | 881.532123 5 176.306425 Prob > F = 0.0000
Residual | 4610.83174 174 26.499033 R-squared = 0.1605
———+—————————— Adj R-squared = 0.1364
Total | 5492.36386 179 30.683597 Root MSE = 5.1477
——————————————————————————
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
———+——————————————————————–
x1 | 5.121087 1.757587 2.91 0.004 1.652152 8.590021
x2 | -3.227026 1.782504 -1.81 0.072 -6.745139 .2910877
g2 | -2.086535 1.917507 -1.09 0.278 -5.871102 1.698032
g2x1 | -6.335727 2.714897 -2.33 0.021 -11.6941 -.9773583
g2x2 | 11.72417 2.59115 4.52 0.000 6.610035 16.8383
_cons | -.1725655 1.374785 -0.13 0.900 -2.885966 2.540835
——————————————————————————
. test g2 g2x1 g2x2
( 1) g2 = 0
( 2) g2x1 = 0
( 3) g2x2 = 0
F( 3, 174) = 8.87
Prob > F = 0.0000
Same answer.
This definition of the “Chow test” is equivalent to pooling the data, estimating the fully
interacted model, and then testing the group 2 coefficients against 0.
That is why I said, “Chow Test is a term I have heard used by economists in the context of
testing a set of regression coefficients being equal to 0.”
Admittedly, that leaves a lot unsaid.
The issue of the variance of u being equal in the two groups is subtle, but I do not want that
to get in the way of understanding that the Chow test is equivalent to the “pool the data,
interact, and test” procedure. They are equivalent.
Concerning variances, the Chow test itself is testing against a pooled, uninteracted model and
so has buried in it an assumption of equal variances. It is really a test that the coefficients
are equal and variance(u) in the groups are equal. It is, however, a weak test of the equality
of variances because that assumption manifests itself only in how the pooled coefficient estimates
are manufactured. Since the Chow test and the “pool the data, interact, and test” procedure
are the same, the same is true of both procedures.
Your second concern might be that in the “pool the data, interact, and test” procedure there
is an extra assumption of equality of variances because everything comes from the pooled model.
As shown, that is not true. It is not true because the model is fully interacted and so the
assumption of equal variances never makes a difference in the calculation of the coefficients.
===============================================
文章来源:http://oukai8459.blog.163.com/blog/static/14031750020119332257830/
原文链接:http://tezhengku.com/topic/?p=496
提出的一种数理检验方法。Chow’s断点(Breakpoint)检验的思想是对每个子样本单独拟合方程来观察
估计方程是否有显著差异。零假设是两个子样本拟合的方程无显著差异。有显著差异意味着关系中结构
改变。检验之前把数据分成两个或更多的子样本,对总体样本可拟合一个方程,对子样本可分别拟合方程,
Chow’s断点检验基于这两组方程的残差平方和的比较。
The Chow test provides a test of whether the set of linear regression parameters (i.e.,
the intercepts and slopes) is equal across groups. Hence the Chow test can applied not only to
time series data but also other data.However, the Stata can not do it directly. Here are some
discussions.
我是在设置虚拟变量时,遇到了这个问题,想要考察不同分组是否存在显著差异。下面是minixi网友
提供的一种算法,把它保留下来:
A good and simple case of minixi.
/*
在Stata中实现Chow检验的间接方法主要有3:
(1)F检验,
(2)引入虚拟变量比较简单
(3)似然比检验,下面以李子奈第二版77页的数据为例
*/
. use lzn77,clear
. /* Chow 模型稳定性检验(lrtest) */
. * 用似然比作chow检验,chow检验的零假设:前后期无结构变化
. * 估计前阶段模型(1981-1994)
. qui reg lnq lnxc lnp0 lnp1 in 1/14
. est store A
. * 估计后阶段模型(1995-2001)
. qui reg lnq lnxc lnp0 lnp1 in 15/21
. est store C
. * 整个区间上的估计结果保存为All
. qui reg lnq lnxc lnp0 lnp1 in 1/21
. est store All
. * 用似然比检验检验结构没有发生变化的约束
. lrtest (All)(A C),stats
Likelihood-ratio test LR chi2(4) = 45.00
Prob > chi2 = 0.0000
Assumption: (All) nested in (A, C)
——————————————————————————
Model | Obs ll(null) ll(model) df AIC BIC
————-+—————————————————————-
All | 21 -19.49077 47.11958 4 -86.23917 -82.06108
A | 14 -7.845462 38.61595 4 -69.23191 -66.67568
C | 7 12.72088 31.00136 4 -54.00272 -54.21908
——————————————————————————
Here are several discussions from www.stata.com.
1)http://www.stata.com/support/faqs/stat/chow.html
How can I compute the Chow test statistic?
Title Computing the Chow statistic
Author William Gould, StataCorp
Date January 1999; minor revisions July 2005
——————————————————————————–
You can include the dummy variables in a regression of the full model and then use the test
command on those dummies. You could also run each of the models and then write down the
appropriate numbers and calculate the statistic by hand—you also have access to functions
to get appropriate p-values.
——————————————————————————–
Here is a longer answer:
Let’s start with the Chow test to which many refer. Consider the model
y = a + b*x1 + c*x2 + u
and say that we have two groups of data. We could estimate that model on the two groups separately:
y = a1 + b1*x1 + c1*x2 + u for group == 1
y = a2 + b2*x1 + c2*x2 + u for group == 2
and we could estimate a single, pooled regression
y = a + b*x1 + c*x2 + u for both groups
In the last regression, we are asserting that a1==a2, b1==b2, and c1==c2. The formula for
the “Chow test” of this constraint is
ess_c – (ess_1+ess_2)
———————
k
———————————
ess_1 + ess_2
—————
N_1 + N_2 – 2*k
and this is the formula to which people refer. ess_1 and ess_2 are the error sum of squares
from the separate regressions, ess_c is the error sum of squares from the pooled (constrained)
regression, k is the number or estimated parameters (k=3 in our case), and N_1 and N_2 are the
number of observations in the two groups.
The resulting test statistic is distributed F(k, N_1+N_2-2*k).
Let’s try this. I have created small datasets:
clear
set obs 100
set seed 1234
generate x1 = uniform()
generate x2 = uniform()
generate y = 4*x1 – 2*x2 + 2*invnormal(uniform())
generate group = 1
save one, replace
clear
set obs 80
generate x1 = uniform()
generate x2 = uniform()
generate y = -2*x1 + 3*x2 + 8*invnormal(uniform())
generate group = 2
save two, replace
use one, clear
append using two
save combined, replace
The models are different in the two groups, the residual variances are different, and so are
the number of observations. With this dataset, I can carry forth the Chow test. First, I run
the separate regressions:
. regress y x1 x2 if group==1
Source | SS df MS Number of obs = 100
———+—————————— F( 2, 97) = 36.10
Model | 328.686307 2 164.343154 Prob > F = 0.0000
Residual | 441.589627 97 4.55247038 R-squared = 0.4267
———+—————————— Adj R-squared = 0.4149
Total | 770.275934 99 7.78056499 Root MSE = 2.1337
——————————————————————————
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
———+——————————————————————–
x1 | 5.121087 .728493 7.03 0.000 3.67523 6.566944
x2 | -3.227026 .7388209 -4.37 0.000 -4.693381 -1.760671
_cons | -.1725655 .5698273 -0.30 0.763 -1.303515 .9583839
——————————————————————————
. regress y x1 x2 if group==2
Source | SS df MS Number of obs = 80
———+—————————— F( 2, 77) = 5.02
Model | 544.11726 2 272.05863 Prob > F = 0.0089
Residual | 4169.24211 77 54.1460014 R-squared = 0.1154
———+—————————— Adj R-squared = 0.0925
Total | 4713.35937 79 59.6627768 Root MSE = 7.3584
——————————————————————————
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
———+——————————————————————–
x1 | -1.21464 2.9578 -0.41 0.682 -7.104372 4.675092
x2 | 8.49714 2.688249 3.16 0.002 3.144152 13.85013
_cons | -2.2591 1.91076 -1.18 0.241 -6.06391 1.545709
——————————————————————————
and then I run the combined regression:
. regress y x1 x2
Source | SS df MS Number of obs = 180
———+—————————— F( 2, 177) = 2.93
Model | 176.150454 2 88.0752272 Prob > F = 0.0559
Residual | 5316.21341 177 30.035104 R-squared = 0.0321
———+—————————— Adj R-squared = 0.0211
Total | 5492.36386 179 30.683597 Root MSE = 5.4804
——————————————————————————
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
———+——————————————————————–
x1 | 2.692373 1.41842 1.90 0.059 -.1068176 5.491563
x2 | 2.061004 1.370448 1.50 0.134 -.6435156 4.765524
_cons | -1.380331 1.017322 -1.36 0.177 -3.387973 .62731
——————————————————————————
For the Chow test,
ess_c – (ess_1+ess_2)
———————
k
———————————
ess_1 + ess_2
—————
N_1 + N_2 – 2*k
here are the relevant numbers copied from the output above:
ess_c = 5316.21341 (from combined regression)
ess_1 = 441.589627 (from group==1 regression)
ess_2 = 4169.24211 (from group==2 regression)
k = 3 (we estimate 3 parameters)
N_1 = 100 (from group==1 regression)
N_2 = 80 (from group==2 regression)
So, plugging in, we get
5316.21341 – (441.589628+4169.24211) 705.38167
———————————— ———
3 3
—————————————– = —————
441.589628 + 4169.24211 4610.8317
———————– ———
100+80-2*3 174
235.12722
= ———-
26.499033
= 8.8730491
The Chow test is F(k,N_1+N_2-2*k) = F(3,174), so our test statistic is F(3,174) = 8.8730491.
Now, I will do the same problem by running one regression and using test to test certain
coefficients equal to zero. What I want to do is estimate the model
y = a3 + b3*x1 + c3*x2 + a3′*g2 + b3′*g2*x1 + c3′*g2*x2 + u
where g2=1 if group==2 and g2=0 otherwise. I can do this by typing
. generate g2 = (group==2)
. generate g2x1 = g2*x1
. generate g2x2 = g2*x2
. regress y x1 x2 g2 g2x1 g2x2
Think about the predictions from this model. The model says
y = a3 + b3*x1 + c3*x2 + u when g2==0
y = (a3+a3′) + (b3+b3′)*x1 + (c3+c3′)*x2 + u when g2==1
Thus the model is equivalent to estimating the separate models
y = a1 + b1*x1 + c1*x2 + u for group == 1
y = a2 + b2*x1 + c2*x2 + u for group == 2
the relationship being
a1 = a3 a2 = a3 + a3′
b1 = b3 b2 = b3 + b3′
c1 = c3 c2 = c3 + c3′
Some of you may be concerned that in the pooled model (the one estimating a3, b3, etc.),
we are constraining the var(u) to be the same for each group, whereas, in the separate-equation
model, we estimate different variances for group 1 and group 2. This does not matter, because
the model is fully interacted. That is probably not convincing, but what should be convincing
is that I am about to obtain the same F(3,174) = 8.87 answer and, in my concocted data, I have
different variances in each group.
So, here is the result of the alternative test coeffiecients against 0 in a pooled specification:
. generate g2 = (group==2)
. generate g2x1 = g2*x1
. generate g2x2 = g2*x2
. regress y x1 x2 g2 g2x1 g2x2
Source | SS df MS Number of obs = 180
———+—————————— F( 5, 174) = 6.65
Model | 881.532123 5 176.306425 Prob > F = 0.0000
Residual | 4610.83174 174 26.499033 R-squared = 0.1605
———+—————————— Adj R-squared = 0.1364
Total | 5492.36386 179 30.683597 Root MSE = 5.1477
——————————————————————————
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
———+——————————————————————–
x1 | 5.121087 1.757587 2.91 0.004 1.652152 8.590021
x2 | -3.227026 1.782504 -1.81 0.072 -6.745139 .2910877
g2 | -2.086535 1.917507 -1.09 0.278 -5.871102 1.698032
g2x1 | -6.335727 2.714897 -2.33 0.021 -11.6941 -.9773583
g2x2 | 11.72417 2.59115 4.52 0.000 6.610035 16.8383
_cons | -.1725655 1.374785 -0.13 0.900 -2.885966 2.540835
——————————————————————————
. test g2 g2x1 g2x2
( 1) g2 = 0
( 2) g2x1 = 0
( 3) g2x2 = 0
F( 3, 174) = 8.87
Prob > F = 0.0000
Same answer.
This definition of the “Chow test” is equivalent to pooling the data, estimating the fully
interacted model, and then testing the group 2 coefficients against 0.
That is why I said, “Chow Test is a term I have heard used by economists in the context of
testing a set of regression coefficients being equal to 0.”
Admittedly, that leaves a lot unsaid.
The issue of the variance of u being equal in the two groups is subtle, but I do not want that
to get in the way of understanding that the Chow test is equivalent to the “pool the data,
interact, and test” procedure. They are equivalent.
Concerning variances, the Chow test itself is testing against a pooled, uninteracted model and
so has buried in it an assumption of equal variances. It is really a test that the coefficients
are equal and variance(u) in the groups are equal. It is, however, a weak test of the equality
of variances because that assumption manifests itself only in how the pooled coefficient estimates
are manufactured. Since the Chow test and the “pool the data, interact, and test” procedure
are the same, the same is true of both procedures.
Your second concern might be that in the “pool the data, interact, and test” procedure there
is an extra assumption of equality of variances because everything comes from the pooled model.
As shown, that is not true. It is not true because the model is fully interacted and so the
assumption of equal variances never makes a difference in the calculation of the coefficients.
===============================================
文章来源:http://oukai8459.blog.163.com/blog/static/14031750020119332257830/
原文链接:http://tezhengku.com/topic/?p=496