Publications: Preventing Subsequent Births
The Evaluators Reply
Michael
Camasso, School of Social Work and Center for Urban Policy Research, Rutgers
University,
Carol Harvey, Center for State Health Policy, Rutgers University
Radha Jagannathan, Woodrow Wilson School of Public and International Affairs,
Princeton University
Mark R. Killingsworth, Department of Economics, Rutgers University
Peter
Rossi (see chapter X) points to three main �deficiencies� that, according
to him, are �serious enough to cast strong doubts on the validity of the
findings� in our work on New Jersey�s Family Development Program (FDP). In
our view, however, we do indeed know what happened as a result of FDP.
Are the Statistical Models Inappropriate?
For both our experimental�control and pre�post analyses, we presented
numerous tables of logit and probit estimates. The final results highlighted
in our reports are based on probit models. Rossi nevertheless finds it
�surprising that the researchers present OLS results and appear to regard
them as valid.� Our conclusions are invariant to the statistical methods
used. This should not be surprising to any experienced analyst: OLS and other
simple estimators often are surprisingly robust, much more so than Rossi
appears to realize. Rossi would have the reader believe that our results are
highly sensitive to the statistical technique used, and he presents a
comparison of our birth and abortion estimates derived from OLS, logit, and
probit models. The numbers are different, as would be expected, but are they
illustrative of a big difference? For example, the difference between our
highest and lowest estimates of the family cap�s effect on abortions (2,064
vs. 1,329) is about 1.7 percent of all actual abortions occurring during the
treatment period (approximately 41,000). Differences of this size should not
surprise (or disturb) anyone who understands that even statistically
significant point estimates come with a standard error.
Rossi also argues that because we used longitudinal data in which
the same people appear at numerous dates, the observations may be serially
dependent. Because of this, he contends, even the logit and probit procedures
we have used are invalid. He also conjectures that use of a more �valid�
(to him) estimator could yield different results.
We have performed further
analyses of the experimental�control and pre�post data using the
statistical methodology that (at least at this point) Rossi seems to prefer:
Probit and logit with robust (Huber-sandwich) standard errors. Table 1
compares estimated treatment coefficients and standard errors for our
experimental design models for two outcomes (own births and abortions)
using two estimation methods: Probit and Huber-adjusted probit. Table 2
provides analogous information from our pre�post analysis.
Different estimation methods do not yield appreciably different
coefficient standard errors or, therefore, warrant any change in our
inferences regarding the impact of FDP on births or abortions. There is simply
no empirical support for Rossi�s speculations about the possible impact of
serial dependence of the observations.
Rather than withdraw his speculation, Rossi now resorts to even
more speculation, asserting that the �other deficiencies� in our work
�are sufficiently serious� that even estimates based on his own preferred
statistical method, Huber-corrected probit, �are not credible.� This
remarkable claim, which Rossi did not make in any of the previous drafts of
his critique, raises an interesting question: If even the Huber-corrected
estimates are �not credible,� why did Rossi argue so strenuously for
computing them?
Does the Pre�Post Analysis Suffer from Omitted-Variables Bias?
Rossi�s second major criticism is that our pre�post analysis �cannot
take into account the effects of time or other events that might affect
outcomes.� However, Rossi conspicuously neglects to identify any variables
that we should have included but did not. He does note that the period from
1991 to 1996 saw increases in the proportion of Aid to Families with Dependent
Children (AFDC) cases with either an ineligible adult caretaker or a
never-married woman, and he speculates that �the effects estimated by the
pre�post analysis are likely confounded with those changes.� However, he
apparently does not realize that we included variables that explicitly
identify the presence of ineligible adult caretakers and never-married women.
Are there other variables that we should have included but did not? Rossi says
nothing about what these other variables might be.
Was
the Experiment Flawed in Its Implementation?
The experimental design was developed by the Department of Human Services,
State of New Jersey, and the Administration for Children and Families, U.S.
Department of Health and Human Services. The evaluators were not involved in
either the development or the implementation of this experiment. We did,
however, monitor the experimental sample to detect any evidence of
experimental�control crossover or contamination.
Rossi offers two main reasons to support his conclusion that
implementation of FDP was flawed. First, he argues that �the integrity of
the control group was problematic.� He notes that �about 20�
control-group recipients did not receive benefit increases appropriate to
their status during the first year or so of FDP. He then assumes that �only
a small proportion of recipients will have children in that period of time.�
He therefore asserts that this error must mean that �a much larger number of
women were mistakenly treated as if they were under FDP rules.� Of course,
Rossi does not actually know that this was the case; it is merely speculation.
We believe that there is no support in the data for such speculation. Although
we uncovered several cases of questionable Medicaid extension, our monitoring
of more than a dozen other FDP treatment blocks revealed no additional
incidence of contamination. The 21 cases we identified as having been
erroneously denied benefit increases never grew in number. Of these, 16 cases
were handled by a single field office within a large urban county. Rossi�s
speculation is appropriate only if it can be assumed that such benefit denials
were random and widespread, rather than limited and idiosyncratic, as we
strongly believe was the case.
Rossi also argues that many experimental and control subjects did
not know which rules applied to them. For several reasons, we are perplexed by
this argument. First, although Rossi claims that no conclusions can be drawn
from our other analyses because of their methodological flaws, he bases this
conclusion on the responses to what he concedes is one �poorly worded
question� in a 275-question survey, which had a 41-percent response rate.
Second, although Rossi now feels able to draw conclusions from this survey,
two years ago he argued that the �serious methodological flaws� in this
survey are so severe that �no one may ever know whether New Jersey�s
�family cap� had any impact on the birth rates of mothers on welfare�
(see Besharov, Germanis, and Rossi 1997, 20�21). Of course, Rossi may now
have changed his mind and may now truly feel that the survey provides credible
evidence. If so, however, we are surprised that he ignored results in this
survey showing, for example, that experimental cases were more likely than
control cases to have decided to put off having more children, advised a
friend to have an abortion, begun to use contraception methods more
consistently, begun to use different contraception methods, sought
family-planning advice, received birth-control or abortion counseling, and
tried harder to get off welfare. We would nevertheless suggest that this
client survey is much more �problematic� (to use Rossi�s term) than
Rossi seems to realize. For example, Jagannathan (1999) found that only 29
percent of actual abortions were reported in the survey. In addition, she
finds that many women tended to report an actual abortion as a birth.
Rossi also says that the FDP evaluation should have done follow-up
studies of births and abortions for women (in both the experimental and the
control groups) after leaving AFDC. We endorse this idea, particularly because
the evaluation contract did not provide for follow-up data collection of this
nature and because, so far as we are aware, no such follow-up analysis has ever
been funded. We would also note that, by its very nature, any such follow-up
analysis of post-AFDC behavior in New Jersey would have to rely on
respondents� self-reporting (which can be quite unreliable) instead of on
administrative records and might well suffer from attrition. Thus, such an
analysis might well be more �problematic� than Rossi seems to realize.
The evaluation of New Jersey�s FDP is the only completed
evaluation of a family-cap policy that includes analysis of births, abortions,
family-planning visits, and contraception use. Many of Rossi�s criticisms are
based solely on speculation; the rest are resoundingly rejected by the empirical
evidence. Perhaps most important, Rossi never explains why two distinctly
different evaluation designs have led to similar results. To accept Rossi�s
conjectures, one must be willing to believe that this confluence of findings is
a mere artifact of design flaws and differences in estimation methods.
We can argue whether New Jersey�s family cap was a good policy.
We can discuss whether one can generalize from New Jersey to other states. We
cannot, however, avoid the conclusion that women on welfare in New Jersey
responded to a family cap in an entirely predictable way by reducing pregnancies
and births and (temporarily) increasing abortions.
References
Besharov, D. J.;
Germanis, P.; and Rossi, P. H. 1997. Evaluating
welfare reform: A guide for scholars and practitioners. College Park:
University of Maryland School of Public Affairs.
Jagannathan, R. 1999. Who
tells the truth when reporting abortions: A study of women on AFDC. Princeton,
NJ: Princeton University, Office of Population Research, 1999.
Table
1.
Treatment-Effect Coefficients: Experimental Design
ONGOING
CASES NEW
CASES
|
Treatment (SE)
|
Time*Status (SE)
|
Treatment (SE)
|
Time * Status (SE)
|
Own Births
|
|
|
|
|
Probit
|
0.029
(0.053)
|
-0.011*
(0.006)
|
-0.098*
(0.035)
|
NC
|
Huber-adjusted
probit
|
0.029
(0.052)
|
-0.011*
(0.005)
|
-0.098*
(0.040)
|
NC
|
|
|
|
|
|
Abortions
|
|
|
|
|
Probit
|
-0.037
(0.046)
|
0.007
(0.005)
|
0.170*
(0.065)
|
-0.010
(0.008)
|
Huber-adjusted
probit
|
-0.037
(0.052)
|
0.007
(0.005)
|
0.170*
(0.070)
|
-0.010
(0.008)
|
|
|
|
|
|
SE = Standard error
* p �
.05 [??AU: correct here and in table 2?]
Table 2.
Treatment-Effect Coefficients: Pre�Post Analysis
|
Middle (SE)
|
Post (SE)
|
Time * Middle (SE)
|
Time * Post (SE)
|
Own Births |
|
|
|
|
Probit
|
0.202*
(0.054)
|
-0.041*
(0.019)
|
-0.027*
(0.006)
|
-0.011*
(0.003)
|
Huber-adjusted
probit
|
0.201*
(0.054)
|
-0.041*
(0.018)
|
-0.027*
(0.006)
|
-0.011*
(0.003)
|
|
|
|
|
|
Abortions
|
|
|
|
|
Probit
|
-0.067
(0.050)
|
0.040*
(0.018)
|
0.009
(0.006)
|
-0.001
(0.003)
|
Huber-adjusted
probit
|
-0.065
(0.048)
|
0.039*
(0.018)
|
0.009
(0.006)
|
-0.001
(0.003)
|
|
|
|
|
|
SE = Standard error
* p �
.05
Back to top