ANALYSIS OF EVALUATIONS OF THE NEW JERSEY FAMILY DEVELOPMENT PROGRAM1
Gene Falk and Christine Devere
Congressional Research Service
This memorandum is in response to your request for an analysis of evaluations of the New Jersey Family Development Program (FDP). You specifically asked that we discuss conclusions concerning the effect of the family cap on abortions.
We have analyzed two reports currently available on the New Jersey FDP completed by a team of researchers from Rutgers University. These two reports reach conflicting conclusions. The first report, based on evidence from a randomized experiment, found that the family cap had no effect on abortions.2 The second report, a statistical analysis of trends in abortions among the state's welfare population, found a statistical relationship between implementation of the family cap and abortions. Specifically, this report concluded that abortions were higher than expected after implementing the family cap.3
We have reviewed the findings and the methodology presented in these two reports. In our judgment, the results presented in both reports are inconclusive. While both techniques are appropriate, there were problems in implementation. The finding from the randomized experiment that the family cap did not affect abortions might be due to confusion of members of the "control" group as to whether or not they were subject to the family cap, which could have "contaminated" the experiment. The finding from the statistical analysis that implementation of the family cap was associated with an increase in abortions could be an artifact of the particular statistical model chosen for presentation in the study. These issues are discussed at length below. Additionally, given the inconclusive body of literature that exists surrounding the relationship between welfare benefits and fertility, in general, it is not possible to determine whether either finding is consistent with other studies.4
Every evaluation technique has its strength and its weaknesses. Experimental design has the inherent problem of a lack of generalizability. For example, any result found in the first report with implications for the state of New Jersey cannot be generalized to other welfare programs throughout the nation. On the other hand, statistical analysis has the inherent problem of its sensitivity to the particular statistical model chosen. As every model differs in how the statistical estimates are calculated, the results may simply be a function of the particular model chosen. Therefore, in both instances, repeated replication and testing of the data as well as continued development of the model is recommended.
Family Development Plan (FDP)
The FDP, which was signed into law in 1992 and officially implemented in October of 1992, represented New Jersey's effort to experiment with welfare reform under Section 1115 of the Social Security Act. The "family cap" provision of this reform has attracted the most attention. This provision precludes an Aid to Families with Dependent Children (AFDC) recipient from receiving additional cash benefits for a child that was conceived while the recipient parent was on welfare. The actual loss in cash benefits amounts to $102 per month for a second child and $64 per month for any additional children.5 The child, however, remains eligible for Medicaid coverage, food stamps, and other benefits. Further, the parent may offset the welfare cash loss by earnings (through a special disregard of earnings used to calculate the family benefit). The FDP experiment also includes some enriched social services for welfare families, special work requirements and sanctions and more liberal treatment of stepparent families.
As a condition of federal approval for New Jersey's FDP, the state agreed to evaluate the program. New Jersey conducted an evaluation based, in part, on a randomized experiment. As there were issues of possible contamination within this experimental design, a nonexperimental, multivariate statistical analysis was also completed by the research team to evaluate FDP. It is this second report (currently in draft form) that has identified a relationship between the FDP and the number of abortions among the AFDC population in New Jersey.
Evaluating the Family Cap
The process by which welfare recipients (and others) receive, assimilate, and then act on policy "signals" sent by lawmakers is not well understood. Evaluating policies such as the family cap will pose special problems for researchers. There is a large body of literature about evaluating different types of benefits and services provided by government programs, such as education or job training, and how they affect outcomes. However, for the family cap to affect outcomes (e.g., out-of-wedlock births and abortions) it must influence highly personal behavior. The change in policy must be communicated and the information about the consequences of certain types of behavior understood by welfare recipients (and others in the population). Then, this information must be acted on by the individual recipients.
Welfare reform has had a high profile in the media and in political debate. Much of the publicity surrounding welfare reform likely occurs during legislative consideration of proposals. This is often well in advance of formal implementation dates for policy initiatives and consequently well in advance of the selection of recipients into experimental and control groups for evaluation. For recipients, it is not known how much they act on information gleaned from the media versus that given to them by caseworkers or heard through the neighborhood. This leads evaluators to wonder when one can expect behavioral change to occur. Is it while reform proposals are given much publicity when they are being considered, the actual implementation date, or after implementation? Moreover, it is possible that recipients might act based on general impressions of welfare policy changes (e.g., tougher or laxer) rather than program specifics.
Evaluations of this type of policy change must be able to capture changes in behavior. However, the models and data used to evaluate policies such as the family cap often are not structured properly to provide this information. These evaluations often are not designed to measure individual behavioral changes. Therefore, at best, any relationship illustrated is simply a statistical association and cannot be proved to be the result of causation. With the family cap program, the model and data simply offer insight into whether the FDP affected the number of abortions or did not affect the number of abortions. The experimental design and statistical analysis used in both reports lack the structure and the ability to offer insight into why the family planning outcome changed or how the family planning outcome changed, issues that are important in evaluating the full impact of policies such as the family cap.
Results from the Experiment
The first report evaluated an experimental design where AFDC recipients from eight New Jersey counties were randomly assigned to either a control group or an experimental group.6 The control group received only JOBS/REACH services and benefits and was subject to AFDC program regulations. The experimental group, however, was subject to all provisions and waivers under the FDP. The purpose of this experimental design was to ascertain any impact that the FDP program might be having on the AFDC population. The randomization process is designed to create groups that are equivalent in their observed and unobserved characteristics. Therefore, in the absence of the policy change, it is expected that these two groups will behave in the same manner. Any difference in behavior or outcomes measured between the control group and the experimental group is assumed to be an effect of difference in program treatment.
The experiment was supplemented by a two-part survey. The first component of this study included a survey of case managers to determine how implementation, responsibilities and workers' views differed across the counties. In general, the researchers found that there were differences across the counties in the sample in terms of implementation and that these differences may have some impact on the outcomes of the program. The second component was a survey of the AFDC recipients selected in the sample to ascertain their perspectives on various program components such as incentives, job training, fertility, and family planning.
The researchers found no difference in abortions between the control group and the experimental group of women. However, all experimental designs have their strengths and their weaknesses. While all experimental results are subject to interpretation, the results in the first report are subject to even greater scrutiny due to contamination issues. There were essentially two types of contamination - media contamination and treatment contamination. Of primary concern in evaluating the results of the first report is the problem of media contamination.7
New Jersey was one of the first states to institute a family cap and therefore this provision received a great deal of media coverage and exposure. The researchers may have succeeded in successfully randomizing the population, but they most likely did not succeed in attempting to shield those not subject to the family cap provision from knowledge of the family cap provision. Specifically, while the individuals in the control group were not subject to any provisions of the FDP, given their probable exposure by the media to the provisions of this program, there is no way to be sure these individuals did not behave as though they thought they were subject to the FDP. For example, results of the AFDC recipient survey indicated that 62% of the control group members did not recognize their control group status and therefore most likely did not recognize that they were not subject to the family cap provision. Therefore, if the individuals in the control group behaved based on the belief that they were subject to the family cap, the results obtained from this group would not be reliable.8
Given the difficulties with the experimental design used in the first report, the second analysis completed by the research team used non-experimental multivariate statistical techniques to examine the effect of the family cap on abortions. The general methodology of multivariate statistical analysis is an appropriate technique for examining whether a relationship exists between the family cap and abortions. Given the problems of media contamination in the FDP experiment, it is legitimate to resort to a statistical analysis to examine the effects of the program. However, inherent in such an analysis is that findings can be very sensitive to the exact type of statistical model used. Therefore, the results of such studies often are best viewed in the context of a body of literature that addresses the same or similar issues. However, the literature that examines the statistical relationship between AFDC and fertility is inconclusive. Therefore, it is not possible to determine whether the findings are consistent with other studies that have been completed.
This study used administrative data from New Jersey's Division of Family Development, the Division of Medical Assistance and the Department of Health to create a database of 306,544 cases. Each of these cases experienced at least one quarter of AFDC recipiency between October of 1990 and December of 1996 and were not included as treatment or control cases in the five-year evaluation of the FDP program.
The study estimated the effect of AFDC benefit levels on abortion rates while controlling for other individual characteristics. Individual characteristics include age, race, education, the number of children on the AFDC grant, time on AFDC, the number of children under age 6, and the number of children age 16 and older. In addition, as the first report found implementation and program outcome differences by county, any possible regional effects were controlled for over time.
In sum, the second report, based on statistical analysis, found that the FDP appears to have increased the number of abortions by 2.4 for every 1,000 "women at risk." As the research team is assuming there are approximately 100,000 women at risk (which excludes non-needy parent person cases), they have reported an increase of 240 abortions in response to the family cap.9 This is in comparison to any increase that might have been expected due to changes in population and changes over time,
As with any statistical analysis, the model used for the second report is dependent on any assumptions or restrictions inherent in the design of the model. With this in mind, one of the primary issues with this report is its inability to illustrate to the reader the sensitivity of the model to changes in assumptions. The purpose of a sensitivity analysis is to vary some of the parameters defined in the model to illustrate how changes in these parameters affect the reported results. An inherent problem with statistical analysis is the possibility that the results reported might simply be a function of the chosen estimation model. As the second report does not contain any sensitivity analysis, we suggest the likelihood that the results could be the artifact of the particular model chosen by the research team. This report also does not address the following issues:
- The selection of the statistical model. Of primary interest in evaluating FDP is whether or not the recipient had an abortion. Therefore, this variable is a discrete condition meaning the recipient can only be classified as choosing to have an abortion or not to have an abortion. There are three types of estimation models used to evaluate variables that are discrete conditions. These are linear probability models, logit models, and probit models. In the second report, the research team used a linear probability model. In many instances, the three models will produce similar results. However, logit or probit models are more frequently used to evaluate discrete variables. This is due to the concern in how the estimates are calculated when applying a linear probability model. In some instances, these estimates may be calculated incorrectly. Given this concern, the question remains as to whether the relationships presented in this report are significant. Additionally, the direction in which these relationships may be biased is not clear. Relationships identified as significant may not exist, and vice versa. There was little discussion as to why the linear probability model was used and given the concern with its estimation technique, we question whether the relationship between the family cap and the number of abortions actually exists. While the research team expressed concern over the estimation of this model using logit or probit, this concern was not sufficiently supported. The logit and probit estimation models were not explored. Additional testing and additional analysis should be completed in order to ensure that the results are not simply an artifact of the linear probability model estimation.
- The timing of the variable to measure the intervention effect. The FDP was implemented in October of 1992. However, in order to allow for the time concerns surrounding decisions that were effected by FDP, any abortions after March of 1993 (approximately 5 months after implementation) are defined as post-program abortions. It is debatable as to whether this is the correct timing to capture the FDP effect since many pregnancies could have occurred prior to program implementation. Given the media attention that was given to the family cap prior to implementation, the point at which this policy affected individual behavior is unclear. However, the research team assumed only one intervention point possibility. At the very least, different possible intervention points should have been explored since this point of intervention is open to much debate and criticism.
- The selection of included variables. There is little discussion of the variables chosen for inclusion in the model. It would appear that the variables included in the model were limited to those available from the administrative data set. This data set is not a complete profile of relevant characteristics of AFDC recipients. The possibility of omitted relevant variables might have biased the results presented in the report. Omitted variable bias is a common methodological concern that was not addressed.
- Interaction of the variables. This is another aspect of the need for sensitivity analysis. There is the possibility that some of these variables might be interacting to produce the effect presented in the results. For example, the number of children a recipient has could be related to the recipient's age, or the recipient's education. Therefore, rather than simply the number of children by the recipient affecting the decision to have an abortion, it could be the combined effect of the recipient's age or education and the number of children by the recipient that is affecting the decision to have an abortion. It is possible that these issues were explored. However, once again there is no discussion of any additional testing or consideration given to this issue.
- Selection bias. The population of individuals who received welfare benefits over time will change as individuals self-select into the welfare system. The data used in the second report cover the time period from 1990 to 1996. Over this period of time, individuals will have entered the welfare system as well as exited from the welfare system. A component of the estimation of this model to evaluate the effect of FDP on abortions is the ability to control for the different characteristics of individuals as the population changes over time. While many of these characteristics can be observed, some can not be observed. The research team included a measure of time in order to control for variations in characteristics of the welfare population over this period of time. While this time variable will control for many of these unobserved characteristics, it will not control for all the possible unobservable characteristics that change over time. Therefore, the results of the report may be biased (incorrectly estimated).
Social science experiments and research are subject to the assumptions and to the parameters defined by the researcher. Therefore, the results presented are subject to interpretation and further testing. The first report suffers from contamination problems from within the experiment. The second report fails to address many common methodological issues in presenting the results and does not sufficiently caution the reader about context and interpretation. The results must be viewed with some caution. Although it is possible that additional investigation of these data would produce the same results, the results of the two reports should be interpreted with a great deal of caution until further testing or exploration of data can be completed.
1 This paper is a July 9, 1998, memo from the authors to Ron Haskins of the House Ways and Means Committee.