Skip to main content

Development of a WIC Participant and Program Characteristics Longitudinal Data Set

The goal of this study was to pilot creating a Special Supplemental Nutrition Program for Women, Infants, and Children (WIC) participant and program characteristics (PC) longitudinal data set with one WIC state agency. This report describes the process of working with one WIC state agency to create a pilot WIC PC longitudinal data set of infant and child participants and the challenges and successes of this effort.

Seven of eight WIC state agencies surveyed had “ideal” or “sufficient” capabilities to provide a longitudinal WIC PC data set.

Three had “ideal” capabilities and only one scored “insufficient” for any of the necessary capabilities (Figure 1; see Table 1 for definitions). The ability to provide longitudinal WIC PC data was based on whether a state agency included data elements that could be used to link records across data sets (e.g., participant identification (ID), last and first name), the type and amount of retrospective data available, availability of key WIC PC variables, and whether participant IDs are retained when participants re-enroll.

Table 1. Summary of management information system (MIS) criteria for state agency selection.
MIS CriteriaMIS Capabilities: IdealMIS Capabilities: SufficientMIS Capabilities: Insufficient
Data elements that could be used to link records across data setsConsistent participant identification (ID), household ID, and first and last nameConsistent participant ID onlyNo data elements that could be used to link records across data sets
Type of available dataAll instances of updates and changes to each participant’s recordAll certification and recertification visits but no records from other visit typesInconsistent or uncertain frequency of records across caseload
Years of available data5 years of retrospective dataBetween 4 and 5 years of retrospective data and capability to provide periodic prospective submissionsFewer than 5 years of data
Scope of available dataAll infants and children enrolled at any time during previous 5 years with an indicator for currently enrolledAll infants and children enrolled at any time during the previous 5 years without an indicator for currently enrolled or records for currently enrolled infants and children onlyThe state agency can provide records for only a subset of enrolled infants and children
Available variablesFive key supplemental variables (i.e., date of first WIC certification, education level of parent, number in household on WIC, birth weight, and birth length) available for all available years of dataFewer than five key supplemental variables (i.e., date of first WIC certification, education level of parent, number in household on WIC, birth weight, and birth length) available for all available years of dataN/A
ID persistenceID retained for infants and children who re-enroll in WIC after a period of nonparticipationNew ID assigned for infants and children who re-enroll in WICN/A
Chart of number of state agencies with ideal, sufficient, or insufficient data provision capabilities for five categories. Seven state agencies had ideal data elements for linking records across data sets and one had sufficient; six had ideal type of data, one had sufficient, and one had insufficient; seven had ideal years of data and one had sufficient; seven had ideal scope of data and one had sufficient; five had ideal available variables and three had sufficient; all eight had ideal ID persistence.
Figure 1. State agency management information systems (MIS) data provision capabilities survey results, N=8 state agencies.

Matching techniques correctly linked nearly all (99%) WIC participant records over time.

  • Deterministic matching uses participant ID to link records belonging to the same person. It cannot link records of participants with more than one ID. Nearly all (99.9%) infants and children had only one participant ID and were able to be linked by participant ID (i.e., through deterministic matching). As with all state agencies, the pilot state agency had processes in place to minimize the number of participants with multiple participant IDs over time in their management information systems (MIS). However, probabilistic matching is still helpful because participants could receive more than one participant ID over time due to data entry mistakes or imperfect data cleaning processes. Although women were not included in this pilot, we expect that probabilistic matching is especially important for women as they are more likely to leave WIC and return if they have multiple pregnancies, and thus may be more likely to be assigned multiple participant IDs.
  • Probabilistic matching uses other variables to link records likely to belong to the same person. It can link records of participants with more than one ID. To inform decisions about whether records with different IDs belonged to the same participant, the researchers used variables that should not change over time and are available in all state agencies (i.e., first name, last name, date of birth, sex, race, and ethnicity). The researchers then developed a similarity score that used those variables to quantify the similarity between records. During testing, the probabilistic matching similarity score correctly matched records longitudinally for 99.97% of tested cases.
  • The matching procedure for this study resulted in a high-quality longitudinal WIC PC data set containing records for all infants and children over a six year period for one state agency. This study suggests that similar results could be achieved with other state agencies.

The pilot longitudinal WIC PC data set was successfully used to determine retention and anemia resolution among WIC participants.

  • Analysis from the WIC PC longitudinal data set showed that among infants and children, 85% first enrolled before the age of one and 56% were last certified for WIC benefits before turning 3 years old (Figure 2). Children are eligible for WIC until age 5, yet the pilot data analysis found that most participants drop out by age 3. Future analysis using a longitudinal WIC PC data set could determine factors associated with later enrollment in WIC and dropping out of WIC early.

Age at First Certification

bar chart shows age at first certification. Eighty-five percent were first certified at less than 1 year of age, 5% at 1-year-old, 4% at 2-years-old, 3% at 3-years-old, and 3% at 4-years-old.
Figure 2a. Age at first certification among infants/children born in 2014.

Age at Last Certification

Bar chart shows age at last certification. Twenty-five percent were last certified at less than 1-year-old, 17% at 1 year, 13% at 2 years, 15% at 3 years, and 30% at 4 years.
Figure 2b. Age at last certification among infants/children born in 2014.
  • The longitudinal WIC PC data were used to determine that 21% of infants and children had a normal hemoglobin level within 12 months of first being identified as anemic. Although further analysis on resolution of anemia status was not conducted for this study, future analyses could allow for a better understanding of which WIC participants are most likely to experience poor health outcomes such as anemia or unhealthy weight.

Why We Did This Study

At the time of this study, we had no participant-level longitudinal data sets containing WIC PC data. Since 1992, we have collected WIC PC data every two years. The WIC PC data are a census of WIC participants and describe participant-level information on demographics, income, nutritional risks, anthropometrics, hematology, breastfeeding status, and food package prescriptions during the month of April for each WIC PC year.

Currently, WIC PC data can only be used to identify population trends in the program over time. Because the WIC PC data lack identifiers to link participants over time, individual-level longitudinal analyses such as participant retention cannot be measured. This report describes the process of working with one WIC state agency to create a pilot WIC PC longitudinal data set of infant and child participants and the challenges and successes of this effort.

How We Did This Study

Eight state agencies with varying MIS platforms and high quality WIC PC 2020 data completed a survey about their MIS and longitudinal data provision capabilities. The survey assessed these state agencies’ MIS capabilities, as measured in six areas, to provide a longitudinal data set.

Next, we chose one state agency and worked with state agency staff and their MIS contractor to extract longitudinal data that included one record per infant or child per week from January 2014 to December 2019.

Once the data were collected, we used two matching approaches to link the records for each infant or child over time. First, we used deterministic matching where the participant ID provided by the state agency was used to link records belonging to the same infant/child. Second, for records that had more than one participant ID, we used probabilistic matching where other identifying variables (e.g., date of birth, first name, and last name) were used to match records that likely belonged to the same participant. Each potential probabilistic match was rated with a similarity score. We reviewed the probabilistic matches and compared their similarity scores to a predetermined threshold to determine whether matches were accepted.

Finally, we conducted example analyses to demonstrate the type of information that could be gained from longitudinal WIC PC data.

Next Steps

  • We have taken lessons learned from this pilot (see Key Findings) to inform the first national collection of longitudinal WIC PC data as part of the 2024 WIC PC data collection. These findings helped shape guidance and technical assistance for state agencies.
  • The WIC PC 2024 collection built off recommendations from the pilot to include women in the longitudinal data set, and to link household members via household ID. We will use these data for new analyses to better understand retention and changes in other key measures over time.

Suggested Citation

Beckerman-Hsu, J., Huret, N., & Zvavitch, P. (2025). Development of a WIC Participant and Program Characteristics Longitudinal Data Set. Prepared by Insight Policy Research, Inc., Contract No. GS-10F-0136X, Order No. 12319820F0078. Alexandria, VA: U.S. Department of Agriculture, Food and Nutrition Service, Project Officer: Amanda Reat. Available online at: www.fns.usda.gov/research/wic/pc-longitudinal-dataset.

Page updated: April 15, 2025