What the study found
The study found that knockoff-based variable selection can be extended to settings with missing data by combining multiple imputation with a knockoff filter. The authors report that this approach is feasible, flexible, and effective, including for predictors with unordered categories.
Why the authors say this matters
The authors say this matters because large-scale assessment data often contain many variables with missing values, and traditional knockoff methods do not consider predictors with unordered categories. The findings indicate that the proposed framework can address both missing data and categorical predictors in this setting.
What the researchers tested
The researchers extended the knockoffs method for selecting predictors to missing-data settings by first applying multiple imputation, which fills in missing values several times to create complete versions of the data. Each imputed dataset was then analyzed with a suitable knockoff filter. They evaluated the method in simulation studies and applied it to INVALSI data on test scores of Italian grade 5 students and background variables.
What worked and what didn't
In the simulation studies, the proposed method showed satisfactory performance and was consistent with a recently advocated cutting-edge method. In the case study, the approach was applied to data where most predictors had unordered categories and some key predictors had missing values. The abstract reports that the method was feasible, flexible, and effective in this application.
What to keep in mind
The abstract does not describe detailed limitations or failure cases. The reported results are based on simulation studies and one applied case study in large-scale assessment data, so the summary does not show how the method performs in other settings.
Key points
- The study extends knockoff-based variable selection to missing-data settings using multiple imputation.
- Simulation studies showed satisfactory performance, matching a recently advocated cutting-edge method.
- The method was applied to INVALSI data on Italian grade 5 students with many background variables.
- The case study involved mostly unordered categorical predictors and some key predictors with missing values.
- The authors describe the approach as feasible, flexible, and effective.
Disclosure
- Research title:
- Knockoff variable selection works with missing categorical data
- Image credit:
- Photo by Myburgh Roux on Pexels
Get the weekly research newsletter
Stay current with peer-reviewed research without reading academic papers — one filtered digest, every Friday.


