What the study found
Hydrological machine learning prediction accuracy depended on both the quantity and quality of information in the training data.
Why the authors say this matters
The authors conclude that a theory-driven approach can help improve data-driven modeling by providing quality information about a system of interest.
What the researchers tested
The researchers trained three machine learning models to predict flow discharge, sediment, total nitrogen, and total phosphorus loads in four watersheds. They increased the information in the training data by adding weather data and outputs from uncalibrated and calibrated mechanistic, or theory-driven, models, and they used Shannon's information theory, including marginal and transfer entropy, to quantify information amount.
What worked and what didn't
Using all types of training data gave the best prediction accuracy for hydrological machine learning models. The abstract also says that models trained only with weather data and calibrated theory-driven model outputs could improve accuracy most efficiently in terms of information use. Accuracy statistics were used to assess the reliability of uncalibrated and calibrated theory-driven outputs as training data.
What to keep in mind
The abstract does not describe specific limitations beyond the scope of four watersheds, the four prediction targets, and the tested training-data combinations.
Key points
- Prediction accuracy improved when training data contained more information and higher-quality information.
- The study used Shannon's information theory, including marginal and transfer entropy, to measure information in the training data.
- Three machine learning models were tested on four watershed prediction tasks: discharge, sediment, total nitrogen, and total phosphorus loads.
- All types of training data together produced the best hydrological machine learning accuracy.
- Weather data plus calibrated theory-driven model outputs were described as the most efficient combination for improving accuracy in terms of information use.
Disclosure
- Research title:
- Hydrological ML accuracy depends on training data quantity and quality
- Authors:
- Minhyuk Jeung, Younggu Her, Sang-Soo Baek, Kwangsik Yoon
- Institutions:
- Chonnam National University, Florida Department of Education, Yeungnam University
- Publication date:
- 2026-02-24
- OpenAlex record:
- View
Get the weekly research newsletter
Stay current with peer-reviewed research without reading academic papers — one filtered digest, every Friday.


