Hydrological ML accuracy depends on training data quantity and quality

Research area:Machine learningHydrological Forecasting Using AIHydrological modelling

What the study found

Hydrological machine learning prediction accuracy depended on both the quantity and quality of information in the training data.

Why the authors say this matters

The authors conclude that a theory-driven approach can help improve data-driven modeling by providing quality information about a system of interest.

What the researchers tested

The researchers trained three machine learning models to predict flow discharge, sediment, total nitrogen, and total phosphorus loads in four watersheds. They increased the information in the training data by adding weather data and outputs from uncalibrated and calibrated mechanistic, or theory-driven, models, and they used Shannon's information theory, including marginal and transfer entropy, to quantify information amount.

What worked and what didn't

Using all types of training data gave the best prediction accuracy for hydrological machine learning models. The abstract also says that models trained only with weather data and calibrated theory-driven model outputs could improve accuracy most efficiently in terms of information use. Accuracy statistics were used to assess the reliability of uncalibrated and calibrated theory-driven outputs as training data.

What to keep in mind

The abstract does not describe specific limitations beyond the scope of four watersheds, the four prediction targets, and the tested training-data combinations.

Key points

Prediction accuracy improved when training data contained more information and higher-quality information.
The study used Shannon's information theory, including marginal and transfer entropy, to measure information in the training data.
Three machine learning models were tested on four watershed prediction tasks: discharge, sediment, total nitrogen, and total phosphorus loads.
All types of training data together produced the best hydrological machine learning accuracy.
Weather data plus calibrated theory-driven model outputs were described as the most efficient combination for improving accuracy in terms of information use.

Disclosure

Research title:: Hydrological ML accuracy depends on training data quantity and quality
Authors:: Minhyuk Jeung, Younggu Her, Sang-Soo Baek, Kwangsik Yoon
Institutions:: Chonnam National University, Florida Department of Education, Yeungnam University
Publication date:: 2026-02-24
DOI:: 10.5194/hess-30-1077-2026
OpenAlex record:: View

AI provenance: This post was generated by OpenAI. The original authors did not write or review this post.

Hydrological ML accuracy depends on training data quantity and quality

What the study found

Why the authors say this matters

What the researchers tested

What worked and what didn't

What to keep in mind

Disclosure

More posts

Allograft augmentation was the most cost-effective option in rotator cuff repair

NATPS enables efficient sampling of nonadiabatic trajectories

Hydrological ML accuracy depends on training data quantity and quality

What the study found

Why the authors say this matters

What the researchers tested

What worked and what didn't

What to keep in mind

Disclosure

More posts

Allograft augmentation was the most cost-effective option in rotator cuff repair

Framework for studying infrastructure failure

NATPS enables efficient sampling of nonadiabatic trajectories

Renovations are linked to tenant relocations in Sweden