Category: Machine learning
Training Data
Training data is the labeled examples used to fit a machine learning model.
Also known as: labeled data
Expanded definition
Training data teaches a model what patterns to associate with each class or output.
In EO, training data is sensitive to geography, season, sensor, preprocessing, and label definitions. A model trained in one region often fails elsewhere unless the training data covers the variability.
For operational use, it helps to track label provenance, time period, and any known biases.
Related terms
Ground Truth
Ground truth is reference information about real conditions on the ground used to train or validate models.
Validation
Validation tests model performance on data not used for training, to estimate how well it will generalize.
Harmonization
Harmonization reduces differences between scenes or sensors so values are more comparable across time.