Neural-ODE reproducibility and fitted latent-ODE lane¶
The neural-ODE lane now has two closed FAST workflows:
a deterministic dataset/baseline/calibration bundle; and
a fitted random-feature latent ODE with train/validation/test metrics.
This is still claim_level = "validation" because the data are FAST
trajectories, not production reconnection campaigns. The lane writes model
parameters, predictions, metrics, plots, and hashes.
Dataset contract¶
Generate the FAST deterministic dataset with:
mhx neural-ode dataset \
--outdir outputs/neural_ode/seed_qi_fast \
--seeds 0,1,2,3,4,5 \
--nx 16 --ny 16 \
--steps 24 \
--dt 1e-2
Expected files:
dataset.npzsplits.jsonbaseline_metrics.jsoncalibration.jsonexperiment_spec.jsonvalidation.jsonfigures/dataset_targets.pngfigures/baseline_rmse.pngfigures/calibration_coverage.pngmanifest.json
The dataset arrays are:
Array |
Shape |
Meaning |
|---|---|---|
|
|
Deterministic sample identifiers. |
|
|
Saved simulation times. |
|
|
Diagnostic histories used as model inputs. |
|
|
Forecast targets selected from the feature tensor. |
Default features are mode amplitude, magnetic energy, kinetic energy, total energy, magnetic-divergence error, \(\|\psi\|_2\), and \(\|\omega\|_2\). Default targets are mode amplitude, total energy, and magnetic-divergence error.
Baselines¶
The lane evaluates no-training baselines:
persistence: \(\hat y(t)=y(t_\mathrm{obs})\);
linear-prefix extrapolation: fit a two-point slope from the observed prefix;
train-mean history: use the mean target history over training seeds.
For each baseline and split, MHX writes MAE, RMSE, maximum absolute error, and target-wise scores:
The calibration file estimates train residual standard deviations and reports empirical one- and two-sigma coverage on train/validation/test splits. These checks are not a probabilistic model; they are a minimum benchmark a later trainable latent or neural ODE must beat.
Fitted latent ODE¶
Train the deterministic CI-scale model with:
mhx neural-ode train \
--outdir outputs/neural_ode/latent_ode_fast \
--seeds 0,1,2,3,4,5 \
--nx 16 --ny 16 \
--steps 24 \
--hidden-size 8
The fitted model is the autonomous ODE
where \(z\) contains the target diagnostics. The random feature matrix \(R\) and
bias \(b\) are deterministic from --model-seed; \(W\) is fitted by ridge
regression to train-set finite differences,
Standalone mhx neural-ode train first writes the dataset bundle when it is
not provided, then writes the fitted-model artifacts. Expected files therefore
include the dataset contract plus:
latent_ode_model.jsonlatent_ode_metrics.jsonlatent_ode_predictions.npzfailure_modes.jsonfigures/latent_ode_predictions.pngfigures/latent_ode_rmse_comparison.pngfigures/latent_ode_failure_modes.pngmanifest.json
The metric file reports train/validation/test MAE, RMSE, target-wise errors, the best baseline test RMSE, and the latent-ODE test-RMSE ratio to that baseline. The ratio is reported rather than hidden; this keeps the current model honest and makes future neural-ODE improvements directly comparable.


The failure-mode report is a deliberately skeptical artifact, not a pass/fail claim that the latent ODE is production-ready. It records train-vs-test RMSE, late-vs-early forecast drift, and latent-vs-best-baseline ratios:

Claim boundary¶
The manifest is claim_level = "validation". The current lane supports claims
that the dataset/split/baseline/calibration contract is deterministic and that
the fitted latent-ODE experiment is reproducible and schema-valid. It does
not support claims that the model generalizes to production nonlinear
reconnection until it is trained and tested on production-quality trajectories.
validation.json uses schema mhx.neural_ode.reproducibility.gates.v1 and
gates four prerequisites together: the source seed-QI validation passed, the
split manifest is disjoint and complete, all baseline arrays are finite, and
the calibration report was generated from the same target tensor.
mhx neural-ode train uses schema mhx.neural_ode.training.gates.v1 and adds
gates for finite fitted coefficients, finite predictions, matching prediction
shapes, and held-out test forecasts.