# Neural-ODE reproducibility and fitted latent-ODE lane

The neural-ODE lane now has two closed FAST workflows:

1. a deterministic dataset/baseline/calibration bundle; and
2. a fitted random-feature latent ODE with train/validation/test metrics.

This is still `claim_level = "validation"` because the data are FAST
trajectories, not production reconnection campaigns. The lane writes model
parameters, predictions, metrics, plots, and hashes.

## Dataset contract

Generate the FAST deterministic dataset with:

```bash
mhx neural-ode dataset \
  --outdir outputs/neural_ode/seed_qi_fast \
  --seeds 0,1,2,3,4,5 \
  --nx 16 --ny 16 \
  --steps 24 \
  --dt 1e-2
```

Expected files:

- `dataset.npz`
- `splits.json`
- `baseline_metrics.json`
- `calibration.json`
- `experiment_spec.json`
- `validation.json`
- `figures/dataset_targets.png`
- `figures/baseline_rmse.png`
- `figures/calibration_coverage.png`
- `manifest.json`

The dataset arrays are:

| Array | Shape | Meaning |
| --- | --- | --- |
| `seeds` | `(n_seed,)` | Deterministic sample identifiers. |
| `times` | `(n_time,)` | Saved simulation times. |
| `features` | `(n_seed, n_time, n_feature)` | Diagnostic histories used as model inputs. |
| `targets` | `(n_seed, n_time, n_target)` | Forecast targets selected from the feature tensor. |

Default features are mode amplitude, magnetic energy, kinetic energy, total
energy, magnetic-divergence error, $\|\psi\|_2$, and $\|\omega\|_2$.  Default
targets are mode amplitude, total energy, and magnetic-divergence error.

## Baselines

The lane evaluates no-training baselines:

- persistence: $\hat y(t)=y(t_\mathrm{obs})$;
- linear-prefix extrapolation: fit a two-point slope from the observed prefix;
- train-mean history: use the mean target history over training seeds.

For each baseline and split, MHX writes MAE, RMSE, maximum absolute error, and
target-wise scores:

$$
\mathrm{MAE}=\langle |y-\hat y|\rangle,\qquad
\mathrm{RMSE}=\sqrt{\langle (y-\hat y)^2\rangle}.
$$

The calibration file estimates train residual standard deviations and reports
empirical one- and two-sigma coverage on train/validation/test splits.  These
checks are not a probabilistic model; they are a minimum benchmark a later
trainable latent or neural ODE must beat.

## Fitted latent ODE

Train the deterministic CI-scale model with:

```bash
mhx neural-ode train \
  --outdir outputs/neural_ode/latent_ode_fast \
  --seeds 0,1,2,3,4,5 \
  --nx 16 --ny 16 \
  --steps 24 \
  --hidden-size 8
```

The fitted model is the autonomous ODE

$$
\frac{dz}{dt}=W\,\phi(z),\qquad
\phi(z)=\left[z,\tanh(zR+b),1\right],
$$

where $z$ contains the target diagnostics.  The random feature matrix $R$ and
bias $b$ are deterministic from `--model-seed`; $W$ is fitted by ridge
regression to train-set finite differences,

$$
W=\arg\min_W \|XW-\dot Z\|_2^2+\lambda\|W\|_2^2.
$$

Standalone `mhx neural-ode train` first writes the dataset bundle when it is
not provided, then writes the fitted-model artifacts. Expected files therefore
include the dataset contract plus:

- `latent_ode_model.json`
- `latent_ode_metrics.json`
- `latent_ode_predictions.npz`
- `failure_modes.json`
- `figures/latent_ode_predictions.png`
- `figures/latent_ode_rmse_comparison.png`
- `figures/latent_ode_failure_modes.png`
- `manifest.json`

The metric file reports train/validation/test MAE, RMSE, target-wise errors,
the best baseline test RMSE, and the latent-ODE test-RMSE ratio to that
baseline. The ratio is reported rather than hidden; this keeps the current
model honest and makes future neural-ODE improvements directly comparable.

![Latent-ODE predictions](_static/validation/neural_ode_latent_fit/latent_ode_predictions.png)

![Latent-ODE RMSE comparison](_static/validation/neural_ode_latent_fit/latent_ode_rmse_comparison.png)

The failure-mode report is a deliberately skeptical artifact, not a pass/fail
claim that the latent ODE is production-ready. It records train-vs-test RMSE,
late-vs-early forecast drift, and latent-vs-best-baseline ratios:

$$
R_\mathrm{drift}
=
\frac{\mathrm{RMSE}_\mathrm{late,test}}
{\max(\mathrm{RMSE}_\mathrm{early,test},\epsilon)} ,
\qquad
R_\mathrm{seed}
=
\frac{\mathrm{RMSE}_\mathrm{test}}
{\max(\mathrm{RMSE}_\mathrm{train},\epsilon)} .
$$

![Latent-ODE failure-mode probes](_static/validation/neural_ode_latent_fit/latent_ode_failure_modes.png)

## Claim boundary

The manifest is `claim_level = "validation"`.  The current lane supports claims
that the dataset/split/baseline/calibration contract is deterministic and that
the fitted latent-ODE experiment is reproducible and schema-valid.  It does
**not** support claims that the model generalizes to production nonlinear
reconnection until it is trained and tested on production-quality trajectories.

`validation.json` uses schema `mhx.neural_ode.reproducibility.gates.v1` and
gates four prerequisites together: the source seed-QI validation passed, the
split manifest is disjoint and complete, all baseline arrays are finite, and
the calibration report was generated from the same target tensor.

`mhx neural-ode train` uses schema `mhx.neural_ode.training.gates.v1` and adds
gates for finite fitted coefficients, finite predictions, matching prediction
shapes, and held-out test forecasts.

## Source links

- [Dataset and baseline implementation](https://github.com/uwplasma/MHX/blob/main/src/mhx/neural_ode/reproducibility.py)
- [Public exports](https://github.com/uwplasma/MHX/blob/main/src/mhx/neural_ode/__init__.py)
- [CLI entrypoint](https://github.com/uwplasma/MHX/blob/main/src/mhx/cli/main.py)
- [Example script](https://github.com/uwplasma/MHX/blob/main/examples/make_neural_ode_reproducibility.py)
- [Latent-ODE training example](https://github.com/uwplasma/MHX/blob/main/examples/train_latent_ode_fast.py)
- [Tests](https://github.com/uwplasma/MHX/blob/main/tests/test_neural_ode_reproducibility.py)