Campaign runner operations¶
This page describes how MHX should move from FAST validation artifacts to long nonlinear reconnection campaigns. It is intentionally operational: reviewers should be able to see what was run, why it was long enough, what files were written, and why a result is or is not a production physics claim.
Current runner status¶
MHX currently ships six campaign-level commands:
mhx campaign rutherford-template --outdir outputs/campaigns/rutherford_template
mhx campaign rutherford-run-fast --outdir outputs/campaigns/rutherford_fast
mhx campaign rutherford-plan-production --outdir outputs/campaigns/rutherford_production_plan
mhx campaign rutherford-resume-plan outputs/campaigns/rutherford_production_plan
mhx campaign rutherford-execute outputs/campaigns/rutherford_production_plan --max-steps 128
mhx campaign rutherford-promotion-check outputs/campaigns/rutherford_production_plan
The first command writes a duration-guarded production template. The second
command runs a tiny deterministic nonlinear trajectory and writes the same
diagnostic vocabulary used by the future Rutherford campaign. The second command
is still claim_level = "validation" unless explicitly configured as smoke; it
is not a production nonlinear result.
The third and fourth commands write a duration-guarded production plan, runbook,
scheduler-neutral walltime chunks, checkpoint index, required output list, and
resume selection contract. The fifth command is the actual restartable executor:
it advances a real reduced-MHD chunk, writes a state checkpoint, appends
production-history arrays, refreshes the resume plan, and writes figures. A
partial chunk remains claim_level = "validation"; a completed target run can
only emit claim_level = "production" if the explicit production-claim gate is
enabled, all execution checks pass, and a passing promotion-readiness report is
attached under <run-dir>/promotion/. The sixth command writes that
promotion-readiness report and exits nonzero until the missing evidence is
attached.
Source links:
Duration gate¶
For a mode with linear growth rate \(\gamma\), any growth or island-formation claim must satisfy
Here \(N_e\) is the number of required e-folds and \(s_f\) is a safety factor. The default Harris reference used by MHX is \(\gamma\simeq0.0131\), so:
The first number is a minimum linear-growth observation window. The second is the default Rutherford-template window, leaving room for a resolved linear phase plus nonlinear island tracking. A shorter run may still be useful as a code-validity gate, but it must not be labeled as nonlinear reconnection evidence.
FAST runner artifacts¶
The FAST runner writes:
rutherford_fast_histories.npzdiagnostics.jsonvalidation.jsoncampaign_template.jsonmanifest.jsonfigures/rutherford_fast_histories.pngoptionally
figures/flux_movie.gif
The NPZ history keys are:
Key |
Meaning |
|---|---|
|
Saved times for each trajectory. |
|
Seed associated with each saved trajectory row. |
|
Reconnecting-mode flux proxy. |
|
$W=4\sqrt{ |
|
Finite-difference time derivative of reconnecting flux. |
|
Reduced-MHD energy diagnostics. |
|
Spectral solenoidal-field check. |
|
Current-density magnitude proxy. |
These names are intentionally the same names that should appear in a future production Rutherford campaign. That lets plotting, manifests, and reviewers reuse one schema.
Production planning and execution artifacts¶
The production planner writes these files before running the expensive PDE:
campaign_plan.jsonwith schemamhx.campaign.rutherford_production_plan.v1;campaign_config.tomlwith the effective long-run configuration;validation.jsonwith duration, resolution, walltime, and artifact gates;runbook.mdwith the launch/restart checklist;job_array.jsonwith scheduler-neutral walltime chunks;checkpoints/checkpoint_index.jsonwith schemamhx.campaign.rutherford_checkpoint_index.v1;manifest.jsonwithclaim_level = "production_template".
The executor then writes:
production_history.npzwith schemamhx.campaign.rutherford_history.v1;diagnostics.jsonwith schemamhx.campaign.rutherford_execution.v1;validation.jsonwith schemamhx.campaign.rutherford_execution.gates.v1;checkpoints/state_step_*.npzwith schemamhx.campaign.rutherford_state.v1;checkpoints/step_*.jsoncheckpoint metadata with hashes;resume_plan.json;figures/production_histories.png;figures/current_sheet_aspect_ratio.png;optional
figures/fixed_scale_flux_movie.gifandfigures/fixed_scale_current_density_movie.gif;artifact_manifest.jsonand an updatedmanifest.json.
The promotion checker then writes:
promotion/promotion_readiness.jsonwith schemamhx.campaign.rutherford_promotion.v1;promotion/validation.jsonwith schemamhx.campaign.rutherford_promotion.gates.v1;promotion/figures/promotion_matrix.png;promotion/artifact_manifest.jsonandpromotion/manifest.json.
The promotion gate is deliberately stricter than the executor. It requires a
completed target, finite histories, current-sheet geometry, detected X/O
critical-point counts, fixed-scale movies unless explicitly disabled,
convergence evidence, seed-QI evidence, and tolerances on energy-budget
residuals and magnetic-divergence error. It also requires positive nonlinear
response: by default, both the peak/initial reconnecting-flux amplification and
the peak/initial Rutherford-width amplification must exceed 1.05. These
thresholds are configurable with
--min-reconnected-flux-amplification and
--min-island-width-amplification on mhx campaign rutherford-promotion-check.
For production-candidate Rutherford/island runs, plan the executor with the same
unstable periodic double-Harris equilibrium used by the nonlinear validation
lane:
mhx campaign rutherford-plan-production \
--outdir outputs/campaigns/rutherford_production \
--equilibrium periodic_double_harris \
--width 0.36 --perturbation-amplitude 0.004 \
--mode-x 2 --mode-y 1 --eta 0.0045 --nu 0.0045 \
--nx 128 --ny 128 --dt 0.02 --target-saved-frames 121
The older cosine_tearing default is retained as a decaying
executor/checkpoint schema sanity path. It is not sufficient for a nonlinear
response promotion because its reconnecting-flux and island-width histories are
expected to fail the amplification gates.
For a laptop-safe closed-lane example:
mhx campaign rutherford-plan-production \
--outdir outputs/campaigns/rutherford_executor_demo \
--nx 8 --ny 8 --dt 1e-2 --target-saved-frames 120 \
--min-production-resolution 8
mhx campaign rutherford-execute \
outputs/campaigns/rutherford_executor_demo \
--max-steps 8 --movies
mhx campaign rutherford-promotion-check \
outputs/campaigns/rutherford_executor_demo \
--no-require-movies --min-history-samples 2 || true
The final command is expected to fail for this tiny demo because the target is
not complete and no convergence/seed-QI bundles are attached. The purpose is to
write promotion/figures/promotion_matrix.png so the missing evidence is
explicit rather than implicit.



The checkpoint index starts empty. Long-run executors register restartable state files with:
from mhx.campaigns import write_checkpoint_metadata
write_checkpoint_metadata(
"outputs/campaigns/rutherford_production_plan",
step=1000,
time=100.0,
state_path="checkpoints/state_0000001000.npz",
history_path="histories.npz",
metrics={
"total_energy": 0.997,
"magnetic_divergence_linf": 2.0e-13,
},
)
Each checkpoint record stores the step, physical time, walltime spent, metrics,
artifact paths, file sizes, and SHA-256 hashes. mhx campaign rutherford-resume-plan <run-dir> selects the latest valid checkpoint at or
before the target step. If files are missing or hashes change, the resume plan
marks the checkpoint invalid and falls back to step zero rather than silently
continuing from a corrupted state.
Production campaign acceptance criteria¶
A production Rutherford or plasmoid campaign should pass all of the following:
Requirement |
Reviewer reason |
|---|---|
Duration guard passes for the declared \(\gamma\), \(N_e\), and \(s_f\). |
Prevents short runs from being interpreted as nonlinear evolution. |
At least two spatial resolutions and two time steps are archived. |
Separates physics from discretization artifacts. |
Energy-budget residual remains below the documented tolerance. |
Checks bracket cancellation and dissipation signs during the long run. |
Magnetic divergence remains near spectral roundoff or a documented tolerance. |
Catches projection/derivative mistakes. |
Seed-robust QI is run on the production diagnostic family. |
Checks that the reported metrics are not seed accidents. |
|
Blocks production claims until convergence, seed-QI, current-sheet geometry, X/O point counts, fixed-scale media, tolerances, and positive reconnecting-flux/island-width response are present. |
Flux/current movies use fixed color limits and include timestamps. |
Makes visual comparisons honest across resolutions and seeds. |
Artifact manifests include hashes, config, git commit, API version, and dependencies. |
Makes reviewer reruns and diffs possible. |
Dry-run production automation lane¶
tools/run_nonlinear_production_campaign.py is the dry-run-first launcher for a
true nonlinear production lane. By default it does not run the solver; it writes
production_campaign_manifest.json and run_commands.sh with exact commands,
timeouts, expected outputs, and claim boundaries:
python tools/run_nonlinear_production_campaign.py \
--outdir outputs/campaigns/nonlinear_production_campaign \
--dry-run
The manifest includes Rutherford duration planning/execution, a seeded
double-Harris production-duration movie run, two convergence bundles, seed-QI,
sheet-width plus aspect-ratio evidence, eta/Lundquist sweeps, fixed-scale movie
gates, double-Harris validation promotion, Rutherford promotion, and a final
--allow-production-claim refresh command. The double-Harris promotion command
can only promote media to convergence-backed validation. Rutherford artifacts
remain claim_level = "validation" unless the target duration completes, the
promotion report passes with convergence/seed/movie/response evidence, and the
final zero-step production-claim refresh succeeds.
By default the campaign fits the early growth rate only through the larger of
three saved samples or two declared Harris e-folding times,
fit_stop = min(t_end, max(3*save_interval, 2/gamma)). This keeps the growth
fit out of nonlinear saturation for bounded GPU campaigns. Use --fit-stop
when a case-specific linear window is known from an eigenmode or pilot run.
When --save-interval is omitted, the convergence and sweep stages inherit the
long-run saved-frame cadence, save_interval = save_every*dt, so the default
campaign uses the same physical sampling cadence across long-run and auxiliary
gates. The seed-QI command also receives the same --save-every stride; this
keeps long seed ensembles from storing every RK4 step on GPU.
Execution is intentionally opt-in and bounded per command:
python tools/run_nonlinear_production_campaign.py \
--outdir outputs/campaigns/nonlinear_production_campaign \
--execute \
--timeout-seconds 43200 \
--gate-timeout-seconds 1800
Use the generated shell script only when an external scheduler owns walltime.
The Python --execute path enforces portable subprocess timeouts and updates
the manifest with pass/fail/timeout records after each command.
Proposed production directory¶
outputs/campaigns/rutherford_production/
campaign_config.toml
campaign_plan.json
runbook.md
job_array.json
diagnostics.json
validation.json
production_history.npz
resume_plan.json
checkpoints/
checkpoint_index.json
state_step_*.npz
step_*.json
convergence/
resolution_sweep/
timestep_sweep/
seed_qi/
figures/
production_histories.png
current_sheet_aspect_ratio.png
fixed_scale_flux_movie.gif
fixed_scale_current_density_movie.gif
promotion/
promotion_readiness.json
validation.json
figures/promotion_matrix.png
artifact_manifest.json
manifest.json
artifact_manifest.json
manifest.json
This layout is now automated for chunked execution through
mhx campaign rutherford-execute. The remaining hard boundary is not the
executor itself; it is running enough chunks at production resolution, then
attaching convergence sweeps, seed-QI checks, fixed-scale movies, and a passing
promotion report before claiming a paper-grade nonlinear result.
Reviewer questions to answer before claiming production¶
Which growth rate set the duration?
How many e-folds were covered before nonlinear island tracking?
Did island width show the expected algebraic phase after the linear phase?
Did the energy budget remain closed over the full run?
Did the result persist under resolution, time-step, and seed changes?
Are flux/current movies plotted with fixed ranges?
Are all files reproducible from the checked-in command sequence?
If the answer to any question is unknown, the result should remain a validation artifact, not a paper claim.