Hyperparameter Tuning

Training a strategy involves choosing hyperparameters — learning rate, batch size, bout offset, regularisation strength — that sit above the strategy parameters themselves. Poor hyperparameters can cause good strategies to underperform or overfit. This tutorial covers how to tune training hyperparameters using walk-forward analysis as the objective.

This tutorial covers:

The three-level optimisation hierarchy
Basic hyperparameter tuning
Multi-period SGD as an alternative trainer
Multi-objective tuning (Pareto fronts)
Custom search spaces
Production workflow

The Three-Level Hierarchy

quantammsim’s optimisation stack has three nested levels:

Level 3: HyperparamTuner
    │  Optuna/TPE — varies (lr, batch_size, bout_offset, ...)
    │  Objective: OOS Sharpe, WFE, or Rademacher-adjusted Sharpe
    ▼
Level 2: TrainingEvaluator
    │  Walk-forward cycles — trains & evaluates on rolling windows
    │  Computes: WFE, IS-OOS gap, Rademacher complexity
    ▼
Level 1: Trainer (train_on_historic_data or multi_period_sgd)
    │  Gradient descent — optimises strategy params (λ, k, weights)
    │  Objective: daily_log_sharpe (or other return_val)
    ▼
Level 0: Forward pass
    Simulate pool → arbitrage → compute financial metric

The key insight: each level optimises something the level below cannot see.

Level 1 optimises strategy parameters for a given training window.
Level 2 evaluates whether Level 1’s output generalises across windows.
Level 3 finds the hyperparameters that make Level 2’s evaluation best.

Optimising for OOS metrics at the outer level avoids the fundamental trap of tuning hyperparameters on in-sample performance.

Basic Hyperparameter Tuning

The HyperparamTuner wraps a TrainingEvaluator inside an Optuna study:

from quantammsim.runners.hyperparam_tuner import HyperparamTuner

run_fingerprint = {
    "tokens": ["BTC", "ETH"],
    "rule": "mean_reversion_channel",
    "startDateString": "2022-06-01 00:00:00",
    "endDateString": "2024-06-01 00:00:00",
    "initial_pool_value": 1_000_000.0,
    "fees": 0.003,
    "do_arb": True,
    "return_val": "daily_log_sharpe",
    "chunk_period": 1440,
    "optimisation_settings": {
        "method": "gradient_descent",
        "optimiser": "adam",
        "n_parameter_sets": 4,
        "use_gradient_clipping": True,
        "clip_norm": 10.0,
    },
}

tuner = HyperparamTuner(
    runner_name="train_on_historic_data",
    n_trials=30,               # Optuna trials
    n_wfa_cycles=3,            # WFA cycles per trial
    objective="mean_oos_sharpe",
)

result = tuner.tune(run_fingerprint)

print(f"Best OOS Sharpe: {result.best_value:.3f}")
print(f"Best params: {result.best_params}")

# Apply best hyperparameters for final training
run_fingerprint["optimisation_settings"].update(result.best_params)

The tuner varies training hyperparameters (learning rate, batch size, bout offset, LR schedule, early stopping, weight decay) while keeping strategy structure and data fixed.

Default Search Space

The default search space covers:

Parameter	Range	Scale	Notes
`base_lr`	[1e-5, 0.1]	Log	Adam/AdamW range; SGD uses [1e-3, 1.0]
`batch_size`	[8, 64]	Log	Powers of 2 preferred
`n_iterations`	[50, 5000]	Log	Training epochs
`bout_offset_days`	[7, ~90% of cycle]	Log	Converted to minutes internally
`clip_norm`	[0.5, 50]	Log	Gradient clipping threshold
`lr_schedule_type`	constant / cosine / warmup_cosine / exponential	Categorical	Conditional parameters follow
`use_early_stopping`	True / False	Categorical	Patience and val_fraction are conditional
`use_weight_decay`	True / False	Categorical	Decay value is conditional
`noise_scale`	[0.01, 0.5]	Log	Initialisation diversity

Objective Functions

Available objectives:

``”mean_oos_sharpe”`` (default) — Average OOS Sharpe across cycles. Best for maximising expected performance.
``”worst_oos_sharpe”`` — Worst-case OOS Sharpe. Best for robustness across market regimes.
``”mean_wfe”`` — Average Walk-Forward Efficiency. Optimises for consistency rather than magnitude.
``”adjusted_mean_oos_sharpe”`` — Rademacher-adjusted Sharpe. Requires compute_rademacher=True on the inner evaluator.
``”multi”`` — Multi-objective (see below).

Also available: "mean_oos_calmar", "mean_oos_sterling", "worst_oos_calmar", "mean_oos_daily_log_sharpe", and others.

Trial Pruning

Unpromising trials are pruned early to save compute. After each walk-forward cycle completes, the tuner reports the intermediate OOS metric to Optuna. If the trial’s trajectory is worse than the bottom 25th percentile of completed trials, it is terminated:

tuner = HyperparamTuner(
    runner_name="train_on_historic_data",
    n_trials=50,
    n_wfa_cycles=4,
    objective="mean_oos_sharpe",
    enable_pruning=True,  # Default
    pruner="percentile",  # Default: prune bottom 25%
)

Available pruners:

"percentile" (default) — Prune bottom 25%. Good for WFA where cycles are independent market regimes.
"median" — Prune below median. More aggressive.
"hyperband" / "successive_halving" — Multi-fidelity pruners. Use cautiously with WFA since cycles are not true fidelity levels.
None — Disable pruning.

Multi-Period SGD

multi_period_sgd is an alternative Level-1 trainer that divides training data into multiple periods and optimises across all of them simultaneously. It’s particularly effective for strategies that need to work across different market regimes:

tuner = HyperparamTuner(
    runner_name="multi_period_sgd",
    n_trials=30,
    n_wfa_cycles=3,
    objective="mean_oos_sharpe",
)

result = tuner.tune(run_fingerprint)

The multi-period search space automatically includes:

n_periods: Number of sub-periods (2-8)
max_epochs: Training epochs (50-300)
aggregation: How to combine period losses (mean, worst, softmin)
softmin_temperature: Temperature for softmin (conditional on aggregation)

The worst aggregation trains to maximise the minimum performance across all periods — a minimax objective that produces conservative but robust strategies.

Multi-Objective Tuning

Sometimes you want to optimise multiple objectives simultaneously rather than collapsing them into a single scalar. Multi-objective tuning returns a Pareto front:

tuner = HyperparamTuner(
    runner_name="train_on_historic_data",
    n_trials=50,
    n_wfa_cycles=3,
    objective="multi",
    multi_objectives=["mean_oos_sharpe", "mean_wfe"],
)

result = tuner.tune(run_fingerprint)

# Inspect the Pareto front
for trial in result.pareto_front:
    print(
        f"OOS Sharpe={trial['values'][0]:.3f}, "
        f"WFE={trial['values'][1]:.3f}, "
        f"params={trial['params']}"
    )

The Pareto front contains all non-dominated solutions — configurations where no other trial is strictly better on all objectives. You then choose from the front based on your priorities:

If deployment risk tolerance is low, pick the trial with highest WFE (most consistent generalisation).
If absolute performance matters more, pick the trial with highest OOS Sharpe (even if WFE is lower).

Custom Search Spaces

Create a custom search space using HyperparamSpace:

from quantammsim.runners.hyperparam_tuner import HyperparamSpace

# Minimal space for quick exploration
space = HyperparamSpace.create(minimal=True)
# Only tunes: base_lr ∈ [0.01, 0.5], n_iterations ∈ [50, 200]

# Full space with custom cycle duration
space = HyperparamSpace.create(
    runner="train_on_historic_data",
    cycle_days=90,                  # Shorter cycles → smaller bout_offset range
    optimizer="adam",
    include_lr_schedule=True,       # Include LR schedule choices
    include_early_stopping=True,    # Include early stopping
    include_weight_decay=True,      # Include weight decay
    objective_metric="mean_oos_sharpe",
)

# Pass to tuner
tuner = HyperparamTuner(
    runner_name="train_on_historic_data",
    n_trials=30,
    n_wfa_cycles=3,
    hyperparam_space=space,
)

You can also define the space manually for full control:

space = HyperparamSpace(params={
    "base_lr": {"low": 0.001, "high": 0.1, "log": True},
    "batch_size": {"low": 8, "high": 32, "log": True, "type": "int"},
    "n_iterations": {"low": 100, "high": 1000, "log": True, "type": "int"},
    "bout_offset_days": {"low": 7, "high": 60, "log": True, "type": "int"},
})

Conditional parameters are supported:

space = HyperparamSpace(params={
    "base_lr": {"low": 0.001, "high": 0.1, "log": True},
    "use_weight_decay": {"choices": [True, False]},
    "weight_decay": {
        "low": 0.0001, "high": 0.1, "log": True,
        "conditional_on": "use_weight_decay",
        "conditional_value": True,
    },
})

Persistent Studies

For long-running tuning, persist the Optuna study to a database so it survives interruptions:

tuner = HyperparamTuner(
    runner_name="train_on_historic_data",
    n_trials=100,
    n_wfa_cycles=4,
    study_name="btc_eth_momentum_tune",
    storage="sqlite:///tuning_results.db",
    total_timeout=3600 * 4,  # Stop after 4 hours
)

result = tuner.tune(run_fingerprint)

Rerunning with the same study_name and storage resumes from where it left off.

Production Workflow

A recommended end-to-end workflow:

from quantammsim.runners.hyperparam_tuner import HyperparamTuner
from quantammsim.runners.training_evaluator import TrainingEvaluator
from quantammsim.runners.jax_runners import train_on_historic_data

# ── Step 1: Quick exploration with minimal space ──
tuner_quick = HyperparamTuner(
    runner_name="train_on_historic_data",
    n_trials=15,
    n_wfa_cycles=3,
    objective="mean_oos_sharpe",
    hyperparam_space=HyperparamSpace.create(minimal=True),
)
quick_result = tuner_quick.tune(run_fingerprint)
print(f"Quick pass best: {quick_result.best_value:.3f}")

# ── Step 2: Full tuning around the promising region ──
tuner_full = HyperparamTuner(
    runner_name="train_on_historic_data",
    n_trials=50,
    n_wfa_cycles=4,
    objective="mean_oos_sharpe",
    enable_pruning=True,
)
full_result = tuner_full.tune(run_fingerprint)
print(f"Full tuning best: {full_result.best_value:.3f}")

# ── Step 3: Validate best hyperparameters with more cycles ──
run_fingerprint["optimisation_settings"].update(full_result.best_params)

evaluator = TrainingEvaluator.from_runner(
    "train_on_historic_data",
    n_cycles=6,           # More cycles for final validation
    compute_rademacher=True,
)
validation = evaluator.evaluate(run_fingerprint)
evaluator.print_report(validation)

# ── Step 4: Final training on full data ──
if validation.is_effective:
    print("Strategy validated — running final training")
    train_on_historic_data(run_fingerprint, verbose=True)
else:
    print("Strategy did not pass validation:")
    for reason in validation.effectiveness_reasons:
        print(f"  - {reason}")

Analysing Results

The TuningResult provides full trial-level data for post-hoc analysis:

result = tuner.tune(run_fingerprint)

# Summary statistics
print(f"Trials: {result.n_completed} completed, "
      f"{result.n_pruned} pruned, {result.n_failed} failed")
print(f"Total time: {result.total_time_seconds:.0f}s")

# Per-trial data
for trial in result.all_trials[:5]:
    print(f"  Trial {trial['number']}: "
          f"value={trial['value']:.3f}, params={trial['params']}")

# Best trial's full WFA evaluation
if result.best_evaluation is not None:
    print(f"\nBest trial WFE: {result.best_evaluation.mean_wfe:.3f}")
    for c in result.best_evaluation.cycles:
        print(f"  Cycle {c.cycle_number}: "
              f"IS={c.is_sharpe:.2f}, OOS={c.oos_sharpe:.2f}")