Mastering Hyperparameter Optimization with Optuna: A Comprehensive Guide
In this guide, we dive into an advanced hyperparameter tuning workflow that leverages pruning techniques, multi-objective optimization, custom callback functions, and insightful visualizations. Step-by-step, we demonstrate how Optuna empowers us to craft more intelligent search spaces, accelerate experimentation, and extract actionable insights to enhance model performance. Using authentic datasets, we develop efficient search strategies and analyze trial outcomes interactively and intuitively.
Setting Up Pruning for Efficient Gradient Boosting Optimization
import optuna
from optuna.pruners import MedianPruner
from optuna.samplers import TPESampler
import numpy as np
from sklearn.datasets import loadbreastcancer
from sklearn.modelselection import KFold
from sklearn.ensemble import GradientBoostingClassifier
import matplotlib.pyplot as plt
def objectivewithpruning(trial):
X, y = loadbreastcancer(returnXy=True)
params = {
'nestimators': trial.suggestint('nestimators', 50, 200),
'minsamplessplit': trial.suggestint('minsamplessplit', 2, 20),
'minsamplesleaf': trial.suggestint('minsamplesleaf', 1, 10),
'subsample': trial.suggestfloat('subsample', 0.6, 1.0),
'maxfeatures': trial.suggestcategorical('maxfeatures', ['sqrt', 'log2', None]),
}
model = GradientBoostingClassifier(*params, randomstate=42)
kf = KFold(nsplits=3, shuffle=True, randomstate=42)
scores = []
for fold, (trainidx, validx) in enumerate(kf.split(X)):
Xtrain, Xval = X[trainidx], X[validx]
ytrain, yval = y[trainidx], y[validx]
model.fit(Xtrain, ytrain)
score = model.score(Xval, yval)
scores.append(score)
trial.report(np.mean(scores), fold)
if trial.shouldprune():
raise optuna.TrialPruned()
return np.mean(scores)
study1 = optuna.createstudy(
direction='maximize',
sampler=TPESampler(seed=42),
pruner=MedianPruner(nstartuptrials=5, nwarmupsteps=1)
)
study1.optimize(objectivewithpruning, ntrials=30, showprogressbar=True)
print("Best Accuracy:", study1.bestvalue)
print("Optimal Parameters:", study1.bestparams)
Here, we initialize essential libraries and define an objective function that incorporates pruning. As the Gradient Boosting model undergoes hyperparameter tuning, Optuna dynamically halts underperforming trials, focusing computational resources on promising configurations. This adaptive pruning accelerates the search process and enhances optimization efficiency.
Balancing Accuracy and Complexity with Multi-Objective Optimization
from sklearn.datasets import loadbreastcancer
from sklearn.ensemble import RandomForestClassifier
from sklearn.modelselection import crossvalscore
def multiobjective(trial):
X, y = loadbreastcancer(returnXy=True)
nestimators = trial.suggestint('nestimators', 10, 200)
maxdepth = trial.suggestint('maxdepth', 2, 20)
minsamplessplit = trial.suggestint('minsamplessplit', 2, 20)
model = RandomForestClassifier(
nestimators=nestimators,
maxdepth=maxdepth,
minsamplessplit=minsamplessplit,
randomstate=42,
njobs=-1
)
accuracy = crossvalscore(model, X, y, cv=3, scoring='accuracy', njobs=-1).mean()
complexity = nestimators maxdepth
return accuracy, complexity
study2 = optuna.createstudy(
directions=['maximize', 'minimize'],
sampler=TPESampler(seed=42)
)
study2.optimize(multiobjective, ntrials=50, showprogressbar=True)
print("Top 3 Pareto-optimal Trials:")
for trial in study2.besttrials[:3]:
print(f"Trial #{trial.number}: Accuracy={trial.values[0]:.4f}, Complexity={trial.values[1]}")
Transitioning to a multi-objective framework, we simultaneously optimize for model accuracy and complexity. Optuna constructs a Pareto front, enabling us to evaluate trade-offs between competing goals rather than focusing on a single metric. This approach offers a nuanced perspective on model selection, balancing performance with resource efficiency.
Implementing Custom Early Stopping for Regression Tasks
from sklearn.datasets import loaddiabetes
from sklearn.linearmodel import Ridge
from sklearn.modelselection import crossvalscore
class EarlyStoppingCallback:
def init(self, earlystoppingrounds=10, direction='maximize'):
self.earlystoppingrounds = earlystoppingrounds
self.direction = direction
self.bestvalue = float('-inf') if direction == 'maximize' else float('inf')
self.counter = 0
def call(self, study, trial):
if trial.state != optuna.trial.TrialState.COMPLETE:
return
currentvalue = trial.value
if self.direction == 'maximize':
if currentvalue > self.bestvalue:
self.bestvalue = currentvalue
self.counter = 0
else:
self.counter += 1
else:
if currentvalue < self.bestvalue:
self.bestvalue = currentvalue
self.counter = 0
else:
self.counter += 1
if self.counter >= self.earlystoppingrounds:
study.stop()
def objectiveregression(trial):
X, y = loaddiabetes(returnXy=True)
alpha = trial.suggestfloat('alpha', 1e-3, 10.0, log=True)
maxiter = trial.suggestint('maxiter', 100, 2000)
model = Ridge(alpha=alpha, maxiter=maxiter, randomstate=42)
score = crossvalscore(model, X, y, cv=5, scoring='negmeansquarederror', njobs=-1).mean()
return -score
earlystopping = EarlyStoppingCallback(earlystoppingrounds=15, direction='minimize')
study3 = optuna.createstudy(direction='minimize', sampler=TPESampler(seed=42))
study3.optimize(objectiveregression, ntrials=100, callbacks=[earlystopping], showprogressbar=True)
print("Lowest MSE:", study3.bestvalue)
print("Best Hyperparameters:", study3.bestparams)
We craft a tailored early stopping callback to halt the optimization when improvements plateau, conserving computational resources. Applied to a Ridge regression task, this mechanism ensures the study terminates once the mean squared error ceases to improve over a set number of trials, reflecting practical training dynamics.
Comprehensive Visualization for Insightful Analysis
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
Plot optimization history for Study 1
ax = axes[0, 0]
values = [t.value for t in study1.trials if t.value is not None]
ax.plot(values, marker='o', markersize=3)
ax.axhline(y=study1.bestvalue, color='red', linestyle='--')
ax.settitle('Optimization Progress - Study 1')
Display parameter importance for Study 1
ax = axes[0, 1]
importance = optuna.importance.getparamimportances(study1)
topparams = list(importance.keys())[:5]
importancevalues = [importance[param] for param in topparams]
ax.barh(topparams, importancevalues)
ax.settitle('Parameter Importance - Study 1')
Visualize Pareto front from Study 2
ax = axes[1, 0]
for trial in study2.trials:
if trial.values:
ax.scatter(trial.values[0], trial.values[1], alpha=0.3)
for trial in study2.besttrials:
ax.scatter(trial.values[0], trial.values[1], color='red', s=90)
ax.setxlabel('Accuracy')
ax.setylabel('Model Complexity')
ax.settitle('Pareto Front - Study 2')
Correlate max
depth with accuracy in Study 1
ax = axes[1, 1]
depthaccuracypairs = [(t.params.get('maxdepth', 0), t.value) for t in study1.trials if t.value]
if depthaccuracypairs:
depths, accuracies = zip(depthaccuracypairs)
ax.scatter(depths, accuracies, alpha=0.6)
ax.setxlabel('maxdepth')
ax.setylabel('Accuracy')
ax.settitle('maxdepth vs Accuracy - Study 1')
plt.tightlayout()
plt.savefig('optunavisualization.png', dpi=150)
plt.show()
To better understand our experiments, we generate multiple plots: optimization trajectories, parameter importance rankings, Pareto fronts illustrating trade-offs, and relationships between hyperparameters and performance metrics. These visual tools provide a holistic view of the tuning process, revealing key factors driving model success.
Summary of Optimization Outcomes
prunedtrials = len([t for t in study1.trials if t.state == optuna.trial.TrialState.PRUNED])
totaltrialsstudy1 = len(study1.trials)
print(f"Study 1 - Best Accuracy: {study1.bestvalue:.4f}")
print(f"Study 1 - Percentage of Pruned Trials: {prunedtrials / totaltrialsstudy1 100:.2f}%")
print(f"Study 2 - Number of Pareto-optimal Solutions: {len(study2.besttrials)}")
print(f"Study 3 - Best Mean Squared Error: {study3.best_value:.4f}")
print(f"Study 3 - Total Trials Conducted: {len(study3.trials)}")
We conclude by reviewing the highlights from each study: the peak accuracy and pruning efficiency in the first, the count of Pareto-optimal configurations in the second, and the minimal regression error alongside trial volume in the third. This concise summary encapsulates the effectiveness and depth of our hyperparameter optimization journey.
Final Thoughts: Building Robust and Adaptive Hyperparameter Tuning Pipelines
This tutorial has equipped you with a versatile framework for hyperparameter optimization that transcends traditional single-metric tuning. By integrating pruning strategies, multi-objective optimization, custom early stopping, and comprehensive visualization, you can construct flexible and powerful workflows tailored to diverse machine learning challenges. Whether optimizing classical models or deep learning architectures, this blueprint offers a practical and scalable approach to achieving superior model performance with Optuna.

