Auto Grad Functions

Smart Predict-then-Optimize+ Loss (SPO+)

SPO+ Loss function [1] is a surrogate loss function of SPO Loss (Regret), which measures the decision error of optimization problem. For SPO/SPO+ Loss, the objective function is linear and constraints are known and fixed, but the cost vector need to be predicted from contextual data. The SPO+ Loss is convex with non-zero subgradient.

class pyepo.func.SPOPlus(optmodel, processes=1, solve_ratio=1, reduction='mean', dataset=None)

An autograd module for SPO+ Loss, as a surrogate loss function of SPO (regret) Loss, which measures the decision error of the optimization problem.

For SPO/SPO+ Loss, the objective function is linear and constraints are known and fixed, but the cost vector needs to be predicted from contextual data.

The SPO+ Loss is convex with subgradient. Thus, it allows us to design an algorithm based on stochastic gradient descent.

Reference: <https://doi.org/10.1287/mnsc.2020.3922>

Parameters:

optmodel (optModel) – an PyEPO optimization model
processes (int) – number of processors, 1 for single-core, 0 for all of cores
solve_ratio (float) – the ratio of new solutions computed during training
reduction (str) – the reduction to apply to the output
dataset (None/optDataset) – the training data

forward(pred_cost, true_cost, true_sol, true_obj): Forward pass

pyepo.func.SPOPlus supports to solve optimization problems in parallel, parameter processes is the number of processors, 0 for using all available cores.

import pyepo

spo = pyepo.func.SPOPlus(optmodel, processes=2)

Differentiable Black-box Optimizer (DBB)

Diffenretiable black-box (DBB) optimizer function [2] estimates gradients from interpolation, replacing the zero gradients. For differentiable block-box, the objective function is linear and constraints are known and fixed, but the cost vector need to be predicted from contextual data.

class pyepo.func.blackboxOpt(optmodel, lambd=10, processes=1, solve_ratio=1, dataset=None)

An autograd module for differentiable black-box optimizer, which yield an optimal solution and derive a gradient.

For differentiable block-box, the objective function is linear and constraints are known and fixed, but the cost vector needs to be predicted from contextual data.

The block-box approximates the gradient of the optimizer by interpolating the loss function. Thus, it allows us to design an algorithm based on stochastic gradient descent.

Reference: <https://arxiv.org/abs/1912.02175>

Parameters:

optmodel (optModel) – an PyEPO optimization model
lambd (float) – a hyperparameter for differentiable block-box to control interpolation degree
processes (int) – number of processors, 1 for single-core, 0 for all of cores
solve_ratio (float) – the ratio of new solutions computed during training
dataset (None/optDataset) – the training data

forward(pred_cost): Forward pass

pyepo.func.blackboxOpt supports to solve optimization problems in parallel, parameter processes is the number of processors, 0 for using all available cores. lambd is a hyperparameter for function smoothing. The range of lambd should be 10 to 20.

import pyepo

dbb = pyepo.func.blackboxOpt(optmodel, lambd=10, processes=2)

Negative Identity Backpropagation (NID)

Negative Identity Backpropagation (NID) [6] treats the solver as a negative identity mapping during the backward pass, which is equivalent to DBB with certain hyperparameter. It is hyperparameter-free and does not require any additional computationally expensive call to the solver on the backward pass.

class pyepo.func.negativeIdentity(optmodel, processes=1, solve_ratio=1, dataset=None)

An autograd module for the differentiable optimizer, which yields optimal a solution and use negative identity as a gradient on the backward pass.

For negative identity backpropagation, the objective function is linear and constraints are known and fixed, but the cost vector needs to be predicted from contextual data.

If the interpolation hyperparameter λ aligns with an appropriate step size, then the identity update is equivalent to DBB. However, the identity update does not require an additional call to the solver during the backward pass and tuning an additional hyperparameter λ.

Reference: <https://arxiv.org/abs/2205.15213>

Parameters:

optmodel (optModel) – an PyEPO optimization model
processes (int) – number of processors, 1 for single-core, 0 for all of cores
solve_ratio (float) – the ratio of new solutions computed during training
dataset (None/optDataset) – the training data

forward(pred_cost): Forward pass

pyepo.func.negativeIdentity supports to solve optimization problems in parallel, parameter processes is the number of processors, 0 for using all available cores.

import pyepo

nid = pyepo.func.negativeIdentity(optmodel, processes=2)

Differentiable Perturbed Optimizer (DPO)

Differentiable perturbed Optimizer (DPO) [3] uses Monte-Carlo samples to estimate solutions, in which randomly perturbed costs are sampled to optimize. For the perturbed optimizer, the cost vector needs to be predicted from contextual data and are perturbed with Gaussian noise. The perturbed optimizer is differentiable in its inputs with non-zero Jacobian.

class pyepo.func.perturbedOpt(optmodel, n_samples=10, sigma=1.0, processes=1, seed=135, solve_ratio=1, dataset=None)

An autograd module for Fenchel-Young loss using perturbation techniques. The use of the loss improves the algorithmic by the specific expression of the gradients of the loss.

For the perturbed optimizer, the cost vector needs to be predicted from contextual data and are perturbed with Gaussian noise.

Thus, it allows us to design an algorithm based on stochastic gradient descent.

Reference: <https://papers.nips.cc/paper/2020/hash/6bb56208f672af0dd65451f869fedfd9-Abstract.html>

Parameters:

optmodel (optModel) – an PyEPO optimization model
n_samples (int) – number of Monte-Carlo samples
sigma (float) – the amplitude of the perturbation
processes (int) – number of processors, 1 for single-core, 0 for all of cores
seed (int) – random state seed
solve_ratio (float) – the ratio of new solutions computed during training
dataset (None/optDataset) – the training data

forward(pred_cost): Forward pass

pyepo.func.perturbedOpt supports to solve optimization problems in parallel, parameter processes is the number of processors, 0 for using all available cores. n_samples is the number of Monte-Carlo samples to estimate solutions, and sigma is the variance of Gaussian noise perturbation.

import pyepo

dpo = pyepo.func.perturbedOpt(optmodel, n_samples=10, sigma=0.5, processes=2)

Perturbed Fenchel-Young Loss (PYFL)

Perturbed Fenchel-Young loss (PYFL) function [3] uses perturbation techniques with Monte-Carlo samples. The use of the loss improves the algorithmic by the specific expression of the gradients of the loss. For the perturbed optimizer, the cost vector need to be predicted from contextual data and are perturbed with Gaussian noise. The Fenchel-Young loss allows to directly optimize a loss between the features and solutions with less computation.

class pyepo.func.perturbedFenchelYoung(optmodel, n_samples=10, sigma=1.0, processes=1, seed=135, solve_ratio=1, reduction='mean', dataset=None)

An autograd module for Fenchel-Young loss using perturbation techniques. The use of the loss improves the algorithmic by the specific expression of the gradients of the loss.

For the perturbed optimizer, the cost vector need to be predicted from contextual data and are perturbed with Gaussian noise.

The Fenchel-Young loss allows to directly optimize a loss between the features and solutions with less computation. Thus, allows us to design an algorithm based on stochastic gradient descent.

Reference: <https://papers.nips.cc/paper/2020/hash/6bb56208f672af0dd65451f869fedfd9-Abstract.html>

Parameters:

optmodel (optModel) – an PyEPO optimization model
n_samples (int) – number of Monte-Carlo samples
sigma (float) – the amplitude of the perturbation
processes (int) – number of processors, 1 for single-core, 0 for all of cores
seed (int) – random state seed
solve_ratio (float) – the ratio of new solutions computed during training
reduction (str) – the reduction to apply to the output
dataset (None/optDataset) – the training data

forward(pred_cost, true_sol): Forward pass

pyepo.func.perturbedFenchelYoung supports to solve optimization problems in parallel, parameter processes is the number of processors, 0 for using all available cores. n_samples is the number of Monte-Carlo samples to estimate solutions, and sigma is the variance of Gaussian noise perturbation.

import pyepo

pfy = pyepo.func.perturbedFenchelYoung(optmodel, n_samples=10, sigma=0.5, processes=2)

Implicit Maximum Likelihood Estimator (I-MLE)

Implicit Maximum Likelihood Estimator (I-MLE) [7] use the perturb-and-MAP framework. They sample noise from a Sum-of-Gamma distribution and interpolate the loss function to approximate finite difference.

class pyepo.func.implicitMLE(optmodel, n_samples=10, sigma=1.0, lambd=10, distribution=<pyepo.func.utlis.sumGammaDistribution object>, two_sides=False, processes=1, solve_ratio=1, dataset=None)

An autograd module for Implicit Maximum Likelihood Estimator, which yield an optimal solution in a constrained exponential family distribution via Perturb-and-MAP.

For I-MLE, it works as black-box combinatorial solvers, in which constraints are known and fixed, but the cost vector need to be predicted from contextual data.

The I-MLE approximate gradient of optimizer smoothly. Thus, allows us to design an algorithm based on stochastic gradient descent.

Reference: <https://proceedings.neurips.cc/paper_files/paper/2021/hash/7a430339c10c642c4b2251756fd1b484-Abstract.html>

Parameters:

optmodel (optModel) – an PyEPO optimization model
n_samples (int) – number of Monte-Carlo samples
sigma (float) – noise temperature for the input distribution
lambd (float) – a hyperparameter for differentiable block-box to control interpolation degree
distribution (distribution) – noise distribution
two_sides (bool) – approximate gradient by two-sided perturbation or not
processes (int) – number of processors, 1 for single-core, 0 for all of cores
solve_ratio (float) – the ratio of new solutions computed during training
dataset (None/optDataset) – the training data

forward(pred_cost): Forward pass

pyepo.func.implicitMLE supports to solve optimization problems in parallel, parameter processes is the number of processors, 0 for using all available cores.

import pyepo

imle = pyepo.func.implicitMLE(optmodel, n_samples=10, sigma=1.0, lambd=10, processes=2)

Adaptive Implicit Maximum Likelihood Estimator (AI-MLE)

Adaptive Implicit Maximum Likelihood Estimator (AI-MLE) [8] use the adaptive interpolation step and the perturb-and-MAP framework. They sample noise from a Sum-of-Gamma distribution and interpolate the loss function to approximate finite difference.

class pyepo.func.adaptiveImplicitMLE(optmodel, n_samples=10, sigma=1.0, distribution=<pyepo.func.utlis.sumGammaDistribution object>, two_sides=False, processes=1, solve_ratio=1, dataset=None)

An autograd module for Adaptive Implicit Maximum Likelihood Estimator, which adaptively choose hyperparameter λ and yield an optimal solution in a constrained exponential family distribution via Perturb-and-MAP.

For AI-MLE, it works as black-box combinatorial solvers, in which constraints are known and fixed, but the cost vector need to be predicted from contextual data.

The AI-MLE approximate gradient of optimizer smoothly. Thus, allows us to design an algorithm based on stochastic gradient descent.

Reference: <https://ojs.aaai.org/index.php/AAAI/article/view/26103>

Parameters:

optmodel (optModel) – an PyEPO optimization model
n_samples (int) – number of Monte-Carlo samples
sigma (float) – noise temperature for the input distribution
distribution (distribution) – noise distribution
two_sides (bool) – approximate gradient by two-sided perturbation or not
processes (int) – number of processors, 1 for single-core, 0 for all of cores
solve_ratio (float) – the ratio of new solutions computed during training
dataset (None/optDataset) – the training data

forward(pred_cost): Forward pass

pyepo.func.adaptiveImplicitMLE supports to solve optimization problems in parallel, parameter processes is the number of processors, 0 for using all available cores.

import pyepo

aimle = pyepo.func.adaptiveImplicitMLE(optmodel, n_samples=10, sigma=1.0, processes=2)

Noise Contrastive Estimation (NCE)

Noise Contrastive Estimation (NCE) [4] serve as surrogate loss function based on negative examples. The key idea is to work with a small set of non-optimal solutions as negative samples. Thus, we can maximizes the difference of the probability between optimal solution and others.

class pyepo.func.NCE(optmodel, processes=1, solve_ratio=1, reduction='mean', dataset=None)

An autograd module for noise contrastive estimation as surrogate loss functions, based on viewing suboptimal solutions as negative examples.

For the NCE, the cost vector needs to be predicted from contextual data and maximizes the separation of the probability of the optimal solution.

Thus allows us to design an algorithm based on stochastic gradient descent.

Reference: <https://www.ijcai.org/proceedings/2021/390>

Parameters:

optmodel (optModel) – an PyEPO optimization model
processes (int) – number of processors, 1 for single-core, 0 for all of cores
solve_ratio (float) – the ratio of new solutions computed during training
reduction (str) – the reduction to apply to the output
dataset (None/optDataset) – the training data, usually this is simply the training set

forward(pred_cost, true_sol): Forward pass

pyepo.func.NCE supports to solve optimization problems in parallel, parameter processes is the number of processors, 0 for using all available cores.

import pyepo

nce = pyepo.func.NCE(optmodel, processes=2, solve_ratio=0.05, dataset=dataset_train)

Contrastive Maximum A Posterior Estimation (CMAP)

Contrastive Maximum A Posteriori (CMAP) Loss function [4] is a special case of NCE where only samples the best one. It is simple but efficient.

class pyepo.func.contrastiveMAP(optmodel, processes=1, solve_ratio=1, reduction='mean', dataset=None)

An autograd module for Maximum A Posterior contrastive estimation as surrogate loss functions, which is an efficient self-contrastive algorithm.

For the MAP, the cost vector needs to be predicted from contextual data and maximizes the separation of the probability of the optimal solution.

Thus, it allows us to design an algorithm based on stochastic gradient descent.

Reference: <https://www.ijcai.org/proceedings/2021/390>

Parameters:

optmodel (optModel) – an PyEPO optimization model
processes (int) – number of processors, 1 for single-core, 0 for all of cores
solve_ratio (float) – the ratio of new solutions computed during training
reduction (str) – the reduction to apply to the output
dataset (None/optDataset) – the training data, usually this is simply the training set

forward(pred_cost, true_sol): Forward pass

pyepo.func.contrastiveMAP supports to solve optimization problems in parallel, parameter processes is the number of processors, 0 for using all available cores.

import pyepo

cmap = pyepo.func.contrastiveMAP(optmodel, processes=2, solve_ratio=0.05, dataset=dataset_train)

Learning to Rank (LTR)

LTR Loss function [5] is to learn an objective function that ranks a pool of feasible solutions correctly. LTR methods assign scores to the disparate solutions in pool, then establish surrogate loss functions predicated on these scores with the intention of ranking the optimal solution best.

Pointwise loss calculates the ranking scores of the items.

class pyepo.func.pointwiseLTR(optmodel, processes=1, solve_ratio=1, reduction='mean', dataset=None)

An autograd module for pointwise learning to rank, where the goal is to learn an objective function that ranks a pool of feasible solutions correctly.

For the pointwise LTR, the cost vector needs to be predicted from contextual data, and calculates the ranking scores of the items.

Thus, it allows us to design an algorithm based on stochastic gradient descent.

Reference: <https://proceedings.mlr.press/v162/mandi22a.html>

Parameters:

optmodel (optModel) – an PyEPO optimization model
processes (int) – number of processors, 1 for single-core, 0 for all of cores
solve_ratio (float) – the ratio of new solutions computed during training
reduction (str) – the reduction to apply to the output
dataset (optDataset) – the training data

forward(pred_cost, true_cost): Forward pass

Pairwise loss learns the relative ordering of pairs of items.

class pyepo.func.pairwiseLTR(optmodel, processes=1, solve_ratio=1, reduction='mean', dataset=None)

An autograd module for pairwise learning to rank, where the goal is to learn an objective function that ranks a pool of feasible solutions correctly.

For the pairwise LTR, the cost vector needs to be predicted from the contextual data and the loss learns the relative ordering of pairs of items.

Thus, it allows us to design an algorithm based on stochastic gradient descent.

Reference: <https://proceedings.mlr.press/v162/mandi22a.html>

Parameters:

optmodel (optModel) – an PyEPO optimization model
processes (int) – number of processors, 1 for single-core, 0 for all of cores
solve_ratio (float) – the ratio of new solutions computed during training
reduction (str) – the reduction to apply to the output
dataset (optDataset) – the training data

forward(pred_cost, true_cost): Forward pass

Listwise loss measures the scores of the whole ranked lists.

class pyepo.func.listwiseLTR(optmodel, processes=1, solve_ratio=1, reduction='mean', dataset=None)

An autograd module for listwise learning to rank, where the goal is to learn an objective function that ranks a pool of feasible solutions correctly.

For the listwise LTR, the cost vector needs to be predicted from the contextual data and the loss measures the scores of the whole ranked lists.

Thus, it allows us to design an algorithm based on stochastic gradient descent.

Reference: <https://proceedings.mlr.press/v162/mandi22a.html>

Parameters:

optmodel (optModel) – an PyEPO optimization model
processes (int) – number of processors, 1 for single-core, 0 for all of cores
solve_ratio (float) – the ratio of new solutions computed during training
reduction (str) – the reduction to apply to the output
dataset (optDataset) – the training data, usually this is simply the training set

forward(pred_cost, true_cost): Forward pass

pyepo.func.pointwiseLTR, pyepo.func.pairwiseLTR, and pyepo.func.listwiseLTR supports to solve optimization problems in parallel, parameter processes is the number of processors, 0 for using all available cores.

import pyepo

# pointwise
ltr = pyepo.func.pointwiseLTR(optmodel, processes=2, solve_ratio=0.05, dataset=dataset_train)
# pairwise
ltr = pyepo.func.pairwiseLTR(optmodel, processes=2, solve_ratio=0.05, dataset=dataset_train)
# listwise
ltr = pyepo.func.listwiseLTR(optmodel, processes=2, solve_ratio=0.05, dataset=dataset_train)

Perturbation Gradient Loss (PG)

PG Loss function [9] a surrogate loss function of objective value, which measures the decision quality of the optimization problem. For PG Loss, the objective function is linear, and constraints are known and fixed, but the cost vector needs to be predicted from contextual data. According to Danskin’s Theorem, the PG Loss is derived from different zeroth order approximations and has the informative gradient. Thus, it allows us to design an algorithm based on stochastic gradient descent.

class pyepo.func.perturbationGradient(optmodel, sigma=0.1, two_sides=False, processes=1, solve_ratio=1, reduction='mean', dataset=None)

An autograd module for PG Loss, as a surrogate loss function of objective value, which measures the decision quality of the optimization problem.

For PG Loss, the objective function is linear, and constraints are known and fixed, but the cost vector needs to be predicted from contextual data.

According to Danskin’s Theorem, the PG Loss is derived from different zeroth order approximations and has the informative gradient. Thus, it allows us to design an algorithm based on stochastic gradient descent.

Reference: <https://arxiv.org/abs/2402.03256>

Parameters:

optmodel (optModel) – an PyEPO optimization model
sigma (float) – the amplitude of the finite difference width used for loss approximation
two_sides (bool) – approximate gradient by two-sided perturbation or not
processes (int) – number of processors, 1 for single-core, 0 for all of cores
solve_ratio (float) – the ratio of new solutions computed during training
reduction (str) – the reduction to apply to the output
dataset (None/optDataset) – the training data

forward(pred_cost, true_cost): Forward pass

pyepo.func.perturbationGradient supports to solve optimization problems in parallel, parameter processes is the number of processors, 0 for using all available cores.

import pyepo

pg = pyepo.func.perturbationGradient(optmodel, sigma=0.1, two_sides=False, processes=2)

Parallel Computation

PyEPO supports parallel computation for solving optimization problems in training, where the parameter processes is the number of processors to be used.

The figure shows that the increasing of processes reduces the runtime.

Footnotes