pyepo.func

Pytorch autograd function for end-to-end training

Surrogate Loss

Smart Predict-then-Optimize Loss+ (SPO+)

SPO+ Loss function [1] is a surrogate loss function of SPO Loss (Regret), which measures the decision error of optimization problem. For SPO/SPO+ Loss, the objective function is linear and constraints are known and fixed, but the cost vector need to be predicted from contextual data. The SPO+ Loss is convex with non-zero subgradient.

class pyepo.func.SPOPlus(optmodel, processes=1, solve_ratio=1, reduction='mean', dataset=None)

An autograd module for SPO+ Loss, as a surrogate loss function of SPO (regret) Loss, which measures the decision error of the optimization problem.

For SPO/SPO+ Loss, the objective function is linear and constraints are known and fixed, but the cost vector needs to be predicted from contextual data.

The SPO+ Loss is convex with subgradient. Thus, it allows us to design an algorithm based on stochastic gradient descent.

Reference: <https://doi.org/10.1287/mnsc.2020.3922>

Parameters:

optmodel (optModel) – an PyEPO optimization model
processes (int) – number of processors, 1 for single-core, 0 for all of cores
solve_ratio (float) – the ratio of new solutions computed during training
reduction (str) – the reduction to apply to the output
dataset (None/optDataset) – the training data

forward(pred_cost, true_cost, true_sol, true_obj): Forward pass

Perturbation Gradient Loss (PG)

PG Loss function [9] a surrogate loss function of objective value, which measures the decision quality of the optimization problem. For PG Loss, the objective function is linear, and constraints are known and fixed, but the cost vector needs to be predicted from contextual data. According to Danskin’s Theorem, the PG Loss is derived from different zeroth order approximations and has the informative gradient. Thus, it allows us to design an algorithm based on stochastic gradient descent.

class pyepo.func.perturbationGradient(optmodel, sigma=0.1, two_sides=False, processes=1, solve_ratio=1, reduction='mean', dataset=None)

An autograd module for PG Loss, as a surrogate loss function of objective value, which measures the decision quality of the optimization problem.

For PG Loss, the objective function is linear, and constraints are known and fixed, but the cost vector needs to be predicted from contextual data.

According to Danskin’s Theorem, the PG Loss is derived from different zeroth order approximations and has the informative gradient. Thus, it allows us to design an algorithm based on stochastic gradient descent.

Reference: <https://arxiv.org/abs/2402.03256>

Parameters:

optmodel (optModel) – an PyEPO optimization model
sigma (float) – the amplitude of the finite difference width used for loss approximation
two_sides (bool) – approximate gradient by two-sided perturbation or not
processes (int) – number of processors, 1 for single-core, 0 for all of cores
solve_ratio (float) – the ratio of new solutions computed during training
reduction (str) – the reduction to apply to the output
dataset (None/optDataset) – the training data

forward(pred_cost, true_cost): Forward pass

Black-box Methods

Differentiable Black-box Optimizer (DBB)

Diffenretiable black-box (DBB) optimizer function [2] estimates gradients from interpolation, replacing the zero gradients. For differentiable block-box, the objective function is linear and constraints are known and fixed, but the cost vector need to be predicted from contextual data.

class pyepo.func.blackboxOpt(optmodel, lambd=10, processes=1, solve_ratio=1, dataset=None)

An autograd module for differentiable black-box optimizer, which yield an optimal solution and derive a gradient.

For differentiable block-box, the objective function is linear and constraints are known and fixed, but the cost vector needs to be predicted from contextual data.

The block-box approximates the gradient of the optimizer by interpolating the loss function. Thus, it allows us to design an algorithm based on stochastic gradient descent.

Reference: <https://arxiv.org/abs/1912.02175>

Parameters:

optmodel (optModel) – an PyEPO optimization model
lambd (float) – a hyperparameter for differentiable block-box to control interpolation degree
processes (int) – number of processors, 1 for single-core, 0 for all of cores
solve_ratio (float) – the ratio of new solutions computed during training
dataset (None/optDataset) – the training data

forward(pred_cost): Forward pass

Negative Identity Backpropagation (NID)

Negative Identity Backpropagation (NID) [6] treats the solver as a negative identity mapping during the backward pass, which is equivalent to DBB with certain hyperparameter. It is hyperparameter-free and does not require any additional computationally expensive call to the solver on the backward pass.

class pyepo.func.negativeIdentity(optmodel, processes=1, solve_ratio=1, dataset=None)

An autograd module for the differentiable optimizer, which yields optimal a solution and use negative identity as a gradient on the backward pass.

For negative identity backpropagation, the objective function is linear and constraints are known and fixed, but the cost vector needs to be predicted from contextual data.

If the interpolation hyperparameter λ aligns with an appropriate step size, then the identity update is equivalent to DBB. However, the identity update does not require an additional call to the solver during the backward pass and tuning an additional hyperparameter λ.

Reference: <https://arxiv.org/abs/2205.15213>

Parameters:

optmodel (optModel) – an PyEPO optimization model
processes (int) – number of processors, 1 for single-core, 0 for all of cores
solve_ratio (float) – the ratio of new solutions computed during training
dataset (None/optDataset) – the training data

forward(pred_cost): Forward pass

Perturbed Methods

Differentiable Perturbed Optimizer (DPO)

Differentiable perturbed Optimizer (DPO) [3] uses Monte-Carlo samples to estimate solutions, in which randomly perturbed costs are sampled to optimize. For the perturbed optimizer, the cost vector needs to be predicted from contextual data and are perturbed with Gaussian noise. The perturbed optimizer is differentiable in its inputs with non-zero Jacobian.

class pyepo.func.perturbedOpt(optmodel, n_samples=10, sigma=1.0, processes=1, seed=135, solve_ratio=1, dataset=None)

An autograd module for Fenchel-Young loss using perturbation techniques. The use of the loss improves the algorithmic by the specific expression of the gradients of the loss.

For the perturbed optimizer, the cost vector needs to be predicted from contextual data and are perturbed with Gaussian noise.

Thus, it allows us to design an algorithm based on stochastic gradient descent.

Reference: <https://papers.nips.cc/paper/2020/hash/6bb56208f672af0dd65451f869fedfd9-Abstract.html>

Parameters:

optmodel (optModel) – an PyEPO optimization model
n_samples (int) – number of Monte-Carlo samples
sigma (float) – the amplitude of the perturbation
processes (int) – number of processors, 1 for single-core, 0 for all of cores
seed (int) – random state seed
solve_ratio (float) – the ratio of new solutions computed during training
dataset (None/optDataset) – the training data

forward(pred_cost): Forward pass

Perturbed Fenchel-Young Loss (PYFL)

Perturbed Fenchel-Young loss (PYFL) function [3] uses perturbation techniques with Monte-Carlo samples. The use of the loss improves the algorithmic by the specific expression of the gradients of the loss. For the perturbed optimizer, the cost vector need to be predicted from contextual data and are perturbed with Gaussian noise. The Fenchel-Young loss allows to directly optimize a loss between the features and solutions with less computation.

class pyepo.func.perturbedFenchelYoung(optmodel, n_samples=10, sigma=1.0, processes=1, seed=135, solve_ratio=1, reduction='mean', dataset=None)

An autograd module for Fenchel-Young loss using perturbation techniques. The use of the loss improves the algorithmic by the specific expression of the gradients of the loss.

For the perturbed optimizer, the cost vector need to be predicted from contextual data and are perturbed with Gaussian noise.

The Fenchel-Young loss allows to directly optimize a loss between the features and solutions with less computation. Thus, allows us to design an algorithm based on stochastic gradient descent.

Reference: <https://papers.nips.cc/paper/2020/hash/6bb56208f672af0dd65451f869fedfd9-Abstract.html>

Parameters:

optmodel (optModel) – an PyEPO optimization model
n_samples (int) – number of Monte-Carlo samples
sigma (float) – the amplitude of the perturbation
processes (int) – number of processors, 1 for single-core, 0 for all of cores
seed (int) – random state seed
solve_ratio (float) – the ratio of new solutions computed during training
reduction (str) – the reduction to apply to the output
dataset (None/optDataset) – the training data

forward(pred_cost, true_sol): Forward pass

Implicit Maximum Likelihood Estimator (I-MLE)

Implicit Maximum Likelihood Estimator (I-MLE) [7] use the perturb-and-MAP framework. They sample noise from a Sum-of-Gamma distribution and interpolate the loss function to approximate finite difference.

class pyepo.func.implicitMLE(optmodel, n_samples=10, sigma=1.0, lambd=10, distribution=<pyepo.func.utlis.sumGammaDistribution object>, two_sides=False, processes=1, solve_ratio=1, dataset=None)

An autograd module for Implicit Maximum Likelihood Estimator, which yield an optimal solution in a constrained exponential family distribution via Perturb-and-MAP.

For I-MLE, it works as black-box combinatorial solvers, in which constraints are known and fixed, but the cost vector need to be predicted from contextual data.

The I-MLE approximate gradient of optimizer smoothly. Thus, allows us to design an algorithm based on stochastic gradient descent.

Reference: <https://proceedings.neurips.cc/paper_files/paper/2021/hash/7a430339c10c642c4b2251756fd1b484-Abstract.html>

Parameters:

optmodel (optModel) – an PyEPO optimization model
n_samples (int) – number of Monte-Carlo samples
sigma (float) – noise temperature for the input distribution
lambd (float) – a hyperparameter for differentiable block-box to control interpolation degree
distribution (distribution) – noise distribution
two_sides (bool) – approximate gradient by two-sided perturbation or not
processes (int) – number of processors, 1 for single-core, 0 for all of cores
solve_ratio (float) – the ratio of new solutions computed during training
dataset (None/optDataset) – the training data

forward(pred_cost): Forward pass

Adaptive Implicit Maximum Likelihood Estimator (AI-MLE)

Adaptive Implicit Maximum Likelihood Estimator (AI-MLE) [8] use the adaptive interpolation step and the perturb-and-MAP framework. They sample noise from a Sum-of-Gamma distribution and interpolate the loss function to approximate finite difference.

class pyepo.func.adaptiveImplicitMLE(optmodel, n_samples=10, sigma=1.0, distribution=<pyepo.func.utlis.sumGammaDistribution object>, two_sides=False, processes=1, solve_ratio=1, dataset=None)

An autograd module for Adaptive Implicit Maximum Likelihood Estimator, which adaptively choose hyperparameter λ and yield an optimal solution in a constrained exponential family distribution via Perturb-and-MAP.

For AI-MLE, it works as black-box combinatorial solvers, in which constraints are known and fixed, but the cost vector need to be predicted from contextual data.

The AI-MLE approximate gradient of optimizer smoothly. Thus, allows us to design an algorithm based on stochastic gradient descent.

Reference: <https://ojs.aaai.org/index.php/AAAI/article/view/26103>

Parameters:

optmodel (optModel) – an PyEPO optimization model
n_samples (int) – number of Monte-Carlo samples
sigma (float) – noise temperature for the input distribution
distribution (distribution) – noise distribution
two_sides (bool) – approximate gradient by two-sided perturbation or not
processes (int) – number of processors, 1 for single-core, 0 for all of cores
solve_ratio (float) – the ratio of new solutions computed during training
dataset (None/optDataset) – the training data

forward(pred_cost): Forward pass

Contrastive Methods

Noise Contrastive Estimation (NCE)

Noise Contrastive Estimation (NCE) [4] serve as surrogate loss function based on negative examples. The key idea is to work with a small set of non-optimal solutions as negative samples. Thus, we can maximizes the difference of the probability between optimal solution and others.

class pyepo.func.NCE(optmodel, processes=1, solve_ratio=1, reduction='mean', dataset=None)

An autograd module for noise contrastive estimation as surrogate loss functions, based on viewing suboptimal solutions as negative examples.

For the NCE, the cost vector needs to be predicted from contextual data and maximizes the separation of the probability of the optimal solution.

Thus allows us to design an algorithm based on stochastic gradient descent.

Reference: <https://www.ijcai.org/proceedings/2021/390>

Parameters:

optmodel (optModel) – an PyEPO optimization model
processes (int) – number of processors, 1 for single-core, 0 for all of cores
solve_ratio (float) – the ratio of new solutions computed during training
reduction (str) – the reduction to apply to the output
dataset (None/optDataset) – the training data, usually this is simply the training set

forward(pred_cost, true_sol): Forward pass

Contrastive Maximum A Posterior Estimation (CMAP)

Contrastive Maximum A Posteriori (CMAP) Loss function [4] is a special case of NCE where only samples the best one. It is simple but efficient.

class pyepo.func.contrastiveMAP(optmodel, processes=1, solve_ratio=1, reduction='mean', dataset=None)

An autograd module for Maximum A Posterior contrastive estimation as surrogate loss functions, which is an efficient self-contrastive algorithm.

For the MAP, the cost vector needs to be predicted from contextual data and maximizes the separation of the probability of the optimal solution.

Thus, it allows us to design an algorithm based on stochastic gradient descent.

Reference: <https://www.ijcai.org/proceedings/2021/390>

Parameters:

optmodel (optModel) – an PyEPO optimization model
processes (int) – number of processors, 1 for single-core, 0 for all of cores
solve_ratio (float) – the ratio of new solutions computed during training
reduction (str) – the reduction to apply to the output
dataset (None/optDataset) – the training data, usually this is simply the training set

forward(pred_cost, true_sol): Forward pass

Learning-to-Rank Methods

LTR Loss function [5] is to learn an objective function that ranks a pool of feasible solutions correctly. LTR methods assign scores to the disparate solutions in pool, then establish surrogate loss functions predicated on these scores with the intention of ranking the optimal solution best.

Pointwise Learning-to-Rank (LTR)

Pointwise loss calculates the ranking scores of the items.

class pyepo.func.pointwiseLTR(optmodel, processes=1, solve_ratio=1, reduction='mean', dataset=None)

An autograd module for pointwise learning to rank, where the goal is to learn an objective function that ranks a pool of feasible solutions correctly.

For the pointwise LTR, the cost vector needs to be predicted from contextual data, and calculates the ranking scores of the items.

Thus, it allows us to design an algorithm based on stochastic gradient descent.

Reference: <https://proceedings.mlr.press/v162/mandi22a.html>

Parameters:

optmodel (optModel) – an PyEPO optimization model
processes (int) – number of processors, 1 for single-core, 0 for all of cores
solve_ratio (float) – the ratio of new solutions computed during training
reduction (str) – the reduction to apply to the output
dataset (optDataset) – the training data

forward(pred_cost, true_cost): Forward pass

Pairwise Learning-to-Rank (LTR)

Pairwise loss learns the relative ordering of pairs of items.

class pyepo.func.pairwiseLTR(optmodel, processes=1, solve_ratio=1, reduction='mean', dataset=None)

An autograd module for pairwise learning to rank, where the goal is to learn an objective function that ranks a pool of feasible solutions correctly.

For the pairwise LTR, the cost vector needs to be predicted from the contextual data and the loss learns the relative ordering of pairs of items.

Thus, it allows us to design an algorithm based on stochastic gradient descent.

Reference: <https://proceedings.mlr.press/v162/mandi22a.html>

Parameters:

optmodel (optModel) – an PyEPO optimization model
processes (int) – number of processors, 1 for single-core, 0 for all of cores
solve_ratio (float) – the ratio of new solutions computed during training
reduction (str) – the reduction to apply to the output
dataset (optDataset) – the training data

forward(pred_cost, true_cost): Forward pass

Listwise Learning-to-Rank (LTR)

Listwise loss measures the scores of the whole ranked lists.

class pyepo.func.listwiseLTR(optmodel, processes=1, solve_ratio=1, reduction='mean', dataset=None)

An autograd module for listwise learning to rank, where the goal is to learn an objective function that ranks a pool of feasible solutions correctly.

For the listwise LTR, the cost vector needs to be predicted from the contextual data and the loss measures the scores of the whole ranked lists.

Thus, it allows us to design an algorithm based on stochastic gradient descent.

Reference: <https://proceedings.mlr.press/v162/mandi22a.html>

Parameters:

optmodel (optModel) – an PyEPO optimization model
processes (int) – number of processors, 1 for single-core, 0 for all of cores
solve_ratio (float) – the ratio of new solutions computed during training
reduction (str) – the reduction to apply to the output
dataset (optDataset) – the training data, usually this is simply the training set

forward(pred_cost, true_cost): Forward pass

Footnotes