Layered-Least-Squares
Interior Point Methods

a combinatorial optimization perspective

László Végh

London School of Economics

joint work with

Daniel Dadush, Sophie Huiberts (CWI),
and
Bento Natura (LSE)

Aussois, 9 Jan 2020

Linear Programming

In standard form for $$A \in \mathbb{Q}^{m \times n}$$, $$b\in \mathbb{Q}^m$$, $$c\in \mathbb{Q}^n$$, $$$ \begin{aligned} \min \; &c^\top x & \max \; & y^\top b \\ Ax& =b & \qquad A^\top y + s &= c \\ x &\geq 0 & s & \geq 0 \\ \end{aligned} $$$

Timeline

Strongly polynomial algorithm for LP
Smale's 9th question
Interior point methods
Karmarkar
Ellipsoid Method
Khachiyan
Simplex Method
Dantzig
Origins
Fourier

Weakly vs Strongly Polynomial Algorithms for LP

LP with $$n$$ variables, $$m$$ constraints
$$L$$: encoding length of the input.

weakly polynomial

$$\mathrm{poly}(m,n,L)$$ basic arithmetic operations.

strongly polynomial

$$\mathrm{poly}(m,n)$$ basic arithmetic operations.
PSPACE: all numbers occurring in the algorithm must remain polynomially bounded in input size.

Ellipsoid, interior point methods: running time bound heavily relies on $$L$$.

Strongly Polynomial Algorithms for LP

Network flow problems

Maximum flow: Edmonds–Karp–Dinitz '70-72
Min-cost flow: Tardos '85

Special classes of LP

Feasibility of 2-variable-per-inequality systems: Megiddo '83
Discounted Markov Decision Processes: Ye '05, Ye '11
Maximum generalized flow problem: V. '17, Olver–V. '18
...

Dependence on the constraint matrix only

$$$\min c^\top x,\, Ax=b,\, x\ge 0$$$ Running time dependent only on constraint matrix $$A$$, but not on $$b$$ and $$c$$.

General LP

'Combinatorial LPs'
If $$A$$ integral and $$|\mathrm{det}(B)| \leq \Delta$$ for all square submatrices of $$A$$, then LP solvable in $$\mathrm{poly}(m,n,\Delta)$$ arithmetic operations: Tardos '86
'Layered-least-squares (LLS) Interior Point Method'
LP solvable in $$O(n^{3.5}\log\bar\chi_A)$$ linear system solves: Vavasis–Ye '96

Condition number $$\bar\chi_A$$

$$\bar\chi_A = O(2^{L_A})$$.
Governs the stability of layered-least-squares solutions.
Depends only on the subspace $$\ker(A)$$.
NP-hard to approximate within a factor $$2^{\mathrm{poly}(\mathrm{rank}(A))}$$: Tunçel '99

Scale invariance

$$$ \begin{aligned} \min \; &c^\top x & \max \; & y^\top b \\ Ax& =b & \qquad A^\top y + s &= c \\ x &\geq 0 & s & \geq 0 \\ \end{aligned} $$$

$$A\in \R^{m\times n}$$. Let $$\mathbf{D}$$ denote the set of $$n\times n$$ positive diagonal matrices.

Since $$\bar\chi_A$$ only depends on $$\ker(A)$$, we have $$\bar\chi_A=\bar\chi_{RA}$$ for any nonsingular $$R\in\R^{m\times m}$$.
However, we may have $$\bar\chi_A\neq \bar\chi_{AD}$$ for $$D\in\mathbf{D}$$.
Diagonal rescaling (scaling) of the problem: Replace $$A'=AD$$, $$c'=Dc$$, $$b'=b$$ in the LP for $$D\in \mathbf{D}$$.
Corresponding variables transformation: $$x'=D^{-1} x$$, $$y'=y$$, $$s'=D s$$.
Central path is invariant under rescaling.
Standard interior point methods are also invariant under rescaling.
The Vavasis–Ye algorithm is not scaling invariant

Is there a scaling invariant LLS interior point method?

$$$\bar\chi_A^* := \inf\{\bar\chi_{AD}: D\in \mathbf{D}\}.$$$

Vavasis–Ye (VY): $$O(n^{3.5} \log\bar\chi_A)$$ linear system solves.
A scaling invariant version of VY would automatically give $$O(n^{3.5} \log\bar\chi_A^*)$$.
The (scaling invariant) Mizuno–Todd–Ye Predictor-Corrector algorithm finds an $$\epsilon$$-approximate solution in time $$O(n^{3.5}\log\bar\chi_A^* + n^2\log\log(1/\varepsilon))$$: Monteiro–Tsuchiya '05.
$$O(n^{3.5}\log\bar\chi_A^*)$$ iterations scaling invariant algorithm, but each iteration depends on the bit-complexity $$b$$ and $$c$$: Lan–Monteiro–Tsuchiya '09.

Our contributions

Dadush–Huiberts–Natura–V. '19+

A scaling invariant algorithm We give a scaling invariant LLS interior point method, settling the open question posed by Monteiro and Tsuchiya in '03.

Improved Runtime We show a running time bound of $$O(n^{2.5}\log(\bar\chi_A^\ast +n)\log n)$$ linear system solves. This is achieved via an improved analysis that also applies to the original VY algorithm.

Finding a nearly-optimal rescaling of $$A$$

Tunçel '99: There exists no additive $${\mathrm{poly}(\mathrm{rank}(n))}$$-approximation of $$\log \bar\chi_A$$.
DHNV19+: In polynomial-time, we can find $$D\in\mathbf{D}$$ such that $$$ \log \bar\chi_A^*\le \log \bar\chi_{AD}\le 2\log \bar\chi_A^*+\log n$$$
DHNV19+: There exists a polynomial-time additive $$\left(3\log (\bar\chi_A^*)+\log n\right)$$-approximation of $$\log \bar\chi_A$$.

Limitation

[Allamigeon–Benchimol–Gaubert–Joswig '18]
No path following interior point method can be strongly polynomial.

Proof using tropical geometry: studies the tropical limit of a family of parametrized linear programs.

A crash course in Interior Point Methods

Complementary Slackness

$$$ \begin{aligned} \min \; &c^\top x & \max \; & y^\top b \\ Ax& =b & \qquad A^\top y + s &= c \\ x &\geq 0 & s & \geq 0 \\ \end{aligned} $$$ A feasible primal dual pair $$(x,s) \geq 0$$ is optimal if and only if $$x^\top s = 0$$, i.e. $$x_i = 0$$ or $$s_i = 0$$ for all $$i \in [n]$$.

Weak duality

$$$ c^\top x = (y^\top A + s^\top)x = y^\top b + \textcolor{red}{s^\top x} \geq y^\top b.$$$

The central path

For each $$\mu>0$$, there exists a unique solution $$w(\mu)=(x(\mu),y(\mu),s(\mu))$$ such that $$$x(\mu)_i s(\mu)_i=\mu\quad \forall i\in [n]$$$ This is called the central path element for $$\mu$$.
The central path is the algebraic curve formed by $$$\{w(\mu): \mu>0\}$$$
For $$\mu\to 0$$, the central path converges to an optimal solution $$w^*=(x^*,y^*,\mu^*)$$

The Mizuno-Todd-Ye Predictor Corrector algorithm

Start from point $$w^0=(x^0,y^0,s^0)$$ 'near' the central path at some $$\mu^0>0$$.
Alternate between two types of Newton steps.
Predictor steps: 'shoot down' the central path, decreasing $$\mu$$ by a factor at least $$1- \beta/\sqrt{n}$$.
Move slightly 'farther' from the central path.
Corrector steps: do not change parameter $$\mu$$, but move back 'closer' to the central path.
Within $$O(\sqrt n)$$ iteration, $$\mu$$ decreases at least by a factor 2.

Predictor Netwon step

We obtain the step direction $$\Delta w=(\Delta x,\Delta y,\Delta s)$$ as solutions to $$$\min\, \sum_{i=1}^n\left(\frac{x_i+\Delta x_i}{x_i}\right)^2 \quad \text{s.t. } A\Delta x = 0 $$$

and $$$ \min\, \sum_{i=1}^n\left(\frac{s_i+\Delta s_i}{s_i}\right)^2 \quad \text{s.t. } A^\top \Delta y + \Delta s = 0 $$$ Next iterate is obtained as $$$w'=w+\alpha \Delta w = (x+\alpha\Delta x,y+\alpha \Delta y, s+\alpha\Delta s)$$$ for some $$\alpha\in [0,1]$$.
$$\alpha=1$$ would terminate with an optimal solution.

Changes of $$x_i$$ and $$s_i$$ variables in the MTY Predictor-Corrector algorithm

Recent weakly polynomial IPM successess

$$$\min c^\top x,\, Ax=b,\, x\ge 0$$$

Randomized $$\tilde{O}(\sqrt{m}(\mathrm{nnz}(A) + m^2)L)$$ algorithm, where $$\mathrm{nnz}$$ is the number of non-zeros: Lee–Sidford '14-15
Randomized algorithm in 'current' matrix multiplication time $$\tilde{O}(n^\omega L)$$,
$$\omega \approx 2.37$$: Cohen–Lee–Song '18
Deterministic algorithm with same runtime: van den Brand '20

For special problems:

$$\tilde{O}(m^{3/2} \log^2(U/\epsilon))$$ algorithm for an additive $$\epsilon$$ approximation for lossy generalized flow problems: Daitch–Spielman '08
$$\mathcal{O}(m^{10/7})$$ algorithm for max $$s$$-$$t$$ flow and min $$s$$-$$t$$ cut problems in directed graphs with unit capacities: Mądry '13

The mysterious $$\bar\chi_A$$

through a matroidal lens

The condition number $$\bar\chi_A$$

$$$ \bar\chi_A:=\sup\left\{\|A^\top \left(A D A^\top\right)^{-1}AD\|\, : D\in {\mathbf D}\right\} $$$ Introduced by Dikin '67, Stewart '89, Todd '90, ...

A convenient characterization

For $$W=\ker(A)$$. Recall that $$\bar\chi_A$$ only depends on the subspace $$W$$.
Define the lifting map $$L_I^W : \mathrm{proj}_I(W) \to W$$ by $$$ L_I^W(v) := \argmin\left\{\|w\| : w_I = v, w \in W\right\}. $$$

$$$ \quad\bar{\chi}_A =\max\left\{ \|L_I^W\|\, : {I\subseteq [n]}, I\neq\emptyset\right\}\, . $$$

Properties of $$\bar\chi_A$$

$$$ \bar\chi_A=\sup\left\{\|A^\top \left(A D A^\top\right)^{-1}AD\|\, : D\in {\mathbf D}\right\} $$$ We also use $$\bar \chi_W=\bar\chi_A$$ for the subspace $$W=\ker(A)$$.

The following hold:

If the entries of $$A$$ are all integers, then $$\bar\chi_A$$ is bounded by $$2^{O(L_A)}$$, where $$L_A$$ is the input bit length of $$A$$.
$$$\bar\chi_A = \max\left\{ \|B^{-1} A\| : B \text{ non-singular } m \times m \text{ submatrix of } A\right\}.$$$
$$$\bar\chi_W=\bar\chi_{W^\perp}.$$$

Application: Final rounding step in standard IPMs

$$$ \begin{aligned} \min \; &c^\top x & \max \; & y^\top b \\ Ax& =b & \qquad A^\top y + s &= c \\ x &\geq 0 & s & \geq 0 \\ \end{aligned} $$$

There exists a partition $$[n]=B^*\cup N^*$$ and optimal $$(x^*,y^*,s^*)$$ such that $$B^*=\mathrm{supp}(x^*),\, N^*=\mathrm{supp}(s^*)$$.
Given $$w=(x,y,s)$$ close to the central path with 'small enough'
$$\mu=x^\top s/n$$, such that for each $$i\in [n]$$, either $$x_i\ll s_i$$ or $$x_i\gg s_i$$.
We 'guess' $$$B:=\{i: x_i\gg s_i\},\quad N:=\{i: x_i\ll s_i\}.$$$
We try to move to $$w'=w+\Delta w$$ such that
$$\mathrm{supp}(x')\subseteq B$$, $$\mathrm{supp}(s')\subseteq N$$, $$x',s'\ge 0$$.
We need $$$A\Delta x=0,\, \Delta x_N=-x_N,\, A^\top \Delta y+\Delta s=0,\, \Delta s_B=-s_B,$$$ and for $$x+\Delta x\ge0,\, s+\Delta s\ge 0$$ it suffices that $$$\|\Delta x_B\|\leq \min_{i \in B} x_i,\, \|\Delta s_N\|\leq \min_{j \in N} s_j.$$$
Natural choices are $$$\begin{aligned} \Delta x&:=\arg\min\left\{ \|u\| : u_N=-x_N, A u=0\right\}\\ \Delta s&:=\arg\min\left\{\|v\| : v_B=-s_B, A^\top z +v=0\right\} \end{aligned} $$$
Then, we have $$$\begin{aligned} \|\Delta x\|&\le \bar\chi_A \|x_N\|,\\ \|\Delta s\|&\le {\textcolor{red}{\bar\chi_A}}\|s_B\|.\\ \end{aligned} $$$

The circuit imbalance measure

...the combinatorial sister of $$\bar\chi_A$$

A circuit of $$A$$ is a minimal linearly dependent subset of columns $$C \subseteq [n]$$. Let $$\mathcal{C}$$ denote the set of all circuits.

The circuit imbalance between $$i,j\in [n]$$ is $$$\kappa_{ij} := \max\left\{\left|\frac{g_j}{g_i} \right|: \{i,j\} \subset \mathrm{supp}(g),\, Ag=0,\, \mathrm{supp}(g) \in \mathcal{C}\right\}$$$ The circuit imbalance measure of $$A$$ is $$$\kappa_A:=\max_{i,j\in [n]}\kappa_{ij}$$$

If $$A$$ is a TU-matrix, then $$\kappa_A=1$$. More generally, if $$A$$ is integer, then $$\kappa_A\le \Delta$$, where $$\Delta$$ is the max. subdeterminant.

[DHNV19+] $$$\sqrt{1+\kappa_A^2}\le \bar\chi_A \le \sqrt{1+(n\kappa_A)^2}.$$$

The optimal rescaling $$\kappa^*_A$$

$$$\kappa^*_A:=\inf\{\kappa_{AD}: D\in \mathbf{D}\}.$$$ A near-optimal rescaling $$D$$ for $$\kappa_{AD}$$ will be near-optimal for $$\bar\chi_{AD}$$

Example: Let $$\ker(A)=\mathrm{span}((0,1,1,M),(1,0,M,1))$$.
Thus, $$\{2,3,4\}$$ and $$\{1,3,4\}$$ are circuits, and $$\kappa_A\ge M$$.
Can we find a $$D$$ such that $$\kappa_{AD}$$ is small?

Cycles are invariant under rescalings

...that is, cycles of circuits

[DHNV19+] $$$\kappa^*_A=\max\left\{\prod_{(i,j)\in H} \kappa_{ij}^{1/|H|}:\, H \textrm{ is a cycle in the complete graph on }[n]\right\}$$$

Approximating $$\kappa^*_A$$

[DHNV19+] $$$\kappa^*_A=\max\left\{\prod_{(i,j)\in H} \kappa_{ij}^{1/|H|}:\, H \textrm{ is a cycle in the complete graph on }[n]\right\}$$$

Algorithmically, the optimal rescaling can be obtained via a minimum-mean cycle computation for the costs $$c_{ij}=-\log\kappa_{ij}$$.
Minor caveat: computing $$\kappa_{ij}$$ values is NP-complete
Luckily, for any $$g\in \ker(A)$$ with $$i,j\in \mathrm{supp}(g)\in \mathcal{C}$$, $$$ |g_j/g_i|\ge \kappa_{ij}/(\kappa^*_A)^2.$$$

Layered Least Squares Interior Point Methods

The Vavasis–Ye algorithm

Learning the optimal partition of variables

Assume the Predictor-Corrector method has already 'found' the partition $$[n]=B^*\cup N^*$$: $$x_i\gg x_j$$ and $$s_i\ll s_j$$ if $$i\in B^*$$, $$j\in N^*$$.
A simple projection would find the optimal solution, but the usual predictor step does not: $$$\min\, \sum_{i=1}^n\left(\frac{x_i+\Delta x_i}{x_i}\right)^2 \quad \text{s.t. } A\Delta x = 0 $$$
What does the Vavasis–Ye algorithm do here?
First, solve $$$\min, \sum_{\textcolor{red}{i\in N^*}}\left(\frac{x_i+\Delta \bar x_i}{x_i}\right)^2 \quad \text{s.t. } A\Delta \bar x = 0 $$$ The $$N^*$$ components of the optimal $$\Delta\bar x$$ give $$\Delta x_{N^*}$$; in this case, $$\Delta x_{N^*}=0$$.
Next, solve $$$\min\, \sum_{\textcolor{red}{i\in B^*}}\left(\frac{x_i+\Delta \bar x_i}{x_i}\right)^2 \quad \text{s.t. } A\Delta \bar x = 0,\, \Delta \bar x_{N^*}=\Delta x_{N^*} $$$

The Vavasis–Ye algorithm

Layering and crossover events
in the Vavasis–Ye algorithm

Arrange the variables into layers $$(J_1,J_2,\ldots,J_k)$$ as follows:
Order $$x_1\ge x_2\ge\ldots\ge x_n$$.
Start a new layer after $$x_i$$ whenever $$x_{i}>c\bar \chi_A x_{i+1}$$.
Variables on lower layers 'barely influence' those on higher layers.
Not scaling invariant!

Progress measure

The variables $$x_i$$ and $$x_j$$ cross over between $$\mu$$ and $$\mu'$$ for $$\mu>\mu'>0$$, if

$$x_j\ge x_i/(\bar \chi_A)^n$$ at the iterate at $$\mu$$.
$$x_j(\mu'') < x_i(\mu'')/(\bar \chi_A)^n$$ for the central path point for any $$0<\mu''\le \mu'$$

In the Vavasis–Ye algorithm, a crossover event happens within every $$O(n\log \bar\chi_A)$$ iterations.

We do not know which two variables cross over!

Scale-invariant layering algorithm

[DHNV19+]

Instead of $$x_i/x_j$$, we look at the scale invariant $$\kappa_{ij} x_i/x_j$$.

Layers are identified as the strongly connected components of the digraph formed by the edges $$(i,j)$$ such that $$\kappa_{ij}x_i/x_j\ge 1/\mathrm{poly}(n)$$.
We do not know the $$\kappa_{ij}$$ values, but maintain gradually improving lower bounds on them.

Improved convergence analysis

The variables $$x_i$$ and $$x_j$$ cross over between $$\mu$$ and $$\mu'$$ for $$\mu>\mu'>0$$, if

$$\mathrm{poly}(n)(\bar \chi_A)^n>\kappa_{ij} x_i/x_j$$ at the iterate at $$\mu$$.
$$\mathrm{poly}(n)(\bar \chi_A)^n>\kappa_{ij} x_i(\mu'')/x_j(\mu'')$$ for the central path point for any $$0<\mu''\le \mu'$$

We do not use cross-over events directly, but a more fine-grained potential
This improves the overall number of iterations from $$O(n^{3.5}\log(\bar\chi_A+n))$$ to $$O(n^{2.5}\log(\bar\chi_A^*+n)\log n)$$
Analyis also applicable to the original Vavasis–Ye algorithm.

Conclusions and open questions

Scope to further improve the running time, both the iteration count, and by using faster linear algebra.
Understand and further explore the combinatorics of $$\bar\chi_A$$ and $$\kappa_A$$.
Practical advice: preprocessing the problem by a near optimal column rescaling may not hurt
Strongly polynomial LP: need to 'leave' the central path.

Linear Programming

Timeline

Strongly polynomial algorithm for LP

Interior point methods

Ellipsoid Method

Simplex Method

Origins

Weakly vs Strongly Polynomial Algorithms for LP

Strongly Polynomial Algorithms for LP

Dependence on the constraint matrix only

Scale invariance

$$$ \begin{aligned} \min \; &c^\top x & \max \; & y^\top b \\ Ax& =b & \qquad A^\top y + s &= c \\ x &\geq 0 & s & \geq 0 \\ \end{aligned} $$$

Is there a scaling invariant LLS interior point method?

Our contributions

Dadush–Huiberts–Natura–V. '19+

Limitation

A crash course in Interior Point Methods

Complementary Slackness

Weak duality

The central path

The Mizuno-Todd-Ye Predictor Corrector algorithm

Predictor Netwon step

Changes of $$x_i$$ and $$s_i$$ variables in the MTY Predictor-Corrector algorithm

Recent weakly polynomial IPM successess

For special problems:

The mysterious $$\bar\chi_A$$

through a matroidal lens

The condition number $$\bar\chi_A$$

A convenient characterization

Properties of $$\bar\chi_A$$

Application: Final rounding step in standard IPMs

$$$ \begin{aligned} \min \; &c^\top x & \max \; & y^\top b \\ Ax& =b & \qquad A^\top y + s &= c \\ x &\geq 0 & s & \geq 0 \\ \end{aligned} $$$

The circuit imbalance measure

...the combinatorial sister of $$\bar\chi_A$$

The optimal rescaling $$\kappa^*_A$$

Cycles are invariant under rescalings

...that is, cycles of circuits

Approximating $$\kappa^*_A$$

Layered Least Squares Interior Point Methods

The Vavasis–Ye algorithm

Learning the optimal partition of variables

The Vavasis–Ye algorithm

Layering and crossover events in the Vavasis–Ye algorithm

Progress measure

Scale-invariant layering algorithm

[DHNV19+]

Improved convergence analysis

Conclusions and open questions

Postdoc position opening

ERC Grant

Scaling Methods for Discrete and Continuous Optimization

Application deadline: 24th January 2020

Layering and crossover events
in the Vavasis–Ye algorithm

Scaling Methods for
Discrete and Continuous Optimization