1 Introduction

STOCHASTIC ORIGIN FRANK-WOLFE FOR
TRAFFIC ASSIGNMENT

Igor Ignashin^∗

Moscow Institute of Physics and Technology

9, Institutskiy per., Dolgoprudny 141700, Russia

ignashin.in@phystech.edu

Demyan Yarmoshik

Moscow Institute of Physics and Technology

9, Institutskiy per., Dolgoprudny 141700, Russia

yarmoshik.dv@phystech.edu

Andrei Raigorodskii

Moscow Institute of Physics and Technology

9, Institutskiy per., Dolgoprudny 141700, Russia

mraigor@yandex.ru

In this paper, we present the Stochastic Origin Frank-Wolfe (SOFW) method, which is a special case of the block-coordinate Frank-Wolfe algorithm, applied to the problem of finding equilibrium flow distributions. By significantly reducing the computational complexity of the minimization oracle, the method improves overall efficiency at the cost of increased memory consumption. Its key advantage lies in minimizing the number of shortest path computations.

We refer to existing theoretical convergence guarantees for generalized coordinate Frank-Wolfe methods and, in addition, extend the analysis by providing a convergence proof for a batched version of the Block-Coordinate Frank-Wolfe algorithm, which was not covered in the original work. We also demonstrate the practical effectiveness of our approach through experimental results. In particular, our findings show that the proposed method significantly outperforms the classical Frank-Wolfe algorithm and its variants on large-scale datasets. On smaller datasets, SOFW also remains effective, though the performance gap relative to classical methods becomes less pronounced. In such cases, there is a trade-off between solution quality, iteration time complexity, and memory usage.

1 Introduction

Predicting flows in urban transportation networks is an essential problem for urban planing and traffic management. Short-term (seconds to hours) flow prediction is usually approached with machine-learning techniques [9]. For making decisions on mid and long-term timescale (days to decays), engineers apply mathematical models which can roughly be classified into static models, dynamic models and multi-agent simulation models. Static equilibrium transportation models play a special role due to their simplicity, computational efficiency and the abundant research and civil practice related to them [5]. They are still the default choice for multistage models of large networks, where traffic demand is also estimated within the model, resulting in multiple evaluations of flow distribution in single model run [13].

Static equilibrium models are mainly represented by Beckmann’s model [4] and a less popular model of Nesterov and De-Palma [17]. The first model is equivalent to a nonlinear convex min-cost multicommodity flow problem (MCF), while the latter is a linear capacitated min-cost MCF.

For both models a lot of attention was attracted to the design of efficient algorithms for searching the equilibria. One of the effective methods for finding equilibrium in Beckmann’s model is the link-based Frank-Wolfe or Frank-Wolfe method [6, 15]. Existing modifications, such as Conjugate Frank-Wolfe [16] or N-Conjugate Frank-Wolfe [8], provide a significant advantage over the basic variant. More advanced path-based and bush-based algorithms were designed to achieve solutions with higher accuracy, but are associated with such drawbacks as increased memory usage or complicated implementation [18, 3]. Distinct from that are approaches based on solving the dual problem for both Beckmann’s and Nesterov–De-Palma’s models [7, 12] and the approach based on saddle-point formulation [20, 21].

All algorithmic approaches except the saddle-point approach require finding the shortest path between each origin-destination pair at each iteration. This is typically done by calling the Dijkstra’s algorithm from each origin node. Thus the complexity of each iteration grows with the size of network as $O(sn\log n)$ , where $n$ is the number of nodes of the road graph and $s$ is the number of origin vertices, what makes it prohibitive to consider highly-detalized models where the number of origins approaches the number of all nodes in the graph.

Note that a similar difficulty in machine learning was bypassed by using stochastic gradient descent method (SGD) which generates randomized step directions using small portions of (large) dataset instead of computing exact gradient over all training samples at each iteration. The idea of using stochastic dual gradient methods for Beckmann’s model was described in [7], but was never validated experimentally. E.g., derivative works [12, 13] with numerical results only used deterministic algorithms.

This paper studies stochastic variants of Frank-Wolfe and universal primal-dual algorithm for finding equilibria in Beckmann’s model. We propose Stochastic Origin Frank-Wolfe methods (SOFW). This method is a special case of Block-Coordinate Frank-Wolfe algorithm proposed in [14]. The idea behind this method is to find a Frank-Wolfe step on a subset of correspondences. The idea of the second method is to find a new direction conjugate to N previous directions.

The key findings of this paper are as follows:

1.

We propose the SOFW method, a block-coordinate variant of Frank-Wolfe with reduced oracle complexity via sparse flow decompositions.
2.

A weighted variant (SOFW-w) improves initial convergence by prioritizing important updates.
3.

The method introduces a memory–computation trade-off: more memory allows fewer shortest path calls.
4.

Experiments on real-world datasets show that SOFW significantly outperforms Frank-Wolfe on large graphs.
5.

On small graphs, SOFW remains competitive, though the performance gap is smaller.
6.

SOFW-w converges faster initially, but aligns with SOFW over time.

2 Problem statement

In this paper we consider the problem of finding an equilibrium distribution of flows in Beckmann’s model, which is equivalent to the following convex optimization problem:

\displaystyle\Psi(f)=\sum_{e\in\mathscr{E}}\underbrace{\int_{0}^{f_{e}}\tau_{e}(z)dz}_{\sigma_{e}(f_{e})}\longrightarrow\min_{f\in\mathscr{F}}.

Let us introduce the notation. The network is represented by a directed graph $\mathscr{G}=(\mathscr{E},\mathscr{V})$ , where edges corresponds to roads.

Beckmann’s model assume that travel time on a road is a monotonically increasing function of flow on the road. Usually, Bureau of Public Roads (BPR) congestion function is used [19]:

\displaystyle\tau_{e}(f_{e})=\overline{t}_{e}\left(1+\rho\left(\frac{f_{e}}{\overline{f}_{e}}\right)^{\frac{1}{\mu}}\right).\hskip 28.45274pt\tau(f)\equiv\{\tau_{e}(f_{e})\}_{e\in\mathscr{E}},

where $\rho,\mu,\overline{t_{e}},\overline{f_{e}}$ are road parameters in the model.

The model is given a predefined set of source and destinations pairs $w=(i,j)$ and the amount of traffic (correspondence) $d_{w}$ between each origin-destination pair in the considered period of time.

	$\displaystyle\hskip 28.45274ptO\text{ and }D-\text{sets of origin and destination vertices }\subset\mathscr{V}$
	$\displaystyle\hskip 28.45274ptOD\equiv\{w=(i,j)~\|~i\in O,j\in D\}$
	$\displaystyle\hskip 28.45274pt\mathscr{P}\equiv\{\mathscr{P}_{w}\}_{w\in OD},~\mathscr{P}_{w}\text{ is the set of all paths from $i$ to $j$ for $w=(i,j)$}$

The correspondences of each origin-destination pair are additively partitioned into flows along the paths connecting these pairs:

\displaystyle\hskip 28.45274pt\mathscr{X}=\{x\;|\;d_{w}=\sum\limits_{p\in\mathscr{P}_{w}}x_{p}\;,\;x_{p}\geqslant 0\;\forall w\in OD\}.

Flows on all paths along some road overlap and give rise to some value of the flow on that road. The set of feasible edge flows is given by the image of $\mathscr{X}$ under the linear operator defined by the path-edge incidence matrix $\Theta\in\{0,1\}^{|\mathscr{E}|\times|\mathscr{P}|}$ :

\mathscr{F}=\left\{f\in\mathbb{R}^{|\mathscr{E}|}\,\middle|\,\exists\,x\in\mathscr{X}\text{ such that }f=\Theta x\right\}.

2.1 Frank-Wolfe

f_{0}\in\mathscr{F}

— starting point

k:=0

3: repeat

{s_{k}^{FW}}:=\arg\min\limits_{s\in\mathscr{F}}\langle\nabla\Psi(f_{k}),s\rangle

d_{k}:={s_{k}^{FW}}-f_{k}

\gamma_{k}:=\arg\min\limits_{\gamma\in[0,1]}\Psi(f_{k}+\gamma\cdot d_{k})

f_{k+1}:=f_{k}+\gamma_{k}\cdot d_{k}

k:=k+1

9: until

k<\text{max iter}

Algorithm 1 Frank–Wolfe

The main loop of the algorithm begins by applying the linear minimization oracle in line 4. Next, the step direction is obtained in line 5. In lines 6–7 the stepsize is searched using linesearch and the step is performed.

Since $\nabla\Psi(f_{k})=\tau(f_{k})$ , the linear minimization oracle minimizes total travel time experienced by all users in each correspondence:

\displaystyle\min\limits_{s\in\mathscr{F}}\langle\nabla\Psi(f^{k}),s\rangle=\min\limits_{s\in\mathscr{F}}\langle\tau(f_{k}),s\rangle=\sum\limits_{w\in OD}T_{p_{w}^{*}}(f_{k})\cdot d_{w},

where $p_{w}^{*}$ is one of the shortests paths between origin $i$ and destination $d$ of the correspondence $w=(i,j)\in OD$ , $T_{p_{w}^{*}}(f_{k})$ – travel time along the shortest path $p_{w}^{*}$ . Therefore, the solution to this linear subproblem is to distribute all traffic along the shortest paths. Variable $s\in\mathscr{F}$ is then efficiently calculated from the output of Dijkstra’s or similar one-to-all shortest paths algorithm by summing flows from the shortest paths trees for all origins.

The key observation is that for the Frank-Wolfe step it is necessary to find shortest paths over for correspondences. On large road graphs this can become computationally expensive. The main interest is in how to improve the method, by making single iteration more computationally efficient.

Note, that the whole algorithm can also be formulated and implemented in the path-based fashion, i.e., in the space of flows on paths $\mathscr{X}$ . The present link-based form with flows on edges is preferable due to lower memory usage: in the worst case at each iteration a new shortest path will be found, thus the path-based implementation will require to store hundreds of paths for each correspondence. However, we will use path-based notation in the next section to make the reduction to the block-coordinate Frank-Wolfe algorithm. Despite that, in the actual implementation we instead use an origin-based representation to store intermediate results.

3 Stochastic Origin Frank-Wolfe

The proposed Stochastic Origin Frank-Wolfe (SOFW) method is a structured variant of the Block-Coordinate Frank-Wolfe algorithm [14], adapted to the setting of equilibrium distribution flows in transportation networks.

We consider the path flow vector $x\in\mathscr{X}\subseteq\mathbb{R}^{|\mathscr{P}|}$ as a concatenation of blocks:

x=\big(x^{(i,j)}\big)_{(i,j)\in\text{OD}},\quad x^{(i,j)}\in\mathbb{R}^{|\mathscr{P}_{i,j}|},

where each block corresponds to the set of paths $\mathscr{P}_{i,j}$ between a given origin-destination (OD) pair $(i,j)$ . These blocks satisfy the demand constraints:

\sum_{p\in\mathscr{P}_{i,j}}x_{p}=d_{ij},\quad x_{p}\geqslant 0.

At each iteration, SOFW samples a random subset of OD pairs $\mathscr{I}\subset\text{OD}$ , and updates only the corresponding sub-blocks $x^{(i,j)}$ , while freezing the rest. The optimization is thus restricted to the subspace:

\mathscr{X}_{\mathscr{I}}=\left\{x\in\mathscr{X}\,\middle|\,x^{(i,j)}=\overline{x}^{(i,j)}\text{ for }(i,j)\notin\mathscr{I}\right\},

and the linear oracle is solved over the set $F(\mathscr{X}_{\mathscr{I}})=\{\Theta x\mid x\in\mathscr{X}_{\mathscr{I}}\}\subset\mathscr{F}$ .

For rational application of shortest-paths algorithms we sample a random subset by sampling a fraction of origin vertices and including all corresponding OD pairs. This results in a computationally efficient approximation of the classical Frank-Wolfe direction that require only several calls to the Dijkstra’s or similar shortest-paths algorithm to be computed, while preserving convergence guarantees established in the block-coordinate framework [14]. The complete method is presented in Algorithm 2.

Algorithm 2 Stochastic Origin Frank-Wolfe

1: Choose initial flow

f_{0}\in\mathscr{F}

, distribution

P((i,j))=w_{i,j}

2: Set iteration counter

k:=0

3: repeat

4: Sample a random subset

\mathscr{I}_{k}\subset\text{OD}

with probabilities

\propto w_{ij}

(selected OD-pairs)

5: Define subspace

\mathscr{X}_{k}=\{x\in\mathscr{X}\mid x^{(i,j)}=x_{k}^{(i,j)}\text{ for all }(i,j)\notin\mathscr{I}_{k}\}

s_{k}:=\arg\min_{s\in F(\mathscr{X}_{k})}\langle\nabla\Psi(f_{k}),s\rangle

7: Compute direction:

d_{k}:=\frac{1}{|\mathscr{I}_{k}|}W_{k}\cdot\sum_{(i,j)\in\mathscr{I}_{k}}\frac{1}{w_{ij}}\left(F(s_{k}^{[i,j]})-F(x_{k}^{[i,j]})\right),\quad W_{k}:=\left(\sum_{(i,j)\in\mathscr{I}_{k}}\frac{1}{w_{ij}}\right)^{-1}

8: Choose step size

\gamma_{k}\in[0,1]

, e.g.,

\gamma_{k}=\frac{2}{k+1}

or via line search

9: Update flow:

f_{k+1}:=f_{k}+\gamma_{k}d_{k}

10: Increment

k:=k+1

11: until stopping criterion is satisfied

In the weighted variant, origin-destination (OD) pairs are sampled by first selecting sources with probabilities proportional to their total demand, and then uniformly sampling correspondences associated with the chosen sources. Formally, the probability of sampling a correspondence $(i,j)$ is given by

\mathbb{P}((i,j))=\frac{d_{i}}{\sum_{k}d_{k}}\cdot\frac{1}{|\mathscr{C}_{i}|},

where $d_{i}=\sum_{j}d_{ij}$ is the total demand of source $i$ , and $|\mathscr{C}_{i}|$ is the number of correspondences originating from source $i$ . This approach ensures that sources with higher total demand are sampled more frequently, while correspondences within each source are equally weighted. An appropriate importance-weighted correction is applied to guarantee an unbiased estimation of the Frank-Wolfe gap. For simplicity, in our experiments we do not sample individual correspondences directly. Instead, we perform batched sampling over sources: at each iteration, we randomly (with weights $\sim\frac{d_{i}}{\sum_{k}d_{k}}$ ) select a fraction $\alpha$ of all sources. Then, for each selected source $i$ , we include all of its associated correspondences $(i,j)$ without additional subsampling. This approach is natural and efficient, as it allows computing shortest paths from a given source to all destinations in a single step, thereby reducing overhead while preserving the sparsity and scalability benefits of the SOFW method.

4 Method convergence

The convergence of the proposed SOFW method follows from the general analysis of the Block-Coordinate Frank-Wolfe algorithm on product domains [14]. We consider a non-batch setting, where at each iteration the algorithm selects a source uniformly and randomly and selects all blocks corresponding to it. This corresponds to the unweighted variant of SOFW with the sample size equal to the number of destinations, i.e., a single source is selected and all corresponding OD pairs originating from it are updated simultaneously. While the current analysis focuses on this regime, it can be extended to batched updates, where several blocks are sampled and updated simultaneously, though this would require a more general theoretical treatment.

In our setting, the feasible set of path flows is a Cartesian product:

\mathscr{X}=\prod_{(i,j)\in\text{OD}}\mathscr{X}^{(i,j)},

and the set of feasible edge flows is given by the linear mapping $F:\mathscr{X}\to\mathscr{F}$ , defined by $F(x)=\Theta x$ .

Block curvature constant.

Let $\Psi:\mathbb{R}^{|E|}\to\mathbb{R}$ be a convex and differentiable objective (e.g., the Beckmann potential). The curvature of $\Psi\circ F$ over block $(i,j)\in\text{OD}$ is defined as:

C^{(i,j)}_{\Psi}:=\sup_{\begin{subarray}{c}x\in\mathscr{X},\ s^{(i,j)}\in\mathscr{X}^{(i,j)},\\ \gamma\in[0,1],\ y=x+\gamma(s^{[i,j]}-x^{[i,j]})\end{subarray}}\frac{2}{\gamma^{2}}\left[\Psi(F(y))-\Psi(F(x))-\left\langle\nabla\Psi(F(x)),\ F(y^{[i,j]})-F(x^{[i,j]})\right\rangle\right],

where $s^{[i,j]},x^{[i.j]}\in\mathscr{X}$ denotes a one-hot vectors with nonzero entries only in the components indexed by $\mathscr{P}_{i,j}$ .

The total curvature over the product domain is then given by:

C^{\otimes}_{\Psi}:=\sum_{(i,j)\in\text{OD}}C^{(i,j)}_{\Psi}.

Primal convergence.

Consider the variant of SOFW where at each iteration a single correspondence $(i,j)\in\mathrm{OD}$ is sampled uniformly at random and updated. Let $f_{k}=\mathscr{F}(x_{k})$ denote the iterate at step $k$ . Then, for this exact variant, the expected suboptimality satisfies [14]:

\mathbb{E}[\Psi(f_{k})]-\Psi(f^{*})\;\leqslant\;\frac{2|\mathrm{OD}|}{k+2|\mathrm{OD}|}\left[C^{\otimes}_{\Psi}+\Psi(f_{0})-\Psi(f^{*})\right].

Therefore, to obtain an $\varepsilon$ -accurate solution in expectation, it suffices to run

\mathscr{O}\!\left(\frac{|\mathrm{OD}|}{\varepsilon}\right)

iterations.

This result can be extended to the batched setting, where multiple blocks are sampled and updated simultaneously, although the theoretical rate in terms of the iteration count remains of the same order.

Batched primal convergence.

We consider the batched SOFW method that, at iteration $k$ , samples (without replacement) a mini-batch $\mathscr{I}_{k}\subset\mathrm{OD}$ of size $m$ uniformly at random, solves the usual Frank–Wolfe (Frank-Wolfe) linear subproblem on each sampled block, and moves along the averaged direction.

	$\displaystyle d_{k}$	$\displaystyle=\frac{1}{m}\sum_{(i,j)\in\mathscr{I}_{k}}\bigl(s_{k}^{[i,j]}-x_{k}^{[i,j]}\bigr),$
	$\displaystyle x_{k+1}$	$\displaystyle=x_{k}+\gamma_{k}d_{k},\qquad\gamma_{k}\in[0,1].$

Let the full Frank–Wolfe gap decompose across blocks as

	$\displaystyle g(x)$	$\displaystyle:=\sum_{(i,j)\in\mathrm{OD}}g^{(i,j)}(x),$
	$\displaystyle g^{(i,j)}(x)$	$\displaystyle:=\Big\langle\nabla\Psi(F(x)),\,F\!\bigl(x^{[i,j]}-s^{[i,j]}(x)\bigr)\Big\rangle,$

where $s^{[i,j]}(x)$ denotes an optimal Frank–Wolfe atom for block $(i,j)$ at $x$ (lifted to the product space).

Recall the block curvature constants $C_{\Psi}^{(i,j)}$ and their sum

\displaystyle C_{\Psi}^{\otimes}:=\sum_{(i,j)\in\mathrm{OD}}C_{\Psi}^{(i,j)}.

We first record two auxiliary facts.

Remark 4.1 (Single-block progress inequality).

For any $x\in\mathscr{X}$ , block $(i,j)\in\mathrm{OD}$ , atom $s^{[i,j]}(x)$ , and step $\gamma\in[0,1]$ , by the definition of the block curvature constant $C_{\Psi}^{(i,j)}$ , we have

\displaystyle\Psi\!\bigl(F\!\bigl(x+\gamma(s^{[i,j]}(x)-x^{[i,j]})\bigr)\bigr)\;\leqslant\;\Psi(F(x))-\gamma\,g^{(i,j)}(x)+\frac{\gamma^{2}}{2}\,C_{\Psi}^{(i,j)}.

Lemma 4.1 (Convex averaging across a mini-batch).

Let $\{v_{r}\}_{r=1}^{m}$ be points in $\mathscr{X}$ and $v=\frac{1}{m}\sum_{r=1}^{m}v_{r}$ . By convexity of $\Psi\circ F$ ,

\displaystyle\Psi(F(v))\leqslant\frac{1}{m}\sum_{r=1}^{m}\Psi(F(v_{r})).

Applying this with $v_{r}=x_{k}+\gamma_{k}(s_{k}^{[i_{r},j_{r}]}-x_{k}^{[i_{r},j_{r}]})$ for $(i_{r},j_{r})\in\mathscr{I}_{k}$ and $v=x_{k}+\gamma_{k}d_{k}$ gives

	$\displaystyle\Psi(F(x_{k+1}))$	$\displaystyle=\Psi\!\Bigl(F\!\Bigl(x_{k}+\gamma_{k}\frac{1}{m}\sum_{(i,j)\in\mathscr{I}_{k}}(s_{k}^{[i,j]}-x_{k}^{[i,j]})\Bigr)\Bigr)$
		$\displaystyle\leqslant\frac{1}{m}\sum_{(i,j)\in\mathscr{I}_{k}}\Psi\!\bigl(F(x_{k}+\gamma_{k}(s_{k}^{[i,j]}-x_{k}^{[i,j]}))\bigr).$

Invoking the single-block progress inequality term-wise gives, for any realized mini-batch $\mathscr{I}_{k}$ ,

\displaystyle\Psi(F(x_{k+1}))

\displaystyle\leqslant\Psi(F(x_{k}))-\gamma_{k}\frac{1}{m}\sum_{(i,j)\in\mathscr{I}_{k}}g^{(i,j)}(x_{k})+\frac{\gamma_{k}^{2}}{2}\frac{1}{m}\sum_{(i,j)\in\mathscr{I}_{k}}C_{\Psi}^{(i,j)}.

Taking the conditional expectation w.r.t. the random mini-batch $\mathscr{I}_{k}$ (uniform sampling without replacement), we have

	$\displaystyle\mathbb{E}\!\bigl[\Psi(F(x_{k+1}))\,\big\|\,x_{k}\bigr]$	$\displaystyle\leqslant\Psi(F(x_{k}))-\gamma_{k}\frac{1}{m}\,\mathbb{E}\!\Bigl[\sum_{(i,j)\in\mathscr{I}_{k}}g^{(i,j)}(x_{k})\Bigr]+\frac{\gamma_{k}^{2}}{2}\frac{1}{m}\,\mathbb{E}\!\Bigl[\sum_{(i,j)\in\mathscr{I}_{k}}C_{\Psi}^{(i,j)}\Bigr]$
		$\displaystyle=\Psi(F(x_{k}))-\gamma_{k}\frac{1}{m}\cdot m\cdot\frac{1}{\|\mathrm{OD}\|}\sum_{(i,j)\in\mathrm{OD}}g^{(i,j)}(x_{k})+\frac{\gamma_{k}^{2}}{2}\frac{1}{m}\cdot m\cdot\frac{1}{\|\mathrm{OD}\|}\sum_{(i,j)\in\mathrm{OD}}C_{\Psi}^{(i,j)}$
		$\displaystyle=\Psi(F(x_{k}))-\gamma_{k}\frac{1}{\|\mathrm{OD}\|}\,g(x_{k})+\frac{\gamma_{k}^{2}}{2}\,\frac{1}{\|\mathrm{OD}\|}\,C_{\Psi}^{\otimes}.$

Observation: the expected update recursion is identical to the non-batched case. Thus, batching does not change the theoretical convergence rate in terms of iteration count, but reduces the variance and may improve wall-clock time in practice.

Theorem 4.1 (Expected primal convergence of batched SOFW).

Let $\{x_{k}\}_{k\geqslant 0}$ be the iterates of the batched SOFW method defined above with any batch size $m$ and step sizes

\displaystyle\gamma_{k}=\frac{2}{k+2|\mathrm{OD}|}.

Then, for all $k\geqslant 0$ ,

\displaystyle\mathbb{E}\bigl[\Psi(F(x_{k}))\bigr]-\Psi(F(x^{*}))\;\leqslant\;\frac{2|\mathrm{OD}|}{k+2|\mathrm{OD}|}\Bigl[C_{\Psi}^{\otimes}+\Psi(F(x_{0}))-\Psi(F(x^{*}))\Bigr].

Consequently, an $\varepsilon$ -accurate solution in expectation is obtained after

\displaystyle\mathscr{O}\!\left(\frac{|\mathrm{OD}|}{\,\varepsilon}\right)

iterations.

Proof.

Define the suboptimality gap $\Delta_{k}:=\mathbb{E}[\Psi(F(x_{k}))]-\Psi(F(x^{*})).$ From the above conditional expectation, we have

\displaystyle\Delta_{k+1}

\displaystyle\leqslant\Delta_{k}-\gamma_{k}\frac{1}{|\mathrm{OD}|}\,\mathbb{E}[g(x_{k})]+\frac{\gamma_{k}^{2}}{2}\,\frac{1}{|\mathrm{OD}|}\,C_{\Psi}^{\otimes}.

Since $\mathbb{E}[g(x_{k})]\geqslant\Delta_{k}$ (the Frank–Wolfe gap dominates the primal gap in expectation), we obtain

\displaystyle\Delta_{k+1}

\displaystyle\leqslant\Delta_{k}\left(1-\frac{\gamma_{k}}{|\mathrm{OD}|}\right)+\frac{\gamma_{k}^{2}}{2|\mathrm{OD}|}C_{\Psi}^{\otimes}.

Choosing $\gamma_{k}=\tfrac{2}{k+2|\mathrm{OD}|}$ , the standard induction argument used in the proof of Theorem C.2 of [14] (with $\nu=1$ ) directly applies here and yields

\displaystyle\Delta_{k}\leqslant\frac{2|\mathrm{OD}|}{k+2|\mathrm{OD}|}\Bigl[C_{\Psi}^{\otimes}+\Psi(F(x_{0}))-\Psi(F(x^{*}))\Bigr].

This concludes the proof. ∎

Remarks on batching.

The above analysis extends the standard Block-Coordinate Frank–Wolfe framework [14] to the batched setting, which was not explicitly covered in the original paper. In our experiments, batching arises naturally because we sample all OD-pairs corresponding to one or several sources at each iteration.

It is important to note that batching, as analyzed here, does not improve the theoretical iteration complexity: the expected progress per iteration remains the same due to the normalization by the batch size $m$ . To achieve an improved rate through batching, one would need to adjust certain assumptions or redefine curvature constants (e.g., taking advantage of block-wise structure or stronger smoothness). Such refinements are beyond the current scope and would require a more careful theoretical treatment.

Remarks on stochastic dual methods for traffic assignment.

We revealed two difficulties in applying stochastic primal-dual methods in this setup. First, there is no ready-to-use variant of USTM (which is the only known primal-dual method whose performance can compare with Frank-Wolfe) with inexact function value oracle [12, 13]: by default universal methods require to compare values of the objective function in the current and next candidate point [22], what require finding all shortest paths. Second, the convergence guarantees for primal-dual methods only hold then the primal variable is restored using special averaging formulas, which, in case of stochastic steps, yield non-feasible approximate solutions (i.e. flows do not satisfy the demand constraints), what makes them far less favorable for practical applications.

5 Experiments

We evaluate the performance of the proposed Stochastic Origin Frank-Wolfe (SOFW) method against the classical Frank-Wolfe (FW) algorithm, as well as several variants involving different sampling strategies of origin-destination correspondences. In particular, we compare the standard SOFW variant that uniformly samples a subset of correspondences, with varying fractions $\alpha$ of the total number of origin-destination pairs $|OD|$ , and the weighted variant (SOFW-w) described in the algorithm section, which samples correspondences proportionally to the weights of their respective sources — where each source weight equals the sum of demands over all correspondences originating from it. For both variants, each correspondence linked to a given source is assigned an equal weight. We also include the classical FW method as a baseline. Experiments are conducted on large-scale transportation datasets from the TransportationNetworks repository: Chicago-Regional, GoldCoast, Birmingham-England, Chicago-Sketch, and Philadelphia. Additionally, we evaluate on smaller benchmark instances: SiouxFalls, Anaheim, Terrassa-Asymmetric, and Winnipeg.

The SOFW algorithm is designed as follows: suppose there are $n$ origin nodes and $m$ destination nodes (i.e., $n\times m$ OD-pairs). At each iteration, we sample a fixed fraction of sources—e.g., $10\%$ of the total origins—and compute shortest paths from these selected sources to all destinations. This reduces the computational cost of shortest-path search by roughly an order of magnitude.

To implement this scheme, we decompose the total flow into components corresponding to each origin node. At each iteration, only the flows associated with a sampled subset of origins $\mathscr{A}\subseteq\{1,\dots,n\}$ are updated, while flows corresponding to the remaining origins are fixed at their current values. Formally, the total edge flow $f$ can be expressed as

f=\sum_{a=1}^{n}f^{(a)},

where $f^{(a)}$ denotes the flow induced by origin $a$ . During the algorithm, we update only the components $f^{(a)}$ for $a\in\mathscr{A}$ , fixing the others.

Although maintaining separate flow components increases memory usage by a factor of approximately

\frac{n}{|\mathscr{A}|},

where $n$ is the total number of origins and $|\mathscr{A}|$ is the number of sampled origins per iteration, this approach significantly reduces per-iteration computational cost by limiting shortest path computations to the sampled subset.

Surprisingly, this memory-compute tradeoff leads to substantial practical improvements. Despite increased memory usage, SOFW achieves nearly an order-of-magnitude better Frank-Wolfe gap for the same computational budget compared to the classical FW method.

An important factor behind the growing performance gap between SOFW and the classical FW method on larger datasets lies in the relationship between the stochastic update quality and the computational complexity of shortest-path calculations.

Empirically, the quality of the stochastic update remains relatively stable even when only a small fraction of origins is selected—i.e., the accuracy of the descent direction does not degrade in proportion to the sampling rate. In contrast, the computational complexity of a full Frank-Wolfe step scales as

\mathscr{O}(n\cdot(E+V\log V)),

where $n$ is the number of origin nodes, $V=|\mathscr{V}|$ is the number of nodes in the network, and $E=|\mathscr{E}|$ is the number of edges. This reflects the cost of solving single-source shortest path problems from all origins.

By sampling only a fraction $\alpha$ of the origins (e.g., $\alpha=0.1$ ), SOFW reduces the per-iteration cost to $\mathscr{O}(\alpha\cdot n\cdot(E+V\log V)),$ leading to substantial savings in large-scale settings.

As the number of origins and the network size increase, this computational advantage becomes more pronounced, resulting in faster convergence and better practical performance compared to the classical FW algorithm.

We summarize key experimental findings:

1.

SOFW significantly outperforms FW on large-scale graphs.
2.

The advantage of SOFW increases with the size of the network.
3.

On small networks, SOFW still performs competitively, although the gap between FW and SOFW becomes less pronounced due to the overhead from flow decomposition.
4.

A clear memory-quality tradeoff is observed: increasing memory usage via flow decomposition yields better convergence rates.
5.

The weighted variant SOFW-w demonstrates faster initial progress compared to standard SOFW, but the performance of both methods converges in the long run.

Refer to caption — Figure 1: Dataset: Chicago-Sketch

6 Conclusion

Stochastic approaches demonstrate clear advantages in large-scale transportation modeling, particularly when working with massive datasets typical of large metropolitan areas. In such settings, solving the linear subproblem exactly becomes computationally prohibitive due to the vast number of origin-destination (OD) pairs.

Reducing the problem size by either limiting the number of OD pairs or introducing stochasticity in the selection of origins for shortest path computations provides a practical trade-off between solution quality and computational effort. By sampling a subset of origins at each iteration, it is possible to significantly reduce computational overhead while maintaining satisfactory convergence behavior.

These stochastic schemes enable scalable optimization on mega-scale networks, facilitating efficient and timely traffic assignment and flow estimation in real-world urban systems.

Moreover, it is essential to explore ways of combining stochastic methods with other algorithmic techniques. In particular, an interesting direction for future research is to incorporate stochasticity—such as block-coordinate sampling—into benchmark methods like NFW, potentially yielding more efficient and flexible hybrid algorithms.

In future work, we aim to extend the theoretical analysis to the batched version of the algorithm, which samples and updates multiple sources and their corresponding blocks simultaneously. This extension is expected to reduce the constant factor in the convergence bounds proportionally to the batch size, thereby improving iteration complexity and practical performance.

References

1.
3. Babazadeh, A., Javani, B., Gentile, G., Florian, M.: Reduced gradient algorithm for user equilibrium traffic assignment problem. Transportmetrica A: Transport Science 16(3), 1111–1135 (2020)
4. Beckmann, M., McGuire, C.B., Winsten, C.B.: Studies in the economics of transportation. Tech. rep. (1956)
5. Boyles, S.D., Lownes, N.E., Unnikrishnan, A.: Transportation Network Analysis, vol. 1 (2023), edition 0.91
6. Frank, M., Wolfe, P., et al.: An algorithm for quadratic programming. Naval research logistics quarterly 3(1-2), 95–110 (1956)
7. Gasnikov, A.V., Dvurechenskii, P., Dorn, Y.V., Maksimov, Y.V.: Numerical methods for the problem of traffic flow equilibrium in the beckmann and the stable dynamic models. Matematicheskoe modelirovanie 28(10), 40–64 (2016)
8. Ignashin, I., Yarmoshik, D.: Modifications of the frank-wolfe algorithm in the problem of finding the equilibrium distribution of traffic flows. Mathematical Modeling and Numerical Simulation 10(1), 10–25 (2024)
9. Jiang, W., Luo, J.: Graph neural network for traffic forecasting: A survey. Expert systems with applications 207, 117921 (2022)
10. Kerdreux, T.: Accelerating Frank-Wolfe methods. Ph.D. thesis, Université Paris sciences et lettres (2020)
11. Kerdreux T., Pedregosa F., d.A.: Frank-wolfe with subsampling oracle. In: International Conference on Machine Learning. pp. PMLR 80:2591–2600, 2018. (2018)
12. Kubentayeva, M., Gasnikov, A.: Finding equilibria in the traffic assignment problem with primal-dual gradient methods for stable dynamics model and beckmann model. Mathematics 9(11), 1217 (2021)
13. Kubentayeva, M., Yarmoshik, D., Persiianov, M., Kroshnin, A., Kotliarova, E., Tupitsa, N., Pasechnyuk, D., Gasnikov, A., Shvetsov, V., Baryshev, L., et al.: Primal-dual gradient methods for searching network equilibria in combined models with nested choice structure and capacity constraints. Computational Management Science 21(1), 15 (2024)
14. Lacoste-Julien, S., Jaggi, M., Schmidt, M., Pletscher, P.: Block-coordinate frank-wolfe optimization for structural svms. International Conference on Machine Learning (07 2012)
15. Levitin, E.S., Polyak, B.T.: Constrained minimization methods. USSR Computational mathematics and mathematical physics 6(5), 1–50 (1966)
16. Mitradjieva, M., Lindberg, P.O.: The stiff is moving—conjugate direction frank-wolfe methods with applications to traffic assignment. Transportation Science 47(2), 280–293 (2013)
17. Nesterov, Y., De Palma, A.: Stationary dynamic solutions in congested transportation networks: summary and perspectives. Networks and spatial economics 3, 371–395 (2003)
18. Perederieieva, O., Ehrgott, M., Raith, A., Wang, J.Y.: A framework for and empirical study of algorithms for traffic assignment. Computers & Operations Research 54, 90–107 (2015)
19. Transportation Networks for Research Core Team: Transportation networks for research. https://github.com/bstabler/TransportationNetworks (2024), accessed: 2024-02-29
20. Yarmoshik, D., Persiianov, M.: On the application of saddle-point methods for combined equilibrium transportation models. In: International Conference on Mathematical Optimization Theory and Operations Research. pp. 432–448. Springer (2024)
21. Zhang, F., Boyd, S.: Solving large multicommodity network flow problems on gpus. arXiv preprint arXiv:2501.17996 (2025)
22. Gasnikov A.V., Nesterov Y.E.: UNIVERSAL METHOD FOR STOCHASTIC COMPOSITE OPTIMIZATION PROBLEMS Computational Mathematics and Mathematical Physics. 2018. 58(1) pp. 48–64.

	$\displaystyle\mathbb{E}\!\bigl[\Psi(F(x_{k+1}))\,\big\|\,x_{k}\bigr]$	$\displaystyle\leqslant\Psi(F(x_{k}))-\gamma_{k}\frac{1}{m}\,\mathbb{E}\!\Bigl[\sum_{(i,j)\in\mathscr{I}_{k}}g^{(i,j)}(x_{k})\Bigr]+\frac{\gamma_{k}^{2}}{2}\frac{1}{m}\,\mathbb{E}\!\Bigl[\sum_{(i,j)\in\mathscr{I}_{k}}C_{\Psi}^{(i,j)}\Bigr]$
		$\displaystyle=\Psi(F(x_{k}))-\gamma_{k}\frac{1}{m}\cdot m\cdot\frac{1}{\|\mathrm{OD}\|}\sum_{(i,j)\in\mathrm{OD}}g^{(i,j)}(x_{k})+\frac{\gamma_{k}^{2}}{2}\frac{1}{m}\cdot m\cdot\frac{1}{\|\mathrm{OD}\|}\sum_{(i,j)\in\mathrm{OD}}C_{\Psi}^{(i,j)}$
		$\displaystyle=\Psi(F(x_{k}))-\gamma_{k}\frac{1}{\|\mathrm{OD}\|}\,g(x_{k})+\frac{\gamma_{k}^{2}}{2}\,\frac{1}{\|\mathrm{OD}\|}\,C_{\Psi}^{\otimes}.$