Tree-based formulation for the multi-commodity flow problem

Simon Spoorendonk
Flowty ApS
Denmark
simon@flowty.ai
&Bjørn Petersen
Flowty ApS
Denmark
bjorn@flowty.ai

Abstract

We introduce a tree-based formulation for the minimum-cost multi-commodity flow problem that addresses large-scale instances. The method decomposes the source-based model by representing flows as convex combinations of trees rooted at source nodes, and solves the resulting formulation with column generation. The number of demand constraints now depends on the number of sources $|S|$ , not commodities $|K|$ , yielding a compact master problem when $|S|\ll|K|$ . We conduct a computational study comparing tree-based decomposition against path-based column generation and direct LP solving. The results show speed-ups of up to one order of magnitude over direct LP solving, and improved scalability compared to path-based formulations. Tree-based decomposition enables solving instances with millions of commodities and hundreds of thousands of nodes. This makes it well-suited for applications in transportation and logistics networks where multiple demands often share common origins.

Keywords Multi-commodity flow problem $\cdot$ Column generation

1 Introduction

The minimum-cost multi-commodity flow (MCF) problem is a fundamental network optimization problem with applications in logistics, passenger transportation, telecommunication, and energy systems [1, 2]. It routes multiple origin–destination demands through a capacitated network at minimum cost. The MCF problem can be solved as a linear programming model, but the number of variables and constraints grows rapidly, making large-scale instances computationally challenging.

The classical edge-based formulation introduces $O(|K||E|)$ variables and $O(|K||V|+|E|)$ constraints. This becomes intractable when either the commodity set $K$ or the network size grows. Source-based formulations aggregate by source nodes, reducing the model to $O(|S||E|)$ variables and $O(|S||V|+|E|)$ constraints. When $|S|\ll|K|$ this yields savings, but commodity-level flows must then be reconstructed [1]. Path-based formulations, obtained via Dantzig–Wolfe decomposition, replace edge variables by path variables. Each commodity $k\in K$ has an exponential path set $\mathcal{P}_{k}$ , and column generation dynamically adds paths [3, 4]. However, the master problem still contains $O(|K|)$ demand constraints, and restricted master problems grow prohibitively large when $|K|$ reaches millions.

Research has therefore focused on decomposition and relaxation approaches. Stabilized column generation methods [5], analytic center cutting-plane algorithms with active sets, and Lagrangian relaxation [6] have extended the tractable instance size. Recently, Zhang and Boyd [7] proposed a GPU–compatible primal–dual hybrid gradient method for the all-pair max-flow variant of the MCF problem, achieving orders-of-magnitude speed-ups over interior-point solvers. It remains to be investigated whether this is also true for the min-cost MCF problem. Destination-aggregated formulations have also been investigated in communication and optical networks [8]. These advances illustrate that scalability is based on exploiting the problem structure and avoiding the explicit introduction of unnecessary flow variables.

To address the master problem bottleneck, we study a tree-based decomposition of the MCF problem that are derived from the source-based formulation by representing flows as combinations of trees rooted at source nodes. This approach drastically reduces the size of the master problem in column generation while retaining the ability to model individual commodity demands. The master problem has now $O(|S|)$ demand constraints and $O(|E|)$ capacity constraints. The number of variables is still exponentially large but in practice much smaller than the path-based formulation since fewer tree variables are needed in an optimal basis. The approach pushes the scalability frontier of exact MCF optimization, solving instances with tens of thousands of nodes and millions of commodities.

The remainder of the paper is organized as follows. section 2 introduces the formulations of the MCF problem including the tree-based formulation introduced in this paper. section 3 describes the column generation algorithm used for the path- and tree-based formulations. section 4 presents computational experiments on large-scale instances, and section 5 concludes with directions for future research.

2 Problem statement

We present four formulations: the edge-based formulation, the source-based formulation, the path-based formulation commonly solved via column generation, and our novel tree-based formulation.

We consider a directed network $G=(V,E)$ with node set $V$ and edge set $E$ . Each edge $e\in E$ has capacity $u_{e}\geq 0$ . The set of commodities is $K$ , with each commodity $k\in K$ defined by source $s_{k}\in V$ , sink $t_{k}\in V$ , and demand $d_{k}>0$ . Sending one unit of commodity over edge $e$ incurs cost $c_{e}$ . The objective of the MCF problem is to route all demands at minimum cost without exceeding edge capacities.

2.1 Edge-based formulation

The classical edge-based linear program introduces variables $f^{k}_{e}$ for the flow of commodity $k$ on edge $e$ . The model is

$\displaystyle\min\quad$	$\displaystyle\sum_{k\in K}\sum_{e\in E}c_{e}f^{k}_{e}$		(1)
s.t.	$\displaystyle\sum_{e\in\delta^{+}(i)}f^{k}_{e}-\sum_{e\in\delta^{-}(i)}f^{k}_{e}=\begin{cases}d_{k},&i=s_{k},\\ -d_{k},&i=t_{k},\\ 0,&\text{otherwise},\end{cases}$	$\displaystyle\forall i\in V,\;k\in K,$	(2)
	$\displaystyle\sum_{k\in K}f^{k}_{e}\leq u_{e},$	$\displaystyle\forall e\in E,$	(3)
	$\displaystyle f^{k}_{e}\geq 0,$	$\displaystyle\forall e\in E,\;\forall k\in K.$	(4)

where $\delta^{+}(i)$ and $\delta^{-}(i)$ denote the outgoing and incoming edges of node $i$ , respectively. This formulation has $O(|K||E|)$ variables and $O(|K||V|+|E|)$ constraints.

2.2 Source-based formulation

Source-based formulations aggregate flows by source node. For each source $s\in S$ , let $f^{s}_{e}$ denote the total flow on edge $e$ originating at $s$ , and let $d^{s}_{i}=\sum_{k\in K:s_{k}=s}d_{k}$ denote the total demand from $s$ to node $i$ . The formulation is

$\displaystyle\min\quad$	$\displaystyle\sum_{s\in S}\sum_{e\in E}c_{e}f^{s}_{e}$		(5)
s.t.	$\displaystyle\sum_{e\in\delta^{+}(i)}f^{s}_{e}-\sum_{e\in\delta^{-}(i)}f^{s}_{e}=\begin{cases}d^{s}_{i}&\text{if }i=s\\ -d_{k}&\text{if }i=t_{k}\forall k\in K:s_{k}=s\\ 0&\text{otherwise}\end{cases}$	$\displaystyle\forall i\in V,\;\forall s\in S,$	(6)
	$\displaystyle\sum_{s\in S}f^{s}_{e}\leq u_{e},$	$\displaystyle\forall e\in E,$	(7)
	$\displaystyle f^{s}_{e}\geq 0,$	$\displaystyle\forall e\in E,\;\forall s\in S.$	(8)

This model has $O(|S||E|)$ variables and $O(|S||V|+|E|)$ constraints, where $|S|$ is the number of unique source nodes. When $|S|\ll|K|$ , this yields a significant reduction in model size compared to the edge-based formulation. The trade-off is that the commodity-level flow decomposition is lost and must be reconstructed in a post-processing step.

Remark (on reconstruction).

The source-aggregated flows can be disaggregated back into individual commodity flows through standard flow decomposition algorithms. For each source-sink pair $(s,t)$ , we iteratively extract $s$ – $t$ paths via augmenting-path search until the required demand is satisfied. This procedure yields a path-based decomposition with at most $|E|$ paths per source-sink pair and runs in $O(|E||V|)$ time per pair [1]. The resulting paths directly correspond to feasible commodity routings that respect capacity constraints [2].

2.3 Path-based formulation

Path-based formulations rely on a Dantzig–Wolfe decomposition of the edge-based formulation. For each commodity $k\in K$ , let $\mathcal{P}_{k}$ be the set of all feasible paths from $s_{k}$ to $t_{k}$ . Introduce variables $x^{k}_{p}\geq 0$ denoting the flow of commodity $k$ on path $p\in\mathcal{P}_{k}$ . The model is

$\displaystyle\min\quad$	$\displaystyle\sum_{k\in K}\sum_{p\in\mathcal{P}_{k}}c_{p}x^{k}_{p}$		(9)
s.t.	$\displaystyle\sum_{p\in\mathcal{P}_{k}}x^{k}_{p}=d_{k},$	$\displaystyle\forall k\in K,$	(10)
	$\displaystyle\sum_{k\in K}\sum_{p\in\mathcal{P}_{k}:e\in p}x^{k}_{p}\leq u_{e},$	$\displaystyle\forall e\in E,$	(11)
	$\displaystyle x^{k}_{p}\geq 0,$	$\displaystyle\forall k\in K,\;p\in\mathcal{P}_{k}.$	(12)

where $c_{p}=\sum_{e\in p}c_{e}$ is the cost of path $p$ .

This formulation arises as a Dantzig–Wolfe decomposition of the edge-based model. Each commodity $k$ defines a sub-problem whose feasible region is the set of all $s_{k}$ – $t_{k}$ flows of value $d_{k}$ . The extreme points of this polytope correspond to simple $s_{k}$ – $t_{k}$ paths. Thus, path variables $x^{k}_{p}$ play the role of convex combination weights over extreme points in the decomposition [9, 10, 3, 4]. The formulation has $O(|K|+|E|)$ constraints. The size of the path set $|\mathcal{P}_{k}|$ is exponential in $|V|$ and can be solved with column generation, see section 3.

2.4 Tree-based formulation

The tree-based formulation is the equivalent of a Dantzig–Wolfe decomposition of the source-based formulation by representing flows as convex combinations of shortest-path trees rooted at each source. Instead of sending flow independently along paths, we aggregate commodities with the same source into tree structures that simultaneously provide feasible routes to all sinks of that source. This yields a compact master problem, since the number of trees required can be much smaller than the number of paths.

For each source $s\in S$ , let $\mathcal{T}_{s}$ denote the set of trees rooted at $s$ that contain all sinks of commodities with $s_{k}=s$ . We denote the sinks $K_{s}=\{k\in K:s_{k}=s\}$ . A decision variable $x^{s}_{\tau}\geq 0$ is associated with each tree $\tau\in\mathcal{T}_{s}$ , indicating the fraction of flow routed through $\tau$ . Let $e\in\tau$ be an edge in the tree $\tau$ and let $p_{k}$ be the path from $s_{k}$ to $t_{k}$ in the tree $\tau$ for $k\in K_{s}$ . The model is

$\displaystyle\min\quad$	$\displaystyle\sum_{s\in S}\sum_{\tau\in\mathcal{T}_{s}}c^{\tau}x^{s}_{\tau}$		(13)
s.t.	$\displaystyle\sum_{\tau\in\mathcal{T}_{s}}x^{s}_{\tau}=1,$	$\displaystyle\forall s\in S,$	(14)
	$\displaystyle\sum_{s\in S}\sum_{\tau\in\mathcal{T}_{s}:\,e\in\tau}\bar{f}^{e}_{\tau}x^{s}_{\tau}\leq u_{e},$	$\displaystyle\forall e\in E,$	(15)
	$\displaystyle x^{s}_{\tau}\geq 0,$	$\displaystyle\forall s\in S,\;\tau\in\mathcal{T}_{s}.$	(16)

Here $\bar{f}^{e}_{\tau}=\sum_{k\in K_{s}}\sum_{p_{k}\in\tau:e\in p_{k}}d_{k}$ is the total flow on edge $e$ in tree $\tau$ , and $c_{\tau}=\sum_{e\in\tau}\bar{f}^{e}_{\tau}c_{e}$ denotes the cost of tree $\tau$ . The formulation has $O(|S|+|E|)$ constraints. The tree set $|\mathcal{T}_{s}|$ is exponential in size.

3 Column Generation

Column generation is a powerful decomposition technique for solving large-scale linear programming problems that would be intractable to solve directly. The method has been extensively studied and successfully applied to many problem classes including vehicle routing, crew scheduling, and network flow problems [3, 4].

The approach decomposes the original problem into a restricted master problem (RMP) containing only a subset of variables, and one or more pricing sub-problems that generate new variables (columns) as needed. The algorithm iterates between solving the RMP to obtain dual values, and using these to find columns with negative reduced cost in the pricing problems. When no such columns exist, the current solution is optimal.

For the MCF problem, we apply column generation to both the path-based and tree-based formulations described above. The key difference lies in the structure of the pricing problems - finding shortest paths versus shortest path trees. Below we detail the specific column generation procedures for each formulation, followed by implementation techniques that exploit the network structure to accelerate convergence.

3.1 Path generation

For the path-based formulation the column generation procedure alternates between:

•

Restricted master problem (RMP): solve (9)–(12) using only a subset of paths from $\cup_{k\in K}\mathcal{P}_{k}$ .
•

Pricing sub-problem: for each commodity $k$ , compute a shortest $s_{k}$ – $t_{k}$ path with respect to the reduced costs given by the dual variables of the RMP. If a path of negative reduced cost is found, it is added as a new column.

Let $\pi_{k}$ denote the dual variable associated with the demand constraint of commodity $k$ (10), and let $\mu_{e}$ denote the dual variable associated with the capacity constraint on edge $e$ (11). Then the reduced cost of a path $p\in\mathcal{P}_{k}$ is

\displaystyle\bar{c}_{p}=\sum_{e\in p}\big(c_{e}-\mu_{e}\big)-\pi_{k}.

(17)

Thus, finding a column of negative reduced cost for commodity $k$ amounts to computing an $s_{k}$ – $t_{k}$ path that minimizes $\sum_{e\in p}(c_{e}-\mu_{e})$ and then subtracting the demand dual $\pi_{k}$ . Since all edge weights $c_{e}\geq 0$ and $\mu_{e}\leq 0$ , the modified costs $c_{e}-\mu_{e}$ remain bounded below, so the pricing problem reduces to a shortest path problem in a weighted graph.

This allows the use of efficient single-source shortest path algorithms such as Dijkstra’s method. In particular, when many commodities share the same source $s$ , a single run of Dijkstra’s algorithm yields shortest paths to all sinks $t_{k}$ with $s_{k}=s$ , producing one pricing candidate for each commodity simultaneously. This reduces the number of Dijkstra runs per iteration from $|K|$ to $|S|$ .

3.2 Tree generation

For the tree-based formulation the procedure is:

•

Restricted master problem (RMP): solve (13)-(16) using only a subset of trees from $\cup_{k\in K}\mathcal{T}_{s}$ .
•

Pricing sub-problem: for each source $s\in S$ , compute a shortest path tree rooted at $s$ that covers all sinks $t_{k}$ where $s_{k}=s$ .

Let $\pi_{s}$ denote the dual variable of (14) and $\mu_{e}$ the dual variable of (15). The reduced cost of a tree $\tau\in\mathcal{T}_{s}$ is

\displaystyle\bar{c}_{\tau}=\sum_{e\in\tau}\bar{f}^{e}_{\tau}\big(c_{e}-\mu_{e}\big)-\pi_{s}.

(18)

Hence, pricing reduces to finding a minimum-cost tree rooted at $s$ with respect to modified edge costs $c^{s}_{e}-\mu_{e}$ . Since all edge weights are non-negative, this corresponds to computing a shortest-path tree (SPT). A single-source shortest path computation from source $s$ yields a set of shortest paths $p_{k}$ to all sinks $t_{k}$ with $s_{k}=s$ where the unique set of edges $\tau=\cap_{e\in p_{k},k\in K_{s}}e$ is the tree. The cost of the paths corresponds to sending one unit of flow from $s$ to all sinks $t_{k}$ with $s_{k}=s$ . The edge flow values $\bar{f}^{e}_{\tau}$ are calculated in a post-processing step as described in subsection 2.4 which allows us to calculate the reduced cost of the tree.

Observe that every path in the tree is the minimum reduced cost path from $s$ to any sink $t_{k}$ with $s_{k}=s$ . Hence, the tree is the minimum reduced cost tree even when scaled by demands $d_{k}$

3.3 Implementation details

Bounded single-source pricing.

For a fixed source $s$ , define $\Pi_{s}:=\max_{k:\,s_{k}=s}\pi_{k}$ . During a single run of Dijkstra on adjusted edge weights $\tilde{c}_{e}:=c_{e}-\mu_{e}$ , extract-min keys are tentative distances $d(v)$ from $s$ . If the current minimum key satisfies $d(v)\geq\Pi_{s}$ , then for every remaining destination $t_{k}$ with $s_{k}=s$ we have

\underbrace{d(t_{k})}_{\text{shortest}}-\pi_{k}\;\geq\;d(v)-\pi_{k}\;\geq\;\Pi_{s}-\pi_{k}\;\geq\;0,

so no path with negative reduced cost exists and the pricing for source $s$ can terminate early.

A* with reverse distance bounds.

Let $h(v)$ be an admissible and consistent lower bound on the remaining distance from $v$ to any destination $t_{k}$ with $s_{k}=s$ under the adjusted costs $\tilde{c}_{e}=c_{e}-\mu_{e}$ (e.g., obtained from a reverse single-source shortest path). Run A* on $\tilde{c}$ with keys $f(v)=g(v)+h(v)$ , where $g(v)$ is the tentative distance from $s$ . The same global stop test as above applies: if $\min_{v}f(v)\geq\Pi_{s}$ , no path of negative reduced cost exists for source $s$ . Exact reverse distances tighten $h$ but may be memory-intensive to store for large graphs when $|S|$ is large.

Precomputing reverse multi-target bounds.

We compute a lower bound $h(v)$ from any node $v$ to any destination $t_{k}$ with $s_{k}\in S$ . Running multi-source Dijkstra on the reverse graph yields

h(v)\;=\;\min_{k:\,s_{k}\in S}\;\text{dist}_{\text{rev}}(v,t_{k}),

which is an admissible and consistent heuristic for all commodities with source $s$ . This provides lower bounds that are worse than if we considered a single fixed source $s$ . However, in this scenario only $|V|$ lower bounds are stored which is much more memory efficient.

Lazy addition of capacity constraints.

In the RMP, the demand constraints, (10) and (14) respectively, are always enforced, while the capacity constraints, (11) and (15) respectively, constraints can be introduced lazily. At optimality, only the binding edge constraints matter, since the dual multiplier of a non-binding edge constraint is zero. Hence, adding all $|E|$ capacity constraints upfront is unnecessary. Instead, one begins with a subset (potentially empty) and monitors edge utilizations during column capacity constraints upfront is unnecessary. Instead, one begins with a subset (potentially empty) and monitors edge utilizations during column generation. Whenever an edge $e$ becomes violated, the corresponding capacity constraint is added to the RMP (row generation). This significantly reduces the initial model size and is essential for scalability on large networks [5, 6].

This mechanism is closely related to the active set strategy proposed in [6], where only those arc constraints that are likely to be saturated are kept in the RMP. Non-congested arcs are temporarily ignored until they become active, at which point their capacity constraints are inserted. Such strategies exploit the empirical fact that in optimal solutions of large multi-commodity problems, only a small fraction of edges are congested. By combining column generation (to handle the exponential number of paths) with active set row management (to handle the potentially huge set of edge capacities), the master problem is kept compact in both dimensions. This dual stabilization on rows and columns is crucial for solving instances with millions of commodities and tens of thousands of edges.

Balancing master and pricing problems.

The computational effort between solving the master problem and pricing problem can vary significantly based on instance characteristics and network structure. We adapt our strategy accordingly.

When the master RMP is computationally easier:

•

Prioritize re-optimizing the master problem whenever lazy cuts need to be added, before attempting any pricing
•

Apply the pricing filter technique from [11] - only price commodities whose previously generated columns use edges from newly added capacity constraints
•

Only remove the filter if fewer than $\epsilon$ columns are found
•

In the final iteration, solve all pricing problems to prove optimality

When pricing is computationally easier:

•

In each iteration, attempt both lazy cut separation and pricing problem solutions
•

Continue solving pricing problems until finding $N$ columns with negative reduced cost
•

If fewer than $N$ columns are found, solve all remaining pricing problems to obtain valid lower bounds via reduced cost calculations

Slack variables.

To improve convergence we avoid infeasibility of the master problem by adding slack variables. When the number of demand constraints are less than the number of edges we add a slack variable to demand constraints (10) and (14) respectively. Otherwise we add a slack variable to edge capacity constraints (11) and (15). The cost of the slack variables is a sufficiently large value, e.g., sum of all edges for the path-based formulation or sum of all edges times the sum of demand for tree-based formulation.

Note, that since capacity constraints are added dynamically, adding edge slack may result in small infeasibilities until all relevant constraints are added. In this case restore feasibility through a phase 1 by adding artificial variables. This can mostly be avoided by initially adding a column for each pricing problem. We do this when edge slack variables are used.

4 Experiments

We evaluate the performance of the path-based and tree-based formulations on a diverse set of instances. For comparison, we compare against a commercial LP solver (MOSEK) using the source-based formulation.

All experiments were performed on a machine with an AMD Ryzen 9 3950X 16-Core Processor and 128GB RAM running Ubuntu 24.04. The algorithms were implemented in C++. We used MOSEK 11.0.27 interior point method without crossover as the LP solver for both the full formulation and the restricted master problems. We use up to 32 threads.

Throughout the experiments we set a relative optimality tolerance of 1e-4 and use a timeout of 7200 seconds. For grid, planar and transportation networks the pricing problem is considered "easy" and for intermodal transport networks the master problem is considered "easy".

4.1 Instances

We consider 4 families of instances, grid, planar, transportation networks, and intermodal transport networks. The grid and planar instances are from the MCF instances benchmark [12]. The transportation networks are from the transportation networks benchmark [13]. The intermodal transport networks are from the intermodal transport networks benchmark [11].

The provided transportation networks are non-linear maximum flow MFC problems, and as in [6] we convert them to min-cost MCF problems. The data is converted such that "free flow time" is used as the unit edge cost, and we divide the demands of a problem by a coefficient. As previously proposed, we increase the coefficient until the problems becomes feasible. Table 3 in Appendix A shows the coefficients used for the transportation network instances.

[6] addresses these problems (with less accuracy 1e-3 vs 1e-4 relative). However, we were not able to reproduce their result using their coefficient values with the current version of transportation network instances [13]. Hence, we have slightly different coefficients and optimal objective values.

The intermodal transport networks are derived from the Munich public transport network. We have used the code of [11] to generate the instances with seed 0. Preliminary tests showed that instances with seeds 1 to 9 were computationally equivalent to seed 0, so due to lack of difference in behaviors we only consider seed 0. Additionally we have filtered out any trips that were not connected in the generated network.

Table 1 shows the characteristics of the instance families. Grid and planar are relatively small but have many commodity sources per vertices. The transportation networks have a large number of commodities with many shared sources. The intermodal transport networks are large in number of vertices and edges but have a relatively small number of commodities with no shared sources. See the Table 4 in Appendix A for a full overview over instance characteristics.

Family	Nodes	Edges	Commodities	Sources	Variables	Constraints	Non-zeros
Grid	523	2,011	4,967	463	1.6e+06	4.1e+05	4.7e+06
Planar	607	3,230	13,920	607	4.8e+06	9.2e+05	1.4e+07
Transportation	10,833	27,108	943,682	1,114	5.0e+07	2.0e+07	1.5e+08
Intermodal	177,179	1,893,620	22,195	22,195	6.4e+10	4.6e+09	1.9e+11

Table 1: Instance family characteristics (mean values). Number of variables, constraints, and non-zeroes are based on the source-based formulation.

4.2 Performance

The performance profile of the path-based, tree-based formulations, and the source-based formulation is shown in Figure 1. The tree-based formulation outperforms the path-based formulation on most instances, the exception being planar2500. Both column generation formulations solves all instances (43) within the time limit compared to the source-based formulation (26) that cannot solve the largest instances due to memory limitations. It is noteworthy, that the source-based formulation with MOSEK did not timeout on any of the instances if it was possible to construct the instance in memory. Detailed results for the instances are shown in Table 5 in Appendix A.

Refer to caption — Figure 1: Performance profile of the path-based and tree-based formulations against the source-based formulation

The cactus plot of the path-based, tree-based formulations, and the source-based formulation is shown in Figure 2 and Figure 3 for runtime and memory respectively. From the plots it is seen that both column generation algorithms are on par with each other, with the tree-based formulation being slightly faster and using a little less memory. The source-based formulation is significantly slower and as expected it uses much more memory.

Table 2 summarize the geometric mean results for instances for runtime and memory. Runtime means are shifted by 10 seconds to reduce the impact of very small runtimes. Memory means are not shifted. From the scaled results it is clear that both column generation formulations outperform the source-based formulation both in runtime and memory.

Solver	Time		Memory		Solved of 43
Solver	Unscaled	Scaled	Unscaled	Scaled
Tree	15.0	1.00	0.7	1.00	43
Path	21.2	1.42	0.9	1.39	43
Source	313.4	20.96	12.6	18.57	26

Table 2: Shifted geometric mean runtime and memory results

Comparison to other work.

We compare our implementation against several recent approaches from the literature. For the planar2500 instance, which was the largest instance solved in [5], our implementation is approximately 7 times faster, solving it in under 900 seconds compared to their 6169 seconds on the same machine.

The transportation network instances from [6] include challenging cases like ChicagoRegional and Philadelphia that could not be solved to an accuracy of 1e-4 in their work. While our problem coefficients differ slightly, we successfully solve all instances, including the larger Sydney network, to 1e-4 accuracy in under 700 seconds.

For the intermodal transport networks, we compare against [11] who solved their largest instance (SBT-56295) in 2856 seconds on an Intel Core i9-9900 (3.1 GHz). Accounting for processor speed differences based on benchmark data, their solution time normalizes to approximately 1203 seconds on our AMD Ryzen 9 3950X (3.5 GHz). In comparison, our implementation solves the same instance in just 115 seconds on average - more than a 10-fold improvement in performance.

4.3 Path and tree-based formulations comparison

Figure 4 and Figure 5 show the runtime and memory usage on the y-axis and size of the instance (in the number of non-zeroes in the LP) on the x-axis. The source-based formulation is included for completeness. The transportation network instances are located in the middle of the plots and the intermodal transport network instances are located most right in the plots.

It is seen that the tree- and path-based behave identical on the intermodal transport network instances. This is to be expected since all commodities have a unique source and destination with a demand of 1, hence the formulations are equivalent as all shortest paths trees are paths.

For the transportation network instances the tree-based formulation is faster than the path-based formulation. The plots indicate that high memory usage is correlated with high runtime, and the tree-based formulation is expected to use less memory than the path-based formulation due to fewer columns in the master problem.

The scatter plot of the path-based and tree-based formulations is shown in Figure 6. The formulations behave identical on the grid, planar, and intermodal transport network instances, except for the instance planar2500. The tree-based formulation is faster on the transportation network instances that have many shared sources.

The relation between the number of commodities and the number of shared sources is explored via the heatmap in Figure 7. It shows the relations between the number of commodities and the fraction of shared sources and is color coded according to the speed-up. A large speed-up (yellow) indicates that the tree-based formulation is faster than the path-based formulation. It is seen that the tree-based formulation is faster than the path-based formulation for instances with a large number of commodities and a small fraction of shared sources. This is expected as the number of demand constraints in the master problem is only $O(|S|)$ for the tree-based formulation and $O(|K|)$ for the path-based formulation.

We observe that the path-based formulation is faster on the largest planar instances. We expect this to be due to too slow column generation convergence and many pricing iterations for the tree-based formulation. The tree-based formulation generates at most $|S|$ columns per iteration, while the path-based formulation generates up to $|K|$ columns per iteration. This may stabilize the column generation process faster for the path-based formulation even if each master problem iteration is slower.

5 Conclusion

In this paper, we introduced a novel tree-based formulation for the multi-commodity flow problem. Rather than decomposing by individual commodity paths, our key innovation is to aggregate flows by shared sources and represent them using trees. This reduces the master problem size from O(|K|) to O(|S|) constraints, where |S| << |K| in many practical applications.

We demonstrated the advantages of this formulation through extensive computational experiments comparing it against traditional path-based decomposition and direct LP solving with MOSEK across diverse instance types including grid, planar, transportation, and intermodal networks. Our implementation using column generation showed that:

•

The tree-based formulation achieves up to 5x reduction in memory usage compared to path-based approaches by maintaining a more compact master problem
•

On large instances with many shared sources, the tree-based method scales significantly better, solving problems with millions of commodities with 5-30x speed-ups over the path-based formulation
•

Both decomposition approaches substantially outperform direct LP solving, with the tree-based formulation showing particular advantages on transportation networks where source sharing is common

The results establish that our tree-based formulation represents a significant advance in solving large-scale multi-commodity flow problems. By exploiting the natural structure of shared sources, it overcomes key scalability limitations of existing methods. The dramatic reduction in memory requirements and improved solution times make it possible to tackle problem sizes that were beyond the reach of previous approaches.

This innovation is particularly impactful for real-world applications in transportation and logistics networks where multiple commodities frequently share origins. The ability to efficiently handle millions of commodities while maintaining individual demand constraints opens new possibilities for detailed network optimization at scale.

Several promising directions could further enhance this approach. First-order optimization methods and GPU acceleration could improve master problem solve times. The pricing problem could benefit from edge filtering, path generation heuristics, and warm-starting techniques. Strategic selection of source nodes through pattern analysis and partitioning could also yield additional performance gains. These extensions could help realize the full potential of the tree-based paradigm for extremely large-scale network optimization.

References

[1] Ravindra K. Ahuja, Thomas L. Magnanti, and James B. Orlin. Network Flows: Theory, Algorithms, and Applications. Prentice Hall, Englewood Cliffs, NJ, 1993.
[2] Khodakaram Salimifard and Sara Bigharaz. The multicommodity network flow problem: state of the art classification, applications, and solution methods. Operational Research, 22:1–47, 2022.
[3] Jacques Desrosiers, Marco Lübbecke, Guy Desaulniers, and Jean Bertrand Gauthier. Branch-and-price. Les Cahiers du GERAD G-2024-36, Groupe d’études et de recherche en analyse des décisions, GERAD, Montréal QC H3T 2A7, Canada, June 2024.
[4] Eduardo Uchoa, Artur Pessoa, and Lorenza Moreno. Optimizing with Column Generation: Advanced branch-cut-and-price algorithms (Part I). Technical Report L-2024-3, Cadernos do LOGIS-UFF, Universidade Federal Fluminense, Engenharia de Produção, August 2024.
[5] Jacek Gondzio, Pablo González-Brevis, and Pedro Munari. Large-scale optimization with the primal–dual column generation method. Mathematical Programming Computation, 8:47–82, 2016.
[6] Frédéric Babonneau, Olivier du Merle, and Jean-Philippe Vial. Solving large-scale linear multicommodity flow problems with an active set strategy and proximal-accpm. Operations Research, 54(1):184–197, 2006.
[7] Fangzhao Zhang and Stephen Boyd. Solving large multicommodity network flow problems on gpus. arXiv preprint, 2025.
[8] Ping Yin, Steven Diamond, Bill Lin, and Stephen Boyd. Network optimization for unified packet and circuit switched networks. Optimization and Engineering, 21:159–180, 2020.
[9] Cynthia Barnhart, Charles A. Hane, Ellis L. Johnson, and George Sigismondi. A column generation and partitioning approach for multi-commodity flow problems. Telecommunication Systems, 3:239–258, 1994.
[10] Cynthia Barnhart, Charles A. Hane, and Patrick H. Vance. Using branch-and-price-and-cut to solve origin-destination integer multicommodity flow problems. Operations Research, 48(2):318–326, 2000.
[11] Benedikt Lienkamp and Maximilian Schiffer. Column generation for solving large scale multi-commodity flow problems for passenger transportation. European Journal of Operational Research, 314(2):703–717, 2024.
[12] University of Pisa CommadLab. Multicommodity flow problems, 2025. Accessed 2025-08-27, https://commalab.di.unipi.it/datasets/mmcf/.
[13] Transportation Networks for Research Core Team. Transportation networks for research, 2025. GitHub repository. commit #d1639b4, Accessed 2025-01-27, https://github.com/bstabler/TransportationNetworks.

Appendix A Appendix

A.1 Instance details

Table 3 shows the coefficients used for the generation of min-cost transportation network instances. Table 4 shows the instance details.

Instance	Coefficient
Transportation
Austin	6.0
Barcelona	5050.0
BerlinCenter	0.5
Birmingham	0.9
ChicagoRegional	4.1
ChicagoSketch	2.4
Philadelphia	7.0
Sydney	1.9
Winnipeg	2000.0

Table 3: Coefficients used for the generation of min-cost transportation network instances

Instance	Nodes	Edges	Commodities	Sources	Variables	Constraints	Non-zeros
Grid
grid1	25	80	50	22	1.8e+03	6.3e+02	5.4e+03
grid2	25	80	100	25	2.0e+03	7.0e+02	6.1e+03
grid3	100	360	50	40	1.4e+04	4.4e+03	4.3e+04
grid4	100	360	100	64	2.3e+04	6.8e+03	6.9e+04
grid5	225	840	100	82	6.9e+04	1.9e+04	2.1e+05
grid6	225	840	200	138	1.2e+05	3.2e+04	3.5e+05
grid7	400	1,520	400	251	3.8e+05	1.0e+05	1.1e+06
grid8	625	2,400	500	356	8.5e+05	2.2e+05	2.6e+06
grid9	625	2,400	1,000	495	1.2e+06	3.1e+05	3.6e+06
grid10	625	2,400	2,000	603	1.4e+06	3.8e+05	4.3e+06
grid11	625	2,400	4,000	625	1.5e+06	3.9e+05	4.5e+06
grid12	900	3,480	6,000	898	3.1e+06	8.1e+05	9.4e+06
grid13	900	3,480	12,000	900	3.1e+06	8.1e+05	9.4e+06
grid14	1,225	4,760	16,000	1,225	5.8e+06	1.5e+06	1.8e+07
grid15	1,225	4,760	32,000	1,225	5.8e+06	1.5e+06	1.8e+07
Planar
planar30	30	150	92	29	4.4e+03	1.0e+03	1.3e+04
planar80	80	440	543	80	3.5e+04	6.8e+03	1.1e+05
planar100	100	532	1,085	100	5.3e+04	1.1e+04	1.6e+05
planar150	150	850	2,239	150	1.3e+05	2.3e+04	3.8e+05
planar300	300	1,680	3,584	300	5.0e+05	9.2e+04	1.5e+06
planar500	500	2,842	3,525	500	1.4e+06	2.5e+05	4.3e+06
planar800	800	4,388	12,756	800	3.5e+06	6.4e+05	1.1e+07
planar1000	1,000	5,200	20,026	1,000	5.2e+06	1.0e+06	1.6e+07
planar2500	2,500	12,990	81,430	2,500	3.2e+07	6.3e+06	9.8e+07
Transportation
Austin	7,388	18,956	1,080,603	1,117	2.1e+07	8.3e+06	6.5e+07
Barcelona	1,020	2,522	7,922	108	2.7e+05	1.1e+05	8.3e+05
BerlinCenter	12,981	28,370	49,688	862	2.4e+07	1.1e+07	7.3e+07
Birmingham	14,639	33,937	470,805	898	3.0e+07	1.3e+07	9.2e+07
ChicagoRegional	12,982	39,018	2,296,227	1,768	6.9e+07	2.3e+07	2.1e+08
ChicagoSketch	933	2,950	93,135	386	1.1e+06	3.6e+05	3.5e+06
Philadelphia	13,389	40,003	1,149,795	1,489	6.0e+07	2.0e+07	1.8e+08
Sydney	33,113	75,379	3,340,619	3,257	2.5e+08	1.1e+08	7.4e+08
Winnipeg	1,052	2,836	4,344	138	3.9e+05	1.5e+05	1.2e+06
Intermodal
BUS-2632	119,857	397,233	2,628	2,628	1.0e+09	3.2e+08	3.1e+09
BUS-7896	130,367	710,464	7,883	7,883	5.6e+09	1.0e+09	1.7e+10
BUS-13160	140,871	1,022,654	13,135	13,135	1.3e+10	1.9e+09	4.0e+10
BUS-18424	151,377	1,336,774	18,388	18,388	2.5e+10	2.8e+09	7.4e+10
BUS-23688	161,887	1,651,215	23,643	23,643	3.9e+10	3.8e+09	1.2e+11
SBT-6255	163,479	809,333	6,251	6,251	5.1e+09	1.0e+09	1.5e+10
SBT-18765	188,467	1,782,451	18,745	18,745	3.3e+10	3.5e+09	1.0e+11
SBT-31275	213,481	2,760,176	31,252	31,252	8.6e+10	6.7e+09	2.6e+11
SBT-43785	238,491	3,741,011	43,757	43,757	1.6e+11	1.0e+10	4.9e+11
SBT-56295	263,515	4,724,888	56,269	56,269	2.7e+11	1.5e+10	8.0e+11

Table 4: Instance details. Number of variables, constraints, and non-zeroes are based on the source-based formulation.

A.2 Runtime details

Table 5 shows the runtime details for the instances.

Grid
Instance	Tree		Path		MOSEK		Objective
	Time(s)	Mem(GB)	Time(s)	Mem(GB)	Time(s)	Mem(GB)
grid1	0.0	0.0	0.0	0.0	-	-	8.2732e+05
grid2	0.0	0.1	0.0	0.0	-	-	1.7054e+06
grid3	0.0	0.0	0.0	0.0	-	-	1.5246e+06
grid4	0.0	0.1	0.0	0.0	-	-	3.0317e+06
grid5	0.0	0.1	0.1	0.1	-	-	5.0497e+06
grid6	0.3	0.1	0.2	0.1	-	-	1.0402e+07
grid7	0.4	0.1	0.3	0.1	-	-	2.5864e+07
grid8	1.6	0.2	1.3	0.2	-	-	4.1711e+07
grid9	3.4	0.2	2.8	0.2	-	-	8.2653e+07
grid10	4.3	0.3	4.1	0.3	-	-	1.6411e+08
grid11	3.9	0.2	4.4	0.3	-	-	3.2926e+08
grid12	4.4	0.2	4.2	0.3	-	-	5.7719e+08
grid13	10.2	0.3	12.4	0.4	-	-	1.1593e+09
grid14	5.5	0.3	8.2	0.4	-	-	1.8027e+09
grid15	10.6	0.4	18.8	0.6	-	-	3.5935e+09
Planar
planar30	0.0	0.0	0.0	0.0	-	-	4.4351e+07
planar80	0.4	0.1	0.3	0.1	-	-	1.8244e+08
planar100	0.3	0.1	0.5	0.1	-	-	2.3134e+08
planar150	4.4	0.2	4.1	0.3	-	-	5.4809e+08
planar300	1.6	0.2	1.9	0.2	-	-	6.8998e+08
planar500	0.7	0.2	1.0	0.2	-	-	4.8198e+08
planar800	5.5	0.3	8.6	0.4	-	-	1.1674e+09
planar1000	75.3	0.6	53.5	0.9	-	-	3.4496e+09
planar2500	5894.7	10.0	866.9	19.3	-	-	1.2662e+10
Transportation
Austin	670.6	14.5	4967.0	48.1	-	-	5.4329e+06
Barcelona	0.1	0.1	2.5	0.2	-	-	8.8588e+01
BerlinCenter	10.3	0.5	19.0	0.7	-	-	2.7529e+07
Birmingham	196.2	3.3	399.3	16.7	-	-	2.1565e+05
ChicagoRegional	68.2	6.1	981.8	33.4	-	-	5.5131e+06
ChicagoSketch	0.5	0.2	3.1	0.6	-	-	6.7059e+06
Philadelphia	380.6	7.4	1471.2	17.3	-	-	2.5578e+07
Sydney	66.4	11.1	2520.9	59.7	-	-	4.9653e+06
Winnipeg	0.0	0.1	2.9	0.2	-	-	2.3767e+02
Intermodal
BUS-2632	1.5	0.7	1.6	0.7	-	-	7.1027e+04
BUS-7896	4.7	2.3	4.3	2.3	-	-	2.1060e+05
BUS-13160	8.4	4.6	8.2	4.6	-	-	3.4836e+05
BUS-18424	12.9	7.8	13.7	7.8	-	-	4.8503e+05
BUS-23688	20.9	11.7	20.5	11.7	-	-	6.2488e+05
SBT-6255	3.7	2.2	3.6	2.2	-	-	1.6061e+05
SBT-18765	15.4	10.4	15.5	10.4	-	-	4.8111e+05
SBT-31275	38.0	24.5	37.3	24.5	-	-	8.0133e+05
SBT-43785	68.5	44.5	68.5	44.5	-	-	1.1236e+06
SBT-56295	114.3	70.6	112.6	70.6	-	-	1.4479e+06

Table 5: Detailed runtime comparison