Fundamentals of quantum information theory

Physics Reports 369 (2002) 431 – 548
www.elsevier.com/locate/physrep

Fundamentals of quantum information theory
Michael Keyl
TU-Braunschweig, Institute of Mathematical Physics, Mendelssohnstra e 3, D-38106 Braunschweig, Germany
Received 3 June 2002
editor: J. Eichler

Abstract
In this paper we give a self-contained introduction to the conceptional and mathematical foundations of
quantum information theory. In the ÿrst part we introduce the basic notions like entanglement, channels,
teleportation, etc. and their mathematical description. The second part is focused on a presentation of the
quantitative aspects of the theory. Topics discussed in this context include: entanglement measures, channel capacities, relations between both, additivity and continuity properties and asymptotic rates of quantum
operations. Finally, we give an overview on some recent developments and open questions.
c 2002 Elsevier Science B.V. All rights reserved.
PACS: 03.67.−a; 03.65.−w

Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.1. What is quantum information? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2. Tasks of quantum information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3. Experimental realizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2. Basic concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1. Systems, states and e ects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1.1. Operator algebras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1.2. Quantum mechanics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1.3. Classical probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1.4. Observables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2. Composite systems and entangled states . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.1. Tensor products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.2. Compound and hybrid systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.3. Correlations and entanglement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.4. Bell inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
E-mail address: m.keyl@tu-bs.de (M. Keyl).
0370-1573/02/$ - see front matter c 2002 Elsevier Science B.V. All rights reserved.
PII: S 0 3 7 0 - 1 5 7 3 ( 0 2 ) 0 0 2 6 6 - 1

433
434
436
438
439
439
440
441
442
443
444
444
445
446
447

432

M. Keyl / Physics Reports 369 (2002) 431 – 548

2.3. Channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.1. Completely positive maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.2. The Stinespring theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.3. The duality lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4. Separability criteria and positive maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4.1. Positivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4.2. The partial transpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4.3. The reduction criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3. Basic examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.1. Entanglement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.1.1. Maximally entangled states . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.1.2. Werner states . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.1.3. Isotropic states . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.1.4. OO-invariant states . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.1.5. PPT states . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.1.6. Multipartite states . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2. Channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.1. Quantum channnels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.2. Channels under symmetry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.3. Classical channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.4. Observables and preparations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.5. Instruments and parameter-dependent operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.6. LOCC and separable channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3. Quantum mechanics in phase space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3.1. Weyl operators and the CCR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3.2. Gaussian states . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3.3. Entangled Gaussians . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3.4. Gaussian channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4. Basic tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.1. Teleportation and dense coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.1.1. Impossible machines revisited: classical teleportation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.1.2. Entanglement enhanced teleportation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.1.3. Dense coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2. Estimating and copying . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2.1. Quantum state estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2.2. Approximate cloning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3. Distillation of entanglement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3.1. Distillation of pairs of qubits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3.2. Distillation of isotropic states . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3.3. Bound entangled states . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.4. Quantum error correction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.5. Quantum computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.5.1. The network model of classical computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.5.2. Computational complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.5.3. Reversible computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.5.4. The network model of a quantum computer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.5.5. Simons problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.6. Quantum cryptography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5. Entanglement measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.1. General properties and deÿnitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.1.1. Axiomatics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

448
449
450
450
451
451
452
453
453
454
454
455
456
457
459
460
461
461
463
464
464
465
467
468
468
469
470
472
473
473
474
474
476
477
477
478
479
480
481
481
482
485
485
486
487
487
490
491
493
493
493

5.1.2. Pure states . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.1.3. Entanglement measures for mixed states . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2. Two qubits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2.1. Pure states . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2.2. EOF for Bell diagonal states . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2.3. Wootters formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2.4. Relative entropy for Bell diagonal states . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.3. Entanglement measures under symmetry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.3.1. Entanglement of formation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.3.2. Werner states . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.3.3. Isotropic states . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.3.4. OO-invariant states . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.3.5. Relative entropy of entanglement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6. Channel capacity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.1. The general case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.1.1. The deÿnition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.1.2. Simple calculations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.2. The classical capacity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.2.1. Classical channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.2.2. Quantum channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.2.3. Entanglement assisted capacity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.2.4. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.3. The quantum capacity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.3.1. Alternative deÿnitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.3.2. Upper bounds and achievable rates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.3.3. Relations to entanglement measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7. Multiple inputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.1. The general scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.1.1. Figures of merit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.1.2. Covariant operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.1.3. Group representations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.1.4. Distillation of entanglement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.2. Optimal devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.2.1. Optimal cloning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.2.2. Puriÿcation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.2.3. Estimating pure states . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.2.4. The UNOT gate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.3. Asymptotic behaviour . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.3.1. Estimating mixed state . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.3.2. Puriÿcation and cloning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

433
495
497
498
499
500
501
502
503
503
503
505
505
507
509
509
509
510
513
513
514
514
515
519
519
520
525
526
526
526
528
529
532
532
532
534
535
538
538
539
541
543

1. Introduction
Quantum information and quantum computation have recently attracted a lot of interest. The
promise of new technologies like safe cryptography and new “super computers”, capable of handling
otherwise untractable problems, has excited not only researchers from many di erent ÿelds like physicists, mathematicians and computer scientists, but also a large public audience. On a practical level
all these new visions are based on the ability to control the quantum states of (a small number of)

434


microsystems individually and to use them for information transmission and processing. From a
more fundamental point of view the crucial point is a reconsideration of the foundations of quantum
mechanics in an information theoretical context. The purpose of this work is to follow the second
path and to guide physicists into the theoretical foundations of quantum information and some of
the most relevant topics of current research.
To this end the outline of this paper is as follows: The rest of this introduction is devoted to a
rough and informal overview of the ÿeld, discussing some of its tasks and experimental realizations.
Afterwards, in Section 2, we will consider the basic formalism which is necessary to present more
detailed results. Typical keywords in this context are: systems, states, observables, correlations, entanglement and quantum channels. We then clarify these concepts (in particular, entanglement and
channels) with several examples in Section 3, and in Section 4 we discuss the most important tasks
of quantum information in greater detail. The last three sections are devoted to a more quantitative analysis, where we make closer contact to current research: In Section 5 we will discuss how
entanglement can be measured. The topic of Section 6 are channel capacities, i.e. we are looking
at the amount of information which can maximally be transmitted over a noisy channel and in
Section 7 we consider state estimation, optimal cloning and related tasks.
Quantum information is a rapidly developing ÿeld and the present work can of course re ect only
a small part of it. An incomplete list of other general sources the reader should consult is: the books
of Lo [111], Gruska [76], Nielsen and Chuang [122], Bouwmeester et al. [23] and Alber et al. [3],
the lecture notes of Preskill [130] and the collection of references by Cabello [37] which particularly
contains many references to other reviews.
1.1. What is quantum information?
Classical information is, roughly speaking, everything which can be transmitted from a sender to
a receiver with “letters” from a “classical alphabet” e.g. the two digits “0” and “1” or any other
ÿnite set of symbols. In the context of classical information theory, it is completely irrelevant which
type of physical system is used to perform the transmission. This abstract approach is successful
because it is easy to transform information between di erent types of carriers like electric currents
in a wire, laser pulses in an optical ÿber, or symbols on a piece of paper without loss of data; and
even if there are losses they are well understood and it is known how to deal with them. However,
quantum information theory breaks with this point of view. It studies, loosely speaking, that kind
of information (“quantum information”) which is transmitted by microparticles from a preparation
device (sender) to a measuring apparatus (receiver) in a quantum mechanical experiment—in other
words, the distinction between carriers of classical and quantum information becomes essential.
This approach is justiÿed by the observation that a lossless conversion of quantum information into
classical information is in the above sense not possible. Therefore, quantum information is a new
kind of information.
In order to explain why there is no way from quantum to classical information and back, let
us discuss how such a conversion would look like. To convert quantum to classical information
we need a device which takes quantum systems as input and produces classical information as
output—this is nothing else than a measuring apparatus. The converse translation from classical
to quantum information can be rephrased similarly as “parameter-dependent preparation”, i.e. the
classical input to such a device is used to control the state (and possibly the type of system) in


435

Fig. 1.1. Schematic representation of classical teleportation. Here and in the following diagrams a curly arrow stands for
quantum systems and a straight one for the ow of classical information.
Fig. 1.2. A teleportation process should not a ect the results of a statistical experiment with quantum systems. A more
precise explanation of the diagram is given in the text.

which the microparticles should be prepared. A combination of these two elements can be done
in two ways. Let us ÿrst consider a device which goes from classical to quantum to classical
information. This is a possible task and in fact technically realized already. A typical example is
the transmission of classical information via an optical ÿber. The information transmitted through
the ÿber is carried by microparticles (photons) and is therefore quantum information (in the sense
of our preliminary deÿnition). To send classical information we have to prepare ÿrst photons in a
certain state send them through the channel and measure an appropriate observable at the output
side. This is exactly the combination of a classical → quantum with a quantum → classical device
just described.
The crucial point is now that the converse composition—performing the measurement M ÿrst and
the preparation P afterwards (cf. Fig. 1.1)—is more problematic. Such a process is called classical
teleportation, if the particles produced by P are “indistinguishable” from the input systems. We will
show the impossibility of such a device via a hierarchy of other “impossible machines” which traces
the problem back to the fundamental structure of quantum mechanics. This ÿnally will prove our
statement that quantum information is a new kind of information. 1
To start with, we have to clarify the precise meaning of “indistinguishable” in this context. This
has to be done in a statistical way, because the only possibility to compare quantum mechanical
systems is in terms of statistical experiments. Hence, we need an additional preparation device
P and an additional measuring apparatus M . Indistinguishable now means that it does not matter
whether we perform M measurements directly on P outputs or whether we switch a teleportation
device in between; cf. Fig. 1.2. In both cases we should get the same distribution of measuring
results for a large number of repetitions of the corresponding experiment. This requirement should
hold for any preparation P and any measurement M , but for ÿxed M and P. The latter means that
we are not allowed to use a priori knowledge about P or M to adopt the teleportation process
(otherwise we can choose in the most extreme case always P for P and the whole discussion
becomes meaningless).
1

The following chain of arguments is taken from [168], where it is presented in greater detail. This concerns, in
particular, the construction of Bell’s telephone from a joint measurement, which we have omitted here.

436


Fig. 1.3. Constructing a quantum copying machine from a teleportation device.
Fig. 1.4. Constructing a joint measurement for the observables A and B from a quantum copying machine.

The second impossible machine we have to consider is a quantum copying machine. This is a device C which takes one quantum system p as input and produces two systems p1 ; p2 of the same type
as output. The limiting condition on C is that p1 and p2 are indistinguishable from the input, where
“indistinguishable” has to be understood in the same way as above: Any statistical experiment performed with one of the output particles (i.e. always with p1 or always with p2 ) yields the same result
as applied directly to the input p. To get such a device from teleportation is easy: We just have to
perform an M measurement on p, make two copies of the classical data obtained, and run the preparation P on each of them; cf. Fig. 1.3. Hence if teleportation is possible copying is possible as well.
According to the “no-cloning theorem” of Wootters and Zurek [173], however, a quantum copy
machine does not exist and this basically concludes our proof. However, we will give an easy
argument for this theorem in terms of a third impossible machine—a joint measuring device MAB
for two arbitrary observables A and B. This is a measuring apparatus which produces each time it is
invoked a pair (a; b) of classical outputs, where a is a possible output of A and b a possible output
of B. The crucial requirement for MAB again is of statistical nature: The statistics of the a outcomes
is the same as for device A, and similarly for B. It is known from elementary quantum mechanics
that many quantum observables are not jointly measurable in this way. The most famous examples
are position and momentum or di erent components of angular momentum. Nevertheless, a device
MAB could be constructed for arbitrary A and B from a quantum copy machine C. We simply have
to operate with C on the input system p producing two outputs p1 and p2 and to perform an A
measurement on p1 and a B measurement on p2 ; cf. Fig. 1.4. Since the outputs p1 , p2 are, by
assumption, indistinguishable from the input p the overall device constructed this way would give
a joint measurement for A and B. Hence, a quantum copying machine cannot exist, as stated by the
no-cloning theorem. This in turn implies that classical teleportation is impossible, and therefore we
cannot transform quantum information lossless into classical information and back. This concludes
our chain of arguments.
1.2. Tasks of quantum information
So we have seen that quantum information is something new, but what can we do with it? There
are three answers to this question which we want to present here. First of all let us remark that


437

in fact all information in a modern data processing environment is carried by microparticles (e.g.
electrons or photons). Hence, quantum information comes automatically into play. Currently, it is
safe to ignore this and to use classical information theory to describe all relevant processes. If the
size of the structures on a typical circuit decreases below a certain limit, however, this is no longer
true and quantum information will become relevant.
This leads us to the second answer. Although it is far too early to say which concrete technologies
will emerge from quantum information in the future, several interesting proposals show that devices
based on quantum information can solve certain practical tasks much better than classical ones.
The most well known and exciting one is, without a doubt, quantum computing. The basic idea is,
roughly speaking, that a quantum computer can operate not only on one number per register but
on superpositions of numbers. This possibility leads to an “exponential speedup” for some computations which makes problems feasible which are considered intractable by any classical algorithm.
This is most impressively demonstrated by Shor’s factoring algorithm [139,140]. A second example
which is quite close to a concrete practical realization (i.e. outside the laboratory; see next section)
is quantum cryptography. The fact that it is impossible to perform a quantum mechanical measurement without disturbing the state of the measured system is used here for the secure transmission
of a cryptographic key (i.e. each eavesdropping attempt can be detected with certainty). Together
with a subsequent application of a classical encryption method known as the “one-time” pad this
leads to a cryptographic scheme with provable security—in contrast to currently used public key
systems whose security relies on possibly doubtful assumptions about (pseudo) random number generators and prime numbers. We will come back to both subjects, quantum computing and quantum
cryptography, in Sections 4.5 and 4.6.
The third answer to the above question is of more fundamental nature. The discussion of questions from information theory in the context of quantum mechanics leads to a deeper and in many
cases to more quantitative understanding of quantum theory. Maybe the most relevant example for
this statement is the study of entanglement, i.e. non-classical correlations between quantum systems,
which lead to violations of the Bell inequalities. 2 Entanglement is a fundamental aspect of quantum mechanics and demonstrates the di erences between quantum and classical physics in the most
drastical way—this can be seen from Bell-type experiments, like the one of Aspect et al. [5], and
the discussion about. Nevertheless, for a long time it was only considered as an exotic feature of
the foundations of quantum mechanics which is not so relevant from a practical point of view.
Since quantum information attained broader interest, however, this has changed completely. It has
turned out that entanglement is an essential resource whenever classical information processing is
outperformed by quantum devices. One of the most remarkable examples is the experimental realization of “entanglement enhanced” teleportation [24,22]. We have argued in Section 1.1 that classical
teleportation, i.e. transmission of quantum information through a classical information channel, is
impossible. If sender and receiver share, however, an entangled pair of particles (which can be used
as an additional resource) the impossible task becomes, most surprisingly, possible [11]! (We will
discuss this fact in detail in Section 4.1.) The study of entanglement and in particular the question how it can be quantiÿed is therefore a central topic within quantum information theory (cf.
Section 5). Further examples for ÿelds where quantum information has led to a deeper and in particular more quantitative insight include “capacities” of quantum information channels and “quantum
2

This is only a very rough characterization. A more precise one will be given in Section 2.2.

438


cloning”. A detailed discussion of these topics will be given in Sections 6 and 7. Finally, let us
remark that classical information theory beneÿts in a similar way from the synthesis with quantum
mechanics. Beside the just mentioned channel capacities this concerns, for example, the theory of
computational complexity which analyzes the scaling behavior of time and space consumed by an
algorithm in dependence of the size of the input data. Quantum information challenges here, in
particular, the fundamental Church–Turing hypotheses [45,152] which claims that each computation
can be simulated “e ciently” on a Turing machine; we come back to this topic in Section 4.5.
1.3. Experimental realizations
Although this is a theoretical paper, it is of course necessary to say something about experimental
realizations of the ideas of quantum information. Let us consider quantum computing ÿrst. Whatever
way we go here, we need systems which can be prepared very precisely in few distinct states (i.e.
we need “qubits”), which can be manipulated afterwards individually (we have to realize “quantum
gates”) and which can ÿnally be measured with an appropriate observable (we have to “read out”
the result).
One of the most far developed approaches to quantum computing is the ion trap technique (see
Sections 4.3 and 5.3 in [23] and Section 7.6 of [122] for an overview and further references).
A “quantum register” is realized here by a string of ions kept by electromagnetic ÿelds in high
vacuum inside a Paul trap, and two long-living states of each ion are chosen to represent “0” and “1”.
A single ion can be manipulated by laser beams and this allows the implementation of all “one-qubit
gates”. To get two-qubit gates as well (for a quantum computer we need at least one two qubit gate
together with all one-qubit operations; cf. Section 4.5) the collective motional state of the ions has to
be used. A “program” on an ion trap quantum computer starts now with a preparation of the register
in an initial state—usually the ground state of the ions. This is done by optical pumping and laser
cooling (which is in fact one of the most di cult parts of the whole procedure, in particular if many
ions are involved). Then the “network” of quantum gates is applied, in terms of a (complicated)
sequence of laser pulses. The readout ÿnally is done by laser beams which illuminate the ions
subsequently. The beams are tuned to a fast transition which a ects only one of the qubit states
and the uorescent light is detected. Concrete implementations (see e.g. [118,102]) are currently
restricted to two qubits; however, there is some hope that we will be able to control up to 10 or 12
qubits in the not too distant future.
A second quite successful technique is NMR quantum computing (see Section 5.4 of [23] and
Section 7.7 of [122] together with the references therein for details). NMR stands for “nuclear
magnetic resonance” and it is the study of transitions between Zeeman levels of an atomic nucleus
in a magnetic ÿeld. The qubits are in this case di erent spin states of the nuclei in an appropriate
molecule and quantum gates are realized by high-frequency oscillating magnetic ÿelds in pulses of
controlled duration. In contrast to ion traps, however, we do not use one molecule but a whole cup of
liquid containing some 1020 of them. This causes a number of problems, concerning in particular the
preparation of an initial state, uctuations in the free time evolution of the molecules and the readout.
There are several ways to overcome these di culties and we refer the reader again to [23,122] for
details. Concrete implementations of NMR quantum computers are capable to use up to ÿve qubits
[113]. Other realizations include the implementation of several known quantum algorithms on two
and three qubits; see e.g. [44,96,109].


439

The fundamental problem of the two methods for quantum computation discussed so far is their
lack of scalability. It is realistic to assume that NMR and ion-trap quantum computer with up to tens
of qubits will exist somewhere in the future but not with thousands of qubits which are necessary for
“real-world” applications. There are, however, many other alternative proposals available and some
of them might be capable to avoid this problem. The following is a small (not at all exhaustive) list:
atoms in optical lattices [28], semiconductor nanostructures such as quantum dots (there are many
works in this area, some recent are [149,30,21,29]) and arrays of Josephson junctions [112].
A second circle of experiments we want to mention here is grouped around quantum communication and quantum cryptography (for a more detailed overview let us refer to [163,69]). Realizations
of quantum cryptography are fairly far developed and it is currently possible to span up to 50 km
with optical ÿbers (e.g. [93]). Potentially greater distances can be bridged by “free space cryptography” where the quantum information is transmitted through the air (e.g [34]). With this technology
satellites can be used as some sort of “relays”, thus enabling quantum key distribution over arbitrary distances. In the meantime there are quite a lot of successful implementations. For a detailed
discussion we will refer the reader to the review of Gisin et al. [69] and the references therein.
Other experiments concern the usage of entanglement in quantum communication. The creation and
detection of entangled photons is here a fundamental building block. Nowadays this is no problem
and the most famous experiment in this context is the one of Aspect et al. [5], where the maximal violation of Bell inequalities was demonstrated with polarization correlated photons. Another
spectacular experiment is the creation of entangled photons over a distance of 10 km using standard
telecommunication optical ÿbers by the Geneva group [151]. Among the most exciting applications of entanglement is the realization of entanglement based quantum key distribution [95], the
ÿrst successful “teleportation” of a photon [24,22] and the implementation of “dense coding” [115];
cf. Section 4.1.

2. Basic concepts
After we have got a ÿrst, rough impression of the basic ideas and most relevant subjects of
quantum information theory, let us start with a more detailed presentation. First, we have to introduce
the fundamental notions of the theory and their mathematical description. Fortunately, much of the
material we should have to present here, like Hilbert spaces, tensor products and density matrices, is
known already from quantum mechanics and we can focus our discussion to those concepts which
are less familiar like POV measures, completely positive maps and entangled states.
2.1. Systems, states and e ects
As classical probability theory quantum mechanics is a statistical theory. Hence, its predictions
are of probabilistic nature and can only be tested if the same experiment is repeated very often and
the relative frequencies of the outcomes are calculated. In more operational terms this means: The
experiment has to be repeated according to the same procedure as it can be set out in a detailed
laboratory manual. If we consider a somewhat idealized model of such a statistical experiment we
get, in fact, two di erent types of procedures: ÿrst preparation procedures which prepare a certain

440


kind of physical system in a distinguished state and second registration procedures measuring
a particular observable.
A mathematical description of such a setup basically consists of two sets S and E and a map
S×E
( ; A) → (A) ∈ [0; 1]. The elements of S describe the states, i.e. preparations, while
the A ∈ E represent all yes=no measurements (e ects) which can be performed on the system. The
probability (i.e. the relative frequency for a large number of repetitions) to get the result “yes”, if
we are measuring the e ect A on a system prepared in the state , is given by (A). This is a
very general scheme applicable not only to quantum mechanics but also to a very broad class of
statistical models, containing, in particular, classical probability. In order to make use of it we have
to specify, of course, the precise structure of the sets S and E and the map (A) for the types of
systems we want to discuss.
2.1.1. Operator algebras
Throughout this paper we will encounter three di erent kinds of systems: Quantum and classical
systems and hybrid systems which are half classical, half quantum (cf. Section 2.2.2). In this subsection we will describe a general way to deÿne states and e ects which is applicable to all three
cases and which therefore provides a handy way to discuss all three cases simultaneously (this will
become most useful in Sections 2.2 and 2.3).
The scheme we are going to discuss is based on an algebra A of bounded operators acting
on a Hilbert space H. More precisely, A is a (closed) linear subspace of B(H), the algebra of
bounded operates on H, which contains the identity (5 ∈ A) and is closed under products (A; B ∈ A
⇒ AB ∈ A) and adjoints (A ∈ A ⇒ A∗ ∈ A). For simplicity we will refer to each such A as an
observable algebra. The key observation is now that each type of system we will study in the
following can be completely characterized by its observable algebra A, i.e. once A is known there
is a systematic way to derive the sets S and E and the map ( ; A) → (A) from it. We frequently
make use of this fact by referring to systems in terms of their observable algebra A, or even by
identifying them with their algebra and saying that A is the system.
Although A and H can be inÿnite dimensional in general, we will consider only ÿnite-dimensional
Hilbert spaces, as long as nothing else is explicitly stated. Since most research in quantum information
is done up to now for ÿnite-dimensional systems (the only exception in this work is the discussion
of Gaussian systems in Section 3.3) this is not a too severe loss of generality. Hence we can choose
H=Cd and B(H) is just the algebra of complex d×d matrices. Since A is a subalgebra of B(H)
it operates naturally on H and it inherits from B(H) the operator norm A = sup =1 A
and
the operator ordering A ¿ B ⇔ ; A ¿ ; B ∀ ∈ H. Now we can deÿne
S(A) = { ∈ A∗ | ¿ 0;

(5) = 1} ;

(2.1)

where A∗ denotes the dual space of A, i.e. the set of all linear functionals on A, and ¿ 0 means
(A) ¿ 0; ∀A ¿ 0. Elements of S(A) describe the states of the system in question while e ects
are given by
E(A) = {A ∈ A | A ¿ 0; A 6 5} :

(2.2)

The probability to measure the e ect A in the state is (A). More generally, we can look at (A)
for an arbitrary A as the expectation value of A in the state . Hence, the idea behind Eq. (2.1) is
to deÿne states in terms of their expectation value functionals.


441

Both spaces are convex, i.e. ; ∈ S(A) and 0 6 6 1 implies
+ (1 − ) ∈ S(A) and
similarly for E(A). The extremal points of S(A), respectively, E(A), i.e. those elements which
do not admit a proper convex decomposition (x = y + (1 − )z ⇒ = 1 or = 0 or y = z = x),
play a distinguished role: The extremal points of S(A) are pure states and those of E(A) are the
propositions of the system in question. The latter represent those e ects which register a property with
certainty in contrast to non-extremal e ects which admit some “fuzziness”. As a simple example for
the latter consider a detector which registers particles not with certainty but only with a probability
which is smaller than one.
Finally, let us note that the complete discussion of this section can be generalized easily to
inÿnite-dimensional systems, if we replace H = Cd by an inÿnite-dimensional Hilbert space (e.g.
H = L2 (R)). This would require, however, more material about C∗ algebras and measure theory
than we want to use in this paper.
2.1.2. Quantum mechanics
For quantum mechanics we have
A = B(H) ;

(2.3)

where we have chosen again H = Cd . The corresponding systems are called d-level systems or
qubits if d = 2 holds. To avoid clumsy notations we frequently write S(H) and E(H) instead
of S[B(H)] and E[B(H)]. From Eq. (2.2) we immediately see that an operator A ∈ B(H) is an
e ect i it is positive and bounded from above by 5. An element P ∈ E(H) is a propositions i P
is a projection operator (P 2 = P).
States are described in quantum mechanics usually by density matrices, i.e. positive and normalized
trace class 3 operators. To make contact to the general deÿnition in Eq. (2.1) note ÿrst that B(H) is
a Hilbert space with the Hilbert–Schmidt scalar product A; B =tr(A∗ B). Hence, each linear functional
∈ B(H)∗ can be expressed in terms of a (trace class) operator ˜ by 4 A → (A) = tr( ˜A). It is
obvious that each ˜ deÿnes a unique functional . If we start on the other hand with
we can
recover the matrix elements of ˜ from by ˜kj = tr( ˜|j k|) = (|j k|), where |j k| denotes the
canonical basis of B(H) (i.e. |j k|ab = ja kb ). More generally, we get for ; ∈ H the relation
; ˜ = (|
|), where |
| now denotes the rank one operator which maps Á ∈ H to ; Á .
In the following we drop the ∼ and use the same symbol for the operator and the functional whenever
confusion can be avoided. Due to the same abuse of language we will interpret elements of B(H)∗
frequently as (trace class) operators instead of linear functionals (and write tr( A) instead of (A)).
However, we do not identify B(H)∗ with B(H) in general, because the two di erent notations
help to keep track of the distinction between spaces of states and spaces of observables. In addition,
we equip B∗ (H) with the trace-norm
1 = tr | | instead of the operator norm.
Positivity of the functional implies positivity of the operator due to 0 6 (|
|) = ;
and the same holds for normalization: 1 = (5) = tr( ). Hence, we can identify the state space from
3

On a ÿnite-dimensional Hilbert space this attribute is of course redundant, since each operator is of trace class in this
case. Nevertheless, we will frequently use this terminology, due to greater consistency with the inÿnite-dimensional case.
4
If we consider inÿnite-dimensional systems this is not true. In this case the dual space of the observable algebra is
much larger and Eq. (2.1) leads to states which are not necessarily given by trace class operators. Such “singular states”
play an important role in theories which admit an inÿnite number of degrees of freedom like quantum statistics and
quantum ÿeld theory; cf. [25,26]. For applications of singular states within quantum information see [97].

442


Eq. (2.1) with the set of density matrices, as expected for quantum mechanics. Pure states of a
quantum system are the one-dimensional projectors. As usual, we will frequently identify the density
matrix |
| with the wave function and call the latter in abuse of language a state.
To get a useful parameterization of the state space consider again the Hilbert–Schmidt scalar
product ; =tr( ∗ ), but now on B∗ (H). The space of trace free matrices in B∗ (H) (alternatively
the functionals with (5) = 0) is the corresponding orthocomplement 5⊥ of the unit operator. If we
choose a basis 1 ; : : : ; d2 −1 with j ; k = 2 jk in 5⊥ we can write each self-adjoint (trace class)
operator with tr( ) = 1 as
5 1
= +
d 2

d2 − 1

xj j = :
j=1

5 1
+ ˜ ·˜
x
d 2

2

with ˜ ∈ Rd −1 :
x

(2.4)

If d = 2 or d = 3 holds, it is most natural to choose the Pauli matrices, respectively, the Gell–Mann
matrices (cf. e.g. [48], Section 13.4) for the j . In the qubit case it is easy to see that ¿ 0 holds
i |˜ | 6 1. Hence the state space S(C2 ) coincides with the Bloch ball {˜ ∈ R3 | |˜ | 6 1}, and the set
x
x
x
of pure states with its boundary, the Bloch sphere {˜ ∈ R3 | |˜ | = 1}. This shows in a very geometric
x
x
way that the pure states are the extremal points of the convex set S(H). If is more generally a
pure state of a d-level system we get
1 1
1 = tr( 2 ) = + |˜ |2 ⇒ |˜ | = 2(1 − 1=d) :
x
(2.5)
x
d 2
This implies that all states are contained in the ball with radius 21=2 (1 − 1=d)1=2 , however, not all
operators in this set are positive. A simple example is d−1 5 ± 21=2 (1 − 1=d)1=2 j , which is positive
only if d = 2 holds.
2.1.3. Classical probability
Since the di erence between classical and quantum systems is an important issue in this work let
us reformulate classical probability theory according to the general scheme from Section 2.1.1. The
restriction to ÿnite-dimensional observable algebras leads now to the assumption that all systems we
are considering admit a ÿnite set X of elementary events. Typical examples are: throwing a dice
X = {1; : : : ; 6}, tossing a coin X = {“head”; “number”} or classical bits X = {0; 1}. To simplify the
notations we write (as in quantum mechanics) S(X ) and E(X ) for the spaces of states and e ects.
The observable algebra A of such a system is the space
A = C(X ) = {f : X → C}

(2.6)

of complex-valued functions on X . To interpret this as an operator algebra acting on a Hilbert space
H (as indicated in Section 2.1.1) choose an arbitrary but ÿxed orthonormal basis |x ; x ∈ X in H
and identify the function f ∈ C(X ) with the operator f = x fx |x x| ∈ B(H) (we use the same
symbol for the function and the operator, provided confusion can be avoided). Most frequently we
have X = {1; : : : ; d} and we can choose H = Cd and the canonical basis for |x . Hence, C(X )
becomes the algebra of diagonal d × d matrices. Using Eq. (2.2) we immediately see that f ∈ C(X )
is an e ect i 0 6 fx 6 1; ∀x ∈ X . Physically, we can interpret fx as the probability that the e ect f
registers the elementary event x. This makes the distinction between propositions and “fuzzy” e ects
very transparent: P ∈ E(X ) is a proposition i we have either Px =1 or Px =0 for all x ∈ X . Hence, the
propositions P ∈ C(X ) are in one-to-one correspondence with the subsets !P = {x ∈ X | Px = 1} ⊂ X


443

which in turn describe the events of the system. Hence, P registers the event !P with certainty,
while a fuzzy e ect f ¡ P does this only with a probability less than one.
Since C(X ) is ÿnite dimensional and admits the distinguished basis |x x|; x ∈ X it is naturally isomorphic to its dual C∗ (X ). More precisely: each linear functional ∈ C∗ (X ) deÿnes and is uniquely
deÿned by the function x → x = (|x x|) and we have (f) = x fx x . As in the quantum case we
will identify the function with the linear functional and use the same symbol for both, although
we keep the notation C∗ (X ) to indicate that we are talking about states rather than observables.
Positivity of ∈ C∗ (X ) is given by x ¿ 0 for all x and normalization leads to 1 = (5) =
( x |x x|) = x x . Hence to be a state ∈ C∗ (X ) must be a probability distribution on X and
x is the probability that the elementary event x occurs during statistical experiments with systems in
the state . More generally (f) = j j fj is the probability to measure the e ect f on systems in
the state . If P is in particular, a proposition, (P) gives the probability for the event !P . The pure
states of the system are the Dirac measures x ; x ∈ X ; with x (|y y|) = xy . Hence, each ∈ S(X )
can be decomposed in a unique way into a convex linear combination of pure states.
2.1.4. Observables
Up to now we have discussed only e ects, i.e. yes=no experiments. In this subsection we will
have a ÿrst short look at more general observables. We will come back to this topic in Section 3.2.4
after we have introduced channels. We can think of an observable E taking its values in a ÿnite
set X as a map which associates to each possible outcome x ∈ X the e ect Ex ∈ E(A) (if A is the
observable algebra of the system in question) which is true if x is measured and false otherwise.
If the measurement is performed on systems in the state we get for each x ∈ X the probability
px = (Ex ) to measure x. Hence, the family of the px should be a probability distribution on X , and
this implies that E should be a positive operator-valued measure (POV measure) on X .
Deÿnition 2.1. Consider an observable algebra A ⊂ B(H) and a ÿnite 5 set X . A family E=(Ex )x∈X
of e ects in A (i.e. 0 6 Ex 6 5) is called a POV measure on X if x∈X Ex = 5 holds. If all Ex are
projections, E is called projection-valued measure (PV measure).
From basic quantum mechanics we know that observables are described by self-adjoint operators
on a Hilbert space H. But, how does this point of view ÿt into the previous deÿnition? The
answer is given by the spectral theorem [134, Theorem VIII.6]: Each self-adjoint operator A on a
ÿnite-dimensional Hilbert space H has the form A =
∈ (A) P where (A) denotes the spectrum
of A, i.e. the set of eigenvalues and P denotes the projection onto the corresponding eigenspace.
Hence, there is a unique PV measure P = (P ) ∈ (A) associated to A which is called the spectral
measure of A. It is uniquely characterized by the property that the expectation value
(P )
of P in the state is given for any state by (A) = tr( A); as it is well known from quantum
mechanics. Hence, the traditional way to deÿne observables within quantum mechanics perfectly ÿts
into the scheme just outlined, however it only covers the projection-valued case and therefore admits
no fuzziness. For this reason POV measures are sometimes called generalized observables.
5

This is of course an artiÿcial restriction and in many situations not justiÿed (cf. in particular the discussion of quantum
state estimation in Section 4.2 and Section 7). However, it helps us to avoid measure theoretical subtleties; cf. Holevo’s
book [79] for a more general discussion.

444


Finally, note that the eigenprojections P of A are elements of an observable algebra A i
A ∈ A. This shows two things: First of all we can consider self-adjoint elements of any ∗ -subalgebra
A of B(H) as observables of A-systems, and this is precisely the reason why we have called
A observable algebra. Secondly, we see why it is essential that A is really a subalgebra of B(H):
if it is only a linear subspace of B(H) the relation A ∈ A does not imply P ∈ A.
2.2. Composite systems and entangled states
Composite systems occur in many places in quantum information theory. A typical example is a
register of a quantum computer, which can be regarded as a system consisting of N qubits (if N is
the length of the register). The crucial point is that this opens the possibility for correlations and
entanglement between subsystems. In particular, entanglement is of great importance, because it is
a central resource in many applications of quantum information theory like entanglement enhanced
teleportation or quantum computing—we already discussed this in Section 1.2 of the Introduction.
To explain entanglement in greater detail and to introduce some necessary formalism we have to
complement the scheme developed in the last section by a procedure which allows us to construct
states and observables of the composite system from its subsystems. In quantum mechanics this is
done, of course, in terms of tensor products, and we will review in the following some of the most
relevant material.
2.2.1. Tensor products
Consider two (ÿnite dimensional) Hilbert spaces H and K. To each pair of vectors 1 ∈ H;
2 ∈ K we can associate a bilinear form 1 ⊗ 2 called the tensor product of 1 and 2 by 1 ⊗
2( 1; 2) =
1; 1
2 ; 2 . For two product vectors
1 ⊗ 2 and Á1 ⊗ Á2 their scalar product is
deÿned by 1 ⊗ 2 ; Á1 ⊗ Á2 = 1 ; Á1 2 ; Á2 and it can be shown that this deÿnition extends in a
unique way to the span of all 1 ⊗ 2 which therefore deÿnes the tensor product H ⊗ K. If we
have more than two Hilbert spaces Hj , j = 1; : : : ; N their tensor product H1 ⊗ · · · ⊗ HN can be
deÿned similarly.
The tensor product A1 ⊗ A2 of two bounded operators A1 ∈ B(H); A2 ∈ B(K) is deÿned ÿrst for
product vectors 1 ⊗ 2 ∈ H ⊗ K by A1 ⊗ A2 ( 1 ⊗ 2 ) = (A1 1 ) ⊗ (A2 2 ) and then extended by
linearity. The space B(H ⊗ K) coincides with the span of all A1 ⊗ A2 . If ∈ B(H ⊗ K) is not
of product form (and of trace class for inÿnite-dimensional H and K) there is nevertheless a way
to deÿne “restrictions” to H, respectively, K called the partial trace of . It is deÿned by the
equation
tr[tr K ( )A] = tr( A ⊗ 5)

∀A ∈ B(H) ;

(2.7)

where the trace on the left-hand side is over H and on the right-hand side over H ⊗ K.
If two orthonormal bases 1 ; : : : ; n and 1 ; : : : ; m are given in H, respectively, K we can
consider the product basis 1 ⊗ 1 ; : : : ; n ⊗ m in H ⊗ K, and we can expand each ∈ H ⊗ K
as
= jk jk j ⊗ k with jk = j ⊗ k ; . This procedure works for an arbitrary number
of tensor factors. However, if we have exactly a twofold tensor product, there is a more economic
way to expand , called Schmidt decomposition in which only diagonal terms of the form j ⊗ j
appear.


445

Proposition 2.2. For each element
of the twofold tensor product H ⊗ K there are orthonormal
systems j ; j =1; : : : ; n and k ; k =1; : : : ; n (not necessarily bases; i.e. n can be smaller than dim H
√
and dim K) of H and K; respectively; such that
= j
j j ⊗ j holds. The
j and j are
√
uniquely determined by . The expansion is called Schmidt decomposition and the numbers
j
are the Schmidt coe cients.
Proof. Consider the partial trace 1 = tr K (|
|) of the one-dimensional projector |
| associated to . It can be decomposed in terms of its eigenvectors n and we get tr K (|
|) = 1 =
with
n |. Now we can choose an orthonormal basis k ; k = 1; : : : ; m in K and expand
n n| n
respect to j ⊗ k . Carrying out the k summation we get a family of vectors j = k ; j ⊗ k k
with the property = j j ⊗ j . Now we can calculate the partial trace and get for any A ∈ B(H1 ):
j

j; A j

= tr( 1 A) =

; (A ⊗ 5)

j

=

j; A k

j

;

k

:

(2.8)

j; k

Since A is arbitrary we can compare the left- and right-hand side of this equation term by term and
−
we get j ; k = jk j . Hence; j = j 1=2 j is the desired orthonormal system.
As an immediate application of this result we can show that each mixed state ∈ B∗ (H) (of the
quantum system B(H)) can be regarded as a pure state on a larger Hilbert space H ⊗ H . We
just have to consider the eigenvalue expansion = j j | j
and to choose an arbitrary
j | of
orthonormal system j ; j = 1; : : : n in H . Using Proposition 2.2 we get
Corollary 2.3. Each state ∈ B∗ (H) can be extended to a pure state
Hilbert space H ⊗ H such that tr H |
| = holds.

on a larger system with

2.2.2. Compound and hybrid systems
To discuss the composition of two arbitrary (i.e. classical or quantum) systems it is very convenient
to use the scheme developed in Section 2.1.1 and to talk about the two subsystems in terms of their
observable algebras A ⊂ B(H) and B ⊂ B(K). The observable algebra of the composite system
is then simply given by the tensor product of A and B, i.e.
A ⊗ B:=span{A ⊗ B | A ∈ A; B ∈ B} ⊂ B(K ⊗ H) :

(2.9)

The dual of A ⊗ B is generated by product states, ( ⊗ )(A ⊗ B) = (A) (B) and we therefore
write A∗ ⊗ B∗ for (A ⊗ B)∗ .
The interpretation of the composed system A ⊗ B in terms of states and e ects is straightforward
and therefore postponed to the next subsection. We will consider ÿrst the special cases arising from
di erent choices for A and B. If both systems are quantum (A = B(H) and B = B(K)) we get
B(H) ⊗ B(K) = B(H ⊗ K)

(2.10)

as expected. For two classical systems A = C(X ) and B = C(Y ) recall that elements of C(X )
(respectively, C(Y )) are complex-valued functions on X (on Y ). Hence, the tensor product C(X ) ⊗
C(Y ) consists of complex-valued functions on X × Y , i.e. C(X ) ⊗ C(Y ) = C(X × Y ). In other words,
states and observables of the composite system C(X ) ⊗ C(Y ) are, in accordance with classical

446


probability theory, given by probability distributions and random variables on the Cartesian product
X × Y.
If only one subsystem is classical and the other is quantum; e.g. a microparticle interacting with
a classical measuring device we have a hybrid system. The elements of its observable algebra
C(X ) ⊗ B(H) can be regarded as operator-valued functions on X , i.e. X x → Ax ∈ B(H) and A
is an e ect i 0 6 Ax 6 5 holds for all x ∈ X . The elements of the dual C∗ (X ) ⊗ B∗ (H) are in a
similar way B∗ (X )-valued functions X x → x ∈ B∗ (H) and is a state i each x is a positive
trace class operator on H and x x = 1. The probability to measure the e ect A in the state is
x x (Ax ).
2.2.3. Correlations and entanglement
Let us now consider two e ects A ∈ A and B ∈ B then A ⊗ B is an e ect of the composite system
A ⊗ B. It is interpreted as the joint measurement of A on the ÿrst and B on the second subsystem,
where the “yes” outcome means “both e ects give yes”. In particular, A ⊗ 5 means to measure A on
the ÿrst subsystem and to ignore the second one completely. If is a state of A ⊗ B we can deÿne
its restrictions by A (A)= (A⊗5) and B (A)= (5⊗A). If both systems are quantum the restrictions
of are the partial traces, while in the classical case we have to sum over the B, respectively A,
variables. For two states 1 ∈ S(A) and 2 ∈ S(B) there is always a state of A ⊗ B such that
A and
B holds: We just have to choose the product state
1=
2=
1 ⊗ 2 . However, in general,
A ⊗ B which means nothing else then
we have =
also contains correlations between the two
subsystems.
Deÿnition 2.4. A state
of a bipartite system A ⊗ B is called correlated if there are some
A ∈ A; B ∈ B such that (A ⊗ B) = A (A) B (B) holds.
We immediately see that = 1 ⊗ 2 implies (A ⊗ B) = 1 (A) 2 (B) = A (A) B (B) hence is
not correlated. If on the other hand (A ⊗ B) = A (A) B (B) holds we get = A ⊗ B . Hence, the
deÿnition of correlations just given perfectly ÿts into our intuitive considerations.
An important issue in quantum information theory is the comparison of correlations between
quantum systems on the one hand and classical systems on the other. Hence, let us have a closer
look on the state space of a system consisting of at least one classical subsystem.
Proposition 2.5. Each state of a composite system A ⊗ B consisting of a classical (A = C(X ))
and an arbitrary system (B) has the form
A

=
j ∈X

j j

⊗

with positive weights

B

(2.11)

j

j

¿ 0 and

A ∈ S(A);
j

B ∈ S(B).
j

Proof. Since A = C(X ) is classical; there is a basis |j j| ∈ A; j ∈ X of mutually orthogonal
one-dimensional projectors and we can write each A ∈ A as
j aj |j j| (cf. Subsection 2.1.3).
For each state ∈ S(A ⊗ B) we can now deÿne A ∈ S(A) with A (A) = tr(A|j j|) = aj and
j
j

B ∈ S(B)

with B (B) =
j
with positive j as stated.
j

−1

j

(|j j| ⊗ B) and

j=

(|j j| ⊗ 5). Hence we get

447

=

j ∈X

A⊗ B

j j

j

If A and B are two quantum systems it is still possible for them to be correlated in the way
just described. We can simply prepare them with a classical random generator which triggers two
preparation devices to produce systems in the states A ; B with probability j . The overall state
j
j
produced by this setup is obviously the from Eq. (2.11). However, the crucial point is that not all
correlations of quantum systems are of this type! This is an immediate consequence of the deÿnition
of pure states = |
| ∈ S(H): Since there is no proper convex decomposition of , it can be
written as in Proposition 2.5 i
is a product vector, i.e. = ⊗ . This observation motivates
the following deÿnition.
Deÿnition 2.6. A state of the composite system B(H1 )⊗B(H2 ) is called separable or classically
correlated if it can be written as
(1)
j j

=

⊗

(2)
j

(2.12)

j
(k)
is called entangled. The set of all
with states j of B(Hk ) and weights j ¿ 0. Otherwise
separable states is denoted by D(H1 ⊗ H2 ) or just D if H1 and H2 are understood.

2.2.4. Bell inequalities
We have just seen that it is quite easy for pure states to check whether they are entangled or
not. In the mixed case however this is a much bigger, and in general unsolved, problem. In this
subsection we will have a short look at the Bell inequalities, which are maybe the oldest criterion
for entanglement (for a more detailed review see [169]). Today more powerful methods, most of
them based on positivity properties, are available. We will postpone the corresponding discussion to
the end of the following section, after we have studied (completely) positive maps (cf. Section 2.4).
Bell inequalities are traditionally discussed in the framework of “local hidden variable
theories”. More precisely we will say that a state
of a bipartite system B(H ⊗ K) admits a
hidden variable model, if there is a probability space (X; ) and (measurable) response functions
X
x → FA (x; k); FB (x; l) ∈ R for all discrete PV measures A = A1 ; : : : ; AN ∈ B(H), respectively
B = B1 ; : : : ; BM ∈ B(K), such that
X

FA (x; k)FB (x; l) (d x) = tr( Ak ⊗ Bl )

(2.13)

holds for all, k; l and A; B. The value of the functions FA (x; k) is interpreted as the probability
to get the value k during an A measurement with known “hidden parameter” x. The set of states
admitting a hidden variable model is a convex set and as such it can be described by an (inÿnite)
hierarchy of correlation inequalities. Any one of these inequalities is usually called (generalized)
Bell inequality. The most well-known one is those given by Clauser et al. [47]: The state satisÿes
the CHSH-inequality if
(A ⊗ (B + B ) + A ⊗ (B − B )) 6 2

(2.14)

448


holds for all A; A ∈ B(H), respectively B; B ∈ B(K), with −5 6 A; A 6 5 and −5 6 B; B 6 5. For
the special case of two dichotomic observables the CHSH inequalities are su cient to characterize
the states with a hidden variable model. In the general case the CHSH inequalities are a necessary
but not a su cient condition and a complete characterization is not known.
(1)
(2)
It is now easy to see that each separable state = n
admits a hidden variable
j=1 j j ⊗ j

model: we have to choose X = 1; : : : ; n; ({j}) = j ; FA (x; k) = (1) (Ak ) and FB analogously. Hence,
x
we immediately see that each state of a composite system with at least one classical subsystem
satisÿes the Bell inequalities (in particular the CHSH version) while this is not the case for pure
quantum systems. The most prominent examples are “maximally entangled states” (cf. Subsection
3.1.1) which violate the CHSH inequality (for appropriately chosen A; A ; B; B ) with a maximal value
√
of 2 2. This observation is the starting point for many discussions concerning the interpretation of
√
quantum mechanics, in particular because the maximal violation of 2 2 was observed in 1982
experimentally by Aspect and coworkers [5]. We do not want to follow this path (see [169] and
the references therein instead). Interesting for us is the fact that Bell inequalities, in particular the
CHSH case in Eq. (2.14), provide a necessary condition for a state to be separable. However,
there exist entangled states admitting a hidden variable model [165]. Hence, Bell inequalities are not
su cient for separability.
2.3. Channels

Assume now that we have a number of quantum systems, e.g. a string of ions in a trap.
To “process” the quantum information they carry we have to perform, in general, many steps
of a quite di erent nature. Typical examples are: free time evolution, controlled time evolution
(e.g. the application of a “quantum gate” in a quantum computer), preparations and measurements.
The purpose of this section is to provide a uniÿed framework for the description of all these di erent
operations. The basic idea is to represent each processing step by a “channel”, which converts input
systems, described by an observable algebra A into output systems described by a possibly di erent
algebra B. Henceforth we will call A the input and B the output algebra. If we consider e.g. the
free time evolution, we need quantum systems of the same type on the input and the output side;
hence, in this case we have A = B = B(H) with an appropriately chosen Hilbert space H. If on
the other hand, we want to describe a measurement we have to map quantum systems (the measured system) to classical information (the measuring result). Therefore, we need in this example
A = B(H) for the input and B = C(X ) for the output algebra, where X is the set of possible
outcomes of the measurement (cf. Section 2.1.4).
Our aim is now to get a mathematical object which can be used to describe a channel. To this
end consider an e ect A ∈ B of the output system. If we invoke ÿrst a channel which transforms
A systems into B systems, and measure A afterwards on the output systems, we end up with a
measurement of an e ect T (A) on the input systems. Hence, we get a map T : E(B) → E(A)
which completely describes the channel. 6 Alternatively, we can look at the states and interpret a
channel as a map T ∗ : S(A) → S(B) which transforms A systems in the state ∈ S(A) into
B systems in the state T ∗ ( ). To distinguish between both maps we can say that T describes the
channel in the Heisenberg picture and T ∗ in the Schrodinger picture. On the level of the statistical
6

Note that the direction of the mapping arrow is reversed compared to the natural ordering of processing.


449

interpretation both points of view should coincide of course, i.e. the probabilities 7 (T ∗ )(A) and
(TA) to get the result “yes” during an A measurement on B systems in the state T ∗ , respectively,
a TA measurement on A systems in the state , should be the same. Since (T ∗ )(A) is linear in
A we see immediately that T must be an a ne map, i.e. T ( 1 A1 + 2 A2 ) = 1 T (A1 ) + 2 T (A2 ) for
each convex linear combination 1 A1 + 2 A2 of e ects in B, and this in turn implies that T can be
extended naturally to a linear map, which we will identify in the following with the channel itself,
i.e. we say that T is the channel.
2.3.1. Completely positive maps
Let us change now slightly our point of view and start with a linear operator T : A → B. To be a
channel, T must map e ects to e ects, i.e. T has to be positive: T (A) ¿ 0 ∀A ¿ 0 and bounded from
above by 5, i.e. T (5) 6 5. In addition it is natural to require that two channels in parallel are again a
channel. More precisely, if two channels T : A1 → B1 and S : A2 → B2 are given we can consider
the map T ⊗ S which associates to each A ⊗ B ∈ A1 ⊗ A2 the tensor product T (A) ⊗ S(B) ∈ B1 ⊗ B2 .
It is natural to assume that T ⊗ S is a channel which converts composite systems of type A1 ⊗ A2
into B1 ⊗ B2 systems. Hence S ⊗ T should be positive as well [125].
Deÿnition 2.7. Consider two observable algebras A; B and a linear map T : A → B ⊂ B(H).
1. T is called positive if T (A) ¿ 0 holds for all positive A ∈ A.
2. T is called completely positive (cp) if T ⊗ Id : A ⊗ B(Cn ) → B(H) ⊗ B(Cn ) is positive for all
n ∈ N. Here Id denotes the identity map on B(Cn ).
3. T is called unital if T (5) = 5 holds.
Consider now the map T ∗ : B∗ → A∗ which is dual to T , i.e. T ∗ (A) = (TA) for all ∈ B∗ and
A ∈ A. It is called the Schrodinger picture representation of the channel T , since it maps states to
states provided T is unital. (Complete) positivity can be deÿned in the Schrodinger picture as in the
Heisenberg picture and we immediately see that T is (completely) positive i T ∗ is.
It is natural to ask whether the distinction between positivity and complete positivity is really
necessary, i.e. whether there are positive maps which are not completely positive. If at least one of
the algebras A or B is classical the answer is no: each positive map is completely positive in this
case. If both algebras are quantum, however, complete positivity is not implied by positivity alone.
We will discuss explicit examples in Section 2.4.2.
If item 2 holds only for a ÿxed n ∈ N the map T is called n-positive. This is obviously a weaker
condition than complete positivity. However, n-positivity implies m-positivity for all m 6 n, and for
A = B(Cd ) complete positivity is implied by n-positivity, provided n ¿ d holds.
Let us consider now the question whether a channel should be unital or not. We have already
mentioned that T (5) 6 5 must hold since e ects should be mapped to e ects. If T (5) is not equal
to 5 we get (T 5) = T ∗ (5) ¡ 1 for the probability to measure the e ect 5 on systems in the state
T ∗ , but this is impossible for channels which produce an output with certainty, because 5 is the
7

To keep notations more readable we will follow frequently the usual convention to drop the parenthesis around
arguments of linear operators. Hence, we will write TA and T ∗ instead of T (A) and T ∗ ( ). Similarly, we will simply
write TS instead of T ◦ S for compositions.

450


e ect which is always true. In other words: If a cp map is not unital it describes a channel which
sometimes produces no output at all and T (5) is the e ect which measures whether we have got an
output. We will assume in the future that channels are unital if nothing else is explicitly stated.
2.3.2. The Stinespring theorem
Consider now channels between quantum systems, i.e. A = B(H1 ) and B = B(H2 ). A fairly
simple example (not necessarily unital) is given in terms of an operator V : H1 → H2 by B(H1 )
A → VAV ∗ ∈ B(H2 ). A second example is the restriction to a subsystem, which is given in the
Heisenberg picture by B(H) A → A ⊗ 5K ∈ B(H ⊗ K). Finally, the composition S ◦ T = ST of
two channels is again a channel. The following theorem, which is the most fundamental structural
result about cp maps, 8 says that each channel can be represented as a composition of these two
examples [147].
Theorem 2.8 (Stinespring dilation theorem). Every completely positive map T : B(H1 ) → B(H2 )
has the form
T (A) = V ∗ (A ⊗ 5K )V ;

(2.15)

with an additional Hilbert space K and an operator V : H2 → H1 ⊗ K. Both (i.e. K and
V ) can be chosen such that the span of all (A ⊗ 5)V with A ∈ B(H1 ) and ∈ H2 is dense
in H1 ⊗ K. This particular decomposition is unique (up to unitary equivalence) and called the
minimal decomposition. If dim H1 = d1 and dim H2 = d2 the minimal K satisÿes dim K 6 d2 d2 .
1
By introducing a family | j j | of one-dimensional projectors with j | j j | = 5 we can deÿne
the “Kraus operators”
; Vj =
⊗ j ; V . In terms of them we can rewrite Eq. (2.15) in the
following form [105]:
Corollary 2.9 (Kraus form). Every completely positive map T : B(H1 ) → B(H2 ) can be written
in the form
N

T (A) =

Vj∗ AVj

(2.16)

j=1

with operators Vj : H2 → H1 and N 6 dim (H1 )dim (H2 ).
2.3.3. The duality lemma
We will consider a fundamental relation between positive maps and bipartite systems, which will
allow us later on to translate properties of entangled states to properties of channels and vice versa.
The basic idea originates from elementary linear algebra: A bilinear form
on a d-dimensional
vector space V can be represented by a d × d-matrix, just as an operator on V . Hence, we can
transform
into an operator simply by reinterpreting the matrix elements. In our situation things
8

Basically, there is a more general version of this theorem which works with arbitrary output algebras. It needs however
some material from representation theory of C*-algebras which we want to avoid here. See e.g. [125,83].


451

are more di cult, because the positivity constraints for states and channels should match up in the
right way. Nevertheless, we have the following theorem.
Theorem 2.10. Let be a density operator on H ⊗ H1 . Then there is a Hilbert space K a pure
state on H ⊗ K and a channel T : B(H1 ) → B(K) with
= (Id ⊗ T ∗ ) ;

(2.17)
B∗ (H).

The pure state can be chosen such that tr H ( )
where Id denotes the identity map on
has no zero eigenvalue. In this case T and are uniquely determined (up to unitary equivalence)
˜
˜∗
by Eq. (2.17); i.e. if ˜ ; T with = (Id ⊗ T ) ˜ are given; we have ˜ = (5 ⊗ U )∗ (5 ⊗ U ) and
˜
T (·) = U ∗ T (·)U with an appropriate unitary operator U .
Proof. The state is obviously the puriÿcation of tr H1 ( ). Hence if j and j are eigenvalues and
eigenvectors of tr H1 ( ) we can set = |
| with = j
j j ⊗ j where
j is an (arbitrary)
orthonormal basis in K. It is clear that is uniquely determined up to a unitary. Hence; we only
have to show that a unique T exists if
is given. To satisfy Eq. (2.17) we must have
(|

j

⊗ Ák

l

⊗ Ál |) =
=
=

; (Id ⊗ T )(|
;|

j

j l

l|

j

⊗ Ák

l

⊗ T (|Ák Áp |)

j ; T (|Ák

Áp |)

l

⊗ Ál |)
;

;

(2.18)
(2.19)
(2.20)

where Ák is an (arbitrary) orthonormal basis in H1 . Hence T is uniquely determined by in terms
of its matrix elements and we only have to check complete positivity. To this end it is useful to note
that the map → T is linear if the j are ÿxed. Hence; it is su cient to consider the case =|
|.
−
Inserting this into Eq. (2.20) we immediately see that T (A)=V ∗ AV with V j ; Ák = j 1=2 j ⊗Ák ;
holds. Hence T is completely positive. Since normalization T (5) = 5 follows from the choice of the
j the theorem is proved.
2.4. Separability criteria and positive maps
We have already stated in Section 2.3.1 that positive but not completely positive maps exist,
whenever input and output algebra are quantum. No such map represents a valid quantum operation,
nevertheless they are of great importance in quantum information theory, due to their deep relations
to entanglement properties. Hence, this section is a continuation of the study of separability criteria
which we have started in Section 2.2.4. In contrast to the rest of this section, all maps are considered
in the Schrodinger rather than in the Heisenberg picture.
2.4.1. Positivity
Let us consider now an arbitrary positive, but not necessarily completely positive map T ∗ :
∗ (H) → B∗ (K). If Id again denotes the identity map, it is easy to see that (Id ⊗ T ∗ )(
B
2 ⊗
∗
∗
2 ) = 1 ⊗ T ( 2 ) ¿ 0 holds for each product state 1 ⊗ 2 ∈ S(H ⊗ K). Hence (Id ⊗ T ) ¿ 0
for each positive T ∗ is a necessary condition for to be separable. The following theorem proved
in [86] shows that su ciency holds as well.

452


Theorem 2.11. A state ∈ B∗ (H⊗K) is separable i for any positive map T ∗ : B∗ (K) → B∗ (H)
the operator (Id ⊗ T ∗ ) is positive.
Proof. We will only give a sketch of the proof; see [86] for details. The condition is obviously
necessary since (Id ⊗ T ∗ ) 1 ⊗ 2 ¿ 0 holds for any product state provided T ∗ is positive. The proof
of su ciency relies on the fact that it is always possible to separate a point (an entangled state)
from a convex set D (the set of separable states) by a hyperplane. A precise formulation of this
idea leads to the following proposition.
Proposition 2.12. For any entangled state ∈ S(H ⊗ K) there is an operator A on H ⊗ K
called entanglement witness for ; with the property (A) ¡ 0 and (A) ¿ 0 for all separable
∈ S(H ⊗ K).
Proof. Since D ⊂ B∗ (H ⊗ K) is a closed convex set; for each ∈ S ⊂ B∗ (H ⊗ K) with ∈ D
there exists a linear functional on B∗ (H ⊗ K); such that ( ) ¡ 6 ( ) for each ∈ D with
a constant . This holds as well in inÿnite-dimensional Banach spaces and is a consequence of the
Hahn–Banach theorem (cf. [135; Theorem 3.4]). Without loss of generality; we can assume that =0
holds. Otherwise we just have to replace by − tr. Hence; the result follows from the fact that
each linear functional on B∗ (H ⊗ K) has the form ( ) = tr(A ) with A ∈ B(H ⊗ K).
To continue the proof of Theorem 2.11 associate now to any operator A ∈ B(H ⊗ K) the map
∗
TA : B∗ (K) → B∗ (H) with
tr(A

1

⊗

2)

T ∗

= tr(

1 TA ( 2 ))

;

(2.21)

where (·)T denotes the transposition in an arbitrary but ÿxed orthonormal basis |j , j = 1; : : : ; d. It
∗
is easy to see that TA is positive if tr(A 1 ⊗ 2 ) ¿ 0 for all product states 1 ⊗ 2 ∈ S(H ⊗ K)
[94]. A straightforward calculation [86] shows in addition that
∗
|(Id ⊗ TA )( ))

tr(A ) = tr(|

d−1=2

(2.22)
T ∗)

T ∗.

holds, where
=
¿ 0 for all positive
Since
j |j ⊗ |j . Assume now that (Id ⊗
∗
TA is positive this implies that the left-hand side of (2.22) is positive; hence tr(A ) ¿ 0 provided
tr(A ) ¿ 0 holds for all separable , and the statement follows from Proposition 2.12.
2.4.2. The partial transpose
The most typical example for a positive non-cp map is the transposition A = AT of d × d
matrices, which we have just used in the proof of Theorem 2.11.
is obviously a positive map,
but the partial transpose
B∗ (H ⊗ K)

→ (Id ⊗

)( ) ∈ B∗ (H ⊗ K)

(2.23)

is not. The latter can be easily checked with the maximally entangled state (cf. Section 3.1.1).
1
=√
|j ⊗ |j ;
(2.24)
d j
where |j ∈ Cd ; j = 1; : : : ; d denote the canonical basis vectors. In low dimensions the transposition
is basically the only positive map which is not cp. Due to results of StHrmer [148] and Woronowicz


453

[174] we have: dim H=2 and dim K=2; 3 imply that each positive map T ∗ : B∗ (H) → B∗ (K) has
∗
∗
∗
∗
the form T ∗ = T1 + T2 with two cp maps T1 ; T2 and the transposition on B(H). This immediately
implies that positivity of the partial transpose is necessary and su cient for separability of a state
∈ S(H ⊗ K) (cf. [86]):
Theorem 2.13. Consider a bipartite system B(H ⊗ K) with dim H = 2 and dim K = 2; 3. A state
∈ S(H ⊗ K) is separable i its partial transpose is positive.
To use positivity of the partial transpose as a separability criterion was proposed for the ÿrst
time by Peres [127], and he conjectured that it is a necessary and su cient condition in arbitrary
ÿnite dimension. Although it has turned out in the meantime that this conjecture is wrong in general
(cf. Section 3.1.5), partial transposition has become a crucial tool within entanglement theory and
we deÿne:
Deÿnition 2.14. A state ∈ B∗ (H ⊗ K) of a bipartite quantum system is called ppt-state if (Id ⊗
) ¿ 0 holds and npt-state otherwise (ppt = “positive partial transpose” and npt = “negative partial
transpose”).
2.4.3. The reduction criterion
Another frequently used example of a non-cp but positive map is B∗ (H)
→ T ∗ ( )=(tr )5−
∗ (H). The eigenvalues of T ∗ ( ) are given by tr − , where
∈B
i
i are the eigenvalues of . If
¿ 0 we have i ¿ 0 and therefore j j − k ¿ 0. Hence T ∗ is positive. That T ∗ is not completely
positive follows if we consider again the example |
| from Eq. (2.24); hence we get
5 ⊗ tr 2 ( ) − ¿ 0;

tr 1 ( ) ⊗ 5 − ¿ 0

(2.25)

for any separable state ∈ B∗ (H⊗K). These equations are another non-trivial separability criterion,
which is called the reduction criterion [85,42]. It is closely related to the ppt criterion, due to the
following proposition (see [85] for a proof).
Proposition 2.15. Each ppt-state ∈ S(H ⊗ K) satisÿes the reduction criterion. If dim H = 2
and dim K = 2; 3 both criteria are equivalent.
Hence we see with Theorem 2.13 that a state
satisÿes the reduction criterion.

in 2 × 2 or 2 × 3 dimensions is separable i it

3. Basic examples
After the somewhat abstract discussion in the last section we will become more concrete now. In
the following, we will present a number of examples which help on the one hand to understand the
structures just introduced, and which are of fundamental importance within quantum information on
the other.

454


3.1. Entanglement
Although our deÿnition of entanglement (Deÿnition 2.6) is applicable in arbitrary dimensions,
detailed knowledge about entangled states is available only for low-dimensional systems or for
states with very special properties. In this section we will discuss some of the most basic examples.
3.1.1. Maximally entangled states
Let us start with a look on pure states of a composite systems A⊗B and their possible correlations.
If one subsystem is classical, i.e. A = C({1; : : : ; d}), the state space is given according to Section
2.2.2 by S(B)d and ∈ S(B)d is pure i
= ( j1 ; : : : ; jd ) with j = 1; : : : ; d and a pure state of
the B system. Hence, the restrictions of to A, respectively, B are the Dirac measure j ∈ S(X )
or ∈ S(B), in other words both restrictions are pure. This is completely di erent if A and B are
quantum, i.e. A⊗B=B(H⊗K): Consider =|
| with ∈ H⊗K and Schmidt decomposition
1=2
(Proposition 2.2)
= j j j ⊗ j . Calculating the A restriction, i.e. the partial trace over K
we get
tr[tr K ( )A] = tr[|

1=2 1=2
j
k

|A ⊗ 5] =

j; A k

jk

;

(3.1)

jk

is entangled. The most extreme case arises if H=K=Cd
hence tr K ( )= j j | j
j | is mixed i
and tr K ( ) is maximally mixed, i.e. tr K ( ) = 5=d. We get for
1
=√
d

d
j

⊗

(3.2)

j

j=1

with two orthonormal bases 1 ; : : : ; d and 1 ; : : : ; d . In 2n × 2n dimensions these states violate
maximally the CHSH inequalities, with appropriately chosen operators A; A ; B; B . Such states are
therefore called maximally entangled. The most prominent examples of maximally entangled states
are the four “Bell states” for two qubit systems, i.e. H = K = C2 ; |1 ; |0 denotes the canonical
basis and
1
j = 1; 2; 3 ;
(3.3)
0 = √ (|11 + |00 );
j = i(5 ⊗ j ) 0 ;
2
where we have used the shorthand notation |jk for |j ⊗ |k and the j denote the Pauli matrices.
The Bell states, which form an orthonormal basis of C2 ⊗ C2 , are the best studied and most
relevant examples of entangled states within quantum information. A mixture of them, i.e. a density
matrix ∈ S(C2 ⊗ C2 ) with eigenvectors j and eigenvalues 0 6 j 6 1;
j j = 1, is called a
1
Bell diagonal state. It can be shown [16] that is entangled i maxj j ¿ 2 holds. We omit the
proof of this statement here, but we will come back to this point in Section 5 within the discussion
of entanglement measures.
Let us come back to the general case now and consider an arbitrary ∈ S(H ⊗ H). Using
maximally entangled states, we can introduce another separability criterion in terms of the maximally
entangled fraction (cf. [16])
F( ) =

sup

max: ent :

;

:

(3.4)


If is separable the reduction criterion (2.25) implies
; [tr 1 ( ) ⊗ 5 − ]
−1 5 we get
entangled state. Since the partial trace of |
| is d
d− 1 =

; tr 1 ( ) ⊗ 5

6

;

;

455

¿ 0 for any maximally
(3.5)

hence F( ) 6 1=d. This condition is not very sharp however. Using the ppt criterion it can be
shown that = | 1
1 | + (1 − )|00 00 (with the Bell state
1 ) is entangled for all 0 ¡ 6 1
but a straightforward calculation shows that F( ) 6 1=2 holds for 6 1=2.
Finally, we have to mention here a very useful parameterization of the set of pure states on
H ⊗ H in terms of maximally entangled states: If
is an arbitrary but ÿxed maximally entangled
state, each ∈ H ⊗ H admits (uniquely determined) operators X1 ; X2 such that
= (X1 ⊗ 5)

= (5 ⊗ X2 )

(3.6)

holds. This can be easily checked in a product basis.
3.1.2. Werner states
If we consider entanglement of mixed states rather than pure ones, the analysis becomes quite
di cult, even if the dimensions of the underlying Hilbert spaces are low. The reason is that the state
space S(H1 ⊗ H2 ) of a two-partite system with dim Hi = di is a geometric object in a (d2 d2 −
1 2
1)-dimensional space. Hence even in the simplest non-trivial case (two qubits) the dimension of the
state space becomes very high (15 dimensions) and naive geometric intuition can be misleading.
Therefore, it is often useful to look at special classes of model states, which can be characterized by
only few parameters. A quite powerful tool is the study of symmetry properties; i.e. to investigate the
set of states which is invariant under a group of local unitaries. A general discussion of this scheme
can be found in [159]. In this paper we will present only three of the most prominent examples.
Consider ÿrst a state ∈ S(H ⊗ H) (with H = Cd ) which is invariant under the group of all
U ⊗ U with a unitary U on H; i.e. [U ⊗ U; ] = 0 for all U . Such a is usually called a Werner
state [165,128] and its structure can be analyzed quite easily using a well-known result of group
theory which goes back to Weyl [171] (see also [142, Theorem IX.11.5]), and which we will state
in detail for later reference:
Theorem 3.1. Each operator A on the N-fold tensor product H⊗N of the (ÿnite dimensional)
Hilbert space H which commutes with all unitaries of the form U ⊗N is a linear combination of
permutation operators; i.e. A =
V ; where the sum is taken over all permutations
of N
elements; ∈ C and V is deÿned by
V

1

⊗ ··· ⊗

N

=

−1 (1)

⊗ ··· ⊗

−1 (N )

:

(3.7)

In our case (N = 2) there are only two permutations: the identity 5 and the ip F( ⊗ ) = ⊗ .
Hence = a5 + bF with appropriate coe cients a; b. Since is a density matrix, a and b are not
independent. To get a transparent way to express these constraints, it is reasonable to consider the
eigenprojections P± of F rather than 5 and F; i.e. FP± = ±P± and P± = (5 ± F)=2. The P± are
the projections on the subspaces H⊗2 ⊂ H ⊗ H of symmetric, respectively antisymmetric, tensor
±
products (Bose-, respectively, Fermi-subspace). If we write d± = d(d ± 1)=2 for the dimensions of

456


H⊗2 we get for each Werner state
±
(1 − )
P+ +
P− ;
∈ [0; 1] :
=
d+
d−

(3.8)

On the other hand, it is obvious that each state of this form is U ⊗ U invariant, hence a Werner
state.
If is given, it is very easy to calculate the parameter from the expectation value of and the
ip tr( F) = 2 − 1 ∈ [ − 1; 1]. Therefore, we can write for an arbitrary state ∈ S(H ⊗ H)
tr( F) + 1
(1 − tr F)
PUU ( ) =
P+ +
P−
(3.9)
2d+
2d−
and this deÿnes a projection from the full state space to the set of Werner states which is called
the twirl operation. In many cases it is quite useful that it can be written alternatively as a group
average of the form
PUU ( ) =

U (d)

(U ⊗ U ) (U ∗ ⊗ U ∗ ) dU ;

(3.10)

where dU denotes the normalized, left invariant Haar measure on U (d). To check this identity note
ÿrst that its right-hand side is indeed U ⊗ U invariant, due to the invariance of the volume element
dU . Hence, we have to check only that the trace of F times the integral coincides with tr(F ):
tr F

U (d)

(U ⊗ U ) (U ∗ ⊗ U ∗ ) dU =

U (d)

tr[F(U ⊗ U ) (U ∗ ⊗ U ∗ )] dU ;

= tr(F )

U (d)

dU = tr(F ) ;

(3.11)
(3.12)

where we have used the fact that F commutes with U ⊗ U and the normalization of dU . We
can apply PUU obviously to arbitrary operators A ∈ B(H ⊗ H) and, as an integral over unitarily
implemented operations, we get a channel. Substituting U → U ∗ in (3.10) and cycling the trace
tr(APUU ( )) we ÿnd tr(PUU (A) ) = tr(APUU ( )), hence PUU has the same form in the Heisenberg
∗
and the Schrodinger picture (i.e. PUU = PUU ).
If ∈ S(H ⊗ H) is a separable state the integrand of PUU ( ) in Eq. (3.10) consists entirely of
separable states, hence PUU ( ) is separable. Since each Werner state is the twirl of itself, we see
that is separable i it is the twirl PUU ( ) of a separable state ∈ S(H ⊗ H). To determine the
set of separable Werner states we therefore have to calculate only the set of all tr(F ) ∈ [ − 1; 1]
with separable . Since each such admits a convex decomposition into pure product states it is
su cient to look at
⊗ ;F ⊗

=|

;

|2 ;

(3.13)

which ranges from 0 to 1. Hence from Eq. (3.8) is separable i 1 6 6 1 and entangled otherwise
2
(due to = (tr(F ) + 1)=2). If H = C2 holds, each Werner state is Bell diagonal and we recover
the result from Section 3.1.1 (separable if highest eigenvalue less or equal than 1=2).
3.1.3. Isotropic states
To derive a second class of states consider the partial transpose (Id ⊗ ) (with respect to a
distinguished base |j ∈ H, j = 1; : : : ; d) of a Werner state . Since
is, by deÿnition, U ⊗ U


457

invariant, it is easy to see that (Id ⊗ ) is U ⊗ U invariant, where U denotes componentwise
T
complex conjugation in the base |j (we just have to use that U ∗ = U holds). Each state with
this kind of symmetry is called an isotropic state [132], and our previous discussion shows that
is a linear combination of 5 and the partial transpose of the ip, which is the rank one operator
˜
F = (Id ⊗

d

)F = |

|=

|jj kk| ;

(3.14)

jk=1

where = j |jj is, up to normalization a maximally entangled state. Hence, each isotropic can
be written as
5
1
d2
˜
;
(3.15)
=
+ (1 − )F ;
∈ 0; 2
d
d
d −1
where the bounds on follow from normalization and positivity. As above we can determine the
parameter from the expectation value
1 − d2
˜
tr(F ) =
+d ;
(3.16)
d
which ranges from 0 to d and this again leads to a twirl operation: For an arbitrary state ∈ S(H⊗
H) we can deÿne
1
˜
˜
˜
PU U ( ) =
([tr(F ) − d]5 + [1 − d tr(F )]F)
(3.17)
d(1 − d2 )
and as for Werner states PU U can be rewritten in terms of a group average
PU U ( ) =

∗

U (d)

(U ⊗ U ) (U ∗ ⊗ U ) dU :

(3.18)

∗
Now we can proceed in the same way as above: PU U is a channel with PU U = PU U , its ÿxed points
PU U ( ) = are exactly the isotropic states, and the image of the set of separable states under PU U
coincides with the set of separable isotropic states. To determine the latter we have to consider the
expectation values (cf. Eq. (3.13))

˜
⊗ ;F ⊗

d

=

j

j

=|

;

|2 ∈ [0; 1] :

(3.19)

j=1

This implies that is separable i
d2
d(d − 1)
6 6 2
(3.20)
d2 − 1
d −1
holds and entangled otherwise. For = 0 we recover the maximally entangled state. For d = 2, again
we recover again the special case of Bell diagonal states encountered already in the last subsection.
3.1.4. OO-invariant states
Let us combine now Werner states with isotropic states, i.e. we look for density matrices which
˜
can be written as = a5 + bF + cF, or, if we introduce the three mutually orthogonal projection
operators
1
1 ˜
1 ˜
1
p0 = F; p1 = (5 − F);
(5 + F) − F
(3.21)
d
2
2
d

458

∼

tr(F )
3

2

1

0

-1
-1

0

1

2

tr(F )

3

Fig. 3.1. State space of OO-invariant states (upper triangle) and its partial transpose (lower triangle) for d = 3. The special
cases of isotropic and Werner states are drawn as thin lines.

as a convex linear combination of tr(pj )−1 pj , j = 0; 1; 2:
p1
p2
+ 2
;
= (1 − 1 − 2 )p0 + 1
1 ; 2 ¿ 0;
tr(p1 )
tr(p2 )

1

+

2

61 :

(3.22)

Each such operator is invariant under all transformations of the form U ⊗ U if U is a unitary with
U = U , in other words: U should be a real orthogonal matrix. A little bit representation theory of
the orthogonal group shows that in fact all operators with this invariance property have the form
given in (3.22); cf. [159]. The corresponding states are therefore called OO-invariant, and we can
apply basically the same machinery as in Section 3.1.2 if we replace the unitary group U (d) by the
orthogonal group O(d). This includes, in particular, the deÿnition of a twirl operation as an average
over O(d) (for an arbitrary ∈ S(H ⊗ H)):
POO ( ) =

O(d)

U ⊗ U U ⊗ U ∗ dU ;

(3.23)

˜
which we can express alternatively in terms of the expectation values tr(F ), tr(F ) by
POO ( ) =

˜
1 + tr(F ) tr(F )
−
2
d

˜
tr(F )
1 − tr(F )
p0 +
p1 +
d
2 tr(p1 )

p2
:
tr(p2 )

(3.24)

˜
The range of allowed values for tr(F ), tr(F ) is given by
− 1 6 tr(F ) 6 1;

˜
0 6 tr(F ) 6 d;

tr(F ) ¿

For d = 3 this is the upper triangle in Fig. 3.1.

˜
2tr(F )
−1 :
d

(3.25)


459

The values in the lower (dotted) triangle belong to partial transpositions of OO-invariant states.
The intersection of both, i.e. the gray-shaded square Q = [0; 1] × [0; 1], represents therefore the set
of OO-invariant ppt states, and at the same time the set of separable states, since each OO-invariant
ppt state is separable. To see the latter note that separable OO-invariant states form a convex subset
of Q. Hence, we only have to show that the corners of Q are separable. To do this note that (1)
˜
POO ( ) is separable whenever is and (2) that tr(FPOO ( ))=tr(F ) and tr(FPOO ( ))=tr(F ) holds
(cf. Eq. (3.12)). We can consider pure product states | ⊗
⊗ | for and get (| ; |2 ; ; |2 )
˜ )). Now the point (1; 1) in Q is obtained if = is real, the point (0; 0)
for the tuple (tr(F ); tr(F
is obtained for real and orthogonal ; and the point (1; 0) belongs to the case = and ; =0.
Symmetrically we get (0; 1) with the same and = .
3.1.5. PPT states
We have seen in Theorem 2.13 that separable states and ppt states coincide in 2 × 2 and
2 × 3 dimensions. Another class of examples with this property are OO-invariant states just studied. Nevertheless, separability and a positive partial transpose are not equivalent. An easy way to
produce such examples of states which are entangled and ppt is given in terms of unextendible
product bases [14]. An orthonormal family j ∈ H1 ⊗ H2 , j = 1; : : : ; N ¡ d1 d2 (with dk = dim Hk )
is called an unextendible product basis 9 (UPB) i (1) all j are product vectors and (2) there is
no product vector orthogonal to all j . Let us denote the projector to the span of all j by E, its
orthocomplement by E ⊥ , i.e. E ⊥ = 5 − E, and deÿne the state = (d1 d2 − N )−1 E ⊥ . It is entangled
because there is by construction no product vector in the support of , and it is ppt. The latter can
be seen as follows: The projector E is a sum of the one-dimensional projectors | j
j |, j =1; : : : ; N .
| are of the form | ˜ j ˜ j |, with
Since all j are product vectors the partial transposes of the | j
j
another UPB ˜ j , j = 1; : : : ; N and the partial transpose (5 ⊗ )E of E is the sum of the | ˜ j ˜ j |.
Hence (5 ⊗ )E ⊥ = 5 − (5 ⊗ )E is a projector and therefore positive.
To construct entangled ppt states we have to ÿnd UPBs. The following two examples are taken
from [14]. Consider ÿrst the ÿve vectors
j

= N (cos(2 j=5); sin(2 j=5); h);

j = 0; : : : ; 4

(3.26)

√
√
with N = 2= 5 + 5 and h = 1 1 + 5. They form the apex of a regular pentagonal pyramid with
2
height h. The latter is chosen such that non-adjacent vectors are orthogonal. It is now easy to show
that the ÿve vectors
j

=

j

⊗

2j mod 5 ;

j = 0; : : : ; 4

(3.27)

form a UPB in the Hilbert space H ⊗ H, dim H = 3 (cf. [14]). A second example, again in
(3 × 3)-dimensional Hilbert space are the following ÿve vectors (called “Tiles” in [14]):
1
√ |0 ⊗ (|0 − |1 );
2
9

1
√ |2 ⊗ (|1 − |2 );
2

This name is somewhat misleading because the

j

1
√ (|0 − |1 ) ⊗ |2 ;
2

are not a base of H1 ⊗ H2 .

460


1
√ (|1 − |2 ) ⊗ |0 ;
2

1
(|0 + |1 + |2 ) ⊗ (|0 + |1 + |2 ) ;
3

(3.28)

where |k , k = 0; 1; 2 denotes the standard basis in H = C3 .
3.1.6. Multipartite states
In many applications of quantum information rather big systems, consisting of a large number
of subsystems, occur (e.g. a quantum register of a quantum computer) and it is necessary to study
the corresponding correlation and entanglement properties. Since this is a fairly di cult task, there
is not much known about—much less as in the two-partite case, which we mainly consider in this
paper. Nevertheless, in this subsection we will give a rough outline of some of the most relevant
aspects.
At the level of pure states the most signiÿcant di culty is the lack of an analog of the Schmidt
decomposition [126]. More precisely, there are elements in an N -fold tensor product H(1) ⊗· · ·⊗H(N )
(with N ¿ 2) which cannot be written as 10
d

=

j

(1)
j

(N )
j

⊗ ··· ⊗

(3.29)

j=1
(k)
(k)
with N orthonormal bases 1 ; : : : ; d of H(k) , k = 1; : : : ; N . To get examples for such states in
the tri-partite case, note ÿrst that any partial trace of |
| with
from Eq. (3.29) has separable
eigenvectors. Hence, each puriÿcation (Corollary 2.3) of an entangled, two-partite, mixed state with
inseparable eigenvectors (e.g. a Bell diagonal state) does not admit a Schmidt decomposition. This
implies on the one hand that there are interesting new properties to be discovered, but on the
other we see that many techniques developed for bipartite pure states can be generalized in a
straightforward way only for states which are Schmidt decomposable in the sense of Eq. (3.29).
The most well-known representative of this class for a tripartite qubit system is the GHZ state [73]
1
= √ (|000 + |111 ) ;
(3.30)
2
which has the special property that contradictions between local hidden variable theories and quantum
mechanics occur even for non-statistical predictions (as opposed to maximally entangled states of
bipartite systems [73,117,116]).
A second new aspect arising in the discussion of multiparty entanglement is the fact that several
di erent notions of separability occur. A state of an N -partite system B(H1 ) ⊗ · · · ⊗ B(HN ) is
called N -separable if

=

J

j1

⊗ ··· ⊗

jN

(3.31)

J

with states jk ∈ B∗ (Hk ) and multiindices J =(j1 ; : : : ; jk ). Alternatively, however, we can decompose
B(H1 )⊗· · ·⊗B(HN ) into two subsystems (or even into M subsystems if M ¡ N ) and call biseparable if it is separable with respect to this decomposition. It is obvious that N -separability implies
There is, however, the possibility to choose the bases (k) ; : : : ; (k) such that the number of summands becomes
1
d
minimal. For tri-partite systems this “minimal canonical form” is study in [1].
10


461

biseparability with respect to all possible decompositions. The converse is—not very surprisingly—
not true. One way to construct a corresponding counterexample is to use an unextendable product
base (cf. Section 3.1.5). In [14] it is shown that the tripartite qubit state complementary to the UPB
1
|0; 1; + ; |1; +; 0 ; |+; 0; 1 ; |−; −; − with |± = √ (|0 ± |1 )
2

(3.32)

is entangled (i.e. tri-inseparable) but biseparable with respect to any decomposition into two subsystems (cf. [14] for details).
Another, maybe more systematic, way to ÿnd examples for multipartite states with interesting
properties is the generalization of the methods used for Werner states (Section 3.1.2), i.e. to look
for density matrices ∈ B∗ (H⊗N ) which commute with all unitaries of the form U ⊗N . Applying
again Theorem 3.1 we see that each such is a linear combination of permutation unitaries. Hence,
the structure of the set of all U ⊗N invariant states can be derived from representation theory of the
symmetric group (which can be tedious for large N !). For N = 3 this program is carried out in [61]
and it turns out that the corresponding set of invariant states is a ÿve-dimensional (real) manifold.
We skip the details here and refer to [61] instead.
3.2. Channels
In Section 2.3 we have introduced channels as very general objects transforming arbitrary types
of information (i.e. classical, quantum and mixtures of them) into one another. In the following, we
will consider some of the most important special cases.
3.2.1. Quantum channnels
Many tasks of quantum information theory require the transmission of quantum information over
long distances, using devices like optical ÿbers or storing quantum information in some sort of
memory. Both situations can be described by a channel or quantum operation T : B(H) → B(H),
where T ∗ ( ) is the quantum information which will be received when was sent, or alternatively:
which will be read o the quantum memory when was written. Ideally, we would prefer those
channels which do not a ect the information at all, i.e. T = 5, or, as the next best choice, a T whose
action can be undone by a physical device, i.e. T should be invertible and T −1 is again a channel.
The Stinespring Theorem (Theorem 2.8) immediately shows that this implies T ∗ = U U ∗ with a
unitary U ; in other words, the systems carrying the information do not interact with the environment.
We will call such a kind of channel an ideal channel. In real situations, however, interaction with
the environment, i.e. additional, unobservable degrees of freedom, cannot be avoided. The general
structure of such a noisy channel is given by
T ∗ ( ) = tr K (U ( ⊗

0 )U

∗

);

(3.33)

where U : H ⊗ K → H ⊗ K is a unitary operator describing the common evolution of the system
(Hilbert space H) and the environment (Hilbert space K) and 0 ∈ S(K) is the initial state of the
environment (cf. Fig. 3.2). It is obvious that the quantum information originally stored in ∈ S(H)
cannot be completely recovered from T ∗ ( ) if only one system is available. It is an easy consequence
of the Stinepspring theorem that each channel can be expressed in this form

462


Fig. 3.2. Noisy channel.

Corollary 3.2 (Ancilla form). Assume that T : B(H) → B(H) is a channel. Then there is a
Hilbert space K; a pure state 0 and a unitary map U : H ⊗ K → H ⊗ K such that
Eq. (3.33) holds. It is always possible; to choose K such that dim(K) = dim(H)3 holds.
Proof. Consider the Stinepspring form T (A) = V ∗ (A ⊗ 5)V with V : H → H ⊗ K of T and choose
a vector ∈ K such that U ( ⊗ ) = V ( ) can be extended to a unitary map U : H ⊗ K → H ⊗ K
(this is always possible since T is unital and V therefore isometric). If ej ∈ H; j = 1; : : : ; d1 and
fk ∈ K; k = 1; : : : ; d2 are orthonormal bases with f1 = we get
tr[T (A) ] = tr[ V ∗ (A ⊗ 5)V ] =

V ej ; (A ⊗ 5)Vej

(3.34)

j

=

U( ⊗ |

|)(ej ⊗ fk ); (A ⊗ 5)U (ej ⊗ fk )

(3.35)

jk

= tr[tr K [U ( ⊗ |

|)U ∗ ]A] ;

(3.36)

which proves the statement.
Note that there are, in general, many ways to express a channel this way, e.g. if T is an ideal
channel
→ T ∗ = U U ∗ we can rewrite it with an arbitrary unitary U0 : K → K by T ∗ =
∗
tr 2 (U ⊗ U0 ⊗ 0 U ∗ ⊗ U0 ). This is the weakness of the ancilla form compared to the Stinespring
representation of Theorem 2.8. Nevertheless, Corollary 3.2 shows that each channel which is not an
ideal channel is noisy in the described way.
The most prominent example for a noisy channel is the depolarizing channel for d-level systems
(i.e. H = Cd )
5
S(H)
→ # + (1 − #) ∈ S(H); 0 6 # 6 1
(3.37)
d
or in the Heisenberg picture
tr(A)
5 ∈ B(H) :
(3.38)
B(H) A → #A + (1 − #)
d
A Stinespring dilation of T (not the minimal one—this can be checked by counting dimensions) is
given by K = H ⊗ H ⊕ C and V : H → H ⊗ K = H⊗3 ⊕ H with
|j → V |j =

1−#
d

d

|k ⊗ |k ⊗ |j
k=1

√
⊕ [ #|j ] ;

(3.39)


463

where |k , k = 1; : : : ; d denotes again the canonical basis in H. An ancilla form of T with the same
K is given by the (pure) environment state
1−#
d

=

d

|k ⊗ |k

√
⊕ [ #|0 ] ∈ K

(3.40)

k=1

and the unitary operator U : H ⊗ K → H ⊗ K with
U(

1

⊗

2

⊗

3

⊕ )=

2

⊗

3

⊗

1

⊕

;

(3.41)

i.e. U is the direct sum of a permutation unitary and the identity.
3.2.2. Channels under symmetry
Similarly to the discussion in Section 3.1 it is often useful to consider channels with special
symmetry properties. To be more precise, consider a group G and two unitary representations 1 ; 2
on the Hilbert spaces H1 and H2 , respectively. A channel T : B(H1 ) → B(H2 ) is called covariant
(with respect to 1 and 2 ) if
T [ 1 (U )A 1 (U )∗ ] =

2 (U )T [A] 2 (U )

∗

∀A ∈ B(H1 ) ∀U ∈ G

(3.42)

holds. The general structure of covariant channels is governed by a fairly powerful variant of
Stinespring’s theorem which we will state below (and which will be very useful for the study
of the cloning problem in Section 7). Before we do this let us have a short look on a particular
class of examples which is closely related to OO-invariant states.
Hence consider a channel T : B(H) → B(H) which is covariant with respect to the orthogonal
group, i.e. T (UAU ∗ ) = UT (A)U ∗ for all unitaries U on H with U = U in a distinguished basis |j ,
j = 1; : : : ; d. The maximally entangled state = d−1=2 j |jj is OO-invariant, i.e. U ⊗ U = for all
these U . Therefore, each state = (Id ⊗ T ∗ )|
| is OO-invariant as well and by the duality lemma
(Theorem 2.10) T and are uniquely determined (up to unitary equivalence) by . This means we
can use the structure of OO-invariant states derived in Section 3.1.4 to characterize all orthogonal
covariant channels. As a ÿrst step consider the linear maps X1 (A) = d tr(A)5, X2 (A) = dAT and
X3 (A) = dA. They are not channels (they are not unital and X2 is not cp) but they have the correct
˜
covariance property and it is easy to see that they correspond to the operators 5; F; F ∈ B(H ⊗ H),
i.e.
(Id ⊗ X1 )|

| = 5;

(Id ⊗ X2 )|

| = F;

(Id ⊗ X3 )|

˜
|=F :

(3.43)

Using Eq. (3.21), we can determine therefore the channels which belong to the three extremal
OO-invariant states (the corners of the upper triangle in Fig. 3.1):
T0 (A) = A;
T2 (A) =

T1 (A) =

tr(A)5 − AT
;
d−1

d
2
(tr(A)5 + AT ) − A
d(d + 1) − 2 2

(3.44)
:

(3.45)

Each OO-invariant channel is a convex linear combination of these three. Special cases are the
channels corresponding to Werner and isotropic states. The latter leads to depolarizing channels

464


T (A)=#A+(1−#)d−1 tr(A)5 with # ∈ [0; d2 =(d2 −1)]; cf. Eq. (3.15), while Werner states correspond
to
T (A) =

#
1−#
[tr(A)5 + AT ] +
[tr(A)5 − AT ];
d+1
d−1

# ∈ [0; 1] ;

(3.46)

cf. Eq. (3.8).
Let us come back now to the general case. We will state here the covariant version of the Stinespring theorem (see [98] for a proof). The basic idea is that all covariant channels are parameterized
by representations on the dilation space.
Theorem 3.3. Let G be a group with ÿnite-dimensional unitary representations j : G → U (Hj )
and T : B(H1 ) → B(H2 ) a 1 ; 2 -covariant channel. Then there is a ÿnite-dimensional unitary
representation ˜ : G → U (K) and an operator V : H2 → H1 ⊗ K with V 2 (U ) = 1 (U ) ⊗ ˜ (U )
and T (A) = V ∗ A ⊗ 5V .
To get an explicit example consider the dilation of a depolarizing channel given in Eq. (3.39).
In this case we have 1 (U ) = 2 (U ) = U and ˜ (U ) = (U ⊗ U ) ⊕ 5. The check that the map V has
indeed the intertwining property V 2 (U ) = 1 (U ) ⊗ ˜ (U ) stated in the theorem is left as an exercise
to the reader.
3.2.3. Classical channels
The classical analog to a quantum operation is a channel T : C(X ) → C(Y ) which describes the
transmission or manipulation of classical information. As we have mentioned already in Section 2.3.1
positivity and complete positivity are equivalent in this case. Hence, we have to assume only that T
is positive and unital. Obviously, T is characterized by its matrix elements Txy = y (T |x x|), where
∗
y ∈ C (X ) denotes the Dirac measure at y ∈ Y and |x x| ∈ C(X ) is the canonical basis in C(X )
(cf. Section 2.1.3). Positivity and normalization of T imply that 0 6 Txy 6 1 and
1=

y (5)

=

y (T (5))

=

y

T

|x x|
x

=

Txy

(3.47)

x

holds. Hence, the family (Txy )x∈X is a probability distribution on X and Txy is therefore the probability to get the information x ∈ X at the output side of the channel if y ∈ Y was send. Each classical
channel is uniquely determined by its matrix of transition probabilities. For X = Y we see that the
information is transmitted without error i Txy = xy , i.e. T is an ideal channel if T = Id holds and
noisy otherwise.
3.2.4. Observables and preparations
Let us consider now a channel which transforms quantum information B(H) into classical information C(X ). Since positivity and complete positivity are again equivalent, we just have to look at
a positive and unital map E : C(X ) → B(H). With the canonical basis |x x|, x ∈ X of C(X ) we
get a family Ex = E(|x x|), x ∈ X of positive operators Ex ∈ B(H) with x∈X Ex = 5. Hence the Ex
form a POV measure, i.e. an observable. If on the other hand a POV measure Ex ∈ B(H), x ∈ X
is given we can deÿne a quantum to classical channel E : C(X ) → B(H) by E(f) = x f(x)Ex .


465

This shows that the observable Ex ; x ∈ X and the channel E can be identiÿed and we say E is the
observable.
Keeping this interpretation in mind it is possible to have a short look at continuous observables
without the need of abstract measure theory: We only have to deÿne the classical algebra C(X ) for
a set X which is not ÿnite or discrete. For simplicity, we assume that X = R holds; however, the
generalization to other locally compact spaces is straightforward. We choose for C(R) the space of
continuous, complex-valued functions vanishing at inÿnity, i.e. |f(x)| ¡ for each ¿ 0 provided
|x| is large enough. C(R) can be equipped with the sup-norm and becomes an Abelian C∗ -algebra
(cf. [25]). To interpret it as an operator algebra as assumed in Section 2.1.1 we have to identify
f ∈ C(R) with the corresponding multiplication operator on L2 (R). An observable taking arbitrary
real values can now be deÿned as a positive map E : C(R) → B(H). The probability to get a result
in the interval [a; b] ⊂ R during an E measurement on systems in the state is 11
([a; b]) = sup {tr(E(f) ) | f ∈ C(R); 0 6 f 6 5; supp f ⊂ [a; b]} ;

(3.48)

where supp denotes the support of f. The most well-known example for R valued observables are
of course position Q and momentum P of a free particle in one dimension. In this case we have
H = L2 (R) and the channels corresponding to Q and P are (in position representation) given by
C(R) f → EQ (f) ∈ B(H) with EQ (f) = f , respectively, C(R) f → EP (f) ∈ B(H) with
EP (f) = (f ˆ )∨ where ∧ and ∨ denote the Fourier transform and its inverse.
Let us return now to a ÿnite set X and exchange the role of C(X ) and B(H); in other
words let us consider a channel R : B(H) → C(X ) with a classical input and a quantum output algebra. In the Schrodinger picture we get a family of density matrices x :=R∗ ( x ) ∈ B∗ (H),
x ∈ X , where x ∈ C∗ (X ) again denote the Dirac measures (cf. Section 2.1.3). Hence, we get a
parameter-dependent preparation which can be used to encode the classical information x ∈ X into
the quantum information x ∈ B∗ (H).
3.2.5. Instruments and parameter-dependent operations
An observable describes only the statistics of measuring results, but does not contain information
about the state of the system after the measurement. To get a description which ÿlls this gap we have
to consider channels which operates on quantum systems and produces hybrid systems as output,
i.e. T : B(H) ⊗ M(X ) → B(K). Following Davies [50] we will call such an object an instrument.
From T we can derive the subchannel
C(X )

f → T (5 ⊗ f) ∈ B(K) ;

(3.49)

which is the observable measured by T , i.e. tr[T (5 ⊗ |x x|) ] is the probability to measure x ∈ X
on systems in the state . On the other hand, we get for each x ∈ X a quantum channel (which is
not unital)
B(H)

11

A → Tx (A) = T (A ⊗ |x x|) ∈ B(K) :

Due to the Riesz–Markov theorem (cf. [134, Theorem IV.18]) the set function
measure on the real line.

(3.50)

extends in unique way to a probability

466


Fig. 3.3. Instrument.
Fig. 3.4. Parameter-dependent operation.

It describes the operation performed by the instrument T if x ∈ X was measured. More precisely if
a measurement on systems in the state gives the result x ∈ X we get (up to normalization) the
state Tx∗ ( ) after the measurement (cf. Fig. 3.3), while
tr(Tx∗ ( )) = tr(Tx∗ ( )5) = tr( T (5 ⊗ |x x|))
is (again) the probability to measure x ∈ X on
the operations Tx by
T (A ⊗ f) =

(3.51)
. The instrument T can be expressed in terms of

f(x)Tx (A) ;

(3.52)

x

hence, we can identify T with the family Tx , x ∈ X . Finally, we can consider the second marginal
of T
B(H)

A → T (A ⊗ 5) =

Tx (A) ∈ B(K) :

(3.53)

x ∈X

It describes the operation we get if the outcome of the measurement is ignored.
The most well-known example of an instrument is a von Neumann–Luders measurement associated to a PV measure given by family of projections Ex , x = 1; : : : d; e.g. the eigenprojections of a
self-adjoint operator A ∈ B(H). It is deÿned as the channel
T : B(H) ⊗ C(X ) → B(H)

with X = {1; : : : ; d}
)− 1 E

and

Tx (A) = Ex AEx :

(3.54)

Hence, we get the ÿnal state tr(Ex
x Ex if we measure the value x ∈ X on systems initially in
the state —this is well known from quantum mechanics.
Let us change now the role of B(H) ⊗ C(X ) and B(K); in other words, consider a channel
T : B(K) → B(H) ⊗ C(X ) with hybrid input and quantum output. It describes a device which
changes the state of a system depending on additional classical information. As for an instrument, T
decomposes into a family of (unital!) channels Tx : B(K) → B(H) such that we get T ∗ ( ⊗ p) =
∗
x px Tx ( ) in the Schrodinger picture. Physically T describes a parameter-dependent operation:
depending on the classical information x ∈ X the quantum information ∈ B(K) is transformed by
the operation Tx (cf. Fig. 3.4).
Finally, we can consider a channel T : B(H) ⊗ C(X ) → B(K) ⊗ C(Y ) with hybrid
input and output to get a parameter-dependent instrument (cf. Fig. 3.5): Similar to the discussion in
the last paragraph we can deÿne a family of instruments Ty : B(H) ⊗ C(X ) → B(K), y ∈ Y by
∗
the equation T ∗ ( ⊗ p) = y py Ty ( ). Physically, T describes the following device: It receives


467

Fig. 3.5. Parameter-dependent instrument.
Fig. 3.6. One-way LOCC operation; cf. Fig. 3.7 for an explanation.

the classical information y ∈ Y and a quantum system in the state ∈ B∗ (K) as input. Depending
on y a measurement with the instrument Ty is performed, which in turn produces the measuring
∗
value x ∈ X and leaves the quantum system in the state (up to normalization) Ty; x ( ); with Ty; x
given as in Eq. (3.50) by Ty; x (A) = Ty (A ⊗ |x x|).
3.2.6. LOCC and separable channels
Let us consider now channels acting on ÿnite-dimensional bipartite systems: T : B(H1 ⊗ K2 ) →
B(K1 ⊗ K2 ). In this case we can ask the question whether a channel preserves separability. Simple
examples are local operations (LOs), i.e. T = T A ⊗ T B with two channels T A; B : B(Hj ) → B(Kj ).
Physically, we think of such a T in terms of two physicists Alice and Bob both performing operations
on their own particle but without information transmission neither classical nor quantum. The next
di cult step are LOs with one-way classical communications (one way LOCC). This means Alice
operates on her system with an instrument, communicates the classical measuring result j ∈ X =
{1; : : : ; N } to Bob and he selects an operation depending on these data. We can write such a channel
as a composition T = (T A ⊗ Id)(Id ⊗ T B ) of the instrument T A : B(H1 ) ⊗ C(X1 ) → B(K1 ) and the
parameter-dependent operation T B : B(H2 ) → C(X1 ) ⊗ B(K2 ) (cf. Fig. 3.6)
Id⊗T B

T A ⊗Id

B(H1 ⊗ H2 ) −→ B(H1 ) ⊗ C(X ) ⊗ B(K2 ) −→ B(K1 ⊗ K2 ) :

(3.55)

It is of course possible to continue the chain in Eq. (3.55), i.e. instead of just operating on his
system, Bob can invoke a parameter-dependent instrument depending on Alice’s data j1 ∈ X1 , send
the corresponding measuring results j2 ∈ X2 to Alice and so on. To write down the corresponding
chain of maps (as in Eq. (3.55)) is simple but not very illuminating and therefore omitted; cf. Fig. 3.7
instead. If we allow Alice and Bob to drop some of their particles, i.e. the operations they perform
need not to be unital, we get an LOCC channel (“local operations and classical communications”).
It represents the most general physical process which can be performed on a two partite system if
only classical communication (in both directions) is available.
The LOCC channels play a signiÿcant role in entanglement theory (we will see this in Section
4.3), but they are di cult to handle. Fortunately, it is often possible to replace them by closely

468


Fig. 3.7. LOCC operation. The upper and lower curly arrows represent Alice’s respectively Bob’s, quantum system, while
the straight arrows in the middle stand for the classical information Alice and Bob exchange. The boxes symbolize the
channels applied by Alice and Bob.

related operations with a more simple structure: A not necessarily unital channel T : B(H1 ⊗K2 ) →
B(K1 ⊗ K2 ) is called separable, if it is a sum of (in general non-unital) local operations, i.e.
N

T=

TjA ⊗ TjB :

(3.56)

j=1

It is easy to see that a separable T maps separable states to separable states (up to normalization) and
that each LOCC channel is separable (cf. [13]). The converse, however, is (somewhat surprisingly)
not true: there are separable channels which are not LOCC, see [13] for a concrete example.
3.3. Quantum mechanics in phase space
Up to now we have considered only ÿnite-dimensional systems and even in this extremely idealized
situation it is not easy to get non-trivial results. At a ÿrst look the discussion of continuous quantum
systems seems therefore to be hopeless. If we restrict our attention however to small classes of
states and channels, with su ciently simple structure, many problems become tractable. Phase space
quantum mechanics, which will be reviewed in this section (see [79, Chapter 5] for details), provides
a very powerful tool in this context.
Before we start let us add some remarks to the discussion of Section 2 which we have restricted to
ÿnite-dimensional Hilbert spaces. Basically, most of the material considered there can be generalized
in a straightforward way, as long as topological issues like continuity and convergence arguments
are treated carefully enough. There are of course some caveats (cf. in particular, footnote 4 of
Section 2); however, they do not lead to problems in the framework we are going to discuss and
can therefore be ignored.
3.3.1. Weyl operators and the CCR
The kinematical structure of a quantum system with d degrees of freedom is usually described
by a separable Hilbert space H and 2d self-adjoint operators Q1 ; : : : ; Qd ; P1 ; : : : ; Pd satisfying the
canonical commutation relations [Qj ; Qk ] = 0, [Pj ; Pk ] = 0, [Qj ; Pk ] = i jk 5. The latter can be rewritten
in a more compact form as
R2j−1 = Qj ; R2j = Pj ;

j = 1; : : : ; d; [Rj ; Rk ] = −i

jk

:

(3.57)


Here

469

denotes the symplectic matrix
= diag(J; : : : ; J );

J=

0

1

−1

0

;

(3.58)

which plays a crucial role for the geometry of classical mechanics. We will call the pair (V; )
consisting of
and the 2d-dimensional real vector space V = R2d henceforth the classical phase
space.
The relations in Eq. (3.57) are, however, not su cient to ÿx the operators Rj up to unitary
equivalence. The best way to remove the remaining physical ambiguities is the study of the unitaries
2d

W (x) = exp(ix ·

· R);

x ∈ V;

x·

·R=

xj

jk Rk

(3.59)

jk=1

instead of the Rj directly. If the family W (x), x ∈ V is irreducible (i.e. [W (x); A] = 0, ∀x ∈ V implies
A = 5 with ∈ C) and satisÿes 12
i
W (x)W (x ) = exp − x · · x W (x + x ) ;
(3.60)
2
it is called an (irreducible) representation of the Weyl relations (on (V; )) and the operators W (x)
are called Weyl operators. By the well-known Stone–von Neumann uniqueness theorem all these
representations are mutually unitarily equivalent, i.e. if we have two of them W1 (x); W2 (x), there
is a unitary operator U with UW1 (x)U ∗ = W2 (x) ∀x ∈ V . This implies that it does not matter from
a physical point of view which representation we use. The most well-known one is of course the
Schrodinger representation where H = L2 (Rd ) and Qj , Pk are the usual position and momentum
operators.
3.3.2. Gaussian states
2
A density operator ∈ S(H) has ÿnite second moments if the expectation values tr( Qj ) and
2
tr( Pj ) are ÿnite for all j =1; : : : ; d. In this case we can deÿne the mean m ∈ R2d and the correlation
matrix by
mj = tr( Rj );

jk

+i

jk

= 2 tr[(Rj − mj ) (Rk − mk )] :

The mean m can be arbitrary, but the correlation matrix
positivity condition
+ i ¿0

(3.61)
must be real and symmetric and the
(3.62)

must hold (this is an easy consequence of the canonical commutation relations (3.57)).
Our aim is now to distinguish exactly one state among all others with the same mean and correlation matrix. This is the point where the Weyl operators come into play. Each state ∈ S(H)
can be characterized uniquely by its quantum characteristic function X x → tr[W (x) ] ∈ C which
12

Note that the CCR (3.57) are implied by the Weyl relations (3.60) but the converse is, in contrast to popular believe,
not true: There are representations of the CCR which are unitarily inequivalent to the Schrodinger representation; cf. [134,
Section VIII.5] for particular examples. Hence, uniqueness can only be achieved on the level of Weyl operators—which
is one major reason to study them.

470


should be regarded as the quantum Fourier transform of
the Wigner function of [164]. We call Gaussian if

and is in fact the Fourier transform of

tr[W (x) ] = exp(im · x − 1 x · · x)
4

(3.63)

holds. By di erentiation it is easy to check that has indeed mean m and covariance matrix .
The most prominent examples for Gaussian states are the ground state 0 of a system of d harmonic
oscillators (where the mean is 0 and is given by the corresponding classical Hamiltonian) and
its phase space translates m = W (m) W (−m) (with mean m and the same as 0 ), which are
known from quantum optics as coherent states. 0 and m are pure states and it can be shown that
−1 = −5 holds (see [79, Chapter 5]). Examples for mixed Gaussians
a Gaussian state is pure i
are temperature states of harmonic oscillators. In one degree of freedom this is
N

=

1
N +1

∞
n=0

N
N +1

n

|n n| ;

(3.64)

where |n n| denotes the number basis and N is the mean photon number. The characteristic function
of N is
tr[W (x)

N]

= exp[ − 1 (N + 1 )|x|2 ]
2
2

and its correlation matrix is simply

(3.65)

= 2(N + 1=2)5

3.3.3. Entangled Gaussians
Let us now consider bipartite systems. Hence the phase space (V; ) decomposes into a direct sum
V = VA ⊕ VB (where A stands for “Alice” and B for “Bob”) and the symplectic matrix = A ⊕ B
is block diagonal with respect to this decomposition. If WA (x), respectively WB (y), denote Weyl
operators, acting on the Hilbert spaces HA , HB , and corresponding to the phase spaces VA and VB , it
is easy to see that the tensor product WA (x)⊗WB (y) satisÿes the Weyl relations with respect to (V; ).
Hence by the Stone–von Neumann uniqueness theorem we can identify W (x ⊕y), x ⊕y ∈ Va ⊕VB =V
with WA (x) ⊗ WA (y). This immediately shows that a state on H = HA ⊗ HB is a product state i
its characteristic function factorizes. Separability 13 is characterized as follows (we omit the proof,
see [170] instead).
Theorem 3.4. A Gaussian state with covariance matrix
matrices A ; B such that
¿

A

0

0

is separable i

there are covariance

B

(3.66)

holds.
This theorem is somewhat similar to Theorem 2.1: It provides a useful criterion as long as abstract considerations are concerned, but not for explicit calculations. In contrast to ÿnite-dimensional
13

In inÿnite dimensions we have to deÿne separable states (in slight generalization to Deÿnition 2.5) as a trace-norm
convergent convex sum of product states.


471

systems, however, separability of Gaussian states can be decided by an operational criterion in terms
of nonlinear maps between matrices [65]. To state it we have to introduce some terminology ÿrst.
The key tool is a sequence of 2n + 2m × 2n + 2m matrices N , N ∈ N, written in block matrix
notation as
N

Given

0

AN

CN

T
CN

BN

the other

N

=

:

(3.67)

are recursively deÿned by

AN +1 = BN +1 = AN − Re(XN )

and

CN +1 = −Im(XN )

(3.68)

T
if N − i ¿ 0 and N +1 = 0 otherwise. Here we have set XN = CN (BN − i B )−1 CN and the inverse
denotes the pseudoinverse 14 if BN − i B is not invertible. Now we can state the following theorem
(see [65] for a proof).

Theorem 3.5. Consider a Gaussian state
the sequence N ; N ∈ N just deÿned.

of a bipartite system with correlation matrix

0 then is not separable.
1. If for some N ∈ N we have AN − i A
2. If there is; on the other hand an N ∈ N such that AN − CN 5 − i
separable ( CN denotes the operator norm of CN ).

A

0

and

¿ 0; then the state

is

To check whether a Gaussian state is separable or not we have to iterate through the sequence
is entangled and separable in
N until either condition 1 or 2 holds. In the ÿrst case we know that
the second. Hence, only the question remains whether the whole procedure terminates after a ÿnite
number of iterations. This problem is treated in [65] and it turns out that the set of for which
separability is decidable after a ÿnite number of steps is the complement of a measure zero set (in
the set of all separable states). Numerical calculations indicate in addition that the method converges
usually very fast (typically less than ÿve iterations).
To consider ppt states we ÿrst have to characterize the transpose for inÿnite-dimensional systems.
There are di erent ways to do that. We will use the fact that the adjoint of a matrix can be
regarded as transposition followed by componentwise complex conjugation. Hence, we deÿne for
any (possibly unbounded) operator AT = CA∗ C, where C : H → H denotes complex conjugation of
T
T
the wave function in position representation. This implies Qj = Qj for position and Pj = −Pj for
momentum operators. If we insert the partial transpose of a bipartite state into Eq. (3.61) we see
that the correlation matrix ˜jk of T picks up a minus sign whenever one of the indices belongs to
one of Alice’s momentum operators. To be a state ˜ should satisfy ˜ + i ¿ 0, but this is equivalent
to + i ˜ ¿ 0, where in ˜ the corresponding components are reversed i.e. ˜ = (− A ) ⊕ B . Hence
we have shown

14

A−1 is the pseudoinverse of a matrix A if AA−1 = A−1 A is the projector onto the range of A. If A is invertible A−1
is the usual inverse.

472


Proposition 3.6. A Gaussian state is ppt i its correlation matrix
+ i˜¿0

with ˜ =

−
0

A

0

:

satisÿes
(3.69)

B

The interesting question is now whether the ppt criterion is (for a given number of degrees of
freedom) equivalent to separability or not. The following theorem which was proved in [144] for
1 × 1 systems and in [170] in 1 × d case gives a complete answer.
Theorem 3.7. A Gaussian state of a quantum system with 1×d degrees of freedom (i.e. dim XA =2
and dim XB = 2d) is separable i it is ppt; in other words i the condition of Proposition 3.6
holds.
For other kinds of systems the ppt criterion may fail which means that there are entangled Gaussian
states which are ppt. A systematic way to construct such states can be found in [170]. Roughly
speaking, it is based on the idea to go to the boundary of the set of ppt covariance matrices, i.e.
has to satisfy Eqs. (3.62) and (3.69) and it has to be a minimal matrix with this property. Using
this method explicit examples for ppt and entangled Gaussians are constructed for 2 × 2 degrees of
freedom (cf. [170] for details).
3.3.4. Gaussian channels
Finally, we want to give a short review on a special class of channels for inÿnite-dimensional
quantum systems (cf. [84] for details). To explain the basic idea ÿrstly note that each ÿnite set of
Weyl operators (W (xj ), j = 1; : : : ; N , xj = xk for j = k) is linear independent. This can be checked
easily using expectation values of j j W (xj ) in Gaussian states. Hence, linear maps on the space
of ÿnite linear combinations of Weyl operators can be deÿned by T [W (x)] = f(x)W (Ax) where f
is a complex-valued function on V and A is a 2d × 2d matrix. If we choose A and f carefully
enough, such that some continuity properties match T can be extended in a unique way to a linear
map on B(H)—which is, however, in general not completely positive.
This means we have to consider special choices for A and f. The most easy case arises if f ≡ 1
and A is a symplectic isomorphism, i.e. AT A = . If this holds the map V
x → W (Ax) is a
representation of the Weyl relations and therefore unitarily equivalent to the representation we have
started with. In other words, there is a unitary operator U with T [W (x)] = W (Ax) = UW (x)U ∗ ,
i.e. T is unitarily implemented, hence completely positive and, in fact, well known as Bogolubov
transformation.
If A does not preserve the symplectic matrix, f ≡ 1 is no option. Instead, we have to choose f
such that the matrices
i
i
Mjk = f(xj − xk )exp − xj · xk + Axj · Axk
2
2

(3.70)

are positive. Complete positivity of the corresponding T is then a standard result of abstract
C∗ -algebra theory (cf. [51]). If the factor f is in addition a Gaussian, i.e. f(x) = exp(− 1 x · ÿx) for
2
a positive deÿnite matrix ÿ the cp-map T is called a Gaussian channel.


473

A simple way to construct a Gaussian channel is in terms of an ancilla representation. More
precisely, if A : V → V is an arbitrary linear map we can extend it to a symplectic map V
x → Ax ⊕ A x ∈ V ⊕ V , where the symplectic vector space (V ; ) now refers to the environment.
Consider now the Weyl operator W (x) ⊗ W (x ) = W (x; x ) on the Hilbert space H ⊗ H associated
to the phase space element x ⊕ x ∈ V ⊕ V . Since A ⊕ A is symplectic it admits a unitary Bogolubov
transformation U : H⊗H → H⊗H with U ∗ W (x; x )U =W (Ax; A x). If denotes now a Gaussian
density matrix on H describing the initial state of the environment we get a Gaussian channel by
tr[T ∗ ( )W (x)] = tr[ ⊗

U ∗ W (x; x )U ] = tr[ W (Ax)]tr[ W (A x)] :

(3.71)

Hence T [W (x)] = f(x)W (Ax) with f(x) = tr[ W (A x)].
Particular examples for Gaussian channels in the case of one degree of freedom are attenuation
and ampliÿcation channels [81,84]. They are given in terms of a real parameter k = 1 by R2 x →
Ax = kx ∈ R2
R2

1 − k 2 x ∈ R2 ¡ 1

x→Ax=

(3.72)

for k ¡ 1 and
R2

(q; p) → A (q; p) = (Äq; −Äp) ∈ R2

with Ä =

for k ¿ 1. If the environment is initially in a thermal state
T [W (x)] = exp

1
2

k2 − 1
˜
N

(3.73)

(cf. Eq. (3.64)) this leads to

|k 2 − 1|
+ Nc x2 W (kx) ;
2

˜
where we have set Nc = |k 2 − 1|N . If we start initially with a thermal state
again to a thermal state N with mean photon number N given by
N = k 2 N + max{0; k 2 − 1} + Nc :

(3.74)
N

it is mapped by T
(3.75)

If Nc = 0 this means that T ampliÿes (k ¿ 1) or damps (k ¡ 1) the mean photon number, while
Nc ¿ 0 leads to additional classical, Gaussian noise. We will reconsider this channel in greater detail
in Section 6.
4. Basic tasks
After we have discussed the conceptual foundations of quantum information we will now consider
some of its basic tasks. The spectrum ranges here from elementary processes, like teleportation 4.1
or error correction 4.4, which are building blocks for more complex applications, up to possible
future technologies like quantum cryptography 4.6 and quantum computing 4.5.
4.1. Teleportation and dense coding
Maybe the most striking feature of entanglement is the fact that otherwise impossible machines
become possible if entangled states are used as an additional resource. The most prominent examples
are teleportation and dense coding which we want to discuss in this section.

474


4.1.1. Impossible machines revisited: classical teleportation
We have already pointed out in the introduction that classical teleportation, i.e. transmission of
quantum information over a classical information channel is impossible. With the material introduced in the last two chapters it is now possible to reconsider this subject in a slightly more
mathematical way, which makes the following treatment of entanglement’ enhanced teleportation
more transparent. To “teleport” the state ∈ B∗ (H) Alice performs a measurement (described by a
POV measure E1 ; : : : ; EN ∈ B(H)) on her system and gets a value x ∈ X ={1; : : : ; N } with probability
px = tr(Ex ). These data she communicates to Bob and he prepares a B(H) system in the state x .
Hence the overall state Bob gets if the experiment is repeated many times is: ˜ = x∈X tr(Ex ) x
(cf. Fig. 1.1). The latter can be rewritten as the composition
E∗

D∗

B∗ (H)→C(X )∗ →B∗ (H)∗

(4.1)

of the channels
C(X )

f(x)Ex ∈ B(H)

f → E(f) =

(4.2)

x ∈X

and
C∗ (X )

p → D∗ (p) =

px

x

∈ B∗ (H) ;

(4.3)

x ∈X

˜ =D∗ E ∗ (

) and this equation makes sense even if X is not ÿnite. The teleportation is successful
i.e.
if the output state ˜ cannot be distinguished from the input state by any statistical experiment,
i.e. if D∗ E ∗ ( ) = . Hence the impossibility of classical teleportation can be rephrased simply as
ED = Id for all observables E and all preparations D.
4.1.2. Entanglement enhanced teleportation
Let us now change our setup slightly. Assume that Alice wants to send a quantum state ∈ B∗ (H)
to Bob and that she shares an entangled state ∈ B∗ (K ⊗ K) and an ideal classical communication
channel C(X ) → C(X ) with him. Alice can perform a measurement E : C(X ) → B(H ⊗ K)
on the composite system B(H ⊗ K) consisting of the particle to teleport (B(H)) and her part
of the entangled system (B(K)). Then she communicates the classical data x ∈ X to Bob and he
operates with the parameter-dependent operation D : B(H) → B(K) ⊗ C(X ) appropriately on his
particle (cf. Fig. 4.1). Hence, the overall procedure can be described by the channel T = (E ⊗ Id)D,

Fig. 4.1. Entanglement enhanced teleportation.


475

or in analogy to (4.1)
E ∗ ⊗Id

D∗

B∗ (H ⊗ K⊗2 ) −→ C∗ (X ) ⊗ B∗ (K)→B∗ (H) :
The teleportation of

(4.4)

is successful if

T ∗ ( ⊗ ):=D∗ ((E ∗ ⊗ Id)( ⊗ )) =

(4.5)

holds, in other words if there is no statistical measurement which can distinguish the ÿnal state
T ∗ ( ⊗ ) of Bob’s particle from the initial state of Alice’s input system. The two channels E
and D and the entangled state form a teleportation scheme if Eq. (4.5) holds for all states of
the B(H) system, i.e. if each state of a B(H) system can be teleported without loss of quantum
information.
Assume now that H = K = Cd and X = {0; : : : ; d2 − 1} holds. In this case we can deÿne
a teleportation scheme as follows: The entangled state shared by Alice and Bob is a maximally
entangled state =|
| and Alice performs a measurement which is given by the one-dimensional
2
projections Ej = | j
j |, where
j ∈ H ⊗ H, j = 0; : : : ; d − 1 is a basis of maximally entangled
2
vectors. If her result is j = 0; : : : ; d − 1 Bob has to apply the operation → Uj∗ Uj on his partner
of the entangled pair, where the Uj ∈ B(H), j = 0; : : : ; d2 − 1 are an orthonormal family of unitary
operators, i.e. tr(Uj∗ Uk ) = d jk . Hence, the parameter-dependent operation D has the form (in the
Schrodinger picture):
∗

∗

C (X ) ⊗ B (H)

d2 − 1

∗

(p; ) → D (p; ) =

pj Uj∗ Uj ∈ B∗ (H) :

(4.6)

j=0

Therefore, we get for T ∗ ( ⊗ ) from Eq. (4.5)
tr[T ∗ ( ⊗ )A] = tr[(E ⊗ Id)∗ ( ⊗ )D(A)]




d2 − 1

= tr 

(4.7)

tr 12 [|

j |(

j

⊗ )]Uj∗ AUj  :

(4.8)

⊗ (Uj∗ AUj )]:

(4.9)

j=0
d2 − 1

=

tr[( ⊗ )|

j

j|

j=0

Here tr 12 denotes the partial trace over the ÿrst two tensor factors (= Alice’s qubits). If
and the Uj are related by the equation
j

= (Uj ⊗ 5)

;

, the

j

(4.10)

it is a straightforward calculation to show that T ∗ ( ⊗ ) = holds as expected [167]. If d = 2 there
is basically a unique choice: the j , j = 0; : : : ; 3 are the four Bell states (cf. Eq. (3.3), = 0 and
the Uj are the identity and the three Pauli matrices. In this way, we recover the standard example
for teleportation, published for the ÿrst time in [11]. The ÿrst experimental realizations are [24,22].

476


Fig. 4.2. Dense coding.

4.1.3. Dense coding
We have just shown how quantum information can be transmitted via a classical channel, if
entanglement is available as an additional resource. Now we are looking at the dual procedure:
transmission of classical information over a quantum channel. To send the classical information
x ∈ X = {1; : : : ; n} to Bob, Alice can prepare a d-level quantum system in the state x ∈ B∗ (H),
sends it to Bob and he measures an observable given by positive operators E1 ; : : : ; Em . The probability
for Bob to receive the signal y ∈ X if Alice has sent x ∈ X is tr( x Ey ) and this deÿnes a classical
information channel by (cf. Section 3.2.3)
C∗ (X )

p→

p(x)tr( x Em ) ∈ C∗ (X ) :

p(x)tr( x E1 ); : : : ;
x ∈X

(4.11)

x ∈X

To get an ideal channel we just have to choose mutually orthogonal pure states x = | x x |, x =
1; : : : ; d on Alice’s side and the corresponding one-dimensional projections Ey = | y y |, y = 1; : : : ; d
on Bob’s. If d = 2 and H = C2 it is possible to send one bit classical information via one qubit
quantum information. The crucial point is now that the amount of classical information can be
increased (doubled in the qubit case) if Alice shares an entangled state ∈ S(H ⊗ H) with Bob.
To send the classical information x ∈ X = {1; : : : ; n} to Bob, Alice operates on her particle with an
operation Dx : B(H) → B(H), sends it through an (ideal) quantum channel to Bob and he performs
a measurement E1 ; : : : ; En ∈ B(H ⊗ H) on both particles. The probability for Bob to measure y ∈ X
if Alice has send x ∈ X is given by
tr[(Dx ⊗ Id)∗ ( )Ey ]

(4.12)

and this deÿnes the transition matrix of a classical communication channel T . If T is an ideal
channel, i.e. if the transition matrix (4.12) is the identity, we will call E, D and a dense coding
scheme (cf. Fig. 4.2).
In analogy to Eq. (4.4) we can rewrite the channel T deÿned by (4.12) in terms of the composition
D∗ ⊗Id

E∗

(4.13)

pj D j ( )

(4.14)

C∗ (X ) ⊗ B∗ (H) ⊗ B∗ (H) −→ B∗ (H) ⊗ B∗ (H)→C∗ (X )
of the parameter-dependent operation
∗

∗

∗

D : C (X ) ⊗ B (H) → B (H);

n

p⊗ →
j=1


477

and the observable
n

E : C(X ) → B(H ⊗ H);

pj Ej ;

p→

(4.15)

j=1

i.e. T ∗ (p) = E ∗ ◦ (D∗ ⊗ Id)(p ⊗ ). The advantage of this point of view is that it works as well for
inÿnite-dimensional Hilbert spaces and continuous observables.
Finally, let us again consider the case where H=Cd and X ={1; : : : ; d2 }. If we choose as in the last
paragraph a maximally entangled vector ∈ H⊗H, an orthonormal base x ∈ H⊗H, x =1; : : : ; d2
of maximally entangled vectors and an orthonormal family Ux ∈ B(H ⊗ H), x = 1; : : : ; d2 of unitary
∗
operators, we can construct a dense coding scheme as follows: Ex = | x
x |, Dx (A) = Ux AUx
and = |
|. If , the x and the Ux are related by Eq. (4.10) it is easy to see that we really
get a dense coding scheme [167]. If d = 2 holds, we have to set again the Bell basis for the x ,
= 0 and the identity and the Pauli matrices for the Ux . We recover in this case the standard
example of dense coding proposed in [19] and we see that we can transfer two bits via one qubit,
as stated above.
4.2. Estimating and copying
The impossibility of classical teleportation can be rephrased as follows: It is impossible to get
complete information about the state of a quantum system by one measurement on one system.
However, if we have many systems, say N , all prepared in the same state it should be possible
to get (with a clever measuring strategy) as much information on as possible, provided N is large
enough. In this way, we can circumvent the impossibility of devices like classical teleportation or
quantum copying at least in an approximate way.
4.2.1. Quantum state estimation
To discuss this idea in a more detailed way consider a number N of d-level quantum systems,
all of them prepared in the same (unknown) state ∈ B∗ (H). Our aim is to estimate the state
by measurements on the compound system ⊗N . This is described in terms of an observable
E N : C(XN ) → B(H⊗N ) with values in a ÿnite subset 15 XN ⊂ S(H) of the quantum state space
S(H). According to Section 3.2.4 each such E N is given in terms of a tuple E N , ∈ XN , by
E(f) =
f( )E N ; hence, we get for the expectation value of an EN measurement on systems in
⊗N the density matrix ˆ ∈ S(H) with matrix elements
the state
N
; ˆN

=

;

EN :

(4.16)

x ∈ XN

We will call the channel E N an estimator and the criterion for a good estimator E N is that for
any one-particle density operator , the value measured on a state ⊗N is likely to be close to ,
15

This is a severe restriction at this point and physically not very well motivated. There might be more general (i.e.
continuous) observables taking their values in the whole state space S(H) which lead to much better estimates. However,
we do not discuss this possibility in order to keep mathematics more elementary.

478


i.e. that the probability
K N (!):=tr(E N (!)

⊗N

)

with E N (!) =

EN

(4.17)

∈ XN ∩ !

is small if ! ⊂ S(H) is the complement of a small ball around . Of course, we will look at
this problem for large N . So the task is to ÿnd a whole sequence of observables E N , N = 1; 2; : : :,
making error probabilities like (4.17) go to zero as N → ∞.
The most direct way to get a family E N , N ∈ N of estimators with this property is to perform a
sequence of measurements on each of the N input systems separately. A ÿnite set of observables
which leads to a successful estimation strategy is usually called a “quorum” (cf. e.g. [107,162]). E.g.
for d = 2 we can perform alternating measurements of the three spin components. If = 1 (5 +˜ · ˜ )
x
2
is the Bloch representation of
(cf. Section 2.1.2) we see that the expectation values of these
measurements are given by 1 (1+xj ). Hence we get an arbitrarily good estimate if N is large enough.
2
A similar procedure is possible for arbitrary d if we consider the generalized Bloch representation
for (see again Section 2.1.2). There are however more e cient strategies based on “entangled”
measurements (i.e. the EN ( ) cannot be decomposed into pure tensor products) on the whole input
system ⊗N (e.g. [156,99]). Somewhat in between are “adaptive schemes” [63] consisting of separate
measurements but the jth measurement depend on the results of (j − 1)th. We will reconsider this
circle of questions in a more quantitative way in Section 7.
4.2.2. Approximate cloning
By virtue of the no-cloning theorem [173], it is impossible to produce M perfect copies of a
d-level quantum system if N ¡ M input systems in the common (unknown) state ⊗N are given.
∗
More precisely there is no channel TMN : B(H⊗M ) → B(H⊗N ) such that TMN ( ⊗N ) = ⊗M holds
for all ∈ S(H). Using state estimation, however, it is easy to ÿnd a device TMN which produces at
least approximate copies which become exact in the limit N; M → ∞: If ⊗N is given, we measure
the observable E N and get the classical data ∈ XN ⊂ S(H), which we use subsequently to prepare
M systems in the state ⊗M . In other words, TMN has the form
B∗ (H⊗N )

tr(E N )

→

⊗M

∈ B∗ (H⊗M ) :

(4.18)

∈ XN

We immediately see that the probability to get wrong copies coincides exactly with the error probability of the estimator given in Eq. (4.17). This shows ÿrst that we get exact copies in the limit
N → ∞ and second that the quality of the copies does not depend on the number M of output
systems, i.e. the asymptotic rate limN; M →∞ M=N of output systems per input system can be arbitrary
large.
The fact that we get classical data at an intermediate step allows a further generalization of this
scheme. Instead of just preparing M systems in the state detected by the estimator, we can apply
ÿrst an arbitrary transformation F : S(H) → S(H) on the density matrix and prepare F( )⊗M
instead of ⊗M . In this way, we get the channel (cf. Fig. 4.3)
B∗ (H⊗N )

tr(E N )F( )⊗M ∈ B∗ (H⊗M ) ;

→
∈ XN

(4.19)


479

Fig. 4.3. Approximating the impossible machine F by state estimation.

i.e. a physically realizable device which approximates the impossible machine F. The probability to
get a bad approximation of the state F( )⊗M (if the input state was ⊗N ) is again given by the error
probability of the estimator and we get a perfect realization of F at arbitrary rate as M; N → ∞.
There are in particular two interesting tasks which become possible this way: The ÿrst is the
“universal not gate” which associates to each pure state of a qubit the unique pure state orthogonal
to it [36]. This is a special example of a antiunitarily implemented symmetry operation and therefore
not completely positive. The second example is the puriÿcation of states [46,100]. Here it is assumed
that the input states were once pure but have passed later on a depolarizing channel |
| →
#|
| + (1 − #)5=d. If # ¿ 0 this map is invertible but its inverse does not describe an allowed
quantum operation because it maps some density operators to operators with negative eigenvalues.
Hence the reversal of noise is not possible with a one-shot operation but can be done with high
accuracy if enough input systems are available. We rediscuss this topic in Section 7.
4.3. Distillation of entanglement
Let us now return to entanglement. We have seen in Section 4.1 that maximally entangled states
play a crucial role for processes like teleportation and dense coding. In practice however entanglement
is a rather fragile property: If Alice produces a pair of particles in a maximally entangled state
|
| ∈ S(HA ⊗ HB ) and distributes one of them over a great distance to Bob, both end up
with a mixed state which contains much less entanglement then the original and which cannot be
used any longer for teleportation. The latter can be seen quite easily if we try to apply the qubit
teleportation scheme (Section 4.1.2) with a non-maximally entangled isotropic state (Eq. (3.15) with
¿ 0) instead of .
Hence the question arises, whether it is possible to recover |
| from , or, following the
reasoning from the last section, at least a small number of (almost) maximally entangled states from
a large number N of copies of . However, since the distance between Alice and Bob is big (and
quantum communication therefore impossible) only LOCC operations (Section 3.2.6) are available
for this task (Alice and Bob can only operate on their respective particles, drop some of them and
communicate classically with one another). This excludes procedures like the puriÿcation scheme just
sketched, because we would need “entangled” measurements to get an asymptotically exact estimate

480


for the state . Hence, we need a sequence of LOCC channels
TN : B(CdN ⊗ CdN ) → B(H⊗N ⊗ H⊗N )
B
A

(4.20)

such that
∗
TN (

⊗N

)−|

N

N

|1 → 0

for N → ∞

(4.21)

holds, with a sequence of maximally entangled vectors N ∈ CdN ⊗ CdN . Note that we have to use
here the natural isomorphism H⊗N ⊗ H⊗N ∼ (HA ⊗ HB )⊗N , i.e. we have to reshu e ⊗N such
=
B
A
that the ÿrst N tensor factors belong to Alice (HA ) and the last N to Bob (HB ). If confusion
can be avoided we will use this isomorphism in the following without a further note. We will call
a sequence of LOCC channels, TN satisfying (4.21) with a state ∈ S(HA ⊗ HB ) a distillation
scheme for and is called distillable if it admits a distillation scheme. The asymptotic rate with
which maximally entangled states can be distilled with a given protocol is
lim inf log2 (dN )=N :
n→∞

(4.22)

This quantity will become relevant in the framework of entanglement measures (Section 5).
4.3.1. Distillation of pairs of qubits
Concrete distillation protocols are in general rather complicated procedures. We will sketch in the
following how any pair of entangled qubits can be distilled. The ÿrst step is a scheme proposed
for the ÿrst time by Bennett et al. [12]. It can be applied if the maximally entangled fraction F
(Eq. (3.4)) is greater than 1=2. As indicated above, we assume that Alice and Bob share a large
amount of pairs in the state , so that the total state is ⊗N . To obtain a smaller number of pairs
with a higher F they proceed as follows:
1. First they take two pairs (let us call them pairs 1 and 2), i.e. ⊗ and apply to each of them the
twirl operation PU U associated to isotropic states (cf. Eq. (3.18)). This can be done by LOCC
operations in the following way: Alice selects at random (respecting the Haar measure on U (2))
a unitary operator U applies it to her qubits and sends to Bob which transformation she has
chosen; then he applies U to his particles. They end up with two isotropic states ˜ ⊗ ˜ with the
same maximally entangled fraction as .
2. Each party performs the unitary transformation
UXOR : |a ⊗ |b → |a ⊗ |a + b mod 2

(4.23)

on his=her members of the pairs.
3. Finally, Alice and Bob perform local measurements in the basis |0 ; |1 on pair 1 and discards
it afterwards. If the measurements agree, pair 2 is kept and has a higher F. Otherwise pair 2 is
discarded as well.
If this procedure is repeated over and over again, it is possible to get states with an
arbitrarily high F, but we have to sacriÿce more and more pairs and the asymptotic rate is zero.
To overcome this problem we can apply the scheme above until F( ) is high enough such that
1 + tr( ln ) ¿ 0 holds and then we continue with another scheme called hashing [16] which leads
to a non-vanishing rate.


481

If ÿnally F( ) 6 1=2 but is entangled, Alice and Bob can increase F for some of their particles
by ÿltering operations [9,67]. The basic idea is that Alice applies an instrument T : C(X )⊗B(H) →
B(H) with two possible outcomes (X = {1; 2}) to her particles. Hence, the state becomes →
−
px 1 (Tx ⊗ Id)∗ ( ), x = 1; 2 with probability px = tr[Tx∗ ( )] (cf. Section 3.2.5 in particular Eq. (3.50)
for the deÿnition of Tx ). Alice communicates her measuring result x to Bob and if x = 1 they keep
the particle otherwise (x = 2) they discard it. If the instrument T was correctly chosen Alice and
Bob end up with a state ˜ with higher maximally entangled fraction. To ÿnd an appropriate T ÿrstly
note that there are ∈ H ⊗ H with
; (Id ⊗ )
6 0 (this follows from Theorem 2.4.3 since
is by assumption entangled) and second that we can write each vector ∈ H ⊗ H as (X ⊗ 5) 0
with the Bell state 0 and an appropriately chosen operator X (see Section 3.1.1). Now we can
deÿne T in terms of the two operations T1 ; T2 (cf. Eq. (3.52)) with
T1 (A) = X ∗ AX −1 ;

Id − T1 = T2 :

It is straightforward to check that we end up with
(Tx ⊗ Id)∗ ( )
;
˜=
tr[(Tx ⊗ Id)∗ ( )]

(4.24)

(4.25)

such that F( ˜) ¿ 1=2 holds and we can continue with the scheme described in the previous
paragraph.
4.3.2. Distillation of isotropic states
˜
Consider now an entangled isotropic state in d dimensions, i.e. we have H=Cd and 0 6 tr(F )
˜
6 1 (with the operator F of Section 3.1.3). Each such state is distillable via the following scheme
[27,85]: First, Alice and Bob apply a ÿlter operation T : C(X ) ⊗ B(H) → B(H) on their respective
particle given by T1 (A)=PAP, T2 =1−T1 where P is the projection onto a two-dimensional subspace.
If both measure the value 1 they get a qubit pair in the state ˜ =(T1 ⊗T1 )( ). Otherwise they discard
their particles (this requires classical communication). Obviously, the state ˜ is entangled (this can
be easily checked), hence they can proceed as in the previous subsection.
The scheme just proposed can be used to show that each state
which violates the reduction
criterion (cf. Section 2.4.3) can be distilled [85]. The basic idea is to project with the twirl PU U
(which is LOCC as we have seen above; cf. Section 4.3.1) to an isotropic state PU U ( ) and to
apply the procedure from the last paragraph afterwards. We only have to guarantee that PU U ( ) is
entangled. To this end use a vector ∈ H ⊗ H with
; (5 ⊗ tr 1 ( ) − ) ¡ 0 (which exists by
assumption since violates the reduction criterion) and to apply the ÿlter operation given by via
Eq. (4.24).
4.3.3. Bound entangled states
It is obvious that separable states are not distillable, because an LOCC operation map separable
states to separable states. However, is each entangled state distillable? The answer, maybe somewhat
surprising, is no and an entangled state which is not distillable is called bound entangled [87]
(distillable states are sometimes called free entangled, in analogy to thermodynamics). Examples of
bound entangled states are all ppt entangled states [87]: This is an easy consequence of the fact that
each separable channel (and therefore each LOCC channel as well) maps ppt states to ppt states
(this is easy to check), but a maximally entangled state is never ppt. It is not yet known, whether

482


bound entangled npt states exists, however, there are at least some partial results: (1) It is su cient
to solve this question for Werner states, i.e. if we can show that each npt Werner state is distillable
it follows that all npt states are distillable [85]. (2) Each npt Gaussian state is distillable [64].
(3) For each N ∈ N there is an npt Werner state which is not “N -copy distillable”, i.e.
; ⊗N
¿ 0 holds for each pure state
with exactly two Schmidt summands [55,58]. This gives some
evidence for the existence of bound entangled npt states because
is distillable i it is N -copy
distillability for some N [87,55,58].
Since bound entangled states cannot be distilled, they cannot be used for teleportation. Nevertheless
bound entanglement can produce a non-classical e ect, called “activation of bound entanglement”
[92]. To explain the basic idea, assume that Alice and Bob share one pair of particles in a distillable
state f and many particles in a bound entangled state b . Assume in addition that f cannot be
used for teleportation, or, in other words if f is used for teleportation the particle Bob receives
is in a state
which di ers from the state Alice has send. This problem cannot be solved by
distillation, since Alice and Bob share only one pair of particles in the state f . Nevertheless, they
can try to apply an appropriate ÿlter operation on to get with a certain probability a new state
which leads to a better quality of the teleportation (or, if the ÿltering fails, to get nothing at all).
It can be shown, however [88], that there are states f such that the error occurring in this process
(e.g. measured by the trace norm distance of and ) is always above a certain threshold. This
is the point where the bound entangled states b come into play: If Alice and Bob operate with
an appropriate protocol on f and many copies of b the distance between and
can be made
arbitrarily small (although the probability to be successful goes to zero). Another example for an
activation of bound entanglement is related to distillability of npt states: If Alice and Bob share
a certain ppt-entangled state as additional resource each npt state becomes distillable (even if
is bound entangled) [60,104]. For a more detailed survey of the role of bound entanglement and
further references see [91].
4.4. Quantum error correction
If we try to distribute quantum information over large distances or store it for a long time in
some sort of “quantum memory” we always have to deal with “decoherence e ects”, i.e. unavoidable
interactions with the environment. This results in a signiÿcant information loss, which is particularly
bad for the functioning of a quantum computer. Similar problems arise as well in a classical computer,
but the methods used there to circumvent the problems cannot be transferred to the quantum regime.
E.g. the most simple strategy to protect classical information against noise is redundancy: instead
of storing the information once we make three copies and decide during readout by a majority vote
which bit to take. It is easy to see that this reduces the probability of an error from order j to j2 .
Quantum mechanically however such a procedure is forbidden by the no cloning theorem.
Nevertheless, quantum error correction is possible although we have to do it in a more subtle way
than just copying; this was observed for the ÿrst time independently in [39,146]. Let us consider
ÿrst the general scheme and assume that T : B(K) → B(K) is a noisy quantum channel. To send
quantum systems of type B(H) undisturbed through T we need an encoding channel E : B(K) →
B(H) and a decoding channel D : B(H) → B(K) such that ETD=Id holds, respectively D∗ T ∗ E ∗ =
Id, in the Schrodinger picture; cf. Fig. 4.4.


483

Fig. 4.4. Five-bit quantum code: encoding one qubit into ÿve and correcting one error.

A powerful error correction scheme should not be restricted to one particular type of error, i.e. one
particular noisy channel T . Assume instead that E ⊂ B(K) is a linear subspace of “error operators”
and T is any channel given by
Fj Fj∗ ;

T∗ ( ) =

Fj ∈ E :

(4.26)

j

An isometry V : H → K is called an error correcting code for E if for each T of form (4.26)
there is a decoding channel D : B(H) → B(K) with D∗ (T (V V ∗ )) = for all ∈ S(H). By the
theory of Knill and La amme [103] this is equivalent to the factorization condition
V ; Fj∗ Fk V

= !(Fj∗ Fk )

;

;

(4.27)

where !(Fj∗ Fk ) is a factor which does not depend on the arbitrary vectors ; ∈ H.
The most relevant examples of error correcting codes are those which generalize the classical
idea of sending multiple copies in a certain sense. This means we encode a small number N of
d-level systems into a big number M N of systems of the same type, which are then transmitted
and decoded back into N systems afterwards. During the transmission K ¡ M arbitrary errors are
allowed. Hence, we have H = H⊗N , K = H⊗M with H1 = Cd and T is an arbitrary tensor product
1
1
of K noisy channels Sj , j =1; : : : ; K and M −K ideal channels Id. The most well-known code for this
type of error is the “ÿve-bit code” where one qubit is encoded into ÿve and one error is corrected
[16] (cf. Fig. 4.4 for N = 1; M = 5 and K = 1). To deÿne the corresponding error space E consider
the ÿnite sets X = {1; : : : ; N } and Y = {1 + N; : : : ; M + N } and deÿne ÿrst for each subset Z ⊂ Y :
E(Z) = span {A1 ⊗ · · · ⊗ AM ∈ B(K)|
Aj ∈ B(H1 ) arbitrary for j + N ∈ Z; Aj = 5 otherwise} :

(4.28)

E is now the span of all E(Z) with |Z| 6 K (i.e. the length of Z is less or equal to K). We say
that an error correcting code for this particular E corrects K errors.
There are several ways to construct error correcting codes (see e.g. [70,38,4]). Most of these
methods are somewhat involved however and require knowledge from classical error correction
which we want to skip. Therefore, we will only present the scheme proposed in [137], which is
quite easy to describe and admits a simple way to check the error correction condition. Let us sketch
ÿrst the general scheme. We start with an undirected graph
with two kinds of vertices: A set of
input vertices, labeled by X and a set of output vertices labeled by Y . The links of the graph are
given by the adjacency matrix, i.e. an N + M × N + M matrix with jk = 1 if node k and j are

484


Fig. 4.5. Two graphs belonging to (equivalent) ÿve bit codes. The input node can be chosen in both cases arbitrarily.
Fig. 4.6. Symbols and deÿnition for the three elementary gates AND, OR and NOT.

linked and
by

jk

= 0 otherwise. With respect to

jN +1 : : : jN +M |V |j1 : : : jN = exp

i ˜ ˜
j· j
d

we can deÿne now an isometry V : H⊗N → H⊗M
1
1
(4.29)

with ˜ = (j1 ; : : : ; jN +M ) ∈ ZN +M (where Zd denotes the cyclic group with d elements). There is an
j
d
easy condition under which V is an error correcting code. To write it down we need the following
additional terminology: We say that an error correcting code V : H⊗N → H⊗M detects the error
1
1
conÿguration Z ⊂ Y if
V ; FV

= !(F)

;

∀F ∈ E(Z)

(4.30)

holds. With Eq. (4.27) it is easy to see that V corrects K errors i it detects all error conÿgurations
of length 2K or less. Now we have the following theorem:
Theorem 4.1. The quantum code V deÿned in Eq. (4.29) detects the error conÿguration Z ⊂ Y
if the system of equations
kl gl

= 0;

k ∈ Y E;

g l ∈ Zd

(4.31)

= 0; k ∈ X

(4.32)

l∈ X ∪ Z

implies that
gl = 0; l ∈ X

and

kl gl
l∈ Z

holds.
We omit the proof, see [137] instead. Two particular examples (which are equivalent!) are given
in Fig. 4.5. In both cases we have N = 1, M = 5 and K = 1 i.e. one input node, which can be chosen
arbitrarily, ÿve output nodes and the corresponding codes correct one error. For a more detailed
survey on quantum error correction, in particular for more examples we refer to [20].


485

x
c

x + y mod 2
y

Fig. 4.7. Half-adder circuit as an example for a Boolean network.

4.5. Quantum computing
Quantum computing is without a doubt the most prominent and most far reaching application of
quantum information theory, since it promises on the one hand, “exponential speedup” for some
problems which are “hard to solve” with a classical computer, and gives completely new insights
into classical computing and complexity theory on the other. Unfortunately, an exhaustive discussion
would require its own review article. Hence, we are only able to give a short overview (see Part II
of [122] for a more complete presentation and for further references).
4.5.1. The network model of classical computing
Let us start with a brief (and very informal) introduction to classical computing (for a more complete review and hints for further reading see [122, Chapter 3]). What we need ÿrst is a mathematical
model for computation. There are, in fact, several di erent choices and the Turing machine [152]
is the most prominent one. More appropriate for our purposes is, however, the so-called network
model, since it allows an easier generalization to the quantum case. The basic idea is to interpret
a classical (deterministic) computation as the evaluation of a map f : BN → BM (where B = {0; 1}
denotes the ÿeld with two elements) which maps N input bits to M output bits. If M = 1 holds f is
called a Boolean function and it is for many purposes su cient to consider this special case—each
general f is in fact a Cartesian product of Boolean functions. Particular examples are the three elementary gates AND, OR and NOT deÿned in Fig. 4.6 and arbitrary algebraic expressions constructed
from them: e.g. the XOR gate (x; y) → x + y mod 2 which can be written as (x ∨ y) ∧ @(x ∧ y). It
is now a standard result of Boolean algebra that each Boolean function can be represented in this
way and there are in general many possibilities to do this. A special case is the disjunctive normal
form of f; cf [161]. To write such an expression down in form of equations is, however, somewhat
confusing. f is therefore expressed most conveniently in graphical form as a circuit or network, i.e.
a graph C with nodes representing elementary gates and edges (“wires”) which determine how the
gates should be composed; cf. Fig. 4.7 for an example. A classical computation can now be deÿned
as a circuit applied to a speciÿed string of input bits.
Variants of this model arise if we replace AND, OR and NOT by another (ÿnite) set G of
elementary gates. We only have to guarantee that each function f can be expressed as a composition
of elements from G. A typical example for G is the set which contains only the NAND gate (x; y) →
x ↑ y = @(x ∧ y). Since AND, OR and NOT can be rewritten in terms of NAND (e.g. @x = x ↑ x)
we can calculate each Boolean function by a circuit of NAND gates.

486


4.5.2. Computational complexity
One of the most relevant questions within classical computing, and the central subject of computational complexity, is whether a given problem is easy to solve or not, where “easy” is deÿned in
terms of the scaling behavior of the resources needed in dependence of the size of the input data.
In the following we will give a rough survey over the most basic aspects of this ÿeld, while we
refer the reader to [124] for a detailed presentation.
To start with, let us specify the basic question in greater detail. First of all the problems we want
to analyze are decision problems which only give the two possible values “yes” and “no”. They are
mathematically described by Boolean functions acting on bit strings of arbitrary size. A well-known
example is the factoring problem given by the function fac with fac(m; l) = 1 if m (more precisely
the natural number represented by m) has a divisor less then l and fac(m; l) = 0 otherwise. Note that
many tasks of classical computation can be reformulated this way, so that we do not get a severe
loss of generality. The second crucial point we have to clarify is the question what exactly are the
resources we have mentioned above and how we have to quantify them. A natural physical quantity
which come into mind immediately is the time needed to perform the computation (space is another
candidate, which we do not discuss here, however). Hence, the question we have to discuss is how
the computation time t depends on the size L of the input data x (i.e. the length L of the smallest
register needed to represent x as a bit string).
However, a precise deÿnition of “computation time” is still model dependent. For a Turing machine
we can take simply the number of head movements needed to solve the problem, and in the network
model we choose the number of steps needed to execute the whole circuit, if gates which operate on
di erent bits are allowed to work simultaneously. 16 Even with a ÿxed type of model the functional
behavior of t depends on the set of elementary operations we choose, e.g. the set of elementary
gates in the network model. It is therefore useful to divide computational problems into complexity
classes whose deÿnitions do not su er under model-dependent aspects. The most fundamental one
is the class P which contains all problems which can be computed in “polynomial time”, i.e. t is,
as a function of L, bounded from above by a polynomial. The model independence of this class is
basically the content of the strong Church Turing hypotheses which states, roughly speaking, that
each model of computation can be simulated in polynomial time on a probabilistic Turing machine.
Problems of class P are considered “easy”, everything else is “hard”. However, even if a (decision)
problem is hard the situation is not hopeless. E.g. consider the factoring problem fac described above.
It is generally believed (although not proved) that this problem is not in class P. But if somebody
gives us a divisor p ¡ l of m it is easy to check whether p is really a factor, and if the answer
is true we have computed fac(m; l). This example motivates the following deÿnition: A decision
problem f is in class NP (“non-deterministic polynomial time”) if there is a Boolean function f
in class P such that f (x; y) = 1 for some y implies f(x). In our example fac is obviously deÿned
by fac (m; l; p) = 1 ⇔ p ¡ l and p is a devisor of m. It is obvious that P is a subset of NP the
other inclusion however is rather non-trivial. The conjecture is that P = NP holds and great parts of

16

Note that we have glanced over a lot of technical problems at this point. The crucial di culty is that each circuit
CN allows only the computation of a Boolean function fN : BN → B which acts on input data of length N . Since we
are interested in answers for arbitrary ÿnite length inputs a sequence CN , N ∈ N of circuits with appropriate uniformity
properties is needed; cf. [124] for details.


487

complexity theory are based on it. Its proof (or disproof), however, represents one of the biggest
open questions of theoretical informatics.
To introduce a third complexity class we have to generalize our point of view slightly. Instead
of a function f : BN → BM we can look at a noisy classical T which sends the input value x ∈ BN
to a probability distribution Txy , y ∈ BM on BM (i.e. Txy is the transition matrix of the classical
channel T ; cf. Section 3.2.3). Roughly speaking, we can interpret such a channel as a probabilistic
computation which can be realized as a circuit consisting of “probabilistic gates”. This means there
are several di erent ways to proceed at each step and we use a classical random number generator
to decide which of them we have to choose. If we run our device several times on the same input
data x we get di erent results y with probability Txy . The crucial point is now that we can allow
some of the outcomes to be wrong as long as there is an easy way (i.e. a class P algorithm) to
check the validity of the results. Hence, we deÿne BPP (“bounded error probabilistic polynomial
time”) as the class of all decision problems which admit a polynomial time probabilistic algorithm
with error probability less than 1=2 − (for ÿxed ). It is obvious that P ⊂ BPP holds but the
relation between BPP and NP is not known.
4.5.3. Reversible computing
In the last subsection we have discussed the time needed to perform a certain computation. Other
physical quantities which seem to be important are space and energy. Space can be treated in a
similar way as time and there are in fact space-related complexity classes (e.g. PSPACE which
stands for “polynomial space”). Energy, however, is di erent, because it turns surprisingly out that
it is possible to do any calculation without expending any energy! One source of energy consumption
in a usual computer is the intrinsic irreversibility of the basic operations. E.g. a basic gate like AND
maps two input bits to one output bit, which obviously implies that the input cannot be reconstructed
from the output. In other words: one bit of information is erased during the operation of the AND
gate; hence a small amount of energy is dissipated to the environment. A thermodynamic analysis,
known as Landauer’s principle, shows that this energy loss is at least kB T ln 2, where T is the
temperature of the environment [106].
If we want to avoid this kind of energy dissipation we are restricted to reversible processes, i.e. it
should be possible to reconstruct the input data from the output data. This is called reversible computation and it is performed in terms of reversible gates, which in turn can be described by invertible
functions f : BN → BN . This does not restrict the class of problems which can be solved however:
We can repackage a non-invertible function f : BN → BM into an invertible one f : BN +M → BN +M
simply by f (x; 0) = (x; f(x)) and an appropriate extension to the rest of BN +M . It can be even
shown that a reversible computer performs as good as a usual one, i.e. an “irreversible” network
can be simulated in polynomial time by a reversible one. This will be of particular importance for
quantum computing, because a reversible computer is, as we will see soon, a special case of a
quantum computer.
4.5.4. The network model of a quantum computer
Now we are ready to introduce a mathematical model for quantum computation. To this end we
will generalize the network model discussed in Section 4.5.1 to the network model of quantum
computation.

488


Fig. 4.8. Universal sets of quantum gates.

Fig. 4.9. Quantum circuit for the discrete Fourier transform on a 4-qubit register.

A classical computer operates by a network of gates on a ÿnite number of classical bits. A quantum
computer operates on a ÿnite number of qubits in terms of a network of quantum gates—this is the
rough idea. To be more precise consider the Hilbert space H⊗N with H = C2 which describes a
quantum register consisting of N qubits. In H there is a preferred set |0 ; |1 of orthogonal states,
describing the two values a classical bit can have. Hence, we can describe each possible value x of
a classical register of length N in terms of the computational basis |x = |x1 ⊗ · · · ⊗ |xN , x ∈ BN .
A quantum gate is now nothing else but a unitary operator acting on a small number of qubits
(preferably 1 or 2) and a quantum network is a graph representing the composition of elementary
gates taken from a small set G of unitaries. A quantum computation can now be deÿned as the
application of such a network to an input state
of the quantum register (cf. Fig. 4.9 for an
example). Similar to the classical case the set G should be universal; i.e. each unitary operator on a
quantum register of arbitrary length can be represented as a composition of elements from G. Since
the group of unitaries on a Hilbert space is continuous, it is not possible to do this with a ÿnite
set G. However, we can ÿnd at least suitably small sets which have the chance to be realizable
technically (e.g. in an ion-trap) somehow in the future. Particular examples are on the one hand
the controlled U operations and the set consisting of CNOT and all one-qubit gates on the other
(cf. Fig. 4.8; for a proof of universality see Section 4.5 of [122]).
Basically, we could have considered arbitrary quantum operations instead of only unitaries as
gates. However in Section 3.2.1, we have seen that we can implement each operation unitarily if we
add an ancilla to the systems. Hence, this kind of generalization is already covered by the model.
(As long as non-unitarily implemented operations are a desired feature. Decoherence e ect due to
unavoidable interaction with the environment are a completely di erent story; we come back to this
point at the end of the subsection.) The same holds for measurements at intermediate steps and
subsequent conditioned operations. In this case we get basically the same result with a di erent


489

network where all measurements are postponed to the end. (Often it is however very useful to allow
measurements at intermediate steps as we will see in the next subsection.)
Having a mathematical model of quantum computers in mind we are now ready to discuss how
it would work in principle.
1. The ÿrst step is in most cases preprocessing of the input data on a classical computer. E.g. the
Shor algorithm for the factoring problem does not work if the input number m is a pure prime
power. However, in this case there is an e cient classical algorithm. Hence, we have to check
ÿrst whether m is of this particular form and use this classical algorithm where appropriate.
2. In the next step we have to prepare the quantum register based on these preprocessed data. This
means in the most simple case to write classical data, i.e. to prepare the state |x ∈ H⊗N if
the (classical) input is x ∈ BN . In many cases, however, it might be more intelligent to use a
superposition of several |x , e.g. the state
1
|x ;
(4.33)
=√
2N x∈BN
which represents actually the superposition of all numbers the registers can represent—this is
indeed the crucial point of quantum computing and we come back to it below.
3. Now we can apply the quantum circuit C to the input state
and after the calculation we get
the output state U , where U is the unitary represented by C.
4. To read out the data after the calculation we perform a von Neumann measurement in the
computational basis, i.e. we measure the observable given by the one-dimensional projectors
|x x|, x ∈ BN . Hence, we get x ∈ BN with probability PN = | |x |2 .
5. Finally, we have to postprocess the measured value x on a classical computer to end up with the
ÿnal result x . If, however, the output state U is a proper superposition of basis vectors |x (and
not just one |x ) the probability px to get this particular x is less than 1. In other words, we have
performed a probabilistic calculation as described in the last paragraph of Section 4.5.2. Hence,
we have to check the validity of the results (with a class P algorithm on a classical computer)
and if they are wrong we have to go back to step 2.
So, why is quantum computing potentially useful? First of all, a quantum computer can perform
at least as good as a classical computer. This follows immediately from our discussion of reversible
computing in Section 4.5.3 and the fact that any invertible function f : BN → BN deÿnes a unitary by
Uf : |x → |f(x) (the quantum CNOT gate in Fig. 4.8 arises exactly in this way from the classical
CNOT). But, there is on the other hand strong evidence which indicates that a quantum computer
can solve problems in polynomial time which a classical computer cannot. The most striking example
for this fact is the Shor algorithm, which provides a way to solve the factoring problem (which is
most probably not in class P) in polynomial time. If we introduce the new complexity class BQP of
decision problems which can be solved with high probability and in polynomial time with a quantum
computer, we can express this conjecture as BPP = BQP.
The mechanism which gives a quantum computer its potential power is the ability to operate not
just on one value x ∈ BN , but on whole superpositions of values, as already mentioned in step 2
above. E.g. consider a, not necessarily invertible, map f : BN → BM and the unitary operator Uf
H⊗ N ⊗ H ⊗ M

|x ⊗ |0 → Uf |x ⊗ |0 = |x ⊗ |f(x) ∈ H⊗N ⊗ H⊗M :

(4.34)

490


If we let act Uf on a register in the state
Uf (

1
⊗ |0 ) = √
|x ⊗ |f(x) :
2N x∈BN

⊗ |0 from Eq. (4.33) we get the result
(4.35)

Hence, a quantum computer can evaluate the function f on all possible arguments x ∈ BN at the
same time! To beneÿt from this feature—usually called quantum parallelism—is, however, not as
easy as it looks like. If we perform a measurement on Uf ( ⊗ |0 ) in the computational basis we
get the value of f for exactly one argument and the rest of the information originally contained
in Uf ( ⊗ |0 ) is destroyed. In other words it is not possible to read out all pairs (x; f(x)) from
Uf ( ⊗ |0 ) and to ÿll a (classical) lookup table with them. To take advantage from quantum
parallelism we have to use a clever algorithm within the quantum computation step (step 3 above).
In the next section we will consider a particular example for this.
Before we come to this point, let us give some additional comments which link this section to
other parts of quantum information. The ÿrst point concerns entanglement. The state Uf ( ⊗ |0 ) is
highly entangled (although is separable since =[2−1=2 (|0 +|1 )]⊗N ), and this fact is essential for
the “exponential speedup” of computations we could gain in a quantum computer. In other words, to
outperform a classical computer, entanglement is the most crucial resource—this will become more
transparent in the next section. The second remark concerns error correction. Up to now we have
implicitly assumed that all components of a quantum computer work perfectly without any error. In
reality, however, decoherence e ects make it impossible to realize unitarily implemented operations,
and we have to deal with noisy channels. Fortunately, it is possible within quantum information to
correct at least a certain amount of errors, as we have seen in Section 4.4. Hence, unlike an analog
computer 17 a quantum computer can be designed fault tolerant, i.e. it can work with imperfectly
manufactured components.
4.5.5. Simons problem
We will consider now a particular problem (known as Simons problem; cf. [143]) which shows
explicitly how a quantum computer can speed up a problem which is hard to solve with a classical
computer. It does not ÿt, however, exactly into the general scheme sketched in the last subsection,
because a quantum “oracle” is involved, i.e. a black box which performs an (a priori unknown)
unitary transformation on an input state given to it. The term “oracle” indicates here that we are not
interested in the time the black box needs to perform the calculation but only in the number of times
we have to access it. Hence, this example does not prove the conjecture BPP = BQP stated above.
Other quantum algorithms which we do not have the room here to discuss include: the Deutsch [52]
and Deutsch–Josza problem [53], the Grover search algorithm [74,75] and of course Shor’s factoring
algorithm [139,140].
Hence, let us assume that our black box calculates the unitary Uf from Eq. (4.34) with a map
f : BN → BN which is two to one and has period a, i.e. f(x) = f(y) i y = x + a mod 2. The task
is to ÿnd a. Classically, this problem is hard, i.e. we have to query the oracle exponentially often.
To see this note ÿrst that we have to ÿnd a pair (x; y) with f(x) = f(y) and the probability to get
it with two random queries is 2−N (since there is for each x exactly one y = x with f(x) = f(y)).
17

If an analog computer works reliably only with a certain accuracy, we can rewrite the algorithm into a digital one.


491

If we use the box 2N=4 times, we get less than 2N=2 di erent pairs. Hence, the probability to get the
correct solution is 2−N=2 , i.e. arbitrarily small even with exponentially many queries.
Assume now that we let our box act on a quantum register H⊗N ⊗ H⊗N in the state
⊗ |0
with
from Eq. (4.33) to get Uf ( ⊗ |0 ) from (4.35). Now we measure the second register. The
outcome is one of 2N −1 possible values (say f(x0 )), each of which occurs equiprobable. Hence,
after the measurement the ÿrst register is the state 2−1=2 (|x + |x + a ). Now we let a Hadamard gate
H (cf. Fig. 4.9) act on each qubit of the ÿrst register and the result is (this follows with a short
calculation)
1
1
√ H ⊗N (|x + |x + a ) = √
(−1)x·y |y ;
(4.36)
N −1
2
2
a·y=0
where the dot denotes the (B-valued) scalar product in the vector space BN . Now we perform a
measurement on the ÿrst register (in computational basis) and we get a y ∈ BN with the property
y · a = 0. If we repeat this procedure N times and if we get N linear-independent values yj we
can determine a as a solution of the system of equations y1 · a = 0; : : : ; yN · a = 0. The probability
to appear as an outcome of the second measurement is for each y with y · a = 0 given by 21−N .
Therefore, the success probability can be made arbitrarily big while the number of times we have
to access the box is linear in N .
4.6. Quantum cryptography
Finally, we want to have a short look on quantum cryptography—another more practical application
of quantum information, which has the potential to emerge into technology in the not so distant future
(see e.g. [95,93,34] for some experimental realizations and [69] for a more detailed overview).
Hence, let us assume that Alice has a message x ∈ BN which she wants to send secretly to Bob
over a public communication channels. One way to do this is the so-called “one-time pad”: Alice
generates randomly a second bit-string y ∈ BN of the same length as x sends x + y instead of x.
Without knowledge of the key y it is completely impossible to recover the message x from x + y.
Hence, this is a perfectly secure method to transmit secret data. Unfortunately, it is completely
useless without a secure way to transmit the key y to Bob, because Bob needs y to decrypt the
message x + y (simply by adding y again). What makes the situation even worse is the fact that the
key y can be used only once (therefore the name one-time pad). If two messages x1 , x2 are encrypted
with the same key we can use x1 as a key to decrypt x2 and vice versa: (x1 + y) + (x2 + y) = x1 + x2 ,
hence both messages are partly compromised.
Due to these problems completely di erent approaches, namely “public key systems” like DSA
and RSA are used today for cryptography. The idea is to use two keys instead of one: a private key
which is used for decryption and only known to its owner and a public key used for encryption,
which is publicly available (we do not discuss the algorithms needed for key generation, encryption
and decryption here, see [145] and the references therein instead). To use this method, Bob generates
a key pair (z; y), keeps his private key (y) at a secure place and sends the public one (z) to Alice
over a public channel. Alice encrypts her message with z sends the result to Bob and he can
decrypt it with y. The security of this scheme relies on the assumption that the factoring problem is
computationally hard, i.e. not in class P, because to calculate y from z requires the factorization of
large integers. Since the latter is tractable on quantum computers via Shor’s algorithm, the security

492


of public key systems breaks down if quantum computers become available in the future. Another
problem of more fundamental nature is the unproven status of the conjecture that factorization is
not solvable in polynomial time. Consequently, security of public key systems is not proven either.
The crucial point is now that quantum information provides a way to distribute a cryptographic
key y in a secure way, such that y can be used as a one-time pad afterwards. The basic idea is to use
the no cloning theorem to detect possible eavesdropping attempts. To make this more transparent,
let us consider a particular example here, namely the probably most prominent protocol proposed
by Benett and Brassard in 1984 [10].
1. Assume that Alice wants to transmit bits from the (randomly generated) key y ∈ BN through
an ideal quantum channel to Bob. Before they start they settle upon two orthonormal bases
e0 ; e1 ∈ H, respectively f0 ; f1 ∈ H, which are mutually non-orthogonal, i.e. | ej ; fk | ¿ ¿ 0
with big enough for each j; k = 0; 1. If photons are used as information carrier a typical choice
◦
are linearly polarized photons with polarization direction rotated by 45 against each other.
2. To send one bit j ∈ B Alice selects now at random one of the two bases, say e0 ; e1 and then
she sends a qubit in the state |ej ej | through the channel. Note that neither Bob nor a potential
eavesdropper knows which bases she has chosen.
3. When Bob receives the qubit he selects, as Alice before, at random a base and performs the
corresponding von Neumann measurement to get one classical bit k ∈ B, which he records together
with the measurement method.
4. Both repeat this procedure until the whole string y ∈ BN is transmitted and then Bob tells Alice
(through a classical, public communication channel) bit for bit which base he has used for the
measurement (but not the result of the measurement). If he has used the same base as Alice
both keep the corresponding bit otherwise they discard it. They end up with a bit-string y ∈ BM
of a reduced length M . If this is not su cient they have to continue sending random bits until
the key is long enough. For large N the rate of successfully transmitted bits per bits sended is
obviously 1 . Hence, Alice has to send approximately twice as many bits as they need.
2
To see why this procedure is secure, assume now that the eavesdropper Eve can listen and modify
the information sent through the quantum channel and that she can listen on the classical channel
but cannot modify it (we come back to this restriction in a minute). Hence, Eve can intercept the
qubits sent by Alice and make two copies of it. One she forwards to Bob and the other she keeps
for later analysis. Due to the no cloning theorem, however, she has produced errors in both copies
and the quality of her own decreases if she tries to make the error in Bob’s as small as possible.
Even if Eve knows about the two bases e0 ; e1 and f0 ; f1 she does not know which one Alice uses
to send a particular qubit 18 . Hence, Eve has to decide randomly which base to choose (as Bob).
If e0 ; e1 and f0 ; f1 are chosen optimal, i.e. | ej ; fk |2 = 0:5 it is easy to see that the error rate Eve
necessarily produces if she randomly measures in one of the bases is 1=4 for large N . To detect this
error Alice and Bob simply have to sacrify portions of the generated key and to compare randomly
selected bits using their classical channel. If the error rate they detect is too big they can decide to
drop the whole key and restart from the beginning.
18

If Alice and Bob uses only one basis to send the data and Eve knows about it she can produce, of course, ideal
copies of the qubits. This is actually the reason why two non-orthogonal bases are necessary.


493

So let us discuss ÿnally a situation where Eve is able to intercept the quantum and the classical
channel. This would imply that she can play Bob’s part for Alice and Alice’s for Bob. As a result
she shares a key with Alice and one with Bob. Hence, she can decode all secret data Alice sends to
Bob, read it, and encode it ÿnally again to forward it to Bob. To secure against such a “woman in
the middle attack”, Alice and Bob can use classical authentication protocols which ensure that the
correct person is at the other end of the line. This implies that they need a small amount of initial
secret material which can be renewed, however, from the new key they have generated through
quantum communication.
5. Entanglement measures
In the last section we have seen that entanglement is an essential resource for many tasks of
quantum information theory, like teleportation or quantum computation. This means that entangled
states are needed for the functioning of many processes and that they are consumed during operation.
It is therefore necessary to have measures which tell us whether the entanglement contained in a
number of quantum systems is su cient to perform a certain task. What makes this subject di cult is
the fact that we cannot restrict the discussion to systems in a maximally or at least highly entangled
pure state. Due to unavoidable decoherence e ects realistic applications have to deal with imperfect
systems in mixed states, and exactly in this situation the question for the amount of available
entanglement is interesting.
5.1. General properties and deÿnitions
The di culties arising if we try to quantify entanglement can be divided, roughly speaking, into
two parts: Firstly, we have to ÿnd a reasonable quantity which describes exactly those properties
which we are interested in and secondly we have to calculate it for a given state. In this section
we will discuss the ÿrst problem and consider several di erent possibilities to deÿne entanglement
measures.
5.1.1. Axiomatics
First of all, we will collect some general properties which a reasonable entanglement measure
should have (cf. also [16,154,153,155,89]). To quantify entanglement, means nothing else but to
associate a positive real number to each state of (ÿnite dimensional) two-partite systems.
Axiom E0. An entanglement measure is a function E which assigns to each state
dimensional bipartite system a positive real number E( ) ∈ R+ .

of a ÿnite-

Note that we have glanced over some mathematical subtleties here, because E is not just deÿned on
the state space of B(H⊗K) systems for particularly chosen Hilbert spaces H and K−E is deÿned
on any state space for arbitrary ÿnite dimensional H and K. This is expressed mathematically most
conveniently by a family of functions which behaves naturally under restrictions (i.e. the restriction
to a subspace H ⊗ K coincides with the function belonging to H ⊗ K ). However, we will see
soon that we can safely ignore this problem.

494


The next point concerns the range of E. If is unentangled E( ) should be zero of course and
it should be maximal on maximally entangled states. But what happens if we allow the dimensions
of H and K to grow? To get an answer consider ÿrst a pair of qubits in a maximally entangled
state . It should contain exactly one-bit entanglement, i.e. E( ) = 1 and N pairs in the state ⊗N
should contain N bits. If we interpret ⊗N as a maximally entangled state of a H ⊗ H system with
H = CN we get E( ⊗N ) = log2 (dim(H)) = N , where we have to reshu e in ⊗N the tensor factors
such that (C2 ⊗ C2 )⊗N becomes (C2 )⊗N ⊗ (C2 )⊗N (i.e. “all Alice particles to the left and all Bob
particles to the right”; cf. Section 4.3.) This observation motivates the following.
Axiom E1 (Normalization). E vanishes on separable and takes its maximum on maximally entangled states. More precisely; this means that E( ) 6 E( ) = log2 (d) for ; ∈ S(H ⊗ H) and
maximally entangled.
One thing an entanglement measure should tell us, is how much quantum information can be
maximally teleported with a certain amount of entanglement, where this maximum is taken over
all possible teleportation schemes and distillation protocols, hence it cannot be increased further by
additional LOCC operations on the entangled systems in question. This consideration motivates the
following Axiom.
Axiom E2 (LOCC monotonicity). E cannot increase under LOCC operation; i.e. E[T ( )] 6 E( )
for all states and all LOCC channels T .
A special case of LOCC operations are, of course, local unitary operations U ⊗ V . Axiom E2
implies now that E(U ⊗ V U ∗ ⊗ V ∗ ) 6 E( ) and on the other hand E(U ∗ ⊗ V ∗ ˜U ⊗ V ) 6 E( ˜)
hence with ˜ =U ⊗V U ∗ ⊗V we get E( ) 6 E(U ⊗V V ∗ ⊗U ∗ ) therefore E( )=E(U ⊗V U ∗ ⊗V ∗ ).
We ÿx this property as a weakened version of Axiom E2.
Axiom E2a (Local unitary invariance). E is invariant under local unitaries; i.e. E(U ⊗ V U ∗ ⊗
V ∗ ) = E( ) for all states and all unitaries U ; V .
This axiom shows why we do not have to bother about families of functions as mentioned above.
If E is deÿned on S(H ⊗ H) it is automatically deÿned on S(H1 ⊗ H2 ) for all Hilbert spaces
Hk with dim(Hk ) 6 dim(H), because we can embed H1 ⊗ H2 under this condition unitarily into
H ⊗ H.
Consider now a convex linear combination
+ (1 − ) with 0 6 6 1. Entanglement cannot
be “generated” by mixing two states, i.e. E( + (1 − ) ) 6 E( ) + (1 − )E( ).
Axiom E3 (Convexity). E is a convex function; i.e. E(
two states ; and 0 6 6 1.

+ (1 − ) ) 6 E( ) + (1 − )E( ) for

The next property concerns the continuity of E, i.e. if we perturb slightly the change of E( )
should be small. This can be expressed most conveniently as continuity of E in the trace norm.
At this point, however, it is not quite clear, how we have to handle the fact that E is deÿned for


495

arbitrary Hilbert spaces. The following version is motivated basically by the fact that it is a crucial
assumption in Theorems 5.2 and 5.3.
Axiom E4 (Continuity). Consider a sequence of Hilbert spaces HN ; N ∈ N and two sequences of
states N ; N ∈ S(HN ⊗ HN ) with lim N − N 1 = 0. Then we have
E( N ) − E( N )
=0 :
(5.1)
lim
N →∞ 1 + log2 (dim HN )
The last point we have to consider here are additivity properties: Since we are looking at entanglement as a resource, it is natural to assume that we can do with two pairs in the state twice
as much as with one , or more precisely E( ⊗ ) = 2E( ) (in ⊗ we have to reshu e tensor
factors again; see above).
Axiom E5 (Additivity). For any pair of two-partite states ; ∈ S(H ⊗ K) we have E( ⊗ ) =
E( ) + E( ).
Unfortunately, this rather natural looking axiom seems to be too strong (it excludes reasonable
candidates). It should be however, always true that entanglement cannot increase if we put two pairs
together.
Axiom E5a (Subadditivity). For any pair of states ;

we have E( ⊗ ) 6 E( ) + E( ).

There are further modiÿcations of additivity available in the literature. Most frequently used is the
following, which restricts Axiom E5 to the case = .
Axiom E5b (Weak additivity). For any state

of a bipartite system we have N −1 E(

⊗N ) = E(

).

Finally, the weakest version of additivity only deals with the behavior of E for large tensor
products, i.e. ⊗N for N → ∞.
Axiom E5c (Existence of a regularization). For each state
E( ⊗N )
E ∞ ( ) = lim
N →∞
N
exists.

the limit
(5.2)

5.1.2. Pure states
Let us consider now a pure state = |
| ∈ S(H ⊗ K). If it is entangled its partial trace
= tr H |
| = tr K |
| is mixed and for a maximally entangled state it is maximally mixed. This
suggests to use the von Neumann entropy 19 of , which measures how much a state is mixed, as
an entanglement measure for pure states, i.e. we deÿne [9,16]
EvN ( ) = −tr[tr H ln(tr H )] :
19

(5.3)

We assume here and in the following that the reader is su ciently familiar with entropies. If this is not the case we
refer to [123].

496


It is easy to deduce from the properties of the von Neumann entropy that EvN satisÿes Axioms E0, E1,
E3 and E5b. Somewhat more di cult is only Axiom E2 which follows, however, from a nice theorem
of Nielsen [119] which relates LOCC operations (on pure states) to the theory of majorization. To
state it here we need ÿrst some terminology. Consider two probability distributions = ( 1 ; : : : ; M )
and = ( 1 ; : : : ; N ) both given in decreasing order (i.e. 1 ¿ · · · ¿ M and 1 ¿ · · · ¿ N ). We say
that is majorized by , in symbols ≺ , if
k

k
j6

j=1

j

∀k = 1; : : : ; min M; N

(5.4)

j=1

holds. Now we have the following result (see [119] for a proof).
Theorem 5.1. A pure state
1=2
j fj

=

j

1=2
j ej

⊗ ej ∈ H ⊗ K can be transformed into another pure

state = j
⊗ fj ∈ H ⊗ K via an LOCC operation; i the Schmidt coe cients of
majorized by those of ; i.e. ≺ .

are

The von Neumann entropy of the restriction tr H |
| can be immediately calculated from the
Schmidt coe cients of by EvN (|
|) = − j j ln( j ). Axiom E2 follows therefore from the
fact that the entropy S( ) = − j j ln( j ) of a probability distribution is a Shur concave function,
i.e. ≺ implies S( ) ¿ S( ); see [121].
Hence, we have seen so far that EvN is one possible candidate for an entanglement measure on pure
states. In the following we will see that it is in fact the only candidate which is physically reasonable.
There are basically two reasons for this. The ÿrst one deals with distillation of entanglement. It was
shown by Bennett et al. [9] that each state ∈ H ⊗ K of a bipartite system can be prepared out
of (a possibly large number of) systems in an arbitrary entangled state
by LOCC operations.
To be more precise, we can ÿnd a sequence of LOCC operations
TN : B[(H ⊗ K)⊗M (N ) ] → B[(H ⊗ K)⊗N ]

(5.5)

such that
∗
lim TN (|

N →∞

|⊗N ) − |

|1 = 0

(5.6)

holds with a non-vanishing rate r=limN →∞ M (N )=N . This is done either by distillation (r ¡ 1 if is
higher entangled then ) or by “diluting” entanglement, i.e. creating many less entangled states from
few highly entangled ones (r ¿ 1). All this can be performed in a reversible way: We can start with
some maximally entangled qubits, dilute them to get many less entangled states which can be distilled
afterwards to get the original states back (again only in an asymptotic sense). The crucial point is
that the asymptotic rate r of these processes is given in terms of EvN by r=EvN (|
|)=EvN (|
|).
Hence, we can say, roughly speaking, that EvN (|
|) describes exactly the amount of maximally
entangled qubits which is contained in |
|.
A second somewhat more formal reason is that EvN is the only entanglement measure on the set
of pure states which satisÿes the axioms formulated above. In other words the following “uniqueness
theorem for entanglement measures” holds [129,155,57].


497

Theorem 5.2. The reduced von Neumann entropy EvN is the only entanglement measure on pure
states which satisÿes Axioms E0 –E5.
5.1.3. Entanglement measures for mixed states
To ÿnd reasonable entanglement measures for mixed states is much more di cult. There are in
fact many possibilities (e.g. the maximally entangled fraction introduced in Section 3.1.1 can be
regarded as a simple measure) and we want to present therefore only four of the most reasonable
candidates. Among those measures which we do not discuss here are negativity quantities ([158]
and the references therein) the “best separable approximation” [108], the base norm associated with
the set of separable states [157,136] and ppt-distillation rates [133].
The ÿrst measure we want to present is oriented along the discussion of pure states: We deÿne,
roughly speaking, the asymptotic rate with which maximally entangled qubits can be distilled at
most out of a state ∈ S(H ⊗ K) as the entanglement of distillation ED ( ) of ; cf. [12]. To be
more precise consider all possible distillation protocols for (cf. Section 4.3), i.e. all sequences of
LOCC channels
TN : B(CdN ⊗ CdN ) → B(H⊗N ⊗ K⊗N ) ;

(5.7)

such that
lim

N →∞

∗
TN (

⊗N

)−|

N

N| 1

=0

(5.8)

holds with a sequence of maximally entangled states
log2 (dN )
;
ED ( ) = sup lim sup
N
(TN )N ∈N N →∞

N

∈ CdN . Now we can deÿne
(5.9)

where the supremum is taken over all possible distillation protocols (TN )N ∈N . It is not very di cult
to see that ED satisÿes Axioms E0, E1, E2 and E5b. It is not known whether continuity (Axiom
E4) and convexity (Axiom E3) holds. It can be shown, however, that ED is not convex (and not
additive; Axiom E5) if npt bound entangled states exist (see [141], cf. also Section 4.3.3).
For pure states we have discussed beside distillation the “dilution” of entanglement and we can
use, similar to ED , the asymptotic rate with which bipartite systems in a given state can be prepared
out of maximally entangled singlets [78]. Hence, consider again a sequence of LOCC channels
TN : B(H⊗N ⊗ K⊗N ) → B(CdN ⊗ CdN )
and a sequence of maximally entangled states
lim

N →∞

⊗N

∗
− TN (|

N

N |) 1

(5.10)
N

∈ CdN , N ∈ N, but now with the property

=0 :

Then we can deÿne the entanglement cost EC ( ) of
log2 (dN )
;
EC ( ) = inf lim inf
(SN )N ∈N N →∞
N

(5.11)
as
(5.12)

where the inÿmum is taken over all dilution protocols SN , N ∈ N. It is again easy to see that EC
satisÿes Axioms E0, E1, E2 and E5b. In contrast to ED however it can be shown that EC is convex
(Axiom E3), while it is not known, whether EC is continuous (Axiom E4); cf [78] for proofs.

498


ED and EC are directly based on operational concepts. The remaining two measures we want to
discuss here are deÿned in a more abstract way. The ÿrst can be characterized as the minimal convex
extension of EvN to mixed states: We deÿne the entanglement of formation EF of as [16]
EF ( ) =

=

inf
j

pj |

j

j|

pj EvN (|

j

j |)

;

(5.13)

where the inÿmum is taken over all decompositions of into a convex sum of pure states. EF satisÿes
E0 –E4 and E5a (cf. [16] for E2 and [120] for E4 the rest follows directly from the deÿnition).
Whether EF is (weakly) additive (Axiom E5b) is not known. Furthermore, it is conjectured that
∞
EF coincides with EC . However, proven is only the identity EF = EC , where the existence of the
∞ of E follows directly from subadditivity.
regularization EF
F
Another idea to quantify entanglement is to measure the “distance” of the (entangled)
from
the set of separable states D. It hat turned out [154] that among all possible distance functions the
relative entropy is physically most reasonable. Hence, we deÿne the relative entropy of entanglement
as
ER ( ) = inf S( | );
∈D

S( | ) = [tr( log2 − log2 )] ;

(5.14)

where the inÿmum is taken over all separable states. It can be shown that ER satisÿes, as EF the
Axioms E0 –E4 and E5a, where E1 and E2 are shown in [154] and E4 in [56]; the rest follows
directly from the deÿnition. It is shown in [159] that ER does not satisfy E5b; cf. also Section 5.3.
∞
Hence, the regularization ER of ER di ers from ER .
Finally, let us give now some comments on the relation between the measures just introduced.
On pure states all measures just discussed, coincide with the reduced von Neumann entropy—this
follows from Theorem 5.2 and the properties stated in the last subsection. For mixed states the
situation is more di cult. It can be shown however that ED 6 EC holds and that all “reasonable”
entanglement measures lie in between [89].
Theorem 5.3. For each entanglement measure E satisfying E0; E1; E2 and E5b and each state
∈ S(H ⊗ K) we have ED ( ) 6 E( ) 6 EC ( ).
Unfortunately, no measure we have discussed in the last subsection satisÿes all the assumptions
of the theorem. It is possible, however, to get a similar statement for the regularization E ∞ with
weaker assumptions on E itself (in particular, without assuming additivity); cf. [57].
5.2. Two qubits
Even more di cult than ÿnding reasonable entanglement measures are explicit calculations. All
measures we have discussed above involve optimization processes over spaces which grow exponentially with the dimension of the Hilbert space. A direct numerical calculation for a general state
is therefore hopeless. There are, however, some attempts to get either some bounds on entanglement measures or to get explicit calculations for special classes of states. We will concentrate this
discussion to some relevant special cases. On the one hand, we will concentrate on EF and ER and
on the other we will look at two special classes of states where explicit calculations are possible:
Two qubit systems in this section and states with symmetry properties in the next one are given.


499

5.2.1. Pure states
Assume for the rest of this section that H = C2 holds and consider ÿrst a pure state ∈ H ⊗ H.
To calculate EvN ( ) is of course not di cult and it is straightforward to see that (cf. [16] for all
material of this and the following subsection)
EvN ( ) = H [ 1 (1 +
2

1 − C( )2 )]

(5.15)

holds, with
H (x) = −x log2 (x) − (1 − x) log2 (1 − x)
and the concurrence C( ) of
3

C( ) =

(5.16)

which is deÿned by
3

2
j

with

j=0

=

j

j

;

(5.17)

j=0

where j , j =0; : : : ; 3 denotes the Bell basis (3.3). Since C becomes rather important in the following
let us reexpress it as C( ) = | ;
|, where →
denotes complex conjugation in the Bell
basis. Hence,
is an antiunitary operator and it can be written as the tensor product = ⊗ of
the map H
→ 2 , where denotes complex conjugation in the canonical basis and 2 is the
second Pauli matrix. Hence, local unitaries (i.e. those of the form U1 ⊗ U2 ) commute with
and
it can be shown that this is not only a necessary but also a su cient condition for a unitary to be
local [160].
We see from Eqs. (5.15) and (5.17) that C( ) ranges from 0 to 1 and that EvN ( ) is a monotone
function in C( ). The latter can be considered therefore as an entanglement quantity in its own
right. For a Bell state we get in particular C( j ) = 1 while a separable state 1 ⊗ 2 leads to
C( 1 ⊗ 2 ) = 0; this can be seen easily with the factorization = ⊗ .
Assume now that one of the j say 0 satisÿes | 0 |2 ¿ 1=2. This implies that C( ) cannot be zero
since
3

2
j

6 1 − | 0 |2

(5.18)

j=1

must hold. Hence, C( ) is at least 1 − 2| 0 |2 and this implies for EvN and arbitrary
EvN ( ) ¿ h(|

0;

2

|)

with h(x) =

H[1 +
2
0

x(1 − x)] x ¿ 1 ;
2
x¡ 1 :
2

(5.19)

This inequality remains valid if we replace 0 by any other maximally entangled state ∈ H ⊗ H.
To see this note that two maximally entangled states ; ∈ H ⊗ H are related (up to a phase) by
a local unitary transformation U1 ⊗ U2 (this follows immediately from their Schmidt decomposition;
cf Section 3.1.1). Hence, if we replace the Bell basis in Eq. (5.17) by j = U1 ⊗ U2 j , j = 0; : : : ; 3
∗
∗
∗
∗
we get for the corresponding C the equation C ( ) = U1 ⊗ U2 ; U1 ⊗ U2
= C( ) since
2
commutes with local unitaries. We can even replace | 0 ; | with the supremum over all maximally
entangled states and therefore get
EvN ( ) ¿ h[F(|

|)] ;

(5.20)

500


where F(|
|) is the maximally entangled fraction of |
| which we have introduced in
Section 3.1.1.
To see that even equality holds in Eq. (5.20) note ÿrst that it is su cient to consider the case
= a|00 + b|11 with a; b ¿ 0, a2 + b2 = 1, since each pure state can be brought into this form
(this follows again from the Schmidt decomposition) by a local unitary transformation which on the
other hand does not change EvN . The maximally entangled state which maximizes | ; |2 is in
this case 0 and we get F(|
|) = (a + b)2 =2 = 1=2 + ab. Straightforward calculations now show
that h[F(|
|)] = h(1=2 + ab) = EvN ( ) holds as stated.
5.2.2. EOF for Bell diagonal states
It is easy to extend inequality (5.20) to mixed states if we use the convexity of EF and the fact
that EF coincides with EvN on pure states. Hence, (5.20) becomes
EF ( ) ¿ h[F( )] :

(5.21)

For general two-qubit states this bound is not achieved however. This can be seen with the example
=1=2(| 1 1 |+|00 00|), which we have already considered in the last paragraph of Section 3.1.1.
It is easy to see that F( ) = 1 holds hence h[F( )] = 0 but is entangled. Nevertheless, we can
2
show that equality holds in Eq. (5.21) if we restrict it to the Bell diagonal states = 3
j |.
j=0 j| j
To prove this statement we have to ÿnd a convex decomposition = j j | j
| of such a into
j
pure states | j
j | such that h[F( )] =
j EvN (| j
j | holds. Since EF ( ) cannot be smaller
j
than h[F( )] due to inequality (5.21) this decomposition must be optimal and equality is proven.
To ÿnd such j assume ÿrst that the biggest eigenvalue of is greater than 1=2, and let, without
loss of generality, 1 be this eigenvalue. A good choice for the j are then the eight pure states


0


0+i

3

(±

j)

j

 :

(5.22)

j=1

The reduced von Neumann entropy of all these states equals h( 1 ), hence j j EvN (| j
j |)=h( 1 )
and therefore EF ( ) = h( 1 ). Since the maximally entangled fraction of is obviously 1 we see
that (5.21) holds with equality.
Assume now that the highest eigenvalue is less than 1=2. Then we can ÿnd phase factors exp(i j )
such that 3 exp(i j ) j = 0 holds and can be expressed as a convex linear combination of the
j=0
states


e

i

0 =2

0


0+i

3

(±ei

j =2

j)

j

 :

(5.23)

j=1

The concurrence C of all these states is 0 hence their entanglement is 0 by Eq. (5.15), which in turn
implies EF ( ) = 0. Again, we see that equality is achieved in (5.21) since the maximally entangled
fraction of is less than 1=2. Summarizing this discussion we have shown (cf. Fig. 5.1)
Proposition 5.4. A Bell diagonal state is entangled i its highest eigenvalue
1=2. In this case the entanglement of formation of is given by
EF ( ) = H [ 1 +
2

(1 − )] :

is greater than
(5.24)


501

1
Entanglement of Formation
Relative Entropy

EF ( )
ER( )
0.8

0.6

0.4

0.2

0
0.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

Highest eigenvalue λ of
Fig. 5.1. Entanglement of formation and relative entropy of entanglement for the Bell diagonal states, plotted as a function
of the highest eigenvalue of .

5.2.3. Wootters formula
If we have a general two-qubit state there is a formula of Wootters [172] which allows an easy
calculation of EF . It is based on a generalization of the concurrence C to mixed states. To motivate
it rewrite C 2 ( ) = | ;
| as
C 2 ( ) = tr(|

) = tr(R2 )

|) = tr(

(5.25)

with
R=

√

√

:

(5.26)

Here we have set = |
|. The deÿnition of the Hermitian matrix R however makes sense for
arbitrary as well. If we write j ; j = 1; : : : ; 4 for the eigenvalues of R and 1 is without loss of
generality, the biggest one we can deÿne the concurrence of an arbitrary two-qubit state as [172]
C( ) = max(0; 2

1

− tr(R)) = max(0;

1

−

2

−

3

−

4)

:

(5.27)

It is easy to see that C(|
|) coincides with C( ) from (5.17). The crucial point is now that
Eq. (5.15) holds for EF ( ) if we insert C( ) instead of C( ):
Theorem 5.5 (Wootters formula). The entanglement of formation of a two-qubit system in a state
is given by
EF ( ) = H [ 1 (1 +
2
where the concurrence of

1 − C( )2 )] ;

(5.28)

is given in Eq. (5.27) and H denotes the binary entropy from (5.16).

502


To prove this theorem we ÿrstly have to ÿnd a convex decomposition = j j | j
j | of
into pure states j such that the average reduced von Neumann entropy
j j EvN ( j ) coincides
with the right-hand side of Eq. (5.28). Secondly, we have to show that we have really found the
minimal decomposition. Since this is much more involved than the simple case discussed in Section
5.2.2 we omit the proof and refer to [172] instead. Note however that Eq. (5.28) really coincides
with the special cases we have derived for the pure and the Bell diagonal states. Finally, let us add
the remark that there is no analog of Wootters’ formula for higher dimensional Hilbert spaces. It
can be shown [160] that the essential properties of the Bell basis j , j = 0; : : : ; 3 which would be
necessary for such a generalization are available only in 2 × 2 dimensions.
5.2.4. Relative entropy for Bell diagonal states
To calculate the relative entropy of entanglement ER for two-qubit systems is more di cult.
However, there is at least an easy formula for the Bell diagonal states which we will give in the
following [154]:
Proposition 5.6. The relative entropy of entanglement for a Bell diagonal state
eigenvalue is given by (cf. Fig. 5.1)
ER ( ) =

1 − H ( );

¿1 ;
2

0;

with highest

61 :
2

Proof. For a Bell diagonal state

(5.29)
3
j=0

=

j|

j|

j

ER ( ) = inf [tr( log2 − log2 )]
∈D

= tr( log2 ) + inf −
∈D

(5.30)


3

j ; log2 (

j

)

:

j

(5.31)

j=0

Since log is a concave function we have −log2

ER ( ) ¿ tr( log2

we have to calculate

) + inf −
∈D

j;

j

3

j

log2

j;

j

6


j ; −log2 (

)

:

j

and therefore
(5.32)

j=0

Hence; only the diagonal elements of in the Bell basis enter the minimization on the right-hand
side of this inequality and this implies that we can restrict the inÿmum to the set of separable
Bell diagonal state. Since a Bell diagonal state is separable i all its eigenvalues are less than 1=2
(Proposition 5.2.1) we get


ER ( ) ¿ tr( log2 ) +

inf

pj ∈[0;1=2]

−

3

j=0


j log2 pj

3

with

pj = 1 :

(5.33)

j=0

This is an optimization problem (with constraints) over only four real parameters and easy to solve.
If the highest eigenvalue of is greater than 1=2 we get p1 = 1=2 and pj = j =(2 − 2 ); where we
have chosen without loss of generality = 1 . We get a lower bound on ER ( ) which is achieved


503

if we insert the corresponding in Eq. (5.31). Hence; we have proven the statement for ¿ 1=2.
which completes the proof; since we have already seen that 6 1=2 implies that
is separable
(Proposition 5.4).
5.3. Entanglement measures under symmetry
The problems occurring if we try to calculate quantities like ER or EF for general density matrices
arise from the fact that we have to solve optimization problems over very high dimensional spaces.
One possible strategy to get explicit results is therefore parameter reduction by symmetry arguments.
This can be done if the state in question admits some invariance properties like Werner, isotropic
or OO-invariant states; cf. Section 3.1. In the following, we will give some particular examples for
such calculations, while a detailed discussion of the general idea (together with much more examples
and further references) can be found in [159].
5.3.1. Entanglement of formation
Consider a compact group of unitaries G ⊂ B(H ⊗ H) (where H is again arbitrary ÿnite
dimensional), the set of G-invariant states, i.e. all with [V; ]=0 for all V ∈ G and the corresponding
twirl operation PG = G V V ∗ dV . Particular examples we are looking at are: (1) Werner states where
G consists of all unitaries U ⊗ U , (2) isotropic states where each V ∈ G has the form V = U ⊗ U
and ÿnally (3) OO-invariant states where G consists of unitaries U ⊗ U with real matrix elements
(U = U ) and the twirl is given in Eq. (3.24).
One way to calculate EF for a G-invariant state
consists now of the following steps: (1)
Determine the set M of pure states
such that PG |
| = holds. (2) Calculate the function
PG S

→ jG ( ) = inf {EvN ( ) | ∈ M } ∈ R ;

(5.34)

where we have denoted the set of G-invariant states with PG S. (3) Determine EF ( ) then in terms
of the convex hull of j, i.e.
EF ( ) = inf

j j( j )| j

∈ PG S; 0 6

j

j

6 1;

=

j j;
j

j

=1

:

(5.35)

j

The equality in the last equation is of course a non-trivial statement which has to be proved. We
skip this point, however, and refer the reader to [159]. The advantage of this scheme relies on the
fact that spaces of G invariant states are in general very low dimensional (if G is not too small).
Hence, the optimization problem contained in step 3 has a much bigger chance to be tractable than
the one we have to solve for the original deÿnition of EF . There is of course no guarantee that
any of this three steps can be carried out in a concrete situation. For the three examples mentioned
above, however, there are results available, which we will present in the following.
5.3.2. Werner states
Let us start with Werner states [159]. In this case is uniquely determined by its ip expectation
value tr( F) (cf. Section 3.1.2). To determine ∈ H ⊗ H such that PUU |
| = holds, we have
to solve therefore the equation
;F

=

jk
jk

kj

= tr(F ) ;

(5.36)

504

1
0.9
0.8
0.7

EF( )

0.6
0.5
0.4
0.3
0.2
0.1
0
-1

-0.8

-0.6

-0.4

-0.2

0

tr( F)
Fig. 5.2. Entanglement of formation for Werner states plotted as function of the ip expectation.

where jk denote components of
in the canonical basis. On the other hand, the reduced density
matrix = tr 1 |
| has the matrix elements jk = l jl kl . By exploiting U ⊗ U invariance we
can assume without loss of generality that is diagonal. Hence, to get the function UU we have to
minimize
EvN (|

|) =

S
j

|

2
jk |

(5.37)

k

under constraint (5.36), where S(x) = −x log2 (x) denotes the von Neumann entropy. We skip these
calculations here (see [159] instead) and state the results only. For tr(F ) ¿ 0 we get ( ) = 0
(as expected since is separable in this case) and with H from (5.16)
jUU ( ) = H [ 1 (1 −
2

1 − tr(F )2 )]

(5.38)

for tr(F ) ¡ 0. The minima are taken for
where all jk except one diagonal element are zero in
the case tr(F ) ¿ 0 and for
with only two (non-diagonal) coe cients jk ; kj , j = k non-zero if
tr( F) ¡ 0. The function is convex and coincides therefore with its convex hull such that we get
Proposition 5.7. For any Werner state
EF ( ) =

H [ 1 (1 −
2
0;

the entanglement of formation is given by (cf. Fig. 5.2)

1 − tr(F )2 )] ; tr(F ) ¡ 0 ;
tr(F ) ¿ 0 :

(5.39)


505

5.3.3. Isotropic states
Let us now consider isotropic, i.e. U ⊗ U invariant states. They are determined by the expectation
˜
˜
˜
value tr( F) with F from Eq. (3.14). Hence, we have to look ÿrst for pure states with ; F =
˜ (since this determines, as for Werner states above, those
tr( F)
with PU U (|
|) = ). To this
end assume that
has the Schmidt decomposition = j j fj ⊗ fj = U1 ⊗ U2 j j ej ⊗ ej with
appropriate unitary matrices U1 ; U2 and the canonical basis ej , j = 1; : : : ; d. Exploiting the U ⊗ U
invariance of we get
˜
tr( F) =

(5 ⊗ V )

˜
⊗ ej ; F(5 ⊗ V )

⊗ ek

(5.40)

ej ⊗ Vej ; el ⊗ el em ⊗ em ; ek ⊗ Vek

(5.41)

j ej
j

=

j k
j; k;l; m

=

k ek
k

2
j

ej ; Vej

(5.42)

j

T
˜
with V = U1 U2 and after inserting the deÿnition of F. Following our general scheme, we have to
minimize EvN (|
|) under the constraint given in Eq. (5.42). This is explicitly done in [150].
We will only state the result here, which leads to the function

 H ( ) + (1 − ) log (d − 1); tr( F) ¿ 1 ;
˜
2
d
jU U ( ) =
(5.43)

˜ ¡0
0;
tr( F)

with
=

1
d2

˜
tr( F) +

˜
[d − 1][d − tr( F)]

2

:

(5.44)

For d ¿ 3 this function is not convex (cf. Fig. 5.3), hence we get
Proposition 5.8. For any isotropic state the entanglement of formation is given as the convex hull
EF ( ) = inf

j jU U ( j )
j

of the function

UU

=

j j;

PU U =

(5.45)

j

in Eq. (5.43).

5.3.4. OO-invariant states
The results derived for isotropic and Werner states can be extended now to a large part of the
set of OO-invariant states without solving new minimization problems. This is possible, because the
deÿnition of EF in Eq. (5.13) allows under some conditions an easy extension to a suitable set of
non-symmetric states. If more precisely a non-trivial, minimizing decomposition = j pj | j j |
of is known, all states
which are a convex linear combination of the same | j j | but arbitrary
pj have the same EF as (see [159] for proof of the statement). For the general scheme we have
presented in Section 5.3.1 this implies the following: If we know the pure states ∈ M which solve
the minimization problem for j( ) in Eq. (5.34) we get a minimizing decomposition of in terms

506

2
d=4
d=3
d=2

1.8
1.6
1.4

∋

¯
UU (

)

1.2
1
0.8
0.6
0.4
0.2
0
1

1.5

2

2.5

3

∼
tr( F )

3.5

4

Fig. 5.3. -function for isotopic states plotted as a function of the ip expectation. For d ¿ 2 it is not convex near the
right endpoint.

3

2

A
C

1

B
-1

0

1

0

Fig. 5.4. State space of OO-invariant states.

of U ∈ G translated copies of . This follows from the fact that is by deÿnition of M the twirl
of . Hence any convex linear combination of pure states U U ∗ with U ∈ G has the same EF as .
A detailed analysis of the corresponding optimization problems in the case of Werner and isotropic
states (which we have omitted here; see [159,150] instead) leads therefore to the following results
about OO-invariant states: The space of OO-invariant states decomposes into four regions: The
separable square and three triangles A; B; C; cf. Fig. 5.4. For all states
in triangle A we can
calculate EF ( ) as for Werner states in Proposition 5.7 and in triangle B we have to apply the result
for isotropic states from Proposition 5.8. This implies in particular that EF depends in A only on
˜
tr( F) and in B only on tr( F) and the dimension.


507

1
0.9
0.8
0.7

ER( )

0.6
0.5
0.4
0.3
0.2
0.1
0
-1

-0.8

-0.6

-0.4

-0.2

0

tr( F)
Fig. 5.5. Relative entropy of entanglement for Werner states, plotted as a function of the ip expectation.

5.3.5. Relative entropy of entanglement
To calculate ER ( ) for a symmetric state is even easier as the treatment of EF ( ), because we
can restrict the minimization in the deÿnition of ER ( ) in Eq. (5.14) to G-invariant separable states,
provided G is a group of local unitaries. To see this assume that ∈ D minimizes S( | ) for a
G-invariant state . Then we get S( |U U ∗ ) = S( | ) for all U ∈ G since the relative entropy S
is invariant under unitary transformations of both arguments and due to its convexity we even get
S( |PG ) 6 S( | ). Hence PG minimizes S( |·) as well, and since PG ∈ D holds for a group G
of local unitaries, we get ER ( ; ) = S( |PG ) as stated.
The sets of Werner and isotropic states are just intervals and the corresponding separable states
form subintervals over which we have to perform the optimization. Due to the convexity of the
relative entropy in both arguments, however, it is clear that the minimum is attained exactly at
the boundary between entangled and separable states. For Werner states this is the state 0 with
tr(F 0 ) = 0, i.e. it gives equal weight to both minimal projections. To get ER ( ) for a Werner state
we have to calculate therefore only the relative entropy with respect to this state. Since all Werner
states can be simultaneously diagonalized this is easily done and we get (cf. Fig. 5.5)
ER ( ) = 1 − H

1 + tr(F )
2

Similarly, the boundary point
(cf. Fig. 5.6)
ER ( ) = log2 d − 1 −

1

:

(5.46)

˜
for isotropic states is given by tr(F

˜
tr(F )
d

log2 (d − 1) − S

˜
˜
tr(F ) 1 − tr(F )
;
d
d

1)

= 1 which leads to

(5.47)

508

2
d=2
d=3
d=4

ER( )

1.5

1

0.5

0
1

1.5

2

2.5

∼
tr( F )

3

3.5

4

˜
Fig. 5.6. Relative entropy of entanglement for isotropic states and d = 2; 3; 4, plotted as a function of tr( F).

for each entangled isotropic state , and 0 if is separable. (S(p1 ; p2 ) denotes here the entropy of
the probability vector (p1 ; p2 ).)
Let us now consider OO-invariant states. As for EOF we divide the state space into the separable
square and the three triangles A; B; C; cf. Fig. 5.4. The state at the coordinates (1; d) is a maximally
entangled state and all separable states on the line connecting (0; 1) with (1; 1) minimize the relative
entropy for this state. Hence consider a particular state on this line. The convexity property of
the relative entropy immediately shows that is a minimizer for all states on the line connecting
with the state at (1; d). In this way, it is easy to calculate ER ( ) for all in A. In a similar way
we can treat the triangle B: We just have to draw a line from to the state at (−1; 0) and ÿnd the
minimizer for at the intersection with the separable border between (0; 0) and (0; 1). For all states
in the triangle C the relative entropy is minimized by the separable state at (0; 1).
An application of the scheme just reviewed is a proof that ER is not additive, i.e. it does not
satisfy Axiom E5b. To see this consider the state = tr(P− )−1 P− where P− denotes the projector
on the antisymmetric subspace. It is a Werner state with ip expectation −1 (i.e. it corresponds
to the point (−1; 0) in Fig. 5.4). According to our discussion above S( |·) is minimized in this
case by the separable state 0 and we get ER ( ) = 1 independently of the dimension d. The tensor
product ⊗2 can be regarded as a state in S(H⊗2 ⊗ H⊗2 ) with U ⊗ U ⊗ V ⊗ V symmetry, where
U; V are unitaries on H. Note that the corresponding state space of UUVV invariant states can be
parameterized by the expectation of the three operators F ⊗ 5, 5 ⊗ F and F ⊗ F (cf. [159]) and we
can apply the machinery just described to get the minimizer ˜ of S( | · ). If d ¿ 2 holds it turns
out that
˜=

d+1
d−1
P ⊗ P+ +
P− ⊗ P −
2 +
2d tr(P+ )
2d tr(P− )2

(5.48)


509

holds (where P± denote the projections onto the symmetric and antisymmetric subspaces of H⊗H)
and not ˜ = 0 ⊗ 0 as one would expect. As a consequence we get the inequality
ER (

⊗2

) = 2 − log2

2d − 1
d

¡ 2 = S(

⊗2

|

⊗2

0

) = 2ER ( ) :

(5.49)

⊗
d = 2 is a special case, where 0 2 and ˜ (and all their convex linear combination) give the same
value 2. Hence for d ¿ 2 the relative entropy of entanglement is, as stated, not additive.

6. Channel capacity
In Section 4.4 we have seen that it is possible to send (quantum) information undisturbed through
a noisy quantum channel, if we encode one qubit into a (possibly long and highly entangled) string
of qubits. This process is wasteful, since we have to use many instances of the channel to send
just one qubit of quantum information. It is therefore natural to ask, which resources we need
at least if we are using the best possible error correction scheme. More precisely the question
is: With which maximal rate, i.e. information sent per channel usage, we can transmit quantum
information undisturbed through a noisy channel? This question naturally leads to the concept of
channel capacities which we will review in this section.
6.1. The general case
We are mainly interested in classical and quantum capacities. The basic ideas behind both situations
are however quite similar. In this section we will consider therefore a general deÿnition of capacity
which applies to arbitrary channels and both kinds of information. (See also [168] as a general
reference for this section.)
6.1.1. The deÿnition
Hence consider two observable algebras A1 , A2 and an arbitrary channel T : A1 → A2 . To send
systems described by a third observable algebra B undisturbed through T we need an encoding
channel E : A2 → B and a decoding channel D : B → A1 such that ETD equals the ideal channel
B → B, i.e. the identity on B. Note that the algebra B describing the systems to send, and the
input, respectively output, algebra of T need not to be of the same type, e.g. B can be classical
while A1 ; A2 are quantum (or vice versa).
In general (i.e. for arbitrary T and B) it is of course impossible to ÿnd such a pair E and D. In
this case we are interested at least in encodings and decodings which make the error produced during
the transmission as small as possible. To make this statement precise we need a measure for this
error and there are in fact many good choices for such a quantity (all of them leading to equivalent
results, cf. Section 6.3.1). We will use in the following the “cb-norm di erence” ETD−Id cb , where
Id is the identity (i.e. ideal) channel on B and · cb denotes the norm of complete boundedness
(“cb-norm” for short)
T

cb

= sup T ⊗ Id n ;
n∈N

Id n : B(Cn ) → B(Cn ) :

(6.1)

510


The cb-norm improves the sometimes annoying property of the usual operator norm that quantities
like T ⊗ Id B(Cd ) may increase with the dimension d. On inÿnite-dimensional observable algebras
T cb can be inÿnite although each term in the supremum is ÿnite. A particular example for a
map with such a behavior is the transposition on an inÿnite-dimensional Hilbert space. A map
with ÿnite cb-norm is therefore called completely bounded. In a ÿnite-dimensional setup each linear
map is completely bounded. For the transposition
on Cd we have in particular
cb = d. The
cb-norm has some nice features which we will use frequently; this includes its multiplicativity
T1 ⊗ T2 cb = T1 cb T2 cb and the fact that T cb = 1 holds for each (unital) channel. Another
useful relation is T cb = T ⊗ Id B(H) , which holds if T is a map B(H) → B(H). For more
properties of the cb-norm let us refer to [125].
Now we can deÿne the quantity
(T; B) = inf ETD − Id B
E;D

cb

;

(6.2)

where the inÿmum is taken over all channels E : A2 → B and D : B → A1 and Id B is again the
ideal B-channel.
describes, as indicated above, the smallest possible error we have to take into
account if we try to transmit one B system through one copy of the channel T using any encoding
E and decoding D. In Section 4.4, however, we have seen that we can reduce the error if we take
M copies of the channel instead of just one. More generally we are interested in the transmission
of “codewords of length” N , i.e. B⊗N systems using M copies of the channel T . Encodings and
decodings are in this case channels of the form E : A⊗M → B⊗N respectively D : B⊗N → A⊗M . If
2
1
we increase the number M of channels the error (T ⊗M ; B⊗N (M ) ) decreases provided the rate with
which N grows as a function of M is not too large. A more precise formulation of this idea leads
to the following deÿnition.
Deÿnition 6.1. Let T be a channel and B an observable algebra. A number c ¿ 0 is called achievable rate for T with respect to B; if for any pair of sequences Mj ; Nj ; j ∈ N with Mj → ∞ and
lim supj→∞ Nj =Mj ¡ c we have
lim

j →∞

(T ⊗Mj ; B⊗Nj ) = 0 :

(6.3)

The supremum of all achievable rates is called the capacity of T with respect to B and denoted by
C(T; B).
Note that by deÿnition c = 0 is an achievable rate hence C(T; B) ¿ 0. If on the other hand each
c ¿ 0 is achievable we write C(T; B) = ∞. At a ÿrst look it seems cumbersome to check all pairs
of sequences with given upper ratio when testing c. Due to some monotonicity properties of ,
however, it can be shown that it is su cient to check only one sequence provided the Mj satisfy
the additional condition Mj =(Mj+1 ) → 1.
6.1.2. Simple calculations
We see that there are in fact many di erent capacities of a given channel depending on the type
of information we want to transmit. However, there are only two di erent cases we are interested in:
B can be either classical or quantum. We will discuss both special cases in greater detail in the next


511

two sections. Before we do this, however, we will have a short look on some simple calculations
which can be done in the general case. To this end it is convenient to introduce the notations
Md = B(Cd )

and

Cd = C({1; : : : ; d})

(6.4)

as shorthand notations for B(Cd ) and C({1; : : : ; d}) since some notations become otherwise a little
bit clumsy. First of all let us have a look on capacities of ideal channels. If Id Mf and Id Cf denote
the identity channels on the quantum algebra Mf , respectively the classical algebra Cf , we get
C(Id Cf ; Md ) = 0;

C(Id Cf ; Cd ) = C(Id Mf ; Md ) = C(Id Mf ; Cd ) =

log2 f
:
log2 d

(6.5)

The ÿrst equation is the channel capacity version of the no-teleportation theorem: It is impossible
to transfer quantum information through a classical channel. The other equations follow simply by
counting dimensions.
For the next relation it is convenient to associate to a pair of channels T , S the quantity C(T; S)
which arises if we replace in Deÿnition 6.1 and Eq. (6.2) the ideal channel Id B by an arbitrary
channel S. Hence C(T; S) is a slight generalization of the channel capacity which describes with
which asymptotic rate the channel S can be approximated by T (and appropriate encodings and
decodings). These generalized capacities satisfy the two-step coding inequality, i.e. for the three
channels T1 ; T2 ; T3 we have
C(T3 ; T1 ) ¿ C(T2 ; T1 )C(T3 ; T2 ):

(6.6)

To prove it consider the relations
⊗
⊗
T 1 N − E 1 E2 T 3 K D2 D1

cb

⊗
⊗
⊗
⊗
= T 1 N − E 1 T 2 M D1 + E 1 T 2 M D1 − E 1 E2 T 3 K D 2 D1
⊗
⊗
6 T 1 N − E 1 T 2 M D1

cb

+ E1

⊗
⊗
6 T 1 N − E 1 T 2 M D1

cb

⊗
⊗
+ T 2 M − E 2 T 3 K D2

cb

⊗
⊗
T 2 M − E 2 T 3 K D2

cb

;

(6.7)

cb

cb

D1

cb

(6.8)
(6.9)

where we have used for the last inequality the fact that the cb-norm of a channel is one. If c1 is an
achievable rate of T1 with respect to T2 such that lim supj→∞ Mj =Nj ¡ c1 and c2 is an achievable
rate of T2 with respect to T3 such that lim supj→∞ Nj =Kj ¡ c2 we see that
lim sup
j →∞

Mj
M j Nj
Mj
Nk
= lim sup
6 lim sup
lim sup
:
Kj
j →∞ Nj Kj
j →∞ Nj k →∞ Kk

(6.10)

If we choose the sequences Mj ; Nj and Kj clever enough (cf. the remark following Deÿnition 6.1)
this implies that c1 c2 is an achievable rate for T1 with respect to T3 and this proves Eq. (6.6).
As a ÿrst application of (6.6), we can relate all capacities C(T; Md ) (and C(T; Cd )) for different d to one another. If we choose T3 = T , T1 = Id Md and T2 = Id Mf we get with (6.5)
C(T; Md ) 6 (log2 f=log2 d)C(T; Mf ), and exchanging d with f shows that even equality holds.

512


A similar relation can be shown for C(T; Cd ). Hence, the dimension of the observable algebra B
describing the type of information to be transmitted, enters only via a multiplicative constant, i.e.
it is only a choice of units and we deÿne the classical capacity Cc (T ) and the quantum capacity
Cq (T ) of a channel T as
Cc (T ) = C(T; C2 );

Cq (T ) = C(T; M2 ) :

(6.11)

A second application of Eq. (6.6) is a relation between the classical and the quantum capacity of
a channel. Setting T3 = T , T1 = Id C2 and T2 = Id M2 we get again with (6.5),
Cq (T ) 6 Cc (T ) :

(6.12)

Note that it is now not possible to interchange the roles of C2 and M2 . Hence equality does not
hold here.
Another useful relation concerns concatenated channels: We transmit information of type B ÿrst
through a channel T1 and then through a second channel T2 . It is reasonable to assume that the
capacity of the composition T2 T1 cannot be bigger than capacity of the channel with the smallest
bandwidth. This conjecture is indeed true and known as the “Bottleneck inequality”:
C(T2 T1 ; B) 6 min{C(T1 ; B); C(T2 ; B)} :

(6.13)

To see this consider an encoding and a decoding channel E, respectively D, for (T2 T1 )⊗M , i.e. in
the deÿnition of C(T2 T1 ; B) we look at
Id ⊗N − E(T2 T1 )⊗M D
B

cb

⊗
⊗
= Id ⊗N − (ET2 M )T1 M D
B

cb

:

(6.14)

⊗
This implies that ET2 M and D are an encoding and a decoding channel for T1 . Something similar
⊗M
holds for D and T1 D with respect to T2 . Hence each achievable rate for T2 T1 is also an achievable
rate for T2 and T1 , and this proves Eq. (6.13).
Finally, we want to consider two channels T1 , T2 in parallel, i.e. we consider the tensor product
⊗
⊗
T1 ⊗ T2 . If Ej , Dj , j = 1; 2 are encoding, respectively decoding, channels for T1 M and T2 M such
⊗ Nj
⊗M
that Id B − Ej Tj Dj cb 6 j holds, we get

Id − Id ⊗ (E2 T ⊗M D2 ) + Id ⊗ (E2 T ⊗M D2 ) − E1 ⊗ E2 (T1 ⊗ T2 )⊗M D1 ⊗ D2
6 Id ⊗ (Id − E2 T ⊗M D2 )
6 Id − E2 T ⊗M D2

cb

cb

⊗
+ (Id − E1 T1 M D1 ) ⊗ E2 T ⊗M D2

⊗
+ Id − E1 T1 M D1

cb 6 2j

:

cb

cb

(6.15)
(6.16)
(6.17)

Hence c1 + c2 is achievable for T1 ⊗ T2 if cj is achievable for Tj . This implies the inequality
C(T1 ⊗ T2 ; B) ¿ C(T1 ; B) + C(T2 ; B) :

(6.18)

When all channels are ideal, or when all systems involved are classical even equality holds, i.e.
channel capacities are additive in this case. However, if quantum channels are considered, it is one
of the big open problems of the ÿeld, to decide under which conditions additivity holds.


513

6.2. The classical capacity
In this section we will discuss the classical capacity Cc (T ) of a channel T . There are in fact three
di erent cases to consider: T can be either classical or quantum and in the quantum case we can
use either ordinary encodings and decodings or a dense coding scheme (cf. Section 4.1.3).
6.2.1. Classical channels
Let us consider ÿrst a classical to classical channel T : C(Y ) → C(X ). This is basically the
situation of classical information theory and we will only have a short look here—mainly to show
how this (well known) situation ÿts into the general scheme described in the last section. 20
First of all we have to calculate the error quantity (T; C2 ) deÿned in Eq. (6.20). As stated in
Section 3.2.3 T is completely determined by its transition probabilities Txy , (x; y) ∈ X × Y describing
the probability to receive x ∈ X when y ∈ Y was sent. Since the cb-norm for a classical algebra
coincides with the ordinary norm we get (we have set X = Y for this calculation)
Id − T

cb

= Id − T = sup
x;f

(

xy

(6.19)

− Txy )fy

y

= 2 sup(1 − Txx ) ;

(6.20)

x

where the supremum in the ÿrst equation is taken over all f ∈ C(X ) with f = supy |fy | 6 1. We
see that the quantity in Eq. (6.20) is exactly twice the maximal error probability, i.e. the maximal
probability of sending x and getting anything di erent. Inserting this quantity for in Deÿnition 6.1
applied to a classical channel T and the “bit-algebra” B = C2 , we get exactly the Shannons classical
deÿnition of the capacity of a discrete memoryless channel [138].
Hence we can apply the Shannons noisy channel coding theorem to calculate Cc (T ) for a classical
channel. To state it we have to introduce ÿrst some terminology. Consider therefore a state p ∈ C∗ (X )
of the classical input algebra C(X ) and its image q = T ∗ (p) ∈ C∗ (Y ) under the channel. p and q
are probability distributions on X , respectively Y , and px can be interpreted as the probability that
the “letter” x ∈ X was send. Similarly qy = x Txy px is the probability that y ∈ Y was received and
Pxy = Txy px is the probability that x ∈ X was sent and y ∈ Y was received. The family of all Pxy can
be interpreted as a probability distribution P on X × Y and the Txy can be regarded as conditional
probability of P under the condition x. Now we can introduce the mutual information
I (p; T ) = S(p) + S(q) − S(P) =
(x;y)∈X ×Y

Pxy log2

Pxy
p x qy

;

(6.21)

where S(p), S(q) and S(P) denote the entropies of p; q and P. The mutual information describes,
roughly speaking, the information that p and q contain about each other. E.g. if p and q are
completely uncorrelated (i.e. Pxy = px qy ) we get I (p; T ) = 0. If T is on the other hand an ideal
bit-channel and p equally distributed we have I (p; T ) = 1. Now we can state the Shannons Theorem
which expresses the classical capacity of T in terms of mutual informations [138]:
20

Please note that this implies in particular that we do not give a complete review of the foundations of classical
information theory here; cf. [101,62,49] instead.

514


Theorem 6.2 (Shannon). The classical capacity of Cc (T ) of a classical communication channel
T : C(Y ) → C(X ) is given by
Cc (T ) = sup I (p; T ) ;

(6.22)

p

where the supremum is taken over all states p ∈ C∗ (X ).
6.2.2. Quantum channels
If we transmit classical data through a quantum channel T : B(H) → B(H) the encoding
E : B(H) → C2 is a parameter-dependent preparation and the decoding D : C2 → B(H) is an
observable. Hence, the composition ETD is a channel C2 → C2 , i.e. a purely classical channel and
we can calculate its capacity in terms of the Shannons Theorem (Theorem 6.2). This observation
leads to the deÿnition of the “one-shot” classical capacity of T :
Cc; 1 (T ) = sup Cc (ETD) ;

(6.23)

E;D

where the supremum is taken over all encodings and decodings of classical bits. The term “one-shot”
in this deÿnition arises from the fact that we need apparently only one invocation of the channel
T . However, many uses of the channel are hidden in the deÿnition of the classical capacity on the
right-hand side. Hence, Cc; 1 (T ) can be deÿned alternatively in the same way as Cc (T ) except that
no entanglement is allowed during encoding and decoding, or more precisely in Deÿnition 6.1 we
⊗
consider only encodings E : B(K)⊗M → C2 N which prepare separable states and only decodings
⊗
D : C2 N → B(H)⊗M which lead to separable observables. It is not yet known, whether entangled
codings can help to increase the transmission rate. Therefore, we only know that
1
Cc; 1 (T ) 6 Cc (T ) = sup
(6.24)
Cc; 1 (T ⊗M )
M
M ∈N
holds. One reason why Cc; 1 (T ) is an interesting quantity relies on the fact that we have, due to the
following theorem by Holevo [80], a computable expression for it.
Theorem 6.3. The one-shot classical capacity Cc; 1 (T ) of a quantum channel T : B(H) → B(H)
is given by
pj T ∗ [ j ]

Cc; 1 (T ) = sup S
pj ;

j

j

pj S(T ∗ [ j ])

−

;

(6.25)

j

where the supremum is taken over all probability distributions pj and collections of density operators j .
6.2.3. Entanglement assisted capacity
Another classical capacity of a quantum channel arises, if we use dense coding schemes instead
of simple encodings and decodings to transmit the data through the channel T . In other words
we can deÿne the entanglement enhanced classical capacity Ce (T ) in the same way as Cc (T ) but
by replacing the encoding and decoding channels in Deÿnition 6.1 and Eq. (6.2) by dense coding
protocols. Note that this implies that the sender Alice and the receiver Bob share an (arbitrary)
amount of (maximally) entangled states prior to the transmission.


515

For this quantity a coding theorem was recently proven by Bennett and others [18] which we want
to state in the following. To this end assume that we are transmitting systems in the state ∈ B∗ (H)
through the channel and that has the puriÿcation ∈ H ⊗ H, i.e. = tr 1 |
| = tr 2 |
|.
Then we can deÿne the entropy exchange
S( ; T ) = S[(T ⊗ Id)(|

|)] :

(6.26)

The density operator (T ⊗ Id)(|
|) has the output state T ∗ ( ) and the input state as its partial
traces. It can be regarded therefore as the quantum analog of the input=output probability distribution
Txy deÿned in Section 6.2.1. Another way to look at S( ; T ) is in terms of an ancilla representation
of T : If T ∗ ( ) = tr K (U ⊗ K U ∗ ) with a unitary U : H ⊗ K and a pure environment state K it
∗
can be shown [7] that S( ; T ) = S[TK ] where TK is the channel describing the information transfer
∗
into the environment, i.e. TK ( ) = tr H (U ⊗ K U ∗ ), in other words S( ; T ) is the ÿnal entropy of
the environment. Now we can deÿne
I ( ; T ) = S( ) + S(T ∗ ) − S( ; T ) ;

(6.27)

which is the quantum analog of the mutual information given in Eq. (6.21). It has a number of nice
properties, in particular positivity, concavity with respect to the input state and additivity [2] and its
maximum with respect to coincides actually with Ce (T ) [18].
Theorem 6.4. The entanglement assisted capacity Ce (T ) of a quantum channel T : B(H) → B(H)
is given by
Ce (T ) = sup I ( ; T ) ;

(6.28)

where the supremum is taken over all input states

∈ B∗ (H).

Due to the nice additivity properties of the quantum mutual information I ( ; T ) the capacity Ce (T )
is known to be additive as well. This implies that it coincides with the corresponding “one-shot”
capacity, and this is an essential simpliÿcation compared to the classical capacity Cc (T ).
6.2.4. Examples
Although the expressions in Theorems 6.3 and 6.4 are much easier than the original deÿnitions
they still involve some optimization problems over possibly large parameter spaces. Nevertheless,
there are special cases which allow explicit calculations. As a ÿrst example we will consider the
“quantum erasure channel” which transmits with probability 1 − # the d-dimensional input state
intact while it is replaced with probability # by an “erasure symbol”, i.e. a (d + 1)th pure state e
which is orthogonal to all others [72]. In the Schrodinger picture this is
B∗ (Cd )

→ T ∗ ( ) = (1 − #) + # tr( )|

e

e| ∈ B

∗

(Cd+1 ) :

(6.29)

This example is very unusual, because all capacities discussed up to now (including the quantum
capacity as we will see in Section 6.3.2) can be calculated explicitly: We get Cc; 1 (T ) = Cc (T ) = (1 −
#) log2 (d) for the classical and Ce (T ) = 2Cc (T ) for the entanglement enhanced classical capacity
[15,17]. Hence the gain by entanglement assistance is exactly a factor two; cf. Fig. 6.1.

516

2
classical capacity
ee. classical capacity
quantum capacity

Ce(T)
Cc(T)
Cq(T)

1.5

1

0.5

0
0

0.2

0.4

0.6

0.8

1

ϑ
Fig. 6.1. Capacities of the quantum erasure channel plotted as a function of the error probability.

Our next example is the depolarizing channel
5
(6.30)
∈ B∗ (Cd ) ;
d
already discussed in Section 3.2. It is more interesting and more di cult to study. It is in particular
not known whether Cc and Cc; 1 coincide in this case (i.e. the value of Cc is not known. Therefore
we can compare Ce (T ) only with Cc; 1 . Using the unitary covariance of T (cf. Section 3.2.2) we see
ÿrst that I (U U ∗ ; T ) = I ( ; T ) holds for all unitaries U (to calculate S(U U ∗ ; T ) note that U ⊗ U
is a puriÿcation of U U ∗ if
is a puriÿcation of ). Due to the concavity of I ( ; T ) in the ÿrst
argument we can average over all unitaries and see that the maximum in Eq. (6.28) is achieved on
the maximally mixed state. Straightforward calculation therefore shows that
B∗ (Cd )

→ T ∗ ( ) = (1 − #) + # tr( )

Ce (T ) = log2 (d2 ) + 1 − #

d2 − 1
d2

log2 1 − #

d2 − 1
d2

d−1
d

log2 1 − #

d−1
d

+#

d2 − 1
#
log2 2
2
d
d

(6.31)

holds, while we have
Cc; 1 (T ) = log2 (d) + 1 − #

+#

d−1
#
log2 ;
d
d

(6.32)

where the maximum in Eq. (6.25) is achieved for an ensemble of equiprobable pure states taken
from an orthonormal basis in H [82]. This is plausible since the ÿrst term under the sup in
∗
Eq. (6.25) becomes maximal and the second becomes minimal:
j is maximally mixed
j pj T
∗
in this case and its entropy is therefore maximal. The entropies of the T j are on the other hand
minimal if the j are pure. In Fig. 6.2 we have plotted both capacities as a function of the noise
parameter # and in Fig. 6.3 we have plotted the quotient Ce (T )=Cc; 1 (T ) which gives an upper bound
on the gain we get from entanglement assistance.


517

2
one-shot cl. capacity
entanglement enhanced cl. capacity

Ce(T)
Cc,1(T)

1.8
1.6
1.4
1.2
1
0.8
0.6
0.4
0.2
0
0

0.2

0.4

0.6

0.8

1

Fig. 6.2. Entanglement enhanced and one-shot classical capacity of a depolarizing qubit channel.
Ce(T)
Cc,1(T)

3
2.9
2.8
2.7
2.6
2.5
2.4
2.3
2.2
2.1
2
0

0.2

0.4

0.6

0.8

1

Fig. 6.3. Gain of using entanglement assisted versus unassisted classical capacity for a depolarizing qubit channel.

As a third example we want to consider Gaussian channels deÿned in Section 3.3.4. Hence consider
the Hilbert space H = L2 (R) describing a one-dimensional harmonic oscillator (or one mode of the
electromagnetic ÿeld) and the ampliÿcation=attenuation channel T deÿned in Eq. (3.74). The results
we want to state concern a slight modiÿcation of the original deÿnitions of Cc; 1 (T ) and Ce (T ): We
will consider capacities for channels with constraint input. This means that only a restricted class of
states on the input Hilbert space of the channel are allowed for encoding. In our case this means

518

10
Ent. enhanced classical capacity
one-shot classical capacity

Ce(T)
Cc,1(T)

9
8
7
6
5
4
3
2
1
0
0

0.5

1

1.5

2

Fig. 6.4. One-shot and entanglement enhanced classical capacity of a Gaussian ampliÿcation=attenuation channel with
Nc = 0 and input noise N = 10.

that we will consider the constraint tr( aa∗ ) 6 N for a positive real number N ¿ 0 and with the
usual creation and annihilation operators a∗ ; a. This can be rewritten as an energy constraint for a
quadratic Hamiltonian; hence this is a physically realistic restriction.
For the entanglement enhanced capacity it can be shown now that the maximum in Eq. (6.28)
is taken on Gaussian states. To get Ce (T ) it is su cient therefore to calculate the quantum mutual
information I (T; ) for the Gaussian state N from Eq. (3.64). The details can be found in [84,18],
we will only state the results here. With the abbreviation
g(x) = (x + 1) log2 (x + 1) − x log2 x ;

(6.33)

we get S( N ) = g(N ) and S(T [ N ]) = g(N ) with N = k 2 N + max{0; k 2 − 1} + Nc (cf. Eq. (3.75))
for the entropies of input and output states and
S( ; T ) = g

D+N −N −1
2

+g

D−N +N −1
2

(6.34)

with
D=

(N + N + 1)2 − 4k 2 N (N + 1)

(6.35)

for the entropy exchange. The sum of all three terms gives Ce (T ) which we have plotted in
Fig. 6.4 as a function of k.
To calculate the one-shot capacity Cc; 1 (T ) the optimization in Eq. (6.25) has to be calculated over
∗
probability distributions pj and collections of density operators j such that
j pj tr(aa j ) 6 N
holds. It is conjectured but not yet proven [84] that the maximum is achieved on coherent states

Ce(T)
Cc1(T)

519

3.5
N=0.1
N=1
N=10

3

2.5

2

1.5

1
0

0.5

1

1.5

2

Fig. 6.5. Gain of using entanglement assisted versus unassisted classical capacity for a Gaussian ampliÿcation=attenuation
channel with Nc = 0 and input noise N = 0:1; 1; 10.

with Gaussian probability distribution p(x) = ( N )−1 exp(−|x|2 =N ). If this is true we get
Cc; 1 (T ) = g(N ) − g(N0 )

with N0 = max{0; k 2 − 1} + Nc :

(6.36)

The result is plotted as a function of k in Fig. 6.4 and the ratio G = Ce =C1 in Fig. 6.5. G gives an
upper bound on the gain of using entanglement assisted versus unassisted classical capacity.
6.3. The quantum capacity
The quantum capacity of a quantum channel T : B(H) → B(H) is more di cult to treat than the
classical capacities discussed in the last section. There is, in particular, no coding theorem available
which would allow explicit calculations. Nevertheless, there are partial results available, which we
will review in the following.
6.3.1. Alternative deÿnitions
Let us start with two alternative deÿnitions of Cq (T ). The ÿrst one proposed by Bennett [16]
di ers only in the error quantity which should go to zero. Instead of the cb-norm the minimal
ÿdelity is used. For a channel T : B(H) → B(H) and a subspace H ⊂ H it is deÿned as
Fp (H ; T ) = inf

∈H

; T [|

|]

(6.37)

and if H = H holds we simply write Fp (T ). Hence a number c is an achievable rate if
lim Fp (Ej T ⊗Mj Dj ) = 1

j →∞

(6.38)

520


holds for sequences
⊗ Nj

Ej : B(H)⊗Mj → M2

;

⊗ Nj

→ B(H)⊗Mj ;

Dj : M2

j∈N

(6.39)

of encodings and decodings and sequences of integers Mj ; Nj , j ∈ N satisfying the same constraints
as in Deÿnition 6.1 (in particular limj→∞ Nj =Mj ¡ c). The equivalence to our version of Cq (T )
follows now from the estimates [168]
T − Id 6 T − Id
T − Id 6 4

cb 6 4

T − Id ;

1 − Fp (T ) 6 4

(6.40)

T − Id :

(6.41)

A second version of Cq (T ) is given in [7]. To state it let us deÿne ÿrst a quantum source as
a sequence N ; N ∈ N of density operators N ∈ B∗ (K⊗N ) (with an appropriate Hilbert space K)
and the entropy rate of this source as lim supN →∞ S( N )=N . In addition we need the entanglement
ÿdelity of a state (with respect to a channel T )
Fe ( ; T ) =

; (T ⊗ Id)[|

|]

;

(6.42)

where
is the puriÿcation of . Now we deÿne c ¿ 0 to be achievable if there is a quantum source
, N ∈ N with entropy rate c such that
N
lim Fe (

n→∞

N ; EN T

⊗N

DN ) = 1

(6.43)

holds with encodings and decodings
EN : B(H)⊗N → B(K⊗N );

DN : B(K⊗N ) → B(H)⊗N ;

j∈N :

(6.44)

Note that these EN , DN play a slightly di erent role than the Ej , Dj in Eq. (6.39) (and in Deÿnition
6.1), because the number of tensor factors of the input and the output algebra is always identical,
while in Eq. (6.39) the quotients of these numbers lead to the achievable rate. To relate both
deÿnitions we have to derive an appropriately chosen family of subspaces HN ⊂ K⊗N from the N
such that the minimal ÿdelities Fp (HN ; EN T ⊗N DN ) of these subspaces go to 1 as N → ∞. If we
identify the HN with tensor products of C2 and the Ej , Dj of Eq. (6.39) with restrictions of EN ,
DN to these tensor products we recover Eq. (6.38). A precise implementation of this rough idea can
be found in [6] and it shows that both deÿnitions just discussed are indeed equivalent.
6.3.2. Upper bounds and achievable rates
Although there is no coding theorem for the quantum capacity Cq (T ), there is a fairly good
candidate which is related to the coherent information
J ( ; T ) = S(T ∗ ) − S( ; T ) :

(6.45)

Here S(T ∗ ) is the entropy of the output state and S( ; T ) is the entropy exchange deÿned in
Eq. (6.26). It is argued [7] that J ( ; T ) plays a role in quantum information theory which is analogous


521

to that of the (classical) mutual information (6.21) in classical information theory. J ( ; T ) has some
nasty properties, however: it can be negative [41] and it is known to be not additive [54]. To relate
it to Cq (T ) it is therefore not su cient to consider a one-shot capacity as in the Shannons Theorem
(Theorem 6.2). Instead, we have to deÿne
Cs (T ) = sup
N

1
Cs; 1 (T ⊗N )
N

with Cs; 1 (T ) = sup J ( ; T ) :

(6.46)

In [7,8] it is shown that Cs (T ) is an upper bound on Cq (T ). Equality, however, is conjectured but
not yet proven, although there are good heuristic arguments [110,90].
A second interesting quantity which provides an upper bound on the quantum capacity uses the
transposition operation
on the output systems. More precisely it is shown in [84] that
Cq (T ) 6 CÂ (T ) = log2 T

(6.47)

cb

holds for any channel. In contrast to many other calculations in this ÿeld it is particular easy to derive
this relation from properties of the cb-norm. Hence we are able to give a proof here. We start with
operates. Assume
the fact that
cb = d if d is the dimension of the Hilbert space on which
N
that Nj =Mj → c 6 Cq (T ) and j large enough such that Id 2 j − Ej T ⊗Mj Dj 6 j with appropriate
encodings and decodings Ej ; Dj . We get
N

2Nj = Id 2 j

N

(Id 2 j − Ej T ⊗Mj Dj )

cb 6
N

6 2Nj Id 2 j − Ej T ⊗Mj Dj
6 2Nj j +

T

Mj

cb

cb

cb

+

E j T ⊗ M j Dj

E j ( T ) ⊗ M j Dj

+

cb

cb

;

(6.49)
(6.50)

where we have used for the last equation the fact that Dj and Ej
cb-norm is multiplicative. Taking logarithms on both sides we get
Nj
log2 (1 − j)
+
6 log2
Mj
Mj

(6.48)

T

cb

:

are channels and that the
(6.51)

T and therefore Cq (T ) 6 log2
T cb = CÂ (T ) as
In the limit j → ∞ this implies c 6 log2
stated.
Since CÂ (T ) is an upper bound on Cq (T ) it is particularly useful to check whether the quantum
capacity for a particular channel is zero. If, e.g., T is classical we have T =T since the transposition
coincides on a classical algebra Cd with the identity (elements of Cd are just diagonal matrices).
This implies CÂ (T ) = log2
T cb = log2 T cb = 0, because the cb-norm of a channel is 1. We
see therefore that the quantum capacity of a classical channel is 0—this is just another proof of the
no-teleportation theorem. A slightly more general result concerns channels T = RS which are the
composition of a preparation R : Md → Cf and a subsequent measurement S : Cf → Md . It is easy
to see that T = RS is a channel, because R is a channel and
is the identity on Cf , hence
R = R and R S = RS = T . Again we get CÂ (T ) = 0.
Let us consider now some examples. The most simple case is again the quantum erasure channel
from Eq. (6.29). As for the classical capacities its quantum capacity can be explicitly calculated [15]
and we have Cq (T ) = max(0; (1 − 2#) log2 (d)); cf. Fig. 6.1.

522

1
one-shot coherent information
transposition bound
Hamming bound

C (T)
Cs,1(T)
0.8

0.6

0.4

0.2

0
0

0.2

0.4

ϑ

0.6

0.8

1

Fig. 6.6. CÂ (T ), Cs (T ) and the Hamming bound of a depolarizing qubit channel plotted as function of the noise
parameter #.

For the depolarizing channel (6.30) precise calculations of Cq (T ) are not available. Hence let us
consider ÿrst the coherent information. J (T; ) inherits from T its unitary covariance, i.e. we have
J (U U ∗ ; T )=J ( ; T ). In contrast to the mutual information, however, it does not have nice concavity
properties, which makes the optimization over all input states more di cult to solve. Nevertheless,
the calculation of J ( ; T ) is straightforward and we get in the qubit case (if # is the noise parameter
of T and is the highest eigenvalue of ):
#
1 − #=2 + A
1 − #=2 − A
J ( ; T ) = S (1 − #) +
−S
−S
2
2
2
−S

#
2

−S

(1 − )#
2

;

(6.52)

where S(x) = −x log2 (x) denotes again the entropy function and
A=

(2 − 1)2 (1 − #=2)2 + 4 (1 − )(1 − #)2 :

(6.53)

Optimization over
can be performed at least numerically (the maximum is attained at the left
boundary ( = 1=2) if J is positive there, and the right boundary otherwise). The result is plotted
together with CÂ (T ) in Fig. 6.6 as a function of Â. The quantity CÂ (T ) is much easier to compute
and we get
3
:
(6.54)
CÂ (T ) = max 0; log2 2 − Â
2
To get a lower bound on Cq (T ) we have to show that a certain rate r 6 Cq (T ) can be achieved
with an appropriate sequence
⊗
⊗
EM : Md M → M2 N (M ) ;

M; N (M ) ∈ N

(6.55)


523

of error correcting codes and corresponding decodings DM . I.e. we need
lim N (M )=M = r

j →∞

and

lim EM T ⊗M DM − Id

j →∞

cb

=0 :

(6.56)

To ÿnd such a sequence note ÿrst that we can look at the depolarizing channel as a device which
produces an error with probability # and leaves the quantum information intact otherwise. If more
and more copies of T are used in parallel, i.e. if M goes to inÿnity, the number of errors approaches
therefore #M . In other words, the probability to have more than #M errors vanishes asymptotically.
To see this consider
M

T ⊗M = ((# − 1)Id + #d−1 tr(·)5)⊗M =

(M
(1 − #)K #N −K TK ) ;

(6.57)

K=1
(M
TK )

denotes the sum of all M -fold tensor products with d−1 tr(·)5 on N places and Id on
where
(M
the N − K remaining—i.e. TK ) is a channel which produces exactly K errors on M transmitted
systems. Now we have
(M
(1 − #)K #N −K TK )

T ⊗M −
K 6#M

(6.58)
cb

(M
(1 − #)K #N −K TK )

=
K¿#M
M

6

(6.59)
cb

(M
(1 − #)K #N −K TK )

cb

(6.60)

K¿#M
M

6
K¿#M

M
K

(1 − #)K #N −K = R :

(6.61)

The quantity R is the tail a of binomial series and vanishes therefore in the limit M → ∞ (cf. e.g.
(M
[131, Appendix B]). This shows that for M → ∞ only terms TK ) with K 6 #M are relevant in
Eq. (6.57)—in other words at most #M errors occur asymptotically, as stated. This implies that
we need a sequence of codes EM which encode N (M ) qubits and correct #M errors on M places.
One way to get such a sequence is “random coding”—the classical version of this method is well
known from the proof of Shannons theorem. The idea is, basically, to generate error correcting
codes of a certain type randomly. E.g. we can generate a sequence of random graphs with N (M )
input and M output vertices (cf. Section 4.4). If we can show that the corresponding codes correct
(asymptotically) #M errors, the corresponding rate r = limM →∞ N (M )=M is achievable. For the
depolarizing channel 21 such an analysis, using randomly generated stabilizer codes shows [16,71]
Cq (T ) 6 1 − H (#) − # log2 3 ;
21

(6.62)

With a more thorough discussion similar results can be obtained for a much more general class of channels, e.g. all
T in a neighborhood of the identity channel; cf. [114].

524

3.5
One-shot coherent information
Transposition bound

C (T)
Cs,1(T)

3

2.5

2

1.5

1

0.5

0
0

0.5

1

1.5

2

Fig. 6.7. CÂ (T ) and Cs (T ) of a Gaussian ampliÿcation=attenuation channel as a function of ampliÿcation parameter k.

where H is the binary entropy from Eq. (5.16). This bound can be further improved using a more
clever coding strategy; cf. [54].
As a third example let us consider again the Gaussian channel studied already in Section 6.2.4.
For CÂ (T ) we have (the corresponding calculation is not trivial and uses properties of Gaussian
channels which we have not discussed; cf. [84].)
CÂ (T ) = max{0; log2 (k 2 + 1) − log2 (|k 2 − 1| + 2Nc )}

(6.63)

and we see that CÂ (T ) and therefore Cq (T ) become zero if Nc is large enough (i.e. Nc ¿ max{1; k 2 }).
The coherent information for the Gaussian state N from Eq. (3.64) has the form
J(

N; T)

= g(N ) − g

D+N −N −1
2

−g

D−N +N −1
2

(6.64)

with N ; D and g as in Section 6.2.4. It increases with N and we can calculate therefore the
maximum over all Gaussian states (which might di er from CS (T )) as
CG (T ) = lim J (
N →∞

N; T)

= log2 k 2 − log2 |k 2 − 1| − g

Nc
−1

k2

:

(6.65)

We have plotted both quantities in Fig. 6.7 as a function of k.
Finally let us have a short look on the special case k = 1, i.e. T describes in this case only the
in uence of classical Gaussian noise on the transmitted qubits. If we set k = 1 in Eq. (6.64) and
take the limit N → ∞ we get CG (T ) = −log2 (Nc e) and CÂ (T ) becomes CÂ (T ) = max{0; −log2 (Nc )};
both quantities are plotted in Fig. 6.8. This special case is interesting because the one-shot coherent


525

8
One-shot coherent information
Transposition bound

C (T)
Cs,1(T)

7

6

5

4

3

2

1

0
0

0.2

0.4

0.6

0.8

1

Fig. 6.8. CÂ (T ) and Cs (T ) of a Gaussian ampliÿcation=attenuation channel as a function of the noise parameter Nc
(and with k = 1).

information CG (T ) is achievable, provided the noise parameter Nc satisÿes certain conditions 22 [77].
Hence there is strong evidence that the quantum capacity lies between the two lines in Fig. 6.8.
6.3.3. Relations to entanglement measures
The duality lemma proved in Section 2.3.3 provides an interesting way to derive bounds on
channel capacities and capacity-like quantities from entanglement measures (and vice versa) [16,90]:
To derive a state of a bipartite system from a channel T we can take a maximally entangled state
∈ H ⊗ H, send one particle through T and get a less entangled pair in the state T = (Id ⊗
T ∗ )|
|. If on the other hand an entangled state ∈ S(H ⊗ H) is given, we can use it as a
resource for teleportation and get a channel T . The two maps → T and T → T are, however,
not inverse to one another. This can be seen easily from the duality lemma (Theorem 2.10): For each
state ∈ S(H⊗H) there is a channel T and a pure state ∈ H⊗H such that =(Id ⊗T ∗ )|
|
holds; but
is in general not maximally entangled (and uniquely determined by ). Nevertheless,
there are special cases in which the state derived from T coincides with : A particular class of
examples is given by teleportation channels derived from a Bell-diagonal state.
On T we can evaluate an entanglement measure E( T ) and get in this way a quantity which
is related to the capacity of T . A particularly interesting candidate for E is the “one-way LOCC”
distillation rate ED; → . It is deÿned in the same way as the entanglement of distillation ED , except
that only one-way LOCC operation are allowed in Eq. (5.8). According to [16] ED; → is related to
Cq by the inequalities ED; → ( ) ¿ Cq (T ) and ED; → (T ) 6 Cq (T ). Hence if T = we can calculate
ED; → ( ) in terms of Cq (T ) and vice versa.
22

It is only shown that log2 ( 1=(Nc e) ) can be achieved, where x denotes the biggest integer less than x. It is very
likely however that this is only a restriction of the methods used in the proof and not of the result.

526


A second interesting example is the transposition bound CÂ (T ) introduced in the last subsection.
It is related to the logarithmic negativity [158]
EÂ (

T)

= log2 (Id ⊗

)

T 1

;

(6.66)

which measures the degree with which the partial transpose of
fails to be positive. EÂ can be
regarded as entanglement measure although it has some drawbacks: it is not LOCC monotone (Axiom
E2), it is not convex (Axiom E3) and most severe: It does not coincides with the reduced von
Neumann entropy on pure states, which we have considered as “the” entanglement measure for
pure states. On the other hand, it is easy to calculate and it gives bounds on distillation rates
and teleportation capacities [158]. In addition EÂ can be used together with the relation between
depolarizing channels and isotropic states to derive Eq. (6.54) in a very simple way.

7. Multiple inputs
We have seen in Section 4 that many tasks of quantum information which are impossible with
one-shot operations can be approximated by channels which operate on a large number of equally
prepared inputs. Typical examples are approximate cloning, undoing noise and distillation of entanglement. There are basically two questions which are interesting for a quantitative analysis: First,
we can search for the optimal solutions for a ÿxed number N of input systems and second we can
ask for the asymptotic behavior in the limit N → ∞. In the latter case the asymptotic rate, i.e. the
number of outputs (of a certain quality) per input system is of particular interest.
7.1. The general scheme
Both types of questions just mentioned can be treated (up to certain degree) independently from
the (impossible) task we are dealing with. In the following we will study the corresponding general
scheme. Hence consider a channel T : B(H⊗M ) → B(H⊗N ) which operates on N input systems
and produces M outputs of the same type. Our aim is to optimize a “ÿgure of merit” F(T ) which
measures the deviation of T ∗ ( ⊗N ) from the target functional we want to approximate. The particular
type of device we are considering is mainly ÿxed by the choice of F(T ) and we will discuss in the
following the most relevant examples. (Note that we have considered them already on a qualitative
level in Section 4; cf. in particular Sections 4.2 and 4.3.)
7.1.1. Figures of merit
Let us start with pure state cloning [68,31,32,35,166,98], i.e. for each (unknown) pure input state
=|
|, ∈ H the M clones T ∗ ( ⊗N ) produced by the channel T should approximate M copies
of the input in the common state ⊗M as good as possible. There are in fact two di erent possibilities
to measure the distance of T ∗ ( ⊗N ) to ⊗M . We can either check the quality of each clone separately
or we can test in addition the correlations between output systems. With the notation
(j)

= 5⊗( j−1) ⊗

⊗ 5⊗(M −j) ∈ B(H⊗M )

(7.1)


527

a ÿgure of merit for the ÿrst case is given by
Fc; 1 (T ) = inf

j=1;:::;N

inf tr(

(j)

pure

T ∗(

⊗N

)) :

It measures the worst one-particle ÿdelity of the output state T ∗ (
correlations too, we have to choose
Fc; all (T ) = inf tr(

⊗M

pure

T ∗(

⊗N

)) ;

(7.2)
⊗N ).

If we are interested in
(7.3)

which is again a “worst case” ÿdelity, but now of the full output with respect to M uncorrelated
copies of the input .
Instead of ÿdelities we can consider other error quantities like trace-norm distances or relative
entropies. In general, however, we do not get signiÿcantly di erent results from such alternative
choices; hence, we can safely ignore them. Real variants arise if we consider instead of the inÿma
over all pure states quantities which prefer a (possibly discrete or even ÿnite) class of states. Such
a choice leads to “state-dependent cloning”, because the corresponding optimal devices perform
better as “universal” ones (i.e. those described by the ÿgures of merit above) on some states but
much worse on the rest. We ignore state-dependent cloning in this work, because the universal
case is physically more relevant and technically more challenging. Other cases which we do not
discuss either include “asymmetric cloning”, which arises if we trade in Eq. (7.2) the quality of
one particular output system against the rest (see [40]), and cloning of mixed states. The latter is
much more di cult than the pure state case and even for classical systems, where it is related to
the so-called “bootstrap” technique [59], non-trivial.
Closely related to cloning is puriÿcation, i.e. undoing noise. This means we are considering N
systems originally prepared in the same (unknown) pure state but which have passed a depolarizing
channel
R∗ = # + (1 − #)5=d

(7.4)

afterwards. The task is now to ÿnd a device T acting on N of the decohered systems such that
T ∗ (R∗ ) is as close as possible to the original pure state. We have the same basic choices for a
ÿgure of merit as in the cloning problem. Hence, we deÿne
FR; 1 (T ) = inf

j=1;:::;N

inf tr(
pure

(j)

T ∗ [(R∗ )⊗N ])

(7.5)

and
FR; all (T ) = inf tr(
pure

⊗M

T ∗ [(R∗ )⊗N ]) :

(7.6)

These quantities can be regarded as generalizations of Fc; 1 and Fc; all which we recover if R∗ is the
identity.
Another task we can consider is the approximation of a map which is positive but not completely
positive, like the transposition. Positivity and normalization imply that ∗ maps states to states but
cannot be realized by a physical device. An explicit example is the universal not gate (UNOT)
which maps each pure qubit state
to its orthocomplement ⊥ [36]. It is given the anti-unitary
operator
= |0 + ÿ|1 →

= |0 − ÿ|1 :

(7.7)

528


Since
is a state if
is, we can ask again for a channel T such that T ∗ ( ⊗N ) approximates
( )⊗M . As in the two previous examples we have the choice to allow arbitrary correlations in the
output or not and we get the following ÿgures of merit:
FÂ; 1 (T ) = inf

j=1;:::;N

inf tr((
pure

)(j) T ∗ (

⊗N

))

(7.8)

and
FÂ; all (T ) = inf tr((
pure

)⊗ M T ∗ (

⊗N

)) :

(7.9)

Note that we can plug in for
basically any functional which maps states to states. In addition we
can combine Eqs. (7.5) and (7.6) on the one hand with (7.8) and (7.9) on the other. As result we
would get a measure for devices which undo an operation R and approximate an impossible machine
at the same time.
7.1.2. Covariant operations
All the functionals just deÿned give rise to optimization problems which we will study in greater
detail in the next sections. This means we are interested in two things: First of all the maximal
value of F#; “ (with # = c; R; Â and “ = 1; all) given by
F#; “ (N; M ) = inf F#; “ (T ) ;

(7.10)

T

where the supremum is taken over all channels T : B(H⊗M ) → B(H⊗N ), and second the particular
ˆ
channel T where the optimum is attained. At a ÿrst look a complete solution of these problems
seems to be impossible, due to the large dimension of the space of all T , which scales exponentially
in M and N . Fortunately, all F#; “ (T ) admit a large symmetry group which allows in many cases
ˆ
the explicit calculation of the optimal values F#; “ (N; M ) and the determination of optimizers T with
a certain covariance behavior. Note that this is an immediate consequence of our decision to restrict
the discussion to “universal” procedures, which do not prefer any particular input state.
Let us consider permutations of the input systems ÿrst: If p ∈ SN is a permutation on N places and
Vp the corresponding unitary on H⊗N (cf. Eq. (3.7)) we get obviously T ∗ (Vp ⊗N Vp∗ ) = T ∗ ( ⊗N ),
hence
F#; “ [

p (T )]

= F#; “ (T )

∀p ∈ SN

with [

p (T )](A)

= Vp∗ T (A)Vp :

(7.11)

In other words: F#; “ (T ) is invariant under permutations of the input systems. Similarly, we can
show that F#; “ (T ) is invariant under permutations of the output systems:
F#; “ [ÿp (T )] = F(T )

∀p ∈ SM

with [ÿp (T )](A) = T (Vp∗ AVp ) :

(7.12)

To see this consider e.g. for # = c and “ = all
tr[

⊗M

Vp T ∗ (

⊗N

)Vp∗ ] = tr[Vp

⊗M

Vp∗ T ∗ (

For the other cases similar calculations apply.

⊗N

)] = tr[

⊗M

T ∗(

⊗N

)] :

(7.13)


529

Finally, none of the F#; “ (T ) singles out a preferred direction in the one-particle Hilbert space H.
This implies that we can rotate T by local unitaries of the form U ⊗N , respectively U ⊗M , without
changing F#; “ (T ). More precisely we have
F#; “ [

U (T )]

= F#; “ (T )

∀U ∈ U (d)

(7.14)

with
[

U (T )](A)

= U ∗⊗N T (U ⊗M AU ∗⊗M )U ⊗N :

(7.15)

The validity of Eq. (7.14) can be proven in the same way as (7.11) and (7.12). The details are
therefore left to the reader.
Now we can average over the groups SN ; SM and U (d). Instead of the operation T we consider
T=

1
N !M ! p∈S

N

G

q ∈ SM

p ÿq U (T ) dU

;

(7.16)

where dU denotes the normalized, left invariant Haar measure on U (d). We see immediately that
T has the following symmetry properties:
p (T )

= T;

ÿq (T ) = T ;

U (T )

=T

∀p ∈ SN

∀q ∈ SM

∀U ∈ U (d)

(7.17)

and we will call each operation T fully symmetric, if it satisÿes this equation. The concavity of
F#; “ implies immediately that it cannot decrease if we replace T by T :


1

F#; “ (T ) = F#; “ 
(7.18)
p ÿq U (T ) dU
N !M ! p∈S q∈S G
N

¿

1
N !M ! p∈S

N

q ∈ SM

G

M

F#; “ [

p ÿq U (T )] dU

= F#; “ (T ) :

(7.19)

To calculate the optimal value F#; “ (N; M ) it is therefore completely su cient to search a maximizer
for F#; “ (T ) only among fully symmetric T and to evaluate F#; “ (T ) for this particular operation. This
simpliÿes the problem signiÿcantly because the size of the parameter space is extremely reduced.
Of course, we do not know from this argument whether the optimum is attained on non-symmetric
operations, however this information is in general less important (and for some problems like optimal
cloning a uniqueness result is available).
7.1.3. Group representations
To get an idea how this parameter reduction can be exploited practically, let us reconsider Theorem 3.1: The two representations U → U ⊗N and p → Vp of U (d), respectively SN , on H⊗N
are “commutants” of each other, i.e., any operator on H⊗N commuting with all U ⊗N is a linear
combination of the Vp , and conversely. This knowledge can be used to decompose the representation
U ⊗N (and Vp as well) into irreducible components. To reduce the group theoretic overhead, we will
discuss this procedure ÿrst for qubits only and come back to the general case afterwards.

530


Hence assume that H = C2 holds. Then H⊗N is the Hilbert space of N (distinguishable) spin-1=2
particles and it can be decomposed into terms of eigenspaces of total angular momentum. More
precisely consider
Lk =

1
2

j

( j)
k ;

k = 1; 2; 3

(7.20)

the k-component of total angular momentum (i.e. k is the kth Pauli matrix and (j) ∈ B(H⊗N ) is
deÿned according to Eq. (7.1)) and ˜ 2 = k L2 . The eigenvalue expansion of ˜ 2 is well known to
L
L
k
be
˜=
L

s(s + 1)Ps

0; 1; : : : ; N=2;

with s =

N even;

(7.21)

1=2; 3=2; : : : ; N=2; N odd;

j

L
where the Ps denote the projections to the eigenspaces of ˜ 2 . It is easy to see that both representations U → U ⊗N and p → Vp commute with ˜ Hence the eigenspaces Ps H⊗N of ˜ 2 are invariant
L.
L
subspaces of U ⊗N and Vp and this implies that the restriction of U ⊗N and Vp to them are represenL
tations of SU(2), respectively SN . Since ˜ 2 is constant on Ps H⊗N the SU(2) representation we get
in this way must be (naturally isomorphic to) a multiple of the irreducible spin-s representation s .
It is deÿned by
s

i
exp
2

= exp

k

(s)
iLk

(s)
Lk

with

1
=
2

2s
j=1

(j)
k

;

(7.22)

on the representation space
⊗
Hs = H+ 2s

(7.23)

(the Bose-subspace of H⊗2s ). Hence we get
Ps H⊗N ∼ Hs ⊗ KN; s ;
=

U ⊗N

= ( s (U ) ⊗ 5)

∀ ∈ P s H⊗ N :

(7.24)

Since Vp and U ⊗N commute the Hilbert space KN; s carries a representation ˆN; s (p) of SN which
is irreducible as well. Note that KN; s depends in contrast to Hs on the number N of tensor factors
and its dimension is (see [100] or [142] for general d)
dim KN; s =

2s + 1
N=2 + s + 1

N
N=2 − s

:

(7.25)

Summarizing the discussion we get
H⊗N ∼ ⊕ Hs ⊗ KN; s ;
=
s

U ⊗N ∼ ⊕
=
s

s (U )

⊗ 5;

Vp ∼ ⊕ 5 ⊗ ˆ(p) :
=

Let us consider now a fully symmetric operation T . Permutation invariance (
ÿp (T ) = T ) implies together with Eq. (7.26) that
T (Aj ⊗ Bj ) = ⊕
s

tr (Bj )
Tsj (Aj ) ⊗ 5
dim KN; j

(7.26)

s

with Tsj : B(Hj ) → B(Hs )

p (T )

= T and
(7.27)


531

holds if Aj ⊗ Bj ∈ B(Hj ⊗ KN; j ). The operations Tsj are unital and have, according to
the following covariance properties:
s (U )T (Aj ) s (U

∗

) = T [ j (U )Aj j (U ∗ )]

∀U ∈ SU(2) :

U (T )

=T

(7.28)

The classiÿcation of all fully symmetric channels T is reduced therefore to the study of all
these Tsj .
We can apply now the covariant version of Stinespring’s theorem (Theorem 3.3) to ÿnd that
˜
V : Hs → Hj ⊗ H;

Tsj (Aj ) = V ∗ (Aj ⊗ 5)V;

V s (U ) =

j (U )

⊗ ˜ (U )V ;

(7.29)

˜
where ˜ is a representation of SU(2) on H. If ˜ is irreducible with total angular momentum l the
“intertwining operator” V is well known: Its components in a particularly chosen basis coincide with
certain Clebsh–Gordon coe cients. Hence, the corresponding operation is uniquely determined (up
to unitary equivalence) and we write
Tsjl (Aj ) = [Vl (Aj ⊗ 5)Vl ];

Vl s (U ) =

j (U )

⊗

l (U )Vl

;

(7.30)

where l can range from |j − s| to j + s. Since in a general representation ˜ can be decomposed into
irreducible components we see that each covariant Tsj is a convex linear combination of the Tsjl and
we get with Eq. (7.27)
T (Aj ⊗ Bj ) = ⊕
s

cjl [Tsjl (Aj ) ⊗ (tr (Bj )5)]

;

(7.31)

l

where the cjl are constrained by cjl ¿ 0 and j cjl = (dim KN; j )−1 . In this way we have parameterized the set of fully symmetric operations completely in terms of group theoretical data and we can
rewrite F#; “ (T ) accordingly. This leads to an optimization problem for a quantity depending only
on s; j and l, which is at least in some cases solvable.
To generalize the scheme just presented to the case H = Cd with arbitrary d we only have to
ÿnd a replacement for the decomposition in Eq. (7.26). This, however, is well known from group
theory:
H⊗N ∼ ⊕HY ⊗ KY ;
=
Y

U ⊗N ∼ ⊕
=
Y

Y (U )

⊗ 5;

Vp ∼ ⊕5 ⊗ ˆY (p) ;
=
Y

(7.32)

where Y : U (d) → B(HY ) and ˆY : SN → B(KY ) are irreducible representations. The summation
index Y runs over all Young frames with d rows and N boxes, i.e. by the arrangements of N
boxes into d rows of lengths Y1 ¿ Y2 ¿ · · · ¿ Yd ¿ 0 with k Yk = N . The relation to total angular
momentum s used as the parameter for d = 2 is given by Y1 − Y2 = 2s, which determines Y together
with Y1 + Y2 = N completely. The rest of the arguments applies without signiÿcant changes, this is
in particular the case for Eq. (7.31) which holds for general d if we replace s; j and l by Young
frames. However, the representation theory of U (d) becomes much more di cult. The generalization
of results available for qubits (d = 2) to d ¿ 2 is therefore not straightforward.
Finally, let us give a short comment on Gaussian states here. Obviously, the methods just described
do not apply in this case. However, we can consider instead of U ⊗N -covariance, covariance with
respect to phase-space translations. Following this idea some results concerning optimal cloning of
Gaussian states are obtained (see [43] and the references therein), but the corresponding general
theory is not as far developed as in the ÿnite-dimensional case.

532


7.1.4. Distillation of entanglement
Finally, let us have another look at distillation of entanglement. The basic idea is quite the same
as for optimal cloning: Use multiple inputs to approximate a task which is impossible with one-shot
operations. From a more technical point of view, however, it does not ÿt into the general scheme
proposed up to now. Nevertheless, some of the arguments can be adopted in an easy way. First of
all we have to replace the “one-particle” Hilbert space H with a twofold tensor product HA ⊗ HB
and the channels we have to look at are LOCC operations
T : B(H⊗M ⊗ H⊗M ) → B(H⊗N ⊗ H⊗N ) ;
B
B
A
A

(7.33)

cf. Section 4.3. Our aim is to determine T such that T ∗ ( ⊗N ) is for each distillable (mixed) state
∈ B∗ (HA ⊗ HB ), close to the M -fold tensor product |
|⊗M of a maximally entangled state
∈ HA ⊗ HB . A ÿgure of merit with a similar structure as the F#; all studied above can be derived
directly from the deÿnition of the entanglement measure ED in Section 5.1.3: We deÿne (replacing
the trace-norm distance with a ÿdelity)
FD (T ) = inf inf

⊗M

; T ∗(

⊗N

)

⊗M

;

(7.34)

where the inÿma are taken over all maximally entangled states
and all distillable states . Alternatively, we can look at state-dependent measures, which seem to be particularly important if we
try to calculate ED ( ) for some state . In this case we simply get
FD; (T ) = inf

⊗M

; T ∗(

⊗N

)

⊗M

:

(7.35)

To translate the group theoretical analysis of the last two subsections is somewhat more di cult.
As in the case of F#; “ we can restrict the search for optimizers to permutation invariant operations,
i.e. p (T ) = T and ÿp (T ) = T in the terminology of Section 7.1.2. Unitary covariance
U ⊗N T (A)U ∗⊗N = T (U ⊗M AU ∗⊗M ) ;

(7.36)

however, cannot be assumed for all unitaries U of HA ⊗ HB , but only for local ones (U = UA ⊗ UB )
in the case of FD or only for local U which leave invariant for FD; . This makes the analog of the
decomposition scheme from Section 7.1.3 more di cult and such a study is (up to my knowledge)
not yet done. A related subproblem arises if we consider FD; from Eq. (7.35) for a state with
special symmetry properties; e.g. an OO-invariant state. The corresponding optimization might be
simpler and a solution would be relevant for the calculation of ED .
7.2. Optimal devices
Now we can consider the optimization problems associated to the ÿgures of merit discussed in the
last section. This means that we are searching for those devices which approximate the impossible
tasks in question in the best possible way. As pointed out at the beginning of this Section this can
be done for ÿnite N and in the limit N → ∞. The latter is postponed to the next section.
7.2.1. Optimal cloning
The quality of an optimal, pure state cloner is deÿned by the ÿgures of merit Fc; # in Eqs. (7.2)
and (7.3) and the group theoretic ideas sketched in Section 7.1.3 allow the complete solution of


533

this problem. We will demonstrate some of the basic ideas in the qubit case ÿrst and state the ÿnal
result afterwards in full generality.
The solvability of this problem relies in part on the special structure of the ÿgures of merit Fc; # ,
which allows further simpliÿcations of the general scheme sketched in Section 7.1.3. If we consider
e.g. Fc; 1 (T ) (the other case works similarly) we get
Fc; 1 (T ) = inf

j=1;:::;N

= inf

j=1;:::;N

inf tr (

inf tr (T (

T ∗(

⊗N

(j)

⊗N

pure

= inf inf
j=1;:::;N

(j)

pure

⊗N

; T(

)

(j)

)

))

(7.37)

))

(7.38)

⊗N

:

(7.39)

Hence Fc; # only depends on the B(H⊗N ) component (where H⊗N denotes again the Bose-subspace
+
+
of H⊗N ) of T and we can assume without loss of generality that T is of the form
T : B(H⊗M ) → B(H⊗N ) :
+

(7.40)

The restriction of U ⊗N to H⊗N is an irreducible representation (for any d) and in the qubit case
+
(d = 2) we have U ⊗N = s (U ) with s = N=2 for all ∈ H⊗N . The decomposition of T from
+
Eq. (7.27) contains therefore only those summands with s = N=2. This simpliÿes the optimization
problem signiÿcantly, since the number of variables needed to parametrize all relevant cloning maps
according to Eq. (7.31) is reduced from 3 to 2. A more detailed (and non-trivial) analysis shows that
the maximum for Fc; 1 and Fc; all is attained if all terms in (7.31) except the one with s=N=2; j=N=2
and l=(M −N )=2 vanish. The precise result is stated in the following theorem ([68,31,32] for qubits
and [166,98] for general d).
Theorem 7.1. For each H = Cd both ÿgures of merit Fc; 1 and Fc; all are maximized by the cloner
d[N ]
ˆ∗
SM ( ⊗ 5)SM ;
(7.41)
T ( )=
d[M ]
where d[N ]; d[M ] denote the dimensions of the symmetric tensor products H⊗N ; respectively
+
H⊗M ; and SM is the projection from H⊗M to H⊗M . This implies for the optimal ÿdelities
+
+
d−1 N M +d
(7.42)
Fc; 1 (N; M ) =
d N +d M
and
d[N ]
:
(7.43)
Fc; all (N; M ) =
d[M ]
ˆ
T is the unique solution for both optimization problems; i.e. there is no other operation T of form
(7.40) which maximizes Fc; 1 or Fc; all .
There are two aspects of this result which deserve special attention. One is the relation to state
estimation which is postponed to Section 7.2.3. The second concerns the role of correlations: It does
not matter whether we are looking for the quality of each single clone (Fc; 1 ) only, or whether
correlations are taken into account (Fc; all ). In both cases we get the same optimal solution. This is

534


a special feature of pure states, however. Although there are no concrete results for quantum systems,
it can be checked quite easily in the classical case that considering correlations changes the optimal
cloner for arbitrary mixed states drastically.
7.2.2. Puriÿcation
To ÿnd an optimal puriÿcation device, i.e. maximizing FR; # , is more di cult than the cloning
problem, because the simpliÿcation from Eq. (7.40) does not apply. Hence we have to consider all
the summands in the direct sum decomposition of T from Eq. (7.31) and solutions are available
only for qubits. Therefore we will assume for the rest of this subsection that H = C2 holds. The
SU(2) symmetry of the problem allows us to assume without loss of generality that the pure initial
state
coincides with one of the basis vectors. Hence we get for the (noisy) input states of the
puriÿer
1
1
3
(ÿ) =
exp 2ÿ
= ÿ
2 cosh (ÿ)
2
e + e− ÿ
= tanh(ÿ)|

| + (1 − tanh(ÿ)) 1 5;
2

eÿ

0

0

e− ÿ

(7.44)

= |0 ;

(7.45)

The parameterization of in terms of the “pseudo-temperature” ÿ is chosen here, because it simpliÿes
some calculations signiÿcantly (as we will see soon). The relation to the form of = R∗ initially
given in Eq. (7.4) is obviously # = tanh(ÿ).
To state the main result of this subsection we have to decompose the product state (ÿ)⊗N
into spin-s components. This can be done in terms of Eq. (7.26). (ÿ) is not unitary of course.
However, we can apply (7.26) by analytic continuation, i.e. we treat (ÿ) in the same way as we
would exp(iÿ 3 ). It is then straightforward to get
5
(7.46)
(ÿ)⊗N = ⊕ wN (s) s (ÿ) ⊗
s
dim KN; s
with
wN (s) =

sinh((2s + 1)ÿ)
dim KN; s
sinh(ÿ)(2 cosh(ÿ))N

(7.47)

and
s (ÿ)

=

sinh(ÿ)
(s)
exp(2ÿL3 ) ;
sinh((2s + 1)ÿ)

(s)
where L3 is the three-component of angular momentum in the spin-s representation and the dimension of KN; s is given in Eq. (7.25). By (7.23) the representation space of s coincides with
the symmetric tensor product H2s . Hence we can interpret s (ÿ) as a state of 2s (indistinguish+
able) particles. In other words the decomposition of (ÿ)⊗N leads in a natural way to a family of
operations

⊗
Qs : B(H+ 2s ) → B(H⊗N )

∗
with Qs [ (ÿ)⊗N ] =

s (ÿ)

:

(7.48)

We can think of the family Qs , of operations as an instrument Q which measures the number of
output systems and transforms (ÿ)⊗N to the appropriate s (ÿ). The crucial point is now that the
purity of s (ÿ), measured in terms of ÿdelities with respect to increases provided s ¿ 1=2 holds.


535

Hence, we can think of Q as a puriÿer which arises naturally by reduction to irreducible spin
components [46]. Unfortunately, Q does not produce a ÿxed number of output systems. The most
obvious way to construct a device which produces always the same number M of outputs is to run
ˆ
the optimal 2s → M cloner T 2s→M if 2s ¡ M or to drop 2s − M particles if M 6 2s holds. More
ˆ
precisely we can deÿne Q : B(H⊗M ) → B(H⊗N ) by
ˆ∗
Q [ (ÿ)⊗N ] =
s

with

ˆ∗
wN (s)T 2s→M [ s (ÿ)]

(7.49)

 d[2s]

SM ( ⊗ 5)SM ; for M ¿ 2s;
ˆ ∗ →M ( ) = d[M ]
T 2s

tr 2s−M
for M 6 2s:

(7.50)

tr 2s−M denotes here the partial trace over the 2s−M ÿrst tensor factors. Applying the general scheme
of Section 7.1.3 shows that this is the best way to get exactly M puriÿed qubits [100]:
ˆ
Theorem 7.2. The operation Q deÿned in Eq. (7.49) maximizes FR; 1 and FR; all . It is called therefore the optimal puriÿer. The maximal values for FR; 1 and FR; all are given by
FR; 1 (N; M ) =

wN (s)f1 (M; ÿ; s);

FR; all (N; M ) =

s

wN (s)fall (M; ÿ; s)

(7.51)

s

with
2f1 (M; ÿ; s) − 1

 2s + 1 coth((2s + 1)ÿ) − 1 coth ÿ

for 2s ¿ M;

2s
2s
=
 1 M +2


((2s + 1) coth((2s + 1)ÿ) − coth ÿ) for 2s 6 M;
2s + 2 M
and

fall (M; ÿ; s) =


 2s + 1 1 − e−2ÿ


 M + 1 1 − e−(4s+2)ÿ

 1 − e−2ÿ



 1 − e−(4s+2)ÿ

2s
M

(7.52)

M 6 2s
−1

K
K

M

(7.53)
e

2ÿ(K −s)

M ¿ 2s:

The expression for the optimal ÿdelities given here look rather complicated and are not very
illuminating. We have plotted there both quantities as a function of # (Fig. 7.1) of N (Fig. 7.2) and
M (Fig. 7.3). While the ÿrst two plots looks quite similar the functional behavior in dependence of
M seems to be very di erent. The study of the asymptotic behavior in the next section will give a
precise analysis of this observation.
7.2.3. Estimating pure states
We have already seen in Section 4.2 that the cloning problem and state estimation are closely
related, because we can construct an approximate cloner T from an estimator E simply by running

536


Fig. 7.1. One- and all-qubit ÿdelities of the optimal puriÿer for N = 100 and M = 10. Plotted as a function of the noise
parameter #.

Fig. 7.2. One- and all-qubit ÿdelities of the optimal puriÿer for # = 0:5 and M = 10. Plotted as a function of N .

E on the N input states, and preparing M systems according to the attained classical information.
In this section we want to go the other way round and show that the optimal cloner derived in
Theorem 7.1 leads immediately to an optimal pure state estimator; cf. [33].
To this end let us assume that E has the form (cf. Section 4.2)
C(X )

f( )E ∈ B(H⊗N ) ;

f → E(f) =
∈X

(7.54)


537

Fig. 7.3. One- and all-qubit ÿdelities of the optimal puriÿer for # = 0:5 and N = 10. Plotted as a function of M .

where X ⊂ B∗ (H) is a ÿnite set 23 of pure states. The quality of E can be measured in analogy to
Section 7.1.1 by a ÿdelity-like quantity
Fs (E) = inf

∈H

;

⊗N

= inf

∈H

;E

⊗N

;

;

(7.55)

∈X

⊗N ; E ⊗n
is the (density matrix valued) expectation value of E and the inÿmum
where =
is taken over all pure states . Hence Fs (E) measures the worst ÿdelity of
with respect to the
input state . If we construct now a cloner TE from E by
∗
TE (|

|⊗N ) =

⊗N

;E

⊗n

⊗M

(7.56)

its one-particle ÿdelity Fc; 1 (TE ) coincides obviously with Fs (E). Since we can produce in this way
arbitrary many clones of the same quality we see that Fs (E) is smaller than Fc; 1 (N; M ) for all M
and therefore
d−1 N
;
(7.57)
Fs (E) 6 Fc; 1 (N; ∞) = lim Fc; 1 (N; M ) =
M →∞
d N +d
where we can look at Fc; 1 (N; ∞) as the optimal quality of a cloner which produces arbitrary many
outputs from N input systems.
To see that this bound can be saturated consider an asymptotically exact family
C(XM )

f( )E M ∈ B(H⊗M ); XM ⊂ S(H)

f → E M (f) =

(7.58)

∈X
23

The generalization of the following considerations to continuous sets and a measure theoretic setup is straightforward
and does not lead to a di erent result; i.e. we cannot improve the estimation quality with continuous observables.

538


of estimators, i.e. the error probabilities (4.17) vanish in the limit N → ∞. If the E M ∈ B(H⊗M ) are
pure tensor products (i.e. the E M are realized by a “quorum” of observables as described in Section
ˆ∗
4.2.1) they cannot distinguish between the output state T ( ⊗N ) (which is highly correlated) and
the pure product state ˜⊗M where ˜ ∈ B∗ (H) denotes the partial trace over M − 1 tensor factors
(due to permutation invariance it does not matter which factors we trace away here). Hence if we
ˆ
apply E M to the output of the optimal N to M cloner T N →M we get an estimate for ˜ and in
the limit M → ∞ this estimate is exact. The ÿdelity
;˜
of ˜ with respect to the pure input
ˆ
ˆ
state
of T N →M coincides however with Fc; 1 (N; M ). Hence the composition of T N →M with E M
24 to an estimator E with F (E) = F (N; ∞). We can rephrase this result roughly in
converges
e
c; 1
the from: “producing inÿnitely many optimal clones of a pure state is the same as estimating
optimally”.
7.2.4. The UNOT gate
ˆ
The discussion of the last subsection shows that the optimal cloner T N →M produces better clones
than any estimation-based scheme (as in Eq. (7.56)), as long as we are interested only in ÿnitely
many copies. Loosely speaking we can say that the detour via classical information is wasteful and
destroys too much quantum information. The same is true for the optimal puriÿer: We can ÿrst
run an estimator on the mixed input state (ÿ)⊗N , apply the inverse (R∗ )−1 of the channel map
to the attained classical data and reprepare arbitrarily many puriÿed qubits accordingly. The quality
of output systems attained this way is, however, worse than those of the optimal puriÿer from
Eq. (7.49) as long as the number M of output systems is ÿnite; this can be seen easily from
Fig. 7.3. In this sense the UNOT gate is a harder task than cloning and puriÿcation, because there
is no quantum operation which performs better than the estimation-based strategy. The following
theorem can be proved again with the group theoretical scheme from Section 7.1.3 [36].
Theorem 7.3. Let H=C2 . Among all channels T : B(H) → B(H⊗N ) the estimation-based scheme
+
just described attains the biggest possible value for the ÿdelity FÂ; # ; namely
1
FÂ; 1 (N; 1) = FÂ; all (N; 1) = 1 −
:
(7.59)
N +2
The dependence on the number M of outputs is not interesting here, because the optimal device
produces arbitrarily many copies of the same quality.
7.3. Asymptotic behaviour
If a device, such as the optimal cloner, is given which produces M output system from N inputs it
is interesting to ask for the maximal rate, i.e. the maximal ratio M (N )=N in the limit N → ∞ such
that the asymptotic ÿdelity limN →∞ F(N; M (N )) is above a certain threshold (preferably equal to
one). Note that this type of question was very important as well for distillation of entanglement and
channel capacities, but almost not computable in there. In the current context this type of question is
somewhat easier to answer. This relies on the one hand on the group theoretical structure presented
24

Basically convergence must be shown here. It follows however easily from the corresponding property of the E M .


539

in the last section and on the other on the close relation to quantum state estimation. We start this
section therefore with a look on some aspects of the asymptotics of mixed state estimation.
7.3.1. Estimating mixed state
If we do not know a priori that the input systems are in a pure state much less is known about
estimating and cloning. It is, in particular, almost impossible to say anything about optimality for
ÿnitely many input systems (only if N is very small e.g. [156]). Nevertheless, some strong results
are available for the behavior in the limit N → ∞ and we will give here a short review of some
of them.
One quantity, interesting to be analyzed for a family of estimators E N in the limit N → ∞ is the
variance of the E N . To state some results in this context it is convenient to parameterize the state
space S(H) or parts of it in terms of n real parameters x = (x1 ; : : : ; x n ) = ⊂ Rn and to write (x)
as the corresponding state. If we want to cover all states, one particular parameterization is e.g. the
generalized Bloch ball from Section 2.1.2. An estimator taking N input systems is now a (discrete)
N
observable Ex ∈ B(H⊗N ); x ∈ XN with values in a (ÿnite) subset XN of . The expectation value
N
of E in the state (x)⊗N is therefore the vector E N x with components E N x ; j; j = 1; : : : ; n
given by
EN

x; j

=
y ∈ XN

N
yj tr(Ey (x)⊗N )

(7.60)

and the mean quadratic error is described by the matrix
N
Vjk (x) =

( EN
y ∈ XN

x; j

− yj )( EN

x; k

N
− yk )tr (Ey (x)⊗N ) :

(7.61)

For a good estimation strategy we expect that Vjk (x) decreases as 1=N , i.e.
Wjk (x)
N
;
(7.62)
Vjk (x)
N
where the scaled mean quadratic error matrix Wjk (x) does not depend on N . The task is now to
ÿnd bounds on this matrix. We will state here one result taken from [66]. To this end we need the
Hellstrom quantum information matrix
j(x) k (x) − k (x) j (x)
Hjk (x) = tr (x)
;
(7.63)
2
which is deÿned in terms of symmetric logarithmic derivatives
by
9 (x)
j (x) (x) + (x) j (x)
:
=
9xj
2

j,

which in turn are implicitly given
(7.64)

Now we have the following theorem [66]:
Theorem 7.4. Consider a family of estimators E N ; N ∈ N as described above such that the following conditions hold:
N
1. The scaled mean quadratic error matrix NVjk (x) converges uniformly in x to Wjk (x) as
N → ∞.

540


2. Wjk (x) is continuous at a point x0 = x.
3. Hjk (x) and its derivatives are bounded in a neighborhood of x0 .
Then we have
tr[H −1 (x0 )W −1 (x0 )] 6 (d − 1) :

(7.65)

For qubits this bound can be attained by a particular estimation strategy which measures on each
qubit separately. We refer to [66] for details.
A second quantity interesting to study in the limit N → ∞ is the error probability deÿned in
Section 4.2; cf. Eq. (4.17). For a good estimation strategy it should go to zero of course, an
additional question, however, concerns the rate with which this happens. We will review here a
result from [99] which concerns the subproblem of estimating the spectrum. Hence we are looking
now at a family of observables E N : C(XN ) → B(H⊗N ); N ∈ N taking their values in a ÿnite subset
XN of the set
=

(x1 ; : : : ; xd ) ∈ Rd | x1 ¿ · · · ¿ xd ¿ 0;

xj = 1

(7.66)

j

of ordered spectra of density operators on H = Cd . Our aim is to determine the behavior of the
error probabilities (cf. Eq. (4.17)
N
tr(Ex

KN ( ) =

⊗N

)

(7.67)

x ∈ ∩ XN

in the limit N → ∞. Following the general arguments in Section 7.1.2 we can restrict our attention
N
here to covariant observables, i.e. we can assume without loss of cloning quality that the Ex commute
⊗N ; U ∈ U (d). If we restrict our
with all permutation unitaries Vp ; p ∈ SN and all local unitaries U
attention in addition to projection-valued measures, which is suggestive for ruling out unnecessary
N
fuzziness, we see that each Ex must coincide with a (sum of) projections PY from H⊗N onto
the U (d), respectively Vp , invariant subspace HY ⊗ KY , which is deÿned in Eq. (7.32), where
Y = (Y1 ; : : : ; Yd ) refers here to Young frames with d rows and N boxes. The only remaining freedom
for the E N is the assignment x(Y ) ∈ of Young frames (and therefore projections EN ) to points in
. Since the Young frames themselves have up to normalization the same structure as the elements
of , one possibility for s(Y ) is just s(Y ) = Y=N . Written as quantum to classical channel this is
C(XN )

f(Y=N )PY ∈ B(H⊗N ) ;

f→

(7.68)

Y

where XN ⊂ is the set of normalized Young frames, i.e. all Y=N if Y has d rows and N boxes. It
turns out, somewhat surprisingly that this choice leads indeed to an asymptotically exact estimation
strategy with exponentially decaying error probability (7.67). The following theorem can be proven
with methods from the theory of large deviations:
Theorem 7.5. The family of estimators E N ; N ∈ N given in Eq. (7.68) is asymptotically exact; i.e.
the error probabilities KN ( ) vanish in the limit N → ∞ if
is a complement of a ball around


541

the spectrum r ∈ of . If is a set (possibly containing r) whose interior is dense in its closure
we have the asymptotic estimate for KN ( ):
1
ln KN ( ) = inf I (s) ;
(7.69)
lim
N →∞ N
s∈
where the “rate function” I :
s and r

→ R is just the relative entropy between the two probability vectors

sj (ln sj − ln rj ) :

I (s) =

(7.70)

j

To make this statement more transparent, note that we can rephrase (7.69) as
KN ( ) ≈ exp −N inf I (s)
s∈

:

(7.71)

Since the rate function I vanishes only for s = r we see that the probability measures KN converge
(weakly) to a point measure concentrated at r ∈ . The rate of this convergence is exponential and
measured exactly by the function I .
7.3.2. Puriÿcation and cloning
Let us come back now to the discussion of puriÿcation started in Section 7.2.2 (consequently we
have H = C2 again). Our aim is now to calculate the ÿdelities FR; # (N; M (N )) in the limit N → ∞
for a sequence M (N ); N ∈ N such that M (N )=N converges to a value c ∈ R. The crucial step to do
this is the application of Theorem 7.5. The density matrices s (ÿ) from Eq. (7.46) can be deÿned
alternatively by
5
= wN (s)−1 Ps (ÿ)⊗N Ps ; wN (s) = tr ( (ÿ)⊗N Ps ) ;
(7.72)
s (ÿ) ⊗
dim KN; s
where Ps is the projection from H⊗N to Hs ⊗ KN; s . In other words Ps is equal to PY from
Eq. (7.68) if we apply the reparametrization
(Y1 ; Y2 ) → (s; N ) = ((Y1 − Y2 )=2; Y1 + Y2 ) :
In a similar way we can rewrite the set of ordered spectra by
KN ( ) becomes a measure on [0; 1] (i.e. ⊂ [0; 1]):
tr( (ÿ)⊗N Ps ) =

KN ( ) =
2s=N ∈

wN (s)

(7.73)
(x1 ; x2 ) → x1 − x2 ∈ [0; 1] and
(7.74)

2s=N ∈

and the sum
FR; # (N; M (N )) =

wN (s)f# (M (N ); ÿ; s)

(7.75)

s

˜
can be rephrased as the integral of a function [0; 1]
x → f # (N; ÿ; x) ∈ R with respect to this
˜
˜ # is related to f# by f # (N; ÿ; 2s=N )=f# (M (N ); ÿ; s). According to Theorem 7.5
measure, provided f
the KN converge to a point measure concentrated at the ordered spectrum of (ÿ); but the latter
corresponds, according to the reparametrization above, to the noise parameter # = tanh ÿ. Hence, if

542

1
theta=0.25
theta=0.50
theta=0.75
theta=1.00

0.9
0.8
0.7

Φ( )

0.6
0.5
0.4
0.3
0.2
0.1
0
0

0.5

1

Fig. 7.4. Asymptotic all-qubit ÿdelity

1.5

2

( ) plotted as function of the rate .

˜
the sequence of functions f # (N; ÿ; ·) converges for N → ∞ uniformly (or at least uniformly on a
˜
neighborhood of #) to f # (ÿ; ·) we get
lim F(N; M (N )) = lim

N →∞

N →∞

s

˜
˜
f # (N; ÿ; s) = f # (ÿ; #)

(7.76)

for the limit of the ÿdelities. A precise formulation of this idea leads to the following theorem [100].
Theorem 7.6. The two puriÿcation ÿdelities FR; # have the following limits:
lim lim FR; 1 (N; M ) = 1

(7.77)

N →∞ M →∞

and

( ) = lim FR; all (N; M ) =
N →∞
M=N →











2#2
2#2 + (1 − #)
2#2
(1 + #)

if

6 #;
(7.78)

if

¿ #:

If we are only interested in the quality of each qubit separately we can produce arbitrarily good
puriÿed qubits at any rate. If on the other hand the correlations between the output systems should
vanish in the limit the rate is always zero. This can be seen from the function , which is the
asymptotic all-qubit ÿdelity which can be reached by a given rate . We have plotted it in Fig. 7.4.
Note ÿnally that the results just stated contain the rates of optimal cloning machines as a special
case; we only have to set # = 1.


543

References
[1] A. AcÃn, A. Andrianov, L. Costa, E. JanÃ , J.I. Latorre, R. Tarrach, Schmidt decomposition and classiÿcation of
e
three-quantum-bit states, Phys. Rev. Lett. 85 (7) (2000) 1560–1563.
[2] C. Adami, N.J. Cerf, Von Neumann capacity of noisy quantum channels, Phys. Rev. A 56 (5) (1997) 3470–3483.
[3] G. Alber, T. Beth, M. Horodecki, R. Horodecki, M. Rotteler, H. Weinfurter, R. Werner, A. Zeilinger (Eds.),
Quantum Information, Springer, Berlin, 2001.
[4] A. Ashikhmin, E. Knill, Nonbinary quantum stabilizer codes, IEEE Trans. Inf. Theory 47 (7) (2001) 3065–3072.
[5] A. Aspect, J. Dalibard, G. Roger, Experimental test of Bell’s inequalities using time-varying analyzers, Phys. Rev.
Lett. 49 (1982) 1804–1807.
[6] H. Barnum, E. Knill, M.A. Nielsen, On quantum ÿdelities and channel capacities, IEEE Trans. Inf. Theory 46
(2000) 1317–1329.
[7] H. Barnum, M.A. Nielsen, B. Schumacher, Information transmission through a noisy quantum channel, Phys. Rev.
A 57 (6) (1998) 4153–4175.
[8] H. Barnum, J.A. Smolin, B.M. Terhal, Quantum capacity is properly deÿned without encodings, Phys. Rev. A 58
(5) (1998) 3496–3501.
[9] C.H. Bennett, H.J. Bernstein, S. Popescu, B. Schumacher, Concentrating partial entanglement by local operations,
Phys. Rev. A 53 (4) (1996) 2046–2052.
[10] C.H. Bennett, G. Brassard, Quantum key distribution and coin tossing, in: Proceedings of the IEEE International
Conference on Computers, Systems, and Signal Processing, Bangalore, India, IEEE, New York, 1984, pp. 175 –179.
[11] C.H. Bennett, G. Brassard, C. CrÃ peau, R. Jozsa, A. Peres, W.K. Wootters, Teleporting an unknown quantum state
e
via dual classical and Einstein–Podolsky–Rosen channels, Phys. Rev. Lett. 70 (1993) 1895–1899.
[12] C.H. Bennett, G. Brassard, S. Popescu, B. Schumacher, J.A. Smolin, W.K. Wootters, Puriÿcation of noisy
entanglement and faithful teleportation via noisy channels, Phys. Rev. Lett. 76 (5) (1996) 722–725;
C.H. Bennett, G. Brassard, S. Popescu, B. Schumacher, J.A. Smolin, W.K. Wootters, Erratum, Phys. Rev. Lett.
78 (10) (1997) 2031.
[13] C.H. Bennett, D.P. DiVincenzo, C.A. Fuchs, T. Mor, E.M. Rains, P.W. Shor, J.A. Smolin, W.K. Wootters, Quantum
nonlocality without entanglement, Phys. Rev. A 59 (2) (1999) 1070–1091.
[14] C.H. Bennett, D.P. DiVincenzo, T. Mor, P.W. Shor, J.A. Smolin, B.M. Terhal, Unextendible product bases and
bound entanglement, Phys. Rev. Lett. 82 (26) (1999) 5385–5388.
[15] C.H. Bennett, D.P. DiVincenzo, J.A. Smolin, Capacities of quantum erasure channels, Phys. Rev. Lett. 78 (16)
(1997) 3217–3220.
[16] C.H. Bennett, D.P. DiVincenzo, J.A. Smolin, W.K. Wootters, Mixed-state entanglement and quantum error
correction, Phys. Rev. A 54 (4) (1996) 3824–3851.
[17] C.H. Bennett, P.W. Shor, J.A. Smolin, A.V. Thapliyal, Entanglement-assisted classical capacity of noisy quantum
channels, Phys. Rev. Lett. 83 (15) (1999) 3081–3084.
[18] C.H. Bennett, P.W. Shor, J.A. Smolin, A.V. Thapliyal, Entanglement-assisted capacity of a quantum channel and
the reverse Shannon theorem, 2001, quant-ph=0106052.
[19] C.H. Bennett, S.J. Wiesner, Communication via one- and two-particle operators on Einstein–Podolsky–Rosen states,
Phys. Rev. Lett. 20 (1992) 2881–2884.
[20] T. Beth, M. Rotteler, Quantum algorithms: applicable algebra and quantum physics, in: G. Alber, et al., (Eds.),
Quantum Information, Springer, Berlin, 2001, pp. 97–150.
[21] E. Biolatti, R.C. Iotti, P. Zanardi, F. Rossi, Quantum information processing with semiconductor macroatoms, Phys.
Rev. Lett. 85 (26) (2000) 5647–5650.
[22] D. Boschi, S. Branca, F. De Martini, L. Hardy, S. Popescu, Experimental realization of teleporting an unknown pure
quantum state via dual classical an Einstein–Podolsky–Rosen channels, Phys. Rev. Lett. 80 (6) (1998) 1121–1125.
[23] D. Bouwmeester, A.K. Ekert, A. Zeilinger (Eds.), The Physics of Quantum Information: Quantum Cryptography,
Quantum Teleportation, Quantum Computation, Springer, Berlin, 2000.
[24] D. Bouwmeester, J.-W. Pan, K. Mattle, M. Eibl, H. Weinfurter, A. Zeilinger, Experimental quantum teleportation,
Nature 390 (1997) 575–579.
[25] O. Bratteli, D.W. Robinson, Operator Algebras and Quantum Statistical Mechanics I, Springer, New York, 1979.

544


[26] O. Bratteli, D.W. Robinson, Operator Algebras and Quantum Statistical Mechanics II, Springer, Berlin, 1997.
[27] S.L. Braunstein, C.M. Caves, R. Jozsa, N. Linden, S. Popescu, R. Schack, Separability of very noisy mixed states
and implications for NMR quantum computing, Phys. Rev. Lett. 83 (5) (1999) 1054–1057.
[28] G.K. Brennen, C.M. Caves, I.H. Deutsch, F.S. Jessen, Quantum logic gates in optical lattices, Phys. Rev. Lett. 82
(5) (1999) 1060–1063.
[29] K.R. Brown, D.A. Lidar, K.B. Whaley, Quantum computing with quantum dots on linear supports, 2001,
quant-ph=0105102.
[30] T.A. Brun, H.L. Wang, Coupling nanocrystals to a high-q silica microsphere: entanglement in quantum dots via
photon exchange, Phys. Rev. A 61 (2000) 032307.
[31] D. Bru , D.P. DiVincenzo, A. Ekert, C.A. Fuchs, C. Machiavello, J.A. Smolin, Optimal universal and
state-dependent cloning, Phys. Rev. A 57 (4) (1998) 2368–2378.
[32] D. Bru , A.K. Ekert, C. Macchiavello, Optimal universal quantum cloning and state estimation, Phys. Rev. Lett.
81 (12) (1998) 2598–2601.
[33] D. Bru , C. Macchiavello, Optimal state estimation for d-dimensional quantum systems, Phys. Lett. A 253 (1999)
249–251.
[34] W.T. Buttler, R.J. Hughes, S.K. Lamoreaux, G.L. Morgan, J.E. Nordholt, C.G. Peterson, Daylight quantum key
distribution over 1:6 km, Phys. Rev. Lett. 84 (2000) 5652–5655.
[35] V. BuÄ ek, M. Hillery, Universal optimal cloning of qubits and quantum registers, Phys. Rev. Lett. 81 (22) (1998)
z
5003–5006.
[36] V. BuÄ ek, M. Hillery, R.F. Werner, Optimal manipulations with qubits: universal-not gate, Phys. Rev. A 60 (4)
z
(1999) R2626–R2629.
[37] A. Cabello, Bibliographic guide to the foundations of quantum mechanics and quantum information, 2000,
quant-ph=0012089.
[38] A.R. Calderbank, E.M. Rains, P.W. Shor, N.J.A. Sloane, Quantum error correction and orthogonal geometry, Phys.
Rev. Lett. 78 (3) (1997) 405–408.
[39] A.R. Calderbank, P.W. Shor, Good quantum error-correcting codes exist, Phys. Rev. A 54 (1996) 1098–1105.
[40] N.J. Cerf, Asymmetric quantum cloning in any direction, J. Mod. Opt. 47 (2) (2000) 187–209.
[41] N.J. Cerf, C. Adami, Negative entropy and information in quantum mechanics, Phys. Rev. Lett. 79 (26) (1997)
5194–5197.
[42] N.J. Cerf, C. Adami, R.M. Gingrich, Reduction criterion for separability, Phys. Rev. A 60 (2) (1999) 898–909.
[43] N.J. Cerf, S. Iblisdir, G. van Assche, Cloning and cryptography with quantum continuous variables, 2001,
quant-ph=0107077.
[44] I.L. Chuang, L.M.K. Vandersypen, X.L. Zhou, D.W. Leung, S. Lloyd, Experimental realization of a quantum
algorithm, Nature 393 (1998) 143–146.
[45] A. Church, An unsolved problem of elementary number theory, Am. J. Math. 58 (1936) 345–363.
[46] J.I. Cirac, A.K. Ekert, C. Macchiavello, Optimal puriÿcation of single qubits, Phys. Rev. Lett. 82 (1999) 4344–4347.
[47] J.F. Clauser, M.A. Horne, A. Shimony, R.A. Holt, Proposed experiment to test local hidden-variable theories,
Phys. Rev. Lett. 23 (15) (1969) 880–884.
[48] J.F. Cornwell, Group Theory in Physics II, Academic Press, London, 1984.
[49] T.M. Cover, J.A. Thomas, Elements of Information Theory, Wiley, Chichester, 1991.
[50] E.B. Davies, Quantum Theory of Open Systems, Academic Press, London, 1976.
[51] B. Demoen, P. Vanheuverzwijn, A. Verbeure, Completely positive maps on the CCR-algebra, Lett. Math. Phys. 2
(1977) 161–166.
[52] D. Deutsch, Quantum theory, the Church–Turing principle and the universal quantum computer, Proc. R. Soc.
London A 400 (1985) 97–117.
[53] D. Deutsch, R. Jozsa, Rapid solution of problems by quantum computation, Proc. R. Soc. London A 439 (1992)
553–558.
[54] D.P. DiVincenzo, P.W. Shor, J.A. Smolin, Quantum-channel capacity of very noisy channels, Phys. Rev. A 57 (2)
(1998) 830–839;
D.P. DiVincenzo, P.W. Shor, J.A. Smolin, Erratum, Phys. Rev. A 59 (2) (1999) 1717.
[55] D.P. DiVincenzo, P.W. Shor, J.A. Smolin, B.M. Terhal, A.V. Thapliyal, Evidence for bound entangled states with
negative partial transpose, Phys. Rev. A 61 (6) (2000) 062312.


545

[56] M.J. Donald, M. Horodecki, Continuity of relative entropy of entanglement, Phys. Lett. A 264 (4) (1999) 257–260.
[57] M.J. Donald, M. Horodecki, O. Rudolph, The uniqueness theorem for entanglement measures, 2001,
quant-ph=0105017.
[58] W. Dur, J.I. Cirac, M. Lewenstein, D. Bruss, Distillability and partial transposition in bipartite systems, Phys. Rev.
A 61 (6) (2000) 062313.
[59] B. Efron, R.J. Tibshirani, An Introduction to the Bootstrap, Chapman Hall, New York, 1993.
[60] T. Eggeling, K.G.H. Vollbrecht, R.F. Werner, M.M. Wolf, Distillability via protocols respecting the positivity of
the partial transpose, Phys. Rev. Lett. 87 (2001) 257902.
[61] T. Eggeling, R.F. Werner, Separability properties of tripartite states with U × U × U -symmetry, Phys. Rev. A 63
(4) (2001) 042111.
[62] A. Feinstein, Foundations of Informations Theory, McGraw-Hill, New York, 1958.
[63] D.G. Fischer, M. Freyberger, Estimating mixed quantum states, Phys. Lett. A 273 (2000) 293–302.
[64] G. Giedke, L.-M. Duan, J.I. Cirac, P. Zoller, Distillability criterion for all bipartite gaussian states, Quant. Inf.
Comput. 1 (3) (2001).
[65] G. Giedke, B. Kraus, M. Lewenstein, J.I. Cirac, Separability properties of three-mode gaussian states, Phys. Rev.
A 64 (5) (2001) 052303.
[66] R.D. Gill, S. Massar, State estimation for large ensembles, Phys. Rev. A 61 (2000) 2312–2327.
[67] N. Gisin, Hidden quantum nonlocality revealed by local ÿlters, Phys. Lett. A 210 (3) (1996) 151–156.
[68] N. Gisin, S. Massar, Optimal quantum cloning machines, Phys. Rev. Lett. 79 (11) (1997) 2153–2156.
[69] N. Gisin, G. Ribordy, W. Tittel, H. Zbinden, Quantum Cryptography, 2001, quant-ph=0101098.
[70] D. Gottesman, Class of quantum error-correcting codes saturating the quantum hamming bound, Phys. Rev. A 54
(1996) 1862–1868.
[71] D. Gottesman, Stabilizer codes and quantum error correction, Ph.D. Thesis, California Institute of Technology,
1997, quant-ph=9705052.
[72] M. Grassl, T. Beth, T. Pellizzari, Codes for the quantum erasure channel, Phys. Rev. A 56 (1) (1997) 33–38.
[73] D.M. Greenberger, M.A. Horne, A. Zeilinger, Going beyond bell’s theorem, in: M. Kafatos (Ed.), Bell’s Theorem,
Quantum Theory, and Conceptions of the Universe, Kluwer Academic Publishers, Dordrecht, 1989, pp. 69–72.
[74] L.K. Grover, Quantum computers can search arbitrarily large databases by a single query, Phys. Rev. A 56 (23)
(1997) 4709–4712.
[75] L.K. Grover, Quantum mechanics helps in searching for a needle in a haystack, Phys. Rev. Lett. 79 (2) (1997)
325–328.
[76] J. Gruska, Quantum Computing, McGraw-Hill, New York, 1999.
[77] J. Harrington, J. Preskill, Achievable rates for the gaussian quantum channel, Phys. Rev. A 64 (6) (2001) 062301.
[78] P.M. Hayden, M. Horodecki, B.M. Terhal, The asymptotic entanglement cost of preparing a quantum state, J. Phys.
A. Math. Gen. 34 (35) (2001) 6891–6898.
[79] A.S. Holevo, Probabilistic and Statistical Aspects of Quantum Theory, North-Holland, Amsterdam, 1982.
[80] A.S. Holevo, Coding theorems for quantum channels, Tamagawa University Research Review no. 4, 1998,
quant-ph=9809023.
[81] A.S. Holevo, Sending quantum information with gaussian states, in: Proceedings of the Fourth International
Conference on Quantum Communication, Measurement and Computing, Evanston, 1998, quant-ph=9809022.
[82] A.S. Holevo, On entanglement-assisted classical capacity, 2001, quant-ph=0106075.
[83] A.S. Holevo, Statistical Structure of Quantum Theory, Springer, Berlin, 2001.
[84] A.S. Holevo, R.F. Werner, Evaluating capacities of bosonic gaussian channels, Phys. Rev. A 63 (3) (2001) 032312.
[85] M. Horodecki, P. Horodecki, Reduction criterion of separability and limits for a class of distillation protocols, Phys.
Rev. A 59 (6) (1999) 4206–4216.
[86] M. Horodecki, P. Horodecki, R. Horodecki, Separability of mixed states: necessary and su cient conditions, Phys.
Lett. A 223 (1–2) (1996) 1–8.
[87] M. Horodecki, P. Horodecki, R. Horodecki, Mixed-state entanglement and distillation: is there a “bound”
entanglement in nature? Phys. Rev. Lett. 80 (24) (1998) 5239–5242.
[88] M. Horodecki, P. Horodecki, R. Horodecki, General teleportation channel, singlet fraction, and quasidistillation,
Phys. Rev. A 60 (3) (1999) 1888–1898.

546


[89] M. Horodecki, P. Horodecki, R. Horodecki, Limits for entanglement measures, Phys. Rev. Lett. 84 (9) (2000)
2014–2017.
[90] M. Horodecki, P. Horodecki, R. Horodecki, Uniÿed approach to quantum capacities: towards quantum noisy coding
theorem, Phys. Rev. Lett. 85 (2) (2000) 433–436.
[91] M. Horodecki, P. Horodecki, R. Horodecki, Mixed-state entanglement and quantum communication, in: G. Alber,
et al., (Eds.), Quantum Information, Springer, Berlin, 2001, pp. 151–195.
[92] P. Horodecki, M. Horodecki, R. Horodecki, Bound entanglement can be activated, Phys. Rev. Lett. 82 (5) (1999)
1056–1059.
[93] R.J. Hughes, G.L. Morgan, C.G. Peterson, Quantum key distribution over a 48 km optical ÿbre network, J. Mod.
Opt. 47 (2–3) (2000) 533–547.
[94] A. Jamiolkowski, Linear transformations which preserve trace and positive semideÿniteness of operators, Rep. Math.
Phys. 3 (1972) 275–278.
[95] T. Jennewein, C. Simon, G. Weihs, H. Weinfurter, A. Zeilinger, Quantum cryptography with entangled photons,
Phys. Rev. Lett. 84 (2000) 4729–4732.
[96] J.A. Jones, M. Mosca, R.H. Hansen, Implementation of a quantum search algorithm on a quantum computer,
Nature 393 (1998) 344–346.
[97] M. Keyl, D. Schlingemann, R.F. Werner, Inÿnitely entangled states, in preparation.
[98] M. Keyl, R.F. Werner, Optimal cloning of pure states, testing single clones, J. Math. Phys. 40 (1999) 3283–3299.
[99] M. Keyl, R.F. Werner, Estimating the spectrum of a density operator, Phys. Rev. A 64 (5) (2001) 052311.
[100] M. Keyl, R.F. Werner, The rate of optimal puriÿcation procedures, Ann H. PoincarÃ 2 (2001) 1–26.
e
[101] A.I. Khinchin, Mathematical Foundations of Information Theory, Dover Publications, New York, 1957.
[102] B.E. King, C.S. Wood, C.J. Myatt, Q.A. Turchette, D. Leibfried, W.M. Itano, C. Monroe, D.J. Wineland, Cooling
the collective motion of trapped ions to initialize a quantum register, Phys. Rev. Lett. 81 (7) (1998) 1525–1528.
[103] E. Knill, R. La amme, Theory of quantum error-correcting codes, Phys. Rev. A 55 (2) (1997) 900–911.
[104] B. Kraus, M. Lewenstein, J.I. Cirac, Characterization of distillable and activable states using entanglement witnesses,
2001, quant-ph=0110174.
[105] K. Kraus, States E ects and Operations, Springer, Berlin, 1983.
[106] R. Landauer, Irreversibility and heat generation in the computing process, IBM J. Res. Dev. 5 (1961) 183.
[107] U. Leonhardt, Measuring the Quantum State of Light, Cambridge University Press, Cambridge, 1997.
[108] M. Lewenstein, A. Sanpera, Separability and entanglement of composite quantum systems, Phys. Rev. Lett. 80 (11)
(1998) 2261–2264.
[109] N. Linden, H. Barjat, R. Freeman, An implementation of the Deutsch–Jozsa algorithm on a three-qubit NMR
quantum computer, Chem. Phys. Lett. 296 (1–2) (1998) 61–67.
[110] S. Lloyd, Capacity of the noisy quantum channel, Phys. Rev. A 55 (3) (1997) 1613–1622.
[111] H.-K. Lo, T. Spiller, S. Popescu (Eds.), Introduction to Quantum Computation and Information, World Scientiÿc,
Singapore, 1998.
[112] Y. Makhlin, G. Schon, A. Shnirman, Quantum-state engineering with Josephson-junction devices, Rev. Mod. Phys.
73 (2) (2001) 357–400.
[113] R. Marx, A.F. Fahmy, J.M. Myers, W. Bermel, S.J. Glaser, Approaching ÿve-bit NMR quantum computing,
Phys. Rev. A 62 (1) (2000) 012310.
[114] R. Matsumoto and T. Uyematsu, Lower bound for the quantum capacity of a discrete memoryless quantum channel,
2001, quant-ph=0105151.
[115] K. Mattle, H. Weinfurter, P.G. Kwiat, A. Zeilinger, Dense coding in experimental quantum communication,
Phys. Rev. Lett. 76 (25) (1996) 4656–4659.
[116] N.D. Mermin, Quantum mysteries revisited, Am. J. Phys. 58 (8) (1990) 731–734.
[117] N.D. Mermin, What’s wrong with these elements of reality? Phys. Today 43 (6) (1990) 9–11.
[118] H.C. Nagerl, W. Bechter, J. Eschner, F. Schmidt-Kaler, R. Blatt, Ion strings for quantum gates, Appl. Phys. B 66
(5) (1998) 603–608.
[119] M. A. Nielsen, Conditions for a class of entanglement transformations, Phys. Rev. Lett. 83 (2) (1999) 436–439.
[120] M.A. Nielsen, Continuity bounds for entanglement, Phys. Rev. A 61 (6) (2000) 064301.
[121] M.A. Nielsen, Characterizing mixing and measurement in quantum mechanics, Phys. Rev. A 63 (2) (2001) 022114.


547

[122] M.A. Nielsen, I.L. Chuang, Quantum Computation and Quantum Information, Cambridge University Press,
Cambridge, 2000.
[123] M. Ohya, D. Petz, Quantum Entropy and its Use, Springer, Berlin, 1993.
[124] C.M. Papadimitriou, Computational Complexity, Addison-Wesley, Reading, MA, 1994.
[125] V.I. Paulsen, Completely Bounded Maps and Dilations, Longman Scientiÿc Technical, New York, 1986.
[126] A. Peres, Higher order schmidt decompositions, Phys. Lett. A 202 (1) (1995) 16–17.
[127] A. Peres, Separability criterion for density matrices, Phys. Rev. Lett. 77 (8) (1996) 1413–1415.
[128] S. Popescu, Bell’s inequalities versus teleportation: what is nonlocality? Phys. Rev. Lett. 72 (6) (1994) 797–799.
[129] S. Popescu, D. Rohrlich, Thermodynamics and the measure of entanglement, Phys. Rev. A 56 (5) (1997)
R3319–R3321.
[130] J. Preskill, Lecture notes for the course ‘Information for Physics 219=Computer Science 219, Quantum Computation,’
Caltech, Pasadena, California, 1999, www.theory.caltech.edu/people/preskill/ph229.
[131] M. Purser, Introduction to Error-Correcting Codes, Artech House, Boston, 1995.
[132] E.M. Rains, Bound on distillable entanglement, Phys. Rev. A 60 (1) (1999) 179–184;
E.M. Rains, Erratum, Phys. Rev. A 63 (1) (2001) 019902(E).
[133] E.M. Rains, A semideÿnite program for distillable entanglement, IEEE Trans. Inf. Theory 47 (7) (2001) 2921–2933.
[134] M. Reed, B. Simon, Methods of Modern Mathematical Physics I, Academic Press, San Diego, 1980.
[135] W. Rudin, Functional Analysis, McGraw-Hill, New-York, 1973.
[136] O. Rudolph, A separability criterion for density operators, J. Phys. A 33 (21) (2000) 3951–3955.
[137] D. Schlingemann, R.F. Werner, Quantum error-correcting codes associated with graphs, 2000, quant-ph=0012111.
[138] C.E. Shannon, A mathematical theory of communication, Bell. Syst. Tech. J. 27 (1948) 379 – 423, 623– 656.
[139] P.W. Shor, Algorithms for quantum computation: discrete logarithms and factoring, in: S. Goldwasser (Ed.),
Proceedings of the 35th Annual Symposium on the Foundations of Computer Science, IEEE Computer Science,
Society Press, Los Alamitos, CA, 1994, pp. 124–134.
[140] P.W. Shor, Polynomial-time algorithms for prime factorization and discrete logarithms on a quantum computer,
Soc. Ind. Appl. Math. J. Comput. 26 (1997) 1484–1509.
[141] P.W. Shor, J.A. Smolin, B.M. Terhal, Nonadditivity of bipartite distillable entanglement follows from a conjecture
on bound entangled Werner states, Phys. Rev. Lett. 86 (12) (2001) 2681–2684.
[142] B. Simon, Representations of Finite and Compact Groups, American Mathematical Society, Providence, RI, 1996.
[143] D. Simon, On the power of quantum computation, in: Proceedings of the 35th Annual Symposium on Foundations
of Computer Science, IEEE Computer Society Press, Los Alamitos, 1994, pp. 124 –134.
[144] R. Simon, Peres-Horodecki separability criterion for continuous variable systems, Phys. Rev. Lett. 84 (12) (2000)
2726–2729.
[145] S. Singh, The Code Book: The Science of Secrecy from Ancient Egypt to Quantum Cryptography, Fourth Estate,
London, 1999.
[146] A.M. Steane, Multiple particle interference and quantum error correction, Proc. Roy. Soc. London A 452 (1996)
2551–2577.
[147] W.F. Stinespring, Positive functions on C*-algebras, Proc. Am. Math. Soc. (1955) 211–216.
[148] E. StHrmer, Positive linear maps of operator algebras, Acta Math. 110 (1693) 233–278.
[149] T. Tanamoto, Quantum gates by coupled asymmetric quantum dots and controlled-not-gate operation, Phys. Rev.
A 61 (2000) 022305.
[150] B.M. Terhal, K.G.H. Vollbrecht, Entanglement of formation for isotropic states, Phys. Rev. Lett. 85 (12) (2000)
2625–2628.
[151] W. Tittel, J. Brendel, H. Zbinden, N. Gisin, Violation of Bell inequalities by photons more than 10 km apart,
Phys. Rev. Lett. 81 (17) (1998) 3563–3566.
[152] A.M. Turing, On computable numbers, with an application to the entscheidungsproblem, Proc. London Math. Soc.
Ser. 2 42 (1936) 230–265.
[153] V. Vedral, M.B. Plenio, Entanglement measures and puriÿcation procedures, Phys. Rev. A 54 (3) (1998) 1619–1633.
[154] V. Vedral, M.B. Plenio, M.A. Rippin, P.L. Knight, Quantifying entanglement, Phys. Rev. Lett. 78 (12) (1997)
2275–2279.
[155] G. Vidal, Entanglement monotones, J. Mod. Opt. 47 (2–3) (2000) 355–376.

548


[156] G. Vidal, J.I. Latorre, P. Pascual, R. Tarrach, Optimal minimal measurements of mixed states, Phys. Rev. A 60
(1999) 126–135.
[157] G. Vidal, R. Tarrach, Robustness of entanglement, Phys. Rev. A 59 (1) (1999) 141–155.
[158] G. Vidal, R.F. Werner, A computable measure of entanglement, 2001, quant-ph=0102117.
[159] K.G.H. Vollbrecht, R.F. Werner, Entanglement measures under symmetry, 2000, quant-ph=0010095.
[160] K.G.H. Vollbrecht, R.F. Werner, Why two qubits are special, J. Math. Phys. 41 (10) (2000) 6772–6782.
[161] I. Wegener, The Complexity of Boolean Functions, Teubner, Stuttgart, 1987.
[162] S. Weigert, Reconstruction of quantum states and its conceptual implications, in: H.D. Doebner, S.T. Ali, M. Keyl,
R.F. Werner (Eds.), Trends in Quantum Mechanics, World Scientiÿc, Singapore, 2000, pp. 146–156.
[163] H. Weinfurter, A. Zeilinger, Quantum communication, in: G. Alber, et al., (Eds.), Quantum Information, Springer,
Berlin, 2001, pp. 58–95.
[164] R.F. Werner, Quantum harmonic analysis on phase space, J. Math. Phys. 25 (1984) 1404–1411.
[165] R.F. Werner, Quantum states with Einstein–Podolsky–Rosen correlations admitting a hidden-variable model, Phys.
Rev. A 40 (8) (1989) 4277–4281.
[166] R.F. Werner, Optimal cloning of pure states, Phys. Rev. A 58 (1998) 980–1003.
[167] R.F. Werner, All teleportation and dense coding schemes, 2000, quant-ph=0003070.
[168] R.F. Werner, Quantum information theory—an invitation, in: G. Alber, et al., (Eds.), Quantum Information, Springer,
Berlin, 2001, pp. 14–59.
[169] R.F. Werner, M.M. Wolf, Bell inequalities and entanglement, Quant. Inf. Comput. 1 (3) (2001) 1–25.
[170] R.F. Werner, M.M. Wolf, Bound entangled gaussian states, Phys. Rev. Lett. 86 (16) (2001) 3658–3661.
[171] H. Weyl, The Classical Groups, Princeton University, Princeton, NJ, 1946.
[172] W.K. Wooters, Entanglement of formation of an arbitrary state of two qubits, Phys. Rev. Lett. 80 (10) (1998)
2245–2248.
[173] W.K. Wootters, W.H. Zurek, A single quantum cannot be cloned, Nature 299 (1982) 802–803.
[174] S.L. Woronowicz, Positive maps of low dimensional matrix algebras, Rep. Math. Phys. 10 (1976) 165–183.

Fundamentals of quantum information theory

More Related Content

What's hot (17)

Viewers also liked (7)

Similar to Fundamentals of quantum information theory (20)

More from Ali J (8)

Recently uploaded (20)

Fundamentals of quantum information theory