Getting AV1/SVC to work in the Janus WebRTC Server

1. Getting AV1/SVC to work in the Janus WebRTC Server Lorenzo Miniero @lminiero@fosstodon.org FOSDEM 2024 Real Time Communications 3rd February 2024, Brussels

2. Who am I? Lorenzo Miniero • Ph.D @ UniNA • Chairman @ Meetecho • Main author of Janus Contacts and info • lorenzo@meetecho.com • https://fosstodon.org/@lminiero • https://www.meetecho.com • https://lminiero.it

3. AOMedia Video 1 (AV1) • Open, royalty free, video codec • Developed by Alliance for Open Media (AOM) • https://aomedia.org/av1-features/get-started/ • Specifically designed for real-time applications • Support for higher resolutions • Natively conceived to support SVC as well • Scalable Video Coding

7. Why is SVC important? • Simulcast • Same source, same m-line • Streams of different “quality” are separate tracks • Each stream uses a different SSRC • Each stream can be decoded indepedently from others • SVC • Same source, same m-line • Streams of different “quality” are layers of the same “thing” • All streams share the same SSRC (since they’re layers) • Each stream depends on the previous to be decoded • Less bandwidth, but more CPU intensive Fun fact – Simulcast in browsers also enables temporal scalability Allows to drop to lower framerate without sacrificing quality

10. Simulcast in a nutshell https://webrtchacks.com/sfu-simulcast/

11. SVC as a different way to encode multiple tracks https://webrtchacks.com/chrome-vp9-svc/

12. Enter the Janus WebRTC Server Janus General purpose, open source WebRTC server • https://github.com/meetecho/janus-gateway • Demos and documentation: https://janus.conf.meetecho.com • Community: https://janus.discourse.group/

13. Adding AV1 support to Janus • A few different (incremental) requirements 1 Negotiate AV1 in the SDP 2 Detect keyframes when receiving packets (useful for different reasons) 3 Copy AV1 frames from Janus recordings to playable format 4 Negotiate Dependency Descriptor extension in the SDP (for SVC) 5 Parse Dependency Descriptor extension format 6 Use Dependency Descriptor for SVC in different plugins • Negotiating AV1 is easy • a=rtpmap:XX AV1/90000 (note: was AV1X/90000 in first Chrome integration) • Detecting keyframes and supporting AV1 in recordings is trickier • Requires understanding of RTP packetization rules for AV1 • Let’s start there, before diving in SVC!

22. Starting from the basics: AV1 and RTP • All codecs need packetization rules, to be used in RTP • Especially true for video, since data may me split in multiple packets • Need to know what’s what, how to split, and how to stitch back together • Activity usually carried within the IETF (Internet Engineering Task Force) • AVTCORE Working Group (Audio/Video Transport Core Maintenance) • https://datatracker.ietf.org/wg/avtcore/about/ • RTP payload format for AV1 developed by AOM, though • https://aomediacodec.github.io/av1-rtp-spec/ • Two main concepts introduced in the document 1 AV1 aggregation header (for the RTP payload) 2 Dependency descriptor (for SVC)

26. AV1 and RTP 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 V=2 P X CC M PT sequence number timestamp synchronization source (SSRC) identifier                  RTP Header 0x100 0x0 extension length ID header length AV1 Dependency Descriptor other extensions                          RTP extension(s) (optional) AV1 aggr. header Bytes 2..N of AV1 payload          RTP Payload

27. AV1 aggregation header • Fundamental concept to map OBUs (Open Bitstream Unit) to RTP packets • How to split OBU across multiple packets • How to aggregate different OBUs in the same packet • How to regenerate original OBUs from RTP packets 0 1 2 3 4 5 6 7 Z Y W N – – – • Z: 1 if OBU continues from previous packet • Y: 1 if OBU will continue in next packet • W: number of OBUs in packet (2 bits) • N: 1 if first packet of a sequence • Other bits are currently reserved (and unused)

28. AV1 aggregation header example (W=0) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Z Y 0 0 N – – – OBU element 1 size (leb128) OBU element 1 data OBU element 2 size (leb128) OBU element 2 data · · · OBU el. N size (leb128) OBU element N data

29. AV1 aggregation header example (W!=0) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Z Y 1 0 N – – – OBU element 1 size (leb128) OBU element 1 data OBU element 2 data

30. Using the Aggregation Header • Detecting a keyframe can be approximated • Z=0 and N=1 (not a continuation + first packet) −→ keyframe • Simple and quick enough to use while relaying traffic • Smarter process when post-processing recordings, instead • Reconstructing OBUs performed following rules of Aggregation Header • Parse RTP payloads and OBUs until we have a full one • When one is ready, write it to file • Also need to know width/height of video (for MP4 metadata) • Resolution information is in a specific OBU (Sequence Header, type=1) • All this is enough for “normal” AV1 usage (no SVC)

33. SVC and the Dependency Descriptors (DD) • With SVC, the video stream can have multiple layers • One or more spatial layers (different resolutions/bitrates) • One or more temporal layers (different framerats/bitrates) • Layers may have dependencies • e.g., can’t decode spatial layer X without layer Y • Has an impact on which packets can be dropped • Need to figure out these dependencies at runtime • SFU needs to know this info to decide when/what to drop/relay • Can’t parse the payload (too heavy; impossible when E2EE is used) Dependency Descriptor Custom RTP extension with dynamic info on SVC/layers

37. Example of decode targets/chains (L2T3) https://aomediacodec.github.io/av1-rtp-spec/#a10-examples

38. Adding support for DD in Janus • Continuing the requirements from before 4 Negotiate Dependency Descriptor extension in the SDP (for SVC) 5 Parse Dependency Descriptor extension format 6 Use Dependency Descriptor for SVC in different plugins • Negotiating was supposed to be the easy part • a=extmap:X https://aomediacodec.github.io/av1-rtp-spec/[..] • ... but we also needed to add support for two-byte headers (DD can be BIG!) • a=extmap-allow-mixed (RFC8285) • Parsing the format proved to be a painful task, though... • https://www.meetecho.com/blog/av1-svc/ • https://www.meetecho.com/blog/vp9-av1-simulcast-svc/

45. Spoiler alert... DD is a f***ing mess!! • Closer to a codec bitstream, than a protocol format • Almost all fields are variable length (often at the bit level!) • Very hard to parse (even harder to validate parsing, when debugging) • Has a few mandatory fields • start_of_frame • end_of_frame • frame_number • frame_dependency_template_id • Everything else is optional • Allows DD to describe full chain, or just indicate scope of current frame

48. Example of DD content (full list of templates) [lminiero@lminiero tests]$ ./parse-dd dd-l3t3.hex Opening file ’dd-l3t3.hex’... Read 95 bytes (760 bits) -- s=1, e=1, t=1, f=1 -- tdeps=1, adt=0, dtis=0, fdiffs=0, chains=0 -- -- tioff=0, dtcnt=9 -- -- Layers -- -- -- [0] spatial=0, temporal=0 -- -- -- [1] spatial=0, temporal=0 -- -- -- [2] spatial=0, temporal=1 -- -- -- [3] spatial=0, temporal=2 -- -- -- [4] spatial=0, temporal=2 -- -- -- [5] spatial=1, temporal=0 -- -- -- [6] spatial=1, temporal=0 -- -- -- [7] spatial=1, temporal=1 -- -- -- [8] spatial=1, temporal=2 -- -- -- [9] spatial=1, temporal=2 -- -- -- [10] spatial=2, temporal=0 -- -- -- [11] spatial=2, temporal=0 -- -- -- [12] spatial=2, temporal=1 -- -- -- [13] spatial=2, temporal=2 -- -- -- [14] spatial=2, temporal=2 -- -- DTIs -- -- -- [0][0] tdti=2 -- -- -- [0][1] tdti=2 -- -- -- [0][2] tdti=2 -- -- -- [0][3] tdti=3

49. Not done yet -- -- -- [0][4] tdti=3 -- -- -- [0][5] tdti=3 -- -- -- [0][6] tdti=3 -- -- -- [0][7] tdti=3 -- -- -- [0][8] tdti=3 -- -- -- -- [0] dti=SSSRRRRRR -- -- -- [1][0] tdti=2 -- -- -- [1][1] tdti=2 -- -- -- [1][2] tdti=2 -- -- -- [1][3] tdti=2 -- -- -- [1][4] tdti=2 -- -- -- [1][5] tdti=2 -- -- -- [1][6] tdti=2 -- -- -- [1][7] tdti=2 -- -- -- [1][8] tdti=2 -- -- -- -- [1] dti=SSSSSSSSS -- -- -- [2][0] tdti=0 -- -- -- [2][1] tdti=1 -- -- -- [2][2] tdti=2 -- -- -- [2][3] tdti=0 -- -- -- [2][4] tdti=3 -- -- -- [2][5] tdti=3 -- -- -- [2][6] tdti=0 -- -- -- [2][7] tdti=3 -- -- -- [2][8] tdti=3 -- -- -- -- [2] dti=-DS-RR-RR -- -- -- [3][0] tdti=0 -- -- -- [3][1] tdti=0

50. Maybe now? -- -- -- [3][2] tdti=1 -- -- -- [3][3] tdti=0 -- -- -- [3][4] tdti=0 -- -- -- [3][5] tdti=3 -- -- -- [3][6] tdti=0 -- -- -- [3][7] tdti=0 -- -- -- [3][8] tdti=3 -- -- -- -- [3] dti=--D--R--R -- -- -- [4][0] tdti=0 -- -- -- [4][1] tdti=0 -- -- -- [4][2] tdti=1 -- -- -- [4][3] tdti=0 -- -- -- [4][4] tdti=0 -- -- -- [4][5] tdti=3 -- -- -- [4][6] tdti=0 -- -- -- [4][7] tdti=0 -- -- -- [4][8] tdti=3 -- -- -- -- [4] dti=--D--R--R -- -- -- [5][0] tdti=0 -- -- -- [5][1] tdti=0 -- -- -- [5][2] tdti=0 -- -- -- [5][3] tdti=2 -- -- -- [5][4] tdti=2 -- -- -- [5][5] tdti=2 -- -- -- [5][6] tdti=3 -- -- -- [5][7] tdti=3 -- -- -- [5][8] tdti=3 -- -- -- -- [5] dti=---SSSRRR

51. Nope -- -- -- [6][0] tdti=0 -- -- -- [6][1] tdti=0 -- -- -- [6][2] tdti=0 -- -- -- [6][3] tdti=2 -- -- -- [6][4] tdti=2 -- -- -- [6][5] tdti=2 -- -- -- [6][6] tdti=2 -- -- -- [6][7] tdti=2 -- -- -- [6][8] tdti=2 -- -- -- -- [6] dti=---SSSSSS -- -- -- [7][0] tdti=0 -- -- -- [7][1] tdti=0 -- -- -- [7][2] tdti=0 -- -- -- [7][3] tdti=0 -- -- -- [7][4] tdti=1 -- -- -- [7][5] tdti=2 -- -- -- [7][6] tdti=0 -- -- -- [7][7] tdti=3 -- -- -- [7][8] tdti=3 -- -- -- -- [7] dti=----DS-RR -- -- -- [8][0] tdti=0 -- -- -- [8][1] tdti=0 -- -- -- [8][2] tdti=0 -- -- -- [8][3] tdti=0 -- -- -- [8][4] tdti=0 -- -- -- [8][5] tdti=1 -- -- -- [8][6] tdti=0 -- -- -- [8][7] tdti=0

52. Not even close -- -- -- [8][8] tdti=3 -- -- -- -- [8] dti=-----D--R -- -- -- [9][0] tdti=0 -- -- -- [9][1] tdti=0 -- -- -- [9][2] tdti=0 -- -- -- [9][3] tdti=0 -- -- -- [9][4] tdti=0 -- -- -- [9][5] tdti=1 -- -- -- [9][6] tdti=0 -- -- -- [9][7] tdti=0 -- -- -- [9][8] tdti=3 -- -- -- -- [9] dti=-----D--R -- -- -- [10][0] tdti=0 -- -- -- [10][1] tdti=0 -- -- -- [10][2] tdti=0 -- -- -- [10][3] tdti=0 -- -- -- [10][4] tdti=0 -- -- -- [10][5] tdti=0 -- -- -- [10][6] tdti=2 -- -- -- [10][7] tdti=2 -- -- -- [10][8] tdti=2 -- -- -- -- [10] dti=------SSS -- -- -- [11][0] tdti=0 -- -- -- [11][1] tdti=0 -- -- -- [11][2] tdti=0 -- -- -- [11][3] tdti=0 -- -- -- [11][4] tdti=0 -- -- -- [11][5] tdti=0

53. This is exhausting! -- -- -- [11][6] tdti=2 -- -- -- [11][7] tdti=2 -- -- -- [11][8] tdti=2 -- -- -- -- [11] dti=------SSS -- -- -- [12][0] tdti=0 -- -- -- [12][1] tdti=0 -- -- -- [12][2] tdti=0 -- -- -- [12][3] tdti=0 -- -- -- [12][4] tdti=0 -- -- -- [12][5] tdti=0 -- -- -- [12][6] tdti=0 -- -- -- [12][7] tdti=1 -- -- -- [12][8] tdti=2 -- -- -- -- [12] dti=-------DS -- -- -- [13][0] tdti=0 -- -- -- [13][1] tdti=0 -- -- -- [13][2] tdti=0 -- -- -- [13][3] tdti=0 -- -- -- [13][4] tdti=0 -- -- -- [13][5] tdti=0 -- -- -- [13][6] tdti=0 -- -- -- [13][7] tdti=0 -- -- -- [13][8] tdti=1 -- -- -- -- [13] dti=--------D -- -- -- [14][0] tdti=0 -- -- -- [14][1] tdti=0 -- -- -- [14][2] tdti=0 -- -- -- [14][3] tdti=0

54. How are you guys doing? -- -- -- [14][4] tdti=0 -- -- -- [14][5] tdti=0 -- -- -- [14][6] tdti=0 -- -- -- [14][7] tdti=0 -- -- -- [14][8] tdti=1 -- -- -- -- [14] dti=--------D -- -- FDiffs -- -- -- [0][0] 12 -- -- -- -- [0] --> 1 -- -- -- -- [1] --> 0 -- -- -- [2][1] 6 -- -- -- -- [2] --> 1 -- -- -- [3][2] 3 -- -- -- -- [3] --> 1 -- -- -- [4][3] 3 -- -- -- -- [4] --> 1 -- -- -- [5][4] 12 -- -- -- [5][4] 1 -- -- -- -- [5] --> 2 -- -- -- [6][6] 1 -- -- -- -- [6] --> 1 -- -- -- [7][7] 6 -- -- -- [7][7] 1 -- -- -- -- [7] --> 2 -- -- -- [8][9] 3 -- -- -- [8][9] 1 -- -- -- -- [8] --> 2 -- -- -- [9][11] 3

55. Wanna play chess? We do have time! -- -- -- [9][11] 1 -- -- -- -- [9] --> 2 -- -- -- [10][13] 12 -- -- -- [10][13] 1 -- -- -- -- [10] --> 2 -- -- -- [11][15] 1 -- -- -- -- [11] --> 1 -- -- -- [12][16] 6 -- -- -- [12][16] 1 -- -- -- -- [12] --> 2 -- -- -- [13][18] 3 -- -- -- [13][18] 1 -- -- -- -- [13] --> 2 -- -- -- [14][20] 3 -- -- -- [14][20] 1 -- -- -- -- [14] --> 2 -- -- -- FDiffs count=22 -- -- Chains -- -- -- [0] dtpb=0 -- -- -- [1] dtpb=0 -- -- -- [2] dtpb=0 -- -- -- [3] dtpb=1 -- -- -- [4] dtpb=1 -- -- -- [5] dtpb=1 -- -- -- [6] dtpb=2 -- -- -- [7] dtpb=2 -- -- -- [8] dtpb=2 -- -- -- [0][0] tcfdiff=12

56. Surely we’re almost done, right? -- -- -- [0][1] tcfdiff=11 -- -- -- [0][2] tcfdiff=10 -- -- -- [1][0] tcfdiff=0 -- -- -- [1][1] tcfdiff=0 -- -- -- [1][2] tcfdiff=0 -- -- -- [2][0] tcfdiff=6 -- -- -- [2][1] tcfdiff=5 -- -- -- [2][2] tcfdiff=4 -- -- -- [3][0] tcfdiff=3 -- -- -- [3][1] tcfdiff=2 -- -- -- [3][2] tcfdiff=1 -- -- -- [4][0] tcfdiff=9 -- -- -- [4][1] tcfdiff=8 -- -- -- [4][2] tcfdiff=7 -- -- -- [5][0] tcfdiff=1 -- -- -- [5][1] tcfdiff=1 -- -- -- [5][2] tcfdiff=1 -- -- -- [6][0] tcfdiff=1 -- -- -- [6][1] tcfdiff=1 -- -- -- [6][2] tcfdiff=1 -- -- -- [7][0] tcfdiff=7 -- -- -- [7][1] tcfdiff=6 -- -- -- [7][2] tcfdiff=5 -- -- -- [8][0] tcfdiff=4 -- -- -- [8][1] tcfdiff=3 -- -- -- [8][2] tcfdiff=2 -- -- -- [9][0] tcfdiff=10 -- -- -- [9][1] tcfdiff=9

57. You wish! -- -- -- [9][2] tcfdiff=8 -- -- -- [10][0] tcfdiff=2 -- -- -- [10][1] tcfdiff=1 -- -- -- [10][2] tcfdiff=1 -- -- -- [11][0] tcfdiff=2 -- -- -- [11][1] tcfdiff=1 -- -- -- [11][2] tcfdiff=1 -- -- -- [12][0] tcfdiff=8 -- -- -- [12][1] tcfdiff=7 -- -- -- [12][2] tcfdiff=6 -- -- -- [13][0] tcfdiff=5 -- -- -- [13][1] tcfdiff=4 -- -- -- [13][2] tcfdiff=3 -- -- -- [14][0] tcfdiff=11 -- -- -- [14][1] tcfdiff=10 -- -- -- [14][2] tcfdiff=9 -- -- Decode target layers -- -- -- [0] spatial=0, temporal=0 -- -- -- [1] spatial=0, temporal=1 -- -- -- [2] spatial=0, temporal=2 -- -- -- [3] spatial=1, temporal=0 -- -- -- [4] spatial=1, temporal=1 -- -- -- [5] spatial=1, temporal=2 -- -- -- [6] spatial=2, temporal=0 -- -- -- [7] spatial=2, temporal=1 -- -- -- [8] spatial=2, temporal=2 -- -- Resolutions -- -- -- [0] w=80, h=45

58. Oh, finally made it! -- -- -- [1] w=160, h=90 -- -- -- [2] w=320, h=180 -- -- Active_decode_targets_bitmask (1) -- -- -- adtb=511 -- spatial=0, temporal=0 (tindex 1) -- -- Resolution -- -- -- 80x45 -- Padding=0 Bye!

59. Luckily, not all DD messages are this big [lminiero@lminiero tests]$ ./parse-dd dd-l3t3.hex dd-l3t3-2.hex Opening file ’dd-l3t3.hex’... Read 95 bytes (760 bits) -- s=1, e=1, t=1, f=1 -- tdeps=1, adt=0, dtis=0, fdiffs=0, chains=0 -- -- tioff=0, dtcnt=9 [..] [..] -- Padding=0 Opening file ’dd-l3t3-2.hex’... Read 7 bytes (56 bits) -- s=1, e=1, t=6, f=2 -- tdeps=0, adt=0, dtis=0, fdiffs=0, chains=1 -- spatial=1, temporal=0 (tindex 6) -- -- Frame Chains -- -- -- [0] fcfdiff=1 -- -- -- [1] fcfdiff=0 -- -- -- [2] fcfdiff=0 -- Padding=3 Bye!

60. Using DD for SVC in Janus • Dependency nature of DD means keeping a state • We need to store the full state of DD templates (from RTP packet X) • DD in other RTP packets will reference existing DD templates • Once we know which spatial (SL) and temporal (TL) layer a packet belongs to... • ... we can decide whether to relay or drop it • Depends on what subscriber wants/needs • Don’t forget to relay the DD along with the video data! • We may be done with it, but the receiver will need it too • Of course, outgoing RTP headers need to be updated accordingly • We’ll drop packets, but the subscriber must see no gaps in sequence numbers • Last packet of sequence must have marker bit set to 1 (super-important!!)

64. Why are marker bits important, here? [SVC] 0/2, m=0, seq=19073, ts= 9393030 [SVC] 1/2, m=0, seq=19074, ts= 9393030 [SVC] 2/2, m=0, seq=19075, ts= 9393030 [SVC] 2/2, m=1, seq=19076, ts= 9393030 [SVC] 0/1, m=0, seq=19077, ts= 9395910 [SVC] 1/1, m=0, seq=19078, ts= 9395910 [SVC] 2/1, m=0, seq=19079, ts= 9395910 [SVC] 2/1, m=0, seq=19080, ts= 9395910 [SVC] 2/1, m=1, seq=19081, ts= 9395910 [SVC] 0/2, m=0, seq=19082, ts= 9398790 [SVC] 1/2, m=0, seq=19083, ts= 9398790 [SVC] 1/2, m=0, seq=19084, ts= 9398790 [SVC] 2/2, m=0, seq=19085, ts= 9398790 [SVC] 2/2, m=0, seq=19086, ts= 9398790 [SVC] 2/2, m=0, seq=19087, ts= 9398790 [SVC] 2/2, m=1, seq=19088, ts= 9398790 [SVC] 0/2, m=0, seq=19073, ts= 9393030 [SVC] 1/2, m=0, seq=19074, ts= 9393030 dropped SL2 dropped SL2 (m=1 lost!) [SVC] 0/1, m=0, seq=19077, ts= 9395910 [SVC] 1/1, m=0, seq=19078, ts= 9395910 dropped SL2 dropped SL2 dropped SL2 (m=1 lost!) [SVC] 0/2, m=0, seq=19082, ts= 9398790 [SVC] 1/2, m=0, seq=19083, ts= 9398790 [SVC] 1/2, m=0, seq=19084, ts= 9398790 dropped SL2 dropped SL2 dropped SL2 dropped SL2 (m=1 lost!)

65. To test all this, we’ll need a browser that supports AV1...

66. ... and the Dependency Descriptor extension --force-fieldtrials=WebRTC-DependencyDescriptorAdvertised/Enabled/

67. Eureka! https://janus.conf.meetecho.com/demos/echotest.html?vcodec=av1&svc=L3T3

68. Thanks! Questions? Comments? Contacts • https://fosstodon.org/@lminiero • https://twitter.com/elminiero • https://twitter.com/meetecho • https://www.meetecho.com/blog/

69. JanusCon is back, see you soon in Napoli! April 29-30, 2024, Napoli — https://januscon.it

Getting AV1/SVC to work in the Janus WebRTC Server

More Related Content

Similar to Getting AV1/SVC to work in the Janus WebRTC Server (20)

More from Lorenzo Miniero (20)

Recently uploaded (20)

Getting AV1/SVC to work in the Janus WebRTC Server