RTP payload formats explained

The Real-time Transport Protocol (RTP) specifies a general-purpose data format and network protocol for transmitting digital media streams on Internet Protocol (IP) networks. The details of media encoding, such as signal sampling rate, frame size and timing, are specified in an RTP payload format. The format parameters of the RTP payload are typically communicated between transmission endpoints with the Session Description Protocol (SDP), but other protocols, such as the Extensible Messaging and Presence Protocol (XMPP) may be used.

Audio and video payload types

RFC 3551, entitled RTP Profile for Audio and Video (RTP/AVP), specifies the technical parameters of payload formats for audio and video streams.

The standard also describes the process of registering new payload types with IANA; additional payload formats and payload types are defined in the following specifications:

Payload identifiers 96–127 are used for payloads defined dynamically during a session. It is recommended to dynamically assign port numbers, although port numbers 5004 and 5005 have been registered for use of the profile when a dynamically assigned port is not required.

Applications should always support PCMU (payload type 0); previously, DVI4 (payload type 5) was also recommended, but this was removed in 2013 by RFC 7007.

Payload type (PT)NameTypeNo. of channelsClock rate (Hz)[1] Frame size (byte)Default packet interval (ms)DescriptionReferences
0PCMUaudio18000any20ITU-T G.711 PCM μ-Law audio 64 kbit/sRFC 3551
1reserved (previously FS-1016 CELP)audio18000reserved, previously FS-1016 CELP audio 4.8 kbit/sRFC 3551, previously RFC 1890
2reserved (previously G721 or G726-32)audio18000reserved, previously ITU-T G.721 ADPCM audio 32 kbit/s or ITU-T G.726 audio 32 kbit/sRFC 3551, previously RFC 1890
3GSMaudio180002020European GSM Full Rate audio 13 kbit/s (GSM 06.10)RFC 3551
4G723audio180003030ITU-T G.723.1 audioRFC 3551
5DVI4audio18000any20IMA ADPCM audio 32 kbit/sRFC 3551
6DVI4audio116000any20IMA ADPCM audio 64 kbit/sRFC 3551
7LPCaudio18000any20Experimental Linear Predictive Coding audio 5.6 kbit/sRFC 3551
8PCMAaudio18000any20ITU-T G.711 PCM A-Law audio 64 kbit/sRFC 3551
9G722audio18000any20ITU-T G.722 audio 64 kbit/sRFC 3551 - Page 14
10L16audio244100any20Linear PCM 16-bit Stereo audio 1411.2 kbit/s,[2] [3] [4] uncompressed RFC 3551, Page 27
11L16audio144100any20Linear PCM 16-bit audio 705.6 kbit/s, uncompressed RFC 3551, Page 27
12QCELPaudio180002020Qualcomm Code Excited Linear PredictionRFC 2658, RFC 3551
13CNaudio18000Comfort noise. Payload type used with audio codecs that do not support comfort noise as part of the codec itself such as G.711, G.722.1, G.722, G.726, G.727, G.728, GSM 06.10, Siren, and RTAudio.RFC 3389
14MPAaudio1, 2900008–72MPEG-1 or MPEG-2 audio onlyRFC 3551, RFC 2250
15G728audio180002.520ITU-T G.728 audio 16 kbit/sRFC 3551
16DVI4audio111025any20IMA ADPCM audio 44.1 kbit/sRFC 3551
17DVI4audio122050any20IMA ADPCM audio 88.2 kbit/sRFC 3551
18G729audio180001020ITU-T G.729 and G.729a audio 8 kbit/s; Annex B is implied unless the annexb=no parameter is usedRFC 3551, Page 20, RFC 3555, Page 15
19reserved (previously CN)audioreserved, previously comfort noiseRFC 3551
25CELLBvideo90000Sun CellB video[5] RFC 2029
26JPEGvideo90000JPEG videoRFC 2435
28nvvideo90000Xerox PARC's Network Video (nv)[6] [7] RFC 3551, Page 32
31H261video90000ITU-T H.261 videoRFC 4587
32MPVvideo90000MPEG-1 and MPEG-2 videoRFC 2250
33MP2Taudio/video90000MPEG-2 transport streamRFC 2250
34H263video90000H.263 video, first version (1996)RFC 3551, RFC 2190
72 - 76reservedreserved because RTCP packet types 200 - 204 would otherwise be indistinguishable from RTP payload types 72 - 76 with the marker bit setRFC 3550, RFC 3551
77 - 95unassignednote that RTCP packet type 207 (XR, Extended Reports) would be indistinguishable from RTP payload types 79 with the marker bit setRFC 3551, RFC 3611
dynamicH263-1998video90000H.263 video, second version (1998)RFC 3551, RFC 4629, RFC 2190
dynamicH263-2000video90000H.263 video, third version (2000)RFC 4629
dynamic (or profile)H264 AVCvideo90000H.264 video (MPEG-4 Part 10)RFC 6184, previously RFC 3984
dynamic (or profile)H264 SVCvideo90000H.264 videoRFC 6190
dynamic (or profile)H265video90000H.265 video (HEVC)RFC 7798
dynamic (or profile)theoravideo90000Theora videodraft-barbato-avt-rtp-theora
dynamiciLBCaudio1800020, 3020, 30Internet low Bitrate Codec 13.33 or 15.2 kbit/sRFC 3952
dynamicPCMA-WBaudio1160005ITU-T G.711.1 A-lawRFC 5391
dynamicPCMU-WBaudio1160005ITU-T G.711.1 μ-lawRFC 5391
dynamicG718audio32000 (placeholder)20ITU-T G.718draft-ietf-payload-rtp-g718
dynamicG719audio(various)4800020ITU-T G.719RFC 5404
dynamicG7221audio16000, 3200020ITU-T G.722.1 and G.722.1 Annex CRFC 5577
dynamicG726-16audio18000any20ITU-T G.726 audio 16 kbit/sRFC 3551
dynamicG726-24audio18000any20ITU-T G.726 audio 24 kbit/sRFC 3551
dynamicG726-32audio18000any20ITU-T G.726 audio 32 kbit/sRFC 3551
dynamicG726-40audio18000any20ITU-T G.726 audio 40 kbit/sRFC 3551
dynamicG729Daudio180001020ITU-T G.729 Annex DRFC 3551
dynamicG729Eaudio180001020ITU-T G.729 Annex ERFC 3551
dynamicG7291audio1600020ITU-T G.729.1RFC 4749
dynamicGSM-EFRaudio180002020ITU-T GSM-EFR (GSM 06.60)RFC 3551
dynamicGSM-HR-08audio1800020ITU-T GSM-HR (GSM 06.20)RFC 5993
dynamic (or profile)AMRaudio(various)800020Adaptive Multi-Rate audioRFC 4867
dynamic (or profile)AMR-WBaudio(various)1600020Adaptive Multi-Rate Wideband audio (ITU-T G.722.2)RFC 4867
dynamic (or profile)AMR-WB+audio1, 2 or omit7200013.3–40Extended Adaptive Multi Rate – WideBand audioRFC 4352
dynamic (or profile)vorbisaudio(various)(various)Vorbis audioRFC 5215
dynamic (or profile)opusaudio1, 2480002.5–6020Opus audioRFC 7587
dynamic (or profile)speexaudio18000, 16000, 3200020Speex audioRFC 5574
dynamicmpa-robustaudio1, 29000024–72Loss-Tolerant MP3 audioRFC 5219 (previously RFC 3119)
dynamic (or profile)MP4A-LATMaudio90000 or othersMPEG-4 Audio (includes AAC)RFC 6416 (previously RFC 3016)
dynamic (or profile)MP4V-ESvideo90000 or othersMPEG-4 VisualRFC 6416 (previously RFC 3016)
dynamic (or profile)mpeg4-genericaudio/video90000 or otherMPEG-4 Elementary StreamsRFC 3640
dynamicVP8video90000VP8 videoRFC 7741
dynamicVP9video90000VP9 videodraft-ietf-payload-vp9
dynamicAV1video90000AV1 videoav1-rtp-spec
dynamicL8audio(various)(various)any20Linear PCM 8-bit audio with 128 offsetRFC 3551 Section 4.5.10 and Table 5
dynamicDAT12audio(various)(various)any20 (by analogy with L16)IEC 61119 12-bit nonlinear audioRFC 3190 Section 3
dynamicL16audio(various)(various)any20Linear PCM 16-bit audioRFC 3551 Section 4.5.11, RFC 2586
dynamicL20audio(various)(various)any20 (by analogy with L16)Linear PCM 20-bit audioRFC 3190 Section 4
dynamicL24audio(various)(various)any20 (by analogy with L16)Linear PCM 24-bit audioRFC 3190 Section 4
dynamicrawvideo90000Uncompressed VideoRFC 4175
dynamicac3audio(various)32000, 44100, 48000Dolby AC-3 audioRFC 4184
dynamiceac3audio(various)32000, 44100, 48000Enhanced AC-3 audioRFC 4598
dynamict140text1000Text over IPRFC 4103
dynamicEVRC
EVRC0
EVRC1
audio8000EVRC audioRFC 4788
dynamicEVRCB
EVRCB0
EVRCB1
audio8000EVRC-B audioRFC 4788
dynamicEVRCWB
EVRCWB0
EVRCWB1
audio16000EVRC-WB audioRFC 5188
dynamicjpeg2000video90000JPEG 2000 videoRFC 5371
dynamicUEMCLIPaudio8000, 16000UEMCLIP audioRFC 5686
dynamicATRAC3audio44100ATRAC3 audioRFC 5584
dynamicATRAC-Xaudio44100, 48000ATRAC3+ audioRFC 5584
dynamicATRAC-ADVANCED-LOSSLESSaudio(various)ATRAC Advanced Lossless audioRFC 5584
dynamicDVvideo90000DV videoRFC 6469 (previously RFC 3189)
dynamicBT656videoITU-R BT.656 videoRFC 3555
dynamicBMPEGvideoBundled MPEG-2 videoRFC 2343
dynamicSMPTE292MvideoSMPTE 292M videoRFC 3497
dynamicREDaudioRedundant Audio DataRFC 2198
dynamicVDVIaudioVariable-rate DVI4 audioRFC 3551
dynamicMP1SvideoMPEG-1 Systems Streams videoRFC 2250
dynamicMP2PvideoMPEG-2 Program Streams videoRFC 2250
dynamictoneaudio8000 (default)toneRFC 4733
dynamictelephone-eventaudio8000 (default)DTMF toneRFC 4733
dynamicaptxaudio2  - 6(equal to sampling rate)4000 ÷ sample rate4[8] aptX audioRFC 7310
dynamicjxsvvideo90000JPEG XS videoRFC 9134
dynamicscipaudio/video8000 or 90000SCIPRFC 9607

Text messaging payload

MIDI payload

See also

External links

Notes and References

  1. The "clock rate" is the rate at which the timestamp in the RTP header is incremented, which need not be the same as the codec's sampling rate. For instance, video codecs typically use a clock rate of 90000 so their frames can be more precisely aligned with the RTCP NTP timestamp, even though video sampling rates are typically in the range of 1 - 60 samples per second.
  2. Web site: RFC 2586 - The Audio/L16 MIME content type . May 1999 . 2010-03-16.
  3. Web site: RFC 3108 - Conventions for the use of the Session Description Protocol (SDP) for ATM Bearer Connections . May 2001 . 2010-03-16.
  4. Web site: RFC 4856 - Media Type Registration of Payload Formats in the RTP Profile for Audio and Video Conferences - Registration of Media Type audio/L16 . March 2007 . 2010-03-16.
  5. https://docs.oracle.com/cd/E19504-01/802-5863/802-5863.pdf XIL Programmer's Guide
  6. https://www.cs.columbia.edu/~hgs/rtp/nv.html nv - network video on Henning Schulzrinne's website
  7. https://github.com/ronf/nv Ron Frederick Github with source code
  8. For aptX, the packetization interval must be rounded down to the nearest packet interval that can contain an integer number of samples. So at sampling rates of 11025, 22050, or 44100, a packetization rate of "4" is rounded down to 3.99.