IPTV technology delivers video streams using fast, but not reliable protocols (UDP). Those connection-less protocols do not guarantee delivery nor retransmissions of missing packets. We have to accept low video quality for some networks or add another layer above basic protocol that allows to control completeness of delivery. For this purpose we use RTP (packet order, completeness) and RTCP (retransmission requests) protocols.
In this port I'm going to show how effectively use Linux command line tools to analyse client - server cooperation regarding retransmissions for complicated real-world example. Basic RTP metadata collected by tcpdump is used as an input.
A real case that came for analysis: one of content delivery servers cannot deliver retransmissions properly for HD streams (we see video decoding errors on a client) despite the fact that everything is configured properly and should work OK.
As an input I collected tcpdump output of RTP headers for streaming session with simulated data loss (some percentage of packets are dropped):
$ tcpdump -qpnli eth0 -T rtp
(...)
18:42:16.178074 IP 213.75.112.54.10000 > 10.215.38.151.3544: udp/rtp 1316 c96 17825 3991516661
18:42:16.179670 IP 213.75.112.54.10000 > 10.215.38.151.3544: udp/rtp 1316 c96 17826 3991516802
18:42:16.181178 IP 213.75.112.54.10000 > 10.215.38.151.3544: udp/rtp 1316 c96 17827 3991516934
18:42:16.182627 IP 213.75.112.54.10000 > 10.215.38.151.3544: udp/rtp 1316 c96 17828 3991517065
18:42:16.184174 IP 213.75.112.54.10000 > 10.215.38.151.3544: udp/rtp 1316 c96 17829 3991517207
18:42:16.185628 IP 213.75.112.54.10000 > 10.215.38.151.3544: udp/rtp 1316 c96 17830 3991517338
18:42:16.187169 IP 213.75.112.54.10000 > 10.215.38.151.3544: udp/rtp 1316 c96 17831 3991517480
18:42:16.188619 IP 213.75.112.54.10000 > 10.215.38.151.3544: udp/rtp 1316 c96 17832 3991517611
18:42:16.190225 IP 213.75.112.54.10000 > 10.215.38.151.3544: udp/rtp 1316 c96 17833 3991517753
18:42:16.191690 IP 213.75.112.54.10000 > 10.215.38.151.3544: udp/rtp 1316 c96 17834 3991517884
(...)
You can observe there the following fields in tcpdump output:
- current time
- protocoltype (IP)
- source IP address with port
- destination IP address with port
- packet type (udp/rtp)
- packet length in bytes
- session ID (c96)
- RTP sequence number (17825 .. 17834)
- RTP timestamps value (3991516661 .. 3991517884)
Based on automated analysis of sequence and timestamp we could easily detect retransmission packets out of all packets and diagnose what might be wrong with the whole process. Let's analyse differences of sequence and timestamp fields:
$ zcat tcpdump.txt.gz | sh lost-repair-analysis.sh | head
18:42:12.673207 packets loss start
18:42:12.788811 packets lost: 75 range: 15521 .. 15595
18:42:12.919069 repair burst started
18:42:13.030555 repair burst finished, packets: 32 range: 15532 .. 15594
18:42:14.836849 packets loss start
18:42:14.939092 packets lost: 66 range: 16944 .. 17009
18:42:15.104764 repair burst started
18:42:15.177257 repair burst finished, packets: 42 range: 16968 .. 17009
18:42:16.985232 packets loss start
18:42:17.087424 packets lost: 67 range: 18356 .. 18422
As you can see 75 packet loss has been detected by script but repair delivered only last 32 packets. The problem with repair was related to server that did retransmissions (too small buffer).
The script source code:
#!/bin/sh sort | uniq | awk ' /> 10.215.38.151.3544/ { dp=$9-p dt=$10-t if(dp>3 && p){ print last_regular_time, "packets loss start" print $1, "packets lost:", dp-1, "range:", p+1, "..", $9-1 p=$9 t=$10 lost = dp - 1 } else if (dp<0) { # One of repairs if (!repair) { print $1, "repair burst started" first_repair = $9 first_repair_time = $1 } repair ++ last_repair = $9 normal = 0 } else { # Regular packet p=$9 t=$10 normal ++ if (normal > 4 && repair) { print $1, "repair burst finished, packets:", repair, \ "range:", first_repair, "..", last_repair repair = 0 } last_regular_time = $1 } }'
In order to analyse deployment problems carefully you don't have to use hardware protocol monitors. Simple packet dump + careful automated offline analysis are sufficient in most cases.