Mu current customer develops embedded devices used by many end users in Netherlands. In order to save server load devices use multicasts for downloading data: every device registers itself on multicast channel using IGMP and listens to UDP packets. No connections to be managed results in lower overhead.
However, some data (or some requests) cannot be downloaded from multicasts channels and HTTP/HTTPS must be used to interact with server. As the number of devices is very high special methods have been used in order not to overload backend servers (randomised delays, client software optimization).
Consequently, small bug in client software that will trigger more events than usual can be very dangerous to whole system stability (because the effect of thousands of devices – perfect DDOS may kill back-end). In order to catch such errant behaviour as soon as possible I’ve developed daily report that controls server usage in my customer office.
First of all, we need to locate the most “interesting” device by IP address from logs (we list top 20 IPs based on server usage):
ssh $server "cat /path-to-logs/localhost_access_log.$date.log" | awk '
{
t[$1 " " $7]++
ip[$1]++
}
END {
for(a in t) { print t[a], a }
max = 0
max_ip = ""
for(a in ip) { if(ip[a] > max) { max=ip[a]; max_ip=a; } }
print max_ip > "/tmp/max_ip"
}
' | sort -n | tail -20
IP="`cat /tmp/max_ip`"
Then selected IP will be examined hour-by-hour to locate patterns in behavior:


