This is about an analysis of data collected in 19 data centers. The data center is supporting a wide range of applications, such as search, streaming, map-reduce, etc. Two set of data are collected: (1) SNMP data of every network device in a 5-minute interval for 10 days; and (2) TcpDump packet traces of 5 switches in a data center.

The data center architecture is tiered (with 2-3 layers) with core busier than the edge. Findings according to the paper:

  • At any moment, only less than 60% of core and edge links, and 73% aggregation links are used
  • Utilization according to 5-min SNMP is higher in core, due to multiplexing
    • Since the DCN in examination has 4x aggregation links than core links, thus it is not a fat-tree
  • Links are not likely idle for long time. Over the 10 day period, only less than 5% links can idle for more than 2 hours accumulatively
  • Packet sizes in the network are bimodel: peak at around 40B (TCP ACK?) and 1470B (Ethernet limit)
  • Core links observe the least lost, edge the greatest: Traffic is bursty in edge and aggregation
  • Large number of links are unused in any 5-min period, but exact set of idle links is constantly changing
  • Traffic at microscopic view reveals a ON-OFF pattern
  • The length of ON period, length of OFF period, and packet interarrival times during ON period all shows a lognormal distribution

Bibliographic data

@inproceedings{
   title = "Understanding Data Center Traffic Characteristics",
   author = "Theophilus Benson and Ashok Anand and Aditya Akella and Ming Zhang",
   booktitle = "Proc. WREN'09",
   month = "Aug 21",
   year = "2009",
   address = "Barcelona, Spain",
}