Designing Networks for VoIP and Real-Time Communications
Many networks are designed for data traffic. Real-time traffic like voice and video add new requirements for network design, operation, and troubleshooting.
Was your network designed for data traffic and then modified to handle real-time communications traffic? If so, you’re not alone. Most networks are primarily designed for data. Support for real-time traffic like voice, video, and collaboration is added on after the design for data is completed. Unfortunately, this type of design can result in sub-optimum operation of both data and real-time traffic, due to the interaction between the two types of applications. This was one of the most common issues we had to troubleshoot for our clients. The early indicators were always vague and generic “quality” problems with VoIP and RTC video. What makes these kinds of issues even harder to deal with is that they tend to develop as the client’s network grows which results in claims of “this used to work just fine.”
Data traffic typically uses TCP as its transport to achieve reliable delivery of the user data. While TCP includes congestion avoidance mechanisms, it will ramp up to use as much bandwidth as it can for large transfers. The slow-start mechanism helps avoid congestion due to too much data being transmitted in a short time. However, it doubles the amount of data to be transmitted (the transmit window) every round-trip time, up to the maximum that the receiver can handle. With round-trip times measured in a few milliseconds in most enterprises, the ramp-up happens quite quickly, which can cause problems for real-time traffic.
Add UC&C Traffic
The bulk of real-time traffic on enterprise networks is voice and video, perhaps with some highly interactive collaboration applications. There are two types of real-time traffic. The first type is streaming media, such as training videos and YouTube. Streaming media is typically delivered by TCP and the application will buffer data for playback, as evidenced by the message “Buffering” that is displayed by some applications as it waits for more data to be received. Because streaming media uses TCP, it will adapt to network congestion.
The second type of real-time traffic is interactive voice and video, such as a voice call or a video conference. Collaboration might include voice, video, and multi-author, interactive document editing. Interactive voice and video typically uses UDP for its transport, and therefore is subject to packet loss. Codec vendors have addressed minor packet loss by approximating individual lost packets. Our ears and eyes also tend to ignore a few glitches due to packet loss. The problem occurs when the network is so congested that bursts of packets are dropped. UDP packets that are significantly delayed due to a burst of big data packets are similar to dropped packets. They simply arrive at the destination application, phone, or video conferencing system too late for the playback. So they look like packet loss. UDP packet loss can occur on high bandwidth links as well as low bandwidth links, although the frequency of occurrence is much higher with low bandwidth infrastructure.
Controlling Traffic: QoS and Policy Routing
The tool we have for controlling UDP congestion loss is Quality of Service (QoS). There are three steps to QoS: Classification, marking, and forwarding. All traffic is classified (prioritized), with voice getting the highest priority and video getting the next highest priority. TCP-based applications are typically given medium or low (best effort) priority. There is also a traffic class known as “scavenger” or “less than best effort,” because its traffic is of lower priority than other traffic. Once packets are classified, they are marked, typically by setting a DSCP value in the IP header. Finally, router and switch policy configurations control the selection and forwarding of high priority traffic to be forwarded before lower priority traffic.
The configuration of QoS can be daunting if you haven’t done it before. There is configuration for each step: Classification, marking, and forwarding. It is a good idea to specify bandwidth limits on each type of traffic so that there is bandwidth left for other traffic types. Ideally, QoS would be implemented across the network, so that the desired handling happens over all links. In some cases, the implementation of QoS on a specific link can have very positive results.
Policy routing is an alternative mechanism for handling real-time traffic. The idea is to route traffic over dedicated links or links that you know have good characteristics for the traffic type. This approach requires complex routing configurations to identify the real-time traffic and modify the next-hop address to direct the real-time traffic to a dedicated network path. Policy routing can also be applied to low priority traffic to route it over low-cost, high latency paths that are less acceptable for real-time traffic. We employed this tactic to establish priority routing for our integrated service customers’ VoIP RTP traffic from the CPE to our switches, essentially giving voice traffic an express lane through our wireless backhaul network.
Monitoring and Troubleshooting
Finally, the network must be monitored to identify when it isn’t functioning correctly or when traffic has changed in a way that the design didn’t anticipate. A new application’s traffic may need special handling, or the volume of an existing application is greater or less than was designed. Monitoring QoS queue drops, interface drops, and interface utilization provides useful information and is possible from many network monitoring applications.
Also plan your troubleshooting tools and methodology for those times when something breaks. A problem may be caused by a configuration mistake or it may be a device or link failure. It is important to be able to quickly identify and resolve problems. For on-going monitoring and analysis of real-time traffic, I like to deploy application probes that generate synthetic traffic or place real voice/video calls, measuring the results and reporting when changes are detected. Cisco Voice Endpoints also have mechanisms built in to report call quality metrics which we collected for our business customers