To create a proper network design it is important to have in depth understanding of working of VOIP networks.  This lesson explains challenges of integrating a voice and data network and suggest solutions to avoid problems while designing VOIP networks for optimal voice quality.

IP Networking and Audio Clarity – Due to inherent nature of IP networking, voice packets send over IP are subject to certain transmission issues prevailing in the network such as echo, jitter, or delay which needs to be addressed using Quality of service or (QoS) mechanism.

The clarity of audio signal is of prime importance.  The listener must be able to recognize the speaker and sense the mood of the speaker. The following factors affect the clarity of voice:


Fidelity is the degree to which a system, or portion of system, is able to reproduce accurately at its output the vital characteristics of the signal overwhelmed upon its input.  Total bandwidth of the spoken voice is limited by the bandwidth of the transmission.  Typically human voice requires bandwidth ranging from 100 to 10,000 Hz.

Echo is result due to electrical impendence mismatch on the transmission path.  Echo is always there, whether telephone networks are traditional or VOIP based, but not at the audible frequency of human ear.  The echo is affected by amplitude (loudness of echo) and delay (time between spoken voice and echoed sound).  Echo can be controlled using suppressors or cancellers.

Jitter is the variation in the arrival of coded speech packets at the end of a VOIP networks.  The different arrival time of packets causes gaps in the recreation and playback of the voice signal.  These gaps are not desired.  Delay is induced in the network by varying the routes of each packet, contention or congestion. We can resolve the delay (variable) by using a dejitter buffers. Figure 1 illustrates example of jitter.

Figure 1: Jitter in VOIP communication networks

 ccna voice supporting voip

When an audio stream is received by the router for VOIP it must balance any jitter which has occurred.  To compensate jitter, play out delay buffer mechanisms are used, or dejitter is put into practice.  The play out delay buffer buffers the packets and plays them in a steady stream to DSPs to get converted back to analog stream.

Delay is the time between spoken voice and arrival of voice electronically delivered to the other end/ destination.  Delay could result from multiple factors such as propagation delay, coding, compression, serialization and buffers.

Packet loss could happen when voice packets gets dropped under various conditions of network such as unstable connection, congestion, or due to too much variable delays.  Voice packets which are lost cannot be recovered and can cause gaps in conversation.  Figure 2 illustrates sources of delay.

Figure 2: Sources of delay in a VOIP network environment

 ccna voice supporting voip

A voice network must correctly account for all potential delays to ensure overall network performance at acceptable levels.  Voice quality is function of many factors inclusive of compression algorithm, errors and frame loss, echo cancellation and delay.  Delay can be fixed or variable in nature.  In a fixed delay components include coding (time needed to translate audio to digital signal), packetization (time required to put digital voice information into packets and remove information from packets), serialization (insertion of bits into a link), propagation (times needed for packet to traverse a link).

Variable delay could arise from queuing delays in the trunk buffers which are located on a serial port connected to the WAN networks.  These buffers create variable delays known as jitters.

International telecommunication standardization sector (ITU-T) recommends network delays for voice applications in G.114 standard. Refer to table 1 to know three bands of acceptable delay set by G.114 standard.

Table 1: Acceptable delay as per G.114 standard

(range – Milliseconds)


0 to 150

Acceptable for user applications

150 to 400

Acceptable, for scenario where administrators are aware of transmission time / its impact on quality of user applications

Above 400

Unacceptable for general network planning purpose


Side tone is designed so that speaker can hear their spoken audio in the earpiece.  Without the side tone, the speaker will be having the impression that phone is not working.

Background noise is low volume audio which can be heard at far connection end.  Certain bandwidth saving technologies eliminates background noise such as voice activity detection (VAD).  In this technology speaker audio path is open to the listener, when listener audio path is closed to the speaker; VAD creates an effect giving the speaker impression that connection is broken because nothing is heard from other end.  VAD can also be combined with comfort noise generation (CNG) to give illusion that call is not disconnected.

Packet loss is irrecoverable other than if the endpoints request for retransmission.  Lost voice packets could happen under multiple conditions such as; the network is unstable, it is congested, or too much variable delay is there.  Packet loss will cause voice clipping and skipping.  Cisco DSPs correct 20 ms to 50 ms of lost voice using packet loss concealment algorithms (PLC).  It analyzes the missing packets and generates a reasonable replacement packet to improvise the voice quality.  By default Cisco VOIP technologies use 20 ms samples of voice payload per VOIP.

This CCNA Voice lesson reviewed our understanding of environmental concerns while designing a voice and data converged VOIP networks.  We learned about various environmental factors which impact the quality of voice transmission in VOIP networks and means that are used to handle these issues.