Whenever we hear the phrase ‘big data’ we usually have connotations with texts or numbers, yet data in the form of audio or video has never been so widespread. One requirement for real-time communication is to ensure data availability for audio files. Various adaptation models are used to adjust the processing of received audio packets based on the delay conditions within the network. Current models perform delay adaptation by mostly using a buffer to store received packets. Therefore, there is an increasing need for understanding the use of buffer algorithms. Yet, before proceeding with them, let’s have a look at the definition of a buffer.
A buffer is used to compensate for varying network delays and varying playout request. In an ideal setting, a packet always arrives at the time for the corresponding playout event. In a real setting, this is not the case due to varying network delays as seen in Figure 1. A buffer is needed to store a certain amount of incoming packets and delay their playout such that there is always data available for playout. In addition to this, the rate of incoming audio packets and playout events might be slightly different due to clock drift.
A mechanism is needed to adjust the buffer level by generating or removing data. Optimally this algorithm has to keep the amount of data in the buffer as low as possible and to keep the delay as low as possible, but also to minimize the probability of a loss of packets due to late arrival. A buffer level adjustment algorithm is based on the following three components:
1. Inter-Arrival Time Prediction:
The algorithm implements the delay adaptation by estimating a time-dependent histogram of audio packet inter-arrival times over past inter-arrival times, including the current inter-arrival time. An exponentially decaying weighting window is used to lower the influence of further away samples. The inter-arrival times are represented in terms of a number of packet lengths, as an integer:
- An inter-arrival time of 1, therefore, corresponds to the packet’s length up to, but not including, two times the packet’s length.
- An inter-arrival time of 0 means the packet arrived earlier than expected and an inter-arrival time greater or equal than 2 means the packet arrived at least one packet length later than expected.
2. Target Buffer Level Decision:
Based on the histogram of inter-arrival times, the target buffer level is chosen, such that at least 95% of packets arrive on time. This is done by calculating the accumulative probability of receiving a packet later than a certain inter-arrival time and choosing the target buffer level, such that the probability is at most 5%. Additionally, peak detection is used to account for peaks that occur too infrequent to have an impact on the histogram but occur regularly. If such peaks are detected, a higher buffer level is chosen.
3. Buffer Level Adjustment
The actual buffer level is driven by the received packets and the playout events. A received packet increases the actual buffer level while a playout event decreases the buffer level. To adjust the actual buffer level towards the target buffer level, additional modifications of the buffer level can be done during the playout events. If the buffer level is below the target buffer level, a so-called preemptive expansion is done by expanding the audio in the buffer by at most 100ms per second. If the buffer level is above the target buffer level, acceleration of the audio is applied by shrinking the audio in the buffer by at most 100ms per second.
This is a very basic explanation of the behavior of the buffer level adjustment algorithm. The actual implementation can be more complex given the amount of late-arriving packets and missing packets.
Design of a Buffer Algorithm
In order to design a buffer algorithm, the following aspects should be taken into account:
- Database preparation, including data download, cleanup and preprocessing
- Implementation of an oracle algorithm
- Implementation of a machine learning algorithm
- Evaluation of the results
Let’s have a look at each of these aspects in more detail:
Database creation, cleanup, preprocessing and visualization to allow data driven training:
To train a machine learning algorithm, training and evaluation data are needed. A fixed database is necessary to allow reproducible and comparable experiments on the same data set with different setups and algorithms. Therefore, downloading of log files from a database are suggested. In order to process these log files, data parsing functions will also need to be implemented. To allow easier analysis and improve understanding of these log file, data visualization would be necessary along with transformation functions to calculate other features.
Implement an oracle algorithm to estimate an optimal buffer level as a baseline measure to evaluate and optimize the buffer level control algorithm:
The oracle algorithm to determine an optimal buffer level is based on finding an optimal sequence of actions to control the buffer level, based on a complete sequence of events extracted from a log file. This can be achieved by searching the shortest path in a graph whose nodes represent the buffer levels for a given event and whose edges represent buffer level changes, based on events and actions in order to arrive at the desired buffer level. The cost of transitions between buffer levels is calculated as the sum of costs resulting from the next buffer level, as determined by an event and an additional action plus the cost for performing the action.
A machine learning framework for automated training, evaluation and model export to reduce the complexity of implementing new models:
A framework should enable to more easily use the provided setup and workflow to implement own models and experiments. It also provides helper functions to simplify model handling, for example, model saving and inference. It also provides helper functions to execute training remotely on a cluster and optimize parameters.
Implement a machine learning-based algorithm to control the buffer level to reduce overall delay and audio distortion:
The goal of a machine learning-based algorithm is to optimize the buffer size. The size has to be adjusted, such that the end-to-end delay, or the delay induced by the buffer, is as low as possible, yet with the constraint of losing as little packets as possible and modifying the as least as possible. Various algorithms and neural network structures, especially feedforward and recurrent neural networks can be investigated to find an optimal implementation. The oracle buffer level and the estimated buffer control actions can be used as training targets.
Evaluation during implementation to monitoring progress and of the new buffer level control algorithm to evaluate the resulting delay and audio quality:
Evaluation can be done by comparing the resulting buffer level with the oracle buffer level, distortions induced by the actions to modify the buffer level and total delay.
A machine learning framework to apply data-driven machine learning to event logs can be implemented while factoring out commonly used code constructs for running an experiment, setting up and data reading.
Such a framework can be used to develop a machine learning system from scratch for the delay adaptation. Its components for data pre-processing, feature extraction or target calculation are of course not without any flaws, yet they would still provide a useful perceptual understanding for those interested in designing buffer algorithms.