Asterisk was written from the ground up with a set of assumptions about how media is represented and negotiated. These assumptions have worked to get Asterisk where it is today, but unfortunately they have also put in place a set of limitations that must be overcome before Asterisk can meet the demands of the future. While these limitations are built into the foundation of Asterisk's design, the necessary changes required to lift these constraints can be made. This document outlines the required changes and breaks them up into a layered approach. Each section addresses a specific problem with Asterisk's current media architecture and proposes a solution. As the sections progress each new section uses the foundation outlined in the previous sections to address an increasingly complex set of problems. By attacking this issue from the foundation up it is possible to generate a complete solution that exceeds the current development constraints opening Asterisk up to an entire new set of possibilities.
2. Table of Contents
- 1. Introduction
- 2. Table of Contents
- 3. Project Requirements
- 4. Representation of Media Formats
- 4.1. Problem Overview
- 4.2. Introducing ast_format, The New and Improved format_t
- 4.3. The Ast Format API
- 4.4. Introducing the Format Attribute Structure
- 4.5. The Ast Format Attribute API
- 4.6. The New Format Unique Identifier
- 4.7. Format Unique Identifier Organization
- 4.8. New Format Unique Id Changes to frame.h
- 4.9. New Format Representation Code Examples and Use cases.
- 5. Representation of Format Capabilities
- 6. IAX2 Ast Format API Compatibility
- 7. Revised Format Translation
- 7.1. Problem Overview
- 7.2. Building Translation Paths
- 7.3. Translator Redundancy and Failover
- 7.4. Redefining The Translator Interface
- 8. Handling Multiple Media Streams
- 9. Media Format with Attributes User Configuration
- 10. Enhancing Format Negotiation During Call Setup
- 11. Format Renegotiation After Call Setup
- 12. Implementation Phases
3. Project Requirements
- Lift the limit placed on the number of media formats Asterisk can support.
- Add the ability for Asterisk to represent media formats with attributes.
- Support for SILK with attributes
- Support for H.264 with attributes
- Add the ability for Asterisk to negotiate media formats with attributes.
- Allow translation paths to be built between all media types, not just audio.
- Allow translation paths to be built in a way that takes into account both media quality and translation cost for all media formats.
- Allow a channel to process multiple media streams, even of the same media type, with translation.
- Support the ability to renegotiate media formats after call setup is complete.
- Support the ability to pass-through media Asterisk does not yet understand.
- Support the ability to for users to specify media formats with attributes in .conf files.
4. Representation of Media Formats
4.1. Problem Overview
One of the key problems the new media architecture must address is how to represent a media format that does not have statically defined parameters. In the past, simply defining a media format type as uLaw or g722 posed no problem as these formats have a very specific set of parameters associated with them. For example uLaw is always 8khz, each sample is always the exact same size, and there is really nothing more required to describe a uLaw payload other than now large it is. Everything else can be calculated because parameters for uLaw payloads never change. Unfortunately the assumption that media formats do not need to be defined beyond their format type has proven to be a limitation in the ability to adopt modern media formats. The problems prohibiting integration of feature complete SILK codec support into Asterisk offers a prime example of how this limitation is hindering development. SILK is an audio codec that may adjust the sample rate used in a stream based upon the capabilities of the network. Right now Asterisk assumes every media format will always contain the same sample rate. Without the ability to define a format's sample rate outside of the hard coded rate defined at compile time, implementing SILK into Asterisk without limiting the codec's functionality is not currently possible.
In order to address this limitation, media formats will have the ability to be further defined using format specific attribute structures. These structures along with usage examples are outlined in below.
4.2. Introducing ast_format, The New and Improved format_t
The ast_format structure completely replaces format_t everywhere in the code. This new structure allows for a format to be represented not only by a unique ID, but with an attribute structure as well. This means if a channel's read format is SILK and it understands 8khz->16khz without the need of translation, this can be now represented using only a single format identifier. In this case the ast_format's uid would be AST_FORMAT_SILK, and the attribute structure would be configured to further define this format as having a possible dynamic sample rate between 8khz and 16khz.
The ast_format structure on an ast_frame has a slightly different behavior than representing a read and write format though. When on a frame the attributes structure must be used only to further define the frame's payload. In the SILK read format example discussed above, the attribute structure is used to represent a sample rate range the channel's read format is capable of understanding without translation, but when the attribute structure is used on a frame it must represent a very precise set of parameters directly related to the media payload being transported. In the case of SILK, the attribute structure on a frame would represent precisely what sample rate the payload contains.
4.3. The Ast Format API
4.4. Introducing the Format Attribute Structure
The attribute structure is present on every ast_format object. This attribute structure is an opaque buffer that can be used in anyway necessary by the format it represents. Since it will be necessary for Asterisk to perform a few generic operations on these attribute structures, every format requiring the use of the attribute structure must implement and register a format attribute interface with Asterisk. These registered interfaces are used by the Ast Format API allowing for attributes on an ast_format structure to be set, removed, and compared using a single set of API functions for all format types. The Ast Format API does all the work of finding the correct interface to use and calling the correct interface functions.
The size of the buffer in the attribute structure was determined by researching the media format with the largest number of attributes expected to be present in Asterisk 10. In this case the H.264 SVC draft was used, which is an expanded form of RFC 3984 allowing for some additional functionality. The attributes required by H.264 SVC are determined based upon the SDP parameters defined in the draft. The SDP parameters used by the draft do not all have fixed sizes, but it was determined that an attribute buffer of ~70 bytes will easily suffice for representing the most common use cases. In order to account for undefined future development, this buffer is initially set at 128 bytes which satisfies the current estimated attribute size requirements.
4.5. The Ast Format Attribute API
4.6. The New Format Unique Identifier
Media formats in Asterisk are currently defined using a bit field, format_t, where every format is uniquely identified by a single bit. While this makes comparing media format capabilities extremely simple using bitwise operations, this representation limits the number of media formats that can be represented due to the limited size of the bit field in use. Even if a bit field could represent an infinite number of bits, this representation has no concept of how to compare format capability attributes.
In order to remove the limitation of the number of unique formats that can be represented the identifier will change from a single bit representation to a numeric representation. This means that #define AST_FORMAT_ULAW (1 << 0) now becomes #define AST_FORMAT_ULAW 1. By changing the way media formats are identified from a bit in a bit field to a numeric value, the limit on the number of formats that can be represented goes from 64 to 4,294,967,296. Altering this representation completely removes the ability to use bitwise operations on a bit field containing multiple media format capabilities, but since these bitwise operations lack the ability to process format attributes, they must be replaced by a more robust system anyway. The new system for computing joint media capabilities between peers hinted at here is discussed in detail in the Representation of Format Capabilities section.
4.7. Format Unique Identifier Organization
The old system of using a single bit in a bit field to represent a single format also allows for bitmasks to be used to determine what type of media a format is categorized as. For example, there is a bitmask for determining if a format is an audio format, video format, or text format. By changing the unique id to a number the ability to use bitmasks to determine the category is no longer possible. Instead, a new convention of organizing these formats into media categories must be set in place.
Since the number of formats that can be represented will likely never be exhausted using the new system, formats can be uniquely identified and categorized using a system that sections off each category into a range of numbers. Since it is unlikely any category will ever have even close to 100,000 unique formats associated with it, each category will be sectioned off by increments of 100,000. For example, all audio formats will be uniquely identified in a category between 100,000-199,999, all video formats will be uniquely identified in a category between 200,000-299,999, and so on for every category. This new system allows for each format's unique id to be overloaded with a category as well just like the previous system did. Instead of using a bitmask to determine if a format is video or audio, a function or macro can be used to do the comparison consistently across the code base.
4.8. New Format Unique Id Changes to frame.h
4.9. New Format Representation Code Examples and Use cases.
This section shows example usage of the ast_format structure and how it replaces existing functionality in Asterisk. It also outlines other highlevel use cases that can not easilly be represented by a code example.
Example 1: One to one mapping of old format_t usage with ast_format structure and its API.
Example 2: Set an optional format attribute structure for a SILK ast_format structure capable of a dynamic sample rate.
Example 3: Set the sample rate of SILK ast_frame representing the sample rate of the frame's payload. Then compare the format of the ast_frame with a read format determine if translation is required.
Example 4. Determine if a format is of type audio.
Example 5: Media format seamlessly changes parameters midstream.
1. A channel is defined to have a write format of SILK with the capabilities of understanding 8khz and 16khz without translation.
2. A stream of SILK audio ast_frames containing 16khz frame attributes begin to be written to the channel.
3. During the call the audio stream's SILK frame attributes change to 8khz.
4. ast_write() determines this change is still within the channel's write format capabilities and continues without translation.
Example 6: Media format changes parameters requiring translation midstream.
1. A channel is defined to have a write format of SILK with the capabilities of understanding 8khz and 16khz without translation.
2. A stream of SILK audio ast_frames containing 16khz frame attributes begin to be written to the channel.
3. During the call the audio stream's SILK frame attributes change to 24khz.
4. ast_write() determines this change is not within the bounds of the channel's write format capabilities and builds a translation path from 24khz SILK to 16khz SILK.
5. Representation of Format Capabilities
5.1. Problem Overview
The new way of handling format capabilities must address two issues. First, formats are no longer represented by the format_t bit field and are replaced by the ast_format structure. This means that the old system of representing format capability sets with a bit field must be replaced as well. Second, even if we could use a bit field to represent format capability sets, the bitwise operators used to compare capabilities and calculate joint capabilities are incapable of processing the new format attribute structures. In order to handle both of these changes, an opaque capabilities container must be created to manipulate sets of ast_format structures. This container must also be coupled with an API that abstracts all the work required to compare sets of ast_formats and their internal format attributes.
5.2. Introducing ast_cap, The Format Capability Container.
The Format Capability API introduces a new container type, struct ast_cap, which acts as the opaque capabilities container discussed in the overview. Like an ao2_container holds astobj2 objects, the ast_cap container holds ast_format objects. The thing that sets the ast_cap container apart from other generic containers in Asterisk is that it is designed specifically for the purpose of comparing and manipulating sets of ast_format structures. API functions for adding/removing ast_formats, computing joint capabilities, and retrieving all capabilities for a specific media type are present. The best way to communicate the big picture for how this new container and API replaces the current architecture is by providing some examples. These examples will walk through the sections discussed so far and provide a better understanding for how the ast_format and ast_cap containers interact with each other using the new API. All the examples below take code from the existing media architecture in Asterisk and show how the new architecture replaces it.
Example 1: Add format capabilities to a peer.
Example 2: Find joint capabilities between a peer and remote endpoint.
Example 3: Separate audio, video, and text capabilities.
5.3. Ast Format Capability API Defined
6. IAX2 Ast Format API Compatibility
IAX2 represents media formats the same way Asterisk currently does using a bit field. This allows Asterisk to communicate format capabilities over IAX2 using the exact same representation Asterisk uses internally. This relationship between Asterisk and IAX2 breaks with the introduction of the ast_format and ast_cap structures though. In order for Asterisk to maintain compatiblity with IAX2 a conversion layer must exist between the previous format representation and the new format representation. This conversion layer will be limited to the formats defined at the moment the media format representation in Asterisk changes to use the ast_format structure. As new media formats are introduced, they must be added to this conversion layer in order to be transported over IAX2. Any media formats requiring the use of media attributes may have to be excluded from this conversion depending on their complexity. Eventually the number of media formats that can be represented in IAX2 will be exhasted. At that point it must be decided to either accept that limitation or alter the protocol in a way that will expand it to take advantage of Asterisk's new format capabilities. This proposal is not defining a change any changes to the IAX2 protocol.
7. Revised Format Translation
7.1. Problem Overview
There are two sets of problems that must be addressed in regards to media translation in Asterisk. The first set of problems is a ripple effect caused by the changes surrounding the new representation of media formats with attributes. Translators must gain the ability to process these attributes and build translation paths between formats requiring the use of them. The other set of problems involves the ability to translate between media types other than just audio. The current translation architecture is very audio specific. It assumes that all translators are audio format translators of some kind, and that no other media type will ever be translated. This assumption is not only within the translation code, it is also deeply rooted throughout the code base. The ability to translate between media other than audio is a concept Asterisk completely lacks at the moment.
This section builds upon the foundation established by the new ast_format media format representation and uses it to redefine what translators look like and how translation paths are built. After these changes are made Asterisk will still not be able to translate video or other types of media even if translation paths actually exist between them. This problem is a result of limitations set in place by the Ast Channel API. The changes required to lift that limitation are discussed in the "Handling Multiple Media Streams" section.
7.2. Building Translation Paths
The current method of calculating translation cost by using the computational time required to translate between formats is no longer effective. When all the formats in Asterisk were 8khz audio, picking the best translation path based upon computational cost made sense. The problem with this system now is that it does not take into account the quality of the translation. It may be computationally quicker to translated from one 16khz audio format to another 16khz audio format using 8khz signed linear audio even when 16khz signed linear is an option. Regardless of the computational costs, down sampling an audio stream unless it is absolutely necessary is a bad idea from a quality perspective. Provisions were made in the current code base to account for the down sampling issue just described, but the introduction of that fix was merely a hack to sustain the current system until a more robust architecture could be set in place. The new system must be aware of quality changes between all forms of media, not just sample rate.
Instead of building a translation cost table out based on computational complexity, the table should be built based on what kind of translation is taking place. For example categorizing a translator as a "lossless to lossy translation with a down sampling of quality" gives quite a bit more information about what kind of translation is actually taking place than simply knowing the translation is between two formats and it takes x amount of time to compute 1 second of sample data. As new formats are introduced, knowing how all the different translators affect media during translation allows the path builder algorithm to consistently produce the best quality available.
7.2.1. Computing Translation Costs
The new translation cost table is built on a scale between 400 and 9999. Notice that the lowest cost is 400 and the next cost after that is 600. These two numbers add up to 1000, which guarantees that a direct translation path will always take precedence over any path containing multiple translation steps. The only exception to this rule is a multiple step translation path between lossless formats of the same quality, which does not exist in Asterisk yet but may in the future.
Every one of these cost categories can be thought of as a range starting at the number listed and ranging all the way up to the next category. If a format is capable of multiple translators for any single category listed below, the cost associated with those translators should not fall onto the same cost number as each other. Instead each translator for a single format calling into the same cost table category should be given a weighted cost within the category's range. For example, siren17 is a 32khz audio codec with translators capable of down sampling to both 16khz signed linear and 8khz signed linear. Both of these translators fall under the [[email protected]:lossy -> lossless] downsample" category which starts at cost 960. In order to make this work the 16khz conversion would be 960 and the 8khz conversion would be 961. This gives the translator that loses the least amount of information priority over the the one that loses more if a tie occurs.
This cost table is weighted in a way that assigns lower cost to translators with the most ideal outcome. For example, translating between a lossless format to a lossy format is aways more ideal that converting a lossy format to a lossless format, translating between two lossy formats of the same quality is always more ideal than translating to a lossy format of lesser quality, and translating to a format of equivalent in quality to the original format is more ideal than any translation that requires some sort of re-sampling. The costs are computed based on these principles and more.
7.2.2. Translation Cost Table
Up Sample: The original format is translated to a format capable of representing more detailed information than the original one. Examples of this term would be audio codec being translated to a higher sample rate, a video codec being translated to a higher resolution/frame rate, or an image being translated to a higher resolution.
Down Sample: The original format is translated to a format of lesser quality. Examples of this term would be audio codec being translated to a lower sample rate, a video codec being translated to a lower resolution/frame rate, or an image being translated to a lower resolution.
Original Sampling: The original format is translated to a format of similar quality with little to no loss of information. Examples of this term would be an audio codec being translated to a format equivalent in quality of the original one, a video codec being translated to a format which preserves all the original information present, and an image being translated to another format preserving the same resolution and color depth.
7.2.3. Translation Path Examples
Example 1: Downsampling g722 to ulaw using signed linear as an intermediary step. Notice that using two lossless conversions is more expensive than downsampling g722 directly to 8khz slin.
Example 2: Direct lossy to loss translation using ulaw to alaw. Notice how the direct path between uLaw and aLaw beats using the intermediary slin step.
Example 3: Complex resampling of siren14 to siren7 using g722 as an intermediary step. Notice how downsamping all the way to 8khz signed linear loses to the path that only requires downsampling to 16khz signed linear.
Example 4: Complex resampling using siren14 to a fake 32khz lossy codec. Notice how siren14->slin16 has a 830 cost while siren14-slin8 has 831. This allows translations within the same category to be weighted against each other to produce the best quality.
7.2.4. Translator Costs Defined.
Note that the table costs actually defined in the code are larger than the ones discussed so far by a factor of 1000.
7.2.5. Creation of Translation Path Matrix
Most least cost algorithms take a matrix as input. The current code's translation path matrix is represented by a 2 dimensional array of translation path structures. The current matrix will not change structurally, but there are some complications involved. The current code accesses translation paths from the matrix using index values which represent individual formats. The index values are computed by converting the format's bit representation to a numeric value. Since the numeric representation of a format bit has to be between 1 and 64, the maximum size of the bit field in use, the numeric representation works as an index for the current two dimensional matrix. With the introduction of the ast_format structure, this conversion between a format's unique id and the a matrix index value is not clean. To account for this complication a hash table mapping every format id to a matrix index value will be used.
7.2.6. Computing Least Cost Translation Paths
The Floyd-Warshall algorithm will be the least cost algorithm in use. At its core, the current translation path building code uses this algorithm but has a few layers of complexity added on top of the base algorithm to deal with translation paths between audio codecs of differing sample rates. With the introduction of the new translation cost table, this additional complexity is completely stripped away from the algorithm. Now the translation costs are computed with translation quality and efficiency in mind, which abstracts these concepts away from least cost algorithm in use.
7.3. Translator Redundancy and Failover
It is possible that multiple redundant translators may exist for a single translation path. A common example of this would be a hardware translator with limited capacity coupled with a software translator. Both of these translators perform the exact same task, but the hardware translator is much faster. In this case the hardware translator would be used until it reached capacity and then it would failover to the software translator. There is however a complication involved with this. Only one of these translators can exist in the translation path matrix at a time. This means that when multiple translators with the same source and destination formats are present, some sort of priority must be used to pick which one is used. If the translator in use reaches capacity it then must deactivate itself allowing the matrix to be rebuilt in order to take advantage of the redundant translator.
In order to prioritize redundant translators, computational cost will be used. Formats requiring the use of redundant translators must supply a set of sample data to translate. This data is already present for most audio formats because it is required by the current architecture to compute translation cost. Translation cost in the new architecture is replaced by the translation cost table, but computational cost is still important when choosing between redundant translators.
7.4. Redefining The Translator Interface
Translators are currently defined by a simple set of functions (constructor, destructor, framein, frameout) coupled with a source and destination media format to translate between. There is not much that needs to be changed about this interface except that the source and destination formats must be converted to be ast_format structures in all the existing code, and each translator must provide a cost value. There will be a table available to guide exactly what cost value to use. In order to make any future changes to the cost table effortless, defined values will be used when assigning cost to a translator. Otherwise this interface is in great shape for the changes ahead.
8. Handling Multiple Media Streams
8.1. Problem Overview
Asterisk was designed from the ground up with the idea of only one audio media path being passed between channels. The code that handles this media path is done in such a way that makes expanding it to multiple media paths very difficult, especially media that is not audio. Asterisk has gotten away with being able to support very limited video functionality by not treating it as a media path at all. Instead of putting all media in the same media path as audio, video and other forms of media are just passed through similar to the way signalling is done. In order to bring all media into the same code path as audio, several fundamental design changes must be made to the way channels represent media streams. This section discusses those changes and how they affect channel drivers and other applications requiring access to media streams.
8.2. Defining a Media Stream in Asterisk
The first step in improving Asterisk's ability to represent multiple media streams is to actually define what a media stream is. At the moment, a stream in Asterisk is a very abstract idea. There is no tangible representation of a stream, no stream object or structure. The best representation of a stream Asterisk has now is the ast_channel structure which is capable of representing a single set of audio tx/rx streams through the use of a bunch disjoint elements. Lets start this discussion by breaking out the elements of the ast_channel structure that allow it to represent these streams.
In order for the ast_channel structure to represent a single set of audio tx/rx streams it needs the following things.
- Read translator - Translates stream on the read path going into the Asterisk Core.
- Write translator - Translates stream on the write path going out to the channel driver.
- Native Format Capabilities - Native capabilities the channel driver is capable of understanding without translation for this stream.
- Read Format - Requested Read format after translation on the read path.
- Raw Read Format - Expected read format before translation.
- Write Format - Requested write format after translation on the write path.
- Raw Read Format - Expected write format before translation.
The combination of all these items represent everything Asterisk needs to make channels compatible with one another and build translation paths between one another for a single set of corresponding tx/rx streams. The problem with this architecture is that all these disjoint elements make it impossible to replicate this functionality allowing for multiple tx/rx streams to exist on a single channel. In order for channels in Asterisk to gain the ability to process multiple tx/rx stream sets on a single channel all of theses stream elements must be organized into an isolated structure that can be easily replicated and manipulated. This new structure is called the ast_channel_stream structure and is discussed in detail in the next section.
8.3. Introducing ast_channel_stream, Making Sense out of Madness
The ast_channel_stream structure is made up of all the individual elements required to represent single set of tx/rx streams on an ast_channel structure. This allows all the disjoint translators and formats on the ast_channel structure associated with the audio streams go away and be replaced by a single ast_channel_stream structure. Everyplace in the current code base that directly accesses any of the stream elements on a channel such as nativeformats, readformat, and writeformat will be replaced by a set of API functions provided by the new Ast Channel Stream API. This API contains all the common operations channel drivers and applications need to perform on a stream, such as setting the native format capabilities, initializing the read/write formats, retrieving the current read/write formats, and setting the read/write formats. By using this API, channels also gain the ability to contain more than one media stream set. This is done through the concept of stream identifiers which is further discussed in the next section.
8.4. Stream Identifiers
The ast_channel_stream structure isolates the concept of tx/rx streams to a single entity allowing channels to represent multiple streams through the use of multiple ast_channel_stream structures. Since it is prohibited for any part of Asterisk except channel.c to directly access the ast_channel_stream structures on a channel, the rest of Asterisk needs a way access these individual streams through the use of the Ast Channel Stream API. This introduces the concept of stream identifiers. Stream identifiers completely abstract away the concept of the ast_channel_stream structure from the rest of Asterisk. Every ast_channel_stream structure on a channel will have a unique stream id assigned to it. This stream id is required by every function in the Ast Channel Stream API to access and manipulate the individual streams on a channel.
In order to separate ast_frames belonging to separate streams, a stream id will also be present on each frame. This will involve placing a new value on the ast_frame structure to represent what stream the frame belongs to. By default the current code will not use the stream id on the ast_frame even though it will be present. This concept is discussed in more detail in the "Default Streams" section.
Steam identifiers are organized into three categories. For the sake of organization and ABI compatibility each of these categories are given a range of unique stream identifiers available to them. Separating the default streams from the auxiliary and dynamic streams also makes it much easier to filter out auxiliary and dynamic streams for applications and modules that do not support them. Once a new stream identifier is defined, its unique id must remain consistent.
- default streams: Unique id defined between 1 - 99999
- auxiliary streams: Unique id defined between 100000 - 199999
- dynamic streams: Unique id defined between 200000 - 299999
8.4.1. Default Streams
Since Asterisk was designed with the concept of a single audio tx/rx stream set existing on a channel, some provisions must be made to allow for a smooth transition into the concept of multiple stream sets. This is where default streams come into play. Every ast_channel structure will contain a set of default streams associated with it, each with a predefined consistent stream id.
Default Audio Streams - The first default tx/rx stream set present on every channel is the default audio streams. This is the stream set all of Asterisk already knows about. It is the one that used to be made of individual elements in the ast_channel structure but was stripped out after defining the ast_channel_stream structure. Every channel driver built so far already knows how to manipulate these streams and many applications require access to them as well. All ast_frames of type AST_FRAME_VOICE with a stream id of 0 will automatically match this default stream set on a channel. Since 0 is the default initialization value for the stream id on a frame, all the channel drivers and applications already making use of these streams do not have to be modified.
It should be noted that while additional audio streams will be possible in the future, it is likely the default audio stream will be the only one that any kind of tone detection is performed on for DTMF, FAX, etc. This document does not attempt to alter this limitation in any way.
Default Video Streams - It is currently impossible to do translation between two channels transmitting different video formats because the channel has no way of representing video translators. This changes with the introduction of the default video rx/tx stream set. Similar to the default audio streams, any video frames containing a stream Id of 0 is automatically matched to the default video stream set on a channel.
As more media types are introduced, it may be beneficial to define additional default stream sets. Initially only audio and video will present.
8.4.2. Auxiliary Streams
If a channel driver is capable of negotiating more streams than can be represented by the default rx/tx stream sets on a channel, the auxiliary media stream sets can be used. These stream sets work the exact same way as the default stream sets except they require the use of the media stream id on frames. With auxiliary streams the stream id must be present on every ast_frame created for the stream. This allows channels and applications not capable of processing auxiliary streams to filter out the frames they don't understand.
Since Asterisk supports multiple protocols with various capabilities, all the auxiliary streams that can be used anywhere in Asterisk must be defined at compile time. This means when a channel driver is extended to make use of a new type of auxiliary stream, that stream must be defined with a stream id that uniquely represents it across the entire code base. This is the only way to keep the different types of auxiliary streams and what they are used for consistent across all modules.
8.4.3. Dynamic Streams
It is possible that Asterisk will need the ability to pass through streams containing media it does not understand. This can only be accomplished if both the channel negotiating the unknown media type and whatever that channel is bridged too can agree that they both understand the unknown media type and are assigning it a dynamic stream id that they both agree upon. This document does not define the negotiation of dynamic streams in Asterisk.
8.5. Ast Channel Stream API Defined
8.6. Code Change Examples
This sections shows how the Ast Channel Stream API replaces existing usage in Asterisk.
Example 1: A channel driver creating a new channel and initializing the default audio stream's formats and capabilities.
Example 2: Setting the read format on a channel.
9. Media Format with Attributes User Configuration
With the addtion of media formats with attributes, users will need a way to represent these new formats and their attributes in their config files. This will be accomplished by the ability to define custom media types that can be used in the format allow/disallow fields. These custom media type definitions will live in codecs.conf. For people familiar with Asterisk's config structure, the best way to present this concept is with some examples.
Example 1. SILK is capable of several different sample rates. If a peer wants to negotiate only using SILK in a narrow band format, a custom format must be created to represent this.
Example 2. H.264 is capable of negotiating a wide range of attributes. If specific attributes are to be negotiated, a custom format must be created to represent this.
Notice from these examples that both the SILK and H264 custom formats are defined using fields specific to their format. Each format will define what fields are applicable to them. If there are common fields used for several different media formats, those fields should be named in a consistent way across all the media formats that use them. Every format allowing custom media formats to be defined must be documented in codecs.conf along with all the available fields.
10. Enhancing Format Negotiation During Call Setup
11. Format Renegotiation After Call Setup
11.1. Problem Overview
When it is at all possible, it is always better to have two bridged channels share the same media formats for their audio streams than to have to perform translation. Translation for audio is expensive, but translation for video is exponentially more expensive that audio. Because of the computational complexity involved with translating video, the concept of being able to renegotiate media after a call is estabilshed in an attempt to get the device to do the translation for us is very important. Right now Asterisk lacks this ability.
11.2. Making ast_channel_make_compatible() Smarter
Every time a channel is bridged with another channel a call to ast_channel_make_compatible() is made. This function takes the two channels to be bridged as input and figures out all the translation paths and intermediate media formats that need to be set in order for the two channels to talk to each other. With protocols like SIP, it is possible to renegotiate the call parameters after call setup has taken place. By placing a feature in ast_channel_make_compatible() that can make the two channels aware of each other's native media format before translation takes place, it is possible for one side to re-negotiate its session to switch to the same media format used by the other side. When this is possible, Asterisk is able to avoid translation completely.
11.2.1. How Renegotiation Works
At the heart of renegotiation is the introduction of a channel option called AST_OPTION_FORMAT_RENEGOTIATE and a structure called ast_option_renegotiate_param. The ast_format_renegotiate_param structure is passed as the data for the AST_OPTION_FORMAT_RENEGOTIATE's query and set actions. This structure contains both a format to renegotiate for each stream renegotation must take place on, a function pointer containing the place a channel tech must report the result of its renegotiation attempt, and an internal structure used to determine what action to take next after a channel tech reports the renegotiation attempt.
On query, the ast_option_renegotiate_param structure is passed down to the channel tech pvt containing information about all the formats and streams to renegotiate. The result of a query request indicates whether or not the channel tech is capable of attempting renegotiation with the formats provided or not. Queries are performed synchronously, meaning the result of a query request must never block for a network transaction to take place.
On set, the ast_option_renegotiate_param structure is passed down to the channel tech pvt containing both the formats and streams to renegotiate along with a place to report the result of the renegotiation. Renegotiation is event driven, meaning that the channel tech pvt is given the renegotation parameters and it must report back at a later time the result of the renegotiation attempt. This allows the set operation to avoid blocking the bridge code by performing the renegotation asynchronously.
During ast_channel_make_compatible(), if it is determined that translation is required to make two channels compatible both channels are queried using the AST_OPTION_FORMAT_RENEGOTIATE option and ast_option_renegotiate_param structures. After the queries, if either of the two channels are capable of renegotiating the set action is used on best candidate to attempt the renegotiation. If the channel used for the first renegotiation attempt reports a failure, a second attempt at renegotiation may take place for the bridged channel if neither channel has hung up.
11.2.2. Renegotiation with Intermediary Translation
- Make Compatible Flow of Events
- ast_channel_make_compatible() is invoked
- read and write formats are different between channels for at least one stream
- translation paths are built for streams requiring translation
- query to AST_OPTION_FORMAT_RENEGOTIATE is made on both channels
- if candidate for renegotation exists, renegotiation parameters are set to the channel using AST_OPTION_FORMAT_RENEGOTIATE
- channels are bridged
- Asynchronous Renegotation Flow of Events
- channel tech is set with renegotation paramters using AST_OPTION_FORMAT_RENEGOTIATE
- channel tech attempts renegotiation and reports result to renegotiation parameter result function
- on SUCCESS: new format is set for renegotated stream and translation path goes away
- on FAILURE: result function attempts renegotation with bridged channel if possible, else translation must remain
11.2.3. Renegotiation with no Intermediary Translation
- Make Compatible Flow of Events
- ast_channel_make_compatible() is invoked
- channel's read and write formats are different for at least one stream
- NO translation path is possible to make channels compatible
- query to AST_OPTION_FORMAT_RENEGOTIATE is made to both channels
- if best candidate for renegotiation is found, renegotiation parameters are set to the channel using AST_OPTION_FORMAT_RENEGOTIATE
- channel is bridged
- media for incompatible streams are blocked for a period of time while renegotiation takes place
- Asynchronous Renegotiation Flow of Events
- channel tech is set with renegotiation parameters using AST_OPTION_FORMAT_RENEGOTIATE.
- channel tech attempts renegotiation and reports result to renegotiation parameter result function
- on SUCCESS: new format is set for renegotiated stream and translation path goes away
- on FAILURE: result function attempts renegotiation with bridged channel if possible
- if renegotiation fails on both channels, depending on the stream in question media is either indefinitely blocked or both channels are hung up
12. Implementation Phases
With a project of this size, it is important to break down the implementation into manageable phases. Each phase of development contains a set of steps which act as milestones. These steps must be small enough to be attainable within a week to two week period but complete enough to not break any Asterisk functionality once they are introduced. Once a step is complete, it should be reviewed and committed into trunk. This allows progress to be made in a maintainable way.
12.1. Phase 1: Re-architect how media is represented and how translation paths are built
From the user perspective, no functionality changes will be present during this phase.
- Step 1
- Define new format unique ID system using numbers rather than bits. Allow this definition to remain unused during this step except by the new APIs.
- Create Ast Format API + unit tests.
- Create Ast Capibility API + unit tests.
- Create IAX2 Conversion layer for ast_format and ast_cap objects. Create unit tests and leave this layer inactive until conversion to new APIs takes place.
- Step 2
- Define translation cost table.
- Replace current matrix algorithm with new matrix algorithm using predefined costs from table.
- Continue to use computational costs for tie breaking translators with identical src and dst formats.
- Create table for mapping format ids to matrix index values. This is required once the conversion from the format bit field representation to a numeric value takes place and will allow for a smoother transition.
- Step 3
- Replace old format unique ID system with the new system. This will temporarily break all asterisk media functionality.
- Add media functionality back into Asterisk by replacing all instances of format_t with ast_format and ast_cap.
- Completely remove format_t type def.
12.2. Phase 2: Exercise the functionality introduced by formats with attributes
This is done by introducing the SILK codec and allowing H.264 to be negotiated with format attributes.
- Step 1
- Define SILK format in Asterisk.
- Create SILK format attribute interface.
- Make SILK translators to and from signed linear.
- Add the ability to define custom media formats with attributes in user configuration.
- Extend the rtp mapping code to allow chan_sip to advertise SILK appropriately in SDPs.
- Step 2
- Create H.264 format attribute interface.
- Extend codecs.conf to allow custom H.264 format definitions.
- Extend chan_sip to be able to correctly advertise and negotiate H.264 with attributes in SDPs.
12.3. Phase 3: Extend Asterisk to handle multiple media streams
- Step 1
- Create Ast Channel Stream API
- Define default audio stream by replacing current audio stream formats and translators on a channel with an ast_channel_stream structure.
- Define default video stream by introducing a new ast_channel_stream structure used solely for negotiating the primary video stream.
- Step 2
- Add the stream id field to the ast_frame structure.
- Block the ability to read anything other than the default streams with all current channel drivers and applications.
- Introduce new ast_read functionality for reading auxiliary streams when it is explicitly requested.
- Step 3
- Exercise the new ability to build video translation paths using an FFMPEG addon translation module.
12.4. Phase 4: Format Renegotiation after call setup
Allowing calls to renegotiate their media formats after call setup is perhaps the most practical functionality introduced by this project. Due to the way multiple media streams will be represented in Asterisk, this ability to represent multiple streams is prerequisite for format renegotiation be implemented correctly. That is the primary reasoning for pushing back the introduction of this functionality to a later phase.
- Step 1
- Re-evaluate design. Define use cases and prove concept with a set of sequence diagrams.
- Test interoperability of renegotiation use cases using sipp scenarios against common SIP devices.
- Step 2
- Implement core functionality changes required to detect and attempt format renegotiation with channel techs.
- Implement chan_sip configuration options and functionality required to allow format renegotiation triggered by the Asterisk core to occur after call setup.