IEC 62379
Common Control Interface
for
networked audio and video equipment
Part 5 specifies control of transmission of live media and other data over each individual network technology. It includes network specific management interfaces along with network specific control elements that integrate into the control framework.
Within part 5, sub-part 1 specifies management of aspects which are common to all network technologies, and sub-part 2 specifies protocols which can be used between networking equipment to enable the setting up of calls which are routed across different networking technologies. Sub-parts 3 onwards specify management of aspects which are particular to individual networking technologies.
IEC 62379 is mainly concerned with two kinds of service, one suitable for live media and one suitable for management messages and other “best effort” packet data.
The service for live media (including status broadcasts) is intended to be similar to the service provided by a cross-point router. Thus it is suitable for carrying a stream of data at a constant rate from a source to one or more destinations. Important Quality of Service (QoS) parameters include the maximum and minimum delay between source and destination and the likelihood of parts of the data being lost.
The concept of a “call” or virtual circuit is used. Setting up and tearing down calls corresponds to making and breaking routes in a cross-point router. The normal procedure in router control is for a destination to “take” the signal from a source, and the model used in IEC 62379 is that the management terminal sends commands to the destination unit and the destination unit then asks the network to make the connection.
An interface unit may implement calls in very different ways, depending on the type of network over which the call passes; network-specific details are in the Sub-part of Part 5 which applies to each type of network. If the network does not support multicasting, the source may send a separate copy of the stream to each destination unit.
If the network offers a connection-oriented service, calls map naturally onto connections and the network can be expected to offer guarantees for the QoS parameters when a call is set up. If the network only offers a connectionless service, calls map onto sessions at a higher layer (for instance, RTP flows which are carried over the connectionless service provided by UDP over IP; note that there is still a “call set-up” process, using a protocol such as SIP), in which case it may only be possible to give an estimate of the QoS parameter values.
The service for management messages may be either connection-oriented or connectionless. In either case, it is assumed to be a “best effort” service which may lose messages or buffer them for an unlimited length of time. On IP networks, the connectionless protocol UDP is used (as specified in RFC1157), but on other networking technologies there may well be advantages in setting up a call, for instance if authentication can be done at call set-up time rather than separately for each message.
The physical connection to the network is described by a “block”. The block has an input for each media flow going to the network, and an output for each media flow coming from it. The number of inputs and outputs will usually change dynamically as calls are connected and cleared down. This model is used in switches within the network as well as in end equipment.
To “take” the media stream from a remote source, a new entry is first created in the “unit destination list” of the unit that is to receive the stream; note that at this stage it may not be known on which of its network ports the call will be connected, if it has more than one. When the management terminal has written all the necessary information in this entry, including identifying which input to which block is to receive the incoming media stream, it requests the unit to make the connection on the network. If the connection is successfully made, a new output is created (or an existing, currently unused, output is assigned) on the network port block representing the physical port on which the call was made, and the input that is to receive the media stream (on a block which may represent a media output port or an internal processing function) is connected to it.
If the remote source is already transmitting on the network, and the network supports multicasting, the network will simply copy the existing stream to the new unit. Otherwise, the source unit will receive an incoming call requesting a source which it identifies from information in the call set-up (or INVITE) message; it creates (or assigns) a new input to the relevant network port block and connects it to the source.
A “call” as defined here only carries one media stream. If the network supports calls which carry a bundle of several media streams, each stream has its own entry in the unit source or destination list. In this case, the “call identifier” will probably consist of a part that identifies the call (i.e. the bundle) and a part that distinguishes individual streams within the bundle. Sub-part 2 specifies a call identifier in this form.
“Identity” information may be associated with any call, to provide a user with information that may not be apparent from, say, the network addresses of the endpoints. In a broadcast environment this could include an indication of whether it is part of an on-air programme chain, and if so which programme.
For some applications, including radio and TV broadcasting, calls are required to exist for very long periods with very high reliability. IEC 62379 includes facilities to assist that.
Where equipment is duplicated to increase reliability, the control system may request that calls between the two sets of equipment follow different paths in the network, so that no part of the network can become a single point of failure. Destination equipment may receive two copies of the media by different routes.
This allows a call to be “replaced”. The replacement call is connected, then the destination equipment switches from using the data from the original call to the replacement, then the original call is torn down. The destination equipment needs to be able to align the two data streams so that there is no discontinuity when switching; the means for doing so (such as the timing specified in AES53) are outside the scope of Part 5.
One use for call replacement is to allow networking equipment to be taken out of service for scheduled maintenance or relocation. This uses another facility, whereby the equipment can be “barred” from accepting new calls. The existing calls through it are then “replaced”, and the replacement calls will take a different route. Once all calls through it have been replaced, the equipment can be taken out of service.
If the network supports it, calls to carry live media can be pre-scheduled. If a programme requires a feed from a remote studio or other remote location, the network can be requested in advance to reserve the necessary resources. A maximum duration may be specified for a call that is not pre-booked, so that it can use resources that will be required later for a pre-booked call.
Sub-part 2 specifies protocols which can be used between networking equipment to enable the setting up of calls which are routed across different networking technologies. Follow this link for a review of some features that are important for Future Networks.
A network is composed of end equipment, switches, and links. End equipment units convey media flows between media ports (such as analogue or digital audio or video connectors) and network ports, the ports being part of the unit. Switches similarly convey media data between network ports.
A link connects network ports on different units together. It may be a dedicated point-to-point link, such as Ethernet over Cat5e cabling or a leased line connection over a telecommunications network, but it may also be a subnetwork (switched or shared-media) that connects more than two ports, or a larger network such as a switched telecommunications network or the Internet.
The route followed by the media data from the source to the destination may need to traverse more than one kind of network. For instance, audio being transmitted from a studio in one location to a studio in another location will probably travel over the local infrastructure in both studios and a wide area network that links them; these three networks may well all use different technologies.
This sub-part of Part 5 specifies protocols which allow calls to be connected across heterogeneous networks which may be modelled in any level of detail, from individual switches and point-to-point links to a single “cloud” with no internal structure visible, even if they use very different addressing schemes.
Different networking technologies also vary widely in the quality of service they are able to provide, from circuit-switched networks offering fixed latency and guaranteed delivery, through managed packet networks which can provide a certain level of assurance, to “best effort” packet networks (such as the Internet) which do not offer any guarantees at all.
Addresses may perform one or both of two functions: location and identification. An identifier selects a particular physical unit or a particular service. A locator shows where the required unit or service is to be found.
Fixed-line telephone numbers are locators. They include a country code and an area code, which help the system to route the call to a specific telephone line. The call is answered by whoever happens to be near the phone at the time, and the caller then has to ask for the person they wish to speak to by name (which is, of course, an identifier).
The 48-bit MAC addresses used in Ethernet and other IEEE802 networks (apart from group addresses and locally-administered addresses) are identifiers which uniquely identify a particular interface but do not contain any information as to where on the network the interface is to be found.
The 20-byte NSAP addresses used in ATM networks include a locator (the prefix) and an identifier (the ESI) in separate fields.
IPv4 addresses act as both locator and identifier, although (unlike MAC addresses) the relationship between this identifier and a particular piece of equipment is often not permanent, and Network Address Translation means that a piece of equipment may appear to have different identifiers in different parts of the network and several different pieces of equipment may appear to have the same identifier. Various schemes have been proposed to adapt IP addressing so as to separate the locator from the identifier.
The principle form of unique identifier used in this standard is the IEEE 64-bit extended unique identifier (EUI-64). An EUI-64 for any piece of equipment that has an interface which has a MAC address can be generated by inserting FFFE hex into the middle of the MAC address. There are two bits in an EUI-64 which are always zero; future versions will support other forms of globally-unique 64-bit identifier that have a nonzero value in these two bits. Identifiers that are not unique may also be used, for instance the name of a service may be used as the identifier for any piece of equipment that offers that service.
For calls within a subnetwork, only the identifier is required.
An address may consist of a series of addresses such that the call is routed to the first address, then from there to the next, etc. Each address is interpreted in the context of the equipment or location specified by the previous address, which thus acts as a locator. For instance, the address for an audio call may consist of the address of a piece of audio equipment followed by the identifier of a particular output port on that equipment. The address of a gateway (or even of a specific port on a gateway or switch) may be used as the locator for equipment accessed through that gateway; this kind of locator should not, however, be included in any kind of identification for the destination unit, because that would prevent calls being (re-)routed if the specified gateway unit failed.
A call is defined in Part 1 as conveying information either from a source unit to one or more destination units or (in both directions) between two units.
A path is a contiguous set of links along which the information might be conveyed from one unit to another. We use the term “path” when describing the process of discovering how to get from one unit to another across the network. A path may branch, in which case the different branches are different attempts at finding a way through; the eventual data flow will only be along one of the branches.
Once a set of links leading from one unit to the other has been discovered, it is called a “route”. A route may also branch, but only as required to deliver multicast data to several different destinations.
A route exists only in the control plane. It does not carry data, apart from signalling messages which are copied from one link to another (often being modified in the process) by software running in a computer which controls the unit through which they pass.
A flow is a single stream of information which is conveyed along a route.
A flow exists in the forwarding plane. In the case of a circuit switched network, data are copied from one link to another by the switching fabric with no intervention from the software.
A call may carry several different flows. For instance, a connection between two studios may carry programme audio, talkback, and signalling (e.g. red light) data as separate flows. Compressed data may be partitioned into a flow carrying a base layer and one or more flows carrying enhancement layers that improve the quality or add, for instance, surround sound channels; all destinations will receive the base layer, but the enhancement layer(s) would not be received by destinations that were not able to use the extra data, or did not have high enough bandwidth connections.
A call that needs high reliability (such as one that is part of the programme chain of a live broadcast, or in a safety-critical public address system) may have more than one route. For the highest reliability, all flows will be sent on two or more routes and the destination unit will use one copy and discard the other(s). Alternatively, a backup route might only carry the base layer, or it might be set up with no flows at all, with the flows only being connected in the event of failure of the main route.
A data structure for identifying calls, routes, and flows is defined. This structure can be larger than would be the case if (like, for instance, IP addresses and port numbers) it was transmitted with the data. Call identification consists of a globally-unique identification of the “owner”, which is either the originator of the call or the source of the data, and the “call reference” which is a 32-bit handle chosen by the owner. Thus the call identifier is unique across the whole network. We use 32 bits so that the size of a call reference is unlikely to be a limitation on the number of calls that a unit can make, and the time before a call reference value is re-used can be made adequately long.
We use 7 bits for the “route reference”, a handle on the route which a call follows. Most calls will only have one or two routes, but there will be cases where a call needs to set up new backup routes when existing routes fail and, again, the size of the field allows adequate time to elapse before handle values are re-used. Also, there is no particular benefit in shaving a small number of bits off this field.
The network is expected to ensure as far as possible that different routes for the same call do not pass through the same piece of equipment, which would then be a single point of failure.
The “flow reference” is a handle on the flow within the call, and is 24 bits. Again, most calls will only have a small number of flows, but this field has been made large enough that a call can carry a “tunnel” through which a large number of flows are routed (similar to an ATM VPC). There is also 1 bit which shows whether the flow is towards or away from the owner of the call.
The flow reference is independent of the route, so flows that carry the same data by different routes can be easily identified. Where the same material is transmitted by different end units (for instance, where an audio input to the network is duplicated), the same call identifier (but, of course, different route references) may be used by the two units. If the two streams are identical (which can only be the case with digital inputs), they will use the same flow reference, but otherwise (for instance where the same analogue signal is digitised separately in each unit) they may need to use different flow references.
Note that a route has an existence independent of the flows that follow it. A route may be established without any flows, in which case signalling messages can still be passed along it. In a circuit-switched network, a flow will be set up in the switching fabric whereas the route will exist only in software records held by the switches. Thus there are two distinct phases in connecting a flow: establishing the route, and setting up the switching fabric so that the data will be forwarded along it.
Also note that a flow passes through the same set of links and switches as the route it follows. This is different from the situation with SIP (RFC 3261), in which the media flow will in many cases not pass through the SIP server(s) that connected it.
The process of establishing a route begins with the caller creating a new route identifier and sending a “FindRoute request” signalling message on one or more of its network ports. Each recipient checks whether it is the called party, in which case it responds to the request, or is directly connected to it, in which case it sends the request on to it; otherwise it either rejects the request or sends it on via one or more of its network ports to units which are in some sense “closer” to the called party.
This standard does not specify how a switch chooses on which ports to forward a FindRoute request; the mechanism will usually be network-specific. In a small network it may simply flood it to all ports; if a loop is formed it can easily be detected, by comparing the route identifier in an incoming request with those of requests that have already been forwarded, and this will occur before any data begin to flow.
As the request progresses through the network, it builds up a path which may have many tentacles reaching out towards the called party in different directions. When the request reaches the called party, a “FindRoute response” message is sent back along the path, and as it makes its way back towards the caller, the message accumulates “route metric” information which includes a measure of the “cost” of the path in terms of congestion and number of hops, and more literally the cost of the call if it passes over a public network. Each such response message is an offer to connect the route by the path over which it has travelled; the calling party chooses one offer and sends a “FindRoute confirmation” message which also causes all the other paths to be cleared down.
A ClearDown request may be issued at any time by any of the parties involved in the call; if issued by a switch it propagates in both directions. In the case of a multicast, in the direction towards the source it stops when it reaches a branching point.
A ClearDown received in reply to a FindRoute request is a negative response which may be a “refusal” or a “backtrack”. A refusal indicates that the called party has been found but is not willing to accept the call, or that the called party's location has been found and the called party is not there; any resources reserved for the call are relinquished as the refusal propagates back towards the caller. A backtrack indicates that the called party's location was not reachable via the path the request followed (maybe because the required resources are not available along that path, or because a link has been lost and the routing tables have not yet been updated to reflect the new topology), or that the called party is a service which was not available at the location the request reached but may be available at a different location; a switch receiving a backtrack may send out a further request along a different path, or if all possible paths have been tried pass the negative response back towards the caller.
If the call is charged for, charging begins when the confirmation is received by the service provider; inclusion of the new flow in routing tables etc may be delayed until the confirmation has been received, to prevent the user consuming the media without paying. Also, a call may be disconnected if no positive confirmation has been received within a specified time (which should be long enough for a user to reply to a “do you want to pay for this call?” alert).
The ClearDown request may be used as a negative FindRoute confirmation, for instance if the user does not wish to pay for the call or has got a better offer via another path.
A switch may treat a positive response with a reduced capacity or high cost as a backtrack and try another path.
In any circumstances where a switch has forwarded a FindRoute request along more than one path, it propagates the first positive response, and any subsequent response that is a significant improvement on it, towards the caller; all such responses except the last include an “interim offer” indication which contains the switch's (EUI-64) identifier and a serial number. If the last response is not one that it would forward to the caller, it repeats the last response without the “interim offer” indication. A response message received by the caller may contain “interim offer” indications from several switches; one with no “interim offer” indications is final. A negative response is always final.
In the case where more than one response has been sent, a positive confirmation must be forwarded along the correct onward path and ClearDown requests sent on the other paths. The caller must include in the confirmation message all the “interim offer” indications from the response message, so that the correct path can be chosen at each switch.
The request message may include an upper limit on the capacity required and also a preferred and a minimum capacity. For instance, in the case of linear PCM audio, it might request 16-bit 96kHz but be prepared to accept 24-bit if that is what the source is outputting, or 48kHz if the network is busy. Usually the source will have advertised the formats it supports, so the caller can be reasonably sure what the options are. As the request message propagates through the network, link capacity may be reserved based on the parameters in the message (reserving the maximum at this stage, if available, else adjusting the message to reflect what is available).
The identifier for a multicast flow is owned by the source. The FindRoute messages use a “temporary” flow identifier owned by the caller; the response and confirmation messages include both identifiers.
Multicast calls include a specification (by the source) of the action to be taken by a switch that finds it is already carrying the flow requested by the caller. This action may be (a) to connect the caller to the flow without propagating the request on further, (b) to connect the caller and also inform the source that it has done so, or (c) to forward the request to the source. Case (c) will be appropriate for “private” flows where connection of the caller requires approval from the source. Except for case (c), switches must store some of the “route metric” information in case they later want to copy the flow to other callers; the figure for the whole flow starting from the source, and maybe including processing delays upstream of the sending interface, is stored, and the figures for the part of the flow downstream of the branching point are added as the FindRoute response message passes along the path.
The messages used to implement the protocol are structured in a similar way to those in ITU T Q.2931, with a fixed-format header followed by information elements in tag-length-value (TLV) form. As with a number of protocols and file formats that use TLV (including the ASN.1 coding used by SNMP), information elements may be nested within other information elements.
The TLV format makes it easy for the software in end equipment and switches to parse the messages, extracting the information they need and ignoring information which they do not need. The recipient of a text format, such as that used in SIP, must scan the whole message to find the newline characters, remove white space, and recognise keywords whether they are in upper case, lower case, or a combination of the two. This adds unnecessary complexity which may not be particularly onerous for PCs etc but can be more so for the embedded code in interface units.
The coding also provides various extension mechanisms that allow manufacturer-specific and application-specific information to be carried transparently and identified unambiguously.
To minimise the amount of transcoding required when transmitting audio and video over heterogeneous networks, and to increase the likelihood that equipment designed for different applications will interoperate successfully, the standard defines formats that can be used with any link technology.
The format for pulse-code modulated (PCM) audio in the current draft is based on that specified in IEC 62365, which is in turn based on that specified in IEC 60958-4. Whereas IEC 62365 uses the fixed 48-octet cell size of ATM networks, the specification in IEC 62379 allows for variable package sizes. It supports networks in which it is efficient to send packages containing one sample per channel (to minimise latency) which, if the number of channels is small, as in mono or stereo, will be smaller than ATM cells. It also supports those with large per-package overheads (such as RTP/UDP/IP) for which larger packages should be sent.