[ RFC Index | RFC Search | Usenet FAQs | Web FAQs | Documents | Cities ]

Alternate Formats: rfc4695.txt | rfc4695.txt.pdf

RFC 4695 - RTP Payload Format for MIDI


    Search the Archives
Display RFC by number
    


RFC4695 - RTP Payload Format for MIDI


Network Working Group                                         J. Lazzaro
Request for Comments: 4695                                  J. Wawrzynek
Category: Standards Track                                    UC Berkeley
                                                           November 2006

                      RTP Payload Format for MIDI

Status of This Memo

   This document specifies an Internet standards track protocol for the
   Internet community, and requests discussion and suggestions for
   improvements.  Please refer to the current edition of the "Internet
   Official Protocol Standards" (STD 1) for the standardization state
   and status of this protocol.  Distribution of this memo is unlimited.

Copyright Notice

   Copyright (C) The IETF Trust (2006).

Abstract

   This memo describes a Real-time Transport Protocol (RTP) payload
   format for the MIDI (Musical Instrument Digital Interface) command
   language.  The format encodes all commands that may legally appear on
   a MIDI 1.0 DIN cable.  The format is suitable for interactive
   applications (such as network musical performance) and content-
   delivery applications (such as file streaming).  The format may be
   used over unicast and multicast UDP and TCP, and it defines tools for
   graceful recovery from packet loss.  Stream behavior, including the
   MIDI rendering method, may be customized during session setup.  The
   format also serves as a mode for the mpeg4-generic format, to support
   the MPEG 4 Audio Object Types for General MIDI, Downloadable Sounds
   Level 2, and Structured Audio.

Table of Contents

   1. Introduction ....................................................4
      1.1. Terminology ................................................5
      1.2. Bitfield Conventions .......................................6
   2. Packet Format ...................................................6
      2.1. RTP Header .................................................7
      2.2. MIDI Payload ..............................................11
   3. MIDI Command Section ...........................................12
      3.1.  Timestamps ...............................................14
      3.2.  Command Coding ...........................................16

   4. The Recovery Journal System ....................................22
   5. Recovery Journal Format ........................................24
   6. Session Description Protocol ...................................28
      6.1. Session Descriptions for Native Streams ...................29
      6.2. Session Descriptions for mpeg4-generic Streams ............30
      6.3. Parameters ................................................33
   7. Extensibility ..................................................34
   8. Congestion Control .............................................35
   9. Security Considerations ........................................35
   10. Acknowledgements ..............................................36
   11. IANA Considerations ...........................................37
      11.1. rtp-midi Media Type Registration .........................37
           11.1.1. Repository Request for "audio/rtp-midi" ...........40
      11.2. mpeg4-generic Media Type Registration ....................41
           11.2.1. Repository Request for Mode rtp-midi for
                   mpeg4-generic .....................................44
      11.3. asc Media Type Registration ..............................46
   A. The Recovery Journal Channel Chapters ..........................48
      A.1. Recovery Journal Definitions ..............................48
      A.2. Chapter P: MIDI Program Change ............................52
      A.3. Chapter C: MIDI Control Change ............................53
           A.3.1. Log Inclusion Rules ................................54
           A.3.2. Controller Log Format ..............................55
           A.3.3. Log List Coding Rules ..............................57
           A.3.4. The Parameter System ...............................60
      A.4. Chapter M: MIDI Parameter System ..........................62
           A.4.1. Log Inclusion Rules ................................64
           A.4.2. Log Coding Rules ...................................65
                 A.4.2.1. The Value Tool .............................67
                 A.4.2.2. The Count Tool .............................70
      A.5. Chapter W: MIDI Pitch Wheel ...............................71
      A.6. Chapter N: MIDI NoteOff and NoteOn ........................71
           A.6.1. Header Structure ...................................73
           A.6.2. Note Structures ....................................74
      A.7. Chapter E: MIDI Note Command Extras .......................75
           A.7.1. Note Log Format ....................................76
           A.7.2. Log Inclusion Rules ................................76
      A.8. Chapter T: MIDI Channel Aftertouch ........................77
      A.9. Chapter A: MIDI Poly Aftertouch ...........................78
   B. The Recovery Journal System Chapters ...........................79
      B.1. System Chapter D: Simple System Commands ..................79
           B.1.1. Undefined System Commands ..........................80
      B.2. System Chapter V: Active Sense Command ....................83
      B.3. System Chapter Q: Sequencer State Commands ................83
           B.3.1. Non-compliant Sequencers ...........................85
      B.4. System Chapter F: MIDI Time Code Tape Position ............86
           B.4.1. Partial Frames .....................................88

      B.5. System Chapter X: System Exclusive ........................89
           B.5.1. Chapter Format .....................................90
           B.5.2. Log Inclusion Semantics ............................92
           B.5.3. TCOUNT and COUNT Fields ............................95
   C. Session Configuration Tools ....................................95
      C.1. Configuration Tools: Stream Subsetting ....................97
      C.2. Configuration Tools: The Journalling System ..............101
           C.2.1. The j_sec Parameter ...............................102
           C.2.2. The j_update Parameter ............................103
                 C.2.2.1. The anchor Sending Policy .................104
                 C.2.2.2. The closed-loop Sending Policy ............104
                 C.2.2.3. The open-loop Sending Policy ..............108
           C.2.3. Recovery Journal Chapter Inclusion Parameters .....110
      C.3. Configuration Tools: Timestamp Semantics .................115
           C.3.1. The comex Algorithm ...............................115
           C.3.2. The async Algorithm ...............................116
           C.3.3. The buffer Algorithm ..............................117
      C.4. Configuration Tools: Packet Timing Tools .................118
           C.4.1. Packet Duration Tools .............................119
           C.4.2. The guardtime Parameter ...........................120
      C.5. Configuration Tools: Stream Description ..................121
      C.6. Configuration Tools: MIDI Rendering ......................128
           C.6.1. The multimode Parameter ...........................129
           C.6.2. Renderer Specification ............................129
           C.6.3. Renderer Initialization ...........................131
           C.6.4. MIDI Channel Mapping ..............................133
                 C.6.4.1. The smf_info Parameter ....................134
                 C.6.4.2. The smf_inline, smf_url, and smf_cid
                          Parameters ................................136
                 C.6.4.3. The chanmask Parameter ....................136
           C.6.5. The audio/asc Media Type ..........................137
      C.7. Interoperability .........................................139
           C.7.1. MIDI Content Streaming Applications ...............139
           C.7.2. MIDI Network Musical Performance Applications .....142
   D. Parameter Syntax Definitions ..................................150
   E. A MIDI Overview for Networking Specialists ....................156
      E.1. Commands Types ...........................................159
      E.2. Running Status ...........................................159
      E.3. Command Timing ...........................................160
      E.4. AudioSpecificConfig Templates for MMA Renderers ..........160
   References .......................................................165
   Normative References .............................................165
   Informative References ...........................................166

1.  Introduction

   The Internet Engineering Task Force (IETF) has developed a set of
   focused tools for multimedia networking ([RFC3550] [RFC4566]
   [RFC3261] [RFC2326]).  These tools can be combined in different ways
   to support a variety of real-time applications over Internet Protocol
   (IP) networks.

   For example, a telephony application might use the Session Initiation
   Protocol (SIP, [RFC3261]) to set up a phone call.  Call setup would
   include negotiations to agree on a common audio codec [RFC3264].
   Negotiations would use the Session Description Protocol (SDP,
   [RFC4566]) to describe candidate codecs.

   After a call is set up, audio data would flow between the parties
   using the Real Time Protocol (RTP, [RFC3550]) under any applicable
   profile (for example, the Audio/Visual Profile (AVP, [RFC3551])).
   The tools used in this telephony example (SIP, SDP, RTP) might be
   combined in a different way to support a content streaming
   application, perhaps in conjunction with other tools, such as the
   Real Time Streaming Protocol (RTSP, [RFC2326]).

   The MIDI (Musical Instrument Digital Interface) command language
   [MIDI] is widely used in musical applications that are analogous to
   the examples described above.  On stage and in the recording studio,
   MIDI is used for the interactive remote control of musical
   instruments, an application similar in spirit to telephony.  On web
   pages, Standard MIDI Files (SMFs, [MIDI]) rendered using the General
   MIDI standard [MIDI] provide a low-bandwidth substitute for audio
   streaming.

   This memo is motivated by a simple premise: if MIDI performances
   could be sent as RTP streams that are managed by IETF session tools,
   a hybridization of the MIDI and IETF application domains may occur.

   For example, interoperable MIDI networking may foster network music
   performance applications, in which a group of musicians, located at
   different physical locations, interact over a network to perform as
   they would if they were located in the same room [NMP].  As a second
   example, the streaming community may begin to use MIDI for low-
   bitrate audio coding, perhaps in conjunction with normative sound
   synthesis methods [MPEGSA].

   To enable MIDI applications to use RTP, this memo defines an RTP
   payload format and its media type.  Sections 2-5 and Appendices A-B
   define the RTP payload format.  Section 6 and Appendices C-D define
   the media types identifying the payload format, the parameters needed
   for configuration, and how the parameters are utilized in SDP.

   Appendix C also includes interoperability guidelines for the example
   applications described above: network musical performance using SIP
   (Appendix C.7.2) and content-streaming using RTSP (Appendix C.7.1).

   Another potential application area for RTP MIDI is MIDI networking
   for professional audio equipment and electronic musical instruments.
   We do not offer interoperability guidelines for this application in
   this memo.  However, RTP MIDI has been designed with stage and studio
   applications in mind, and we expect that efforts to define a stage
   and studio framework will rely on RTP MIDI for MIDI transport
   services.

   Some applications may require MIDI media delivery at a certain
   service quality level (latency, jitter, packet loss, etc).  RTP
   itself does not provide service guarantees.  However, applications
   may use lower-layer network protocols to configure the quality of the
   transport services that RTP uses.  These protocols may act to reserve
   network resources for RTP flows [RFC2205] or may simply direct RTP
   traffic onto a dedicated "media network" in a local installation.
   Note that RTP and the MIDI payload format do provide tools that
   applications may use to achieve the best possible real-time
   performance at a given service level.

   This memo normatively defines the syntax and semantics of the MIDI
   payload format.  However, this memo does not define algorithms for
   sending and receiving packets.  An ancillary document [RFC4696]
   provides informative guidance on algorithms.  Supplemental
   information may be found in related conference publications [NMP]
   [GRAME].

   Throughout this memo, the phrase "native stream" refers to a stream
   that uses the rtp-midi media type.  The phrase "mpeg4-generic stream"
   refers to a stream that uses the mpeg4-generic media type (in mode
   rtp-midi) to operate in an MPEG 4 environment [RFC3640].  Section 6
   describes this distinction in detail.

1.1.  Terminology

   In this document, the key words "MUST", "MUST NOT", "REQUIRED",
   "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY",
   and "OPTIONAL" are to be interpreted as described in BCP 14, RFC 2119
   [RFC2119].

1.2.  Bitfield Conventions

   In this document, the packet bitfields that share a common name often
   have identical semantics.  As most of these bitfields appear in
   Appendices A-B, we define the common bitfield names in Appendix A.1.

   However, a few of these common names also appear in the main text of
   this document.  For convenience, we list these definitions below:

     o R flag bit.  R flag bits are reserved for future use.  Senders
       MUST set R bits to 0.  Receivers MUST ignore R bit values.

     o LENGTH field.  All fields named LENGTH (as distinct from LEN)
       code the number of octets in the structure that contains it,
       including the header it resides in and all hierarchical levels
       below it.  If a structure contains a LENGTH field, a receiver
       MUST use the LENGTH field value to advance past the structure
       during parsing, rather than use knowledge about the internal
       format of the structure.

2.  Packet Format

   In this section, we introduce the format of RTP MIDI packets.  The
   description includes some background information on RTP, for the
   benefit of MIDI implementors new to IETF tools.  Implementors should
   consult [RFC3550] for an authoritative description of RTP.

   This memo assumes that the reader is familiar with MIDI syntax and
   semantics.  Appendix E provides a MIDI overview, at a level of detail
   sufficient to understand most of this memo.  Implementors should
   consult [MIDI] for an authoritative description of MIDI.

   The MIDI payload format maps a MIDI command stream (16 voice channels
   + systems) onto an RTP stream.  An RTP media stream is a sequence of
   logical packets that share a common format.  Each packet consists of
   two parts: the RTP header and the MIDI payload.  Figure 1 shows this
   format (vertical space delineates the header and payload).

   We describe RTP packets as "logical" packets to highlight the fact
   that RTP itself is not a network-layer protocol.  Instead, RTP
   packets are mapped onto network protocols (such as unicast UDP,
   multicast UDP, or TCP) by an application [ALF].  The interleaved mode
   of the Real Time Streaming Protocol (RTSP, [RFC2326]) is an example
   of an RTP mapping to TCP transport, as is [RFC4571].

2.1.  RTP Header

   [RFC3550] provides a complete description of the RTP header fields.
   In this section, we clarify the role of a few RTP header fields for
   MIDI applications.  All fields are coded in network byte order (big-
   endian).

       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      | V |P|X|  CC   |M|     PT      |        Sequence number        |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                           Timestamp                           |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                             SSRC                              |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                     MIDI command section ...                  |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                       Journal section ...                     |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                         Figure 1 -- Packet format

   The behavior of the 1-bit M field depends on the media type of the
   stream.  For native streams, the M bit MUST be set to 1 if the MIDI
   command section has a non-zero LEN field, and MUST be set to 0
   otherwise.  For mpeg4-generic streams, the M bit MUST be set to 1 for
   all packets (to conform to [RFC3640]).

   In an RTP MIDI stream, the 16-bit sequence number field is
   initialized to a randomly chosen value and is incremented by one
   (modulo 2^16) for each packet sent in the stream.  A related
   quantity, the 32-bit extended packet sequence number, may be computed
   by tracking rollovers of the 16-bit sequence number.  Note that
   different receivers of the same stream may compute different extended
   packet sequence numbers, depending on when the receiver joined the
   session.

   The 32-bit timestamp field sets the base timestamp value for the
   packet.  The payload codes MIDI command timing relative to this
   value.  The timestamp units are set by the clock rate parameter.  For
   example, if the clock rate has a value of 44100 Hz, two packets whose
   base timestamp values differ by 2 seconds have RTP timestamp fields
   that differ by 88200.

   Note that the clock rate parameter is not encoded within each RTP
   MIDI packet.  A receiver of an RTP MIDI stream becomes aware of the
   clock rate as part of the session setup process.  For example, if a
   session management tool uses the Session Description Protocol (SDP,
   [RFC4566]) to describe a media session, the clock rate parameter is
   set using the rtpmap attribute.  We show examples of session setup in
   Section 6.

   For RTP MIDI streams destined to be rendered into audio, the clock
   rate SHOULD be an audio sample rate of 32 KHz or higher.  This
   recommendation is due to the sensitivity of human musical perception
   to small timing errors in musical note sequences, and due to the
   timbral changes that occur when two near-simultaneous MIDI NoteOns
   are rendered with a different timing than that desired by the content
   author due to clock rate quantization.  RTP MIDI streams that are not
   destined for audio rendering (such as MIDI streams that control stage
   lighting) MAY use a lower clock rate but SHOULD use a clock rate high
   enough to avoid timing artifacts in the application.

   For RTP MIDI streams destined to be rendered into audio, the clock
   rate SHOULD be chosen from rates in common use in professional audio
   applications or in consumer audio distribution.  At the time of this
   writing, these rates include 32 KHz, 44.1 KHz, 48 KHz, 64 KHz, 88.2
   KHz, 96 KHz, 176.4 KHz, and 192 KHz.  If the RTP MIDI session is a
   part of a synchronized media session that includes another (non-MIDI)
   RTP audio stream with a clock rate of 32 KHz or higher, the RTP MIDI
   stream SHOULD use a clock rate that matches the clock rate of the
   other audio stream.  However, if the RTP MIDI stream is destined to
   be rendered into audio, the RTP MIDI stream SHOULD NOT use a clock
   rate lower than 32 KHz, even if this second stream has a clock rate
   less than 32 KHz.

   Timestamps of consecutive packets do not necessarily increment at a
   fixed rate, because RTP MIDI packets are not necessarily sent at a
   fixed rate.  The degree of packet transmission regularity reflects
   the underlying application dynamics.  Interactive applications may
   vary the packet sending rate to track the gestural rate of a human
   performer, whereas content-streaming applications may send packets at
   a fixed rate.

   Therefore, the timestamps for two sequential RTP packets may be
   identical, or the second packet may have a timestamp arbitrarily
   larger than the first packet (modulo 2^32).  Section 3 places
   additional restrictions on the RTP timestamps for two sequential RTP
   packets, as does the guardtime parameter (Appendix C.4.2).

   We use the term "media time" to denote the temporal duration of the
   media coded by an RTP packet.  The media time coded by a packet is

   computed by subtracting the last command timestamp in the MIDI
   command section from the RTP timestamp (modulo 2^32).  If the MIDI
   list of the MIDI command section of a packet is empty, the media time
   coded by the packet is 0 ms.  Appendix C.4.1 discusses media time
   issues in detail.

   We now define RTP session semantics, in the context of sessions
   specified using the session description protocol [RFC4566].  A
   session description media line ("m=") specifies an RTP session.  An
   RTP session has an independent space of 2^32 synchronization sources.
   Synchronization source identifiers are coded in the SSRC header field
   of RTP session packets.  The payload types that may appear in the PT
   header field of RTP session packets are listed at the end of the
   media line.

   Several RTP MIDI streams may appear in an RTP session.  Each stream
   is distinguished by a unique SSRC value and has a unique sequence
   number and RTP timestamp space.  Multiple streams in the RTP session
   may be sent by a single party.  Multiple parties may send streams in
   the RTP session.  An RTP MIDI stream encodes data for a single MIDI
   command name space (16 voice channels + Systems).

   Streams in an RTP session may use different payload types, or they
   may use the same payload type.  However, each party may send, at
   most, one RTP MIDI stream for each payload type mapped to an RTP MIDI
   payload format in an RTP session.  Recall that dynamic binding of
   payload type numbers in [RFC4566] lets a party map many payload type
   numbers to the RTP MIDI payload format; thus a party may send many
   RTP MIDI streams in a single RTP session.  Pairs of streams (unicast
   or multicast) that communicate between two parties in an RTP session
   and that share a payload type have the same association as a MIDI
   cable pair that cross-connects two devices in a MIDI 1.0 DIN network.

   The RTP session architecture described above is efficient in its use
   of network ports, as one RTP session (using a port pair per party)
   supports the transport of many MIDI name spaces (16 MIDI channels +
   systems).  We define tools for grouping and labelling MIDI name
   spaces across streams and sessions in Appendix C.5 of this memo.

   The RTP header timestamps for each stream in an RTP session have
   separately and randomly chosen initialization values.  Receivers use
   the timing fields encoded in the RTP control protocol (RTCP,
   [RFC3550]) sender reports to synchronize the streams sent by a party.
   The SSRC values for each stream in an RTP session are also separately
   and randomly chosen, as described in [RFC3550].  Receivers use the
   CNAME field encoded in RTCP sender reports to verify that streams
   were sent by the same party, and to detect SSRC collisions, as
   described in [RFC3550].

   In some applications, a receiver renders MIDI commands into audio (or
   into control actions, such as the rewind of a tape deck or the
   dimming of stage lights).  In other applications, a receiver presents
   a MIDI stream to software programs via an Application Programmer
   Interface (API).  Appendix C.6 defines session configuration tools to
   specify what receivers should do with a MIDI command stream.

   If a multimedia session uses different RTP MIDI streams to send
   different classes of media, the streams MUST be sent over different
   RTP sessions.  For example, if a multimedia session uses one MIDI
   stream for audio and a second MIDI stream to control a lighting
   system, the audio and lighting streams MUST be sent over different
   RTP sessions, each with its own media line.

   Session description tools defined in Appendix C.5 let a sending party
   split a single MIDI name space (16 voice channels + systems) over
   several RTP MIDI streams.  Split transport of a MIDI command stream
   is a delicate task, because correct command stream reconstruction by
   a receiver depends on exact timing synchronization across the
   streams.

   To support split name spaces, we define the following requirements:

     o  A party MUST NOT send several RTP MIDI streams that share a MIDI
        name space in the same RTP session.  Instead, each stream MUST
        be sent from a different RTP session.

     o  If several RTP MIDI streams sent by a party share a MIDI name
        space, all streams MUST use the same SSRC value and MUST use the
        same randomly chosen RTP timestamp initialization value.

   These rules let a receiver identify streams that share a MIDI name
   space (by matching SSRC values) and also let a receiver accurately
   reconstruct the source MIDI command stream (by using RTP timestamps
   to interleave commands from the two streams).  Care MUST be taken by
   senders to ensure that SSRC changes due to collisions are reflected
   in both streams.  Receivers MUST regularly examine the RTCP CNAME
   fields associated with the linked streams, to ensure that the assumed
   link is legitimate and not the result of an SSRC collision by another
   sender.

   Except for the special cases described above, a party may send many
   RTP MIDI streams in the same session.  However, it is sometimes
   advantageous for two RTP MIDI streams to be sent over different RTP
   sessions.  For example, two streams may need different values for RTP
   session-level attributes (such as the sendonly and recvonly
   attributes).  As a second example, two RTP sessions may be needed to
   send two unicast streams in a multimedia session that originate on

   different computers (with different IP numbers).  Two RTP sessions
   are needed in this case because transport addresses are specified on
   the RTP-session or multimedia-session level, not on a payload type
   level.

   On a final note, in some uses of MIDI, parties send bidirectional
   traffic to conduct transactions (such as file exchange).  These
   commands were designed to work over MIDI 1.0 DIN cable networks may
   be configured in a multicast topology, which use pure "party-line"
   signalling.  Thus, if a multimedia session ensures a multicast
   connection between all parties, bidirectional MIDI commands will work
   without additional support from the RTP MIDI payload format.

2.2. MIDI Payload

   The payload (Figure 1) MUST begin with the MIDI command section.  The
   MIDI command section codes a (possibly empty) list of timestamped
   MIDI commands, and provides the essential service of the payload
   format.

   The payload MAY also contain a journal section.  The journal section
   provides resiliency by coding the recent history of the stream.  A
   flag in the MIDI command section codes the presence of a journal
   section in the payload.

   Section 3 defines the MIDI command section.  Sections 4-5 and
   Appendices A-B define the recovery journal, the default format for
   the journal section.  Here, we describe how these payload sections
   operate in a stream in an RTP session.

   The journalling method for a stream is set at the start of a session
   and MUST NOT be changed thereafter.  A stream may be set to use the
   recovery journal, to use an alternative journal format (none are
   defined in this memo), or not to use a journal.

   The default journalling method of a stream is inferred from its
   transport type.  Streams that use unreliable transport (such as UDP)
   default to using the recovery journal.  Streams that use reliable
   transport (such as TCP) default to not using a journal.  Appendix
   C.2.1 defines session configuration tools for overriding these
   defaults.  For all types of transport, a sender MUST transmit an RTP
   packet stream with consecutive sequence numbers (modulo 2^16).

   If a stream uses the recovery journal, every payload in the stream
   MUST include a journal section.  If a stream does not use
   journalling, a journal section MUST NOT appear in a stream payload.
   If a stream uses an alternative journal format, the specification for
   the journal format defines an inclusion policy.

   If a stream is sent over UDP transport, the Maximum Transmission Unit
   (MTU) of the underlying network limits the practical size of the
   payload section (for example, an Ethernet MTU is 1500 octets), for
   applications where predictable and minimal packet transmission
   latency is critical.  A sender SHOULD NOT create RTP MIDI UDP packets
   whose size exceeds the MTU of the underlying network.  Instead, the
   sender SHOULD take steps to keep the maximum packet size under the
   MTU limit.

   These steps may take many forms.  The default closed-loop recovery
   journal sending policy (defined in Appendix C.2.2.2) uses RTP control
   protocol (RTCP, [RFC3550]) feedback to manage the RTP MIDI packet
   size.  In addition, Section 3.2 and Appendix B.5.2 provide specific
   tools for managing the size of packets that code MIDI System
   Exclusive (0xF0) commands.  Appendix C.5 defines session
   configuration tools that may be used to split a dense MIDI name space
   into several UDP streams (each sent in a different RTP session, per
   Section 2.1) so that the payload fits comfortably into an MTU.
   Another option is to use TCP.  Section 4.3 of [RFC4696] provides
   non-normative advice for packet size management.

3.  MIDI Command Section

   Figure 2 shows the format of the MIDI command section.

       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |B|J|Z|P|LEN... |  MIDI list ...                                |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                      Figure 2 -- MIDI command section

   The MIDI command section begins with a variable-length header.

   The header field LEN codes the number of octets in the MIDI list that
   follow the header.  If the header flag B is 0, the header is one
   octet long, and LEN is a 4-bit field, supporting a maximum MIDI list
   length of 15 octets.

   If B is 1, the header is two octets long, and LEN is a 12-bit field,
   supporting a maximum MIDI list length of 4095 octets.  LEN is coded
   in network byte order (big-endian): the 4 bits of LEN that appear in
   the first header octet code the most significant 4 bits of the 12-bit
   LEN value.

   A LEN value of 0 is legal, and it codes an empty MIDI list.

   If the J header bit is set to 1, a journal section MUST appear after
   the MIDI command section in the payload.  If the J header bit is set
   to 0, the payload MUST NOT contain a journal section.

   We define the semantics of the P header bit in Section 3.2.

   If the LEN header field is nonzero, the MIDI list has the structure
   shown in Figure 3.

      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |  Delta Time 0     (1-4 octets long, or 0 octets if Z = 1)     |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |  MIDI Command 0   (1 or more octets long)                     |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |  Delta Time 1     (1-4 octets long)                           |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |  MIDI Command 1   (1 or more octets long)                     |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                              ...                              |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |  Delta Time N     (1-4 octets long)                           |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |  MIDI Command N   (0 or more octets long)                     |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                       Figure 3 -- MIDI list structure

   If the header flag Z is 1, the MIDI list begins with a complete MIDI
   command (coded in the MIDI Command 0 field, in Figure 3) preceded by
   a delta time (coded in the Delta Time 0 field).  If Z is 0, the Delta
   Time 0 field is not present in the MIDI list, and the command coded
   in the MIDI Command 0 field has an implicit delta time of 0.

   The MIDI list structure may also optionally encode a list of N
   additional complete MIDI commands, each coded in a MIDI Command K
   field.  Each additional command MUST be preceded by a Delta Time K
   field, which codes the command's delta time.  We discuss exceptions
   to the "command fields code complete MIDI commands" rule in Section
   3.2.

   The final MIDI command field (i.e., the MIDI Command N field, shown
   in Figure 3) in the MIDI list MAY be empty.  Moreover, a MIDI list
   MAY consist a single delta time (encoded in the Delta Time 0 field)
   without an associated command (which would have been encoded in the
   MIDI Command 0 field).  These rules enable MIDI coding features that
   are explained in Section 3.1.  We delay the explanations because an
   understanding of RTP MIDI timestamps is necessary to describe the
   features.

3.1.  Timestamps

   In this section, we describe how RTP MIDI encodes a timestamp for
   each MIDI list command.  Command timestamps have the same units as
   RTP packet header timestamps (described in Section 2.1 and
   [RFC3550]).  Recall that RTP timestamps have units of seconds, whose
   scaling is set during session configuration (see Section 6.1 and
   [RFC4566]).

   As shown in Figure 3, the MIDI list encodes time using a compact
   delta-time format.  The RTP MIDI delta time syntax is a modified form
   of the MIDI File delta time syntax [MIDI].  RTP MIDI delta times use
   1-4 octet fields to encode 32-bit unsigned integers.  Figure 4 shows
   the encoded and decoded forms of delta times.  Note that delta time
   values may be legally encoded in multiple formats; for example, there
   are four legal ways to encode the zero delta time (0x00, 0x8000,
   0x808000, 0x80808000).

   RTP MIDI uses delta times to encode a timestamp for each MIDI
   command.  The timestamp for MIDI Command K is the summation (modulo
   2^32) of the RTP timestamp and decoded delta times 0 through K.  This
   cumulative coding technique, borrowed from MIDI File delta time
   coding, is efficient because it reduces the number of multi-octet
   delta times.

   All command timestamps in a packet MUST be less than or equal to the
   RTP timestamp of the next packet in the stream (modulo 2^32).

   This restriction ensures that a particular RTP MIDI packet in a
   stream is uniquely responsible for encoding time starting at the
   moment after the RTP timestamp encoded in the RTP packet header, and
   ending at the moment before the final command timestamp encoded in
   the MIDI list.  The "moment before" and "moment after" qualifiers
   acknowledge the "less than or equal" semantics (as opposed to
   "strictly less than") in the sentence above this paragraph.

   Note that it is possible to "pad" the end of an RTP MIDI packet with
   time that is guaranteed to be void of MIDI commands, by setting the
   "Delta Time N" field of the MIDI list to the end of the void time,
   and by omitting its corresponding "MIDI Command N" field (a syntactic
   construction the preamble of Section 3 expressly made legal).

   In addition, it is possible to code an RTP MIDI packet to express
   that a period of time in the stream is void of MIDI commands.  The
   RTP timestamp in the header would code the start of the void time.
   The MIDI list of this packet would consist of a "Delta Time 0" field

   that coded the end of the void time.  No other fields would be
   present in the MIDI list (a syntactic construction the preamble of
   Section 3 also expressly made legal).

   By default, a command timestamp indicates the execution time for the
   command.  The difference between two timestamps indicates the time
   delay between the execution of the commands.  This difference may be
   zero, coding simultaneous execution.  In this memo, we refer to this
   interpretation of timestamps as "comex" (COMmand EXecution)
   semantics.  We formally define comex semantics in Appendix C.3.

   The comex interpretation of timestamps works well for transcoding a
   Standard MIDI File (SMF) into an RTP MIDI stream, as SMFs code a
   timestamp for each MIDI command stored in the file.  To transcode an
   SMF that uses metric time markers, use the SMF tempo map (encoded in
   the SMF as meta-events) to convert metric SMF timestamp units into
   seconds-based RTP timestamp units.

   The comex interpretation also works well for MIDI hardware
   controllers that are coding raw sensor data directly onto an RTP MIDI
   stream.  Note that this controller design is preferable to a design
   that converts raw sensor data into a MIDI 1.0 cable command stream
   and then transcodes the stream onto an RTP MIDI stream.

   The comex interpretation of timestamps is usually not the best
   timestamp interpretation for transcoding a MIDI source that uses
   implicit command timing (such as MIDI 1.0 DIN cables) into an RTP
   MIDI stream.  Appendix C.3 defines alternatives to comex semantics
   and describes session configuration tools for selecting the timestamp
   interpretation semantics for a stream.

        One-Octet Delta Time:

           Encoded form: 0ddddddd
           Decoded form: 00000000 00000000 00000000 0ddddddd

        Two-Octet Delta Time:

           Encoded form: 1ccccccc 0ddddddd
           Decoded form: 00000000 00000000 00cccccc cddddddd

        Three-Octet Delta Time:

           Encoded form: 1bbbbbbb 1ccccccc 0ddddddd
           Decoded form: 00000000 000bbbbb bbcccccc cddddddd

        Four-Octet Delta Time:

           Encoded form: 1aaaaaaa 1bbbbbbb 1ccccccc 0ddddddd
           Decoded form: 0000aaaa aaabbbbb bbcccccc cddddddd

                  Figure 4 -- Decoding delta time formats

3.2.  Command Coding

   Each non-empty MIDI Command field in the MIDI list codes one of the
   MIDI command types that may legally appear on a MIDI 1.0 DIN cable.
   Standard MIDI File meta-events do not fit this definition and MUST
   NOT appear in the MIDI list.  As a rule, each MIDI Command field
   codes a complete command, in the binary command format defined in
   [MIDI].  In the remainder of this section, we describe exceptions to
   this rule.

   The first MIDI channel command in the MIDI list MUST include a status
   octet.  Running status coding, as defined in [MIDI], MAY be used for
   all subsequent MIDI channel commands in the list.  As in [MIDI],
   System Common and System Exclusive messages (0xF0 ... 0xF7) cancel
   the running status state, but System Real-time messages (0xF8 ...
   0xFF) do not affect the running status state.  All System commands in
   the MIDI list MUST include a status octet.

   As we note above, the first channel command in the MIDI list MUST
   include a status octet.  However, the corresponding command in the
   original MIDI source data stream might not have a status octet (in
   this case, the source would be coding the command using running
   status).  If the status octet of the first channel command in the
   MIDI list does not appear in the source data stream, the P (phantom)
   header bit MUST be set to 1.  In all other cases, the P bit MUST be
   set to 0.

   Note that the P bit describes the MIDI source data stream, not the
   MIDI list encoding; regardless of the state of the P bit, the MIDI
   list MUST include the status octet.

   As receivers MUST be able to decode running status, sender
   implementors should feel free to use running status to improve
   bandwidth efficiency.  However, senders SHOULD NOT introduce timing
   jitter into an existing MIDI command stream through an inappropriate
   use or removal of running status coding.  This warning primarily
   applies to senders whose RTP MIDI streams may be transcoded onto a
   MIDI 1.0 DIN cable [MIDI] by the receiver: both the timestamps and
   the command coding (running status or not) must comply with the
   physical restrictions of implicit time coding over a slow serial
   line.

   On a MIDI 1.0 DIN cable [MIDI], a System Real-time command may be
   embedded inside of another "host" MIDI command.  This syntactic
   construction is not supported in the payload format: a MIDI Command
   field in the MIDI list codes exactly one MIDI command (partially or
   completely).

   To encode an embedded System Real-time command, senders MUST extract
   the command from its host and code it in the MIDI list as a separate
   command.  The host command and System Real-time command SHOULD appear
   in the same MIDI list.  The delta time of the System Real-time
   command SHOULD result in a command timestamp that encodes the System
   Real-time command placement in its original embedded position.

   Two methods are provided for encoding MIDI System Exclusive (SysEx)
   commands in the MIDI list.  A SysEx command may be encoded in a MIDI
   Command field verbatim: a 0xF0 octet, followed by an arbitrary number
   of data octets, followed by a 0xF7 octet.

   Alternatively, a SysEx command may be encoded as multiple segments.
   The command is divided into two or more SysEx command segments; each
   segment is encoded in its own MIDI Command field in the MIDI list.

   The payload format supports segmentation in order to encode SysEx
   commands that encode information in the temporal pattern of data
   octets.  By encoding these commands as a series of segments, each
   data octet may be associated with a distinct delta time.
   Segmentation also supports the coding of large SysEx commands across
   several packets.

   To segment a SysEx command, first partition its data octet list into
   two or more sublists.  The last sublist MAY be empty (i.e., contain
   no octets); all other sublists MUST contain at least one data octet.
   To complete the segmentation, add the status octets defined in Figure

   5 to the head and tail of the first, last, and any "middle" sublists.
   Figure 6 shows example segmentations of a SysEx command.

   A sender MAY cancel a segmented SysEx command transmission that is in
   progress, by sending the "cancel" sublist shown in Figure 5.  A
   "cancel" sublist MAY follow a "first" or "middle" sublist in the
   transmission, but MUST NOT follow a "last" sublist.  The cancel MUST
   be empty (thus, 0xF7 0xF4 is the only legal cancel sublist).

   The cancellation feature is needed because Appendix C.1 defines
   configuration tools that let session parties exclude certain SysEx
   commands in the stream.  Senders that transcode a MIDI source onto an
   RTP MIDI stream under these constraints have the responsibility of
   excluding undesired commands from the RTP MIDI stream.

   The cancellation feature lets a sender start the transmission of a
   command before the MIDI source has sent the entire command.  If a
   sender determines that the command whose transmission is in progress
   should not appear on the RTP stream, it cancels the command.  Without
   a method for cancelling a SysEx command transmission, senders would
   be forced to use a high-latency store-and-forward approach to
   transcoding SysEx commands onto RTP MIDI packets, in order to
   validate each SysEx command before transmission.

   The recommended receiver reaction to a cancellation depends on the
   capabilities of the receiver.  For example, a sound synthesizer that
   is directly parsing RTP MIDI packets and rendering them to audio will
   be aware of the fact that SysEx commands may be cancelled in RTP
   MIDI.  These receivers SHOULD detect a SysEx cancellation in the MIDI
   list and act as if they had never received the SysEx command.

   As a second example, a synthesizer may be receiving MIDI data from an
   RTP MIDI stream via a MIDI DIN cable (or a software API emulation of
   a MIDI DIN cable).  In this case, an RTP-MIDI-aware system receives
   the RTP MIDI stream and transcodes it onto the MIDI DIN cable (or its
   emulation).  Upon the receipt of the cancel sublist, the RTP-MIDI-
   aware transcoder might have already sent the first part of the SysEx
   command on the MIDI DIN cable to the receiver.

   Unfortunately, the MIDI DIN cable protocol cannot directly code
   "cancel SysEx in progress" semantics.  However, MIDI DIN cable
   receivers begin SysEx processing after the complete command arrives.
   The receiver checks to see if it recognizes the command (coded in the
   first few octets) and then checks to see if the command is the
   correct length.  Thus, in practice, a transcoder can cancel a SysEx
   command by sending an 0xF7 to (prematurely) end the SysEx command --
   the receiver will detect the incorrect command length and discard the
   command.

   Appendix C.1 defines configuration tools that may be used to prohibit
   SysEx command cancellation.

   The relative ordering of SysEx command segments in a MIDI list must
   match the relative ordering of the sublists in the original SysEx
   command.  By default, commands other than System Real-time MIDI
   commands MUST NOT appear between SysEx command segments (Appendix C.1
   defines configuration tools to change this default, to let other
   commands types appear between segments).  If the command segments of
   a SysEx command are placed in the MIDI lists of two or more RTP
   packets, the segment ordering rules apply to the concatenation of all
   affected MIDI lists.

          -----------------------------------------------------------
         | Sublist Position |  Head Status Octet | Tail Status Octet |
         |-----------------------------------------------------------|
         |    first         |       0xF0         |       0xF0        |
         |-----------------------------------------------------------|
         |    middle        |       0xF7         |       0xF0        |
         |-----------------------------------------------------------|
         |    last          |       0xF7         |       0xF7        |
         |-----------------------------------------------------------|
         |    cancel        |       0xF7         |       0xF4        |
          -----------------------------------------------------------

               Figure 5 -- Command segmentation status octets

   [MIDI] permits 0xF7 octets that are not part of a (0xF0, 0xF7) pair
   to appear on a MIDI 1.0 DIN cable.  Unpaired 0xF7 octets have no
   semantic meaning in MIDI, apart from cancelling running status.

   Unpaired 0xF7 octets MUST NOT appear in the MIDI list of the MIDI
   Command section.  We impose this restriction to avoid interference
   with the command segmentation coding defined in Figure 5.

   SysEx commands carried on a MIDI 1.0 DIN cable may use the "dropped
   0xF7" construction [MIDI].  In this coding method, the 0xF7 octet is
   dropped from the end of the SysEx command, and the status octet of
   the next MIDI command acts both to terminate the SysEx command and
   start the next command.  To encode this construction in the payload
   format, follow these steps:

     o  Determine the appropriate delta times for the SysEx command and
        the command that follows the SysEx command.

     o  Insert the "dropped" 0xF7 octet at the end of the SysEx command,
        to form the standard SysEx syntax.

     o  Code both commands into the MIDI list using the rules above.

     o  Replace the 0xF7 octet that terminates the verbatim SysEx
        encoding or the last segment of the segmented SysEx encoding
        with a 0xF5 octet.  This substitution informs the receiver of
        the original dropped 0xF7 coding.

   [MIDI] reserves the undefined System Common commands 0xF4 and 0xF5
   and the undefined System Real-time commands 0xF9 and 0xFD for future
   use.  By default, undefined commands MUST NOT appear in a MIDI
   Command field in the MIDI list, with the exception of the 0xF5 octets
   used to code the "dropped 0xF7" construction and the 0xF4 octets used
   by SysEx "cancel" sublists.

   During session configuration, a stream may be customized to transport
   undefined commands (Appendix C.1).  For this case, we now define how
   senders encode undefined commands in the MIDI list.

   An undefined System Real-time command MUST be coded using the System
   Real-time rules.

   If the undefined System Common commands are put to use in a future
   version of [MIDI], the command will begin with an 0xF4 or 0xF5 status
   octet, followed by an arbitrary number of data octets (i.e., zero or
   more data bytes).  To encode these commands, senders MUST terminate
   the command with an 0xF7 octet and place the modified command into
   the MIDI Command field.

   Unfortunately, non-compliant uses of the undefined System Common
   commands may appear in MIDI implementations.  To model these
   commands, we assume that the command begins with an 0xF4 or 0xF5
   status octet, followed by zero or more data octets, followed by zero
   or more trailing 0xF7 status octets.  To encode the command, senders
   MUST first remove all trailing 0xF7 status octets from the command.
   Then, senders MUST terminate the command with an 0xF7 octet and place
   the modified command into the MIDI Command field.

   Note that we include the trailing octets in our model as a cautionary
   measure: if such commands appeared in a non-compliant use of an
   undefined System Common command, an RTP MIDI encoding of the command
   that did not remove trailing octets could be mistaken for an encoding
   of "middle" or "last" sublist of a segmented SysEx commands (Figure
   5) under certain packet loss conditions.

          Original SysEx command:

              0xF0 0x01 0x02 0x03 0x04 0x05 0x06 0x07 0x08 0xF7

          A two-segment segmentation:

              0xF0 0x01 0x02 0x03 0x04 0xF0

              0xF7 0x05 0x06 0x07 0x08 0xF7

          A different two-segment segmentation:

              0xF0 0x01 0xF0

              0xF7 0x02 0x03 0x04 0x05 0x06 0x07 0x08 0xF7

          A three-segment segmentation:

              0xF0 0x01 0x02 0xF0

              0xF7 0x03 0x04 0xF0

              0xF7 0x05 0x06 0x07 0x08 0xF7

         The segmentation with the largest number of segments:

              0xF0 0x01 0xF0

              0xF7 0x02 0xF0

              0xF7 0x03 0xF0

              0xF7 0x04 0xF0

              0xF7 0x05 0xF0

              0xF7 0x06 0xF0

              0xF7 0x07 0xF0

              0xF7 0x08 0xF0

              0xF7 0xF7

                     Figure 6 -- Example segmentations

4.  The Recovery Journal System

   The recovery journal is the default resiliency tool for unreliable
   transport.  In this section, we normatively define the roles that
   senders and receivers play in the recovery journal system.

   MIDI is a fragile code.  A single lost command in a MIDI command
   stream may produce an artifact in the rendered performance.  We
   normatively classify rendering artifacts into two categories:

     o Transient artifacts.  Transient artifacts produce immediate but
       short-term glitches in the performance.  For example, a lost
       NoteOn (0x9) command produces a transient artifact: one note
       fails to play, but the artifact does not extend beyond the end of
       that note.

     o Indefinite artifacts.  Indefinite artifacts produce long-lasting
       errors in the rendered performance.  For example, a lost NoteOff
       (0x8) command may produce an indefinite artifact: the note that
       should have been ended by the lost NoteOff command may sustain
       indefinitely.  As a second example, the loss of a Control Change
       (0xB) command for controller number 7 (Channel Volume) may
       produce an indefinite artifact: after the loss, all notes on the
       channel may play too softly or too loudly.

   The purpose of the recovery journal system is to satisfy the recovery
   journal mandate: the MIDI performance rendered from an RTP MIDI
   stream sent over unreliable transport MUST NOT contain indefinite
   artifacts.

   The recovery journal system does not use packet retransmission to
   satisfy this mandate.  Instead, each packet includes a special
   section, called the recovery journal.

   The recovery journal codes the history of the stream, back to an
   earlier packet called the checkpoint packet.  The range of coverage
   for the journal is called the checkpoint history.  The recovery
   journal codes the information necessary to recover from the loss of
   an arbitrary number of packets in the checkpoint history.  Appendix
   A.1 normatively defines the checkpoint packet and the checkpoint
   history.

   When a receiver detects a packet loss, it compares its own knowledge
   about the history of the stream with the history information coded in
   the recovery journal of the packet that ends the loss event.  By
   noting the differences in these two versions of the past, a receiver
   is able to transform all indefinite artifacts in the rendered

   performance into transient artifacts, by executing MIDI commands to
   repair the stream.

   We now state the normative role for senders in the recovery journal
   system.

   Senders prepare a recovery journal for every packet in the stream.
   In doing so, senders choose the checkpoint packet identity for the
   journal.  Senders make this choice by applying a sending policy.
   Appendix C.2.2 normatively defines three sending policies: "closed-
   loop", "open-loop", and "anchor".

   By default, senders MUST use the closed-loop sending policy.  If the
   session description overrides this default policy, by using the
   parameter j_update defined in Appendix C.2.2, senders MUST use the
   specified policy.

   After choosing the checkpoint packet identity for a packet, the
   sender creates the recovery journal.  By default, this journal MUST
   conform to the normative semantics in Section 5 and Appendices A-B in
   this memo.  In Appendix C.2.3, we define parameters that modify the
   normative semantics for recovery journals.  If the session
   description uses these parameters, the journal created by the sender
   MUST conform to the modified semantics.

   Next, we state the normative role for receivers in the recovery
   journal system.

   A receiver MUST detect each RTP sequence number break in a stream.
   If the sequence number break is due to a packet loss event (as
   defined in [RFC3550]), the receiver MUST repair all indefinite
   artifacts in the rendered MIDI performance caused by the loss.  If
   the sequence number break is due to an out-of-order packet (as
   defined in [RFC3550]), the receiver MUST NOT take actions that
   introduce indefinite artifacts (ignoring the out-of-order packet is a
   safe option).

   Receivers take special precautions when entering or exiting a
   session.  A receiver MUST process the first received packet in a
   stream as if it were a packet that ends a loss event.  Upon exiting a
   session, a receiver MUST ensure that the rendered MIDI performance
   does not end with indefinite artifacts.

   Receivers are under no obligation to perform indefinite artifact
   repairs at the moment a packet arrives.  A receiver that uses a
   playout buffer may choose to wait until the moment of rendering
   before processing the recovery journal, as the "lost" packet may be a
   late packet that arrives in time to use.

   Next, we state the normative role for the creator of the session
   description in the recovery journal system.  Depending on the
   application, the sender, the receivers, and other parties may take
   part in creating or approving the session description.

   A session description that specifies the default closed-loop sending
   policy and the default recovery journal semantics satisfies the
   recovery journal mandate.  However, these default behaviors may not
   be appropriate for all sessions.  If the creators of a session
   description use the parameters defined in Appendix C.2 to override
   these defaults, the creators MUST ensure that the parameters define a
   system that satisfies the recovery journal mandate.

   Finally, we note that this memo does not specify sender or receiver
   recovery journal algorithms.  Implementations are free to use any
   algorithm that conforms to the requirements in this section.  The
   non-normative [RFC4696] discusses sender and receiver algorithm
   design.

5.  Recovery Journal Format

   This section introduces the structure of the recovery journal and
   defines the bitfields of recovery journal headers.  Appendices A-B
   complete the bitfield definition of the recovery journal.

   The recovery journal has a three-level structure:

     o Top-level header.

     o Channel and system journal headers.  These headers encode
       recovery information for a single voice channel (channel journal)
       or for all systems commands (system journal).

     o Chapters.  Chapters describe recovery information for a single
       MIDI command type.

   Figure 7 shows the top-level structure of the recovery journal.  The
   recovery journals consists of a 3-octet header, followed by an
   optional system journal (labeled S-journal in Figure 7) and an
   optional list of channel journals.  Figure 8 shows the recovery
   journal header format.

       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |            Recovery journal header            | S-journal ... |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                      Channel journals ...                     |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                Figure 7 -- Top-level recovery journal format

              0                   1                   2
              0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3
             +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
             |S|Y|A|H|TOTCHAN|   Checkpoint Packet Seqnum    |
             +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                    Figure 8 -- Recovery journal header

   If the Y header bit is set to 1, the system journal appears in the
   recovery journal, directly following the recovery journal header.

   If the A header bit is set to 1, the recovery journal ends with a
   list of (TOTCHAN + 1) channel journals (the 4-bit TOTCHAN header
   field is interpreted as an unsigned integer).

   A MIDI channel MAY be represented by (at most) one channel journal in
   a recovery journal.  Channel journals MUST appear in the recovery
   journal in ascending channel-number order.

   If A and Y are both zero, the recovery journal only contains its 3-
   octet header and is considered to be an "empty" journal.

   The S (single-packet loss) bit appears in most recovery journal
   structures, including the recovery journal header.  The S bit helps
   receivers efficiently parse the recovery journal in the common case
   of the loss of a single packet.  Appendix A.1 defines S bit
   semantics.

   The H bit indicates if MIDI channels in the stream have been
   configured to use the enhanced Chapter C encoding (Appendix A.3.3).

   By default, the payload format does not use enhanced Chapter C
   encoding.  In this default case, the H bit MUST be set to 0 for all
   packets in the stream.

   If the stream has been configured so that controller numbers for one
   or more MIDI channels use enhanced Chapter C encoding, the H bit MUST
   be set to 1 in all packets in the stream.  In Appendix C.2.3, we show
   how to configure a stream to use enhanced Chapter C encoding.

   The 16-bit Checkpoint Packet Seqnum header field codes the sequence
   number of the checkpoint packet for this journal, in network byte
   order (big-endian).  The choice of the checkpoint packet sets the
   depth of the checkpoint history for the journal (defined in Appendix
   A.1).

   Receivers may use the Checkpoint Packet Seqnum field of the packet
   that ends a loss event to verify that the journal checkpoint history
   covers the entire loss event.  The checkpoint history covers the loss
   event if the Checkpoint Packet Seqnum field is less than or equal to
   one plus the highest RTP sequence number previously received on the
   stream (modulo 2^16).

       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |S| CHAN  |H|      LENGTH       |P|C|M|W|N|E|T|A|  Chapters ... |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                     Figure 9 -- Channel journal format

   Figure 9 shows the structure of a channel journal: a 3-octet header,
   followed by a list of leaf elements called channel chapters.  A
   channel journal encodes information about MIDI commands on the MIDI
   channel coded by the 4-bit CHAN header field.  Note that CHAN uses
   the same bit encoding as the channel nibble in MIDI Channel Messages
   (the cccc field in Figure E.1 of Appendix E).

   The 10-bit LENGTH field codes the length of the channel journal.  The
   semantics for LENGTH fields are uniform throughout the recovery
   journal, and are defined in Appendix A.1.

   The third octet of the channel journal header is the Table of
   Contents (TOC) of the channel journal.  The TOC is a set of bits that
   encode the presence of a chapter in the journal.  Each chapter
   contains information about a certain class of MIDI channel command:

      o  Chapter P: MIDI Program Change (0xC)
      o  Chapter C: MIDI Control Change (0xB)
      o  Chapter M: MIDI Parameter System (part of 0xB)
      o  Chapter W: MIDI Pitch Wheel (0xE)
      o  Chapter N: MIDI NoteOff (0x8), NoteOn (0x9)
      o  Chapter E: MIDI Note Command Extras (0x8, 0x9)

      o  Chapter T: MIDI Channel Aftertouch (0xD)
      o  Chapter A: MIDI Poly Aftertouch (0xA)

   Chapters appear in a list following the header, in order of their
   appearance in the TOC.  Appendices A.2-9 describe the bitfield format
   for each chapter, and define the conditions under which a chapter
   type MUST appear in the recovery journal.  If any chapter types are
   required for a channel, an associated channel journal MUST appear in
   the recovery journal.

   The H bit indicates if controller numbers on a MIDI channel have been
   configured to use the enhanced Chapter C encoding (Appendix A.3.3).

   By default, controller numbers on a MIDI channel do not use enhanced
   Chapter C encoding.  In this default case, the H bit MUST be set to 0
   for all channel journal headers for the channel in the recovery
   journal, for all packets in the stream.

   However, if at least one controller number for a MIDI channel has
   been configured to use the enhanced Chapter C encoding, the H bit for
   its channel journal MUST be set to 1, for all packets in the stream.

   In Appendix C.2.3, we show how to configure a controller number to
   use enhanced Chapter C encoding.

       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |S|D|V|Q|F|X|      LENGTH       |  System chapters ...          |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                     Figure 10 -- System journal format

   Figure 10 shows the structure of the system journal: a 2-octet
   header, followed by a list of system chapters.  Each chapter codes
   information about a specific class of MIDI Systems command:

      o  Chapter D: Song Select (0xF3), Tune Request (0xF6), Reset
                    (0xFF), undefined System commands (0xF4, 0xF5, 0xF9,
                    0xFD)
      o  Chapter V: Active Sense (0xFE)
      o  Chapter Q: Sequencer State (0xF2, 0xF8, 0xF9, 0xFA, 0xFB, 0xFC)
      o  Chapter F: MTC Tape Position (0xF1, 0xF0 0x7F 0xcc 0x01 0x01)
      o  Chapter X: System Exclusive (all other 0xF0)

   The 10-bit LENGTH field codes the size of the system journal and
   conforms to semantics described in Appendix A.1.

   The D, V, Q, F, and X header bits form a Table of Contents (TOC) for
   the system journal.  A TOC bit that is set to 1 codes the presence of
   a chapter in the journal.  Chapters appear in a list following the
   header, in the order of their appearance in the TOC.

   Appendix B describes the bitfield format for the system chapters and
   defines the conditions under which a chapter type MUST appear in the
   recovery journal.  If any system chapter type is required to appear
   in the recovery journal, the system journal MUST appear in the
   recovery journal.

6.  Session Description Protocol

   RTP does not perform session management.  Instead, RTP works together
   with session management tools, such as the Session Initiation
   Protocol (SIP, [RFC3261]) and the Real Time Streaming Protocol (RTSP,
   [RFC2326]).

   RTP payload formats define media type parameters for use in session
   management (for example, this memo defines "rtp-midi" as the media
   type for native RTP MIDI streams).

   In most cases, session management tools use the media type parameters
   via another standard, the Session Description Protocol (SDP,
   [RFC4566]).

   SDP is a textual format for specifying session descriptions.  Session
   descriptions specify the network transport and media encoding for RTP
   sessions.  Session management tools coordinate the exchange of
   session descriptions between participants ("parties").

   Some session management tools use SDP to negotiate details of media
   transport (network addresses, ports, etc.).  We refer to this use of
   SDP as "negotiated usage".  One example of negotiated usage is the
   Offer/Answer protocol ([RFC3264] and Appendix C.7.2 in this memo) as
   used by SIP.

   Other session management tools use SDP to declare the media encoding
   for the session but use other techniques to negotiate network
   transport.  We refer to this use of SDP as "declarative usage".  One
   example of declarative usage is RTSP ([RFC2326] and Appendix C.7.1 in
   this memo).

   Below, we show session description examples for native (Section 6.1)
   and mpeg4-generic (Section 6.2) streams.  In Section 6.3, we
   introduce session configuration tools that may be used to customize
   streams.

6.1.  Session Descriptions for Native Streams

   The session description below defines a unicast UDP RTP session (via
   a media ("m=") line) whose sole payload type (96) is mapped to a
   minimal native RTP MIDI stream.

   v=0
   o=lazzaro 2520644554 2838152170 IN IP4 first.example.net
   s=Example
   t=0 0
   m=audio 5004 RTP/AVP 96
   c=IN IP4 192.0.2.94
   a=rtpmap:96 rtp-midi/44100

   The rtpmap attribute line uses the "rtp-midi" media type to specify
   an RTP MIDI native stream.  The clock rate specified on the rtpmap
   line (in the example above, 44100 Hz) sets the scaling for the RTP
   timestamp header field (see Section 2.1, and also [RFC3550]).

   Note that this document does not specify a default clock rate value
   for RTP MIDI.  When RTP MIDI is used with SDP, parties MUST use the
   rtpmap line to communicate the clock rate.  Guidance for selecting
   the RTP MIDI clock rate value appears in Section 2.1.

   We consider the RTP MIDI stream shown above to be "minimal" because
   the session description does not customize the stream with
   parameters.  Without such customization, a native RTP MIDI stream has
   these characteristics:

     1. If the stream uses unreliable transport (unicast UDP, multicast
        UDP, etc.), the recovery journal system is in use, and the RTP
        payload contains both the MIDI command section and the journal
        section.  If the stream uses reliable transport (such as TCP),
        the stream does not use journalling, and the payload contains
        only the MIDI command section (Section 2.2).

     2. If the stream uses the recovery journal system, the recovery
        journal system uses the default sending policy and the default
        journal semantics (Section 4).

     3. In the MIDI command section of the payload, command timestamps
        use the default "comex" semantics (Section 3).

     4. The recommended temporal duration ("media time") of an RTP
        packet ranges from 0 to 200 ms, and the RTP timestamp difference
        between sequential packets in the stream may be arbitrarily
        large (Section 2.1).

     5. If more than one minimal rtp-midi stream appears in a session,
        the MIDI name spaces for these streams are independent: channel
        1 in the first stream does not reference the same MIDI channel
        as channel 1 in the second stream (see Appendix C.5 for a
        discussion of the independence of minimal rtp-midi streams).

     6. The rendering method for the stream is not specified.  What the
        receiver "does" with a minimal native MIDI stream is "out of
        scope" of this memo.  For example, in content creation
        environments, a user may manually configure client software to
        render the stream with a specific software package.

   As in standard in RTP, RTP sessions managed by SIP are sendrecv by
   default (parties send and receive MIDI), and RTP sessions managed by
   RTSP are recvonly by default (server sends and client receives).

   In sendrecv RTP MIDI sessions for the session description shown
   above, the 16 voice channel + systems MIDI name space is unique for
   each sender.  Thus, in a two-party session, the voice channel 0 sent
   by one party is distinct from the voice channel 0 sent by the other
   party.

   This behavior corresponds to what occurs when two MIDI 1.0 DIN
   devices are cross-connected with two MIDI cables (one cable routing
   MIDI Out from the first device into MIDI In of the second device, a
   second cable routing MIDI In from the first device into MIDI Out of
   the second device).  We define this "association" formally in Section
   2.1.

   MIDI 1.0 DIN networks may be configured in a "party-line" multicast
   topology.  For these networks, the MIDI protocol itself provides
   tools for addressing specific devices in transactions on a multicast
   network, and for device discovery.  Thus, apart from providing a 1-
   to-many forward path and a many-to-1 reverse path, IETF protocols do
   not need to provide any special support for MIDI multicast
   networking.

6.2.  Session Descriptions for mpeg4-generic Streams

   An mpeg4-generic [RFC3640] RTP MIDI stream uses an MPEG 4 Audio
   Object Type to render MIDI into audio.  Three Audio Object Types
   accept MIDI input:

     o General MIDI (Audio Object Type ID 15), based on the General MIDI
       rendering standard [MIDI].

     o Wavetable Synthesis (Audio Object Type ID 14), based on the
       Downloadable Sounds Level 2 (DLS 2) rendering standard [DLS2].

     o Main Synthetic (Audio Object Type ID 13), based on Structured
       Audio and the programming language SAOL [MPEGSA].

   The primary service of an mpeg4-generic stream is to code Access
   Units (AUs).  We define the mpeg4-generic RTP MIDI AU as the MIDI
   payload shown in Figure 1 of Section 2.1 of this memo: a MIDI command
   section optionally followed by a journal section.

   Exactly one RTP MIDI AU MUST be mapped to one mpeg4-generic RTP MIDI
   packet.  The mpeg4-generic options for placing several AUs in an RTP
   packet MUST NOT be used with RTP MIDI.  The mpeg4-generic options for
   fragmenting and interleaving AUs MUST NOT be used with RTP MIDI.  The
   mpeg4-generic RTP packet payload (Figure 1 in [RFC3640]) MUST contain
   empty AU Header and Auxiliary sections.  These rules yield mpeg4-
   generic packets that are structurally identical to native RTP MIDI
   packets, an essential property for the correct operation of the
   payload format.

   The session description that follows defines a unicast UDP RTP
   session (via a media ("m=") line) whose sole payload type (96) is
   mapped to a minimal mpeg4-generic RTP MIDI stream.  This example uses
   the General MIDI Audio Object Type under Synthesis Profile @ Level 2.

   v=0
   o=lazzaro 2520644554 2838152170 IN IP6 first.example.net
   s=Example
   t=0 0
   m=audio 5004 RTP/AVP 96
   c=IN IP6 2001:DB80::7F2E:172A:1E24
   a=rtpmap:96 mpeg4-generic/44100
   a=fmtp:96 streamtype=5; mode=rtp-midi; profile-level-id=12;
   config=7A0A0000001A4D546864000000060000000100604D54726B0000
   000600FF2F000

   (The a=fmtp line has been wrapped to fit the page to accommodate memo
   formatting restrictions; it comprises a single line in SDP.)

   The fmtp attribute line codes the four parameters (streamtype, mode,
   profile-level-id, and config) that are required in all mpeg4-generic
   session descriptions [RFC3640].  For RTP MIDI streams, the streamtype
   parameter MUST be set to 5, the "mode" parameter MUST be set to
   "rtp-midi", and the "profile-level-id" parameter MUST be set to the
   MPEG-4 Profile Level for the stream.  For the Synthesis Profile,
   legal profile-level-id values are 11, 12, and 13, coding low (11),
   medium (12), or high (13) decoder computational complexity, as
   defined by MPEG conformance tests.

   In a minimal RTP MIDI session description, the config value MUST be a
   hexadecimal encoding [RFC3640] of the AudioSpecificConfig data block
   [MPEGAUDIO] for the stream.  AudioSpecificConfig encodes the Audio
   Object Type for the stream and also encodes initialization data (SAOL
   programs, DLS 2 wave tables, etc.).  Standard MIDI Files encoded in
   AudioSpecificConfig in a minimal session description MUST be ignored
   by the receiver.

   Receivers determine the rendering algorithm for the session by
   interpreting the first 5 bits of AudioSpecificConfig as an unsigned
   integer that codes the Audio Object Type.  In our example above, the
   leading config string nibbles "7A" yield the Audio Object Type 15
   (General MIDI).  In Appendix E.4, we derive the config string value
   in the session description shown above; the starting point of the
   derivation is the MPEG bitstreams defined in [MPEGSA] and
   [MPEGAUDIO].

   We consider the stream to be "minimal" because the session
   description does not customize the stream through the use of
   parameters, other than the 4 required mpeg4-generic parameters
   described above.  In Section 6.1, we describe the behavior of a
   minimal native stream, as a numbered list of characteristics.  Items
   1-4 on that list also describe the minimal mpeg4-generic stream, but
   items 5 and 6 require restatements, as listed below:

     5. If more than one minimal mpeg4-generic stream appears in a
        session, each stream uses an independent instance of the Audio
        Object Type coded in the config parameter value.

     6. A minimal mpeg4-generic stream encodes the AudioSpecificConfig
        as an inline hexadecimal constant.  If a session description is
        sent over UDP, it may be impossible to transport large
        AudioSpecificConfig blocks within the Maximum Transmission Size
        (MTU) of the underlying network (for Ethernet, the MTU is 1500
        octets).  In some cases, the AudioSpecificConfig block may
        exceed the maximum size of the UDP packet itself.

   The comments in Section 6.1 on SIP and RTSP stream directional
   defaults, sendrecv MIDI channel usage, and MIDI 1.0 DIN multicast
   networks also apply to mpeg4-generic RTP MIDI sessions.

   In sendrecv sessions, each party's session description MUST use
   identical values for the mpeg4-generic parameters (including the
   required streamtype, mode, profile-level-id, and config parameters).
   As a consequence, each party uses an identically configured MPEG 4
   Audio Object Type to render MIDI commands into audio.  The preamble
   to Appendix C discusses a way to create "virtual sendrecv" sessions
   that do not have this restriction.

6.3.  Parameters

   This section introduces parameters for session configuration for RTP
   MIDI streams.  In session descriptions, parameters modify the
   semantics of a payload type.  Parameters are specified on an fmtp
   attribute line.  See the session description example in Section 6.2
   for an example of a fmtp attribute line.

   The parameters add features to the minimal streams described in
   Sections 6.1-2, and support several types of services:

     o  Stream subsetting.  By default, all MIDI commands that are legal
        to appear on a MIDI 1.0 DIN cable may appear in an RTP MIDI
        stream.  The cm_unused parameter overrides this default by
        prohibiting certain commands from appearing in the stream.  The
        cm_used parameter is used in conjunction with cm_unused, to
        simplify the specification of complex exclusion rules.  We
        describe cm_unused and cm_used in Appendix C.1.

     o  Journal customization.  The j_sec and j_update parameters
        configure the use of the journal section.  The ch_default,
        ch_never, and ch_anchor parameters configure the semantics of
        the recovery journal chapters.  These parameters are described
        in Appendix C.2 and override the default stream behaviors 1 and
        2, listed in Section 6.1 and referenced in Section 6.2.

     o  MIDI command timestamp semantics.  The tsmode, octpos, mperiod,
        and linerate parameters customize the semantics of timestamps in
        the MIDI command section.  These parameters let RTP MIDI
        accurately encode the implicit time coding of MIDI 1.0 DIN
        cables.  These parameters are described in Appendix C.3 and
        override default stream behavior 3, listed in Section 6.1 and
        referenced in Section 6.2

     o  Media time.  The rtp_ptime and rtp_maxptime parameters define
        the temporal duration ("media time") of an RTP MIDI packet.  The
        guardtime parameter sets the minimum sending rate of stream
        packets.  These parameters are described in Appendix C.4 and
        override default stream behavior 4, listed in Section 6.1 and
        referenced in Section 6.2.

     o  Stream description.  The musicport parameter labels the MIDI
        name space of RTP streams in a multimedia session.  Musicport is
        described in Appendix C.5.  The musicport parameter overrides
        default stream behavior 5, in Sections 6.1 and 6.2.

     o  MIDI rendering.  Several parameters specify the MIDI rendering
        method of a stream.  These parameters are described in Appendix
        C.6 and override default stream behavior 6, in Sections 6.1 and
        6.2.

   In Appendix C.7, we specify interoperability guidelines for two RTP
   MIDI application areas: content-streaming using RTSP (Appendix C.7.1)
   and network musical performance using SIP (Appendix C.7.2).

7.  Extensibility

   The payload format defined in this memo exclusively encodes all
   commands that may legally appear on a MIDI 1.0 DIN cable.

   Many worthy uses of MIDI over RTP do not fall within the narrow scope
   of the payload format.  For example, the payload format does not
   support the direct transport of Standard MIDI File (SMF) meta-event
   and metric timing data.  As a second example, the payload format does
   not define transport tools for user-defined commands (apart from
   tools to support System Exclusive commands [MIDI]).

   The payload format does not provide an extension mechanism to support
   new features of this nature, by design.  Instead, we encourage the
   development of new payload formats for specialized musical
   applications.  The IETF session management tools [RFC3264] [RFC2326]
   support codec negotiation, to facilitate the use of new payload
   formats in a backward-compatible way.

   However, the payload format does provide several extensibility tools,
   which we list below:

     o  Journalling.  As described in Appendix C.2, new token values for
        the j_sec and j_update parameters may be defined in IETF
        standards-track documents.  This mechanism supports the design
        of new journal formats and the definition of new journal sending
        policies.

     o  Rendering.  The payload format may be extended to support new
        MIDI renderers (Appendix C.6.2).  Certain general aspects of the
        RTP MIDI rendering process may also be extended, via the
        definition of new token values for the render (Appendix C.6) and
        smf_info (Appendix C.6.4.1) parameters.

     o  Undefined commands.  [MIDI] reserves 4 MIDI System commands for
        future use (0xF4, 0xF5, 0xF9, 0xFD).  If updates to [MIDI]
        define the reserved commands, IETF standards-track documents may
        be defined to provide resiliency support for the commands.

        Opaque LEGAL fields appear in System Chapter D for this purpose
        (Appendix B.1.1).

   A final form of extensibility involves the inclusion of the payload
   format in framework documents.  Framework documents describe how to
   combine protocols to form a platform for interoperable applications.
   For example, a stage and studio framework might define how to use SIP
   [RFC3261], RTSP [RFC2326], SDP [RFC4566], and RTP [RFC3550] to
   support media networking for professional audio equipment and
   electronic musical instruments.

8.  Congestion Control

   The RTP congestion control requirements defined in [RFC3550] apply to
   RTP MIDI sessions, and implementors should carefully read the
   congestion control section in [RFC3550].  As noted in [RFC3550], all
   transport protocols used on the Internet need to address congestion
   control in some way, and RTP is not an exception.

   In addition, the congestion control requirements defined in [RFC3551]
   applies to RTP MIDI sessions run under applicable profiles.  The
   basic congestion control requirement defined in [RFC3551] is that RTP
   sessions that use UDP transport should monitor packet loss (via RTCP
   or other means) to ensure that the RTP stream competes fairly with
   TCP flows that share the network.

   Finally, RTP MIDI has congestion control issues that are unique for
   an audio RTP payload format.  In applications such as network musical
   performance [NMP], the packet rate is linked to the gestural rate of
   a human performer.  Senders MUST monitor the MIDI command source for
   patterns that result in excessive packet rates and take actions
   during RTP transcoding to reduce the RTP packet rate.  [RFC4696]
   offers implementation guidance on this issue.

9.  Security Considerations

   Implementors should carefully read the Security Considerations
   sections of the RTP [RFC3550], AVP [RFC3551], and other RTP profile
   documents, as the issues discussed in these sections directly apply
   to RTP MIDI streams.  Implementors should also review the Secure
   Real-time Transport Protocol (SRTP, [RFC3711]), an RTP profile that
   addresses the security issues discussed in [RFC3550] and [RFC3551].

   Here, we discuss security issues that are unique to the RTP MIDI
   payload format.

   When using RTP MIDI, authentication of incoming RTP and RTCP packets
   is RECOMMENDED.  Per-packet authentication may be provided by SRTP or

   by other means.  Without the use of authentication, attackers could
   forge MIDI commands into an ongoing stream, damaging speakers and
   eardrums.  An attacker could also craft RTP and RTCP packets to
   exploit known bugs in the client and take effective control of a
   client machine.

   Session management tools (such as SIP [RFC3261]) SHOULD use
   authentication during the transport of all session descriptions
   containing RTP MIDI media streams.  For SIP, the Security
   Considerations section in [RFC3261] provides an overview of possible
   authentication mechanisms.  RTP MIDI session descriptions should use
   authentication because the session descriptions may code
   initialization data using the parameters described in Appendix C.  If
   an attacker inserts bogus initialization data into a session
   description, he can corrupt the session or forge an client attack.

   Session descriptions may also code renderer initialization data by
   reference, via the url (Appendix C.6.3) and smf_url (Appendix
   C.6.4.2) parameters.  If the coded URL is spoofed, both session and
   client are open to attack, even if the session description itself is
   authenticated.  Therefore, URLs specified in url and smf_url
   parameters SHOULD use [RFC2818].

   Section 2.1 allows streams sent by a party in two RTP sessions to
   have the same SSRC value and the same RTP timestamp initialization
   value, under certain circumstances.  Normally, these values are
   randomly chosen for each stream in a session, to make plaintext
   guessing harder to do if the payloads are encrypted.  Thus, Section
   2.1 weakens this aspect of RTP security.

10.  Acknowledgements

   We thank the networking, media compression, and computer music
   community members who have commented or contributed to the effort,
   including Kurt B, Cynthia Bruyns, Steve Casner, Paul Davis, Robin
   Davies, Joanne Dow, Tobias Erichsen, Nicolas Falquet, Dominique
   Fober, Philippe Gentric, Michael Godfrey, Chris Grigg, Todd Hager,
   Michel Jullian, Phil Kerr, Young-Kwon Lim, Jessica Little, Jan van
   der Meer, Colin Perkins, Charlie Richmond, Herbie Robinson, Larry
   Rowe, Eric Scheirer, Dave Singer, Martijn Sipkema, William Stewart,
   Kent Terry, Magnus Westerlund, Tom White, Jim Wright, Doug Wyatt, and
   Giorgio Zoia.  We also thank the members of the San Francisco Bay
   Area music and audio community for creating the context for the work,
   including Don Buchla, Chris Chafe, Richard Duda, Dan Ellis, Adrian
   Freed, Ben Gold, Jaron Lanier, Roger Linn, Richard Lyon, Dana Massie,
   Max Mathews, Keith McMillen, Carver Mead, Nelson Morgan, Tom
   Oberheim, Malcolm Slaney, Dave Smith, Julius Smith, David Wessel, and
   Matt Wright.

11.  IANA Considerations

   This section makes a series of requests to IANA.  The IANA has
   completed registration/assignments of the below requests.

   The sub-sections that follow hold the actual, detailed requests.  All
   registrations in this section are in the IETF tree and follow the
   rules of [RFC4288] and [RFC3555], as appropriate.

   In Section 11.1, we request the registration of a new media type:
   "audio/rtp-midi".  Paired with this request is a request for a
   repository for new values for several parameters associated with
   "audio/rtp-midi".  We request this repository in Section 11.1.1.

   In Section 11.2, we request the registration of a new value ("rtp-
   midi") for the "mode" parameter of the "mpeg4-generic" media type.
   The "mpeg4-generic" media type is defined in [RFC3640], and [RFC3640]
   defines a repository for the "mode" parameter.  However, we believe
   we are the first to request the registration of a "mode" value, so we
   believe the registry for "mode" has not yet been created by IANA.

   Paired with our "mode" parameter value request for "mpeg4-generic" is
   a request for a repository for new values for several parameters we
   have defined for use with the "rtp-midi" mode value.  We request this
   repository in Section 11.2.1.

   In Section 11.3, we request the registration of a new media type:
   "audio/asc".  No repository request is associated with this request.

11.1.  rtp-midi Media Type Registration

   This section requests the registration of the "rtp-midi" subtype for
   the "audio" media type.  We request the registration of the
   parameters listed in the "optional parameters" section below (both
   the "non-extensible parameters" and the "extensible parameters"
   lists).  We also request the creation of repositories for the
   "extensible parameters"; the details of this request appear in
   Section 11.1.1, below.

   Media type name:

       audio

   Subtype name:

       rtp-midi

   Required parameters:

       rate: The RTP timestamp clock rate.  See Sections 2.1 and 6.1
       for usage details.

   Optional parameters:

       Non-extensible parameters:

          ch_anchor:    See Appendix C.2.3 for usage details.
          ch_default:   See Appendix C.2.3 for usage details.
          ch_never:     See Appendix C.2.3 for usage details.
          cm_unused:    See Appendix C.1 for usage details.
          cm_used:      See Appendix C.1 for usage details.
          chanmask:     See Appendix C.6.4.3 for usage details.
          cid:          See Appendix C.6.3 for usage details.
          guardtime:    See Appendix C.4.2 for usage details.
          inline:       See Appendix C.6.3 for usage details.
          linerate:     See Appendix C.3 for usage details.
          mperiod:      See Appendix C.3 for usage details.
          multimode:    See Appendix C.6.1 for usage details.
          musicport:    See Appendix C.5 for usage details.
          octpos:       See Appendix C.3 for usage details.
          rinit:        See Appendix C.6.3 for usage details.
          rtp_maxptime: See Appendix C.4.1 for usage details.
          rtp_ptime:    See Appendix C.4.1 for usage details.
          smf_cid:      See Appendix C.6.4.2 for usage details.
          smf_inline:   See Appendix C.6.4.2 for usage details.
          smf_url:      See Appendix C.6.4.2 for usage details.
          tsmode:       See Appendix C.3 for usage details.
          url:          See Appendix C.6.3 for usage details.

       Extensible parameters:

          j_sec:        See Appendix C.2.1 for usage details.  See
                        Section 11.1.1 for repository details.
          j_update:     See Appendix C.2.2 for usage details.  See
                        Section 11.1.1 for repository details.
          render:       See Appendix C.6 for usage details.  See
                        Section 11.1.1 for repository details.
          subrender:    See Appendix C.6.2 for usage details.  See
                        Section 11.1.1 for repository details.
          smf_info:     See Appendix C.6.4.1 for usage details.  See
                        Section 11.1.1 for repository details.

   Encoding considerations:

       The format for this type is framed and binary.

   Restrictions on usage:

       This type is only defined for real-time transfers of MIDI
       streams via RTP.  Stored-file semantics for rtp-midi may
       be defined in the future.

   Security considerations:

       See Section 9 of this memo.

   Interoperability considerations:

       None.

   Published specification:

       This memo and [MIDI] serve as the normative specification.  In
       addition, references [NMP], [GRAME], and [RFC4696] provide
       non-normative implementation guidance.

   Applications that use this media type:

       Audio content-creation hardware, such as MIDI controller piano
       keyboards and MIDI audio synthesizers.  Audio content-creation
       software, such as music sequencers, digital audio workstations,
       and soft synthesizers.  Computer operating systems, for network
       support of MIDI Application Programmer Interfaces.  Content
       distribution servers and terminals may use this media type for
       low bit-rate music coding.

   Additional information:

       None.

   Person & email address to contact for further information:

       John Lazzaro <lazzaro@cs.berkeley.edu>

   Intended usage:

       COMMON.

   Author:

       John Lazzaro <lazzaro@cs.berkeley.edu>

   Change controller:

       IETF Audio/Video Transport Working Group delegated
       from the IESG.

11.1.1.  Repository Request for "audio/rtp-midi"

   For the "rtp-midi" subtype, we request the creation of repositories
   for extensions to the following parameters (which are those listed as
   "extensible parameters" in Section 11.1).

      j_sec:

         Registrations for this repository may only occur
         via an IETF standards-track document.  Appendix C.2.1
         of this memo describes appropriate registrations for this
         repository.

         Initial values for this repository appear below:

         "none":  Defined in Appendix C.2.1 of this memo.
         "recj":  Defined in Appendix C.2.1 of this memo.

      j_update:

         Registrations for this repository may only occur
         via an IETF standards-track document.  Appendix C.2.2
         of this memo describes appropriate registrations for this
         repository.

         Initial values for this repository appear below:

         "anchor":  Defined in Appendix C.2.2 of this memo.
         "open-loop":  Defined in Appendix C.2.2 of this memo.
         "closed-loop":  Defined in Appendix C.2.2 of this memo.

      render:

         Registrations for this repository MUST include a
         specification of the usage of the proposed value.
         See text in the preamble of Appendix C.6 for details
         (the paragraph that begins "Other render token ...").

         Initial values for this repository appear below:

         "unknown":  Defined in Appendix C.6 of this memo.
         "synthetic":  Defined in Appendix C.6 of this memo.
         "api":  Defined in Appendix C.6 of this memo.
         "null":  Defined in Appendix C.6 of this memo.

      subrender:

         Registrations for this repository MUST include a
         specification of the usage of the proposed value.
         See text Appendix C.6.2 for details (the paragraph
         that begins "Other subrender token ...").

         Initial values for this repository appear below:

         "default":  Defined in Appendix C.6.2 of this memo.

      smf_info:

         Registrations for this repository MUST include a
         specification of the usage of the proposed value.
         See text in Appendix C.6.4.1 for details (the
         paragraph that begins "Other smf_info token ...").

         Initial values for this repository appear below:

         "ignore":  Defined in Appendix C.6.4.1 of this memo.
         "sdp_start":  Defined in Appendix C.6.4.1 of this memo.
         "identity":  Defined in Appendix C.6.4.1 of this memo.

11.2.  mpeg4-generic Media Type Registration

   This section requests the registration of the "rtp-midi" value for
   the "mode" parameter of the "mpeg4-generic" media type.  The "mpeg4-
   generic" media type is defined in [RFC3640], and [RFC3640] defines a
   repository for the "mode" parameter.  We are registering mode rtp-
   midi to support the MPEG Audio codecs [MPEGSA] that use MIDI.

   In conjunction with this registration request, we request the
   registration of the parameters listed in the "optional parameters"
   section below (both the "non-extensible parameters" and the
   "extensible parameters" lists).  We also request the creation of
   repositories for the "extensible parameters"; the details of this
   request appear in Appendix 11.2.1, below.

   Media type name:

       audio

   Subtype name:

       mpeg4-generic

   Required parameters:

       The "mode" parameter is required by [RFC3640].  [RFC3640]
       requests a repository for "mode", so that new values for mode
       may be added.  We request that the value "rtp-midi" be
       added to the "mode" repository.

       In mode rtp-midi, the mpeg4-generic parameter rate is
       a required parameter.  Rate specifies the RTP timestamp
       clock rate.  See Sections 2.1 and 6.2 for usage details
       of rate in mode rtp-midi.

   Optional parameters:

       We request registration of the following parameters
       for use in mode rtp-midi for mpeg4-generic.

       Non-extensible parameters:

          ch_anchor:    See Appendix C.2.3 for usage details.
          ch_default:   See Appendix C.2.3 for usage details.
          ch_never:     See Appendix C.2.3 for usage details.
          cm_unused:    See Appendix C.1 for usage details.
          cm_used:      See Appendix C.1 for usage details.
          chanmask:     See Appendix C.6.4.3 for usage details.
          cid:          See Appendix C.6.3 for usage details.
          guardtime:    See Appendix C.4.2 for usage details.
          inline:       See Appendix C.6.3 for usage details.
          linerate:     See Appendix C.3 for usage details.
          mperiod:      See Appendix C.3 for usage details.
          multimode:    See Appendix C.6.1 for usage details.
          musicport:    See Appendix C.5 for usage details.
          octpos:       See Appendix C.3 for usage details.
          rinit:        See Appendix C.6.3 for usage details.
          rtp_maxptime: See Appendix C.4.1 for usage details.
          rtp_ptime:    See Appendix C.4.1 for usage details.
          smf_cid:      See Appendix C.6.4.2 for usage details.
          smf_inline:   See Appendix C.6.4.2 for usage details.

          smf_url:      See Appendix C.6.4.2 for usage details.
          tsmode:       See Appendix C.3 for usage details.
          url:          See Appendix C.6.3 for usage details.

       Extensible parameters:

          j_sec:        See Appendix C.2.1 for usage details.  See
                        Section 11.2.1 for repository details.
          j_update:     See Appendix C.2.2 for usage details.  See
                        Section 11.2.1 for repository details.
          render:       See Appendix C.6 for usage details.  See
                        Section 11.2.1 for repository details.
          subrender:    See Appendix C.6.2 for usage details.  See
                        Section 11.2.1 for repository details.
          smf_info:     See Appendix C.6.4.1 for usage details.  See
                        Section 11.2.1 for repository details.

   Encoding considerations:

       The format for this type is framed and binary.

   Restrictions on usage:

       Only defined for real-time transfers of audio/mpeg4-generic
       RTP streams with mode=rtp-midi.

   Security considerations:

       See Section 9 of this memo.

   Interoperability considerations:

       Except for the marker bit (Section 2.1), the packet formats
       for audio/rtp-midi and audio/mpeg4-generic (mode rtp-midi)
       are identical.  The formats differ in use: audio/mpeg4-generic
       is for MPEG work, and audio/rtp-midi is for all other work.

   Published specification:

       This memo, [MIDI], and [MPEGSA] are the normative references.
       In addition, references [NMP], [GRAME], and [RFC4696] provide
       non-normative implementation guidance.

   Applications that use this media type:

       MPEG 4 servers and terminals that support [MPEGSA].

   Additional information:

       None.

   Person & email address to contact for further information:

       John Lazzaro <lazzaro@cs.berkeley.edu>

   Intended usage:

       COMMON.

   Author:

       John Lazzaro <lazzaro@cs.berkeley.edu>

   Change controller:

       IETF Audio/Video Transport Working Group delegated
       from the IESG.

11.2.1.  Repository Request for Mode rtp-midi for mpeg4-generic

   For mode rtp-midi of the mpeg4-generic subtype, we request the
   creation of repositories for extensions to the following parameters
   (which are those listed as "extensible parameters" in Section 11.2).

      j_sec:

         Registrations for this repository may only occur
         via an IETF standards-track document.  Appendix C.2.1
         of this memo describes appropriate registrations for this
         repository.

         Initial values for this repository appear below:

         "none":  Defined in Appendix C.2.1 of this memo.
         "recj":  Defined in Appendix C.2.1 of this memo.

      j_update:

         Registrations for this repository may only occur
         via an IETF standards-track document.  Appendix C.2.2
         of this memo describes appropriate registrations for this
         repository.

         Initial values for this repository appear below:

         "anchor":  Defined in Appendix C.2.2 of this memo.
         "open-loop":  Defined in Appendix C.2.2 of this memo.
         "closed-loop":  Defined in Appendix C.2.2 of this memo.

      render:

         Registrations for this repository MUST include a
         specification of the usage of the proposed value.
         See text in the preamble of Appendix C.6 for details
         (the paragraph that begins "Other render token ...").

         Initial values for this repository appear below:

         "unknown":  Defined in Appendix C.6 of this memo.
         "synthetic":  Defined in Appendix C.6 of this memo.
         "null":  Defined in Appendix C.6 of this memo.

      subrender:

         Registrations for this repository MUST include a
         specification of the usage of the proposed value.
         See text Appendix C.6.2 for details (the paragraph
         that begins "Other subrender token ..." and
         subsequent paragraphs).  Note that the text in
         Appendix C.6.2 contains restrictions on subrender
         registrations for mpeg4-generic ("Registrations
         for mpeg4-generic subrender values ...").

         Initial values for this repository appear below:

         "default":  Defined in Appendix C.6.2 of this memo.

      smf_info:

         Registrations for this repository MUST include a
         specification of the usage of the proposed value.
         See text in Appendix C.6.4.1 for details (the
         paragraph that begins "Other smf_info token ...").

         Initial values for this repository appear below:

         "ignore":  Defined in Appendix C.6.4.1 of this memo.
         "sdp_start":  Defined in Appendix C.6.4.1 of this memo.
         "identity":  Defined in Appendix C.6.4.1 of this memo.

11.3.  asc Media Type Registration

   This section registers "asc" as a subtype for the "audio" media type.
   We register this subtype to support the remote transfer of the
   "config" parameter of the mpeg4-generic media type [RFC3640] when it
   is used with mpeg4-generic mode rtp-midi (registered in Appendix 11.2
   above).  We explain the mechanics of using "audio/asc" to set the
   config parameter in Section 6.2 and Appendix C.6.5 of this document.

   Note that this registration is a new subtype registration and is not
   an addition to a repository defined by MPEG-related memos (such as
   [RFC3640]).  Also note that this request for "audio/asc" does not
   register parameters, and does not request the creation of a
   repository.

   Media type name:

       audio

   Subtype name:

       asc

   Required parameters:

       None.

   Optional parameters:

       None.

   Encoding considerations:

       The native form of the data object is binary data,
       zero-padded to an octet boundary.

   Restrictions on usage:

       This type is only defined for data object (stored file)
       transfer.  The most common transports for the type are
       HTTP and SMTP.

   Security considerations:

       See Section 9 of this memo.

   Interoperability considerations:

       None.

   Published specification:

       The audio/asc data object is the AudioSpecificConfig
       binary data structure, which is normatively defined in
       [MPEGAUDIO].

   Applications that use this media type:

       MPEG 4 Audio servers and terminals that support
       audio/mpeg4-generic RTP streams for mode rtp-midi.

   Additional information:

       None.

   Person & email address to contact for further information:

       John Lazzaro <lazzaro@cs.berkeley.edu>

   Intended usage:

       COMMON.

   Author:

       John Lazzaro <lazzaro@cs.berkeley.edu>

   Change controller:

       IETF Audio/Video Transport Working Group delegated
       from the IESG.

A.  The Recovery Journal Channel Chapters

A.1.  Recovery Journal Definitions

   This appendix defines the terminology and the coding idioms that are
   used in the recovery journal bitfield descriptions in Section 5
   (journal header structure), Appendices A.2 to A.9 (channel journal
   chapters) and Appendices B.1 to B.5 (system journal chapters).

   We assume that the recovery journal resides in the journal section of
   an RTP packet with sequence number I ("packet I") and that the
   Checkpoint Packet Seqnum field in the top-level recovery journal
   header refers to a previous packet with sequence number C (an
   exception is the self-referential C = I case).  Unless stated
   otherwise, algorithms are assumed to use modulo 2^16 arithmetic for
   calculations on 16-bit sequence numbers and modulo 2^32 arithmetic
   for calculations on 32-bit extended sequence numbers.

   Several bitfield coding idioms appear throughout the recovery journal
   system, with consistent semantics.  Most recovery journal elements
   begin with an "S" (Single-packet loss) bit.  S bits are designed to
   help receivers efficiently parse through the recovery journal
   hierarchy in the common case of the loss of a single packet.

   As a rule, S bits MUST be set to 1.  However, an exception applies if
   a recovery journal element in packet I encodes data about a command
   stored in the MIDI command section of packet I - 1.  In this case,
   the S bit of the recovery journal element MUST be set to 0.  If a
   recovery journal element has its S bit set to 0, all higher-level
   recovery journal elements that contain it MUST also have S bits that
   are set to 0, including the top-level recovery journal header.

   Other consistent bitfield coding idioms are described below:

     o R flag bit.  R flag bits are reserved for future use.  Senders
       MUST set R bits to 0.  Receivers MUST ignore R bit values.

     o LENGTH field.  All fields named LENGTH (as distinct from LEN)
       code the number of octets in the structure that contains it,
       including the header it resides in and all hierarchical levels
       below it.  If a structure contains a LENGTH field, a receiver
       MUST use the LENGTH field value to advance past the structure
       during parsing, rather than use knowledge about the internal
       format of the structure.

   We now define normative terms used to describe recovery journal
   semantics.

     o Checkpoint history.  The checkpoint history of a recovery journal
       is the concatenation of the MIDI command sections of packets C
       through I - 1.  The final command in the MIDI command section for
       packet I - 1 is considered the most recent command; the first
       command in the MIDI command section for packet C is the oldest
       command.  If command X is less recent than command Y, X is
       considered to be "before Y".  A checkpoint history with no
       commands is considered to be empty.  The checkpoint history never
       contains the MIDI command section of packet I (the packet
       containing the recovery journal), so if C == I, the checkpoint
       history is empty by definition.

     o Session history.  The session history of a recovery journal is
       the concatenation of MIDI command sections from the first packet
       of the session up to packet I - 1.  The definitions of command
       recency and history emptiness follow those in the checkpoint
       history.  The session history never contains the MIDI command
       section of packet I, and so the session history of the first
       packet in the session is empty by definition.

     o Finished/unfinished commands.  If all octets of a MIDI command
       appear in the session history, the command is defined as being
       finished.  If some but not all octets of a command appear in the
       session history, the command is defined as being unfinished.
       Unfinished commands occur if segments of a SysEx command appear
       in several RTP packets.  For example, if a SysEx command is coded
       as 3 segments, with segment 1 in packet K, segment 2 in packet K
       + 1, and segment 3 in packet K + 2, the session histories for
       packets K + 1 and K + 2 contain unfinished versions of the
       command.  A session history contains a finished version of a
       cancelled SysEx command if the history contains the cancel
       sublist for the command.

     o Reset State commands.  Reset State (RS) commands reset renderers
       to an initialized "powerup" condition.  The RS commands are:
       System Reset (0xFF), General MIDI System Enable (0xF0 0x7E 0xcc
       0x09 0x01 0xF7), General MIDI 2 System Enable (0xF0 0x7E 0xcc
       0x09 0x03 0xF7), General MIDI System Disable (0xF0 0x7E 0xcc 0x09
       0x00 0xF7), Turn DLS On (0xF0 0x7E 0xcc 0x0A 0x01 0xF7), and Turn
       DLS Off (0xF0 0x7E 0xcc 0x0A 0x02 0xF7).  Registrations of
       subrender parameter token values (Appendix C.6.2) and IETF
       standards-track documents MAY specify additional RS commands.

     o Active commands.  Active command are MIDI commands that do not
       appear before a Reset State command in the session history.

     o N-active commands.  N-active commands are MIDI commands that do
       not appear before one of the following commands in the session
       history:  MIDI Control Change numbers 123-127 (numbers with All
       Notes Off semantics) or 120 (All Sound Off), and any Reset State
       command.

     o C-active commands.  C-active commands are MIDI commands that do
       not appear before one of the following commands in the session
       history:  MIDI Control Change number 121 (Reset All Controllers)
       and any Reset State command.

     o Oldest-first ordering rule.  Several recovery journal chapters
       contain a list of elements, where each element is associated with
       a MIDI command that appears in the session history.  In most
       cases, the chapter definition requires that list elements be
       ordered in accordance with the "oldest-first ordering rule".
       Below, we normatively define this rule:

       Elements associated with the most recent command in the session
       history coded in the list MUST appear at the end of the list.

       Elements associated with the oldest command in the session
       history coded in the list MUST appear at the start of the list.

       All other list elements MUST be arranged with respect to these
       boundary elements, to produce a list ordering that strictly
       reflects the relative session history recency of the commands
       coded by the elements in the list.

     o Parameter system.  A MIDI feature that provides two sets of
       16,384 parameters to expand the 0-127 controller number space.
       The Registered Parameter Names (RPN) system and the Non-
       Registered Parameter Names (NRPN) system each provides 16,384
       parameters.

     o Parameter system transaction.  The value of RPNs and NRPNs are
       changed by a series of Control Change commands that form a
       parameter system transaction.  A canonical transaction begins
       with two Control Change commands to set the parameter number
       (controller numbers 99 and 98 for NRPNs, controller numbers 101
       and 100 for RPNs).  The transaction continues with an arbitrary
       number of Data Entry (controller numbers 6 and 38), Data
       Increment (controller number 96), and Data Decrement (controller
       number 97) Control Change commands to set the parameter value.
       The transaction ends with a second pair of (99, 98) or (101, 100)
       Control Change commands that specify the null parameter (MSB
       value 0x7F, LSB value 0x7F).

       Several variants of the canonical transaction sequence are
       possible.  Most commonly, the terminal pair of (99, 98) or (101,
       100) Control Change commands may specify a parameter other than
       the null parameter.  In this case, the command pair terminates
       the first transaction and starts a second transaction.  The
       command pair is considered to be a part of both transactions.
       This variant is legal and recommended in [MIDI].  We refer to
       this variant as a "type 1 variant".

       Less commonly, the MSB (99 or 101) or LSB (98 or 100) command of
       a (99, 98) or (101, 100) Control Change pair may be omitted.

       If the MSB command is omitted, the transaction uses the MSB value
       of the most recent C-active Control Change command for controller
       number 99 or 101 that appears in the session history.  We refer
       to this variant as a "type 2 variant".

       If the LSB command is omitted, the LSB value 0x00 is assumed.  We
       refer to this variant as a "type 3 variant".  The type 2 and type
       3 variants are defined as legal, but are not recommended, in
       [MIDI].

       System real-time commands may appear at any point during a
       transaction (even between octets of individual commands in the
       transaction).  More generally, [MIDI] does not forbid the
       appearance of unrelated MIDI commands during an open transaction.
       As a rule, these commands are considered to be "outside" the
       transaction and do not affect the status of the transaction in
       any way.  Exceptions to this rule are commands whose semantics
       act to terminate transactions:  Reset State commands, and Control
       Change (0xB) for controller number 121 (Reset All Controllers)
       [RP015].

     o Initiated parameter system transaction.  A canonical parameter
       system transaction whose (99, 98) or (101, 100) initial Control
       Change command pair appears in the session history is considered
       to be an initiated parameter system transaction.  This definition
       also holds for type 1 variants.  For type 2 variants (dropped
       MSB), a transaction whose initial LSB Control Change command
       appears in the session history is an initiated transaction.  For
       type 3 variants (dropped LSB), a transaction is considered to be
       initiated if at least one transaction command follows the initial
       MSB (99 or 101) Control Change command in the session history.
       The completion of a transaction does not nullify its "initiated"
       status.

     o Session history reference counts.  Several recovery journal
       chapters include a reference count field, which codes the total
       number of commands of a type that appear in the session history.
       Examples include the Reset and Tune Request command logs (Chapter
       D, Appendix B.1) and the Active Sense command (Chapter V,
       Appendix B.2).  Upon the detection of a loss event, reference
       count fields let a receiver deduce if any instances of the
       command have been lost, by comparing the journal reference count
       with its own reference count.  Thus, a reference count field
       makes sense, even for command types in which knowing the NUMBER
       of lost commands is irrelevant (as is true with all of the
       example commands mentioned above).

   The chapter definitions in Appendices A.2 to A.9 and B.1 to B.5
   reflect the default recovery journal behavior.  The ch_default,
   ch_never, and ch_anchor parameters modify these definitions, as
   described in Appendix C.2.3.

   The chapter definitions specify if data MUST be present in the
   journal.  Senders MAY also include non-required data in the journal.
   This optional data MUST comply with the normative chapter definition.
   For example, if a chapter definition states that a field codes data
   from the most recent active command in the session history, the
   sender MUST NOT code inactive commands or older commands in the
   field.

   Finally, we note that a channel journal only encodes information
   about MIDI commands appearing on the MIDI channel the journal
   protects.  All references to MIDI commands in Appendices A.2 to A.9
   should be read as "MIDI commands appearing on this channel."

A.2.  Chapter P: MIDI Program Change

   A channel journal MUST contain Chapter P if an active Program Change
   (0xC) command appears in the checkpoint history.  Figure A.2.1 shows
   the format for Chapter P.

                0                   1                   2
                0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3
               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
               |S|   PROGRAM   |B|   BANK-MSB  |X|  BANK-LSB   |
               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                      Figure A.2.1 -- Chapter P format

   The chapter has a fixed size of 24 bits.  The PROGRAM field indicates
   the data value of the most recent active Program Change command in
   the session history.  By default, the B, BANK-MSB, X, and BANK-LSB

   fields MUST be set to 0.  Below, we define exceptions to this default
   condition.

   If an active Control Change (0xB) command for controller number 0
   (Bank Select MSB) appears before the Program Change command in the
   session history, the B bit MUST be set to 1, and the BANK-MSB field
   MUST code the data value of the Control Change command.

   If B is set to 1, the BANK-LSB field MUST code the data value of the
   most recent Control Change command for controller number 32 (Bank
   Select LSB) that preceded the Program Change command coded in the
   PROGRAM field and followed the Control Change command coded in the
   BANK-MSB field.  If no such Control Change command exists, the BANK-
   LSB field MUST be set to 0.

   If B is set to 1, and if a Control Change command for controller
   number 121 (Reset All Controllers) appears in the MIDI stream between
   the Control Change command coded by the BANK-MSB field and the
   Program Change command coded by the PROGRAM field, the X bit MUST be
   set to 1.

   Note that [RP015] specifies that Reset All Controllers does not reset
   the values of controller numbers 0 (Bank Select MSB) and 32 (Bank
   Select LSB).  Thus, the X bit does not effect how receivers will use
   the BANK-LSB and BANK-MSB values when recovering from a lost Program
   Change command.  The X bit serves to aid recovery in MIDI
   applications where controller numbers 0 and 32 are used in a non-
   standard way.

A.3.  Chapter C: MIDI Control Change

   Figure A.3.1 shows the format for Chapter C.

       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 8 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |S|     LEN     |S|   NUMBER    |A|  VALUE/ALT  |S|   NUMBER    |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |A|  VALUE/ALT  |  ....                                         |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                       Figure A.3.1 -- Chapter C format

   The chapter consists of a 1-octet header, followed by a variable
   length list of 2-octet controller logs.  The list MUST contain at
   least one controller log.  The 7-bit LEN field codes the number of
   controller logs in the list, minus one.  We define the semantics of
   the controller log fields in Appendix A.3.2.

   A channel journal MUST contain Chapter C if the rules defined in this
   appendix require that one or more controller logs appear in the list.

A.3.1.  Log Inclusion Rules

   A controller log encodes information about a particular Control
   Change command in the session history.

   In the default use of the payload format, list logs MUST encode
   information about the most recent active command in the session
   history for a controller number.  Logs encoding earlier commands MUST
   NOT appear in the list.

   Also, as a rule, the list MUST contain a log for the most recent
   active command for a controller number that appears in the checkpoint
   history.  Below, we define exceptions to this rule:

     o  MIDI streams may transmit 14-bit controller values using paired
        Most Significant Byte (MSB, controller numbers 0-31, 99, 101)
        and Least Significant Byte (LSB, controller numbers 32-63, 98,
        100) Control Change commands [MIDI].

        If the most recent active Control Change command in the session
        history for a 14-bit controller pair uses the MSB number,
        Chapter C MAY omit the controller log for the most recent active
        Control Change command for the associated LSB number, as the
        command ordering makes this LSB value irrelevant.  However, this
        exception MUST NOT be applied if the sender is not certain that
        the MIDI source uses 14-bit semantics for the controller number
        pair.  Note that some MIDI sources ignore 14-bit controller
        semantics and use the LSB controller numbers as independent 7-
        bit controllers.

     o  If active Control Change commands for controller numbers 0 (Bank
        Select MSB) or 32 (Bank Select LSB) appear in the checkpoint
        history, and if the command instances are also coded in the
        BANK-MSB and BANK-LSB fields of the Chapter P (Appendix A.2),
        Chapter C MAY omit the controller logs for the commands.

     o  Several controller number pairs are defined to be mutually
        exclusive.  Controller numbers 124 (Omni Off) and 125 (Omni On)
        form a mutually exclusive pair, as do controller numbers 126
        (Mono) and 127 (Poly).

        If active Control Change commands for one or both members of a
        mutually exclusive pair appear in the checkpoint history, a log
        for the controller number of the most recent command for the
        pair in the checkpoint history MUST appear in the controller

        list.  However, the list MAY omit the controller log for the
        most recent active command for the other number in the pair.

        If active Control Change commands for one or both members of a
        mutually exclusive pair appear in the session history, and if a
        log for the controller number of the most recent command for the
        pair does not appear in the controller list, a log for the most
        recent command for the other number of the pair MUST NOT appear
        in the controller list.

     o  If an active Control Change command for controller number 121
        (Reset All Controllers) appears in the session history, the
        controller list MAY omit logs for Control Change commands that
        precede the Reset All Controllers command in the session
        history, under certain conditions.

        Namely, a log MAY be omitted if the sender is certain that a
        command stream follows the Reset All Controllers semantics
        defined in [RP015], and if the log codes a controller number for
        which [RP015] specifies a reset value.

        For example, [RP015] specifies that controller number 1
        (Modulation Wheel) is reset to the value 0, and thus a
        controller log for Modulation Wheel MAY be omitted from the
        controller log list.  In contrast, [RP015] specifies that
        controller number