Sophie: libnut-devel-0-0.274.2mdv2008.1 x86

libnut-devel-0-0.274.2mdv2008.1.x86_64.rpm



!!! DRAFT DRAFT DRAFT !!!

DRAFT USAGE / SEMANTICS / RATIONALE SECTIONS FOR NUT SPEC


Overview of NUT

Unlike many popular containers, a NUT file can largely be viewed as a
byte stream, as opposed to having a global block structure. NUT files
consist of a sequence of packets, which can contain global headers,
file metadata, stream headers for the individual media streams,
optional index data to accelerate seeking, and, of course, the actual
encoded media frames. Aside from frames, all packets begin with a
64-bit startcode, the first byte of which is 0x4E, the ASCII character
'N'. In addition to identifying the type of packet to follow, these
startcodes (combined with CRC) allow for reliable resynchronization
when reading damaged or incomplete files. Packets have a common
structure that enables a process reading the file both to verify
packet contents and to bypass uninteresting packets without having to
be aware of the specific packet type.

In order to facilitate identification and playback of NUT files,
strict rules are imposed on the location and order of packets and
streams. Streams can be of class video, audio, subtitle, or
user-defined data. Additional classes may be added in a later version
of the NUT specification. Streams must be numbered consecutively
beginning from 0. This allows simple and compact reference to streams
in packet types where overhead must be kept to a minimum.

Packet Structure

Every NUT packet has a packet_header and a packet_footer.
The packet_header consists of a 64-bit startcode, a forward_ptr, and an
optional CRC on the packet header if the forward_ptr is larger than 4k.
The forward_ptr gives the size of the packet from after the packet_header
until the end of the packet_footer. The optional CRC is to prevent a
demuxer from seeing a damaged startcode and forward_ptr with a high size,
causing it to skip or buffer a large part of the file, only to find it is
not really a NUT packet.
The packet_footer consists of reserved_bytes, which is room for any
reserved fields which can be skipped by old demuxers, and a CRC covering
the packet from after the packet_header until the CRC itself.

Variable Length Coding

Almost all fields in NUT are coded using VLCs. VLCs allow for compact
storage of small integers, while still being extendible to infinitely
large integers. The syntax is of a VLC is, per byte, a 1 bit flag stating
if there are more bits to the integer, and 7 bits to be prepended to the
lsb of the integer, shifting the previous value to the left.
Stuffing is allowed in VLCs by adding 0x80 bytes before the actual value,
but a maximum of 8 bytes of stuffing is allowed in any VLC outside of a
NUT packet. The only such fields are forward_ptr and fields in frame
headers. This is to prevent demuxers from looping on a large of amount of
damaged bytes of 0x80. Fields inside a NUT packet are protected by a CRC
which can be checked before decoding.

Header Structure

A NUT file must begin with a magic identification string, followed by
the main header and a stream header for each stream, ordered by stream
id. No other packets may intervene between these header packets. For
robustness, a NUT file needs to include backup copies of the headers.
In the absence of valid headers at the beginning of the file,
processes attempting to read a NUT file are recommended to search for
backup headers beginning at each power-of-two byte offset in the file,
and before end of file.
Simple stop conditions are provided to ensure that this search
algorithm is bounded logarithmically in file length. This stop condition
is finding any valid NUT packet (such as a syncpoint) during the search,
as no packets are allowed between a search start until a reapeted header
set.

Metadata - Info Packets

The NUT main header and stream headers may be followed by metadata
"info" packets, which contain (mostly textual, but other formats are
possible) information on the file, on particular streams, or on
particular time intervals ("chapters") of the file, such as: title,
author, language, etc. One should note that info packets may occur at
other locations in a file, particulatly in a file that is being
generated/transmitted in real time; however, a process interpreting a
NUT file should not make any attempt to search for info packets except
in their usual location, i.e. following the headers. It is intended
that processes presenting the contents of a NUT file will make
automated responses to information stored in these packets, e.g.
selecting a subtitle language based on the user's preferred list of
languages, or providing a visual list of chapters to the user.
Therefore, the format of info packets and the data they are to contain
has been carefully specified and is aligned with International
Standards for language codes and so forth. For this reason it is also
important that info packets be stored in the correct locations, so
that processes making automated responses to these packets can operate
correctly.

Index

An index packet to facilitate O(1) seek-to-time operations may follow
the headers. If an index packet does exist here, it should be placed
after info packets, rather than before. Since the contents of the
index depend on knowing the complete contents of the file, most
processes generating NUT files are not expected to store an index with
the headers. This option is merely provided for applications where it
makes sense, to allow the index to be read without any seek operations
on the underlying media when it is available.

On the other hand, all NUT files except live streams (which have no
concept of "end of file") must include an index at the end of the
file, followed by a fixed-size 64-bit integer that is an offset
backwards from end-of-file at which the final index packet begins.
This is the only fixed-size field specified by NUT, and makes it
possible to locate an index stored at the end of the file without
resorting to unreliable heuristics.

Streams

A NUT file consists of one or more streams, intended to be presented
simultaneously in synchronization with one another. Use of streams as
independent entities is discouraged, and the nature of NUT's ordering
requirements on frames makes it highly disadvantageous to store
anything except the audio/video/subtitle/etc. components of a single
presentation together in a single NUT file. Nonlinear playback order,
scripting, and such are topics outside the scope of NUT, and should be
handled at a higher protocol layer should they be desired (for
example, using several NUT files with an external script file to
control their playback in combination).

With each stream, a single media encoding format is associated. The
stream headers convey properties of the encoding, such as video frame
dimensions, sample rates, and the compression standard ("codec") used
(if any). Stream headers may also carry with them an opaque, binary
object in a codec-specific format, containing global parameters for
the stream such as codebooks. Both the compression format and whatever
parameters are stored in the stream header (including NUT fields and
the opaque global header object) are constant for the duration of the
stream.

Each stream has a last_pts context. For compression, every frame's
pts is coded relatively to the last_pts. In order for demuxing to resume
from arbitrary points in the file, all last_pts contexes are reset by
syncpoints.

Frames

NUT is built on the model that video, audio, and subtitle streams all
consist of a sequence of "frames", where the specific definition of
frame is left partly to the codec, but should be roughly interpreted
as the smallest unit of data which can be decoded (not necessarily
independently; it may depend on previously-decoded frames) to a
complete presentation unit occupying an interval of time. In
particular, video frames correspond to the usual idea of a frame as a
picture that is displayed beginning at its assigned timestamp until it
is replaced by a subsequent picture with a later timestamp. Subtitle
frames should be thought of as individual subtitles in the case of
simple text-only streams, or as events that alter the presentation in
the case of more advanced subtitle formats. Audio frames are merely
intervals of samples; their length is determined by the compression
format used.

Frames need not be decoded in their presentation order. NUT allows for
arbitrary out-of-order frame systems, from classic MPEG-1-style B
frames to H.264 B pyramid and beyond, using a simple notion of "delay"
and an implicitly-determined "decode timestamp" (dts). Out-of-order
decoding is not limited to video streams; it is available to audio
streams as well, and, given the right conditions, even subtitle
streams, should a subtitle format choose to make use of such a
capability.

Central to NUT is the notion that EVERY frame has a timestamp. This
differs from other major container formats which allow timestamps to
be omitted for some or even most frames. The decision to explicitly
timestamp each frame allows for powerful high-level seeking and
editing in applications without any interaction with the codec level.
This makes it possible to develop applications which are completely
unaware of the codecs used, and allows applications which do need to
perform decoding to be more properly factored.

Keyframes

NUT defines a "key frame" as any frame such that the frame itself and
all subsequent (with regard to presentation time) frames of the stream
can be decoded successfully without reference to prior (with regard to
storage/decoding order) frames in the stream. This definition may
sometimes be bent on a per-codec basis, particularly with audio
formats where there is MDCT window overlap or similar.

The concept of key frames is central to seeking, and key frames will
be the targets of the seek-to-time operation.

Representation of Time

NUT represents all timestamps as exact integer multiples of a rational
number "time base". Files can have multiple time bases in order to
accurately represent the time units of each stream. The set of
available time bases is defined in the main header, while each stream
header indicates which time base the corresponding stream will use.

Effective use of time bases both allows for compact representation of
timestamps, minimizing overhead, and enriches the information
contained in the file. For example, a process interpreting a NUT file
with a video time base of 1/25 second knows it can convert the video
to fixed-framerate 25 fps content or present it faithfully on a PAL
display.

The scope of the media contained in a NUT file is a single contiguous
interval of time. Timestamps need not begin at zero, but they may not
jump backwards. Any large forward jump in timestamps must be
interpreted as a frame with a large presentation interval, not as a
discontinuity in the presentation. Without conditions such as these,
NUT could not guarantee correct seeking in efficient time bounds.

Aside from provisions made for out-of-order decoding, all frames in a
NUT file must be strictly ordered by timestamp. For the purpose of
sorting frames, all timestamps are treated as rational numbers derived
from a coded integer timestamp and the associated time base, and
compared under the standard ordering on the rational numbers.

Frame Coding

Each frame begins with a "framecode", a single byte which indexes a
table in the main header. This table can associate properties such as
stream id, size, relative timestamp, keyframe flag, etc. with the
frame that follows, or allow the values to be explicitly coded
following the framecode byte. By careful construction of the framecode
table in the main header, an average overhead of significantly less
than 2 bytes per frame can be achieved for single-stream files at low
bitrates.
Framecodes can also be flagged as invalid, and seeing such a framecode
indicates a damaged file. The frame code 0x4E ('N') is a special invalid
framecode which marks that the next packet is a NUT packet, and not a
frame. The following 7 bytes, combined with 'N', is the full startcode of
the NUT packet.

Syncpoints

Syncpoints are mini-NUT packets, which serve for seeking, error recovery,
and error checking. They contain a startcode like all NUT packets, a
timestamp, a back_ptr, and a CRC on the packet itself. Syncpoints must be
placed every 32kb (or whatever max_distance is set to in the main header,
64kb at most), unless between the 2 syncpoints is a single frame.
Syncpoints must be followed by a frame, and must be placed after headers
(except those at end of file).
The timestamp coded in the syncpoint is a global timestamp, which is used
to reset the last_pts context of all streams, and to find the appropriate
syncpoint when seeking. Demuxing can only begin at syncpoints for proper
last_pts context across all streams, including after seeking.
A back_ptr points to a previous syncpoint in the file. The area between
the previous syncpoint and this one must contain a keyframe for every
stream, with a pts lower than or equal to the timestamp of this syncpoint.
This back_ptr is used for optimal seeking in files without an index.
For compression, the back_ptr is relative to this syncpoint, and is
divided by 16. The reason for this is that the minimum size for a
syncpoint is 16 bytes:
8 startcode + 1 forward_ptr + 1 timestamp + 1 back_ptr + 4 checksum + 1 frame_code

End of Relevance
EOR is a flag that can be attributed to a frame in any stream, and marks
the end of relavence of a stream on presentation, such as a subtitle
stream currently showing no subtitles on screen. EOR flag can only be
given to zero byte frames, and must be set as keyframe as well. Once EOR
is seen on such a stream, the stream is set EOR until the next keyframe on
that stream. Streams which are set EOR are ignored by back_ptr in
syncpoints until EOR is unset. The significance of EOR is to set the
stream as irrelevant when seeking and searching for optimal keyframes to
begin demuxing.

Error Checking

There are several ways to detect a damaged stream in NUT during demuxing:
1. Invalid framecode - If a framecode which has been marked as invalid in
   the main header is found as the framecode in a frame header, then the
   stream is damaged. For this reason, 0x00 and 0xFF are recommended to be
   set as invalid frame codes in NUT.
2. Bad CRC on a NUT packet, packet_header, or frame header - fairly
   obvious.
3. Decoded frame size causes a distance from the last syncpoint to be
   bigger than max_distance, and frame does not follow a syncpoint.
4. Decoded frame size is bigger than max_distance*2, and frame header does
   not have a CRC.
5. Decoded frame pts is more than max_pts_distance higher than last_pts,
   and frame header does not have a CRC.
6. A VLC is found with more than 8 bytes of stuffing in a frame header or
   forward_ptr.
7. Streams are found to not be strictly interleaved by comparing dts and
   pts - a precise formula for this check can be found in the spec.

All these conditions make it impossible for a demuxer to read, skip or
buffer a large amount of data from a file because of damaged data. Also
max_pts_distance prevents an overly large pts caused by damaged data to
cause a player to get stuck.

Error Recovery

The recommended method for recovering from errors once damage has been
detected, is to linear search the file from current position to the
closest syncpoint startcode found, and resume demuxing from there. If
possible, before the linear search, rewind to the last syncpoint seen, in
case a syncpoint was already skipped due to demuxing damaged data.

Seeking

An in depth explanation of an optimal seeking algorithm can be found in
http://wiki.multimedia.cx/index.php?title=NUT