README for the Dirac video codec ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ by the BBC R&D Dirac team (diracinfo@rd.bbc.co.uk) 1. Executive Summary ~~~~~~~~~~~~~~~~~~~~ Dirac is an open source video codec. It uses a traditional hybrid video codec architecture, but with the wavelet transform instead of the usual block transforms. Motion compensation uses overlapped blocks to reduce block artefacts that would upset the transform coding stage. Dirac can code just about any size of video, from streaming up to HD and beyond, although certain presets are defined for different applications and standards. These cover the parameters that need to be set for the encoder to work, such as block sizes and temporal prediction structures, which must otherwise be set by hand. Dirac is intended to develop into real coding and decoding software, capable of plugging into video processing applications and media players that need compression. It is intended to develop into a simple set of reliable but effective coding tools that work over a wide variety of content and formats, using well-understood compression techniques, in a clear and accessible software structure. It is not intended as a demonstration or reference coder. 2. Documentation ~~~~~~~~~~~~~~~~ Documentation can be found at http://diracvideo.org/wiki/index.php/Main_Page#Documentation 3. Building and installing ~~~~~~~~~~~~~~~~~~~~~~~~~~ GNU/Linux, Unix, MacOS X, Cygwin, Mingw --------------------------------------- ./configure --enable-debug (to enable extra debug compile options) OR ./configure --enable-profile (to enable the g++ profiling flag -pg) OR ./configure --disable-mmx (to disable MMX optimisation which is enabled by default) OR ./configure --enable-debug --enable-profile (to enable extra debug compile options and profiling options) OR ./configure By default, both shared and static libraries are built. To build all-static libraries use ./configure --disable-shared To build shared libraries only use ./configure --disable-static make make install The INSTALL file documents arguments to ./configure such as --prefix=/usr/local (specify the installation location prefix). MSYS and Microsoft Visual C++ ----------------------------- Download and install the no-cost Microsoft Visual C++ 2008 Express Edition from http://msdn.microsoft.com/vstudio/express/visualc/ Download and install MSYS (the MinGW Minimal SYStem), MSYS-1.0.10.exe, from http://www.mingw.org/download.shtml. An MSYS icon will be available on the desktop. Click on the MSYS icon on the desktop to open a MSYS shell window. Create a .profile file to set up the environment variables required. vi .profile Include the following four lines in the .profile file. PATH=/c/Program\ Files/Microsoft\ Visual\ Studio\ 9.0/Common7/IDE:/c/Program\ Files/Microsoft\ Visual\ Studio\ 9.0/VC/BIN:/c/Program\ Files/Microsoft\ Visual\ Studio\ 9.0/Common7/Tools:/c/WINDOWS/Microsoft.NET/Framework/v3.5:/c/WINDOWS/Microsoft.NET/Framework/v2.0.50727:/c/Program\ Files/Microsoft\ Visual\ Studio\ 9.0/VC/VCPackages:$PATH INCLUDE=/c/Program\ Files/Microsoft\ Visual\ Studio\ 9.0/VC/INCLUDE:$INCLUDE LIB=/c/Program\ Files/Microsoft\ Visual\ Studio\ 9.0/VC/LIB:$LIB LIBPATH=/c/WINDOWS/Microsoft.NET/Framework/v3.5:/c/WINDOWS/Microsoft.NET/Framework/v2.0.50727:/c/Program\ Files/Microsoft\ Visual\ Studio\ 9.0/VC/LIB:$LIBPATH (Replace /c/Program\ Files/Microsoft\ Visual\ Studio\ 9/ with the location where VC++ 2008 is installed if necessary) Exit from the MSYS shell and click on the MSYS icon on the desktop to open a new MSYS shell window for the .profile to take effect. Change directory to the directory where Dirac was unpacked. By default only the dynamic libraries are built. ./configure CXX=cl LD=cl --enable-debug (to enable extra debug compile options) OR ./configure CXX=cl LD=cl --disable-shared (to build static libraries) OR ./configure CXX=cl LD=cl make make install The INSTALL file documents arguments to ./configure such as --prefix=/usr/local (specify the installation location prefix). Microsoft Visual C++ .NET 2008 ------------------------------ Download and install the no-cost Microsoft Visual C++ 2008 Express Edition from http://www.microsoft.com/express/download/ The MS VC++ 2008 solution and project files are in win32/VisualStudio directory. Double-click on the solution file, dirac.sln, in the win32/VisualStudio directory. The target 'Everything' builds the codec libraries and utilities. Four build-types are supported Debug - builds unoptimised encoder and decoder dlls with debug symbols Release - builds optimised encoder and decoder dlls Debug-mmx - builds unoptimised encoder and decoder dlls with debug symbols and mmx optimisations enabled. Release-mmx - builds optimised encoder and decoder dlls with mmx optimisations enabled. Static-Debug - builds unoptimised encoder and decoder static libraries with debug symbols Static-Release - builds optimised encoder and decoder static libraries Static-Debug-mmx - builds unoptimised encoder and decoder static libraries with debug symbols and mmx optmisations enabled. Static-Release-mmx - builds optimised encoder and decoder static libraries with mmx optmisations enabled. Static libraries are created in the win32/VisualStudio/build/lib/<build-type> directory. Encoder and Decoder dlls and import libraries, encoder and decoder apps are created in the win32/VisualStudio/build/bin/<build-type> directory. The "C" public API is exported using the _declspec(dllexport) mechanism. Conversion utilites are created in the win32/VisualStudio/build/utils/conversion/<build-type> directory. Only static versions are built. Instrumentation utility is created in the win32/VisualStudio/build/utils/instrumentation/<build-type> directory. Only static versions are built. Older editions of Microsoft Visual C++ (e.g. 2003 and 2005) ----------------------------------------------------------- NOTE: Since Visual C++ 2008 Express edition is freely available to download, older versions of the Visual C++ editions are no longer supported. So it is suggested that the users upgrade their VC++ environment to VC++ 2008. 4. Running the example programs ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 4.1 Command-line parameters At the moment there is a simple command-line parser class which is used in all the executables. The general procedure for running a program is to type: prog_name -<flag_name> flag_val ... param1 param2 ... In other words, options are prefixed by a dash; some options take values, while others are boolean options that enable specific features. For example: When running the encoder, the -qf options requires a numeric argument specifying the "quality factor" for encoding. The -verbose option enables detailed output and does not require an argument. Running any program without arguments will display a list of parameters and options. 4.2 File formats The example coder and decoder use raw 8-bit planar YUV data. This means that data is stored bytewise, with a frame of Y followed by a frame of U followed by a frame of V, all scanned in the usual raster order. The video dimensions , frame rate and chroma are passed to the encoder via command line arguments. Other file formats are supported by means of conversion utilities that may be found in the subdirectory util/conversion. These will convert to and from raw RGB format, and support all the standard raw YUV formats as well as bitmaps. Raw RGB can be obtained as an output from standard conversion utilities such as ImageMagick. Example. Compress an image sequence of 100 frames of 352x288 video in tiff format. Step 1. Use your favourite conversion routine to produce a single raw RGB file of all the data. If your routine converts frame-by-frame then you will need to concatenate the output. Step 2. Convert from RGB to the YUV format of your choice. For example, to do 420, type RGBtoYUV420 <file.rgb >file.yuv 352 288 100 Note that this uses stdin and stdout to read and write the data. We have provided a script create_test_data.pl to help convert rgb format files into all the input formats supported by Dirac. The command line arguments it supports can be listed using create_test_data.pl -use Sample usage is create_test_data.pl -width=352 -height=288 -num_frames=100 file.rgb (This assumes that the RGBtoYUV utilities are in a directory specified in PATH variable. If not in the path, then use options -convutildir and to set the directories where the script can find the conversion utilities.) The scripts then outputs files in all chroma formats (420, 422, 444) supported by Dirac to the current directory. Step 4. Run the encoder. This will produce a locally decoded output in the same format if the locally decoded output is enabled using the -local flag. Step 5. Convert back to RGB. YUV420toRGB <file.yuv >file.rgb 352 288 100 Step 6. Use your favourite conversion utility to convert to the format of your choice. You can also use the transcode utility to convert data to and from Dirac's native formats (see http://zebra.fh-weingarten.de/~transcode/): This example uses a 720x576x50 DV source, and transcodes to 720x576 YUV in 4:2:0 chroma format. Cascading codecs (DV + Dirac) is generally a bad idea - use this only if you don't have any other source of uncompressed video. transcode -i source.dv -x auto,null --dv_yuy2_mode -k -V -y raw,null -o file.avi tcextract -i test.avi -x rgb > file.yuv Viewing and playback utilities for uncompressed video include MPlayer and ImageMagick's display command. Continuing the 352x288 4:2:0 example above, to display a single frame of raw YUV with ImageMagick use the following (use <spacebar> to see subsequent frames): display -size 352x288 test.yuv Raw YUV 420 data can also be played back in MPlayer - use the following MPlayer command: mplayer -fps 15 -rawvideo on:size=152064:w=352:h=288 test.yuv (at the time of writing MPlayer could not playback 4:2:2 or 4:4:4 YUV data) 4.3 Encoding The basic encoding syntax is to type dirac_encoder [options] file_in file_out This will compress file_in and produce an output file_out of compressed data. A locally decoded output file_out.local-dec.yuv and instrumentation data file_out.imt (for debugging the encoder and of interest to developers only) are also produced if the -local flag is enabled on the command line. There are a large number of optional parameters that can be used to run the encoder, all of which are listed below. To encode video you need three types of parameter need to be set: a) quality factor or target bit rate b) source parameters (width, height, frame rate, chroma format) c) encoding parameters (motion compensation block sizes, preferred viewing distance) In practice you don't have to set all these directly because presets can be used to use appropriate default values. a) The most important parameters are the quality factor or target bit rate. The quality factor is specified by using the option qf : Overall quality factor (>0) This value is greater than 0, the higher the number, the better the quality. Typical high quality is 8-10, but it will vary from sequence to sequence, sometimes higher and sometimes lower. The target bit rate is set using the option targetrate : Target bit rate in Kb/s This will attempt to maintain constant bit rate over the sequence. It works reasonably well, but actual bit rate, especially over short sequences, may be slightly different from the target. Setting -targetrate overrides -qf, in that CBR will still be applied, although the initial quality will be set by the given qf value. This might help the CBR algorithm to adapt faster. Setting -lossless overrides both -qf and -targetrate, and enforces lossless coding. b) Source parameters need to be set as the imput is just a raw YUV file and the encoder doesn't have any information about it. The best way to set source parameters is to use a preset for different video formats. The available preset options are: QSIF525 : width=176; height=120; 4:2:0 format; 14.98 frames/sec QCIF : width=176; height=144; 4:2:0 format; 12.5 frames/sec SIF525 : width=352; height=240; 4:2:0 format; 14.98 frames/sec CIF : width=352; height=288; 4:2:0 format; 12.5 frames/sec 4SIF525 : width=704; height=480; 4:2:0 format; 14.98 frames/sec 4CIF : width=704; height=576; 4:2:0 format; 12.5 frames/sec SD480I60 : width=720; height=480; 4:2:2 format; 29.97 frames/sec SD576I50 : width=720; height=576; 4:2:2 format; 25 frames/sec HD720P60 : width=1280; height=720; 4:2:2 format; 60 frames/sec HD720P50 : width=1280; height=720; 4:2:2 format; 50 frames/sec HD1080I60 : width=1920; height=1080; 4:2:2 format; 29,97 frames/sec HD1080I50 : width=1920; height=1080; 4:2:2 format; 25 frames/sec HD1080P60 : width=1920; height=1080; 4:2:2 format; 59.94 frames/sec HD1080P50 : width=1920; height=1080; 4:2:2 format; 50 frames/sec DC2K24 : width=2048; height=1080; 4:2:2 format; 24 frames/sec DC4K24 : width=4096; height=2160; 4:2:2 format; 24 frames/sec UHDTV4K60 : width=3840; height=2160; 4:2:2 format; 59.94 frames/sec UHDTV4K50 : width=3840; height=2160; 4:2:2 format; 50 frames/sec UHDTV8K60 : width=7680; height=4320; 4:2:2 format; 59.94 frames/sec UHDTV8K50 : width=7680; height=4320; 4:2:2 format; 50 frames/sec The default format used is CUSTOM format which has the following preset values width=640; height=480; 4:2:0 format; 23.97 frames/sec. If your video is not one of these formats, you should pick the nearest preset and override the parameters that are different. Example 1 Simple coding example. Code a 720x576 sequence in Planar 420 format to high quality. Solution. dirac_encoder -cformat YUV420P -SD576I50 -qf 9 test.yuv test_out.drc Example 2. Code a 720x486 sequence at 29.97 frames/sec in 422 format to medium quality Solution dirac_encoder -SD576I50 -width 720 -height 486 -fr 29.97 -cformat YUV422P -qf 5.5 test.yuv test_out.drc Source parameters that affect coding are: width : Width of video frame height : Height of video frame cformat : Chroma Sampling format. Acceptable values are YUV444P, YUV422P and YUV420P. fr : Frame rate. Can be a decimal number or a fraction. Examples of acceptable values are 25, 29.97, 12.5, 30000/1001. source_sampling : Source material type - 0 - progressive or 1 - interlaced For a complete list of source parameters, refer to Annex C of the Dirac Specification. WARNING!! If you use a preset but don't override source parameters that are different, then Dirac will still compress, but the bit rate will be much, much higher and there may well be serious artefacts. The encoder prints out the parameters it's actually using before starting encoding (in verbose mode only), so that you can abort at this point. c) The presets ALSO set encoding parameters. That's why it's a very good idea to use presets, as the encoding parameters are a bit obscure. They're still supported for those who want to experiment, but use with care. Encoding parameters are: L1_sep : the separation between L1 frames (frames that are predicted but also used as reference frames, like P frames in MPEG-2) num_L1 : the number of L1 frames before the next intra frame xblen : the width of blocks used for motion compensation yblen : the height of blocks used for motion compensation xbsep : the horizontal separation between blocks. Always <xblen ybsep : the vertical separation between blocks. Always <yblen cpd : normalised viewing distance parameter, in cycles per degree. iwlt_filter : transform filter to use when encoding INTRA frames, Valid values are DD9_7, LEGALL5_3, DD13_7, HAAR0, HAAR1, FIDELITY, DAUB97. Default value is DD13_7. rwlt_filter : transform filter to use when encoding INTER frames, Valid values are DD9_7, LEGALL5_3, DD13_7, HAAR0, HAAR1, FIDELITY, DAUB97. Default value is DD13_7. wlt_depth : transform depth, i.e number of times the component is split while applying the wavelet transform no_spartition : Do not split a subband into coefficient blocks before entropy coding multi_quants : If subbands are split into multiple coefficient blocks before entropy coding, assign different quantisers to each block within the subband. prefilter : Prefilter to apply to input video before encoding. The name of the filter to be used and the filter strength have to be supplied. Valid filter names are NO_PF, CWM, RECTLP and DIAGLP. Filter strenth range should be in the range 0-10. (note PSNR statistics will be computed relative to the filtered video if -local is enabled) lossless : Lossless coding.(overrides -qf and -targetrate) mv_prec : Motion vector precision. Valid values are 1 (Pixel precision), 1/2 (half-pixel precision), 1/4 (quarter pixel precision which is the default), 1/8 ( Eighth pixel precision). full_search : Use full search motion estimation combined_me : Use combination of all three components to do motion estimation field_coding : Code the input video as fields instead of frames. Default coding is frames. use_vlc : Use VLC for entropy coding of coefficients instead of arithmetic coding. Modifying L1_sep and num_L1 allows for new GOP structures to be used, and should be entirely safe. There are two non-GOP modes that can also be used for encoding: setting num_L1=0 gives I-frame only coding, and setting L1_sep to 1 will do IP-only coding (no B-pictures). P-only coding isn't possible, but num_L1=very large and L1_sep=1 will approximate it. Modifying the block parameters is strongly deprecated: it's likely to break the encoder as there are many constraints. Modifying cpd will not break anything, but will change the way noise is distributed which may be more (or less) suitable for your application. Setting cpd equal zero turns off perceptual weighting altogether. For more information, see the algorithm documentation on the website: http://diracvideo.org/wiki/index.php/Dirac_Algorithm Other options. The encoder also supports some other options, which are verbose : turn on verbosity (if you don't, you won't see the final bitrate!) start : code from this frame number stop : code up until this frame number local : Generate diagnostics and locally decoded output (to avoid running a decoder to see your video) Using -start and -stop allows a small section to be coded, rather than the whole thing. If the -local flag is present in the command line, the encoder produces diagnostic information about motion vectors that can be used to debug the encoder algorithm. It also produces a locally decoded picture so that you don't have to run the decoder to see what the pictures are like. 4.4 Decoding Decoding is much simpler. Just point the decoder input at the bitstream and the output to a file: dirac_decoder -verbose test_enc test_dec will decode test_enc into test_dec with running commentary.