$Id: musicmatch.txt,v 1.4 2000/09/09 23:02:37 eldamitri Exp $ MusicMatch (TM) tag format description Status of this document This document is a description of a deprecated tagging format. The information contained herein is not a specification; its intent is to interpret the format based on hundreds of examples. It also relies heavily on information obtained by others who have done similar investigations. It is not based on any official documentation of the format, as such documentation is not publicly available. Therefore the contents of this document may change to adjust for newly-discovered information, but the format itself is unlikely to change due to its deprecation. Distribution of this document is unlimited. Abstract This document describes the MusicMatch tagging format present in some digital audio files. This format, like other tagging specifications, provides a method for storing information about an audio file within itself to document its contents. This format was developed by MusicMatch and used exclusively by older versions of Jukebox, their popular, "all-in-one" MP3 application. 1. Table of contents Status of this document Abstract 1. Table of contents 2. Introduction 3. Conventions in this document 4. Tagging format 4.1. Header 4.2. Image extension 4.3. Image binary 4.4. Unused 4.5. Version information 4.6. Audio meta-data 4.6.1. Single-line text fields 4.6.2. Non-text fields 4.6.3. Multi-line text fields 4.6.4. Internet addresses 4.6.5. Padding 4.7. Data offsets 4.8. Footer 5. Identifying and parsing a MusicMatch tag 6. Converting to ID3v2 7. Copyright 8. References 9. Author's Address 2. Introduction The following document describes the structure of the tagging format used by MusicMatch (TM) Jukebox, prior to version 4.0 of that application. This program is a so-called "All-In-One" MP3 program and provides a CD-Ripper, WAV-to-MP3 converter, database, and MP3 player. The MusicMatch tagging format has gone through several incremental iterations in its format, although the basic structure has remained fairly constant throughout its history. The various formats of the MusicMatch tag have been tightly coupled with the version of Jukebox that created it. As such, this document will refer to the Jukebox version and tagging format version interchangeably. As of version 4.0, MusicMatch has deprecated the use of this format in their own Jukebox application, transitioning instead to ID3v2, an open standard for tagging digital audio. Unfortunately, despite repeated requests, MusicMatch has not provided to the public any documents describing this format, and the MusicMatch Jukebox is the only widely-distributed software application that can read and write these tags. As such, this text may not be completely accurate and is surely incomplete, but it covers enough to the format to enable one to write robust software to find and parse tags in this format. For example, the id3lib tagging library's MusicMatch parsing routines were written solely based on the information found in this document. However, the authors cannot be held responsible for any inaccuracies or any harm caused by using this information. One can assume that the specifition is unlikely to change, given MusicMatch's own abandonment of the format. It should also be noted that incoporating functionality into applications to write tags in this format is discouraged, as the format has been officially deprecated by MusicMatch themselves. 3. Conventions in this document This document borrows heavily from specifications written by Martin Nillson, author of the ID3v2 tagging standard. Much of the structure, formatting, and other such conventions used in the ID3v2 specifications are carried over into this document. Text within "" is a text string exactly as it appears in a tag. Numbers preceded with $ are hexadecimal and numbers preceded with % are binary. $xx is used to indicate a byte with unknown content. 4. Tag overview The MusicMatch Tagging Format was designed to store specific types of audio meta-data inside the audio file itself. As the format was used exclusively by the MusicMatch Jukebox application, it is used only with MPEG-1/2 layer III files encoded with that program. However, its tagging format is not inherently exclusive of other audio formats, and could conceivably be used with other types of encodings. MusicMatch tags were originally designed to come at the very end of MP3 files, after all of the MP3 audio frames. Starting with Jukebox version 3.1, the application became more ID3-friendly and started placing ID3v1 tags after the MusicMatch tag as well. In practice, since very few applications outside of the MusicMatch Jukebox are capable of reading and understanding this format, it is not unusual to find MusicMatch tags "buried" within mp3 files, coming before other types of tagging formats in a file, such as Lyrics3 or ID3v2.4.0. Such "relocations" are not uncommon, and therefore any software application that intends to find, read, and parse MusicMatch tags should be flexible in this endeavor, despite the apparent intentions of the original specification. Although various sections of a MusicMatch tag are fixed in length, other sections are not, and so tag lengths can vary from one file to another. A valid MusicMatch tag will be at least 8 kilobytes (8192 bytes) in length. Those tags with image data will often be much larger. The byte-order in 4-byte pointers and multibyte numbers for MusicMatch tags is least-significant byte (LSB) first, also known as "little endian". For example, $12345678 is encoded as $78 56 34 12. Overall tag structure: +-----------------------------+ | Header | | (256 bytes, OPTIONAL) | +-----------------------------+ | Image extension (4 bytes) | +-----------------------------+ | Image binary | | (var. length >= 4 bytes) | +-----------------------------+ | Unused (4 bytes) | +-----------------------------+ | Version info (256 bytes) | +-----------------------------+ | Audio meta-data | | (var. length >= 7868 bytes) | +-----------------------------+ | Data offsets (20 bytes) | +-----------------------------+ | Footer (48 bytes) | +-----------------------------+ This document will describe the various sections of the tag in the order listed above (that is, in the sequential order that they appear when reading the tag from beginning to end). However, due to the nature of the tag's format, in practice the tag's sections will often be parsed in the reverse order. A robust parsing algorithm will be suggested and described later in the document. 4.1. Header An optional tag header often precedes the tag data in a MusicMatch tag. Although the rules that determine this header's required presence are unknown, the header is usually found in tag versions up to and including 2.50, and is usually lacking otherwise. Luckily, its format is rigid and therefore its presence is easy to determine. The data in the header are not vital to the correct parsing of the rest of the tag and can thus be discarded. The header is the only optional section in a MusicMatch tag. All other sections are required to consider the tag valid. The header section is always 256 bytes in length. It begins with three 10-byte subsections, and ends with 226 bytes of space ($20) padding. Each of the first three subsections contains an 8-byte ASCII text string followed by two bytes of null ($00) padding. The first subsection serves as a sync string: its 8-byte string is always "18273645". The second subsection's 8-byte string is the version of the Xing encoder used to encode the mp3 file. The last four bytes of this string are usually '0' ($30). An example of this string is "1.010000". The third and final 10-byte subsection is the version of the MusicMatch Jukebox used to encode the mp3 file. The last four bytes of this string are usually '0' ($30). An example of this string is "2.120000". Sync string "18273645" Null padding $00 00 Xing encoder version <8-byte numerical ASCII string> Null padding $00 00 MusicMatch version <8-byte numerical ASCII string> Null padding $00 00 Space padding 226 * $20 4.2. Image extension MusicMatch tags can contain at most one image. This first required section is the extension of the image when saved as a file (for example, "jpg" or "bmp"). This section is 4 bytes in length, and the data is padded with spaces ($20) if the extension doesn't use all 4 bytes (in practice, 3-byte extensions are the most prevalent). Likewise, tags without images have all spaces for this section (4 * $20). Picture extension $xx xx xx xx 4.3. Image binary When an image is present in the tag, the image binary section consists of two fields. The first field is the size of the image data, in bytes. The second is the actual image data. Image size $xx xx xx xx Image data <binary data> If no image is present, the image binary section consists of exactly four null bytes ($00 00 00 00). 4.4. Unused This section is never used, to the best of the author's knowledge. It is always 4 null ($00) bytes. Null padding $00 00 00 00 4.5. Version information This section of the tag has the exact same format as the header. Unlike the header, this section is required for the tag to be considered valid. Sync string "18273645" Null padding $00 00 Xing encoder version <8-byte numerical ASCII string> Null padding $00 00 MusicMatch version <8-byte numerical ASCII string> Null padding $00 00 Space padding 226 * $20 4.6. Audio meta-data The audio meta-data is the heart of the MusicMatch tag. It contains most of the pertinent information found in other tagging formats (song title, album title, artist, etc.) and some that are unique to this format (mood, prefernce, situation). In all versions of the MusicMatch format up to and including 3.00, this section is always 7868 bytes in length. All subsequent versions allowed three possible lengths for this section: 7936, 8004, and 8132 bytes. The conditions under which a particular length from these three possibilities was used is unknown. In all cases, this section is padded with dashes ($2D) to achieve this constant size. Due to the great number of fields in this portion of the tag, they are divided amongst the next four sections of the document: single-line text fields, non-text fields, multi-line text fields, and internet addresses. This clarification is somewhat arbitrary and somewhat inaccurate (some of the fields described as "non-text" are indeed ASCII strings). However, the clarification does allow for easier description of the meta-data as a whole. At any rate, the actual fields in this section of the tag appear sequentially in the order presented. 4.6.1. Single-line text fields The first group entries in this section of the tag are variable-length ASCII text strings. Each of these strings are preceded by a two-byte field describing the size of the following string (again, in LSB order). Multiple entries in a text field are separated by a semicolon ($3B). An empty (and non-existant) text field is indicated by a size field of 0 ($00 00). The first three of these entries are fairly-self explanatory: song title, album title, and artist name. The final five entries are a little less common: Genre, Tempo, Mood, Situation, and Preference. These fields can contain any information, but do to the interface and default set-up for the Jukebox application, they typically are limited to a subset of possibilities. The Genre entry differs from the ID3v1 tagging format in that it allows a full-text genre description, whereas ID3v1 maps a number to a list of genres. Again, the genre description could be anything, but the interface in Jukebox typically limited most users to the standard ID3v1 genres. The Tempo entry is intended to describe the general tempo of the song. The Jukebox application provided the following defaults: None, Fast, Pretty fast, Moderate, Pretty slow, and Slow. The Mood entry describes what type of mood the audio establishes: Typical values include the following: None, Wild, Upbeat, Morose, Mellow, Tranquil, and Comatose. The Situation entry describes in which situation this music is best played. Expect the following: None, Dance, Party, Romantic, Dinner, Background, Seasonal, Rave, and Drunken Brawl. The Preference entry allows the user to rate the song. Possible values include the following: None, Excellent, Very Good, Good, Fair, Poor, and Bad Taste. Song title length $xx xx Song title <ASCII string> Album title length $xx xx Album title <ASCII string> Artist name length $xx xx Artist name <ASCII string> Genre length $xx xx Genre <ASCII string> Tempo length $xx xx Tempo <ASCII string> Mood length $xx xx Mood <ASCII string> Situation length $xx xx Situation <ASCII string> Preference length $xx xx Preference <ASCII string> 4.6.2. Non-text fields The next group of fields is described here as "non-text". They are probably better described as entries that are auto-created (i.e., not entered in by a user), although this isn't entirely accurate, either, as the track number field is determined by user input. At any rate, they've been separated to clarify the presentation of the material. The "Song duration" entry consists of two fields: a size and text. The text is formatted as "minutes:seconds", and thus the size field is typically 4 ($04 00). The only field that is neither a string nor a LSB numerical value is the creation date. It is 8-byte floating-point value. It can be interpreted as a TDateTime in the Delphi programming language, where the integral portion is the number of elapsed days since 1899-12-30, and the mantissa portion represents the fractional portion of that day, where .0 would be midnight, .5 would be noon, and .99999... would be just before midnight of the next day. In practice, this field is typically unused and will be filled with 8 null ($00) bytes. The next field is the play counter, presumably maintained by the Jukebox application. Most of the time this field is unused, and is typically 0 ($00 00 00 00). The next entry is a size/text combo and represents the original filename and path. As these tags were created almost universally on Windows machines, the entries are typically in the form of "C:\path\to\file.mp3". The next size/text entry is the album serial number fetched from the online CDDB when a track is ripped with MusicMatch. The final field is the track number, usually entered automatically when ripping, encoding, and tagging the audio off from a CD using CDDB. Song duration length $xx xx Song duration <ASCII string> Creation date <8-byte IEEE-64 float> Play counter $xx xx xx xx Original filename length $xx xx Original filename <ASCII string> Serial number length $xx xx Serial number <ASCII string> Track number $xx xx 4.6.3. Multi-line text fields The next three entries are typically multi-line entries. All line separators use the Windows-standard carriage return ($0D 0A). As with the single-line text entries, the text fields are preceded by LSB size fields which indicate their length. Notes length $xx xx Notes <ASCII string> Artist bio length $xx xx Artist bio <ASCII string> Lyrics length $xx xx Lyrics <ASCII string> 4.6.4. Internet addresses The final group of meta-data are internet addresses. As with other text entries, the text fields are preceded by LSB size fields. Artist URL length $xx xx Artist URL <ASCII string> "Buy CD" URL length $xx xx "Buy CD" URL <ASCII string> Artist email length $xx xx Artist email <ASCII string> 4.6.5. Padding The data fields are then followed by 16 null ($00) bytes. Presumably these were intended for (up to 8) future text fields. The remainder of this section is padded with '-' ($2D) characters. 4.7. Data offsets This section of the tag was intended to give offsets into the file for each of the five major required sections of the tag. The offsets, however, are off by 1; for searching a file where the first position is offset 0, the offset given here must be reduced by 1. In practice, however, these offsets can often be invalid, since the data that comes before may be increased or reduces (such as when an ID3v2 tag is appended to the file). Therefore these offsets are best used to calculate the size of the sections by finding the difference of two consecutive offsets. Obviously, the size of the audio meta-data section must be calculated in a different manner. Image extension offset $xx xx xx xx Image binary offset $xx xx xx xx Unused offset $xx xx xx xx Version info offset $xx xx xx xx Audio meta-data offset $xx xx xx xx 4.8. Footer Unlike the header, the footer is a required section of any MusicMatch tag, and checking for its existance is an easy way to determine if a file has a MusicMatch tag. It is always 48 bytes in length. The first 19 bytes is the company name "Brava Software Inc." (Note: it seems that the company name has officially changed to MusicMatch, as "Brava Software" is not mentioned anywhere on their website), followed by 13 bytes of space ($20) padding. The next 4 bytes is the tag version as a numerical ASCII string (e.g., "3.05"), and should match the version string found in the Version section and the (optional) header. This is followed by 12 bytes of space ($20) padding. Signature "Brava Software Inc." Space padding 13 * $20 Tag version <4-byte numerical ASCII string> Space padding 12 * $20 5. Identifying and parsing a MusicMatch tag Finding and parsing a MusicMatch tag is not difficult to do, but due to lack of foresight and questionable design decisions by MusicMatch, care must be taken to ensure it is done correctly. <unfinished /> 6. Converting to ID3v2 As of Jukebox 4.0, MusicMatch has abandoned the MusicMatch tagging format in favor of the open standard ID3v2. The Jukebox application will convert old tags to ID3v2 upon request, but as this is a closed application that serves a limited number of platforms (currently on Windows and Macintosh), having a public specification for performing this mapping is necessary. As ID3v2 can encapsulate all of the information found in the original MusicMatch format while being infinitely more flexible, the decision to convert shouldn't be a difficult one. <unfinished /> 7. Copyright Copyright (C) Scott Thomas Haug 2000. All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that a reference to this document is included on all such copies and derivative works. However, this document itself may not be modified in any way and reissued as the original document. The limited permissions granted above are perpetual and will not be revoked. This document and the information contained herein is provided on an 'AS IS' basis and THE AUTHORS DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 8. References [MMTrailer] Peter "The Videoripper" Luijer, 'Description of the MusicMatch trailer in MP3 files' <url:http://members.xoom.com/videoripper/warez/mmtrailer.txt> [ID3v2] Martin Nilsson, 'ID3v2 informal standard'. <url:http://www.id3.org/id3v2.3.0.txt> [id3lib] Scott Thomas Haug, 'The ID3v1/ID3v2 Tagging Library' <url:http://www.id3lib.org> [ISO-8859-1] ISO/IEC DIS 8859-1. '8-bit single-byte coded graphic character sets, Part 1: Latin alphabet No. 1.' Technical committee / subcommittee: JTC 1 / SC 2 [JFIF] 'JPEG File Interchange Format, version 1.02' <url:http://www.w3.org/Graphics/JPEG/jfif.txt> [MPEG] ISO/IEC 11172-3:1993. 'Coding of moving pictures and associated audio for digital storage media at up to about 1,5 Mbit/s, Part 3: Audio.' Technical committee / subcommittee: JTC 1 / SC 29 and ISO/IEC 13818-3:1995 'Generic coding of moving pictures and associated audio information, Part 3: Audio.' Technical committee / subcommittee: JTC 1 / SC 29 and ISO/IEC DIS 13818-3 'Generic coding of moving pictures and associated audio information, Part 3: Audio (Revision of ISO/IEC 13818-3:1995)' [URL] T. Berners-Lee, L. Masinter & M. McCahill, 'Uniform Resource Locators (URL)', RFC 1738, December 1994. <url:ftp://ftp.isi.edu/in-notes/rfc1738.txt> [UTF-8] F. Yergeau, 'UTF-8, a transformation format of ISO 10646', RFC 2279, January 1998. <url:ftp://ftp.isi.edu/in-notes/rfc2279.txt> 9. Author's Address Written by Scott Thomas Haug Seattle, WA USA