MCF specification, technical part

developement-version (0x00), latest update: 2002-07-17

Progress: can't really estimate (has been almost done for last six months)

NOTE: This is devel-version; specs are so unstable that we don't want to increase version-number every time we change something. MCF-files of different 0x00-versions might not be compatible. This also means that 0x00-version MCF-files won't work with 0x01-version (or later) software.

Please don't mirror this specification - we don't want old versions floating around the net. This specification lives in http://mcf.sourceforge.net/ and is very unlikely to disappear. Even if it does, developers have backup-copies of it and will set up a mirror somewhere. In such case you can find the new location by looking at your favorite message board.

Vocabulary

Track contains one type of media (like video or subtitles) and controls some parameters needed for playing it.

Track Type defines what type of media a Track contains.

Track Entry defines a Track (gives it parameters such as Track Type and Format (compression) used).

Track Number - tracks get sequential numbers (in order of Track Entries), which are used for referring to a certain Track

Canvas is the video window of the player. Canvas is normally same size as the video, but for files without video (but still some graphics) it may differ.

Element is an element in our main file structure. Elements are those colorful things in the diagram below.

Part is a part of an element (for example, Main Header consists of four parts)

One octet is 8 bits.

Filenames

One important thing for a cross-platform format is naming of files. Not all operating-systems support all characters and even if the OS supports something, there will definately be programs which don't support it. That's why we have defined two levels for naming: strict and relaxed. Filenames are limited to 64 characters in length. This really should be enough for everybody. If you think that YOU need more, send mail to mcf-devel list at SourceForge, telling us why.

Strict names can only contain A-Z, a-z, 0-9, "_", "-", "(", ")" and ".". Strict names can contain several dots, but may not start of end with one or contain two or more in a row (".."). Relaxed names can also contain spaces (but only one in row and not as first or last char), several dots in a row and commas (if you need more chars, let us know).

When encoding, use of relaxed names should give warning to the user (saying that it might not work reliably in all systems) and use of names outside relaxed should give an error.

Fields which contain [parts of] filenames, and therefore must follow these rules, are labelled "FILENAME".

Structure

Main Header, consists of four logical parts: (divided in parts for humans, not for parsers)
- Type Header (ASCII info on file type, size, etc; not protected by checksum or signature)
- Actual Header (the most important part of Main Header; among other things, links to all other elements)
- Extended Info (a bit like ID3v2 with MP3)
- Content-specific Info (movie actors, CG-clip modelers, documentary/news interviewers, etc)
[TrkE] Track Entries (settings of tracks (format, language, ..). Each Track can have a Codec Header)
[CodH] Codec Headers (variable-sized storage of additional data, usually some information for the codec)
[Chap] Chapter Definitions (provides a menu for jumping to important positions in the movie)
- Edition Entries (defines editions/cuts)
- Chapter Entries (for each edition, defines all chapters)
[Seek] Seek Entries (contains position (link), timecode and some flags of each Block)
[CHdr] Clusters: actual data interleaved with checksums and other structures
[File] Attached Files (.nfo, .html, pictures or something else, but nothing big)
- Attach Entries (descriptions, filenames, MIME-types, etc)
- Files (pure data)
[SigH] Signature Header (allows verifying that the file is untouched and coming from specific source)

Four character string in square braces is an identifier put in front of that element. It is normally NOT used for anything, but can be used for identifying elements when using hex-editor. Another possible uses are file validation and recovery.

Note that the format allows adding new elements between current elements and reorganizing elements, without losing compatibility. There are many ways to extend current headers too, but more about that later ..

Main Header is the only mandatory header, but if you don't have any Track Entries (no Tracks), the file can't contain Blocks (=data) either. Attached Files element only exists if you attach one or more files (doh). Codec Headers only exist if those are required. Signature Header and Chapter Definitions are optional.

Some people have speculated that the Main Header (Extended Info) will be our Achilles heel, but I don't really think so. If something in it outdates, the field can be abandoded and we still have compatibility with old software. Space waste caused by that is minimal and we can easily add new sub-parts to Main Header. There is also some space reserved inside, to allow adding new fields without adding new sub-parts.

General things

All integers are big-endian (Motorola/network-style, NOT byte-swapped).
All strings are in UTF-8 (Unicode) charset (it is variable width and eats ASCII without conversion).
All fixed-length strings are padded with 0x00.
Bit 0 is the lowest bit (has value of 1) and bit 7 or 15 is the highest (depends on field width).

In our mission to confuse you, we [may] have some exceptions to those rules.

Player field

(used in this specification) can have following values:

"must": player must check that data and support/do whatever it indicates (if the player supports the feature where this belongs to)
"check": optional feature in player: used for checking file validity (highly recommended)
"feat": optional feature in player: data that can be used for some feature etc.
"info": optional (but strongly suggested) feature: information that can be shown to the user and sometimes even modified by player, but not used for anything technical.
"-": this is because of MCF-structure or something else, no use (or defined in bit-level, see description)

This field doesn't apply very well in all situations though.

Main Header

Size: 0x1400 (5120) octets (might increase)

Main Header is the anchor point of everything in MCF. It must be the first thing in MCF-file and must start at offset 0. For streams and MCF-CDs the stream handling code must be able to figure out the MCF addressing space on the actual stream.

All positions in Main Header are relative to beginning of the file (as described in previous paragraph).

Note: in Type Header all strings are padded with 0x20, not with 0x00 (like elsewhere).

Type Header (160 octets)
Position	Player	Description
0x000	check	ASCII-string: "MCF - more info: http://mcf.sourceforge.net/". If you need to identify MCF-file, primarily use file extension (or MIME-type), but if that is not possible, you can use first 6 chars of this string ("MCF - "). To be sure, you can validate the checksum of Actual Header (or other parts).
0x037	-	0x0D (CR), 0x0A (LF), ASCII: "Site: ["
0x040	info	Site ("passed thru"-ads; ads ONLY allowed in this field), full URL or text (64 octets ASCII/Latin-1)
0x080	-	ASCII: "]", 0x0D (CR), 0x0A (LF), ASCII: "Size: ["
0x08A	check	File size, in octets, for easy detection of incomplete downloads (LW) (18 octets, ASCII-string, decimal, no formatting, left-aligned)
0x09C	-	ASCII: "]", 0x0D (CR), 0x0A (LF), 0x1A (EOF)

(LW) means Linear Write: when some properties of the file are not known at the beginning of write and it is impossible to seek back and fix those fields later, LW-mode can be used. In that mode, Main Header fields marked with (LW) can be omitted and filled with zero (binary, not a number). Never use LW, unless you really have to.

All elements, except Main Header, must be specified either in "big elements" or "small elements". Non-existant elements have zero as position. Elements cannot have gaps in-between nor overlap other elements nor go out of the file. Any MCF-file with empty space between elements will fail the validation.

Last/lowest 4 bits in "small element" type denote error-tolerance of the element. Lower values denote elements with more importance. Main Header should be considered level 0. This value is used for determining how strong error correction different parts of MCF need (on MCF-CD, for example). Hardcode values listed here, don't change any. Other 24 bits are still undefined.

Actual Header (864 octets)

Position

Player

Description

0x0A0

info

Version (uint8, currently 0x00)

0x0A1

must

Minimum read-version (to use all features, uint8, currently 0x00)

0x0A2

must

Absolute minimum read-version (there is some sense in reading this file with, uint8 0x00)

0x0A3

Flags
Bit	Player	Description
7	-	Linear Write mode. This flag only affects one Main Header copy, not all copies.
6-0	-	Reserved (set to 0)

0x0A4

must

Total length (uint32, in milliseconds)

0x0A8

must

Size of Main Header (uint32)

0x0AC-

Reserved (fill with 0x00)

0x0B0

must

Big element #1: Position of Clusters (uint64)

0x0B8

info

Big element #1: Total size of Clusters (LW) (uint64)

0x0C0-

info

Reserved for up to 4 other big elements (uint64 position, uint64 size) (fill with 0x00)

0x100

must

Small element #1: Position of Track Entries (uint64)

0x108

must

Small element #1: Total size of a Track Entries (uint32)

0x10C

must

Small element #1: element type (set to 0x00, 0x00, 0x00, 0x01) (4 octets)

0x110

must

Small element #2: Position of Codec Headers (uint64)

0x118

info

Small element #2: Total size of a Codec Headers (uint32)

0x11C

info

Small element #2: element type (set to 0x00, 0x00, 0x00, 0x02) (4 octets)

0x120

must

Small element #3: Position of Seek Entries (uint64)

0x128

must

Small element #3: Total size of a Seek Entries (uint32)

0x12C

info

Small element #3: element type (set to 0x00, 0x00, 0x00, 0x0E) (4 octets)

0x130

feat.

Small element #4: Position of Chapter Definitions (uint64)

0x138

feat.

Small element #4: Total size of a Chapter Definitions (uint32)

0x13C

info

Small element #4: element type (set to 0x00, 0x00, 0x00, 0x08) (4 octets)

0x140

feat.

Small element #5: Position of Attached Files (uint64)

0x148

info

Small element #5: Total size of Attached Files (uint32)

0x14C

info

Small element #5: element type (set to 0x00, 0x00, 0x00, 0x09) (4 octets)

0x150

feat.

Small element #6: Position of Signature Header (uint64)

0x158

feat.

Small element #6: Total size of a Signature Header (uint32)

0x15C

info

Small element #6: element type (set to 0x00, 0x00, 0x00, 0x0C) (4 octets)

0x160-

info

Reserved for up to 10 other small chunks (uint64 position, uint32 size, 4 octets type) (fill with 0x00)

0x200

feat.

Original filename, includes file extension (64 octets FILENAME)

0x240

feat.

Next part: Filename of next part (64 octets FILENAME) (loads from the same directory as current file (or from other, player-side configured paths), no path allowed)

0x280

feat.

Next part: Timecode of next part to start from (to allow seamless playback even with overlapping files) (uint32)

0x284

info

Muxing application or library (20 octets) ("libmcf-0.1.0", for example)

0x298

info

Writing application (24 octets) ("VirtualDub 2.1", for example)

0x2B0

check

Position of this Main Header copy (0 for the primary copy, other for backups) (uint64)

0x2B8-

Reserved (fill with 0x00)

0x300

info

Clusters: Number of Blocks = number of Block Headers (uint32)

0x304

must

Clusters: Size of a Cluster Header (currently 16) (uint8)

0x305

must

Clusters: Size of a Block Header (currently 10) (uint8)

0x306

must

Clusters: Size of a Cluster Footer (currently 4) (uint8)

0x307

must

Size of a Seek Entry (currently 12) (uint8)

0x308

check

Adler-32 checksum of Seek Entries

0x30C

must

Size of a Track Entry (uint16)

0x30E

must

Size of a Subtrack Entry (lives in the Codec Header of Audio Track) (uint16)

0x310

check

Attach: Adler-32 checksum of Attach Entries

0x314

must

Attach: Size of an Attach Entry (uint16)

0x316

Reserved (set to 0)

0x317

feat.

Chapter Definitions: Number of editions = number of Edition Entries (uint8)

0x318

check

Chapter Definitions: Adler-32 checksum of the whole Chapter Definitions

0x31C

feat.

Chapter Definitions: Size of a Chapter Block (uint16)

0x31E

feat.

Chapter Definitions: Size of an Edition Entry (uint16)

0x320-

Reserved (fill with 0x00)

0x3FC

check

Adler-32 checksum of [0x0A0,0x3FB] (Actual Header, excluding this field)

In the normal copy of Main Header (in the beginning of file), set Multiheader values to zero.

Linear mode should only be used in situations where it is impossible to know all values at the beginning of write and when it is impossible to seek back and insert that data later. One example of this would be directly capturing video to a CD-R.

Fields in Extended Info and Content-specific Info are fixed-length for many reasons. One is easy reading - to read a specific field (say, "Encoded by") I only need to seek once (to 0x680) and read fixed amount of data (128 octets) and then figure out what is inside. This makes things a lot easier. Another reason for fixed length is to allow changing data without rewriting the whole MCF-file. Imagine fixing a typo in one of these fields and then reading and writing 700 Mo of data, just to reposition it by one octet.

Comments-field can be used when other fields don't fit for the information you need.

Some strings are ASCII (and not UTF-8) for obvious reasons: URLs, e-mails, etc. cannot contain non-ASCII chars.

Extended Info (3072 octets)
Position	Player	Description
0x400	info	Title (192 octets)
0x4C0	info	Edition or episode ("director's cut" or "1x08: Firewall", for example) (128 octets)
0x540	info	Name of the original author (company, team or individual) (128 octets)
0x5C0	info	URL of the original author (movie's website) (128 octets ASCII)
0x640	info	E-mail or secondary URL of the author (64 octets ASCII)
0x680	info	Encoded by (whoever compressed it into MCF) (128 octets)
0x700	info	Encoder's URL (128 octets ASCII)
0x780	info	Encoder's e-mail (64 octets ASCII)
0x7C0	info	Comments/description (2048 octets)
0xFC0	info	RSACI Rating (parental guidance) (6 octets)
0xFC6	info	Flags (not yet defined, set to 0) (1 octet)
0xFC7	info	Content Type (Content Type page) (uint8)
0xFC8	info	Encoding date (seconds since 1970-01-01 00:00 UTC, aka UNIXTIME) (uint32, 0=undefined)
0xFCC	info	Last editing date (seconds since 1970-01-01 00:00 UTC, aka UNIXTIME) (uint32, 0=undefined)
0xFD0	info	Production year (uint16, 0 means undefined)
0xFD2	info	Country code (same as in Internet domains) (2 octets ASCII)
0xFD4	info	Language code (4 octets)
0xFD8-	-	Reserved (fill with 0x00)
0xFFC	check	Adler-32 checksum of [0x400,0xFFB] (Extended Info, excluding this field)

Like the name of Content-specific Info suggests, the info depends on Content Type. This means that we only have generic fields (Textfield 1, for example), but once Content Type is selected (say, the user selects "Movie"), fields get names (Actors, Director, Writer, ..).

Content-specific Info (1024 octets)
Position	Player	Description
0x1000	info	Textfield 1 (128 octets)
0x1080	info	Textfield 2 (128 octets)
0x1100	info	Textfield 3 (128 octets)
0x1180	info	Textfield 4 (128 octets)
0x1200	info	Textfield 5 (128 octets)
0x1280	info	Textfield 6 (128 octets)
0x1300	info	Textfield 7 (128 octets)
0x1380	info	SmallText (64 octets)
0x13C0-	-	Reserved (fill with 0x00)
0x13F8	info	Checkboxes 1-8 (highest bit is checkbox 1) (1 octet)
0x13F9	info	RadioA selection (RadioA 1 = 0x00; maximum is 0x07) (uint8)
0x13FA	info	RadioB selection (RadioB 1 = 0x00; maximum is 0x07) (uint8)
0x13FB	info	Content subtype (uint8)
0x13FC	check	Adler-32 checksum of [0x1000,0x13FB] ("content-specific info", excluding this field)

Extended info explained

Track Entry

Size: 0x240 (576) octets (might increase)

Track Entry helps player to decide what it needs for presentating the AV-data, but the main reason for this is to allow any combination of audio/video/other media, like dubs for different languages, or maybe alternative angles, ..

Tracks are numbered in defining order, starting from 1.

Track Entry

Offset

Player

Description

0x00

check

"TrkE" (ASCII, 4 octets)

0x04

must

Track Type (uint8)

0x05

Flags
Bit	Player	Description
7	must	Enabled (should only be 0 if the track is broken or something)
6	must	Preferred (=this track should be used (enabled) by default)
5	info	Constant Block size (=all Blocks are equal in size)
4	must	Lacing used (note: currently only supported for audio)
3-0	-	Reserved (set to 0)

0x06-

Reserved (fill with 0x00)

0x0C

info

Language code (4 octets)

0x10

must

Format (16 chars ASCII, look below, video-formats)

0x20

info

Format Version1 (encoding) (uint16)

0x22

must

Format Version2 (reading) (uint16)

0x24

must

Format Version3 (absolute minimum for reading) (uint16)

0x26-

Reserved (fill with 0x00)

0x28

must

Size of the Codec Header (uint32)

0x2C

check

Adler-32 checksum of codec-header

0x30

info

Codec (name and version) used for compressing (64 octets ASCII)

0x70

info

URL to codec's website (64 octets ASCII)

0xB0

info

Alternative URL (should be free, preferrably multi-platform, codec that can at least decompress it) (64 octets ASCII)

0xF0

info

Nanoseconds per Block (if not constant, set to 0) (unsigned int64)

0xF8-

Reserved (fill with 0x00)

0x100

info

Settings (bitrate, ..) used for compression-codec - free form string (64 octets ASCII)

0x140

must

Track name ("Finnish subtitles" would be a good name) (64 octets)

0x180

Very IMPORTANT field. Depends on Track Type, more info on Track Type page (188 octets)

-0x04

check

Adler-32 checksum of Track Entry, excluding this field

Note: checksum position cannot be hardcoded - it should depend on Track Entry size. Also the area it is calculated from should be Track Entry minus 4 octets.

Format versions may be pure values (greater value means greater version), bitmasks (each bit denotes a feature) or something else. Codecs or wrappers should know how to read those fields, and with that data, know if they can play it or not. It is recommended for all three fields to have same format, but not required.

Nanoseconds per Block can be used for converting to old formats (like AVI), which only support constant framerates. You should never use it for MCF playback. NOTE: we can have "dropped frames" even if the rate is constant: conversion software must be able to detect this by comparing timecodes of Blocks to number_of_frame * nanosecs_per_frame. When writing MCF, make sure you define that field, as we really want users to be able to convert to AVI, if the material itself allows that.

Clusters

The actual data is stored in checksum protected Clusters. A cluster can have any number of Blocks inside, but it is recommended to have Clusters aligned with video keyframes. Also, Clusters should be small enough to be entirely loaded in memory, but parsers should also be able to handle situations where there is not enough memory for holding a Cluster.

Too small Clusters result in some wasted space, but too large Clusters reduce data recovability and prevent error checking on fly. Sizes between 5 Ko and 2 Mo are okay, but using sizes outside of that boundary should be avoided. If there is an error somewhere inside a Cluster, the entire Cluster will be skipped.

In a Cluster we have one or more Blocks with Block Headers. The header in front of a Block tells which Track the Block belongs to and how big the Block is.

The normal way to read data from MCF would be reading Cluster Header (which contains the size of the Cluster), then reading the whole Cluster to a buffer in memory. At that point the checksum can be easily validated. If the checksum is valid, the first Block Header in it can be read. With the information in Block Header the first Block can be read, then the second Block Header, and so on, until the size of the Cluster is reached.

A Block contains one video frame, a tiny piece of audio, one subtitle, a menu or something else.

Structure of a Cluster:

Cluster Header

One or more Blocks

Cluster Footer

Cluster Header

Size: 16 octets (might increase, but very unlikely)

Cluster Header
Offset	Player	Description
0x00	check	ASCII: "CHdr"
0x04	must	Size of this Cluster, including Cluster Header and Footer, in octets (uint32)
0x08	check	Exact position of this Cluster (uint64)

Checksum includes Cluster Header (except "CHdr" and the checksum itself) and the content.

If the file is damaged, ASCII string "CHdr" can be searched. Exact location of the Cluster also helps recovering misaligned files.

Cluster Footer

Size: 4 octets (might increase, but very unlikely)

Cluster Footer
Offset	Player	Description
0x00	check	Adler-32 checksum of Cluster Header and Block(s) of the Cluster

Checksum allows verifying data integrity during playback, and reliable recovery of broken files.

Cluster Header and Cluster Footer sizes are defined in Actual Header. In practice, those sizes are always constant, but parsers MUST read those from Actual Header, instead of hard-coding.

Block

Block Header

Data (or Lacing + Data frame(s))

Ending timecode, if gap flag enabled

Block Header

Size: 10 octets (might increase, but very unlikely)

This is located in front of each Block and is used for figuring out how big the Block is, when it should be played and which Track it belongs to.

Block Header

Offset

Player

Description

0x00

must

Block size, including Block Header (uint32)

0x04

must

Timecode (milliseconds from start of file, uint32)

0x08

must

Track Number (Track Entry) (uint8)

0x09

Flags
Bit	Player	Description
7	must	Gap - track is empty after this Block ends
6	must	Keyframe - this Block is seekable
5	must	B-frame - for displaying this Block, previous AND next non-B-frames are required
4-0	-	Reserved, set to 0

Keyframe flag can only be set if the Block starts with a keyframe. This is not a problem for video, but some audio compressions might have keyframes and non-keyframes. If we store several frames in one Block, any keyframe (if there are keyframes in that Block) should be aligned to the start of Block, not somewhere middle it. MP3, Vorbis all other compressions I am aware of, only contain keyframes, so this is not a problem there.

When there is a gap in a Track, set gap flag of the last Block before that gap. When gap flag is set, you must append ending timecode (uint32) to the Block, right after its contents (data). Once ending timecode is reached, the Track is reset to the same state it is in the beginning of stream. In other words, it gets cleared. Audio tracks are silent, Video is black and Titles are completely transparent. Gap ends when there is another Block on the Track.

Track #0 aka Magic Block

Frozen: structure, but new features may appear.

This is a special structure used for some stream control functions required in MCF.

Track #0: special case of Block Header
Offset	Player	Description
0x00	must	Block size (uint32)
0x04	must	Timecode (milliseconds from start of file, uint32)
0x08	must	Track Number: 0x00 (uint8)
0x09	must	Magic Block type (uint8): 0 = deleted block, 1 = headers rebroadcast, 0xFF = stream reset: end of Clusters Element

Clusters in MCF must always terminate with stream reset command. This command must be the last Block in MCF. It is considered as the end of stream. In normal files it is followed by other Elements (defined in Main Header) or Main Footer. In Linear Write files, it is always followed by the Main Footer that has Main Header copy in it. Broadcasts may use it differently (work in progress).

When reading, you have to skip Magic Blocks whose type you don't know. If you know the type, you'll know how to handle it.

Lacing

Frozen: totally.

Some (audio) formats (Vorbis) require many tiny frames in one Block. For this we have a system for sub-dividing a Block. We could just have each frame in its own Block, but that would cause too much overhead.

Example of this; we have a Block with 5 frames in it. Frame sizes are 70, 260, 255, 1030 and 666. Block starts with following structure:

Number of frames in the Block (5) (uint8)
Lace for frame #1 (70)
Lace for frame #2 (255, 5)
Lace for frame #3 (255, 0)
Lace for frame #4 (255, 255, 255, 255, 10)
Frame #1
Frame #2
Frame #3
Frame #4
Frame #5

Each lace ends with the first (uint8) value smaller than 255. Frame size can be calculated by taking a sum of all values in a lace (for frame #4: 255 + 255 + 255 + 255 + 10 = 1030). Size for the last frame is not required because we already know the Block size and all the remaining octets in the Block are for the last frame.

NOTE: Only Audio Tracks are supported at the moment (other Track Types don't need this!). Remember to set/read the flag in Track Entry before using this feature.

Codec Headers

Codec Headers contain (variable-sized) Track-specific data, which doesn't fit in the Track Entry. The actual content depends on Track Type and Format. Position of the Codec Headers element is defined in Actual Header. Codec Header size for each Track is defined in Track Entries. Codec Headers are sorted in Track order. Structure of Codec Headers element:

ASCII: "CodH" (4 octets)
Codec Header of Track #1
Codec Header of Track #2
...

If a Track doesn't need Codec Header, size of it (in Track Entry) is set to 0.

Adler-32 checksum of the whole Codec Header element is stored in Actual Header.

Seek Header

ASCII: "Seek" (4 octets)

Cluster Seek entries (one for each Cluster)

Block Seek information

Seek Header is an index that allows fast seeking - for example, if you want to watch a part starting at 57 minutes, you could only download file headers (including Seek Header), then directly seek to the correct position and only download what you really want to. All players should read Seek Header to memory, for fast access. Space required for it is typically less than 20 kilobytes. Seek Header is mandatory.

Seek Header starts with ASCII "Seek" (4 octets), directly followed by one CS-entry for each Cluster of the file:

Size: 12 octets (might increase, but very unlikely)

CS-Entry (Cluster Seek)
Offset	Player	Description
0x00	must	Position of the Cluster (uint64, octets relative to start of file)
0x08	must	Timecode (milliseconds from start of file, uint32)

Timecode points to the time "sync" is reached if decoding starts at the Cluster pointed to by that Seek Entry. Sync means that each Track either has reached keyframe or has a gap (is not active).

If the Cluster is the first Cluster in the file, timecode should be set to zero (because all tracks have gap until the first Block comes). If some Tracks never get synced (end of file is reached before they sync), set timecode to 0xFFFFFFFF.

There is one big problem with this system: if there is some content that stay on-screen for very long (a logo, for example), seeking won't work as expected. As a workaround, such problematic Blocks are excluded from those timecodes (CS-Entries). Blocks excluded from CS-index are described in Block Seek.

A good rule for deciding if a Block should go to normal or special index is displaying time: a Block that stays on screen over three or more Clusters should go to BS instead of CS.

Work in progress: Block Seek specs coming soon!

Attached Files

Frozen: All of it.

This allows bundling (little) files with the content. This could mean pictures of movie covers, posters, etc. Structure of the element is:

ASCII: "File" (4 octets)
Attach Entries (right one after another, size of an Attach Entry is defined in Actual Header)
ASCII: "FDat" (4 octets)
Data (files one after another, in same order as Attach Entries)

Attach Entry (currently 0xD1 (209) octets)
Offset	Player	Description
0x00	must	File description (128 octets)
0x80	must	MIME-type (32 octets ASCII)
0xA0	must	File extension (32 octets FILENAME)
0xC0	must	Position of the file (uint64)
0xC8	must	File size in octets (uint32)
0xCC	check	Adler-32 checksum of the file
0xD0	must	File compression (0 = no compression, others not defined yet) (uint8)

File extension is required for saving file to disk and for determining its type (if the target-system doesn't support MIME-types). It can also be used for defining part of the full filename, not just the extension. Files extracted get their basename (name up to the first dot) from the MCF-file they were extracted from. Example-movie.XviD.Vorbis.av.mcf with an attachemnt named instructions.html would result that attachment extracted as "Example-movie.instructions.html". Never start an extension with a dot.

Chapter Definitions

Frozen: all of it, but don't consider this freeze very strong.

This is a simple menu for jumping to certain positions of the movie - a menu listing these should be accessible with following buttons (depending on which buttons your hardware has): chapters, menu, tab, alt-c. You are not tied to using these buttons, but these are what we recommend. On Windows-platform this should be in the right-click menu (as one submenu).

Another use for this is defining different editions (cuts) of the material. Control-track (menu) is used for switching between editions and for actual jumps during playback. When edition is switched from a control-Block, entries available in player's chapter-menu should change.

The first thing in Chapter Definitions is one or more edition-entries. The edition defined first is edition #1 and will be the default.

ASCII: "Chap" (4 octets)
Edition Entries
Chapter Entries for edition #1
Chapter Entries for edition #2
..and so on

Edition Entry (66 octets each)
Offset	Player	Description
0x00	info	Name of the edition (64 octets)
0x40	must	Number of chapters in this edition (uint16)

For each chapter of edition #1 there is one chapter-entry. Directly after those there are chapter-entries for chapters of edition #2 (if there are two or more editions), and so on.

Chapter Blocks are part of the chapter-header. Total size of the chapter-header is (no_of_chapters * size_of_chapterBlock + 4) octets.

Chapter-entry (264 octets each)
Offset	Player	Description
0x00	must	Start-timecode of the chapter (uint32)
0x04	must	End-timecode of the chapter (uint32)
0x08	must	Name of the chapter (256 octets)

We use 0x00 for separating menu levels:

Shrek\0Intro
Shrek\0Eremite Ogre\0Hunters
Shrek\0Eremite Ogre\0Donkey
... (many other chapters)
Shrek\0Party
Shrek\0Ending Credits
Making of Shrek
Animation Interviews

Maximum number of menu-levels is 4. Total number of chapter-entries is limited to 4096. Maximum number of entries in one menu is limited to 256. If any of these limits are broken, players should give a warning and ignore the chapter-structure of that movie. The purpose of these limits is to stop MCF-files from overloading the user-interface.

This header is also used for next/prev chapter functions. Start/end-timecodes are used for figuring out which chapter is playing. If no chapters match (ie. the position playing is not included in the edition currently selected), we'll jump to timecode 0. Previous-button normally moves to the chapter right before current chapter, not to the beginning of current chapter. Play-button (if not shared with pause-button) should jump to beginning of current chapter.

Signature Header

This goes to the end of file and contains a digital signature of the file, so that you can verify that the file is coming from where you think it is from - and unmodified.

Signature Header
Offset	Player	Description
0x00	check	ASCII: "SigH" (4 octets)
0x04	must	Public key algorithm (0 = RSA) (uint8)
0x05	must	Hash algorithm (0 = SHA1-160) (uint8)
0x06	must	Size of the public key (octets, uint16)
0x08	must	Position of the first octet to sign (normally 160: beginning of the Actual Header) (uint64)
0x10	must	Position of the last octet to sign (normally the octet right before this header) (uint64)
0x1A	must	Public key (extremely long unsigned integer)
0x1A + keysize	must	Crypted hash (extremely long unsigned integer)
-0x04 (from end of header)	check	Adler-32 checksum of Signature Header, excluding this field

Crypted hash size is determined by the total size of the Signature Header element (defined in Actual Header).

We are compatible with PGP/GPG - your keypairs work with this too.

Work in progress: Signature Header is likely to change

MultiSegment

Work in progress! Really.

Sometimes files must be split, to fit on CDs, or to meet file size restrictions. It isn't always easy to find a spot with keyframe appearing for all tracks. MultiSegment allows splitting MCFs at any Cluster boundary, without having to recalculate checksums for Clusters.

A MultiSegment set of files should be treated just like one big file. All segments must have exactly same settings (no changing of codecs or anything is allowed).

All Small Elements (except MultiSeg Header) must be stored in the first segment; other segments only contain Main Header, MultiSegment, Clusters and Main Footer.

MultiSeg Header: (alpha)
0x00 "MSeg"
0x04 Cur_seg_num (first is 0x01)
0x05 Total_segs (or 0x00 if unknown)
0x06 Number of segs defined here (must be greater or equal to cur_seg_num)
0x07 Reserved
0x10 Filename of segment 1 (64 octets)
0x50 Seg2: First timecode
0x54 Seg2: First synced time (or 0xFFFFFFFF if never gets synced in this segment)
0x58 Seg2: Offset (how many octets all positions inside Clusters Element are off)
0x60 Seg2: Filename (64 octets)
0xA0-0xF0 Seg3 (same structure as Seg2)
0xF0- .. and so on until "number of segs defined here" reached.
-0x04 Checksum of MultiSeg Header (offset relative to end of the header, defined by Element size).

Filenames must contain ".seg3of4.", where 3 and 4 would be decimal numbers for current segment and total number of segments. Players should not rely on this information, but use names defined inside instead (and ask for user what to load in case something cannot be found). If user can't supply some part, players should still handle the situation by skipping over the missing part. Even the case where first segment is missing should be handled.

NOTE: Main Header backups inside Clusters might get replaced by this, have to think about it..

Main Footer

Work in progress!

This part is anchored at end of file and is mandatory. If Linear Write is used in the primary copy of Main Header, Main Header copy with all information filled in must be inserted into Main Footer. Main Footer can also contain other Elements; in that case, positions and sizes of those elements are defined in the Main Header copy inside Main Footer.

Main Footer
Offset	Player	Description
-size_of_MF	-	Main Header and optionally other Elements (1024 or more octets)
-0x14	must	Main Footer size (uint32)
-0x10	-	"MCF ends here ->" (16 octets)

Previous copy of Main Header may be at zero (if there only is one copy) or somewhere else (with Magic Blocks).

If MultiSegment and Linear Write are both used, Main Footer copy of Main Header

Checksums

MCF uses checksums for all headers and data to make sure that those are not broken and to allow redownloading only the broken part, or playing other parts, while skipping the broken part. The algorithm used everywhere is Adler-32.

Links to Adler-32 resources:

http://www.gzip.org/zlib/zlib_tech.html - bottom of that page: CRC-32 against Adler-32
http://www.cis.ohio-state.edu/cgi-bin/rfc/rfc1950.html - bottom of the page: info and C-code for calculating

Levels (another way to explain MCF structure)

I don't know if this part clarifies anything, but it's the best I can offer now. To access a low level (say, #3), you must obviously need to know higher levels (#1 and #2) too. Level #3 is surely the most common

Level #0: Main Header

Not actually a level, but some software might be happy without knowing anything about other levels. That would mean software only reading Extended Info or some other field from Main Header.

Where required: search engines, very simple file validators

Level #1: Elements

Each has its own color in our structure diagram

Each bit of a MCF-file belongs to some Element
Each Element has a position and a size
Three types: Main Header, Small Element and Big Element
Small Elements and Big Elements can be reordered with other Small/Big Elements
Small Elements have their importance defined in Actual Header
Every Element has its size defined in Actual Header
All Small and Big Elements have their position defined in Actual Header; Main Header is always at position 0
Each starts with a 4+ octet (ASCII) identifier

Where required: reordering MCF for better performance, selecting protected areas for MCF-CD

Level #2: Clusters

Each bit of the Clusters Element belongs to exactly one Cluster
Cluster has a size and a checksum of itself
Clusters cannot be reordered, but reClustering on Block level is possible

Where required: checksum validation of data

Level #3: Blocks

Each bit of a Cluster belongs to exactly one Block, or to the Cluster Header
Has a size and a timecode
Consist of Block Header and data
Each belongs to a Track

Where required: players, encoding software, simple editors

Level #4: Frames

We could take the easy way and put each Frame to a separate Block (="single per block" framing mode), so that each would have a uint32 fields for size and timecode. Unfortunately audio Frames can be very small (ranging from 32 bits of typical PCM to roughly 100-1500 octets of Vorbis etc). This would mean big overhead. Also, always handling data in so small pieces would be inefficient. That's why we often need to put many frames into a single Block.

On player-level data in handled in Blocks: no knowledge of the underlying framing is required.

Frame is the smallest piece of data MCF can handle
Framing modes: (new modes might appear in future)
- Single per Block (SPB): no framing: size and timecode come from the Block
- Fixed size Frames (FixS): size of a Frame known because all are same size
- OctetLacing (L8): Frame sizes (with a precision of one octet) described in the data-area of a Block
Timecodes are only known for:
- The first frame in a Block, and
- All Frames in Tracks which have constant Frame duration
Currently only audio tracks may use framing modes other than SPB
Framing mode is constant for the whole Track (flags and fields for it are in Track Entry)
Framing mode is ALSO constant for each audio format (defined in MCF format specs)
There is no setting for "SPB enabled" - if no other modes apply, SPB is always assumed.

Where required: advanced editors, MCF-codecs, wrappers

Exceptions to above rules about levels:

If Signature Header is present, data (except unprotected area(s)) cannot be changed in any way (without removing Signature Header)

Maintained by: Steve Lhomme (robUx4 ) & Lasse Kärkkäinen (Tronic)

is hosting MCF