USTAR

From OSDev Wiki
Jump to navigation Jump to search
Filesystems
Virtual Filesystems

VFS

Disk Filesystems
CD/DVD Filesystems
Network Filesystems
Flash Filesystems

USTAR is the POSIX standard form of TAr, Unix's tape archive format. It has become a very common general-purpose archive format; tools to create tar archives are available for every OS. It is also very very simple. As USTAR is a POSIX standard (POSIX.1-1988 and POSIX.1-2001), it's well defined and very well documented. You can also find a lot of example code, and the GNU tar utility is Open Source.

As an archive format, USTAR is not strictly a filesystem, but can very easily be used as one in read-only mode. Writing files is allowed under the standard, but only by appending complete updated copies of the files to be changed. This restriction makes USTAR unsuitable for a general-purpose writeable filesystem. However, read-only tarfs implementations have their place. Notably, the Plan 9 and Inferno OSs developed by Ken Thompson and Dennis Ritchie included it with other archive filesystems, all read-only.

Tar archives may be written direct to floppy disks or partitions. It's designed for media which may be longer than the archive; it's fine with trailing garbage. Its block size, 512 bytes, formerly corresponded with that of most storage media, but this is of little importance unless you read or write one block at a time.

Format Details

Each file and directory has a 512 bytes sector containing meta data (i-node with filename if you like). If the file is not empty, then that meta data sector is followed by data sectors with file contents rounded up to 512 bytes.

Offset Size Description
0 100 File name
100 8 File mode
108 8 Owner's numeric user ID
116 8 Group's numeric user ID
124 12 File size in bytes (octal base)
136 12 Last modification time in numeric Unix time format (octal)
148 8 Checksum for header record
156 1 Type flag
157 100 Name of linked file
257 6 UStar indicator "ustar" then NUL
263 2 UStar version "00"
265 32 Owner user name
297 32 Owner group name
329 8 Device major number
337 8 Device minor number
345 155 Filename prefix

The only trick is, that file size is not stored in binary, rather in an ASCII octal string. For example 1025 is stored as '000000002001'.

The field Type flag tells what kind of file it's about.

Type flag Meaning
'0' or (ASCII NUL) Normal file
'1' Hard link
'2' Symbolic link
'3' Character device
'4' Block device
'5' Directory
'6' Named pipe (FIFO)

Pretty much that's all you need to know for a basic implementation.

Example Code

We need a helper function to convert ASCII octal number into binary:

int oct2bin(unsigned char *str, int size) {
    int n = 0;
    unsigned char *c = str;
    while (size-- > 0) {
        n *= 8;
        n += *c - '0';
        c++;
    }
    return n;
}

Then file lookup is as simple as:

/* returns file size and pointer to file data in out */
int tar_lookup(unsigned char *archive, char *filename, char **out) {
    unsigned char *ptr = archive;

    while (!memcmp(ptr + 257, "ustar", 5)) {
        int filesize = oct2bin(ptr + 0x7c, 11);
        if (!memcmp(ptr, filename, strlen(filename) + 1)) {
            *out = ptr + 512;
            return filesize;
        }
        ptr += (((filesize + 511) / 512) + 1) * 512;
    }
    return 0;
}

If you do not load the entire archive into memory, then you will have to load the sector from the disk in the beginning of the loop (before 'int filesize'), and seeking to the next sector (or skipping more) at the end of the loop (instead of 'ptr +=').

See Also

External Links