USTAR
Filesystems |
---|
Virtual Filesystems |
Disk Filesystems |
CD/DVD Filesystems |
Network Filesystems |
Flash Filesystems |
USTAR is the POSIX standard form of TAr, Unix's tape archive format. It has become a very common general-purpose archive format; tools to create tar archives are available for every OS. It is also very very simple. As USTAR is a POSIX standard (POSIX.1-1988 and POSIX.1-2001), it's well defined and very well documented. You can also find a lot of example code, and the GNU tar utility is Open Source.
As an archive format, USTAR is not strictly a filesystem, but can very easily be used as one in read-only mode. Writing files is allowed under the standard, but only by appending complete updated copies of the files to be changed. This restriction makes USTAR unsuitable for a general-purpose writeable filesystem. However, read-only tarfs implementations have their place. Notably, the Plan 9 and Inferno OSs developed by Ken Thompson and Dennis Ritchie included it with other archive filesystems, all read-only.
Tar archives may be written direct to floppy disks or partitions. It's designed for media which may be longer than the archive; it's fine with trailing garbage. Its block size, 512 bytes, formerly corresponded with that of most storage media, but this is of little importance unless you read or write one block at a time.
Format Details
Each file and directory has a 512 bytes sector containing meta data (i-node with filename if you like). If the file is not empty, then that meta data sector is followed by data sectors with file contents rounded up to 512 bytes.
Offset | Size | Description |
---|---|---|
0 | 100 | File name |
100 | 8 | File mode |
108 | 8 | Owner's numeric user ID |
116 | 8 | Group's numeric user ID |
124 | 12 | File size in bytes (octal base) |
136 | 12 | Last modification time in numeric Unix time format (octal) |
148 | 8 | Checksum for header record |
156 | 1 | Type flag |
157 | 100 | Name of linked file |
257 | 6 | UStar indicator "ustar" then NUL |
263 | 2 | UStar version "00" |
265 | 32 | Owner user name |
297 | 32 | Owner group name |
329 | 8 | Device major number |
337 | 8 | Device minor number |
345 | 155 | Filename prefix |
The only trick is, that file size is not stored in binary, rather in an ASCII octal string. For example 1025 is stored as '000000002001'.
The field Type flag tells what kind of file it's about.
Type flag | Meaning |
---|---|
'0' or (ASCII NUL) | Normal file |
'1' | Hard link |
'2' | Symbolic link |
'3' | Character device |
'4' | Block device |
'5' | Directory |
'6' | Named pipe (FIFO) |
Pretty much that's all you need to know for a basic implementation.
Example Code
We need a helper function to convert ASCII octal number into binary:
int oct2bin(unsigned char *str, int size) {
int n = 0;
unsigned char *c = str;
while (size-- > 0) {
n *= 8;
n += *c - '0';
c++;
}
return n;
}
Then file lookup is as simple as:
/* returns file size and pointer to file data in out */
int tar_lookup(unsigned char *archive, char *filename, char **out) {
unsigned char *ptr = archive;
while (!memcmp(ptr + 257, "ustar", 5)) {
int filesize = oct2bin(ptr + 0x7c, 11);
if (!memcmp(ptr, filename, strlen(filename) + 1)) {
*out = ptr + 512;
return filesize;
}
ptr += (((filesize + 511) / 512) + 1) * 512;
}
return 0;
}
If you do not load the entire archive into memory, then you will have to load the sector from the disk in the beginning of the loop (before 'int filesize'), and seeking to the next sector (or skipping more) at the end of the loop (instead of 'ptr +=').
See Also
External Links
- Example implementation on github with file creation and removal, MIT licensed
- Tar format details at IBM Knowledge Center
- GNU tar for Linux and UNIX like OSes
- Tar for Windows
- Wikipedia on USTAR
- newlib tar header file