[ntar-workers] Re: NTAR - PCAP next generation dump file format

Wed Jun 29 12:47:46 GMT 2005

Christian Kreibich writes:
> While I think nothing's wrong with a good "toc" structure for the new
> format, I think it's at least as important to provide good clues to free
> fseek()s to find their way back into the entity sequence.

One of the issues with the existing tcpdump trace format files is that this 
sort of random access is not really feasible when the capture file is 
compressed, since with most common compression schemes (compress, gzip, bzip2) 
the data must be decompressed sequentially starting at the beginning of the 
file in any case.

It's promising to see that the PCAP-NG spec includes compression blocks, but as 
they are marked "experimental" I suspect that the NTAR implementation doesn't 
(yet) support them; furthermore they are still somewhat lacking.  Essentially, 
you have two choices for using compression with the current spec:

1. a compression block that spans the entire file (except for the section 
header) - this doesn't provide much benefit over simply compressing a regular 
PCAP-NG capture file using an external compression program, only that the file 
is identified as PCAP-NG, and possibly some applications may find it easier to 
handle compressed and uncompressed files uniformly.

2. multiple compression blocks (with or without multiple section headers) - 
this allows chunking of the compression, and allows a limited random access 
comparable to splitting a classic capture file and compressing them independently.

A third choice that I'm surprised isn't supported (or, apparently, supportable) 
is one where only the packet data is contained in a compression block; with the 
packet block header remaining uncompressed.  This sort of thing would be 
especially useful for full-packet captures, which can get very large, and 
really need compression.  While a simplistic implementation would probably not 
provide great compression, due to the duplication of compression algorithm 
header data in each packet, a more sophisticated approach might provide a 
common compression dictionary block that could be used to decompress each of 
the individual packets.

This third choice is also limited by the types of data that can be represented 
in the (uncompressed) packet block headers - currently this is only timestamp, 
(capture) length, inbound/outbound and error flags, and packet hash.  For 
random or packet selection access, it would be very useful if it were possible 
to include address features of the captured packet, e.g. IP or MAC src/dst 
addresses, TCP/UDP src/dst ports, etc.  This could be done using new options, 
although the fact that options follow packet data is mildly annoying in this 
case (I understand the reasoning for that, and am not suggesting changing it - 
it's just that for this (ab)use of options, having to seek past the packet data 
is inconvenient).

@alex