[ntar-workers] Re: NTAR - PCAP next generation dump file format

Thu Jun 30 16:53:52 GMT 2005

Alexander Dupuy wrote:
> Christian Kreibich writes:
> 
>> While I think nothing's wrong with a good "toc" structure for the new
>> format, I think it's at least as important to provide good clues to free
>> fseek()s to find their way back into the entity sequence.
> 
> 
> One of the issues with the existing tcpdump trace format files is that 
> this sort of random access is not really feasible when the capture file 
> is compressed, since with most common compression schemes (compress, 
> gzip, bzip2) the data must be decompressed sequentially starting at the 
> beginning of the file in any case.
> 
> It's promising to see that the PCAP-NG spec includes compression blocks, 
> but as they are marked "experimental" I suspect that the NTAR 
> implementation doesn't (yet) support them; furthermore they are still 
> somewhat lacking.  Essentially, you have two choices for using 
> compression with the current spec:
> 
> 1. a compression block that spans the entire file (except for the 
> section header) - this doesn't provide much benefit over simply 
> compressing a regular PCAP-NG capture file using an external compression 
> program, only that the file is identified as PCAP-NG, and possibly some 
> applications may find it easier to handle compressed and uncompressed 
> files uniformly.
> 
> 2. multiple compression blocks (with or without multiple section 
> headers) - this allows chunking of the compression, and allows a limited 
> random access comparable to splitting a classic capture file and 
> compressing them independently.
> 
> A third choice that I'm surprised isn't supported (or, apparently, 
> supportable) is one where only the packet data is contained in a 
> compression block; with the packet block header remaining uncompressed.  
> This sort of thing would be especially useful for full-packet captures, 
> which can get very large, and really need compression.  While a 
> simplistic implementation would probably not provide great compression, 
> due to the duplication of compression algorithm header data in each 
> packet, a more sophisticated approach might provide a common compression 
> dictionary block that could be used to decompress each of the individual 
> packets.

That's a good idea. When defining the file format, I tought about 
per-packet compression, but I rejected it because of:

- the overhead to set-up compression for every packet
- the limited amount of compression obtained

The issues I still see in your approach are:
- it's quite complex to implement. In particular, is there any library 
we can rely on?
- the location of the compression dictionary block. The beginning of the 
file? In that case you have to jump back in order to update it. The end 
of the file? You run the risk of loosing a lot of information if 
something goes wrong while you write it.

> This third choice is also limited by the types of data that can be 
> represented in the (uncompressed) packet block headers - currently this 
> is only timestamp, (capture) length, inbound/outbound and error flags, 
> and packet hash.  For random or packet selection access, it would be 
> very useful if it were possible to include address features of the 
> captured packet, e.g. IP or MAC src/dst addresses, TCP/UDP src/dst 
> ports, etc.  This could be done using new options, 

... or by an additional block that will contain this kind of information 
for one (or more?) packet blocks.

Loris

> although the fact 
> that options follow packet data is mildly annoying in this case (I 
> understand the reasoning for that, and am not suggesting changing it - 
> it's just that for this (ab)use of options, having to seek past the 
> packet data is inconvenient).
> 
> @alex
> _______________________________________________
> ntar-workers mailing list
> ntar-workers at winpcap.org
> https://www.winpcap.org/mailman/listinfo/ntar-workers
>