[ntar-workers] Re: NTAR - PCAP next generation dump file format

Fri Jul 1 05:11:43 GMT 2005

----- Original Message ----- 
From: "Alexander Dupuy" <alex.dupuy at counterstorm.com>
To: <ntar-workers at winpcap.org>
Sent: Wednesday, June 29, 2005 5:47 AM
Subject: [ntar-workers] Re: NTAR - PCAP next generation dump file format

> Christian Kreibich writes:
>> While I think nothing's wrong with a good "toc" structure for the new
>> format, I think it's at least as important to provide good clues to free
>> fseek()s to find their way back into the entity sequence.
>
> One of the issues with the existing tcpdump trace format files is that 
> this sort of random access is not really feasible when the capture file is 
> compressed, since with most common compression schemes (compress, gzip, 
> bzip2) the data must be decompressed sequentially starting at the 
> beginning of the file in any case.
>
> It's promising to see that the PCAP-NG spec includes compression blocks, 
> but as they are marked "experimental" I suspect that the NTAR 
> implementation doesn't (yet) support them; furthermore they are still 
> somewhat lacking.  Essentially, you have two choices for using compression 
> with the current spec:

No, at the moment they are not supported. Basically, I chose to implement 
the "framework" of the pcap-ng, plus the needed blocks used to test the 
library. i.e. the IDB and the packet blocks.
Moreover, the draft does actually specifies how the compression block work.

>
> 1. a compression block that spans the entire file (except for the section 
> header) - this doesn't provide much benefit over simply compressing a 
> regular PCAP-NG capture file using an external compression program, only 
> that the file is identified as PCAP-NG, and possibly some applications may 
> find it easier to handle compressed and uncompressed files uniformly.

I agree 100%.

>
> 2. multiple compression blocks (with or without multiple section 
> headers) - this allows chunking of the compression, and allows a limited 
> random access comparable to splitting a classic capture file and 
> compressing them independently.
>
> A third choice that I'm surprised isn't supported (or, apparently, 
> supportable) is one where only the packet data is contained in a 
> compression block; with the packet block header remaining uncompressed. 
> This sort of thing would be especially useful for full-packet captures, 
> which can get very large, and really need compression.  While a simplistic 
> implementation would probably not provide great compression, due to the 
> duplication of compression algorithm header data in each packet, a more 
> sophisticated approach might provide a common compression dictionary block 
> that could be used to decompress each of the individual packets.

This approach is quite interesting, and maybe could be implemented quite 
easily in ntar, as well (I'm not completely sure, however...).

>
> This third choice is also limited by the types of data that can be 
> represented in the (uncompressed) packet block headers - currently this is 
> only timestamp, (capture) length, inbound/outbound and error flags, and 
> packet hash.  For random or packet selection access, it would be very 
> useful if it were possible to include address features of the captured 
> packet, e.g. IP or MAC src/dst addresses, TCP/UDP src/dst ports, etc. 
> This could be done using new options, although the fact that options 
> follow packet data is mildly annoying in this case (I understand the 
> reasoning for that, and am not suggesting changing it - it's just that for 
> this (ab)use of options, having to seek past the packet data is 
> inconvenient).

Well, I don't think this is a big deal. I think it's possible to just jump 
the compressed packet and retrieve the options with a seek. From my point of 
view, the actual problem is that we want to add some "dissected" fields of 
the packet in the packet block itself, through options. Personally I don't 
like this approach for two reasons:
1. we are mixing raw (i.e. not "dissected") packets with "dissected" data 
(e.g. ip src/dst and so on)
2. even if 1. is acceptable, what are the fields that we want to support in 
the options? I'm pretty sure that some users will want MACs and IPs, others 
will want to add TCP ports, and so we will need to add the L4 protocol type, 
and then some others will want another field because they think it's 
interesting/useful/whatever. Conclusion: we will end up having tons of 
different options to store all the possible protocol fields one may think 
of...

Have a nice day
GV

>
> @alex
> _______________________________________________
> ntar-workers mailing list
> ntar-workers at winpcap.org
> https://www.winpcap.org/mailman/listinfo/ntar-workers