[ntar-workers] Re: NTAR - PCAP next generation dump file format
Guy Harris
guy at alum.mit.edu
Wed Jun 29 20:32:54 GMT 2005
On Jun 29, 2005, at 5:47 AM, Alexander Dupuy wrote:
> Christian Kreibich writes:
>
>> While I think nothing's wrong with a good "toc" structure for the new
>> format, I think it's at least as important to provide good clues
>> to free
>> fseek()s to find their way back into the entity sequence.
>>
>
> One of the issues with the existing tcpdump trace format files is
> that this sort of random access is not really feasible when the
> capture file is compressed, since with most common compression
> schemes (compress, gzip, bzip2) the data must be decompressed
> sequentially starting at the beginning of the file in any case.
...at least on the first pass. If decompressor state is saved at
various checkpoints, gzipped data can be read randomly with
reasonable speed, at least according to somebody back at Network
Appliance (where they used that to support running gdb on compressed
crash dumps without decompressing the file first).
I have the impression that in the bzip2 format, each compressed block
is compressed separately, so, if you have enough memory to hold a
fully decompressed block, you can do random access by seeking to the
beginning of the block containing a decompressed offset,
decompressing the block, and then returning data from the appropriate
offset within the block.
Yes, those require that the file be decompressed sequentially first
(in the bzip2 case, to construct a table mapping between the offsets
of blocks in the compressed data stream and the uncompressed data
stream) - but, at least for Ethereal, that's not a problem, as
Ethereal has to read the entire file sequentially in any case (to
handle the case where the dissection of a particular packet depends
on the dissection of previous packets, which happens a *lot* in
Ethereal).
There might be other applications that would do random access without
doing an initial sequential pass over the file, and for those,
existing compression schemes don't work. I'm just noting that one
shouldn't use Ethereal as a reason to do this. (Note also that
neither a TOC structure nor any other helper structures are needed
for Ethereal. except, in pcap-NG, for a table indicating the offsets
of each section of the file that has a particular byte order -
Ethereal maintains its own data structure containing, for each
packet, the offset to seek to for random access to that packet's data.)
More information about the ntar-workers
mailing list