[ntar-workers] Re: NTAR - PCAP next generation dump file format

Guy Harris guy at alum.mit.edu
Wed Jun 29 20:32:54 GMT 2005


On Jun 29, 2005, at 5:47 AM, Alexander Dupuy wrote:

> Christian Kreibich writes:
>
>> While I think nothing's wrong with a good "toc" structure for the new
>> format, I think it's at least as important to provide good clues  
>> to free
>> fseek()s to find their way back into the entity sequence.
>>
>
> One of the issues with the existing tcpdump trace format files is  
> that this sort of random access is not really feasible when the  
> capture file is compressed, since with most common compression  
> schemes (compress, gzip, bzip2) the data must be decompressed  
> sequentially starting at the beginning of the file in any case.

...at least on the first pass.  If decompressor state is saved at  
various checkpoints, gzipped data can be read randomly with  
reasonable speed, at least according to somebody back at Network  
Appliance (where they used that to support running gdb on compressed  
crash dumps without decompressing the file first).

I have the impression that in the bzip2 format, each compressed block  
is compressed separately, so, if you have enough memory to hold a  
fully decompressed block, you can do random access by seeking to the  
beginning of the block containing a decompressed offset,  
decompressing the block, and then returning data from the appropriate  
offset within the block.

Yes, those require that the file be decompressed sequentially first  
(in the bzip2 case, to construct a table mapping between the offsets  
of blocks in the compressed data stream and the uncompressed data  
stream) - but, at least for Ethereal, that's not a problem, as  
Ethereal has to read the entire file sequentially in any case (to  
handle the case where the dissection of a particular packet depends  
on the dissection of previous packets, which happens a *lot* in  
Ethereal).

There might be other applications that would do random access without  
doing an initial sequential pass over the file, and for those,  
existing compression schemes don't work.  I'm just noting that one  
shouldn't use Ethereal as a reason to do this.   (Note also that  
neither a TOC structure nor any other helper structures are needed  
for Ethereal. except, in pcap-NG, for a table indicating the offsets  
of each section of the file that has a particular byte order -  
Ethereal maintains its own data structure containing, for each  
packet, the offset to seek to for random access to that packet's data.)



More information about the ntar-workers mailing list