[ntar-workers] Seekable file layouts etc

Loris Degioanni loris.degioanni at gmail.com
Thu Jun 30 17:14:23 GMT 2005



Christian Kreibich wrote:
> Hi,
> 
> here's my input on the seekable file format discussion.
> 
> - We have: a structured but extensible file format which defines a file
> as a sequence of top-level blocks which can contain nested blocks.
> Blocks contain fields showing the blocks' size at the beginning and end
> of the block, so backward seeks are basically feasible. The only fixed
> requirement for block content at the moment is the Section Header Block
> (SHB).
> 
> - We want: to be able to nail down the packets in a particular
> timeframe, particularly the start time, much faster than through
> sequential reads, while keeping life as simple as possible for packet
> dumpers.
> 
> Here's what I don't like:
> 
> - I don't like the idea of markers at fixed offsets in the file +
> padding etc. Too complicated, and it simply doesn't fit in well with the
> otherwise clean nestable structure.
> 
> - I don't like the idea of a flat row of markers dispersed through the
> file for speeding up seeking to someplace. I believe that to make this
> truly fast you want to allow more sophisticated structures, e.g., tree-
> type approaches. See below for more on this.
> 
> - I don't think timestamps are the only search criterion that's handy. I
> wouldn't be surprised if some applications would like "bookmarks" for
> all kinds of things (and a timeframe of packets really is just
> equivalent to two such bookmarks). I work a lot with IDSs and I know for
> a fact that we'd often love to label the first packet of individual
> flows as "suspicious", "scanner", "worm", etc. Being able to index those
> as a component of the file format would rock.

This could be achieved easily through a "marker" block before the 
suspiciuos.

> Now, I definitely like Ronnie's suggestion of having a "PacketsBlock"
> that contains a sequence of packets. So
> 
> 1) Let's have the dumpers partition the stream of packets into such
> blocks by some amount specified in the SHB. The size isn't as important
> as in the previous approaches because it will only serve as the *basic*
> granularity at which parts can be skipped, but not the *only* one.
> 
> 2) The SHB already has a special meaning. Why don't we complement it by
> a mandatory Section End Block (SEB)? I think that would be an ideal
> location to store:
> 
>       * Section size.  Useful so you can quickly get back to the
>         corresponding SHB. Notice that backward navigation so far is
>         only possible at the block level.
> 
>       * Statistics.  The dumpers chalk up their stats as they pump out
>         the packets, while chunking up the packet sequence into Packet
>         Blocks at some suitable interval (the size of which probably
>         depends on whether this is used in 2005 or 2015 -- the current
>         format has been around for a decade!). 
> 
>       * Navigation data.  Let's have special blocks for that, with
>         suitable encodings of lookup structures. Could be flat lists of
>         offsets relative to the SHB of packet blocks, or for best
>         performance, the app can encode a tree-like search structure in
>         those blocks. The tree's nodes label individual PacketsBlocks
>         (and optionally packet counts into those blocks) for
>         bookmarking.

This is really cool, but opens a problem: what happens if the capture is 
truncated before writing the SEB? One of the goals of pcap-NG is to 
support concatenating trace files: What happens if a file is without the 
SEB?

> Sample illustration:
> 
> SHB: Section Header Block
> IDB: Interface Description Block
> SEB: Section End Block
> PSB: PacketsBlock
> PB:  PacketBlock
> 
> <-- Section ---------------------------------------------------->|<--..
> 
> +---+---+--+--------------+------+--------------+----------------+----
> |SHB|IFB|  |PSB +--+  +--+|PSB   |PSB +--+  +--+| SEB +-----+---+|SHB
> |   |   |..|    |PB|..|PB||    ..|    |PB|..|PB||     |Stats|Nav||   ..
> |   |   |  |    +--+  +--+|      |    +--+  +--+|     +-----+---+|
> +---+---+--+--------------+------+--------------+----------------+----
> 
> The only assumption I make is that there aren't too many sections in a
> file, so that you can quickly find all the SEBs (you find them from the
> end of the file, not the beginning). Once there, you check if there are
> any Nav Blocks that suit your purpose, and use their structure to
> quickly walk to or iterate over the points in the PSBs indexed.

The problem of this structure is that it's gerarchical, and ntar at the 
moment doesn't support blocks nesting.
(Why? I proposed blocks nesting last year as one of the first features 
of pcap-NG, but nobody liked it because they tought it was too heavy to 
support and pretty much useless. Therefore, I removed it from the specs 
and ntar now doesn't support it).
On the other side, the structure you propose should work ok if the PSB 
could be a marker and not a container. In every case, I don't perfectly 
understand the precise use of the PSB, and why using it is better than 
just having more sections (I don't see a big overhead in it).


> At the very least, this proposal clearly gets the Best ASCII Art Award
> so far.

I totally agree with that. My complments. :-)

Loris


> Cheers,
> Christian.


More information about the ntar-workers mailing list