[ntar-workers] Seekable file layouts etc

Christian Kreibich christian at whoop.org
Thu Jun 30 00:31:46 GMT 2005


Hi,

here's my input on the seekable file format discussion.

- We have: a structured but extensible file format which defines a file
as a sequence of top-level blocks which can contain nested blocks.
Blocks contain fields showing the blocks' size at the beginning and end
of the block, so backward seeks are basically feasible. The only fixed
requirement for block content at the moment is the Section Header Block
(SHB).

- We want: to be able to nail down the packets in a particular
timeframe, particularly the start time, much faster than through
sequential reads, while keeping life as simple as possible for packet
dumpers.

Here's what I don't like:

- I don't like the idea of markers at fixed offsets in the file +
padding etc. Too complicated, and it simply doesn't fit in well with the
otherwise clean nestable structure.

- I don't like the idea of a flat row of markers dispersed through the
file for speeding up seeking to someplace. I believe that to make this
truly fast you want to allow more sophisticated structures, e.g., tree-
type approaches. See below for more on this.

- I don't think timestamps are the only search criterion that's handy. I
wouldn't be surprised if some applications would like "bookmarks" for
all kinds of things (and a timeframe of packets really is just
equivalent to two such bookmarks). I work a lot with IDSs and I know for
a fact that we'd often love to label the first packet of individual
flows as "suspicious", "scanner", "worm", etc. Being able to index those
as a component of the file format would rock.

Now, I definitely like Ronnie's suggestion of having a "PacketsBlock"
that contains a sequence of packets. So

1) Let's have the dumpers partition the stream of packets into such
blocks by some amount specified in the SHB. The size isn't as important
as in the previous approaches because it will only serve as the *basic*
granularity at which parts can be skipped, but not the *only* one.

2) The SHB already has a special meaning. Why don't we complement it by
a mandatory Section End Block (SEB)? I think that would be an ideal
location to store:

      * Section size.  Useful so you can quickly get back to the
        corresponding SHB. Notice that backward navigation so far is
        only possible at the block level.

      * Statistics.  The dumpers chalk up their stats as they pump out
        the packets, while chunking up the packet sequence into Packet
        Blocks at some suitable interval (the size of which probably
        depends on whether this is used in 2005 or 2015 -- the current
        format has been around for a decade!). 

      * Navigation data.  Let's have special blocks for that, with
        suitable encodings of lookup structures. Could be flat lists of
        offsets relative to the SHB of packet blocks, or for best
        performance, the app can encode a tree-like search structure in
        those blocks. The tree's nodes label individual PacketsBlocks
        (and optionally packet counts into those blocks) for
        bookmarking.

Sample illustration:

SHB: Section Header Block
IDB: Interface Description Block
SEB: Section End Block
PSB: PacketsBlock
PB:  PacketBlock

<-- Section ---------------------------------------------------->|<--..

+---+---+--+--------------+------+--------------+----------------+----
|SHB|IFB|  |PSB +--+  +--+|PSB   |PSB +--+  +--+| SEB +-----+---+|SHB
|   |   |..|    |PB|..|PB||    ..|    |PB|..|PB||     |Stats|Nav||   ..
|   |   |  |    +--+  +--+|      |    +--+  +--+|     +-----+---+|
+---+---+--+--------------+------+--------------+----------------+----

The only assumption I make is that there aren't too many sections in a
file, so that you can quickly find all the SEBs (you find them from the
end of the file, not the beginning). Once there, you check if there are
any Nav Blocks that suit your purpose, and use their structure to
quickly walk to or iterate over the points in the PSBs indexed.

At the very least, this proposal clearly gets the Best ASCII Art Award
so far.

Cheers,
Christian.
-- 
________________________________________________________________________
                                          http://www.cl.cam.ac.uk/~cpk25
                                                    http://www.whoop.org




More information about the ntar-workers mailing list