[pcap-ng-format] Multiple SHBs in a file

Mon Aug 24 16:39:03 UTC 2015

On Mon, Aug 24, 2015 at 11:14 AM, Michael Richardson <mcr at sandelman.ca> wrote:
>
> Hadriel Kaplan <the.real.hadriel at gmail.com> wrote:
>     > It makes read-processing a file far more complicated, and I don't see
>     > any real benefit in return - except maybe for a dumb "file merger"
>     > which just concatenates SHB sections from separate files into one file
>     > - but I'm not sure why we should complicate the file format for that
>     > one action.
>
> I thought one of the points of being able to insert SHBs anywhere is that one
> can merge *in real time* data from multiple sources.

I'm not sure in what way you mean "merge data from multiple sources in
real time".

> Can you explain why the read-processing becomes harder?  Probably I'm just
> lacking implementation experience here.

Because it creates a new "scope" for subsequent blocks: a different
Interface ID number space, which all the EPBs and ISBs refer to (and
SPBs implicitly refer to), for the respective IDBs.

I can only speak for tools from the wireshark set, but this probably
applies to other tools as well:

A tool like Wireshark performs random access of packet information
from the file - whenever you click a row in the GUI packet list, it
accesses that packet info from the file and decodes it again (and does
so at various other times as well). So it becomes difficult because
the Interface ID number in the file for a EPB, which ties it to some
IDB, is only relative to the IDB list of the SHB its within the scope
of, not to mention the endian order might change, etc... which is not
something Wireshark handles currently. It would have to create and
keep that type of information, increasing the complexity and memory
use.

A tool like mergecap does not perform random file access, but does try
to merge SHBs from all the input files into one SHB for the output
file; and it either merges the IDBs into one IDB or joins them into a
bigger list of IDBs, and thus re-writes EPB Interface ID numbers to
match the new ordinal position/index of their merged/joined original
IDBs. So you can see how that would be complicated by having multiple
SHBs in a file, resetting the Interface ID number space. Plus mergecap
tries to figure out if the input files' IDB sets are "duplicates" -
because it's common to merge multiple PCAPNG files generated by
ring-buffer saving, where all the files were created by the same
machine one after another. So that too would be complicated by having
multiple SHBs. (though I don't think mergecap even handles IDBs
appearing in the middle of a file, so there's work to do there anyway)

A tool like reordercap, which reorders/sorts all of the packets in a
pcapng file based on timestamp - it would have to re-map Interface IDs
of EPBs it moves around, if there were multiple SHBs. (and it couldn't
really handle SPBs in such a scenario)

For most any tool that reads an input pcapng file and writes some
subset or all of contents to an output pcapng file - for example:
Wireshark, tshark, editcap, mergecap, reordercap, and probably
something I'm forgetting - you'd have to decide whether you want to
write the same SHBs from the input file into the output file. If you
do then you'd have to keep track of where they were and which packets
belonged to them, or expose them to your packet reading API somehow.
If you don't want to keep them then you'd need to re-map the
Interface-ID of the EPB and ISBs, and either not be able to handle
SPBs or have to convert SPBs to EPBs somehow.

-hadriel