[pcap-ng-format] Does anyone actually generate the epb_hash field today?

Thu Aug 27 23:58:03 UTC 2015

On Fri, Aug 28, 2015 at 11:25 AM, Hadriel Kaplan <the.real.hadriel at gmail.com
> wrote:

> On Thu, Aug 27, 2015 at 6:13 PM, Stephen Donnelly
> <stephen.donnelly at avagotech.com> wrote:
> > I don't use the current epb_hash option, but I can see some use cases.
>
> Oh I'm not debating there might be some use for it
> somewhere/somehow/someday - just asking if anyone's ever implemented
> it. (Mostly because I'm trying to get the doc to a point we can submit
> it to the IETF, and thus need to remove or clear up things that aren't
> specified well enough to implement in an interoperable fashion.)

Wireshark's Ethernet dissector has a 'Generate an MD5 hash of each frame'
preference, so presumably this could be stored in pcapng files?

> > The epb_hash could be a hash, signature, or digest over some part of the
> > packet 'payload'. This could be just the IP payload, the whole IP
> datagram,
> > or the entire Ethernet frame for example. The purpose would be to
> accelerate
> > the detection of 'duplicate' packets/payloads. These commonly occur in
> some
> > SPAN (or other Network Packet Broker) configurations, when capturing from
> > multiple VLANs, or when capturing at multiple points in a network
> > simultaneously.
> > Duplicates might be excluded from TCP analysis to avoid invalid
> > retransmission detection, or may be leveraged to measure
> network/equipment
> > latency.
>
> Sure, but programs which perform duplicate-packet detection and
> removal already do their own calculations and decisions, based on
> various factors/criteria. Why do they need the file to tell them
> something they can figure out on their own, and will ignore anyway
> unless it meets their exact criteria?
>

All of them? I could see capture being performed separately from duplicate
detection. Use cases where duplicate removal is not the goal may include
latency measurement (including multicast), or routing/switching debugging.

Or are you saying it's a hash of the *original* packet data, which
> might not all be in the Packet Data field due to a shorter
> snaplen/capture-length?  I don't believe that's the case, because the
> current text in the doc says one of the purposes for it was for
> "reliable data transfer between the data acquisition system and the
> capture library". (but I may be reading too much into that)

I would expect it to be on the original data, so it may be useful where the
data has been 'snapped', although this may make it impossible to detect
'hash collisions' after the fact. In either case it can accelerate
duplicate detection by reducing the number of full content comparisons
needed.

I agree that the algorithm and 'scope' e.g. included/excluded fields should
be known for best utility. This could be reported per packet, or a policy
could be recorded in IDBs?

Stephen
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.winpcap.org/pipermail/pcap-ng-format/attachments/20150828/10a9907a/attachment-0001.html>