<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Fri, Aug 28, 2015 at 11:25 AM, Hadriel Kaplan <span dir="ltr"><<a href="mailto:the.real.hadriel@gmail.com" target="_blank">the.real.hadriel@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On Thu, Aug 27, 2015 at 6:13 PM, Stephen Donnelly<br>

<<a href="mailto:stephen.donnelly@avagotech.com">stephen.donnelly@avagotech.com</a>> wrote:<br>

> I don't use the current epb_hash option, but I can see some use cases.<br>

<br>

</span>Oh I'm not debating there might be some use for it<br>

somewhere/somehow/someday - just asking if anyone's ever implemented<br>

it. (Mostly because I'm trying to get the doc to a point we can submit<br>

it to the IETF, and thus need to remove or clear up things that aren't<br>

specified well enough to implement in an interoperable fashion.)</blockquote><div><br></div><div>Wireshark's Ethernet dissector has a 'Generate an MD5 hash of each frame' preference, so presumably this could be stored in pcapng files?</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">

> The epb_hash could be a hash, signature, or digest over some part of the<br>

> packet 'payload'. This could be just the IP payload, the whole IP datagram,<br>

> or the entire Ethernet frame for example. The purpose would be to accelerate<br>

> the detection of 'duplicate' packets/payloads. These commonly occur in some<br>

> SPAN (or other Network Packet Broker) configurations, when capturing from<br>

> multiple VLANs, or when capturing at multiple points in a network<br>

> simultaneously.<br>

> Duplicates might be excluded from TCP analysis to avoid invalid<br>

> retransmission detection, or may be leveraged to measure network/equipment<br>

> latency.<br>

<br>

</span>Sure, but programs which perform duplicate-packet detection and<br>

removal already do their own calculations and decisions, based on<br>

various factors/criteria. Why do they need the file to tell them<br>

something they can figure out on their own, and will ignore anyway<br>

unless it meets their exact criteria?<br></blockquote><div><br></div><div>All of them? I could see capture being performed separately from duplicate detection. Use cases where duplicate removal is not the goal may include latency measurement (including multicast), or routing/switching debugging.</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Or are you saying it's a hash of the *original* packet data, which<br>

might not all be in the Packet Data field due to a shorter<br>

snaplen/capture-length?  I don't believe that's the case, because the<br>

current text in the doc says one of the purposes for it was for<br>

"reliable data transfer between the data acquisition system and the<br>

capture library". (but I may be reading too much into that)</blockquote><div><br></div><div>I would expect it to be on the original data, so it may be useful where the data has been 'snapped', although this may make it impossible to detect 'hash collisions' after the fact. In either case it can accelerate duplicate detection by reducing the number of full content comparisons needed.</div><div><br></div><div>I agree that the algorithm and 'scope' e.g. included/excluded fields should be known for best utility. This could be reported per packet, or a policy could be recorded in IDBs?</div><div><br></div><div>Stephen</div><div><br></div></div>

</div></div>