[pcap-ng-format] TODO in pcap-ng specifications

Guy Harris guy at alum.mit.edu
Tue Jul 24 19:45:37 PDT 2012


On Jul 24, 2012, at 6:49 AM, Jasper Bongertz wrote:

> I've just spent a little time in the specs and searched for all TODOs to
> see what can be done about them. I have created a text document with my
> thoughts, and maybe some of you can take a look at it and we can start a
> discussion about it to get things going.

	...

> Text: "TODO: mention for each option, if it can/shouldn't appear more than one time. The option list is terminated by a Option which uses the special 'End of Option' code (opt_endofopt)."
> 
> Options appearing more than once make the file parsing process more difficult since you can't assume that there will be only one value (or none at all) for each option, but on the other hand it gives greater flexibility, e.g. to store multiple comments in case a file was processed etc. This may be a way to get around adding more option fields each time someone needs to store additional information by just adding them. 
> A problem might be that we will have to force writing options in the same order as they were read to prevent something like sequential comments getting out of order after writing the file again.
> 
> The TODO asks for each option if they can appear multiple times, so here are my suggestions:
> 
> opt_endofopt			only once, if at all

That's somewhat implicit - if opt_endofopt is the end of the options, there are no options after it, which means there are no opt_endofopt options after it.

> opt_comment			multiple

Yes.

> shb_hardware			multiple  
> shb_os				multiple
> shb_userappl			multiple

That would require some way of determining, for instance of one of those options, which instances of the other options go along with it, if any.  If the capture program only supplied shb_hardware and shb_os, and the first program that processed the file after that only supplied shb_userappl, a naive program might think that was the application that captured the trace.  (Either that, or we should mandate that if any of those are present all should be present, but it might be tricky to get some of them on some platforms; I guess we could say "a zero-length string is OK, and it means "I have no clue"".)

> if_name				once

> if_description			once

Yes.

> if_IPv4addr			multiple
> if_IPv6addr			multiple

Yes - the spec explicitly says so, as per my mail to Anders.

> if_MACaddr			once
> if_EUIaddr			once

I think so - an interface might be set up to handle multiple *multicast* addresses, but I don't know whether any adapters can be given multiple individual addresses.

> if_speed			once
> if_tsresol			once
> if_tzone			once

Yes.

> if_filter			once

Yes, as that's the *capture* filter, not any filter applied to a capture file after the capture.

> if_os				once
> if_fcslen			once
> if_tsoffset 			once

Yes.

> epb_flags			once
> epb_hash			once
> epb_dropcount			once
> pack_flags			once
> pack_hash			once

Yes.

> ns_dnsname			multiple
> ns_dnsIP4addr			multiple
> ns_dnsIP6addr			multiple

Yes - many resolvers can be configured to have more than one DNS server.

> isb_starttime			once
> isb_endtime			once
> isb_ifrecv			once
> isb_ifdrop			once
> isb_filteraccept		once
> isb_osdrop			once
> isb_usrdeliv			once

Yes.

(Note, BTW, that the table above assumes 8-space tabs; I suspect your table assumed 4-space tabs.  For better or worse, neither 4-space tabs nor 8-space tabs are a universal standard.)

> Text: "(TODO - It would be nice, to have a "invalid Interface ID" defined, e.g. 0xFFFFFFFF"
> 
> I guess we can just say "Use 0xFFFFFFFF to signal an invalid Interface ID" here.

If a Packet Block or Enhanced Packet Block has 0xFFFFFFFF as an interface ID, how is a program reading the file to determine what link-layer header type the packet has?  I'd say we should demand that all PBs and EPBs have a valid interface ID.

> Text: "Time zone for GMT support (TODO: specify better). 	TODO: give a good example"
> 
> I'm not sure what the time zone is for, but I guess it can be used to track in which time zone a trace was taken. That would allow to adjust the UTC saved in the time stamps to the local time of when the capture was made. I think the best way to specify this would be to just note the offset from UTC in minutes, like "UTC+60" (for a trace taking in Germany) or just "60". Storing it in minutes instead of hours takes care of offsets that are fractional, (e.g. Iran has a time zone of UTC+4.5)
> 
> For daylight saving time I'd just write the UTC offset including the additional minutes.

What if a change to or from DST/summer time takes place during the capture?  Handling that would require something such as an Olson/IANA time zone database:

	http://www.iana.org/time-zones/

zone name (or a POSIX TZ specification), which would also specify the offset(s) from UTC.

> By the way, I just scanned the whole document to find the specification that all timestamps have to be written in UTC and didn't find it. Did I miss it, or is it really not specified at the moment? If so, this is probably a critical topic to add, isn't it?

Well, the EPB specification says

> Timestamp (High) and Timestamp (Low): high and low 32-bits of a 64-bit quantity representing the timestamp. The timestamp is a single 64-bit unsigned integer representing the number of units since 1/1/1970. The way to interpret this field is specified by the 'if_tsresol' option (see Figure 9) of the Interface Description block referenced by this packet. Please note that differently from the libpcap file format, timestamps are not saved as two 32-bit values accounting for the seconds and microseconds since 1/1/1970. They are saved as a single 64-bit quantity saved as two 32-bit words.

but what they really presumably meant is "since 1/1/1970 00:00:00 UTC".

Of course, that then raises the question of whether that count is a strict count of seconds and fractions of a second, counting up by 1 every second, or if the clock stops during a positive leap second and jumps by two over a negative leap second.  My strong inclination is not to address that question right now, given that people wanting to do calculations of elapsed time between packets would want "strict count of seconds" and people forced to implement packet capture on OSes either forced to do weird things during leap seconds by POSIX or by compatibility would want the former.  I might be sorely tempted to, in a future version of the spec:

	add a new packet block type, wherein the time stamp is a strict count of seconds and fractions of a second since the capture started;

	add an option to the SHB for "time the capture started", and perhaps specify that as year/month/day/hour/minute/second/fraction UTC (and duck as people ask whether, if the capture starts during a positive leap second, "second" can be 60 :-)), thus perhaps allowing us to punt on leap second hell (other than the aforementioned issue) *AND* not have to worry about what happens if the clock gets changed during the capture (for POSIX weenies, think of the packet block containing a CLOCK_MONOTONIC rather than CLOCK_REALTIME value), or, alternatively, use if_tsoffset for that.

(But don't get me started about POSIX and leap seconds and what "UTC" means....)

> Text: "The filter (e.g. "capture only TCP traffic") used to capture traffic. The first byte of the Option Data keeps a code of the filter used (e.g. if this is a libpcap string, or BPF bytecode, and more). More details about this format will be presented in Appendix XXX (TODO). (TODO: better use different options for different fields? e.g. if_filter_pcap, if_filter_bpf, ...)"
> 
> Maybe this is something for someone who is more specialized in the capture filter business. I'm not sure if we need different fields for this.

I don't think so.  For one thing, what if you have more than one such option?  Is a program that cares about it required to assume that, say, an if_filter_bpf value is the result of compiling, on the machine on which the compilation was done, the if_filter_pcap value, or does it need to decide which of those filters was the one actually used?

We *should*, however, nail down the code for the first byte.

> 3.3 Enhanced Packet Block
> 
> Text: "(TODO: the text above uses "first bit", but shouldn't this be "first byte"?!?)"
> 
> I think it is already fixed, so this TODO is deprecated. Or am I missing something here?

It's not fixed in

	http://www.winpcap.org/ntar/draft/PCAP-DumpFileFormat.html

Are you referring to a newer version in SVN?


More information about the pcap-ng-format mailing list