[Winpcap-users] pcap_open_offline and unicode charsets
guy at alum.mit.edu
Sun Nov 8 11:07:06 PST 2009
On Nov 7, 2009, at 5:23 PM, Mark Bednarczyk wrote:
> What support is there, for unicode character based file names in
> WinPcap to functions such as pcap_open_offline?
> I have users that are trying to open a file with some chineese
> characters in its filename. As far as I understand it, fopen under
> unix (especially under linux) should handle unicode 8-bit with no
I presume by "unicode 8-bit" you mean UTF-8-encoded Unicode.
fopen() on UN*Xes passes the pathname on to open(); that means that it
should handle any sequence of octets as long as the octet value 0x2f
is used *only* as a pathname component separator and the octet value
0x00 is used *only* as a pathname string terminator.
Most local file systems will not attempt to interpret that string,
except to treate 0x2f (/) as a component separator and 0x00 ('\0') as
a pathname string terminator. Whether a particular file name is
encoded as UTF-8, or ISO 8859/1, ... is another matter; I have the
impression that various UN*Xes are tending towards UTF-8 as the most
common encoding, but there are probably still systems using other
> Linux also handles wider widths but only in a non-intentional way
> where wider width chars are handled as 8-bit entities (ie. 0x1065 is
> handled as 2 separate 8-bit chars: 0x65 and 0x10 where order is
> dependent on processor endianness.)
I would hope it does no such thing, especially with, for example, the
wide character 0x2f65 (⽥) - if you hand any UN*X API that takes
pathnames an octet sequenc containing the octet 0x2f followed by the
octet 0x65, I would hope that it would be interpreted as containing "/
e", and, similarly, if you had it a string containing the octet 0x65
followed by the octet 0x2f, I would hope that it would be interpreted
as containing "e/".
> Under MSFC is different and you have to use MS specific wfopen and
> wopen calls which take unicode (or wide chars).
> Does WinPcap provide any support for unicode and call the
> appropriate "open" function?
No, it just uses fopen(), just as libpcap does on UN*X.
In theory, it could convert from UTF-8 to UTF-16 and call _wfopen(),
but that could conceivably break existing applications that either
explicitly or implicitly expect the path argument to
pcap_open_offline() to work the same as the path argument to fopen().
My inclination would be to, in WinPcap, provide pcap_wopen_offline(),
or something such as that, taking a UTF-16 pathname as an argument.
> My library gets its filename from a java string and it currently
> converts it to plain UTF-8 charset and that works fine.
On UN*X, it should perhaps be converted to whatever the locale's
filename character set is.
I'm not sure how that would be determined, however. I might be
tempted to assume that, if the environment variable LC_CTYPE is set
that specifies the encoding, otherwise if LANG is set that specifies
the encoding, otherwise it might be the C locale (which, I think,
unfortunately says the encoding is ASCII). However, GLib (not glibc,
GLib) has its own additional environment variables:
and I'm not sure why that's the case.
> But in reality I'd like to support all unicode widths 8, 16 and even
> 32 bit. I'm not sure how those wider unicode chars would be handled.
How are they handled elsewhere in Java? The File class seems to work
with Strings, and the String class, at least as I understand the
documentation, uses UTF-16 (presumably that's what you mean by
"unicode [width] ... 16 ... bit").
More information about the Winpcap-users