nmh-workers
[Top] [All Lists]

Re: [Nmh-workers] mhshow/test-charset failures in nmh-1.7 (`Can't convert ?us-ascii to UTF-8')

2017-11-23 10:23:08
Hi Leonardo,

After finding that having the `libiconv' package installed made a
difference I first looked if the several nmh binaries was linked
against the GNU iconv(3) or the NetBSD iconv(3) and in both cases it's
correctly linked to the NetBSD iconv(3).

So NetBSD has two iconv implementations available, and both supply a
library and iconv(1)?  Can both packages be installed at the same time?
It sounds like it.  And nmh correctly picks the "native" NetBSD library.
Which package provides the iconv in $PATH when both are installed?  And
is /usr/pkg/bin/iconv the other one?

28     #### For unknown reasons, the parameter values checks fail on the
29     #### FreeBSD10 buildbot.  It doesn't support EBCDIC-US, which is used
30     #### by the checks, so check for that.  Though that doesn't seem to be
31     #### the reason.
32     printf '\xe4' | iconv -f EBCDIC-US -t UTF-8 >/dev/null 2>&1  ||
33         skip_param_value_checks=1

So with your original report, this test passed, skip_param_value_checks
remained 0, and thus the failing test was later run.

And, with NetBSD iconv(1) I have:

 % printf '\xe4' | /usr/bin/iconv -f EBCDIC-US -t UTF-8
 U

Good.  So that's what the above iconv test used because...

...while with iconv(1) provided by the `libiconv' package:

 % printf '\xe4' | /usr/pkg/bin/iconv -f EBCDIC-US -t UTF-8
 /usr/pkg/bin/iconv: conversion from EBCDIC-US unsupported
 /usr/pkg/bin/iconv: try '/usr/pkg/bin/iconv -l' to get the list of supported 
encodings
 % echo $?
 1

So, in if GNU iconv(1) is available `$skip_param_value_checks' is
set to 1.

Yes, on your platform, if it's the iconv chosen by the user's PATH.

I'm now curious if apart FreeBSD and NetBSD with `libiconv' package
installed what happens on other platforms, just checking the exit
status of: 

 $ printf '\xe4' | iconv -f EBCDIC-US -t UTF-8

will be probably enough.

Don't quite understand the question.  Here on Arch Linux, ICONV_ENABLED
is 1 so that `printf | iconv' does get run and works so the last two
tests don't get skipped.  That's with iconv(1) from glibc 2.26.

If the exit status is 0 and then, in test-charset context
`$skip_param_value_checks' is 0, what happens if you try (this is
stolen entirely from 'replacement character in parameter value' test
in test-charset):

 $ printf "Subject: invalid parameter value charset\nMIME-Version: 
1.0\nContent-Type: text/plain; charset*=invalid'
'%%0Dus-ascii\n" | \
 mhshow -file - | cat

The test passes here, so I get the expected output.  (At the command
line I get slightly different, but that's my ~/.mh_profile, etc.,
kicking in.)

    start_test 'replacement character in parameter value'
    #### The output of this test doesn't show it, but it covers the
    #### noiconv: portion of get_param_value().
    cat > $msgfile <<'EOF'
    Subject: invalid parameter value charset
    MIME-Version: 1.0
    Content-Type: text/plain; charset*=invalid''%0Dus-ascii
    EOF

    cat > $expected <<EOF
    [ Message inbox:12 ]
    Subject: invalid parameter value charset

    MIME-Version: 1.0

    [ part  - text/plain -   0B  ]
    EOF

Here, I have:

| Subject: invalid parameter value charset
| 
| mhshow: Can't convert ?us-ascii to UTF-8
| mhshow: unable to convert character set from ?us-ascii, continuing...
| [ part  - text/plain -   0B  ]

It seems reasonable that `?us-ascii', with a U+3F question mark at the
start, is an invalid source charset.  Yet mhshow is calling
iconv_open(3) with it here and that's happy.  If I change
content_charset()'s

    ret_charset = get_param(ct->c_ctinfo.ci_first_pm, "charset", '?', 0);

to use a `x' instead then I get a similar mhshow error to you, but with
`xus-ascii'.  So what's special about the question mark to glibc's
iconv_open() that gives `?' a free ride?

I also find this works, oddly.

    $ printf '\344' |
    > iconv -f EBCDIC-US -t '???us-as???cii???'; printf \\n
    U
    $

I run out of answers at this point and will do a bit more digging,
unless someone else here already knows.  Was the `?' replacement
character chosen deliberately in content_charset() to exploit this?

-- 
Cheers, Ralph.
https://plus.google.com/+RalphCorderoy

-- 
Nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers

<Prev in Thread] Current Thread [Next in Thread>