perl-unicode

RE: UTF-16 -> UTF-8

2001-11-21 15:30:20
You can tell Access about the encoding when you import a file.

In Access 2000, open the File menu, and on the Get External Data submenu,
select Import. The file browser dialog box will open.

If you have your data in a text file, select the text file type, then select
the file to import in the dialog box. The Import text wizard will appear.

Click the Advanced button. The Import Specification dialog box will appear.

On the Code Page menu, select the character encoding used in the file. This
wizard supports 16-bit Unicode (Big- and Little-endian), UTF-8, and the
obsolete UTF-7, in addition to numerous legacy character sets such as
Japanese Shift-JIS. The wizard will ask you other questions about field
widths and the like, and then import the data into a table, one line to a
record.

When the file has been imported, you may need to select an appropriate font
for viewing your new table.

-----Original Message-----
From: Rui Ribeiro [mailto:ruirib(_at_)computer(_dot_)org]
Sent: Wednesday, November 21, 2001 10:47 AM
To: Philip Newton
Cc: perl-unicode(_at_)perl(_dot_)org
Subject: RE: UTF-16 -> UTF-8


Philip,

I can read the file properly on Word,now. Just had to force
it to have me confirm the conversion performed when opening
the file. So
when opening if I "force" Word to treat it as Unicode, it
will read the file properly.
Just have to make Access to recognize the encoding now. I'm
feeling closer to end now. Hope I can sort it out.

Thank you very much. Your help was invaluable to have reached
this far.
Best regards.

Rui

On Wed, 21 Nov 2001 16:05:06 -0000, in perl.unicode you wrote:

now I can write to the DB, but the values are not
properly recognized. If
you try to open the file I attached to my prior mail in
Word, you'll
see exactly what I see in the DB record.

In Word, I see ĨĩŨũ, but when I open it in UniPad as
UTF-8, I get
<LATIN CAPITAL LETTER I WITH TILDE><LATIN SMALL LETTER I WITH
TILDE><LATIN CAPITAL LETTER U WITH TILDE><LATIN SMALL LETTER U WITH
TILDE><CARRRIAGE RETURN><LINE FFED>, i.e. "I~i~U~u~\r\n"
but with the
accents on the characters. So the data is UTF-8 encoded, not UTF-16.
(But the attachment was called fich1.txt, which you said was UTF-8
encoded.)

Maybe the database re-coded it, or the whatever you're
using to write to
the database or to read back from it is recoding the UTF-16
to UTF-8?

Cheers,
Philip



<Prev in Thread] Current Thread [Next in Thread>