xsl-list
[Top] [All Lists]

Re: [xsl] 16-bit chars rendered as "?" in UTF-8?

2012-08-15 09:33:35
You can catch Non-ASCII XML characters using this regexp:

$ od -tx1 00 | grep -E " (0[^9ad]|[189a-f]|7[^f])"
0000000 00
$ od -tx1 0D | grep -E " (0[^9ad]|[189a-f]|7[^f])"
$ od -tx1 7E | grep -E " (0[^9ad]|[189a-f]|7[^f])"
0000000 7e
$ od -tx1 7F | grep -E " (0[^9ad]|[189a-f]|7[^f])"
$ od -tx1 80 | grep -E " (0[^9ad]|[189a-f]|7[^f])"
0000000 80
$


Mit besten Gruessen / Best wishes,

Hermann Stamm-Wilbrandt
Level 3 support for XML Compiler team and Fixpack team lead
WebSphere DataPower SOA Appliances
https://www.ibm.com/developerworks/mydeveloperworks/blogs/HermannSW/
https://twitter.com/#!/HermannSW/
----------------------------------------------------------------------
IBM Deutschland Research & Development GmbH
Vorsitzende des Aufsichtsrats: Martina Koederitz
Geschaeftsfuehrung: Dirk Wittkopp
Sitz der Gesellschaft: Boeblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294


                                                                                
                                          
  From:       John English <john(_dot_)foreign(_at_)gmail(_dot_)com>            
                                                           
                                                                                
                                          
  To:         David Carlisle <davidc(_at_)nag(_dot_)co(_dot_)uk>,               
                                                           
                                                                                
                                          
  Cc:         xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com                  
                                                           
                                                                                
                                          
  Date:       08/15/2012 04:07 PM                                               
                                          
                                                                                
                                          
  Subject:    Re: [xsl] 16-bit chars rendered as "?" in UTF-8?                  
                                          
                                                                                
                                          





On 15/08/2012 16:58, David Carlisle wrote:
On 15/08/2012 14:38, John English wrote:
I then search the file with "od -b | grep ' [4-7][0-7][0-7]'",

a file containing é  in iso-8859-1 would have octal byte 351 which would
not match that grep. You need to check no characters bigger than 127
(octal 177 if you prefer)

Ooops, quite right! However, still nothing shows up...

--
John English

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--





--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--