Wed 31 Oct 2007
We recently upgraded from MySQL 4 to MySQL 5 and we noticed strange characters in some of the varchar and text fields.
step 1: dump out the contents of the bad field in hex
mysql> select hex(myfieldname) from mytablename where myid=’something’;
step 2: run a php program to print out the chars one at a time
function hex2asc($temp) { $len = strlen($temp); for ($i=0;$i<$len;$i+=2) { $data[$i]=chr(hexdec(substr($temp,$i,2)))." ".substr($temp,$i,2); } return $data; } $str="HEX GOES HERE"; $data=hex2asc($str); print_r($data);
step 3: once you’ve identified the bad chars and their hex values, change them in the database
here are some bad chars I found:
– fix apostrophe
update mytablename set myfieldname=REPLACE(myfieldname,UNHEX(’92′),”‘”);
– fix left single quote
update mytablename set myfieldname=REPLACE(myfieldname,UNHEX(’93′),”‘”);
– fix right single quote
update mytablename set myfieldname=REPLACE(myfieldname,UNHEX(’94′),”‘”);
– fix bullets
update mytablename set myfieldname=REPLACE(myfieldname,UNHEX(’95′),”& #8226;”);
– fix double dash
update mytablename set myfieldname=REPLACE(myfieldname,UNHEX(’96′),”–”);
– fix triple dash
update mytablename set myfieldname=REPLACE(myfieldname,UNHEX(’97′),”—”);
— fix supscripted TM
update mytablename set myfieldname=REPLACE(myfieldname,UNHEX(’99′),”<sup>TM</sup>”);
– replace jacked up apostrophe/single quote
– delete EFs
update mytablename set myfieldname=REPLACE(myfieldname,UNHEX(’EF’),”");
– replace BFs with single quote
update mytablename set myfieldname=REPLACE(myfieldname,UNHEX(’BF’),”‘”);
– delete BDs
update mytablename set myfieldname=REPLACE(myfieldname,UNHEX(’BD’),”");
see also: http://www.oreillynet.com/onlamp/blog/2006/01/turning_mysql_data_in_latin1_t.html
January 23rd, 2008 at 4:27 pm
What happens when the hex is multibyte?
I had one character HEX to E28099 in mysql, and it was clearly intended to be a single quote (In “They’re just beautiful!”, character 5), but the UTF-8 for that hex seems to be a Trademark character.