We recently upgraded from MySQL 4 to MySQL 5 and we noticed strange characters in some of the varchar and text fields.
step 1: dump out the contents of the bad field in hex
mysql> select hex(myfieldname) from mytablename where myid=’something’;
step 2: run a php program to print out the chars one at a time
function hex2asc($temp) {
$len = strlen($temp);
for ($i=0;$i<$len;$i+=2) {
$data[$i]=chr(hexdec(substr($temp,$i,2)))." ".substr($temp,$i,2);
}
return $data;
}
$str="HEX GOES HERE";
$data=hex2asc($str);
print_r($data);
?>
step 3: once you’ve identified the bad chars and their hex values, change them in the database
here are some bad chars I found:
– fix apostrophe
update mytablename set myfieldname=REPLACE(myfieldname,UNHEX(’92′),”‘”);
– fix left single quote
update mytablename set myfieldname=REPLACE(myfieldname,UNHEX(’93′),”‘”);
– fix right single quote
update mytablename set myfieldname=REPLACE(myfieldname,UNHEX(’94′),”‘”);
– fix bullets
update mytablename set myfieldname=REPLACE(myfieldname,UNHEX(’95′),”& #8226;”);
– fix double dash
update mytablename set myfieldname=REPLACE(myfieldname,UNHEX(’96′),”–”);
– fix triple dash
update mytablename set myfieldname=REPLACE(myfieldname,UNHEX(’97′),”—”);
— fix supscripted TM
update mytablename set myfieldname=REPLACE(myfieldname,UNHEX(’99′),”<sup>TM</sup>”);
– replace jacked up apostrophe/single quote
– delete EFs
update mytablename set myfieldname=REPLACE(myfieldname,UNHEX(’EF’),”");
– replace BFs with single quote
update mytablename set myfieldname=REPLACE(myfieldname,UNHEX(’BF’),”‘”);
– delete BDs
update mytablename set myfieldname=REPLACE(myfieldname,UNHEX(’BD’),”");
see also: http://www.oreillynet.com/onlamp/blog/2006/01/turning_mysql_data_in_latin1_t.html