Jul 6, 2010

Posted by in PHP | 5 comments

Convert Latin and other characters to html entities in PHP

There is a simple function available in php which converts all applicable characters to HTML Entities. Note :: applicable this means not all characters. 

Mostly when a user copy paste some data from Microsoft Word or Excel, it used to have latin characters which couldn't be converted to there respective HTML Entities if you use the htmlentities function.

To overcome this problem I have written a custom function which will conver the characters like “ € kind of symbols to there respective html entities.

Basically you can say, this function will work to convert all the applicable characters to there respective html entities along with the typical symbols and other Latin characters.

If required you can add any missing latin characters to the array. Kindly comment if you find any thing missing.

Here goes the function

/*
   This function will convert all applicable characters to html entities along with the latins other picular characters
*/


function htmlentitiesandlatin($str)
{
      //list of latin chars which mostly comes when a user copy pasts there data from MS WORD kind of soft.
   
  $trans[chr(130)] = '‚';    // Single Low-9 Quotation Mark
        $trans[chr(131)] = 'ƒ';    // Latin Small Letter F With Hook
        $trans[chr(132)] = '„';    // Double Low-9 Quotation Mark
        $trans[chr(133)] = '…';    // Horizontal Ellipsis
        $trans[chr(134)] = '†';    // Dagger
        $trans[chr(135)] = '‡';    // Double Dagger
        $trans[chr(136)] = 'ˆ';    // Modifier Letter Circumflex Accent
        $trans[chr(137)] = '‰';    // Per Mille Sign
        $trans[chr(138)] = 'Š';    // Latin Capital Letter S With Caron
        $trans[chr(139)] = '‹';    // Single Left-Pointing Angle Quotation Mark
        $trans[chr(140)] = 'Œ';    // Latin Capital Ligature OE
        $trans[chr(145)] = '‘';    // Left Single Quotation Mark
        $trans[chr(146)] = '’';    // Right Single Quotation Mark
        $trans[chr(147)] = '“';    // Left Double Quotation Mark
        $trans[chr(148)] = '”';    // Right Double Quotation Mark
        $trans[chr(149)] = '•';    // Bullet
        $trans[chr(150)] = '–';    // En Dash
        $trans[chr(151)] = '—';    // Em Dash
        $trans[chr(152)] = '˜';    // Small Tilde
        $trans[chr(153)] = '™';    // Trade Mark Sign
        $trans[chr(154)] = 'š';    // Latin Small Letter S With Caron
        $trans[chr(155)] = '›';    // Single Right-Pointing Angle Quotation Mark
        $trans[chr(156)] = 'œ';    // Latin Small Ligature OE
        $trans[chr(159)] = 'Ÿ';    // Latin Capital Letter Y With Diaeresis
        $trans['euro'] = '€';    // euro currency symbol
   
    //first replacce the html entities
    $str = htmlentities($str,ENT_QUOTES);
   
    foreach($trans as $latin=>$htmlchar)
    {
      //echo $latin;
      $str = str_replace($latin,"$htmlchar",$str);
    }
   
    return $str;
 
}
  1. Ian Fraser says:

    Well done! A brilliant solution which has saved many hours of work and also headaches!

    • Thank you IAN,

      It had already given me a lots of headache, and I found it difficult to get its solution in web, so thought to blog it. Glad that it helped you.

  2. this saved me a lot of hassle – thanks!

    one small change:
    FROM:
    $trans[‘euro’] = ‘€’;
    TO:
    $trans[‘128’] = ‘€’;

    (this was necessary for me…)

  3. Thanks, bro! I was pulling out what little hair I have left. Sending some good Karma your way…

    🙂

Leave a Reply