Multibyte String Truncate Modifier for Smarty – mb_truncate

When working with Smarty, a PHP templating engine, I discovered that while the regular truncate modifier works great on ASCII strings, it doesn’t work with multibyte strings, i.e. UTF-8 encoded strings. This leads to problems in internationalization (i18n), as UTF-8 is the popular encoding for non-Latin alphabets nowdays. The problem can be solved by modifying the built-in truncate modifier and create a new one that takes an additional argument, the charset of the string, and acts accordingly. The new modified modifier, mb_truncate is implemented below.

<?php
/**
 * Smarty plugin
 * @package Smarty
 * @subpackage plugins
 */


/**
 * Smarty truncate modifier plugin
 *
 * Type:     modifier<br>
 * Name:     mb_truncate<br>
 * Purpose:  Truncate a string to a certain length if necessary,
 *           optionally splitting in the middle of a word, and
 *           appending the $etc string or inserting $etc into the middle.
 *           This version also supports multibyte strings.
 * @link http://smarty.php.net/manual/en/language.modifier.truncate.php
 *          truncate (Smarty online manual)
 * @author   Guy Rutenberg <guyrutenberg@gmail.com> based on the original 
 *           truncate by Monte Ohrt <monte at ohrt dot com>
 * @param string
 * @param integer
 * @param string
 * @param string
 * @param boolean
 * @param boolean
 * @return string
 */
function smarty_modifier_mb_truncate($string, $length = 80, $etc = '...', $charset='UTF-8',
                                  $break_words = false, $middle = false)
{
    if ($length == 0)
        return '';

    if (mb_strlen($string) > $length) {
        $length -= min($length, mb_strlen($etc));
        if (!$break_words && !$middle) {
            $string = preg_replace('/\s+?(\S+)?$/u', '', mb_substr($string, 0, $length+1, $charset));
        }
        if(!$middle) {
            return mb_substr($string, 0, $length, $charset) . $etc;
        } else {
            return mb_substr($string, 0, $length/2, $charset) . $etc . mb_substr($string, -$length/2, (mb_strlen($string)-$length/2), $charset);
        }
    } else {
        return $string;
    }
}

/* vim: set expandtab: */
?>

The license for the code is LGPL (same as Smarty’s). To install to modifier just put it under /smarty/plugins/modifier.mb_truncate.php.

Using the mb_truncate modifier is similar to truncate.

//$some_string is a string of utf-8 variables assign via php
{$some_string|mb_truncate:13:"...":'UTF-8'}

The modifier also supports the break words, and truncate in the middle flags of the original truncate.

Update: Applied the fixes in the comments by tics and Vladimir.
Update: Applied the fix by Marko.

25 thoughts on “Multibyte String Truncate Modifier for Smarty – mb_truncate

  1. i have a 7 line paragraph in my database. now i wanna show only 3 line from this paragraph using PHP. which function i can use ?

  2. Hi curzon,
    A simple solution would be to call strpos (multiple times) to find the third occurrence of “\n” and now return the substring till this point, it will only have 3 lines.

    I don’t know any existing modifier for Smarty, so you will have to wrap it before you can use it in Smarty. See the mb_truncate modifier as an example how to do it.

  3. That was exactly what I needed. Took me about two hours to figure out that the problem was not in the database but with Smartys truncate 😉
    Then I googled and came here -> problem solved.

    Thanks,
    Pierre

  4. Thank you very much for sharing this code!

    It helped me fix an annoying bug in my weblog, which relies on Smarty.

    Best regards,
    Alex

  5. If you don’t put “/u” at the end of pattern, you may have a problem at the end of accentued caracters.

    $string = preg_replace(‘/\s+?(\S+)?$/u’, ”, mb_substr($string, 0, $length+1, $charset));

    Best regards
    tics

  6. In UTF8, if you don’t put “/u” at the end of pattern, you may have a problem at the end of accentued caracters.

    $string = preg_replace(’/\s+?(\S+)?$/u’, ”, mb_substr($string, 0, $length+1, $charset));

    Best regards
    tics

  7. Great work,
    I have exactly the problem you described in your post. Thanks for this plugin.

  8. Thanks for this plugin.
    There is a small problem in the code.
    On line 36 and line 37 instead of strlen must be used mb_strlen.
    So it should be:
    if (mb_strlen($string, $charset) > $length) {
    $length -= min($length, mb_strlen($etc, $charset));

    🙂

  9. Nice plugin, works even better if previously suggested changes are made.

    Additionally one can change the order of the function arguments. To match the order of things with classic smarty truncate, the charset should be passed after breakwords, so it would be:
    $string, $length, $etc, $break_words …

    I placed that as last param, as php`s mb_ functions.

    You can now prepend mb_ on all existing truncate modifier calls within the templates.

    This is a very easy task using find/replace, which won’t do so well with original argument order.
    It would pass the former break_words as charset, mb_ functions will not be happy with it.

  10. Thank you for posting this, an excellent addition to Smarty and very useful for those of us working in UTF-8 only environments.

  11. Thanks for this multibyte safe smarty truncate plugin.
    It’s great!

    I found an issue in this script, since mb_substr will return an empty string if you send it an out of bounds paramenter. Also you’re sending $charset to a parameter that should be an int.

    So line 44 should be:

    return mb_substr($string, 0, $length/2, $charset) . $etc . mb_substr($string, -$length/2, (mb_strlen($string)-$length/2), $charset);

    Cheers, Marko

Leave a Reply

Your email address will not be published. Required fields are marked *