When working with Smarty, a PHP templating engine, I discovered that while the regular truncate
modifier works great on ASCII strings, it doesn’t work with multibyte strings, i.e. UTF-8 encoded strings. This leads to problems in internationalization (i18n), as UTF-8 is the popular encoding for non-Latin alphabets nowdays. The problem can be solved by modifying the built-in truncate
modifier and create a new one that takes an additional argument, the charset of the string, and acts accordingly. The new modified modifier, mb_truncate
is implemented below.
<?php
/**
* Smarty plugin
* @package Smarty
* @subpackage plugins
*/
/**
* Smarty truncate modifier plugin
*
* Type: modifier<br>
* Name: mb_truncate<br>
* Purpose: Truncate a string to a certain length if necessary,
* optionally splitting in the middle of a word, and
* appending the $etc string or inserting $etc into the middle.
* This version also supports multibyte strings.
* @link http://smarty.php.net/manual/en/language.modifier.truncate.php
* truncate (Smarty online manual)
* @author Guy Rutenberg <guyrutenberg@gmail.com> based on the original
* truncate by Monte Ohrt <monte at ohrt dot com>
* @param string
* @param integer
* @param string
* @param string
* @param boolean
* @param boolean
* @return string
*/
function smarty_modifier_mb_truncate($string, $length = 80, $etc = '...', $charset='UTF-8',
$break_words = false, $middle = false)
{
if ($length == 0)
return '';
if (mb_strlen($string) > $length) {
$length -= min($length, mb_strlen($etc));
if (!$break_words && !$middle) {
$string = preg_replace('/\s+?(\S+)?$/u', '', mb_substr($string, 0, $length+1, $charset));
}
if(!$middle) {
return mb_substr($string, 0, $length, $charset) . $etc;
} else {
return mb_substr($string, 0, $length/2, $charset) . $etc . mb_substr($string, -$length/2, (mb_strlen($string)-$length/2), $charset);
}
} else {
return $string;
}
}
/* vim: set expandtab: */
?>
The license for the code is LGPL (same as Smarty’s). To install to modifier just put it under /smarty/plugins/modifier.mb_truncate.php
.
Using the mb_truncate
modifier is similar to truncate
.
//$some_string is a string of utf-8 variables assign via php
{$some_string|mb_truncate:13:"...":'UTF-8'}
The modifier also supports the break words, and truncate in the middle flags of the original truncate
.
Update: Applied the fixes in the comments by tics and Vladimir.
Update: Applied the fix by Marko.
Nice !
It is good idea !
i have a 7 line paragraph in my database. now i wanna show only 3 line from this paragraph using PHP. which function i can use ?
Hi curzon,
A simple solution would be to call strpos (multiple times) to find the third occurrence of “\n” and now return the substring till this point, it will only have 3 lines.
I don’t know any existing modifier for Smarty, so you will have to wrap it before you can use it in Smarty. See the mb_truncate modifier as an example how to do it.
That was exactly what I needed. Took me about two hours to figure out that the problem was not in the database but with Smartys truncate 😉
Then I googled and came here -> problem solved.
Thanks,
Pierre
{$some_string|mb_truncate:13:”…”:’UTF-8′}
This is the right syntax to truncate…..
🙂
@muthu: Thanks, I updated the post and removed the extra mb_truncate from the example.
wow! thanks great work!
Thank you very much for sharing this code!
It helped me fix an annoying bug in my weblog, which relies on Smarty.
Best regards,
Alex
If you don’t put “/u” at the end of pattern, you may have a problem at the end of accentued caracters.
$string = preg_replace(‘/\s+?(\S+)?$/u’, ”, mb_substr($string, 0, $length+1, $charset));
Best regards
tics
In UTF8, if you don’t put “/u” at the end of pattern, you may have a problem at the end of accentued caracters.
$string = preg_replace(’/\s+?(\S+)?$/u’, ”, mb_substr($string, 0, $length+1, $charset));
Best regards
tics
tics +1
Great work,
I have exactly the problem you described in your post. Thanks for this plugin.
Thanks for this plugin.
There is a small problem in the code.
On line 36 and line 37 instead of strlen must be used mb_strlen.
So it should be:
if (mb_strlen($string, $charset) > $length) {
$length -= min($length, mb_strlen($etc, $charset));
🙂
Nice plugin, works even better if previously suggested changes are made.
Additionally one can change the order of the function arguments. To match the order of things with classic smarty truncate, the charset should be passed after breakwords, so it would be:
$string, $length, $etc, $break_words …
I placed that as last param, as php`s mb_ functions.
You can now prepend mb_ on all existing truncate modifier calls within the templates.
This is a very easy task using find/replace, which won’t do so well with original argument order.
It would pass the former break_words as charset, mb_ functions will not be happy with it.
Thank you for posting this, an excellent addition to Smarty and very useful for those of us working in UTF-8 only environments.
Thanks/Спасибо
Thank you! Good work! 🙂
tics +2 even
Add that “u”
Thanks! Very nice plugin!
Thanks for this multibyte safe smarty truncate plugin.
It’s great!
I found an issue in this script, since mb_substr will return an empty string if you send it an out of bounds paramenter. Also you’re sending $charset to a parameter that should be an int.
So line 44 should be:
return mb_substr($string, 0, $length/2, $charset) . $etc . mb_substr($string, -$length/2, (mb_strlen($string)-$length/2), $charset);
Cheers, Marko
@Marko, thanks, I’ve fixed that line.
Seems MBSSTRING support for truncate is implemented in Smarty 3.1.21
Thank you so much , it fix my problem.