Fixing Numbering Direction for Hebrew Text in LyX

On Monday, I’ve submitted a patch to the LyX developers mailing list with a fix for the numbering direction in Hebrew text. In Hebrew text the dot appeared before the numbering symbol instead of after it as it should.
before-fix
This behaviour has been this way for years (at least as long as I can remember).
Continue reading Fixing Numbering Direction for Hebrew Text in LyX

Batch Renaming Using sed

I was reorganizing my music library and decided to change the naming convention I’ve used. This task is just asking to be automated. Since the filename change could be described using regular expression, I looked for a way to use sed for the renaming process.

The files I had, had the following pattern as filename ARTIST – SONG – TRACK – ALBUM

James Brown - I Got You (I Feel Good).ogg  - 01 - Classic James Brown

I wanted to rename it to ARTIST – ALBUM – TRACK – NAME

James Brown - Classic James Brown - 01 - I Got You (I Feel Good).ogg

Describing the change as a sed program is easy:

s/\(.*\) - \(.*\) - \(.*\) - \(.*\).ogg/\1 - \4 - \3 - \2.ogg/

Now all that has to be done is to pass each filename to mv and pass it again after it went through the sed script. This could be done like this:

for i in *; do
  mv "$i" "`echo $i | sed "s/\(.*\) - \(.*\) - \(.*\) - \(.*\).ogg/\1 - \4 - \3 - \2.ogg/"`";
done

The important part is the

`echo $i | sed "s/\(.*\) - \(.*\) - \(.*\) - \(.*\).ogg/\1 - \4 - \3 - \2.ogg/"`

which pipes the filename to sed and returns it as an argument for mv.

To see what renaming will be done one can alter a bit the above command, and get

for i in *; do
  echo "$i" "->" "`echo $i | sed "s/\(.*\) - \(.*\) - \(.*\) - \(.*\).ogg/\1 - \4 - \3 - \2.ogg/"`";
done

While will effectively print a list of lines of the form oldname -> newname.

Of course this technique isn’t limited to the renaming I’ve done. By changing the pattern given to sed, one can do any kind of renaming that can be described as a regular expression replacement. Also one can change the globbing (the *) in the for loop to operate only on specific files, that match a given pattern, in the directory instead of all of them.

Deleting a Range of Tickets in Trac

Recently the Open Yahtzee website which runs Trac has fallen victim to several spam attacks. The spammers submit large number of tickets containing links to various sites. This post was written mainly to allow me to copy paste a command to delete a range of tickets at once, but I thought it may be useful to others as well.
Continue reading Deleting a Range of Tickets in Trac

WordPress Backup to FTP

Update: A newer version of the script is available.

This script allows you to easily backup your WordPress blog to an FTP server. It’s actually a modification of my WordPress Backup to Amazon S3 Script, but instead of saving the backup to Amazon S3 it uploads it to an FTP server. Another update is that now the SQL dump includes the database creation instructions so you don’t need to create it manually before restoring from the backup.

Although I’ve written it with WordPress in mind (to creates backups of my blog), it isn’t WordPress specific. It can be used to backup any website that consists of a MySQL database and files. I’ve successfully used it to backup MediaWiki installation.
Continue reading WordPress Backup to FTP

Extract Public Key from X.509 Certificate as Hex

X.509 certificates are common way to exchange and distribute public key information. For example, most Open Social containers use the OAuth RSA-SHA1 signature method, and distribute their public keys in the X.509 format.

While working on an AppEngine application, I needed to verify requests from such containers. However, there is (currently) no pure python library able of parsing the certificates. This meant that I needed extract the public key out of the certificate manually, and store it in some parsed way inside the Python code.

Fortunately, parsing public keys form a X.509 certificate and representing them as a Hex number turned out simple and easy.
Continue reading Extract Public Key from X.509 Certificate as Hex

Expanding Macros into String Constants in C

Today I came across an annoying problem, how do I expand a C macro into a string?

One of C’s preprocessor operators is the # which surrounds the token that follows it in the replacement text with double quotes (“). So, at first the solution sounds pretty simple, just define

#define STR(tok) #tok

and things will work. However, there is one caveat: it will not work if passed another macro. For example,

#define BUF_LEN 100
#define STR(tok) #tok

STR(BUF_LEN)

will produce after going through the preprocessor

"BUF_LEN"

instead of "100", which is undesired. This behavior is due to the C standard noting that no macro expansions should happen to token preceded by #.

However, after reconsidering the source of the problem, I’ve found the following workaround: define another macro which will expand the argument and only then call the macro which does the quoting.

#define STR_EXPAND(tok) #tok
#define STR(tok) STR_EXPAND(tok)

#define BUF_LEN 100

STR(BUF_LEN)

will produce

"100"

as desired.

Explanation: The STR macro calls the STR_EXPAND macro with its argument. Unlike in the first example, this time the parameter is checked for macro expansions and evaluated by the preprocessor before being passed to STR_EXPAND which quotes it, thus giving the desired behavior.

Damerau-Levenshtein Distance in Python

Damerau-Levenshtein Distance is a metric for measuring how far two given strings are, in terms of 4 basic operations:

  • deletion
  • insertion
  • substitution
  • transposition

The distance of two strings are the minimal number of such operations needed to transform the first string to the second. The algorithm can be used to create spelling correction suggestions, by finding the closest word from a given list to the users input. See Damerau–Levenshtein distance (Wikipedia) for more info on the subject.

Here is an of the algorithm (restricted edit distance version) in Python. While this implementation isn’t perfect (performance wise) it is well suited for many applications.

"""
Compute the Damerau-Levenshtein distance between two given
strings (s1 and s2)
"""
def damerau_levenshtein_distance(s1, s2):
    d = {}
    lenstr1 = len(s1)
    lenstr2 = len(s2)
    for i in xrange(-1,lenstr1+1):
        d[(i,-1)] = i+1
    for j in xrange(-1,lenstr2+1):
        d[(-1,j)] = j+1

    for i in xrange(lenstr1):
        for j in xrange(lenstr2):
            if s1[i] == s2[j]:
                cost = 0
            else:
                cost = 1
            d[(i,j)] = min(
                           d[(i-1,j)] + 1, # deletion
                           d[(i,j-1)] + 1, # insertion
                           d[(i-1,j-1)] + cost, # substitution
                          )
            if i and j and s1[i]==s2[j-1] and s1[i-1] == s2[j]:
                d[(i,j)] = min (d[(i,j)], d[i-2,j-2] + cost) # transposition

    return d[lenstr1-1,lenstr2-1]

Update 24 Mar, 2012: Fixed the error in computing transposition at the beginning of the strings.

Backup a SourceForge hosted SVN repository – sf-svn-backup

SourceForge urges their users to backup the code repositories of their projects. As I have several projects hosted with SourceForge, I should do it too. Making the backups isn’t complicated at all, but because it isn’t automated properly, I’ve been lazy with it.

sf-svn-backup was written in order to simply automate the process. The script is pretty simple to use, just pass as the first argument the project name and the script will write down to stdout the dump file.

For example:

sf-svn-backup openyahtzee > openyahtzee.dump

The project name should be it’s UNIX name (e.g. openyahtzee and not Open Yahtzee). Because the script writes the dump file directly to stdout it’s easy to pipe the output first through a compression program such as gzip to create compressed SVN dump files.

Question Marks Instead of Non-ASCII Chars when using Gettext in PHP

Yesterday I’ve ported a PHP website to use Gettext for localizations (l10n). After reading through the Gettext documentation and going through the documentation in the PHP site, I’ve manged to get everything working (almost). I had one problem, all the non-ASCII characters (accented Latin chars, Japanese and Chinese) where displayed as question marks (?) instead of the correct form. This happened despite me using UTF-8 encoded files.

While some people (e.g. this one) suggested that it’s not possible to use non-ASCII characters when using a UTF-8 encoded message files, there is a solution and it’s quiet simple one. All you have to do is to call bind_textdomain_codset and pass it UTF-8 as charset.

InfiniteTTT 0.6 Released

InfiniteTTT 0.6 was released today. The main change in the new version is that the game is now multi-threaded.

InfiniteTTT is a variation of Tic-Tac-Toe which is played on an infinite board.

The new version has new multi-threaded AI engine, and several minor fixes and improvements. The changes improved the user experience and made the game more responsive. The new release contains binaries for Windows, source package and a Gentoo ebuild. Packages for other Linux distributions will follow soon (help will be appreciated).

To download the new version visit InfiniteTTT’s download page.