On Monday, I’ve submitted a patch to the LyX developers mailing list with a fix for the numbering direction in Hebrew text. In Hebrew text the dot appeared before the numbering symbol instead of after it as it should.
This behaviour has been this way for years (at least as long as I can remember).
Continue reading Fixing Numbering Direction for Hebrew Text in LyX
Batch Renaming Using sed
I was reorganizing my music library and decided to change the naming convention I’ve used. This task is just asking to be automated. Since the filename change could be described using regular expression, I looked for a way to use sed
for the renaming process.
The files I had, had the following pattern as filename ARTIST – SONG – TRACK – ALBUM
James Brown - I Got You (I Feel Good).ogg - 01 - Classic James Brown
I wanted to rename it to ARTIST – ALBUM – TRACK – NAME
James Brown - Classic James Brown - 01 - I Got You (I Feel Good).ogg
Describing the change as a sed
program is easy:
s/\(.*\) - \(.*\) - \(.*\) - \(.*\).ogg/\1 - \4 - \3 - \2.ogg/
Now all that has to be done is to pass each filename to mv and pass it again after it went through the sed
script. This could be done like this:
for i in *; do
mv "$i" "`echo $i | sed "s/\(.*\) - \(.*\) - \(.*\) - \(.*\).ogg/\1 - \4 - \3 - \2.ogg/"`";
done
The important part is the
`echo $i | sed "s/\(.*\) - \(.*\) - \(.*\) - \(.*\).ogg/\1 - \4 - \3 - \2.ogg/"`
which pipes the filename to sed
and returns it as an argument for mv
.
To see what renaming will be done one can alter a bit the above command, and get
for i in *; do
echo "$i" "->" "`echo $i | sed "s/\(.*\) - \(.*\) - \(.*\) - \(.*\).ogg/\1 - \4 - \3 - \2.ogg/"`";
done
While will effectively print a list of lines of the form oldname -> newname
.
Of course this technique isn’t limited to the renaming I’ve done. By changing the pattern given to sed
, one can do any kind of renaming that can be described as a regular expression replacement. Also one can change the globbing (the *) in the for loop to operate only on specific files, that match a given pattern, in the directory instead of all of them.
Deleting a Range of Tickets in Trac
Recently the Open Yahtzee website which runs Trac has fallen victim to several spam attacks. The spammers submit large number of tickets containing links to various sites. This post was written mainly to allow me to copy paste a command to delete a range of tickets at once, but I thought it may be useful to others as well.
Continue reading Deleting a Range of Tickets in Trac
WordPress Backup to FTP
Update: A newer version of the script is available.
This script allows you to easily backup your WordPress blog to an FTP server. It’s actually a modification of my WordPress Backup to Amazon S3 Script, but instead of saving the backup to Amazon S3 it uploads it to an FTP server. Another update is that now the SQL dump includes the database creation instructions so you don’t need to create it manually before restoring from the backup.
Although I’ve written it with WordPress in mind (to creates backups of my blog), it isn’t WordPress specific. It can be used to backup any website that consists of a MySQL database and files. I’ve successfully used it to backup MediaWiki installation.
Continue reading WordPress Backup to FTP
Extract Public Key from X.509 Certificate as Hex
X.509 certificates are common way to exchange and distribute public key information. For example, most Open Social containers use the OAuth RSA-SHA1 signature method, and distribute their public keys in the X.509 format.
While working on an AppEngine application, I needed to verify requests from such containers. However, there is (currently) no pure python library able of parsing the certificates. This meant that I needed extract the public key out of the certificate manually, and store it in some parsed way inside the Python code.
Fortunately, parsing public keys form a X.509 certificate and representing them as a Hex number turned out simple and easy.
Continue reading Extract Public Key from X.509 Certificate as Hex
Expanding Macros into String Constants in C
Today I came across an annoying problem, how do I expand a C macro into a string?
One of C’s preprocessor operators is the #
which surrounds the token that follows it in the replacement text with double quotes (“). So, at first the solution sounds pretty simple, just define
#define STR(tok) #tok
and things will work. However, there is one caveat: it will not work if passed another macro. For example,
#define BUF_LEN 100
#define STR(tok) #tok
STR(BUF_LEN)
will produce after going through the preprocessor
"BUF_LEN"
instead of "100"
, which is undesired. This behavior is due to the C standard noting that no macro expansions should happen to token preceded by #
.
However, after reconsidering the source of the problem, I’ve found the following workaround: define another macro which will expand the argument and only then call the macro which does the quoting.
#define STR_EXPAND(tok) #tok
#define STR(tok) STR_EXPAND(tok)
#define BUF_LEN 100
STR(BUF_LEN)
will produce
"100"
as desired.
Explanation: The STR
macro calls the STR_EXPAND
macro with its argument. Unlike in the first example, this time the parameter is checked for macro expansions and evaluated by the preprocessor before being passed to STR_EXPAND
which quotes it, thus giving the desired behavior.
Damerau-Levenshtein Distance in Python
Damerau-Levenshtein Distance is a metric for measuring how far two given strings are, in terms of 4 basic operations:
- deletion
- insertion
- substitution
- transposition
The distance of two strings are the minimal number of such operations needed to transform the first string to the second. The algorithm can be used to create spelling correction suggestions, by finding the closest word from a given list to the users input. See Damerau–Levenshtein distance (Wikipedia) for more info on the subject.
Here is an of the algorithm (restricted edit distance version) in Python. While this implementation isn’t perfect (performance wise) it is well suited for many applications.
"""
Compute the Damerau-Levenshtein distance between two given
strings (s1 and s2)
"""
def damerau_levenshtein_distance(s1, s2):
d = {}
lenstr1 = len(s1)
lenstr2 = len(s2)
for i in xrange(-1,lenstr1+1):
d[(i,-1)] = i+1
for j in xrange(-1,lenstr2+1):
d[(-1,j)] = j+1
for i in xrange(lenstr1):
for j in xrange(lenstr2):
if s1[i] == s2[j]:
cost = 0
else:
cost = 1
d[(i,j)] = min(
d[(i-1,j)] + 1, # deletion
d[(i,j-1)] + 1, # insertion
d[(i-1,j-1)] + cost, # substitution
)
if i and j and s1[i]==s2[j-1] and s1[i-1] == s2[j]:
d[(i,j)] = min (d[(i,j)], d[i-2,j-2] + cost) # transposition
return d[lenstr1-1,lenstr2-1]
Update 24 Mar, 2012: Fixed the error in computing transposition at the beginning of the strings.
Backup a SourceForge hosted SVN repository – sf-svn-backup
SourceForge urges their users to backup the code repositories of their projects. As I have several projects hosted with SourceForge, I should do it too. Making the backups isn’t complicated at all, but because it isn’t automated properly, I’ve been lazy with it.
sf-svn-backup
was written in order to simply automate the process. The script is pretty simple to use, just pass as the first argument the project name and the script will write down to stdout
the dump file.
For example:
sf-svn-backup openyahtzee > openyahtzee.dump
The project name should be it’s UNIX name (e.g. openyahtzee and not Open Yahtzee). Because the script writes the dump file directly to stdout
it’s easy to pipe the output first through a compression program such as gzip
to create compressed SVN dump files.
Question Marks Instead of Non-ASCII Chars when using Gettext in PHP
Yesterday I’ve ported a PHP website to use Gettext for localizations (l10n). After reading through the Gettext documentation and going through the documentation in the PHP site, I’ve manged to get everything working (almost). I had one problem, all the non-ASCII characters (accented Latin chars, Japanese and Chinese) where displayed as question marks (?) instead of the correct form. This happened despite me using UTF-8 encoded files.
While some people (e.g. this one) suggested that it’s not possible to use non-ASCII characters when using a UTF-8 encoded message files, there is a solution and it’s quiet simple one. All you have to do is to call bind_textdomain_codset
and pass it UTF-8
as charset
.
InfiniteTTT 0.6 Released
InfiniteTTT 0.6 was released today. The main change in the new version is that the game is now multi-threaded.
InfiniteTTT is a variation of Tic-Tac-Toe which is played on an infinite board.
The new version has new multi-threaded AI engine, and several minor fixes and improvements. The changes improved the user experience and made the game more responsive. The new release contains binaries for Windows, source package and a Gentoo ebuild. Packages for other Linux distributions will follow soon (help will be appreciated).
To download the new version visit InfiniteTTT’s download page.