WP-Syntax was a great syntax-highlighting plugin for WordPress. However, development had ceased and it was not updated for a very long time. While it is not broken per se, it didn’t work with Jetpack’s markdown support and so I stopped using it on the site and started using a different plugin. With the introduction of the Gutenberg editor, I started looking again for a plugin that will allow me to easily highlight fenced code blocks (this feature worked in the old editor with SyntaxHighlighter Evolved but isn’t supported in Gutenberg). Realizing that I don’t want three syntax-highlighting plugins enabled simultaneously, and not wanting to have an abandoned plugin enabled, I decided to migrate all the posts from WP-Syntax to a new solution.
The new solution I chose was PrismJS. I decided to use it directly (without a plugin) as it highlights by default all the <pre><code class="language-...">
constructs (which is what markdown produces as well) and I didn’t want (yet again) to use plugin specific shortcodes like before which will require migration when the plugin eventually stops working.
WP-Syntax used <pre lang="...">code goes here</pre>
construct. Furthermore, it took care of html escaping everything inside the <pre>
tag. So the migration solution would be to rewrite the <pre>
tags to <pre><code>
constructs, html escape the code inside the pre
tag and finally remove any leading newlines. I wrote it to work on dumped SQL tables as it seemed easiest. The flow is
$ mysqldump --add-drop-table -u USER -p blog wp_comments > wp_posts.sql
$ python3 < wp_posts.sql > wp_posts_updated.sql
$ mysql --user=USER --password blog < /tmp/wp_posts_updated.sql
#!/usr/bin/python3
import re
import html
import sys
def convert(fin, fout):
for line in fin:
# Each post is in a single line
# <pre><code> doesn't ignore the first newline like <pre>
replaced = re.sub(r'<pre lang=\\"(.*?)\\">(?:\\r)?(?:\\n)?(.*?)</pre>', replace_and_escape, line)
print(replaced)
if line != replaced:
print(line, replaced, sep="\n============>\n", file=sys.stderr)
def replace_and_escape(matchobj):
language = matchobj.group(1)
# We don't escape quotes because it's unnecessary and it would mess up the
# SQL escaping
content = html.escape(matchobj.group(2), quote=False)
return r'<pre><code class=\"language-{}\">{}</code></pre>'.format(language, content)
if __name__=='__main__':
convert(sys.stdin, sys.stdout)
Nice approach thanks. Saved me a lot of time.
Two possible improvements:
1.
Your mysqldump command dumps wp_comments. I think you mean wp_posts instead.
2.
wp-syntax also offered a “none” language. Those are
<
pre lang=”none”> blocks without syntax highlighting. So the script could have a fallback for those.
def replace_and_escape(matchobj):
language = matchobj.group(1)
if language == ‘none’:
language = ‘bash’;
…