It is almost done: The migration of my blog from Wordpress to octopress!
Recently my old Wordpress-based blog was successfully misused multiple times for attacks on my server. That was the time I decided to get rid of Wordpress and find another solution. As I already read about octopress I gave it a try. To make it short: I liked it and started to migrate my old blog to the new system.
There is the great tool exitwp that converts a full wordpress XML dump to octopress postings and takes care of downloading the images, although that part does not work too well, especially if you use thumbnails. Also the image-urls are not adapted to their new locations.
As I had roughly about 100 postings with many of them having images, I quickly lost motivation to manually adapt the image links. So I wrote a small Perl script that takes care of that adaptation.
```perl correctImageLinks.pl
!/bin/perl -w
use strict;
use File::Basename; use Image::Size;
set to 1 for some processing information
my $verbose = 0;
base path of octopress source directory
my $basePath = "/path/to/octopress/source/"; #ADAPT ME
if (not (-e $basePath."_posts/new")) { mkdir($basePath."_posts/new"); }
get dir listing
my @files = ; #ADAPT ME
some variables
my ($imageComment, $imageURL, $imageName, $linkURL); my ($sizeX, $sizeY);
for each file in dir
foreach my $fileName (@files) {
# create a backup of file
my $inFileName = $fileName;
my $outFileName = dirname($fileName)."/new/".basename($fileName);
# Open input file in read mode
open INPUTFILE, "<", $inFileName or die $!;
# Open output file in write mode
open OUTPUTFILE, ">", $outFileName or die $!;
# get article name from file name
# by removing path and .markdown
my $articleName = basename($fileName);
$articleName =~ s/\.markdown//g;
# Read the input file line by line :
while (my $input_line = <INPUTFILE>) {
# search for linked images
# [! ... ](...)
# and extract image file name and comment
# example:
# [](link URL)
while ( $input_line =~ /\[!\[([^\]\)]*)\]\((http:\/\/[^\]\)]*wp-content[^\]\)]*)\)\]\(([^\]\)]*)\)/g ) {
$imageComment = $1;
$imageURL = $2;
$imageName = basename($2);
$linkURL = $3;
if ($verbose > 0) {
print ": $imageURL\n:: $linkURL\n\n";
}
# create new file path
# and gather image dimensions
my $newFilePath = "/images/posts/$articleName/$imageName";
if (-e $basePath.$newFilePath) {
my ($sizeX, $sizeY) = imgsize( $basePath.$newFilePath );
# create new {.% img %.} tag
# {.% img PATH X Y COMMENT %.}
# choose whether to keep or remove the link
# my $newImageTag = "[{"."% img $newFilePath $sizeX $sizeY $imageComment %"."}]($linkURL)";
my $newImageTag = "{"."% img $newFilePath $sizeX $sizeY $imageComment %"."}";
if ($verbose > 0) {
print $input_line;
}
$input_line =~ s/\[!\[[^\]\)]*\]\([^\]\)]*wp-content[^\]\)]*\)\]\([^\]\)]*\)/$newImageTag/;
if ($verbose > 0) {
print "--> $input_line\n\n";
}
} else {
print "WARNING: Image file '$imageName' does not exist!\n";
}
} # while linked images
while ($input_line =~ /!\[([^\]\)]*)\]\((http:\/\/[^\]\)]*wp-content[^\]\)]*)\)/g) {
# 
$imageComment = $1;
$imageURL = $2;
$imageName = basename($2);
if ($verbose > 0) {
print ": $imageURL\n\n";
}
# create new file path
# and gather image dimensions
my $newFilePath = "/images/posts/$articleName/$imageName";
if (-e $basePath.$newFilePath) {
my ($sizeX, $sizeY) = imgsize( $basePath.$newFilePath );
# create new {.% img %.} tag
# {.% img PATH X Y COMMENT %.}
my $newImageTag = "{"."% img $newFilePath $sizeX $sizeY $imageComment %"."}";
if ($verbose > 0) {
print $input_line;
}
$input_line =~ s/!\[[^\]\)]*\][^\]\)]*\([^\]\)]*wp-content[^\]\)]*\)/$newImageTag/;
if ($verbose > 0) {
print "--> $input_line\n\n";
}
} else {
print "WARNING: Image file '$imageName' does not exist!\n";
}
} # while unlinked images
print OUTPUTFILE $input_line;
} # while lines
close(OUTPUTFILE);
} # while files ```
This script modifies the output of exitwp so that the image paths match the new scheme and also changes the markdown command from
to
{ % img image-url width height alt-text % }
which creates nicer images.