Wordpress to octopress migration

It is almost done: The migration of my blog from Wordpress to octopress!

Recently my old Wordpress-based blog was successfully misused multiple times for attacks on my server. That was the time I decided to get rid of Wordpress and find another solution. As I already read about octopress I gave it a try. To make it short: I liked it and started to migrate my old blog to the new system.

There is the great tool exitwp that converts a full wordpress XML dump to octopress postings and takes care of downloading the images, although that part does not work too well, especially if you use thumbnails. Also the image-urls are not adapted to their new locations.

As I had roughly about 100 postings with many of them having images, I quickly lost motivation to manually adapt the image links. So I wrote a small Perl script that takes care of that adaptation.

```perl correctImageLinks.pl

!/bin/perl -w

use strict;

use File::Basename; use Image::Size;

set to 1 for some processing information

my $verbose = 0;

base path of octopress source directory

my $basePath = "/path/to/octopress/source/"; #ADAPT ME

if (not (-e $basePath."_posts/new")) { mkdir($basePath."_posts/new"); }

get dir listing

my @files = ; #ADAPT ME

some variables

my ($imageComment, $imageURL, $imageName, $linkURL); my ($sizeX, $sizeY);

for each file in dir

foreach my $fileName (@files) {

# create a backup of file
my $inFileName = $fileName;
my $outFileName =  dirname($fileName)."/new/".basename($fileName);

# Open input file in read mode
open INPUTFILE, "<", $inFileName or die $!;
# Open output file in write mode
open OUTPUTFILE, ">", $outFileName or die $!;

# get article name from file name
#   by removing path and .markdown
my $articleName = basename($fileName);
$articleName =~ s/\.markdown//g;

# Read the input file line by line :
while (my $input_line = <INPUTFILE>) {

    # search for linked images
    #     [! ... ](...)
    #   and extract image file name and comment
    # example:
    # [![Alt text](image URL)](link URL)
    while ( $input_line =~ /\[!\[([^\]\)]*)\]\((http:\/\/[^\]\)]*wp-content[^\]\)]*)\)\]\(([^\]\)]*)\)/g ) {

        $imageComment = $1;
        $imageURL = $2;
        $imageName = basename($2);
        $linkURL = $3;

        if ($verbose > 0) {
            print ": $imageURL\n:: $linkURL\n\n";
        }

        # create new file path
        #   and gather image dimensions
        my $newFilePath = "/images/posts/$articleName/$imageName";

        if (-e $basePath.$newFilePath) {
            my ($sizeX, $sizeY) = imgsize( $basePath.$newFilePath );

            # create new {.% img %.} tag
            #   {.% img PATH X Y COMMENT %.}

            # choose whether to keep or remove the link
            # my $newImageTag = "[{"."% img $newFilePath $sizeX $sizeY $imageComment %"."}]($linkURL)";
            my $newImageTag = "{"."% img $newFilePath $sizeX $sizeY $imageComment %"."}";

            if ($verbose > 0) {
                print $input_line;
            }
            $input_line =~ s/\[!\[[^\]\)]*\]\([^\]\)]*wp-content[^\]\)]*\)\]\([^\]\)]*\)/$newImageTag/;
            if ($verbose > 0) {
                print "--> $input_line\n\n";
            }
        } else {
            print "WARNING: Image file '$imageName' does not exist!\n";
        }
    } # while linked images

    while ($input_line =~ /!\[([^\]\)]*)\]\((http:\/\/[^\]\)]*wp-content[^\]\)]*)\)/g) {
        # ![Alt text](/path/to/img.jpg)

        $imageComment = $1;
        $imageURL = $2;
        $imageName = basename($2);

        if ($verbose > 0) {
            print ": $imageURL\n\n";
        }

        # create new file path
        #   and gather image dimensions
        my $newFilePath = "/images/posts/$articleName/$imageName";

        if (-e $basePath.$newFilePath) {
            my ($sizeX, $sizeY) = imgsize( $basePath.$newFilePath );

            # create new {.% img %.} tag
            #   {.% img PATH X Y COMMENT %.}
            my $newImageTag = "{"."% img $newFilePath $sizeX $sizeY $imageComment %"."}";
            if ($verbose > 0) {
                print $input_line;
            }
            $input_line =~ s/!\[[^\]\)]*\][^\]\)]*\([^\]\)]*wp-content[^\]\)]*\)/$newImageTag/;
            if ($verbose > 0) {
                print "--> $input_line\n\n";
            }
        } else {
            print "WARNING: Image file '$imageName' does not exist!\n";
        }
    } # while unlinked images

    print OUTPUTFILE $input_line;
} # while lines

close(OUTPUTFILE);

} # while files ```

This script modifies the output of exitwp so that the image paths match the new scheme and also changes the markdown command from alt text to { % img image-url width height alt-text % } which creates nicer images.