Using FireFox (3.5) to convert html to pdf

So I’ve been struggling to convert html to pdf. There are many attempts at this in the forms of perl scripts, some php dom parsers making use of pdflib or fpdf, and some of them even work … mostly. But without exception all of the solutions I’ve come across in the last month fails on gb2312 (chinese character sets). I’ve actually wanted to hit my head that the solution didn’t make itself apparent earlier … seeing that I’ve been opening the sample .html files with firefox for most of that month to see what it should look like. It’s been staring me in the face and I didn’t realize it.

Of course, life would never be that simple. In my particular situation I need something that can work from the command line. And since this is running on a headless server configuring X.org is pretty pointless (and wasteful of resources).

Anyway, step one is to enable firefox to print from the command line (we’ll hack around X issues in a while).

So after about 5 seconds of googling I find this link: http://torisugari.googlepages.com/commandlineprint2 – it’s a FF plugin that enables printing from the command line. After tweaking a bit I get a command that works:

$ firefox -print __main_body__.html -printmode pdf -printfile /home/jkroon/tmp/body/_main_body__.pdf

Note the full path for the pdf file … this is due to the way the filename handling is being done. Without this full path, it doesn’t work. The link above contains an explanation I don’t pretend to fully understand. The best would probably be to use something like “$(readlink -f “${output_path}”)/${outpdfname}” to construct that parameter. The above managed to get all my test cases for both latin-1 (iso-8859-1) as well as gb2312 correct (and all of the other html to pdf conversion paths that I’ve managed to choke on gb2312). So that’s major win number one.

The next step was to get rid of the unwanted headers and footers. This can be achieved by editing the prefs.js file directly on a headless box after running FF once to create the prefs folders, or by opening about:config in firefox and looking for these options:

user_pref("print.print_footerleft", "");
user_pref("print.print_footerright", "");
user_pref("print.print_headerleft", "");
user_pref("print.print_headerright", "");

Note that per-printer settings may override this, so I just nuked the lot (mostly the printers that was here is no longer in existence for me anyway) and after this the command above rendered properly without any funky header options. Gotta love vim.

Additionally, as per stealthyninja, we can solve the crashing restart by setting a firefox option, so whilst you’re editing prefs.js, add the following line too:

user_pref("browser.sessionstore.resume_from_crash", false);

This tells firefox to not resume from a crash and always restart a new session.

On to solving the X issue. This is a sticky one and I don’t have a complete fix for it. It’s a problem, no two doubts about it. The inefficient way is to hope that no other instance of the script is already running and to do something like:

$ Xvfb :1 &
$ DISPLAY=:1 firefox ...
$ kill %1

But this is going to cause Xvfb to be started on every run, and I think we can do better than that. In particular, the following snippet is currently undergoing testing:

$ DISPLAY=:1 xrandr 2>&1 | grep -q "^Can't open display :1\$" && (echo "Starting Xvfb"; Xvfb :1 -screen 0 800x600x24 &) || echo "Xvfb already running"

This does contain a race if two instances of the script happens to execute that exact same line at the same time. Murphy has shown this to be more likely than one would expect. In this case I’m not overly worried actually, and this may illustrate why (With another instance of Xvfb already running):

Also note the extra -screen 0 800x600x24 option. This is to work around a segfault in Xvfb which occurs in 8-bit mode (the default).

$ Xvfb :1
Fatal server error:
Server is already active for display 1
If this server is no longer running, remove /tmp/.X1-lock
and start again.

The above handles pretty well if you have a server with a head you can use to first configure Xorg and, obviously, firefox. It’s a (relatively) simple matter of installing the plugin for the user that’s going to be running the firefox command using your normal head, and to then rip the prefs.js file appart, and off you go, you can use the above from your scripts. Doing this on a headless server presented some new challenges.

First a little info regarding my setup. Running a gentoo server, headless about 12000km from where I am. Firstly, latency is a bitch (not to mention in looks like I’ve got a throttled down link – feels like dial-up at times), and secondly, running Xvfb didn’t go so smooth (even as root):

$ Xvfb :1
Could not init font path element /usr/share/fonts/TTF/, removing from list!
Could not init font path element /usr/share/fonts/OTF, removing from list!
Could not init font path element /usr/share/fonts/Type1/, removing from list!

Fatal server error:
could not open default font ‘fixed’

A bit of googling found a few possible fixes, mostly looking to install font-misc, which is font-misc-misc on gentoo (No, I’m not the right person to answer that particular question). In my case I should be installing as many fonts as possible anyway, so I went looking in /usr/portage/media-fonts and installed a rather instane number of fonts from there. You also need font-cursor-misc.

Once you have Xvfb up and running, you need some way of configuring firefox, so start Xvfb with something like this (as the user intended to run the conversion process):

$ Xvfb :1 -screen 0 800x600x16 &

Now, attach a vnc (or other remote service) to it:

$ x11vnc -display :1 -bg -nopw -listen localhost

This causes vnc to listen on localhost:5900, and not require a password to connect. This is dangerous under normal circumstances, but seeing that I’m the only user on this system, and it is only for the purposes of configuring FF I’m willing to take a temporary risk.

Now I ssh into this system with port forwarding:

$ ssh jkroon@remote_headless_server -L 5900:localhost:5900

And as the user as which Xvfb is running:

$ DISPLAY=:1 firefox

Give it a few seconds to properly start up, and then on your local machine from where you’ve ssh’ed (you can use other encodings, just avoid “Raw” – it’s ultra slow):

$ vncviewer localhost -encodings zlib

This can be slow depending on connectivity and encoding, so you really want to restrict your activity here. After waiting for the window to come up, click on the “Install” button, which will complete installation of the plug-in. Do NOT killall firefox, make sure to terminate it cleanly. If you can’t, just be sure to start it again (due to lack of window decorations for example), kill it but be sure to start it again and exit cleanly. If you’re having issues, install a light-weight WM like icewm (chosen because it has very few dependencies, unlike pretty much everything else, evilwm would work too but is almost too reliant on keyboard shortcuts – which often requires things like alt … which is what makes vnc hard to use) and start it against the display:

$ DISPLAY=:1 icewm

Alternatively, Ctrl+Q is the key to make firefox quit.

Once this is done, just edit prefs.js as per above and you should be good to go.

Some things that can possibly be done to make the whole process simpler:

  • figure out if firefox has been configured (check for existence of ~/.mozilla/firefox/??/prefs.js, and if not, automatically configure it instead of the process above.
  • figure out how to install the cmdlnprint_0_5_1.xpi file from the commandline (would make the whole vnc experience pointless.

7 Responses to “Using FireFox (3.5) to convert html to pdf”

  1. stealthyninja says:

    Get firefox to assume that you always want to start a new session. As it stands there is a small risk that if firefox decides to (for whatever reason) crash, that recovering the system would require the need to go via the whole vnc experience … again

    In about:config, set browser.sessionstore.resume_from_crash to false.

  2. stealthyninja says:

    figure out how to install the cmdlnprint_0_5_1.xpi file from the commandline (would make the whole vnc experience pointless.

    firefox -install-global-extension /path/to/extension (reference: Install extension from command line)

  3. stealthyninja says:

    Hmm; the next time you need to VNC in, you could try creating (if it doesn’t exist yet) a new boolean called browser.sessionstore.enabled and set it to false. Theoretically Firefox should then stop storing session info (i.e. there would be nothing to ask the user to restore should an application crash occur).

  4. Jaco Kroon says:

    Where do you get all these options or are you actually sifting through all of them in about:config?

  5. stealthyninja says:

    The first two I knew about, the other one I found using about:config’s handy auto-complete and then I checked on its functionality by searching for it on Mozilla’s wiki (also just to confirm it’s available and not reported as buggy on *nix machines).