Previous Topic

Next Topic

Generating Cyrillic PDF Documents Using htmldoc

For the htmldoc PDF generator to start supporting Cyrillic fonts, the three-step procedure is to be performed:

  1. Put the Cyrillic fonts into the htmldoc fonts directory.
  2. Decode the file into the format suitable for htmldoc and then run htmldoc. Below we provide the example of the script that converts the UTF-8 into CP1251 encoding.
  3. Specify the command line template in PDF Generator Setup in Provider Control Center > Configuration Director > Miscellaneous Settings > PDF Generator Setup.

Replacing the HTML Cyrillic Fonts

The complete set of Cyrillic fonts that can be used with htmldoc is available over the Internet, for example here:

http://fonts.kolodka.com/htmldoc.cyr.fonts-0.1.tar.bz2

The GPL Cyrillic fonts were used as a source, and developer of this archive just performed the did pfb2pfa conversion, renamed the fonts according to htmldoc requirements and changed FontName, FullName and FamilyName attributes.

Please note that these fonts size is rather big. About 250 KBytes each. The way how the htmldoc includes fonts to PDF now far from optimal, so expect the resulting PDF file size not less than 1MB. If you think that this is too much, you can significantly reduce the size of PDF using Ghostscript together with htmldoc. You will find some tips in the Readme file inside package.

To install the fonts, unbzip and untar the archive. It will be automatically extracted into the fonts/ directory. Then overwrite the htmldoc original fonts with the extracted ones. By default, the htmldoc fonts are located in the /usr/share/htmldoc/fonts/ directory.

Example of Decoding Script

Parallels Business Automation - Standard provides the HTML content, encoded with UTF-8, and with all symbols having number greater than 127, replaced with &#number; HTML entities, where number is actual symbol number, for example, ñ.

Below is the example script (to_pdf.pl), that converts the Parallels Business Automation - Standard data into the format suitable for htmldoc utility, then calls htmldoc and creates PDF in the Cyrillic font. Put the script into the directory accessible and executable for apache.

#!/usr/bin/perl

# Convert source files to 1251 encoding and PDF

# Usage: perl topdf.pl result_filename [source html files]

use strict;

use Encode qw(from_to);

my $f = shift;

foreach my $file (@ARGV) {

my $text = load_file($file);

Encode::_utf8_on($text);

$text =~ s/&\#(\d+);/chr($1)/ge; ## fix html characters after 127

Encode::_utf8_off($text);

from_to($text,'utf8','cp-1251'); ## encode

save_file($file,$text);

}

## call htmldoc

system ("/usr/bin/htmldoc --webpage --embedfonts --charset cp-1251 -f $f @ARGV");

sub load_file {

my ($file) = @_;

open (F, "< $file") or die $!;

local $/ = undef;

<F>;

}

sub save_file {

my ($file,$text) = @_;

open (F, "> $file") or die $!;

local $/ = undef;

print F $text;

close(F);

}

Configuring PDF Generator in Parallels Business Automation - Standard

The last, but not the least thing to do is specifying the path to the decoding script in the PDF Generator command line template.

Log in to the Provider Control Center, and go to Configuration Director > Miscellaneous Settings > PDF Generator Setup. Click the Edit button and enter the command template into the PDF Generator Template field. For example, if you have put the decoding script into the perl/var/opt/hspc-root/ directory, enter the following:

perl /var/opt/hspc-root/topdf.pl %target_file% %source_files%

Please send us your feedback on this help page