HTML_ToPDF Version 3.4 http://www.rustyparts.com/pdf.php What is HTML_ToPDF? ------------------- HTML_ToPDF is a PHP class that makes it easy to convert HTML documents to PDF files on the fly. HTML_ToPDF grew out of the need to convert HTML files (which are easy to create) to PDF files (which are not so easy to create) fast and easily. It has the following features: * The ability to convert images in the webpage to images embedded in the PDF. The script tries to convert relative image paths in to absolute ones as well. * The ability to use the CSS in the HTML file in the creation of the PDF. This includes remote CSS files as well. * The ability to convert remote files * The ability to convert links into embedded clickable links in the PDF file. * The ability to scale the HTML page. * Easy setting of any of these options through the methods of the class. * Tries to fix quirks in html pages which break html2ps. * Works on both Unix/Linux and Windows. What is PDFEncryptor? --------------------- PDFEncryptor is a helper class that comes with this package. It is also a wrapper for a couple of free java libraries that allow you to digitally sign and protect the PDFs you create by adding a password, permissions (such as printing and copying rights), and meta-data such as the Keywords and Author. Obtaining HTML_ToPDF -------------------- Further information on HTML_ToPDF and the latest version can be obtained at http://www.rustyparts.com/pdf.php Installation ------------ There is no real installation other than requiring the file. See the examples directory for help in how to use the class. However, you do need to make sure that whatever directory the PDF files are going to be created in is writable by the user the webserver runs as. Also note you need to install the programs below. Requirements ------------ * PHP: Version 4.0.4 or greater (http://www.php.net). * html2ps: Does the initial conversion of html to a postscript file, and thus is the most crucial part of the conversion. More information about it is here: http://www.tdb.uu.se/~jan/html2ps.html You will especially want to read the user's guide here: http://www.tdb.uu.se/~jan/html2psug.html if this script is not making the pdf look like you want. * ps2pdf: This comes with the Ghostscript package, and can be found here: http://www.cs.wisc.edu/~ghost/ This package is normally installed as an RPM on RedHat systems, so if you're using that OS you shouldn't have to worry. There is a windows install for this as well. * curl: Or some program that grabs documents off the web (lynx, w3m, etc.). HOWEVER, for whatever reason only curl has worked in my tests when the script page is running as a web script and not just a script from the command line. Curl can be found here: http://curl.haxx.se/ * Valid HTML: Good HTML (XHTML is best) definitely helps. * PDFEncryptor needs java (http://java.sun.com) and the iText jar file (http://itext.sourceforge.net/downloads/) and the bcprov jar (http://mirrors.ibiblio.org/pub/mirrors/maven2/bouncycastle/bcprov-jdk14/138/bcprov-jdk14-138.jar) Included Files -------------- The following files are available in the HTML_ToPDF distribution: README - This file CHANGES - The list of changes examples/ - Several example files to aid you on your way docs/ - Class documentation (in HTML) lib/ - Where some of the encryption libraries live Common Problems --------------- * How do I make this stupid thing work under windows?! It took me many headaches to figure out, but here is what I came up with, YMMV: - Make sure you install the windows version of all the programs above (ghostscript, ImageMagick, curl, etc.) - html2ps requires a bit more work to get working than the other programs. Follow these steps to get it working: * Install ActivePerl (http://www.activestate.com/) * Download html2ps from http://user.it.uu.se/~jan/html2ps.html * Extract all the files to c:\html2ps * Create a file named html2psrc in c:\html2ps with this in it: @html2ps { package { ImageMagick: 1; } hyphenation { en { file: "c:\html2ps\hyphen.tex"; } } } * Open up the html2ps file in c:\html2ps folder with a decent text editor (i.e. VIM, not notepad!) and make the following changes: Change line 22 (or thereabouts) to: $globrc='c:\html2ps\html2psrc'; After this line (around 497): ($scr=$tmpname)=~/\w+$/; Add this: $scr='c:\html2ps\scratch'; Change line 3582 (or thereabouts) from: $_[1]=`$geturl '$url'`; to: $_[1]=`$geturl $url`; * If it is having trouble converting pngs try adding this line to your conversion script: $pdf->addHtml2PsSettings('package { libwww-perl: 1; }'); - If you use IIS, make sure you give the IUSR_xxxx user read, write (this is important), and exec permissions to the cmd.exe program in the system32 folder in your Windows system directory. - Put the following in your script (modify the paths to fit your installation, though) after instantiating the $pdf variable: $pdf->setHtml2Ps('perl c:\html2ps\html2ps'); $pdf->setPs2Pdf('c:\Program Files\gs\gs8.52\bin\gswin32c.exe'); $pdf->setTmpDir('\temp'); $pdf->setGetUrl('c:\PHP\curl.exe -i -s'); - You won't get any error output, so if it's not working, run each command (you can see this by turning debugging on -- $pdf->setDebug(true) ) from the command prompt to find out the problem. * No files are created. - Likely the problem is due to a lack of permissions. Either the tmp directory that HTML_ToPDF uses for intermediary files is not writable by the webserver or the output directory for the PDF file is not writable by the webserver. Possible fixes are: // Change the tmp directory $pdf->setTmpDir('/somewhere/writable'); // Change the directory permissions bash# chmod 777 www/pdfs; chown apache www/pdfs * The images don't show up. - Often the program getting the images (i.e. curl) is having a problem. These errors can usually be seen if the program is run from the command line. * ps2pdf fails with error code 127 - The problem is that the ps2pdf shell script does not have the path to the 'gs' executable. Look in the ps2pdf and see where it points to (usually it executes ps2pdf12 which in turn executes ps2pdfwr). Once your in the file that actually does something, you will see this line at the bottom of the file: exec gs $OPTIONS... Change that to: exec /path/to/gs $OPTIONS... * Creating a page break with isn't working - Make sure the tag is not inside a table. * ps2pdf returns "ERROR: /invalidfont in findfont" - See: http://groups.google.com/group/alt.php/browse_thread/thread/fe0d413d65aae946/27937584acedb288%2327937584acedb288 License ------- Both HTML_ToPDF.php and PDFEncryptor are under version 3 of the PHP license. Credits ------- Thanks to Jack Utano (http://tnloghomes.com) for funding development of ver. 3.0