这是 pdf2htmlEX 0.12 版本的帮助手册. 也可以直接在命令行执行 man pdf2htmlEX 来查看最新的帮助手册.

    如果发现本文档过期了,可以通过发起 issue 来通知我。

    1. pdf2htmlEX(1) pdf2htmlEX 通用手册 pdf2htmlEX(1)
    2. NAME
    3. pdf2htmlEX - PDF 转换成 HTML(保持原有的文本和格式)。
    4. 用法
    5. pdf2htmlEX [options] <input-filename> [<output-filename>]
    6. 描述
    7. pdf2htmlEX 是一个将 PDF 转换成 HTML 文件的工具.
    8. pdf2htmlEX 尝试准确的转换 PDF, 保留适当的样式和文本, 而且还会为 Web 展示做优化。
    9. PDF 中提取到的字体会被嵌入到 HTML 中( Type 3 号字体暂不支持)。 转换后的 HTML 文件中的文本通常是可以被选中和复制的。
    10. 其他对象会被渲染为图像并嵌入到 HTML 中。
    11. 选项
    12. 页面相关
    13. -f, --first-page <num> (默认值: 1)
    14. 指定要处理的第一页是那一页
    15. -l, --last-page <num> (默认值: 最后一页)
    16. 指定最后要处理的是那一页
    17. Dimensions
    18. --zoom <ratio>, --fit-width <width>, --fit-height <height>
    19. --zoom 指定了缩放因子; --fit-width/height
    20. 指定了页面的最大宽度和高度, 单位是像素.
    21. 如果指定了多个值,会使用最小值。
    22. 如果没有指定, 页面将以 72DPI 的规格来渲染.
    23. --use-cropbox <0|1> (默认值: 1)
    24. 使用 CropBox 来代替 MediaBox 作为输出.
    25. --dpi <dpi> (默认值: 144)
    26. 指定图像的横向和纵向的 DPI
    27. 输出
    28. --embed <string>
    29. --embed-css <0|1> (默认值: 1)
    30. --embed-font <0|1> (默认值: 1)
    31. --embed-image <0|1> (默认值: 1)
    32. --embed-javascript <0|1> (默认值: 1)
    33. --embed-outline <0|1> (默认值: 1)
    34. 指定哪些元素会被嵌入到输出的 HTML 文件中。
    35. 如果被关闭, 对应的组成部分(CSSfontimageJavascript)会被生成为单独的文件。
    36. --embed 接受字符串作为值。 具体的值必须是 `cCfFiIjJoO` 其中的一个, `cCfFiIjJoO` 对应的是
    37. --embed-*** 开关(比如 c 或者 C 就是代表 --embed-css). 其中小写字母代表具体的值为 0,大写代表具体的值为 1 例如, `--embed cFIJo` 表示除了 CSS 文件和 outlines (大纲) 其他部分都会被嵌入到 HTML 文件中.
    38. --split-pages <0|1> (默认值: 0)
    39. 如果打开的话,每个页面都会被作为一个单独的文件输出。
    40. 这个开关在你想要动态加载单个 HTML 页面的时候会很有用,这需要服务端的支持(要自己实现单独加载特定页面的逻辑).
    41. 也可以参考 --page-filename 选项。
    42. --dest-dir <dir> (默认值: .)
    43. 指定输出的目标文件夹(默认是当前文件夹)。
    44. --css-filename <filename> (默认值: <none>)
    45. 指定生成的 CSS 文件(如果使用的不是嵌入的方式的话)的文件名。
    46. 如果留空的话,对应的文件名会自动生成。
    47. --page-filename <filename> (默认值: <none>)
    48. Specify the filename template for pages when --split-pages is 1
    49. A %d placeholder may be included in `filename` to indicate where
    50. the page number should be placed. The placeholder supports a
    51. limited subset of normal numerical placeholders, including spec
    52. ified width and zero padding.
    53. If `filename` does not contain a placeholder for the page num
    54. ber, the page number will be inserted directly before the file
    55. extension. If the filename does not have an extension, the page
    56. number will be placed at the end of the file name.
    57. If --page-filename is not specified, <input-filename> will be
    58. used for the output filename, replacing the extension with .page
    59. and adding the page number directly before the extension.
    60. Examples
    61. pdf2htmlEX --split-pages 1 foo.pdf
    62. Yields page files foo1.page, foo2.page, etc.
    63. pdf2htmlEX --split-pages 1 foo.pdf --page-filename bar.baz
    64. Yields page files bar1.baz, bar2.baz, etc.
    65. pdf2htmlEX --split-pages 1 foo.pdf --page-filename page%dbar.baz
    66. Yields page files page1bar.baz, page2bar.baz, etc.
    67. pdf2htmlEX --split-pages 1 foo.pdf --page-filename bar%03d.baz
    68. Yields page files bar001.baz, bar002.baz, etc.
    69. --outline-filename <filename> (默认值: <none>)
    70. Specify the filename of the generated outline file, if not
    71. embedded.
    72. If it's empty, the file name will be determined automatically.
    73. --process-nontext <0|1> (默认值: 1)
    74. Whether to process non-text objects (as images)
    75. --process-outline <0|1> (默认值: 1)
    76. Whether to show outline in the generated HTML
    77. --printing <0|1> (默认值: 1)
    78. Enable printing support. Disabling this option may reduce the
    79. size of CSS.
    80. --fallback <0|1> (默认值: 0)
    81. Output in fallback mode, for better accuracy and browser compat‐
    82. ibility, but the size becomes larger.
    83. --tmp-file-size-limit <limit> (默认值: -1)
    84. This limits the total size (in KB) of the temporary files which
    85. will also limit the total size of the output file. This is an
    86. estimate and it will stop after a page, once the total temporary
    87. files size is greater than this number.
    88. -1 means no limit and is the default.
    89. Fonts
    90. --embed-external-font <0|1> (默认值: 1)
    91. Specify whether the local matched fonts, for fonts not embedded
    92. in PDF, should be embedded into HTML.
    93. If this switch is off, only font names are exported such that
    94. web browsers may try to find proper fonts themselves, and that
    95. might cause issues about incorrect font metrics.
    96. --font-format <format> (默认值: woff)
    97. Specify the format of fonts extracted from the PDF file.
    98. --decompose-ligature <0|1> (默认值: 0)
    99. Decompose ligatures. For example 'fi' -> 'f''i'.
    100. --auto-hint <0|1> (默认值: 0)
    101. If set to 1, hints will be generated for the fonts using font‐
    102. forge.
    103. This may be preceded by --external-hint-tool.
    104. --external-hint-tool <tool> (默认值: <none>)
    105. If specified, the tool will be called in order to enhanced hint‐
    106. ing for fonts, this will precede --auto-hint.
    107. The tool will be called as '<tool> <in.suffix> <out.suffix>',
    108. where suffix will be the same as specified for --font-format.
    109. --stretch-narrow-glyph <0|1> (默认值: 0)
    110. If set to 1, glyphs narrower than described in PDF will be
    111. stretched; otherwise space will be padded to the right of the
    112. glyphs
    113. --squeeze-wide-glyph <0|1> (默认值: 1)
    114. If set to 1, glyphs wider than described in PDF will be
    115. squeezed; otherwise it will be truncated.
    116. --override-fstype <0|1> (默认值: 0)
    117. Clear the fstype bits in TTF/OTF fonts.
    118. Turn this on if Internet Explorer complains about 'Permission
    119. must be Installable' AND you have permission to do so.
    120. --process-type3 <0|1> (默认值: 0)
    121. If turned on, pdf2htmlEX will try to convert Type 3 fonts such
    122. that text can be rendered natively in HTML. Otherwise all text
    123. with Type 3 fonts will be rendered as image.
    124. This feature is highly experimental.
    125. Text
    126. --heps <len>, --veps <len> (默认值: 1)
    127. Specify the maximum tolerable horizontal/vertical offset (in
    128. pixels).
    129. pdf2htmlEX would try to optimize the generated HTML file moving
    130. Text within this distance.
    131. --space-threshold <ratio> (默认值: 0.125)
    132. pdf2htmlEX would insert a whitespace character ' ' if the dis‐
    133. tance between two consecutive letters in the same line is wider
    134. than ratio * font_size.
    135. --font-size-multiplier <ratio> (默认值: 4.0)
    136. Many web browsers limit the minimum font size, and many would
    137. round the given font size, which results in incorrect rendering.
    138. Specify a ratio greater than 1 would resolve this issue, however
    139. it might freeze some browsers.
    140. For some versions of Firefox, however, there will be a problem
    141. when the font size is too large, in which case a smaller value
    142. should be specified here.
    143. --space-as-offset <0|1> (默认值: 0)
    144. If set to 1, space characters will be treated as offsets, which
    145. allows a better optimization.
    146. For PDF files with bad encodings, turning on this option may
    147. cause losing characters.
    148. --tounicode <-1|0|1> (默认值: 0)
    149. A ToUnicode map may be provided for each font in PDF which indi‐
    150. cates the 'meaning' of the characters. However often there is
    151. better "ToUnicode" info in Type 0/1 fonts, and sometimes the
    152. ToUnicode map provided is wrong. If this value is set to 1, the
    153. ToUnicode Map is always applied, if provided in PDF, and charac‐
    154. ters may not render correctly in HTML if there are collisions.
    155. If set to -1, a customized map is used such that rendering will
    156. be correct in HTML (visually the same), but you may not get cor‐
    157. rect characters by select & copy & paste.
    158. If set to 0, pdf2htmlEX would try its best to balance the two
    159. methods above.
    160. --optimize-text <0|1> (默认值: 0)
    161. If set to 1, pdf2htmlEX will try to reduce the number of HTML
    162. elements used for text. Turn it off if anything goes wrong.
    163. Background Image
    164. --bg-format <format> (默认值: png)
    165. Specify the background image format. Run `pdf2htmlEX -v` to
    166. check all supported formats.
    167. PDF Protection
    168. -o, --owner-password <password>
    169. Specify owner password
    170. -u, --user-password <password>
    171. Specify user password
    172. --no-drm <0|1> (默认值: 0)
    173. Override document DRM settings
    174. Turn this on only when you have permission.
    175. Misc.
    176. --clean-tmp <0|1> (默认值: 1)
    177. If switched off, intermediate files won't be cleaned in the end.
    178. --data-dir <dir> (默认值: /usr/local/share/pdf2htmlEX)
    179. Specify the folder holding the manifest and other files (see
    180. below for the manifest file)`
    181. --tmp-dir <dir> (默认值: /tmp)
    182. Specify the temporary folder to use for temporary files
    183. --css-draw <0|1> (默认值: 0)
    184. Experimental and unsupported CSS drawing
    185. --debug <0|1> (默认值: 0)
    186. Print debug information.
    187. Meta
    188. -v, --version
    189. Print copyright and version info
    190. --help Print usage information
    191. MANIFEST and DATA-DIR
    192. When split-pages is 0, the manifest file describes how the final html
    193. page should be generated.
    194. By default, pdf2htmlEX will use the manifest in the default data-dir
    195. (run `pdf2htmlEX -v` to check), which gives a simple demo of its syn‐
    196. tax.
    197. You can modify the default one, or you can create a new one and specify
    198. the correct data-dir in the command line.
    199. All files referred by the manifest must be located in the data-dir.
    200. EXAMPLE
    201. pdf2htmlEX /path/to/file.pdf
    202. Convert file.pdf into file.html
    203. pdf2htmlEX --clean-tmp 0 --debug 1 /path/to/file.pdf
    204. Convert file.pdf and leave all intermediate files.
    205. pdf2htmlEX --dest-dir out --embed fi /path/to/file.pdf
    206. Convert file.pdf into out/file.html and leave font/image files
    207. separated.
    208. COPYRIGHT
    209. Copyright 2012,2013 Lu Wang <coolwanglu@gmail.com>
    210. pdf2htmlEX is licensed under GPLv3 with additional terms, read LICENSE
    211. for details.
    212. AUTHOR
    213. pdf2htmlEX is written by Lu Wang <coolwanglu@gmail.com>
    214. SEE ALSO
    215. Home page
    216. https://github.com/pdf2htmlEX/pdf2htmlEX
    217. pdf2htmlEX Wiki
    218. https://github.com/pdf2htmlEX/pdf2htmlEX/wiki
    219. pdf2htmlEX 0.12 pdf2htmlEX(1)