pdf reference官方指南之-语法基础和文件结构

object的概念和种类

翻译:

pdf reference官方指南之-语法基础和文件结构

英文: 

The syntax of PDF comprises the four main elements:

• These are the basic building blocks in PDF.Objects.

• It specifies how objects are laid out and modified in a PDF file.File structure.

• It determines how objects are logically organized to represent the contents of a PDF file (text, graphics, etc.).Document structure.

• They provide a means for efficient storage of various parts of the document content.Content streams.

翻译:

PDF中有9种基本对象类型。简单对象类型有布尔型、数值型、字符串型和Null型。PDF字符串的长度有界,并用括号“(”和“)括起来。类型名称用作PDF文档结构描述中的标识符。名称是使用字符“”引入的,可以包含除null(0x00)以外的任意字符。上述5种对象类型在本文中称为基元类型。数组是包含在方括号“[”和“]”中的PDF对象的一维有序集合。数组可能包含不同类型的PDF对象,包括嵌套数组。字典是包含在符号“”和“”之间的无序键值对集。键必须是名称对象,并且在字典中必须是唯一的。这些值可以是任何PDF对象类型,包括嵌套字典。流对象是后跟字节序列的PDF字典。字节表示可以压缩或加密的信息,关联的字典包含是否以及如何解码字节的信息。这些字节通常包含要呈现的内容,但也可能包含一组其他对象[3]。最后,间接对象是任何先前定义的对象,该对象具有唯一的对象标识符,并包含在关键字obj和endobj中。由于间接对象具有唯一的标识符,因此可以通过间接引用从其他对象引用间接对象。/<<>>

英文:

 There are 9 basic object types in PDF. Simple object types are Boolean, Numeric, String and Null. PDF strings have bounded length and are enclosed in parentheses '(' and ')'. The type Name is used as an identifier in the description of the PDF document structure. Names are introduced using the character '' and can contain arbitrary characters except null (0x00). The aforementioned 5 object types will be referred to as primitive types in this paper. An Array is a one-dimensional ordered collection of PDF objects enclosed in square brackets, '[' and ']'. Arrays may contain PDF objects of different type, including nested arrays. A Dictionary is an unordered set of key-value pairs enclosed between the symbols '' and ''. The keys must be name objects and must be unique within a dictionary. The values may be of any PDF object type, including nested dictionaries. A Stream object is a PDF dictionary followed by a sequence of bytes. The bytes represent information that may be compressed or encrypted, and the associated dictionary contains information on whether and how to decode the bytes. These bytes usually contain content to be rendered, but may also contain a set of other objects[3]. Finally, an Indirect object is any of the previously defined objects supplied with a unique object identifier and enclosed in the keywords obj and endobj. Due to their unique identifiers, indirect objects can be referenced from other objects via indirect references./<<>>

字典对象(Dictionary Objects)的例子:

pdf reference官方指南之-语法基础和文件结构

 流对象的例子:

pdf reference官方指南之-语法基础和文件结构

 Indirect Objects的例子:

pdf reference官方指南之-语法基础和文件结构

---------------------------------------------------

File structure and Document structure

 pdf reference官方指南之-语法基础和文件结构

This  escribes how objects are organized in a PDF file for efficient random access and incremental update. A basic conforming PDF file shall be constructed of following four elements (see Figure 2):
• A one-line header identifying the version of the PDF specification to which the file conforms
• A body containing the objects that make up the document contained in the file
• A cross-reference table containing information about the indirect objects in the file
• A trailer giving the location of the cross-reference table and of certain special objects within the body of the file .

pdf reference官方指南之-语法基础和文件结构

The body of a PDF file shall consist of a sequence of indirect objects representing the contents of a document. The objects, which are of the basic types described in 7.3, "Objects," represent components of the document such as fonts, pages, and sampled images. Beginning with PDF 1.5, the body can also contain object streams, each of which contains a sequence of indirect objects; see 7.5.7, "Object Streams."

PDF文件的Body应由表示文件内容的一系列间接对象组成。对象属于7.3“对象”中描述的基本类型,表示文档的组件,如字体、页面和采样图像。从PDF 1.5开始,主体还可以包含对象流,每个对象流包含一系列间接对象;见7.5.7“对象流” 

以下列表显示了来自真实PDF文件的示例性结构路径:(BODY)

pdf reference官方指南之-语法基础和文件结构

 

PDF对象的语法如图1左侧所示的简化示例性PDF文件所示。它包含四个由两部分对象标识符表示的间接对象,例如,第一个对象为1 0,以及obj和endobj关键字。这些对象是字典,因为它们被符号“”和“”包围。第一个是目录字典,由其类型条目表示,该条目包含一个带有值Catalog的PDF名称。该目录有两个额外的字典条目:Pages和OpenAction。OpenAction是嵌套字典的一个示例。它有两个条目:S,一个表示这是JavaScript动作字典的PDF名称,和JS,一个包含要执行的实际JavaScript脚本的PDF字符串:alert('Hello!');。Pages是对对象标识符为3 0的对象的间接引用:目录后面紧跟的Pages字典。它有一个整数Count,表示文档中有2个页面,还有一个由方括号标识的数组,其中有两个对页面对象的引用。相同的对象类型用于构建其余的页面对象。请注意,每个页面对象在其父条目中都包含对页面对象的反向引用。总共有三个引用指向同一个间接对象3 0,即Pages对象。<<>>

各种基本对象之间的关系构成了PDF文件的逻辑树形文档结构,如图1的中间部分所示。文档结构中的节点本身就是对象,边与子对象在父对象中的名称相对应。对于数组,父子关系是无名的,对应于单个元素的整数索引。请注意,严格地说,文档结构不是一棵树,而是一个有方向的潜在循环图,因为间接引用可能指向文档结构中任何位置的其他对象。文档结构中的根节点是一个特殊的PDF字典,其强制类型条目包含名称目录。原语类型的任何对象都构成文档结构中的叶。

---------------------------------------------------- 

7.5.4 Cross-Reference Table

The cross-reference table contains information that permits random access to indirect objects within the file so that the entire file need not be read to locate any particular object. The table shall contain a one-line entry for each indirect object, specifying the byte offset of that object within the body of the file.

交叉引用表包含允许对文件中的间接对象进行随机访问的信息,因此无需读取整个文件来定位任何特定对象。该表应包含每个间接对象的一行条目,指定该对象在文件正文中的字节偏移量。(从PDF 1.5开始,部分或全部交叉参考信息也可以包含在交叉参考流中;请参见7.5.8“交叉参考流”。

pdf reference官方指南之-语法基础和文件结构

 其中

xref代表Cross-Reference Table的开始

0 代表起始object number

26 代表此表格indirect object entry的个数

pdf reference官方指南之-语法基础和文件结构

 n 代表 in-use, f 代表 free

交叉参考表(包括原始交叉参考部分和所有更新部分)应包含从0到文件中定义的最大对象编号的每个对象编号的一个条目,即使该范围内的一个或多个对象编号实际上并未出现在文件中。

pdf reference官方指南之-语法基础和文件结构

注意: ggggg 是生成号,可以看作是版本号。

7.5.5 File Trailer

The trailer of a PDF file enables a conforming reader to quickly find the cross-reference table and certain special objects. Conforming readers should read a PDF file from its end. The last line of the file shall contain only the end-of-file marker, %%EOF. The two preceding lines shall contain, one per line and in order, the keyword startxref and the byte offset in the decoded stream from the beginning of the file to the beginning of the xref keyword in the last cross-reference section. The startxref line shall be preceded by the trailer dictionary, consisting of the keyword trailer followed by a series of key-value pairs enclosed in double angle brackets (<< … >>) (using LESS-THAN SIGNs (3Ch) and GREATER-THAN SIGNs (3Eh)). Thus, the trailer has the following overall structure:

pdf reference官方指南之-语法基础和文件结构

 

上一篇:AD18中元器件的中心点标注和叉叉怎么取消掉


下一篇:Python绘制Excel图表