E-books and their formats: FB2 and FB3 - history, pros, cons and principles of work

In the previous article, we talked about features of the DjVu format. Today we decided to focus on the FictionBook2 format, better known as FB2, and its "successor" FB3.

E-books and their formats: FB2 and FB3 - history, pros, cons and principles of work
/flickr/ Judith Klein / CC

The appearance of the format

In the mid-90s, enthusiasts Have begun digitize Soviet books. They translated and preserved literature in a wide variety of formats. One of the first libraries in Runet - Maxim Moshkov's Library β€” used a formatted text file (.txt).

The choice in its favor was made due to resistance to byte corruption and versatility - TXT opens on any operating system. However, he hindered processing of stored text information. For example, to move to the thousandth line, you had to process 999 lines before it. Books also kept in "Word" documents and PDF - the latter was difficult to convert to other formats, and weak computers opened and displayed PDF documents with delays.

HTML was also used to "storage" electronic literature. It simplified indexing, converting to other formats, and creating documents (tagging text), but introduced its own drawbacks. One of the most significant wasvaguenessΒ» standard: he allowed certain liberties when writing tags. Some of them had to be closed, others (for example, ) - it was not necessary to close. The tags themselves could have an arbitrary nesting order.

And although such work with files was not encouraged - such documents were considered incorrect - the standard required readers to try to display the content. This is where the difficulties arose, since in each application the process of β€œthinking” was implemented in its own way. At the same time, devices and applications for reading available on the market at that time understood one or two specialized formats. If the book was available in one format, it had to be reformatted in order to be read. To solve all these shortcomings and was called FictionBook2, or FB2, which took over the initial β€œcombing” of the text and converting.

Note that the format had the first version βˆ’ FictionBook1 - however, it was only experimental, did not last long, is not currently supported and is not backward compatible. Therefore, FictionBook most often means its β€œfollower” - the FB2 format.

FB2 was created by a development team led by Dmitry Gribov, who is the technical director of the LitRes company, and Mikhail Matsnev, the creator of the Haali Reader. The format is based on XML, which is stricter than HTML in regulating the work with unclosed and nested tags. An XML document is accompanied by a so-called XML schema. An XML schema is a special file that contains all the tags and describes the rules for their use (sequence, nesting, mandatory and optional, etc.). In FictionBook, the schema is in the FictionBook2.xsd file. An example XML schema can be found at link (it is used by the LitRes e-book store).

FB2 Document Structure

Text in document kept in special tags - elements of paragraph types: , And . There is also an element , which has no content and is used to insert gaps.

All documents start with a root tag , below which , , And .

Tag contains style sheets to facilitate conversion to other formats. IN lie encoded with base64 data that may be needed to render the document.

Element contains all the necessary information about the book: the genre of the work, the list of authors (full name, e-mail address and website on the Internet), title, block with keywords, annotation. It may also contain information about the changes made to the document and information about the publisher of the book, if it was issued on paper.

This is what the block looks like in the FictionBook entry for works "Study in Scarlet" by Arthur Conan Doyle, taken from Project Gutenberg:

<?xml version="1.0" encoding="iso-8859-1"?>
 <FictionBook 
  >
  <description>
    <title-info>
      <genre match="100">detective</genre>
      <author>
        <first-name>Arthur</first-name>
        <middle-name>Conan</middle-name>
        <last-name>Doyle</last-name>
      </author>
      <book-title>A Study in Scarlet</book-title>
      <annotation>
      </annotation>
      <date value="1887-01-01">1887</date>
    </title-info>
  </description>

The key component of a FictionBook document is . It contains the actual text of the book. There can be several of these tags in the entire document - additional blocks are used to store footnotes, comments and notes.

FictionBook also provides several tags for dealing with hyperlinks. They are based on specifications xLinkdeveloped by the consortium W3C specifically for creating links between different resources in XML documents.

Advantages of the format

The FB2 standard includes only the minimum required set of tags (sufficient for the "decoration" of fiction), which simplifies its processing by readers. Moreover, in the case of direct work of the reader with the FB format, the user gets the opportunity to customize almost all display parameters.

The strict structure of the document allows you to automate the process of converting from the FB format to any other. The same structure makes it possible to work with individual elements of documents - to set up filters by book authors, title, genre, etc. For this reason, the FB2 format has gained popularity in Runet, becoming the default standard in Russian electronic libraries and libraries of the CIS countries.

Format Disadvantages

The simplicity of the FB2 format is its advantage and disadvantage at the same time. This limits the functionality for complex text layout (for example, marginal notes). It does not have vector graphics and support for numbered lists. For this reason, the format not very suitable for textbooks, reference books and technical literature (even the name of the format itself says this - fiction book, or β€œfiction book”).

At the same time, in order to display the minimum information about the book - the title, author, and cover - the program needs to process almost the entire XML document. This is because the metadata is at the beginning of the text and the images are at the end.

FB3 - format development

In connection with the increased requirements for formatting book texts (and in order to level out some of the shortcomings of FB2), Gribov began work on the FB3 format. Later development stopped, but in 2014 was resumed.

According to the authors, they studied the real needs in the publication of technical literature, looked at textbooks, reference books, manuals, and outlined a more specific set of tags that would allow any book to be displayed.

In the new specification, the FictionBook format is a zip archive that stores metadata, images, and text in separate files. The requirements for the zip file format and conventions for its organization are spelled out in the standard ECMA-376An that defines Open XML.

A number of improvements were made related to formatting (spacing, underlining) and a new object was added - "block" - which forms an arbitrary fragment of the book in the form of a quadrangle and can be embedded in text with wrapping. Added support for numbered and bulleted lists.

FB3 is distributed under a free license and has an open source, so all utilities are available to publishers and users: converters, cloud editors, readers. current version format, reading room ΠΈ editor can be found in the project repository on GitHub.

In general, FictionBook3 is still less common than its older brother, but books in this format are already offered by several electronic libraries. And a couple of years ago, LitRes announced its intention to transfer its entire catalog to a new format. Some readers already support all the necessary FB3 functionality. For example, all modern ONYX reader models can work with this format out of the box, for example, Darwin 3 or Cleopatra 3.

E-books and their formats: FB2 and FB3 - history, pros, cons and principles of work
/ ONYX BOOX Cleopatra 3

Wider distribution of FictionBook3 will create an ecosystem that oriented for full-fledged and efficient work with text on any device with limited resources: black and white or small display, low memory, etc. According to the developers, a once laid out book will be as convenient as possible in any environment.

PS We bring to your attention several reviews of ONYX BOOX readers:



Source: habr.com

Add a comment