E-books and their formats: talking about EPUB - its history, pros and cons

Earlier in the blog, we wrote about how e-book formats appeared DjVu ΠΈ FB2.

The topic of today's article is EPUB.

E-books and their formats: talking about EPUB - its history, pros and cons
Image: Nathan Oakley / CC BY

Format history

In the 90s, the e-book market was dominated by proprietary solutions. And many reader manufacturers had their own format. For example, NuvoMedia used files with the .rb extension. These were containers with an HTML file and an .info file containing metadata. This state of affairs complicated the work of publishers - they had to typeset books for each format separately. A group of engineers from Microsoft, the already mentioned NuvoMedia and SoftBook Press, took over to fix the situation.

At that time, Microsoft was going to conquer the e-book market and was developing a reader application for Windows 95. It can be said that the creation of a new format was part of the business strategy of the IT giant.

If we talk about NuvoMedia, then this company is considered the manufacturer of the first mass e-reader Rocket eBook. The internal memory of the device was only eight megabytes, and the battery life did not exceed 40 hours. As for SoftBook Press, they also developed e-readers. But their devices had a distinctive feature - a built-in modem - it allowed you to download digital literature directly from the SoftBookstore.

At the beginning of the XNUMXs, both companies - NuvoMedia and SoftBook - were bought by the media company Gemstar and merged into the Gemstar eBook Group. This organization has been selling readers for several years (for example, RCA REB 1100) and digital books, but in 2003 went out of business.

But back to the development of a single standard. In 1999, Microsoft, NuvoMedia, and SoftBook Press founded the Open eBook Forum, which began working on the draft document that launched EPUB. Initially standard was called OEBPS (stands for Open EBook Publication Structure). It allowed the digital publication to be distributed in a single file (ZIP archive) and made it easier to transfer books between different hardware platforms.

Later, IT companies Adobe, IBM, HP, Nokia, Xerox and publishers McGraw Hill and Time Warner joined the Open eBook Forum. Together, they continued to develop the OEBPS and developed the digital literature ecosystem as a whole. In 2005, the organization was renamed the International Digital Publishing Forum, or IDP extension.

In 2007, IDPF changed the name of the OEBPS format to EPUB and began developing its second version. It was presented to the general public in 2010. The novelty almost did not differ from its predecessor, however got support vector graphics and embedded fonts.

By this time, EPUB was taking over the market and became the default standard for many publishers and manufacturers of electronic gadgets. The format has already been used by O'Reilly and Cisco Press, plus Apple, Sony, Barnes & Noble, ONYX BOOX devices supported it.

In 2009, the Google Books project объявил about supporting EPUB - it has been used to distribute over a million free books. The format began to gain popularity among writers. In 2011 JK Rowling told about plans launch the Pottermore website and make it the only digital selling point for Potter books.

EPUB was chosen as the standard for distributing literature, primarily because of its ability to implement copy protection (DRM). All books in the writer's online store so far only available in this format.

The third version of the EPUB format was released in 2011. The developers have added the ability to work with audio and video files and footnotes. Today, the standard continues to evolve - in 2017 IDPF even entered a member of the W3C consortium, which implements technology standards for the World Wide Web.

How EPUB works

An EPUB book is a ZIP archive. It stores the text of the publication in the form of XHTML or HTML pages or PDF files. The archive also contains media content (audio, video or images), fonts and metadata. It may also contain additional files with CSS styles or PLS-documents with information for speech generation services.

XML markup is responsible for displaying content. Book fragment with embedded audio and image might look like this:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html>
<html  
    
    epub_prefix="media: http://idpf.org/epub/vocab/media/#">
    <head>
        <meta charset="utf-8" />
        <link rel="stylesheet" type="text/css" href="../css/shared-culture.css" />
    </head>
    <body>
        <section class="base">
            <h1>the entire transcript</h1>
            <audio id="bgsound" epub_type="media:soundtrack media:background"
                src="../audio/asharedculture_soundtrack.mp3" autoplay="" loop="">
                <div class="errmsg">
                    <p>Your Reading System does not support (this) audio</p>
                </div>
            </audio>

            <p>What does it mean to be human if we don't have a shared culture? What
 does a shared culture mean if we can't share it? It's only in the last
 100, or 150 years or so, that we started tightly restricting how that
 culture gets used.</p>

            <img class="left" src="../images/326261902_3fa36f548d.jpg"
                alt="child against a wall" />
        </section>
    </body>
</html>

In addition to content files, the archive contains a special navigation document (Navigation Document). It describes the arrangement of text and images in a book. Reader applications refer to it if the reader wants to "jump" through several pages.

Another required file in the archive is package. It includes metadata - information about the author, publisher, language, title, and so on. It also includes a list (spine) of subsections of the book. An example of a package document can be viewed in the IDPF repository on GitHub.

Advantages

The advantage of the format is its flexibility. EPUB allows you to create dynamic document layout that adapts to the screen size of your device. This is one of the main reasons why the format supports a large number of readers (and other electronic devices). For example, all ONYX BOOX readers work with EPUB out of the box: from the base and 6-inch caesar 3 to premium and 9,7" Euclid.

E-books and their formats: talking about EPUB - its history, pros and cons
/ ONYX BOOX Caesar 3

Since the format is based on popular standards (XML), it is easy to convert it for reading on the web. EPUB also supports interactive elements. Yes, there are similar elements in PDF, but you can only add them to a PDF document using proprietary software. In the case of EPUB, they are added to the book with markup and XML tags in any text editor.

Another advantage of EPUB is the features for people with vision problems or dyslexia. The standard allows you to modify the display of text on the screen - for example, highlight certain letter combinations.

EPUB also, as we have already noted, gives the publisher the opportunity to set copy protection. Optional e-book sellers can use their mechanisms that restrict access to the document. To do this, you need to modify the rights.xml file in the archive.

Disadvantages

To create an EPUB publication, you need to understand the syntax of XML, XHTML, and CSS. In this case, you have to work with a large number of identifier tags. For comparison, the same FB2 standard includes only the minimum necessary set of tags - sufficient for imposition of fiction. And to create PDF Documents no special knowledge is required at all - specialized software is responsible for everything.

EPUB has also been criticized for the complexity of the design of comics and other books with many illustrations. In this case, the publisher has to create a static layout with fixed coordinates for each image, which can take a lot of effort and time.

What's next

IDPF is currently working on new specifications for the format. For example, one of them will help create interactive tutorials with hidden sections. The same book will look different for a teacher and a student - in the second case, for example, answers to tests or control questions will be hidden.

E-books and their formats: talking about EPUB - its history, pros and cons
Image: Guian Bolisay / CC BY SA

It is expected that the new feature will help reorganize the educational process. Today, EPUB is quite actively used by large universities, such as the University of Oxford. A few years ago they added EPUB 3.0 support in your digital library application.

IDPF is also creating a specification for embedding Open Annotations in EPUB. This standard was developed by the W3C in 2013 and makes it easier to work with complex annotation types. For example, it can be used to put a note on a specific section of a JPEG image. optional standard implements the mechanism synchronization of changes in annotations between copies of the same EPUB document. Open Annotation Annotations You can add into EPUB files even now, but a formal specification for them has not yet been adopted.

Work is also underway on a new version of the standard - EPUB 3.2. It will show formats WOFF 2.0 ΠΈ SFNT, which are used to compress fonts (in some cases, they can reduce file sizes by 30%). Also, developers will replace some deprecated HTML attributes. For example, instead of a separate trigger element for activating audio and video files, the new standard will have native HTML audio and video elements.

Draft spit-up ΠΈ List of changes are already available in the W3C GitHub repository.

Reviews of ONYX-BOOX readers:

Source: habr.com

Add a comment