Over the past few days I’ve been reading up on e-book file formats. I have a collection of short stories I want to publish, and I have a working understanding of the technology that readers will use to embrace that content, but until recently I haven’t worried too much about delivering content to that technology. (The main reason for my delay is simply the pace of change. Time spent trying to understand or master e-content technology six months ago would have put me at buggy-whip risk.)
As luck would have it, Mark Coker just released data about the file formats most in use on Smashwords, his e-publishing site. At the same time, Joel Friedlander pointed me to a useful video tutorial about formatting content using Adobe’s InDesign software, which seems to be the tool of choice for many people. From these two sources of information I was able to understand and easily navigate the first fork in the road on my own publishing journey.
Like any author, what I want is for my text — the words I’ve written using whatever tools I’ve chosen to use — to be available to as many people as possible. That’s going to be my main, unchanging goal, no matter what else happens in the future. Because of the technological time we live in, reaching that goal means providing my text in various file formats so it can be accessed by the end user. Ideally there would be only one file format for publishing text, and it would be open source — meaning no one would own or control that particular format, and anyone could use it without having to pay a per-use fee or buy a proprietary application. (For obvious reasons, this is not the preferred course for companies looking to profit from the dissemination of text.)
According to Mark Coker’s file-format data covering the past year on Smashwords, Adobe’s proprietary .pdf file format was the most-used format at 35%, followed by the open-source .epub format at 22%. Mark also noted that this was a change from the previous year, when .epub beat .pdf handily.
Why would a proprietary format beat out an open-source format? In this case the answer has as much to do with the demands of the content being published as it does with functionality of the file formats being used. As I recently learned, the .epub format’s strength is that it creates reflowable text — meaning text that adjusts itself depending on the size of the display, the font being used (if the user is able to change fonts), the size of the text, and various other variables.
From the point of view of many authors, however, this is also .epub’s weakness. If what you are publishing is simply a long string of text — as most fiction tends to be — then .epub works fine. If your content includes tables, images, sidebars and other layout-specific elements, then .epub quickly becomes a nightmare because you cannot control when and how these elements will display across all of the various e-readers and viewing applications.
The .pdf file format solves these layout-specific problems because it creates a static image — a picture — of each page. From the author’s point of view this is a godsend, because content will always display the same way for every user. For users, however, there is a downside. Precisely because .pdf text is not reflowable, it will not resize to fit each device or user setting. This means some users on some devices will need to zoom in and out to clearly see things like captions, table data, or sidebar text that may be in a smaller font. All of the information will be present as the author intended, but if the original page was 9 inches high by 6 inches wide, and the end user is looking at that same content on a Kindle or iPhone, there’s probably going to be some zooming involved — provided the device supports that functionality.
Because the stories I want to publish are straight text, the reflowable .epub format not only meets my needs as an author, but it provides the most transparent reading experience for end users. That’s a win-win for me because I don’t have to make any trade-offs between my own authorial needs and the end-user reading experience. Having said that, the appeal of the .pdf format is clear because it preserves all the work an author puts into page layout and structure. If I had content that was dependent on images, data or layout, I’d have to decide whether to use .pdf, or how to translate all of those assets into .epub-friendly equivalents. Ugh.
As a follow-up, I encourage you to watch the InDesign tutorial I mentioned above. I learned a lot in the few short minutes it took to watch, and I think it will give you valuable insight into these issues. It will also introduce you to the learning curve you’ll be facing if you decide you want to do some of the more complex stuff yourself.
As for me, I won’t be buying InDesign any time soon, for three reasons. First, at $699 it’s pricey. Second, the life of a successful writer will be defined as much by keeping costs under control as by anything else, so there aren’t going to be a lot of dollars going out until there are a lot of dollars coming in. Third, if I ever decide the software is worth having, I’ll still want to compare the total cost (in time and money) of buying it, learning how to use it and paying for future upgrades with the total cost of having someone else provide that service. If I can get the end result cheaper I’ll go with the service: if not, I’ll buy the software.
It should be noted that it is not necessary to buy InDesign, or any Adobe application, in order to create a .pdf file from most commonly created source text. Adobe’s proprietary tool for creating .pdf files is called Acrobat, and it currently sells for $299 in the U.S. The application that most computer users use to read .pdf files is called the Acrobat Reader, and it is distributed free — meaning anyone can freely read content that someone else has created as a .pdf. What is less commonly known is that the OpenOffice suite of applications includes a .pdf writer in its Write application, allowing documents created in a wide variety of file formats (including MS Word .doc) to be exported as .pdf files. More info here (scroll down).
— Mark Barrett
Mark,
I admire your tenacity and the logical approach you are taking to your quest. Like many other specialized production tasks that have been thrust into our laps in recent years, doing these format conversions has to rank as one of the most thankless. I mean, why learn the ins and outs of something one is only going to do rarely, if at all?
On my Mac I can export to PDF from any print screen, which is handy. PDF itself bears looking into because there are many “flavors” of PDF designed to produce documents for different purposes. A lot of the offset printing industry standardized on PDF workflow years ago, but the PDFs we make for high-end print production are quite different than the ones pumped out by Word.
Looking forward to the next installment!
I think an honest answer would be this: it’s just me.
I’m always interested in process, and in understanding how things are done. Given the historical moment we’re in, this file-format stuff seems like useful information, if only so I can blather about it when I’m old and gray and everyone is reading text on their retinas from projectors implanted in their skulls.
I look around and people are putting their work up on Smashwords or making POD books, and I don’t know how to do that. I know I could pay someone to do it for me, but if I want to do it more than once, or I want to keep my one-time costs under ruthless control, one option is to try to find out what all those people already know. So that’s what I’m doing.
It may well be folly. At times it feels like folly. But it’s still interesting to me.
remeber SGML versus PDF? PDF is an open standard currently used by all it has clear synergy with printing and design and does what it says on the can.
epub is a container and offers much but has many ‘open’ issues
there is much hype and much pragmitism as always
You’re right, of course. There’s a lot to like about .pdf, including the fact that a major corporation has invested a massive amount of capital in projecting and protecting the brand and its functionality. You’re also right that open-source is not a panacea. I’ve monkeyed around with GIMP enough to see the difference between the two development paths, and I’m not sure I prefer one over the other.
Still, I’m not a fan of proprietary tools that become industry standards. MP3 may be as ubiquitous as .pdf (or more so), but it’s not an open standard. Even if nobody can figure out who owns the licensing rights, it’s clear somebody does. I’d prefer not to see that happen to publishing. (Imagine if HTML required a license to use.)
Mark:
I don’t think you need to settle on a single format … but if you were going to do so, I would encourage you to use HTML.
HTML lacks PDF’s ability to set a rigid format…but to me, that is a strength, not a weakness, because I can’t tell you the number of PDFs I’ve stumbled across that are a royal pain to read on my screen, especially if a multi-column format has been used.
HTML is extremely user friendly — almost every device from PC to smartphone to tablet comes with a web browser, even the Kindle and the upcoming IPad. In short, if you’re on the web, you can read an HTML ebook — your stated goal is to keep your books as accessible as possible. HTML does the job.
Sure, you lose *absolute* control over placement of graphics and charts, but HTML does give you *enough* control to ensure the correct flow of information.
Not only that, HTML is easy to convert to other formats. You can save as plain text or PDF with no need for manual adjustment on your part. (I use Puppy Linux, a Linux live CD, and it has an instant “print to PDF” option within SeaMonkey, the Mozilla-based browser). It is relatively easy to convert from HTML to Epub, Mobi (for the Kindle) and any number of other formats with the right software such as Sigil or Calibre.
Hi Bill,
The choice between .pdf and .epub was meant to be Smashwords-centric. I work in Word and want the shortest conversion process between that format and being able to publish to that site.
I’m still also hazy on the difference between HTML (and varieties thereof) and XML. There seems to be some agreement that XML is the (or a) growing standard, which is good, but there are still so many downstream conversion choices (as you point out.)
I know I have a great deal more to learn, and that I also need to think seriously of how to convert my text(s) to one basic format which can then easily be moved to all supported reader devices. Smashwords seems a fluke in the respect that they prefer content coming in via the .doc format. It’s a useful fluke for me personally, but I know it’s not the norm.
PDF has been, since 2008, an officially open format. See http://en.wikipedia.org/wiki/Portable_Document_Format :
“While the PDF specification was available for free since at least 2001, PDF was originally a proprietary format controlled by Adobe, and was officially released as an open standard on July 1, 2008, and published by the International Organization for Standardization as ISO 32000-1:2008. The ISO 32000-1 allows use of some specifications, which are not standardized (e.g. Adobe XML Forms Architecture). ISO 32000-1 does not specify methods for validating the conformance of PDF files or readers. In 2008, Adobe published a Public Patent License to ISO 32000-1 granting a royalty-free rights for all patents owned by Adobe that are necessary to make, use, sell and distribute PDF compliant implementations.”