CN103761277A

CN103761277A - ePub electronic book loading method and system

Info

Publication number: CN103761277A
Application number: CN201410010411.9A
Authority: CN
Inventors: 陈轶; 王玮; 潘腾; 吴远青; 王旭东; 郭伟
Original assignee: BEIJING ZHANGKUO TECHNOLOGY Co Ltd
Current assignee: BEIJING ZHANGKUO TECHNOLOGY Co Ltd
Priority date: 2014-01-09
Filing date: 2014-01-09
Publication date: 2014-04-30

Abstract

The invention discloses an ePub electronic book loading method and system. The method includes: analyzing an ePub electronic book selected by a user to obtain the table of contents information, corresponding text word information and/or multimedia index information and multimedia resource file name information; analyzing the table of contents of the ePub electronic book selected by the user to obtain the text word information and/or multimedia index information and multimedia resource file name information corresponding to the table of contents, rendering, and displaying the rendered chapters to the user. The method supports mixed setting types of words and multimedia resources such as pictures, audios and videos. Due to the fact that only one text appointed by the user is analyzed, the contents of whole book are not loaded in internal memory, and internal memory load is reduced greatly.

Description

Method and system for loading ePub electronic book

Technical Field

The invention belongs to the field of mobile reading, and relates to a method and a system for loading a book file in an ePub format.

Background

Existing parsing for ePub ebooks is generally done in browsers. With conventional PCs, it has become customary to drag a scroll bar with a mouse or keyboard through a browser because the screen is relatively large. However, for a mobile phone, the screen is relatively small, and thus, it is obviously not friendly for the user to read through the scroll bar. In addition, the style of the browser is not suitable for reading at the mobile phone end, and the user can see a plurality of characters only by dragging the characters to the left and right directions, so that the reading experience is greatly reduced.

At present, the good reading experience of the mobile phone end is that more contents are read by turning pages by a user, and the contents do not roll up and down or roll left and right. And the user can freely adjust the line spacing and the font size according to the self condition. Therefore, the ePub parsing engine at the mobile phone device needs to cater to the e-book rendering engine written by the ePub parsing engine to perform matching parsing.

At present, some ePub rendering engines realized by the ePub rendering engine exist in the market, but most of the ePub rendering engines are plug-in implementation codes directly transplanted on an original browser, so that the program is large and heavy in size, the loading speed on a mobile phone is slow, and many ePub rendering engines do not support font adjustment and line spacing adjustment.

The common problems of the ePub rendering engine of the existing mobile phone end are as follows:

1. the realization volume is large and the analysis speed is slow.

2. Some electronic books have a font adjusting function, which requires a long time for adjusting a font each time, and these implementations add all the contents of the whole electronic book to a memory and then perform font adjusting processing in sequence without performing chapter splitting processing.

3. Almost all existing implementations of mobile phone terminals do not support a mixed image-text arrangement mode and a multimedia playing function.

Disclosure of Invention

The invention aims to provide a method and a system for analyzing book files in an ePub format.

The technical scheme adopted by the invention for solving the technical problems is as follows:

a loading method of an ePub e-book comprises the following steps:

analyzing the ePub electronic book selected by a user to obtain the directory information of the ePub electronic book, and the corresponding text information and/or multimedia index information and multimedia resource file name information;

analyzing the catalog of the ePub electronic book selected by the user;

and acquiring text information and/or multimedia index information and multimedia resource file name information of the text corresponding to the directory, rendering and displaying the rendered chapter content to a user.

Preferably, the analyzing the ePub e-book selected by the user to obtain the directory information of the ePub e-book, the text information and/or the multimedia index information of the corresponding text, and the multimedia resource file name information includes:

analyzing content.opf files in the ePub electronic book, and acquiring book names, authors and other introduction information of the whole electronic book of the corresponding electronic book;

acquiring a text file of the corresponding electronic book, and analyzing the text file to acquire text information and/or multimedia index information and multimedia resource file name information of the corresponding text;

and acquiring the NCX file corresponding to the corresponding electronic book, and analyzing the NCX file to obtain the directory information of the electronic book.

Preferably, the method further comprises the following steps:

storing the text information and/or the multimedia index information of the text as a single linear list; and/or, the multimedia resource file name information is also stored as a single linear list.

Preferably, the obtaining text information and/or multimedia index information and multimedia resource file name information corresponding to the directory, rendering and displaying the rendered chapter contents to a user includes:

acquiring text information and/or multimedia index information and multimedia resource file name information of the text corresponding to the directory, paging the text information and/or multimedia index information, and displaying the text information and/or multimedia index information to a user;

when the user clicks the multimedia index information, the corresponding multimedia resource file name is inquired according to the index, and the corresponding multimedia resource is displayed as an independent page.

Preferably, the text information and/or the multimedia index information is paginated and displayed to the user, further comprising:

paging the text information and/or the multimedia index information, caching and displaying the first page to a user, and caching the subsequent page number content in advance when the user reads the page.

Preferably, parsing the ePub e-book selected by the user further includes:

analyzing a text file, acquiring the information of the escape symbol contained in the text file, putting the information of the escape symbol into the text information and/or multimedia index information of the text, simultaneously analyzing an HTML (hypertext markup language) label in the subsequent analysis, and carrying out corresponding processing on the label supported by an analyzer;

wherein the HTML tag includes:

audio, bold, body parts, line breaks, headlines, italics, indexes to images, hyperlinks, paragraphs, headlines, video tags, any one or a combination.

Preferably, the method further comprises the following steps: acquiring text information and/or multimedia index information and multimedia resource file name information of the text corresponding to the directory, and paging and displaying the text information and/or multimedia index information to a user specifically comprises:

and putting the text information and/or the multimedia index information and the multimedia resource file name information of the text into a linear content list for storage, wherein the method comprises the following steps:

text content or image/video links in the form of a character string, whether the character string is text content or belongs to a multimedia type, multimedia links.

Preferably, the method further comprises the following steps: acquiring information of a user for adjusting fonts or line spacing, and adjusting the characters;

and when the article page is adjusted each time, the previously constructed linear content list is emptied and reconstructed.

A loading system for ePub e-books, comprising:

the system comprises an analysis engine module and a rendering engine module, wherein the analysis engine module is used for analyzing an ePub electronic book selected by a user to obtain directory information of the ePub electronic book, and corresponding text character information and/or multimedia index information and multimedia resource file name information;

analyzing the catalog of the ePub electronic book selected by the user to obtain text information and/or multimedia index information and multimedia resource file name information corresponding to the catalog;

and the rendering engine module is used for rendering and displaying the rendered chapter contents to a user.

Preferably, the parsing engine module is further configured to parse content.opf files in the ePub e-book, and obtain a title and an author of the corresponding e-book and other introduction information of the whole e-book;

The invention has the following advantages after adopting the scheme:

the invention mainly captures the character information and the multimedia link information of the electronic book, and abandons other information files for webpage typesetting such as css format files and the like in the character information and the multimedia link information, so that the typesetting format of the electronic book drawn by the parser and the renderer is uniform for all books.

In addition, different from the existing method of loading the content of the whole book into the memory, the method and the system can render the whole book according to chapters according to the chapter content in the corresponding file of the ePub, so that the response speed of a user is very high when the line spacing and the font size are adjusted.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

Drawings

The present invention will be described in detail below with reference to the accompanying drawings so that the above advantages of the present invention will be more apparent. Wherein,

fig. 1 is a schematic structural diagram of a loading system of an ePub electronic book according to an embodiment of the present invention;

fig. 2 is a schematic diagram of a parsing process of a parsing engine module of a loading system of an ePub e-book according to an embodiment of the present invention;

fig. 3 is a flowchart illustrating a loading method of an ePub electronic book according to an embodiment of the present invention.

Detailed Description

The following detailed description of the embodiments of the present invention will be provided with reference to the drawings and examples, so that how to apply the technical means to solve the technical problems and achieve the technical effects can be fully understood and implemented. It should be noted that, as long as there is no conflict, the embodiments and the features of the embodiments of the present invention may be combined with each other, and the technical solutions formed are within the scope of the present invention.

Additionally, the steps illustrated in the flow charts of the figures may be performed in a computer system such as a set of computer-executable instructions and, although a logical order is illustrated in the flow charts, in some cases, the steps illustrated or described may be performed in an order different than here.

Specifically, as shown in fig. 1, a loading system for an ePub e-book mainly includes: the system comprises a parsing engine module and a rendering engine module.

Different from the prior art, the parsing engine module in this embodiment mainly captures text information and multimedia link information of an electronic book, and discards other information files for webpage layout, such as cs format files, in the text information and multimedia link information, so that the electronic book layout format drawn by the parser and the renderer is uniform for all books. Therefore, a uniform reading experience can be provided for the user, and more functions can be added to the customized book.

In addition, unlike the existing method of loading the content of the whole book into the memory, in this embodiment, the parsing engine module renders the whole book according to chapters according to the content of the chapters in the corresponding file of the ePub, so that the response speed of the user is very fast when the line spacing and the font size are adjusted.

In addition, as compared with the existing system which only supports text, the parsing engine module in this embodiment supports media file formats supported by all epubs, and the rendering engine module plays multimedia according to the media file types.

Specifically, a loading system of an ePub electronic book includes:

As shown in fig. 3, a method for loading an ePub e-book includes:

step 1: analyzing the ePub electronic book selected by a user to obtain the directory information of the ePub electronic book, and the corresponding text information and/or multimedia index information and multimedia resource file name information;

step 2: analyzing the catalog of the ePub electronic book selected by the user;

and step 3: and acquiring text information and/or multimedia index information and multimedia resource file name information of the text corresponding to the directory, rendering and displaying the rendered chapter content to a user.

Preferably, the method further comprises the following steps:

Preferably, parsing the ePub e-book selected by the user further includes:

wherein the HTML tag includes:

In order to make the above advantages of the present invention more clear, each electronic book package complying with the ePub protocol must include a sub-directory named META-INF, and a file named container.

This file contains the directory path of the OPF file. XML file is read from META-INF subdirectory to obtain more information, for example, the XML parser scans < rootfile > tag, and then reads "full-path" key value, and the latter path is the path of OPF file.

In which, since the opf file may contain a plurality of files, for example, a whole set of files contains a plurality of books, the < rootfile > tag needs to be scanned for a plurality of times.

The OPF file includes information such as title, author, copyright, etc. of the entire book. In addition, the content contains directory, cover information and the xml text file name corresponding to each chapter content, and the text file and the OPF file are in the same path according to the specification.

Among these, in analyzing OPF, it is necessary to analyze < dc: title > (title), < dc: creator >, < dc: language >, < dc: rights > (copyright), < dc: publisher > (issuer) these tags.

Then, it is a very critical tag < manifest > in OPF. The label includes the directory file name of the whole book and the text file name corresponding to each chapter and chapter.

Specifically, each item in this tag is represented by an < item > tag, which is composed of an id key, an href key, and a media-type key. In an embodiment, a dictionary is used to store the information, wherein in the dictionary:

the keys of the dictionary are represented by id, and the values are represented by reference to a structure composed of href and media-type.

In an embodiment, the parsing engine module ignores the cs file and the page template file without parsing, wherein another tag < spine > is encountered after all < item > items in < manifest > are collected. This label contains the bibliography of this book and the reading order of each chapter.

Wherein, the id corresponding to the toc key in the < spine > tag is the file name of the NCX file, and the file contains more detailed directory information. And all items later consist of < itemref > tags.

The < itemref > tag contains the key idref, which corresponds to the id in the < item > item in < manifest >, so we can immediately find the file name corresponding to each < itemref >.

We deposit each < itemref > content in < spine > by using a linear list, since they are already in order.

After the NCX file path is acquired, the NCX file can be opened.

Wherein the < docTitle > tag represents a directory title. This is followed by the core tag < navMap > in the NCX file. Wherein < navMap > is a navigation map, indicating that each item inside can quickly jump to the corresponding content of the book. The < navMap > tag consists of a set of < navPoint > tags.

The < navPoint > tag details the information for this navigation item. This tag contains the key id, noting that this id may be different from the id of < item > in < manifest > before.

In addition, a playOrder key is included, followed by an integer to indicate the order of arrangement in the directory table. In addition, it contains a < navLabel > tag, describing the title of the directory entry; the < content > tag describes to which text file to jump.

Therefore, in an embodiment, the parsing engine module further includes: and the storage unit is used for storing each item of information in the < navMap > by using a linear list and sequencing the items according to the playOrder in the < navPoint >. A structure is used to store the < navLabel > content and the text index in the < content > tag.

And after the catalog is processed, displaying the catalog to the user, and analyzing each text file when the rendering engine module needs according to the click of the user.

The text file is usually xml as the file type name, or xml. In any format, the html tag is led out by an xml tag, and then the html tag is embedded in the xml tag. The xml tag contains the document type and the character encoding format. Because html on the Internet almost uses UTF-8 as a universal character set encoding format, we do not need to parse the xml tag part in detail, so html can be skipped by scanning directly.

Html has many built-in tags and escape symbols.

In an embodiment, the present parsing engine supports all of the escape symbols of HTML. Because the parsing engine is mainly used for article typesetting at a mobile phone end, only labels related to article segmentation and < link > labels are supported, and other labels are ignored. The labels supported by the ePub parsing engine are as follows:

< audio > -representing audio. When this tag is encountered, the parsing engine will save the audio link in a special multimedia link list. Each element of this list consists of two parts, the first part representing a hyperlink for a resource and the second part representing the resource type. The multimedia types supported by the renderer are as follows: three types of images, audio and video.

-bold. This tells the rendering engine to use bold font to describe the text between and . When meeting the label, the analysis engine records the real index of the bold font description text through a linear list, and then records the end index until scanning to the .

< body > -representing a body part.

-represents line feed. In the parsing engine, the line is uniformly represented by using '/r' character due to the requirement of the rendering engine.

< h1> to < h6> -represent titles 1 to 6. Each can be distinguished by a different font size. When the < h.

< head > -representing the head of the section, will typically contain a < title > tag to represent the title of the section.

< HTML > HTML start index. The content behind this tag is all HTML content.

-representing italic fonts. When the parser encounters and the process is similar to , the text segment is saved in a linear list for use by a subsequent rendering engine and is represented in italic font.

< img > -this tag represents the index to the image. When the parser encounters the < img > tag, the previous text is saved as a string element to a linear list. The image links in the current < img > tag are then saved in a linear list dedicated to saving multimedia links, identifying the multimedia type as an image. Finally, the list index where the image is located is saved after the text list.

< link > -this is also a more critical tag. < link > the following can follow video, audio, etc. in addition to images. This tag is handled the same as the < img > tag.

-represents a new paragraph. When the tag is encountered, the parser automatically inserts a wrap.

< title > -represents a title. If this tag is in < head >, then this is taken as the chapter title. If present in < body >, ignore.

< video > -represents video. The process is the same as < img >, identifying the multimedia type as video.

That is, in this embodiment, the parsing engine module has two linear lists after parsing all the text and multimedia tags.

Specifically, one is a linear list containing text information and multimedia element indexes, and the other is a linear list specially storing multimedia resource file names. The two lists are then submitted to a rendering engine for processing.

For the parsing engine side, the complete data flow is shown in fig. 2.

The rendering engine module has three major parts of available data: the name of the book, the author, and other information related to the entire book, the catalog of the book, and the text of the book.

The information related to the whole book, such as the book name, the author and the like, is acquired after the stage of analyzing content.opf is finished; the book directory is obtained by analyzing the NCX file; the text of the book is obtained by firstly finding out a text file through the text file name in the content.

The rendering engine module can customize how to display information such as book names, authors, creators, issuers, copyrights and the like. For example, in the simplest embodiment, a cover page map may also be found in the OPF file, which serves as the cover page for the book. Then, the directory of the book is exposed according to the contents in the NCX file. This need only be ordered by the < navPoint > tag in NCX.

In the present invention, the rendering engine module is mainly used for rendering the text, that is, in this embodiment, the text and the multimedia file are rendered separately.

Wherein the picture or video is given as one single page.

For video, a prompt is given to the user on the content page containing the audio, and the audio is played after the user clicks a certain button. The renderer implementer can also automatically play audio when the user flips to content containing audio pages.

The rendering engine module of the present embodiment is briefly described below, and specifically, the working mechanism and the included modules of the rendering engine module are as follows:

for the text file of a specific chapter, a linear list is used for storing all the contents of the chapter, wherein each element of the linear list is composed of three members:

1. textual content or image/video links (strings); 2. whether the content 1 belongs to text or multimedia type (boolean type); 3. and audio linking.

Since the content of a certain chapter of an electronic book is generally not too large, i.e. hundreds of pages at most, the use of the linear list to store the information can save time and space, and since the information of all chapters of the whole book is not stored in the linear list, and the method brings great convenience for determining the current page number and instantly asking for the content.

Wherein the rendering engine module further comprises: the paging unit is used for reading a text linear list given by the ePub analysis engine module, wherein if the current text is a text, whether the next list node is an audio type is judged, if not, the text is paged according to the font size and the line spacing set by the current user, then the content of each page is stored in the chapter content linear list, whether the member of the multimedia type is set to be 'No', and the audio link is set to be null;

if the next list node is audio type, then we set the audio link member at the first page content node of the text to this audio link after paging is complete.

If the current is video or image, the content node is set as the link of the video or image, then whether the multimedia type member is set as 'yes', and finally whether the next text type is audio is checked, if the text type is audio, the audio link of the content node is set as the audio link, otherwise, the audio link is set as 'null', and therefore, the content of a specific chapter is well paged.

Therefore, the rendering unit in the rendering engine module further renders the content of a certain page according to the constructed content linear table.

The rendering modes are many, and the text can be drawn by using an upper layer interface provided by a specific system, or by using a bottom layer interface provided by the system, or even by the bottom layer interface. Since the image and the video occupy one page separately, the processing is very convenient, and the prior art means can be adopted, which is not described in detail herein.

For example, when the user sets the font size or line spacing, all the contents of the current chapter need to be adjusted because the contents of each page may be changed. However, since the contents of a single page are very limited, it is very fast to process.

It should be noted that, in order to better implement the present invention, all the contents provided by the ePub parsing engine module cannot be destroyed, otherwise, a problem occurs when adjusting the article page according to the current setting environment, and in order to achieve fast rendering, when adjusting the article page each time, all the content linear tables constructed last time need to be cleared and then reconstructed, so that the memory can be utilized most efficiently, and waste of memory space is not caused.

In addition, in the embodiment, after we show images, videos or audios, if the user turns to the next page or the previous page, the multimedia playing resources of the current page are also closed, so as to save the memory space and the consumption of CPU resources.

The ePub parsing engine and rendering engine will be described in more detail below in connection with an existing case of the company on iOS systems.

Specifically, in one embodiment, it is primarily a book city product.

The ePub e-book loading system firstly scans the content of the locally stored ePub book, and displays the cover of each book obtained by previous parsing on the book shelf.

When a user clicks a book, the application activates the ePub parser, initializes the ePub parser, and parses the designated ePub file package. Wherein the ePub parser first finds the specified OPF file path in container. And then, analyzing the OPF file to obtain various information such as book name, author, copyright and the like. The complete class is shown below:

wherein, mContanainerVersion represents ePub protocol version number; mBaseFilePath represents the root directory of the book; mConntetBaseDir represents the text root directory path; mConntentFullPath represents the directory full path; mBookTitle represents the name of the book; mBookAuthor represents the author of the book; mCopyRights stands for copyright; mPublisher represents an issuer; mCoverImagePath represents the cover picture path. manifest ditect is a dictionary that stores the key and detailed text information for each text file entry after parsing the < manifest > tag. mToRecID represents the NCX file path. The mOrderedContentIDs represent a linear list of sorted body content IDs.

After the OPF is resolved, the NCX file is found by mToRecID to resolve the directory. After the directory is analyzed, mNavDocTitIe can be obtained and represents the title of the directory; mNavDocAuthor represents an author presented in a directory; the mNavContentArray represents a linear list of directory contents.

All of the above information is available to the application by calling getContents this method. The application can display the catalog on the screen. When a user clicks on a directory entry, the text content parser will be activated.

The parsing of the specified text can be completed after the application calls the parse method. Here, mtile stores the text header after the analysis; the mContentList stores the text content, which contains the index of the text content or the multimedia resource list. And mmedia list stores the file name of the multimedia asset. Then, the rendering engine starts to perform paging processing after obtaining the text content, and in the embodiment, the application uses a CoreText interface on the iOS bottom layer to perform paging processing.

And after paging is completed, performing cache type display. That is, when the reader looks at page 1, the content of page 2 is cached; page 1 is retained while page 2 is viewed, then page 3 contents are cached, and so on. Then, when the last page is seen, only the contents of the second last page are retained.

Compared with the prior art, the proposal of the application has the technical advantages

1. And the mixing arrangement of multimedia resources such as characters, pictures, audio, video and the like is supported.

2. The memory occupation is small. Because the proposal only takes one text specified by the user to analyze each time, the content of the whole text is not loaded into the memory, thereby greatly reducing the memory load.

In addition, the characteristic enables the user to wait for changing the font size, line spacing and the like in a short time, and the change can be completed within 2 seconds generally. And is typically completed in 0.5 seconds on the iPhone 5.

3. The analysis is efficient. Because the scheme uses the linear list element overlapping technology, namely, the elements of one linear list can be of a character string type or an integer type, the elements can be directly mapped to the elements of other corresponding linear lists, and time and space are saved.

It should be noted that for simplicity of description, the above method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects.

Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

Finally, it should be noted that: although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that changes may be made in the embodiments and/or equivalents thereof without departing from the spirit and scope of the invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A method for loading an ePub e-book, the method comprising:

analyzing the catalog of the ePub electronic book selected by the user;

2. The method for loading an ePub e-book according to claim 1, wherein analyzing the ePub e-book selected by the user to obtain directory information of the ePub e-book, and text information and/or multimedia index information and multimedia resource file name information of a corresponding text, includes:

3. The method for loading an ePub e-book according to claim 1 or 2, further comprising:

4. The method for loading an ePub e-book according to claim 1 or 2, wherein the step of obtaining text information and/or multimedia index information and multimedia resource file name information corresponding to the directory, rendering the text information and/or multimedia index information, and displaying the rendered chapter contents to a user includes:

5. The method for loading an ePub e-book according to claim 4, wherein the text information and/or multimedia index information is paginated and presented to a user, further comprising:

6. The method for loading an ePub e-book of claim 1, wherein parsing the ePub e-book selected by the user further comprises:

wherein the HTML tag includes:

7. The method for loading an ePub e-book of claim 4, further comprising: acquiring text information and/or multimedia index information and multimedia resource file name information of the text corresponding to the directory, and paging and displaying the text information and/or multimedia index information to a user specifically comprises:

8. The method for loading an ePub e-book of claim 7, further comprising: acquiring information of a user for adjusting fonts or line spacing, and adjusting the characters;

9. A loading system for ePub e-books, comprising:

10. The method for loading an ePub e-book of claim 9, wherein the parsing engine module is further configured to parse a content.opf file in the ePub e-book to obtain a title, an author, and other introduction information of the whole e-book;