CN103761277A - ePub electronic book loading method and system - Google Patents
ePub electronic book loading method and system Download PDFInfo
- Publication number
- CN103761277A CN103761277A CN201410010411.9A CN201410010411A CN103761277A CN 103761277 A CN103761277 A CN 103761277A CN 201410010411 A CN201410010411 A CN 201410010411A CN 103761277 A CN103761277 A CN 103761277A
- Authority
- CN
- China
- Prior art keywords
- information
- text
- multimedia
- epub
- book
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000011068 loading method Methods 0.000 title claims abstract description 26
- 238000009877 rendering Methods 0.000 claims abstract description 42
- 238000000034 method Methods 0.000 claims abstract description 37
- 230000008569 process Effects 0.000 description 4
- 230000006870 function Effects 0.000 description 3
- 238000004590 computer program Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 230000000717 retained effect Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/40—Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
- G06F16/43—Querying
- G06F16/438—Presentation of query results
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/40—Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
- G06F16/43—Querying
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Multimedia (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Document Processing Apparatus (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses an ePub electronic book loading method and system. The method includes: analyzing an ePub electronic book selected by a user to obtain the table of contents information, corresponding text word information and/or multimedia index information and multimedia resource file name information; analyzing the table of contents of the ePub electronic book selected by the user to obtain the text word information and/or multimedia index information and multimedia resource file name information corresponding to the table of contents, rendering, and displaying the rendered chapters to the user. The method supports mixed setting types of words and multimedia resources such as pictures, audios and videos. Due to the fact that only one text appointed by the user is analyzed, the contents of whole book are not loaded in internal memory, and internal memory load is reduced greatly.
Description
Technical Field
The invention belongs to the field of mobile reading, and relates to a method and a system for loading a book file in an ePub format.
Background
Existing parsing for ePub ebooks is generally done in browsers. With conventional PCs, it has become customary to drag a scroll bar with a mouse or keyboard through a browser because the screen is relatively large. However, for a mobile phone, the screen is relatively small, and thus, it is obviously not friendly for the user to read through the scroll bar. In addition, the style of the browser is not suitable for reading at the mobile phone end, and the user can see a plurality of characters only by dragging the characters to the left and right directions, so that the reading experience is greatly reduced.
At present, the good reading experience of the mobile phone end is that more contents are read by turning pages by a user, and the contents do not roll up and down or roll left and right. And the user can freely adjust the line spacing and the font size according to the self condition. Therefore, the ePub parsing engine at the mobile phone device needs to cater to the e-book rendering engine written by the ePub parsing engine to perform matching parsing.
At present, some ePub rendering engines realized by the ePub rendering engine exist in the market, but most of the ePub rendering engines are plug-in implementation codes directly transplanted on an original browser, so that the program is large and heavy in size, the loading speed on a mobile phone is slow, and many ePub rendering engines do not support font adjustment and line spacing adjustment.
The common problems of the ePub rendering engine of the existing mobile phone end are as follows:
1. the realization volume is large and the analysis speed is slow.
2. Some electronic books have a font adjusting function, which requires a long time for adjusting a font each time, and these implementations add all the contents of the whole electronic book to a memory and then perform font adjusting processing in sequence without performing chapter splitting processing.
3. Almost all existing implementations of mobile phone terminals do not support a mixed image-text arrangement mode and a multimedia playing function.
Disclosure of Invention
The invention aims to provide a method and a system for analyzing book files in an ePub format.
The technical scheme adopted by the invention for solving the technical problems is as follows:
a loading method of an ePub e-book comprises the following steps:
analyzing the ePub electronic book selected by a user to obtain the directory information of the ePub electronic book, and the corresponding text information and/or multimedia index information and multimedia resource file name information;
analyzing the catalog of the ePub electronic book selected by the user;
and acquiring text information and/or multimedia index information and multimedia resource file name information of the text corresponding to the directory, rendering and displaying the rendered chapter content to a user.
Preferably, the analyzing the ePub e-book selected by the user to obtain the directory information of the ePub e-book, the text information and/or the multimedia index information of the corresponding text, and the multimedia resource file name information includes:
analyzing content.opf files in the ePub electronic book, and acquiring book names, authors and other introduction information of the whole electronic book of the corresponding electronic book;
acquiring a text file of the corresponding electronic book, and analyzing the text file to acquire text information and/or multimedia index information and multimedia resource file name information of the corresponding text;
and acquiring the NCX file corresponding to the corresponding electronic book, and analyzing the NCX file to obtain the directory information of the electronic book.
Preferably, the method further comprises the following steps:
storing the text information and/or the multimedia index information of the text as a single linear list; and/or, the multimedia resource file name information is also stored as a single linear list.
Preferably, the obtaining text information and/or multimedia index information and multimedia resource file name information corresponding to the directory, rendering and displaying the rendered chapter contents to a user includes:
acquiring text information and/or multimedia index information and multimedia resource file name information of the text corresponding to the directory, paging the text information and/or multimedia index information, and displaying the text information and/or multimedia index information to a user;
when the user clicks the multimedia index information, the corresponding multimedia resource file name is inquired according to the index, and the corresponding multimedia resource is displayed as an independent page.
Preferably, the text information and/or the multimedia index information is paginated and displayed to the user, further comprising:
paging the text information and/or the multimedia index information, caching and displaying the first page to a user, and caching the subsequent page number content in advance when the user reads the page.
Preferably, parsing the ePub e-book selected by the user further includes:
analyzing a text file, acquiring the information of the escape symbol contained in the text file, putting the information of the escape symbol into the text information and/or multimedia index information of the text, simultaneously analyzing an HTML (hypertext markup language) label in the subsequent analysis, and carrying out corresponding processing on the label supported by an analyzer;
wherein the HTML tag includes:
audio, bold, body parts, line breaks, headlines, italics, indexes to images, hyperlinks, paragraphs, headlines, video tags, any one or a combination.
Preferably, the method further comprises the following steps: acquiring text information and/or multimedia index information and multimedia resource file name information of the text corresponding to the directory, and paging and displaying the text information and/or multimedia index information to a user specifically comprises:
and putting the text information and/or the multimedia index information and the multimedia resource file name information of the text into a linear content list for storage, wherein the method comprises the following steps:
text content or image/video links in the form of a character string, whether the character string is text content or belongs to a multimedia type, multimedia links.
Preferably, the method further comprises the following steps: acquiring information of a user for adjusting fonts or line spacing, and adjusting the characters;
and when the article page is adjusted each time, the previously constructed linear content list is emptied and reconstructed.
A loading system for ePub e-books, comprising:
the system comprises an analysis engine module and a rendering engine module, wherein the analysis engine module is used for analyzing an ePub electronic book selected by a user to obtain directory information of the ePub electronic book, and corresponding text character information and/or multimedia index information and multimedia resource file name information;
analyzing the catalog of the ePub electronic book selected by the user to obtain text information and/or multimedia index information and multimedia resource file name information corresponding to the catalog;
and the rendering engine module is used for rendering and displaying the rendered chapter contents to a user.
Preferably, the parsing engine module is further configured to parse content.opf files in the ePub e-book, and obtain a title and an author of the corresponding e-book and other introduction information of the whole e-book;
acquiring a text file of the corresponding electronic book, and analyzing the text file to acquire text information and/or multimedia index information and multimedia resource file name information of the corresponding text;
and acquiring the NCX file corresponding to the corresponding electronic book, and analyzing the NCX file to obtain the directory information of the electronic book.
The invention has the following advantages after adopting the scheme:
the invention mainly captures the character information and the multimedia link information of the electronic book, and abandons other information files for webpage typesetting such as css format files and the like in the character information and the multimedia link information, so that the typesetting format of the electronic book drawn by the parser and the renderer is uniform for all books.
In addition, different from the existing method of loading the content of the whole book into the memory, the method and the system can render the whole book according to chapters according to the chapter content in the corresponding file of the ePub, so that the response speed of a user is very high when the line spacing and the font size are adjusted.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
Drawings
The present invention will be described in detail below with reference to the accompanying drawings so that the above advantages of the present invention will be more apparent. Wherein,
fig. 1 is a schematic structural diagram of a loading system of an ePub electronic book according to an embodiment of the present invention;
fig. 2 is a schematic diagram of a parsing process of a parsing engine module of a loading system of an ePub e-book according to an embodiment of the present invention;
fig. 3 is a flowchart illustrating a loading method of an ePub electronic book according to an embodiment of the present invention.
Detailed Description
The following detailed description of the embodiments of the present invention will be provided with reference to the drawings and examples, so that how to apply the technical means to solve the technical problems and achieve the technical effects can be fully understood and implemented. It should be noted that, as long as there is no conflict, the embodiments and the features of the embodiments of the present invention may be combined with each other, and the technical solutions formed are within the scope of the present invention.
Additionally, the steps illustrated in the flow charts of the figures may be performed in a computer system such as a set of computer-executable instructions and, although a logical order is illustrated in the flow charts, in some cases, the steps illustrated or described may be performed in an order different than here.
Specifically, as shown in fig. 1, a loading system for an ePub e-book mainly includes: the system comprises a parsing engine module and a rendering engine module.
Different from the prior art, the parsing engine module in this embodiment mainly captures text information and multimedia link information of an electronic book, and discards other information files for webpage layout, such as cs format files, in the text information and multimedia link information, so that the electronic book layout format drawn by the parser and the renderer is uniform for all books. Therefore, a uniform reading experience can be provided for the user, and more functions can be added to the customized book.
In addition, unlike the existing method of loading the content of the whole book into the memory, in this embodiment, the parsing engine module renders the whole book according to chapters according to the content of the chapters in the corresponding file of the ePub, so that the response speed of the user is very fast when the line spacing and the font size are adjusted.
In addition, as compared with the existing system which only supports text, the parsing engine module in this embodiment supports media file formats supported by all epubs, and the rendering engine module plays multimedia according to the media file types.
Specifically, a loading system of an ePub electronic book includes:
the system comprises an analysis engine module and a rendering engine module, wherein the analysis engine module is used for analyzing an ePub electronic book selected by a user to obtain directory information of the ePub electronic book, and corresponding text character information and/or multimedia index information and multimedia resource file name information;
analyzing the catalog of the ePub electronic book selected by the user to obtain text information and/or multimedia index information and multimedia resource file name information corresponding to the catalog;
and the rendering engine module is used for rendering and displaying the rendered chapter contents to a user.
Preferably, the parsing engine module is further configured to parse content.opf files in the ePub e-book, and obtain a title and an author of the corresponding e-book and other introduction information of the whole e-book;
acquiring a text file of the corresponding electronic book, and analyzing the text file to acquire text information and/or multimedia index information and multimedia resource file name information of the corresponding text;
and acquiring the NCX file corresponding to the corresponding electronic book, and analyzing the NCX file to obtain the directory information of the electronic book.
As shown in fig. 3, a method for loading an ePub e-book includes:
step 1: analyzing the ePub electronic book selected by a user to obtain the directory information of the ePub electronic book, and the corresponding text information and/or multimedia index information and multimedia resource file name information;
step 2: analyzing the catalog of the ePub electronic book selected by the user;
and step 3: and acquiring text information and/or multimedia index information and multimedia resource file name information of the text corresponding to the directory, rendering and displaying the rendered chapter content to a user.
Preferably, the analyzing the ePub e-book selected by the user to obtain the directory information of the ePub e-book, the text information and/or the multimedia index information of the corresponding text, and the multimedia resource file name information includes:
analyzing content.opf files in the ePub electronic book, and acquiring book names, authors and other introduction information of the whole electronic book of the corresponding electronic book;
acquiring a text file of the corresponding electronic book, and analyzing the text file to acquire text information and/or multimedia index information and multimedia resource file name information of the corresponding text;
and acquiring the NCX file corresponding to the corresponding electronic book, and analyzing the NCX file to obtain the directory information of the electronic book.
Preferably, the method further comprises the following steps:
storing the text information and/or the multimedia index information of the text as a single linear list; and/or, the multimedia resource file name information is also stored as a single linear list.
Preferably, the obtaining text information and/or multimedia index information and multimedia resource file name information corresponding to the directory, rendering and displaying the rendered chapter contents to a user includes:
acquiring text information and/or multimedia index information and multimedia resource file name information of the text corresponding to the directory, paging the text information and/or multimedia index information, and displaying the text information and/or multimedia index information to a user;
when the user clicks the multimedia index information, the corresponding multimedia resource file name is inquired according to the index, and the corresponding multimedia resource is displayed as an independent page.
Preferably, the text information and/or the multimedia index information is paginated and displayed to the user, further comprising:
paging the text information and/or the multimedia index information, caching and displaying the first page to a user, and caching the subsequent page number content in advance when the user reads the page.
Preferably, parsing the ePub e-book selected by the user further includes:
analyzing a text file, acquiring the information of the escape symbol contained in the text file, putting the information of the escape symbol into the text information and/or multimedia index information of the text, simultaneously analyzing an HTML (hypertext markup language) label in the subsequent analysis, and carrying out corresponding processing on the label supported by an analyzer;
wherein the HTML tag includes:
audio, bold, body parts, line breaks, headlines, italics, indexes to images, hyperlinks, paragraphs, headlines, video tags, any one or a combination.
Preferably, the method further comprises the following steps: acquiring text information and/or multimedia index information and multimedia resource file name information of the text corresponding to the directory, and paging and displaying the text information and/or multimedia index information to a user specifically comprises:
and putting the text information and/or the multimedia index information and the multimedia resource file name information of the text into a linear content list for storage, wherein the method comprises the following steps:
text content or image/video links in the form of a character string, whether the character string is text content or belongs to a multimedia type, multimedia links.
Preferably, the method further comprises the following steps: acquiring information of a user for adjusting fonts or line spacing, and adjusting the characters;
and when the article page is adjusted each time, the previously constructed linear content list is emptied and reconstructed.
In order to make the above advantages of the present invention more clear, each electronic book package complying with the ePub protocol must include a sub-directory named META-INF, and a file named container.
This file contains the directory path of the OPF file. XML file is read from META-INF subdirectory to obtain more information, for example, the XML parser scans < rootfile > tag, and then reads "full-path" key value, and the latter path is the path of OPF file.
In which, since the opf file may contain a plurality of files, for example, a whole set of files contains a plurality of books, the < rootfile > tag needs to be scanned for a plurality of times.
The OPF file includes information such as title, author, copyright, etc. of the entire book. In addition, the content contains directory, cover information and the xml text file name corresponding to each chapter content, and the text file and the OPF file are in the same path according to the specification.
Among these, in analyzing OPF, it is necessary to analyze < dc: title > (title), < dc: creator >, < dc: language >, < dc: rights > (copyright), < dc: publisher > (issuer) these tags.
Then, it is a very critical tag < manifest > in OPF. The label includes the directory file name of the whole book and the text file name corresponding to each chapter and chapter.
Specifically, each item in this tag is represented by an < item > tag, which is composed of an id key, an href key, and a media-type key. In an embodiment, a dictionary is used to store the information, wherein in the dictionary:
the keys of the dictionary are represented by id, and the values are represented by reference to a structure composed of href and media-type.
In an embodiment, the parsing engine module ignores the cs file and the page template file without parsing, wherein another tag < spine > is encountered after all < item > items in < manifest > are collected. This label contains the bibliography of this book and the reading order of each chapter.
Wherein, the id corresponding to the toc key in the < spine > tag is the file name of the NCX file, and the file contains more detailed directory information. And all items later consist of < itemref > tags.
The < itemref > tag contains the key idref, which corresponds to the id in the < item > item in < manifest >, so we can immediately find the file name corresponding to each < itemref >.
We deposit each < itemref > content in < spine > by using a linear list, since they are already in order.
After the NCX file path is acquired, the NCX file can be opened.
Wherein the < docTitle > tag represents a directory title. This is followed by the core tag < navMap > in the NCX file. Wherein < navMap > is a navigation map, indicating that each item inside can quickly jump to the corresponding content of the book. The < navMap > tag consists of a set of < navPoint > tags.
The < navPoint > tag details the information for this navigation item. This tag contains the key id, noting that this id may be different from the id of < item > in < manifest > before.
In addition, a playOrder key is included, followed by an integer to indicate the order of arrangement in the directory table. In addition, it contains a < navLabel > tag, describing the title of the directory entry; the < content > tag describes to which text file to jump.
Therefore, in an embodiment, the parsing engine module further includes: and the storage unit is used for storing each item of information in the < navMap > by using a linear list and sequencing the items according to the playOrder in the < navPoint >. A structure is used to store the < navLabel > content and the text index in the < content > tag.
And after the catalog is processed, displaying the catalog to the user, and analyzing each text file when the rendering engine module needs according to the click of the user.
The text file is usually xml as the file type name, or xml. In any format, the html tag is led out by an xml tag, and then the html tag is embedded in the xml tag. The xml tag contains the document type and the character encoding format. Because html on the Internet almost uses UTF-8 as a universal character set encoding format, we do not need to parse the xml tag part in detail, so html can be skipped by scanning directly.
Html has many built-in tags and escape symbols.
In an embodiment, the present parsing engine supports all of the escape symbols of HTML. Because the parsing engine is mainly used for article typesetting at a mobile phone end, only labels related to article segmentation and < link > labels are supported, and other labels are ignored. The labels supported by the ePub parsing engine are as follows:
< audio > -representing audio. When this tag is encountered, the parsing engine will save the audio link in a special multimedia link list. Each element of this list consists of two parts, the first part representing a hyperlink for a resource and the second part representing the resource type. The multimedia types supported by the renderer are as follows: three types of images, audio and video.
< b > -bold. This tells the rendering engine to use bold font to describe the text between < b > and </b >. When meeting the < b > label, the analysis engine records the real index of the bold font description text through a linear list, and then records the end index until scanning to the < b >.
< body > -representing a body part.
< br > -represents line feed. In the parsing engine, the line is uniformly represented by using '/r' character due to the requirement of the rendering engine.
< h1> to < h6> -represent titles 1 to 6. Each can be distinguished by a different font size. When the < h.
< head > -representing the head of the section, will typically contain a < title > tag to represent the title of the section.
< HTML > HTML start index. The content behind this tag is all HTML content.
< i > -representing italic fonts. When the parser encounters < i > and the process is similar to < b >, the text segment is saved in a linear list for use by a subsequent rendering engine and is represented in italic font.
< img > -this tag represents the index to the image. When the parser encounters the < img > tag, the previous text is saved as a string element to a linear list. The image links in the current < img > tag are then saved in a linear list dedicated to saving multimedia links, identifying the multimedia type as an image. Finally, the list index where the image is located is saved after the text list.
< link > -this is also a more critical tag. < link > the following can follow video, audio, etc. in addition to images. This tag is handled the same as the < img > tag.
< p > -represents a new paragraph. When the tag is encountered, the parser automatically inserts a wrap.
< title > -represents a title. If this tag is in < head >, then this is taken as the chapter title. If present in < body >, ignore.
< video > -represents video. The process is the same as < img >, identifying the multimedia type as video.
That is, in this embodiment, the parsing engine module has two linear lists after parsing all the text and multimedia tags.
Specifically, one is a linear list containing text information and multimedia element indexes, and the other is a linear list specially storing multimedia resource file names. The two lists are then submitted to a rendering engine for processing.
For the parsing engine side, the complete data flow is shown in fig. 2.
The rendering engine module has three major parts of available data: the name of the book, the author, and other information related to the entire book, the catalog of the book, and the text of the book.
The information related to the whole book, such as the book name, the author and the like, is acquired after the stage of analyzing content.opf is finished; the book directory is obtained by analyzing the NCX file; the text of the book is obtained by firstly finding out a text file through the text file name in the content.
The rendering engine module can customize how to display information such as book names, authors, creators, issuers, copyrights and the like. For example, in the simplest embodiment, a cover page map may also be found in the OPF file, which serves as the cover page for the book. Then, the directory of the book is exposed according to the contents in the NCX file. This need only be ordered by the < navPoint > tag in NCX.
In the present invention, the rendering engine module is mainly used for rendering the text, that is, in this embodiment, the text and the multimedia file are rendered separately.
Wherein the picture or video is given as one single page.
For video, a prompt is given to the user on the content page containing the audio, and the audio is played after the user clicks a certain button. The renderer implementer can also automatically play audio when the user flips to content containing audio pages.
The rendering engine module of the present embodiment is briefly described below, and specifically, the working mechanism and the included modules of the rendering engine module are as follows:
for the text file of a specific chapter, a linear list is used for storing all the contents of the chapter, wherein each element of the linear list is composed of three members:
1. textual content or image/video links (strings); 2. whether the content 1 belongs to text or multimedia type (boolean type); 3. and audio linking.
Since the content of a certain chapter of an electronic book is generally not too large, i.e. hundreds of pages at most, the use of the linear list to store the information can save time and space, and since the information of all chapters of the whole book is not stored in the linear list, and the method brings great convenience for determining the current page number and instantly asking for the content.
Wherein the rendering engine module further comprises: the paging unit is used for reading a text linear list given by the ePub analysis engine module, wherein if the current text is a text, whether the next list node is an audio type is judged, if not, the text is paged according to the font size and the line spacing set by the current user, then the content of each page is stored in the chapter content linear list, whether the member of the multimedia type is set to be 'No', and the audio link is set to be null;
if the next list node is audio type, then we set the audio link member at the first page content node of the text to this audio link after paging is complete.
If the current is video or image, the content node is set as the link of the video or image, then whether the multimedia type member is set as 'yes', and finally whether the next text type is audio is checked, if the text type is audio, the audio link of the content node is set as the audio link, otherwise, the audio link is set as 'null', and therefore, the content of a specific chapter is well paged.
Therefore, the rendering unit in the rendering engine module further renders the content of a certain page according to the constructed content linear table.
The rendering modes are many, and the text can be drawn by using an upper layer interface provided by a specific system, or by using a bottom layer interface provided by the system, or even by the bottom layer interface. Since the image and the video occupy one page separately, the processing is very convenient, and the prior art means can be adopted, which is not described in detail herein.
For example, when the user sets the font size or line spacing, all the contents of the current chapter need to be adjusted because the contents of each page may be changed. However, since the contents of a single page are very limited, it is very fast to process.
It should be noted that, in order to better implement the present invention, all the contents provided by the ePub parsing engine module cannot be destroyed, otherwise, a problem occurs when adjusting the article page according to the current setting environment, and in order to achieve fast rendering, when adjusting the article page each time, all the content linear tables constructed last time need to be cleared and then reconstructed, so that the memory can be utilized most efficiently, and waste of memory space is not caused.
In addition, in the embodiment, after we show images, videos or audios, if the user turns to the next page or the previous page, the multimedia playing resources of the current page are also closed, so as to save the memory space and the consumption of CPU resources.
The ePub parsing engine and rendering engine will be described in more detail below in connection with an existing case of the company on iOS systems.
Specifically, in one embodiment, it is primarily a book city product.
The ePub e-book loading system firstly scans the content of the locally stored ePub book, and displays the cover of each book obtained by previous parsing on the book shelf.
When a user clicks a book, the application activates the ePub parser, initializes the ePub parser, and parses the designated ePub file package. Wherein the ePub parser first finds the specified OPF file path in container. And then, analyzing the OPF file to obtain various information such as book name, author, copyright and the like. The complete class is shown below:
wherein, mContanainerVersion represents ePub protocol version number; mBaseFilePath represents the root directory of the book; mConntetBaseDir represents the text root directory path; mConntentFullPath represents the directory full path; mBookTitle represents the name of the book; mBookAuthor represents the author of the book; mCopyRights stands for copyright; mPublisher represents an issuer; mCoverImagePath represents the cover picture path. manifest ditect is a dictionary that stores the key and detailed text information for each text file entry after parsing the < manifest > tag. mToRecID represents the NCX file path. The mOrderedContentIDs represent a linear list of sorted body content IDs.
After the OPF is resolved, the NCX file is found by mToRecID to resolve the directory. After the directory is analyzed, mNavDocTitIe can be obtained and represents the title of the directory; mNavDocAuthor represents an author presented in a directory; the mNavContentArray represents a linear list of directory contents.
All of the above information is available to the application by calling getContents this method. The application can display the catalog on the screen. When a user clicks on a directory entry, the text content parser will be activated.
The parsing of the specified text can be completed after the application calls the parse method. Here, mtile stores the text header after the analysis; the mContentList stores the text content, which contains the index of the text content or the multimedia resource list. And mmedia list stores the file name of the multimedia asset. Then, the rendering engine starts to perform paging processing after obtaining the text content, and in the embodiment, the application uses a CoreText interface on the iOS bottom layer to perform paging processing.
And after paging is completed, performing cache type display. That is, when the reader looks at page 1, the content of page 2 is cached; page 1 is retained while page 2 is viewed, then page 3 contents are cached, and so on. Then, when the last page is seen, only the contents of the second last page are retained.
Compared with the prior art, the proposal of the application has the technical advantages
1. And the mixing arrangement of multimedia resources such as characters, pictures, audio, video and the like is supported.
2. The memory occupation is small. Because the proposal only takes one text specified by the user to analyze each time, the content of the whole text is not loaded into the memory, thereby greatly reducing the memory load.
In addition, the characteristic enables the user to wait for changing the font size, line spacing and the like in a short time, and the change can be completed within 2 seconds generally. And is typically completed in 0.5 seconds on the iPhone 5.
3. The analysis is efficient. Because the scheme uses the linear list element overlapping technology, namely, the elements of one linear list can be of a character string type or an integer type, the elements can be directly mapped to the elements of other corresponding linear lists, and time and space are saved.
It should be noted that for simplicity of description, the above method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects.
Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
Finally, it should be noted that: although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that changes may be made in the embodiments and/or equivalents thereof without departing from the spirit and scope of the invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (10)
1. A method for loading an ePub e-book, the method comprising:
analyzing the ePub electronic book selected by a user to obtain the directory information of the ePub electronic book, and the corresponding text information and/or multimedia index information and multimedia resource file name information;
analyzing the catalog of the ePub electronic book selected by the user;
and acquiring text information and/or multimedia index information and multimedia resource file name information of the text corresponding to the directory, rendering and displaying the rendered chapter content to a user.
2. The method for loading an ePub e-book according to claim 1, wherein analyzing the ePub e-book selected by the user to obtain directory information of the ePub e-book, and text information and/or multimedia index information and multimedia resource file name information of a corresponding text, includes:
analyzing content.opf files in the ePub electronic book, and acquiring book names, authors and other introduction information of the whole electronic book of the corresponding electronic book;
acquiring a text file of the corresponding electronic book, and analyzing the text file to acquire text information and/or multimedia index information and multimedia resource file name information of the corresponding text;
and acquiring the NCX file corresponding to the corresponding electronic book, and analyzing the NCX file to obtain the directory information of the electronic book.
3. The method for loading an ePub e-book according to claim 1 or 2, further comprising:
storing the text information and/or the multimedia index information of the text as a single linear list; and/or, the multimedia resource file name information is also stored as a single linear list.
4. The method for loading an ePub e-book according to claim 1 or 2, wherein the step of obtaining text information and/or multimedia index information and multimedia resource file name information corresponding to the directory, rendering the text information and/or multimedia index information, and displaying the rendered chapter contents to a user includes:
acquiring text information and/or multimedia index information and multimedia resource file name information of the text corresponding to the directory, paging the text information and/or multimedia index information, and displaying the text information and/or multimedia index information to a user;
when the user clicks the multimedia index information, the corresponding multimedia resource file name is inquired according to the index, and the corresponding multimedia resource is displayed as an independent page.
5. The method for loading an ePub e-book according to claim 4, wherein the text information and/or multimedia index information is paginated and presented to a user, further comprising:
paging the text information and/or the multimedia index information, caching and displaying the first page to a user, and caching the subsequent page number content in advance when the user reads the page.
6. The method for loading an ePub e-book of claim 1, wherein parsing the ePub e-book selected by the user further comprises:
analyzing a text file, acquiring the information of the escape symbol contained in the text file, putting the information of the escape symbol into the text information and/or multimedia index information of the text, simultaneously analyzing an HTML (hypertext markup language) label in the subsequent analysis, and carrying out corresponding processing on the label supported by an analyzer;
wherein the HTML tag includes:
audio, bold, body parts, line breaks, headlines, italics, indexes to images, hyperlinks, paragraphs, headlines, video tags, any one or a combination.
7. The method for loading an ePub e-book of claim 4, further comprising: acquiring text information and/or multimedia index information and multimedia resource file name information of the text corresponding to the directory, and paging and displaying the text information and/or multimedia index information to a user specifically comprises:
and putting the text information and/or the multimedia index information and the multimedia resource file name information of the text into a linear content list for storage, wherein the method comprises the following steps:
text content or image/video links in the form of a character string, whether the character string is text content or belongs to a multimedia type, multimedia links.
8. The method for loading an ePub e-book of claim 7, further comprising: acquiring information of a user for adjusting fonts or line spacing, and adjusting the characters;
and when the article page is adjusted each time, the previously constructed linear content list is emptied and reconstructed.
9. A loading system for ePub e-books, comprising:
the system comprises an analysis engine module and a rendering engine module, wherein the analysis engine module is used for analyzing an ePub electronic book selected by a user to obtain directory information of the ePub electronic book, and corresponding text character information and/or multimedia index information and multimedia resource file name information;
analyzing the catalog of the ePub electronic book selected by the user to obtain text information and/or multimedia index information and multimedia resource file name information corresponding to the catalog;
and the rendering engine module is used for rendering and displaying the rendered chapter contents to a user.
10. The method for loading an ePub e-book of claim 9, wherein the parsing engine module is further configured to parse a content.opf file in the ePub e-book to obtain a title, an author, and other introduction information of the whole e-book;
acquiring a text file of the corresponding electronic book, and analyzing the text file to acquire text information and/or multimedia index information and multimedia resource file name information of the corresponding text;
and acquiring the NCX file corresponding to the corresponding electronic book, and analyzing the NCX file to obtain the directory information of the electronic book.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410010411.9A CN103761277A (en) | 2014-01-09 | 2014-01-09 | ePub electronic book loading method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410010411.9A CN103761277A (en) | 2014-01-09 | 2014-01-09 | ePub electronic book loading method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN103761277A true CN103761277A (en) | 2014-04-30 |
Family
ID=50528514
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410010411.9A Pending CN103761277A (en) | 2014-01-09 | 2014-01-09 | ePub electronic book loading method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103761277A (en) |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106156191A (en) * | 2015-04-21 | 2016-11-23 | 北京大学 | Academic probation method based on ePub file and the academic probation system based on ePub file |
CN106713962A (en) * | 2016-12-19 | 2017-05-24 | 掌阅科技股份有限公司 | Video display method and apparatus and terminal device |
CN107220221A (en) * | 2016-03-22 | 2017-09-29 | 北大方正集团有限公司 | The read method and device of EPub files based on Android platform |
CN107247691A (en) * | 2017-05-24 | 2017-10-13 | 腾讯科技(深圳)有限公司 | A kind of display methods of text message, device, mobile terminal and storage medium |
CN107992250A (en) * | 2017-12-20 | 2018-05-04 | 维沃移动通信有限公司 | A kind of display methods of electronic book documentary content, mobile terminal |
CN108154041A (en) * | 2016-12-02 | 2018-06-12 | 北京京东尚科信息技术有限公司 | A kind of ePub document data safeties guard method, apparatus and system |
CN108345595A (en) * | 2017-01-22 | 2018-07-31 | 北大方正集团有限公司 | The management method and management system of EPUB format books |
CN109062880A (en) * | 2018-07-05 | 2018-12-21 | 掌阅科技股份有限公司 | The production method of electronic book documentary, electronic equipment, server, storage medium |
CN109542852A (en) * | 2018-12-03 | 2019-03-29 | 郑州云海信息技术有限公司 | A kind of directory information processing method and relevant apparatus |
CN109726166A (en) * | 2018-12-20 | 2019-05-07 | 百度在线网络技术(北京)有限公司 | Display methods, device, computer equipment and the readable storage medium storing program for executing of e-book |
CN110532233A (en) * | 2019-08-20 | 2019-12-03 | 武汉鼎森电子科技有限公司 | A kind of epub document generating method and system |
CN110717323A (en) * | 2019-10-17 | 2020-01-21 | 北京幻想纵横网络技术有限公司 | Document seal dividing method and device, terminal and computer readable storage medium |
CN110727887A (en) * | 2019-09-17 | 2020-01-24 | 武汉鼎森电子科技有限公司 | Book link processing method based on two-dimensional code |
CN110807298A (en) * | 2019-09-27 | 2020-02-18 | 北京思维造物信息科技股份有限公司 | Method and system for processing marking information |
CN111276118A (en) * | 2018-12-03 | 2020-06-12 | 北京京东尚科信息技术有限公司 | Method and system for realizing audio electronic book |
CN111460345A (en) * | 2020-03-30 | 2020-07-28 | 掌阅科技股份有限公司 | Electronic book loading display method, electronic equipment and storage medium |
CN112016024A (en) * | 2019-05-31 | 2020-12-01 | 腾讯科技(深圳)有限公司 | Data recommendation method and device and computer-readable storage medium |
CN112487327A (en) * | 2020-11-30 | 2021-03-12 | 惠州Tcl移动通信有限公司 | Electronic book loading method and device and mobile terminal |
CN112632959A (en) * | 2020-12-29 | 2021-04-09 | 湖北大学 | EPUB file analysis method |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101281529A (en) * | 2008-05-30 | 2008-10-08 | 杨洪 | Method for realizing hyperlink reading on hand-hold reading equipment |
CN101894115A (en) * | 2009-05-18 | 2010-11-24 | 北京大学 | Image data processing method and device for electronic document |
CN101977233A (en) * | 2010-11-01 | 2011-02-16 | 优视科技有限公司 | Method and system for leading mobile terminal to browse webpage in reading mode |
CN102521280A (en) * | 2011-11-26 | 2012-06-27 | 华为技术有限公司 | Loading method and loading device of EPub electronic book |
CN102915654A (en) * | 2011-08-05 | 2013-02-06 | 汉王科技股份有限公司 | Digital document processing method and electronic reading device |
US20130080887A1 (en) * | 2011-09-26 | 2013-03-28 | Zhaorong Hou | Simulation of web applications and secondary devices in a web browser, web application development tools, and methods using the same |
CN103020082A (en) * | 2011-09-23 | 2013-04-03 | 北大方正集团有限公司 | Reading processing system and method, server and terminal equipment |
US20130088511A1 (en) * | 2011-10-10 | 2013-04-11 | Sanjit K. Mitra | E-book reader with overlays |
-
2014
- 2014-01-09 CN CN201410010411.9A patent/CN103761277A/en active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101281529A (en) * | 2008-05-30 | 2008-10-08 | 杨洪 | Method for realizing hyperlink reading on hand-hold reading equipment |
CN101894115A (en) * | 2009-05-18 | 2010-11-24 | 北京大学 | Image data processing method and device for electronic document |
CN101977233A (en) * | 2010-11-01 | 2011-02-16 | 优视科技有限公司 | Method and system for leading mobile terminal to browse webpage in reading mode |
CN102915654A (en) * | 2011-08-05 | 2013-02-06 | 汉王科技股份有限公司 | Digital document processing method and electronic reading device |
CN103020082A (en) * | 2011-09-23 | 2013-04-03 | 北大方正集团有限公司 | Reading processing system and method, server and terminal equipment |
US20130080887A1 (en) * | 2011-09-26 | 2013-03-28 | Zhaorong Hou | Simulation of web applications and secondary devices in a web browser, web application development tools, and methods using the same |
US20130088511A1 (en) * | 2011-10-10 | 2013-04-11 | Sanjit K. Mitra | E-book reader with overlays |
CN102521280A (en) * | 2011-11-26 | 2012-06-27 | 华为技术有限公司 | Loading method and loading device of EPub electronic book |
Non-Patent Citations (2)
Title |
---|
王黎黎: ""电子书阅读软件的设计与实现"", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
陆海龙: ""Linux平台嵌入式epub电子书阅读与管理系统"", 《万方数据库》 * |
Cited By (27)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106156191B (en) * | 2015-04-21 | 2019-08-06 | 北京大学 | A trial-reading method based on ePub files and a trial-reading system based on ePub files |
CN106156191A (en) * | 2015-04-21 | 2016-11-23 | 北京大学 | Academic probation method based on ePub file and the academic probation system based on ePub file |
CN107220221A (en) * | 2016-03-22 | 2017-09-29 | 北大方正集团有限公司 | The read method and device of EPub files based on Android platform |
CN108154041A (en) * | 2016-12-02 | 2018-06-12 | 北京京东尚科信息技术有限公司 | A kind of ePub document data safeties guard method, apparatus and system |
CN106713962A (en) * | 2016-12-19 | 2017-05-24 | 掌阅科技股份有限公司 | Video display method and apparatus and terminal device |
CN106713962B (en) * | 2016-12-19 | 2018-10-09 | 掌阅科技股份有限公司 | Video display method, apparatus and terminal device |
CN108345595A (en) * | 2017-01-22 | 2018-07-31 | 北大方正集团有限公司 | The management method and management system of EPUB format books |
CN107247691A (en) * | 2017-05-24 | 2017-10-13 | 腾讯科技(深圳)有限公司 | A kind of display methods of text message, device, mobile terminal and storage medium |
CN107247691B (en) * | 2017-05-24 | 2021-10-08 | 腾讯科技(深圳)有限公司 | Text information display method and device, mobile terminal and storage medium |
CN107992250A (en) * | 2017-12-20 | 2018-05-04 | 维沃移动通信有限公司 | A kind of display methods of electronic book documentary content, mobile terminal |
CN109062880A (en) * | 2018-07-05 | 2018-12-21 | 掌阅科技股份有限公司 | The production method of electronic book documentary, electronic equipment, server, storage medium |
CN109542852B (en) * | 2018-12-03 | 2021-10-29 | 郑州云海信息技术有限公司 | A kind of directory information processing method and related device |
CN109542852A (en) * | 2018-12-03 | 2019-03-29 | 郑州云海信息技术有限公司 | A kind of directory information processing method and relevant apparatus |
CN111276118A (en) * | 2018-12-03 | 2020-06-12 | 北京京东尚科信息技术有限公司 | Method and system for realizing audio electronic book |
CN109726166A (en) * | 2018-12-20 | 2019-05-07 | 百度在线网络技术(北京)有限公司 | Display methods, device, computer equipment and the readable storage medium storing program for executing of e-book |
CN109726166B (en) * | 2018-12-20 | 2024-06-07 | 百度在线网络技术(北京)有限公司 | Electronic book display method and device, computer equipment and readable storage medium |
CN112016024A (en) * | 2019-05-31 | 2020-12-01 | 腾讯科技(深圳)有限公司 | Data recommendation method and device and computer-readable storage medium |
CN112016024B (en) * | 2019-05-31 | 2024-05-10 | 腾讯科技(深圳)有限公司 | Data recommendation method and device and computer readable storage medium |
CN110532233A (en) * | 2019-08-20 | 2019-12-03 | 武汉鼎森电子科技有限公司 | A kind of epub document generating method and system |
CN110727887A (en) * | 2019-09-17 | 2020-01-24 | 武汉鼎森电子科技有限公司 | Book link processing method based on two-dimensional code |
CN110807298A (en) * | 2019-09-27 | 2020-02-18 | 北京思维造物信息科技股份有限公司 | Method and system for processing marking information |
CN110807298B (en) * | 2019-09-27 | 2023-08-08 | 北京思维造物信息科技股份有限公司 | Method and system for processing marking information |
CN110717323A (en) * | 2019-10-17 | 2020-01-21 | 北京幻想纵横网络技术有限公司 | Document seal dividing method and device, terminal and computer readable storage medium |
CN111460345A (en) * | 2020-03-30 | 2020-07-28 | 掌阅科技股份有限公司 | Electronic book loading display method, electronic equipment and storage medium |
CN112487327A (en) * | 2020-11-30 | 2021-03-12 | 惠州Tcl移动通信有限公司 | Electronic book loading method and device and mobile terminal |
CN112632959A (en) * | 2020-12-29 | 2021-04-09 | 湖北大学 | EPUB file analysis method |
CN112632959B (en) * | 2020-12-29 | 2023-09-01 | 湖北大学 | A method for parsing EPUB files |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103761277A (en) | ePub electronic book loading method and system | |
US7617450B2 (en) | Method, system, and computer-readable medium for creating, inserting, and reusing document parts in an electronic document | |
JP4716612B2 (en) | Method for redirecting the source of a data object displayed in an HTML document | |
US20140156590A1 (en) | Producing automated terms listings in html document publishing with enhanced terms definitions | |
US20160283606A1 (en) | Method for performing webpage loading, device and browser thereof | |
US7783967B1 (en) | Packaging web content for reuse | |
US20120110436A1 (en) | Integrated document viewer | |
US20110131482A1 (en) | System and method for multi-channel publishing | |
US20090313579A1 (en) | Systems and methods involving favicons | |
US20080244381A1 (en) | Document processing for mobile devices | |
US20100332977A1 (en) | Method and apparatus for facilitating directed reading of document portions based on information-sharing relevance | |
US20130339840A1 (en) | System and method for logical chunking and restructuring websites | |
CN109558123B (en) | Method for converting webpage into electronic book, electronic equipment and storage medium | |
CN114021042A (en) | Web page content extraction method, device, computer equipment and storage medium | |
US20100306307A1 (en) | System and method for social bookmarking/tagging at a sub-document and concept level | |
Garrish et al. | EPUB 3 best practices | |
CN105786847A (en) | Method and system for displaying structured abstracts of commodity web page in e-commerce website | |
CN113360106B (en) | Webpage printing method and device | |
US9015577B2 (en) | Content flow through containers | |
CN106502968A (en) | The method and device of data processing | |
US9619445B1 (en) | Conversion of content to formats suitable for digital distributions thereof | |
US20050131859A1 (en) | Method and system for standard bookmark classification of web sites | |
CN117634425A (en) | Webpage text marking method, device, terminal equipment and storage medium | |
WO2018040807A1 (en) | Method and device for browsing front-end auxiliary converted data | |
CN114780881A (en) | Method and device for labeling text content of browser and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20140430 |
|
RJ01 | Rejection of invention patent application after publication |