CN120471022A

CN120471022A - File conversion method, device, equipment and program product

Info

Publication number: CN120471022A
Application number: CN202510952283.8A
Authority: CN
Inventors: 尹毅彬
Original assignee: Uc Mobile Co ltd
Current assignee: Uc Mobile Co ltd
Priority date: 2025-07-10
Filing date: 2025-07-10
Publication date: 2025-08-12

Abstract

The application provides a file conversion method, a device, equipment and a program product, wherein the file conversion method comprises the steps of obtaining a file to be converted, wherein the file to be converted is an image file or a portable document format file, namely a PDF file; identifying elements in the file to be converted, determining the attribute of the identified elements based on the identification result, logically layering the identified elements, generating a structured document based on the logical layering result and the attribute of the identified elements, and converting the structured document or the edited structured document into an editable portable document format file to edit PDF. The application realizes the automatic conversion from the image or PDF to the editable PDF, and the structured document generated in the middle in the conversion process supports the user to edit instantly while improving the logic level of the document, thereby improving the editing efficiency and convenience.

Description

File conversion method, device, equipment and program product

Technical Field

The present application relates to the field of document processing technologies, and in particular, to a method, an apparatus, a device, and a program product for file conversion.

Background

Existing documents include editable documents, such as word and the like, and non-editable documents, such as pictures and the like. PDF (Portable Document Format ) is widely used by virtue of its cross-platform compatibility, accurate layout control and data security.

PDF files are largely divided into plain image PDFs (also known as photocopying PDFs), editable PDFs, and partially editable PDFs. The editable PDF internally stores structured text, images and vector graphic data, so that a user can directly select and copy text contents, and can edit and search documents and the like, and the editable PDF is essentially a digital document constructed based on text and graphic instructions. The editable PDF contains editable elements such as text, vector graphics, etc., and may also contain non-editable elements such as images. The pure image PDF is usually generated by directly converting paper documents or pictures, the content body of the whole document is stored in an image form, the memory consumption is high, meanwhile, the pure image PDF can completely preserve the visual appearance of the original document, but does not contain editable text data, a user cannot directly extract text information, the format of the document is difficult to adjust, and the document is inconvenient to process and reuse. The partial editable PDF is the combination of the pure image PDF and the editable PDF, and has similar problems as the pure image PDF.

Therefore, it is desirable to provide a solution for converting an image, a pure image PDF or a partially editable PDF into an editable PDF.

Disclosure of Invention

The embodiment of the application provides a file conversion method, device, equipment and program product, which realize the conversion of images or PDFs into editable PDFs, and through the structured document generated in the middle, not only is a clear hierarchical logic structure endowed to the document, but also instant editing in the whole conversion process is supported, and the document processing efficiency and the convenience for recycling content are obviously improved.

In a first aspect, an embodiment of the present application provides a file conversion method, including obtaining a file to be converted, where the file to be converted is an image file or a portable document format file, identifying an element in the file to be converted, determining an attribute of the identified element based on an identification result, and logically layering the identified element, where the element includes text, generating a structured document based on the logical layering result and the attribute of the identified element, and converting the structured document or the edited structured document into an editable portable document format file.

In a second aspect, an embodiment of the application provides a file conversion device, which comprises a file obtaining module for obtaining a file to be converted, an element identification module for identifying elements in the file to be converted and determining the attribute of the identified elements and logically layering the identified elements based on an identification result, a structured document generation module for generating a structured document based on the logical layering result and the attribute of the identified elements, and an editable file conversion module for converting the structured document or the edited structured document into an editable portable document format file.

In a third aspect, an embodiment of the present application provides an electronic device, including a memory, a processor, where the memory stores computer-executable instructions, and where the processor executes the computer-executable instructions stored in the memory, so that the processor executes the method provided in the first aspect and/or the various possible implementations of the first aspect of the present application.

In a fourth aspect, embodiments of the present application provide a computer-readable storage medium having stored therein computer-executable instructions which, when executed by a processor, are adapted to carry out the method provided in the first aspect and/or the various possible implementations of the first aspect of the present application.

In a fifth aspect, embodiments of the present application provide a computer program product comprising a computer program which, when executed by a processor, implements the method provided by the first aspect and/or the various possible implementations of the first aspect.

The file conversion method, device, equipment and program product provided by the embodiment of the application aim at a part or all of files to be converted such as images or PDF files which can not be edited and occupy large memory, determine the attribute of elements in the files and logically layer the identification elements through an element identification step, further generate a structured document such as an HTML (hypertext markup language) document based on the element attribute and the layering result, finally convert the structured document or the edited structured document into an editable PDF document, and a user can directly perform operations such as text selection, copying, searching and the like in the editable PDF document, thereby improving the operability of the files and the convenience of the editable elements such as text reuse in the files, supporting the instant editing and the effect preview of the documents through the generation of intermediate structured documents, and improving the editing convenience.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.

FIG. 1 is a schematic view of a scene provided in an embodiment of the present application;

FIG. 2 is a flowchart illustrating a method for converting files according to the present application;

FIG. 3 is a schematic diagram of a hierarchical relationship according to an embodiment of the present application;

FIG. 4 is a schematic diagram of an editing interface for a structured document provided by an embodiment of the present application;

FIG. 5 is a second flow chart of the file conversion method according to the present application;

FIG. 6 is a schematic diagram of a visual operation interface of an editable PDF document according to an embodiment of the application;

Fig. 7 is a schematic structural diagram of a file conversion device according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Specific embodiments of the present application have been shown by way of the above drawings and will be described in more detail below. The drawings and the written description are not intended to limit the scope of the inventive concepts in any way, but rather to illustrate the inventive concepts to those skilled in the art by reference to the specific embodiments.

Detailed Description

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the application. Rather, they are merely examples of apparatus and methods consistent with aspects of the application as detailed in the accompanying claims.

It should be noted that, the user information (including but not limited to user equipment information, user personal information, etc.) and the data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present application are information and data authorized by the user or fully authorized by each party, and the collection, use and processing of the related data need to comply with related laws and regulations and standards, and provide corresponding operation entries for the user to select authorization or rejection.

The following describes the technical scheme of the present application and how the technical scheme of the present application solves the above technical problems in detail with specific embodiments. The following embodiments may be combined with each other, and the same or similar concepts or processes may not be described in detail in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.

Some software provides the function of image to PDF conversion. Fig. 1 is a schematic view of a scenario provided in an embodiment of the present application, as shown in fig. 1, a user may upload a scanned image of a document or a document in other formats, such as a word, through an "upload" key or a photographing key of software, obtain a PDF document through a PDF conversion function of the software, for example, convert the scanned image into a pure image PDF, and convert the word document into an editable PDF document.

However, the user cannot participate in the whole conversion process, so that the user is difficult to control and adjust the conversion effect in real time, and the details of the document cannot be optimized according to actual requirements, meanwhile, the problems of text dislocation, format confusion, image loss and the like are easy to occur aiming at the document with complex typesetting (such as multi-column layout, nested tables and mixed-line of images and texts) in the case that the PDF of pictures or pure images is converted into editable PDFs, and the readability and usability of the converted editable PDFs are greatly reduced.

In order to solve the problems, the application provides a file conversion method, which aims to realize the conversion of an image or a pure image PDF document into an editable PDF document, firstly, through the accurate identification of elements in the image or the pure image PDF document, the attribute analysis and the logic layering are realized, the disassembly of the document into structured data is realized, a reliable data base is provided for the subsequent conversion, and the introduction of an intermediate structured document such as an HTML (Hyper Text Markup Language ) document supports the real-time editing of the document and the preview editing effect of a user, thereby improving the flexibility of the document editing and the controllability of the conversion process.

Fig. 2 is a schematic flow chart of a file conversion method provided in the present application, where the file conversion method may be executed by any device or module having corresponding data processing capability, for example, a user terminal, a server of software, etc., as shown in fig. 2, and the file conversion method includes:

Step S201, a file to be converted is obtained and is an image file or a PDF file.

Wherein the file to be converted may be a PDF file comprising at least partially non-editable content. The non-editable content may be an image. Non-editable refers to the inability to edit using general-purpose software, and the portion of content may be edited by some specialized software.

The file to be converted may be a PDF file in which the contents of the partial page are images, or a PDF file in which the contents are plain images, that is, a plain image PDF document, for example. The content of the plain image PDF document is not composed of editable elements such as text, vector images and the like, but is composed of one image, and the content cannot be directly edited, i.e. the elements in the plain image PDF document only comprise images, and editable elements such as characters, tables, vector graphics and the like do not exist. The pure image PDF document can be a PDF document obtained by scanning a paper file, a scanned piece of an electronic file, a PDF document obtained by splicing pictures, and the like. The pure image PDF is generally large in file size and inconvenient to transfer and store.

For example, the file to be converted, for example, a pure image PDF document, may be directly uploaded through an "upload" key of the file upload function as in fig. 1, or one or more scan pictures may be obtained by scanning the file as the file to be converted through a scan function or a photographing function, or the multiple scan pictures may be processed into the pure image PDF document, and the pure image PDF document is used as the file to be converted, so as to perform subsequent file conversion, thereby obtaining the editable PDF document.

When uploading the file to be converted, the file to be converted can be uploaded from various paths such as a local file of the device, a recently browsed or saved file, a file in a preset program and the like.

In some embodiments, the web page screenshot can be obtained by screenshot of a browsed web page, the web page screenshot is taken as a file to be converted, or the pure image PDF document is taken as the file to be converted when the web page screenshot is processed into the pure image PDF document.

The specific way to process an image into an image-only PDF document may be to import the image into a blank PDF document or to convert the image into an image-only PDF document using related software or tools.

Step S202, identifying elements in the file to be converted, determining the attribute of the identified elements based on the identification result, and logically layering the identified elements, wherein the elements comprise texts.

Since the file to be converted cannot be directly edited, the elements in the file to be converted need to be identified through an identification or detection algorithm, and the elements in the file to be converted can comprise one or more of images, tables, graphics and the like besides texts. Where the image is commonly referred to as a raster image (RASTER IMAGE), also known as a bitmap, is made up of a matrix of pixels, each pixel containing color and location information, in a common format JPEG, PNG, GIF, etc. The graph is usually a vector graph, such as a graph formed by basic elements of points, lines, polygons, etc., or a shape defined by a mathematical formula, such as a flowchart, a graph, etc.

The attributes of the elements include the attributes of the elements themselves, such as position, size, content or type of text, font, color, bold, slant, etc., and typesetting attributes of the elements, such as alignment, spacing, etc.

Before identifying the elements in the file to be converted, in order to improve the accuracy of identification, the file to be converted may be preprocessed, for example, image enhancement, tilt correction, page segmentation, and the like. The image enhancement specifically improves the definition of the image in the file to be converted through contrast adjustment, noise reduction, binarization and other processes. The tilt correction is used to correct the angle of the image in the file to be converted so that the text lines in the image are horizontally arranged, and can be implemented by hough transform, projection, a depth learning algorithm, or the like. The page segmentation is used for segmenting a file to be converted containing a plurality of pages into single pages, and/or segmenting different areas from the single pages so as to perform element identification on the segmented single pages or areas.

The various elements in the file to be converted may be identified by element detection and localization algorithms, such as edge detection algorithms, object detection models (e.g., YOLO, fast R-CNN, etc., models), morphological operations (bloating, erosion), etc.

Taking text as an example, text in an image file or PDF file may be recognized by an optical character recognition (Optical Character Recognition, OCR) algorithm.

Specifically, text lines in the file to be converted can be positioned through a projection method, connected domain analysis or deep learning model, and then the text lines are split into single characters, the characters are recognized through models such as RNN (Recurrent Neural Network, cyclic neural network), CNN (Convolutional Neural Network ) and the like, and some attributes of the characters, such as fonts, word sizes, colors and the like, are determined.

For images and graphics, the identification of images or graphics may be accomplished by computer vision algorithms. Specifically, an image area and a graph area can be obtained through color threshold segmentation, edge detection and boundary extraction are carried out on the image area and the graph area, and the image contour and the graph contour are positioned by combining with connected domain analysis, so that the positioning of the image and the graph is realized. Or the image elements and graphics in the file to be converted may be identified or segmented using an object detection model or an instance segmentation model, such as outputting coordinate frames of the image elements and graphics.

The method comprises the steps of processing a form and analyzing connectivity to realize the recognition of the form, or carrying out line segment detection and straight line fitting on a region of a recognized text, determining intersection points between linear elements, constructing a connection relation network based on the intersection points between the linear elements, and judging whether the connection relation network is the form. Or may identify tables in the file to be converted based on a deep learning algorithm.

After identifying each type of element in the file to be converted, based on the identification result, determining the attribute of each element, such as position, such as coordinates, alignment mode, resolution, text type, font size, color, row and column number, cell spacing, etc., and may further include relationship attribute between elements, such as spatial relationship, logic relationship, etc.

The logical layering step is used to organize the identified elements into a multi-level structure in logical or read order. Specifically, based on the types and positions of the identified various elements, the logic layering of the identified elements can be realized, and a background layer, a text layer, a graphic layer, a table layer and the like are obtained. The background layer may be composed of non-content elements such as watermarks, page borders, background, etc. The text layer is obtained by dividing the identified text blocks according to the grades of paragraphs, titles and the like, and the text blocks are composed of the identified texts in one connected area. In the image layers, each image identified may be a layer independently, and one or more of the identified graphics may be a layer. The form is located in a form layer solely and comprises row and column data and association relation with text.

In the step of logical layering, a tree structure may be used to store the hierarchical relationships of the elements obtained.

For example, fig. 3 is a schematic diagram of a hierarchical relationship provided by the embodiment of the present application, where, as shown in fig. 3, a pure image PDF document includes 2 pages, and page 1 includes, from top to bottom, a piece of text (including a header H1 and 2 paragraphs p1 and p 2), an image img1, and a table 1, and then the hierarchical relationship of the pure image PDF document is shown in fig. 3, where a root node is used to represent the pure image PDF document, child nodes of the root node are pages, such as page 1 and page 2, and multiple descendant nodes exist in a node corresponding to each page to represent a structure of the content of the page. Taking page 1 as an example, the included child nodes comprise a background layer, a text layer, an image layer and a table layer, wherein the text layer comprises 3 child nodes, namely a title (H1), a text paragraph 1 (p 1) and a text paragraph 2 (p 2), and the two child nodes of the image layer and the table layer comprise child nodes of an image 1 (img 1) and a table 1 (table 1).

Step S203, based on the logic layering result and the attribute of the identification element, a structured document is generated.

Illustratively, the structured document may be an HTML document, an XML (Extensible Markup Language ) document, or a document in other structured format.

Specifically, taking an HTML document as an example, HTML tags may be mapped for each hierarchy based on a logical hierarchy result, for example, a text layer may be mapped to a title (e.g., < H1> to < H6 >), a paragraph (e.g., < p >, < span >), etc., an image layer is mapped to < img >, < figure >, etc., a table layer is mapped to < table >, and the content and style of the HTML tags are determined by the attributes of the elements, and visual reduction is implemented using the CSS style.

The logically layered result may be in JSON or XML format. The HTML template can be predefined, the logic layering result is converted into a data structure which can be identified by the template engine, the attribute of each element after analysis corresponds to the HTML or CSS label, the presentation mode of each element in the HTML is determined, and the converted data is combined with the HTML module through the template engine to generate the final displayable HTML document.

After the structured document is generated, the structured document may be presented by an editor to facilitate user editing of the structured document, such as modifying text content, changing text attributes, adjusting image size, replacing images, adjusting image locations, editing forms, and the like.

Step S204, converting the structured document or the edited structured document into an editable PDF file.

Illustratively, the editable PDF file can be an editable PDF document, and the format of the editable PDF file can be PDF/A, PDF/X, PDF/E, standard PDF and the like.

After the structured document is generated, the structured document may be directly converted into an editable PDF document. Or the structured document can be displayed first after being generated, and if the PDF conversion instruction issued by the user is detected, or if the editing operation of the user is not detected within a preset time period, the structured document is converted into the editable PDF document.

Taking a structured document as an HTML document as an example, converting the HTML document (or the edited HTML document) into an editable PDF document specifically can comprise three steps of document analysis, layout calculation and element mapping, wherein the document analysis step mainly realizes the analysis of the HTML document to obtain an abstract syntax tree, extracts labels, text contents and hierarchical relations, analyzes CSS styles to realize the style combination of the same element to obtain a final style of the element, the layout calculation step converts the element from an HTML box model to a PDF coordinate system to obtain the absolute coordinate of the element in the PDF document, maps the element into the PDF document according to the absolute coordinate and the final style to reserve the original style of the element, and can determine the breakpoint of a text line based on the width of the PDF document and the font size of the text.

Optionally, the method further comprises the steps of providing an editing interface of the structured document, and responding to editing operation input by a user through the editing interface and aiming at the structured document, updating the content of the structured document to obtain the edited structured document.

After the structured document is displayed, the user is supported to edit the content of the structured document through an editing interface. The editing operation of the text comprises, but is not limited to, adding characters, deleting characters, modifying characters, adjusting character attributes and the like, wherein the character attribute adjustment comprises character font adjustment, character size adjustment, color adjustment, character spacing adjustment, line spacing adjustment, thickening adjustment, inclination adjustment and the like, the image editing operation comprises scaling adjustment, image replacement adjustment, rotation adjustment, alignment adjustment and the like, the image editing operation comprises scaling adjustment, position adjustment, rotation adjustment, filling adjustment, transparency adjustment, shading adjustment, alignment adjustment and the like, the image editing such as curve shape adjustment, element combination disassembly in the image and the like, the table editing operation comprises inserting lines/columns, deleting lines/columns, adjusting line columns, merging cells, splitting cells, editing cell content, adjusting table style and the like, the table style adjustment comprises frame color adjustment, thickness adjustment, background color adjustment, font style adjustment and the like, the table row height adjustment, the table width adjustment, the table position adjustment, the table column size adjustment, the ordering of certain table data and the like.

After the structured document is generated, or one of the pages of the structured document is generated, page-by-page rendering and interactive editing based on page granularity is supported. Exemplary, fig. 4 is a schematic diagram of an editing interface of a structured document according to an embodiment of the present application, where, as shown in fig. 4, the editing interface of the structured document adopts a dual toolbar collaborative design, including a bottom resident toolbar and a region floating toolbar. The bottom resident toolbar is fixed in position and is positioned at the bottom of the interface, and the contained controls can be default controls or can be dynamically adjusted based on the types of elements selected by the user. The region floating toolbar is positioned in the user operation region for dynamic rendering, and when the user selects a page element, editing controls for the element are displayed in a floating mode above the selected element. In fig. 4, taking an example that a user selects a section of text, the editing interface presents a text attribute adjustment control, where a toolbar resides at the bottom, where the text attribute adjustment control includes controls for adjusting text attributes, colors, and spaces (including word spaces and line spaces), such as controls corresponding to fonts, word sizes (e.g., increasing word sizes by a ⁺ and decreasing word sizes by a ^-), thickening (B), tilting (I), underlining (U), and the like, and where the region floating toolbar includes controls corresponding to text editing, text deleting, full selection, copying, translating, moving, adjusting word sizes (including word size+ and word size-), thickening, alignment (including left alignment, centering, and right alignment), and the like.

The file conversion method provided by the embodiment is used for determining the attribute of an element in a file and logically layering the identification element through an element identification step aiming at the file which can not be edited and occupies a large memory, such as an image or a PDF file, so as to generate a structured document, such as an HTML document, based on the element attribute and layering result, finally converting the structured document or the edited structured document into an editable PDF document, and enabling a user to directly perform operations such as text selection, copying and searching in the editable PDF document, thereby improving the operability of the file and the convenience of reusing editable elements such as text in the file, supporting the instant editing and the effect preview of the document through the generation of an intermediate structured document, and simultaneously, the structured document can completely reserve the typesetting of the image or PDF, thereby improving the quality of the editable PDF conversion.

The method comprises the steps of selecting a file to be converted, wherein the file to be converted comprises at least two elements selected from text, images, tables and graphs, identifying the elements in the file to be converted, determining the attribute of the identified elements based on an identification result, and logically layering the identified elements, and comprises the steps of carrying out multi-mode identification on the file to be converted, identifying the elements in the file to be converted, determining the attribute of each identified element, determining the spatial relationship and semantic association of the identified elements based on the attribute of each identified element, and logically layering each identified element based on the spatial relationship and semantic association.

Optionally, before identifying the element in the file to be converted, the method further comprises determining an interference attribute of the file to be converted, wherein the interference attribute comprises at least one of text definition, form integrity and image interference degree, and executing the step of identifying the element in the file to be converted if the interference attribute indicates that the file to be converted can be typeset.

Optionally, the obtaining the file to be converted includes obtaining at least one picture to be converted, and embedding the at least one picture into a blank portable document format file, namely a blank PDF file, to obtain a pure image PDF file, namely the file to be converted.

Fig. 5 is a second flow chart of the file conversion method provided by the present application, and the embodiment is based on the embodiment of fig. 2, and the file conversion method is described in detail, as shown in fig. 5, and the file conversion method specifically includes the following steps:

Step S501, at least one picture to be converted is acquired.

The at least one picture to be converted can be a scanned picture of the document or a picture uploaded by a user.

Step S502, embedding the at least one picture into a blank PDF document to obtain a file to be converted.

The file to be converted is a pure image PDF file. The at least one picture may be a scanned picture of the document, or may be a photograph (e.g., a landscape photograph, an artistic photograph, etc.), or a picture of a chart, an electronic book, a comic book, etc., such as a screenshot.

The pages in the blank PDF document can be regarded as a canvas, each picture to be converted is positioned through a coordinate system, and the pictures are stored in the PDF pages in a Stream (Stream) mode, so that the pure image PDF document is obtained.

The at least one picture may also be pre-processed, e.g. format converted, color space converted, size and resolution adapted, etc., and may also be losslessly compressed, before embedding the at least one picture in the blank PDF document.

At least one picture can be processed into a picture list, PDF pages are generated one by one according to the ordering in the picture list, and the pure image PDF document is obtained. One picture can correspond to one PDF page, or a plurality of continuous pictures are spliced based on the size to obtain spliced pictures, and the spliced pictures are embedded into the same PDF page.

In the picture list, at least one picture may be ordered according to the order of uploading or obtaining, or page numbers and/or contents in the pictures may be identified, and the order of the pictures in the picture list is determined based on the identified page numbers and/or contents.

After embedding all the pictures into the PDF page, frames, page numbers and titles can be added to the PDF page to obtain the pure image PDF document.

In step S503, an interference attribute of the file to be converted is determined, where the interference attribute includes at least one of text definition, form integrity, and image interference degree.

The image interference degree is used for indicating the shielding degree of the watermark, the seal and the non-text area detected in the file to be converted such as a pure image PDF file to the text area.

After obtaining the file to be converted, before converting the file to be converted, the complexity of the file to be converted needs to be determined first to determine whether the typesetting of the file to be converted can be identified through the subsequent steps in the embodiment.

Specifically, the interference attribute of the file to be converted can be quantized based on computer vision, machine learning algorithm and the like so as to evaluate typesetability of the file.

For a text region (a region where the text is located) in a file to be converted, determining the text definition of the text region by calculating the edge gradient value of the text region, taking a page, a single image or the whole file as granularity, counting the average value of the text definition of the text region, determining an index of the text definition based on the average value, and determining whether the index indicates that the file to be converted can be typeset or not.

In some embodiments, for text sharpness, the text sharpness may be determined by the accuracy of OCR recognition or based on the quality of each image or page of images in the document to be converted.

Specifically, taking a pure image PDF document as an example, the text definition of the pure image PDF document or each page of characters in the pure image PDF document can be determined based on the confidence average value of the OCR recognition pure image PDF document or each page of characters, and the higher the confidence average value, the better the text definition. The Laplacian variance of a plain image PDF document or text region per page may be calculated, with a smaller value of the variance indicating that the plain image PDF document or text region per page is less sharp, i.e., more blurred.

And when the text definition of the file to be converted is lower than a set threshold value, the file to be converted is considered to be non-typeable. Or when the text definition of any page of the file to be converted is lower than a set threshold value, the file to be converted or the page is considered to be non-typeable.

For the table integrity, detecting the continuity and the cross point density of the table grid lines through the Hough transformation, and determining the table integrity based on the continuity and the cross point density of the table grid lines, wherein the higher the continuity of the table grid lines is, the more the cross point density is consistent with the row number of the table, and the higher the table integrity is. Or a U-Net segmentation model can be used to divide the table into two types of line segments and background, the number of connected domains of the segmented line segments is calculated, and the more the number of the connected domains is, the more serious the fracture of the table is, and the lower the integrity of the table is. When the integrity of the table is lower than the set threshold value, the file to be converted, the table or the page where the table is located can be considered to be non-typeable.

For the image interference degree, the shielding proportion of the watermark, the seal or the non-text area in the file to be converted to character recognition can be recognized, and the higher the shielding proportion is, the higher the image interference degree is. And if the image interference degree is higher than the set threshold value, the file to be converted or the corresponding page is considered to be non-typeable.

Step S504, if the interference attribute indicates that the file to be converted can be typeset, multi-mode identification is performed on the file to be converted, elements in the file to be converted are identified, and the attribute of each identified element is determined.

Specifically, it may be determined whether the file to be converted or the corresponding page is typeset based on the calculated weighted value of each interference attribute. If the partial pages of the file to be converted are not typeable, the partial pages which are not typeable can be skipped, the element identification is only carried out on the typeable pages, and blank pages can be sampled in the generated structural document to represent the partial pages which are not typeable. And if the file to be converted does not have typeset pages, determining that the file to be converted is not typeset.

Optionally, the method further comprises determining that the file to be converted is typeable if the text definition is greater than a first threshold, the table integrity is greater than a second threshold, and the image interference degree is less than a third threshold.

The first threshold, the second threshold, and the third threshold are configurable parameters.

And typesetting judgment can be performed according to the granularity of the page, namely, for each page of the file to be converted, if the text definition of the page is greater than a first threshold, the form integrity is greater than a second threshold and the image interference degree is less than a third threshold, determining that the page file can be typeset.

The method has the advantages of simple logic and easy implementation, realizes the identification of unsuitable fuzzy file processing, realizes the quality pre-judgment of the file before element identification, avoids the invalid conversion of the file with serious and fuzzy noise, reduces the calculation cost and ensures the quality of editable PDF conversion.

Further, when the file to be converted or part of the file to be converted is page-non-typeable, prompt information can be generated and fed back to the user, and the prompt information can be generated based on the page where the attribute which does not meet the typeable judgment condition is located, the corresponding element and the like, so as to prompt the user to rescan the corresponding page, thereby improving the definition of the corresponding element.

Optionally, the elements in the file to be converted include at least two elements of text, images, tables and graphics.

When the file to be converted can be typeset, in order to be qualified for the multi-mode scene when identifying the elements in the file to be converted, the multi-mode identification algorithm can be specifically used for identification.

The method comprises the steps of locating text blocks by using an OCR engine, extracting text content and boundary frame coordinates, distinguishing texts of different types such as texts, titles, headers and footers in the text blocks through layout analysis, identifying image areas based on color clustering and edge detection, distinguishing elements in the image areas as images or graphics by using a deep learning model, locating the images and the graphics, detecting grid lines based on Hough transformation, analyzing and dividing cells by combining connected domains, determining the table structure and the content in the cells by using a semantic division model, and locating the table.

Fonts, font sizes, bolding, tilting and the like in the attributes of the text can be extracted by a deep learning model, and the color can be obtained by a color extractor.

Optionally, the multi-modal identification is performed on the file to be converted, elements in the file to be converted are identified, and the attribute of each identified element is determined, including text detection is performed on the file to be converted, text elements in the file to be converted are identified, the attribute of each identified text element is determined, image elements are identified and the attribute of each identified image element is determined according to the characteristics of pixels in the area and the spatial relationship of the pixels, table elements are identified and the attribute of each identified table element is determined according to the alignment mode of each identified text and the visual separation line between each identified text, and if the file to be converted still has an unrecognized residual area, a graph is identified and the attribute of each identified graph is determined according to the contour characteristics and the connected area of the images in the residual area.

In the multi-modal recognition, the recognition may be performed in the order of text, image, form, and graphic recognition.

Specifically, the text block is positioned based on a deep learning model or a machine vision algorithm, and the attribute of the Chinese character in the text block is recognized based on an OCR algorithm.

After the region in which the text block-level text is located is identified, image recognition is performed from the region in which the text is not identified. Specifically, based on the characteristics of pixels in the area of the unrecognized text and the spatial relationship of the pixels, the pixels are clustered, for example, based on a K-means clustering algorithm, texture analysis is performed on each cluster in a clustering result, whether the area corresponding to the cluster is an image is determined, if so, the size of the image is determined based on the connected area of the area where the image is located, and the content of the image is identified based on a target detection model.

The method comprises the steps of carrying out row and column grouping on the identified text blocks, carrying out row and column grouping on the identified text blocks according to y coordinates to obtain row grouping, carrying out clustering on the identified text blocks according to x coordinates to obtain column grouping, calculating row and column intersecting points (vertexes of cells) based on the distance between the row grouping and the column grouping, extracting visual separation lines based on Hough detection transformation, calculating the coincidence ratio of the visual separation lines and the intersecting points, determining whether a table exists or not based on the coincidence ratio and an alignment mode of the text blocks, realizing table positioning, and determining row and column numbers, row heights, column widths, cell contents and the like of the table.

And carrying out edge detection, contour extraction and connected domain analysis on the residual region after the text, the image and the table are identified, and determining whether the residual region has the graph or not based on an analysis result.

The method ensures the effective recognition of the text serving as the core information carrier through the recognition sequence of the multi-mode elements, fully considers the characteristic expression of different elements by setting different recognition strategies for different elements, and improves the accuracy of multi-mode element recognition.

Step S505, based on the identified attributes of each element, determines the spatial relationship and semantic association of the identified elements.

Wherein the spatial relationship may include at least one of an adjacent relationship, an aligned relationship, and an inclusion relationship.

For each identified element, the adjacent relation among the elements can be determined based on the distance between the elements such as Euclidean distance, a space adjacency graph is constructed based on the adjacent relation, the nodes are the identified elements, the edge weight is the inverse of the distance, the alignment relation among the elements can be determined by whether the centers of the elements are positioned on the same horizontal/vertical line within the allowable error range, and the inclusion relation among the elements is determined by the inclusion condition among the bounding boxes of the elements.

Semantic association is used to represent the degree of association of semantics between elements, such as the degree of association of text content of different paragraphs, the degree of association of content and/or titles of images, tables, graphics, etc. with previous or subsequent paragraphs, etc.

Step S506, carrying out logic layering on the identified elements based on the spatial relationship and semantic association of the elements.

For the content layer, the spatial relationship and semantic association of the elements can be used for determining the layering of the content layer identification elements, taking texts as an example, text blocks can be clustered based on the spatial relationship and semantic association among each text row or each text block, and the identified texts are divided into different layers, such as a title layer, a text layer and the like based on the clustering result.

For text in the immediate vicinity of the identified image/graphic/form, the title of the image/graphic/form may be determined from the immediate vicinity of the text based on semantic association, the title and the image/graphic/form being partitioned into the same hierarchy.

Step S507, based on the logical layering result and the attribute of the identification element, a structured document is generated.

Step S508, converting the structured document or the edited structured document into an editable PDF document.

In the embodiment, an extremely simple generation link taking a picture as a starting point is constructed, a user can quickly start an editable PDF generation process directly through a single picture or a batch of pictures, the method is more convenient, judgment of the dischargeable version of a file to be converted is realized through quantification of document interference attributes, identification of an unsuitable fuzzy document is avoided, quality prejudgment of the document before element identification is carried out, invalid conversion of the document with serious and fuzzy noise is avoided, calculation cost is reduced, PDF conversion quality is ensured, multi-mode elements in the document are effectively identified by utilizing multi-mode identification, comprehensiveness of element identification is improved, logic layering of the elements is realized through spatial relationship among the identification elements and depth understanding of semantic association, layering accuracy is improved, and reduction degree of the editable PDF document is further improved.

When the file to be converted comprises a plurality of pages, an asynchronous processing mechanism can be adopted to remarkably improve the user experience. In order to solve the problem that a user needs to wait for the whole PDF document to finish editing after the structured document conversion, the embodiment adopts an asynchronous processing mechanism to realize smooth experience of loading a subsequent page while previewing. Specifically, each page in the file to be converted can be processed in parallel as an independent task, each page of the file to be converted is generated in the mode and converted into the pages in the structured document one by one, and each page in the structured document can be preloaded after being obtained, so that the rendering efficiency is improved. For the structured document, the page-by-page rendering can be performed according to the page sequence, or the page rendering flow is dynamically triggered based on the user behavior so as to preferentially render the page which is being viewed by the user and the subsequent pages.

The method comprises the steps of determining the attribute of each page of the file to be converted, determining the attribute of each page of the identification element based on the identification result, and logically layering the identified element, wherein the step of generating the structured document comprises the step of asynchronously executing the step of generating the page corresponding to the follow-up page in the structured document based on the logic layering result of the current page and the attribute of the identification element when the page corresponding to the current page in the structured document is generated based on the logic layering result of the current page and the attribute of the identification element.

For each page in the file to be converted, an asynchronous mechanism can be adopted to identify elements in the page, the attribute of the page identification element is determined based on the identification result of the page, the page is logically layered, the structured page corresponding to the page is generated based on the logical layering result of the page and the attribute of the identification element, and the structured document is composed of the structured pages of each page.

In order to improve the efficiency of converting the file to be converted into the structured document, an asynchronous mechanism may be adopted while the structured page of the previous page, such as the i-th page, is generated, and the structured page of the subsequent page, such as the i+1th page, the i+2th page, and the like, is generated.

The specific step of generating the structured page of each page may be performed with reference to the step of generating the structured document, and only the object is replaced by the whole file to be converted by a certain page of the file to be converted, which is not described herein.

Optionally, the method further comprises the steps of determining a page currently viewed by a user based on the monitored operation event, and immediately pushing the page to be rendered to the front end for rendering after the page corresponding to the page to be rendered is generated, wherein the page to be rendered comprises the current page viewed by the user and a page subsequent to the current page.

The subsequent page may be one or more pages subsequent to the current page viewed by the user. The page corresponding to the page to be rendered is specifically a page corresponding to the page to be rendered in the structured document, namely the structured page of the page to be rendered.

To preferentially render user view pages and their subsequent pages, user operational events, such as scroll events (Scroll Evernt), click events (CLICK EVERNT), keyboard events (Keyboard Evernt), touch events (Touch Evernt), etc., may be captured in real-time by monitoring the front-end's interaction events.

Based on the monitored operational events, a page currently viewed by the user (simply referred to as a current page) is determined. For a scrolling event, the current page can be determined through the vertical scrolling distance and the height of a single page, the user can also navigate to the corresponding page through the page number navigation button, the corresponding page is determined to be the current page, and when the user turns over the page in a touch sliding manner, the current page can be determined through the sliding distance and the sliding speed.

After determining the current page, the current page and the subsequent N pages may be regarded as pages to be rendered, where N is a positive integer, e.g., 2, 3, 5, etc.

N may be a default value, such as 2, or may be determined based on the total number of pages of the file to be converted and the device capabilities, or based on the user behavior scenario. The number of subsequent pages, i.e. the value of N, may be determined, for example, based on the sliding speed. When the user edits and views the page, N can be determined to be the minimum value such as 0 or 1, and as the browsing requirement on the subsequent page is reduced when the user edits the page, resource allocation optimization can be realized by reducing N, so that the response speed of editing operation is ensured, and the fluency of the editing operation is improved.

Illustratively, when the structured document is loaded for the first time, the default current page is the first page, the number of subsequent pages is 2, N may be 1 when the user turns pages slowly (the sliding or scrolling speed is less than the preset speed), N may be 5 when the user turns pages rapidly, and N may be 1 when the user editing and viewing pages are detected.

Further, after the editable PDF document is generated, the editable PDF document may be presented or previewed to facilitate viewing of the editable PDF document by a user to determine whether to export or store the editable PDF document and whether to edit the editable PDF document.

And providing flexible and various interactive functions on a visual operation interface of the editable PDF document so as to meet the editing and management requirements of users in all aspects. The interface supports single-page fine display and multi-page thumbnail overview, and a user can quickly switch the browsing view angle through the paging navigation bar or the scaling of the sidebar. The method and the device support the user to select a single page or a batch of pages for editing, and can be easily realized no matter the text content of a certain page is modified, or the uniform format adjustment is carried out on a plurality of pages.

On the content editing, a rich editing tool is provided, which comprises annotation addition and deletion, picture insertion and position adjustment, text content addition and deletion, and helps users to efficiently complete content optimization. In terms of file attribute management, a user can flexibly adjust page formats, page sizes, compression attributes, encryption attributes and the like. In addition, the visual operation interface also supports page deletion and new addition operations, and when a new page is added, the method of generating the editable PDF document can be referred to, and will not be described herein.

Fig. 6 is a schematic diagram of a visual operation interface of an editable PDF document according to an embodiment of the present application, and after the editable PDF document is generated, the generated editable PDF document is displayed through a display interface of the visual operation interface, where fig. 6 is a single page display, and a plurality of functionality controls including "edit", "page management", "file attribute", "add" and "complete" are deployed at the bottom of the display interface, for example. The editing control is used for editing the currently displayed content, activating an editing mode of the currently displayed page after being triggered, enabling a cursor to be changed into a text selection state, and supporting editing operations such as text editing and element deleting. After the page management control is triggered, a page management interface is displayed and used for carrying out operations of single pages or batch pages, including extracting, deleting, rotating and adding pages, so that partial pages are extracted and stored as independent PDF files, selecting pages are deleted, rotating the selected pages according to a configuration angle, adding pages, and page sequences can be adjusted in a dragging mode, and page numbers can be automatically adjusted after the sequence adjustment. The file attribute control is triggered to display a floating panel above the file attribute control and comprises a plurality of controls such as volume compression control, document encryption control, page number adjustment control and page size control, wherein the volume compression control is used for compressing the whole document, such as lossless compression, standard compression, limit compression and the like, the document encryption control is used for encrypting the document, an opening password and an editing right password can be set, an encryption algorithm can be supported, and the page number adjustment control and the page size control are respectively used for adjusting the format of a page number and the size of the page in the document. The "add" control is used for adding content to the editable PDF document, and after being triggered, the corresponding toolbar is displayed above the editable PDF document, and the corresponding toolbar comprises controls such as "endorsement", "handwriting", "text insertion", "picture insertion", and "rubber". The method comprises the steps of enabling a 'annotating' control to support the insertion of yellow note patterns, enabling characters to be input, setting character attributes such as colors and fonts, enabling highlighting background colors, underlines, deleting lines and the like to be added to selected texts, enabling a 'handwriting' control to support custom pen touch colors, thickness and the like, drawing handwriting in real time to simulate handwriting modes, enabling contents to be added to pages, enabling an 'erasing' control to support erasure of handwriting contents or annotates in designated areas, enabling characters and pictures to be inserted in designated positions through 'inserting characters' and 'inserting pictures', and enabling positions of subsequent contents to be adjusted through automatic typesetting after the characters and pictures are inserted.

In some embodiments, converting the editable PDF document into other formats, such as word, is also supported.

The embodiment of the application also provides another file conversion method, which comprises the steps of responding to a file to be converted uploaded by a user, processing the file to be converted into a structured file, wherein the file to be converted is an image file or a portable file format file, namely a PDF file, displaying the structured file, responding to a file conversion instruction, and converting the displayed structured file or the edited structured file into an editable portable file format file, namely the PDF file.

The specific process of processing the file to be converted into the structured document can be performed with reference to the foregoing embodiments. Specifically, the method can identify elements in the file to be converted, determine the attribute of the identified elements based on the identification result and logically layer the identified elements, wherein the elements comprise texts, and generate a structured document based on the logical layering result and the attribute of the identified elements.

After obtaining the hierarchical relationship (specifically obtained through a logic layering step) of the file to be converted in the process of processing the file to be converted into the structured document, the hierarchical relationship can be displayed, so that a user can adjust the hierarchical relationship conveniently, and the structured document is generated based on the adjusted hierarchical relationship and the attribute of the identification element.

The structured document may be presented through an editor or editing interface and a user may edit the presented structured document including, but not limited to, modifying text content (adding text, deleting text, modifying text, etc.), changing text properties (font, font size, bolding, tilting, color, etc.), adjusting image size, replacing images, adjusting image positions, editing forms, etc.

Responding to the editing operation of the structured document input by a user through the editing interface, updating the content of the structured document, and obtaining the edited structured document.

After the structured document is displayed, if a file conversion instruction issued by a user is detected, or if the editing operation of the user is not detected within a preset time period, the structured document is converted into an editable PDF document, if the editing operation of the user is detected within the preset time period, the structured document is updated based on the editing operation of the user to obtain the edited structured document, and if the file conversion instruction issued by the user is detected, the edited structured document is converted into the editable PDF document, under the condition that the user edits for a plurality of times, the structured document is updated based on each editing operation of the user, and if the editing operation of the user is not detected within the preset time period after the last editing operation, the updated structured document after the last editing operation is converted into the editable PDF document.

For the characters in the structured document, editing operations of a user include, but are not limited to, adding characters, deleting characters, modifying characters, adjusting character attributes and the like, wherein the adjusting character attributes can specifically include adjusting characters, colors, character spacing, line spacing, thickening, tilting and the like, editing operations of an image include scaling, image replacement, image position adjustment, rotation, alignment adjustment and the like, editing operations of a graph include scaling, position adjustment, rotation, filling, transparency, shading, alignment and the like, graph editing such as modifying curve shapes, combining and splitting elements in the graph and the like, editing operations of a table include inserting rows/columns, deleting rows/columns, adjusting row columns, merging cells, splitting cells, editing cell contents, adjusting table patterns and the like, adjusting table patterns including frame colors, thicknesses, background colors, table rows, column widths, changing table positions, ordering certain table columns and the like.

Corresponding to the file conversion method provided in the foregoing embodiment, the embodiment of the present application further provides a file conversion device. Fig. 7 is a schematic structural diagram of a file conversion device provided by an embodiment of the present application, where the file conversion device includes a file to be converted obtaining module configured to obtain a file to be converted, where the file to be converted is an image file or a portable document format file, an element identifying module configured to identify an element in the file to be converted, determine an attribute of the identified element based on the identification result, and logically layer the identified element, where the element includes a text, a structured document generating module configured to generate a structured document based on the logical layering result and the attribute of the identified element, and an editable file converting module configured to convert the structured document or the edited structured document into an editable portable document format file, that is, an editable PDF document.

In one possible implementation manner, the elements in the file to be converted comprise at least two elements of text, images, tables and graphs, and the element identification module comprises a multi-mode identification unit used for carrying out multi-mode identification on the file to be converted, identifying the elements in the file to be converted and determining the attribute of each identified element, a relationship analysis unit used for determining the spatial relationship and semantic association of the identified elements based on the attribute of each identified element, and a logic layering unit used for logically layering each identified element based on the spatial relationship and semantic association.

In one possible implementation manner, the multi-mode identification unit is specifically configured to perform text detection on the file to be converted, identify text elements therein and determine attributes of the identified text elements, identify image elements and determine attributes of the identified image elements based on features of pixels and spatial relationships of the pixels in an area for an area of unrecognized text, identify form elements and determine attributes of the identified form elements based on an alignment mode of the identified text and a visual separation line between the identified text, and identify graphics and determine attributes of the identified graphics based on contour features and connected areas of images in the remaining area if the file to be converted still has unrecognized remaining area.

In a possible implementation manner, the device further comprises a document screening module, wherein the document screening module is used for identifying interference attributes of the files to be converted before identifying the elements in the files to be converted, the interference attributes comprise at least one of text definition, form integrity and image interference degree, and if the interference attributes indicate that the files to be converted can be typeset, the step of identifying the elements in the files to be converted is executed.

In a possible implementation manner, the device further comprises a typesetability determining module, which is used for determining that the file to be converted can be typeset if the text definition is greater than a first threshold, the table integrity is greater than a second threshold and the image interference degree is less than a third threshold.

In one possible implementation manner, the file to be converted comprises a plurality of pages, and the element identification module is specifically configured to identify elements of each page in the file to be converted, determine an attribute of each page of identification elements based on an identification result of each page, and logically layer the elements identified by each page.

Correspondingly, the structured document generation module is specifically configured to asynchronously execute the logical layering result and the attribute of the identification element based on the subsequent page when generating the page corresponding to the current page in the structured document based on the logical layering result and the attribute of the identification element of the current page, and generate the page corresponding to the subsequent page in the structured document.

In a possible implementation manner, the device further comprises a to-be-rendered page determining and rendering module, wherein the to-be-rendered page determining and rendering module is used for determining a current page viewed by a user based on the monitored operation event, and immediately pushing the page corresponding to the to-be-rendered page to the front end for rendering, wherein the to-be-rendered page comprises the current page viewed by the user and a page subsequent to the current page viewed by the user.

In one possible implementation manner, the device further comprises a document editing module, wherein the document editing module is used for providing an editing interface of the structured document, and updating the content of the structured document to obtain the edited structured document in response to the editing operation of the structured document, which is input by a user through the editing interface.

In one possible implementation manner, the file to be converted obtaining module is specifically configured to obtain at least one picture to be converted, and embed the at least one picture into a blank portable document format file, namely a blank PDF file, to obtain a pure image PDF file, namely the file to be converted.

The file conversion device provided in this embodiment may execute the file conversion method provided in any of the above embodiments, and its implementation principle and technical effects are similar, and this embodiment will not be described here in detail.

Fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application. As shown in fig. 8, the electronic device 80 provided in this embodiment includes at least one processor 801 and a memory 802. Optionally, the electronic device 80 further comprises a communication component 803. The processor 801, the memory 802, and the communication section 803 are connected via a bus 804.

In a specific implementation, at least one processor 801 executes computer-executable instructions stored in memory 802, causing the at least one processor 801 to perform the methods described above.

The specific implementation process of the processor 801 may refer to the above-mentioned method embodiment, and its implementation principle and technical effects are similar, and this embodiment will not be described herein again.

In the above embodiment, it should be understood that the Processor may be a central processing unit (english: central Processing Unit, abbreviated as CPU), or may be other general purpose processors, digital signal processors (english: DIGITAL SIGNAL Processor, abbreviated as DSP), application specific integrated circuits (english: application SPECIFIC INTEGRATED Circuit, abbreviated as ASIC), or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the present invention may be embodied directly in a hardware processor for execution, or in a combination of hardware and software modules in a processor for execution.

The Memory may include high-speed Memory (Random Access Memory, RAM) or may further include Non-volatile Memory (NVM), such as at least one disk Memory.

The bus may be an industry standard architecture (Industry Standard Architecture, ISA) bus, an external device interconnect (PERIPHERAL COMPONENT, PCI) bus, or an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus, among others. The buses may be divided into address buses, data buses, control buses, etc. For ease of illustration, the buses in the drawings of the present application are not limited to only one bus or to one type of bus.

The application also provides a computer program product comprising a computer program which, when executed by a processor, implements the method described above.

The application also provides a computer readable storage medium, wherein computer execution instructions are stored in the computer readable storage medium, and when a processor executes the computer execution instructions, the method is realized.

The above-described readable storage medium may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk. A readable storage medium can be any available medium that can be accessed by a general purpose or special purpose computer.

An exemplary readable storage medium is coupled to the processor such the processor can read information from, and write information to, the readable storage medium. In the alternative, the readable storage medium may be integral to the processor. The processor and the readable storage medium may reside in an Application SPECIFIC INTEGRATED Circuits (ASIC). The processor and the readable storage medium may reside as discrete components in a device.

The division of units is merely a logical function division, and there may be another division manner in actual implementation, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method of the embodiments of the present invention. The storage medium includes a U disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, an optical disk, or other various media capable of storing program codes.

Those of ordinary skill in the art will appreciate that all or a portion of the steps of implementing the various method embodiments described above may be implemented by hardware associated with program instructions. The foregoing program may be stored in a computer readable storage medium. The program, when executed, performs the steps comprising the method embodiments described above, and the storage medium described above includes various media capable of storing program code, such as ROM, RAM, magnetic or optical disk.

Finally, it should be noted that other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This invention is intended to cover any adaptations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains and as may be applied to the precise construction hereinbefore set forth and shown in the drawings and as follows in the scope of the appended claims.

Claims

1. A method for converting a file, comprising:

the method comprises the steps of obtaining a file to be converted, wherein the file to be converted is an image file or a portable document format file;

identifying elements in the file to be converted, determining the attribute of the identified elements based on the identification result, and logically layering the identified elements, wherein the elements comprise texts;

generating a structured document based on the logical layering result and the attribute of the identification element;

and converting the structured document or the edited structured document into an editable portable document format file.

2. The method of claim 1, wherein the portable document format file includes at least partially non-editable content.

3. The method of claim 1, wherein the elements in the file to be converted comprise at least two elements of text, images, tables, and graphics, wherein the identifying the elements in the file to be converted, and determining attributes of the identified elements based on the identification results, and logically layering the identified elements, comprises:

Carrying out multi-mode identification on the file to be converted, identifying elements in the file to be converted, and determining the attribute of each identified element;

determining the spatial relationship and semantic association of the identified elements based on the identified attributes of the elements;

and logically layering the identified elements based on the spatial relationship and the semantic association.

4. A method according to claim 3, wherein said multimodal recognition of the document to be converted, recognizing elements therein and determining attributes of the recognized elements, comprises:

performing text detection on the file to be converted, identifying text elements in the file to be converted, and determining the attribute of the identified text elements;

Identifying image elements for regions of unidentified text based on features of pixels in the regions and spatial relationships of the pixels and determining attributes of the identified image elements;

Identifying form elements and determining attributes of the identified form elements based on the alignment pattern of the identified text and the visual separation line between the identified text;

And if the file to be converted still has an unidentified residual area, identifying the graph and determining the attribute of the identified graph based on the contour features and the connected domain of the image in the residual area.

5. The method of claim 1, wherein prior to identifying the element in the file to be converted, the method further comprises:

Determining interference attributes of the file to be converted, wherein the interference attributes comprise at least one of text definition, form integrity and image interference degree;

and if the interference attribute indicates that the file to be converted can be typeset, executing the step of identifying the elements in the file to be converted.

6. The method of claim 5, wherein the method further comprises:

and if the text definition is greater than a first threshold, the form integrity is greater than a second threshold and the image interference degree is less than a third threshold, determining that the file to be converted can be typeset.

7. The method of claim 1, wherein the structured document is a markup language document, the file to be converted comprises a plurality of pages, the identifying the elements in the file to be converted, and determining attributes of the identified elements and logically layering the identified elements based on the identification results comprises:

identifying elements of each page in the file to be converted, determining the attribute of each page of identification elements based on the identification result of each page, and logically layering the elements identified by each page;

the generating a structured document based on the logical layering result and the attribute of the identification element comprises:

when a page corresponding to the current page in the structured document is generated based on the logical layering result of the current page and the attribute of the identification element, asynchronously executing the logical layering result and the attribute of the identification element based on the subsequent page, and generating the page corresponding to the subsequent page in the structured document.

8. The method of claim 7, wherein the method further comprises:

determining a current page viewed by a user based on the monitored operation event;

And immediately pushing the page corresponding to the page to be rendered to the front end for rendering, wherein the page to be rendered comprises a current page checked by the user and a subsequent page of the current page.

9. The method according to any one of claims 1-8, further comprising:

providing an editing interface of the structured document;

And responding to the editing operation input by the user through the editing interface and aiming at the structured document, and updating the content of the structured document to obtain the edited structured document.

10. The method according to any one of claims 1-8, wherein the obtaining the file to be converted includes:

Acquiring at least one picture to be converted;

And embedding the at least one picture into a blank portable document format file to obtain the file to be converted.

11. A document conversion apparatus, comprising:

The file to be converted is an image file or a portable document format file;

the element identification module is used for identifying the elements in the file to be converted, determining the attribute of the identified elements based on the identification result and logically layering the identified elements, wherein the elements comprise texts;

the structured document generation module is used for generating a structured document based on the logic layering result and the attribute of the identification element;

and the editable file conversion module is used for converting the structured document or the edited structured document into an editable portable document format file.

12. An electronic device is characterized by comprising a memory and a processor;

The memory stores computer-executable instructions;

The processor executing computer-executable instructions stored in the memory, causing the processor to perform the method of any one of claims 1-10.

13. A computer program product comprising a computer program which, when executed by a processor, implements the method of any of claims 1-10.