CN111275048A

CN111275048A - PPT reproduction method based on OCR character recognition technology

Info

Publication number: CN111275048A
Application number: CN202010040969.7A
Authority: CN
Inventors: 吴振东; 李锐; 金长新
Original assignee: Jinan Inspur Hi Tech Investment and Development Co Ltd
Current assignee: Shandong Inspur Scientific Research Institute Co Ltd
Priority date: 2020-01-15
Filing date: 2020-01-15
Publication date: 2020-06-12
Anticipated expiration: 2040-01-15
Also published as: CN111275048B

Abstract

The invention provides a PPT reproduction method based on an OCR character recognition technology, which belongs to the technical field of picture reproduction. The user reproduction time is greatly saved, and the quality of the generated pptx file is improved.

Description

PPT reproduction method based on OCR character recognition technology

Technical Field

The invention relates to a picture reproduction technology, in particular to a PPT reproduction method based on an OCR character recognition technology.

Background

Ocr (optical character recognition) text recognition refers to the process of an electronic device (e.g., a scanner or digital camera) examining printed characters on paper and then translating the shapes into computer text using character recognition methods.

The PPT is presentation software, and a user can perform presentation on a projector or a computer and also can print out the presentation to be made into a film so as to be applied to wider fields.

In real life, PPT pages displayed by some other people are often seen in a watched video and a attended conference, if the PPT pages are required to be reproduced, a picture is shot or photographed, and then certain time and energy are spent to reproduce the PPT pages again according to the pattern in the picture.

Disclosure of Invention

In order to solve the technical problems, the invention provides a PPT reproduction method realized based on an OCR character recognition technology and Python language, which provides convenience for users and saves the time of the users.

The technical scheme of the invention is as follows:

a PPT reproduction method based on OCR character recognition technology is characterized in that a PPT display page in a picture is reproduced into an approximate pptx file through a system constructed by realizing OCR character recognition technology and Python language through a mobile phone camera or other equipment and directly shooting or uploading the picture.

The method comprises the following steps that the whole method is divided into two stages, wherein the first stage is to identify the elements such as a title, a subtitle, a text, a style, a font and a picture on the PPT page of the original picture by using an OCR character recognition technology; and in the second stage, the PPT is reproduced by using a Python language, and the contents identified in the first stage are correspondingly arranged and combined by using a Python-pptx library to generate a pptx file which is approximate to a PPT page in the original image.

Further, in the above-mentioned case,

the OCR character recognition model is generated by training a large number of pictures with pptx pages, so that not only can the character content be extracted, but also the basic information of each element in the pictures can be extracted, for example: the character size, the font color, the background color, the type of artistic character, the position of the PPT page, the size proportion of the PPT page occupied by the image, the type of the chart and the like. The input of the model is a picture in a JPG or PNG format, and the output is a Json file.

The Python program is implemented primarily using a Python-pptx library. And generating a corresponding PPT page through the position information, specific content and style of each module in the Json file and a corresponding instruction, and outputting the PPTx file page similar to the PPT page contained in the picture input in the OCR character recognition model. The input to the Python program is the Json file and the output is the. pptx file.

In a still further aspect of the present invention,

first, OCR character recognition includes the steps of:

step 1, image input: and reading the picture uploaded by the user.

Step 2, removing redundancy: the program firstly identifies the PPT page in the picture, the rest parts are regarded as redundancy and are directly removed, the PPT page in the picture is regarded as the picture to be processed next, and the picture is subjected to tilt correction.

And 3, removing noise: removing interference factors and abnormal pixel points in the picture;

step 4, layout analysis: each element in the picture is analyzed, for example: titles, text, pictures, artistic words, diagrams, shapes, formulas, etc.;

step 5, cutting characters (pictures): cutting each element analyzed by the layout into single modules;

step 6, character recognition and labeling: inputting the information of each module into a trained OCR model, and identifying and labeling the information of the corresponding module, such as character content, font size, font style, coordinates of the position point and the like; if the module is a picture, only the position information of the module is recorded; if the module is a chart, the same chart style needs to be found in the ppt element library;

and 7, outputting a Json file: the module information is recorded in json format,

for example:

a second part: the reproduction of PPT by Python language includes the following steps:

step 1, analyzing a Json file: reading the Json file by using Python language, analyzing the content in the Json file, and converting the Json file into a dictionary data structure;

and step 2, reproducing the content: and generating corresponding elements on the position information in each module information according to the data of the dictionary data structure by utilizing a Python-pptx library. Circulating until all contents of the single-page PPT are finished;

and 3, combining all the generated single-page PPT, uniformly adding the background according to the information in the Json file, storing the pptx file, and outputting the pptx file to a user.

The invention has the advantages that

The PPTX file can be generated by directly reproducing the PPT page which is desired to be realized by the user in the picture, so that the PPTX file which is very close to the original picture is generated, the reproduction time of the user is greatly saved, the quality of the generated PPTX file is improved, and the PPT effect in the picture is very close to that of the PPT in the original picture.

Drawings

FIG. 1 is a schematic workflow diagram of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer and more complete, the technical solutions in the embodiments of the present invention will be described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention, and based on the embodiments of the present invention, all other embodiments obtained by a person of ordinary skill in the art without creative efforts belong to the scope of the present invention.

The input of the invention is a picture in JPG or PNG format, the system also provides camera function, and the picture can be directly shot and stored as the picture in JPG format. The output file format of the system is pptx.

The invention is composed of two parts, the first part is an OCR character recognition model, and the second part is a PPT file reproduction program written by Python language.

In order to generate an OCR character recognition model specific to PPT pages in the system, OpenCV technology is used, a Classify (vgg16) network is used for detecting text directions, a CTPN (CNN + RNN) network is used for detecting text regions, and CRNN (CNN + LSTM + CTC) is used for text recognition. Firstly, training each network mentioned above through a large number of pictures containing PPT pages and labeled information, and finally generating the PPT page OCR character recognition model specially aiming at the invention.

The OCR character recognition model firstly detects a PPT page of a picture uploaded by a user, if no PPT page exists in the picture, a Json file containing error reporting information is output to a Python program of the system, and the program outputs the error reporting information on a display terminal: 'you upload pictures without PPT page'. And if the picture contains the PPT page, regarding the rest parts as redundancy, directly removing the redundancy, regarding the PPT page in the picture as a picture to be processed next, and performing tilt correction on the picture. And removing interference factors, such as abnormal pixel points, carried in the corrected picture. Each element in the picture is analyzed, for example: title, text, picture, artistic word, diagram, shape, formula, etc. And cutting each element analyzed by the layout into a single module, and identifying information such as text content, font size, font style, coordinates of the position point and the like corresponding to each module according to the information of each module for marking. And recording the marking information into a Json format, and transmitting the Json file to a Python program of the system.

The Python program firstly creates a PPT page by utilizing a Python-pptx library, and then analyzes specific content, position, style, format, font size and the like of a corresponding element according to information of each module in a Json file, and converts the specific content, position, style, format, font size and the like into a dictionary data structure. And generating corresponding elements on the position information in each module information according to the data in the dictionary data structure. And circulating until the whole single page PPT is completed. And finally, combining all the generated single-page PPTs, uniformly adding backgrounds according to the information in the Json file, saving the PPTs into a pptx file, and outputting the pptx file to a user.

If the user only uploads a single picture, the system outputs a pptx file containing only a single page of PPT. If the user uploads multiple pictures at one time, the system outputs a pptx file containing multiple pages of PPT.

The above description is only a preferred embodiment of the present invention, and is only used to illustrate the technical solutions of the present invention, and not to limit the protection scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims

1. A PPT reproduction method based on OCR character recognition technology is characterized in that,

the PPT display page in the picture is copied into an approximate pptx file through an OCR character recognition technology and Python language.

2. The method of claim 1,

the first stage is to identify elements on the PPT page of the original picture by using an OCR character identification technology.

3. The method of claim 2,

and in the second stage, the PPT is reproduced by using a Python language, and the contents identified in the first stage are correspondingly arranged and combined by using a Python-pptx library to generate a pptx file of a PPT page.

4. The method of claim 3,

the pictures with the pptx pages are trained to generate an OCR character recognition model, so that not only can the character content be extracted, but also the basic information of each element in the pictures can be extracted.

5. The method of claim 4,

the input of the OCR character recognition model is a picture in a JPG or PNG format, and the output is a Json file.

6. The method of claim 5,

the Python program is mainly realized by utilizing a Python-pptx library; and generating a corresponding PPT page through corresponding instructions according to the position information, specific contents and styles of each module in the Json file, wherein the input of the Python program is the Json file, and the output is the pptx file.

7. The method of claim 5,

OCR character recognition comprises the following steps:

step 1), image input: reading a picture uploaded by a user;

step 2), removing redundancy: firstly, identifying a PPT page in a picture by a program, regarding the rest parts as redundancy, directly removing the redundancy, regarding the PPT page in the picture as a picture to be processed next, and performing tilt correction on the picture;

step 3), removing noise: removing interference factors and abnormal pixel points in the picture;

step 4), layout analysis: analyzing each element in the picture;

step 5), cutting characters or pictures: cutting each element analyzed by the layout into single modules;

step 6), character recognition and labeling labels: inputting the information of each module into a trained OCR model, and identifying and labeling the information of the corresponding module, such as character content, font size, font style, coordinates of the position point and the like; if the module is a picture, only the position information of the module is recorded; if the module is a chart, the same chart style needs to be found in the ppt element library;

step 7), outputting a Json file: and recording the module information into json format.

8. The method of claim 7,

the method for reproducing the PPT by using the Python language comprises the following steps:

step 1), analyzing a Json file: reading the Json file by using Python language, analyzing the content in the Json file, and converting the Json file into a dictionary data structure;

step 2), reproducing the content: generating corresponding elements on position information in each module information according to the data of the dictionary data structure by using a Python-pptx library; circulating until all contents of the single-page PPT are finished;

and 3) combining all the generated single-page PPT, uniformly adding the background according to the information in the Json file, storing the pptx file, and outputting the pptx file to a user.