CN111275048A - PPT reproduction method based on OCR character recognition technology - Google Patents
PPT reproduction method based on OCR character recognition technology Download PDFInfo
- Publication number
- CN111275048A CN111275048A CN202010040969.7A CN202010040969A CN111275048A CN 111275048 A CN111275048 A CN 111275048A CN 202010040969 A CN202010040969 A CN 202010040969A CN 111275048 A CN111275048 A CN 111275048A
- Authority
- CN
- China
- Prior art keywords
- ppt
- picture
- pptx
- file
- page
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 19
- 238000002372 labelling Methods 0.000 claims description 4
- 230000002159 abnormal effect Effects 0.000 claims description 3
- 101710129069 Serine/threonine-protein phosphatase 5 Proteins 0.000 description 43
- 101710199542 Serine/threonine-protein phosphatase T Proteins 0.000 description 43
- 229920000470 poly(p-phenylene terephthalate) polymer Polymers 0.000 description 43
- 238000012015 optical character recognition Methods 0.000 description 17
- 238000010586 diagram Methods 0.000 description 3
- 102100032202 Cornulin Human genes 0.000 description 1
- 101000920981 Homo sapiens Cornulin Proteins 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
- G06V10/267—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/14—Image acquisition
- G06V30/148—Segmentation of character regions
- G06V30/153—Segmentation of character regions using recognition of characters or words
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Character Discrimination (AREA)
Abstract
The invention provides a PPT reproduction method based on an OCR character recognition technology, which belongs to the technical field of picture reproduction. The user reproduction time is greatly saved, and the quality of the generated pptx file is improved.
Description
Technical Field
The invention relates to a picture reproduction technology, in particular to a PPT reproduction method based on an OCR character recognition technology.
Background
Ocr (optical character recognition) text recognition refers to the process of an electronic device (e.g., a scanner or digital camera) examining printed characters on paper and then translating the shapes into computer text using character recognition methods.
The PPT is presentation software, and a user can perform presentation on a projector or a computer and also can print out the presentation to be made into a film so as to be applied to wider fields.
In real life, PPT pages displayed by some other people are often seen in a watched video and a attended conference, if the PPT pages are required to be reproduced, a picture is shot or photographed, and then certain time and energy are spent to reproduce the PPT pages again according to the pattern in the picture.
Disclosure of Invention
In order to solve the technical problems, the invention provides a PPT reproduction method realized based on an OCR character recognition technology and Python language, which provides convenience for users and saves the time of the users.
The technical scheme of the invention is as follows:
a PPT reproduction method based on OCR character recognition technology is characterized in that a PPT display page in a picture is reproduced into an approximate pptx file through a system constructed by realizing OCR character recognition technology and Python language through a mobile phone camera or other equipment and directly shooting or uploading the picture.
The method comprises the following steps that the whole method is divided into two stages, wherein the first stage is to identify the elements such as a title, a subtitle, a text, a style, a font and a picture on the PPT page of the original picture by using an OCR character recognition technology; and in the second stage, the PPT is reproduced by using a Python language, and the contents identified in the first stage are correspondingly arranged and combined by using a Python-pptx library to generate a pptx file which is approximate to a PPT page in the original image.
Further, in the above-mentioned case,
the OCR character recognition model is generated by training a large number of pictures with pptx pages, so that not only can the character content be extracted, but also the basic information of each element in the pictures can be extracted, for example: the character size, the font color, the background color, the type of artistic character, the position of the PPT page, the size proportion of the PPT page occupied by the image, the type of the chart and the like. The input of the model is a picture in a JPG or PNG format, and the output is a Json file.
The Python program is implemented primarily using a Python-pptx library. And generating a corresponding PPT page through the position information, specific content and style of each module in the Json file and a corresponding instruction, and outputting the PPTx file page similar to the PPT page contained in the picture input in the OCR character recognition model. The input to the Python program is the Json file and the output is the. pptx file.
In a still further aspect of the present invention,
first, OCR character recognition includes the steps of:
step 1, image input: and reading the picture uploaded by the user.
Step 2, removing redundancy: the program firstly identifies the PPT page in the picture, the rest parts are regarded as redundancy and are directly removed, the PPT page in the picture is regarded as the picture to be processed next, and the picture is subjected to tilt correction.
And 3, removing noise: removing interference factors and abnormal pixel points in the picture;
step 4, layout analysis: each element in the picture is analyzed, for example: titles, text, pictures, artistic words, diagrams, shapes, formulas, etc.;
step 5, cutting characters (pictures): cutting each element analyzed by the layout into single modules;
step 6, character recognition and labeling: inputting the information of each module into a trained OCR model, and identifying and labeling the information of the corresponding module, such as character content, font size, font style, coordinates of the position point and the like; if the module is a picture, only the position information of the module is recorded; if the module is a chart, the same chart style needs to be found in the ppt element library;
and 7, outputting a Json file: the module information is recorded in json format,
for example:
a second part: the reproduction of PPT by Python language includes the following steps:
step 1, analyzing a Json file: reading the Json file by using Python language, analyzing the content in the Json file, and converting the Json file into a dictionary data structure;
and step 2, reproducing the content: and generating corresponding elements on the position information in each module information according to the data of the dictionary data structure by utilizing a Python-pptx library. Circulating until all contents of the single-page PPT are finished;
and 3, combining all the generated single-page PPT, uniformly adding the background according to the information in the Json file, storing the pptx file, and outputting the pptx file to a user.
The invention has the advantages that
The PPTX file can be generated by directly reproducing the PPT page which is desired to be realized by the user in the picture, so that the PPTX file which is very close to the original picture is generated, the reproduction time of the user is greatly saved, the quality of the generated PPTX file is improved, and the PPT effect in the picture is very close to that of the PPT in the original picture.
Drawings
FIG. 1 is a schematic workflow diagram of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer and more complete, the technical solutions in the embodiments of the present invention will be described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention, and based on the embodiments of the present invention, all other embodiments obtained by a person of ordinary skill in the art without creative efforts belong to the scope of the present invention.
The input of the invention is a picture in JPG or PNG format, the system also provides camera function, and the picture can be directly shot and stored as the picture in JPG format. The output file format of the system is pptx.
The invention is composed of two parts, the first part is an OCR character recognition model, and the second part is a PPT file reproduction program written by Python language.
The OCR character recognition model is generated by training a large number of pictures with pptx pages, so that not only can the character content be extracted, but also the basic information of each element in the pictures can be extracted, for example: the character size, the font color, the background color, the type of artistic character, the position of the PPT page, the size proportion of the PPT page occupied by the image, the type of the chart and the like. The input of the model is a picture in a JPG or PNG format, and the output is a Json file.
The Python program is implemented primarily using a Python-pptx library. And generating a corresponding PPT page through the position information, specific content and style of each module in the Json file and a corresponding instruction, and outputting the PPTx file page similar to the PPT page contained in the picture input in the OCR character recognition model. The input to the Python program is the Json file and the output is the. pptx file.
In order to generate an OCR character recognition model specific to PPT pages in the system, OpenCV technology is used, a Classify (vgg16) network is used for detecting text directions, a CTPN (CNN + RNN) network is used for detecting text regions, and CRNN (CNN + LSTM + CTC) is used for text recognition. Firstly, training each network mentioned above through a large number of pictures containing PPT pages and labeled information, and finally generating the PPT page OCR character recognition model specially aiming at the invention.
The OCR character recognition model firstly detects a PPT page of a picture uploaded by a user, if no PPT page exists in the picture, a Json file containing error reporting information is output to a Python program of the system, and the program outputs the error reporting information on a display terminal: 'you upload pictures without PPT page'. And if the picture contains the PPT page, regarding the rest parts as redundancy, directly removing the redundancy, regarding the PPT page in the picture as a picture to be processed next, and performing tilt correction on the picture. And removing interference factors, such as abnormal pixel points, carried in the corrected picture. Each element in the picture is analyzed, for example: title, text, picture, artistic word, diagram, shape, formula, etc. And cutting each element analyzed by the layout into a single module, and identifying information such as text content, font size, font style, coordinates of the position point and the like corresponding to each module according to the information of each module for marking. And recording the marking information into a Json format, and transmitting the Json file to a Python program of the system.
The Python program firstly creates a PPT page by utilizing a Python-pptx library, and then analyzes specific content, position, style, format, font size and the like of a corresponding element according to information of each module in a Json file, and converts the specific content, position, style, format, font size and the like into a dictionary data structure. And generating corresponding elements on the position information in each module information according to the data in the dictionary data structure. And circulating until the whole single page PPT is completed. And finally, combining all the generated single-page PPTs, uniformly adding backgrounds according to the information in the Json file, saving the PPTs into a pptx file, and outputting the pptx file to a user.
If the user only uploads a single picture, the system outputs a pptx file containing only a single page of PPT. If the user uploads multiple pictures at one time, the system outputs a pptx file containing multiple pages of PPT.
The above description is only a preferred embodiment of the present invention, and is only used to illustrate the technical solutions of the present invention, and not to limit the protection scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.
Claims (8)
1. A PPT reproduction method based on OCR character recognition technology is characterized in that,
the PPT display page in the picture is copied into an approximate pptx file through an OCR character recognition technology and Python language.
2. The method of claim 1,
the first stage is to identify elements on the PPT page of the original picture by using an OCR character identification technology.
3. The method of claim 2,
and in the second stage, the PPT is reproduced by using a Python language, and the contents identified in the first stage are correspondingly arranged and combined by using a Python-pptx library to generate a pptx file of a PPT page.
4. The method of claim 3,
the pictures with the pptx pages are trained to generate an OCR character recognition model, so that not only can the character content be extracted, but also the basic information of each element in the pictures can be extracted.
5. The method of claim 4,
the input of the OCR character recognition model is a picture in a JPG or PNG format, and the output is a Json file.
6. The method of claim 5,
the Python program is mainly realized by utilizing a Python-pptx library; and generating a corresponding PPT page through corresponding instructions according to the position information, specific contents and styles of each module in the Json file, wherein the input of the Python program is the Json file, and the output is the pptx file.
7. The method of claim 5,
OCR character recognition comprises the following steps:
step 1), image input: reading a picture uploaded by a user;
step 2), removing redundancy: firstly, identifying a PPT page in a picture by a program, regarding the rest parts as redundancy, directly removing the redundancy, regarding the PPT page in the picture as a picture to be processed next, and performing tilt correction on the picture;
step 3), removing noise: removing interference factors and abnormal pixel points in the picture;
step 4), layout analysis: analyzing each element in the picture;
step 5), cutting characters or pictures: cutting each element analyzed by the layout into single modules;
step 6), character recognition and labeling labels: inputting the information of each module into a trained OCR model, and identifying and labeling the information of the corresponding module, such as character content, font size, font style, coordinates of the position point and the like; if the module is a picture, only the position information of the module is recorded; if the module is a chart, the same chart style needs to be found in the ppt element library;
step 7), outputting a Json file: and recording the module information into json format.
8. The method of claim 7,
the method for reproducing the PPT by using the Python language comprises the following steps:
step 1), analyzing a Json file: reading the Json file by using Python language, analyzing the content in the Json file, and converting the Json file into a dictionary data structure;
step 2), reproducing the content: generating corresponding elements on position information in each module information according to the data of the dictionary data structure by using a Python-pptx library; circulating until all contents of the single-page PPT are finished;
and 3) combining all the generated single-page PPT, uniformly adding the background according to the information in the Json file, storing the pptx file, and outputting the pptx file to a user.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010040969.7A CN111275048B (en) | 2020-01-15 | 2020-01-15 | PPT reproduction method based on OCR character recognition technology |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010040969.7A CN111275048B (en) | 2020-01-15 | 2020-01-15 | PPT reproduction method based on OCR character recognition technology |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111275048A true CN111275048A (en) | 2020-06-12 |
CN111275048B CN111275048B (en) | 2023-04-18 |
Family
ID=70998954
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010040969.7A Active CN111275048B (en) | 2020-01-15 | 2020-01-15 | PPT reproduction method based on OCR character recognition technology |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111275048B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111753108A (en) * | 2020-06-28 | 2020-10-09 | 平安科技(深圳)有限公司 | Presentation generation method, device, equipment and medium |
CN112949471A (en) * | 2021-02-27 | 2021-06-11 | 浪潮云信息技术股份公司 | Domestic CPU-based electronic official document identification reproduction method and system |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109492206A (en) * | 2018-10-10 | 2019-03-19 | 深圳市容会科技有限公司 | PPT presentation file method for recording, device, computer equipment and storage medium |
CN109815765A (en) * | 2019-01-21 | 2019-05-28 | 东南大学 | A method and device for extracting business license information containing two-dimensional code |
CN110210413A (en) * | 2019-06-04 | 2019-09-06 | 哈尔滨工业大学 | A kind of multidisciplinary paper content detection based on deep learning and identifying system and method |
-
2020
- 2020-01-15 CN CN202010040969.7A patent/CN111275048B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109492206A (en) * | 2018-10-10 | 2019-03-19 | 深圳市容会科技有限公司 | PPT presentation file method for recording, device, computer equipment and storage medium |
CN109815765A (en) * | 2019-01-21 | 2019-05-28 | 东南大学 | A method and device for extracting business license information containing two-dimensional code |
CN110210413A (en) * | 2019-06-04 | 2019-09-06 | 哈尔滨工业大学 | A kind of multidisciplinary paper content detection based on deep learning and identifying system and method |
Non-Patent Citations (2)
Title |
---|
NEWBY: ""百度OCR文字识别"", 《博客园》 * |
阮颐等: ""基于 Python 的示波器图像数据识别"", 《研究与设计》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111753108A (en) * | 2020-06-28 | 2020-10-09 | 平安科技(深圳)有限公司 | Presentation generation method, device, equipment and medium |
CN111753108B (en) * | 2020-06-28 | 2023-08-25 | 平安科技(深圳)有限公司 | Presentation generation method, device, equipment and medium |
CN112949471A (en) * | 2021-02-27 | 2021-06-11 | 浪潮云信息技术股份公司 | Domestic CPU-based electronic official document identification reproduction method and system |
Also Published As
Publication number | Publication date |
---|---|
CN111275048B (en) | 2023-04-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11849196B2 (en) | Automatic data extraction and conversion of video/images/sound information from a slide presentation into an editable notetaking resource with optional overlay of the presenter | |
CN109756751B (en) | Multimedia data processing method and device, electronic equipment and storage medium | |
JP4576427B2 (en) | Annotated image generation method and camera | |
US20130104016A1 (en) | Digital comic editor, method and non-transitory computer-readable medium | |
CN111275048B (en) | PPT reproduction method based on OCR character recognition technology | |
CN113436222A (en) | Image processing method, image processing apparatus, electronic device, and storage medium | |
Tymoshenko et al. | Real-Time Ukrainian Text Recognition and Voicing. | |
CN111723653B (en) | Method and device for reading drawing book based on artificial intelligence | |
CN111881904A (en) | Blackboard writing recording method and system | |
CN110674825A (en) | Character recognition method, device and system applied to intelligent voice mouse and storage medium | |
CN112365402B (en) | Intelligent winding method and device, storage medium and electronic equipment | |
CN106162328A (en) | A kind of video synchronizing information methods of exhibiting and system | |
US20240078007A1 (en) | Information processing apparatus, information processing method, and program | |
CN101887207B (en) | Method for searching and displaying image-aided data and physical projector using the method | |
JP2006135664A (en) | Picture processor and program | |
CN111611986B (en) | Method and system for extracting and identifying focus text based on finger interaction | |
CN114445744A (en) | Education video automatic positioning method, device and storage medium | |
JP4250983B2 (en) | Device for associating user data with continuous data | |
CN2896374Y (en) | Portable digital photographic character recognition translator | |
CN112989943B (en) | Information processing method and information processing device | |
CN118524240B (en) | Streaming media file generation method, terminal and storage medium | |
JP2004185424A (en) | Presentation recording device | |
CN112766270B (en) | Image segmentation method, device and storage medium | |
CN110516542B (en) | Staff image automatic identification method and system, storage medium and terminal | |
US20230336839A1 (en) | Method, computer device, and storage medium for generating video cover |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20230320 Address after: 250000 building S02, No. 1036, Langchao Road, high tech Zone, Jinan City, Shandong Province Applicant after: Shandong Inspur Scientific Research Institute Co.,Ltd. Address before: 250100 First Floor of R&D Building 2877 Kehang Road, Sun Village Town, Jinan High-tech Zone, Shandong Province Applicant before: JINAN INSPUR HIGH-TECH TECHNOLOGY DEVELOPMENT Co.,Ltd. |
|
GR01 | Patent grant | ||
GR01 | Patent grant |