RU2721188C2

RU2721188C2 - Improved contrast and noise reduction on images obtained from cameras

Info

Publication number: RU2721188C2
Application number: RU2017143913A
Authority: RU
Inventors: Василий Васильевич Логинов; Иван Германович Загайнов
Original assignee: Общество с ограниченной ответственностью "Аби Продакшн"
Priority date: 2017-12-14
Filing date: 2017-12-14
Publication date: 2020-05-18
Also published as: US20190188835A1; RU2017143913A3; US20200175658A1; US10552949B2; US11107202B2; RU2017143913A

Abstract

FIELD: computer equipment.

SUBSTANCE: content of the present description can be realized by using a method which involves detecting one or more blocks in an electronic image comprising text symbols, detecting one or more text blocks among blocks which contain text symbols, determining average text contrast value for each of the text blocks, detecting the type of each pixel in each of the text blocks, based on average text contrast value, performing locally-adaptive filtering in a first neighborhood of pixels around each pixel from each text block to determine pixel brightness, based on the detected type, storing, in at least one data storage device, an electronic image containing the found brightness for each pixel in each of the text blocks.

EFFECT: technical result consists in improvement of contrast and noise reduction on image.

20 cl, 4 dwg

Description

ОБЛАСТЬ ТЕХНИКИFIELD OF TECHNOLOGY

[0001] Настоящее описание относится к улучшению контраста и снижению шума на изображении документа, полученного камерами, встроенными в устройство, такое как мобильное устройство.[0001] The present description relates to improving contrast and reducing noise in an image of a document obtained by cameras embedded in a device, such as a mobile device.

УРОВЕНЬ ТЕХНИКИBACKGROUND

[0002] Печатные документы на естественном языке до сих пор являются широко распространенным средством, используемым для коммуникации между людьми в рамках организаций, а также для распространения информации среди ее потребителей. С появлением повсеместно используемых мощных вычислительных ресурсов, включая персональные вычислительные ресурсы, реализованные в смартфонах, планшетах, ноутбуках и персональных компьютерах, а также с распространением более мощных вычислительных ресурсов облачных вычислительных сервисов, центров обработки данных и корпоративных серверов организаций и предприятий, шифрование и обмен информацией на естественном языке все чаще выполняется в виде электронных документов.[0002] Printed documents in natural language are still a widespread means used for communication between people within organizations, as well as for disseminating information among its consumers. With the advent of ubiquitous powerful computing resources, including personal computing resources implemented in smartphones, tablets, laptops and personal computers, as well as with the spread of more powerful computing resources of cloud computing services, data centers and corporate servers of organizations and enterprises, encryption and exchange of information in natural language is increasingly performed in the form of electronic documents.

[0003] В отличие от печатных документов, которые по своей сути представляют собой изображения, электронные документы содержат последовательности цифровых кодов символов и знаков естественного языка. Поскольку электронные документы имеют перед печатными документами преимущества по стоимости, возможностям передачи и рассылки, простоте редактирования и изменения, а также по надежности хранения, за последние 50 лет развилась целая отрасль, поддерживающая способы и системы преобразования печатных документов в электронные. Вычислительные способы и системы оптического распознавания символов, совместно с электронными сканерами, являются надежными и экономичными средствами получения изображений печатных документов и компьютерной обработки получаемых цифровых изображений содержащих текст документов с целью создания электронных документов, соответствующих печатным.[0003] Unlike printed documents, which are essentially images, electronic documents contain sequences of digital character codes and natural language characters. Since electronic documents have advantages over printed documents in terms of cost, transmission and distribution capabilities, ease of editing and modification, as well as storage reliability, an entire industry has developed over the past 50 years that supports methods and systems for converting printed documents into electronic ones. Computing methods and optical character recognition systems, together with electronic scanners, are reliable and cost-effective means of obtaining images of printed documents and computer processing of digital images obtained containing text documents in order to create electronic documents corresponding to printed ones.

[0004] С появлением смартфонов, оснащенных камерами, а также других мобильных устройств формирования изображения с процессорным управлением появилась возможность получения цифровых изображений, содержащих текст документов, с помощью широкого диапазона различных типов широко распространенных портативных устройств, включая смартфоны, недорогие цифровые камеры, недорогие камеры видеонаблюдения, а также устройства получения изображений, встроенные в мобильные вычислительные приборы, включая планшеты и ноутбуки. Получаемые при помощи таких портативных устройств и приборов цифровые изображения документов, содержащих текст, могут обрабатываться вычислительными системами оптического распознавания символов, в том числе приложениями оптического распознавания символов (также именуемыми обработкой OCR), установленными в смартфонах или реализованными на серверах, для создания соответствующих электронных документов.[0004] With the advent of smartphones equipped with cameras, as well as other processor-controlled mobile imaging devices, it has become possible to receive digital images containing text of documents using a wide range of different types of widely used portable devices, including smartphones, low-cost digital cameras, low-cost cameras CCTV, as well as image acquisition devices built into mobile computing devices, including tablets and laptops. Digital images of documents containing text obtained using such portable devices and devices can be processed by optical character recognition computer systems, including optical character recognition applications (also called OCR processing) installed on smartphones or implemented on servers, to create the corresponding electronic documents .

[0005] При локальной реализации функции обработки OCR, эта функция может быть реализована на портативном устройстве, и цифровое изображение может обрабатываться с целью создания распознанного текстового документа. Альтернативно, цифровое изображение, полученное электронным устройством пользователя (в сжатой форме или в фактическом размере), передается через сеть связи на сервер систем оптического распознавания символов для выполнения функции OCR на стороне сервера. Затем сервер (i) получает цифровое изображение, (ii) распаковывает цифровое изображение для получения распакованного цифрового изображения (в случаях, когда изображение передается в сжатой форме между портативным устройством и сервером) и (iii) выполняет функцию OCR на базе сервера для создания распознанного текстового документа на основе цифрового изображения, при этом распознанный текстовый документ содержит текст, сформированный на основе цифрового изображения. Затем сервер может передать распознанный текстовый документ обратно на электронное устройство пользователя через сеть связи (в исходном или сжатом виде).[0005] With the local implementation of the OCR processing function, this function can be implemented on a portable device, and a digital image can be processed to create a recognized text document. Alternatively, the digital image obtained by the user's electronic device (in compressed form or in actual size) is transmitted via the communication network to the server of the optical character recognition systems to perform the OCR function on the server side. The server then (i) receives a digital image, (ii) decompresses the digital image to obtain an unpacked digital image (in cases where the image is transmitted in compressed form between the portable device and the server) and (iii) performs the server-based OCR function to create the recognized text document based on a digital image, while the recognized text document contains text formed on the basis of a digital image. Then the server can transfer the recognized text document back to the user's electronic device via the communication network (in its original or compressed form).

ОПИСАНИЕ ЧЕРТЕЖЕЙDESCRIPTION OF DRAWINGS

[0006] На ФИГ. 1 представлена схема примера системы для улучшения контраста и (или) уменьшения шума на изображении документа, полученного камерой, встроенной в устройство, например, мобильное устройство.[0006] FIG. 1 is a diagram of an example system for improving contrast and (or) reducing noise in an image of a document received by a camera integrated in a device, for example, a mobile device.

[0007] На ФИГ. 2 представлена блок-схема примера процесса улучшения контраста и (или) уменьшения шума на изображении документа, полученного камерой, встроенной в устройство, например, мобильное устройство.[0007] In FIG. 2 is a flowchart of an example of a process for improving contrast and / or reducing noise in an image of a document received by a camera integrated in a device, for example, a mobile device.

[0008] На ФИГ. 3 представлена схема примера системы для слияния значений яркости пикселей в перекрывающихся частях текстовых блоков.[0008] In FIG. 3 is a diagram of an example system for merging pixel brightness values in overlapping parts of text blocks.

[0009] На ФИГ. 4 представлена схема примера вычислительной системы.[0009] FIG. 4 is a diagram of an example computing system.

ОПИСАНИЕ ПРЕДПОЧТИТЕЛЬНЫХ ВАРИАНТОВ РЕАЛИЗАЦИИDESCRIPTION OF PREFERRED EMBODIMENTS

[0010] Этот документ описывает системы и методики улучшения контраста и (или) уменьшения шума на изображении документа, полученного камерами, встроенными в устройство, например, мобильное устройство. Для уменьшения ошибок при оптическом распознавании символов (OCR) системы и методики, описанные в этом документе, получают высококачественное изображение документа с минимальным содержанием шума. Эти системы и методики подавляют шум и выполняют локальное деконтрастирование проблемных пикселей изображения.[0010] This document describes systems and techniques for improving the contrast and (or) reducing noise in the image of a document received by cameras embedded in a device, such as a mobile device. To reduce errors in optical character recognition (OCR), the systems and procedures described in this document produce a high-quality image of the document with minimal noise. These systems and techniques suppress noise and perform local de-contrasting of problematic image pixels.

[0011] Зашумленные изображения, полученные в условиях недостаточной освещенности, часто содержат локальные области расфокусировки или смаза. Иногда до обращения к системам OCR применяются стандартные методики повышения контурной резкости изображения, такие как «нерезкое маскирование». Однако эффект от использования этих традиционных методик в OCR ограничен сопутствующим усилением шума и возникновением проблем от подавления усиленного шума. Обычно функция OCR, выполняемая локально, например, на мобильном устройстве, обеспечивает более низкое качество результатов OCR по сравнению с функцией OCR на базе сервера (поскольку локально выполняемые системы OCR менее сложные, чем выполняемые на сервере системы OCR в связи с обычно ограниченными вычислительными ресурсами, доступными на портативном (мобильном) устройстве, по сравнению с сервером, выделенным для выполнения функции OCR). В любом случае, выполняется ли функция OCR локально или на сервере, эта функция включает некую предварительную обработку цифрового изображения для сокращения количества артефактов в цифровом изображении (т.е. уменьшение шума, оптического размытия и т.д.). В рамках функции OCR портативное устройство или сервер выполняет бинаризацию и вычислительноемкие процедуры OCR.[0011] Noisy images obtained in low light often contain local areas of defocus or blur. Sometimes, before resorting to OCR systems, standard techniques are used to increase the contour sharpness of the image, such as “unsharp masking”. However, the effect of using these traditional techniques in OCR is limited by concomitant noise amplification and the occurrence of problems from suppression of amplified noise. Typically, an OCR function executed locally, for example on a mobile device, provides lower quality OCR results than a server-based OCR function (since locally running OCR systems are less complex than server-based OCR systems due to usually limited computing resources, available on a portable (mobile) device compared to a server dedicated to the OCR function). In any case, whether the OCR function is performed locally or on the server, this function includes some preliminary processing of the digital image to reduce the number of artifacts in the digital image (i.e., reduce noise, optical blur, etc.). As part of the OCR function, a portable device or server performs binarization and computationally intensive OCR procedures.

[0012] В ходе бинаризации и процессов OCR изображение документа разбивается на блоки с примерно постоянными характеристиками шума и сигнала внутри блоков. Использование этих характеристик помогает найти высоко контрастные пиксели внутри текстовых блоков, и локальный контраст внутри каждого блока снижается до заданного уровня. Кроме того, ограничения на степень деконтрастирования (т.е. снижения уровня контраста) локальных импульсов шума может предотвратить искажение структур текста. Некоторые текстовые структуры могут быть похожи на локальные импульсы шума, поэтому ограничение на степень деконтрастирования предотвращает подавление текстовых структур наравне с локальными импульсами шума. Кроме того, эти системы и методики могут в сначала ослаблять обнаруженные импульсы, после чего выполнять выборочное сглаживание с новой дисперсией шума, не переключаясь между фильтрацией импульсов и сглаживанием на каждом пикселе. Это обеспечивает более эффективную работу усредняющего фильтра. Кроме того, сглаживание может выполняться по-разному для фоновых пикселей и пикселей связных компонент (например, для разных видов пикселей могут использоваться разные параметры фильтрации). Эти системы и методики также могут повышать контраст слабоконтрастных пикселей связных компонент до заданного уровня.[0012] During binarization and OCR processes, a document image is divided into blocks with approximately constant noise and signal characteristics within the blocks. Using these characteristics helps to find highly contrasting pixels inside text blocks, and the local contrast inside each block is reduced to a given level. In addition, restrictions on the degree of decontrasting (i.e., reducing the contrast level) of local noise pulses can prevent distortion of text structures. Some text structures may look like local noise pulses, so limiting the degree of decontrast prevents text structures from being suppressed along with local noise pulses. In addition, these systems and techniques can first weaken the detected pulses, and then perform selective smoothing with a new noise dispersion without switching between pulse filtering and smoothing at each pixel. This provides a more efficient operation of the averaging filter. In addition, smoothing can be performed differently for background pixels and pixels of connected components (for example, different filtering parameters can be used for different types of pixels). These systems and techniques can also increase the contrast of low-contrast pixels of connected components to a predetermined level.

[0013] Описанные в настоящем документе системы и методики могут обеспечить одно или более из следующих преимуществ. Например, когда шум на полученном от камеры изображении невозможно описать известными моделями шума (например, как гауссов шум или белый шум) и шум нельзя считать независимым от сигнала или истинного содержимого изображения, система может предусматривать устранение шума и (или) увеличение контраста изображения с учетом характеристик полезного сигнала изображения. В другом примере описываемые системы и методики могут улучшить контраст (например, уменьшить контраст между импульсами шума и пикселями фона и (или) увеличить контраст между пикселями связных компонент и пикселями фона), и (или) устранить шум, там, где использование стандартных локальных и нелокальных выборочных усредняющих фильтров с сохранением контуров может оставлять большое количество контрастных зашумленных пикселей. Традиционное подавление импульсов на изображении с интенсивной фильтрацией может привести к деградации полезного сигнала. Дисперсия шума на изображении, равная 25-30% или выше от контраста текста на изображении, может привести к неточности OCR. Таким образом, описанные системы и методики повышают точность технологии OCR за счет улучшения контраста и (или) устранения шума на изображении, где выполняется OCR, что, в противном случае может привести к неправильным результатам OCR.[0013] The systems and methods described herein may provide one or more of the following advantages. For example, when the noise in the image received from the camera cannot be described by known noise models (for example, Gaussian noise or white noise) and the noise cannot be considered independent of the signal or the true content of the image, the system may include eliminating noise and (or) increasing the contrast of the image taking into account characteristics of the desired image signal. In another example, the described systems and methods can improve contrast (for example, reduce the contrast between noise pulses and background pixels and (or) increase the contrast between pixels of connected components and background pixels), and (or) eliminate noise, where the use of standard local and non-local selective averaging filters with the preservation of contours can leave a large number of contrasting noisy pixels. Conventional pulse suppression in an image with intense filtering can lead to degradation of the desired signal. Dispersion of noise in the image, equal to 25-30% or higher of the contrast of the text in the image, can lead to inaccuracy of OCR. Thus, the described systems and methods increase the accuracy of the OCR technology by improving the contrast and / or eliminating noise in the image where the OCR is performed, which otherwise could lead to incorrect OCR results.

[0014] Подробности одного или более вариантов реализации изложены в прилагаемых чертежах и в описании ниже. Другие особенности и преимущества будут ясны из описания и чертежей, а также из пунктов формулы изобретения.[0014] Details of one or more embodiments are set forth in the accompanying drawings and in the description below. Other features and advantages will be apparent from the description and drawings, as well as from the claims.

[0015] На ФИГ. 1 представлена схема, которая иллюстрирует пример системы 100 для улучшения контраста или уменьшения шума на изображении документа, полученного с камеры, встроенной в устройство, например, мобильное устройство. Система 100 включает пользовательское электронное устройство 102. Пользовательское электронное устройство 102 может быть, например, персональным компьютером (например, настольным компьютером, портативным компьютером или нетбуком) или устройством беспроводной связи (например, смартфоном, сотовым телефоном или планшетом). Как показано на ФИГ. 1, пользовательское электронное устройство 102 реализовано как устройство беспроводной связи.[0015] In FIG. 1 is a diagram that illustrates an example of a system 100 for improving contrast or reducing noise in an image of a document received from a camera integrated in a device, such as a mobile device. System 100 includes a user electronic device 102. The user electronic device 102 may be, for example, a personal computer (eg, a desktop computer, laptop computer, or netbook) or a wireless device (eg, a smartphone, cell phone, or tablet). As shown in FIG. 1, the user electronic device 102 is implemented as a wireless communication device.

[0016] Пользовательское электронное устройство 102 включает или находится на связи как минимум с одним интерфейсом получения изображения 103а-b. Интерфейс получения изображения 103а-b включает оборудование (и соответствующее программное обеспечение при необходимости) для получения электронного изображения 120. Электронное изображение 120 может быть получено с физического документа 140, содержащего текст 142 (и, возможно, не текстовые элементы, такие как картинка 144). Физический документ 140 или другое отображение текста на электронном изображении 120 может содержать одну или более страниц, некоторые или все страницы могут содержать различный текст (который может включать перекрывающийся текст), различные рисунки, различную разметку, различные шрифты или размеры шрифтов и т.д.[0016] The user electronic device 102 includes or is in communication with at least one image acquisition interface 103a-b. The image acquisition interface 103a-b includes equipment (and corresponding software, if necessary) for acquiring an electronic image 120. An electronic image 120 can be obtained from a physical document 140 containing text 142 (and possibly non-text elements such as picture 144) . A physical document 140 or another display of text on an electronic image 120 may contain one or more pages, some or all pages may contain different text (which may include overlapping text), various drawings, different layouts, different fonts or font sizes, etc.

[0017] Как показано на ФИГ. 1, интерфейс получения изображения 103а-b реализован в виде камеры. Однако интерфейс получения изображения 103а-b может быть реализован в виде нескольких камер, одна или более из которых находятся на лицевой поверхности, как показано на пользовательском электронном устройстве 102, и первый интерфейс получения изображения 103а и (или) один или более находятся на тыльной поверхности пользовательского электронного устройства 102 и второго интерфейса получения изображения 103b. В приведенном ниже описании «камерой» или интерфейсом получения изображения 103а-b может называться один (или оба) интерфейса получения изображения 103а-b. Вместо этого или в дополнение к этому интерфейс получения изображения 103а-b может быть реализован в виде сканера или другого устройства получения изображения для получения электронного изображения физического документа.[0017] As shown in FIG. 1, the image acquisition interface 103a-b is implemented as a camera. However, the image acquisition interface 103a-b may be implemented in the form of several cameras, one or more of which are located on the front surface, as shown on the user electronic device 102, and the first image acquisition interface 103a and (or) one or more are located on the back surface a user electronic device 102 and a second image acquisition interface 103b. In the description below, a “camera” or image acquisition interface 103a-b may refer to one (or both) of the image acquisition interface 103a-b. Instead, or in addition to this, the image acquisition interface 103a-b may be implemented as a scanner or other image acquisition device for acquiring an electronic image of a physical document.

[0018] Пользовательское электронное устройство 102 может включать коммуникационный интерфейс для связи по сети 110 с серверной системой 112. Сеть 110 может содержать одно или более сетевых устройств, которые составляют сеть Интернет. Серверная система 112 может включать коммуникационный интерфейс для связи с пользовательским электронным устройством 102 по сети 110.[0018] The user electronic device 102 may include a communication interface for communication over the network 110 with the server system 112. The network 110 may include one or more network devices that make up the Internet. Server system 112 may include a communication interface for communicating with user electronic device 102 over network 110.

[0019] В некоторых вариантах реализации обработка OCR может выполняться пользовательским электронным устройством 102, или сервером 112, или обоими этими устройствами. Однако в случае, когда функция обработки OCR выполняется пользовательским электронным устройством 102, серверная система 112 может отсутствовать. Пользовательское электронное устройство 102 может локально исполнять модуль OCR 118, выполняя OCR электронного изображения 120 физического документа 140. Модуль OCR 118 может быть реализован, например, в виде мобильного комплекта для разработки ПО (SDK) OCR. Вместо этого или к дополнение к этому серверная система 112 может получать изображение, полученное интерфейсом получения изображения 103а-b с пользовательского электронного устройства 102 через сеть 110. Затем серверная система 112 может запускать модуль OCR 113 для выполнения OCR изображения документа. Модуль OCR 118 пользовательского электронного устройства 102 и модуль OCR 113 серверной системы 112 будут совместно именоваться модулем OCR 113, 118, указывая на то, что оба эти модуля выполняют идентичную операцию.[0019] In some embodiments, OCR processing may be performed by user electronic device 102, or server 112, or both. However, in the case where the OCR processing function is performed by the user electronic device 102, the server system 112 may be absent. The user electronic device 102 may locally execute the OCR module 118, performing the OCR of the electronic image 120 of the physical document 140. The OCR module 118 may be implemented, for example, as a mobile OCR software development kit (SDK). Instead, or in addition to this, the server system 112 may receive an image obtained by the image acquisition interface 103a-b from the user electronic device 102 via the network 110. Then, the server system 112 may start the OCR module 113 to perform OCR image of the document. The OCR module 118 of the user electronic device 102 and the OCR module 113 of the server system 112 will be collectively referred to as the OCR module 113, 118, indicating that both of these modules perform the same operation.

[0020] Пользовательское электронное устройство 102 и (или) серверная система 112 могут быть реализованы с возможностью выполнения команд модуля локального фильтра 114. Модуль локального фильтра 114 может исполняться программно, аппаратно, в виде микропрограммы или как комбинация этих вариантов. Модуль локального фильтра 114 выполнен с возможностью улучшения контраста и (или) уменьшения шума электронного изображения 120 физического изображения 140, полученного с интерфейса получения изображения 103а-b пользовательского электронного устройства 102 для повышения точности OCR, выполняемого модулем OCR 113, 118.[0020] The user electronic device 102 and / or the server system 112 may be implemented with the ability to execute commands of the local filter module 114. The local filter module 114 may be executed in software, hardware, in the form of firmware, or as a combination of these options. The local filter module 114 is configured to improve the contrast and (or) reduce the noise of the electronic image 120 of the physical image 140 obtained from the image acquisition interface 103a-b of the user electronic device 102 to increase the accuracy of the OCR performed by the OCR module 113, 118.

[0021] На ФИГ. 2 представлена блок-схема примера процесса 200 для улучшения контраста и (или) уменьшения шума на изображении документа, полученного камерами, встроенными в устройство, например, мобильное устройство. Процесс 200 может выполняться, например, системой, такой как система 100. Для удобства представления в приводимом описании система 100 используется как пример для описания процесса 200. Однако для осуществления процесса 200 может использоваться другая система или сочетание систем.[0021] In FIG. 2 is a flowchart of an example process 200 for improving contrast and (or) reducing noise in an image of a document received by cameras integrated in a device, for example, a mobile device. Process 200 may be performed, for example, by a system, such as system 100. For ease of reference, in the following description, system 100 is used as an example to describe process 200. However, another system or combination of systems may be used to implement process 200.

[0022] На шаге 201 изображение (например, физического документа), содержащее символы текста, обрабатывается для получения серого изображения. Например, модуль локального фильтра 114 может обрабатывать электронное изображение 120, создавая серое изображение электронного изображения 120. Модуль локального фильтра 114 может получать электронное изображение 120, предназначенное для обработки модулем OCR 113, 118. В некоторых вариантах реализации интерфейс получения изображения 103а-b может создавать электронное изображение 120. Вместо этого или к дополнение к этому пользовательское электронное устройство 102 и (или) серверная система 112 могут содержать ранее сохраненное электронное изображение 120 и могут затем извлекать электронное изображение 120 из устройства хранения данных 106. Вместо этого или к дополнение к этому пользовательское электронное устройство 102 и (или) серверная система 112 могут получать электронное изображение 120 по сети 110. В некоторых вариантах реализации модуль локального фильтра 114 может получать серое изображение от анализатора изображений или модуля OCR 113, 118 до выполнения OCR. Электронное изображение 120 может быть изображением физического документа 140 или другим цифровым изображением, содержащим текст. Один из примеров создания серого изображения описан в совместной заявке на патент США №15/165,512 «СПОСОБ И УСТРОЙСТВО ДЛЯ ОПРЕДЕЛЕНИЯ ПРИГОДНОСТИ ДОКУМЕНТА ДЛЯ ОПТИЧЕСКОГО РАСПОЗНАВАНИЯ СИМВОЛОВ (OCR)», поданной 26 мая 2016 г. В некоторых вариантах реализации создание серого изображения может быть необязательным шагом, и, таким образом, может быть пропущено. В некоторых вариантах реализации модуль локального фильтра 114 может использовать вместо серого изображения электронное изображение 120.[0022] In step 201, an image (eg, a physical document) containing text characters is processed to obtain a gray image. For example, the local filter module 114 may process the electronic image 120 to produce a gray image of the electronic image 120. The local filter module 114 may receive the electronic image 120 for processing by the OCR module 113, 118. In some embodiments, the image acquisition interface 103a-b may create electronic image 120. Instead, or in addition to this, the user electronic device 102 and / or server system 112 may comprise a previously stored electronic image 120 and may then retrieve the electronic image 120 from the storage device 106. Instead, or in addition to this, the user the electronic device 102 and / or the server system 112 may receive the electronic image 120 via the network 110. In some embodiments, the local filter module 114 may receive a gray image from the image analyzer or OCR module 113, 118 before performing OCR. The electronic image 120 may be an image of a physical document 140 or another digital image containing text. One example of creating a gray image is described in US Patent Application No. 15 / 165,512, METHOD AND DEVICE FOR DETERMINING THE APPLICABILITY OF A DOCUMENT FOR OPTICAL RECOGNITION OF CHARACTERS (OCR), filed May 26, 2016. In some embodiments, creating a gray image may not be necessary step, and thus can be skipped. In some embodiments, the local filter module 114 may use an electronic image 120 instead of a gray image.

[0023] На шаге 202 серое изображение и (или) изображение разделяется на блоки. Блоки могут быть перекрывающимися и (или) неперекрывающимися. Например, модуль локального фильтра 114 может разделить серое изображение и (или) электронное изображение 120 на перекрывающиеся и (или) неперекрывающиеся блоки. Модуль локального фильтра 114 может выбрать размер блоков таким образом, чтобы характеристики шума и сигнала внутри блока были приблизительно постоянными (например, яркость фона, шум, контраст текста и (или) размытие незначительно менялись в пределах данного блока).[0023] In step 202, the gray image and / or image is divided into blocks. Blocks may be overlapping and / or non-overlapping. For example, the local filter module 114 may split the gray image and / or electronic image 120 into overlapping and / or non-overlapping blocks. The local filter module 114 can select the block size so that the noise and signal characteristics within the block are approximately constant (for example, background brightness, noise, text contrast and / or blur vary slightly within this block).

[0024] В некоторых вариантах реализации модуль локального фильтра 114 использует достаточно большой размер блока, чтобы не перегружать вычислительные ресурсы.[0024] In some embodiments, the local filter module 114 uses a sufficiently large block size so as not to overload the computing resources.

Например, модуль локального фильтра 114 может использовать размер блока N около 10% от наибольшего размера электронного изображения 120. В некоторых вариантах реализации модуль локального фильтра 114 может предварительно оценивать размер шрифта и подбирать N так, чтобы блок по высоте содержал приблизительно от трех до шести строк текста физического документа 140.For example, the local filter module 114 may use a block size N of about 10% of the largest size of the electronic image 120. In some embodiments, the local filter module 114 may pre-estimate the font size and select N so that the block height contains approximately three to six lines text of physical document 140.

[0025] В некоторых вариантах реализации модуль локального фильтра 114 может использовать неперекрывающиеся блоки, если вычислительные ресурсы ограничены. Вместо этого в некоторых вариантах реализации модуль локального фильтра 114 может использовать перекрывающиеся блоки, когда обработанное изображение возвращается пользователю (например, после подавления шума и деконтрастирования), поскольку неперекрывающиеся блоки могут давать результаты, заметно различающиеся по яркости, контрасту и другим параметрам на границах между блоками. В некоторых вариантах реализации модуль локального фильтра 114 может не допускать различий на границах блоков, выбирая блоки, которые как минимум частично перекрываются (например, на десять-пятнадцать процентов) с соседними блоками.[0025] In some embodiments, the local filter module 114 may use non-overlapping blocks if computing resources are limited. Instead, in some embodiments, the local filter module 114 may use overlapping blocks when the processed image is returned to the user (for example, after noise reduction and de-contrasting), since non-overlapping blocks can produce results that differ markedly in brightness, contrast, and other parameters at the boundaries between the blocks . In some embodiments, the local filter module 114 may not allow differences at block boundaries by selecting blocks that at least partially overlap (for example, ten to fifteen percent) with neighboring blocks.

[0026] На шаге 203 определяется среднее значение контраста каждого блока. Например, модуль локального фильтра 114 может определить среднее значение контраста C_i (где i представляет идентификатор соответствующего блока), связанное с серым изображением и (или) электронным изображением 120.[0026] In step 203, the average contrast value of each block is determined. For example, the local filter module 114 may determine the average contrast value C _i (where i represents the identifier of the corresponding block) associated with the gray image and / or electronic image 120.

[0027] В некоторых вариантах реализации модуль локального фильтра 114 проводит анализ на основе гистограммы для определения значения контраста C_i для заданного блока, определенного на шаге 202. В частности, в некоторых вариантах реализации модуль локального фильтра 114 создает гистограмму яркости. С помощью гистограммы яркости модуль локального фильтра 114 определяет минимальное и максимальное значение яркости, при котором 0,1% всех пикселей в заданном блоке имеют яркость меньше минимального значения, а 0,1% всех пикселей в заданном блоке имеют яркость выше максимума. В некоторых вариантах реализации значение яркости каждого пикселя в сером изображении может быть целым числом в диапазоне от 0 до 255. Модуль локального фильтра 114 может определить среднее значение контраста C_i как разницу между максимальным и минимальным значением.[0027] In some embodiments, the local filter module 114 performs a histogram analysis to determine the contrast value C _i for a given block determined in step 202. In particular, in some embodiments, the local filter module 114 creates a luminance histogram. Using the brightness histogram, the local filter module 114 determines the minimum and maximum brightness value at which 0.1% of all pixels in a given block have a brightness less than the minimum value, and 0.1% of all pixels in a given block have a brightness above the maximum. In some embodiments, the luminance value of each pixel in the gray image may be an integer ranging from 0 to 255. The local filter module 114 may determine the average contrast value C _i as the difference between the maximum and minimum values.

[0028] На шаге 204 создается бинаризованная версия каждого блока. Например, модуль локального фильтра 114 может создавать бинаризованную версию каждого блока i серого изображения и (или) электронного изображения 120. Модуль локального фильтра 114 может выполнять бинаризацию с помощью алгоритма определения пороговых значений на основе яркости. Более конкретно, для процесса бинаризации может использоваться порог бинаризации th, представляющий полусумму максимальной яркости I_max и минимальной яркости I_min для заданного блока.[0028] In step 204, a binarized version of each block is created. For example, the local filter module 114 may generate a binarized version of each gray image block i and / or the electronic image 120. The local filter module 114 may perform binarization using a luminance threshold determination algorithm. More specifically, for the binarization process, a binarization threshold th may be used, representing half the maximum brightness I _max and minimum brightness I _min for a given block.

[0029]

[0030] Альтернативно модуль локального фильтра 114 может использовать для вычисления порога бинаризации метод Оцу th.[0030] Alternatively, the local filter module 114 may use the Otsu method th to calculate the binarization threshold.

[0031] Затем модуль локального фильтра 114 определяет число К бинаризованных контурных пикселей на бинаризованном изображении в заданном блоке i. Например, черный пиксель может быть определен как контурный пиксель, если рядом с ним расположен соседний белый пиксель (в вертикальном или горизонтальном направлении), а количество наиболее вероятных контурных пикселей в окружающей области, например, 3×3, менее чем, например, 7 пикселей.[0031] Then, the local filter module 114 determines the number K of binarized contour pixels in the binarized image in the given block i. For example, a black pixel can be defined as a contour pixel if an adjacent white pixel is located next to it (in the vertical or horizontal direction), and the number of the most likely contour pixels in the surrounding area, for example, 3 × 3, is less than, for example, 7 pixels .

[0032] На шаге 205 выявляются блоки, содержащие текст. Оставшаяся часть процесса 200 может выполняться с подмножеством блоков, которые содержат компоненты текста. Например, модуль локального фильтра 114 может выявить блоки, содержащие текст. В некоторых вариантах реализации модуль локального фильтра 114 выявляет блоки, исходя как минимум из одного из следующих условий: (I) контурные бинарные пиксели превышают заданное соотношение всех пикселей в блоке (например, 3-5 процентов); и (ii) соотношение черных пикселей (или белых, если электронное изображение 120 представляет собой инвертированное изображение текста) в бинаризованном изображении, созданном на шаге 204, будет ниже заданного порога P (заданный порог P может, например, составлять от 20 до 30 процентов); и (iii) значение контраста C_i не ниже предварительно заданного значения. Однако модуль локального фильтра 114 может использовать другие способы определения текстовых блоков.[0032] In step 205, blocks containing text are detected. The remainder of the process 200 may be performed with a subset of blocks that contain components of the text. For example, the local filter module 114 may detect blocks containing text. In some embodiments, the local filter module 114 detects the blocks based on at least one of the following conditions: (I) the contour binary pixels exceed the specified ratio of all the pixels in the block (for example, 3-5 percent); and (ii) the ratio of black pixels (or white if the electronic image 120 is an inverted image of the text) in the binarized image created in step 204 will be below a predetermined threshold P (a predetermined threshold P may, for example, be from 20 to 30 percent) ; and (iii) the contrast value C _{i is} not lower than a predetermined value. However, the local filter module 114 may use other methods for defining text blocks.

[0033] На шаге 206 определяется среднее значение контраста текста C_t в каждом блоке, который содержит текст. Например, модуль локального фильтра 114 может использовать гистограмму яркости серого изображения и (или) электронного изображения 120, а также пороговое значение th, вычисленное на шаге 204. Более конкретно, модуль локального фильтра 114 может вычислять среднее значение М₁ для пикселей, имеющих яркость меньше th. Затем модуль локального фильтра 114 может вычислять среднее значение М₂ для пикселей с яркостью, равной или выше th. Затем модуль локального фильтра 114 может вычислять средний контраст текста C_t в текстовых блоках следующим образом:[0033] In step 206, the average text contrast value C _t in each block that contains the text is determined. For example, the local filter module 114 may use a histogram of the brightness of the gray image and / or the electronic image 120, as well as the threshold value th calculated in step 204. More specifically, the local filter module 114 may calculate the average value M ₁ for pixels having a brightness less than th. Then, the local filter module 114 can calculate the average value of M ₂ for pixels with a brightness equal to or higher than th. Then, the local filter module 114 can calculate the average text contrast C _t in the text blocks as follows:

[0034] C_t=М₂-М₁ [0034] C _t = M ₂ -M ₁

[0035] На шаге 207 выявляются шумовые пиксели в каждом блоке, содержащем текст, исходя из анализа статистики малой окрестности n×n пикселей вокруг каждого пикселя в блоке. В некоторых вариантах реализации на этом этапе процесса 200 тип каждого пикселя в каждом из текстовых блоков (например, фоновый пиксель, пиксель связной компоненты или проблемный пиксель связной компоненты) еще не определен. Например, модуль локального фильтра 114 может выявлять шумовые пиксели, исходя из модуля разницы яркости I в центральном пикселе из окрестности n×n, такой как 3×3 или 5×5, и некоторой статистики S о яркости соседних пикселей (например, abs(I-S)).[0035] In step 207, noise pixels in each block containing text are detected based on an analysis of statistics of a small neighborhood of n × n pixels around each pixel in the block. In some implementations at this stage of process 200, the type of each pixel in each of the text blocks (e.g., background pixel, pixel of a connected component, or problem pixel of a connected component) has not yet been determined. For example, the local filter module 114 may detect noise pixels based on the luminance difference module I in the central pixel from an n × n neighborhood, such as 3 × 3 or 5 × 5, and some statistics S on the brightness of neighboring pixels (e.g. abs (IS )).

Статистика S может представлять собой среднее значение или тип усреднения для яркости этих n×n пикселей. Модуль локального фильтра 114 может обнаруживать импульсы шума, если:Statistics S may be the average value or type of averaging for the brightness of these n × n pixels. The local filter module 114 can detect noise pulses if:

[0036]

[0037] Модуль локального фильтра 114 может использовать один или несколько из следующих статистик, например:[0037] The local filter module 114 may use one or more of the following statistics, for example:

[0038]

, серединную точку для максимальной и минимальной яркости в окрестности[0038]

, midpoint for maximum and minimum brightness in the neighborhood

[0039]

, медиана яркости в окрестности[0039]

, the median of brightness in the neighborhood

[0040]

, среднее значение яркости в окрестности[0040]

, the average brightness in the neighborhood

[0041] На шаге 208 контраст локальных шумовых пикселей снижается до заданного уровня контраста в каждом блоке, содержащем текст. Например, модуль локального фильтра 114 может понижать контраст выявленных шумовых пикселей, вычисляя новую яркость пикселя I₁ как сумму статистики S и половины контраста текста C_t, умноженной на знак разницы между яркостью I и статистикой S:[0041] In step 208, the contrast of local noise pixels is reduced to a predetermined contrast level in each block containing text. For example, the local filter module 114 can reduce the contrast of detected noise pixels by calculating the new pixel brightness I ₁ as the sum of the statistics S and half the text contrast C _t times the sign of the difference between the brightness I and the statistics S:

[0042 ]

[0042]

[0043] Новая яркость пикселя I₁ представляет ограничение на степень деконтрастирования для локальных шумовых пикселей. В некоторых вариантах реализации новая яркость I₁ использует оценку среднего значения контраста текста C_t в блоке для выполнения деконтрастирования локальных шумовых пикселей без повреждения структур текста в блоке. Другими словами, если контраст шумовых пикселей меньше половины контраста текста

, деконтрастирование шумовых пикселей не производится. Если контраст шумовых пикселей больше половины контраста текста

, контраст шумовых пикселей уменьшается до половины контраста текста

.[0043] The new pixel luminance I ₁ represents a restriction on the degree of decontrast for local noise pixels. In some embodiments, the new luminance I ₁ uses an estimate of the average text contrast value C _t in the block to perform de-contrasting of the local noise pixels without damaging the text structures in the block. In other words, if the contrast of the noise pixels is less than half the contrast of the text

, noise pixels are not decontraining. If the contrast of the noise pixels is more than half the contrast of the text

, noise pixel contrast is reduced to half the text contrast

.

[0044] На шаге 209 в каждом блоке, содержащем текст, выбираются фоновые пиксели, пиксели связных компонент и проблемные пиксели связных компонент с неуверенной бинаризацией. Выбор фоновых пикселей, пикселей связных компонент и проблемных пикселей может производиться в текстовых блоках, где шумовые пиксели были уменьшены или деконтрастированы на шаге 208. Например, модуль локального фильтра 114 может выполнять фильтрацию серого изображения, используя медианный фильтр 3×3. Затем в окрестности m×m (например, 7×7) каждого пикселя модуль локального фильтра 114 вычисляет разницу между максимальной яркостью и минимальной яркостью в окрестности m×m. Если эта разница меньше b×C_t (где параметр b может быть 0.5<b<1), то модуль локального фильтра 114 определяет пиксель как пиксель фона. В противном случае модуль локального фильтра 114 определяет, что пиксель является частью связной компоненты (например, символа текста или части символа текста, который предстоит распознать посредством OCR, или другого объекта, присутствующего на электронном изображении 120, например, линии или формы).[0044] In step 209, background pixels, pixels of connected components, and problem pixels of connected components with uncertain binarization are selected in each block containing text. The selection of background pixels, pixels of connected components, and problem pixels can be made in text blocks where the noise pixels were reduced or decontrained in step 208. For example, the local filter module 114 can filter the gray image using a 3 × 3 median filter. Then, in the neighborhood m × m (for example, 7 × 7) of each pixel, the local filter module 114 calculates the difference between the maximum brightness and the minimum brightness in the vicinity of m × m. If this difference is less than b × C _t (where the parameter b may be 0.5 <b <1), then the local filter module 114 determines the pixel as a background pixel. Otherwise, the local filter module 114 determines that the pixel is part of a connected component (for example, a text symbol or part of a text symbol to be recognized by OCR, or another object present in the electronic image 120, for example, a line or shape).

[0045] Вместо этого или в дополнение к этому модуль локального фильтра 114 может определить фоновые пиксели, пиксели связных компонент и проблемные пиксели связных компонент, используя бинаризацию и морфологическую эрозию (или дилатацию в случае инверсного текста). Модуль локального фильтра 114 применяет к бинаризованному изображению фильтр эрозия (или дилатация) 3×3 несколько раз, например, от двух до четырех раз (где количество применений может настраиваться). После нескольких применений фильтра модуль локального фильтра 114 определяет, что оставшиеся черные пиксели, например, - пиксели связных компонент, а белые пиксели - пиксели фона. В случае инверсного текста белые пиксели будут пикселями связных компонент, а черные пиксели - пикселями фона.[0045] Instead, or in addition to this, the local filter module 114 can determine the background pixels, pixels of connected components, and problem pixels of connected components using binarization and morphological erosion (or dilatation in the case of inverse text). The local filter module 114 applies an erosion filter (or dilatation) 3 × 3 to the binarized image several times, for example, two to four times (where the number of applications can be adjusted). After several uses of the filter, the local filter module 114 determines that the remaining black pixels, for example, are pixels of the connected components, and white pixels are the pixels of the background. In the case of inverse text, white pixels will be pixels of connected components, and black pixels will be background pixels.

[0046] Модуль локального фильтра 114 выявляет как проблемные пиксели связных компонент те пиксели связных компонент, для которых яркость находится в некотором интервале вблизи порога бинаризации th текстового блока: th-delta<I<th+delta, где I - яркость пикселя, и delta<C_t/2 (т.е. это пикселей связных компонент с неуверенной бинаризацией). Модуль локального фильтра 114 может подбирать интервал delta, на основе обучающих изображений, как описано ниже.[0046] The local filter module 114 detects, as problematic pixels of connected components, those pixels of connected components for which the brightness is in a certain interval near the binarization threshold th of the text block: th-delta <I <th + delta, where I is the pixel brightness, and delta <C _t / 2 (i.e. these are pixels of connected components with uncertain binarization). The local filter module 114 may select a delta interval based on training images, as described below.

[0047] На шаге 210 измеряется дисперсия шума sigma² фоновых пикселей в каждом блоке, содержащем текст, где sigma - стандартное отклонение. Например, для фоновых пикселей, выявленных на шаге 209, модуль локального фильтра 114 может вычислить дисперсию яркости I изображения, используя стандартную формулу:

×

, где m - среднее значение яркости фоновых пикселей в блоке, N - общее число фоновых пикселей в блоке, и суммирование производится для всех фоновых пикселей (i,j) в блоке. [0047] At step 210, the noise variance sigma ^{2 of the} background pixels in each block containing text is measured, where sigma is the standard deviation. For example, for the background pixels detected in step 209, the local filter module 114 can calculate the brightness variance of the I image using the standard formula:

×

where m is the average brightness of the background pixels in the block, N is the total number of background pixels in the block, and summation is performed for all background pixels (i, j) in the block.

[0048] На шаге 211 выполняется локально-адаптивная фильтрация выборочным усреднением с сохранением контуров для каждого блока, содержащего текст. Локально-адаптивная фильтрация может выполняться для текстовых блоков, шумовые пиксели в которых уменьшены или деконтрастированы. Локально-адаптивная фильтрация может предполагать использование различных значений параметров фильтра для фоновых пикселей, пикселей связных компонент и проблемных пикселей связных компонент. Параметры фильтра для проблемных пикселей связных компонент определяются исходя из голосования пикселей связных компонент с похожей структурой локальной окрестности.[0048] In step 211, locally adaptive filtering is performed by selective averaging while maintaining the contours for each block containing text. Local adaptive filtering can be performed for text blocks in which noise pixels are reduced or decontrasted. Locally adaptive filtering may involve the use of different filter parameter values for background pixels, pixels of connected components, and problem pixels of connected components. Filter parameters for problematic pixels of connected components are determined based on the voting of pixels of connected components with a similar local neighborhood structure.

[0049] В первом примере модуль локального фильтра 114 может выполнять быстрый вариант локально-адаптивной фильтрации, использующий сигма-фильтрацию пикселей. В результате сигма-фильтрации модуль локального фильтра 114 выдает для центрального пикселя окрестности F×F с яркостью I среднюю яркость 〈I〉, усредненную по тем пикселям (i,j) окрестности F×F, которые удовлетворяют следующему условию, где I(i,j) - яркость каждого пикселя (i,j) окрестности F×F вокруг центрального пикселя:[0049] In the first example, the local filter module 114 may perform a fast version of locally adaptive filtering using sigma filtering of pixels. As a result of sigma filtering, the local filter module 114 produces for the central pixel of the neighborhood F × F with brightness I the average brightness 〈I〉 averaged over those pixels (i, j) of the neighborhood F × F that satisfy the following condition, where I (i, j) is the brightness of each pixel (i, j) of the neighborhood F × F around the central pixel:

[0050] I-k×sigma<I(i, j)<I+k×sigma[0050] I-k × sigma <I (i, j) <I + k × sigma

[0051] Модуль локального фильтра 114 может подобрать параметры F и k при обработке учебных изображений, как описано ниже. Модуль локального фильтра 114 может использовать этот вариант фильтрации, если вычислительные ресурсы (например, скорость процессора, доступная память или срок работы аккумулятора) ограничены, например, при обработке изображений на мобильном устройстве.[0051] The local filter module 114 may select the parameters F and k when processing training images, as described below. Local filter module 114 may use this filtering option if computing resources (eg, processor speed, available memory, or battery life) are limited, for example, when processing images on a mobile device.

[0052] О втором примере модуль локального фильтра 114 может выполнять более сложную версию локально-адаптивной фильтрации пикселей, используя различные параметры для трех типов пикселей (фоновых, связных компонент и проблемных для связных компонент). В некоторых вариантах реализации изображение обрабатывается модулем локального фильтра 114 или другим модулем в системе сервера 112. В более сложной версии модуль локального фильтра 114 может использовать управляемый сигма-фильтр для трех типов пикселей. Для определения соседних пикселей, участвующих в локальном усреднении, модуль локального фильтра 114 использует некоторую оценку неискаженного изображения, а не само значение яркости пикселей. В результате управляемой сигма-фильтрации в фоновых пикселях модуль локального фильтра 114 выдает для центрального пикселя окрестности F×F среднюю яркость 〈I〉, усредненную по тем пикселям (i,j) окрестности F×F, которые удовлетворяют условию, ранее описанному для быстрой версии локально-адаптивной фильтрации. Вместо этого или в дополнение к этому модуль локального фильтра 114 может использовать другой сглаживающий фильтр для определения результирующей яркости в центральных пикселях, которые относятся к типу фоновых пикселей.[0052] About the second example, the local filter module 114 may perform a more complex version of locally adaptive pixel filtering using various parameters for three types of pixels (background, connected components and problematic for connected components). In some implementations, the image is processed by the local filter module 114 or another module in the server system 112. In a more complex version, the local filter module 114 can use a managed sigma filter for three types of pixels. To determine the neighboring pixels involved in local averaging, the local filter module 114 uses some estimate of the undistorted image, and not the pixel brightness value itself. As a result of controlled sigma filtering in the background pixels, the local filter module 114 produces for the central pixel of the F × F neighborhood the average brightness 〈I〉 averaged over those pixels (i, j) of the F × F neighborhood that satisfy the condition previously described for the fast version locally adaptive filtering. Instead, or in addition to this, the local filter module 114 may use another smoothing filter to determine the resulting brightness in the center pixels, which are the type of background pixels.

[0053] В результате управляемой сигма-фильтрации в пикселях связных компонент модуль локального фильтра 114 выдает для центрального пикселя окрестности F×F среднюю яркость 〈I〉, усредненную по тем пикселям (i,j) окрестности F×F, которые удовлетворяют следующим условиям:[0053] As a result of the controlled sigma filtering in the pixels of the connected components, the local filter module 114 produces for the central pixel of the F × F neighborhood an average brightness 〈I〉 averaged over those pixels (i, j) of the F × F neighborhood that satisfy the following conditions:

[0054] М₁-k×sigma<I(i,j)<М₁+k×sigma, если яркость центрального пикселя ниже порогового значения th, и[0054] M ₁ -k × sigma <I (i, j) <M ₁ + k × sigma if the brightness of the center pixel is below the threshold value th, and

[0055] М₂-k×sigma<I(i,j)<М₂+k×sigma, если яркость центрального пикселя не ниже (то есть выше или равна) порогового значения th.[0055] M ₂ -k × sigma <I (i, j) <M ₂ + k × sigma, if the brightness of the central pixel is not lower (that is, greater than or equal to) the threshold value th.

[0056] В результате управляемой сигма-фильтрации в проблемных пикселях связных компонент модуль локального фильтра 114 выдает для центрального пикселя окрестности F×F среднюю яркость 〈I〉, усредненную по тем пикселям (i,j) окрестности F×F, которые удовлетворяют следующим условиям:[0056] As a result of controlled sigma filtering in the problematic pixels of the connected components, the local filter module 114 produces for the central pixel of the F × F neighborhood an average brightness 〈I〉 averaged over those pixels (i, j) of the F × F neighborhood that satisfy the following conditions :

[0057] М₁-k×sigma<I(i,j)<М₁+k×sigma (формула А), если для большинства пикселей связных компонент в блоке с похожей локальной окрестностью яркость этих пикселей ниже порогового значения th, и[0057] M ₁ -k × sigma <I (i, j) <M ₁ + k × sigma (formula A), if for most pixels of connected components in a block with a similar local neighborhood, the brightness of these pixels is below the threshold value th, and

[0058] М₂-k×sigma<I(i,j)<М₂+k×sigma (формула В), если для большинства пикселей связных компонент в блоке с похожей локальной окрестностью яркость этих пикселей не ниже (то есть выше или равна) порогового значения th.[0058] M ₂ -k × sigma <I (i, j) <M ₂ + k × sigma (formula B), if for most pixels of connected components in a block with a similar local neighborhood, the brightness of these pixels is not lower (that is, higher or equals) the threshold value th.

[0059] Для каждого проблемного пикселя связных компонент модуль локального фильтра 114 может выявить пиксели с похожей локальной окрестностью (то есть, подобные пиксели) внутри текстового блока N×N среди пикселей связных компонент. Модуль локального фильтра 114 может использовать бинарные локальные дескрипторы, такие как BRISK, для поиска похожих локальных окрестностей. Модуль локального фильтра 114 может вычислять дескрипторы для каждого пикселя связных компонент. Например, модуль локального фильтра 114 может определить, что проблемный пиксель и некоторый пиксель связной компоненты подобны, и этот пиксель связной компоненты может использоваться для выбора между формулами А и В. Если расстояние Хэмминга между локальными дескрипторами проблемного пикселя и пикселя связной компоненты меньше порогового значения D, пиксель связной компоненты считается подобным с проблемным пикселем, а значит, подобные пиксели связных компонент могут использоваться для выбора между формулами А и В указанных выше.[0059] For each problem pixel of the connected components, the local filter module 114 may detect pixels with a similar local neighborhood (that is, similar pixels) within the N × N text block among the pixels of the connected components. Local filter module 114 may use binary local descriptors, such as BRISK, to search for similar local neighborhoods. The local filter module 114 may calculate descriptors for each pixel of the connected components. For example, the local filter module 114 may determine that the problem pixel and some pixel of the connected component are similar, and this pixel of the connected component can be used to choose between formulas A and B. If the Hamming distance between the local descriptors of the problem pixel and the pixel of the connected component is less than the threshold value D , the pixel of the connected component is considered similar to the problem pixel, which means that similar pixels of the connected components can be used to choose between formulas A and B above.

[0060] Модуль локального фильтра 114 может сначала выполнить приблизительный поиск похожих локальных окрестностей на уменьшенном изображении, например, на бинаризованном изображении, разрешение которого в два раза меньше. Модуль локального фильтра 114 может уменьшить разрешение изображения, используя билинейную интерполяцию. В некоторых вариантах реализации билинейная интерполяция может приводить к тому, что шум становится больше похожим на гауссов шум и менее коррелирован по пространству изображения, что может привести к более устойчивому поиску похожих локальных окрестностей пикселей.[0060] The local filter module 114 may first perform an approximate search of similar local neighborhoods in a reduced image, for example, in a binarized image, whose resolution is half that. The local filter module 114 may reduce image resolution using bilinear interpolation. In some embodiments, bilinear interpolation may result in noise becoming more like Gaussian noise and less correlated in image space, which may lead to a more stable search for similar local pixel neighborhoods.

[0061] Модуль локального фильтра 114 может затем в исходном изображении выполнить уточнение области подобного пикселя, обнаруженного при уменьшенном разрешении. Вместо этого или в дополнение к этому модуль локального фильтра 114 может счесть каждый из пикселей этой окрестности подобным и решить, что все они могут участвовать в голосовании при выборе условий между формулами А и В. Модуль локального фильтра 114 может выполнить поиск пикселей связных компонент с похожими окрестностями и для голосования за формулы А и В на предварительно обработанном изображении. Предварительная обработка может включать, например, бинаризацию или использование сглаживающего фильтра с сохранением контуров.[0061] The local filter module 114 may then, in the original image, refine the region of a similar pixel detected at a reduced resolution. Instead, or in addition to this, the local filter module 114 may consider each of the pixels in this neighborhood similar and decide that they all can vote when conditions are selected between formulas A and B. The local filter module 114 can search for pixels of connected components with similar neighborhoods and to vote for formulas A and B in the pre-processed image. Pretreatment may include, for example, binarization or the use of a smoothing filter while maintaining the contours.

[0062] Вместо этого или к дополнение к этому при фильтрации в проблемных пикселях модуль локального фильтра 114 может использовать результат нелокального усреднения (non-local means) по найденным подобным пикселям вместо применения формул А и В. Модуль локального фильтра 114 может определить, что проблемный пиксель связной компоненты имеет слишком мало участников найденных для голосования (например, меньше определенного порогового значения V) или при голосовании выяснилось, что большинство незначительно превосходит меньшинство (например, менее чем на V₁ процентов от количества участников голосования), и в этом случае вместо формул А и В модуль локального фильтра 114 может применять сглаживающий фильтр с сохранением контуров 3×3 к проблемному пикселю связной компоненты, например симметричный фильтр ближайшего соседа (SNN) или фильтр Кувахара-Нагао (Kuwahara-Nagao).[0062] Instead, or in addition to this, when filtering in problematic pixels, local filter module 114 may use the non-local means for similar pixels found instead of applying formulas A and B. Local filter module 114 may determine that the problem the pixel of the connected component has too few participants found for voting (for example, less than a certain threshold value V) or during the voting it turned out that the majority slightly exceeds the minority (for example, less than V ₁ percent of the number of participants in the vote), and in this case instead of formulas A and B, the local filter module 114 may apply a 3 × 3 smoothing filter to the problem pixel of the connected component, such as a symmetrical nearest neighbor (SNN) filter or a Kuwahara-Nagao filter.

[0063] В другом примере модуль локального фильтра 114 может применять оригинальный 3×3 сглаживающий фильтр с сохранением контуров. Модуль локального фильтра 114 может выполнять усреднение в окне 3×3 вдоль одного из четырех направлений, т.е. вертикального, горизонтального или одному из двух диагональных триплетов, в окрестности проблемного пикселя, включая центральный пиксель. Модуль локального фильтра 114 может выбирать направление сглаживания, исходя из анализа гистограммы направлений градиента с четырьмя уровнями квантования в окне 3×3. Модуль локального фильтра 114 может использовать бин с максимальным значением гистограммы для оценки направления градиента в центральном элементе, если значение гистограммы для него превышает пороговое значение в пять девятых или шесть девятых, например. Модуль локального фильтра 114 может выполнять сглаживание на триплете пикселей, перпендикулярном вычисленному направлению градиента. Если модуль локального фильтра 114 обнаруживает, что максимальное значение гистограммы не превосходит пороговое значение, модуль локального фильтра 114 может предоставить в качестве результата фильтрации исходное значение центрального пикселя. В некоторых вариантах реализации модуль локального фильтра 114 может вычислять градиент, используя оператор Собеля. Кроме того, модуль локального фильтра 114 может несколько раз применять этот фильтр к одним и тем же проблемным пикселям.[0063] In another example, the local filter module 114 may apply the original 3 × 3 smoothing filter while maintaining the contours. The local filter module 114 can perform averaging in a 3 × 3 window along one of four directions, i.e. vertical, horizontal, or one of two diagonal triplets, in the vicinity of the problem pixel, including the center pixel. The local filter module 114 may select a smoothing direction based on an analysis of a histogram of gradient directions with four quantization levels in a 3 × 3 window. Local filter module 114 may use a bin with a maximum histogram value to estimate the direction of the gradient in the center element if the histogram value for it exceeds a threshold value of five-ninths or six-ninths, for example. The local filter module 114 may perform smoothing on a triplet of pixels perpendicular to the calculated gradient direction. If the local filter module 114 detects that the maximum histogram value does not exceed the threshold value, the local filter module 114 may provide an initial central pixel value as a filter result. In some embodiments, the local filter module 114 may calculate the gradient using the Sobel operator. In addition, the local filter module 114 may apply this filter to the same problem pixels several times.

[0064] На шаге 212 локальной контраст для слабоконтрастных пикселей приводится к заданному значению в каждом блоке, содержащем текст. Например, если модуль локального фильтра 114 определяет слабоконтрастные пиксели в связных компонентах как удовлетворяющие условию

, где с<1, то модуль локального фильтра 114 повышает контраст выявленных слабо контрастных пикселей в связных компонентах до заданного значения. Модуль локального фильтра 114 вычисляет новое значение яркости пикселя следующим образом:[0064] In step 212, the local contrast for low-contrast pixels is reduced to a predetermined value in each block containing text. For example, if the local filter module 114 defines low contrast pixels in connected components as satisfying the condition

, where c <1, then the local filter module 114 increases the contrast of the detected weakly contrasting pixels in the connected components to a predetermined value. The local filter module 114 calculates a new pixel brightness value as follows:

[0065]

[0066] Модуль локального фильтра 114 может объединять результаты локально-адаптивной фильтрации в тех областях, где блоки перекрываются. Если модуль локального фильтра 114 разбивает изображение на перекрывающиеся блоки, модуль локального фильтра 114 может объединять результаты, обеспечивая незаметный переход между блоками в тех областях, где эти блоки перекрываются. Модуль локального фильтра 114 может объединять результаты локально-адаптивной фильтрации соседних перекрывающихся блоков 1 и 2 умножая значения яркости I₁ и I₂ соответствующих пикселей блоков 1 и 2 на соответствующие веса и складывая полученные значения так, что значение для отфильтрованного изображения в перекрывающейся области будет равно:[0066] The local filter module 114 may combine locally adaptive filtering results in areas where the blocks overlap. If the local filter module 114 splits the image into overlapping blocks, the local filter module 114 can combine the results, providing a seamless transition between the blocks in the areas where these blocks overlap. The local filter module 114 can combine the results of locally adaptive filtering of neighboring overlapping blocks 1 and 2 by multiplying the brightness values I ₁ and I _{2 of the} corresponding pixels of blocks 1 and 2 by the corresponding weights and adding the obtained values so that the value for the filtered image in the overlapping region is equal to :

[0067]

[0068] На ФИГ. 3 представлена схема примера системы 300 для объединения значений яркости пикселей из перекрывающихся частей 304 нескольких текстовых блоков 302а-b.[0068] In FIG. 3 is a diagram of an example system 300 for combining pixel brightness values from overlapping portions 304 of several text blocks 302a-b.

Перекрывающаяся часть 304 представляет собой область, для которой вычисления яркости, описанные ранее, были выполнены отдельно для первого текстового блока 302а и второго текстового блока 302b. Это может привести к вычислению нескольких значений яркости в одном месте расположения или для одного пикселя на электронном изображении 120, например, для различных пикселей 308а-с.На первой границе 306а перекрывающейся части 304 модуль локального фильтра 114 может использовать значение р=1, в результате чего формула С сократится до яркости I₁ пикселя из блока 1, а яркость I₂ пикселя из блока 2 использоваться не будет. На второй границе 306b перекрывающейся части 304 модуль локального фильтра 114 может использовать значение р=0, в результате чего формула С сократится до яркости I₂ пикселя из блока 2, а яркость I₁ пикселя из блока 1 использоваться не будет. Модуль локального фильтра 114 может изменять значение p линейно от единицы до нуля на протяжении перекрывающейся части 304 от первого края 306а до второго края 306b. Таким образом, модуль локального фильтра 114 может использовать значение p=1/2 на центральной линии 310 перекрывающейся области 304, в результате для вычисления объединенного значения яркости пикселя 308 с электронного изображения 120 будет взята половина яркости I₁ пикселя из блока 1 и половина яркости I₂ пикселя из блока 2.The overlapping portion 304 is an area for which the brightness calculations described previously were performed separately for the first text block 302a and the second text block 302b. This can lead to the calculation of several brightness values at the same location or for one pixel in the electronic image 120, for example, for different pixels 308a-s. At the first border 306a of the overlapping portion 304, the local filter module 114 can use the value p = 1, as a result which formula C will be reduced to the brightness I ₁ pixels from block 1, and the brightness I ₂ pixels from block 2 will not be used. At the second boundary 306b of the overlapping portion 304, the local filter module 114 can use the value p = 0, as a result of which the formula C is reduced to the brightness I _{2 of the} pixel from block 2, and the brightness I _{1 of the} pixel from block 1 will not be used. The local filter module 114 may vary the value of p linearly from one to zero over the overlapping portion 304 from the first edge 306a to the second edge 306b. Thus, the local filter module 114 can use the value p = 1/2 on the center line 310 of the overlapping region 304, as a result, half the brightness I _{1 of the} pixel from block 1 and half the brightness I will be taken from the electronic image 120 to calculate the combined brightness of the pixel 308 ₂ pixels from block 2.

[0069] Пользовательское электронное устройство 102 и (или) серверная система 112 могут содержать обучающий модуль 122. Обучающий модуль 122 может настраивать параметры для процесса 200, подбирая параметры таким образом, чтобы, при использовании на обучающей выборке сильно зашумленных изображений физических документов, подобранные параметры обеспечивали высокую или максимально возможную точность OCR для имеющейся обучающей выборки с шумом на изображениях. В некоторых вариантах реализации обучающий модуль 122 может входить в состав или иметь доступ к модулю локального фильтра 114 для выполнения обучения параметров. Обучающий модуль 122 может использовать базу данных зашумленных текстовых изображений, таких, что отношение сигнал-шум в текстовых блоках на зашумленных изображений менее четырех. Формула

может использоваться для оценки отношения сигнал-шум. Альтернативно, формула R=C_t/sigma может использоваться для оценки отношения сигнал-шум. В некоторых вариантах реализации обучающий модуль 122 может использовать изображения с R<4 как обучающие изображения для подбора параметров.[0069] The user electronic device 102 and / or server system 112 may include a training module 122. The training module 122 may adjust the parameters for the process 200, selecting parameters so that, when using highly noisy images of physical documents on the training sample, the selected parameters provided the highest or highest possible OCR accuracy for the existing training sample with noise in the images. In some embodiments, training module 122 may be part of or have access to local filter module 114 to perform parameter training. Learning module 122 may use a database of noisy text images, such that the signal-to-noise ratio in text blocks on noisy images is less than four. Formula

can be used to estimate signal to noise ratio. Alternatively, the formula R = C _t / sigma can be used to estimate the signal-to-noise ratio. In some implementations, training module 122 may use images with R <4 as training images for parameter selection.

[0070] Обучающий модуль 122 снабжен реальным текстом зашумленных изображений. Например, реальный текст зашумленных изображений может храниться в базе данных. Обучающий модуль 122 может сравнивать текст, полученный в результате OCR, выполняемого на улучшенных обучающих изображениях, с реальным текстом, определяя точность OCR, например, процент ошибок в распознанном тексте (т.е. общее количество ошибок в символах распознанного текста, деленное на общее количество распознаваемых символов). Модуль локального фильтра 114 может выполнять улучшение обучающих изображений, а модуль OCR 113, 118 может выполнять OCR улучшенных обучающих изображений, используя множество различных значений параметров. Обучающий модуль 122 может подбирать оптимальные значения параметров, включая N, n, m, b, с, тип статистики S, k, F, D, delta и другие параметры фильтра, что позволяет получить максимальную точность OCR между результатами, полученными при OCR улучшенных обучающих изображений и реальным текстом обучающих изображений.[0070] The training module 122 is provided with real text of noisy images. For example, the real text of noisy images may be stored in a database. Learning module 122 can compare the text generated by OCR performed on enhanced learning images with real text, determining the accuracy of the OCR, for example, the percentage of errors in the recognized text (i.e., the total number of errors in the characters of the recognized text divided by the total recognizable characters). The local filter module 114 may perform enhancement of training images, and the OCR module 113, 118 may perform OCR of enhanced training images using a variety of different parameter values. Training module 122 can select the optimal parameter values, including N, n, m, b, s, statistics type S, k, F, D, delta and other filter parameters, which allows you to obtain maximum OCR accuracy between the results obtained with OCR improved training images and real text training images.

[0071] Обучающий модуль 122 может использовать примерный начальный диапазон возможных значений для каждого из параметров. Начальный диапазон может быть задан вручную пользователем обучающего модуля 122. Диапазоны могут выбираться с учетом априорных знаний о статистиках различных характеристик текста и шума. Диапазоны могут быть, например, такими:[0071] The training module 122 may use an exemplary initial range of possible values for each of the parameters. The initial range can be manually set by the user of the training module 122. The ranges can be selected taking into account a priori knowledge of the statistics of various characteristics of the text and noise. Ranges can be, for example, such:

[0072] N: от 5% до 15% или от 2 до 6 строк[0072] N: from 5% to 15% or from 2 to 6 lines

[0073] k: от 1 до 5[0073] k: 1 to 5

[0074] F: от 11 до 35 пикселей[0074] F: 11 to 35 pixels

[0075] D: от 50 до 150[0075] D: from 50 to 150

[0076] с: от 0.5 до 0.95[0076] s: from 0.5 to 0.95

[0077] 0.5<b<1[0077] 0.5 <b <1

[0078] delta=q*C_t/2, где q: от 0.05 до 0.5[0078] delta = q * C _t / 2, where q: from 0.05 to 0.5

[0079] n: 3, 5, 7[0079] n: 3, 5, 7

[0080] m: 5, 7, 9, 11, 13, 15[0080] m: 5, 7, 9, 11, 13, 15

[0081] S: одна из трех указанных выше формул[0081] S: one of the three above formulas

[0082] В некоторых вариантах осуществления обучающий модуль 122 определяет значение для каждого из параметров выше в перечисленном порядке, меняя значения каждого из параметров в разрешенном диапазоне. Обучающий модуль 122 может фиксировать найденное оптимальное значение параметра, а затем переходить к следующему параметру в списке. Обучающий модуль 122 может изначально использовать значение из середины допустимого диапазона для каждого параметра перед изменением значений параметра.[0082] In some embodiments, the training module 122 determines a value for each of the parameters above in the listed order, changing the values of each of the parameters in the allowed range. Learning module 122 can record the optimal parameter value found, and then move on to the next parameter in the list. Learning module 122 may initially use a value from the middle of the allowable range for each parameter before changing parameter values.

[0083] Например, обучающий модуль 122 может начать с определения оптимального значения N_opt для N. Обучающий модуль 122 определяет процент ошибок OCR, выполненном на улучшенных обучающих изображений, для значений N в указанном диапазоне (например, с некоторым приращением, таким как 1%, одна строка или половина строки). Обучающий модуль 122 может задать значения других параметров из середины диапазонов этих параметров. Обучающий модуль 122 сравнивает итоговый процент ошибок для каждого из значений N, чтобы определить, какое значение N обеспечивает наименьший процент ошибок. Затем обучающий модуль 122 фиксирует значение N на полученном значении N_opt и продолжает работу, изменяя следующий параметр, например, k, оставляя значения других необработанных параметров на средних значениях. Обучающий модуль 122 определяет процент ошибок OCR, выполненном на улучшенных обучающих изображений, для значений k в указанном диапазоне (например, с приращением 0,2) и сравнивает процент ошибок, чтобы определить, какое значение k дает меньше всего ошибок.[0083] For example, training module 122 may begin by determining the optimal N _opt value for N. Training module 122 determines the percentage of OCR errors performed on enhanced training images for N values in a specified range (for example, in some increment, such as 1% , one line or half a line). Learning module 122 may set other parameter values from the middle of the ranges of these parameters. Learning module 122 compares the total percentage of errors for each of the N values to determine which N value provides the lowest percentage of errors. Then, the training module 122 fixes the value of N on the obtained value of N _opt and continues to work, changing the next parameter, for example, k, leaving the values of the other unprocessed parameters at average values. The training module 122 determines the percentage of OCR errors performed on the improved training images for k values in a specified range (for example, in increments of 0.2) and compares the percentage of errors to determine which k value gives the least errors.

[0084] Процесс может продолжаться, пока обучающий модуль 122 не определит значения для всех параметров. В некоторых вариантах осуществления, если оптимальное значение параметра находится на границе диапазона параметра, то обучающий модуль 122 может расширить диапазон на его границе, пока расширенный диапазон не будет включать значение параметра, которое обеспечивает наименьший процент ошибок.[0084] The process may continue until the training module 122 determines the values for all parameters. In some embodiments, if the optimal parameter value is at the boundary of the parameter range, then training module 122 may extend the range at its boundary until the extended range includes the parameter value that provides the lowest error rate.

[0085] В некоторых вариантах осуществления для пикселей связных компонент с неуверенной бинаризацией модуль локального фильтра 114 может определять яркость пикселей с неуверенной бинаризацией голосованием среди пикселей связных компонент блока, который имеет похожую структуру локальной окрестности. Модуль локального фильтра 114 может выполнять это действие, для исправления результатов бинаризации пикселей связных компонент, имеющих неуверенную бинаризацию, и получить улучшенное бинаризованное изображение. Модуль локального фильтра 114 может определять яркость проблемных пикселей связных компонент на бинаризованном изображении голосованием среди пикселей связных компонент блока, который имеет похожую структуру локальной окрестности. Модуль локального фильтра 114 может выполнять обработку аналогично шагу 211, где на сером изображении выполнялось голосование для вычисления значения 〈I〉 для проблемных пикселей связных компонент, только теперь модуль локального фильтра 114 выполняет эту обработку на бинаризованном изображении, а не на сером изображении. Описанный выше способ позволяет получить улучшенное изображение (то есть с улучшенным контрастом и (или) уменьшенным шумом), которое может обеспечить большую точность при выполнении OCR на улучшенном изображении. Создание улучшенного изображения может использовать меньше ресурсов системы (например, меньше памяти, места для хранения данных и (или) циклов обрабатывающего устройства), чем стандартные техники улучшения изображения, а также может обеспечить более точные результаты, например, даже на мобильных устройствах.[0085] In some embodiments, for pixels of connected components with uncertain binarization, the local filter module 114 may determine the brightness of pixels with uncertain binarization by voting among the pixels of the connected components of a block that has a similar local neighborhood structure. The local filter module 114 can perform this action to correct the binarization results of pixels of connected components having uncertain binarization and obtain an improved binarized image. The local filter module 114 may determine the brightness of the problem pixels of the connected components in the binarized image by voting among the pixels of the connected components of a block that has a similar local neighborhood structure. The local filter module 114 can perform the processing similarly to step 211, where a vote was taken on the gray image to calculate the 〈I〉 value for the problem pixels of the connected components, only now the local filter module 114 performs this processing on the binarized image, and not on the gray image. The method described above provides an improved image (i.e., with improved contrast and / or reduced noise), which can provide greater accuracy when performing OCR on the enhanced image. Creating an enhanced image can use less system resources (for example, less memory, less storage space and / or cycles of the processing device) than standard image enhancement techniques, and can also provide more accurate results, for example, even on mobile devices.

[0086] Для упрощения объяснения процессы в настоящем описании изобретения изображены и описаны в виде последовательности действий. Однако действия в соответствии с настоящим описанием изобретения могут выполняться в различном порядке и (или) одновременно с другими действиями, не представленными и не описанными в настоящем документе. Кроме того, не все проиллюстрированные действия могут быть необходимы для реализации процессов в соответствии с настоящим описанием изобретения. Кроме того, специалистам в этой области техники будет понятно, что эти процессы могут быть представлены и другим образом - в виде последовательности взаимосвязанных состояний через диаграмму состояний или событий. Кроме того, следует учесть, что процессы, раскрываемые в данном описании, могут храниться в изделии для упрощения транспортировки и передачи этих процессов в вычислительные устройства. Термин «изделие» в настоящем документе означает компьютерную программу, доступную посредством любого машиночитаемого устройства или носителя данных.[0086] To simplify the explanation of the processes in the present description of the invention are depicted and described as a sequence of actions. However, the actions in accordance with the present description of the invention can be performed in a different order and (or) simultaneously with other actions not presented and not described in this document. In addition, not all illustrated acts may be necessary to implement processes in accordance with the present description of the invention. In addition, it will be understood by those skilled in the art that these processes can also be represented in another way — as a sequence of interrelated states through a state diagram or events. In addition, it should be noted that the processes disclosed in this description can be stored in the product to simplify the transportation and transfer of these processes to computing devices. The term “product” as used herein means a computer program accessible by any computer-readable device or storage medium.

[0087] На ФИГ. 4 приведена схема, иллюстрирующая пример машины в виде вычислительной системы 400. Вычислительная система 400 выполняет один или более наборов инструкций 426, которые заставляют машину выполнять одну или более рассматриваемых в этом документе методологий. Эта машина может работать в качестве сервера или клиентского устройства в сетевой среде «клиент-сервер» или как одноранговая машина в одноранговой (или распределенной) сети. Такой машиной может быть персональный компьютер (ПК), планшетный ПК, телевизионная приставка (STB), карманный персональный компьютер (PDA), сотовый телефон, веб-устройство, сервер, маршрутизатор, коммутатор или мост либо любая машина, которая может выполнять набор команд (последовательных или иных), определяющих действия, которые будут предприняты этой машиной. В дальнейшем, несмотря на то что на рисунке изображена одна машина, термин «машина» также будет включать в себя любые наборы машин, которые по отдельности или совместно выполняют набор (или несколько наборов) инструкций 426, осуществляя любую одну или более рассматриваемых в этом документе методологий.[0087] In FIG. 4 is a diagram illustrating an example machine in the form of a computing system 400. Computing system 400 executes one or more sets of instructions 426 that force the machine to execute one or more of the methodologies discussed herein. This machine can operate as a server or client device in a client-server network environment or as a peer-to-peer machine in a peer-to-peer (or distributed) network. Such a machine can be a personal computer (PC), a tablet PC, a television set-top box (STB), a personal digital assistant (PDA), a cell phone, a web device, a server, a router, a switch or a bridge, or any machine that can execute a set of commands ( sequential or otherwise) determining the actions to be taken by this machine. In the future, despite the fact that the figure depicts one machine, the term “machine” will also include any sets of machines that individually or collectively carry out a set (or several sets) of instructions 426, implementing any one or more of those discussed in this document methodologies.

[0088] Вычислительная система 400 содержит процессор 402, основное запоминающее устройство 404 (например, постоянное запоминающее устройство (ПЗУ), флеш-память, динамическое оперативное запоминающее устройство (ДОЗУ) (например, синхронное ДОЗУ (СДОЗУ) или Rambus ДОЗУ (РДОЗУ) и т.д.), статическое запоминающее устройство 406 (например, флеш-память, статическое оперативное запоминающее устройство (СОЗУ) и т.д.) и устройство хранения данных 416, которые могут обмениваться информацией друг с другом с помощью шины 408.[0088] The computing system 400 includes a processor 402, a main storage device 404 (eg, read only memory (ROM), flash memory, dynamic random access memory (DOS) (eg, synchronous DOS (DOS) or Rambus DOS (RDOS) and etc.), a static storage device 406 (e.g., flash memory, static random access memory (RAM), etc.) and a data storage device 416 that can exchange information with each other via a bus 408.

[0089] Процессор 402 представляет собой одно или более обрабатывающих устройств общего назначения, например, микропроцессоров, центральных процессоров или аналогичных устройств. В частности, процессор 402 может представлять собой микропроцессор с полным набором команд (CISC), микропроцессор с сокращенным набором команд (RISC), микропроцессор с командными словами сверхбольшой длины (VLIW), процессор, реализующий другой набор команд, или процессоры, реализующие комбинацию наборов команд. Процессор 402 также может представлять собой одно или более устройств обработки специального назначения, например интегральную микросхему (ASIC), программируемую пользователем логическую интегральную схему (FPGA), процессор цифровой обработки сигналов (DSP), сетевой процессор и т.д. Процессор 402 реализован с возможностью выполнения команд модуля локального фильтра 114, обучающего модуля 122, модуля OCR 113, 118, пользовательского электронного устройства 102 и (или) серверной системой 112 для выполнения операций и шагов, рассматриваемых в этом документе.[0089] The processor 402 is one or more general processing devices, such as microprocessors, central processing units, or similar devices. In particular, the processor 402 may be a full instruction set microprocessor (CISC), a reduced instruction set microprocessor (RISC), an extra long instruction microprocessor (VLIW), a processor implementing another instruction set, or processors implementing a combination of instruction sets . The processor 402 may also be one or more special-purpose processing devices, such as an integrated circuit (ASIC), user-programmable logic integrated circuit (FPGA), digital signal processing processor (DSP), network processor, etc. The processor 402 is implemented with the ability to execute commands of the local filter module 114, the training module 122, the OCR module 113, 118, the user electronic device 102 and / or the server system 112 to perform the operations and steps discussed in this document.

[0090] Вычислительная система 400 может также содержать устройство сетевого интерфейса 422, которое обеспечивает связь с другими машинами по сети 418, например, по локальной сети (LAN), корпоративной сети, сети экстранет или сети Интернет. Вычислительная система 400 также может включать блок видео дисплея 410 (например, жидкокристаллический дисплей (LCD) или электронно-лучевой монитор (CRT)), алфавитно-цифровое устройство ввода 412 (например, клавиатуру), устройство управления курсором 414 (например, мышь) и устройство генерации сигналов 420 (например, динамик).[0090] Computing system 400 may also include a network interface device 422 that communicates with other machines over network 418, such as a local area network (LAN), corporate network, extranet, or the Internet. Computing system 400 may also include a video display unit 410 (e.g., a liquid crystal display (LCD) or electron beam monitor (CRT)), an alphanumeric input device 412 (e.g., a keyboard), a cursor control device 414 (e.g., a mouse), and 420 signal generating device (e.g. speaker).

[0091] Устройство хранения данных 416 может включать машиночитаемый носитель данных 424, на котором хранятся наборы инструкций 426 для модуля локального фильтра 114, обучающего модуля 122, модуля OCR 113, 118, пользовательского электронного устройства 102 и (или) серверной системы 112, отражающие одну или более методологий или функций, описанных в данном документе. Наборы инструкций 426 для модуля локального фильтра 114, обучающего модуля 122, модуля OCR 113, 118, пользовательского электронного устройства 102 и (или) серверной системы 112 могут также располагаться полностью или как минимум частично в основном запоминающем устройстве 404 и/или в процессоре 402 во время их выполнения вычислительной системой 400, основное запоминающее устройство 404 и процессор 402 также формируют машиночитаемый носитель данных. Наборы инструкций 426 могут также передаваться или приниматься по сети 418 с помощью устройства сетевого интерфейса 422.[0091] The storage device 416 may include a computer-readable storage medium 424 that stores instruction sets 426 for the local filter module 114, the training module 122, the OCR module 113, 118, the user electronic device 102 and / or the server system 112, reflecting one or more methodologies or functions described herein. The instruction sets 426 for the local filter module 114, the training module 122, the OCR module 113, 118, the user electronic device 102 and / or the server system 112 can also be located completely or at least partially in the main memory 404 and / or in the processor 402 in their execution time by the computing system 400, the main storage device 404, and the processor 402 also form a computer-readable storage medium. The instruction sets 426 may also be transmitted or received over the network 418 using the network interface device 422.

[0092] В то время как пример машиночитаемого носителя данных 424 показан как единый носитель, термин «машиночитаемый носитель данных» следует понимать как единый носитель либо множество таких носителей (например, централизованную или распределенную базу данных и/или соответствующие кэши и серверы), в которых хранятся наборы команд 426. Термин «машиночитаемый носитель данных» может включать любой носитель, который может хранить, кодировать или переносить набор команд для выполнения машиной и который обеспечивает выполнение машиной любой одной или более методологий настоящего изобретения. Термин «машиночитаемый носитель данных» может включать, в частности, устройства твердотельной памяти, оптические и магнитные носители.[0092] While the example of computer-readable storage medium 424 is shown as a single medium, the term “computer-readable storage medium” should be understood as a single medium or a plurality of such mediums (eg, a centralized or distributed database and / or corresponding caches and servers), which stores instruction sets 426. The term “machine-readable storage medium” may include any medium that can store, encode, or carry a set of instructions for execution by a machine and which enables the machine to execute any one or more methodologies of the present invention. The term “computer readable storage medium” may include, but is not limited to, solid state memory devices, optical and magnetic media.

[0093] В приведенном выше описании изложены многочисленные детали. Однако любому специалисту в этой области техники, ознакомившемуся с этим описанием, должно быть очевидно, что настоящее изобретение может быть осуществлено на практике без этих конкретных деталей. В некоторых случаях хорошо известные структуры и устройства показаны в виде блок-схем без детализации, чтобы не усложнять описание настоящего изобретения.[0093] In the above description, numerous details are set forth. However, it should be apparent to any person skilled in the art who has read this description that the present invention can be practiced without these specific details. In some cases, well-known structures and devices are shown in block diagrams without detail, so as not to complicate the description of the present invention.

[0094] Некоторые части описания предпочтительных вариантов реализации изобретения представлены в виде алгоритмов и символического представления операций с битами данных в запоминающем устройстве компьютера. Такие описания и представления алгоритмов являются средством, используемым специалистами в области обработки данных, чтобы наиболее эффективно передавать сущность своей работы другим специалистам в данной области. Приведенный здесь (и в целом) алгоритм сконструирован как непротиворечивая последовательность шагов, ведущих к нужному результату. Эти шаги требуют физических манипуляций с физическими величинами. Обычно, хотя и не обязательно, эти величины принимают форму электрических или магнитных сигналов, которые можно хранить, передавать, комбинировать, сравнивать и выполнять другие манипуляции. Иногда удобно, прежде всего для обычного использования, описывать эти сигналы в виде битов, значений, элементов, символов, терминов, цифр и т.д.[0094] Some parts of the description of preferred embodiments of the invention are presented in the form of algorithms and a symbolic representation of operations with data bits in a computer storage device. Such descriptions and representations of algorithms are the means used by specialists in the field of data processing to most effectively transfer the essence of their work to other specialists in this field. The algorithm presented here (and in general) is designed as a consistent sequence of steps leading to the desired result. These steps require physical manipulation of physical quantities. Usually, although not necessarily, these quantities take the form of electrical or magnetic signals that can be stored, transmitted, combined, compared and other manipulations performed. Sometimes it is convenient, first of all for ordinary use, to describe these signals in the form of bits, values, elements, symbols, terms, numbers, etc.

[0095] Однако следует иметь в виду, что все эти и подобные термины должны быть связаны с соответствующими физическими величинами и что они являются лишь удобными обозначениями, применяемыми к этим величинам. Если не указано иное, как видно из последующего обсуждения, следует понимать, что во всем описании такие термины, как «определение», «предоставление», «активация», «нахождение», «выбор» и т.д., относятся к операциям и процессам вычислительной системы или подобного электронного вычислительного устройства, которые управляют данными, представленными в виде физических (электронных) величин в регистрах и запоминающих устройствах вычислительной системы, и преобразуют их в другие данные, аналогичным образом представленные в виде физических величин в запоминающих устройствах или регистрах вычислительной системы либо в других подобных устройствах хранения, передачи или отображения информации.[0095] However, it should be borne in mind that all of these and similar terms should be associated with the corresponding physical quantities and that they are only convenient designations applicable to these quantities. Unless otherwise indicated, as can be seen from the discussion that follows, it should be understood that throughout the description, terms such as “definition”, “provision”, “activation”, “finding”, “choice”, etc., refer to operations and processes of a computing system or similar electronic computing device that control data represented as physical (electronic) quantities in registers and storage devices of a computing system and convert them into other data similarly represented as physical quantities in storage devices or computing registers systems or other similar devices for storing, transmitting or displaying information.

[0096] Настоящее изобретение также относится к устройству для выполнения операций, описанных в настоящем документе. Такое устройство может быть специально сконструировано для требуемых целей, или оно может содержать универсальный компьютер, который избирательно активируется или дополнительно настраивается с помощью компьютерной программы, хранящейся в компьютере. Подобная компьютерная программа может храниться на машиночитаемом носителе данных, включая, помимо прочего, любые типы дисков, например гибкие диски, оптические диски, компакт-диски, не предназначенные для перезаписи (CD-ROM), магнитно-оптические диски, постоянные запоминающие устройства (ROM), оперативные запоминающие устройства (RAM), стираемые программируемые постоянные запоминающие устройства (EPROM), стираемые электрическим сигналом программируемые постоянные запоминающие устройства (EEPROM), магнитные или оптические карты или другие типы носителей, используемые для хранения инструкций в электронном виде.[0096] The present invention also relates to a device for performing the operations described herein. Such a device can be specially designed for the required purposes, or it can contain a universal computer that is selectively activated or optionally configured using a computer program stored in the computer. Such a computer program may be stored on a computer-readable storage medium, including but not limited to any type of disc, such as floppy disks, optical discs, non-rewritable compact discs (CD-ROMs), optical magnetic disks, read-only memory devices (ROMs) ), random access memory (RAM), erasable programmable read-only memory (EPROM), electric-erasable programmable read-only memory (EEPROM), magnetic or optical cards or other types of media used to store instructions in electronic form.

[0097] Слова «пример» или «примерный» используются здесь для обозначения использования в качестве примера, отдельного случая или иллюстрации. Любой вариант реализации или конструкция, описанная в настоящем документе как «пример», не должны обязательно рассматриваться как предпочтительные или преимущественные по сравнению с другими вариантами реализации или конструкциями. Слово «пример» лишь предполагает, что идея изобретения представляется конкретным образом. В этой заявке термин «или» предназначен для обозначения включающего «или», а не исключающего «или». Если не указано иное или не очевидно из контекста, то «X включает А или В» используется для обозначения любой из естественных включающих перестановок. То есть если X включает в себя А; X включает в себя В; или X включает и А, и В, то высказывание «X включает в себя А или В» является истинным в любом из указанных выше случаев. Использование терминов «вариант осуществления» или «один вариант осуществления» либо «реализация» или «одна реализация» не означает одинаковый вариант реализации, если такое описание не приложено. В описании термины «первый», «второй», «третий», «четвертый» и т.д. используются как метки для обозначения различных элементов, они не обязательно имеют смысл порядка в соответствии с их числовым обозначением.[0097] The words “example” or “exemplary” are used herein to mean use as an example, individual case, or illustration. Any embodiment or design described herein as an “example” should not necessarily be construed as preferred or advantageous over other embodiments or designs. The word “example” only assumes that the idea of the invention is presented in a concrete way. In this application, the term “or” is intended to mean an inclusive “or” and not an exclusive “or”. Unless otherwise indicated or obvious from the context, “X includes A or B” is used to mean any of the natural inclusive permutations. That is, if X includes A; X includes B; or X includes both A and B, then the statement “X includes A or B” is true in any of the above cases. The use of the terms “embodiment” or “one embodiment” or “implementation” or “one implementation” does not mean the same embodiment if such a description is not attached. In the description, the terms “first”, “second”, “third”, “fourth”, etc. are used as labels to indicate various elements; they do not necessarily have a sense of order according to their numerical designation.

[0098] Следует понимать, что приведенное выше описание носит иллюстративный, а не ограничительный характер. Другие варианты реализации будут очевидны специалистам в данной области техники после прочтения и понимания приведенного выше описания. Поэтому область применения изобретения должна определяться с учетом прилагаемой формулы изобретения, а также всех областей применения эквивалентных способов, которые охватывает формула изобретения.[0098] It should be understood that the above description is illustrative and not restrictive. Other embodiments will be apparent to those skilled in the art after reading and understanding the above description. Therefore, the scope of the invention should be determined taking into account the attached claims, as well as all areas of application of equivalent methods that are covered by the claims.

Claims

1. A method of improving electronic images, including:

identifying one or more blocks in an electronic image that contain text characters;

identifying one or more text blocks among blocks that contain text characters;

determination of the average text contrast value for each of the text blocks;

identifying the type of each pixel in each of the text blocks based on the average text contrast, where the pixel type is selected from background pixels, pixels of connected components or problem pixels of connected components, and where the type of background pixel indicates that the pixel is part of the background of the electronic image , the pixel type of the connected component indicates that the pixel is part of the connected component in the electronic image, and the problem pixel type of the connected component indicates that the pixel is part of the connected component in the electronic image and has an uncertain binarization;

performing at least one processing device locally adaptive filtering in the first neighborhood of pixels around each pixel from each text block to determine the brightness of the pixel based on the detected type; and

storing in at least one electronic image data storage device containing the brightness found for each pixel in each of the text blocks.

2. The method according to claim 1, further comprising receiving an electronic image via an image receiving interface of a user electronic device, wherein the user electronic device comprises a processing device.

3. The method according to claim 1, further comprising performing optical character recognition of the text in the electronic image, based on the brightness of each pixel found in each text block.

4. The method according to p. 1, further comprising:

identifying at least one first noisy pixel in at least one text block from text blocks based on a comparison of the difference in brightness of this first pixel and brightness statistics for pixels from a second neighborhood around the first pixel in the text block with the average text contrast value; and

reducing the contrast of noise pixels based on the brightness, brightness statistics and average contrast of the text, the decrease being limited by the average contrast of the text, where type detection is performed after the contrast of the noise pixels is reduced.

5. The method according to p. 4, characterized in that the brightness statistics includes at least one of the midpoint values for the brightness of the pixels from the second neighborhood, the median of the brightness of the pixels from the second neighborhood or the average brightness of the pixels from the second neighborhood.

6. The method according to p. 1, further comprising:

determining that the brightness of at least one first pixel having a pixel type of a connected component in at least one block among text blocks has a contrast below a threshold value based on the average text contrast value and brightness statistics for pixels from a second neighborhood around the first pixel ; and

determining the brightness of the first pixel based on the average value of the contrast of the text and statistics of brightness in order to increase the contrast of the first pixel.

7. The method according to p. 1, characterized in that the locally adaptive filtering uses different calculations for each type of pixel, that is, background pixels, pixels of connected components and problem pixels of connected components.

8. The method according to p. 1, characterized in that the text blocks overlap and that the described method further includes combining the found brightness for the overlapping parts of adjacent text blocks, based on the corresponding weights for the aligned pixels in the overlapping parts.

9. A permanent computer-readable storage medium containing instructions stored on it which, when executed by at least one processing device, cause the processing device to perform the following actions:

identifying one or more text blocks among blocks that contain text characters;

determination of the average text contrast value for each of the text blocks;

identifying the type of each pixel in each of the text blocks based on the average text contrast, where the pixel type is selected from background pixels, pixels of connected components or problem pixels of connected components, and where the type of background pixel indicates that the pixel is part of the background of the electronic image, the pixel type of the connected component indicates that the pixel is part of the connected component in the electronic image, and the problem pixel type of the connected component indicates that the pixel is part of the connected component in the electronic image and has an uncertain binarization;

the processing device performing locally adaptive filtering in the first neighborhood of pixels around each pixel from each text block to determine the brightness of the pixel based on the detected type; and

10. The computer-readable storage medium according to claim 9, characterized in that the commands additionally cause the processing device to receive an electronic image via the image acquisition interface of the user electronic device, wherein the user electronic device comprises a processing device.

11. The computer-readable storage medium according to claim 9, characterized in that the instructions additionally cause the processing device to perform optical character recognition of the text in the electronic image based on the brightness of each pixel found in each text block.

12. The computer-readable storage medium according to claim 9, characterized in that the instructions additionally call in the processing device:

identifying at least one first noisy pixel in at least one text block among the text blocks based on comparing the brightness of this first pixel and brightness statistics for pixels from a second neighborhood around the first pixel in the text block with the average text contrast value; and

reducing the contrast of noise pixels based on brightness, statistics of brightness and the average value of the contrast of the text, and the decrease is limited to the average value of the contrast of the text, where type detection is performed after reducing the contrast of the noise pixels.

13. The computer-readable storage medium according to claim 12, characterized in that the brightness statistics includes at least one of the values of the midpoint of the brightness of the pixels from the second neighborhood, the median brightness of the pixels from the second neighborhood or the average brightness of the pixels from the second neighborhood.

14. The computer-readable storage medium according to claim 9, characterized in that the locally adaptive filtering uses different calculations for each type of pixel, that is, background pixels, pixels of connected components and problem pixels of connected components.

15. The computer-readable storage medium according to claim 9, characterized in that the text blocks overlap and that the commands additionally cause the processing unit to combine the found brightness for the overlapping parts of adjacent text blocks based on the respective weights for the aligned pixels in the overlapping parts.

16. An electronic image enhancement system comprising the following components:

at least one storage device in which instructions are stored; and

at least one processing device for executing commands for the following purposes:

identifying one or more text blocks among blocks that contain text characters;

determination of the average text contrast value for each of the text blocks;

17. The system of claim 16, wherein the processing device further executes instructions for performing optical character recognition of the text in the electronic image, based on the brightness of each pixel found in each text block.

18. The system according to p. 16, characterized in that the processing device also performs instructions that provide:

identifying at least one first noisy pixel in at least one text block among the text blocks based on a comparison of the difference in brightness of this first pixel and brightness statistics for pixels from a second neighborhood around the first pixel in the text block with the average text contrast value; and

19. The system according to p. 18, characterized in that the brightness statistics includes at least one of the values of the midpoint of the brightness of the pixels from the second neighborhood, the median brightness of the pixels from the second neighborhood or the average brightness of the pixels in the second neighborhood.

20. The system according to p. 16, characterized in that the locally adaptive filtering uses different calculations for each type of pixel, i.e. background pixels, pixels of connected components, and problem pixels.