[go: up one dir, main page]

CN110991371B - Intelligent reading learning method based on coordinate recognition - Google Patents

Intelligent reading learning method based on coordinate recognition Download PDF

Info

Publication number
CN110991371B
CN110991371B CN201911253273.6A CN201911253273A CN110991371B CN 110991371 B CN110991371 B CN 110991371B CN 201911253273 A CN201911253273 A CN 201911253273A CN 110991371 B CN110991371 B CN 110991371B
Authority
CN
China
Prior art keywords
page
image
area
printed matter
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911253273.6A
Other languages
Chinese (zh)
Other versions
CN110991371A (en
Inventor
江周平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Anxin Zhitong Technology Co ltd
Original Assignee
Beijing Anxin Zhitong Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Anxin Zhitong Technology Co ltd filed Critical Beijing Anxin Zhitong Technology Co ltd
Priority to CN201911253273.6A priority Critical patent/CN110991371B/en
Publication of CN110991371A publication Critical patent/CN110991371A/en
Application granted granted Critical
Publication of CN110991371B publication Critical patent/CN110991371B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/107Static hand or arm
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2135Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B5/00Electrically-operated educational appliances
    • G09B5/06Electrically-operated educational appliances with both visual and audible presentation of the material to be studied
    • G09B5/065Combinations of audio and video presentations, e.g. videotapes, videodiscs, television systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Business, Economics & Management (AREA)
  • Educational Administration (AREA)
  • Educational Technology (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses an intelligent reading learning method based on coordinate recognition, which comprises the following steps: placing the printed matter on an auxiliary pad, wherein a placement area and a marking area are formed on the auxiliary pad, and a staff gauge for representing position information is arranged on the marking area; shooting a printed matter page to be learned by using a camera of a learning rod, and identifying page information to obtain original page information; the method comprises the steps of utilizing a selected reference object to touch an interested area on a printed matter page to be learned, continuously shooting through a camera of a learning rod to obtain a reference page image containing the selected reference object, the printed matter and an auxiliary pad, and identifying the selected reference object from the reference page image to obtain the selected reference object; and calculating and selecting the position information of the reference object on the printed matter based on the scaleplate object of the auxiliary pad on the reference page image, acquiring a multimedia file preset at the corresponding position, and playing. The invention does not need to prefabricate codes on books, and is convenient for accurately identifying the click-to-read action.

Description

Intelligent reading learning method based on coordinate recognition
Technical Field
The invention relates to the technical field of multimedia education, in particular to an intelligent reading learning method based on coordinate recognition.
Background
The click-reading is an intelligent reading and learning mode realized by utilizing an optical image recognition technology and a digital voice technology, which embodies perfect integration of an electronic multimedia technology and the education industry and realizes the scientific and technological human-oriented concept. With existing point-and-read devices, it is often necessary to pre-process the book, print or paste specific codes on the book, otherwise the book contents cannot be identified.
In order to solve the above problems, a reading device for directly recognizing and then reading out contents by using an OCR character recognition technology has appeared in the market, but because of the complexity of the printing variety, the OCR recognition method is prone to misreading, and the data processing calculation amount of the OCR recognition method is also relatively large. In addition, for click-to-read action recognition in the click-to-read process, the fingertip needs to be recognized on the printed matter, but in the recognition process, the recognition difficulty is relatively high and the accuracy is not high due to the problem of the printed matter.
Disclosure of Invention
The invention aims to provide an intelligent reading learning method based on coordinate recognition, which does not need to prefabricate codes on books, gets rid of the limitation of reading contents due to the limitation of the codes, can ensure the accuracy of broadcasting the contents, and is convenient for accurately recognizing the reading actions.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
An intelligent reading learning method based on coordinate recognition comprises the following steps:
S1, placing a printed matter on an auxiliary pad, wherein a placement area and a marking area are formed on the auxiliary pad, the placement area is used for placing the printed matter, the marking area surrounds the placement area, and a staff object for representing position information is arranged on the marking area;
S2, shooting a printed matter page to be learned by using a camera of a learning rod, and identifying page information to obtain original page information;
S3, utilizing the selected reference object to touch an interested area on a printed matter page to be learned, continuously shooting through a camera of a learning rod to obtain a reference page image containing the selected reference object, the printed matter and an auxiliary pad, and identifying the selected reference object from the reference page image to obtain the selected reference object;
S4, calculating the position information of the selected reference object on the printed matter, which is identified in the step S3, based on the staff object of the auxiliary pad on the reference page image;
s5, acquiring a multimedia file preset at a corresponding position based on the position information acquired in the S4 and the original page information acquired in the S2, and playing the multimedia file.
Preferably, the selected reference object is a human hand, a finger, a pen-shaped object or an object with a light emitting device at the tip.
Preferably, the step S3 further includes performing a point touch recognition from the reference page image to obtain gesture command information, and the step S5 further includes selecting a playing mode of the multimedia file based on the gesture command information.
Preferably, the step S2 comprises the following sub-steps:
S21, acquiring an original page image of a printed matter in advance and extracting feature points, so as to acquire a page feature library, and forming a corresponding area play multimedia content library in a mode that certain specific areas correspond to certain multimedia files on the original page of the printed matter in advance;
S22, continuously shooting the page of the printed matter to be learned by using a camera of the learning rod, obtaining an image of the page to be learned, extracting characteristic points, searching the page to be learned in the page characteristic library based on the extracted characteristic points, and matching page characteristics to obtain original page information of the page to be learned in the page characteristic library.
Preferably, the feature point extraction is realized by feature extraction SIFT and SURF algorithms.
Preferably, the feature point extraction is realized by the following method:
graying the image;
extracting characteristic points by using a key point detection algorithm;
performing feature point direction identification based on histogram statistics;
and describing the feature points to obtain feature descriptors.
Preferably, the page feature matching is realized through a feature value Euclidean distance, cosine similarity of feature vectors and a correlation coefficient algorithm.
Preferably, the page feature matching is achieved by the following method:
performing dimension reduction, hash conversion and sequencing on feature descriptors corresponding to the feature points extracted from the learning page image, comparing the hash value with hash values of the feature points stored in a page feature library, and recognizing that the pair of feature points are matched if the distance is smaller than a preset first threshold value;
And counting the number of the matched characteristic points, and if the number of the matched characteristic points is larger than a preset second threshold value, recognizing that the learning page image is matched with the corresponding original page image.
Preferably, the step S4 includes the following sub-steps:
S41, carrying out coordinate mapping based on the scale object of the auxiliary pad on the reference page image, and calculating the position information of the selected reference object on the auxiliary pad, which is identified in the step S3;
S42, converting coordinate values of the selected reference object on the auxiliary pad into position information of the selected reference object on the printed matter based on the mapping relation between the auxiliary pad and the printed matter.
Preferably, the click action of the selected reference object in step S4 includes a single click or a double click.
After the technical scheme is adopted, compared with the background technology, the invention has the following advantages:
the invention does not need to prefabricate codes on books, gets rid of the limitation of the read-by-point content due to the limitation of the codes, can ensure the accuracy of content broadcasting, and is convenient for accurately identifying the read-by-point action.
Drawings
FIG. 1 is a schematic workflow diagram of the present invention;
FIG. 2 is a schematic view of the present invention in use;
FIG. 3 is a schematic diagram of a page matching process according to the present invention;
Fig. 4 is a schematic diagram showing the identification of a selected reference object (finger tip) according to the present invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
Examples
The invention discloses an intelligent reading learning method based on coordinate recognition, which is shown in the drawings in combination with fig. 1 and 2, and comprises the following steps:
S1, placing the printed matter on an auxiliary pad. Referring to fig. 2, the auxiliary pad 100 is formed with a placement area for placing the printed matter 300 and a marking area surrounding the placement area, on which a scale 110 representing position information is provided. The learning rod 200 is placed on the outside of the auxiliary pad 100 for photographing.
S2, shooting a printed matter page to be learned by using a camera of the learning rod, and identifying page information to obtain original page information. The method specifically comprises the following steps of:
s21, acquiring an original page image of the printed matter in advance, extracting feature points, performing dimension reduction, hash conversion and sequencing, so as to obtain a page feature library, and forming a corresponding area play multimedia content library in a mode that certain specific areas correspond to certain multimedia files on the original page of the printed matter in advance.
The feature point extraction method may employ any feature point extraction algorithm including, but not limited to SIFT, SURF, and algorithm variations thereof, and the present invention is not particularly limited. In this embodiment, feature point extraction may be achieved by the following method:
a. and (5) image graying processing. The acquired image is a color image (for example, an RGB three-channel color image), and the gray-scale processing needs to be performed first to facilitate the execution of the subsequent steps. In this embodiment, the calculation formula of graying adopts:
Gray=(R*30+G*59+B*11+50)/100
Wherein Gray is a Gray value.
B. And extracting the characteristic points by using a key point detection algorithm. And continuously downsampling the original image to obtain a series of images with different sizes, further carrying out Gaussian filtering on the images with different scales, subtracting the two images after the Gaussian filtering of the same image with similar scales to obtain a Gaussian difference image, and carrying out extremum detection, wherein extremum points meeting curvature conditions are characteristic points. The gaussian difference image D (x, y, σ) operates as follows, G (x, y, σ) is a gaussian filter function, I (x, y) corresponds to the original image, and L (x, y, σ) represents the gaussian filtered image of the scale σ:
D(x,y,σ)=(G(x,y,σ(s+1))-G(x,y,σs)))*I(x,y)
=L(x,y,σ(s+1))-L(x,y,σ(s))
c. And carrying out feature point direction identification based on histogram statistics. After the gradient calculation of the feature points is completed, the histogram is used for counting the gradient and the direction of the pixels in the neighborhood. The gradient histogram divides the range of directions from 0 to 360 degrees into 18 bins (bins), with 20 degrees per bin. The peak direction of the histogram represents the main direction of the feature point. L is the scale space value of the key point, and the calculation formulas of the gradient m and the direction theta of each pixel point are as follows:
θ(x,y)=tan-1((L(x,y+1)-L(x,y-1))/L(x+1,y)-L(x-1,y)))
And describing the feature points to obtain feature descriptors. Determining a neighborhood with the size of 21 multiplied by 21 for the feature points, and rotating the neighborhood to a main direction; calculating the horizontal gradient and the vertical gradient of the pixel points in the neighborhood, so that each feature point determines a feature descriptor with the size of 19 multiplied by 2=722; the description of the feature points includes coordinates, dimensions, directions. It should be noted that, since the obtained feature descriptors are high-dimensional (722-dimensional in this embodiment), for facilitating the subsequent processing, the dimension reduction and hash transformation is performed, in this embodiment, the dimension reduction processing is performed by using the principal component analysis dimension reduction method, that is, PCA in fig. 3, the dimension reduction processing is performed in 20 dimensions, and after the locally sensitive hash transformation, that is, LSH in fig. 3, the feature descriptors in 20 dimensions are mapped into 1 32-bit floating point values. The specific operation of PCA is as follows:
Firstly, constructing a feature matrix X by using feature data of a large number of acquired images, obtaining feature values of the matrix X, sorting the feature values according to the size, and obtaining feature vectors corresponding to the feature values to form a transformation matrix W. Under the condition of the existing transformation matrix W, for the characteristic data Y of any acquired image, Z=YW T, the original characteristic matrix Y is projected to the matrix Z, the high-dimensional characteristic matrix Y is reduced to a low-dimensional new characteristic matrix Z, and the new characteristics are linear irrelevant.
The specific operation of LSH is as follows:
(1) Selecting a locally sensitive hash function meeting (d 1, d2, p1, p 2) sensitivity;
(2) Determining the number L of hash tables according to the accuracy of the search result, the number K of hash functions in each table and parameters related to the sensitive hash;
(3) All data are hashed into corresponding barrels through a local sensitive hash function, and one or more hash tables are formed.
S22, continuously shooting the page of the printed matter to be learned by using a camera of the learning rod, obtaining an image of the page to be learned, extracting characteristic points, searching the page to be learned in a page characteristic library based on the extracted characteristic points, and matching page characteristics to obtain original page information of the page to be learned in the page characteristic library. The algorithm adopted by the page feature matching comprises but is not limited to the Euclidean distance of the feature value, the cosine similarity of the feature vector, the correlation coefficient and the like, and the invention is not particularly limited. Referring to fig. 3, in this embodiment, the page feature matching is achieved by the following method:
And performing dimension reduction, hash conversion and sequencing on feature descriptors corresponding to the feature points extracted from the page image to be learned, comparing the hash value with the hash value of the feature points stored in the page feature library, and if the distance is smaller than a preset first threshold value, recognizing that the pair of feature points are matched. The matching calculation distance process is to calculate the distance between the hash value of the feature point and 2L data in the page feature library, wherein the distance is defined as, but not limited to, the absolute value of the two differences, and the feature point pair is judged to be matched when the distance is smaller than a set first threshold value.
And counting the number of the matched characteristic points, and if the number of the matched characteristic points is larger than a preset second threshold value, recognizing that the page image to be learned is matched with the corresponding original page image.
S3, utilizing the selected reference object to point contact the interested area on the printed matter page to be learned, continuously shooting through a camera of the learning rod to obtain a reference page image containing the selected reference object, the printed matter and the auxiliary pad, identifying the selected reference object from the reference page image, obtaining the selected reference object, identifying the point contact from the reference page image, and obtaining gesture instruction information.
In this embodiment, the reference object is selected by a human finger. Of course, the selected reference object is not limited to a human finger, but may be a pen-shaped object, an object with a light emitting device at its tip, or the like. The point-touch actions involved in this step include, but are not limited to, single-click, double-click, or other actions with distinct features. Different gesture instructions may be represented by different click actions, for example, a single click is a click, and a double click is a call out of the multimedia presentation.
The finger tip recognition can be realized by the following method:
s31, converting the color space of the reference page image into a YCbCr color space. For reference page images in RGB format, the color space conversion is achieved by the following formula:
Y=0.257*R+0.564*G+0.098*B+16
Cb=-0.148*R-0.291*G+0.439*B+128
Cr=0.439*R-0.368*G-0.071*B+128。
S32, skin color segmentation, and obtaining candidate hand areas based on histogram statistics. Respectively establishing histograms for Y, cb and Cr, selecting lower limits Y1, cb1 and Cr1 according to peaks of the histograms, selecting upper limits Y2, cb2 and Cr2, and judging pixels meeting Y1< Y2, cb1< Cb2 and Cr1< Cr2 as candidate skin color points; binarizing the image, setting non-candidate skin color points to 0, and setting candidate skin color points to 1; and (3) expanding and communicating the binary image, reserving a maximum communication area with a value of 1, calculating the area S0 of the communication area, and if S0/S > Th1 and Th1 are preset thresholds, judging the communication area as a candidate hand area.
S33, hand outline recognition is performed, and whether the candidate player area is a real hand area in a click-reading state is judged. And (3) carrying out longitudinal projection on the candidate hand region, dividing the projection image into a left part and a right part by the average height, calculating the length l and the height h of the lower part region of the projection image, acquiring the image width w, and judging the connected region as a real hand region if h/w > Th2 and h/l < Th3 (Th 2 and Th3 are preset thresholds) are simultaneously met.
S34, calculating the center of gravity of the region. Assuming that there are N pixel points of the hand region, the coordinates in the image are (xi, yi), i=1 … N, and the region barycentric coordinates xc=sum (xi)/N, yc=sum (yi)/N.
S35, identifying the fingertip position based on the center of gravity of the region. And extracting the outline of the hand region, calculating the distance between the outline pixel point of the real hand region and the position of the gravity center of the region, and judging the pixel point with the largest distance as the fingertip position. (xp, yp) =argmax ((xi-xc) ×(xi-xc) + (yi-yc) ×i=1 … M, where (xp, yp) is a fingertip position coordinate and M is a contour pixel count.
The present invention fingertip recognition can also be achieved by the following method, as shown in fig. 4:
firstly, skin color segmentation is carried out, and then candidate region segmentation is carried out. And finding out the maximum value and the minimum value xmin, ymin, xmax and ymax from the coordinates of the pixel points of the candidate hand area, acquiring an image of the rectangular area (xmin, xmax, ymin and ymax), feeding the image into a convolutional neural network, and positioning fingertips in the identified finger area.
The specific process of the convolutional neural network for identifying the finger is as follows:
1. Collecting training samples: collecting a large number of images containing finger reading, marking the finger images, preprocessing the finger images, and constructing a training sample set;
2. Constructing a network model: constructing a CNN feature extraction network and a discrimination network, wherein the CNN feature extraction network consists of a convolution layer, an excitation function layer and a pooling layer, and the discrimination network consists of a region-of-interest pooling layer, a full connection layer, an excitation function layer and a Softmax layer;
3. Training a network model: initializing a CNN feature extraction network and a discrimination network, and training the CNN feature extraction network by inputting a finger image with a mark to obtain a CNN feature extraction network model; the judging network trains the candidate areas according to the feature images provided by the CNN feature extraction network to obtain a judging network model;
4. And (3) constructing a detection model: combining the CNN feature extraction network model and the discrimination network into a detection network, and training the network by utilizing finger image training data to obtain a finger detection and key point positioning network model;
5. finger detection: and detecting the finger outline and positioning the key points by using the obtained detection network model to obtain a finger image candidate frame and positioning the finger key point image.
The fingertip positioning process is as follows:
After the finger is identified, the contour of the finger is extracted according to the finger area obtained by skin color segmentation, the fingertip position is (xp, yp) =argmin (yi), i= … M, and M is the contour pixel point number.
And S4, calculating the position information of the selected reference object on the printed matter, which is identified in the step S3, based on the scale object of the auxiliary pad on the reference page image. The method specifically comprises the following steps of:
s41, carrying out coordinate mapping based on the scale object of the auxiliary pad on the reference page image, and calculating the position information of the selected reference object on the auxiliary pad, which is identified in the step S3.
S42, converting coordinate values of the selected reference object on the auxiliary pad into position information of the selected reference object on the printed matter based on the mapping relation between the auxiliary pad and the printed matter.
The coordinate position of the reference object can be selected by adopting the following two methods:
the method 1 is that according to the mark points on the auxiliary pad, as projection mapping points, a transformation matrix is generated, the real-time image is projected and transformed to obtain a rectangular image, and meanwhile, the coordinates of a selected reference object (finger) are input into the transformation matrix to obtain transformed rectangular inner coordinates, and the transformed rectangular inner coordinates correspond to a printed matter.
And 2, respectively calculating and selecting the distance between the coordinates of the reference object (finger) and the identification point according to the identification point on the auxiliary pad, then knowing the distance between the coordinates of the three points before transformation and the coordinates of the finger, the coordinates of the three points in the rectangle after transformation and the distance between the calculated coordinates and the three points (proportional relation with the distance before transformation), and calculating three circle intersecting points on the plane after transformation.
S5, acquiring a preset multimedia file at a corresponding position based on the position information acquired in the S4 and the original page information acquired in the S2, selecting a playing mode of the multimedia file based on gesture instruction information, and playing the multimedia file. The multimedia file may be an audio file, or may be an image or video, and the present invention is not particularly limited. The multimedia playing can be performed by using a display screen (the display screen is an optional component) or a loudspeaker integrated on the learning stick, and the multimedia playing can also be performed by using a WIFI or Bluetooth function of the learning stick, connecting an external intelligent terminal and using a screen and a loudspeaker of the external intelligent terminal, wherein the WIFI, the Bluetooth and the external intelligent terminal are optional components. The learning stick can be integrated with a projection device which is used for playing the multimedia file and is an optional component. The original page image library, the page feature library and the multimedia file corresponding to the printed matter related to step S2 may be stored in the learning stick or may be stored in a storage space of an external server. The learning stick can be integrated with a projection device, and the projection device is an optional component.
As can be seen from the above description, the present invention realizes recognition of a click-to-read operation and multimedia content play by means of a learning stick and an auxiliary pad. When the auxiliary pad is used, a printed matter is required to be placed on a placement area of the auxiliary pad in a standard mode, an original page information is obtained by shooting a printed matter page image through the learning rod, finger auxiliary positioning identification is achieved through the staff on the auxiliary pad, and finally a corresponding multimedia file can be obtained based on the original page information and the finger position. For ease of identification, the color of the auxiliary pad should be significantly different from the printed matter.
The present invention is not limited to the above-mentioned embodiments, and any changes or substitutions that can be easily understood by those skilled in the art within the technical scope of the present invention are intended to be included in the scope of the present invention. Therefore, the protection scope of the present invention should be subject to the protection scope of the claims.

Claims (6)

1. The intelligent reading learning method based on coordinate recognition is characterized by comprising the following steps of:
S1, placing a printed matter on an auxiliary pad, wherein a placement area and a marking area are formed on the auxiliary pad, the placement area is used for placing the printed matter, the marking area surrounds the placement area, and a staff object for representing position information is arranged on the marking area;
S2, shooting a printed matter page to be learned by using a camera of a learning rod, and identifying page information to obtain original page information;
The step S2 comprises the following sub-steps:
S21, acquiring an original page image of a printed matter in advance and extracting feature points, so as to acquire a page feature library, and forming a corresponding area play multimedia content library in a mode that certain specific areas correspond to certain multimedia files on the original page of the printed matter in advance;
S22, continuously shooting a printed matter page to be learned by using a camera of a learning rod, obtaining a page image to be learned, extracting characteristic points, searching the page to be learned in the page characteristic library based on the extracted characteristic points, and matching page characteristics to obtain original page information of the page to be learned in the page characteristic library;
The page feature matching is realized by the following method:
Performing dimension reduction, hash conversion and sequencing on feature descriptors corresponding to feature points extracted from a page image to be learned, comparing the hash value with hash values of the feature points stored in a page feature library, if the distance is smaller than a preset first threshold value, determining that the pair of feature points are matched, and determining that the page image to be learned is matched with a corresponding original page image, wherein the matching calculation distance process is to calculate the distance between the hash value of the feature points and 2L data in the page feature library, L is the number of hash tables, the distance is defined as the absolute value of the difference between the two numbers, and the number of the feature points is determined to be matched if the distance is smaller than the preset first threshold value;
S3, utilizing the selected reference object to touch an interested area on a printed matter page to be learned, continuously shooting through a camera of a learning rod to obtain a reference page image containing the selected reference object, the printed matter and an auxiliary pad, and identifying the selected reference object from the reference page image to obtain the selected reference object;
S31, converting a color space of a reference page image into a YCbCr color space, wherein the color space conversion is realized by the following formula aiming at the reference page image in an RGB format:
Y = 0.257*R+0.564*G+0.098*B+16
Cb = -0.148*R-0.291*G+0.439*B+128
Cr = 0.439*R-0.368*G-0.071*B+128;
S32, skin color segmentation, and obtaining candidate hand areas based on histogram statistics; respectively establishing histograms for Y, cb and Cr, selecting lower limits Y1, cb1 and Cr1 according to peaks of the histograms, selecting upper limits Y2, cb2 and Cr2, and judging pixels meeting Y1< Y2, cb1< Cb2 and Cr1< Cr2 as candidate skin color points; binarizing the image, setting non-candidate skin color points to 0, and setting candidate skin color points to 1; expanding and communicating the binary image, reserving a maximum communication area with a value of 1, calculating the area S0 of the communication area and the total area S of the image, and judging the communication area as a candidate hand area if S0/S > Th1 and Th1 are preset thresholds;
S33, hand outline recognition is carried out, and whether the candidate player area is a real hand area in a click-reading state is judged; the method comprises the steps of carrying out longitudinal projection on a candidate hand area, dividing a projection image into a left part and a right part according to average height, calculating the length l and the height h of an area of a lower part of the projection image, collecting the image width w, and judging the connected area as a real hand area if h/w > Th2, h/l < Th3, th2 and Th3 are preset thresholds;
S34, calculating the center of gravity of the region; assuming that there are N pixel points of the hand region, the coordinates in the image are
(Xi, yi), i=1 … N, region barycentric coordinates xc=sum (xi)/N, yc=sum (yi)/N;
s35, identifying the fingertip position based on the center of gravity of the region; extracting the outline of the hand region, calculating the distance between the outline pixel point of the real hand region and the position of the gravity center of the region, and judging the pixel point with the largest distance as the fingertip position; (xp, yp) =argmax ((xi-xc) ×(xi-xc) + (yi-yc) ×i=1 … M, where (xp, yp) is a fingertip position coordinate and M is a contour pixel count;
S4, calculating the position information of the selected reference object on the printed matter, which is identified in the step S3, based on the staff object of the auxiliary pad on the reference page image;
s5, acquiring a multimedia file preset at a corresponding position based on the position information acquired in the S4 and the original page information acquired in the S2, and playing the multimedia file.
2. The intelligent reading learning method based on coordinate recognition as claimed in claim 1, wherein: the selected reference object adopts a human hand finger, a pen-shaped object or an object with a light-emitting device at the tip.
3. The intelligent reading learning method based on coordinate recognition as claimed in claim 1, wherein the step S3 further comprises performing point touch recognition from the reference page image to obtain gesture command information, and the step S5 further comprises selecting a playing mode of the multimedia file based on the gesture command information.
4. The intelligent reading learning method based on coordinate recognition as claimed in claim 1, wherein: the step S4 includes the following sub-steps:
S41, carrying out coordinate mapping based on the scale object of the auxiliary pad on the reference page image, and calculating the position information of the selected reference object on the auxiliary pad, which is identified in the step S3;
S42, converting coordinate values of the selected reference object on the auxiliary pad into position information of the selected reference object on the printed matter based on the mapping relation between the auxiliary pad and the printed matter.
5. The intelligent reading learning method based on coordinate recognition as claimed in claim 1, wherein: the feature point extraction is realized by the following method:
graying the image;
extracting characteristic points by using a key point detection algorithm;
performing feature point direction identification based on histogram statistics;
and describing the feature points to obtain feature descriptors.
6. The intelligent reading learning method based on coordinate recognition as claimed in claim 3, wherein: the click action of the selected reference object in step S3 includes a single click and a double click.
CN201911253273.6A 2019-12-09 2019-12-09 Intelligent reading learning method based on coordinate recognition Active CN110991371B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911253273.6A CN110991371B (en) 2019-12-09 2019-12-09 Intelligent reading learning method based on coordinate recognition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911253273.6A CN110991371B (en) 2019-12-09 2019-12-09 Intelligent reading learning method based on coordinate recognition

Publications (2)

Publication Number Publication Date
CN110991371A CN110991371A (en) 2020-04-10
CN110991371B true CN110991371B (en) 2024-08-09

Family

ID=70091463

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911253273.6A Active CN110991371B (en) 2019-12-09 2019-12-09 Intelligent reading learning method based on coordinate recognition

Country Status (1)

Country Link
CN (1) CN110991371B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111539405B (en) * 2020-04-16 2023-05-30 安徽淘云科技股份有限公司 Auxiliary reading method, auxiliary reading device, electronic equipment and storage medium
CN113676654B (en) * 2020-05-14 2023-06-06 武汉Tcl集团工业研究院有限公司 Image capturing method, device, equipment and computer readable storage medium

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108509136A (en) * 2018-04-12 2018-09-07 山东音为爱智能科技有限公司 A kind of children based on artificial intelligence paint this aid reading method

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110111612A (en) * 2019-04-11 2019-08-09 深圳市学之友科技有限公司 A kind of photo taking type reading method, system and point read equipment
CN110060524A (en) * 2019-04-30 2019-07-26 广东小天才科技有限公司 Robot-assisted reading method and reading robot

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108509136A (en) * 2018-04-12 2018-09-07 山东音为爱智能科技有限公司 A kind of children based on artificial intelligence paint this aid reading method

Also Published As

Publication number Publication date
CN110991371A (en) 2020-04-10

Similar Documents

Publication Publication Date Title
CN111259889A (en) Image text recognition method and device, computer equipment and computer storage medium
CN111291629A (en) Recognition method, device, computer equipment and computer storage medium of text in image
WO2019061658A1 (en) Method and device for positioning eyeglass, and storage medium
CN110569818A (en) intelligent reading learning method
WO2019033571A1 (en) Facial feature point detection method, apparatus and storage medium
CN110458158B (en) Text detection and identification method for assisting reading of blind people
CN113252614B (en) Transparency detection method based on machine vision
CN114092938B (en) Image recognition processing method and device, electronic equipment and storage medium
WO2019033570A1 (en) Lip movement analysis method, apparatus and storage medium
CN112634125B (en) Automatic face replacement method based on off-line face database
WO2019033568A1 (en) Lip movement capturing method, apparatus and storage medium
CN109947273B (en) A kind of point reading positioning method and device
US9082184B2 (en) Note recognition and management using multi-color channel non-marker detection
CN110555435B (en) Point-reading interaction realization method
WO2023024766A1 (en) Object size identification method, readable storage medium and object size identification system
CN110991371B (en) Intelligent reading learning method based on coordinate recognition
CN114240981A (en) Mark recognition method and device
CN113780116A (en) Invoice classification method, apparatus, computer equipment and storage medium
CN109977834A (en) The method and apparatus divided manpower from depth image and interact object
Sruthi et al. Double-handed dynamic gesture recognition using contour-based hand tracking and maximum mean probability ensembling (MMPE) for Indian Sign Language
WO2019071476A1 (en) Express information input method and system based on intelligent terminal
CN115457585A (en) Processing method and device for homework correction, computer equipment and readable storage medium
JP2008067321A (en) Data registration management device
Thongtaweechaikij et al. Text extraction by optical character recognition-based on the template card
CN110765997B (en) Interactive reading realization method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20240709

Address after: 1406, 14th Floor, Building 2, No.1 Courtyard, Shangdi 10th Street, Haidian District, Beijing, 100080

Applicant after: Beijing Anxin Zhitong Technology Co.,Ltd.

Country or region after: China

Address before: Room 403, C4, building 2, software industry base, No. 87, 89, 91, South 10th Road, Gaoxin, Binhai community, Yuehai street, Nanshan District, Shenzhen, Guangdong 518000

Applicant before: Shenzhen yikuai Interactive Network Technology Co.,Ltd.

Country or region before: China

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant