[go: up one dir, main page]

CN118334673A - AR-based library book introduction intelligent reading method and system - Google Patents

AR-based library book introduction intelligent reading method and system Download PDF

Info

Publication number
CN118334673A
CN118334673A CN202410438194.7A CN202410438194A CN118334673A CN 118334673 A CN118334673 A CN 118334673A CN 202410438194 A CN202410438194 A CN 202410438194A CN 118334673 A CN118334673 A CN 118334673A
Authority
CN
China
Prior art keywords
book
target
searching
pixel coordinates
tag search
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202410438194.7A
Other languages
Chinese (zh)
Other versions
CN118334673B (en
Inventor
刘鹏程
吴文珏
王楠
胡婧莹
万凌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hubei Engineering Vocational College Hubei Mechanical Industry School Huangshi Senior Technical School
Original Assignee
Hubei Engineering Vocational College Hubei Mechanical Industry School Huangshi Senior Technical School
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hubei Engineering Vocational College Hubei Mechanical Industry School Huangshi Senior Technical School filed Critical Hubei Engineering Vocational College Hubei Mechanical Industry School Huangshi Senior Technical School
Priority to CN202410438194.7A priority Critical patent/CN118334673B/en
Publication of CN118334673A publication Critical patent/CN118334673A/en
Application granted granted Critical
Publication of CN118334673B publication Critical patent/CN118334673B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/20Image signal generators
    • H04N13/204Image signal generators using stereoscopic image cameras
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/53Querying
    • G06F16/532Query formulation, e.g. graphical querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • G06N3/0442Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/15Cutting or merging image elements, e.g. region growing, watershed or clustering-based techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/18Extraction of features or characteristics of the image
    • G06V30/1801Detecting partial patterns, e.g. edges or contours, or configurations, e.g. loops, corners, strokes or intersections
    • G06V30/18019Detecting partial patterns, e.g. edges or contours, or configurations, e.g. loops, corners, strokes or intersections by matching or filtering
    • G06V30/18038Biologically-inspired filters, e.g. difference of Gaussians [DoG], Gabor filters
    • G06V30/18048Biologically-inspired filters, e.g. difference of Gaussians [DoG], Gabor filters with interaction between the responses of different filters, e.g. cortical complex cells
    • G06V30/18057Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/18Extraction of features or characteristics of the image
    • G06V30/18105Extraction of features or characteristics of the image related to colour
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/10Processing, recording or transmission of stereoscopic or multi-view image signals
    • H04N13/106Processing image signals
    • H04N13/15Processing image signals for colour aspects of image signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/30Image reproducers
    • H04N13/332Displays for viewing with the aid of special glasses or head-mounted displays [HMD]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N2013/0074Stereoscopic image analysis
    • H04N2013/0077Colour aspects

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biophysics (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to the technical field of intelligent library book introduction reading scheme design based on AR, in particular to an intelligent library book introduction reading method and system based on AR. Acquiring a current frame image of a target area shot by an AR camera on the head-mounted AR device; processing the current frame image and establishing a label search frame of a book searching label of each book; identifying the book searching number in the target tag searching frame, acquiring a target book searching number corresponding to the target tag searching frame, calling a target book brief introduction corresponding to the book searching number in a book database according to the target book searching number, and writing the target book brief introduction into a preset position in a current frame image, so that intelligent reading of the library book brief introduction can be realized only through an AR camera on head-mounted AR equipment, the book brief introduction can be read without taking a reader off a bookshelf, the intelligent degree and usability of the invention are greatly improved, and the application scene of the invention is greatly expanded.

Description

AR-based library book introduction intelligent reading method and system
Technical Field
The invention relates to the technical field of intelligent library book introduction reading scheme design based on AR, in particular to an intelligent library book introduction reading method and system based on AR.
Background
Because the library has a plurality of library books, most libraries can adopt book searching labels to improve the searching efficiency, but readers need to take down books which want to know from the bookshelf in sequence when searching books of interest, and then put the books back on the bookshelf after finishing looking for introduction or general browsing.
Thus, the prior art is still to be further developed.
Disclosure of Invention
The invention aims to overcome the technical defects and provide an AR-based intelligent library book introduction reading method and system, which solve the problems in the prior art.
To achieve the above technical object, according to a first aspect of the present invention, there is provided an AR-based intelligent reading method for library book introduction, the method comprising:
S100, acquiring a current frame image of a target area shot by an AR camera on the head-mounted AR equipment; processing the current frame image, extracting boundaries of book searching labels of all books in the current frame image, acquiring pixel coordinates of the upper left corner and the lower right corner of each boundary, and establishing a label searching frame of the book searching labels of all books according to the pixel coordinates of the upper left corner and the lower right corner of each boundary;
S200, calculating pixel coordinates of a central point of each tag search frame, calculating Euclidean distances between the pixel coordinates of the central point of each tag search frame and preset pixel coordinates, and marking the tag search frame corresponding to the minimum value in the Euclidean distances as a target tag search frame;
S300, identifying the book searching number in the target tag searching frame, obtaining a target book searching number corresponding to the target tag searching frame, calling a target book brief introduction corresponding to the book searching number in the book database according to the target book searching number, and writing the target book brief introduction into a preset position in the current frame image.
Specifically, the processing the current frame image includes:
extracting each RGB color channel, threshold segmentation based on color characteristics, morphological processing and region screening based on height are sequentially carried out on the current frame image, and a target image only containing a frame region of a book label is obtained; and carrying out edge processing on the target image, and judging and determining a book searching label area through boundary points to obtain a book searching label image.
Specifically, the extracting the boundaries of the book-searching labels of all books in the current frame image, and establishing a label search box of the book-searching labels of all books according to the boundaries of the book-searching labels of all books includes:
and acquiring the minimum circumscribed rectangle of each book-binding label in the book-binding label image, and taking the minimum circumscribed rectangle corresponding to each book-binding label as the boundary of each book-binding label.
Specifically, the calculating the pixel coordinates of the center point of each tag search box includes:
and sequentially connecting pixels corresponding to the upper left corner and the lower right corner of each boundary to form line segments, calculating the pixel coordinates of the midpoints of each line segment, and taking the pixel coordinates of the midpoints of each line segment as the pixel coordinates of the central point of each tag search box.
Specifically, the identifying the book number in the target tag search box to obtain the target book number corresponding to the target tag search box includes:
and identifying text content in the target tag search box by using the CRNN network, and taking the content identified by the CRNN network as a filing number in the target tag search box.
Specifically, the identifying text content in the target tag search box by using the CRNN network includes:
Firstly, a convolution layer is used for learning text features, then the convolved features are input into sequence features of learning words in a bidirectional long-short-time memory network, and finally, the recognized text content is subjected to de-duplication processing through a transcription layer to output a final prediction result.
Specifically, the method further comprises the following steps:
Outputting a voice interaction signal related to whether the current book is locked or not to a user, acquiring interaction voice of the user, judging whether to stop calculating the Euclidean distance between the pixel coordinates of the center point of each tag search box and the preset pixel coordinates according to the interaction voice of the user, and writing the target book profile of the current video frame image into a subsequent frame image.
Specifically, the method further comprises the following steps:
if the acquisition result of the interactive voice of the user is yes, stopping calculating the Euclidean distance between the pixel coordinates of the central point of each tag search box and the preset pixel coordinates, and writing the target book introduction of the current video frame image into the subsequent frame image;
If the acquisition result of the interactive voice of the user is 'no', calculating the Euclidean distance between the pixel coordinates of the central point of each tag search box and the preset pixel coordinates, and marking the tag search box corresponding to the minimum value in the Euclidean distance as a target tag search box; identifying the book searching number in the target tag searching frame, obtaining a target book searching number corresponding to the target tag searching frame, calling a target book brief introduction corresponding to the book searching number in a book database according to the target book searching number, and writing the target book brief introduction into a preset position in the current frame image.
According to a second aspect of the present invention, there is provided an AR-based intelligent library book profile reading system comprising:
The acquisition module comprises an AR camera on the head-mounted AR equipment and is used for shooting a current frame image of a target area;
The control module is used for processing the current frame image, extracting the boundaries of the book searching labels of all books in the current frame image, acquiring the pixel coordinates of the upper left corner and the lower right corner of each boundary, and establishing a label searching frame of the book searching labels of all books according to the pixel coordinates of the upper left corner and the lower right corner of each boundary; or calculating the pixel coordinates of the central point of each tag search frame, calculating the Euclidean distance between the pixel coordinates of the central point of each tag search frame and the preset pixel coordinates, and marking the tag search frame corresponding to the minimum value in the Euclidean distance as a target tag search frame; or the method is used for identifying the book searching number in the target tag searching frame, obtaining the target book searching number corresponding to the target tag searching frame, calling the target book brief introduction corresponding to the book searching number in the book database according to the target book searching number, and writing the target book brief introduction into the preset position in the current frame image.
According to a third aspect of the present invention, there is provided an electronic device comprising: a memory; and a processor, wherein the memory stores computer readable instructions, and the computer readable instructions implement the intelligent library book profile reading method based on AR when executed by the processor.
The beneficial effects are that:
According to the invention, intelligent reading of library book introduction can be realized only through the AR camera on the head-mounted AR device, the book introduction can be read without taking the book off the bookshelf by a reader, complicated algorithm modeling is not needed, the time of the reader is saved to a great extent, the problems that the position of the book is inaccurate and the follow-up reader is influenced when the reader puts the book back to the bookshelf are solved, the intelligent degree and usability of the invention are improved to a great extent, and the application scene of the invention is greatly expanded.
Drawings
FIG. 1 is a flow chart of an AR-based intelligent reading method for library book profiles provided in an embodiment of the present invention;
FIG. 2 is a schematic diagram of the system components of an AR-based intelligent library book profile reading system in accordance with an embodiment of the present invention.
Detailed Description
In order to make the technical solution of the present application better understood by those skilled in the art, the technical solution of the present application will be clearly and completely described in the following with reference to the accompanying drawings, and based on the embodiments of the present application, other similar embodiments obtained by those skilled in the art without making any inventive effort should be included in the scope of protection of the present application. In addition, directional words such as "upper", "lower", "left", "right", and the like, as used in the following embodiments are merely directions with reference to the drawings, and thus, the directional words used are intended to illustrate, not to limit, the application.
The invention will be further described with reference to the drawings and preferred embodiments.
Referring to fig. 1, the invention provides an intelligent library book introduction reading method based on AR, comprising the following steps:
S100, acquiring a current frame image of a target area shot by an AR camera on the head-mounted AR equipment; processing the current frame image, extracting boundaries of the book searching labels of all books in the current frame image, acquiring pixel coordinates of the upper left corner and the lower right corner of each boundary, and establishing a label searching frame of the book searching labels of all books according to the pixel coordinates of the upper left corner and the lower right corner of each boundary.
Here, the step S100 includes, before:
a book database is established, wherein the book database comprises a plurality of book searching numbers and text contents of book introduction corresponding to the book searching numbers.
Here, step S100 further includes:
preset pixel coordinates and preset positions are preset in the control module.
It can be understood that the preset pixel coordinates and the preset positions can be specifically set according to the actual needs of the user, and the invention does not limit the specific numerical values of the preset pixel coordinates and the preset positions, so long as the method is suitable for the intelligent reading method of the library book profile based on AR.
Preferably, the preset pixel coordinates are set as the coordinates of the central point of the current frame image, the preset position is set as the coordinates of the upper left corner of the text corresponding to the target book introduction by taking the lower right corner of the target tag search box, and the text corresponding to the target book introduction is inserted. The technical staff can display only one book brief introduction through a large number of experiments, the displayed book brief introduction is the book brief introduction corresponding to the target tag search frame closest to the center point coordinate of the current frame image, the book brief introduction inserted into the current frame image does not shade the target tag search frame, the display effect is optimized to a great extent, the intelligent degree, the reliability and the usability of the invention are further improved, and the user experience is optimized to a great extent.
Specifically, the processing the current frame image includes:
extracting each RGB color channel, threshold segmentation based on color characteristics, morphological processing and region screening based on height are sequentially carried out on the current frame image, and a target image only containing a frame region of a book label is obtained; and carrying out edge processing on the target image, and judging and determining a book searching label area through boundary points to obtain a book searching label image.
Here, according to the actual shooting situation, the collected book image is found to have the following features:
(1) In the acquired video or single image, the number of books is generally 14-29, and 18 books are used for most;
(2) The book-ordering label consists of a white background and black characters, and in order to make the position of the book-ordering label in the spine more striking, the periphery of the book-ordering label is generally provided with frames with certain width and other colors, and the specifications and colors of the frames of the book-ordering label are not unified at present, and the red frames are most;
(3) There are 3 kinds of attaching modes of the book end, namely, the book end is closely attached to the book end and the book spine, and the three modes have advantages and disadvantages respectively, so that no unified standard exists at present.
According to the characteristics of the book-holding labels, the invention extracts the boundaries of the book-holding labels of all books based on the book-holding labels of the red frames. The specific steps for acquiring the frame of the book label are as follows:
S110, acquiring a current frame image of a target area shot by an AR camera on the head-mounted AR device, and extracting red (R), green (G) and blue (B) components aiming at the current frame image to obtain component images IR, IG and IB.
S120, according to the color development principle, if the image is to be presented with red color, the value of the red component is generally larger, the value of the red component is required to be far larger than the values of other two components, and certain correlation exists among the three color components; through analysis of a series of images by the technician of the present invention, it is finally determined that the value of the red component reaches at least half of the maximum gray value, namely 127; assuming that a pixel point at a certain point in an image is displayed as red, the relationship of the components is as follows:
Wherein m1 and m2 are correlation coefficients of R and G, B components respectively, and IR (x, y), IG (x, y), IB (x, y) are gray values of the images IR, IG, IB at positions (x, y) respectively.
Analyzing boundary colors of the Soxhlet labels under various conditions, including perfect aging states and different aging states, so as to obtain a distribution range of m1 and m2 as m1 epsilon [2,6]; m2 is epsilon [2.19,5.44].
S130, judging pixels of each component image IR, IG, IB of the book image one by one, if the gray value of the pixel of a certain point in each channel image IR, IG, IB meets I R (x, y) >127,
The pixel point is the frame of the suspected book label, the gray value of the pixel point is set to be 1, otherwise, the gray value of the pixel point is 0, namely, the acquisition mode of the pixel value of each point in the target image only containing the frame area of the book label is as follows:
Wherein m1min and m2min are minimum values of m1 and m2 respectively, and (x, y) represents any pixel point in the image.
And S140, performing region filling on the target image which only contains the frame region of the Soxhlet tag and is obtained in the step S130 through morphological processing, and removing the influence of a small error region caused by noise.
S150, because partial red areas possibly exist in the book spine, the range is generally larger than the height stability of the book-lashing tag frame, and the height is also generally far greater than the height of the book-lashing tag frame, so that the areas are screened according to the height. The number Hn of pixel points corresponding to the frame height of the book label may be different in different shooting distances and shooting angles, and after the shooting distances and the shooting angles are determined, the numerical value is basically stable, and the numerical value can be tested and fixed in the first test image. In this experiment, the value of Hn was about 10 pixels.
Specifically, the height-based region screening procedure is as follows:
(1) Judging and sequencing the number of the connected domains according to the eight-connected criterion;
(2) The judgment of the connected domain is sequentially carried out according to the ordering sequence, the judgment steps are as follows, firstly, the height of the connected domain is obtained, the connected domain is compared with a determined threshold Hn, and if the height of the connected domain is smaller than or equal to Hn, all pixel values in the connected domain are kept unchanged; if the height of the connected domain is larger than Hn, setting all pixel values in the connected domain to 0;
(3) And after all the connected domains are judged, obtaining a boundary image of the book-in-cable label.
(4) And carrying out edge processing on the boundary image of the book label, judging and determining a book label area through boundary points, namely separating the book label from the background and the spine to obtain a book label image, and finally, carrying out minimum circumscribed rectangle acquisition on the book label image to obtain pixel coordinates of the left upper corner and the right lower corner of each book label in the current frame image.
Specifically, the acquiring method of the book label is as follows: and (3) carrying out edge extraction on the boundary image of the book label, namely, reserving the numerical value of the boundary point, changing other gray values into 0, and reserving an upper frame and a lower frame of the edge image of the frame of the book label to obtain an intact book label after the processing of the steps, wherein each frame is provided with the upper and the lower boundaries, and only the numerical value of the boundary point is reserved in the edge extraction process, namely, each column of the intact book label area is provided with four non-zero values. According to the method, the method comprises the following specific steps of:
Firstly, creating three one-dimensional arrays, which are named as A0, ab and Ae respectively; detecting the number of non-zero points in the edge image column by column, storing the number of the non-zero points in the column into A0, storing the row coordinate of the first non-zero point in the column into Ab, and storing the row coordinate of the last non-zero point in the column into Ae; judging column by column, taking j as an example, changing the gray values of all pixel points from the first non-zero point to the last non-zero point of the column into 1 if A0 (j) =4, wherein A0 (j) represents the number of the non-zero points in the j-th column, ab (j) represents the row coordinate of the first non-zero point in the j-th column, and Ae (j) represents the row coordinate of the last non-zero point in the j-th column; if A0 (j) is not equal to 4, setting the non-zero boundary point in the column to zero; and (3) finishing the judgment of all the columns to obtain the boundary image of the book-reading label.
Specifically, the extracting the boundaries of the book-searching labels of all books in the current frame image, and establishing a label search box of the book-searching labels of all books according to the boundaries of the book-searching labels of all books includes:
and acquiring the minimum circumscribed rectangle of each book-binding label in the book-binding label image, and taking the minimum circumscribed rectangle corresponding to each book-binding label as the boundary of each book-binding label.
And S200, calculating pixel coordinates of the central points of the tag search frames, calculating Euclidean distances between the pixel coordinates of the central points of the tag search frames and preset pixel coordinates, and marking the tag search frame corresponding to the minimum value in the Euclidean distances as a target tag search frame.
Specifically, the calculating the pixel coordinates of the center point of each tag search box includes:
and sequentially connecting pixels corresponding to the upper left corner and the lower right corner of each boundary to form line segments, calculating the pixel coordinates of the midpoints of each line segment, and taking the pixel coordinates of the midpoints of each line segment as the pixel coordinates of the central point of each tag search box.
S300, identifying the book searching number in the target tag searching frame, obtaining a target book searching number corresponding to the target tag searching frame, calling a target book brief introduction corresponding to the book searching number in the book database according to the target book searching number, and writing the target book brief introduction into a preset position in the current frame image.
Specifically, the identifying the book number in the target tag search box to obtain the target book number corresponding to the target tag search box includes:
and identifying text content in the target tag search box by using the CRNN network, and taking the content identified by the CRNN network as a filing number in the target tag search box.
Specifically, the identifying text content in the target tag search box by using the CRNN network includes:
Firstly, a convolution layer is used for learning text features, then the convolved features are input into sequence features of learning words in a bidirectional long-short-time memory network, and finally, the recognized text content is subjected to de-duplication processing through a transcription layer to output a final prediction result.
It should be noted that the CRNN network (Convolutional Recurrent Neural Network) is used to identify the text content detected in the previous step, and the network is a convolutional neural network structure, which is used to solve the problem of image-based sequence identification, especially the problem of scene word identification. The content identified by the CRNN network is the book number. The method of the network is that firstly, a convolution layer is used for learning text characteristics, and then the convolved characteristics are input into a sequence characteristic of learning characters in a bidirectional long-short-time memory network. The bidirectional long-short-term memory network can well utilize the information of the context rather than the isolated prediction of each character, and can more accurately identify the predicted text content by combining all contents of the context. And finally, performing processing such as de-duplication and the like on the identified text content through a transcription layer to output a final prediction result. The network identifies the content detected in the last step, and outputs the content as text information, including the book searching number of each book and the position information of the bookshelf.
It should be noted that the present invention uses CRNN networks to perform text recognition respectively, but the text recognition networks are far more than this, and similar effects can be achieved by using other text recognition networks, such as DTRN (Deep-text RecurrentNetwork), but the nature is to recognize the detected content.
It can be appreciated that the Soxhlet number extraction and identification are both prior art, and the present invention is not described in detail herein.
Specifically, the method further comprises the following steps:
Outputting a voice interaction signal related to whether the current book is locked or not to a user, acquiring interaction voice of the user, judging whether to stop calculating the Euclidean distance between the pixel coordinates of the center point of each tag search box and the preset pixel coordinates according to the interaction voice of the user, and writing the target book profile of the current video frame image into a subsequent frame image.
Specifically, the method further comprises the following steps:
if the acquisition result of the interactive voice of the user is yes, stopping calculating the Euclidean distance between the pixel coordinates of the central point of each tag search box and the preset pixel coordinates, and writing the target book introduction of the current video frame image into the subsequent frame image;
If the acquisition result of the interactive voice of the user is 'no', calculating the Euclidean distance between the pixel coordinates of the central point of each tag search box and the preset pixel coordinates, and marking the tag search box corresponding to the minimum value in the Euclidean distance as a target tag search box; identifying the book searching number in the target tag searching frame, obtaining a target book searching number corresponding to the target tag searching frame, calling a target book brief introduction corresponding to the book searching number in a book database according to the target book searching number, and writing the target book brief introduction into a preset position in the current frame image.
The invention realizes that the same target book introduction is continuously displayed according to the needs of the user or is in a continuously updated state according to the left-right rotation of the AR glasses by the aid of the distinguishing technical features, so that readers can read the book introduction conveniently, and the intelligent degree and usability of the invention are further improved.
It can be understood that the intelligent reading of the library book introduction can be realized only through the AR camera on the head-mounted AR device, the book introduction can be read without taking the book off the bookshelf by a reader, and complex algorithm modeling is not needed, so that the time of the reader is saved to a great extent, the problems that the position of the book placed by the reader is inaccurate and the follow-up reader is influenced to find the related book when the reader places the book back to the bookshelf are solved, the intelligent degree and usability of the intelligent reading device are improved to a great extent, and the application scene of the intelligent reading device is greatly expanded.
Referring to fig. 2, another embodiment of the present invention is provided, and the present embodiment provides an AR-based intelligent reading system for library book profiles, including:
an acquisition module 100, including an AR camera on a head-mounted AR device, for capturing a current frame image of a target area;
the control module 200 is used for processing the current frame image, extracting the boundaries of the book searching labels of all books in the current frame image, acquiring the pixel coordinates of the upper left corner and the lower right corner of each boundary, and establishing a label searching frame of the book searching labels of all books according to the pixel coordinates of the upper left corner and the lower right corner of each boundary; or calculating the pixel coordinates of the central point of each tag search frame, calculating the Euclidean distance between the pixel coordinates of the central point of each tag search frame and the preset pixel coordinates, and marking the tag search frame corresponding to the minimum value in the Euclidean distance as a target tag search frame; or the method is used for identifying the book searching number in the target tag searching frame, obtaining the target book searching number corresponding to the target tag searching frame, calling the target book brief introduction corresponding to the book searching number in the book database according to the target book searching number, and writing the target book brief introduction into the preset position in the current frame image.
It should be noted that the intelligent reading of the library book introduction can be realized only through the AR camera on the head-mounted AR device, the book introduction can be read without taking the book off the bookshelf by a reader, complicated algorithm modeling is not needed, the time of the reader is saved to a great extent, the problems that the position of the book is inaccurate and the follow-up reader is influenced to find the related book when the reader puts the book back to the bookshelf are solved, the intelligent degree and usability of the intelligent reading device are improved to a great extent, and the application scene of the intelligent reading device is greatly expanded.
In a preferred embodiment, the present application also provides an electronic device, including:
a memory; and a processor, wherein the memory stores computer readable instructions that when executed by the processor implement the AR-based intelligent library book profile reading method. The computer device may be broadly a server, a terminal, or any other electronic device having the necessary computing and/or processing capabilities. In one embodiment, the computer device may include a processor, memory, network interface, communication interface, etc. connected by a system bus. The processor of the computer device may be used to provide the necessary computing, processing and/or control capabilities. The memory of the computer device may include a non-volatile storage medium and an internal memory. The non-volatile storage medium may have an operating system, computer programs, etc. stored therein or thereon. The internal memory may provide an environment for the operation of the operating system and computer programs in the non-volatile storage media. The network interface and communication interface of the computer device may be used to connect and communicate with external devices via a network. Which when executed by a processor performs the steps of the method of the invention.
The present invention may be implemented as a computer readable storage medium having stored thereon a computer program which, when executed by a processor, causes steps of a method of an embodiment of the present invention to be performed. In one embodiment, the computer program is distributed over a plurality of computer devices or processors coupled by a network such that the computer program is stored, accessed, and executed by one or more computer devices or processors in a distributed fashion. A single method step/operation, or two or more method steps/operations, may be performed by a single computer device or processor, or by two or more computer devices or processors. One or more method steps/operations may be performed by one or more computer devices or processors, and one or more other method steps/operations may be performed by one or more other computer devices or processors. One or more computer devices or processors may perform a single method step/operation or two or more method steps/operations.
Those of ordinary skill in the art will appreciate that the method steps of the present invention may be implemented by a computer program, which may be stored on a non-transitory computer readable storage medium, to instruct related hardware such as a computer device or a processor, which when executed causes the steps of the present invention to be performed. Any reference herein to memory, storage, database, or other medium may include non-volatile and/or volatile memory, as the case may be. Examples of nonvolatile memory include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), flash memory, magnetic tape, floppy disk, magneto-optical data storage, hard disk, solid state disk, and the like. Examples of volatile memory include Random Access Memory (RAM), external cache memory, and the like.
It can be understood that the intelligent reading of the library book introduction can be realized only through the AR camera on the head-mounted AR device, the book introduction can be read without taking the book off the bookshelf by a reader, and complex algorithm modeling is not needed, so that the time of the reader is saved to a great extent, the problems that the position of the book placed by the reader is inaccurate and the follow-up reader is influenced to find the related book when the reader places the book back to the bookshelf are solved, the intelligent degree and usability of the intelligent reading device are improved to a great extent, and the application scene of the intelligent reading device is greatly expanded.
The technical features described above may be arbitrarily combined. Although not all possible combinations of features are described, any combination of features should be considered to be covered by the description provided that such combinations are not inconsistent.
The above-described embodiments of the present invention do not limit the scope of the present invention. Any other corresponding changes and modifications made in accordance with the technical idea of the present invention shall be included in the scope of the claims of the present invention.

Claims (10)

1. An intelligent library book introduction reading method based on AR, which is characterized by comprising the following steps:
S100, acquiring a current frame image of a target area shot by an AR camera on the head-mounted AR equipment; processing the current frame image, extracting boundaries of book searching labels of all books in the current frame image, acquiring pixel coordinates of the upper left corner and the lower right corner of each boundary, and establishing a label searching frame of the book searching labels of all books according to the pixel coordinates of the upper left corner and the lower right corner of each boundary;
S200, calculating pixel coordinates of a central point of each tag search frame, calculating Euclidean distances between the pixel coordinates of the central point of each tag search frame and preset pixel coordinates, and marking the tag search frame corresponding to the minimum value in the Euclidean distances as a target tag search frame;
S300, identifying the book searching number in the target tag searching frame, obtaining a target book searching number corresponding to the target tag searching frame, calling a target book brief introduction corresponding to the book searching number in the book database according to the target book searching number, and writing the target book brief introduction into a preset position in the current frame image.
2. The AR-based intelligent reading method for library book profiles according to claim 1, wherein said processing the current frame image comprises:
extracting each RGB color channel, threshold segmentation based on color characteristics, morphological processing and region screening based on height are sequentially carried out on the current frame image, and a target image only containing a frame region of a book label is obtained; and carrying out edge processing on the target image, and judging and determining a book searching label area through boundary points to obtain a book searching label image.
3. The intelligent reading method for the library book profile based on the AR according to claim 2, wherein the extracting the boundaries of the book-searching tags of all books in the current frame image, and establishing the tag search box of the book-searching tag of each book according to the boundaries of the book-searching tags of all books, comprises:
and acquiring the minimum circumscribed rectangle of each book-binding label in the book-binding label image, and taking the minimum circumscribed rectangle corresponding to each book-binding label as the boundary of each book-binding label.
4. The AR-based intelligent reading method for library book profiles of claim 3, wherein said calculating pixel coordinates of a center point of each tag search box comprises:
and sequentially connecting pixels corresponding to the upper left corner and the lower right corner of each boundary to form line segments, calculating the pixel coordinates of the midpoints of each line segment, and taking the pixel coordinates of the midpoints of each line segment as the pixel coordinates of the central point of each tag search box.
5. The intelligent reading method for the library book introduction based on the AR of claim 4, wherein the identifying the index number in the target tag search box to obtain the target index number corresponding to the target tag search box comprises:
and identifying text content in the target tag search box by using the CRNN network, and taking the content identified by the CRNN network as a filing number in the target tag search box.
6. The AR-based intelligent reading method of library book profiles of claim 5, wherein said identifying text content in a target tag search box using a CRNN network comprises:
Firstly, a convolution layer is used for learning text features, then the convolved features are input into sequence features of learning words in a bidirectional long-short-time memory network, and finally, the recognized text content is subjected to de-duplication processing through a transcription layer to output a final prediction result.
7. The AR-based library book profile intelligent reading method of claim 6, further comprising:
Outputting a voice interaction signal related to whether the current book is locked or not to a user, acquiring interaction voice of the user, judging whether to stop calculating the Euclidean distance between the pixel coordinates of the center point of each tag search box and the preset pixel coordinates according to the interaction voice of the user, and writing the target book profile of the current video frame image into a subsequent frame image.
8. The AR-based library book profile intelligent reading method of claim 7, further comprising:
if the acquisition result of the interactive voice of the user is yes, stopping calculating the Euclidean distance between the pixel coordinates of the central point of each tag search box and the preset pixel coordinates, and writing the target book introduction of the current video frame image into the subsequent frame image;
If the acquisition result of the interactive voice of the user is 'no', calculating the Euclidean distance between the pixel coordinates of the central point of each tag search box and the preset pixel coordinates, and marking the tag search box corresponding to the minimum value in the Euclidean distance as a target tag search box; identifying the book searching number in the target tag searching frame, obtaining a target book searching number corresponding to the target tag searching frame, calling a target book brief introduction corresponding to the book searching number in a book database according to the target book searching number, and writing the target book brief introduction into a preset position in the current frame image.
9. An AR-based intelligent library book profile reading system, comprising:
The acquisition module comprises an AR camera on the head-mounted AR equipment and is used for shooting a current frame image of a target area;
The control module is used for processing the current frame image, extracting the boundaries of the book searching labels of all books in the current frame image, acquiring the pixel coordinates of the upper left corner and the lower right corner of each boundary, and establishing a label searching frame of the book searching labels of all books according to the pixel coordinates of the upper left corner and the lower right corner of each boundary; or calculating the pixel coordinates of the central point of each tag search frame, calculating the Euclidean distance between the pixel coordinates of the central point of each tag search frame and the preset pixel coordinates, and marking the tag search frame corresponding to the minimum value in the Euclidean distance as a target tag search frame; or the method is used for identifying the book searching number in the target tag searching frame, obtaining the target book searching number corresponding to the target tag searching frame, calling the target book brief introduction corresponding to the book searching number in the book database according to the target book searching number, and writing the target book brief introduction into the preset position in the current frame image.
10. An electronic device, comprising:
A memory; and a processor having stored thereon computer readable instructions which when executed by the processor implement the AR-based library book profile intelligent reading method of any one of claims 1 to 8.
CN202410438194.7A 2024-04-12 2024-04-12 AR-based library book introduction intelligent reading method and system Active CN118334673B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410438194.7A CN118334673B (en) 2024-04-12 2024-04-12 AR-based library book introduction intelligent reading method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410438194.7A CN118334673B (en) 2024-04-12 2024-04-12 AR-based library book introduction intelligent reading method and system

Publications (2)

Publication Number Publication Date
CN118334673A true CN118334673A (en) 2024-07-12
CN118334673B CN118334673B (en) 2024-10-08

Family

ID=91767358

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410438194.7A Active CN118334673B (en) 2024-04-12 2024-04-12 AR-based library book introduction intelligent reading method and system

Country Status (1)

Country Link
CN (1) CN118334673B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111832826A (en) * 2020-07-16 2020-10-27 北京悉见科技有限公司 Augmented reality-based library management method, device and storage medium
US10902395B1 (en) * 2017-07-11 2021-01-26 Massachusetts Mutual Life Insurance Company Intelligent e-book reader incorporating augmented reality or virtual reality
CN114267042A (en) * 2021-12-27 2022-04-01 北京邮电大学 A book inventory method and system based on target detection and OCR technology
CN114882483A (en) * 2022-04-01 2022-08-09 南京大学 Book checking method based on computer vision

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10902395B1 (en) * 2017-07-11 2021-01-26 Massachusetts Mutual Life Insurance Company Intelligent e-book reader incorporating augmented reality or virtual reality
CN111832826A (en) * 2020-07-16 2020-10-27 北京悉见科技有限公司 Augmented reality-based library management method, device and storage medium
CN114267042A (en) * 2021-12-27 2022-04-01 北京邮电大学 A book inventory method and system based on target detection and OCR technology
CN114882483A (en) * 2022-04-01 2022-08-09 南京大学 Book checking method based on computer vision

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
杨强: "基于增强现实的图书馆个性化服务系统的研究与实现", CNKI, 31 December 2015 (2015-12-31) *

Also Published As

Publication number Publication date
CN118334673B (en) 2024-10-08

Similar Documents

Publication Publication Date Title
US11681418B2 (en) Multi-sample whole slide image processing in digital pathology via multi-resolution registration and machine learning
US8744196B2 (en) Automatic recognition of images
CN110119741B (en) Card image information identification method with background
US5048107A (en) Table region identification method
CN107590447A (en) A kind of caption recognition methods and device
CA3166091A1 (en) An identification method, device computer equipment and storage medium for identity document reproduction
CN108228761B (en) Image retrieval method and device supporting region customization, equipment and medium
CN112489143A (en) Color identification method, device, equipment and storage medium
CN111178290A (en) Signature verification method and device
CN105868708A (en) Image object identifying method and apparatus
CN113052170B (en) Small target license plate recognition method under unconstrained scene
Xu et al. End-to-end subtitle detection and recognition for videos in East Asian languages via CNN ensemble
CN105260428A (en) Picture processing method and apparatus
CN118470613A (en) A video image change detection method based on artificial intelligence
CN113065559A (en) Image comparison method and device, electronic equipment and storage medium
CN112686247A (en) Identification card number detection method and device, readable storage medium and terminal
CN118334673B (en) AR-based library book introduction intelligent reading method and system
CN115497010A (en) Deep learning-based geographic information identification method and system
CN111462035B (en) Picture detection method and device
CN118334672A (en) Electronic price tag identification method and device
CN111325194B (en) Character recognition method, device and equipment and storage medium
Bhaskar et al. Implementing optical character recognition on the android operating system for business cards
CN113486788A (en) Video similarity determination method and device, electronic equipment and storage medium
CN114511862A (en) Form identification method and device and electronic equipment
Taira et al. Book boundary detection from bookshelf image based on model fitting

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant