CN118334673A

CN118334673A - AR-based library book introduction intelligent reading method and system

Info

Publication number: CN118334673A
Application number: CN202410438194.7A
Authority: CN
Inventors: 刘鹏程; 吴文珏; 王楠; 胡婧莹; 万凌
Original assignee: Hubei Engineering Vocational College Hubei Mechanical Industry School Huangshi Senior Technical School
Current assignee: Hubei Engineering Vocational College Hubei Mechanical Industry School Huangshi Senior Technical School
Priority date: 2024-04-12
Filing date: 2024-04-12
Publication date: 2024-07-12
Anticipated expiration: 2044-04-12
Also published as: CN118334673B

Abstract

The invention relates to the technical field of intelligent library book introduction reading scheme design based on AR, in particular to an intelligent library book introduction reading method and system based on AR. Acquiring a current frame image of a target area shot by an AR camera on the head-mounted AR device; processing the current frame image and establishing a label search frame of a book searching label of each book; identifying the book searching number in the target tag searching frame, acquiring a target book searching number corresponding to the target tag searching frame, calling a target book brief introduction corresponding to the book searching number in a book database according to the target book searching number, and writing the target book brief introduction into a preset position in a current frame image, so that intelligent reading of the library book brief introduction can be realized only through an AR camera on head-mounted AR equipment, the book brief introduction can be read without taking a reader off a bookshelf, the intelligent degree and usability of the invention are greatly improved, and the application scene of the invention is greatly expanded.

Description

AR-based library book introduction intelligent reading method and system

Technical Field

The invention relates to the technical field of intelligent library book introduction reading scheme design based on AR, in particular to an intelligent library book introduction reading method and system based on AR.

Background

Because the library has a plurality of library books, most libraries can adopt book searching labels to improve the searching efficiency, but readers need to take down books which want to know from the bookshelf in sequence when searching books of interest, and then put the books back on the bookshelf after finishing looking for introduction or general browsing.

Thus, the prior art is still to be further developed.

Disclosure of Invention

The invention aims to overcome the technical defects and provide an AR-based intelligent library book introduction reading method and system, which solve the problems in the prior art.

To achieve the above technical object, according to a first aspect of the present invention, there is provided an AR-based intelligent reading method for library book introduction, the method comprising:

S100, acquiring a current frame image of a target area shot by an AR camera on the head-mounted AR equipment; processing the current frame image, extracting boundaries of book searching labels of all books in the current frame image, acquiring pixel coordinates of the upper left corner and the lower right corner of each boundary, and establishing a label searching frame of the book searching labels of all books according to the pixel coordinates of the upper left corner and the lower right corner of each boundary;

S200, calculating pixel coordinates of a central point of each tag search frame, calculating Euclidean distances between the pixel coordinates of the central point of each tag search frame and preset pixel coordinates, and marking the tag search frame corresponding to the minimum value in the Euclidean distances as a target tag search frame;

S300, identifying the book searching number in the target tag searching frame, obtaining a target book searching number corresponding to the target tag searching frame, calling a target book brief introduction corresponding to the book searching number in the book database according to the target book searching number, and writing the target book brief introduction into a preset position in the current frame image.

Specifically, the processing the current frame image includes:

extracting each RGB color channel, threshold segmentation based on color characteristics, morphological processing and region screening based on height are sequentially carried out on the current frame image, and a target image only containing a frame region of a book label is obtained; and carrying out edge processing on the target image, and judging and determining a book searching label area through boundary points to obtain a book searching label image.

Specifically, the extracting the boundaries of the book-searching labels of all books in the current frame image, and establishing a label search box of the book-searching labels of all books according to the boundaries of the book-searching labels of all books includes:

and acquiring the minimum circumscribed rectangle of each book-binding label in the book-binding label image, and taking the minimum circumscribed rectangle corresponding to each book-binding label as the boundary of each book-binding label.

Specifically, the calculating the pixel coordinates of the center point of each tag search box includes:

and sequentially connecting pixels corresponding to the upper left corner and the lower right corner of each boundary to form line segments, calculating the pixel coordinates of the midpoints of each line segment, and taking the pixel coordinates of the midpoints of each line segment as the pixel coordinates of the central point of each tag search box.

Specifically, the identifying the book number in the target tag search box to obtain the target book number corresponding to the target tag search box includes:

and identifying text content in the target tag search box by using the CRNN network, and taking the content identified by the CRNN network as a filing number in the target tag search box.

Specifically, the identifying text content in the target tag search box by using the CRNN network includes:

Firstly, a convolution layer is used for learning text features, then the convolved features are input into sequence features of learning words in a bidirectional long-short-time memory network, and finally, the recognized text content is subjected to de-duplication processing through a transcription layer to output a final prediction result.

Specifically, the method further comprises the following steps:

Outputting a voice interaction signal related to whether the current book is locked or not to a user, acquiring interaction voice of the user, judging whether to stop calculating the Euclidean distance between the pixel coordinates of the center point of each tag search box and the preset pixel coordinates according to the interaction voice of the user, and writing the target book profile of the current video frame image into a subsequent frame image.

Specifically, the method further comprises the following steps:

if the acquisition result of the interactive voice of the user is yes, stopping calculating the Euclidean distance between the pixel coordinates of the central point of each tag search box and the preset pixel coordinates, and writing the target book introduction of the current video frame image into the subsequent frame image;

If the acquisition result of the interactive voice of the user is 'no', calculating the Euclidean distance between the pixel coordinates of the central point of each tag search box and the preset pixel coordinates, and marking the tag search box corresponding to the minimum value in the Euclidean distance as a target tag search box; identifying the book searching number in the target tag searching frame, obtaining a target book searching number corresponding to the target tag searching frame, calling a target book brief introduction corresponding to the book searching number in a book database according to the target book searching number, and writing the target book brief introduction into a preset position in the current frame image.

According to a second aspect of the present invention, there is provided an AR-based intelligent library book profile reading system comprising:

The acquisition module comprises an AR camera on the head-mounted AR equipment and is used for shooting a current frame image of a target area;

The control module is used for processing the current frame image, extracting the boundaries of the book searching labels of all books in the current frame image, acquiring the pixel coordinates of the upper left corner and the lower right corner of each boundary, and establishing a label searching frame of the book searching labels of all books according to the pixel coordinates of the upper left corner and the lower right corner of each boundary; or calculating the pixel coordinates of the central point of each tag search frame, calculating the Euclidean distance between the pixel coordinates of the central point of each tag search frame and the preset pixel coordinates, and marking the tag search frame corresponding to the minimum value in the Euclidean distance as a target tag search frame; or the method is used for identifying the book searching number in the target tag searching frame, obtaining the target book searching number corresponding to the target tag searching frame, calling the target book brief introduction corresponding to the book searching number in the book database according to the target book searching number, and writing the target book brief introduction into the preset position in the current frame image.

According to a third aspect of the present invention, there is provided an electronic device comprising: a memory; and a processor, wherein the memory stores computer readable instructions, and the computer readable instructions implement the intelligent library book profile reading method based on AR when executed by the processor.

The beneficial effects are that:

According to the invention, intelligent reading of library book introduction can be realized only through the AR camera on the head-mounted AR device, the book introduction can be read without taking the book off the bookshelf by a reader, complicated algorithm modeling is not needed, the time of the reader is saved to a great extent, the problems that the position of the book is inaccurate and the follow-up reader is influenced when the reader puts the book back to the bookshelf are solved, the intelligent degree and usability of the invention are improved to a great extent, and the application scene of the invention is greatly expanded.

Drawings

FIG. 1 is a flow chart of an AR-based intelligent reading method for library book profiles provided in an embodiment of the present invention;

FIG. 2 is a schematic diagram of the system components of an AR-based intelligent library book profile reading system in accordance with an embodiment of the present invention.

Detailed Description

In order to make the technical solution of the present application better understood by those skilled in the art, the technical solution of the present application will be clearly and completely described in the following with reference to the accompanying drawings, and based on the embodiments of the present application, other similar embodiments obtained by those skilled in the art without making any inventive effort should be included in the scope of protection of the present application. In addition, directional words such as "upper", "lower", "left", "right", and the like, as used in the following embodiments are merely directions with reference to the drawings, and thus, the directional words used are intended to illustrate, not to limit, the application.

The invention will be further described with reference to the drawings and preferred embodiments.

Referring to fig. 1, the invention provides an intelligent library book introduction reading method based on AR, comprising the following steps:

S100, acquiring a current frame image of a target area shot by an AR camera on the head-mounted AR equipment; processing the current frame image, extracting boundaries of the book searching labels of all books in the current frame image, acquiring pixel coordinates of the upper left corner and the lower right corner of each boundary, and establishing a label searching frame of the book searching labels of all books according to the pixel coordinates of the upper left corner and the lower right corner of each boundary.

Here, the step S100 includes, before:

a book database is established, wherein the book database comprises a plurality of book searching numbers and text contents of book introduction corresponding to the book searching numbers.

Here, step S100 further includes:

preset pixel coordinates and preset positions are preset in the control module.

It can be understood that the preset pixel coordinates and the preset positions can be specifically set according to the actual needs of the user, and the invention does not limit the specific numerical values of the preset pixel coordinates and the preset positions, so long as the method is suitable for the intelligent reading method of the library book profile based on AR.

Preferably, the preset pixel coordinates are set as the coordinates of the central point of the current frame image, the preset position is set as the coordinates of the upper left corner of the text corresponding to the target book introduction by taking the lower right corner of the target tag search box, and the text corresponding to the target book introduction is inserted. The technical staff can display only one book brief introduction through a large number of experiments, the displayed book brief introduction is the book brief introduction corresponding to the target tag search frame closest to the center point coordinate of the current frame image, the book brief introduction inserted into the current frame image does not shade the target tag search frame, the display effect is optimized to a great extent, the intelligent degree, the reliability and the usability of the invention are further improved, and the user experience is optimized to a great extent.

Specifically, the processing the current frame image includes:

Here, according to the actual shooting situation, the collected book image is found to have the following features:

(1) In the acquired video or single image, the number of books is generally 14-29, and 18 books are used for most;

(2) The book-ordering label consists of a white background and black characters, and in order to make the position of the book-ordering label in the spine more striking, the periphery of the book-ordering label is generally provided with frames with certain width and other colors, and the specifications and colors of the frames of the book-ordering label are not unified at present, and the red frames are most;

(3) There are 3 kinds of attaching modes of the book end, namely, the book end is closely attached to the book end and the book spine, and the three modes have advantages and disadvantages respectively, so that no unified standard exists at present.

According to the characteristics of the book-holding labels, the invention extracts the boundaries of the book-holding labels of all books based on the book-holding labels of the red frames. The specific steps for acquiring the frame of the book label are as follows:

S110, acquiring a current frame image of a target area shot by an AR camera on the head-mounted AR device, and extracting red (R), green (G) and blue (B) components aiming at the current frame image to obtain component images IR, IG and IB.

S120, according to the color development principle, if the image is to be presented with red color, the value of the red component is generally larger, the value of the red component is required to be far larger than the values of other two components, and certain correlation exists among the three color components; through analysis of a series of images by the technician of the present invention, it is finally determined that the value of the red component reaches at least half of the maximum gray value, namely 127; assuming that a pixel point at a certain point in an image is displayed as red, the relationship of the components is as follows:

Wherein m1 and m2 are correlation coefficients of R and G, B components respectively, and IR (x, y), IG (x, y), IB (x, y) are gray values of the images IR, IG, IB at positions (x, y) respectively.

Analyzing boundary colors of the Soxhlet labels under various conditions, including perfect aging states and different aging states, so as to obtain a distribution range of m1 and m2 as m1 epsilon [2,6]; m2 is epsilon [2.19,5.44].

S130, judging pixels of each component image IR, IG, IB of the book image one by one, if the gray value of the pixel of a certain point in each channel image IR, IG, IB meets I _R (x, y) >127,

The pixel point is the frame of the suspected book label, the gray value of the pixel point is set to be 1, otherwise, the gray value of the pixel point is 0, namely, the acquisition mode of the pixel value of each point in the target image only containing the frame area of the book label is as follows:

Wherein m1min and m2min are minimum values of m1 and m2 respectively, and (x, y) represents any pixel point in the image.

And S140, performing region filling on the target image which only contains the frame region of the Soxhlet tag and is obtained in the step S130 through morphological processing, and removing the influence of a small error region caused by noise.

S150, because partial red areas possibly exist in the book spine, the range is generally larger than the height stability of the book-lashing tag frame, and the height is also generally far greater than the height of the book-lashing tag frame, so that the areas are screened according to the height. The number Hn of pixel points corresponding to the frame height of the book label may be different in different shooting distances and shooting angles, and after the shooting distances and the shooting angles are determined, the numerical value is basically stable, and the numerical value can be tested and fixed in the first test image. In this experiment, the value of Hn was about 10 pixels.

Specifically, the height-based region screening procedure is as follows:

(1) Judging and sequencing the number of the connected domains according to the eight-connected criterion;

(2) The judgment of the connected domain is sequentially carried out according to the ordering sequence, the judgment steps are as follows, firstly, the height of the connected domain is obtained, the connected domain is compared with a determined threshold Hn, and if the height of the connected domain is smaller than or equal to Hn, all pixel values in the connected domain are kept unchanged; if the height of the connected domain is larger than Hn, setting all pixel values in the connected domain to 0;

(3) And after all the connected domains are judged, obtaining a boundary image of the book-in-cable label.

(4) And carrying out edge processing on the boundary image of the book label, judging and determining a book label area through boundary points, namely separating the book label from the background and the spine to obtain a book label image, and finally, carrying out minimum circumscribed rectangle acquisition on the book label image to obtain pixel coordinates of the left upper corner and the right lower corner of each book label in the current frame image.

Specifically, the acquiring method of the book label is as follows: and (3) carrying out edge extraction on the boundary image of the book label, namely, reserving the numerical value of the boundary point, changing other gray values into 0, and reserving an upper frame and a lower frame of the edge image of the frame of the book label to obtain an intact book label after the processing of the steps, wherein each frame is provided with the upper and the lower boundaries, and only the numerical value of the boundary point is reserved in the edge extraction process, namely, each column of the intact book label area is provided with four non-zero values. According to the method, the method comprises the following specific steps of:

Firstly, creating three one-dimensional arrays, which are named as A0, ab and Ae respectively; detecting the number of non-zero points in the edge image column by column, storing the number of the non-zero points in the column into A0, storing the row coordinate of the first non-zero point in the column into Ab, and storing the row coordinate of the last non-zero point in the column into Ae; judging column by column, taking j as an example, changing the gray values of all pixel points from the first non-zero point to the last non-zero point of the column into 1 if A0 (j) =4, wherein A0 (j) represents the number of the non-zero points in the j-th column, ab (j) represents the row coordinate of the first non-zero point in the j-th column, and Ae (j) represents the row coordinate of the last non-zero point in the j-th column; if A0 (j) is not equal to 4, setting the non-zero boundary point in the column to zero; and (3) finishing the judgment of all the columns to obtain the boundary image of the book-reading label.

And S200, calculating pixel coordinates of the central points of the tag search frames, calculating Euclidean distances between the pixel coordinates of the central points of the tag search frames and preset pixel coordinates, and marking the tag search frame corresponding to the minimum value in the Euclidean distances as a target tag search frame.

It should be noted that the CRNN network (Convolutional Recurrent Neural Network) is used to identify the text content detected in the previous step, and the network is a convolutional neural network structure, which is used to solve the problem of image-based sequence identification, especially the problem of scene word identification. The content identified by the CRNN network is the book number. The method of the network is that firstly, a convolution layer is used for learning text characteristics, and then the convolved characteristics are input into a sequence characteristic of learning characters in a bidirectional long-short-time memory network. The bidirectional long-short-term memory network can well utilize the information of the context rather than the isolated prediction of each character, and can more accurately identify the predicted text content by combining all contents of the context. And finally, performing processing such as de-duplication and the like on the identified text content through a transcription layer to output a final prediction result. The network identifies the content detected in the last step, and outputs the content as text information, including the book searching number of each book and the position information of the bookshelf.

It should be noted that the present invention uses CRNN networks to perform text recognition respectively, but the text recognition networks are far more than this, and similar effects can be achieved by using other text recognition networks, such as DTRN (Deep-text RecurrentNetwork), but the nature is to recognize the detected content.

It can be appreciated that the Soxhlet number extraction and identification are both prior art, and the present invention is not described in detail herein.

Specifically, the method further comprises the following steps:

The invention realizes that the same target book introduction is continuously displayed according to the needs of the user or is in a continuously updated state according to the left-right rotation of the AR glasses by the aid of the distinguishing technical features, so that readers can read the book introduction conveniently, and the intelligent degree and usability of the invention are further improved.

It can be understood that the intelligent reading of the library book introduction can be realized only through the AR camera on the head-mounted AR device, the book introduction can be read without taking the book off the bookshelf by a reader, and complex algorithm modeling is not needed, so that the time of the reader is saved to a great extent, the problems that the position of the book placed by the reader is inaccurate and the follow-up reader is influenced to find the related book when the reader places the book back to the bookshelf are solved, the intelligent degree and usability of the intelligent reading device are improved to a great extent, and the application scene of the intelligent reading device is greatly expanded.

Referring to fig. 2, another embodiment of the present invention is provided, and the present embodiment provides an AR-based intelligent reading system for library book profiles, including:

an acquisition module 100, including an AR camera on a head-mounted AR device, for capturing a current frame image of a target area;

the control module 200 is used for processing the current frame image, extracting the boundaries of the book searching labels of all books in the current frame image, acquiring the pixel coordinates of the upper left corner and the lower right corner of each boundary, and establishing a label searching frame of the book searching labels of all books according to the pixel coordinates of the upper left corner and the lower right corner of each boundary; or calculating the pixel coordinates of the central point of each tag search frame, calculating the Euclidean distance between the pixel coordinates of the central point of each tag search frame and the preset pixel coordinates, and marking the tag search frame corresponding to the minimum value in the Euclidean distance as a target tag search frame; or the method is used for identifying the book searching number in the target tag searching frame, obtaining the target book searching number corresponding to the target tag searching frame, calling the target book brief introduction corresponding to the book searching number in the book database according to the target book searching number, and writing the target book brief introduction into the preset position in the current frame image.

It should be noted that the intelligent reading of the library book introduction can be realized only through the AR camera on the head-mounted AR device, the book introduction can be read without taking the book off the bookshelf by a reader, complicated algorithm modeling is not needed, the time of the reader is saved to a great extent, the problems that the position of the book is inaccurate and the follow-up reader is influenced to find the related book when the reader puts the book back to the bookshelf are solved, the intelligent degree and usability of the intelligent reading device are improved to a great extent, and the application scene of the intelligent reading device is greatly expanded.

In a preferred embodiment, the present application also provides an electronic device, including:

a memory; and a processor, wherein the memory stores computer readable instructions that when executed by the processor implement the AR-based intelligent library book profile reading method. The computer device may be broadly a server, a terminal, or any other electronic device having the necessary computing and/or processing capabilities. In one embodiment, the computer device may include a processor, memory, network interface, communication interface, etc. connected by a system bus. The processor of the computer device may be used to provide the necessary computing, processing and/or control capabilities. The memory of the computer device may include a non-volatile storage medium and an internal memory. The non-volatile storage medium may have an operating system, computer programs, etc. stored therein or thereon. The internal memory may provide an environment for the operation of the operating system and computer programs in the non-volatile storage media. The network interface and communication interface of the computer device may be used to connect and communicate with external devices via a network. Which when executed by a processor performs the steps of the method of the invention.

The present invention may be implemented as a computer readable storage medium having stored thereon a computer program which, when executed by a processor, causes steps of a method of an embodiment of the present invention to be performed. In one embodiment, the computer program is distributed over a plurality of computer devices or processors coupled by a network such that the computer program is stored, accessed, and executed by one or more computer devices or processors in a distributed fashion. A single method step/operation, or two or more method steps/operations, may be performed by a single computer device or processor, or by two or more computer devices or processors. One or more method steps/operations may be performed by one or more computer devices or processors, and one or more other method steps/operations may be performed by one or more other computer devices or processors. One or more computer devices or processors may perform a single method step/operation or two or more method steps/operations.

Those of ordinary skill in the art will appreciate that the method steps of the present invention may be implemented by a computer program, which may be stored on a non-transitory computer readable storage medium, to instruct related hardware such as a computer device or a processor, which when executed causes the steps of the present invention to be performed. Any reference herein to memory, storage, database, or other medium may include non-volatile and/or volatile memory, as the case may be. Examples of nonvolatile memory include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), flash memory, magnetic tape, floppy disk, magneto-optical data storage, hard disk, solid state disk, and the like. Examples of volatile memory include Random Access Memory (RAM), external cache memory, and the like.

The technical features described above may be arbitrarily combined. Although not all possible combinations of features are described, any combination of features should be considered to be covered by the description provided that such combinations are not inconsistent.

The above-described embodiments of the present invention do not limit the scope of the present invention. Any other corresponding changes and modifications made in accordance with the technical idea of the present invention shall be included in the scope of the claims of the present invention.

Claims

1. An intelligent library book introduction reading method based on AR, which is characterized by comprising the following steps:

2. The AR-based intelligent reading method for library book profiles according to claim 1, wherein said processing the current frame image comprises:

3. The intelligent reading method for the library book profile based on the AR according to claim 2, wherein the extracting the boundaries of the book-searching tags of all books in the current frame image, and establishing the tag search box of the book-searching tag of each book according to the boundaries of the book-searching tags of all books, comprises:

4. The AR-based intelligent reading method for library book profiles of claim 3, wherein said calculating pixel coordinates of a center point of each tag search box comprises:

5. The intelligent reading method for the library book introduction based on the AR of claim 4, wherein the identifying the index number in the target tag search box to obtain the target index number corresponding to the target tag search box comprises:

6. The AR-based intelligent reading method of library book profiles of claim 5, wherein said identifying text content in a target tag search box using a CRNN network comprises:

7. The AR-based library book profile intelligent reading method of claim 6, further comprising:

8. The AR-based library book profile intelligent reading method of claim 7, further comprising:

9. An AR-based intelligent library book profile reading system, comprising:

10. An electronic device, comprising:

A memory; and a processor having stored thereon computer readable instructions which when executed by the processor implement the AR-based library book profile intelligent reading method of any one of claims 1 to 8.