CN120166261A

CN120166261A - A video application method and system

Info

Publication number: CN120166261A
Application number: CN202411757479.3A
Authority: CN
Inventors: 单正建
Original assignee: Trans Boundary Free Technology Beijing Co ltd
Current assignee: Trans Boundary Free Technology Beijing Co ltd
Priority date: 2023-12-17
Filing date: 2024-12-03
Publication date: 2025-06-17

Abstract

After a user watching a video clicks or triggers a plane or a three-dimensional target in the video, a program of a user terminal judges that a specific one of one or more targets input in advance is clicked according to the clicked position, and then displays or acquires preset information corresponding to preset information of the target and executes a next instruction or program; the method and the system are used for solving the problems of inherent defects of video transmission, coding, interaction and the like, which affect the transmission efficiency of the video serving as a transmission carrier, the richness, the completeness and the vividness of the transmission information and prevent the problems of commodity sales and advertising effect.

Description

Video application method and system

Technical Field

Video interaction technology, video propagation capability enhancement technology, image recognition, feature contrast, image segmentation, image target analysis, data communication and other support technologies are also included, and the technology is also a next generation internet application technology.

Background

In addition, the contents of targets (PPT (video content, texture commodity) in the video are limited by video shooting equipment, a lens, coding and transmission modes and modes of a user side and a player, so that the targets in the video such as plane targets or three-dimensional targets are limited by video coding and video composition, and detailed and clear contents cannot be effectively transmitted to a video watching or playing user.

For example, a teacher's video recording, in which the target plane information in the recording, such as a table, a picture, a code area, etc., is involved, and since the pixels of the recording, such as 1080P, still cannot clearly show the content, the effect of learning on the student side of the viewing (such as showing the molecular structure or a lot of program code, and the details of classical painting) is affected.

For example, a video displayed in a museum contains a plurality of famous statues (three-dimensional objects) and oil paintings (planar objects), but when a viewer wants to clearly see any one of the objects, the existing video mode cannot clearly display the content of the object, so that the grasping of the content in the video by the viewer is influenced, and the viewing experience (such as oil paintings in the video background, focusing of the video on the person of the commentator, and the viewer wants to carefully watch the oil paintings in the background) is influenced.

For video transmission in the forms of network sales, marketing or lecture such as live broadcast, the target (including commodity and background projection) is in the video, and therefore, the target cannot show detail properties such as materials and textures, and thus, in most cases, the influence of live broadcasting and speaking on consumers is caused, but the information displayed by the target commodity cannot be clearly displayed (such as a displayed book, and the buyer is interested in certain contents of the book, but not the illustration of the seller at the angle).

In view of the mobility of the terminal and the mobile phone screen, the targets in the video cannot be clearly and possibly displayed fully based on the current mode, then the effect is not good on the terminal and is normal, the depth required to be reached by propagation cannot be realized, and the sales effect to be reached such as effect advertisement cannot be realized.

In video technology, the video drama needs to interact, and the interaction with the audience needs to be developed based on a specific target, but the current mode of A, B drama from the game thought prevents the user from watching and flexibly interacting, and then limits the realization of the deep interactive drama.

The traditional video conference adopts a T.120 mode to share files, but a large amount of non-real-time videos and group live broadcast exist in reality, and the traditional video conference thought cannot cope with user side dynamics and personal appeal.

In fact, the video technology widely used on the internet has a large number of defects, and cannot meet the needs of the scene, and the above description is only a small part of the defects, and in the following disclosure, several embodiments are included to solve specific problems, it is worth mentioning that the method of identifying, retrieving information and displaying to the user in the internet by using pictures such as google patent is completely contrary to the problems to be solved in the present disclosure.

In fact, summarizing, the current video technology, due to the technical bias of unidirectional propagation and the defects of imaging technology, causes propagation effects, is not limited to content, commodity, transaction and other limitations, and enables videos which are not easy to see to be limited to the video itself, but not to influence, extension and multidimensional transmission of information of propagation, so that the present disclosure solves the technical defects of the current video by using a new technical mode.

Disclosure of Invention

In a first aspect, the present disclosure provides a method for video applications, comprising:

a terminal user watching the video triggers a target in the video being played by the terminal;

The user terminal program calculates and obtains the triggered position data in the video frame;

the user terminal program or the video transmission module judges the target of the position in the current video position according to the information of the target input by the video generating personnel or the target which is delineated, and the generated target data;

the user terminal program reads the preset information of the target and displays the preset information on the user terminal, or reads the preset information of the target and executes the preset program or instruction.

In some examples, in combination with the disclosed method of the first aspect, the objects in the video may comprise planar objects and stereoscopic objects, the planar objects comprising information in the background such as projections, photographs, pictures, planar artwork, and the stereoscopic objects comprising stereoscopic merchandise, objects, and in addition, the objects may comprise characters in the video.

In some embodiments, the method disclosed in the first aspect further includes the step of generating, by the video generator, a picture of the target, wherein the picture is used for feature extraction and comparison.

With reference to the method disclosed in the first aspect, in some embodiments, the generating person defines a target in a video display interface of a target data generating module;

The target data generation module generates target data of the delineating target based on a time value or a frame sequence value of the video.

In combination with the method disclosed in the first aspect, in some embodiments, the preset information of the target is uploaded to the video propagation module by a video generating end program or directly by a video generating person.

In some embodiments, the method disclosed in the first aspect is used for defining the object in the displayed video in a man-machine interaction mode, such as using a mouse and a touch screen, and using a geometric frame to select and define the object, wherein the content of the object comprises a planar object and a three-dimensional object, and the planar object comprises an object with dynamic content and comprises played PPT, PPTs and video.

In some embodiments, the method disclosed in the first aspect includes generating the target data based on a time line of the video and based on a position and a track of the target in the video by adopting a mode including artificial intelligence and image processing.

The method disclosed in the first aspect is combined, information preset by a person on a target is generated to comprise a file or a resource link, the file comprises a picture, PDF, PPT, a vector file and a video file, the resource link comprises a webpage link and a video link, and the video link further comprises a video stream.

In combination with the method disclosed in the first aspect, an end program or a video transmission module is generated, and characteristic information of the target is generated based on a target set by the video generating personnel (such as a target directly circled on a video) or a picture of the imported target.

In combination with the method disclosed in the first aspect, a user of a terminal watching a video triggers a target in the video, if a corresponding target is determined, an interaction area including service personnel interaction content, an interaction area of a viewer, service personnel further including machine service personnel further describe, provide and communicate information of the target with the terminal user, the interaction area can be in a program side conventional form such as a webpage form, a window form and the like in the terminal, and the AI robot can be in a window form or an animation form and can be combined with a TTS to increase the user experience.

In combination with the method disclosed in the first aspect, a user watching a video clicks on a face of a person in the video, and if the user judges that the target is triggered, the user executes the instruction according to a preset next instruction and preset information of the target, wherein the executed instruction includes paying attention to or establishing communication.

In combination with the method disclosed in the first aspect, if multiple targets are included in the video and the target type attribute is different, the method of generating the target data is applied in combination with the method of comparing the feature, and the method of generating the target data is used for the target of the dynamic content change.

In combination with the first aspect, the present disclosure provides a method for video application, specifically including the following:

editing, recording or generating personnel (video generating personnel) of the video, and importing a video file or the video being acquired into a target data generating module;

The generating personnel define a target in a video display interface of the target data generating module;

the target data generation module generates target data of the delineating target based on a time value or a frame sequence value of the video;

The generating personnel guide the preset information of the target into a target data generating module or a video transmission module;

The terminal user uses the terminal user program to access the video transmission module through the network and obtain the target data of the defined target in the video file or video;

Triggering the target when the terminal user watches the video;

and the terminal user program or the video transmission module judges the triggered target according to the target data and the position of the trigger point in the video, displays the preset information of the target, or reads the preset information of the target and executes a preset next instruction.

In combination with the above method, in some specific embodiments, the person uses a man-machine interaction manner, and in the target data generating module, the target in the displayed video is defined, where the defining manner includes using a man-machine interaction manner, including using a mouse and a touch screen, using a geometric frame, and selecting and defining the target, where the content of the target includes a planar target and a three-dimensional target, and the planar target includes a target with dynamic content, such as a played PPT, PPTs, video, and the like.

In combination with the above method, in some specific embodiments, the target data generating module further includes a target detection module, where the target detection module uses a manner including artificial intelligence and image processing, and generates target data based on a position and a track of a target in the video based on a timeline of the video (the data generally tracks the target in the video with a rectangular frame).

In combination with the above method, in some specific embodiments, the target data and the file corresponding to the target, which can clearly reflect the target, are generated in the same file, while in other embodiments, the generated target data, the detailed file of the target and the video file are separated.

In combination with the method described above, in some specific embodiments, the video transmission module may use a video streaming mode or a video file mode to transmit video, such as HLS, DASH, RTP/RTSP, multicast, etc., and in addition, the video transmission module will transmit preset information corresponding to the target separated from the video, including a file corresponding to the target and a digital link, and when the video is uniquely encoded in the video transmission module, the program of the terminal user reads the preset information of the target in the video and displays or executes the preset next instruction according to the unique encoding of the video, if the user clicks the trigger video and determines the target to which the clicked position belongs.

In some specific embodiments, the object may comprise a planar object and a stereoscopic object, the planar object may comprise information in the background, may comprise projections, photographs, pictures, planar artwork, and the stereoscopic object may comprise stereoscopic merchandise, objects, etc.

In combination with the above method, in some specific embodiments, the generation of the target data is based on the size of the original imported video, and the target data is generated, where the value of the target frame of the target data may be an absolute value or a relative value.

In combination with the above method, in some specific embodiments, the program of the user terminal may include the target frame and may not include the target frame when displaying the target in the video, or may generate a stereoscopic target based on the feature information of the target, so as to enhance the user experience, for example, the terminal program segments the target and forms the stereoscopic target based on the video and the target data, thereby increasing the sense of science.

In some specific implementation examples, the user sees the target on the video display interface of the terminal and clicks or triggers the target, the terminal program determines the triggered or clicked target according to the triggered position feedback information, the current frame or time information of the video, and the generation data of the target.

In combination with the first aspect, the present disclosure provides a method for video application, specifically as follows:

Editing, recording or generating personnel (generating personnel of the video) of the video, importing video files or video streams being collected and information preset by targets in the video (including files or digital connections corresponding to the targets, preset connections, including videos, files, web page connections and the like) through a video editing/generating end program;

The video editing/generating end program uploads or leads the video and target preset information in the video to a video propagation module;

In addition, the program or the video transmission module generates feature information of the target based on a target set by the video generating person (such as a picture of the target directly circled on a video) or an imported target;

The terminal user uses the terminal user program to acquire the video file through the video transmission module by a network;

Triggering and clicking the target when the terminal user watches the video;

The terminal user program obtains the clicked position in the video, and the terminal program or the video transmission module judges the target to which the clicked position belongs based on the characteristic data of the target;

and if the set target is matched, displaying preset information of the target on the terminal user program or reading the preset information and executing a preset next instruction or program.

By combining the method, the user of the terminal clicks or triggers the target, and software of the terminal reads and displays information (corresponding file information or resource links) preset by the target, wherein the resource links comprise web page links and live video streams.

By combining the method, the user program of the terminal can display according to the format of the file in the information preset by the target, wherein the format of the file can contain forms of tables, pictures, vectors and the like, and can also contain video files, and the user can adjust and display the preset display area so as to clearly watch details which cannot be displayed in the video, such as enlarging and shrinking preset information (such as vector pictures, file pictures, PPT or PDF page turning and the like) in the display area.

In combination with the method, after clicking or triggering the target in the video, the user of the terminal can also contain service personnel interaction content and interaction area in the information interface related to the target, the service personnel also contains a machine service personnel, and the service personnel further describes, provides and communicates the information of the target with the terminal user.

By combining the method, after the terminal user clicks and triggers the target in the video, the message of other users in the information interface corresponding to the target can be displayed, and other people can also see the message information.

In combination with the method, after clicking the target in the trigger video, the end user can purchase the target commodity, collect the red packet and discount card in the information interface for displaying the target.

Combining the method, the attribution judgment of the clicked area can adopt the following mode that the target picture is matched with the triggered frame, and if the matched range contains the triggered position, the triggered target is judged;

Or extracting an image of the position of the trigger point in the video frame, extracting features, comparing the extracted features with the features of the target, and determining the clicked target, wherein the method is further subdivided into extracting a picture of the clicked position or dividing the image of the clicked target according to a set size, comparing the picture with the features of the target, and judging the clicked target.

In combination with the above method, in some embodiments, when the video contains multiple targets and the target type attribute is different, the method of generating the target data is applied in combination with the method of comparing the feature, and the method of generating the target data is used for the target of the dynamic content change.

By combining the method, the user clicks the face of the person in the video, and if the person judges that the person is the target, the user executes the public number paying attention, communication and other next instructions according to the next instruction preset by the target.

In combination with the method, after clicking the object in the video and confirming that the object is clicked, the user can put the object into the shopping basket in the next execution instruction.

Drawings

FIG. 1, method based on object detection FIG. 1;

FIG. 2, schematic diagram 1;

FIG. 3, schematic diagram 2;

FIG. 4, method based on feature contrast FIG. 2;

FIG. 5, method diagram;

Detailed Description

The specific implementations, encodings, numbers, values, durations, triggers, schematics, next instructions, programs, etc., described in the following exemplary embodiments are not representative of all implementations, embodiments consistent with the present disclosure, but rather are merely application methods and systems of some videos consistent with the present disclosure as set forth in the appended claims.

The techniques and methods of the present disclosure will be described more fully hereinafter with reference to the accompanying drawings, in which embodiments, it is shown, however, in which embodiments are shown, some, but not all of the examples of the disclosure.

The technical logic of the method is that the video is taken as a propagation basis, the targets in the video are taken as interaction contacts, the targets are triggered in the video based on the preset of a video generator, and then, the targets are displayed in a preset mode or a next instruction is executed, so that the propagation efficiency of the video is improved in a multi-dimensional mode, the mode of a set control of the machine such as Flash, H5 and the like is effectively and dynamically changed, and the propagation efficiency and the appearance of the video are improved.

If the video is in a file form, for example, a video teaching material recorded by a teacher, for students to watch and learn remotely (video on demand, file form can be a slice form, the present disclosure does not limit the on-demand form), which includes planar target information, for example, a classical written oil painting picture, a teacher is used for interpretation of skills, characteristics, etc., although on the screen of the teacher computer, the oil painting may be a very clear image file with a size of 10 megabits, after recording, the oil painting information in the recorded video (the target in the video is defined in the present disclosure), that is, the content information of the target becomes the video, far reaching the minimum standard required by professional learning, and if the students just watch the video on the mobile phone, the transmission effect to be achieved by the video is far lower than the target effect to be transmitted, and of course, the oil painting taught by the teacher is in the video, which is also unable to reach the professional requirements, and the prior art of the video is not clearly understood by the teacher, and the detail is not clearly understood by the teacher, because the prior art of the video is directly broadcasting the video.

The method is characterized in that a doctor uses a video to share own medical experience, information pictures of nuclear magnetic imaging are used for explaining how to judge diseases, and after the nuclear magnetic pictures are changed into videos, the values of medical study judgment are completely absent, so that people watching the videos cannot form logic of cognition and approval closed loops based on information (information of targets in the videos) of the nuclear magnetic pictures in the videos, such as nodes in the nuclear magnetic pictures cannot be recognized in the videos at all, and the effective transmission of the information is substantially hindered in many aspects by the current video transmission mode.

Therefore, in order to solve these problems, a01, that is, an editing or recording person and a generating person of a video (defined as a video generating person in the present disclosure), there is first a video file V01, which is assumed to be a recorded video of a teacher (or a video recorded by a doctor, the present disclosure is only exemplified), another file is F01, that is, information corresponding to a target in the video (a file, a digital link, etc. corresponding to the target are all defined as preset information of the target in the present disclosure), and the information may be a file, such as a medical picture file, a clear photograph of an oil painting, a designed vector file, etc., for example, a planar target-classical oil painting, picture information of nuclear magnetic imaging (there may be a plurality of targets in the video, each target may have a file, a resource link chain, or even a video stream corresponding to different forms), each target is preset with corresponding preset information, and the corresponding information contains a file form and also contains a digital link form.

In order to achieve a good effect, a functional module D01 and a target data generating module are needed, wherein the functional module may be included in video editing software or terminal software of a video playing platform to form a functional module, or may be a function of video application software, when a01, namely, a video generating person imports video into the D01 module, for example, when a program containing the D01 module is used for opening the video, the a01 can use the D02 target delineating module or function, the video generating person sets and selects delineating targets in a display interface of the imported video, for example, oil painting in the above example, the a01 person uses geometric shapes, such as geometric shapes of circles, ellipses, rectangles, polygons and the like, delineating targets (usually rectangles), and the targets are included in the closed geometric shapes, or may be an automatic mode, for example, the a01 person or the targets in the triggered video image are triggered (by a man-machine interaction mode, the program knows that the whole image containing the triggered position is the target, usually adopts an image segmentation mode, and the target is identified by clicking the whole image, namely, the target is identified by the point-and the image containing position is partitioned according to the program. Under the setting of A01 personnel, the function of the target is defined, and the identified target frame is used for user determination or detail adjustment, so that in the target definition, the target can be defined by adopting a mode of combining manual work and manual work with artificial intelligence or image processing.

For video with little content, such as video recording, the content of each content in the screen is laid out, so that the change in information is not particularly large in a period of time, and the manual mode is completely enough, while for video recorded on site, such as a video with economic teacher and economic lecture, the chart information (usually projected information) in the video background can be shifted, size changed, lens defocusing and the like in the video along with the shift of the teacher (the teacher is the focus of the video), but in reality the chart in the background information can be the information (detail information) more focused by the viewer, but can be blurred and non-focused when photographed by a camera, so that the staff for editing and generating the video can select the chart display area of the background as the target area.

The video object detection technology is always the key technical direction of the AI and computer vision fields in the industry, is also the field in which various implementation modes appear, but is also the field in which the best effect is always found, the disclosure is not limited to a specific mode, but rather, the technical mode described in the appended claims is adopted as the appended claims, and the technical mode is a specific embodiment of the disclosure.

In addition, if a product is expressed in a core in a video, such as a video containing advertisements, the texture of many commodities is not expressed in the video although the product is placed in the video core position, especially the video without professional video shooting processing cannot arouse the attention of consumers, so that a video viewer is not felt, one of the reasons that the sales situation of a large number of textured commodities is not good in the current video live broadcast is live broadcast, the textured commodities lose texture, the products with details need to be displayed, and the live broadcast mode cannot display details, so that the purchasing will of consumers is hindered.

In video, the stereoscopic object of commodity is usually moving in an image, such as a pulley advertisement, is usually expressed in the moving video, so that the object detection module D03, D03 needs to be adopted to detect and frame the object in the video according to the time line of the video based on the object selected by D02, and form an object frame and a mark frame (reducing the complexity of manpower).

The technical development of D03 has been carried out for many years, including an artificial intelligence mode, a computer vision mode or a combination of both, but the present disclosure is not limited to a specific implementation path, and it should be noted that, in the present disclosure, a manually selected target, that is, a D02 mode, is adopted, because the conventional target detection is to find a target in an entire video, and the problem to be solved in the present disclosure is to find various moving targets in a specific target rather than a video, and to ignore the background, that is, the conventional target detection mode is to find a target in the video, that is, D02, and mark a target frame along with a time line of the video, so that, in a video changing process, a size position of the target frame is also adjusted along with a change of the video, so that, based on the time line of the video, position data of the target can be obtained (and based on the position data of the target in the video, generally, a rectangular frame is returned to be used for labeling the target), that is to be implemented by the D01 functional module, and in the data generation process, there can be multiple targets in the video, each target can be corresponding to a single target number, and each target can be clicked as the unique target position in the video requiring the determination of the video.

If the A01 personnel can manually process the target range and generate data on the basis of video frames and I frames and timelines without the D03, the A01 personnel can automatically generate the target data in an automatic mode after the D03 is adopted, for example, when the target range and the data cannot be manually changed in a live broadcast mode, the D03 can be adopted, and the target data can be automatically generated in an artificial intelligence mode or a computer vision technology.

In the video, a plurality of targets can be contained in the video and used in the process D01, for example, in the video of a section of exhibition hall, the video with corresponding duration corresponding to one lens contains a plurality of targets (such as a plurality of oil paintings and sculptures), so that the process D02 can be adopted to select and the process D03 can be adopted to automatically detect the targets to generate respective target data, and for the documentaries of the knowledge subject, a plurality of targets such as sculptures, frescoes, history files and cultural relics can be contained in the documentaries, when the targets are displayed in the video mode, the targets cannot be displayed one by one in detail, for example, the content in the files can not be clearly displayed, so that the target data can be generated in the video according to the process D01, a viewer of the video clicks the target of interest of the viewer in the user, such as the files in the video, and because the user terminal program judges the clicked position and the position is contained in the specific target range of the current video position, then the user is judged to click the target, so that on the user terminal side, the preset information (such as the content of the files displayed in the PDF) corresponding to the target is read, and the target information can be displayed on the interface of the user terminal side, and the user terminal side can see the content of the target, which is displayed in the detail of the picture, and the user can be viewed by clicking the specific details only in the picture (such case, and the picture can be displayed by clicking the picture, and the user can be viewed by clicking details.

In the foregoing examples, the input and import of video files are described, and the following description is given of live recording, such as live video.

When a video recorder collects the video shooting and recording equipment such as a V02 video collecting module, a digital video stream can be input into the D01 module, at the moment, the A01 person uses a D02 function to select and outline a target according to the target to be marked, then under the D03 function, the V02 collecting equipment lens changes at any time, the target is detected, moved and changed along with time to generate target data, a region (target frame) is framed, at the moment, a terminal watching the video can also receive the generated target data, after clicking the target, a terminal program judges the clicked target according to the clicked position contained in the data range of the target, the terminal can read preset information of the target and display the preset information on a user terminal, so that a user can clearly know clear information of the target to be watched, or information different from the video, such as books in the video, after clicking the user terminal, PDF file pages of books are displayed.

The V02 video acquisition module generally comprises an optical imaging function module, a digital imaging module and the like, and then converts an optical signal into a digital signal, so that the digital signal is led into the D01 in a video coding manner (common acquisition modules such as a mobile phone, a video collector and the like, and the optical signal is coded into a video stream).

When a01, a video editor or a recording and generating person generates the target data in the operation step D01, files (preset information of the target) of the corresponding target should be prepared in advance, such as a chart and a table used by an economic teacher, and classical written oil painting photos used by an artist, so that the files can be clear picture files, vector files and digitally connected files, then at a video playing end, after a user clicks the target, a terminal program or a background program judges that the user clicks the target and displays preset information of the target after the user clicks the target, and the terminal user can amplify details of the information at the terminal side as if the terminal user views the information with a magnifying glass on site, thereby achieving the site observation of a real object.

For stereoscopic targets such as commodities, the preset information of the targets can be photos corresponding to the targets, vector diagrams and design drawings, the targets can be special videos, even special image equipment is included to collect corresponding special videos such as live jewelry sales, sculpture artworks and the like, the network shooting equipment and the light are set, the targets move regularly (such as rotate under the light), if a viewer clicks on a target object in the videos when watching the anchor video, video resources (URL resource link chains) particularly displayed on the target object are entered, if the online video is actually called and another video is displayed on the terminal, the streaming video is played by the terminal, the fine video is enhanced, the transmission effect of the original video is enhanced, namely, the first video is used for transmission, the target is clicked after the user who is transmitted to the second video is interested in the target, better understanding and feeling are further provided for the target, and the two videos can be displayed in a picture-in-picture mode on the user terminal side of course.

The target file in the F01 video may be one or more (the preset information of one target may be a plurality of (for example, a bundle of books in the video but different sets of a bundle of books, after clicking the target, PDF of each set is read by the terminal, so that a person clicking on the target can preview the PDF and decide whether to purchase the bundle of books) so as to meet the target of the a01 person for video broadcasting effect and the requirement of a video viewer, that is, the terminal user U01, for obtaining target information in the video, especially, the video, of course, some frames of one video or some video segments may contain a plurality of targets, or different targets may be contained in one video segment, each target may correspond to one file or a plurality of files (respective preset information of each target), after the target is generated, depending on the target data, the user terminal determines that the target is clicked on the basis of a time line, after the user clicks on the target, the user terminal reads and obtains the preset information corresponding to the target, that is more actively read by the user terminal or the preset information file or the preset information is transmitted to the user terminal, and the video broadcasting information is generated by the user terminal or the lead-in module.

The target data generating module may of course also form a new format video file (the current video coding standard cannot apply video flexibly) from the file F01 in a certain coding form, so that the new format video file is transmitted and imported to the T01 as a complete file or data stream, that is, the video broadcasting module.

The video standard of today does not support the flexible mode, so that the conventional file or the file of F01 is transmitted to a T01 video propagation module for a video user to call and watch the file after clicking the target, and naturally, in the T01 module, the video, the target and the URL are associated, and the target is also uniquely encoded in the video because of the unique encoding of the video and the unique association of the target and the URL or the file (each target has own preset information), so that in one video, the user clicks a certain target of the video displayed by the terminal, the terminal program is then judged to trigger the target according to the clicked position within a certain target range, so that the terminal program reads and displays the file or the URL (preset information) corresponding to the target, and because the target detection, namely the target with the position range data of the target is adopted, the target is judged to be in the current range of the video (usually the target frame), the target is judged to be triggered.

In the application, the file corresponding to the target is not limited to be specifically adopted to form a new regular video file or a separated state, namely, the video is separated from the preset information of the target, and the terminal retrieves and reads the preset information according to the user's requirements.

In addition, the data generated by the target data may be stored in a video file in a set format to form a new video file, or may be transmitted to a database corresponding to the T01 module (in a database format, for example, a real-time database is adopted in live broadcast), or may be in a file format (for example, a non-live broadcast video), and the terminal program for watching the video reads the target generated data simultaneously when reading the video file (so long as the terminal reads the video based on the ID of the video, and obtains the target generated data), which is a specific embodiment of the present disclosure.

In the disclosure, the T01 module may further include a database, for example, in a case that the video file, the target file and the generated target data are separated, the user terminal may not only read the video file, but also read the data generated by the D01 associated with the video file and the file or preset information corresponding to the target, and of course, the unstructured data may use an unstructured database, or the structured database may be combined with a file service form, and the specific implementation mode of the T01 module is not limited in the present disclosure, but any mode consistent with the present disclosure is a specific embodiment of the present disclosure.

In addition, for live, broadcast, etc. video streaming, such as V02 captured video, the data generated by the D01 portion is associated with a unique code of the stream (e.g., a video ID or a video serial number specifically generated based on the video to determine the video code to be unique), and when the user terminal invokes the stream, the data may be extracted from the database on T01.

The T01 module is usually interconnected with an external user terminal through a network, and usually comprises a server, a storage, a database and the like, so as to be accessed by a terminal user to acquire video files, video streams or related data and files and information corresponding to targets (preset information of targets), while in the T01 module, the video is usually uniquely coded (video ID), and if the video is played by the video playing terminal during the separation, the generated data associated with the unique code or the target preset information of the video or the generated data of the target can be read based on the unique code.

For the video user side, generally, the user of a smart television or a smart terminal, a viewer uses a program in the user terminal to watch the video, specifically, for example, the video information is read from a T01 video propagation module, and a file stream (including a slice) or a video stream technology is adopted to display and play the video file or the video stream on a display device/module of a local terminal, so that the user watches the video.

Because U02 can read the target generation data from T01 and the file corresponding to the target (the preset information of the target), when the terminal plays the video, based on the video and the corresponding target generation data, when displaying, the target, such as the oil painting, the table, the stereo target, etc. in the example above, if the played video contains the target data, the terminal can display the frame of the target (or not) and if the user knows this way or clicks or triggers the region or the target in the target frame, the user terminal program invokes the F01 file corresponding to the T01 video, that is, the preset information of the triggered target, and reads and displays the corresponding file (the prompting way contains the voice, the subtitle prompting or other explicit or implicit prompting) on the terminal according to the clicked target.

If the video of the screen test contains a pair of oil painting, the oil painting is marked by a person generating the oil painting as a target to generate target data, when the user watches the oil painting, the related target data is read when the program of the terminal reads the video, then when the video is displayed, the target frame (or not marked) is marked according to the position data of the target and the size of the displayed image and then the target displayed on the terminal according to the time line of the data, and the user clicks the target or the area in the area or the target frame, then the display interface of the user terminal is opened to display the information of the target

For example, the target is a design drawing, when the video is transmitted, for example, a designer can speak through the video, the details of the design drawing cannot be seen clearly (for example, a large building) at all, when the video is realized through the mode, the end user can click the target in the video of the designer at the user terminal when listening to the explanation, the design drawing area is then opened according to the mode, and then the user can randomly amplify the design drawing of the vector, so that the details can be better mastered, and the explanation of the designer can have better effect.

As a special case, if the video is live, at the moment, if a person who live the video wants to adjust a target, such as enlarging and focusing a certain point, when the person who live the video operates, the enlarged data and the position of the file display can be read by all the watched terminals in equal proportion, then the user side displays not only a clear file, but also the content of the statement is focused by the person who live the video, further, a higher transmission effect is achieved, namely, the target is adjusted by the person who generates the video, the preset information of the target which is opened by the user side of the video is enlarged along with the enlargement of the generating side, moves along with the movement of a key point, and further, the transmission efficiency is greatly improved in breadth and effect. The generating end can be used for delineating the target, and can also be used for generating personnel on the video side in a specific period, clicking the target in live broadcasting, because the target on the generating side is delineated and target data are generated, and the triggering or clicking position at the moment is in the current data generating range (generally rectangular) of a certain target, then T01 can be used for sending data to each watching terminal program, the data drive each watching terminal program to read preset information of the specific triggered target and display the preset information in proportion to the generating side, thus, the current live broadcasting personnel can control a far-end user to watch the preset information of the target in a clearer mode according to the condition of live broadcasting, thereby improving the explanation effect (in implementation, the size of the content of the generating side target, the display proportion and the data of the display position are generally received by the terminal which is clicked and displays the preset information of the target according to the corresponding display target of the data, such as generating the preset information of the target on the side, and assuming that the PDF file is 5 pages of the second section is automatically turned over to the second page of the second section, and the user is turned to the fifth page of the second section is generated.

Of course, there is also a situation that after the user reads the video and plays it, the user passive displayed object of the video corresponds to the file information, that is, the video player, the editor has very clear the defect of the traditional video transmission, then after the object appears, a rule (preset instruction) is set, that is, when the terminal playing system reads the video and plays it, the terminal automatically reads the file corresponding to the object according to the rule and displays it, the playing user of the video passively views the preset information of the object, in addition, the passive viewing mode of the user can be set, according to the designed display path of the a01 person, from macroscopic to microscopic, from boundary to core, the user software displays the preset information after seeing the period of the object, so this mode can rapidly improve the effect of video transmission (this implementation effect is equivalent to generating the synchronization of the user side, and the data of the user side control preset information is received by the user at the viewing side and displays the preset information according to the remote control).

Of course, for video transmission, the viewer of the video is only information of purposefully selecting the target content, so there is a form that the user side does not watch the video without the mark indicating the target such as the target frame, but the voice information, the prompt information and the viewer in the video prompt the viewer to click on the target area, and after clicking on the target area, the viewer can see the preset information of the target associated with the clicked target.

In addition, for the program of the user terminal, since the target generation data associated with the video may be read when the video is read, based on the target data corresponding to the clicked frame (or the data corresponding to the frame time), if the position at the time of clicking is within the range included in the target data, the detailed information indicating that the user wants to see the target is displayed, and then the terminal displays the preset information corresponding to the target.

After the steps and the method are carried out on a video, in a user program, a user can click on a target according to a prompt, then the information that the video cannot be expressed in detail can be seen, and of course, in a specific situation (special case), when a video generating person such as a teacher speaks the content of the target, the target is triggered and clicked on the generating or live side, the program judges the clicked target after judging, and the user program watching the video is notified through T01, and preset information of the clicked target is read and displayed at each end.

The description is a mode when the target corresponds to a file, and if the target corresponds to a resource link such as a web page, the logic is the same, namely, the target in a video, when data is generated, the target corresponds to a web page corresponding to a special file URL, or the file, after a user clicks the target, namely, the user terminal APP accesses the web page corresponding to the URL, the web page can also have information such as target description, communication, evaluation and the like, and the method can play a role in direct drainage and sales for commodities.

For example, in a movie or a series, an implanted commodity (target) appears, although the commodity (target) is implanted in the video, the director can think that the commodity (target) is bright (through shooting technology), but after all, the commodity (target) is displayed in a small area in the video, and if the user clicks the commodity (target), a webpage directly related to the target or a page in a terminal program is opened, commodity details are displayed, and the commodity activities are completed, so that the dual realization of commodity advertising and effect is completed.

Therefore, based on the above-mentioned preset information related to the target, F01 may be a file corresponding to the target, such as a video, a picture, a vector, a table, etc., or may be a link of a network, or may be a page of terminal software (preset information corresponding to the target).

In the following, taking fig. 2 as an example, an embodiment will be described, in which S01 is an image of the nth frame in a video, in which there is a person S110, a hand-held object S103, and then projected information S102, and it is assumed that S110 is an economic student, and S102 is a projected data table, it is known that after the video is transmitted to a terminal, the information such as the information in S102 cannot be seen clearly because of coding and the size of the terminal, and that S110, exclusively when speaking, also refers to a product such as S103, a microphone of a certain brand, or a mark of a sponsor organization is sleeved on the microphone.

It is known that the video is composed of images as shown in fig. 2 in a time line, if the machine position is changed, S110 will move at different moments, S103 in the hand will also move along with the movement, S103 is a three-dimensional object, and S102 at the back is a planar object, when the video is imported to the person to be processed by the D01 function, the person editing, creating or recording the video will frame or outline the object S102 with the object frame 122, and S123 will be used to outline the object S103.

Because each frame has a corresponding frame sequence number or corresponding time in the video, under the condition that the frame value or the time value is taken as the target frame, under the condition that the corresponding target is coded (for example, S102 is the target 001, S103 is the target 002), each delineated target frame has a corresponding position parameter (the generated data of the target), the corresponding X, Y value is generated corresponding to the video, for example, the position value of four corners is generated corresponding to the S122 rectangle, the camera shooting angle is possibly changed along with the progress of the video, the lens is stretched, and the value of S122 is changed, but because D01 generates the data of the target, namely, the time or the frame value corresponding to the video, the target frame of the target 001 generates the corresponding value; similarly, S103 generates a value position value of the target frame corresponding to the video frame or the video time, and takes a rectangular frame as an example, and has four angles with corresponding values, for example, when the original video is transcoded to 720P, the four angles of the rectangle 122 in fig. 2 are respectively (x1=100, y1=40), (x2=500, y2=40), (x3=100, y3=800), (x4=500, y4=800), that is, the true pixel value (absolute value), the ratio value with the video pixel, for example, 1080P in the video, the vertical direction is 1080 pixels, and the horizontal direction is 1920 in the ratio relationship (X1/1080, y 1/1920) in the video, when the original video is transcoded to 720P, if the position of the correct target frame cannot be corresponding to the absolute value, and the relative position, for example, the ratio value of the video in the transcoding is not changed, the image ratio relationship is applicable on the user side, after the position of the target in the transcoded video is determined and then the target position is clicked, the clicked target (such as conversion relations x1/1920 x 1080, y1/1080 x 720) can be correctly identified. Of course, the relative position is a normal means in video processing, in the present disclosure, only reminding when transcoding scenes, the absolute value or the relative value can be used for generating data of the target to be considered, and the relative value is adopted in the transcoding environment, so that various propagation quality directions (the original video is transcoded) are more convenient.

When a user watching a video watches the video on a small mobile phone screen, the video is actually scaled down according to the screen size and the like when displayed, so that the scaling value, namely the relative value, reduces the calculation complexity of scaling when clicking a target, otherwise, the user needs to know the format of the original video when making and then calculate the transcoded video according to the possible change of the terminal side.

In fig. 3, it is assumed that the video software in the user terminal is playing the video according to the method, where the portion S211 corresponds to the content of fig. 2, where the object may include the object frame, be displayed to the end user, or may not include the object frame, be displayed to the viewer of the video normally, and display the object frame, which has the disadvantage that the viewer is disturbed in the normal viewing of the video, such as the view of the viewer is worn, and is not interested in any object, so that the object frame is a representation of a poor video effect, in fig. 3, such as the hand-held object has no object frame identifier, where in the future the object image forms a stereoscopic image, and then the human-machine mode is more ideal, and expresses that the object has been set, and may be triggered, and of course the object frame is only displayed when paused, but may also be displayed in a stereoscopic or convex mode when playing in real time, such as the current image game playing is friendly, and the object is set in the current image playing, and the whole-course is displayed in a preset mode, and the trigger mode is performed than the user has the object is set in the whole-course, and the trigger mode is executed.

However, as described above, the small person S110 in fig. 1 can tell (prompt) the viewer in language, and then the video viewer is prompted to click on the stereoscopic object in his hand, or click on the chart, so that more detailed preset information can be seen, or the user is prompted by a subtitle on the interface of the video, for example, please click on the chart on the right side in fig. 2, so that clear chart information can be seen.

In the present application, because in step D01, if an editor selects a target, each target generates data along with a video timeline, and when a user' S player reads data of a far-end video, the generated target data is read together, and when the user clicks the target in a specific frame, a program of the terminal obtains position information of a click point, combines size information of video play in the terminal and position in a layout, the target is position information, and then combines data information of the target corresponding to the frame (position data information of a target frame), a specific target clicked by the user can be calculated, and then file information corresponding to the target can be read by the terminal (preset information of the target), so that the user can clearly see information which cannot be clearly seen in the video, for example, if S102 is a table file, the terminal is opened with the table file, and if S102 is a table picture, the terminal displays a file corresponding to the table picture of S102.

From fig. 2 and 3, it can be seen that, by using the terminal examples of the creator or editor side and the video viewer side, the information which may not be seen at all can be clearly displayed in the above manner, and one of the defects of the original video transmission, that is, the problem of reduced definition caused by unified encoding of whatever target information is solved.

If fig. 2 and fig. 3 are diagrams describing the current live broadcast situation, and the articles in the small hands are articles promoted by the small hands, and under normal conditions, the user cannot clearly see the details of the product only by imagining the user, and by the method of the present disclosure, the user clicks the target corresponding to fig. 2S 103 in fig. 3, the user terminal displays a clear photograph, a design drawing or a live video or a recorded video of the user at the video playing end under the condition of good light and photographic equipment of S103, and then the user at the video playing end can carefully watch the target commodity.

Of course, the video user can also interact with the producer or service team of the video under the condition, such as overlaying communities and instant messaging systems in the disclosure, when the user enters detail viewing, the producer or marketer can also clearly know that the interested person is paying attention to goods or information, and then the service team can provide more information to interact with the video user, and of course, the service team can also be an AI robot, such as an AI robot based on a large model, so as to provide more knowledge and information for the individual, thereby enhancing the propagation efficiency (of course, the systems and terminal software need to be interconnected with the corresponding service system).

In a specific example, after clicking on a target, a video user enters an interface displaying the table in fig. 2, the system notifies the AI robot, and the AI robot gives details to the user about the purpose of making the table, forming data, and conclusion, where these expressions may be information that the previous lecturer trains the AI robot, and then the AI robot depth affects the purpose of the video audience, so that the viewer is more deeply informed than in the video.

If the content in fig. 2 shows the parameter comparison of the product, etc., the user can speak the product party shopping guide such as AI robot or real person to the person watching the details, and then further complete the consumption.

In the current situation, on the video editing or live broadcasting side, an artificial intelligence or computer vision technology mode, namely a D03 target detection technology or function is adopted, when a target is set, a program can track the target along with a video time line to generate target data, on the user side, because when the video target is clicked, a frame number or a video time value is determined, the read target is combined with the generated data, and the position where the video is displayed is clicked, a specific target can be seen by a viewer, and because the adopted mode has no difference from normal video playing under the condition that a target frame is not displayed, the watching of the user is not influenced, and the interested user can know the information of the target content deeply.

In addition, the terminal program disclosed by the invention is also suitable for terminals such as XR (VR), AR, MR) and the like, when a user watches videos, the user clicks the targets in a man-machine interaction mode, so that detailed information of the targets can be displayed on the terminals, meanwhile, the terminals can be served by customer service, messages, discussions and the like of other people can be seen, and then, independent video audiences can be mutually active without influencing the watching of the videos no matter what kind of terminal is arranged, and the defects that the video watching is influenced by the bullet screen and the user is not beneficial to direct relative deep expression are overcome.

In the above description, clicking and triggering the target in the video are adopted, and then the information of the target is obtained, in the man-machine interaction, the mouse may be used for clicking, or the finger may be used for clicking the touch screen, or of course, other agreed modes, such as double clicking on the target object, are also not limited in the present application, and any specific triggering mode (triggering mode) such as clicking or double clicking, etc., where the user adopts the agreed mode to trigger the target in the video, determine and read the preset information corresponding to the target, are all specific implementation examples of the modes described in the present disclosure.

The above method generally needs to be implemented on both the video source side and the video playing side, where the target data and the file corresponding to the target are to be generated, and the user side can obtain the target generated data when reading the video, and after triggering the target, the terminal program displays the file corresponding to the target after the system is determined, so as to achieve the effect of clearly displaying the target

The above method does not reasonably change the content of some targets, such as oil painting, in practice, but adopts a target detection mode, which obviously has relatively low efficiency and generates unnecessary data, and for the PPT which can change (the PPT is turned by the person), the video has the video which is put, and the video is set as the target, so that the content of the target in the video changes, and the target detection method can be adopted.

In the above manner, after the user terminal clicks the screen due to the triggering of the target by the user, the judging manner may also occur in the background, for example, after the user terminal clicks the target in the video, the clicked position data and the clicked time or frame information are fed back to the server, the server judges that the clicked area is included in a specific target frame according to the video and the frame or time information, the clicked information is then determined, and information is fed back to the terminal of the user clicking the target, and then the terminal reads the preset information of the target, that is, the work for judging the clicked target is given to the background system. In the disclosure, each piece of user side software is not limited to acquire the data generated by the target, the program of the user side judges the clicked target or the user side clicks, and the server side judges the clicked target, but the two modes can realize the problem to be solved by the disclosure, and the method is only different.

In the above embodiment, the problem that the target information in the original video cannot be effectively displayed is definitely solved, but there is a problem that target data, such as a video 120 minutes long, may be generated if 30 frames per second are assumed, and then a lot of target data (computing resources) may be generated, in which we provide a2 nd method adopted for achieving the same purpose, in the 2 nd method, depending on the fast processing capability of the terminal, specifically, such as fig. 4, a video editing or recording person (video generating person), a program of a U10 video editing or generating end is used, a digital video stream acquired by a video file V01 or V02 video acquisition module is imported, in addition, preset information corresponding to the target in the video, such as a file, a digital link, a picture F02 corresponding to the target, because the F02 is used for extracting image features of the target, preset information (which may be a plurality of types of files or digital links for display) is not required to be identical, and in the 2 nd method, in which F02 is only a picture for comparison, if one target is clicked at different angles, but the user clicks the target in a different angle or a different position of the target, and the target is clicked or clicked by the user has a different image region; F11 is feature extraction of the target image, and this part can be placed in the function of the T01 module or in the U10 module, and the disclosure is not limited, the video is finally imported into the T01 through the U10, the file corresponding to the target in the video (contained in preset information) and the feature comprising the target are included, the user accesses the T01 by using the terminal program, the method comprises the steps of obtaining video information and playing the video information, wherein the video coding is unique, the characteristic information (the unique coding information) of a target corresponding to the video can be extracted, the user plays and watches the video, when the target in the video is seen, the user clicks and triggers the target (the covered area of the trigger target is substantially) at the moment, the terminal software extracts the image information and the characteristics of the triggered and clicked position (such as x value and y value) in the video, the characteristic information of one or more targets read from T01 is compared, the corresponding target of the clicked position is judged, if the matched target exists, the target is judged to be clicked, and the terminal program displays the preset information of the triggered target in the video. Of course, it is also possible to find a matching area in the current frame video frame by the features of the target, if the extracted x, y values are points in the matching area of a certain target, then determine that the target is a target triggered by the terminal user, and then display the preset information of the target on the screen of the user terminal.

The above-mentioned feature comparison method can save unnecessary calculation compared with the method target detection, because the foregoing method 1 frames the target in each frame theoretically to generate target data, which requires a certain amount of calculation, and the method based on feature comparison only starts to execute comparison after the target is triggered and then judges the triggered target, so the calculated amount is relatively less, but the real-time performance is inferior to that of the target detection method.

After a user triggers a target in a video by clicking or double clicking and other triggering modes, a program of playing the video at the user side obtains a triggered position of a touch screen on one hand, calculates and obtains a clicked position in a video image (the resolution of the display screen is different from that of the video, so that scaling is needed, and scaling is also needed based on the layout size) according to the relative relation between the layout displayed by the video and the screen, then calculates the position of the clicked position in a video frame, and can divide and extract the whole target of a clicked region into image information of the clicked region by adopting an image division mode, compare the image information with the characteristics of a picture of a plurality of targets input in advance (such as SURF and SIFT), then judges that the clicked target is a specific one of the targets preset, then displays specific preset information on the program of the terminal according to the preset information of the target, wherein the specific preset information can be defined by a video generator, a file corresponding to the target defined by the video generator, Video stream, such as high definition picture of Menna Lisa, or PPT corresponding to object, or PDF or a real-time video stream corresponding to object, designing vector diagram, etc., if a user clicks red packet in the main corner (character) hand in video (such as red packet held in S103 in FIG. 2), program of terminal extracts the image of red packet at the target position clicked by user (such as dividing or extracting the area image), then compares the terminal program end with the extracted features of multiple pre-input target images or compares the terminal program end with the features of multiple pre-input target images at the background server, then compares the target of clicked position, such as determining the target of clicked position to be red packet, of course, one video may contain red packets of different types such as red packet containing Fu or property words, then is identified according to the correspondence, The compared targets are displayed according to preset information of different targets, such as a person in a video, holds a red packet (the cover contains financial words), speaks a section of words such as happy blessing in new spring of happy, prompts and clicks the red packet, people see the video if the person has a lot, after the red packet held by the person in the video is clicked, the red packet (which can be a preset information photo file of the red packet) is displayed on a terminal program after the red packet which is the financial word is identified, or the amount of the red packet (the preset information of the target) is automatically or needs to click a control, and a user manually switches the account of the cash clicker of the amount into the account (clicks on a video terminal program to confirm receiving the red packet); if the person in the video is a person selling goods directly, hold a red packet containing "good fortune" word in hand, prompt the audience looking at the video to click on the red packet, then the user program extracts the image and feature of the clicked position target, and generates the feature image contrast of the target input in advance by the user in the terminal program or background server, then judges and compares the image contrast to be the red packet containing "good fortune" word, the information corresponding to the red packet is preset by the producer of the display video to be the discount card of the commodity, the user video interface displays the discount card, discount amount, the user clicking on the red envelope manually clicks on the confirmation receipt or the discount card automatically switches to the electronic card envelope, in this embodiment, since the cash or discount card is automatically or manually switched to the card envelope or wallet, instead of displaying the preset information, but completing the next instruction.

In the above embodiment, after clicking or triggering the target, the terminal program segments or extracts the target image or extracts the image (including the target or part of the target) of the clicked position at the clicking position corresponding to the video image, compares the image with the characteristics of the image of the target input in advance by the generating personnel, judges the target, then displays the preset information according to the preset information corresponding to the preset target of the video generating personnel, executes the next program or instruction, such as automatically transferring into a digital wallet or a discount card to a card package after cash for 5 seconds, or in order to meet certain standard flow requirements, human-computer interaction such as button with confirmation meaning appears on the screen of the terminal, then the user clicks to enter into an electronic account or card package, such as transferring money of a red packet into the digital wallet of the clicker, or transferring the discount card into the digital card package of the clicker.

In the above disclosure, besides that the user can clearly display the preset content of some targets by clicking the targets in the video, the content that the video cannot show can make up for the deficiency of the video, or can realize that the viewer of the video is prompted by the video, clicking the specific target in the video, the front-end program extracts the target in the video frame by obtaining the clicked position based on the layout, pixels and screen pixels of the video display, and compares the front-end or rear-end with the features of the target image input in advance, determines the clicked target, then displays the clicked target with the preset information of the target, and even executes the program or instruction according to the task to be completed of the preset target, such as transferring cash of a red package into a digital wallet of the clicker, or transferring a discount card into a digital card package of the clicker, of course, clicking goods taken by a person in the video can also be set to be displayed and manually or automatically transferred into a shopping cart.

Of course, according to the position of the click position in the video image, the target image is segmented and extracted in a relatively computationally intensive manner (although there are existing algorithms and functions), and the method can be simpler in treatment, namely according to the click position, the image is extracted according to specific dimensions such as setting width 150 pixels and height 90 pixels, or 90 pixels by 90 pixels, then the image is extracted, then the feature is extracted and compared with the feature of the picture of one or more targets input in advance, then the clicked target is judged, specifically taking the clicked position of the video as the center, if taking 150x90 pixels as an example, assuming that the clicked position is x=300 and y=200, and taking a rectangle as an example, 2 diagonals of the rectangle are (300-150/2, 200-90/2), (300+150/2, 200+90/2); in the function of image processing, there are two opposite angles, so that the image in the rectangle can be extracted, then the characteristics of the image in the extraction area are matched with the characteristics of each target input in advance by the video producer, the matched target is calculated, the preset information of the target is displayed, or the preset information is displayed and the program is executed, such as cash in a red packet is transferred into an electronic wallet of a clicker in a terminal program (the above numbers are only used for illustration, in practical application, images of different sizes can be set, including other geometric shape extraction such as circles, the contact is used as a center, the radius is set to be 90 pixels, the image is extracted, and the embodiment is only used for illustration).

In the present disclosure, a method path of determining a clicked object in a video is not unique, but is not detailed as an existing technology, but as required by the claims in the present disclosure, a method or a technology of extracting an object in a clicked area, comparing with a feature of a picture of one or more objects input in advance by a video producer, determining a matching object, displaying information preset by the matching object, or reading preset information and executing a next procedure, or displaying a man-machine interaction interface, and requiring a user to confirm or select to execute the next procedure is consistent with the present disclosure.

In the above embodiment, the execution of the next procedure, such as the transfer of cash in a red packet, usually requires the user to click for confirmation, so after the user needs to click on the red packet, it is identified that the user clicks on the red packet, then the amount of cash preset by the video producer is displayed, the user clicking on the red packet confirms that the cash is to be received on the terminal program interface, and after clicking, the cash is transferred;

As in the video of selling goods, in addition to the above-mentioned method of this disclosure, clicking the target in the video clearly sees the preset image, or directly adding the clicked goods to the shopping basket of the clicker after clicking the goods, that is, after identifying and matching, the next procedure or instruction is set to add the target goods to the shopping basket, which is the same as the embodiment method of clicking the red package, except that the shopping basket has a different form of representation from the electronic wallet and the electronic card package.

In an interactive movie, the technology of the present disclosure is also used for interactions in video, such as a scenario, where a scout places a live photo on a table, hopefully to help analysis (including characters and viewers in the scenario), and at this time, the viewer watching the scenario clicks the photo in the video, in the same manner as the above disclosure, the photo is clearly shown on the terminal program of the viewer (digital photo is a preset information, and the photo in the scenario video is a target), then the viewer can analyze the photo, such as discussing in the interactive group of the scenario, then participate in the case breaking interaction according to the photo, then the episode is focused down, and the director modifies the scenario and shoots according to the interaction, information, thereby forming a depth interactive movie work.

Also in the case of a movie, for example, a person in the movie is taken to a book containing a decrypted password, and usually the decryption in the movie is irrelevant to the audience, but according to the present disclosure, if the book is clicked by a video user on a table placed in a video, after comparing according to the above technology, the user knows that the target is clicked, then preset information, namely the content of the book, is presented at the user end, the user can search and decrypt the password in the book in a password mode, then the movie is unidirectionally infused, and becomes bidirectional deep interaction, and in the community of the movie, the audience can interact and cooperatively decrypt the password, so that the present disclosure gives a new deep interaction capability to the movie, and therefore, in the case of the movie, besides the preset information, an interaction area of the audience is displayed at the user terminal interface, a photo description and the like can be discussed, thus preset information is analyzed, and the opinion and logic of the audience can be supported, thereby the audience can walk to the left or the right of the viewing system.

In movie and television drama, if the set target is a storage disk containing images, the traditional mode can only read the character ideas in the movie, but after adopting the technology of the present disclosure, a viewer can click on a storage disk specially written in the video, after judging that a user clicks on the storage disk according to the technology of the present disclosure, a file list in the storage disk can be displayed on a user terminal, the user can participate in a process of searching for a spider silk and horse trace in the file based on the list, and the information clicked and seen by the user terminal is also information preset by a video generator based on the target, only the interactivity is more simulated actually (the contents can be a set webpage, and the stored contents can be real files, the user can search for the spider silk and horse trace in a pile of real files), so that the technology of the present disclosure clearly improves interactivity, transmissibility (vigorous discussion of crowd, clearly attracts media, and then increases heat and transmissibility) of the traditional movie industry.

In the above embodiments, the targets are all clicked by the user, and under the feature comparison method, the targets need to be preset pictures for extracting features and comparing, then the clicked targets are judged, and the displayed information is also preset for displaying the video transmission effect or trend of the video content, and the next procedure is a procedure or instruction which needs to achieve the setting purpose after the targets with different attributes are clicked.

In the comparison method, pictures of targets which will appear in the video are taken as characteristic extraction, comparison and matching input when the video is generated, when a user looks at the video, the targets which appear in the video are triggered after prompting or habit development, the terminal program extracts images and or characteristics of clicked positions and compares the images and characteristics with the pictures of the targets, the clicked targets are determined, and the clicked targets are displayed on a display interface of the terminal according to preset information, or the next program is displayed and executed.

In movie and television dramas, such as the year-of-life, when a user watches at a terminal, roles in videos can also use red packages to congratulate the audience (such as the year-of-life), and the audience clicks the red packages, so that a platform for playing the videos gives the audience a certain amount of cash, tokens, discount cards and the like, or commodity discounts, samples and the like of certain sponsored dramas, and the user terminal program clicking targets displays the clicked results, and the results are transferred to corresponding digital wallets, card packages and the like, so that the audience is happy, and the conversion of certain commodities is also realized.

One type of object is a person, for example, a video of a video blogger, the video blogger takes the video as a target, when a user triggers the face of the video blogger, the terminal program extracts an image (the face of the person) of a clicked position in the video, and compares the image with a target image (a face photo) input in advance by the blogger, then the program determines that a video viewer triggers the face of the video blogger, a next program, for example, automatically adds set information, such as a communication contact of the video blogger to a contact list of the viewer, or focuses on the blogger, or the like, or clicks the face position of the video blogger, the program sets to click the face of the video blogger to communicate with the video blogger (can display or not display the contact), and the communication, for example, contains voice, video, voice or text or half duplex discussion group, so that the video viewer is prompted by information in the video (contains voice, text displayed on a screen, or icon with specific meaning) after clicking the target, the video is clicked, the preset target setting information can be displayed, and the next program is effectively propagated.

In the above embodiment of the person in the click video, if the video of a commodity, a tour, etc. is seen by a video audience, the person in the video can directly establish communication with the doctor by clicking the face of the person in the video, then produce business effects, such as direct communication, consultation, and then achieve the purpose of guiding the person, thus, a great promotion of the propagation tool of the video, such as a tourist seeing the video of a scenic spot doctor, needs a series of means such as inquiry, etc. in the traditional way, before the person contacts the doctor, the face area of the doctor is clicked by adopting the technology of the present disclosure, the doctor can directly establish communication with the doctor according to the above disclosure technology, and the way does not need the doctor to disclose contact information, before the doctor communicates, the doctor can clearly see the trusted information of the caller (the caller is sent when the caller calls), thus, the doctor is not the irrelevant person is distinguished, then confirms the receiving, refuses, and details see the authorized patent 201810357591.6 "a mobile communication system facing the scene and communication content", thus, the irrelevant person is prevented from interfering with the step, and the customer is triggered, and the next program is executed after the customer is identified, and the next program is executed.

Of course, for a video containing public figures, if a video producer sets figures as targets and sets focus to be the next execution program, if a viewer of a video sees a favorite public figure and clicks the favorite public figure in the video, the next program can be set to focus on the public figure of the viewer, such as WeChat, twitter and other public numbers, so that focus is expanded on the public figure, i.e. a very good effect is achieved, so that the defects of the existing video are overcome, and the video spreading effect is improved.

Of course, as in the previous example, if the viewer likes him, by clicking on the person in the video, then the viewer of the video pays attention to his public number, which is also the next instruction or next program to be executed.

Regarding the attention of the public number, the attention can be realized by calling the API, and the public number of the target person (preset information) and the user number of the video viewer, but this is all typical examples of executing the matching described in the present disclosure and then executing the matching according to the preset next procedure, and calling the API to realize the present function, that is, after executing the comparison target confirmation, reading the information and executing the next instruction.

The techniques of the present disclosure described above are directed to addressing a number of deficiencies of video technology, and in turn, allowing video to serve as a more efficient propagation tool, whether from the information dimension, merchandise sales, commercial coupons (red envelope, discount card, shopping, etc.), or to interact with the communication of people in the video.

The technology disclosed by the invention is not used for extracting the picture content in the video and then matching certain information (essentially, the picture is used for drawing) from the Internet data, and the technology disclosed by the invention is used for solving the defects of the video technology and further enhancing the video as a carrier, so that the video has stronger propagation capability and interaction capability.

The method is used for solving the problems that targets in the current video cannot be effectively expressed and displayed by the video and then the video transmission is affected, and is different in that, for example, a comparison method is used for comparing the image processing capability of a dependent terminal, for example, in the current frame, the specific targets are quickly matched based on the characteristics of the targets and are determined based on the matched positions and the positions of trigger targets, if one video is longer and contains a plurality of targets, for example, a movie is 90 minutes, 50 targets are contained, the 50 targets are completely matched and take time obviously, and in the first method, the generation of target data is started after the video is imported, but when a user clicks on the side, only the click position and the target generation data are calculated based on the range of the current frame, so that when the target is finally matched to the target, the time efficiency is reversed and very high, and when the target data generated on the side are directly played, the user side is required to grasp in real time, namely, the T01 is required to provide a data channel to ensure the efficiency.

In order to simplify the second comparison mode, in u10, the video generation or editor can associate time with the target according to the video time line, for example, the time of the A period is only 1,2 and 3, and the time of the B period is only 2, so that the calculation amount and time during comparison can be reduced based on time, for example, in a certain video period, only a few targets are matched, so that the characteristics of the targets are matched, the settings can be used as preset auxiliary information, after the target is triggered by a user, a user terminal program or a background server only compares the characteristics of the targets in the time period based on the preset auxiliary information, and then reduces the feedback delay (for example, the time length of finding the target and displaying the preset information of the target), but in the case of live broadcast, the recorder can be busy with a plurality of targets, because of live broadcast time constraint is strong, in the case of method 1, the system generates the target data (the range value of the target in the current frame), but generates corresponding data and if the target calculation theory is not clicked by the recorder, the calculation is wasted if the target is the target.

In the method 2, the picture of F11 may also be implemented in U10 by a man-machine interaction manner, for example, a person is generated to select and delineate a target, then the features of the image are extracted from the delineated target, but the information of the image background may be extracted, and a clean picture without background is input, which sometimes has higher comparison efficiency.

In the method 1 or 2, when the target determines and displays the information of the target file or URL, the method 2 is consistent with 1, and the method is not limited to displaying the target preset information, but can also access customer service resources, wherein the customer service resources comprise people or machine resources, specifically, for example, videos are video which teach a product, and users interested in the product click on the target commodity, the preset information of the product which comprises details which cannot be displayed by a video technology is clearly displayed on a user side, at the moment, machine customer service personnel or customer service personnel actively provide more information for the users who enter details to watch so as to better guide the information which is displayed by the commodity and is purchased from the angles of brand advertisements and effect advertisements, naturally, the information displayed by the commodity also comprises commodity purchasing information such as shopping baskets and the like, and the resource links are videos which are other than web pages, for example, jewelry videos shot by micro distance are naturally available, the jewelry rotates under the light, and the terminal users can purchase the target commodity in a single way under the condition of being infected by the videos, the condition of being in detail, the customer side can also purchase the detail through intervention, the details which cannot be displayed by the video technology, the customer service personnel can actively, the user can provide more detailed information for the users, the information can be further enlarged, the information can be provided by the user can see the user to the user can see the information, and the information can be further enlarged, the user can see the information, and the user can have a great mutual requirement, and the information, and the user can see the information, and the information can be greatly required by the user and the user can be adjusted and the user can meet the user and the user can meet the requirements and the user.

In the method 2, feature extraction is also a common sense technique in image processing, and there are a large number of techniques in technical literature for object detection, and the common sense technique is not developed in the present application.

In addition, regarding the display in the user interface, the same page display as the video may be performed, and the page display may be performed, which is not limited in this disclosure, for example, a bottom sheet, a pop-up window, etc. may be used in the same page, and a different page may be used to call a program already installed on the terminal, for example, by newly opening an interface in a program or calling another general-purpose program such as a browser or using a Deeplink or similar attribute protocol.

For the current and future technologies, the terminal is no longer a one-way receiving device such as a television, so based on the functionality of the receiving terminal, a user reads a plurality of forms such as a file containing pictures, vector pictures, videos, tables, texts and the like of a target, then the defect that the video is transmitted only depends on frame content information is completely broken, through the transmission of the video, the user can acquire information which is contained in the video but cannot express clearly based on own appeal, and the conventional information cannot have the transmission strength of the video, so the video is used for the view, the depth of the superimposed information is large, and the transmission effect of the video is wide and deep.

Based on the above, the method actually comprises video editing and generating software, playing software of the user terminal and a video spreading module, wherein the software or the module runs on the computing equipment, the computing equipment comprises a CPU, a storage and a network module, and the video editing and generating personnel edit and generate videos by applying the method, so that the terminal user watches the videos.

The video editing and generating side software runs on a computer or an intelligent terminal, the user side software runs on a computer or a terminal and a server resource running on a video spreading module, the video editing and generating side software comprises a CPU (Central processing Unit) and memory and external memory, the memory resource comprises a memory and the external memory, the network and the video spreading module are all required to be interconnected, the video spreading module is usually a server or a server group, and the video spreading module provides access service to the external video under the support of the network and comprises a database and the like.

In the disclosure, if the target in a video includes PPT taught by a presenter and also includes other recommended products such as product a, then method 1 and method 2 may be used in combination, that is, the video producer delineates the PPT display area in the video, and method 1 is used, while product a uses image feature comparison mode, because the content in the PPT area is dynamic (such as PPT and video played in the projection area), and the image comparison mode is used, the effect is not as good as that of the area, and the target detection mode is used, so that mode 1 and mode 2 are used in combination for different situations, so that content delivery is satisfied, and the system computing resources are reduced, and meanwhile, the experience sense of the user side is stronger, after all, the processing and comparison of the images need to consume computing resources and time, while the dynamic content is not suitable for feature comparison if no special processing is used.

The technology disclosed in the above is essentially a method for video application, i.e. a user watching a video triggers a target in the video being played by a terminal, a user terminal program calculates the position of the triggered position in the video frame, a terminal program or a background system judges the target to which the triggered position belongs according to the information of the target input by a video generating person, the terminal program reads the preset information of the target to be displayed on the terminal, or reads the preset information of the target to be displayed on the terminal, and executes a preset next finger or program.

In practice, in the first mode or the image feature matching mode, the triggered point position is converted into a position in the video frame, and the touch point is usually a coordinate value of (X, Y), and in either mode, the value is actually in the target, so it belongs to a certain point in the target, for example, in fig. 2, the clicked position is in the projection area, and in the image feature contrast mode, whether the image is segmented or the image is extracted in the video frame according to the triggered point, the trigger point (trigger position) is in the area of the target according to the geometric frame, so the trigger position (position in the video frame) is an input for judging the target to which the clicked position belongs, so it is the target to which the clicked position is judged to belong in the image, regardless of the method 1 or the method 2.

In addition, it should be noted that in image processing and comparison, such as book comparison, and comparison with human face, different algorithm models are generally adopted for better effect, so that in order to better achieve the goal, when the goal is set, the video producer can set the displayed content type, such as picture, PDF file, video stream, etc., based on the classified logic, such as face, article, book, design, and preset information, and the next procedure or instruction after the goal is triggered can be set in various forms, such as adding shopping basket, adding public number, paying attention, or cash transfer, ordering, etc., and specific examples are that the video producer sets the goal a in the setting interface, the type is the face of the person, the corresponding uploading photo of the face of the goal 1, the preset information is the public number of the person, and the next procedure triggering the goal to execute is the public number of the goal. Then after the user of the video sees the video, if interested, the system identifies and judges that the user clicks the target A, then based on the preset information, executes the next instruction, adds the public number of the target, if in the same video, the target B is classified as a book, and the preset information is the PDF version of the book, so that the generator needs to set the target B, upload the picture of the target B, upload the PDF file of the target B, when watching the video, the user clicks the book in the video, namely the target B, the triggered area after the system judges belongs to the position of the book of the current frame in the video, namely the target B, the user side reads the PDF file and displays the PDF file on the user terminal triggering the target, the display mode can be any man-machine interaction mode, including windows, display areas and the like, and when executing the instruction, the mode is realized by means of the transmissibility and the liveliness of the video, and the specific functions and instruction execution are simultaneously combined, so that the APP or small part of functions required by the traditional business are replaced by the video interaction mode, thereby the video has direct commercial effect.

In the above way, all the devices (although such devices are usually triggered in a virtual manner, they are triggered) such as a touch-controllable television, a display screen, an intelligent terminal or an XR (VR, MR), and by applying the technology of the present disclosure, various contents in a video can be defined as targets, and according to the classification, attribute, preset information and next instruction of each target, the preset information can be displayed or the preset information can be displayed and executed by triggering, so that the defect of the traditional video for a long time is thoroughly twisted.

In the disclosure, the program of the user terminal, such as U02, may be different in the presentation form of different terminals, for example, on a computer, may be a form based on Web access, and on a mobile phone, such as android, apple, etc., may be generally in APP form, but actually, may also be a Web form, by acquiring a position clicked by a mouse or a multi-touch screen, then extracting a position in a displayed video frame, and feeding back data to a server, that is, a video transmission module in a data communication manner, where the video transmission module performs judgment, and then displays pre-information or performs a preset instruction on the user terminal according to a preset. The same reason is that the program of the video generating end can also be in a Web access form, but if the program is live broadcast, the mode of generating the program needed by the user end, such as a program running on a computer or an APP running on a mobile terminal, such as a mobile phone, is generally simple and easy to use.

In order to achieve the above technical objects and based on different characteristics and properties of the objects in the video, the present disclosure adopts various methods or combinations of methods, so as to further improve the propagation effect of the video, and although any path can be adopted according to the objects in implementation, as described in this disclosure, it is a specific embodiment of the present disclosure.

Through the above description, a person skilled in the art can implement steps and procedures based on the disclosed technology, so as to solve the problems that the video has poor transmission effect for a long time and cannot achieve the transmission purpose.

Claims

1. A method for video applications, comprising:

The user terminal program calculates and obtains the triggered position in the video frame;

The user terminal program or the video transmission module judges the target of the position in the current video position according to the information of the target input by the video generating personnel or the target data generated by the delineated target;

The user terminal program reads the preset information of the attribution target and displays the preset information on the user terminal program or reads the preset information of the attribution target and executes a preset next instruction.

2. A method for video applications according to claim 1, comprising:

the targets in the video comprise planar targets and stereoscopic targets;

the planar target comprises at least one of projection, photo, picture and planar artwork, and the stereoscopic target comprises a non-planar object;

in addition, the target may also include a character in the video.

3. A method for video applications according to claim 1, comprising:

and the information of the target input by the video generating personnel comprises a picture of the target, and the picture is used for feature extraction and comparison.

4. A method for video applications according to claim 1, comprising:

the video generating personnel define a target in a video display interface of the target data generating module;

the method for delineating the target comprises a method for image segmentation or geometric frame delineation.

5. A method for video applications according to claim 1, comprising:

and the video generating personnel upload preset information of the target to the video transmission module through a video generating end program or directly.

6. A method for video applications according to claim 1, comprising:

The generation of the target data comprises the steps of generating the target data based on a time line of the video and the position and the track of the target in the video by adopting a mode containing artificial intelligence and/or computer vision technology.

7. A method for video applications according to claim 1, comprising:

the information preset by the video generating personnel for the target contains files or resource links, the resource links contain web page links and video links, and the links of the video contain video streams.

8. A method for video applications according to claim 1, comprising:

If the user of the terminal watching the video triggers and is judged to trigger the target in the video, the video user terminal can display preset information and also can contain an interaction area;

The interaction area comprises an interaction area of service personnel or an interaction area of audience;

the service personnel also comprise machine service personnel;

the service personnel further describe, provide and communicate the targeted information with the end user.

9. A method for video applications according to claim 1, comprising:

if the terminal user watching the video triggers the face of the person in the video, the preset information is read and the instruction is executed according to the next instruction preset by the video generating person of the target, wherein the instruction contains attention or communication establishment.

10. A method for video applications according to claim 1, comprising:

If the video contains a plurality of targets, and the target type attributes are different, the target data generation mode can be combined with the characteristic comparison mode to be applied, and the target data generation mode is used for the targets with dynamic content change and the targets with plane target area content change.