Detailed Description
Reference will now be made in detail to the embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The embodiments described in the examples below do not represent all embodiments consistent with the application. Merely exemplary of systems and methods consistent with aspects of the application as set forth in the claims.
It should be noted that the brief description of the terminology in the present application is for the purpose of facilitating understanding of the embodiments described below only and is not intended to limit the embodiments of the present application. Unless otherwise indicated, these terms should be construed in their ordinary and customary meaning.
The terms first, second, third and the like in the description and in the claims and in the above-described figures are used for distinguishing between similar or similar objects or entities and not necessarily for describing a particular sequential or chronological order, unless otherwise indicated. It is to be understood that the terms so used are interchangeable under appropriate circumstances.
The terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a product or apparatus that comprises a list of elements is not necessarily limited to all elements explicitly listed, but may include other elements not expressly listed or inherent to such product or apparatus.
The term "module" refers to any known or later developed hardware, software, firmware, artificial intelligence, fuzzy logic, or combination of hardware or/and software code that is capable of performing the function associated with that element.
In the embodiment of the present application, the display device 200 generally refers to a device having a screen display and a data processing capability. For example, display device 200 includes, but is not limited to, a smart television, a mobile terminal, a computer, a monitor, an advertising screen, a wearable device, a virtual reality device, an augmented reality device, and the like.
Fig. 1 is a schematic diagram of an operation scenario between a display device and a control device according to some embodiments of the present application. As shown in fig. 1, a user may operate the display device 200 through a touch operation, the mobile terminal 300, and the control device 100. Wherein the control device 100 is configured to receive an operation instruction input by a user, and convert the operation instruction into a control instruction recognizable and responsive by the display device 200. For example, the control device 100 may be a remote control, a stylus, a handle, or the like.
The mobile terminal 300 may serve as a control device for performing man-machine interaction between a user and the display device 200. The mobile terminal 300 may also be used as a communication device for establishing a communication connection with the display device 200 for data interaction. In some embodiments, the mobile terminal 300 may install a software application with the display device 200, implement connection communication through a network communication protocol, and achieve the purpose of one-to-one control operation and data communication. The audio/video content displayed on the mobile terminal 300 can also be transmitted to the display device 200, so as to realize the synchronous display function.
In some embodiments, the mobile terminal 300 or other electronic device may also simulate the functions of the control device 100 by running an application program that controls the display device 200.
As also shown in fig. 1, the display device 200 is also in data communication with the server 400 via a variety of communication means. The display device 200 may be permitted to make communication connections via a Local Area Network (LAN), a Wireless Local Area Network (WLAN), and other networks. The server 100 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, a content delivery network (Content Delivery Network, CDN), basic cloud computing services such as big data and an artificial intelligent platform.
The display device 200 may provide a broadcast receiving tv function, and may additionally provide an intelligent network tv function of a computer supporting function, including, but not limited to, a network tv, an intelligent tv, an Internet Protocol Tv (IPTV), etc.
Fig. 2 is a block diagram of a hardware configuration of the display device 200 of fig. 1 according to some embodiments of the present application.
In some embodiments, the display apparatus 200 may include at least one of a modem 210, a communication device 220, a detector 230, a device interface 240, a controller 250, a display 260, an audio output device 270, a memory, a power supply, a user input interface.
In some embodiments, the detector 230 is used to collect signals of the external environment or interaction with the outside. For example, detector 230 includes a light receiver, a sensor for capturing the intensity of ambient light; either the detector 230 comprises an image collector, such as a camera, which may be used to collect external environmental scenes, user attributes or user interaction gestures, or the detector 230 comprises a sound collector, such as a microphone or the like, for receiving external sounds.
In some embodiments, display 260 includes display functionality for presenting pictures, and a drive component that drives the display of images. The display 260 is used for receiving and displaying image signals output from the controller 250. For example, the display 260 may be used to display video content, image content, and components of menu manipulation interfaces, user manipulation UI interfaces, and the like.
In some embodiments, the communication apparatus 220 is a component for communicating with an external device or server 400 according to various communication protocol types. The display apparatus 200 may be provided with a plurality of communication devices 220 according to the supported communication manner. For example, when the display apparatus 200 supports wireless network communication, the display apparatus 200 may be provided with a communication device 220 including a WiFi function. When the display apparatus 200 supports bluetooth connection communication, the display apparatus 200 needs to be provided with a communication device 220 including a bluetooth function.
The communication means 220 may communicatively connect the display device 200 with an external device or the server 400 by means of a wireless or wired connection. Wherein the wired connection may connect the display device 200 with an external device through a data line, an interface, etc. The wireless connection may then connect the display device 200 with an external device through a wireless signal or a wireless network. The display device 200 may directly establish a connection with an external device, or may indirectly establish a connection through a gateway, a route, a connection device, or the like.
In some embodiments, the controller 250 may include at least one of a central processor, a video processor, an audio processor, a graphic processor, a power supply processor, first to nth interfaces for input/output, and the controller 250 controls the operation of the display device and responds to the user's operation through various software control programs stored on the memory. The controller 250 controls the overall operation of the display apparatus 200.
In some embodiments, the modem 210 receives broadcast television signals via wired or wireless reception and demodulates audio video signals, and EPG data signals, from a plurality of wireless or wired broadcast television signals.
In some embodiments, the controller 250 and the modem 210 may be located in separate devices, i.e., the modem 210 may also be located in an external device to the main device in which the controller 250 is located, such as an external set-top box or the like.
In some embodiments, a user may input a user command through a graphical user interface (GRAPHICAL USER INTERFACE, GUI) displayed on the display 260, and the user input interface receives the user input command through the Graphical User Interface (GUI).
In some embodiments, audio output device 270 may be a speaker local to display device 200 or an audio output device external to display device 200. For an external audio output device of the display device 200, the display device 200 may also be provided with an external audio output terminal, and the audio output device may be connected to the display device 200 through the external audio output terminal to output sound of the display device 200.
In some embodiments, user input interface 280 may be used to receive instructions from user input.
To perform user interactions, in some embodiments, display device 200 may be run with an operating system. The operating system is a computer program for managing and controlling hardware resources and software resources in the display device 200. The operating system may control the display device to provide a user interface, for example, the operating system may directly control the display device to provide a user interface, or may run an application to provide a user interface. The operating system also allows a user to interact with the display device 200.
It should be noted that, the operating system may be a native operating system based on a specific operating platform, a third party operating system customized based on a depth of the specific operating platform, or an independent operating system specially developed for a display device.
The operating system may be divided into different modules or tiers depending on the functionality implemented, for example, as shown in FIG. 3, in some embodiments the system is divided into four layers, an application layer (simply "application layer"), an application framework layer (Application Framework) layer (simply "framework layer"), a system library layer, and a kernel layer, from top to bottom, respectively.
In some embodiments, the application layer is used to provide services and interfaces for applications so that the display device 200 can run applications and interact with users based on the applications. At least one application program can be run in the application program layer, and the application programs can be a Window (Window) program, a system setting program or a clock program of an operating system; or may be an application developed by a third party developer. In particular implementations, the application packages in the application layer are not limited to the above examples.
The framework layer provides an application programming interface (Application Programming Interface, API) and programming framework for the application. The application framework layer includes a number of predefined functions. The application framework layer corresponds to a processing center that decides to let the applications in the application layer act. Through the API interface, the application program can access the resources in the system and acquire the services of the system in the execution.
As shown in fig. 3, in the embodiment of the present application, the application framework layer includes a view system (VIEW SYSTEM), a manager (Managers), a Content Provider (Content Provider), and the like, where the view system may design and implement interfaces and interactions of the application, and the view system includes a list (1 ists), a network (grids), text boxes, buttons (butto ns), and the like. The manager includes at least one of the following modules: an activity manager (ACTIVITY MANAGER) is used to interact with all activities running in the system; a Location Manager (Location Manager) is used to provide system services or applications with access to system Location services; a package manager (PACKAGE MANAGER) for retrieving various information about the application packages currently installed on the device; a notification manager (Notification Manager) for controlling the display and clearing of notification messages; a Window Manager (Window Manager) is used to manage icons, windows, toolbars, wallpaper, and desktop components on the user interface.
In some embodiments, the activity manager is used to manage the lifecycle of the individual applications as well as the usual navigation rollback functions, such as controlling the exit, opening, fallback, etc. of the applications. The window manager is used for managing all window programs, such as obtaining the size of a display screen, judging whether a status bar exists or not, locking the screen, intercepting the screen, controlling the change of the display window, for example, reducing the display window to display, dithering display, distorting display and the like.
In some embodiments, the system runtime layer may provide support for the framework layer, and when the framework layer is in use, the operating system may run instruction libraries, such as the C/C++ instruction library, contained in the system runtime layer to implement the functions to be implemented by the framework layer.
In some embodiments, the kernel layer is a functional hierarchy between the hardware and software of the display device 200. The kernel layer can realize the functions of hardware abstraction, multitasking, memory management and the like. For example, as shown in fig. 3, a hardware driver may be configured in the kernel layer, and the driver included in the kernel layer may be at least one of the following drivers: audio drive, display drive, bluetooth drive, camera drive, WIFI drive, USB drive, HDMI drive, sensor drive (e.g., fingerprint sensor, temperature sensor, pressure sensor, etc.), and power supply drive, etc.
It should be noted that the above examples are merely a simple division of functions of an operating system, and do not limit the specific form of the operating system of the display device 200 in the embodiment of the present application, and the number of levels and specific types of levels included in the operating system may be expressed in other forms according to factors such as the functions of the display device and the type of the operating system.
A specific screen may be presented based on the display device 200 described above. Such as play screens, control interfaces, and other application interfaces. In some embodiments, the display device 200 may display a multi-channel video playback interface based on Picture-in-Picture (PiP) or multi-window functionality. The picture-in-picture and multi-window function is a video processing technique that may allow the display device 200 to superimpose one main video picture on another video picture, and the multi-window function may allow the display device 200 to display multiple video pictures simultaneously. The picture-in-picture function and the multi-window function are realized by processing a plurality of video sources simultaneously and synthesizing the video sources into one or a plurality of pictures, and can be applied to a plurality of fields such as video processing, broadcasting, security monitoring, multimedia application and the like.
For the picture-in-picture function, display device 200 may receive and process two different video sources, which may originate from different input channels, e.g., two modems, two cameras, one camera and one video file, etc. For each video source, the display device 200 may perform a decoding operation to convert the compressed video data into pixel information that can be used directly for display so that the system can understand and process the video data. The decoded video data may be fed to a graphics processor (Graphics Processing Unit, GPU) for compositing the pictures and displaying the composited pictures. In the composition phase, the main video picture occupies the main part of the screen display, and the other video picture is overlapped on the main video picture with a smaller size as a picture-in-picture, thereby realizing the picture-in-picture display effect. The user can control the custom settings of the position of the picture-in-picture, the size of the picture-in-picture, the transparency of the picture-in-picture, the audio output, and the like through interaction modes such as a remote controller, a keyboard or a touch screen.
The pip function requires the display apparatus 200 to process video data in real time to ensure smooth playback of both the main video picture and the pip, reducing delay or jamming.
For the multi-view function, the multi-view function allows the display apparatus 200 to simultaneously display a plurality of video pictures through a plurality of windows. The windows may be the same size or may be different sizes. The multi-view function is similar to the picture-in-picture function in that the display device 200 receives and processes multiple video sources, each of which needs to be decoded and combined into a picture display by the graphics processor. All video windows are updated in real time to keep synchronous play of the video.
For a multi-window layout, the display device 200 may preset a variety of layouts, such as a 2 x 2 grid, a 3 x 1 vertical stack, etc., and the user may select a different window layout based on the needs. The user may also perform viewing of the video source, resizing of the window, moving of the window position, etc. through a user interface provided by the display device 200.
In some embodiments, the display device 200 may receive a broadcast television signal through the modem 210, demodulate audio-video data from the broadcast television signal, and play the audio-video data through the display 260 and the audio output device 270, thereby playing a digital television program for viewing by a user.
Fig. 4 is a flowchart of playing a digital television program according to an embodiment of the present application. The Front-end Sub-system (Front-end Sub-system) of the display device 200 may receive a Radio Frequency (RF) signal from an antenna, and the received RF signal is fed into a Tuner (Tuner) through which the RF signal is converted into an intermediate Frequency signal (INTERMEDIATE FREQUENCY, IF) of a designated Frequency. The intermediate frequency signal is then input to a demodulator (Demodulator) and processed by the demodulator to be converted into a Multi-program transport stream (Multi-Programme Transport Stream, MPTS), i.e., a TS stream.
The de-multiplexing subsystem (Demux Sub-system) of the display apparatus 200 may send the transport stream to a de-multiplexer (Demux). The Demux module can separate the packets by type and separate audio, video and other data packets from the TS stream. These packets are in the form of Packetized elementary streams (Packetized ELEMENTARY STREAM, PES). The PES packet is formed by grouping elementary stream (ELEMENTARY STREAM, ES) data and adding header information. I.e. Demux module can filter out audio PES, video PES and data packets from TS stream. And the data meeting the requirements can be further filtered according to the setting of the user. If the television program is encrypted, the de-multiplexing subsystem may feed the de-multiplexed data to a descrambler (DeScrambler) that may use Keys (Keys) provided by a smart card (SmartCard) to decrypt the audio video data.
The Audio/Video subsystem (a/V Sub-system) of the display apparatus 200 may send the demultiplexed Video data (Vide o ES) to a Video Decoder (Video Decoder) and send the demultiplexed Audio data (Audio ES) to an Audio Decoder (Audio Decoder). The audio data and the video data can be synchronously processed by an audio-video synchronization module (AVSync) according to a time reference point (Program Clock Reference, PCR) after being decoded. Finally, the decoded video data (such as YUV format data) is sent to a video output module (VOUT) for playing, and the decoded audio data (such as PC M format data) is sent to an audio output module (AOUT) for playing.
The digital television program playing flow is completed by the cooperation of hardware and software. Wherein the hardware part is responsible for high-speed data processing and decoding, and the software controls the logic of the whole flow. In order to achieve proper playback of the program, the audio stream, video stream and data packets need to be properly separated from the transport stream by the Demux module for subsequent decoding and display. The Demux module belongs to the module of the driving layer and is realized by relying on a hardware programmable transport interface (PTI, programmable Transport Interface). The program data structure generated by the system stores stProg variables such as the transponder number (TransponderId) and the audio/video PID (Packet Identifier) values of the program to correctly extract specific program data from the multiplexed program.
In some embodiments, the display device 200 may also parse electronic program guide (Electron ic Program Guide, EPG) data from the broadcast television signal and present it in a graphical interface through the display 260 to enable a user to view programs being broadcast or upcoming on various channels, as well as detailed information about such programs, such as program titles, genre, profile, broadcast times, etc. An intuitive way is provided for a user to view and select television programs, and the viewing experience of the user is improved.
An electronic program guide is a service that provides an interactive listing of detailed information about television programs. The EPG information includes basic EPG information and extended EPG information. The basic EPG information refers to EPG information described based on a network information table (Network Informati on Table, NIT), a service description table (Service Description Table, SDT), and an event information table (Event Inf ormation Table, EIT). The NIT table may provide basic structure information of a broadcast network and related information of a transport stream TS, including network information such as a network name (network_name_descriptor) and management system information (sys system_management_descriptor), program list and program information (service_list_ deseriptor) in the broadcast network, frequency point and other frequency locking related information (service_description_descriptor), and the like.
The EIT table includes details of upcoming and on-air programs on each channel. EIT tables fall into two categories: EIT [ p/f ] and EIT [ schedule ]. Wherein EIT p/f includes information of a program currently being played and a program to be played next, including a program ID, a title, a start time, an end time, and the like. The EIT schedule includes information for a plurality of programs over a future period of time.
The SDT table includes services (channels) provided in the broadcast network, and detailed information of each service, such as a channel name (service_descriptor), a channel type, and the like. There are also identification bits, such as identification bit EIT _present_following_flag, for indicating whether or not there is an EIT [ p/f ] in the transport stream, and identification bit EIT _schedule_flag for indicating whether or not there is an EIT [ schedule ] in the transport stream, so as to inform the display apparatus 200 whether or not corresponding EIT information can be acquired in the current transport stream.
The basic EPG information includes titles, start times, end times, descriptions, channels, etc. of programs, which are encoded and transmitted in a specific format so that the display device 200 can parse and display. The extended EPG information includes more rich program content such as detailed program descriptions, actor lists, director information, ratings, etc.
For example, an EPG may include the following information elements: program name: the name of the television program that is upcoming or being broadcast. Program category: such as news, sports, movies, music, children's programs, etc. Mark label: finer granularity of classification, such as sign language programs, 4K programs, multi-lingual, multi-channel programs, caption programs, etc. Actor and director information: starring actors and directors participating in the program or movie. Scenario introduction: introduction of scenario. Time: start and end times of the program. Channel: the television channel on which the program is located. Program description: a brief overview of the program content. Event information: including special event, replay, first broadcast, etc. markers. Program length: the duration of the program. The following is carried out: future program trailers for days or weeks. Program rating: for parental control, display the appropriate age rating of the program. Related links: possibly including websites, social media, or other programming-related resources. Recording options: if supported by the device, the user may set a scheduled recording. Interaction function: the EPG may contain voting, gaming, or other functions that interact with the program. Keyword: by keyword classification, such as program name, program category, mark tag, etc.
Based on the above-mentioned playing embodiments of the television program, it is known that the playing of the television program is performed according to a predetermined time sequence, so that flexibility and freedom of playing when the television program is played are reduced, the user needs to watch the television program to know the scenario of the playing content, and only watch the current live program.
Therefore, some embodiments of the present application provide a method for promoting a television program, which is applied to the display device 200, and can parse out audio and video data and Electronic Program Guide (EPG) information of a target television program, and send the audio and video data and the Electronic Program Guide (EPG) information into a pre-trained AI model to generate a short video, a mind map, a text abstract, etc., so as to enrich the content of the television program, and facilitate users to understand the scenario development. And can be used as self-media popularization materials of respective media platforms to improve the audience rating of target television programs.
As shown in fig. 5, a flow chart of a television program promotion method provided by an embodiment of the present application specifically includes the following steps:
s501: audio and video data of a target television program and electronic program guide information are acquired.
During operation of the display device 200, a play instruction for playing a target television program may be received, and in response to the play instruction, a satellite signal or a terrestrial signal may be received by the tuner demodulator 210 to play the target television program.
As shown in fig. 6, for audio/video data of a target television program, based on the playing flowchart of the digital television program shown in fig. 4, the display apparatus 200 may acquire a radio frequency signal (RF) of the target television program, and after processing by a Tuner (Tuner) and a demodulator (Demodulator), may demodulate the radio frequency signal to obtain a Transport Stream (TS) of the target television program. Then, the Audio packet and the Video packet are separated from the transport stream by a demultiplexer (Demux), the Video packet is decoded by a Video Decoder (Video Decoder), and the Audio packet is decoded by an Audio Decoder (Audio Decoder) to obtain decoded Audio/Video data. And finally playing the audio and video data to realize the playing of the target television program.
In some embodiments, in order to implement the generation of the subsequent key audio and video, the display device 200 may pre-allocate a memory area (Buffer) for storing the audio and video data. When the audio and video data of the decoded target television program is obtained, the audio and video data can be stored in the memory area.
As shown in fig. 6, for electronic program guide information, the display device 200 may parse program specific information (Program Specific Information, PSI) and service information (Service Information, SI) tables from a TS stream of a television program, the PSI being used to describe a composition structure of the TS stream. SI tables are used to describe details of television programs such as program names, airtimes, program descriptions, etc. The event information table (Event Information Table, EIT), the network information table (Network Information Table, NIT) and the service description table (Service Description Table, SDT) in the SI table are then parsed. By parsing the data table, electronic program guide information about television programs is collected.
In some embodiments, display device 200 may store electronic program guide information in a memory area. And updating the electronic program guide information in the memory area in real time or periodically to ensure the integrity and accuracy of the electronic program guide information.
S502: and generating key audios and videos according to the audio and video data and the electronic program guide information through the video generation model.
The key audio and video is generated based on at least one audio and video fragment in the audio and video data. The key audios and videos are audio and video fragments which are extracted from the audio and video data and used for representing the highlight, one or more key audios and videos can be selected, and the key audios and videos can be single video frames or a plurality of continuous video frames.
S503: key information is generated according to the audio and video data and the electronic program guide information through the information generation model.
The key information is used for representing the playing content of the audio and video data. The key information may be at least one of a mind map, abstract, and schema for providing a summary or summary of the audio-visual data.
S504: the control display 260 displays key audios and videos and key information.
After the key audio/video and the key information are acquired, the key audio/video and the key information can be displayed through the display 260, so that the user can understand the scenario development of the target television program.
In some embodiments, when it is desired to display key audio and video and/or key information of the target television program, the display device 200 may obtain audio and video data and electronic program guide information of the target television program from the memory area, and generate the key audio and video and/or key information through the pre-trained AI model.
In order to achieve fast response, the display device 200 may also obtain audio and video data and electronic program guide information of the target television program from the memory area in advance, generate key audio and video and key information through a pre-trained AI model, and store the key audio and video and the key information into the memory area. When the key audio and video and/or key information of the target television program need to be displayed, the key audio and video and/or key information of the target television program can be acquired from the memory area.
In this embodiment, the video generation model and the information generation model are both AI models, and a description is given below of a technique related to a principle of forming a key audio/video and key information based on the AI models.
Information understanding: the AI model may understand the entered text description or script based on natural language processing (Natural Language Processing, NLP) techniques, including semantic analysis, emotion understanding, narrative structure parsing, etc. Based on the interpreted text content, the AI model can convert the text content into visual instructions, such as instructions depicting a scene layout, a sequence of actions, or particular visual elements, for subsequent video generation.
Video generation: AI models may utilize image generation models such as generation of antagonism Networks (GENERATIVE ADVERSARIAL Networks, GANs), variational self-encoders (Variational Auto-Encoders, VAEs), etc., to create new images or to modify existing images to conform to the text description. The AI model may perform video frame synthesis on a frame-by-frame basis to generate video, with each frame of image being generated based on the continuity of the previous frame and the text description, tiga visual continuity and a consistent progression of the story.
Animation and transition: the AI model can learn and simulate real world motion patterns for creating smooth animations to enhance the realism of video. The AI model can use a cyclic neural network (Recurrent Neural Networks, RNNs), a Long Short time memory network (Long Short-Term Memory Networks, LSTMs), a self-attention mechanism and other time sequence models to understand the relation and sequence modes among frames, so that the time consistency of actions and scene change in the video is ensured, and the smoothness and logic consistency of the video are improved.
Audio processing: the AI model may convert text to natural speech and may be synchronized with video content. The AI model may generate background music that matches the emotion and rhythm of the video.
Editing and integrating: the AI model can integrate the generated elements such as images, animations, sounds and the like, and can perform corresponding clipping and adjustment to form a complete video, thereby improving the compactness and coordination of video content. The AI model may perform post-processing such as color correction, resolution enhancement, etc., to improve video quality. The AI model may continually learn and optimize the video generation strategy based on the user's feedback.
The process shows that the AI model fuses text, image and audio data through deep learning and multi-mode processing technology, and creates short video content which meets the spirit of the original program and has innovation, thereby adapting to the diversified demands of different platforms and users. It should be noted that the above technology is merely an example, and not a limitation, and the AI model in practical application may be combined with more technologies and continuously optimized to improve the accuracy of prediction and user experience.
In some embodiments, as shown in fig. 7, a flow chart for generating key audio/video and key information provided by the embodiment of the present application specifically includes the following steps:
S701: the video data is partitioned into a sequence of video frames.
Wherein the sequence of video frames comprises time ordered video frames. The display device 200 segments video data in the audio-video data into a sequence of successive frames for individual analysis of each video frame.
In some embodiments, the display device 200 may pre-process the video frames, such as noise reduction processing, to reduce unnecessary interference; image enhancement processing to highlight key details; format conversion processes, such as converting color images to grayscale images or feature vectors, allow for more efficient processing by subsequent algorithms.
S702: visual features of the video sequence are extracted, and audio features of the audio data are extracted.
Wherein the visual features are used to characterize at least one of a feature target, a feature target action, and a feature target expression in the video data. The audio features are used to characterize at least one of dialog content, music intensity, emotion expression in the audio data. For visual features, visual elements such as feature targets, feature target actions, and feature target expressions may be identified from the video frames based on a Convolutional Neural Network (CNNs). For audio features, speech recognition techniques may be used to identify dialog content in the audio data, and emotion analysis techniques may be used to evaluate the strength of music and emotion expression to extract the audio features.
In some embodiments, to facilitate analysis of the audio data, the audio data may be converted into a form that is easy to analyze, such as a spectrogram, a power spectral density map, an amplitude spectrum, a mel-frequency cepstral coefficient, and the like, and then extracted for audio features. For example, the audio data may be converted into a spectrogram, and the audio features of the spectrogram extracted. The characteristics of the audio signal are shown in the frequency domain through the spectrogram, so that various frequency components in the audio can be conveniently understood and processed.
S703: audio-video data, visual features, audio features, and electronic program guide information are input to a video generation model. Key audios and videos are generated based on visual features, audio features and electronic program guide information through a video generation model.
The key audio and video is an audio and video fragment used for representing at least one of action change peak value, emotion fluctuation peak value and dialogue climax in the audio and video data, and the key audio and video is a highlight in the audio and video data. The motion change peak represents the climax part of the motion in the audio-video data, such as martial arts motion, dance segments, fight scenes, etc. The peak of emotion fluctuation represents the moment of strong emotion such as happiness, sadness, anger, etc. in the part of emotion burst in audio/video data. The dialogue climax represents the part of the audio-video data which highlights the climax in the dialogue, such as important decisions, revealing true looks or emotional complaints, etc.
After audio-video data, visual characteristics, audio characteristics and electronic program guide information are input into a video generation model, the video generation model can conduct behavior and emotion analysis based on the visual characteristics and the audio characteristics, and characteristic target behavior changes, emotion fluctuation peaks, climax parts of conversations and the like in the audio-video data are analyzed. Meanwhile, a time sequence model such as a cyclic neural network (RSSs), a long-short-time memory network (LSTMs) or a self-attention mechanism can be used for analyzing the relation of the time change of the frame sequence, and the narrative logic and plot development in the video are understood, so that continuous interesting events or climax parts in the audio and video data are identified and extracted. And meanwhile, the context can be understood based on the electronic program guide information, and the topics of the video and the content of interest to the user can be better understood by considering the additional information of the audio and video data in the electronic program guide information, such as titles, descriptions, labels, the viewing history of the user, interaction data and the like.
In some embodiments, after the display device 200 obtains the audio and video data generated by the video generation model, the audio and video data may be post-processed, such as deleting transient or incoherent segments, merging adjacent climax segments, or performing personalized processing according to user preferences, etc.
In some embodiments, the video generation model may be obtained by training a neural network model based on audio-video sample data of the marked key audio-video (highlight), visual features of the audio-video sample data, and audio features, and by iterating and optimizing, the model is enabled to learn how to accurately predict the highlight in the audio-video data from the input visual features and audio features. Meanwhile, the model parameters can be adjusted through supervised learning or reinforcement learning, so that the model can more accurately predict the interest points of the user.
S704: audio-video data, visual features, audio features, and electronic program guide information are input to an information generation model. Key information is generated by an information generation model based on visual features, audio features, and electronic program guide information.
In some embodiments, display device 200 may generate key information from at least one of audio-visual data, visual features, audio features, and electronic program guide information. At least one of audio-video data, visual characteristics, audio characteristics and electronic program guide information can be input into the information generation model, and the key information can be generated by analyzing the information generation model.
It will be appreciated that in the above embodiments, the extraction of key information is achieved using different types of input data. Aiming at the difference of the input data for generating the key information, different training sample data are required to be set, and the information generation model is trained in a targeted mode, so that the information generation model can accurately understand and extract the key information. For example, when generating key information according to electronic program guide information, the information generation model may be trained based on electronic program guide sample information and corresponding key sample information; when key information is generated according to the audio data and the electronic program guide information, the information generation model can be obtained by training based on the electronic program guide sample information, the audio characteristics of the audio sample data and the corresponding key sample information; when key information is generated according to video data and electronic program guide information, an information generation model can be obtained by training based on electronic program guide sample information, visual characteristics of the video sample data and corresponding key sample information; when key information is generated according to the audio and video data and the electronic program guide information, the information generation model can be obtained by training based on the electronic program guide sample information, the visual characteristics of the video sample data, the audio characteristics of the audio sample data and the corresponding key sample information.
In some embodiments, the display device 200 includes a first display and a second display, for which the display device 200 supports a dual screen presentation. Fig. 8 is a schematic diagram of a dual-screen display according to an embodiment of the present application. The display device 200 may control the first display to display a play screen of the target television program, and control the second display to display a play screen of the key audio and video and the key information. When the target television program is played, the target television program is played through the first display, key audios and videos corresponding to the target television program are played on the second display, and meanwhile key information such as a thinking guide chart, an outline, a abstract and the like can be displayed through the second display, so that a user can predict a scenario outline through the key audios and videos and the key information.
In some embodiments, the display apparatus 200 may implement an episode review function for episodes in the case where the target television program is a television program having an episode association relationship between an episode and a episode. The display device 200 may respond to the playing instruction of the target television program, and may obtain the key audio and video and the key information corresponding to the audio and video data of the last set of the target television program in the memory area, control the first display to display the playing picture of the target television program, and control the second display to display the playing picture and the key information of the key audio and video corresponding to the audio and video data of the last set of the target television program.
For example, as shown in fig. 9, a schematic diagram of a dual-screen display for implementing the top-set review function is provided in an embodiment of the present application. When the display device 200 plays the second episode of the television play, the key audio and video and the key information corresponding to the audio and video data of the first episode of the television play may be queried in the memory area. And controlling the first display to display the playing picture of the second episode of the television play, and controlling the second display to display the playing picture of the key audio and video corresponding to the first episode of the television play and key information such as a mind map, a schema, a abstract and the like, so that a user can review the episode of the television play conveniently, and user experience is improved.
In some embodiments, the display device 200 may simultaneously display the target television program, the key audio and video corresponding to the target television program, and the key information based on the first display.
In some embodiments, the display device 200 may divide the first display area and the second display area in the user interface of the first display based on the multi-window function. And controlling the first display to display the playing picture of the target television program in the first display area. And controlling the first display to display the playing picture of the key audio and video and the key information in the second display area.
For example, as shown in fig. 10, the user interface includes 1 first display area and 3 second display areas. The first display area displays the playing picture of the target television program, and the 3 second display areas respectively display the playing picture, the mind map and the schema of the key audio and video.
In some embodiments, the display apparatus 200 may control the first display to display a play screen of the target television program based on the picture-in-picture function, and display a play screen of the key audio and video and the key information in a picture-in-picture form at an upper layer of the play screen of the target television program.
For example, as shown in fig. 11, the play screen of the target television program is displayed full screen, and the play screen of the key audio/video is displayed in the form of a floating window on the upper layer of the play screen of the target television program.
In some embodiments, the display device 200 may implement a highlight skip function based on key audios in the audio-video data predicted by the video generation model. When the display apparatus 200 plays the target television program, the user may input a jump instruction for jumping to the highlight to the display apparatus 200, and the display apparatus 200 may jump the play progress of the target television program to the progress position of the key audio and video based on the progress position of the key audio and video in the audio and video data of the target television program, from quick positioning to the highlight position of the target television program.
In some embodiments, the display device 200 may set a skip button for skipping to a highlight in the interface for playing the target television program, and the user inputs a skip instruction by clicking the skip button.
In some embodiments, the display device 200 may also receive jump instructions based on interaction policies of voice, control device 200, and the like.
In some embodiments, to facilitate interaction with a user, the display device 200 may provide a graphical user interface with which the user may interact so that the user may perform operations based on the graphical user interface of entering text, entering instructions, selecting styles, reserving viewing, reserving a TV episode to be played after recording, and so forth.
In some embodiments, the display device 200 may obtain a set of television programs associated with the keywords from the memory area in response to a generation instruction for generating the key audios and videos associated with the keywords and key information, and obtain audio and video data and electronic program guide information of the television programs in the set of television programs, and generate the key audios and videos associated with the keywords from the audio and video data and the electronic program guide information.
The keywords may be serial names, directors, main angles, TV types, etc., so that the display device 200 may generate key audios and videos and key information conforming to the keywords.
For example, when the display device 200 plays a television series, the user may activate a voice interactive function of the display device 200 through a bluetooth remote controller so that the user may control the display device 200 based on voice instructions. When the user wants to view the episode introduction of the current episode of the tv episode, a voice command for instructing generation of the episode introduction may be input, and the display apparatus 200 may generate a content schema of the tv episode through the information generating model and display the content schema through the display 260, presenting an interface schematic as shown in fig. 12. In the interface shown in fig. 12, the display device continues to play the tv episode while displaying the content schema of the tv episode so that the user can better understand the episode content.
When the user wants to view the tv show or movie related to the tv show main name, a voice command for instructing generation of a program related to the main name may be input, and the display apparatus 200 may query the audio and video data related to the main name and the electronic program guide in the memory area, predict the highlight in the audio and video data related to the main name through the video generation model, generate a plurality of key audios and videos, and display through the display 260, presenting an interface schematic as shown in fig. 13. In the interface shown in fig. 13, the display device continues playing the television drama, and simultaneously displays the key audios and videos such as the key audio and video 1, the key audio and video 2, the key audio and video 3 and the like related to the main angle name formed by the user instruction indication.
In some embodiments, after the display device 200 obtains the key audio and video and the key information, a self-media promotion scheme may be generated based on the key audio and video and the key information, and uploaded to the self-media platform based on the user's needs.
In some embodiments, some embodiments of the present application further provide a television program promotion method, which is applied to the display device 200 and the server 400, and stores information and interacts through the server 400. As shown in fig. 14, the display apparatus 200 may upload information such as audio and video data of a television program, electronic program guide information, and user behavior data to the server 400. The server 400 invokes a video analysis service or runs an analysis algorithm to automatically extract key audios and videos in the audio and video data, generate key information, and store the key information in a database. The display device 200 may request corresponding data from the server 400 for presentation to enable user interaction.
As shown in fig. 15, a flow chart of another method for promoting television programs provided by the embodiment of the application specifically includes the following steps:
S1501: the display device 200 acquires audio-video data of the target television program.
S1502: the display device 200 transmits the audio and video data to the server 400.
S1503: the server 400 detects key indexes of the audio and video data, and extracts audio and video fragments with key indexes larger than an index threshold value to obtain key audio and video.
The key indexes comprise at least one of video code rate, scene change frequency, dialogue density, volume peak value, emotion intensity and action intensity. That is, the server 400 may perform visual content analysis and audio content analysis on the audio and video data, and determine whether the audio and video clip in the audio and video data contains climax or an important event according to indexes such as video code rate, scene change frequency, dialogue density, volume peak value, facial expression recognition, action intensity, and the like.
In some embodiments, the server 400 may extract the key audio and video according to the video code rate, and by counting the video code rate in the audio and video data playing process, when the video code rate is greater than the video code rate threshold value and the preset time is continued, mark the audio and video segment with the video code rate greater than the video code rate threshold value as the key audio and video.
The video code rate threshold value can be preset, and can also be dynamically set based on the video code rate of the audio and video data. For example, when the video code rate is 50% of the video code rate of a fixed period (e.g. 1 minute) before the video code rate and lasts for more than 1 minute, the audio/video clip is marked as a key audio/video.
In some embodiments, the server 400 may extract the key audio and video according to the scene change frequency, and the server 400 may use a video cutting technique to detect the image similarity between video frames in the video data, so as to determine whether a scene change occurs between the video frames, thereby calculating the scene change frequency, where the scene change frequency refers to the number of scene changes in the video content in a preset time. If the scene change frequency is larger than the change frequency threshold, marking the audio/video fragment in the preset time as the key audio/video.
In some embodiments, the server 400 may extract the key audio and video according to the dialogue density, and the server 400 may convert the audio data into text content through a voice recognition technology, and calculate the number of words or the sentence length in a specific time period, so as to obtain the dialogue density. If the conversation density is greater than the conversation density threshold, marking the audio-video clips within the specific time period as key audio-video.
In some embodiments, the server 400 may extract the key audio and video from the volume peak, and the server 400 may analyze the audio waveform to mark audio and video segments with a volume greater than the volume threshold as key audio and video.
In some embodiments, the server 400 may extract the key audio and video according to the emotion intensity, the server 400 may perform facial expression recognition on the video frames in the video data, recognize emotion expressions such as surprise, sadness, happiness, etc., evaluate the emotion intensity, and mark the audio and video segments with emotion intensity greater than the emotion intensity threshold as the key audio and video.
In some embodiments, the server 400 may extract the key audios and videos according to the action intensity, and the server 400 may detect the action amplitude, the action speed, etc. of the feature targets (such as characters or objects) in the video frames through the motion estimation algorithm to evaluate the action intensity, and mark the audios and videos with the action intensity greater than the action intensity threshold as the key audios and videos.
S1504: the server 400 feeds back the key audios and videos to the display device 200.
S1505: the display device 200 acquires the key audio/video fed back by the server 400, and controls the display 260 to display the key audio/video.
In some embodiments, the display device 200 may also obtain user behavior data when playing the target television program, and send the user behavior data to the server 400, so that the server 400 extracts the key audio and video based on the user behavior data. The user behavior data comprise data such as specific time points of pause, playback and fast forward, frequency and position of recorded files and the like. These data may reveal which portions are more focused by the user so that the server 400 may further verify and optimize the accuracy of the tags in conjunction with the user behavior data to achieve personalized content recommendation.
In some embodiments, display device 200 may also send electronic program guide information to server 400 so that server 400 may generate key information for the target television program based on the electronic program guide information and/or the audiovisual data.
In some embodiments, server 400 may generate key audios and videos based on electronic program guide information and audio and video data. For example, the display device 200 may request key audio-visual and key information associated with a key from the server 400. The server 400 may search the database for the audio/video data associated with the keywords based on the electronic program guide information, and extract the key audio/video in the audio/video data. Meanwhile, text information related to the audio and video data associated with the keywords can be extracted from the electronic program guide information, and a short summary of the audio and video content is generated through a text summarization algorithm so as to obtain the key information. The key audios and videos and the key information are fed back to the display device 200.
It can be understood that in this embodiment, the AI model is not required to analyze and predict the key audio and video, but the server 400 is used to extract and store the key audio and video and the key information based on the relevant analysis technology. The server 400 interacts with the data of the display device 200 similar to the memory area in the above-described embodiments. When the display device 200 needs to acquire data such as key audio and video, key information, etc., a data request may be sent to the server 400, so that corresponding steps are performed according to the data fed back by the server 400. Of course, the server 400 may also generate the key audio/video and the key information based on the AI model analysis, and the display device 200 may also implement extraction of the key audio/video and the key information based on the relevant analysis technology.
In this embodiment, by analyzing audio and video data such as movies and television dramas, extracting key video frames, and automatically generating short videos, long videos are concentrated into short videos, so that a user can conveniently and rapidly screen interesting programs, an upper-set review function can be provided, so that the user can easily link up dramas, and viewing continuity is improved. Key information such as a mind map, a text abstract, a schema and the like can be formed based on the EPG information, and an understanding threshold is reduced in a visual and textual information mode, so that a user can understand the relationship of characters and the development of a scenario conveniently. Based on understanding the scenario, the function of reserving future episodes can be provided for users based on the EPG information, so that audience rating is improved. And the self-media scheme can be formed according to the generated materials such as short videos, mind-guide diagrams, text abstracts and the like, and the television episodes can be promoted at multiple terminals such as televisions, mobile phones, PC ends and the like, so that the popularity and audience rating of the episodes are improved.
It should be noted that, the method for promoting television programs provided by the embodiment of the application is not limited to television programs, but can be various video resources such as network video, movies, live broadcast playback, short video platform content and the like.
Based on the television promotion method, some embodiments of the present application further provide a display device 200, where the display device 200 includes a display 260 and a controller 250. Wherein the display 260 is configured to display a user interface; the controller 250 is configured to:
And acquiring audio and video data of the target television program, wherein the audio and video data comprise audio data and video data.
Electronic program guide information of a target television program is acquired.
And generating key audios and videos according to the audio and video data and the electronic program guide information through the video generation model, wherein the key audios and videos are generated based on at least one audio and video fragment in the audio and video data.
And generating key information according to the audio and video data and the electronic program guide information through the information generation model, wherein the key information is used for representing the playing content of the audio and video data.
And controlling the display to display the key audio and video and the key information.
The same and similar parts of the embodiments in this specification are referred to each other, and are not described herein.
Based on the television promotion method, some embodiments of the present application further provide a display apparatus 200, where the display apparatus 200 includes a display 260, a communication device 220, and a controller 250. Wherein the display 260 is configured to display a user interface; the communication device 220 is configured to establish a communication connection with the server 400, and the controller 250 is configured to:
And acquiring audio and video data of the target television program, wherein the audio and video data comprise audio data and video data.
Electronic program guide information of a target television program is acquired.
The audio and video data and the electronic program guide information are sent to the server 400, so that the server 400 detects key indexes of the audio and video data, and extracts audio and video fragments with key indexes larger than an index threshold value to obtain key audio and video. And generating key information according to the electronic program guide information and/or the audio/video data.
And acquiring the key audio and video and the key information fed back by the server 400.
And controlling the display to display the key audio and video and the key information.
According to the technical scheme, the display device and the television program popularization method provided by the embodiment can acquire the audio and video data and the electronic program guide information of the target television program. Generating key audios and videos according to the audio and video data and the electronic program guide information through the video generation model, generating key information according to the audio and video data and the electronic program guide information through the information generation model, generating the key audios and videos based on at least one audio and video segment in the audio and video data, and representing playing contents of the audio and video data through the key information. After the key audio/video and the key information are acquired, the first display can be controlled to display the key audio/video and the key information. According to the method, the audio and video data and the electronic program guide information of the television program are analyzed, and the information such as the short video, the mind map, the text abstract and the like is generated and displayed through an AI model, so that a user can understand the scenario conveniently, the method can be used as a self-media popularization material, and the problem of low flexibility and freedom of playing of the television program during playing is solved.
It will be apparent to those skilled in the art that the techniques of embodiments of the present invention may be implemented in software plus a necessary general purpose hardware platform. Based on such understanding, the technical solutions in the embodiments of the present invention may be embodied essentially or in parts contributing to the prior art in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method of the embodiments or some parts of the embodiments of the present invention.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the application.
The foregoing description, for purposes of explanation, has been presented in conjunction with specific embodiments. The illustrative discussions above are not intended to be exhaustive or to limit the embodiments to the precise forms disclosed above. Many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles and the practical application, to thereby enable others skilled in the art to best utilize the embodiments and various embodiments with various modifications as are suited to the particular use contemplated.