CN104731959B - The method of text based web page contents generation video frequency abstract, apparatus and system - Google Patents
The method of text based web page contents generation video frequency abstract, apparatus and system Download PDFInfo
- Publication number
- CN104731959B CN104731959B CN201510156123.9A CN201510156123A CN104731959B CN 104731959 B CN104731959 B CN 104731959B CN 201510156123 A CN201510156123 A CN 201510156123A CN 104731959 B CN104731959 B CN 104731959B
- Authority
- CN
- China
- Prior art keywords
- text
- keyword
- news
- web page
- word
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/78—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/7867—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, title and artist information, manually generated time, location and usage information, user ratings
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Library & Information Science (AREA)
- Multimedia (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Field is generated the present invention relates to video, more particularly to text based web page contents generate the method for video frequency abstract, apparatus and system.This method, apparatus and system, can generate video frequency abstract, and video frequency abstract is shown on targeted news webpage based on targeted news webpage text content.User can obtain news information when browsing corresponding news web page by way of watching video frequency abstract, compared to existing news website by way of picture and word show news, can save the time cost that user reads news on news web page.
Description
Technical field
Field is generated the present invention relates to video, video frequency abstract is generated in particular to text based web page contents
Method, apparatus and system.
Background technology
News website, refers to manage news as the website of main living means.Including national large-scale news portal, business
Industry door, also local items door, various industries portal website also act as the news website of the sector.
Existing news website, news displaying is generally basede on word and picture to describe.Current user passes through more and more
Approach, such as mobile phone, tablet personal computer, TV, to obtain the content of news.Existing news category is applied and obtained from news website
Take the also many based on word picture of news.
The shortcoming of news is that, in the quick Consumption Age of content, user reads the time cost of word on existing news website
It is relative to improve, it is unfavorable for news website word and shows news to user.
The content of the invention
It is an object of the invention to provide a kind of text based web page contents generation method of video frequency abstract, device and it is
System, with the mode of generation video frequency abstract to the news on user's displayed web page, news on news web page is read to save user
Time cost.
In a first aspect, the embodiments of the invention provide the method that a kind of web page contents of text generate video frequency abstract, including:
Extract the text snippet of targeted news webpage text content;The text snippet is parsed, the key in the text snippet is obtained
Word;Semantic analysis is carried out to the keyword, the keyword Ontology is obtained;Based on the keyword Ontology, from
Corresponding picture or video are retrieved in internet, Background is formed, Background is formed;Based on the keyword Ontology,
Animation template corresponding with the keyword is obtained from the grammar database preestablished;The text snippet is converted into sound
Frequency evidence;Rule is rendered according to default, the synthesis of the Background, the animation template and the voice data is rendered into video
File.
With reference in a first aspect, the embodiments of the invention provide the possible embodiment of the first of first aspect, wherein, institute
Stating the text snippet of extraction targeted news webpage text content includes:Based on web page interlinkage, news web page is obtained;Remove described new
The additional information in webpage is heard, wherein the additional information includes one or more of:Advertisement, picture, video, framework and figure
Table;Extract the content of text of the news web page after additional information belonging to removing;Emphasis language is won from the content of text
The sentence composition text snippet.
With reference in a first aspect, the embodiments of the invention provide the possible embodiment of second of first aspect, wherein, institute
State and the emphasis sentence composition text snippet is won from the content of text, including:Calculate every in the content of text successively
Similitude between two sentences;According to the result of calculation of the similitude, to the statement classification in the content of text;According to
The result of the classification, from every quasi-sentence extracting sentence respectively is combined, and obtains candidate's summary;From candidate summary
The candidate summary minimum with pre-set text length of summarization difference is chosen as the summary texts of the news web page, wherein described pre-
If text snippet length is determined according to video length to be generated and the bright reading rate of text snippet set in advance.
With reference in a first aspect, the embodiments of the invention provide the possible embodiment of the third of first aspect, wherein, institute
State and calculate similitude in the content of text between every two sentences successively, including:Calculate and word is had in current two sentences
The quantity sum of language;By the quantity sum divided by the length average value of current two sentences, the phase of current two sentences is obtained
Like property;Method according to the similitude for obtaining current two sentences calculates similar between every two sentences in the content of text
Property.
With reference in a first aspect, the embodiments of the invention provide the possible embodiment of the 4th of first aspect kind, wherein, institute
The parsing text snippet is stated, the keyword in the text snippet is obtained, including:Participle is carried out to the text snippet;Will
The word obtained after the participle is compared with the word template in the grammar database, determines the word obtained after participle
Part of speech;According to the judged result of the part of speech, choose noun from the word after participle and number is used as the text snippet
Keyword.
With reference in a first aspect, the embodiments of the invention provide the possible embodiment of the 5th of first aspect kind, wherein, institute
State and semantic analysis is carried out to the keyword, obtain the keyword Ontology, including:Retrieved in the grammar database
The keyword, obtains all ontology describings related to the keyword;Using network ontology language OWL from the keyword
All ontology describings in determine keyword Ontology under current context.
With reference in a first aspect, the embodiments of the invention provide the possible embodiment of the 6th of first aspect kind, wherein, institute
State and render rule according to default, the synthesis of the Background, the animation template and the voice data is rendered into video file,
Including:The mapping for setting keyword described in the voice data, Background corresponding with the keyword and animation template is closed
System;According to the mapping relations, synthesis is carried out to the Background, the animation template and the voice data and rendered.
Second aspect, the embodiment of the present invention additionally provides the dress that a kind of text based web page contents generate video frequency abstract
Put, including:Extraction module, the text snippet for extracting targeted news webpage text content;Keyword acquisition module, for solving
The text snippet is analysed, the keyword in the text snippet is obtained;Semantic module, for carrying out language to the keyword
Justice analysis, obtains the keyword Ontology;Background graphics are into module, for based on the keyword Ontology, from mutual
Corresponding picture or video are retrieved in networking network, Background is formed;Animation template acquisition module, for based on the keyword sheet
Body is semantic, and animation template corresponding with the keyword is obtained from default grammar database;Audio conversion module, for inciting somebody to action
The text snippet is converted into voice data;Video Composition module, for rendering rule according to default, by the Background, institute
State animation template and voice data synthesis is rendered into video file.
With reference to second aspect, the embodiments of the invention provide the possible embodiment of the first of second aspect, wherein, institute
Keyword acquisition module is stated, including:Participle unit, for carrying out participle to the text snippet;Part of speech determining unit, for inciting somebody to action
The word obtained after the participle is compared with the word template in the grammar database, determines the word obtained after participle
Part of speech;Keyword chooses unit, for the judged result according to the part of speech, and noun and number are chosen from the word after participle
Word as the text snippet keyword.
The third aspect, what the embodiment of the present invention additionally provided a kind of text based web page contents generation video frequency abstract is
System, including:User terminal and the NEWS SERVER end being connected with user terminal by internet;The NEWS SERVER end includes such as the
Text based web page contents described in two aspects and second aspect the first possible embodiment generate the dress of video frequency abstract
Put.
The method of text based web page contents generation video frequency abstract provided in an embodiment of the present invention, apparatus and system, energy
Video frequency abstract is enough generated based on targeted news webpage text content, and video frequency abstract is shown on targeted news webpage.
User can obtain news information when browsing corresponding news web page by way of watching video frequency abstract, compared to existing news
Website can save the time cost that user reads news on news web page by way of picture and word show news.
To enable the above objects, features and advantages of the present invention to become apparent, preferred embodiment cited below particularly, and coordinate
Appended accompanying drawing, is described in detail below.
Brief description of the drawings
In order to illustrate the technical solution of the embodiments of the present invention more clearly, below will be attached to what is used required in embodiment
Figure is briefly described, it will be appreciated that the following drawings illustrate only certain embodiments of the present invention, therefore is not construed as pair
The restriction of scope, for those of ordinary skill in the art, on the premise of not paying creative work, can also be according to this
A little accompanying drawings obtain other related accompanying drawings.
Fig. 1 shows that the text based web page contents that the embodiment of the present invention 1 is provided generate the method stream of video frequency abstract
Journey schematic diagram;
Fig. 2 shows that the text based web page contents that the embodiment of the present invention 2 is provided generate the method stream of video frequency abstract
Journey schematic diagram;
Fig. 3 shows that the text based web page contents that the embodiment of the present invention 3 is provided generate the method stream of video frequency abstract
Journey schematic diagram;
Fig. 4 shows that the text based web page contents that the embodiment of the present invention 4 is provided generate the method stream of video frequency abstract
Journey schematic diagram;
Fig. 5 shows the method that the text based web page contents that the embodiment of the present invention 5 is provided generate into video frequency abstract
Schematic flow sheet;
Fig. 6 shows the device for the text based web page contents generation video frequency abstract that the embodiment of the present invention 6 is provided
Structure is intended to;
Fig. 7 is shown in the device for the text based web page contents generation video frequency abstract that the embodiment of the present invention 7 is provided
The structural representation of keyword acquisition module;
Fig. 8 shows that the system for the text based web page contents generation video frequency abstract that the embodiment of the present invention 8 is provided connects
Connect schematic diagram.
Main element symbol description:
21st, extraction module;22nd, keyword acquisition module;23rd, semantic module;24th, background graphics are into module;25th, move
Draw template acquisition module;26th, audio conversion module;27th, Video Composition module;22a, participle unit;22b, part of speech determining unit;
22c, keyword choose unit;31st, user terminal;32nd, NEWS SERVER end.
Embodiment
Below in conjunction with accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete
Ground is described, it is clear that described embodiment is only a part of embodiment of the invention, rather than whole embodiments.Generally exist
The component of the embodiment of the present invention described and illustrated in accompanying drawing can be arranged and designed with a variety of configurations herein.Cause
This, the detailed description of the embodiments of the invention to providing in the accompanying drawings is not intended to limit claimed invention below
Scope, but it is merely representative of the selected embodiment of the present invention.Based on embodiments of the invention, those skilled in the art are not doing
The every other embodiment obtained on the premise of going out creative work, belongs to the scope of protection of the invention.
Embodiment 1:
The present embodiment 1 provides a kind of method that text based web page contents generate video frequency abstract, its schematic flow sheet
For Fig. 1, main processing steps include:
Step S101:Extract the text snippet of targeted news webpage text content.
The text of news information may be not succinct enough, and user needs more time cost to obtain the news letter on webpage
Breath, can be with more succinct and relatively complete introduce news to user by the text snippet for extracting news web page content of text
Information.
In addition, news web page is in addition to news content text, generally also include other additional informations, such as in news web page
On be attached with advertisement, picture, video, framework and/or chart etc., these additional informations are not effective content of newsletter archive, because
This can first remove the additional information on news web page before the text snippet of news web page content of text is extracted.
Step S102, parsing text snippet, obtain the keyword in text snippet.
Keyword in text snippet includes the key message of news content, by extracting the acquisition that keyword can be easy
The key message of news content.By this step, the keyword of newsletter archive can be obtained, keyword letter is provided for subsequent step
Breath.
Step S103, to keyword carry out semantic analysis, obtain keyword Ontology.
Body is the clear and definite specification explanation of generalities, provides the basic terms and relation for constituting association area vocabulary, with
And the regular definition for providing these vocabulary extensions constituted using these terms and relation.Using Ontology, it can obtain
The basic description of news, the ontology describing of such as " shirt " is " dress ornament ".One word may have multiple ontology describings, for example
The ontology describing of " apple " can be " fruit " or " company ", it is therefore desirable to determine the keyword under current context
Ontology.This step carries out semantic analysis to keyword and obtains keyword Ontology, to be correctly based in subsequent step
The web page contents generation video frequency abstract of text.
Step S104, based on keyword Ontology, retrieve corresponding picture or video from internet, form background
Figure;
Synthetic video summary needs material.According to keyword Ontology, retrieved in internet search engine corresponding
Picture or video, formed Background, be used as in subsequent step synthetic video make a summary material.
Step S105, based on keyword Ontology, from the grammar database preestablished obtain it is corresponding with keyword
Animation template;
The corresponding animation template of different terms Ontology is stored in grammar database, according to keyword Ontology,
Corresponding animation template can be obtained from grammar database.The corresponding template of different keywords is pieced together, resulted in
Generate the teaming method of complete video summary.
Step S106, text snippet is converted into voice data;
Text snippet is changed into voice data using corresponding software, made a summary using this voice data as synthetic video
Audio material.In video frequency abstract, news is told about to user in the way of audio, it is easier compared to word description, save
The time of user.
Step S107, according to default rule is rendered, Background, animation template and institute's voice data synthesis are rendered into video
File.
Generation video file is rendered, has corresponding software and renders rule, rule is rendered according to default, by Background, is moved
Draw template and institute's voice data synthesis is rendered into video file.When synthesizing render video, keyword and pass in setting voice data
The mapping relations of the corresponding Background of keyword and animation template;According to mapping relations, to Background, animation template and voice data
Synthesis is carried out to render.For example keyword 1 occurs in audio at the 3rd second, next keyword occurs at the 5th second in audio, then
The corresponding Background of keyword 1 shows according to animation template between the 3rd second and the 5th second.Pass through the method so that video frequency abstract
In audio and image coincide, preferably show news to user.
The present embodiment 1 provides a kind of method that text based web page contents generate video frequency abstract, can be based on target
News web page content of text generates video frequency abstract, and video frequency abstract is shown on targeted news webpage.User is browsing
During corresponding news web page, news content information can be obtained by way of watching video frequency abstract, it is logical compared to existing news website
Cross picture and word shows the mode of news, the time cost that user reads news on news web page can be saved.
Embodiment 2:
The present embodiment 2 provides a kind of web page contents generation video for being preferably based on text on the basis of embodiment 1 and plucked
The method wanted, its schematic flow sheet is Fig. 2, and key step includes:
Step S201, based on web page interlinkage, obtain news web page;
The address of web page interlinkage, when can be that user accesses news web page, the Present News net sent to NEWS SERVER
Page address;It can also be the address of all news web pages on the corresponding news website of NEWS SERVER scanning acquisition.Press service
Web page interlinkage of the device based on acquisition, obtains corresponding news web page information.
Step S202, the additional information removed in news web page, wherein additional information include one or more of:Extensively
Announcement, picture, video, framework and chart;
On the news web page that NEWS SERVER is obtained, in addition to including the corresponding character introduction of news content, it is also possible to have
Other incoherent additional informations, such as advertisement, picture, video, framework and chart, this additional information have no for understanding news
Use, therefore step S202 is used for removing additional information on news web page.
Step S203, the content of text for extracting the news web page after removal additional information;
Eliminate after the additional information on news web page, NEWS SERVER obtains the text message being shown to news,
Based on corresponding text message generation video frequency abstract on news web page in the step of so as to after.
Step S204, win from content of text emphasis sentence composition text snippet.
Newsletter archive on news web page may be not succinct enough, containing more word, user need the more time into
News information on this acquisition webpage, accordingly, it would be desirable to emphasis sentence composition text snippet be won from content of text, so as to simpler
The news on news web page is introduced to user cleanly, the time cost that user obtains news information is saved.
Step S205, parsing text snippet, obtain the keyword in text snippet.
This step obtains the keyword of news in brief, and key word information is provided for subsequent step.
Step S206, to keyword carry out semantic analysis, obtain keyword Ontology.
The semantic analysis that this step obtains keyword obtains Ontology, correctly to be given birth in subsequent step based on text
Into webpage audio content summary.
Step S207, based on keyword Ontology, retrieve corresponding picture or video from internet, form background
Figure;
This step obtains Background, the material made a summary as synthetic video in subsequent step.
Step S208, based on keyword Ontology, from the grammar database preestablished obtain it is corresponding with keyword
Animation template;
This step results in the mode of generation video frequency abstract.
Step S209, text snippet is converted into voice data;
Text snippet is changed into voice data by this step, the audio element made a summary using this voice data as synthetic video
Material.
Step S210, according to default rule is rendered, Background, animation template and institute's voice data synthesis are rendered into video
File.
This step renders generation video file.
The side that a kind of method for text based content generation video frequency abstract that the present embodiment 2 is provided is provided with embodiment 1
Method is compared, and its course of work is identical with advantage, repeats no more.
Embodiment 3:
The present embodiment 3 provides a kind of web page contents generation video for being preferably based on text on the basis of embodiment 2 and plucked
The method wanted, its schematic flow sheet is Fig. 3, and key step includes:
Step S301, based on web page interlinkage, obtain news web page;
This step obtains corresponding news web page.
Step S302, the additional information removed in news web page, wherein additional information include one or more of:Extensively
Announcement, picture, video, framework and chart;
This step is used for removing the additional information on news web page.
Step S303, the content of text for extracting the news web page after removal additional information;
This step obtains the text message to news.
Similitude in step S304, successively calculating content of text between every two sentences.
Similar sentence generally comprises similar information.It is situated between displaying news content to user in order to succinct, complete, can be with
According to similitude by statement classification in content of text, a sentence is proposed in each class, so just can succinctly, completely to
Introduce the news content on news web page in family.
Specifically the method for similitude is between two sentences of calculating:
First, the quantity sum that word is had in current two sentences is calculated;
The sum of all words appeared in simultaneously in current two word is calculated, its sum is bigger, then it is assumed that two sentences
Between similitude it is bigger;
Secondly, by quantity sum divided by the length average value of current two sentences, the similitude of current two sentences is obtained;
The length of sentence is the number of words in sentence defined in this method.All words that will be appeared in simultaneously in current two word
The average value of the sum of language divided by the number of words of two sentences, obtains the similitude of current two sentences, i.e., what two sentences had
Word is more, and two mean lengths of utterance are shorter, then it is assumed that similitude is bigger between two sentences.Can be easily with the method
Obtain the similitude between two sentences.For example, two words in content of text are respectively sentence 1 and sentence 2;Wrapped in sentence 1
Containing 4 words, each word length is 2 words, respectively word 1, word 2, word 3, word 4;6 words are included in sentence 2
Language, each word length is 2 words, respectively word 3, word 4, word 5, word 6, word 7, word 8.Sentence 1 and sentence 2
In have word 3 and word 4 totally 2 words;The length of sentence 1 is 8 words, and the length of sentence 2 is 12 words, and this two sentences are averagely long
Degree is 10 words;Therefore the similitude of sentence 1 and sentence 2 is 0.2.
Using the above method, the similitude between every two sentences in content of text can be calculated.
Step S305, the result of calculation according to similitude, to the statement classification in content of text;
The result calculated according to step S304 is to all statement classifications, if for example, similar between sentence 1 and sentence 2
Property be more than the similitude between sentence 1 and other all sentences and the similitude between sentence 1 and sentence 2 and be more than between sentence
Average similarity, then 2 points of sentence 1 and sentence is a class;Otherwise, 2 points of sentence 1 and sentence is different classes.By by sentence
Classification, it is believed that the sentence in same class expresses the same meaning;All classes are all extracted into a sentence, can completely,
The succinct newsletter archive summary info introduced to user on news web page, saves the time cost that user obtains news.
Step S306, the result according to classification, from every quasi-sentence extracting sentence respectively is combined, and obtains candidate and plucks
Will;
The content of text being previously obtained can be classified according to the similitude between sentence, and the sentence in content of text is divided into
The sentence of Similar content may be possessed in multiple classes, each class containing more than one.If not extracting summary, news content therefrom
It is troublesome.Extract a sentence respectively from every quasi-sentence, candidate's summary can be obtained, completely, succinctly can be situated between to user
The content continued on news web page.May also there are multiple schemes containing many sentences, candidate's summary of acquisition in the class of each sentence,
Need to take suitable scheme by follow-up step.
Step S307, made a summary from candidate in choose the candidate summary minimum with pre-set text length of summarization difference and be used as news
The summary texts of text, wherein pre-set text length of summarization are bright according to video length to be generated and text snippet set in advance
Reading rate is determined.
Pre-set text length of summarization determines according to video length to be generated and the bright reading rate of text snippet set in advance,
Such as video length is set to 1 minute, and the bright reading rate of text snippet is set to 120 words per minute clocks, then pre-set text length of summarization is set to
120 words., it is necessary to filter out suitable text snippet in multiple text snippets that step 1d3 is obtained.It is in multiple text snippets and pre-
If the minimum candidate's summary of text snippet length difference be chosen for the summary texts of newsletter archive.Make when there are multiple schemes
Content of text length of summarization it is identical when, using the scheme for extracting most preceding sentence.The text sentence of such as acquisition can be divided into two
Class, wherein sentence 1 and sentence 3 are a classes, and sentence 2 and sentence 4 are the number of words and default text of another class, sentence 1 plus sentence 2
This length of summarization difference is minimum and sentence 1 adds the length of sentence 2 to be equal to the length of sentence 3 plus sentence 4, and now sentence 1 is text
In the sentence that occurs at first, then text snippet be made up of sentence 1 and sentence 2.Text snippet is obtained by this step, can be complete
Whole, the succinct news shown to user on targeted news webpage.
Step S308, parsing text snippet, obtain the keyword in text snippet.
This step can obtain the keyword of news in brief, and key word information is provided for subsequent step.
Step S309, to keyword carry out semantic analysis, obtain keyword Ontology.
The semantic analysis that this step obtains keyword obtains Ontology, correctly to be given birth in subsequent step based on text
Into webpage audio content summary.
Step S310, based on keyword Ontology, retrieve corresponding picture or video from internet, form background
Figure;
This step obtains Background, the material made a summary as synthetic video in subsequent step.
Step S311, based on keyword Ontology, from the grammar database preestablished obtain it is corresponding with keyword
Animation template;
This step results in the corresponding video template of keyword.
Step S312, text snippet is converted into voice data;
Text snippet is changed into voice data by this step, the audio element made a summary using this voice data as synthetic video
Material.
Step S313, according to default rule is rendered, Background, animation template and institute's voice data synthesis are rendered into video
File.
This step renders generation video file.
Embodiment 4:
The present embodiment 4 provides a kind of web page contents generation video for being preferably based on text on the basis of embodiment 1 and plucked
The method wanted, its schematic flow sheet is Fig. 4, and key step includes:
Step S401, the text snippet for extracting targeted news webpage text content.
This step obtains text snippet, can completely, it is succinct to the news on user's displaying targeted news webpage.
Step S402, to text snippet carry out participle;
Using Chinese character as base unit in the statement of Chinese sentence, without the participle information in similar English sentence, thus it is right first
Text snippet carries out participle, obtains participle information in text snippet.
Step S403, the word obtained after participle is compared with the word template in default grammar database, really
Determine the part of speech of word obtained after participle;
Word template is stored in grammar database.By the word that will be obtained after participle with default grammar database
Word template is compared, and can determine that the part of speech of the word obtained after participle, i.e. word is noun, verb, number, measure word, generation
Word, adjective, adverbial word, preposition, conjunction, auxiliary word, the part of speech of onomatopoeia and interjection are divided.Similar function word such as adverbial word, preposition, company
Word, auxiliary word, onomatopoeia and interjection do not include key message generally, pass through word and the default syntax data that will be obtained after participle
Word template in storehouse is compared, and determines the part of speech of word obtained after participle, can more rapidly obtain keyword.
Step S404, the judged result according to part of speech, choose noun from the word after participle and number are plucked as text
The keyword wanted.
The keyword of news in brief is noun and number in news web page, and noun describes the target and object of news, number
Word describes the data message in news.Noun and number in text snippet are extracted, the key message of news can be obtained.
Step S405, to keyword carry out semantic analysis, obtain keyword Ontology.
The semantic analysis that this step obtains keyword obtains Ontology, correctly to be given birth in subsequent step based on text
Into webpage audio content summary.
Step S406, based on keyword Ontology, retrieve corresponding picture or video from internet, form background
Figure;
This step obtains Background, the material made a summary as synthetic video in subsequent step.
Step S407, based on keyword Ontology, from the grammar database preestablished obtain it is corresponding with keyword
Animation template;
This step results in the corresponding video template of keyword.
Step S408, text snippet is converted into voice data;
Text snippet is changed into voice data by this step, the audio element made a summary using this voice data as synthetic video
Material.
Step S409, according to default rule is rendered, Background, animation template and institute's voice data synthesis are rendered into video
File.
This step renders generation video file.
Embodiment 5:
The present embodiment 5 provides a kind of web page contents generation video for being preferably based on text on the basis of embodiment 1 and plucked
The method wanted, its schematic flow sheet is Fig. 5, and key step includes:
Step S501, the text snippet for extracting targeted news webpage text content.
This step obtain text snippet, can completely, succinctly the news content introduced to user on targeted news webpage.
Step S502, parsing text snippet, obtain the keyword in text snippet.
This step obtains the keyword of news in brief, and key word information is provided for subsequent step.
Step S503, the search key in default grammar database, obtain all bodies related to keyword and retouch
State;
Be stored with the corresponding ontology describing of each word in default grammar database, is retrieved in grammar database crucial
Word, can obtain all ontology describings related to keyword.For example, by retrieving grammar database, obtaining the sheet of " shirt "
Body description is " dress ornament ".
Step S504, determined from all ontology describings of keyword under current context using network ontology language OWL
Keyword Ontology.
Keyword may be containing multiple ontology describings, such as " apple ", it may be possible to " fruit ", it is also possible to " company ", this
When, the keyword Ontology under current context is determined using OWL, the correct description of keyword is obtained, so as to follow-up step
Video frequency abstract is correctly generated based on news web page content in rapid.
Step S505, based on keyword Ontology, retrieve corresponding picture or video from internet, form background
Figure;
This step obtains Background, the material made a summary as synthetic video in subsequent step.
Step S506, based on keyword Ontology, from the grammar database preestablished obtain it is corresponding with keyword
Animation template;
This step results in the corresponding video template of keyword.
Step S507, text snippet is converted into voice data;
Text snippet is changed into voice data by this step, the audio element made a summary using this voice data as synthetic video
Material..
Step S508, according to default rule is rendered, Background, animation template and institute's voice data synthesis are rendered into video
File.
This step renders generation video file.
Embodiment 6:
The present embodiment 6 provides the device that a kind of text based web page contents generate video frequency abstract, its structural representation
Such as Fig. 6, including:
Extraction module 21, the text snippet for extracting targeted news webpage text content;
Keyword acquisition module 22, for parsing text snippet, obtains the keyword in text snippet;
Semantic module 23, for carrying out semantic analysis to keyword, obtains keyword Ontology;
Background graphics into module 24, for based on keyword Ontology, retrieved from internet corresponding picture or
Video, forms Background;
Animation template acquisition module 25, for based on keyword Ontology, being obtained from the grammar database preestablished
Take animation template corresponding with keyword;
Audio conversion module 26, for text snippet to be converted into voice data;
Video Composition module 27, for rendering rule according to default, wash with watercolours is synthesized by Background, animation template and voice data
Contaminate for video file.
A kind of text based web page contents that the present embodiment 6 is provided generate the device of video frequency abstract, pass through extraction module
21 extract the text snippet of targeted news webpage text content;Then the text for parsing extraction by keyword acquisition module 22 is plucked
Will, obtain the keyword in text snippet;Afterwards, the analysis of key word of semantic module 23 obtains keyword Ontology, then
Keyword Ontology is based on into module 24 by background graphics, corresponding picture or video are retrieved from internet, the back of the body is formed
Jing Tu;Keyword Ontology is based on by animation template acquisition module 25, obtained from the grammar database preestablished with closing
The corresponding animation template of keyword;Text snippet is converted into voice data by audio conversion module 26;Finally, Video Composition module 27
Rule is rendered according to default, the synthesis of Background, animation template and voice data is rendered into video file.When user accesses news
During webpage, it can see on webpage and video frequency abstract is generated based on corresponding news web page content.
The present embodiment 6 provides the device that a kind of text based web page contents generate video frequency abstract, can be based on target
News web page content of text generates video frequency abstract, and video frequency abstract is shown on targeted news webpage.User is browsing
During corresponding news web page, news information can be obtained by way of watching video frequency abstract, figure is passed through compared to existing news website
Piece and word show the mode of news, can save the time cost that user reads news on news web page.
Embodiment 7:
The present embodiment 7 provides a kind of text based web page contents on the basis of embodiment 6 and generates video frequency abstract
Device, the wherein structural representation of keyword acquisition module 22 as shown in fig. 7, comprises:
Participle unit 22a, for carrying out participle to text snippet;
Part of speech determining unit 22b, for word and the word template in default grammar database that will be obtained after participle
It is compared, determines the part of speech of word obtained after participle;
Keyword chooses unit 22c, for the judged result according to part of speech, and noun and number are chosen from the word after participle
Word as text snippet keyword.
Embodiment 8:
The present embodiment 8 provides the system that a kind of text based web page contents generate video frequency abstract, including:With
Family end 31 and NEWS SERVER end 32, user terminal 21 are connected with NEWS SERVER end 32 by internet, and its connection diagram is such as
Shown in Fig. 8.
The text based web page contents that NEWS SERVER end 32 includes being provided such as embodiment 6 or 7 generate video frequency abstract
Device.
The generation of NEWS SERVER end 32 generates video frequency abstract based on news web page content, and user is accessed newly by user terminal 21
When hearing webpage, the video frequency abstract of the web page contents generation based on corresponding text can be seen on webpage.
The present embodiment 8 provides the system that a kind of text based web page contents generate video frequency abstract, can be based on target
News web page content of text generates video frequency abstract, and video frequency abstract is shown on targeted news webpage.User is browsing
During corresponding news web page, news information can be obtained by way of watching video frequency abstract, figure is passed through compared to existing news website
Piece and word show the mode of news, can save the time cost that user reads news on news web page.
User terminal 31 can be the application of iPhone mobile phones in the embodiment, iPad tablet personal computers are applied, Android phone is answered
Applied with, Android tablet personal computers, TV set-top box is applied, WindowS platform softwares are applied, Mac platform softwares are applied, IE
Any one in browser plug-in, Chrome browser plug-ins and Firefox browser plug-in unit.
News website end 32 can be WordpreSS plug-in units, Drupal plug-in units, Joomla plug-in units, Mediawiki plug-in units,
Any one in DiScuz plug-in units, PhpWind plug-in units and webpage javaScript scripts.
The technique effect and preceding method of each device and module that the embodiment of the present invention is provided, its realization principle and generation
Embodiment is identical, to briefly describe, and the embodiment part does not refer to part, refers to corresponding contents in preceding method embodiment.
, can be with several embodiments provided herein, it should be understood that disclosed systems, devices and methods
Realize by another way.Device embodiment described above is only schematical, for example, the division of the unit,
It is only a kind of division of logic function, there can be other dividing mode when actually realizing, in another example, multiple units or component can
To combine or be desirably integrated into another system, or some features can be ignored, or not perform.It is another, it is shown or beg for
The coupling each other of opinion or direct-coupling or communication connection can be by some communication interfaces, device or unit it is indirect
Coupling is communicated to connect, and can be electrical, machinery or other forms.
In addition, each functional unit in each embodiment of the invention can be integrated in a processing unit, can also
That unit is individually physically present, can also two or more units it is integrated in a unit.
The foregoing is only a specific embodiment of the invention, but protection scope of the present invention is not limited thereto, any
Those familiar with the art the invention discloses technical scope in, change or replacement can be readily occurred in, should all be contained
Cover within protection scope of the present invention.Therefore, protection scope of the present invention described should be defined by scope of the claims.
Claims (8)
1. a kind of method that text based web page contents generate video frequency abstract, it is characterised in that including:
Extract the text snippet of targeted news webpage text content;
The text snippet is parsed, the keyword in the text snippet is obtained;
Semantic analysis is carried out to the keyword, the keyword Ontology is obtained;
Based on the keyword Ontology, corresponding picture or video are retrieved from internet, Background is formed;
Based on the keyword Ontology, animation corresponding with the keyword is obtained from the grammar database preestablished
Template;
The text snippet is converted into voice data;
Rule is rendered according to default, the synthesis of the Background, the animation template and the voice data is rendered into video text
Part;
Wherein, the text snippet for extracting targeted news webpage text content, including:
Based on web page interlinkage, news web page is obtained;
The additional information in the news web page is removed, wherein the additional information includes one or more of:Advertisement, figure
Piece, video, framework and chart;
Extract the content of text for removing the news web page after the additional information;
Emphasis sentence is won from the content of text and constitutes the text snippet;
Wherein, the emphasis sentence of being won from the content of text constitutes the text snippet, including:
The similitude between every two sentences in the content of text is calculated successively;
According to the result of calculation of the similitude, to the statement classification in the content of text;
According to the result of the classification, from every quasi-sentence extracting sentence respectively is combined, and obtains candidate's summary;
The candidate summary minimum with pre-set text length of summarization difference is chosen in being made a summary from the candidate and is used as the news web page
Text snippet, wherein the pre-set text length of summarization is bright according to video length to be generated and text snippet set in advance
Reading rate is determined.
2. according to the method described in claim 1, it is characterised in that described to calculate every two sentences in the content of text successively
Between similitude, including:
Calculate the quantity sum that word is had in current two sentences;
By the quantity sum divided by the length average value of current two sentences, the similitude of current two sentences is obtained;
Method according to the similitude for obtaining current two sentences calculates similar between every two sentences in the content of text
Property.
3. according to the method described in claim 1, it is characterised in that the parsing text snippet, obtain the text and pluck
Keyword in wanting, including:
Participle is carried out to the text snippet;
The word obtained after the participle is compared with the word template in the grammar database, determines to obtain after participle
Word part of speech;
According to the judged result of the part of speech, noun and number are chosen from the word after participle as the pass of the text snippet
Keyword.
4. according to the method described in claim 1, it is characterised in that described that semantic analysis is carried out to the keyword, obtain institute
Keyword Ontology is stated, including:
The keyword is retrieved in the grammar database, all ontology describings related to the keyword are obtained;
The keyword sheet under current context is determined from all ontology describings of the keyword using network ontology language OWL
Body is semantic.
5. according to the method described in claim 1, it is characterised in that described to render rule according to default, by the Background, institute
State animation template and voice data synthesis is rendered into video file, including:
The mapping for setting keyword described in the voice data, Background corresponding with the keyword and animation template is closed
System;
According to the mapping relations, synthesis is carried out to the Background, the animation template and the voice data and rendered.
6. the text based web page contents generation video frequency abstract of a kind of method any one of application claim 1 to 5
Device, it is characterised in that including:
Extraction module, the text snippet for extracting targeted news webpage text content;
Keyword acquisition module, for parsing the text snippet, obtains the keyword in the text snippet;
Semantic module, for carrying out semantic analysis to the keyword, obtains the keyword Ontology;
Background graphics are into module, for based on the keyword Ontology, corresponding picture being retrieved from internet or is regarded
Frequently, Background is formed;
Animation template acquisition module, for based on the keyword Ontology, being obtained and institute from default grammar database
State the corresponding animation template of keyword;
Audio conversion module, for the text snippet to be converted into voice data;
Video Composition module, for rendering rule according to default, by the Background, the animation template and the voice data
Synthesis is rendered into video file.
7. device according to claim 6, it is characterised in that the keyword acquisition module, including:
Participle unit, for carrying out participle to the text snippet;
Part of speech determining unit, for the word obtained after the participle to be compared with the word template in the grammar database
It is right, determine the part of speech of word obtained after participle;
Keyword chooses unit, for the judged result according to the part of speech, and noun and number are chosen from the word after participle
It is used as the keyword of the text snippet.
8. the system that a kind of text based web page contents generate video frequency abstract, it is characterised in that including:User terminal and and user
Hold the NEWS SERVER end connected by internet;
The text based web page contents that the NEWS SERVER end is included described in claim 6 or 7 generate the dress of video frequency abstract
Put.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510156123.9A CN104731959B (en) | 2015-04-03 | 2015-04-03 | The method of text based web page contents generation video frequency abstract, apparatus and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510156123.9A CN104731959B (en) | 2015-04-03 | 2015-04-03 | The method of text based web page contents generation video frequency abstract, apparatus and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104731959A CN104731959A (en) | 2015-06-24 |
CN104731959B true CN104731959B (en) | 2017-10-17 |
Family
ID=53455846
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510156123.9A Expired - Fee Related CN104731959B (en) | 2015-04-03 | 2015-04-03 | The method of text based web page contents generation video frequency abstract, apparatus and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104731959B (en) |
Families Citing this family (38)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106340049A (en) * | 2015-07-15 | 2017-01-18 | 中国传媒大学 | Verification code generating method based on animation semantics |
CN105516457A (en) * | 2015-11-24 | 2016-04-20 | 小米科技有限责任公司 | Communication message processing method and apparatus |
CN106504304B (en) * | 2016-09-14 | 2019-09-24 | 厦门黑镜科技有限公司 | A kind of method and device of animation compound |
CN106777202A (en) * | 2016-12-23 | 2017-05-31 | 北京奇虎科技有限公司 | Information based on search recommends method and device |
CN106550268B (en) * | 2016-12-26 | 2020-08-07 | Tcl科技集团股份有限公司 | Video processing method and video processing device |
CN108513671B (en) | 2017-01-26 | 2021-08-27 | 华为技术有限公司 | Display method and terminal for 2D application in VR equipment |
CN108965737B (en) * | 2017-05-22 | 2022-03-29 | 腾讯科技(深圳)有限公司 | Media data processing method, device and storage medium |
CN107832382A (en) * | 2017-10-30 | 2018-03-23 | 百度在线网络技术(北京)有限公司 | Method, apparatus, equipment and storage medium based on word generation video |
CN107943839A (en) * | 2017-10-30 | 2018-04-20 | 百度在线网络技术(北京)有限公司 | Method, apparatus, equipment and storage medium based on picture and word generation video |
CN109756751B (en) * | 2017-11-07 | 2023-02-03 | 腾讯科技(深圳)有限公司 | Multimedia data processing method and device, electronic equipment and storage medium |
CN108009152A (en) * | 2017-12-04 | 2018-05-08 | 陕西识代运筹信息科技股份有限公司 | A kind of data processing method and device of the text similarity analysis based on Spark-Streaming |
CN108470036A (en) * | 2018-02-06 | 2018-08-31 | 北京奇虎科技有限公司 | A kind of method and apparatus that video is generated based on story text |
CN110309351A (en) * | 2018-02-14 | 2019-10-08 | 阿里巴巴集团控股有限公司 | Video image generation, device and the computer system of data object |
WO2019227429A1 (en) * | 2018-05-31 | 2019-12-05 | 优视科技新加坡有限公司 | Method, device, apparatus, terminal, server for generating multimedia content |
CN110807126B (en) * | 2018-08-01 | 2023-05-26 | 腾讯科技(深圳)有限公司 | Method, device, storage medium and equipment for converting article into video |
CN108986186B (en) * | 2018-08-14 | 2023-05-05 | 山东师范大学 | Method and system for converting text into video |
CN109189916B (en) * | 2018-08-17 | 2022-04-22 | 杜林蔚 | English abstract key information extraction method and device and electronic equipment |
CN109195007B (en) * | 2018-10-19 | 2021-09-07 | 深圳市轱辘车联数据技术有限公司 | Video generation method, device, server and computer readable storage medium |
CN109325135B (en) * | 2018-10-26 | 2023-08-08 | 平安科技(深圳)有限公司 | Text-based video generation method, device, computer equipment and storage medium |
CN109672909B (en) * | 2018-11-08 | 2024-09-17 | 北京奇虎科技有限公司 | Data processing method, device, electronic equipment and readable storage medium |
CN109978021B (en) * | 2019-03-07 | 2022-09-16 | 北京大学深圳研究生院 | A dual-stream video generation method based on different feature spaces of text |
CN114666663A (en) * | 2019-04-08 | 2022-06-24 | 百度(美国)有限责任公司 | Method and apparatus for generating video |
CN111866610B (en) * | 2019-04-08 | 2022-09-30 | 百度时代网络技术(北京)有限公司 | Method and apparatus for generating information |
CN110324709A (en) * | 2019-07-24 | 2019-10-11 | 新华智云科技有限公司 | A kind of processing method, device, terminal device and storage medium that video generates |
WO2021042234A1 (en) * | 2019-09-02 | 2021-03-11 | 深圳海付移通科技有限公司 | Application introduction method, mobile terminal, and server |
CN111158924B (en) * | 2019-12-02 | 2023-09-22 | 百度在线网络技术(北京)有限公司 | Content sharing method and device, electronic equipment and readable storage medium |
JP2023513095A (en) * | 2020-02-05 | 2023-03-30 | ループ ナウ テクノロジーズ インコーポレイテッド | Machine learning video curation for selection and display |
CN111327960B (en) * | 2020-03-05 | 2022-07-29 | 北京字节跳动网络技术有限公司 | Article processing method and device, electronic equipment and computer storage medium |
CN111694947A (en) * | 2020-06-15 | 2020-09-22 | 中国银行股份有限公司 | Text abstract display method, text abstract display device, storage medium and equipment |
CN112988099A (en) * | 2021-04-09 | 2021-06-18 | 上海掌门科技有限公司 | Video display method and device |
CN112988100A (en) * | 2021-04-09 | 2021-06-18 | 上海掌门科技有限公司 | Video playing method and device |
CN113905254B (en) * | 2021-09-03 | 2024-03-29 | 前海人寿保险股份有限公司 | Video synthesis method, device, system and readable storage medium |
CN114491152B (en) * | 2021-12-02 | 2023-10-31 | 南京硅基智能科技有限公司 | Method for generating abstract video, storage medium and electronic device |
CN114297354B (en) * | 2021-12-02 | 2023-12-12 | 南京硅基智能科技有限公司 | Bullet screen generation method and device, storage medium and electronic device |
CN114238689A (en) | 2021-12-17 | 2022-03-25 | 北京百度网讯科技有限公司 | Video generation method, video generation device, electronic device, storage medium, and program product |
CN114401431B (en) * | 2022-01-19 | 2024-04-09 | 中国平安人寿保险股份有限公司 | Virtual person explanation video generation method and related device |
CN114390220B (en) * | 2022-01-19 | 2023-12-08 | 中国平安人寿保险股份有限公司 | Animation video generation method and related device |
CN114528437A (en) * | 2022-02-24 | 2022-05-24 | 深圳市天下文章科技有限公司 | Text imaging method, device, equipment and computer readable storage medium |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103324760A (en) * | 2013-07-11 | 2013-09-25 | 中国农业大学 | Method and system for automatically generating nutrition health education video through commentary file |
CN103559214A (en) * | 2013-10-11 | 2014-02-05 | 中国农业大学 | Method and device for automatically generating video |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100306232A1 (en) * | 2009-05-28 | 2010-12-02 | Harris Corporation | Multimedia system providing database of shared text comment data indexed to video source data and related methods |
-
2015
- 2015-04-03 CN CN201510156123.9A patent/CN104731959B/en not_active Expired - Fee Related
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103324760A (en) * | 2013-07-11 | 2013-09-25 | 中国农业大学 | Method and system for automatically generating nutrition health education video through commentary file |
CN103559214A (en) * | 2013-10-11 | 2014-02-05 | 中国农业大学 | Method and device for automatically generating video |
Also Published As
Publication number | Publication date |
---|---|
CN104731959A (en) | 2015-06-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104731959B (en) | The method of text based web page contents generation video frequency abstract, apparatus and system | |
CN104731960B (en) | Method, apparatus and system based on ecommerce webpage content generation video frequency abstract | |
US11675977B2 (en) | Intelligent system that dynamically improves its knowledge and code-base for natural language understanding | |
WO2017024553A1 (en) | Information emotion analysis method and system | |
US11651015B2 (en) | Method and apparatus for presenting information | |
US10394886B2 (en) | Electronic device, computer-implemented method and computer program | |
US20170221476A1 (en) | Method and system for constructing a language model | |
CN103544321A (en) | Data processing method and device for micro-blog emotion information | |
CN111178056A (en) | Deep learning based file generation method and device and electronic equipment | |
CN103150331A (en) | Method and device for providing search engine tags | |
US20220365956A1 (en) | Method and apparatus for generating patent summary information, and electronic device and medium | |
CN103186522A (en) | Electronic device and natural language analyzing method thereof | |
Pandey et al. | Sentiment analysis using lexicon based approach | |
Horbach et al. | Improving the performance of standard part-of-speech taggers for computer-mediated communication | |
Xu et al. | RIP emojis and words to contextualize mourning on Twitter | |
CN113038175B (en) | Video processing method and device, electronic equipment and computer readable storage medium | |
CN106708932A (en) | Abstract extraction method and apparatus for reply of question and answer website | |
CN104077320B (en) | method and device for generating information to be issued | |
CN106844357A (en) | Big sentence storehouse interpretation method | |
CN106959945B (en) | Method and device for generating short titles for news based on artificial intelligence | |
CN104615654A (en) | Text summarization obtaining method and device | |
JP2019053262A (en) | Learning system | |
CN114625922A (en) | Word stock construction method and device, electronic equipment and storage medium | |
KR20210086736A (en) | A method of sentimental feature and vocabulary extraction | |
Thanadechteemapat et al. | Automatic content extraction and visualization of Thai websites for improved information representation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20171017 |
|
CF01 | Termination of patent right due to non-payment of annual fee |