[go: up one dir, main page]

CN104731959B - The method of text based web page contents generation video frequency abstract, apparatus and system - Google Patents

The method of text based web page contents generation video frequency abstract, apparatus and system Download PDF

Info

Publication number
CN104731959B
CN104731959B CN201510156123.9A CN201510156123A CN104731959B CN 104731959 B CN104731959 B CN 104731959B CN 201510156123 A CN201510156123 A CN 201510156123A CN 104731959 B CN104731959 B CN 104731959B
Authority
CN
China
Prior art keywords
text
keyword
news
web page
word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201510156123.9A
Other languages
Chinese (zh)
Other versions
CN104731959A (en
Inventor
李国祥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Wei Yang Science And Technology Ltd
Original Assignee
Beijing Wei Yang Science And Technology Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Wei Yang Science And Technology Ltd filed Critical Beijing Wei Yang Science And Technology Ltd
Priority to CN201510156123.9A priority Critical patent/CN104731959B/en
Publication of CN104731959A publication Critical patent/CN104731959A/en
Application granted granted Critical
Publication of CN104731959B publication Critical patent/CN104731959B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/7867Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, title and artist information, manually generated time, location and usage information, user ratings

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Library & Information Science (AREA)
  • Multimedia (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Field is generated the present invention relates to video, more particularly to text based web page contents generate the method for video frequency abstract, apparatus and system.This method, apparatus and system, can generate video frequency abstract, and video frequency abstract is shown on targeted news webpage based on targeted news webpage text content.User can obtain news information when browsing corresponding news web page by way of watching video frequency abstract, compared to existing news website by way of picture and word show news, can save the time cost that user reads news on news web page.

Description

The method of text based web page contents generation video frequency abstract, apparatus and system
Technical field
Field is generated the present invention relates to video, video frequency abstract is generated in particular to text based web page contents Method, apparatus and system.
Background technology
News website, refers to manage news as the website of main living means.Including national large-scale news portal, business Industry door, also local items door, various industries portal website also act as the news website of the sector.
Existing news website, news displaying is generally basede on word and picture to describe.Current user passes through more and more Approach, such as mobile phone, tablet personal computer, TV, to obtain the content of news.Existing news category is applied and obtained from news website Take the also many based on word picture of news.
The shortcoming of news is that, in the quick Consumption Age of content, user reads the time cost of word on existing news website It is relative to improve, it is unfavorable for news website word and shows news to user.
The content of the invention
It is an object of the invention to provide a kind of text based web page contents generation method of video frequency abstract, device and it is System, with the mode of generation video frequency abstract to the news on user's displayed web page, news on news web page is read to save user Time cost.
In a first aspect, the embodiments of the invention provide the method that a kind of web page contents of text generate video frequency abstract, including: Extract the text snippet of targeted news webpage text content;The text snippet is parsed, the key in the text snippet is obtained Word;Semantic analysis is carried out to the keyword, the keyword Ontology is obtained;Based on the keyword Ontology, from Corresponding picture or video are retrieved in internet, Background is formed, Background is formed;Based on the keyword Ontology, Animation template corresponding with the keyword is obtained from the grammar database preestablished;The text snippet is converted into sound Frequency evidence;Rule is rendered according to default, the synthesis of the Background, the animation template and the voice data is rendered into video File.
With reference in a first aspect, the embodiments of the invention provide the possible embodiment of the first of first aspect, wherein, institute Stating the text snippet of extraction targeted news webpage text content includes:Based on web page interlinkage, news web page is obtained;Remove described new The additional information in webpage is heard, wherein the additional information includes one or more of:Advertisement, picture, video, framework and figure Table;Extract the content of text of the news web page after additional information belonging to removing;Emphasis language is won from the content of text The sentence composition text snippet.
With reference in a first aspect, the embodiments of the invention provide the possible embodiment of second of first aspect, wherein, institute State and the emphasis sentence composition text snippet is won from the content of text, including:Calculate every in the content of text successively Similitude between two sentences;According to the result of calculation of the similitude, to the statement classification in the content of text;According to The result of the classification, from every quasi-sentence extracting sentence respectively is combined, and obtains candidate's summary;From candidate summary The candidate summary minimum with pre-set text length of summarization difference is chosen as the summary texts of the news web page, wherein described pre- If text snippet length is determined according to video length to be generated and the bright reading rate of text snippet set in advance.
With reference in a first aspect, the embodiments of the invention provide the possible embodiment of the third of first aspect, wherein, institute State and calculate similitude in the content of text between every two sentences successively, including:Calculate and word is had in current two sentences The quantity sum of language;By the quantity sum divided by the length average value of current two sentences, the phase of current two sentences is obtained Like property;Method according to the similitude for obtaining current two sentences calculates similar between every two sentences in the content of text Property.
With reference in a first aspect, the embodiments of the invention provide the possible embodiment of the 4th of first aspect kind, wherein, institute The parsing text snippet is stated, the keyword in the text snippet is obtained, including:Participle is carried out to the text snippet;Will The word obtained after the participle is compared with the word template in the grammar database, determines the word obtained after participle Part of speech;According to the judged result of the part of speech, choose noun from the word after participle and number is used as the text snippet Keyword.
With reference in a first aspect, the embodiments of the invention provide the possible embodiment of the 5th of first aspect kind, wherein, institute State and semantic analysis is carried out to the keyword, obtain the keyword Ontology, including:Retrieved in the grammar database The keyword, obtains all ontology describings related to the keyword;Using network ontology language OWL from the keyword All ontology describings in determine keyword Ontology under current context.
With reference in a first aspect, the embodiments of the invention provide the possible embodiment of the 6th of first aspect kind, wherein, institute State and render rule according to default, the synthesis of the Background, the animation template and the voice data is rendered into video file, Including:The mapping for setting keyword described in the voice data, Background corresponding with the keyword and animation template is closed System;According to the mapping relations, synthesis is carried out to the Background, the animation template and the voice data and rendered.
Second aspect, the embodiment of the present invention additionally provides the dress that a kind of text based web page contents generate video frequency abstract Put, including:Extraction module, the text snippet for extracting targeted news webpage text content;Keyword acquisition module, for solving The text snippet is analysed, the keyword in the text snippet is obtained;Semantic module, for carrying out language to the keyword Justice analysis, obtains the keyword Ontology;Background graphics are into module, for based on the keyword Ontology, from mutual Corresponding picture or video are retrieved in networking network, Background is formed;Animation template acquisition module, for based on the keyword sheet Body is semantic, and animation template corresponding with the keyword is obtained from default grammar database;Audio conversion module, for inciting somebody to action The text snippet is converted into voice data;Video Composition module, for rendering rule according to default, by the Background, institute State animation template and voice data synthesis is rendered into video file.
With reference to second aspect, the embodiments of the invention provide the possible embodiment of the first of second aspect, wherein, institute Keyword acquisition module is stated, including:Participle unit, for carrying out participle to the text snippet;Part of speech determining unit, for inciting somebody to action The word obtained after the participle is compared with the word template in the grammar database, determines the word obtained after participle Part of speech;Keyword chooses unit, for the judged result according to the part of speech, and noun and number are chosen from the word after participle Word as the text snippet keyword.
The third aspect, what the embodiment of the present invention additionally provided a kind of text based web page contents generation video frequency abstract is System, including:User terminal and the NEWS SERVER end being connected with user terminal by internet;The NEWS SERVER end includes such as the Text based web page contents described in two aspects and second aspect the first possible embodiment generate the dress of video frequency abstract Put.
The method of text based web page contents generation video frequency abstract provided in an embodiment of the present invention, apparatus and system, energy Video frequency abstract is enough generated based on targeted news webpage text content, and video frequency abstract is shown on targeted news webpage. User can obtain news information when browsing corresponding news web page by way of watching video frequency abstract, compared to existing news Website can save the time cost that user reads news on news web page by way of picture and word show news.
To enable the above objects, features and advantages of the present invention to become apparent, preferred embodiment cited below particularly, and coordinate Appended accompanying drawing, is described in detail below.
Brief description of the drawings
In order to illustrate the technical solution of the embodiments of the present invention more clearly, below will be attached to what is used required in embodiment Figure is briefly described, it will be appreciated that the following drawings illustrate only certain embodiments of the present invention, therefore is not construed as pair The restriction of scope, for those of ordinary skill in the art, on the premise of not paying creative work, can also be according to this A little accompanying drawings obtain other related accompanying drawings.
Fig. 1 shows that the text based web page contents that the embodiment of the present invention 1 is provided generate the method stream of video frequency abstract Journey schematic diagram;
Fig. 2 shows that the text based web page contents that the embodiment of the present invention 2 is provided generate the method stream of video frequency abstract Journey schematic diagram;
Fig. 3 shows that the text based web page contents that the embodiment of the present invention 3 is provided generate the method stream of video frequency abstract Journey schematic diagram;
Fig. 4 shows that the text based web page contents that the embodiment of the present invention 4 is provided generate the method stream of video frequency abstract Journey schematic diagram;
Fig. 5 shows the method that the text based web page contents that the embodiment of the present invention 5 is provided generate into video frequency abstract Schematic flow sheet;
Fig. 6 shows the device for the text based web page contents generation video frequency abstract that the embodiment of the present invention 6 is provided Structure is intended to;
Fig. 7 is shown in the device for the text based web page contents generation video frequency abstract that the embodiment of the present invention 7 is provided The structural representation of keyword acquisition module;
Fig. 8 shows that the system for the text based web page contents generation video frequency abstract that the embodiment of the present invention 8 is provided connects Connect schematic diagram.
Main element symbol description:
21st, extraction module;22nd, keyword acquisition module;23rd, semantic module;24th, background graphics are into module;25th, move Draw template acquisition module;26th, audio conversion module;27th, Video Composition module;22a, participle unit;22b, part of speech determining unit; 22c, keyword choose unit;31st, user terminal;32nd, NEWS SERVER end.
Embodiment
Below in conjunction with accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete Ground is described, it is clear that described embodiment is only a part of embodiment of the invention, rather than whole embodiments.Generally exist The component of the embodiment of the present invention described and illustrated in accompanying drawing can be arranged and designed with a variety of configurations herein.Cause This, the detailed description of the embodiments of the invention to providing in the accompanying drawings is not intended to limit claimed invention below Scope, but it is merely representative of the selected embodiment of the present invention.Based on embodiments of the invention, those skilled in the art are not doing The every other embodiment obtained on the premise of going out creative work, belongs to the scope of protection of the invention.
Embodiment 1:
The present embodiment 1 provides a kind of method that text based web page contents generate video frequency abstract, its schematic flow sheet For Fig. 1, main processing steps include:
Step S101:Extract the text snippet of targeted news webpage text content.
The text of news information may be not succinct enough, and user needs more time cost to obtain the news letter on webpage Breath, can be with more succinct and relatively complete introduce news to user by the text snippet for extracting news web page content of text Information.
In addition, news web page is in addition to news content text, generally also include other additional informations, such as in news web page On be attached with advertisement, picture, video, framework and/or chart etc., these additional informations are not effective content of newsletter archive, because This can first remove the additional information on news web page before the text snippet of news web page content of text is extracted.
Step S102, parsing text snippet, obtain the keyword in text snippet.
Keyword in text snippet includes the key message of news content, by extracting the acquisition that keyword can be easy The key message of news content.By this step, the keyword of newsletter archive can be obtained, keyword letter is provided for subsequent step Breath.
Step S103, to keyword carry out semantic analysis, obtain keyword Ontology.
Body is the clear and definite specification explanation of generalities, provides the basic terms and relation for constituting association area vocabulary, with And the regular definition for providing these vocabulary extensions constituted using these terms and relation.Using Ontology, it can obtain The basic description of news, the ontology describing of such as " shirt " is " dress ornament ".One word may have multiple ontology describings, for example The ontology describing of " apple " can be " fruit " or " company ", it is therefore desirable to determine the keyword under current context Ontology.This step carries out semantic analysis to keyword and obtains keyword Ontology, to be correctly based in subsequent step The web page contents generation video frequency abstract of text.
Step S104, based on keyword Ontology, retrieve corresponding picture or video from internet, form background Figure;
Synthetic video summary needs material.According to keyword Ontology, retrieved in internet search engine corresponding Picture or video, formed Background, be used as in subsequent step synthetic video make a summary material.
Step S105, based on keyword Ontology, from the grammar database preestablished obtain it is corresponding with keyword Animation template;
The corresponding animation template of different terms Ontology is stored in grammar database, according to keyword Ontology, Corresponding animation template can be obtained from grammar database.The corresponding template of different keywords is pieced together, resulted in Generate the teaming method of complete video summary.
Step S106, text snippet is converted into voice data;
Text snippet is changed into voice data using corresponding software, made a summary using this voice data as synthetic video Audio material.In video frequency abstract, news is told about to user in the way of audio, it is easier compared to word description, save The time of user.
Step S107, according to default rule is rendered, Background, animation template and institute's voice data synthesis are rendered into video File.
Generation video file is rendered, has corresponding software and renders rule, rule is rendered according to default, by Background, is moved Draw template and institute's voice data synthesis is rendered into video file.When synthesizing render video, keyword and pass in setting voice data The mapping relations of the corresponding Background of keyword and animation template;According to mapping relations, to Background, animation template and voice data Synthesis is carried out to render.For example keyword 1 occurs in audio at the 3rd second, next keyword occurs at the 5th second in audio, then The corresponding Background of keyword 1 shows according to animation template between the 3rd second and the 5th second.Pass through the method so that video frequency abstract In audio and image coincide, preferably show news to user.
The present embodiment 1 provides a kind of method that text based web page contents generate video frequency abstract, can be based on target News web page content of text generates video frequency abstract, and video frequency abstract is shown on targeted news webpage.User is browsing During corresponding news web page, news content information can be obtained by way of watching video frequency abstract, it is logical compared to existing news website Cross picture and word shows the mode of news, the time cost that user reads news on news web page can be saved.
Embodiment 2:
The present embodiment 2 provides a kind of web page contents generation video for being preferably based on text on the basis of embodiment 1 and plucked The method wanted, its schematic flow sheet is Fig. 2, and key step includes:
Step S201, based on web page interlinkage, obtain news web page;
The address of web page interlinkage, when can be that user accesses news web page, the Present News net sent to NEWS SERVER Page address;It can also be the address of all news web pages on the corresponding news website of NEWS SERVER scanning acquisition.Press service Web page interlinkage of the device based on acquisition, obtains corresponding news web page information.
Step S202, the additional information removed in news web page, wherein additional information include one or more of:Extensively Announcement, picture, video, framework and chart;
On the news web page that NEWS SERVER is obtained, in addition to including the corresponding character introduction of news content, it is also possible to have Other incoherent additional informations, such as advertisement, picture, video, framework and chart, this additional information have no for understanding news Use, therefore step S202 is used for removing additional information on news web page.
Step S203, the content of text for extracting the news web page after removal additional information;
Eliminate after the additional information on news web page, NEWS SERVER obtains the text message being shown to news, Based on corresponding text message generation video frequency abstract on news web page in the step of so as to after.
Step S204, win from content of text emphasis sentence composition text snippet.
Newsletter archive on news web page may be not succinct enough, containing more word, user need the more time into News information on this acquisition webpage, accordingly, it would be desirable to emphasis sentence composition text snippet be won from content of text, so as to simpler The news on news web page is introduced to user cleanly, the time cost that user obtains news information is saved.
Step S205, parsing text snippet, obtain the keyword in text snippet.
This step obtains the keyword of news in brief, and key word information is provided for subsequent step.
Step S206, to keyword carry out semantic analysis, obtain keyword Ontology.
The semantic analysis that this step obtains keyword obtains Ontology, correctly to be given birth in subsequent step based on text Into webpage audio content summary.
Step S207, based on keyword Ontology, retrieve corresponding picture or video from internet, form background Figure;
This step obtains Background, the material made a summary as synthetic video in subsequent step.
Step S208, based on keyword Ontology, from the grammar database preestablished obtain it is corresponding with keyword Animation template;
This step results in the mode of generation video frequency abstract.
Step S209, text snippet is converted into voice data;
Text snippet is changed into voice data by this step, the audio element made a summary using this voice data as synthetic video Material.
Step S210, according to default rule is rendered, Background, animation template and institute's voice data synthesis are rendered into video File.
This step renders generation video file.
The side that a kind of method for text based content generation video frequency abstract that the present embodiment 2 is provided is provided with embodiment 1 Method is compared, and its course of work is identical with advantage, repeats no more.
Embodiment 3:
The present embodiment 3 provides a kind of web page contents generation video for being preferably based on text on the basis of embodiment 2 and plucked The method wanted, its schematic flow sheet is Fig. 3, and key step includes:
Step S301, based on web page interlinkage, obtain news web page;
This step obtains corresponding news web page.
Step S302, the additional information removed in news web page, wherein additional information include one or more of:Extensively Announcement, picture, video, framework and chart;
This step is used for removing the additional information on news web page.
Step S303, the content of text for extracting the news web page after removal additional information;
This step obtains the text message to news.
Similitude in step S304, successively calculating content of text between every two sentences.
Similar sentence generally comprises similar information.It is situated between displaying news content to user in order to succinct, complete, can be with According to similitude by statement classification in content of text, a sentence is proposed in each class, so just can succinctly, completely to Introduce the news content on news web page in family.
Specifically the method for similitude is between two sentences of calculating:
First, the quantity sum that word is had in current two sentences is calculated;
The sum of all words appeared in simultaneously in current two word is calculated, its sum is bigger, then it is assumed that two sentences Between similitude it is bigger;
Secondly, by quantity sum divided by the length average value of current two sentences, the similitude of current two sentences is obtained;
The length of sentence is the number of words in sentence defined in this method.All words that will be appeared in simultaneously in current two word The average value of the sum of language divided by the number of words of two sentences, obtains the similitude of current two sentences, i.e., what two sentences had Word is more, and two mean lengths of utterance are shorter, then it is assumed that similitude is bigger between two sentences.Can be easily with the method Obtain the similitude between two sentences.For example, two words in content of text are respectively sentence 1 and sentence 2;Wrapped in sentence 1 Containing 4 words, each word length is 2 words, respectively word 1, word 2, word 3, word 4;6 words are included in sentence 2 Language, each word length is 2 words, respectively word 3, word 4, word 5, word 6, word 7, word 8.Sentence 1 and sentence 2 In have word 3 and word 4 totally 2 words;The length of sentence 1 is 8 words, and the length of sentence 2 is 12 words, and this two sentences are averagely long Degree is 10 words;Therefore the similitude of sentence 1 and sentence 2 is 0.2.
Using the above method, the similitude between every two sentences in content of text can be calculated.
Step S305, the result of calculation according to similitude, to the statement classification in content of text;
The result calculated according to step S304 is to all statement classifications, if for example, similar between sentence 1 and sentence 2 Property be more than the similitude between sentence 1 and other all sentences and the similitude between sentence 1 and sentence 2 and be more than between sentence Average similarity, then 2 points of sentence 1 and sentence is a class;Otherwise, 2 points of sentence 1 and sentence is different classes.By by sentence Classification, it is believed that the sentence in same class expresses the same meaning;All classes are all extracted into a sentence, can completely, The succinct newsletter archive summary info introduced to user on news web page, saves the time cost that user obtains news.
Step S306, the result according to classification, from every quasi-sentence extracting sentence respectively is combined, and obtains candidate and plucks Will;
The content of text being previously obtained can be classified according to the similitude between sentence, and the sentence in content of text is divided into The sentence of Similar content may be possessed in multiple classes, each class containing more than one.If not extracting summary, news content therefrom It is troublesome.Extract a sentence respectively from every quasi-sentence, candidate's summary can be obtained, completely, succinctly can be situated between to user The content continued on news web page.May also there are multiple schemes containing many sentences, candidate's summary of acquisition in the class of each sentence, Need to take suitable scheme by follow-up step.
Step S307, made a summary from candidate in choose the candidate summary minimum with pre-set text length of summarization difference and be used as news The summary texts of text, wherein pre-set text length of summarization are bright according to video length to be generated and text snippet set in advance Reading rate is determined.
Pre-set text length of summarization determines according to video length to be generated and the bright reading rate of text snippet set in advance, Such as video length is set to 1 minute, and the bright reading rate of text snippet is set to 120 words per minute clocks, then pre-set text length of summarization is set to 120 words., it is necessary to filter out suitable text snippet in multiple text snippets that step 1d3 is obtained.It is in multiple text snippets and pre- If the minimum candidate's summary of text snippet length difference be chosen for the summary texts of newsletter archive.Make when there are multiple schemes Content of text length of summarization it is identical when, using the scheme for extracting most preceding sentence.The text sentence of such as acquisition can be divided into two Class, wherein sentence 1 and sentence 3 are a classes, and sentence 2 and sentence 4 are the number of words and default text of another class, sentence 1 plus sentence 2 This length of summarization difference is minimum and sentence 1 adds the length of sentence 2 to be equal to the length of sentence 3 plus sentence 4, and now sentence 1 is text In the sentence that occurs at first, then text snippet be made up of sentence 1 and sentence 2.Text snippet is obtained by this step, can be complete Whole, the succinct news shown to user on targeted news webpage.
Step S308, parsing text snippet, obtain the keyword in text snippet.
This step can obtain the keyword of news in brief, and key word information is provided for subsequent step.
Step S309, to keyword carry out semantic analysis, obtain keyword Ontology.
The semantic analysis that this step obtains keyword obtains Ontology, correctly to be given birth in subsequent step based on text Into webpage audio content summary.
Step S310, based on keyword Ontology, retrieve corresponding picture or video from internet, form background Figure;
This step obtains Background, the material made a summary as synthetic video in subsequent step.
Step S311, based on keyword Ontology, from the grammar database preestablished obtain it is corresponding with keyword Animation template;
This step results in the corresponding video template of keyword.
Step S312, text snippet is converted into voice data;
Text snippet is changed into voice data by this step, the audio element made a summary using this voice data as synthetic video Material.
Step S313, according to default rule is rendered, Background, animation template and institute's voice data synthesis are rendered into video File.
This step renders generation video file.
Embodiment 4:
The present embodiment 4 provides a kind of web page contents generation video for being preferably based on text on the basis of embodiment 1 and plucked The method wanted, its schematic flow sheet is Fig. 4, and key step includes:
Step S401, the text snippet for extracting targeted news webpage text content.
This step obtains text snippet, can completely, it is succinct to the news on user's displaying targeted news webpage.
Step S402, to text snippet carry out participle;
Using Chinese character as base unit in the statement of Chinese sentence, without the participle information in similar English sentence, thus it is right first Text snippet carries out participle, obtains participle information in text snippet.
Step S403, the word obtained after participle is compared with the word template in default grammar database, really Determine the part of speech of word obtained after participle;
Word template is stored in grammar database.By the word that will be obtained after participle with default grammar database Word template is compared, and can determine that the part of speech of the word obtained after participle, i.e. word is noun, verb, number, measure word, generation Word, adjective, adverbial word, preposition, conjunction, auxiliary word, the part of speech of onomatopoeia and interjection are divided.Similar function word such as adverbial word, preposition, company Word, auxiliary word, onomatopoeia and interjection do not include key message generally, pass through word and the default syntax data that will be obtained after participle Word template in storehouse is compared, and determines the part of speech of word obtained after participle, can more rapidly obtain keyword.
Step S404, the judged result according to part of speech, choose noun from the word after participle and number are plucked as text The keyword wanted.
The keyword of news in brief is noun and number in news web page, and noun describes the target and object of news, number Word describes the data message in news.Noun and number in text snippet are extracted, the key message of news can be obtained.
Step S405, to keyword carry out semantic analysis, obtain keyword Ontology.
The semantic analysis that this step obtains keyword obtains Ontology, correctly to be given birth in subsequent step based on text Into webpage audio content summary.
Step S406, based on keyword Ontology, retrieve corresponding picture or video from internet, form background Figure;
This step obtains Background, the material made a summary as synthetic video in subsequent step.
Step S407, based on keyword Ontology, from the grammar database preestablished obtain it is corresponding with keyword Animation template;
This step results in the corresponding video template of keyword.
Step S408, text snippet is converted into voice data;
Text snippet is changed into voice data by this step, the audio element made a summary using this voice data as synthetic video Material.
Step S409, according to default rule is rendered, Background, animation template and institute's voice data synthesis are rendered into video File.
This step renders generation video file.
Embodiment 5:
The present embodiment 5 provides a kind of web page contents generation video for being preferably based on text on the basis of embodiment 1 and plucked The method wanted, its schematic flow sheet is Fig. 5, and key step includes:
Step S501, the text snippet for extracting targeted news webpage text content.
This step obtain text snippet, can completely, succinctly the news content introduced to user on targeted news webpage.
Step S502, parsing text snippet, obtain the keyword in text snippet.
This step obtains the keyword of news in brief, and key word information is provided for subsequent step.
Step S503, the search key in default grammar database, obtain all bodies related to keyword and retouch State;
Be stored with the corresponding ontology describing of each word in default grammar database, is retrieved in grammar database crucial Word, can obtain all ontology describings related to keyword.For example, by retrieving grammar database, obtaining the sheet of " shirt " Body description is " dress ornament ".
Step S504, determined from all ontology describings of keyword under current context using network ontology language OWL Keyword Ontology.
Keyword may be containing multiple ontology describings, such as " apple ", it may be possible to " fruit ", it is also possible to " company ", this When, the keyword Ontology under current context is determined using OWL, the correct description of keyword is obtained, so as to follow-up step Video frequency abstract is correctly generated based on news web page content in rapid.
Step S505, based on keyword Ontology, retrieve corresponding picture or video from internet, form background Figure;
This step obtains Background, the material made a summary as synthetic video in subsequent step.
Step S506, based on keyword Ontology, from the grammar database preestablished obtain it is corresponding with keyword Animation template;
This step results in the corresponding video template of keyword.
Step S507, text snippet is converted into voice data;
Text snippet is changed into voice data by this step, the audio element made a summary using this voice data as synthetic video Material..
Step S508, according to default rule is rendered, Background, animation template and institute's voice data synthesis are rendered into video File.
This step renders generation video file.
Embodiment 6:
The present embodiment 6 provides the device that a kind of text based web page contents generate video frequency abstract, its structural representation Such as Fig. 6, including:
Extraction module 21, the text snippet for extracting targeted news webpage text content;
Keyword acquisition module 22, for parsing text snippet, obtains the keyword in text snippet;
Semantic module 23, for carrying out semantic analysis to keyword, obtains keyword Ontology;
Background graphics into module 24, for based on keyword Ontology, retrieved from internet corresponding picture or Video, forms Background;
Animation template acquisition module 25, for based on keyword Ontology, being obtained from the grammar database preestablished Take animation template corresponding with keyword;
Audio conversion module 26, for text snippet to be converted into voice data;
Video Composition module 27, for rendering rule according to default, wash with watercolours is synthesized by Background, animation template and voice data Contaminate for video file.
A kind of text based web page contents that the present embodiment 6 is provided generate the device of video frequency abstract, pass through extraction module 21 extract the text snippet of targeted news webpage text content;Then the text for parsing extraction by keyword acquisition module 22 is plucked Will, obtain the keyword in text snippet;Afterwards, the analysis of key word of semantic module 23 obtains keyword Ontology, then Keyword Ontology is based on into module 24 by background graphics, corresponding picture or video are retrieved from internet, the back of the body is formed Jing Tu;Keyword Ontology is based on by animation template acquisition module 25, obtained from the grammar database preestablished with closing The corresponding animation template of keyword;Text snippet is converted into voice data by audio conversion module 26;Finally, Video Composition module 27 Rule is rendered according to default, the synthesis of Background, animation template and voice data is rendered into video file.When user accesses news During webpage, it can see on webpage and video frequency abstract is generated based on corresponding news web page content.
The present embodiment 6 provides the device that a kind of text based web page contents generate video frequency abstract, can be based on target News web page content of text generates video frequency abstract, and video frequency abstract is shown on targeted news webpage.User is browsing During corresponding news web page, news information can be obtained by way of watching video frequency abstract, figure is passed through compared to existing news website Piece and word show the mode of news, can save the time cost that user reads news on news web page.
Embodiment 7:
The present embodiment 7 provides a kind of text based web page contents on the basis of embodiment 6 and generates video frequency abstract Device, the wherein structural representation of keyword acquisition module 22 as shown in fig. 7, comprises:
Participle unit 22a, for carrying out participle to text snippet;
Part of speech determining unit 22b, for word and the word template in default grammar database that will be obtained after participle It is compared, determines the part of speech of word obtained after participle;
Keyword chooses unit 22c, for the judged result according to part of speech, and noun and number are chosen from the word after participle Word as text snippet keyword.
Embodiment 8:
The present embodiment 8 provides the system that a kind of text based web page contents generate video frequency abstract, including:With Family end 31 and NEWS SERVER end 32, user terminal 21 are connected with NEWS SERVER end 32 by internet, and its connection diagram is such as Shown in Fig. 8.
The text based web page contents that NEWS SERVER end 32 includes being provided such as embodiment 6 or 7 generate video frequency abstract Device.
The generation of NEWS SERVER end 32 generates video frequency abstract based on news web page content, and user is accessed newly by user terminal 21 When hearing webpage, the video frequency abstract of the web page contents generation based on corresponding text can be seen on webpage.
The present embodiment 8 provides the system that a kind of text based web page contents generate video frequency abstract, can be based on target News web page content of text generates video frequency abstract, and video frequency abstract is shown on targeted news webpage.User is browsing During corresponding news web page, news information can be obtained by way of watching video frequency abstract, figure is passed through compared to existing news website Piece and word show the mode of news, can save the time cost that user reads news on news web page.
User terminal 31 can be the application of iPhone mobile phones in the embodiment, iPad tablet personal computers are applied, Android phone is answered Applied with, Android tablet personal computers, TV set-top box is applied, WindowS platform softwares are applied, Mac platform softwares are applied, IE Any one in browser plug-in, Chrome browser plug-ins and Firefox browser plug-in unit.
News website end 32 can be WordpreSS plug-in units, Drupal plug-in units, Joomla plug-in units, Mediawiki plug-in units, Any one in DiScuz plug-in units, PhpWind plug-in units and webpage javaScript scripts.
The technique effect and preceding method of each device and module that the embodiment of the present invention is provided, its realization principle and generation Embodiment is identical, to briefly describe, and the embodiment part does not refer to part, refers to corresponding contents in preceding method embodiment.
, can be with several embodiments provided herein, it should be understood that disclosed systems, devices and methods Realize by another way.Device embodiment described above is only schematical, for example, the division of the unit, It is only a kind of division of logic function, there can be other dividing mode when actually realizing, in another example, multiple units or component can To combine or be desirably integrated into another system, or some features can be ignored, or not perform.It is another, it is shown or beg for The coupling each other of opinion or direct-coupling or communication connection can be by some communication interfaces, device or unit it is indirect Coupling is communicated to connect, and can be electrical, machinery or other forms.
In addition, each functional unit in each embodiment of the invention can be integrated in a processing unit, can also That unit is individually physically present, can also two or more units it is integrated in a unit.
The foregoing is only a specific embodiment of the invention, but protection scope of the present invention is not limited thereto, any Those familiar with the art the invention discloses technical scope in, change or replacement can be readily occurred in, should all be contained Cover within protection scope of the present invention.Therefore, protection scope of the present invention described should be defined by scope of the claims.

Claims (8)

1. a kind of method that text based web page contents generate video frequency abstract, it is characterised in that including:
Extract the text snippet of targeted news webpage text content;
The text snippet is parsed, the keyword in the text snippet is obtained;
Semantic analysis is carried out to the keyword, the keyword Ontology is obtained;
Based on the keyword Ontology, corresponding picture or video are retrieved from internet, Background is formed;
Based on the keyword Ontology, animation corresponding with the keyword is obtained from the grammar database preestablished Template;
The text snippet is converted into voice data;
Rule is rendered according to default, the synthesis of the Background, the animation template and the voice data is rendered into video text Part;
Wherein, the text snippet for extracting targeted news webpage text content, including:
Based on web page interlinkage, news web page is obtained;
The additional information in the news web page is removed, wherein the additional information includes one or more of:Advertisement, figure Piece, video, framework and chart;
Extract the content of text for removing the news web page after the additional information;
Emphasis sentence is won from the content of text and constitutes the text snippet;
Wherein, the emphasis sentence of being won from the content of text constitutes the text snippet, including:
The similitude between every two sentences in the content of text is calculated successively;
According to the result of calculation of the similitude, to the statement classification in the content of text;
According to the result of the classification, from every quasi-sentence extracting sentence respectively is combined, and obtains candidate's summary;
The candidate summary minimum with pre-set text length of summarization difference is chosen in being made a summary from the candidate and is used as the news web page Text snippet, wherein the pre-set text length of summarization is bright according to video length to be generated and text snippet set in advance Reading rate is determined.
2. according to the method described in claim 1, it is characterised in that described to calculate every two sentences in the content of text successively Between similitude, including:
Calculate the quantity sum that word is had in current two sentences;
By the quantity sum divided by the length average value of current two sentences, the similitude of current two sentences is obtained;
Method according to the similitude for obtaining current two sentences calculates similar between every two sentences in the content of text Property.
3. according to the method described in claim 1, it is characterised in that the parsing text snippet, obtain the text and pluck Keyword in wanting, including:
Participle is carried out to the text snippet;
The word obtained after the participle is compared with the word template in the grammar database, determines to obtain after participle Word part of speech;
According to the judged result of the part of speech, noun and number are chosen from the word after participle as the pass of the text snippet Keyword.
4. according to the method described in claim 1, it is characterised in that described that semantic analysis is carried out to the keyword, obtain institute Keyword Ontology is stated, including:
The keyword is retrieved in the grammar database, all ontology describings related to the keyword are obtained;
The keyword sheet under current context is determined from all ontology describings of the keyword using network ontology language OWL Body is semantic.
5. according to the method described in claim 1, it is characterised in that described to render rule according to default, by the Background, institute State animation template and voice data synthesis is rendered into video file, including:
The mapping for setting keyword described in the voice data, Background corresponding with the keyword and animation template is closed System;
According to the mapping relations, synthesis is carried out to the Background, the animation template and the voice data and rendered.
6. the text based web page contents generation video frequency abstract of a kind of method any one of application claim 1 to 5 Device, it is characterised in that including:
Extraction module, the text snippet for extracting targeted news webpage text content;
Keyword acquisition module, for parsing the text snippet, obtains the keyword in the text snippet;
Semantic module, for carrying out semantic analysis to the keyword, obtains the keyword Ontology;
Background graphics are into module, for based on the keyword Ontology, corresponding picture being retrieved from internet or is regarded Frequently, Background is formed;
Animation template acquisition module, for based on the keyword Ontology, being obtained and institute from default grammar database State the corresponding animation template of keyword;
Audio conversion module, for the text snippet to be converted into voice data;
Video Composition module, for rendering rule according to default, by the Background, the animation template and the voice data Synthesis is rendered into video file.
7. device according to claim 6, it is characterised in that the keyword acquisition module, including:
Participle unit, for carrying out participle to the text snippet;
Part of speech determining unit, for the word obtained after the participle to be compared with the word template in the grammar database It is right, determine the part of speech of word obtained after participle;
Keyword chooses unit, for the judged result according to the part of speech, and noun and number are chosen from the word after participle It is used as the keyword of the text snippet.
8. the system that a kind of text based web page contents generate video frequency abstract, it is characterised in that including:User terminal and and user Hold the NEWS SERVER end connected by internet;
The text based web page contents that the NEWS SERVER end is included described in claim 6 or 7 generate the dress of video frequency abstract Put.
CN201510156123.9A 2015-04-03 2015-04-03 The method of text based web page contents generation video frequency abstract, apparatus and system Expired - Fee Related CN104731959B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510156123.9A CN104731959B (en) 2015-04-03 2015-04-03 The method of text based web page contents generation video frequency abstract, apparatus and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510156123.9A CN104731959B (en) 2015-04-03 2015-04-03 The method of text based web page contents generation video frequency abstract, apparatus and system

Publications (2)

Publication Number Publication Date
CN104731959A CN104731959A (en) 2015-06-24
CN104731959B true CN104731959B (en) 2017-10-17

Family

ID=53455846

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510156123.9A Expired - Fee Related CN104731959B (en) 2015-04-03 2015-04-03 The method of text based web page contents generation video frequency abstract, apparatus and system

Country Status (1)

Country Link
CN (1) CN104731959B (en)

Families Citing this family (38)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106340049A (en) * 2015-07-15 2017-01-18 中国传媒大学 Verification code generating method based on animation semantics
CN105516457A (en) * 2015-11-24 2016-04-20 小米科技有限责任公司 Communication message processing method and apparatus
CN106504304B (en) * 2016-09-14 2019-09-24 厦门黑镜科技有限公司 A kind of method and device of animation compound
CN106777202A (en) * 2016-12-23 2017-05-31 北京奇虎科技有限公司 Information based on search recommends method and device
CN106550268B (en) * 2016-12-26 2020-08-07 Tcl科技集团股份有限公司 Video processing method and video processing device
CN108513671B (en) 2017-01-26 2021-08-27 华为技术有限公司 Display method and terminal for 2D application in VR equipment
CN108965737B (en) * 2017-05-22 2022-03-29 腾讯科技(深圳)有限公司 Media data processing method, device and storage medium
CN107832382A (en) * 2017-10-30 2018-03-23 百度在线网络技术(北京)有限公司 Method, apparatus, equipment and storage medium based on word generation video
CN107943839A (en) * 2017-10-30 2018-04-20 百度在线网络技术(北京)有限公司 Method, apparatus, equipment and storage medium based on picture and word generation video
CN109756751B (en) * 2017-11-07 2023-02-03 腾讯科技(深圳)有限公司 Multimedia data processing method and device, electronic equipment and storage medium
CN108009152A (en) * 2017-12-04 2018-05-08 陕西识代运筹信息科技股份有限公司 A kind of data processing method and device of the text similarity analysis based on Spark-Streaming
CN108470036A (en) * 2018-02-06 2018-08-31 北京奇虎科技有限公司 A kind of method and apparatus that video is generated based on story text
CN110309351A (en) * 2018-02-14 2019-10-08 阿里巴巴集团控股有限公司 Video image generation, device and the computer system of data object
WO2019227429A1 (en) * 2018-05-31 2019-12-05 优视科技新加坡有限公司 Method, device, apparatus, terminal, server for generating multimedia content
CN110807126B (en) * 2018-08-01 2023-05-26 腾讯科技(深圳)有限公司 Method, device, storage medium and equipment for converting article into video
CN108986186B (en) * 2018-08-14 2023-05-05 山东师范大学 Method and system for converting text into video
CN109189916B (en) * 2018-08-17 2022-04-22 杜林蔚 English abstract key information extraction method and device and electronic equipment
CN109195007B (en) * 2018-10-19 2021-09-07 深圳市轱辘车联数据技术有限公司 Video generation method, device, server and computer readable storage medium
CN109325135B (en) * 2018-10-26 2023-08-08 平安科技(深圳)有限公司 Text-based video generation method, device, computer equipment and storage medium
CN109672909B (en) * 2018-11-08 2024-09-17 北京奇虎科技有限公司 Data processing method, device, electronic equipment and readable storage medium
CN109978021B (en) * 2019-03-07 2022-09-16 北京大学深圳研究生院 A dual-stream video generation method based on different feature spaces of text
CN114666663A (en) * 2019-04-08 2022-06-24 百度(美国)有限责任公司 Method and apparatus for generating video
CN111866610B (en) * 2019-04-08 2022-09-30 百度时代网络技术(北京)有限公司 Method and apparatus for generating information
CN110324709A (en) * 2019-07-24 2019-10-11 新华智云科技有限公司 A kind of processing method, device, terminal device and storage medium that video generates
WO2021042234A1 (en) * 2019-09-02 2021-03-11 深圳海付移通科技有限公司 Application introduction method, mobile terminal, and server
CN111158924B (en) * 2019-12-02 2023-09-22 百度在线网络技术(北京)有限公司 Content sharing method and device, electronic equipment and readable storage medium
JP2023513095A (en) * 2020-02-05 2023-03-30 ループ ナウ テクノロジーズ インコーポレイテッド Machine learning video curation for selection and display
CN111327960B (en) * 2020-03-05 2022-07-29 北京字节跳动网络技术有限公司 Article processing method and device, electronic equipment and computer storage medium
CN111694947A (en) * 2020-06-15 2020-09-22 中国银行股份有限公司 Text abstract display method, text abstract display device, storage medium and equipment
CN112988099A (en) * 2021-04-09 2021-06-18 上海掌门科技有限公司 Video display method and device
CN112988100A (en) * 2021-04-09 2021-06-18 上海掌门科技有限公司 Video playing method and device
CN113905254B (en) * 2021-09-03 2024-03-29 前海人寿保险股份有限公司 Video synthesis method, device, system and readable storage medium
CN114491152B (en) * 2021-12-02 2023-10-31 南京硅基智能科技有限公司 Method for generating abstract video, storage medium and electronic device
CN114297354B (en) * 2021-12-02 2023-12-12 南京硅基智能科技有限公司 Bullet screen generation method and device, storage medium and electronic device
CN114238689A (en) 2021-12-17 2022-03-25 北京百度网讯科技有限公司 Video generation method, video generation device, electronic device, storage medium, and program product
CN114401431B (en) * 2022-01-19 2024-04-09 中国平安人寿保险股份有限公司 Virtual person explanation video generation method and related device
CN114390220B (en) * 2022-01-19 2023-12-08 中国平安人寿保险股份有限公司 Animation video generation method and related device
CN114528437A (en) * 2022-02-24 2022-05-24 深圳市天下文章科技有限公司 Text imaging method, device, equipment and computer readable storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103324760A (en) * 2013-07-11 2013-09-25 中国农业大学 Method and system for automatically generating nutrition health education video through commentary file
CN103559214A (en) * 2013-10-11 2014-02-05 中国农业大学 Method and device for automatically generating video

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100306232A1 (en) * 2009-05-28 2010-12-02 Harris Corporation Multimedia system providing database of shared text comment data indexed to video source data and related methods

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103324760A (en) * 2013-07-11 2013-09-25 中国农业大学 Method and system for automatically generating nutrition health education video through commentary file
CN103559214A (en) * 2013-10-11 2014-02-05 中国农业大学 Method and device for automatically generating video

Also Published As

Publication number Publication date
CN104731959A (en) 2015-06-24

Similar Documents

Publication Publication Date Title
CN104731959B (en) The method of text based web page contents generation video frequency abstract, apparatus and system
CN104731960B (en) Method, apparatus and system based on ecommerce webpage content generation video frequency abstract
US11675977B2 (en) Intelligent system that dynamically improves its knowledge and code-base for natural language understanding
WO2017024553A1 (en) Information emotion analysis method and system
US11651015B2 (en) Method and apparatus for presenting information
US10394886B2 (en) Electronic device, computer-implemented method and computer program
US20170221476A1 (en) Method and system for constructing a language model
CN103544321A (en) Data processing method and device for micro-blog emotion information
CN111178056A (en) Deep learning based file generation method and device and electronic equipment
CN103150331A (en) Method and device for providing search engine tags
US20220365956A1 (en) Method and apparatus for generating patent summary information, and electronic device and medium
CN103186522A (en) Electronic device and natural language analyzing method thereof
Pandey et al. Sentiment analysis using lexicon based approach
Horbach et al. Improving the performance of standard part-of-speech taggers for computer-mediated communication
Xu et al. RIP emojis and words to contextualize mourning on Twitter
CN113038175B (en) Video processing method and device, electronic equipment and computer readable storage medium
CN106708932A (en) Abstract extraction method and apparatus for reply of question and answer website
CN104077320B (en) method and device for generating information to be issued
CN106844357A (en) Big sentence storehouse interpretation method
CN106959945B (en) Method and device for generating short titles for news based on artificial intelligence
CN104615654A (en) Text summarization obtaining method and device
JP2019053262A (en) Learning system
CN114625922A (en) Word stock construction method and device, electronic equipment and storage medium
KR20210086736A (en) A method of sentimental feature and vocabulary extraction
Thanadechteemapat et al. Automatic content extraction and visualization of Thai websites for improved information representation

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20171017

CF01 Termination of patent right due to non-payment of annual fee