Summary of the invention
The objective of the invention is to, for addressing the above problem adaptive package system of a kind of web page contents and the method for proposing.
For realizing above-mentioned purpose, the present invention provides a kind of web page contents adaptive method for packing, comprises:
(1) step of source web page and parameter acquiring is obtained source web page and termination property parameter;
(2) step of abstract syntax tree generation is utilized source web page DOM (DOM Document Object Model) tree, and generates the set of related abstractions syntax tree according to the termination property parameter;
(3) adaptation step is carried out adaptive conversion process respectively to each item content in the abstract syntax tree;
(4) add the new step of using, the abstract syntax tree content is carried out subject classification, in abstract syntax tree, add new node and content according to classification results;
(5) step of webpage encapsulation is basic granularity with abstract syntax tree, adopts indicating template that each abstract syntax tree is reorganized and is packaged into webpage, web page content on display terminal.
In the technique scheme, the said step of obtaining source web page and termination property parameter further comprises:
(1-1) request of purpose url and product correlation parameter is initiated to comprise in the terminal to acting server; (1-2) acting server parses the url of request and the product type of terminal device after obtaining user's request; And from profile document library or database, search corresponding profile document; If find and then resolve the document acquisition terminal device correlation parameter information; If the profile document or the data base table recording of this product type then do not require the terminal that its detail parameters and product type are sent to acting server once more, generate corresponding profile document and warehouse-in preservation.
Said termination property parameter comprises: end product model, screen size, screen resolution, quality of colour, terminal storage ability and processor parameter.The said classification of step (4) adopts the machine learning method training to obtain text classifier, and adopts this sorter that AST is set content and carry out subject classification according to classification and level.
The present invention also provides a kind of web page contents adaptive package system, and this system comprises:
Source web page and parameter acquisition module are used to resolve the user to the request that terminal server sends, and obtain source web page and termination property parameter;
The abstract syntax tree generation module is used to utilize source web page DOM (DOM Document Object Model) tree, and generates the set of related abstractions syntax tree according to the termination property parameter;
Adaptation module is used for each item content of abstract syntax tree is carried out adaptive conversion process respectively;
The new application added module, is used for the abstract syntax tree content is carried out subject classification, in abstract syntax tree, adds new node and content according to classification results;
The webpage package module, being used for abstract syntax tree is basic granularity, adopts indicating template that each abstract syntax tree is reorganized and is packaged into webpage, web page content on display terminal.
In the technique scheme, said new application is added module and further comprised: the sorter unit is used for AST tree content is classified; With the application adding device, be used for adding expanded application in corresponding AST tree to the sorter classification results.
Said abstract syntax tree adaptation module further comprises: content of text adaptation unit, picture voice and video adaptation unit, index segmentation adaptation unit and newly use adding device.
Said system also comprises a cache module, is used for the corresponding AST tree of webpage of buffer memory ever accessed.
Wherein, the present invention safeguards a profile document library or performance parameter database, preserves the terminal device performance parameter, and termination property comprises end product model, screen size, screen resolution, quality of colour, terminal storage ability, processor parameter etc.; These parameters are obtained the back first and are stored in profile document library or the database with the form of XML document or data recording, be used for according to the different terminals product type from the profile storehouse or database resolve the parameter that obtains demarcating.According to terminal capability characteristic, adaptation strategies, generate the set of relevant AST tree from the source web page dom tree of user request, and the different content in the tree such as text, picture, voice, video etc. are carried out adaptive conversion process; Adopt the method training of supervised learning to obtain a text classifier, use this sorter that AST tree content is carried out subject classification, in the AST tree, add the new business content according to classification results and professional increment strategy; With the AST tree is basic granularity, uses the indicating template of acquiescence or customization that each AST tree reorganization is packaged into webpage.
The invention has the advantages that; The present invention adopts the set of AST tree according to the terminal capability characteristic WEB webpage to be carried out adaptive encapsulation; With the AST tree is that basic granularity takes to give tacit consent to display strategy to the different qualities of different terminals; Solve the not adaptive problem of its access internet web displaying, can set the subject categories of the technology identification AST tree of adopting machine learning in the adaptation procedure at AST simultaneously, a kind of method of adding expanded application to web page contents is provided.
Embodiment
Below in conjunction with accompanying drawing and specific embodiment the present invention is carried out detailed explanation.
The present invention carries out adaptive encapsulation to web page contents between various display terminals and WEB server; Adopt the method for machine learning to combine terminal capability characteristic, adaptation strategies and increment strategy to generate the set of relevant AST tree, and each item content in the tree such as text, picture, voice, video etc. are carried out adaptive conversion process from the source web page dom tree; With the AST tree is basic granularity, takes different acquiescence display strategies that the AST tree is reorganized to the different display characteristics of mobile phone or television terminal and is packaged into webpage.
As shown in Figure 1, this figure is the synoptic diagram of the composition of the adaptive package system of web page contents of the present invention, and this system comprises:
Source web page and parameter acquisition module 1 are obtained source web page and termination property parameter with suitable mode.
Abstract syntax tree generation module 3 is used for according to the source web page dom tree, and generates the set of related abstractions syntax tree according to the termination property parameter.
Adaptation module 4 is used for carrying out adaptive to the difference of regional element content to the AST tree; With
Webpage envelope revolving die piece 6 is used for the related content encapsulation webpage according to ATL and AST tree adaptation module.
As shown in Figure 2, this figure is that the functional module of the adaptive package system of web page contents of the present invention is formed the practical implementation block diagram, and this system comprises:
Obtain source web page and termination property parameter module 1, be used to resolve terminal request and then obtain source web page and the termination property parameter.
Webpage is resolved filtering module 2, is used for the webpage that obtains is resolved the dom tree that obtains source web page.
Abstract syntax tree set generation module 3 is used for according to the source web page dom tree, and generates the set of related abstractions syntax tree according to the termination property parameter.
Abstract syntax tree adaptation module 4 is used for carrying out adaptive to the difference of regional element content to the AST tree.
In addition, encapsulate and can also implement the new module of adding of using before the webpage, this module comprises again:
Sorter 8 is used for AST tree content is classified; Add module 5 with new application, be used for adding expanded application in corresponding AST tree to the sorter classification results.
Webpage package module 6 is used for the related content encapsulation webpage according to ATL module 7 and AST tree adaptation module, concrete corresponding webpage envelope revolving die piece 104.
System of the present invention also comprises a cache module, is used to deposit the abstract syntax tree that some source web pages form.This module can be accelerated the access speed of user terminal to webpage.
As shown in Figure 3, this figure is abstract syntax tree (AST) adaptation module concrete structure block diagram, the modular structure synoptic diagram that further comprises.Wherein, AST tree adaptation strategies is described in detail as follows:
(1). the content of text adaptation module
For the relatively more concentrated zone of text; The perhaps big situation of page Chinese version ratio; Adopt first sentence conversion or key words conversion; Be about to first sentence or keyword and be presented in the webpage as hypertext link, link provides through the form of tabulation, and other content only just turns back to the terminal under the situation that the user clicks.Full page is clean and tidy, and use a teleswitch or cell phone keyboard on directionkeys can operate fast.
If content of text is a lot of and when having hierarchical structure, can take the strategy of first sentence link gradation directory.
(2). picture, voice and video adaptation module
Picture adopts the acquiescence compression to show thumbnail or label link method.First method acquiescence compresses 25% to 75% with former figure, the thumbnail after in the final page, only providing compression.Second method adopts picture header or alt content to point to picture as hypertext link, only after the user clicks picture, just sends it on the terminal device to show.Voice and video then is to provide as text link through corresponding title.
(3). index segmentation adaptation module
During to source content of pages structure more complicated, carry out logical division according to list label, paragraph or the form etc. of front and back order.Adopt first sentence link method that hypertext is pointed to concrete cut zone content on this basis again.
If this a little can not all be on a terminal page and can further segment now, the content in the antithetical phrase piece is cut apart according to identical principle and is linked processing.Sub-number of blocks takes the strategy of previous and next navigation to show the link of last sub-piece and back one sub-piece more for a long time.
(4). use and add module
According to the classification results of AST tree content, in the AST tree, add relevant activating business.For example, if AST tree content is educational, can in this tree, insert relevant advertisements such as college entrance examination, postgraduate qualifying examination.
As shown in Figure 4, this figure is the adaptive method for packing of the web page contents based on said system provided by the invention, and this method comprises:
Step 101: obtain source web page and termination property parameter;
Step 102: utilize source web page DOM (DOM Document Object Model) tree, and generate the set of related abstractions syntax tree according to the termination property parameter;
Step 103: each item content in the abstract syntax tree is carried out adaptive conversion process respectively;
Step 104: the abstract syntax tree content is carried out subject classification, in abstract syntax tree, add new node and content according to classification results;
Step 105: with abstract syntax tree is basic granularity, adopts indicating template that each abstract syntax tree is reorganized and is packaged into webpage, web page content on display terminal.
Wherein, the abstract syntax tree that generates is carried out caching process simultaneously and accelerate the access speed of subsequent user terminal same source web page.
As shown in Figure 5, this figure obtains the plurality of sub step that the step of source web page and termination property parameter further comprises, and is described below:
The terminal user comprises the performance parameter of purpose url and terminal device to the acting server initiation request.
Step 201: acting server parses the url of request and the product type of terminal device after obtaining user's request;
Step 202: from the profile document library, search corresponding profile document.
Step 203: then resolve the document and obtain terminal device correlation parameter information if find,, generate corresponding profile document and warehouse-in is preserved if the profile document of this product type not then requires the terminal that its detail parameters and product type are beamed back.
Obtain behind the user capture url at first query caching module 9,, otherwise visit again the source web page document that the WEB server obtains appointment if had the corresponding AST tree of this webpage then directly with being sent back to the terminal after its encapsulation.
As shown in Figure 6, this figure obtains adaptive method for packing process flow diagram of detailed web page contents behind the source web page,
Step 301: processing power, storage capacity and the display capabilities parameter of the terminal device that provides according to the profile document obtains the AST tree of different sub piece, to the difference of regional element content such as text, picture, voice and video etc., carries out adaptive to the AST tree;
Step 302: use the sorter that has trained simultaneously, AST tree content is classified;
Step 303: can in corresponding AST tree, add expanded application as adding ad content etc. to classification results;
Step 304: whether the information check terminal browser according to the profile document provides supports the HTML5.0 standard;
Step 305: if support then adopt HTML5.0 standard packaging webpage;
Step 306: otherwise adopt default standard encapsulation webpage, use acquiescence template or customization template to encapsulate adaptive result at last, obtain final webpage.
Wherein, step 304, step 305 and step 306 are an embodiment of webpage encapsulation.
As shown in Figure 7, this figure is the synoptic diagram of the specific embodiment of an AST tree, and wherein, the left side is an ordinary pages content, and the right side is corresponding AST tree.Classifying content about the AST tree; Difference according to content type; The AST tree is divided into classifications such as news, education, finance and economics, physical culture, amusement, science and technology, life; Each classification is divided into different levels again, as education be divided into college entrance examination, in examine, adult education, prepare for the postgraduate qualifying examination, examination for going abroad, life can be divided into tourism, shopping, marriage, child-bearing etc.Take the method training of supervised learning to obtain a text classifier, its content is classified in AST tree adaptation procedure according to this sorter, to support expanded application.
It should be noted last that above embodiment is only unrestricted in order to technical scheme of the present invention to be described.Although the present invention is specified with reference to embodiment; Those of ordinary skill in the art is to be understood that; Technical scheme of the present invention is made amendment or is equal to replacement, do not break away from the spirit and the scope of technical scheme of the present invention, it all should be encompassed in the middle of the claim scope of the present invention.