[go: up one dir, main page]

CN109902254B - Information input method and device and electronic equipment - Google Patents

Information input method and device and electronic equipment Download PDF

Info

Publication number
CN109902254B
CN109902254B CN201910081547.1A CN201910081547A CN109902254B CN 109902254 B CN109902254 B CN 109902254B CN 201910081547 A CN201910081547 A CN 201910081547A CN 109902254 B CN109902254 B CN 109902254B
Authority
CN
China
Prior art keywords
preset
field
field information
information
target product
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910081547.1A
Other languages
Chinese (zh)
Other versions
CN109902254A (en
Inventor
余燃
李昱
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sankuai Online Technology Co Ltd
Original Assignee
Beijing Sankuai Online Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sankuai Online Technology Co Ltd filed Critical Beijing Sankuai Online Technology Co Ltd
Priority to CN201910081547.1A priority Critical patent/CN109902254B/en
Publication of CN109902254A publication Critical patent/CN109902254A/en
Application granted granted Critical
Publication of CN109902254B publication Critical patent/CN109902254B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Transfer Between Computers (AREA)
  • Machine Translation (AREA)

Abstract

The application discloses an information input method, belongs to the technical field of computers, and is used for solving the problem that in the prior art, the single-mode efficiency is low. The information entry method disclosed by the embodiment of the application comprises the following steps: acquiring a webpage address for displaying a target product; acquiring candidate field information corresponding to a preset single word segment of the target product through the webpage address; and analyzing the candidate field information, matching the candidate field information with a preset attribute, extracting the field information according to a matching result, and setting the field information to a preset single character section of the target product for display. According to the information input method disclosed by the embodiment of the application, the filling of the order field can be automatically completed only by inputting the website of the target product, the semi-automation of the order field of the product is realized, and the order field filling efficiency of the product is effectively improved.

Description

Information input method and device and electronic equipment
Technical Field
The present application relates to the field of computer technologies, and in particular, to an information entry method and apparatus, an electronic device, and a storage medium.
Background
Products on the electronic commerce platform are continuously replaced and updated, and the process needs to be executed. In the prior art, the order-adding operation of the commodities on the electronic commerce platform needs to be manually executed by a merchant or a platform maintainer. When information is recorded through a document loading system of an e-commerce platform, information of relevant fields of a product needs to be manually recorded. For example, for each travel product displayed on an e-commerce platform page, a product name, a departure location, a destination, a time, a product profile, etc. may need to be manually entered. Due to individual differences of products, the automation and batch information input of information input cannot be realized at present. And manual ordering, because the product fields needing to be entered are numerous and the information amount is large, the entering time is long, and the ordering efficiency is low.
Therefore, the information input method in the prior art has the defect of low single efficiency at least.
Disclosure of Invention
The application provides an information entry method which is beneficial to solving the problem of low single efficiency of products.
In order to solve the above problem, in a first aspect, an embodiment of the present application provides an information entry method, including:
acquiring a webpage address for displaying a target product;
acquiring candidate field information corresponding to a preset single word segment of the target product through the webpage address;
and analyzing the candidate field information, matching the candidate field information with a preset attribute, extracting the field information according to a matching result, and setting the field information to a preset single character section of the target product for display.
In a second aspect, an embodiment of the present application provides an information entry apparatus, including:
the webpage address acquisition module is used for acquiring a webpage address for displaying a target product;
the candidate field information acquisition module is used for acquiring candidate field information corresponding to a preset single word field of the target product through the webpage address;
and the field information determining module is used for analyzing the candidate field information, matching the candidate field information with a preset attribute, extracting the field information according to a matching result, and setting the field information to a preset single character field of the target product for display.
In a third aspect, an embodiment of the present application further discloses an electronic device, which includes a memory, a processor, and a computer program that is stored on the memory and is executable on the processor, and when the processor executes the computer program, the information entry method according to the embodiment of the present application is implemented.
In a fourth aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to perform the steps of the information entry method disclosed in the present application.
According to the information input method disclosed by the embodiment of the application, the webpage address of the display target product is obtained; acquiring candidate field information corresponding to a preset single word segment of the target product through the webpage address; the candidate field information is analyzed and matched with the preset attributes, the field information is extracted according to the matching result and is set to the preset single word section of the target product for displaying, and the problem of low single efficiency of the product is solved. According to the information input method disclosed by the embodiment of the application, the filling of the order field can be automatically completed only by inputting the website of the target product, the semi-automation of the order field of the product is realized, and the order field filling efficiency of the product is effectively improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.
Fig. 1 is a flowchart of an information entry method in a first embodiment of the present application;
FIG. 2 is a schematic diagram of content displayed on a web page corresponding to an acquired web page address;
fig. 3 is a schematic structural diagram of an information entry device according to a second embodiment of the present application;
fig. 4 is a schematic structural diagram of an electronic device disclosed in an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Example one
As shown in fig. 1, the information entry method disclosed in this embodiment includes: step 110 to step 130.
And step 110, acquiring a webpage address for displaying the target product.
When the method is implemented specifically, the information of each preset field of the target product, which needs to be filled in by the form loading system, is set quickly and automatically through the relevant information of the target product displayed by other webpages for displaying the target product. The web page address for displaying the target product may be a web page address for displaying the target product in a provider website of the target product, or a web page address for displaying a web page of the target product in a public network.
In specific implementation, the ordering system can be used for a user to input a webpage address for displaying a target product by setting a webpage address input box. And then, according to the triggering operation of the user, the webpage address of the display target webpage is obtained by the ordering system.
In some embodiments of the present application, the web page address may be entered in the form of a URL. In other implementations of the present application, the web page address may also be entered in the form of a browser file. The specific technical means for obtaining the webpage address for displaying the target product is not limited.
And step 120, acquiring candidate field information corresponding to the preset single word segment of the target product through the webpage address.
In some embodiments of the present application, the obtaining, by the web page address, candidate field information corresponding to a preset single word segment of the target product includes: determining page content displayed in a webpage corresponding to the webpage address; and determining page contents matched with preset format characteristics and/or preset keywords of each preset upper single field of the target product in the page contents through a webpage crawler technology, and taking the page contents as candidate field information of the corresponding upper single field.
After the webpage address for displaying the target product is obtained, the content displayed on the webpage address can be further obtained by accessing the webpage address. The web page shown in fig. 2 shows the following information of a travel product: title 210, product description 220, hotel 230, product profile 240.
In order to obtain the information of the single upper field required by the single upper system from the content displayed on the web page corresponding to the web page address, it is first necessary to determine the content corresponding to the single upper field required by the single upper system in the content displayed on the web page.
In specific implementation, the preset upper single-word segment is determined according to specific service requirements. For example, for travel products, the top field includes: a title, a place of departure, a destination, a mode of transportation, a product introduction, a hotel, and the like; for catering products, the upper single-character section comprises: title, cuisine, taste, restaurant name, address, etc.; for hotel products, the upper single-word segment includes: title, star, hotel name, address, house type, product profile, etc. In some embodiments of the present application, a layout and content fields of content displayed on a web page corresponding to the web page address are first analyzed, so as to determine format features and/or keywords of the content corresponding to the single character field required by the single character system. For example, when the ordering system executes the ordering operation on a travel product "Beijing to Harbin snow village tour", the upper single character section to be filled in includes: title, origin, destination, mode of transportation, hotel, price, product profile. First, according to the page content of the product of the provider website "Beijing to Harbin snow village tour", the format features and/or keywords corresponding to several fields of title, departure place, destination, transportation mode, hotel, product brief introduction, etc. are determined.
In specific implementation, the format characteristics may be display positions, font relationships, and the like of corresponding content. And the keywords are determined according to the webpage content corresponding to the acquired webpage address.
The following describes a technical solution for determining a format feature and/or a keyword corresponding to a preset upper field by taking the webpage content as an example shown in fig. 2. From the web page shown in fig. 2, it can be seen that the upper single field "title" corresponds to the title 210 in fig. 2, and the format thereof is the first line at the top of the page, so that the format characteristic corresponding to the upper single field is determined as follows: the top page line. As can be further understood from the web page shown in fig. 2, the single field "hotel" corresponds to the hotel 230 in fig. 2, and the hotel 230 in fig. 2 has a fixed keyword index, such as "hotel" and "accommodation", so that determining the keyword corresponding to the single field "price" includes: hotels and lodging. As another example, the above one-word segment "product profile" corresponds to the product description 240 of FIG. 2, while the product description 240 of FIG. 2 has a fixed key index, such as "detailed description", and thus, determining the key corresponding to the above one-word segment "product profile" includes: and (6) detailed description.
In the process of filling single-field information in information acquired through a web page, an important problem needs to be solved, that is, some upper single-field sections do not have fixed keywords in the web page for indexing, that is, no obvious display positions exist, as in the above single-field sections such as a starting place, a destination, a traffic mode and the like, web page contents corresponding to the upper single-field sections without the obvious display positions need to be found and determined by analyzing the contents displayed on the web page, and the upper single-field sections appearing in the same web page contents are provided with the same format features and/or keywords. For example, it can be found by analyzing the web page shown in fig. 2 that the upper words "departure place", "destination", and "transportation" do not correspond to the explicit information in fig. 2, but the upper words "departure place", "destination", and "transportation" are obtained from the product description 220 in fig. 2, and then the upper words "departure place", "destination", and "transportation" are determined to correspond to the product description 220 in fig. 2, and the format characteristics of the web page contents corresponding to the upper words "departure place", "destination", and "transportation" are determined as follows: to the right of the picture below the header 210.
In specific implementation, a website of each supplier comprises a plurality of products, and the format characteristics of the display page of each product are the same as the keywords of the same field of the product, so that for one supplier, only a set of format characteristics and/or keywords corresponding to the preset single upper field need to be determined according to the corresponding relationship between the content displayed in one page and the preset single upper field in the single upper system.
And then, determining page contents matched with preset format characteristics and/or preset keywords of each preset single-entry field of the target product in the page contents through a webpage crawler technology, and taking the page contents as candidate field information of the corresponding single-entry field. For example, the content of the title 210 in the web page shown in fig. 2 is obtained by web crawler technology as the candidate field information of the upper single field "title"; the content of the product description 220 in the webpage shown in fig. 2 is obtained through a webpage crawler technology and is used as candidate field information of the previous field 'departure place', 'destination' and 'transportation mode'; the content of the hotel 230 in the webpage shown in fig. 2 is obtained through a webpage crawler technology and is used as candidate field information of the single-field hotel; the contents of the product introduction 240 in the web page shown in fig. 2 will be obtained by web crawler technology as candidate field information of the previous field "product introduction".
The specific implementation of determining the page contents corresponding to different keywords or the page contents meeting a certain format requirement in the page contents by using a web crawler technology is referred to in the prior art, and is not described in detail in the embodiments of the present application.
And step 130, analyzing the candidate field information, matching the candidate field information with a preset attribute, extracting the field information according to a matching result, and setting the field information to a preset single word segment of the target product for display.
In specific implementation, the upper single-word segment can be matched to different attributes according to the characteristics of the corresponding webpage content. For example, if a certain upper single-word segment has an explicit exhibition position in the web page corresponding to the acquired web page address, the upper single-word segment is matched with the first attribute, such as the upper single-word segment "hotel" in the previous step; if a certain upper single-word segment does not explicitly show a position in the web page corresponding to the acquired web page address, the upper single-word segment is matched with the second attribute, such as the upper single-word segment 'departure place', 'destination' and 'traffic mode' in the previous steps; if a certain upper single-word segment has a definite display position in the web page corresponding to the acquired web page address but the content displayed by the web page belongs to non-standard content, that is, the display forms of different platforms are different, the upper single-word segment is matched with a third attribute, such as the upper single-word segment 'product introduction' in the previous step.
In specific implementation, after the candidate field information of each upper single-word segment of the target product is acquired, the candidate field information is further analyzed, and accurate field information is determined. Because the types of different single fields are different, some field information can be directly captured from the webpage of the target product through the preset keywords, and some field information needs to be captured from the webpage of the target product through the preset keywords and/or the preset format characteristics and then further extracted from the captured information. In the embodiment of the application, corresponding processing can be performed on the last single-character segments with different attributes.
In some embodiments of the present application, the analyzing the candidate field information, matching the candidate field information with a preset attribute, extracting the field information according to a matching result, and setting the field information to a preset upper single word segment of the target product for display includes: and for the preset single character field matched with the first attribute, setting the candidate field information corresponding to the preset single character field of the target product for display. For example, for the upper single-word segment "hotel", the candidate field information corresponding to "hotel" acquired in the previous step is used as the field information of the upper single-word segment "hotel" of the target product. And the obtained field information automatically sets the corresponding field of the target product in the ordering system to be stored for the ordering system to preview or display to a terminal client.
In some embodiments of the present application, the analyzing the candidate field information, matching the candidate field information with a preset attribute, extracting the field information according to a matching result, and setting the field information to a preset single word segment of the target product for display includes: for the preset single-word field matched with the second attribute, performing word segmentation processing on the candidate field information corresponding to the single-word field, and determining a plurality of words arranged in sequence; respectively identifying the plurality of words based on a word library corresponding to the preset single word segment, and determining the words matched with the preset single word segment; and according to the structural information of the preset single-word-loading field matched with the webpage address, extracting the word matched with the preset single-word-loading field and setting the field information of the preset single-word-loading field of the target product for display.
In the process of filling in the single-field information through the information acquired by the web page, because some single-field information has no fixed keyword for indexing in the web page, that is, has no obvious display position, the candidate field information corresponding to the single-field information is acquired through the format characteristics in the foregoing step, and therefore, the acquired candidate field information needs to be further processed. The candidate field information of the upper single-word fields of the departure place, the destination and the traffic mode obtained in the previous steps is as follows: "beijing flies to harbin + china snow village + yabuli + Changbai mountain + mirror lake panorama 8 days 7 nights with a group tour," dad dado where "find a view land snow village + Changbai mountain hot spring hotel" exemplifies the field information acquisition method.
Firstly, word segmentation processing is carried out on the obtained candidate field information through a word segmentation algorithm, and a plurality of words included in the candidate field information are determined. In this example, the following terms will be obtained: beijing, Feilong, Harbin, China snow county, Amberlite, Changbai mountain, mirror lake, panorama, 8 days, 7 nights, where dad goes, view finding land, snow county, Changbai mountain hot spring hotel.
Further, the words are respectively identified based on word libraries corresponding to the preset single word segments, and the words matched with the preset single word segments are determined. When the method is implemented specifically, firstly, word libraries corresponding to the previous single-word segments, such as a place name word library and a sight spot word library corresponding to the previous single-word segments of 'departure place' and 'destination', are required to be established; and a traffic mode word bank corresponding to the traffic mode of the single word segment. The place name word bank comprises the following components: city names such as Beijing, Shanghai, Harbin, and Yabuli, the scenic spot thesaurus includes: names of scenic spots such as snow county, mirror lake and Changbai mountain; the traffic mode thesaurus comprises: the words of the transportation modes such as plane, flying, cruise ship, train and the like. In different application scenarios, the preset upper single word segments are different, and correspondingly, word banks to be set and words contained in the word banks are different, which is not illustrated in the embodiment of the application.
And respectively matching each word obtained by word segmentation with a word in a preset word bank to determine the attribute of each word. For example, matching the word "beijing" obtained by word segmentation with a city lexicon, a scenic spot lexicon and a traffic mode lexicon respectively, and determining that the word attribute of the word "beijing" is a city name if the word "beijing" is determined to be in the city lexicon. According to the method, the word attribute of each word obtained by word segmentation can be respectively determined. Through word bank matching, successfully matched words in the words obtained through segmentation are as follows in sequence: beijing, Feiwei, Harbin, China snow county, Amberlite, Changbai mountain, Jingpo lake, snow county. Wherein, the words matched with the previous single-word sections of 'departure place' and 'destination' are: beijing, Harbin, China snow county, Amberlite, Changbai mountain, Jingpo lake, snow county; the words matched with the above single-word section 'traffic mode' are as follows: flying to.
In some embodiments of the present application, before the step of identifying the words based on the lexicon corresponding to the preset single-word segment and determining the words matched with the preset single-word segment, the method further includes: and denoising the determined words which are sequentially arranged according to the part of speech and the sentence structure. In order to reduce the computation load of word matching, in specific implementation, denoising processing may be performed on words obtained by splitting words based on the part of speech. Words that do not match the upper single-word segment of the second attribute are removed. If the word-cutting results in "panorama", "8 days", "7 nights", "dad where" is not a noun nor a verb, and cannot be the previous single-word segments "departure place", "destination" and "transportation", the word-cutting can be directly removed, and the matching with the lexicon is not performed.
Finally, since the candidate field information obtained through the format feature may include a plurality of candidate field information corresponding to one previous single word segment, it is further necessary to extract and set the field information of the preset previous single word segment of the target product for display from the words matched with the preset previous single word segment according to the structure information of the preset previous single word segment matched with the web page address. Specifically, for the words that are sequentially arranged and matched with the previous single-word segments of "departure place", "destination" and "transportation mode", it is further determined whether the arrangement sequence of the words meets a preset distribution rule, that is, whether the arrangement sequence of the words meets the structural information of the preset previous single-word segment matched with the web address. As according to the product description in fig. 2, it may be determined that the structural information of the preset upper single field matching the web address is "departure place + transportation means + destination". It follows that only the words "beijing", "flighting" and "harbin" conform to the structural information of the above fields. Specifically, in the present embodiment, the field information of the last specified upper one-word "departure place" is "beijing", the field information of the upper one-word "transportation mode" is "departure", and the field information of the upper one-word "destination" is "harbin". And then, the extracted field information automatically sets the corresponding field of the target product in the ordering system to be stored, and the ordering system previews or displays the corresponding field to a terminal client.
In specific implementation, the preset distribution rule is set according to business requirements. This application can snatch more field information through setting up different distribution rules, reduces artifical work load of typing in, promotes and goes up single efficiency.
In some embodiments of the present application, the analyzing the candidate field information, matching the candidate field information with a preset attribute, extracting the field information according to a matching result, and setting the field information to a preset upper single word segment of the target product for display includes: for the preset single word segment matched with the third attribute, performing sentence division processing on the candidate field information of the single word segment to determine at least one sentence; and editing the determined at least one sentence according to a preset strategy, and determining and setting the field information of the upper single-word section of the target product for display. For example, for the last single-word field "product introduction" in the foregoing step, because this field is not a standard field, each vendor may use a customized personalized description for the same product, in order to make the automatically set corresponding field improve the processing efficiency of the e-commerce platform on product information, improve the information consistency of different products, and make the product information displayed by the e-commerce platform better conform to the reading habit of people and the understanding logic of the scene, this application preferably further includes editing the candidate field information after acquiring the candidate field information of the last single-word field matching the third attribute.
In specific implementation, the candidate field information may be first divided into different sentences according to punctuation marks. Such as sentence segmentation with periods or exclamation marks or ellipses as sentence delimiters. Then, through semantic recognition, sentences containing "city", sentences containing "sight spot", and words containing "morning", "noon", "evening", and other time nouns are found out. Then, the sentences containing the cities are arranged together, and the sentences containing the scenic spots are arranged together; the sentences containing time nouns are arranged in sequence in the following time sequence of 'morning', 'noon' and 'evening'.
Taking the candidate field information as: a 'Chinese snow countryside' double-humped scenic spot with unique climate is an important component of a big-sea forest scenic spot, during the period of accumulated snow, an expanse of white snow is shaped along with objects under the action of wind force, and is in a thousand-pose state. At night, the snow countryside under the high illumination of the red light is more beautiful. The snow county is more than 90 kilometers from the Changting town of the peony river city, the floor area is 500 hectares, and the elevation of the whole area is more than 1200 meters. "for example, the field information obtained after editing according to the foregoing method is: the snow county is more than 90 kilometers from the Changting town of the peony river city, the floor area is 500 hectares, and the elevation of the whole area is more than 1200 meters. A 'Chinese snow countryside' double-humped scenic spot with unique climate is an important component of a large-sea forest scenic spot, during the period of accumulated snow, an expanse of white snow is shaped along with objects under the action of wind force, and is in a thousand-pose state. At night, the snow countryside under the high illumination of the red light is more beautiful. "
Alternatively, sentences containing city names or geographic locations are laid out in front of other sentences. Taking the candidate field information as a Stone Forest Scenic area (Stone Forest scene): the natural heritage in the world, the geological park in the world, the national AAAAA level tourist attraction, the national key landscape scenic spot, the national geological park, the national civilized landscape tourist attraction, and the ten-large scenic spot in China are protected by the best resources. The scenic spot of the stone forest is also called as the Yunnan stone forest, is located in the Yi nationality of the stone forest in Kunming City of Yunnan province, has an area of 350 square kilometers, has rich scenic curiosity and strong mood, and is the hometown of Ashima. For example, the field information obtained after editing is: "Stone Forest Scenic spot (Stone Forest Scenic): the scenic spot of the stone forest is also called as the Yunnan stone forest, is located in the Yi nationality of the stone forest in Kunming City of Yunnan province, has an area of 350 square kilometers, has rich scenic curiosity and strong mood, and is the hometown of Ashima. The natural heritage in the world, the geological park in the world, the national AAAAA level tourist attraction, the national key landscape scenic spot, the national geological park, the national civilized landscape tourist attraction, and the ten-large scenic spot in China are protected by the best resources. And then, automatically setting the field information obtained after editing to a corresponding field of the target product of the ordering system for storage, and previewing or displaying the corresponding field to a terminal client by the ordering system.
According to the information input method disclosed by the embodiment of the application, the webpage address of the display target product is obtained; acquiring candidate field information corresponding to a preset single word segment of the target product through the webpage address; the candidate field information is analyzed and matched with the preset attributes, the field information is extracted according to the matching result and is set to the preset single word section of the target product for displaying, and the problem of low single efficiency of the product is solved. According to the information input method disclosed by the embodiment of the application, the filling of the order field can be automatically completed only by inputting the website of the target product, the semi-automation of the order field of the product is realized, and the order field filling efficiency of the product is effectively improved.
In the prior art, the single-feeding system has the characteristics of numerous single-feeding fields and great diversity of product information formats provided by single-feeding product suppliers, cannot realize automatic single feeding, and only can be manually operated to fill in field information of each field. The process of performing the singulation on a plurality of target products consumes a lot of time, and the singulation efficiency is extremely low. According to the method and the device, the product content displayed on the webpage is analyzed, the product content displayed on the webpage is associated with the previous single word section, the keyword and/or format characteristics of the webpage content associated with the previous single word section are determined, then, the field information of the previous single word section with association, which is contained in the webpage input by a user, is automatically acquired by combining a webpage analysis technology and a data processing technology, and the efficiency of ordering the product is greatly improved. For the webpage content and the upper single field without clear one-to-one association relationship, firstly, the webpage content corresponding to the upper single field is determined based on the fuzzy association rule, and further, the distribution rule of the webpage content in fuzzy association with the upper single field is determined according to the analysis of the product content displayed by the webpage, and finally, the data extracted based on the fuzzy rule are extracted in a matching mode based on the distribution rule.
Example two
As shown in fig. 3, an information entry apparatus disclosed in this embodiment includes:
a web page address obtaining module 310, configured to obtain a web page address for displaying a target product;
a candidate field information obtaining module 320, configured to obtain, through the web page address, candidate field information corresponding to a preset single word field of the target product;
and the field information determining module 330 is configured to analyze the candidate field information, match the candidate field information with a preset attribute, extract field information according to a matching result, and set the field information to a preset single word segment of the target product for display.
In the process of acquiring the candidate field information corresponding to the preset single-word segment of the target product through the webpage address, in order to acquire the information of the single-word segment required by the single-word system from the content displayed on the webpage corresponding to the webpage address, the content corresponding to the single-word segment required by the single-word system in the content displayed on the webpage needs to be determined. In specific implementation, the preset upper single-word segment is determined according to specific service requirements. And analyzing the layout and content fields of the content displayed on the webpage corresponding to the webpage address to determine the format characteristics and/or keywords of the content corresponding to the single character field required by the single character system.
In specific implementation, the format characteristics may be display positions, font relationships, and the like of corresponding content. And the keywords are determined according to the webpage content corresponding to the acquired webpage address.
In the process of filling single-field information in the information acquired through the web page, an important problem needs to be solved, that is, some upper single-field segments do not have fixed keywords in the web page for indexing, that is, no obvious display positions, such as the upper single-field segments "departure place", "destination", and "traffic mode", etc., the web page content corresponding to the upper single-field segments without obvious display positions needs to be found and determined by analyzing the content displayed on the web page, and the upper single-field segments appearing in the same web page content are provided with the same format features and/or keywords, so as to extract the whole block of candidate field information.
In specific implementation, the upper single-word segment can be matched to different attributes according to the characteristics of the corresponding webpage content. For example, if a certain upper single-word segment has an explicit location in the web page corresponding to the acquired web page address, the upper single-word segment matches the first attribute, such as the upper single-word segment "hotel" in embodiment one; if a certain upper single-word segment does not explicitly show a position in the web page corresponding to the acquired web page address, the upper single-word segment is matched with the second attribute, such as the upper single-word segments of "origin", "destination" and "transportation mode" in the first embodiment; if a certain upper single-word segment has a definite display position in the web page corresponding to the acquired web page address but the content displayed by the web page belongs to non-standard content, that is, the display forms of different platforms are different, the upper single-word segment is matched with a third attribute, such as the upper single-word segment "product introduction" in the first embodiment.
In specific implementation, after the candidate field information of each upper single-word segment of the target product is acquired, the candidate field information is further analyzed, and accurate field information is determined. Because the types of different single fields are different, some field information can be directly captured from the webpage of the target product through the preset keywords, and some field information needs to be captured from the webpage of the target product through the preset keywords and/or the preset format characteristics and then further extracted from the captured information. In the embodiment of the application, corresponding processing can be performed on the last single-character segments with different attributes.
Optionally, the candidate field information obtaining module 320 is further configured to:
determining page content displayed in a webpage corresponding to the webpage address;
and determining page contents matched with preset format characteristics and/or preset keywords of each preset upper single field of the target product in the page contents through a webpage crawler technology, and taking the page contents as candidate field information of the corresponding upper single field.
Optionally, the field information determining module 330 is further configured to:
and for the preset single character field matched with the first attribute, setting the candidate field information corresponding to the preset single character field of the target product for display.
Optionally, the field information determining module 330 is further configured to:
for the preset single-word field matched with the second attribute, performing word segmentation processing on the candidate field information corresponding to the single-word field, and determining a plurality of words arranged in sequence;
respectively identifying the plurality of words based on a word library corresponding to the preset single word segment, and determining the words matched with the preset single word segment;
and according to the structural information of the preset single-word-loading field matched with the webpage address, extracting the word matched with the preset single-word-loading field and setting the field information of the preset single-word-loading field of the target product for display.
Optionally, the field information determining module 330 is further configured to:
for the preset single word segment matched with the third attribute, performing sentence division processing on the candidate field information of the single word segment to determine at least one sentence;
and editing the determined at least one sentence according to a preset strategy, and determining and setting the field information of the upper single-word section of the target product for display.
By editing the acquired field information, the processing efficiency of the electronic commerce platform on the product information can be improved by the corresponding automatically set field, the information consistency of different products is improved, and the product information displayed by the electronic commerce platform is more in line with the reading habits of people and the understanding logic of scenes.
The information entry device disclosed in this embodiment is used to implement the information entry method, and each module of the information entry device is used to implement the corresponding step of the information entry method, which is not described herein again.
The information input device disclosed by the embodiment of the application acquires the webpage address of the display target product; acquiring candidate field information corresponding to a preset single word segment of the target product through the webpage address; the candidate field information is analyzed and matched with the preset attributes, the field information is extracted according to the matching result and is set to the preset single word section of the target product for displaying, and the problem of low single efficiency of the product is solved. The information input device disclosed by the embodiment of the application can automatically complete filling of the order field only by inputting the website of the target product, so that semi-automation of the order field on the product is realized, and the order efficiency of the product is effectively improved.
In the prior art, the single-feeding system has the characteristics of numerous single-feeding fields and great diversity of product information formats provided by single-feeding product suppliers, cannot realize automatic single feeding, and only can be manually operated to fill in field information of each field. The process of performing the singulation on a plurality of target products consumes a lot of time, and the singulation efficiency is extremely low. According to the method and the device, the product content displayed on the webpage is analyzed, the product content displayed on the webpage is associated with the previous single word section, the keyword and/or format characteristics of the webpage content associated with the previous single word section are determined, then, the field information of the previous single word section with association, which is contained in the webpage input by a user, is automatically acquired by combining a webpage analysis technology and a data processing technology, and the efficiency of ordering the product is greatly improved. For the webpage content and the upper single field without clear one-to-one association relationship, firstly, the webpage content corresponding to the upper single field is determined based on the fuzzy association rule, and further, the distribution rule of the webpage content in fuzzy association with the upper single field is determined according to the analysis of the product content displayed by the webpage, and finally, the data extracted based on the fuzzy rule are extracted in a matching mode based on the distribution rule.
Correspondingly, the present application also discloses an electronic device, as shown in fig. 4, the electronic device includes a memory 410, a processor 420, and a computer program 4201 stored on the memory and executable on the processor 420, and when the processor 420 executes the computer program, the information entry method according to the first embodiment of the present application is implemented. The electronic device can be a PC, a mobile terminal, a personal digital assistant, a tablet computer and the like.
The application also discloses a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the steps of the information entry method according to the first embodiment of the application.
The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.
The information entry method and the information entry device provided by the application are introduced in detail, a specific example is applied in the text to explain the principle and the implementation mode of the application, and the description of the embodiment is only used for helping to understand the method and the core idea of the application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Claims (8)

1. An information entry method, comprising:
acquiring a webpage address for displaying a target product;
acquiring candidate field information corresponding to a preset single word segment of the target product through the webpage address;
analyzing the candidate field information, matching the candidate field information with a preset attribute, extracting field information according to a matching result, and setting the field information to a preset upper single field of the target product for display;
the step of analyzing the candidate field information, matching the candidate field information with a preset attribute, extracting the field information according to a matching result, and setting the field information to a preset single character section of the target product for display comprises the following steps:
for the preset single-word field matched with the second attribute, performing word segmentation processing on the candidate field information corresponding to the single-word field, and determining a plurality of words arranged in sequence;
respectively identifying the plurality of words based on a word library corresponding to the preset single word segment, and determining the words matched with the preset single word segment;
extracting and setting field information of the preset upper single field of the target product for display from the words matched with the preset upper single field according to the structural information of the preset upper single field matched with the webpage address;
or,
the step of analyzing the candidate field information, matching the candidate field information with a preset attribute, extracting the field information according to a matching result, and setting the field information to a preset single character section of the target product for display comprises the following steps:
for the preset single word segment matched with the third attribute, performing sentence division processing on the candidate field information of the single word segment to determine at least one sentence;
and editing the determined at least one sentence according to a preset strategy, and determining and setting the field information of the upper single-word section of the target product for display.
2. The method of claim 1, wherein the step of obtaining the candidate field information corresponding to the preset single-word field of the target product through the webpage address comprises:
determining page content displayed in a webpage corresponding to the webpage address;
and determining page contents matched with preset format characteristics and/or preset keywords of each preset upper single field of the target product in the page contents through a webpage crawler technology, and taking the page contents as candidate field information of the corresponding upper single field.
3. The method according to claim 1 or 2, wherein the step of matching with a preset attribute by parsing the candidate field information, extracting field information according to a matching result and setting to a preset single word segment of the target product for display comprises:
and for the preset single character field matched with the first attribute, setting the candidate field information corresponding to the preset single character field of the target product for display.
4. An information entry device, comprising:
the webpage address acquisition module is used for acquiring a webpage address for displaying a target product;
the candidate field information acquisition module is used for acquiring candidate field information corresponding to a preset single word field of the target product through the webpage address;
the field information determining module is used for analyzing the candidate field information, matching the candidate field information with a preset attribute, extracting the field information according to a matching result, and setting the field information to a preset single character field of the target product for display;
the field information determining module is further configured to:
for the preset single-word field matched with the second attribute, performing word segmentation processing on the candidate field information corresponding to the single-word field, and determining a plurality of words arranged in sequence;
respectively identifying the plurality of words based on a word library corresponding to the preset single word segment, and determining the words matched with the preset single word segment;
extracting and setting field information of the preset upper single field of the target product for display from the words matched with the preset upper single field according to the structural information of the preset upper single field matched with the webpage address;
or, the field information determining module is further configured to:
for the preset single word segment matched with the third attribute, performing sentence division processing on the candidate field information of the single word segment to determine at least one sentence;
and editing the determined at least one sentence according to a preset strategy, and determining and setting the field information of the upper single-word section of the target product for display.
5. The apparatus of claim 4, wherein the candidate field information obtaining module is further configured to:
determining page content displayed in a webpage corresponding to the webpage address;
and determining page contents matched with preset format characteristics and/or preset keywords of each preset upper single field of the target product in the page contents through a webpage crawler technology, and taking the page contents as candidate field information of the corresponding upper single field.
6. The apparatus of claim 4 or 5, wherein the field information determining module is further configured to:
and for the preset single character field matched with the first attribute, setting the candidate field information corresponding to the preset single character field of the target product for display.
7. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the information entry method of any one of claims 1 to 3 when executing the computer program.
8. A computer-readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, carries out the steps of the information entry method of any one of claims 1 to 3.
CN201910081547.1A 2019-01-28 2019-01-28 Information input method and device and electronic equipment Active CN109902254B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910081547.1A CN109902254B (en) 2019-01-28 2019-01-28 Information input method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910081547.1A CN109902254B (en) 2019-01-28 2019-01-28 Information input method and device and electronic equipment

Publications (2)

Publication Number Publication Date
CN109902254A CN109902254A (en) 2019-06-18
CN109902254B true CN109902254B (en) 2021-09-24

Family

ID=66944373

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910081547.1A Active CN109902254B (en) 2019-01-28 2019-01-28 Information input method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN109902254B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101639905A (en) * 2009-08-19 2010-02-03 熙健 Method and device for automatically inputting product Information
JP4980518B2 (en) * 2001-02-27 2012-07-18 エイディシーテクノロジー株式会社 Information system
CN105989042A (en) * 2015-02-04 2016-10-05 阿里巴巴集团控股有限公司 Information input method and device thereof
CN107357937A (en) * 2017-08-22 2017-11-17 腾讯科技(深圳)有限公司 Management of webpage end creation method and device
CN107943862A (en) * 2017-11-09 2018-04-20 天脉聚源(北京)传媒科技有限公司 A kind of method and device of intelligence generation reptile

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4980518B2 (en) * 2001-02-27 2012-07-18 エイディシーテクノロジー株式会社 Information system
CN101639905A (en) * 2009-08-19 2010-02-03 熙健 Method and device for automatically inputting product Information
CN105989042A (en) * 2015-02-04 2016-10-05 阿里巴巴集团控股有限公司 Information input method and device thereof
CN107357937A (en) * 2017-08-22 2017-11-17 腾讯科技(深圳)有限公司 Management of webpage end creation method and device
CN107943862A (en) * 2017-11-09 2018-04-20 天脉聚源(北京)传媒科技有限公司 A kind of method and device of intelligence generation reptile

Also Published As

Publication number Publication date
CN109902254A (en) 2019-06-18

Similar Documents

Publication Publication Date Title
US8064727B2 (en) Adaptive image maps
CN103514299B (en) Information search method and device
CN102741838B (en) System and method for block segmentation, identification and indexing of visual elements and searching documents
CN108399150B (en) Text processing method and device, computer equipment and storage medium
CN106503211B (en) Method for automatic generation of mobile version of information publishing website
CN103377258A (en) Method and device for classification display of microblog information
CN103544176A (en) Method and device for generating page structure template corresponding to multiple pages
KR20220006491A (en) Method, apparatus, electronic device, storage medium and computer program for generating comment subtitle
CN115982376B (en) Method and device for training model based on text, multimode data and knowledge
CN103885983A (en) Travelling route determining method, and optimizing method and device
CN113407678B (en) Knowledge graph construction method, device and equipment
CN101826096A (en) Information display method, device and system based on mouse pointing
US11651039B1 (en) System, method, and user interface for a search engine based on multi-document summarization
CN107590288B (en) Method and device for extracting webpage image-text blocks
CN104462532A (en) Method and device for extracting webpage text
CN102999511B (en) A kind of page fast conversion method, device and system
CN105786847A (en) Method and system for displaying structured abstracts of commodity web page in e-commerce website
CN103942211A (en) Text page recognition method and device
CN116595241A (en) New media information display method and device, electronic equipment and computer readable medium
CN116954414A (en) Information display method, information display device, electronic device, storage medium, and program product
US20170235835A1 (en) Information identification and extraction
CN114443928B (en) Web text data crawler method and system
CN111369294A (en) Software cost estimation method and device
CN109902254B (en) Information input method and device and electronic equipment
CN117421413A (en) Question-answer pair generation method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant