CN112330463A - Method, device, equipment and medium for detecting legal qualification of financing website - Google Patents
Method, device, equipment and medium for detecting legal qualification of financing website Download PDFInfo
- Publication number
- CN112330463A CN112330463A CN202011364312.2A CN202011364312A CN112330463A CN 112330463 A CN112330463 A CN 112330463A CN 202011364312 A CN202011364312 A CN 202011364312A CN 112330463 A CN112330463 A CN 112330463A
- Authority
- CN
- China
- Prior art keywords
- website
- target
- legal
- financing
- qualification
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/06—Asset management; Financial planning or analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0635—Risk analysis of enterprise or organisation activities
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Human Resources & Organizations (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Strategic Management (AREA)
- Physics & Mathematics (AREA)
- Economics (AREA)
- General Physics & Mathematics (AREA)
- Development Economics (AREA)
- Entrepreneurship & Innovation (AREA)
- Game Theory and Decision Science (AREA)
- Finance (AREA)
- Accounting & Taxation (AREA)
- Operations Research (AREA)
- Marketing (AREA)
- General Business, Economics & Management (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Technology Law (AREA)
- Educational Administration (AREA)
- Quality & Reliability (AREA)
- Tourism & Hospitality (AREA)
- Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)
Abstract
The application discloses a legal quality detection method, a legal quality detection device, legal quality detection equipment and a legal quality detection medium for a financing website. The method comprises the following steps: acquiring a target website and identifying whether the target website is a known legal qualification website; if not, determining the website type and the website keywords of the target website; matching the website keywords with a preset regular expression to obtain matching parameters; and judging whether the target website is a financial website with legal quality or not based on the matching parameters and the website type. Therefore, the obtained website keywords are matched with the preset regular expression, whether the target website has the behavior of publicizing the financial products or not can be reflected, whether the target website is the financial website with the legal-quality public financial products or not is determined through the matching parameters and the website types, and the detection capability of the legal quality of the financial website is improved.
Description
Technical Field
The invention relates to the field of website detection, in particular to a method, a device, equipment and a medium for detecting legal quality of a financing website.
Background
At present, various investment and financing websites and financing products are filled on the network, and people can buy or invest in the financing products through the network. However, there are some illegal websites on the network that do not sell legal quality of the financial product, and there may be illegal sales of the financial product in such illegal websites, such as private funds, i.e. investment funds collected from qualified investors in terms of stocks, equities, bonds, futures, options, fund shares and other investment targets agreed by investment contracts, and the private funds can only collect funds for the qualified investors, and have no need to publicly recommend, promote and advertise. However, lawbreakers engaged in illegal activities with private fund, publicized publicity, and partially private fund raising institutions collaborated with banks, insurance and other institutions to surrender private fund raising in order to broaden the fund raising channel. Individual lawbreakers use the trust of investors on banks and insurance institutions or copy means such as insurance marketing and biography to cheat the unqualified investors with private funds, so that the investment risk of the investors is aggravated under the condition that the investors are not aware of the private funds. Therefore, how to detect whether the financing website has legal qualification or not and discover the illegal qualification website in time to guarantee the property safety of people is a problem which is widely concerned at present.
Disclosure of Invention
In view of the above, the present invention provides a method, an apparatus, a device and a medium for detecting legal quality of a financing website, which can detect the legal quality of the financing website. The specific scheme is as follows:
in a first aspect, the application discloses a legal qualification detection method for a financing website, which comprises the following steps:
acquiring a target website and identifying whether the target website is a known legal qualification website;
if not, determining the website type and the website keywords of the target website;
matching the website keywords with a preset regular expression to obtain matching parameters;
and judging whether the target website is a financial website with legal quality or not based on the matching parameters and the website type.
Optionally, the obtaining a target website and identifying whether the target website is a known legal qualification website includes:
acquiring a financing investment website through a crawler technology to obtain the target website;
and checking the website title and the description sentence of the target website to determine whether the target website is a known legal qualification website.
Optionally, the determining the website keyword of the target website includes:
calculating TF-IDF values of words in the text information of the target website by using a TF-IDF algorithm, and screening out the words of which the TF-IDF values are larger than a preset threshold value;
carrying out weighted calculation on the words based on the preset weight of the target keyword so as to determine the website keyword; wherein the target keyword comprises any one or more of income, product, investment, financing, finance, endorsement, guarantee, duration, stability, member, warranty, non-warranty, high return and risk.
Optionally, before calculating the TF-IDF value of a word in the text information of the target website by using the TF-IDF algorithm, the method further includes:
acquiring hypertext markup languages of a primary page and a secondary page of the target website through an HTML (hypertext markup language) parser; the hypertext markup language comprises text information and pictures;
and extracting text information contained in the picture in the hypertext markup language by an optical character recognition technology to obtain the text information contained in the target website.
Optionally, the rule character string of the preset regular expression is the target keyword.
Optionally, the known legal qualification websites include a bank website, a securities website, a public fund website, a futures website and an insurance website; the website types comprise small loan websites, scientific and financial websites, wealth company websites, private fund websites, trust websites and other types of websites.
Optionally, the determining whether the target website is a legal financing website based on the matching parameters and the website type includes:
if the risk level corresponding to the website type is high risk and the matching parameter is greater than or equal to a first preset threshold value, judging that the target website is a financing website without legal quality;
if the risk level corresponding to the website type is a medium risk and the matching parameter is greater than or equal to a second preset threshold value, determining that the target website is a financing website without legal quality;
if the risk level corresponding to the website type is low risk and the matching parameter is greater than or equal to a third preset threshold value, determining that the target website is a financing website without legal quality;
the risk levels of the wealth company website and the private fund website are high risks; the risk levels of the petty loan website and the scientific and technical finance website are intermediate risks; the risk level of the trusted web site and the other types of web sites is low risk.
In a second aspect, the present application discloses a legal qualification detection apparatus for a financing website, comprising:
the legal qualification website identification module is used for acquiring a target website and identifying whether the target website is a known legal qualification website;
the website keyword determining module is used for determining the website type and the website keywords of the target website if the identification result of the legal qualification website identification module is negative;
the matching module is used for matching the website keywords with a preset regular expression to obtain matching parameters;
and the legal qualification judging module is used for judging whether the target website is a financing website with legal qualification or not based on the matching parameters and the website type.
In a third aspect, the present application discloses an electronic device, comprising:
a memory for storing a computer program;
and the processor is used for executing the computer program to realize the legal quality detection method of the financing website.
In a fourth aspect, the present application discloses a computer readable storage medium for storing a computer program; wherein the computer program realizes the legal quality detection method of the financing website when being executed by the processor.
The application provides a legal qualification detection method for a financing website, which comprises the following steps: acquiring a target website and identifying whether the target website is a known legal qualification website; if not, determining the website type and the website keywords of the target website; matching the website keywords with a preset regular expression to obtain matching parameters; and judging whether the target website is a financial website with legal quality or not based on the matching parameters and the website type. Therefore, the obtained website keywords are matched with the preset regular expression, whether the target website has the behavior of publicizing the financial products or not can be reflected, whether the target website is the financial website with the legal-quality public financial products or not is determined through the matching parameters and the website types, and the detection capability of the legal quality of the financial website is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
FIG. 1 is a flow chart of a legal qualification testing method for a financing website provided by the present application;
FIG. 2 is a flowchart of a specific legal qualification testing method for a financing website provided by the present application;
FIG. 3 is a schematic structural view of a legal qualification testing device for a financing website provided by the present application;
fig. 4 is a block diagram of an electronic device provided in the present application.
Detailed Description
The embodiment of the application discloses a legal qualification detection method for a financing website, which is shown in figure 1 and can comprise the following steps:
step S11: and acquiring a target website and identifying whether the target website is a known legal qualification website.
In the embodiment, a target website is obtained firstly, and whether the target website is a known legal qualification website is identified; the known legal qualification websites comprise a bank website, a securities website, a public fund website, a futures website and an insurance website, and the number of the target websites can be one or more. Specifically, known legal qualification websites can be determined by comparing the target website with the white list.
Step S12: if not, determining the website type and the website keywords of the target website.
In this embodiment, if the target website is not a known legal qualification website, determining the website type and the website keywords of the target website; the website types comprise small loan websites, scientific and financial websites, wealth company websites, private fund websites, trust websites and other types of websites. It can be understood that after the target website is obtained, the known legal qualification websites in the target website are filtered, then the website types and the website keywords of the filtered target website are determined, the risk degrees of different types of websites are different, and the website keywords can reflect whether the target website has behaviors for publicizing financial products or not.
Step S13: and matching the website keywords with a preset regular expression to obtain matching parameters.
In this embodiment, after determining the website keyword, matching the website keyword with a preset regular expression, where the regular character string corresponding to the preset regular expression includes, but is not limited to, income, product, investment, financing, finance, endorsement, guarantee, term, stability, member, warranty, non-warranty, high return and risk, and obtains a matching parameter. Specifically, the website keywords are matched with the rule character strings to obtain matching parameters, and the expression of the matching parameters is as follows:
wherein, F (i) is the matching result of the ith rule character string and the website keywords, and n is the number of the rule character strings; specifically, 1 is selected if the corresponding website keyword is matched, and 0 is selected if the corresponding website keyword is not matched, so that the matching parameter Y is finally obtained.
Step S14: and judging whether the target website is a financial website with legal quality or not based on the matching parameters and the website type.
In this embodiment, after the matching parameters are obtained through matching, whether the target website is a financing website with legal qualifications is determined by combining the matching parameters and the website types. It can be understood that, because the rule character string is a common word for promoting a financial product, whether a target website has a behavior for promoting the financial product can be judged according to the matching parameters, and whether the target website is a financial website with legal quality can be determined by combining the website types capable of reflecting website risks.
In this embodiment, the determining whether the target website is a legal financing website based on the matching parameters and the website type may include: if the risk level corresponding to the website type is high risk and the matching parameter is greater than or equal to a first preset threshold value, judging that the target website is a financing website without legal quality; if the risk level corresponding to the website type is a medium risk and the matching parameter is greater than or equal to a second preset threshold value, determining that the target website is a financing website without legal quality; if the risk level corresponding to the website type is low risk and the matching parameter is greater than or equal to a third preset threshold value, determining that the target website is a financing website without legal quality; the risk levels of the wealth company website and the private fund website are high risks; the risk levels of the petty loan website and the scientific and technical finance website are intermediate risks; the risk level of the trusted web site and the other types of web sites is low risk. It can be understood that whether the target website is a legal financing website is determined according to the risk level corresponding to the website type and the numerical value of the matching parameter; the first preset threshold may be 6, the second preset threshold may be 8, and the third preset threshold may be 10.
As can be seen from the above, in this embodiment, a target website is obtained and whether the target website is a known legal qualification website is identified; if not, determining the website type and the website keywords of the target website; then matching the website keywords with a preset regular expression to obtain matching parameters; and finally, judging whether the target website is a financial website with legal quality based on the matching parameters and the website type. Therefore, whether the target website has the behavior of publicizing the financial products or not can be reflected by matching the acquired website keywords with the preset regular expression, and whether the target website is the financial website with the legal-quality public financial products or not is determined by matching parameters and website types. Therefore, whether the website on the internet is a website for publicizing financial products or not can be judged quickly, an important judgment basis is further provided for judging whether the financial enterprise of the private type relates to illegal behaviors such as publicizing financial products or not, and the property safety of the masses is further maintained.
The embodiment of the application discloses a specific method for detecting legal qualification of a financing website, which is shown in figure 2 and can comprise the following steps:
step S21: acquiring a financing investment website through a crawler technology to obtain a target website; and checking the website title and the description sentence of the target website to determine whether the target website is a known legal qualification website.
In this embodiment, a financing-investing website is obtained as the target website through a web crawler technology, and then whether the target website is a known legal qualification website is determined by checking whether a website title (title) and a description sentence (description) field of the target website have a corresponding type identification field. The known legal qualification websites comprise bank websites, securities websites, public fund websites, futures websites and insurance websites; the type identification field of the bank website can be a bank, a commercial bank, a cooperative bank and a credit agency; the type identification field of the securities type website can be securities and securities companies; the type identification field of the public fund type website can be fund and fund company; the type identification field of the future goods website can be futures and a future company; the type identification field of the insurance website can be insurance, life insurance and property insurance.
Step S22: if the target website is not a known legal qualification website, determining the website type of the target website, and calculating TF-IDF values of words in text information of the target website by using a TF-IDF algorithm.
In this embodiment, if it is determined that the target website is not a known legal quality website, the website type of the target website is determined, where the website type includes a small loan website, a scientific and technical finance website, a wealth company website, a private fund website, a trust website, and other types of websites. Then, calculating TF-IDF values of words in the text information of the target website by using a TF-IDF algorithm, wherein the TF-IDF values are products of TF (Term Frequency) values and IDF (Inverse text Frequency index) values; wherein the expression of the TF value is:
wherein n isi,jFor the number of times the word j appears in document i, ∑kni,kThe total number of words contained in the document i;
wherein the expression of the IDF value is:
wherein | D | is the total number of files in the corpus, | i: tj∈diL is the total number of documents in which the word j appears; it is understood that if the TF-IDF value of a word is larger, it means that the word appears more frequently in this document but less frequently in other documents, and thus can be used as a tokenized word in this document.
In this embodiment, before calculating the TF-IDF value of a word in the text information of the target website by using the TF-IDF algorithm, the method may further include: acquiring hypertext markup languages of a primary page and a secondary page of the target website through an HTML (hypertext markup language) parser; the hypertext markup language comprises text information and pictures; and extracting text information contained in the picture in the hypertext markup language by an optical character recognition technology to obtain the text information contained in the target website. Specifically, hypertext markup language (HTML) of the primary page and the secondary page of the target website can be acquired through a jsup technology, and an Optical Character Recognition (OCR) technology is used for recognizing a picture in the HTML to obtain text information in the picture, so as to obtain text information included in the target website. By extracting effective contents in the webpage and adopting an OCR recognition technology to recognize the contents aiming at the picture contents, the integrity of text information acquisition is improved, the analysis result is more accurate,
step S23: screening out the words with the TF-IDF value larger than a preset threshold value, and carrying out weighted calculation on the words based on the preset weight of the target keyword so as to determine the website keyword.
In the embodiment, after the TF-IDF values of the words in the text information of the target website are calculated, the words with the TF-IDF values larger than a preset threshold value are screened out, and then the words are weighted and calculated by the weight of preset target keywords; it can be understood that when the property products are popularized, the occurrence frequency of partial words is not the highest, but the partial words can also be used as keywords, so that some target keywords are preset according to the propaganda habits of the property management products, and corresponding weights are given to each word, so that the weights of the target keywords in the words can be increased through weighting calculation, and more needed website keywords can be obtained. Wherein the target keywords include, but are not limited to, income, production, investment, financing, finance, endorsement, warranty, term, stability, membership, warranty, non-warranty, high return and risk; wherein the financing, warranty, non-warranty, finance, investment and high return weights may be 0.8, and the endorsement, warranty, stabilization and investment weights may be 0.4; the term and membership may be weighted by 0.2 and the remaining words may be weighted by 0.1.
Step S24: and matching the website keywords with a preset regular expression to obtain matching parameters.
In the embodiment, after the website keywords are determined, the website keywords are matched with a preset regular expression to obtain matching parameters; and the regular character string of the preset regular expression is the target keyword.
Step S25: and judging whether the target website is a financial website with legal quality or not based on the matching parameters and the website type.
For the specific processes of step S21 and step S25, reference may be made to the corresponding contents disclosed in the foregoing embodiments, and details are not repeated here.
As can be seen from the above, in the embodiment, the financing-investing website is acquired as the target website by the crawler technology, and then the known legal qualification website is removed by looking up the website title and description statement of the target website, so that the calculation amount of the subsequent steps can be reduced; and calculating TF-IDF values of words in text information of the target website by using a TF-IDF algorithm, then screening out words with TF-IDF values larger than a preset threshold value, and performing weighted calculation on the words based on preset weights of the target keywords to determine website keywords, so that the weights of the target keywords in the words can be increased through weighted calculation, and therefore more needed website keywords can be obtained, and the website keywords can reflect whether the website has a financial product propaganda behavior, so that the judgment of judging whether the financial website has the financial product propaganda behavior is improved, and then whether the target website is a financial website with legal property open financial products is determined through matching parameters and website types, and the detection capability of the legal property of the financial website is improved.
Correspondingly, the embodiment of the present application further discloses a device for detecting legal qualification of a financing website, as shown in fig. 3, the device includes:
a legal qualification website identification module 11, configured to acquire a target website and identify whether the target website is a known legal qualification website;
a website keyword determining module 12, configured to determine a website type and a website keyword of the target website if the recognition result of the legal qualification website recognition module 11 is negative;
the matching module 13 is configured to match the website keyword with a preset regular expression to obtain a matching parameter;
and a legal qualification judging module 14, configured to judge whether the target website is a financing website with legal qualification based on the matching parameters and the website type.
As can be seen from the above, in this embodiment, a target website is obtained and whether the target website is a known legal qualification website is identified; if not, determining the website type and the website keywords of the target website; then matching the website keywords with a preset regular expression to obtain matching parameters; and finally, judging whether the target website is a financial website with legal quality based on the matching parameters and the website type. Therefore, whether the target website has the behavior of publicizing the financial products or not can be reflected by matching the acquired website keywords with the preset regular expression, and whether the target website is the financial website with the legal-quality public financial products or not is determined by matching parameters and website types. Therefore, whether the website on the internet is a website for publicizing financial products or not can be judged quickly, an important judgment basis is further provided for judging whether the financial enterprise of the private type relates to illegal behaviors such as publicizing financial products or not, and the property safety of the masses is further maintained.
In some embodiments, the legally qualified website identifying module 11 may specifically include:
acquiring a financing investment website through a crawler technology to obtain the target website;
and checking the website title and the description sentence of the target website to determine whether the target website is a known legal qualification website.
In some embodiments, the website keyword determination module 12 may specifically include:
calculating TF-IDF values of words in the text information of the target website by using a TF-IDF algorithm, and screening out the words of which the TF-IDF values are larger than a preset threshold value;
carrying out weighted calculation on the words based on the preset weight of the target keyword so as to determine the website keyword; wherein the target keyword comprises any one or more of income, product, investment, financing, finance, endorsement, guarantee, duration, stability, member, warranty, non-warranty, high return and risk.
In some embodiments, the legal qualification determining module 14 may specifically include:
if the risk level corresponding to the website type is high risk and the matching parameter is greater than or equal to a first preset threshold value, judging that the target website is a financing website without legal quality;
if the risk level corresponding to the website type is a medium risk and the matching parameter is greater than or equal to a second preset threshold value, determining that the target website is a financing website without legal quality;
if the risk level corresponding to the website type is low risk and the matching parameter is greater than or equal to a third preset threshold value, determining that the target website is a financing website without legal quality;
the risk levels of the wealth company website and the private fund website are high risks; the risk levels of the petty loan website and the scientific and technical finance website are intermediate risks; the risk level of the trusted web site and the other types of web sites is low risk.
Further, the embodiment of the present application also discloses an electronic device, which is shown in fig. 4, and the content in the drawing cannot be considered as any limitation to the application scope.
Fig. 4 is a schematic structural diagram of an electronic device 20 according to an embodiment of the present disclosure. The electronic device 20 may specifically include: at least one processor 21, at least one memory 22, a power supply 23, a communication interface 24, an input output interface 25, and a communication bus 26. Wherein, the memory 22 is used for storing a computer program, and the computer program is loaded and executed by the processor 21 to implement the relevant steps in the legal quality detection method for a financing website disclosed in any one of the foregoing embodiments.
In this embodiment, the power supply 23 is configured to provide a working voltage for each hardware device on the electronic device 20; the communication interface 24 can create a data transmission channel between the electronic device 20 and an external device, and a communication protocol followed by the communication interface is any communication protocol applicable to the technical solution of the present application, and is not specifically limited herein; the input/output interface 25 is configured to obtain external input data or output data to the outside, and a specific interface type thereof may be selected according to specific application requirements, which is not specifically limited herein.
In addition, the storage 22 is used as a carrier for storing resources, and may be a read-only memory, a random access memory, a magnetic disk or an optical disk, etc., the resources stored thereon include an operating system 221, a computer program 222, data 223 including a target website, etc., and the storage may be a transient storage or a permanent storage.
The operating system 221 is used for managing and controlling each hardware device and the computer program 222 on the electronic device 20, so as to realize the operation and processing of the mass data 223 in the memory 22 by the processor 21, and may be Windows Server, Netware, Unix, Linux, and the like. The computer program 222 may further include a computer program that can be used to perform other specific tasks in addition to the computer program that can be used to perform the legal quality detection method of a financial website executed by the electronic device 20 disclosed in any of the foregoing embodiments. Data 223 may include a target website acquired by electronic device 20.
Further, the embodiment of the present application further discloses a computer storage medium, in which computer executable instructions are stored, and when the computer executable instructions are loaded and executed by a processor, the steps of the legal quality detection method for the financing website disclosed in any of the foregoing embodiments are implemented.
The embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The method, the device, the equipment and the medium for detecting the legal qualification of the financing website provided by the invention are introduced in detail, a specific example is applied in the text to explain the principle and the implementation mode of the invention, and the description of the embodiment is only used for helping to understand the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.
Claims (10)
1. A legal qualification detection method for a financing website is characterized by comprising the following steps:
acquiring a target website and identifying whether the target website is a known legal qualification website;
if not, determining the website type and the website keywords of the target website;
matching the website keywords with a preset regular expression to obtain matching parameters;
and judging whether the target website is a financial website with legal quality or not based on the matching parameters and the website type.
2. The method for detecting legal qualification of a financing website according to claim 1, wherein the acquiring a target website and identifying whether the target website is a known legal qualification website comprises:
acquiring a financing investment website through a crawler technology to obtain the target website;
and checking the website title and the description sentence of the target website to determine whether the target website is a known legal qualification website.
3. The method for detecting legal qualification of a financing website according to claim 1, wherein the determining the website keywords of the target website comprises:
calculating TF-IDF values of words in the text information of the target website by using a TF-IDF algorithm, and screening out the words of which the TF-IDF values are larger than a preset threshold value;
carrying out weighted calculation on the words based on the preset weight of the target keyword so as to determine the website keyword; wherein the target keyword comprises any one or more of income, product, investment, financing, finance, endorsement, guarantee, duration, stability, member, warranty, non-warranty, high return and risk.
4. The method for detecting legal qualification of financial websites according to claim 3, wherein before calculating TF-IDF values of words in the text information of the target website by using TF-IDF algorithm, the method further comprises:
acquiring hypertext markup languages of a primary page and a secondary page of the target website through an HTML (hypertext markup language) parser; the hypertext markup language comprises text information and pictures;
and extracting text information contained in the picture in the hypertext markup language by an optical character recognition technology to obtain the text information contained in the target website.
5. The method for detecting the legal qualification of the financial website as claimed in claim 3, wherein the regular character string of the preset regular expression is the target keyword.
6. The method for detecting the legal quality of a financing website according to claim 1, characterized in that the known legal quality websites comprise a bank website, a securities website, a public fund website, a futures website and an insurance website; the website types comprise small loan websites, scientific and financial websites, wealth company websites, private fund websites, trust websites and other types of websites.
7. The method for detecting legal quality of financing website according to claim 6, wherein said judging whether the target website is a legal quality financing website based on the matching parameters and the website type comprises:
if the risk level corresponding to the website type is high risk and the matching parameter is greater than or equal to a first preset threshold value, judging that the target website is a financing website without legal quality;
if the risk level corresponding to the website type is a medium risk and the matching parameter is greater than or equal to a second preset threshold value, determining that the target website is a financing website without legal quality;
if the risk level corresponding to the website type is low risk and the matching parameter is greater than or equal to a third preset threshold value, determining that the target website is a financing website without legal quality;
the risk levels of the wealth company website and the private fund website are high risks; the risk levels of the petty loan website and the scientific and technical finance website are intermediate risks; the risk level of the trusted web site and the other types of web sites is low risk.
8. A legal qualification detection device for a financing website is characterized by comprising:
the legal qualification website identification module is used for acquiring a target website and identifying whether the target website is a known legal qualification website;
the website keyword determining module is used for determining the website type and the website keywords of the target website if the identification result of the legal qualification website identification module is negative;
the matching module is used for matching the website keywords with a preset regular expression to obtain matching parameters;
and the legal qualification judging module is used for judging whether the target website is a financing website with legal qualification or not based on the matching parameters and the website type.
9. An electronic device, comprising:
a memory for storing a computer program;
a processor for executing said computer program to implement the method of legal quality inspection of a financing website as claimed in any one of claims 1 to 7.
10. A computer-readable storage medium for storing a computer program; wherein the computer program when executed by a processor implements a method of legal asset detection for a financial website as claimed in any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011364312.2A CN112330463A (en) | 2020-11-27 | 2020-11-27 | Method, device, equipment and medium for detecting legal qualification of financing website |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011364312.2A CN112330463A (en) | 2020-11-27 | 2020-11-27 | Method, device, equipment and medium for detecting legal qualification of financing website |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112330463A true CN112330463A (en) | 2021-02-05 |
Family
ID=74309646
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011364312.2A Pending CN112330463A (en) | 2020-11-27 | 2020-11-27 | Method, device, equipment and medium for detecting legal qualification of financing website |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112330463A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113157998A (en) * | 2021-02-28 | 2021-07-23 | 江苏匠算天诚信息科技有限公司 | Method, system, device and medium for polling website and judging website type through IP |
CN113962573A (en) * | 2021-10-27 | 2022-01-21 | 天元大数据信用管理有限公司 | A method and equipment for forecasting regional financial development trend |
CN114493269A (en) * | 2022-01-26 | 2022-05-13 | 政采云有限公司 | A risk detection method, device and medium for item information |
CN115587893A (en) * | 2022-12-12 | 2023-01-10 | 深圳市泰铼科技有限公司 | Futures transaction supervisory systems based on internet finance |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106603490A (en) * | 2016-11-10 | 2017-04-26 | 上海斐讯数据通信技术有限公司 | Phishing website detecting method and system |
CN106776946A (en) * | 2016-12-02 | 2017-05-31 | 重庆大学 | A kind of detection method of fraudulent website |
CN110929129A (en) * | 2018-08-31 | 2020-03-27 | 阿里巴巴集团控股有限公司 | Information detection method, equipment and machine-readable storage medium |
-
2020
- 2020-11-27 CN CN202011364312.2A patent/CN112330463A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106603490A (en) * | 2016-11-10 | 2017-04-26 | 上海斐讯数据通信技术有限公司 | Phishing website detecting method and system |
CN106776946A (en) * | 2016-12-02 | 2017-05-31 | 重庆大学 | A kind of detection method of fraudulent website |
CN110929129A (en) * | 2018-08-31 | 2020-03-27 | 阿里巴巴集团控股有限公司 | Information detection method, equipment and machine-readable storage medium |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113157998A (en) * | 2021-02-28 | 2021-07-23 | 江苏匠算天诚信息科技有限公司 | Method, system, device and medium for polling website and judging website type through IP |
CN113962573A (en) * | 2021-10-27 | 2022-01-21 | 天元大数据信用管理有限公司 | A method and equipment for forecasting regional financial development trend |
CN114493269A (en) * | 2022-01-26 | 2022-05-13 | 政采云有限公司 | A risk detection method, device and medium for item information |
CN115587893A (en) * | 2022-12-12 | 2023-01-10 | 深圳市泰铼科技有限公司 | Futures transaction supervisory systems based on internet finance |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US12243340B2 (en) | System and method for domain aware document classification and information extraction from consumer documents | |
CN112330463A (en) | Method, device, equipment and medium for detecting legal qualification of financing website | |
Urquiza | Forward-looking disclosure and corporate reputation as mechanisms to reduce stock return volatility: La divulgación de información previsional y la reputación corporativa como mecanismos para reducir la volatilidad de las acciones | |
Acito et al. | The materiality of accounting errors: Evidence from SEC comment letters | |
Hensher et al. | An error component logit analysis of corporate bankruptcy and insolvency risk in Australia | |
Shumway et al. | The delisting bias in CRSP's Nasdaq data and its implications for the size effect | |
US8069105B2 (en) | Hedge fund risk management | |
WO2008030884A2 (en) | System and method of determining and recommending a document control policy for a document | |
Boritz et al. | Determinants of the readability of SOX 404 reports | |
Brockman et al. | The information content of management earnings forecasts: An analysis of hard versus soft information | |
Hamid et al. | The relationship between corporate governance and expropriation of minority shareholders’ interests | |
Scannella et al. | How to measure bank credit risk disclosure? Testing a new methodological approach based on the content analysis framework | |
Bepari | Audit committee characteristics and key audit matters (KAMs) disclosures | |
Onulaka et al. | Non-audit fees and auditor independence: Nigerian evidence | |
Macve | What should be the nature and role of a revised Conceptual Framework for International Accounting Standards? | |
Mazri et al. | Corporate governance attributes as determinants of the Islamic social reporting of Shariah-compliant companies in Malaysia | |
Boskou et al. | Assessing internal audit with text mining | |
Bourveau et al. | Decentralized Finance (DeFi) assurance: early evidence | |
Octavio et al. | The influence of board characteristics, ownership structure and public attention on climate change disclosure in banking sector companies | |
Davalos et al. | A textual analysis of the US Securities and Exchange Commission's accounting and auditing enforcement releases relating to the Sarbanes–Oxley Act | |
Garba et al. | Design of a conceptual framework for cybersecurity culture amongst online banking users in Nigeria | |
Calderon et al. | Comparing the cybersecurity risk disclosures of US and foreign firms | |
Demaline | Disclosure characteristics of firms being investigated by the SEC | |
Cassell et al. | The consequences of writing not so readable responses to SEC comment letters | |
Karoui et al. | Fund names versus family names: Implications for mutual fund flows |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210205 |