CN116483882A - Data processing method, device, computer equipment and storage medium - Google Patents
Data processing method, device, computer equipment and storage medium Download PDFInfo
- Publication number
- CN116483882A CN116483882A CN202310465573.0A CN202310465573A CN116483882A CN 116483882 A CN116483882 A CN 116483882A CN 202310465573 A CN202310465573 A CN 202310465573A CN 116483882 A CN116483882 A CN 116483882A
- Authority
- CN
- China
- Prior art keywords
- target
- group
- test
- determining
- strategy
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 24
- 238000003860 storage Methods 0.000 title claims abstract description 15
- 238000012360 testing method Methods 0.000 claims abstract description 255
- 238000000034 method Methods 0.000 claims abstract description 41
- 238000012545 processing Methods 0.000 claims abstract description 36
- 230000004044 response Effects 0.000 claims description 10
- 238000009826 distribution Methods 0.000 claims description 9
- 238000004590 computer program Methods 0.000 claims description 8
- 238000010606 normalization Methods 0.000 claims description 5
- 238000010586 diagram Methods 0.000 description 4
- 238000002474 experimental method Methods 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 238000012216 screening Methods 0.000 description 4
- 238000004891 communication Methods 0.000 description 3
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 239000002699 waste material Substances 0.000 description 3
- 239000004065 semiconductor Substances 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2457—Query processing with adaptation to user needs
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02P—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
- Y02P90/00—Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
- Y02P90/02—Total factory control, e.g. smart factories, flexible manufacturing systems [FMS] or integrated manufacturing systems [IMS]
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present disclosure provides a data processing method, apparatus, computer device, and storage medium, where the method includes: determining an experimental group and a control group from a plurality of test groups to obtain a target experimental group containing the experimental group and the control group; the plurality of test groups respectively correspond to different strategy schemes; determining a target combined variance reflecting the overall difference condition between the plurality of test groups by using the observation data corresponding to each sample in the test group and the control group, so as to determine a hypothesis testing result of the target test group based on the target combined variance; and determining a target strategy scheme from a plurality of strategy schemes associated with the target test group based on the hypothesis test result, so as to perform data processing based on the target strategy scheme.
Description
Technical Field
The disclosure relates to the technical field of computer processing, and in particular relates to a data processing method, a data processing device, computer equipment and a storage medium.
Background
For various application scenarios, such as a search scenario, a problem may be encountered that one policy scheme is selected from a plurality of different policy schemes for practical application. For example, in a search scenario, there are multiple types of selectable information search strategies, and the search results obtained using each type of information search strategy may be different for the same search information. In practical application, an information searching strategy is selected to complete information searching.
In this case, if an information search policy put into search application is selected from a plurality of different information search policies depending on personal experience, the selected information search policy may not be optimal among the plurality of information search policies, and a search result obtained by using such an information search policy is more likely to be in a situation that cannot meet the search requirement of the user, and is also more likely to cause waste of information resources.
Disclosure of Invention
The embodiment of the disclosure at least provides a data processing method, a data processing device, computer equipment and a storage medium.
In a first aspect, an embodiment of the present disclosure provides a data processing method, including: determining an experimental group and a control group from a plurality of test groups to obtain a target experimental group containing the experimental group and the control group; the plurality of test groups respectively correspond to different strategy schemes; determining a target combined variance reflecting the overall difference condition between the plurality of test groups by using the observation data corresponding to each sample in the test group and the control group, so as to determine a hypothesis testing result of the target test group based on the target combined variance; and determining a target strategy scheme from a plurality of strategy schemes associated with the target test group based on the hypothesis test result, so as to perform data processing based on the target strategy scheme.
In an optional implementation manner, each policy scheme corresponds to a different information searching policy, the test set under each policy scheme comprises a plurality of samples, each sample comprises a search result determined based on the information searching policy under the policy scheme, and observed data corresponding to the sample comprises consumption data corresponding to the search result; and the target strategy scheme is used for determining a search result to be displayed for the search information by adopting an information search strategy corresponding to the target strategy scheme based on the acquired search information when the data is processed.
In an alternative embodiment, the target merge variance is determined in the following manner: determining a combination variance corresponding to the target test group; and determining target combined variances corresponding to the plurality of test groups based on the combined variances in response to the difference in sample size between each of the plurality of test groups being less than a first threshold and the difference in variances being less than a second threshold.
In an alternative embodiment, determining hypothesis testing results for the target trial group based on the target combined variance includes: determining a mean value difference corresponding to the target test group based on observation data corresponding to each sample in the test group and the control group of the target test group; and determining the ratio between the mean difference and the target combined variance after normalization processing as the statistic of the target test group so as to determine the corresponding hypothesis test result in the range distribution under multiple tests based on the statistic.
In an alternative embodiment, determining the combined variance corresponding to the target trial group includes: and determining a first sample amount and a first variance corresponding to the experimental group of the target experimental group, and a second sample amount and a second variance corresponding to the control group, and determining a combined variance corresponding to the target experimental group.
In an alternative embodiment, the mean difference corresponding to the target test group is determined in the following manner: in the target test group, determining a first average value corresponding to the test group based on observation data corresponding to each first sample in the test group; and determining a second mean value corresponding to the control group based on the observed data corresponding to each second sample in the control group; and determining the mean value difference corresponding to the target test group based on the first mean value and the second mean value.
In an alternative embodiment, before determining a target strategy scenario from a plurality of strategy scenarios associated with the target trial group based on the hypothesis test result, the method further comprises: determining an original assumption based on strategy schemes respectively corresponding to an experimental group and a control group in the target experimental group; the original hypothesis is used for selecting and predicting one strategy scheme from a plurality of strategy schemes respectively corresponding to the experimental group and the control group; the determining a target strategy scheme from a plurality of strategy schemes associated with the target test group based on the hypothesis test result comprises the following steps: and comparing the hypothesis test result with a preset significance level, and verifying whether the original hypothesis is established based on the comparison result, so that the predicted strategy scheme is taken as the target strategy scheme in response to the establishment of the original hypothesis.
In a second aspect, an embodiment of the present disclosure further provides a data processing apparatus, including: the first determining module is used for determining an experimental group and a control group from a plurality of test groups to obtain a target experimental group comprising the experimental group and the control group; the plurality of test groups respectively correspond to different strategy schemes; a second determining module, configured to determine a target combined variance reflecting an overall difference condition between the plurality of test groups by using observation data corresponding to each sample in the test group and the control group, so as to determine a hypothesis testing result of the target test group based on the target combined variance; and the third determining module is used for determining a target strategy scheme from a plurality of strategy schemes associated with the target test group based on the hypothesis test result so as to perform data processing based on the target strategy scheme.
In an optional implementation manner, each policy scheme corresponds to a different information searching policy, the test set under each policy scheme comprises a plurality of samples, each sample comprises a search result determined based on the information searching policy under the policy scheme, and observed data corresponding to the sample comprises consumption data corresponding to the search result; and the target strategy scheme is used for determining a search result to be displayed for the search information by adopting an information search strategy corresponding to the target strategy scheme based on the acquired search information when the data is processed.
In an alternative embodiment, the apparatus further comprises a processing module configured to determine the target merge variance by: determining a combination variance corresponding to the target test group; and determining target combined variances corresponding to the plurality of test groups based on the combined variances in response to the difference in sample size between each of the plurality of test groups being less than a first threshold and the difference in variances being less than a second threshold.
In an alternative embodiment, a second determination module, when determining the hypothesis testing results for the target trial group based on the target combined variance, is configured to: determining a mean value difference corresponding to the target test group based on observation data corresponding to each sample in the test group and the control group of the target test group; and determining the ratio between the mean difference and the target combined variance after normalization processing as the statistic of the target test group so as to determine the corresponding hypothesis test result in the range distribution under multiple tests based on the statistic.
In an alternative embodiment, the processing module, when determining the combined variance corresponding to the target test group, is configured to: and determining a first sample amount and a first variance corresponding to the experimental group of the target experimental group, and a second sample amount and a second variance corresponding to the control group, and determining a combined variance corresponding to the target experimental group.
In an alternative embodiment, the mean difference corresponding to the target test group is determined in the following manner: in the target test group, determining a first average value corresponding to the test group based on observation data corresponding to each first sample in the test group; and determining a second mean value corresponding to the control group based on the observed data corresponding to each second sample in the control group; and determining the mean value difference corresponding to the target test group based on the first mean value and the second mean value.
In an alternative embodiment, the third determining module is further configured, before determining a target policy plan from the plurality of policy plans associated with the target trial group based on the hypothesis test result, to: determining an original assumption based on strategy schemes respectively corresponding to an experimental group and a control group in the target experimental group; the original hypothesis is used for selecting and predicting one strategy scheme from a plurality of strategy schemes respectively corresponding to the experimental group and the control group; the third determination module, when determining a target policy scheme from a plurality of policy schemes associated with the target trial group based on the hypothesis test results, is to: and comparing the hypothesis test result with a preset significance level, and verifying whether the original hypothesis is established based on the comparison result, so that the predicted strategy scheme is taken as the target strategy scheme in response to the establishment of the original hypothesis.
In a third aspect, an optional implementation manner of the disclosure further provides a computer device, a processor, and a memory, where the memory stores machine-readable instructions executable by the processor, and the processor is configured to execute the machine-readable instructions stored in the memory, where the machine-readable instructions, when executed by the processor, perform the steps in the first aspect, or any possible implementation manner of the first aspect, when executed by the processor.
In a fourth aspect, an alternative implementation of the present disclosure further provides a computer readable storage medium having stored thereon a computer program which when executed performs the steps of the first aspect, or any of the possible implementation manners of the first aspect.
According to the data processing method, the data processing device, the computer equipment and the storage medium, a plurality of corresponding test groups under a plurality of strategy schemes to be screened can be determined, a target test group comprising an experiment group and a control group is selected from the test groups, the target merging variance reflecting the overall difference condition among the plurality of test groups is determined according to observation data corresponding to each sample, and the hypothesis test result of the target test group is determined by utilizing the target merging variance, so that the target strategy scheme is determined. In the method, the observation data of the strategy quality can be reflected in practice under each strategy scheme, multiple comparisons are carried out on each strategy scheme, and the method is more accurate than the method which depends on manual experience when a target strategy scheme is selected.
Further, in the data processing method provided by the embodiment of the disclosure, when multiple comparisons are performed on the selected multiple strategy schemes, the hypothesis test results are determined by the target combination schemes determined by the selected experimental group and the control group, so that the problem that the calculated amount is excessive when the combination variances are determined for all the test groups under the multiple comparisons can be avoided, and therefore, the time required for data processing is shorter and the efficiency is higher.
The foregoing objects, features and advantages of the disclosure will be more readily apparent from the following detailed description of the preferred embodiments taken in conjunction with the accompanying drawings.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings required for the embodiments are briefly described below, which are incorporated in and constitute a part of the specification, these drawings showing embodiments consistent with the present disclosure and together with the description serve to illustrate the technical solutions of the present disclosure. It is to be understood that the following drawings illustrate only certain embodiments of the present disclosure and are therefore not to be considered limiting of its scope, for the person of ordinary skill in the art may admit to other equally relevant drawings without inventive effort.
FIG. 1 illustrates a flow chart of a data processing method provided by an embodiment of the present disclosure;
FIG. 2 shows a schematic diagram of a data processing apparatus provided by an embodiment of the present disclosure;
fig. 3 shows a schematic diagram of a computer device provided by an embodiment of the present disclosure.
Detailed Description
For the purposes of making the objects, technical solutions and advantages of the embodiments of the present disclosure more apparent, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in the embodiments of the present disclosure, and it is apparent that the described embodiments are only some embodiments of the present disclosure, but not all embodiments. The components of the disclosed embodiments generally described and illustrated herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present disclosure is not intended to limit the scope of the disclosure, as claimed, but is merely representative of selected embodiments of the disclosure. All other embodiments, which can be made by those skilled in the art based on the embodiments of this disclosure without making any inventive effort, are intended to be within the scope of this disclosure.
According to research, in application scenes such as the search field, different search results can be obtained by adopting different information search strategies for the same search information. In an actual application scenario, a better one needs to be selected from a plurality of information searching strategies to complete the searching task. The information search strategy is selected and used directly depending on personal experience, the selected information search strategy may not be optimal, the obtained search result is more likely to be in a situation of not meeting the search requirement of the user, and the waste of information resources is more likely to be caused.
Based on the above study, the disclosure provides a data processing method, which can determine a plurality of test groups corresponding to a plurality of strategy schemes to be screened, and select a target test group including an experiment group and a control group from the plurality of test groups, so as to determine a target merging variance reflecting the overall difference condition between the plurality of test groups according to observation data corresponding to each sample, and determine a hypothesis test result of the target test group by using the target merging variance, thereby determining a target strategy scheme. In the method, the observation data of the strategy quality can be reflected in practice under each strategy scheme, multiple comparisons are carried out on each strategy scheme, and the method is more accurate than the method which depends on manual experience when a target strategy scheme is selected.
In addition, when multiple comparisons are carried out on a plurality of selected strategy schemes, the hypothesis test results are determined through the target combination schemes determined by the selected experimental group and the control group, so that the problem that the calculated amount is overlarge when the combination variances are determined for all test groups under the multiple comparisons can be avoided, and therefore, the time required for data processing is shorter and the efficiency is higher.
The present invention is directed to a method for manufacturing a semiconductor device, and a semiconductor device manufactured by the method.
It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures.
For the sake of understanding the present embodiment, first, a detailed description will be given of a data processing method disclosed in an embodiment of the present disclosure, where an execution body of the data processing method provided in the embodiment of the present disclosure is generally a computer device having a certain computing capability, where the computer device includes, for example: the terminal device, or server or other processing device, may be a User Equipment (UE), mobile device, user terminal, cellular telephone, cordless telephone, personal digital assistant (Personal Digital Assistant, PDA), handheld device, computing device, vehicle mounted device, wearable device, etc. In some possible implementations, the data processing method may be implemented by way of a processor invoking computer readable instructions stored in a memory.
The data processing method provided by the embodiment of the present disclosure is described below. The data processing method provided by the embodiment of the disclosure can be particularly used for selecting an optimal scheme from a plurality of schemes, so that the method can be applied to various different fields. The selection of an optimal information search strategy in a search scenario in various fields, such as the above description, may also include determining a preferred strategy threshold among a plurality of alternative strategy thresholds, or may select an optimal image processing algorithm in the image processing field, and so on. According to the data processing method provided by the embodiment of the disclosure, when the optimal scheme is selected from a plurality of schemes, observation data is obtained under each scheme, and the scheme is selected according to the scheme, so that compared with a screening mode relying on manual experience, the optimal scheme can be selected from the schemes more easily and accurately.
Referring to fig. 1, a flowchart of a data processing method according to an embodiment of the disclosure is shown, where the method includes steps S101 to S103, where:
s101: determining an experimental group and a control group from a plurality of test groups to obtain a target experimental group containing the experimental group and the control group; the plurality of test groups respectively correspond to different strategy schemes;
s102: determining a target combined variance reflecting the overall difference condition between the plurality of test groups by using the observation data corresponding to each sample in the test group and the control group, so as to determine a hypothesis testing result of the target test group based on the target combined variance;
s103: and determining a target strategy scheme from a plurality of strategy schemes associated with the target test group based on the hypothesis test result, so as to perform data processing based on the target strategy scheme.
The following describes the above-described S101 to S103 in detail, taking the search scenario described above as an example.
For S101 above, a plurality of test groups may be first determined, where each test group corresponds to a policy scheme, and each policy scheme has a difference therebetween. Specifically, each policy scheme may correspond to an information search policy, where the information search policy is specifically configured to complete an information search task, and the information search policy may determine how to obtain an associated search result after receiving search information sent by a user terminal.
The corresponding information search strategies under different strategy schemes are different, and the specific indication of the information search strategies can be embodied in different search dimensions, for example, one information search strategy comprises performing associated search directly according to the split keywords in the search information, and the other information search strategy comprises performing associated search by combining the hot words and the split keywords, and the like. Alternatively, there may be only a difference in some threshold values, for example, a certain information search strategy includes determining to acquire related search results using a plurality of semantic words having a similarity of more than 70% with the above-mentioned keywords, another information search strategy includes determining to acquire related search results using a plurality of semantic words having a similarity of more than 80% with the above-mentioned keywords, and so on. Taking the latter as an example, it is expected that in the case of selecting the 70% threshold, the obtained search result may have a low correlation with the keywords, but in the case of selecting the 80% threshold, the obtained search result may have a small number of search results, and the information available to the user is insufficient, so that it is also necessary to select the target policy scheme from among these different policy schemes.
For each policy scheme, a test group corresponding to the policy scheme can be specifically determined, and each test group includes a plurality of samples, such as a plurality of search results obtained under an information search policy. The quality of the information search strategy can be expressed through the excellent degree of the search result, and the excellent degree of the search result can be evaluated through various indexes, such as user checking time, number of user praise comments, number of watched users and the like, and is specifically called consumption data corresponding to the search result in the embodiment of the disclosure and is taken as observation data of a sample.
Here, it may be determined in the embodiments of the present disclosure that, if the search result under one information search policy has more users viewing and approving comments in the corresponding consumption data representation, the information search policy is better. Therefore, through the observation data of the sample under each information searching strategy, a plurality of different information searching strategies can be compared to select a better information searching strategy as a target searching strategy. Accordingly, the process determines the target policy scheme, i.e., under a plurality of policy schemes. And for the finally selected target strategy scheme, the corresponding target search strategy can be used for data processing, and after a search request sent by a user terminal is received, the selected target search strategy can be utilized to obtain a search result which meets the search requirement of the user, and the search result is used as a search result to be displayed for displaying to the user.
After determining a plurality of test groups under different strategy schemes, experimental groups and control groups can also be selected. For a plurality of strategy schemes, in a possible case, a plurality of strategy schemes with important attention are provided in a period of time, for example, after an information searching strategy is updated once, whether the updated information searching strategy is improved compared with an information searching strategy before the updating is focused, and a test group correspondingly determined under the two information searching strategies is used as an experimental group and a control group. Alternatively, the experimental group and the control group may be selected from a plurality of test groups according to actual conditions, which will not be described herein.
The experimental and control groups selected herein are specifically referred to as target experimental groups in the examples of the present disclosure. The test group and the control group under the target test group were regarded as independent test groups.
For the S102, for the selected target test group, the target combination variance reflecting the overall difference condition between the multiple test groups may be determined by using the observed data corresponding to each sample in the test group and the control group, so as to determine the hypothesis test result of the target test group based on the target combination variance.
Here, the original hypothesis (H0) may be first determined based on the policy schemes corresponding to the experimental group and the control group, respectively, in the target test group. For example, the original assumption may be: the user residence time length X0 in the corresponding consumption data under the experimental group is not significantly different from the user residence time length X1 in the corresponding consumption data under the control group, that is: h0:x0=x1; accordingly, the alternative hypotheses are determined as: the user residence time length X0 in the corresponding consumption data under the experimental group is longer than the user residence time length X1 in the corresponding consumption data under the control group, that is: H1:X0> X1. The policies under the control group are usually current policies, and if the original assumptions cannot be rejected in the setting mode of the original assumption H0 and the alternative assumption H1, the current policies under the control group are continued in practical application.
In determining the target strategy scheme under the strategy schemes respectively corresponding to the experimental group and the control group, the method can be specifically performed by determining the target combination variance to determine the hypothesis test result. In a specific implementation, by comparing the hypothesis test result with a preset significance level, whether the original hypothesis is true can be determined, so as to determine the target strategy scheme.
Here, for both test groups, a class of errors may occur in the screening, specifically, a class of errors that erroneously rejects the original hypothesis as a result of the screening. The embodiments of the present disclosure relate to comparison screening for multiple test groups, i.e., multiple comparisons, and further solve a problem of error expansion in multiple comparisons, i.e., error in one type is more likely to occur.
In this case, in the manner of multiple comparisons, a base test (Turkey's test) may be specifically selected, in which the variance information of all test groups is used, specifically, each time a comparison of two test groups is performed. However, in practice, for multiple comparisons, the number of test sets included therein may be large, and if two test sets are selected at a time to determine variance information, the amount of data calculated will be very large.
Therefore, in the embodiment of the present disclosure, when determining the target merge variance, the following manner is specifically selected: determining a combination variance corresponding to the target test group; and determining target combined variances corresponding to the plurality of test groups based on the combined variances in response to the difference in sample size between each of the plurality of test groups being less than a first threshold and the difference in variances being less than a second threshold. Here, the obtained target combined variances corresponding to the plurality of test groups, that is, the variance information of all test groups under the base test described above. For convenience of description, the above-described combined variance is referred to as S, and the target combined variance determined using the combined variance S is referred to as S'.
First, a method to be adopted when determining the combined variance S corresponding to the target test group will be described. In determining the combining variance S, the following may be specifically adopted: and determining a first sample amount and a first variance corresponding to the experimental group of the target experimental group, and a second sample amount and a second variance corresponding to the control group, and determining a combined variance corresponding to the target experimental group.
The following is an example. For both experimental and control groups, the test groups were tested by i n The representation may be specifically represented by i 0 Represents the experimental group, i 1 A control group is shown. Experiment group i n The corresponding sample size can be expressed as n in Thus the first sample amount corresponding to the above experimental group is expressed as n i0 The second sample size corresponding to the control group is expressed as n i1 The method comprises the steps of carrying out a first treatment on the surface of the Experiment group i n The corresponding variance is denoted as sigma in Phase (C)Similarly, the first variance corresponding to the above experimental group is denoted as sigma i0 The second variance corresponding to the control group is denoted as sigma i1 。
Based on the above expression, when determining the combined variance S of the target test group, it specifically satisfies the following formula (1):
for the selected target test group and the original multiple test groups, when the test groups are actually obtained in a random shunting manner, no matter how many test groups are obtained, the sample sizes among the obtained test groups are close, and the observed data of each test group are distributed close before the test.
The specific reasons are set forth below: firstly, when each sample is divided into test groups, if the sample amount between each test group is large, the random shunting mode itself has a problem, and no matter what method is adopted, the reliability of the obtained selection result cannot be ensured after the target strategy scheme is determined by the test group, so that the sample amount of each test group should be close when the test group is determined by the random shunting.
Secondly, considering that control variables may be required when making comparisons between test groups, and that in general the policy scheme differs little between test groups, especially in the case of a large number of test groups, such as in the application scenario described in the embodiments of the present disclosure, there may be only adjustments in the threshold between test groups, so it is foreseen that the variation between the observed data of the samples in the test groups is small, so the variance between the test groups remains in a relatively close state.
Under the above description of the sample size and variance, the following two assumption conditions can be obtained specifically, including the following (2-1) and (2-2):
On this basis, the following formula (3) can be specifically satisfied when determining the target merge variances S' corresponding to the plurality of test groups
Therefore, the following formula (4) is satisfied between the combined variance S of the target test group and the target combined variances S' corresponding to the plurality of test groups:
that is, after the merging variances corresponding to the target test groups are determined, the target merging variances corresponding to the plurality of test groups can be approximately determined by using the above formula (4). By adopting the method, the calculation amount brought by calculating the target merging variance S' by adopting the formula (3) under the condition that a plurality of test groups exist can be effectively reduced, and the efficiency can be effectively improved.
After determining the target merging variance, continuously determining a mean value difference corresponding to the target test group based on the observation data corresponding to each sample in the test group and the control group of the target test group; and determining the ratio between the mean difference and the target combined variance after normalization processing as the statistic of the target test group so as to determine the corresponding hypothesis test result in the range distribution under multiple tests based on the statistic.
Here, for example, for the user stay time period in the corresponding consumption data in the experimental group listed in the above example to be X0, the user stay time period in the corresponding consumption data in the control group to be X1, when the mean difference Δ is determined, the mean difference Δ=x0-X1 may be determined.
Here, when determining the mean value difference, the user residence time length X1 corresponding to the experimental group and the user residence time length X2 corresponding to the control group, which are adopted, may be specifically expressed as the mean value. Thus, in determining the mean difference, the following may be specifically adopted: in the target test group, determining a first mean value X0 corresponding to the test group based on observation data corresponding to each first sample in the test group; and determining a second mean value X1 corresponding to the control group based on the observed data corresponding to each second sample in the control group; and determining the mean value difference corresponding to the target test group based on the first mean value and the second mean value.
Using the mean difference delta and normalizing the target combined variance, a statistic t can be determined, where the statistic t satisfies t=delta/S'. The statistic t specifically obeys the range distribution under multiple tests, and the specific range distribution can be the student range distribution (Studentized Range distribution, t range), so that the hypothesis test result (p value) corresponding to the statistic t can be obtained by a table look-up mode.
For S103 described above, in the case where the hypothesis test result is determined, the target policy scheme may be determined from among the plurality of policy schemes associated with the target trial group.
Specifically, the original hypothesis H0 may be determined according to the related description in S101 described above, and for the determined hypothesis test result p value, whether the original hypothesis H0 can be established may be determined by comparison with a preset significance level α value. In this case, if the original assumption H0 is not satisfied, the policy scheme predicted in the alternative assumption H1 is set as the target policy scheme. Here, the preset significance level α value may be set to 0.05.
When the comparison is specifically performed, if p < alpha, the original assumption H0 is refused, namely the user stay time length X0 in the corresponding consumption data under the experimental group is considered to be longer than the user stay time length X1 in the corresponding consumption data under the control group, and the corresponding strategy scheme under the experimental group is better. Otherwise, the control group strategy is still maintained, namely the user stay time length X0 in the corresponding consumption data under the experimental group is considered to be similar to the user stay time length X1 in the corresponding consumption data under the control group, so that the original strategy scheme under the control group is not selected to be changed, and the original strategy scheme under the control group is continuously maintained.
Thus, for two different strategy schemes corresponding to the experimental group and the control group under the target experimental group respectively, one of the more optimal strategy schemes can be determined, or the original strategy scheme can be decided to be maintained. In an actual application scenario, a more applicable information search strategy can be determined specifically so as to be put into the actual search scenario and used for carrying out data processing on the received search information to obtain a search result to be displayed to a user.
In another embodiment of the present disclosure, after determining the target policy scheme from the plurality of policy schemes associated with the target test group, the method may further include determining a final selected target policy scheme from the plurality of test groups under the policy schemes respectively corresponding to the plurality of test groups: and determining the test group and the test group corresponding to the target strategy as a new target test group in response to the existence of the unselected test group in the plurality of test groups, and determining a new target strategy scheme based on the new target test group so as to determine a target strategy scheme for acquiring the search result corresponding to the search information from the plurality of strategy schemes.
In the above step, two test groups of great interest are selected from the plurality of test groups as target test groups, to determine a preferred target strategy scheme therefrom. In practice, however, the selectable test set includes a plurality of test sets, and there are a plurality of policy schemes that can be selected accordingly. Based on this, all test groups can be traversed to determine an optimal target strategy scheme from a plurality of strategy schemes corresponding to a plurality of test groups through observation data of samples under each test group.
Because the strategy scheme corresponding to one test group is determined to be better from the target test group in the steps, the test group and the test group which is not tested next can be continuously combined to be used as a new target test group, the process is repeated, so that the better test group is determined from a plurality of test groups through continuous comparison, the strategy scheme under the test group is particularly shown, the stay time of a user in observed data is longer, namely, the search result which meets the requirement of the user is obtained under the corresponding information search strategy, and the waste of display information is effectively reduced. The corresponding information searching strategy under the strategy scheme is also more suitable for data processing of the received searching request.
It will be appreciated by those skilled in the art that in the above-described method of the specific embodiments, the written order of steps is not meant to imply a strict order of execution but rather should be construed according to the function and possibly inherent logic of the steps.
Based on the same inventive concept, the embodiments of the present disclosure further provide a data processing device corresponding to the data processing method, and since the principle of solving the problem by the device in the embodiments of the present disclosure is similar to that of the data processing method in the embodiments of the present disclosure, the implementation of the device may refer to the implementation of the method, and the repetition is omitted.
Referring to fig. 2, a schematic diagram of a data processing apparatus according to an embodiment of the disclosure is provided, where the apparatus includes: a first determination module 21, a second determination module 22, a third determination module 23; wherein,,
a first determining module 21, configured to determine an experimental group and a control group from a plurality of test groups, and obtain a target experimental group including the experimental group and the control group; the plurality of test groups respectively correspond to different strategy schemes;
a second determining module 22, configured to determine a target combined variance reflecting an overall difference condition between the plurality of test groups by using the observed data corresponding to each sample in the experimental group and the control group, so as to determine a hypothesis testing result of the target experimental group based on the target combined variance;
A third determining module 23, configured to determine a target policy scheme from a plurality of policy schemes associated with the target test group based on the hypothesis test result, so as to perform data processing based on the target policy scheme.
In an optional implementation manner, each policy scheme corresponds to a different information searching policy, the test set under each policy scheme comprises a plurality of samples, each sample comprises a search result determined based on the information searching policy under the policy scheme, and observed data corresponding to the sample comprises consumption data corresponding to the search result; and the target strategy scheme is used for determining a search result to be displayed for the search information by adopting an information search strategy corresponding to the target strategy scheme based on the acquired search information when the data is processed.
In an alternative embodiment, the apparatus further comprises a processing module 24 for determining the target merge variance by: determining a combination variance corresponding to the target test group; and determining target combined variances corresponding to the plurality of test groups based on the combined variances in response to the difference in sample size between each of the plurality of test groups being less than a first threshold and the difference in variances being less than a second threshold.
In an alternative embodiment, the second determining module 22 is configured to, when determining the hypothesis testing results for the target trial group based on the target combined variance: determining a mean value difference corresponding to the target test group based on observation data corresponding to each sample in the test group and the control group of the target test group; and determining the ratio between the mean difference and the target combined variance after normalization processing as the statistic of the target test group so as to determine the corresponding hypothesis test result in the range distribution under multiple tests based on the statistic.
In an alternative embodiment, the processing module 24 is configured, when determining the combined variance corresponding to the target trial group, to: and determining a first sample amount and a first variance corresponding to the experimental group of the target experimental group, and a second sample amount and a second variance corresponding to the control group, and determining a combined variance corresponding to the target experimental group.
In an alternative embodiment, the mean difference corresponding to the target test group is determined in the following manner: in the target test group, determining a first average value corresponding to the test group based on observation data corresponding to each first sample in the test group; and determining a second mean value corresponding to the control group based on the observed data corresponding to each second sample in the control group; and determining the mean value difference corresponding to the target test group based on the first mean value and the second mean value.
In an alternative embodiment, the third determining module 23 is further configured, before determining a target policy scheme from the plurality of policy schemes associated with the target trial group based on the hypothesis test result, to: determining an original assumption based on strategy schemes respectively corresponding to an experimental group and a control group in the target experimental group; the original hypothesis is used for selecting and predicting one strategy scheme from a plurality of strategy schemes respectively corresponding to the experimental group and the control group; the third determining module 23 is configured to, when determining a target policy scenario from among a plurality of policy scenarios associated with the target trial group based on the hypothesis test result: and comparing the hypothesis test result with a preset significance level, and verifying whether the original hypothesis is established based on the comparison result, so that the predicted strategy scheme is taken as the target strategy scheme in response to the establishment of the original hypothesis.
The process flow of each module in the apparatus and the interaction flow between the modules may be described with reference to the related descriptions in the above method embodiments, which are not described in detail herein.
The embodiment of the disclosure further provides a computer device, as shown in fig. 3, which is a schematic structural diagram of the computer device provided by the embodiment of the disclosure, including:
A processor 10 and a memory 20; the memory 20 stores machine readable instructions executable by the processor 10, the processor 10 being configured to execute the machine readable instructions stored in the memory 20, the machine readable instructions when executed by the processor 10, the processor 10 performing the steps of:
determining an experimental group and a control group from a plurality of test groups to obtain a target experimental group containing the experimental group and the control group; the plurality of test groups respectively correspond to different strategy schemes; determining a target combined variance reflecting the overall difference condition between the plurality of test groups by using the observation data corresponding to each sample in the test group and the control group, so as to determine a hypothesis testing result of the target test group based on the target combined variance; and determining a target strategy scheme from a plurality of strategy schemes associated with the target test group based on the hypothesis test result, so as to perform data processing based on the target strategy scheme.
The memory 20 includes a memory 210 and an external memory 220; the memory 210 is also referred to as an internal memory, and is used for temporarily storing operation data in the processor 10 and data exchanged with the external memory 220 such as a hard disk, and the processor 10 exchanges data with the external memory 220 via the memory 210.
The specific execution process of the above instructions may refer to the steps of the data processing method described in the embodiments of the present disclosure, which is not described herein.
The disclosed embodiments also provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the data processing method described in the method embodiments above. Wherein the storage medium may be a volatile or nonvolatile computer readable storage medium.
Embodiments of the present disclosure further provide a computer program product, where the computer program product carries program code, where instructions included in the program code may be used to perform steps of a data processing method described in the foregoing method embodiments, and specifically reference may be made to the foregoing method embodiments, which are not described herein.
Wherein the above-mentioned computer program product may be realized in particular by means of hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied as a computer storage medium, and in another alternative embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK), or the like.
It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described system and apparatus may refer to corresponding procedures in the foregoing method embodiments, which are not described herein again. In the several embodiments provided in the present disclosure, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. The above-described apparatus embodiments are merely illustrative, for example, the division of the units is merely a logical function division, and there may be other manners of division in actual implementation, and for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some communication interface, device or unit indirect coupling or communication connection, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in each embodiment of the present disclosure may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer readable storage medium executable by a processor. Based on such understanding, the technical solution of the present disclosure may be embodied in essence or a part contributing to the prior art or a part of the technical solution, or in the form of a software product stored in a storage medium, including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method described in the embodiments of the present disclosure. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
Finally, it should be noted that: the foregoing examples are merely specific embodiments of the present disclosure, and are not intended to limit the scope of the disclosure, but the present disclosure is not limited thereto, and those skilled in the art will appreciate that while the foregoing examples are described in detail, it is not limited to the disclosure: any person skilled in the art, within the technical scope of the disclosure of the present disclosure, may modify or easily conceive changes to the technical solutions described in the foregoing embodiments, or make equivalent substitutions for some of the technical features thereof; such modifications, changes or substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the disclosure, and are intended to be included within the scope of the present disclosure. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.
Claims (10)
1. A method of data processing, comprising:
determining an experimental group and a control group from a plurality of test groups to obtain a target experimental group containing the experimental group and the control group; the plurality of test groups respectively correspond to different strategy schemes;
determining a target combined variance reflecting the overall difference condition between the plurality of test groups by using the observation data corresponding to each sample in the test group and the control group, so as to determine a hypothesis testing result of the target test group based on the target combined variance;
and determining a target strategy scheme from a plurality of strategy schemes associated with the target test group based on the hypothesis test result, so as to perform data processing based on the target strategy scheme.
2. The method of claim 1, wherein each of the policy schemes corresponds to a different information search policy, the test set under each policy scheme includes a plurality of samples, each of the samples includes a search result determined based on the information search policy under the policy scheme, and the observation data corresponding to the samples includes consumption data corresponding to the search result;
and the target strategy scheme is used for determining a search result to be displayed for the search information by adopting an information search strategy corresponding to the target strategy scheme based on the acquired search information when the data is processed.
3. The method according to claim 1 or 2, characterized in that the target combined variance is determined in the following way:
determining a combination variance corresponding to the target test group;
and determining target combined variances corresponding to the plurality of test groups based on the combined variances in response to the difference in sample size between each of the plurality of test groups being less than a first threshold and the difference in variances being less than a second threshold.
4. The method of claim 1 or 2, wherein determining hypothesis testing results for the target trial group based on the target combined variance comprises:
determining a mean value difference corresponding to the target test group based on observation data corresponding to each sample in the test group and the control group of the target test group;
and determining the ratio between the mean difference and the target combined variance after normalization processing as the statistic of the target test group so as to determine the corresponding hypothesis test result in the range distribution under multiple tests based on the statistic.
5. The method of claim 3, wherein determining the combined variance for the target trial group comprises:
and determining a first sample amount and a first variance corresponding to the experimental group of the target experimental group, and a second sample amount and a second variance corresponding to the control group, and determining a combined variance corresponding to the target experimental group.
6. The method of claim 4, wherein the mean differences corresponding to the target test group are determined by:
in the target test group, determining a first average value corresponding to the test group based on observation data corresponding to each first sample in the test group; and determining a second mean value corresponding to the control group based on the observed data corresponding to each second sample in the control group;
and determining the mean value difference corresponding to the target test group based on the first mean value and the second mean value.
7. The method of claim 1, wherein prior to determining a target policy plan from a plurality of policy plans associated with the target trial group based on the hypothesis test result, the method further comprises:
determining an original assumption based on strategy schemes respectively corresponding to an experimental group and a control group in the target experimental group; the original hypothesis is used for selecting and predicting one strategy scheme from a plurality of strategy schemes respectively corresponding to the experimental group and the control group;
the determining a target strategy scheme from a plurality of strategy schemes associated with the target test group based on the hypothesis test result comprises the following steps:
And comparing the hypothesis test result with a preset significance level, and verifying whether the original hypothesis is established based on the comparison result, so that the predicted strategy scheme is taken as the target strategy scheme in response to the establishment of the original hypothesis.
8. A data processing apparatus, comprising:
the first determining module is used for determining an experimental group and a control group from a plurality of test groups to obtain a target experimental group comprising the experimental group and the control group; the plurality of test groups respectively correspond to different strategy schemes;
a second determining module, configured to determine a target combined variance reflecting an overall difference condition between the plurality of test groups by using observation data corresponding to each sample in the test group and the control group, so as to determine a hypothesis testing result of the target test group based on the target combined variance;
and the third determining module is used for determining a target strategy scheme from a plurality of strategy schemes associated with the target test group based on the hypothesis test result so as to perform data processing based on the target strategy scheme.
9. A computer device, comprising: a processor, a memory storing machine readable instructions executable by the processor for executing machine readable instructions stored in the memory, which when executed by the processor, perform the steps of the data processing method according to any one of claims 1 to 7.
10. A computer-readable storage medium, on which a computer program is stored which, when being run by a computer device, performs the steps of the data processing method according to any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310465573.0A CN116483882A (en) | 2023-04-26 | 2023-04-26 | Data processing method, device, computer equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310465573.0A CN116483882A (en) | 2023-04-26 | 2023-04-26 | Data processing method, device, computer equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116483882A true CN116483882A (en) | 2023-07-25 |
Family
ID=87219175
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310465573.0A Pending CN116483882A (en) | 2023-04-26 | 2023-04-26 | Data processing method, device, computer equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116483882A (en) |
-
2023
- 2023-04-26 CN CN202310465573.0A patent/CN116483882A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP3780541B1 (en) | Identity information identification method and device | |
US11354797B2 (en) | Method, device, and system for testing an image | |
CN109949154B (en) | Customer information classification method, apparatus, computer device and storage medium | |
EP3633553A1 (en) | Method, device and apparatus for training object detection model | |
CN113642659B (en) | Training sample set generation method and device, electronic equipment and storage medium | |
CN107451854B (en) | Method and device for determining user type and electronic equipment | |
CN105373800A (en) | Classification method and device | |
CN110796269B (en) | Method and device for generating model, and method and device for processing information | |
CN112052251A (en) | Target data updating method and related device, equipment and storage medium | |
CN111311328A (en) | Method and device for determining advertisement click rate of product under advertisement channel | |
CN111275106B (en) | Countermeasure sample generation method and device and computer equipment | |
WO2018058721A1 (en) | Apparatus and method for dataset model fitting using classifying engine | |
CN114003648B (en) | Identification method and device for risk transaction group partner, electronic equipment and storage medium | |
CN116483882A (en) | Data processing method, device, computer equipment and storage medium | |
CN110489416B (en) | Information storage method based on data processing and related equipment | |
CN111275071B (en) | Prediction model training method, prediction device and electronic equipment | |
CN112446428A (en) | Image data processing method and device | |
CN115953248B (en) | Wind control method, device, equipment and medium based on saprolitic additivity interpretation | |
CN117331907A (en) | Computer-implemented method and system for creating a scene database | |
CN112905191B (en) | Data processing method, device, computer readable storage medium and computer equipment | |
CN111078877B (en) | Data processing method, training method of text classification model, and text classification method and device | |
CN109542906B (en) | Equipment determination method and device | |
CN112926678A (en) | Model similarity determination method and device | |
CN114697322B (en) | Data screening method based on cloud service processing | |
CN111414470A (en) | Method and device for processing document, computer storage medium and terminal |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |