[go: up one dir, main page]

CN120875931A - Intelligent passenger acquisition and user behavior analysis system based on AI full ecology - Google Patents

Intelligent passenger acquisition and user behavior analysis system based on AI full ecology

Info

Publication number
CN120875931A
CN120875931A CN202510986555.6A CN202510986555A CN120875931A CN 120875931 A CN120875931 A CN 120875931A CN 202510986555 A CN202510986555 A CN 202510986555A CN 120875931 A CN120875931 A CN 120875931A
Authority
CN
China
Prior art keywords
data
user
module
model
user behavior
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202510986555.6A
Other languages
Chinese (zh)
Inventor
杨佳蓉
吴小培
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ningbo Jiayuan Technology Co ltd
Original Assignee
Ningbo Jiayuan Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ningbo Jiayuan Technology Co ltd filed Critical Ningbo Jiayuan Technology Co ltd
Priority to CN202510986555.6A priority Critical patent/CN120875931A/en
Publication of CN120875931A publication Critical patent/CN120875931A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/26Visual data mining; Browsing structured data
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/10Pre-processing; Data cleansing
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/70Protecting specific internal or peripheral components, in which the protection of a component leads to protection of the entire computer
    • G06F21/71Protecting specific internal or peripheral components, in which the protection of a component leads to protection of the entire computer to assure secure computing or processing of information
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/092Reinforcement learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/096Transfer learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • G06N5/022Knowledge engineering; Knowledge acquisition
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • G06N5/022Knowledge engineering; Knowledge acquisition
    • G06N5/025Extracting rules from data
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/01Customer relationship services
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0631Recommending goods or services

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Databases & Information Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Strategic Management (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Marketing (AREA)
  • Biophysics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Business, Economics & Management (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Economics (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Human Resources & Organizations (AREA)
  • Computer Security & Cryptography (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Game Theory and Decision Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses an AI-based full-ecology intelligent passenger acquisition and user behavior analysis system, which relates to the technical field of artificial intelligence and big data marketing, and comprises a multi-source data acquisition module for collecting multi-channel data inside and outside an enterprise; the intelligent customer acquisition module utilizes transfer learning to screen high-potential customers, the user behavior analysis module utilizes a transducer to mine behavior logic, the intelligent recommendation engine fuses multiple algorithms to realize personalized recommendation, and the real-time decision module combines rules and reinforcement learning to optimize marketing. The method realizes multi-source data integration and deep analysis, accurately identifies the demands of clients and mining behaviors, improves the efficiency and conversion rate of acquiring the clients, optimizes marketing effects through personalized recommendation and intelligent decision making, and provides a full-flow intelligent marketing solution from data acquisition to decision execution for enterprises.

Description

Intelligent passenger acquisition and user behavior analysis system based on AI full ecology
Technical Field
The invention relates to the technical field of artificial intelligence and big data marketing, in particular to an intelligent passenger acquisition and user behavior analysis system based on AI total ecology.
Background
Under the wave of digital marketing, the demands of enterprises on intelligent acquisition and user behavior analysis are more urgent, but the prior art has a plurality of bottlenecks, and the requirements of rapid market change are difficult to meet.
The traditional guest-obtaining mode relies on manual screening of customer clues, and marketing is achieved through modes of telephone promotion, mail mass sending and the like. The method is low in efficiency, and due to the lack of accurate customer portrait and demand analysis, marketing resources are seriously wasted, and the acquisition cost is high. Meanwhile, the enterprise data are distributed in a plurality of independent systems, such as a Customer Relationship Management (CRM) system records basic information and transaction data, an e-commerce platform stores purchasing behavior data, and a social media platform stores user interests and social data. The data formats are different and standard is not uniform, and the phenomenon of data islanding is serious, so that enterprises are difficult to integrate multidimensional information, the comprehensive cognition of clients cannot be formed, and the development of accurate marketing is limited.
In the field of user behavior analysis, the existing technical means are more in the basic data statistics level. For example, only surface data such as page access, clicking times, stay time and the like of a user can be counted, and potential requirements and behavior logic behind the behavior of the user cannot be deeply mined. The traditional analysis method has insufficient processing capacity for complex time sequence dependency relationship in the user behavior sequence, is difficult to predict the next behavior of the user, and cannot provide prospective decision support for enterprises. In addition, most recommendation systems adopt a single recommendation algorithm, such as collaborative filtering-based or content-based recommendation, ignore real-time changes and personalized features of user behaviors, and the recommendation results lack pertinence and diversity, so that user experience is poor, and user transformation and retention are difficult to effectively promote.
Although artificial intelligence techniques have been gradually applied to the marketing field, there are significant drawbacks. Part of AI systems are optimized only for a single link, such as an independent client prediction model or recommendation module, lack of an ecosystem with multi-technology fusion, and cannot realize full-link intellectualization from data acquisition and analysis to decision execution. Meanwhile, with increasingly stricter data privacy protection regulations, cross-enterprise data collaboration faces double risks of legal compliance and privacy disclosure. The prior art is difficult to realize the value sharing of multiparty data on the premise of ensuring the data security, limits the depth and breadth of the insight of enterprises to clients, and prevents the further development of intelligent marketing.
Disclosure of Invention
The invention provides an AI-based full-ecology intelligent passenger acquisition and user behavior analysis system, which aims to solve the problems in the prior art.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
an AI-total-ecology-based intelligent acquisition and user behavior analysis system, comprising:
The multi-source data acquisition module is used for deploying distributed acquisition nodes, acquiring client data through communication of an API interface with an enterprise CRM and ERP system, acquiring user interaction behaviors at a mobile terminal by an SDK embedded point technology, and directionally crawling media data by using an incremental crawler;
The data cleaning and preprocessing module is used for removing the weight of the data through a bloom filter, marking an abnormal value when the average path length from a sample to a root node is smaller than a threshold value mu-3 sigma, wherein mu is the average path length, sigma is the standard deviation, and the cleaned data is stored in a Hadoop distributed file system through a Parquet format;
the user portrait construction module is One-Hot coding vectorization category feature, fractional discretization processing continuous feature, using GRAPHSAGE algorithm to map entity to 256-dimensional vector space, user similarity calculation using formula U i、uj is two different users, vec (u i)、vec(uj) is a user feature vector;
The intelligent acquisition module comprises XGBoost algorithm, a head enterprise client characteristic migration learning technology, a client conversion probability output through a LightGBM model, P convert=f(Xuser,W),Xuser, a user vector containing 128-dimensional characteristics and W as a weight matrix, wherein the algorithm is based on user basis, behaviors and external market data primary screening clues;
The user behavior analysis module adopts a Session-Based method to divide a user behavior sequence, and a transducer architecture is adopted for behavior sequence modeling;
The intelligent recommendation engine module is used for extracting commodity features by adopting a BERT-CNN model based on a recommendation part of the content, inputting the commodity features and the user portrait features into a multi-layer perceptron to calculate matching degree after the commodity features and the user portrait features are spliced, and introducing a time attenuation factor into a collaborative filtering part T is the number of days of behavior occurrence;
the real-time decision and automatic marketing module is used for constructing a rule engine and a reinforcement learning dual-drive framework, wherein the rule engine supports rule configuration, the state space of the former comprises 50-dimensional information, and the action space of the latter comprises 8 marketing channels;
The data visualization and report generation module is used for constructing a component library based on ECharts and AntV, providing a drag report designer, enabling a user to select indexes and dimensions from a metadata center, and enabling a system to automatically generate SQL query sentences;
And the system management and optimization module is used for deploying Prometaus+ Grafana monitoring, alarming when the index exceeds the limit value for 5 minutes continuously, and distributing the flow through Nginx by the automatic A/B test framework.
Further, the system also comprises a multi-mode data fusion module, wherein a microphone array and a camera module are deployed at a hardware level, a mute section is removed through VAD, a Wav2Vec2.0 model is input to extract 768-dimensional acoustic features, image data is used for detecting a face region through MTCNN algorithm, then ResNet50 is used for extracting 2048-dimensional visual features, a alignment network based on a transducer is adopted for cross-mode alignment, and a loss function is adoptedWherein L align is a cross-modal alignment loss value, N is a sample number, i is a sample index, x i is an ith text sample, and f text、faudio is a text and voice feature extraction function respectively.
Further, the method also comprises a client loss early warning module, wherein an integrated learning early warning model is built, three base models of random forests and LightGBM, XGBoost are fused, the model is fused through a Stacking method, a Cox proportion risk model in survival analysis is adopted, S (t|X) =exp (- Λ (t) exp (X beta)) is in a formula S (t|X) and is the retention probability of a user at time t, Λ (t) is a baseline risk function, X is a user vector containing 50-dimensional features, and beta is a regression coefficient.
Further, in the intelligent acquisition module, a transverse federal learning architecture is built based on TensorFlowFederated, each participant trains LightGBM a model based on own client data locally, only uploads model gradient parameters to a central server, the central server updates global model parameters through a weighted aggregation algorithm, and an objective function isWherein K is the number of participating enterprises, omega i is the data volume duty ratio of the ith enterprise, L i (theta) is the local model loss function of the ith enterprise, a differential privacy mechanism is built in the system, and Laplace noise is added before parameter uploadingE is privacy budget and Δf is sensitivity.
Further, in the user behavior analysis module, the formula for calculating the behavior path conversion rate is as followsN start is the initial behavior quantity, N end is the end behavior quantity, a user behavior causal graph is built based on the Do-Calculus theory, and a back door adjustment formula ACE=Σ x′wP(Y|do(X=x),W=w)P(W=w)-Σx′w P (Y|do (X=x'), W=w) P (W=w)
Calculating an average causal effect of behavioral intervention, wherein ACE is the average causal effect, X is an intervention variable, Y is a result variable, W is a confounding variable set, automatically detecting a back gate path in a causal graph by the system, eliminating deviation by a tendency score matching method when the unbroken confounding factors exist, and analyzing the result to guide product iteration.
Further, in the intelligent recommendation engine module, the weights of the two recommendation results are adjusted through an attention mechanism, and the formula is thatH i is a feature vector, W a is a 128×128 leachable matrix, n is the total number of feature vectors, a knowledge graph containing entities is constructed based on Neo4j, node embedding learning is carried out through a graph neural network GRAPHSAGE, a recommendation model inputs the knowledge graph embedding vector vec (G), a user behavior sequence vector vec (u) and a commodity feature vector vec (i) into a multi-layer perceptron, recommendation scores r u,i = f (vec (u), vec (i) and vec (G)) are output, system support interpretability display is carried out, and recommendation reasons are generated through knowledge graph path searching.
Further, regarding 8 marketing channels as arms of the multi-arm slot machine, dynamically selecting a throwing strategy by adopting a Thompson sampling algorithm, modeling the posterior distribution of the conversion rate of each arm as Beta distribution theta a~Beta(αaa, wherein alpha a is the number of success times plus 1, beta a is the number of failure times plus 1, updating posterior distribution parameters once per hour of the system, and distributing marketing budget according to the sampling result.
Further, in the data visualization and report generation module, an analysis report is automatically generated by adopting a natural language generation technology, fine adjustment is performed on a data set based on a pre-trained T5-Base model, a system encodes data indexes, dimension information and business rules into an input sequence, natural language texts are generated through beamsearch, a template fusion mechanism is introduced, and the generated texts are combined with a preset report template to support multi-language output.
Further, an automatic operation and maintenance system is constructed in a system management and optimization module, based on Prometaus acquisition system hardware indexes, software indexes and business indexes, visual monitoring is carried out through Grafana, an LSTM-Seq2Seq model is adopted to predict system load, minute-level index data with history of 7 days is input, the future 1 hour load condition is predicted, in the aspect of model optimization, data drift and conceptual drift are detected through a model monitoring module, and when drift is detected, new data is automatically extracted from a data lake to carry out incremental training.
Furthermore, in the data acquisition stage, laplace noise is added to user behavior data by adopting a differential privacy technology, noise intensity epsilon is adjusted according to data sensitivity, in the data storage stage, sensitive data is encrypted by using a homomorphic encryption technology, in the model training stage, data joint modeling is carried out by adopting a secure multiparty computing protocol, and the system dynamically allocates rights based on user roles, data labels and operation scenes by adopting a federal identity authentication and access control mechanism.
Compared with the prior art, the invention has the beneficial effects that:
In the aspect of data processing, the system realizes efficient collection and deep integration of multi-source heterogeneous data, and breaks the limitation of data island. Through unified data cleaning and preprocessing flow, the data quality is effectively improved, and a solid foundation is laid for subsequent accurate analysis. Dynamic user portraits constructed based on these high quality data can comprehensively characterize users from more than 300 dimensions, enabling enterprises to be able to understand user needs and preferences in depth.
In the intelligent passenger acquisition field, the system can accurately identify the high-potential customers by utilizing advanced algorithms such as transfer learning, integrated learning and the like, and changes the traditional rough passenger acquisition mode. Through accurate prediction of the customer conversion probability, enterprises can intensively input marketing resources into the most valuable customer groups, so that the customer acquisition efficiency is remarkably improved, and the marketing cost is reduced. Meanwhile, the personalized marketing touch strategy can effectively improve the attention and response rate of the clients.
The user behavior analysis module can process large-scale time sequence behavior data by means of a transducer architecture and a causal inference technology, and can also mine causal relations among user behaviors, so that enterprises can be helped to accurately grasp the evolution trend of user demands. This enables enterprises to purposefully optimize product designs, improve user experience, adjust marketing strategies, thereby increasing user satisfaction and loyalty.
The intelligent recommendation engine fuses various recommendation algorithms and knowledge graph technologies, and can provide diversified and interpretable recommendation contents according to real-time behaviors and personalized features of users. The user engagement and use experience are improved, user conversion can be effectively promoted, and sales and profits of enterprises are increased.
The real-time decision and automatic marketing module realizes intelligent triggering and strategy dynamic optimization of marketing activities. The system can automatically adjust marketing channels, content and opportunities according to the changes of user states and market environments, and maximize marketing input-output Ratio (ROI). The application of privacy calculation and federal learning technology ensures that data can be shared across enterprises on the premise of safety compliance, and the depth and breadth of the enterprise to client insight are expanded.
In addition, the data visualization and automatic operation and maintenance functions of the system reduce the use threshold and maintenance cost, so that enterprises can monitor marketing effects more conveniently and optimize system performance. The technology provides a full-link intelligent solution from data acquisition and analysis to decision execution for enterprises, and comprehensively improves the marketing competitiveness and market response capability of the enterprises.
Drawings
FIG. 1 is a schematic block diagram of an AI-based fully ecological intelligent acquisition and user behavior analysis system;
FIG. 2 is a diagram of a conventional passenger acquisition versus intelligent passenger acquisition effect of the system;
FIG. 3 is a graph of data processing throughput versus model reasoning delay comparison before and after optimization;
FIG. 4 is a graphical illustration of a dynamic optimization contrast of a marketing channel ROI for a multi-arm slot machine algorithm with a fixed budget strategy.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
In the description of the present invention, it should be understood that the terms "center", "longitudinal", "lateral", "length", "width", "thickness", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", "clockwise", "counterclockwise", etc. indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings are merely for convenience in describing the present invention and simplifying the description, and do not indicate or imply that the device or element referred to must have a specific orientation, be configured and operated in a specific orientation, and thus should not be construed as limiting the present invention.
Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more of the described features. In the description of the present invention, the meaning of "a plurality" is two or more, unless explicitly defined otherwise. The terms "mounted," "connected," "coupled," and "connected" are used in a broad sense, and may be, for example, fixedly connected, detachably connected, or integrally connected, mechanically connected, electrically connected, directly connected, or indirectly connected via an intermediate medium, or may be in communication with the interior of two elements. The specific meaning of the terms in the present invention will be understood by those skilled in the art in detail, and the present invention will be further described in detail with reference to the accompanying drawings.
Referring to fig. 1 to 4, an AI-based full-ecology intelligent acquisition and user behavior analysis system comprises the following modules:
And the multi-source data acquisition module is used for realizing multi-channel data acquisition by the system through a distributed acquisition architecture. And connecting the enterprise internal system with the CRM system by adopting a RESTfulAPI interface, synchronizing structured data such as client basic information, transaction records and the like according to the hours, and extracting associated data such as orders, inventory and the like from the ERP system by using an ETL tool (such as APACHENIFI). For mobile terminal application, the embedded self-grinding SDK acquires user behavior data, 23 interactive behaviors such as click coordinates, sliding tracks, page jump paths and the like are covered, the acquisition frequency is reported in real time, and the data format conforms to the JSON standard. In the aspect of external data acquisition, a distributed crawler cluster is deployed, a directional crawler is developed based on Scrapy frames, a User-Agent pool and an IP Agent pool are arranged to bypass a website anti-crawling mechanism, and unstructured data such as commodity evaluation, social media topic discussion and the like of an e-commerce platform are automatically grabbed every early morning. The system builds a message queue (Kafka) on an Arian ECS server cluster, and serves as a buffer layer for data acquisition, each Topic corresponds to one data type, and the partition number is set to 8 so as to meet the high throughput requirement. The acquisition node pushes data to Kafka in real time, and when the single-node data flow exceeds 5MB/s, a load balancer (Nginx) automatically shunts the acquisition task to the standby node, so that the system is ensured to stably process 12 ten thousand data flows per second.
And the data cleaning and preprocessing module adopts a flow processing architecture in the data cleaning process and constructs a processing pipeline based on APACHEFLINK. Firstly, the data is de-duplicated through a bloom filter, the hash function number of the bloom filter is set to be 6, the bit array size is 1024KB, for numerical data (such as transaction amount and stay time length), anomaly detection is carried out through an isolated forest algorithm, the number of trees is set to be 100, the sub-sample size is 256, when the average path length from a sample to a root node is smaller than mu-3 sigma, the sample is marked as an anomaly value, and normal values calculated through a box diagram method are adopted for replacement. And in the missing value processing link, sparkSQL statistics mode filling is used for category type data (such as gender and region), time sequence data (such as access time sequence) is predicted and filled through an LSTM-AE model, a model input layer is set to 128 neurons, a hidden layer adopts 2 layers of LSTM (64 units per layer), batchsize is set to 32 during training, the learning rate is 0.001, and 100 epochs are trained. Converting the cleaned data into Parquet format, storing in Hadoop Distributed File System (HDFS), and constructing three-level hierarchical architecture in Hive data warehouse, wherein original data layer (ODS) retains complete original data, cleaning layer (DWD) stores cleaned data, and mild summary layer (DWS) aggregates statistical index according to day, week and month.
And the user portrait construction module is used for processing data by using the Scikit-learn library of Python in the characteristic engineering stage. The category features are converted into sparse vectors by One-Hot coding, and the continuous features are divided into 5 sections by discrete processing of dividing numbers. And constructing GRAPHSAGE model based on DGL (DeepGraphLibrary) framework to perform node embedded learning, and mapping the entities such as users, commodities, brands and the like to 256-dimensional vector space. When the model is trained, the neighbor sampling number is set to be 10, the aggregation function adopts mean value pooling, the optimizer uses Adam, the learning rate is 0.002, and the training period is 7 days.
The user similarity calculation uses cosine similarity and Jaccard coefficients in combination. Cosine similarity formulaFor measuring the distance of users in a vector space, wherein u i、uj is two different users, vec (u i)、vec(uj) is a user feature vector mapped to a 256-dimensional vector space through a GRAPHSAGE algorithm, and Jaccard coefficients calculate the user tag overlap ratio, wherein the formula isWherein A, B are the tag sets of users u i and u j, respectively. The portrayal tab system contains 326 dimensions, where dynamic tabs (e.g., real-time interest preferences) are updated every 15 minutes by the flank stream calculation, and static tabs (e.g., demographic information) are automatically calibrated by batch tasks every 1 day a month and early morning.
And the intelligent acquisition module is used for constructing a three-level potential customer screening system, wherein a XGBoost model is used for filtering massive clues, and input characteristics comprise user basic information (12 dimensions of age, gender and the like), historical behavior data (18 dimensions of access frequency, stay time and the like) and external market data (10 dimensions of industry growth rate, bidding activity heat and the like). When the model is trained, the number of trees is set to be 500, the learning rate is 0.05, the maximum depth is 6, and the clues with the scores of 30% before being screened out enter the next round through 5-fold cross validation adjustment parameters.
And performing transfer learning by adopting Domain-AdversarialNeuralNetwork (DANN) for secondary screening, and transferring the high-value client characteristics of the head enterprise to a target client group. The source domain data comprises 10 ten thousand high-value customer samples, the target domain is 50 ten thousand threads after primary screening, the discriminator adopts a 3-layer fully-connected network (128 neurons in each layer), the weight of the countermeasures loss is 0.5 during training, and the threads 20% before the similarity with the source domain characteristics are screened.
The LightGBM model is used to output the customer transformation probabilities P convert=f(Xuser, W), where X user is the user vector containing 128-dimensional features and W is the weight matrix that the model trains. The model adopts GOSS (Gradient-basedOne-SIDESAMPLING) algorithm to process the sample imbalance problem, sets the learning rate to be 0.1, the number of trees to be 300, and the maximum depth to be 8, and defines the clients with the prediction probability larger than 0.6 as high-potential clients.
For high potential customers, the system calls a Freemarker template engine to generate personalized marketing documents, and the templates contain dynamic variables (such as user names and browsed commodity names). The marketing channel is selected by adopting a multi-arm slot machine algorithm (see a real-time decision module for details), the sending time determines the optimal time period (such as 8-10 points of pushing APP messages at night on working days) through historical data analysis, and the combined effect of different documents and channels is compared through an A/B test.
And the user behavior analysis module is used for constructing a real-time behavior analysis engine Based on the Flink flow calculation frame, dividing a user behavior sequence by adopting a Session-Based method, and judging that the Session is ended when the user does not operate for 30 minutes. The behavior sequence modeling adopts a transducer architecture, a 12-layer encoder is arranged, the attention header number is 8, and the embedding dimension is 512. When calculating the behavior path conversion rate, constructing a user behavior track through a Directed Acyclic Graph (DAG), and calculating the behavior path conversion rate by using the formula as followsWherein N start is the number of start behaviors, N end is the number of end behaviors, and N start and N end are counted in real time through a Redis counting service.
Specifically, a real-time behavior analysis platform is built Based on APACHEFLINK, a Session-Based method is adopted to divide a user behavior sequence, and the Session timeout time is set to be 30 minutes. Behavioral sequence modeling using a transducer architecture, the model contained a 12-layer encoder with 8 attention heads per layer, with an embedding dimension of 512. The input data enters the model after position coding, and the position coding formula is as follows: where pos is the position, i is the dimension index, and d model is the embedding dimension. Behavior path conversion rate is calculated through a Directed Acyclic Graph (DAG), and node user numbers are counted in real time by using a HyperLogLog data structure of Redis. And (3) introducing a Shapley value method to perform attribution analysis, and quantifying the contribution degree of each behavior node to final transformation. When the contribution degree of a certain node exceeds 30%, the system automatically generates optimization suggestions, for example, if the loss rate from 'joining shopping carts' to 'settlement' links is found to be high, the optimization of the payment flow is suggested, and filling steps are reduced.
And the intelligent recommendation engine module is used for recommending that the system adopts a mixed architecture, and the BERT-CNN model is used for recommending parts based on contents. The BERT pre-training model adopts a Google open-source Chinese model, freezes parameters of the first 6 layers and the later 6 layers in a downstream task to participate in fine adjustment, wherein the CNN part comprises 3 layers of convolution layers (the convolution kernel sizes are 3, 4 and 5 respectively and the channel number is 128), and the matching degree is calculated by inputting the model after splicing the commodity text description and the user portrait characteristics. The collaborative filtering part adopts matrix decomposition (MF) algorithm and introduces time attenuation factor(T is the number of days of behavior occurrence), the influence of old behaviors is reduced, random gradient descent (SGD) is used for model training, the learning rate is 0.01, and 100 epochs are trained.
Two recommended results are fused through an attention mechanism, and a formula is formedWhere h i is the eigenvector, W a is a 128 x 128 learnable matrix, and the parameters are optimized during training using a cross entropy loss function. The recommendation result is processed by a cold start strategy, wherein a new user adopts hot recommendation (based on commodity sales ordering), and a new commodity adopts semantic recommendation (searching related users of similar commodities) based on a knowledge graph. The final recommendation list is returned through Redis cache service, the cache validity period is set to be 10 minutes, and the QPS can reach 5000+.
And the real-time decision and automation marketing module is used for developing a rule engine based on a Drools framework and supporting a visual rule configuration interface. Rule types include time triggers (e.g., holiday automatically sending coupons), event triggers (user pushes reminders 1 hour after additional purchases are not paid), conditional triggers (full 500-membered gifting points are consumed). Rule matching adopts a Rete algorithm, and the system can process 10 ten thousand rule matching requests per second.
The reinforcement learning part optimizes marketing decisions by using a Deep Q Network (DQN), a state space comprises 70-dimensional information such as user portrait characteristics (50-dimensional), marketing activity history data (20-dimensional), current time and the like, an action space comprises 8 marketing channels such as short messages, mails, push pushing and the like, a reward function R t=αCt+βR′t-γCcost comprises alpha=0.4 (instant conversion weight), beta=0.3 (long-term value weight), gamma=0.3 (cost weight), R t is a reward value at the moment t, C t is the instant conversion rate at the moment t, R t' is the long-term value at the moment t, ccost is the marketing cost model at the moment t, an experience playback mechanism is adopted, and the playback buffer size is set to 10 ten thousands, and the target network is updated every 100 steps.
The marketing calendar module supports a 6 month-advanced configuration campaign, scheduling execution flows by the Airflow task scheduler. When the active ROI is below 15% of expected, the system automatically triggers policy adjustments, e.g., decrease the inefficient channel budget, increase the high conversion channel impression, or adjust the document content to find the optimal version through the a/B test.
And a data visualization and report generation module, wherein a visualization component is developed based on ECharts5.0 and AntVG to provide 28 chart types. The Sang Ji diagram is used for displaying a user behavior path, the size of a node is automatically adjusted according to the number of users, the thermodynamic diagram can display click distribution according to the region and time dimensions, the funnel diagram supports a user-defined conversion step, and the conversion rate is calculated in real time. The report designer adopts drag-and-drop interaction, after a user selects indexes (such as GMV, UV and conversion rate) and dimensions (time, region and channel) from the metadata center, the system automatically generates SQL query sentences, interfaces ClickHouse column type databases, and the response time of a complex report (containing more than 10 aggregation calculations) is controlled within 3 seconds.
A Natural Language Generation (NLG) module performs fine-tuning on a dataset containing 10 ten thousand enterprise analytics reports based on the T5-Base model. The input data contains an index value, an equivalence ratio/ring ratio change, dimension information, and the like, and text is generated by beamsearch (beamsize =3). A template fusion mechanism is introduced, for example, the' present month\ ({ region } \) { index } is\ ({ numerical value }, comparably increases\) { percent }, which mainly benefits from $ { cause }, wherein variables are automatically filled from data. The multi-language output of Chinese, english and Japanese is supported, and the manual audit after machine translation is carried out through GoogleTranslateAPI.
The system management and optimization module is that a monitoring system is built based on Prometheus+ Grafana, and the acquisition indexes are divided into three types:
Hardware indexes including CPU utilization rate, memory bandwidth and disk I/O;
the software index comprises an interface QPS, model reasoning delay and database connection number;
business indexes are daily living user number, conversion rate and GMV.
Setting three-level alarm rules, namely sending early warning short messages when the index exceeds 80% of a threshold value, triggering mail notification when the index exceeds 90%, automatically suspending non-critical tasks and notifying operation and maintenance personnel by telephone when the index exceeds 95%.
The automated A/B test achieves flow distribution through the Nginx reverse proxy, defaults to 90% flow away from the baseline version and 10% flow away from the experimental version. And evaluating the experimental effect by using a Bayesian statistical method, and automatically switching the flow to the new scheme when the confidence of the improvement index of the new scheme exceeds 95%. In terms of model optimization, data drift was detected by calculating the Kolmogorov-Smirnov (KS) distance of the training data from the real-time data, and when KS value exceeded 0.2, new data was extracted from HDFS for incremental training for the past 7 days. The training adopts a model parallel (divided by layers) and data parallel (batchsize is set to 128) mixed strategy, and the incremental training time of the LightGBM model is shortened from 4 hours to 1.5 hours on an 8-card NVIDIAA GPU cluster of 100.
The invention also comprises a multi-mode data fusion module, wherein a hardware layer adopts a Rayleigh core micro RK3588 development board to integrate a microphone array (6 wheat) and a camera module (4K resolution). The voice data acquisition adopts 8kHz sampling rate and 16-bit quantization precision, after a silence segment is removed through a VAD algorithm, 768-dimensional acoustic features are extracted by inputting a Wav2Vec2.0 model, the face region is detected by image data through MTCNN algorithm, and 2048-dimensional visual features are extracted by ResNet.
Transformer-based alignment network is used for cross-modal alignment, and loss functionWhere f text is a text feature extraction function, f audio is a speech feature extraction function, i is a sample index, N is the total number of samples involved in the calculation, x i is the ith text data sample, and y i is the ith speech data sample. When training is carried out, batchsize is set to be 16, the learning rate is 0.0001, 50 epochs are trained, and the multi-modal features are mapped to 1024-dimensional unified semantic space. The fused data is used for enhancing the user portrait, for example, the real attitude of the user is accurately judged by analyzing the consistency of the voice intonation and the text emotion of the user comment, and if the voice intonation is angry but the text is neutral, the user emotion label is set as potential dissatisfaction.
The invention further comprises a customer loss early warning module, wherein the early warning module adopts an integrated learning framework, fuses three base models of Random Forest (RF) and LightGBM, XGBoost, and performs model fusion by a Stacking method. The first layer models are trained individually, and the second layer uses logistic regression to perform secondary training with the first layer output as a feature. The input features include 68-dimensional data including RFM metrics (time of last consumption, frequency of consumption, amount of consumption), service satisfaction score, number of complaints, etc. And calculating the loss probability by adopting a Cox proportion risk model in survival analysis, wherein in the formula S (t|X) =exp (- Λ (t) exp (X beta)), S (t|X) is the retention probability of a user at time t, Λ (t) is a baseline risk function, X is a user vector containing 68-dimensional features, and beta is a regression coefficient. Model training solves for parameters by maximum likelihood estimation using the lifelines library of Python. When the predicted loss probability exceeds a threshold value of 0.6, the system automatically triggers a three-level rescue strategy:
primary, sending personalized coupons with 100 minus 20;
A middle stage of arranging exclusive customer service call return visit within 24 hours;
And high-level, providing free upgrade service of member level.
In the intelligent guest-obtaining module, a transverse federal learning framework is built based on TensorFlowFederated (TFF), and up to 50 enterprises are supported to participate in data collaboration. Each enterprise uses own customer data to train LightGBM the model locally, and training parameters are consistent with the intelligent acquisition module. The central server receives model gradient parameters uploaded by each enterprise, updates a global model through a weighted aggregation algorithm, and adopts a formulaWhere K is the number of participating enterprises, ω i is the ith enterprise data volume duty cycle, and L i (θ) is the ith enterprise local model loss function. A differential privacy mechanism is built in the system, and Laplace noise is added before parameter uploadingE is privacy budget (default value 0.5) and Δf is sensitivity (set to 1). After each round of training is finished, the central server verifies the validity of updating the parameters of each enterprise through a secure multiparty computing (MPC) protocol, and malicious attacks are prevented.
In the invention, a causal inference technology is introduced into a user behavior analysis module to identify the real causal relationship of the user behavior. Building a user behavior causal graph based on the Do-Calculus theory, and adjusting the formula ace=Σ x′wP(Y|do(X=x),W=w)P(W=w)-Σx′w P (y|do (x=x'), w=w) P (w=w) through a back door
Calculating the average causal effect of behavioral intervention, wherein ACE is the average causal effect, X is an intervention variable (such as page reformation), X and X' are different values of the intervention variable X, Y is a result variable (such as conversion rate), W is a confounding variable set, and W is the value of a confounding variable W.
The system automatically detects back gate paths in the causal graph and eliminates bias by a bias score matching (PSM) method when there are unblocked confounding factors. For example, when analyzing the "influence of page shuffling on conversion rate", if user liveness is found to be confounding factor, PSM is used to match users with similar liveness in shuffling and non-shuffling, ensuring reliability of the result. The analysis results are used for guiding product iteration, such as promoting a scheme of improving the conversion rate by 18% to other pages by changing the position of a certain button.
In the invention, in the intelligent recommendation engine module, the knowledge graph is adopted to enhance recommendation. Knowledge graph is built based on Neo4j5.0, and the knowledge graph comprises entities such as users, commodities, brands, classes, attributes and the like, the total number of nodes is 1200 ten thousand, and the number of relation edges is 5800 ten thousand. Node embedding learning is carried out through a graph neural network GRAPHSAGE, a two-layer aggregator is arranged, the neighbor sampling number is 15 and 10 respectively, an Adam optimizer is used for training, the learning rate is 0.003, and the training period is 10 days.
The recommendation model inputs the knowledge graph embedded vector vec (G), the user behavior sequence vector vec (u) and the commodity feature vector vec (i) into a three-layer fully-connected network (256 neurons per layer), and outputs recommendation scores r u,i =f (vec (u), vec (i), vec (G)). The system supports the explanatory display, and searches a recommended path in the knowledge graph through a Cypher statement, for example :"MATCH(u:User)-[:BOUGHT]->(p1:Product)-[:BELONGS_TO]->(c:Category)<-[:BELONGS_TO]-(p2:Product)WHEREu.id={userId}RETUR Np2", generates a recommended reason' because you purchased a brand A sports shoe, recommended a brand B new version of the same category, and the two belong to the ventilation and shock absorption series.
In the invention, in the real-time decision and automatic marketing module, a multi-arm slot machine algorithm is adopted to optimize marketing resource allocation. The 8 marketing channels are regarded as 'arms' of the multi-arm slot machine, and a thompson sampling algorithm is adopted to dynamically select a throwing strategy. The conversion posterior distribution for each arm was modeled as Beta distribution θ a~Beta(αaa, where α a is the number of successes plus 1 and β a is the number of failures plus 1. Initially, α a and β a for each arm are set to 1, representing a uniform a priori distribution. The system updates posterior distribution parameters once per hour, and distributes marketing budget according to sampling results. The specific implementation steps are as follows:
sampling phase Each channel a samples a conversion rate from its Beta distribution
Selection phase selecting the channel with the highest sampling conversion
An update stage, executing the marketing activity of the selected channel, if successfulOtherwise
When the score of 3 continuous sampling times of a channel is 20% lower than the average value, the budget ratio of the channel is automatically reduced by 5%. To avoid the cold start problem, the new channel initial budget is set to 2% and participates in normal thompson sampling after it is performed cumulatively 100 times. By the algorithm, the system can dynamically adapt to channel effect change and optimize resource allocation.
In the invention, in the data visualization and report generation module, an analysis report is automatically generated by adopting a Natural Language Generation (NLG) technology. Fine tuning was performed on a dataset containing 10 ten thousand enterprise analytics reports based on a pre-trained T5-Base model. The system encodes the data index, dimension information, and business rules into an input sequence, and generates natural language text through beamsearch (beamsize =3). And a template fusion mechanism is introduced, and the generated text is combined with a preset report template, for example, the GMV contribution in the east China area accounts for 45 percent from the regional dimension, and the GMV contribution is increased by 30 percent in the same ratio, so that the method is mainly beneficial to newly developed online channels. The multi-language output is supported, and the report is automatically converted into English, japanese and other languages through the translation API, so that the requirements of the national enterprises are met.
In the invention, an automatic operation and maintenance system is constructed in a system management and optimization module. Visual monitoring is performed through Grafana based on the Prometaus acquisition system hardware index, the software index and the business index. And predicting the system load by adopting an LSTM-Seq2Seq model, inputting minute index data of 7 days, predicting the load condition of 1 hour in the future, and automatically triggering Kubernetes to expand the container when the predicted load exceeds 80% of a threshold value. In the aspect of model optimization, data drift (adopting maximum average difference (MMD) test) and concept drift (based on model prediction probability distribution change) are detected in real time through a model monitoring module, when drift is detected, new data is automatically extracted from a data lake for incremental training, and an online model is updated through a blue-green deployment mode, so that no interruption of service is ensured.
In the invention, the system implements privacy protection measures in the whole life cycle of the data. In the data acquisition stage, the differential privacy technology is adopted to add Laplacian noise to the user behavior data. For numerical data (e.g., dwell time), noise is addedWhere N is the laplace noise and,For the laplace distribution, e is the privacy budget (transaction data e=0.3, browsing data e=0.5), Δf is the sensitivity (set to 1). For category type data (such as commodity category), disturbance is performed by a random response mechanism, so that probability is achievedReturning a true value (p is the probability of returning the true value), otherwise, randomly selecting other categories.
And in the data storage stage, the sensitive data is encrypted by using homomorphic encryption technology (CKKS scheme). For a numerical value field (such as transaction amount), hierarchical homomorphic encryption is adopted, and addition and multiplication operation under a ciphertext state are supported. For example, when calculating the average value of ciphertext data, summation and division operations can be directly performed on ciphertext, and a correct result can be obtained after decryption.
Model training phase, using garbled circuit (GarbledCircuit) technology in secure multiparty computing (MPC) protocol. The participants convert the original data into a Boolean circuit, and the encrypted model parameters are finally obtained through joint calculation by exchanging encrypted circuit gate values. In the federal learning scene, each participant only uploads the encrypted gradient parameters, and the central server decrypts and aggregates the encrypted gradient parameters and then issues the updated gradient parameters to ensure that the original data is not leaked.
The system dynamically allocates rights according to user roles, data tags and operating scenarios through an attribute-based access control (ABAC) mechanism. For example, customer service personnel can only view the desensitized user consultation records, an analyst can access the encrypted original data but need to examine, and an administrator can perform full data operations but the operations can be audited.
Embodiment 1E-commerce platform Intelligent acquisition and repurchase promotion scenario
After a certain comprehensive e-commerce platform is accessed into the system, the application is developed for new user acquisition and old user repurchase. The multisource data acquisition module acquires clicking/purchasing behaviors (300 thousands of interactions in the day) in the APP through an API (application program interface) docking platform order system (10 thousands of bills in the day) and an SDK (software development kit), and a crawler grabs bid price and user evaluation (5 thousands of daily). The data cleaning module adopts a bloom filter to remove weight (daily processing repeated data is 8%), an isolated forest marks an abnormal order (such as abnormal purchasing of more than 10 ten thousand yuan per stroke), and the missing user region information is filled through IP address mapping (accuracy rate is 92%).
The user portrayal construction module generates 326-dimensional labels, with dynamic labels of "price sensitive", "brand loyalty" etc. updated every 15 minutes. The intelligent passenger acquisition module screens 5 thousands of potential customers with conversion rate of over 60% through the LightGBM model by transferring and learning high-value customer characteristics of the head 3C category, and pushes a personalized coupon of 'new passenger head list 8 folds' aiming at the potential customers, and the touch channel is preferentially selected APPPush (the highest ROI is verified through a multi-arm slot machine algorithm).
The user behavior analysis module finds that the path conversion rate of 'first page search- & gt commodity details- & gt shopping cart- & gt settlement' is only 12%, and the path conversion rate after optimization is improved to 28% by locating 'settlement page loading slowly' through causality inference as a key bottleneck. The intelligent recommendation engine fuses BERT-CNN and collaborative filtering, recommends the same brand accessories for the user who browses the mobile phone, and improves the recommended click rate by 40%. The real-time decision module automatically triggers full-reduction activity in a large promotion period, and dynamically adjusts budget of each channel by combining reinforcement learning, so that the acquisition cost of new customers is reduced by 35%, and the repurchase rate of old customers is improved by 22%.
Embodiment 2 financial product accurate marketing scenario
A bank financial platform applies the system to optimize marketing of financial products. The multi-source data acquisition module integrates customer asset data (50 ten thousand users) of the CRM system, web banking journals (20 ten thousand operations per day) and external financial information browsing records (3 ten thousand crawlers grasp each day). In the data cleaning stage, abnormal transactions of isolated forest marks (such as transferring more than 500 ten thousand yuan on a single day) are replaced by a box-line graph method, and missing risk assessment data are predicted to be filled through an LSTM-AE model (error rate is < 5%).
The user portrait construction module extracts core labels such as 'risk preference', 'investment period', and the like, calculates the similarity through GRAPHSAGE algorithm, and discovers 80% of concerned bond products in 'robust user' clusters. The intelligent passenger acquisition module utilizes federal learning and combines 3 branch data training models to output high-potential customer conversion probability, and aims at pushing 'annual 4.2% financial management' products for 20% of customers before scoring, and the touch time is selected at 10:00 of working days (the historical data show that the opening rate is highest at the moment).
The user behavior analysis module analyzes the behavior sequence through a transducer, and finds that the conversion rate of the path from checking the product specification to calculating the benefits to consulting the customer service is 3 times that of other paths, so that the customer service is optimized, the user is guided to finish the path, and the conversion rate is improved by 15%. The intelligent recommendation engine is combined with the knowledge graph to recommend new products managed by the same fund manager for the user who purchases the fund, and the recommendation interpretation is improved by 60% (user feedback is clear in reason). The real-time decision module dynamically distributes short messages and telemarketing resources through Thompson sampling, so that the marketing ROI is improved by 28%, and the compliance audit display system completely meets the requirements of data safety regulations through differential privacy and homomorphic encryption technology.
Example 1 (E-commerce platform) effects comparison Table
Index (I) Before the system is applied After the system is applied Amplitude of lift
Cost of getting new guest (Yuan/person) 85 55 35%
Old user repurchase rate (%) 18 22 22%
Recommended click rate (%) 3.5 4.9 40%
Critical path conversion (%) 12 28 133%
Abnormal data processing efficiency (bar/second) 5000 12000 140%
After the table data shows that the e-commerce platform is applied to the system, the core marketing index is comprehensively improved. The new guest obtaining cost is reduced by 35%, and the intelligent guest obtaining module is beneficial to precisely screening high-potential customers through transfer learning, so that ineffective marketing investment is reduced. The repurchase rate of the old user is improved by 22%, the optimized effect of the user behavior analysis module on the repurchase path is reflected, and the conversion rate of the critical path is increased from 12% to 28% after the loading speed of the settlement page is improved. The recommendation click rate is improved by 40%, the validity of the multiple algorithms is fused by the intelligent recommendation engine, and the user trust is enhanced by combining the interpretable recommendation of the knowledge graph. The abnormal data processing efficiency is doubled, the synergistic advantage of the bloom filter of the data cleaning module and the isolated forest algorithm is reflected, a high-quality data base is laid for subsequent analysis, and the marketing ROI of the platform is integrally improved.
Example 2 (finance and financial platform) effect comparison table
This form represents significant success behind the financial platform application system. The marketing ROI is improved by 28 percent, which is attributed to the dynamic resource allocation of the SMS and telephone channels by the Toepson sampling algorithm of the real-time decision module. The high-potential customer identification accuracy rate reaches 82%, and the model generalization capability is enhanced due to the integration of multi-branch data by federal learning. The recommendation interpretability score is improved by 60%, and recommendation reasons such as 'same-fund manager' generated by the knowledge graph are easier to understand by users. Compliance audit passing rate is 100%, and the validity of differential privacy and homomorphic encryption technology is verified, so that the financial data security requirement is met. The customer service guiding conversion rate is improved by 188%, the accurate mining of the high-value path by the user behavior analysis module is reflected, the user is guided to complete conversion through optimized speaking, and the financial business marketing efficiency is comprehensively improved.
Embodiment 3E-commerce platform Intelligent acquisition and repurchase promotion scenario
The problem that the traditional acquisition mode is low in efficiency is faced by a certain off-line chain business (20 stores), the acquisition cost is high and the conversion rate is less than 5% depending on the modes of leaflet distribution, off-line sales promotion and the like, the user behavior data are scattered in a POS system, member registration and a small program, and are difficult to integrate and analyze, so that consumption preference cannot be accurately known, sales promotion activity pertinence is poor, and the old customer re-purchase rate is continuously low.
In order to solve the problems, the intelligent customer acquisition and user behavior analysis system based on AI total ecology is applied to data acquisition and processing, wherein a multi-source data acquisition module is connected with a store POS system (3 ten thousand transactions per day) through an API, a small program SDK acquires commodity browsing and coupon verifying behaviors (50 ten thousand interactions per day), and an increment crawler grabs peripheral bid promotion information (2 ten thousand daily). The data cleaning module adopts a bloom filter to remove weight (7% of repeated verification records in daily processing), abnormal consumption is marked through an isolated forest (such as a non-group purchase order for purchasing 50 bottles of the same type of beverage in a single time), and missing user age information is filled through consumer clustering (the accuracy rate is 89%).
The user portrait and the acquisition of customers, wherein the user portrait construction module extracts 310+ dimension labels including dynamic labels such as 'fresh high-frequency purchase', 'weekend family purchase', and the like (updated every 30 minutes), maps the users to 256-dimensional vector space through GRAPHSAGE algorithm, and calculates similarity to divide consumer groups. The intelligent passenger acquisition module utilizes the high-value client characteristics of the DANN transfer learning transfer core store, and selects 2 thousands of potential clients with conversion probability of more than 62% through LightGBM model, pushes 'New Consumer 200 minus 50' coupons, and the touch channel preferentially selects community WeChat groups (the ROI is verified to be highest through a multi-arm slot machine algorithm).
The behavior analysis and recommendation comprises the steps that a user behavior analysis module adopts a Session-Based method to divide a behavior sequence, a transaction architecture is used for finding that the path conversion rate of 'scanning code coupon' stays in a fresh-keeping area 'and' paying by a cash desk is only 8%, a cause and effect inference positioning 'coupon use threshold is too high' is a key bottleneck, and the path conversion rate is improved to 23% after optimization and is adjusted to be 'full 150 minus 30'). The intelligent recommendation engine fuses the knowledge graph, and recommends related commodities (such as infant wet tissues and complementary foods) for users who purchase infant milk powder, and the recommended click rate is improved by 45%.
The real-time decision making and effect that the real-time decision making module combines a rule engine and reinforcement learning, automatically triggers the 'member consuming full gift' activity on holidays, dynamically adjusts the resource allocation recommended by store broadcasting and shopping guide, finally reduces the acquisition cost of new customers by 40% and improves the resale rate of old customers by 28%.
Effect comparison table
Index (I) Before the system is applied After the system is applied Amplitude of lift
Cost of getting new guest (Yuan/person) 60 36 40%
Promotional conversion (%) 5 12 140%
Old customer repurchase rate (%) 18 23 28%
Click through Rate of recommended goods (%) 3.2 4.6 45%
The comparison table intuitively shows the optimization effect of the system on the retail scene, namely the new customer acquisition cost is reduced by 40%, the intelligent customer acquisition module is beneficial to accurately screening high potential customers through transfer learning, invalid marketing investment is reduced, the sales promotion activity conversion rate is improved by 140%, the optimization effect of the user behavior analysis module on the consumption path is reflected, such as the conversion bottleneck is solved by adjusting a coupon threshold, the improvement of the old customer repurchase rate and the recommended commodity click rate is verified, the effectiveness of personalized recommendation realized by combining the intelligent recommendation engine with the knowledge graph is realized, and the viscosity and the consumption will of a user are enhanced.
Example 4 Online entertainment platform user Retention and Payment conversion scenarios
The online entertainment platform (for providing video, game and reading service) has the problems that the 7-day retention rate of registered users is less than 20%, payment conversion mainly depends on home page recommendation, but recommendation content is seriously homogenized, single recommendation logic based on 'hot degree' leads to low user interest matching degree and payment rate of only 3%, user behavior data (such as watching duration, clicking track and comment interaction) are not deeply mined, high-potential payment users cannot be identified, and marketing resources are seriously wasted. After being connected into an AI-based full-ecology intelligent acquisition and user behavior analysis system, the method is applied as follows:
And the data acquisition and processing, namely acquiring user behaviors (100 thousands clicks and 30 thousands comments in daily) in the APP by the multi-source data acquisition module through the SDK, abutting the member system and payment record by the API, and capturing the content heat data of the industry bid by the crawler. The data cleaning module marks abnormal accounts (such as 10+ area brushing accounts switched in a short time) by using an isolated forest through a bloom filter to repeatedly click records (12% of daily average processing), and the missing user interest labels are predicted and filled through an LSTM-AE model (error rate < 6%).
The user portrait and the acquisition of the guests are that a user portrait construction module generates 340+ dimension labels including suspicion drama preference, hand-tour payment willingness and the like, and user similarity is calculated by utilizing GRAPHSAGE algorithm through One-Hot coding and discrete processing characteristics of the digits. Based on XGBoost primary screening clues, the intelligent passenger acquisition module integrates 3 cooperation platform data in combination with federal learning, outputs 1.5 thousands of users with the payment conversion probability of more than 70% through a LightGBM model, directionally pushes the 'first month member half price' activity, and the contact time is selected at 8-10 points in the evening (historical active peak).
The behavior analysis and recommendation comprises the steps that a user behavior analysis module analyzes a behavior sequence through a transducer, discovers that the path conversion rate is only 5% from 'watching episodes, adding a chasing list, checking payment Fei Taocan', locating 'package description is unclear' through the Do-Calculus theory, and adding a 'single-album payment' option after optimization, wherein the path conversion rate is improved to 18%. The intelligent recommendation engine introduces a time decay factorAnd the knowledge graph is combined to recommend the same director work, so that the recommendation accuracy is improved by 52%.
The real-time decision and effect that the real-time decision module dynamically adjusts recommended position resources through reinforcement learning, the exposure of the high-potential user is improved by 30%, the 7-day retention rate of the final platform is improved to 35%, and the payment rate is improved to 8%.
Effect comparison table
Index (I) Before the system is applied After the system is applied Amplitude of lift
7 Day user retention (%) 20 35 75%
User pay rate (%) 3 8 167%
High potential user identification accuracy (%) 55 78 42%
Recommended content matching degree (%) 40 61 52%
The on-line entertainment platform has the advantages that the on-line entertainment platform is improved due to the deep mining and accurate operation of the system on user behaviors, the 7 daily user retention rate and the recommended content matching degree are remarkably improved, the intelligent recommendation engine is beneficial to introducing time attenuation factors and fusing knowledge graphs to realize dynamic matching of the content and the user interests, the user paying rate and the high-potential user identification accuracy are improved, the accurate depiction of the multidimensional labels by the user portrait construction module is embodied, and the intelligent guest acquisition module enhances the generalization capability of the model through federal learning, so that marketing resources are more focused on high-value users.
The present invention is not limited to the above-mentioned embodiments, and any person skilled in the art, based on the technical solution of the present invention and the inventive concept thereof, can be replaced or changed within the scope of the present invention.

Claims (10)

1. An AI-based full-ecology intelligent acquisition and user behavior analysis system is characterized by comprising:
The multi-source data acquisition module is used for deploying distributed acquisition nodes, acquiring client data through communication of an API interface with an enterprise CRM and ERP system, acquiring user interaction behaviors at a mobile terminal by an SDK embedded point technology, and directionally crawling media data by using an incremental crawler;
The data cleaning and preprocessing module is used for removing the weight of the data through a bloom filter, marking an abnormal value when the average path length from a sample to a root node is smaller than a threshold value mu-3 sigma, wherein mu is the average path length, sigma is the standard deviation, and the cleaned data is stored in a Hadoop distributed file system through a Parquet format;
the user portrait construction module is One-Hot coding vectorization category feature, fractional discretization processing continuous feature, using GRAPHSAGE algorithm to map entity to 256-dimensional vector space, user similarity calculation using formula U i、uj is two different users, vec (u i)、vec(uj) is a user feature vector;
The intelligent acquisition module comprises XGBoost algorithm, a head enterprise client characteristic migration learning technology, a client conversion probability output through a LightGBM model, P convert=f(Xuser,W),Xuser, a user vector containing 128-dimensional characteristics and W as a weight matrix, wherein the algorithm is based on user basis, behaviors and external market data primary screening clues;
The user behavior analysis module adopts a Session-Based method to divide a user behavior sequence, and a transducer architecture is adopted for behavior sequence modeling;
The intelligent recommendation engine module is used for extracting commodity features by adopting a BERT-CNN model based on a recommendation part of the content, inputting the commodity features and the user portrait features into a multi-layer perceptron to calculate matching degree after the commodity features and the user portrait features are spliced, and introducing a time attenuation factor into a collaborative filtering part T is the number of days of behavior occurrence;
the real-time decision and automatic marketing module is used for constructing a rule engine and a reinforcement learning dual-drive framework, wherein the rule engine supports rule configuration, the state space of the former comprises 50-dimensional information, and the action space of the latter comprises 8 marketing channels;
The data visualization and report generation module is used for constructing a component library based on ECharts and AntV, providing a drag report designer, enabling a user to select indexes and dimensions from a metadata center, and enabling a system to automatically generate SQL query sentences;
And the system management and optimization module is used for deploying Prometaus+ Grafana monitoring, alarming when the index exceeds the limit value for 5 minutes continuously, and distributing the flow through Nginx by the automatic A/B test framework.
2. The AI-based full-ecological intelligent acquisition and user behavior analysis system as set forth in claim 1, further comprising a multi-mode data fusion module, wherein a microphone array and a camera module are deployed at a hardware level, a mute segment is removed by VAD, a Wav2Vec2.0 model is input to extract 768-dimensional acoustic features, image data is subjected to MTCNN algorithm to detect a face region, then ResNet extracts 2048-dimensional visual features, a trans-former-based alignment network is adopted for trans-mode alignment, and a loss function is adoptedWherein L align is a cross-modal alignment loss value, N is a sample number, i is a sample index, x i is an ith text sample, and f text、faudio is a text and voice feature extraction function respectively.
3. The AI-total-ecology-based intelligent acquisition and user behavior analysis system according to claim 1, further comprising a customer loss early warning module, wherein an integrated learning early warning model is built, three base models of random forests and LightGBM, XGBoost are fused, a model is fused through a Stacking method, a Cox proportion risk model in survival analysis is adopted, and in a formula S (t|X) =exp (- Λ (t) exp (X beta)), S (t|X) is the retention probability of a user at time t, Λ (t) is a baseline risk function, X is a user vector containing 50-dimensional features, and beta is a regression coefficient.
4. The AI-based full-ecological intelligent acquisition and user behavior analysis system according to claim 1, wherein in the intelligent acquisition module, a transverse federal learning architecture is built based on TensorFlowFederated, each participant trains LightGBM model based on own client data locally, only uploads model gradient parameters to a central server, the central server updates global model parameters through a weighted aggregation algorithm, and an objective function isWherein K is the number of participating enterprises, omega i is the data volume duty ratio of the ith enterprise, L i (theta) is the local model loss function of the ith enterprise, a differential privacy mechanism is built in the system, and Laplace noise is added before parameter uploadingE is privacy budget and Δf is sensitivity.
5. The AI-based, fully-ecological, intelligent, acquisition and user behavior analysis system of claim 1, wherein in the user behavior analysis module, the formula for calculating behavior path conversion rate isN start is the initial behavior quantity, N end is the end behavior quantity, a user behavior causal graph is built based on the Do-Calculus theory, and a back door adjustment formula ACE= Σ x′wP(Y|do(X=x),W=w)P(W=w)-Σx′Σw P (Y|do (X=x'), W=w) P (W=w) is adopted
Calculating an average causal effect of behavioral intervention, wherein ACE is the average causal effect, X is an intervention variable, Y is a result variable, W is a confounding variable set, automatically detecting a back gate path in a causal graph by the system, eliminating deviation by a tendency score matching method when the unbroken confounding factors exist, and analyzing the result to guide product iteration.
6. The AI-based full-ecology intelligent acquisition and user behavior analysis system of claim 1, wherein in the intelligent recommendation engine module, two recommendation result weights are adjusted by an attention mechanism, the formula is thatH i is a feature vector, W a is a 128×128 leachable matrix, n is the total number of feature vectors, a knowledge graph containing entities is constructed based on Neo4j, node embedding learning is carried out through a graph neural network GRAPHSAGE, a recommendation model inputs the knowledge graph embedding vector vec (G), a user behavior sequence vector vec (u) and a commodity feature vector vec (i) into a multi-layer perceptron, recommendation scores r u,i = f (vec (u), vec (i) and vec (G)) are output, system support interpretability display is carried out, and recommendation reasons are generated through knowledge graph path searching.
7. The AI-based full-ecology intelligent acquisition and user behavior analysis system according to claim 1, wherein 8 marketing channels are regarded as arms of a multi-arm slot machine, a thompson sampling algorithm is adopted to dynamically select a throwing strategy, the conversion posterior distribution of each arm is modeled as Beta distribution theta a~Beta(αaa), wherein alpha a is the number of success times plus 1, beta a is the number of failure times plus 1, the posterior distribution parameters are updated once per hour by the system, and marketing budget is allocated according to sampling results.
8. The AI-based full-ecological intelligent acquisition and user behavior analysis system according to claim 1, wherein in the data visualization and report generation module, an analysis report is automatically generated by adopting a natural language generation technology, a pretrained T5-Base model is used for fine adjustment on a data set, the system encodes data indexes, dimension information and business rules into an input sequence, natural language texts are generated through beamsearch, a template fusion mechanism is introduced, and the generated texts are combined with a preset report template to support multi-language output.
9. The AI-based full-ecological intelligent acquisition and user behavior analysis system according to claim 1, wherein an automatic operation and maintenance system is constructed in a system management and optimization module, visual monitoring is performed through Grafana based on a Prometaus acquisition system hardware index, a software index and a business index, an LSTM-Seq2Seq model is adopted to predict system load, minute index data of 7 days is input, future 1 hour load conditions are predicted, in terms of model optimization, data drift and conceptual drift are detected through a model monitoring module, and new data are automatically extracted from a data lake to perform incremental training when drift is detected.
10. The AI-based full-ecology intelligent acquisition and user behavior analysis system of claim 1, wherein in a data acquisition stage, laplace noise is added to user behavior data by adopting a differential privacy technology, noise intensity e is adjusted according to data sensitivity, in a data storage stage, sensitive data are encrypted by using a homomorphic encryption technology, in a model training stage, data joint modeling is performed by adopting a secure multiparty computing protocol, and the system dynamically allocates rights based on user roles, data labels and operation scenes by adopting a federal identity authentication and access control mechanism.
CN202510986555.6A 2025-07-17 2025-07-17 Intelligent passenger acquisition and user behavior analysis system based on AI full ecology Withdrawn CN120875931A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202510986555.6A CN120875931A (en) 2025-07-17 2025-07-17 Intelligent passenger acquisition and user behavior analysis system based on AI full ecology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202510986555.6A CN120875931A (en) 2025-07-17 2025-07-17 Intelligent passenger acquisition and user behavior analysis system based on AI full ecology

Publications (1)

Publication Number Publication Date
CN120875931A true CN120875931A (en) 2025-10-31

Family

ID=97451515

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202510986555.6A Withdrawn CN120875931A (en) 2025-07-17 2025-07-17 Intelligent passenger acquisition and user behavior analysis system based on AI full ecology

Country Status (1)

Country Link
CN (1) CN120875931A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN121073157A (en) * 2025-11-06 2025-12-05 深圳市维卓数字营销有限公司 A method, apparatus, device, and storage medium for generating budget allocation strategies.
CN121092784A (en) * 2025-11-10 2025-12-09 江苏省软件产品检测中心 A Personalized Customer Service System and Method Based on Multimodal Data Fusion
CN121117194A (en) * 2025-11-17 2025-12-12 上海勃冉众创数字科技有限公司 Intelligent recruitment clue mining and accurate matching method based on AI

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN121073157A (en) * 2025-11-06 2025-12-05 深圳市维卓数字营销有限公司 A method, apparatus, device, and storage medium for generating budget allocation strategies.
CN121092784A (en) * 2025-11-10 2025-12-09 江苏省软件产品检测中心 A Personalized Customer Service System and Method Based on Multimodal Data Fusion
CN121117194A (en) * 2025-11-17 2025-12-12 上海勃冉众创数字科技有限公司 Intelligent recruitment clue mining and accurate matching method based on AI

Similar Documents

Publication Publication Date Title
Safara A computational model to predict consumer behaviour during COVID-19 pandemic
Alojail et al. A novel technique for behavioral analytics using ensemble learning algorithms in E-commerce
WO2021025926A1 (en) Digital content prioritization to accelerate hyper-targeting
Kannan et al. Predictive big data analytic on demonetization data using support vector machine
US12039424B2 (en) Systems and methods for managing, distributing and deploying a recursive decisioning system based on continuously updating machine learning models
Orogun et al. Predicting consumer behaviour in digital market: a machine learning approach
Ramesh et al. Hybrid artificial neural networks using customer churn prediction
CN120875931A (en) Intelligent passenger acquisition and user behavior analysis system based on AI full ecology
Loshin et al. Using information to develop a culture of customer centricity: customer centricity, analytics, and information utilization
Gan XGBoost‐Based E‐Commerce Customer Loss Prediction
US20160171590A1 (en) Push-based category recommendations
McCarthy et al. Introduction to predictive analytics
Tan et al. Recommendation Based on Users’ Long‐Term and Short‐Term Interests with Attention
Abdul-Rahman et al. Enhancing churn forecasting with sentiment analysis of steam reviews
Giri et al. Exploitation of social network data for forecasting garment sales
Pinheiro et al. Introduction to statistical and machine learning methods for data science
Thakur et al. Enhancing customer experience through AI-powered personalization: a data science perspective in e-commerce
CN117933568A (en) Operation decision method, apparatus, device, medium and program product
Duan et al. Recommendation system for improving churn rate based on action rules and sentiment mining
Met et al. Product recommendation system with machine learning algorithms for SME banking
US20250378239A1 (en) Agent instantiation and calibration for multi-agent simulator platform
Li Application and optimization of various machine learning models in social e-commerce marketing strategies
Wang A Dynamic CGE Model for Consumer Trust Mechanism within an E‐Commerce Market
CN120409788A (en) Store operation optimization method and its device, equipment and medium
Verdhan Introduction to supervised learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20251031

WW01 Invention patent application withdrawn after publication