Open AccessArticle

A Two-Layer Causal Knowledge Network Construction Method Based on Quality Problem-Solving Data

Yubin Wang

¹,

Shirong Qiang

¹,

Xin Yue

²,

Tao Li

^3,*

and

Keyong Zhang

School of Economics and Management, North University of China, Taiyuan 030051, China

College of Psychology, Liaoning Normal University, Dalian 116023, China

School of Management, Zhengzhou University, Zhengzhou 450001, China

Author to whom correspondence should be addressed.

Systems 2025, 13(3), 142; https://doi.org/10.3390/systems13030142

Submission received: 16 January 2025 / Revised: 6 February 2025 / Accepted: 13 February 2025 / Published: 20 February 2025

(This article belongs to the Special Issue Data-Driven Methods in Business Process Management)

Download

Browse Figures

Figure 1
Research design. "> Figure 2
Diagram of causal knowledge group. "> Figure 3
Schematic of DL-CKN. "> Figure 4
Domain vocabulary construction process. "> Figure 5
Causal knowledge sets containing multiple causal and contextual elements. "> Figure 6
Abstract causal knowledge network. "> Figure 7
Abstract causal knowledge network for the necking problem. "> Figure 8
Concrete causal knowledge network. "> Figure 9
Concrete causal knowledge network for the necking problem. "> Figure 10
DL-CKN. "> Figure A1
Schematic diagram of one cause with multiple effects. "> Figure A2
Schematic diagram of multiple causes and one result. "> Figure A3
Schematic diagram of multiple causes and multiple results. "> Figure A4
Schematic diagram of the splitting method of the “or” relationship in one cause and multiple results. "> Figure A5
Schematic diagram of the splitting method of the “or” relationship in multiple causes and one result. "> Figure A6
Schematic diagram of the splitting method of the “or” relationship in multiple causes and multiple results. "> Figure A7
Schematic diagram of multiple causal relationships after splitting. "> Figure A8
Schematic diagram of the combination method of the “and” relationship in one cause and multiple results. "> Figure A9
Schematic diagram of the combination method of the “and” relationship in multiple causes and one result. "> Figure A10
Schematic diagram of the combination method of the “and” relationship in multiple causes and multiple results. "> Figure A11
Causal knowledge groups in the form of “one cause and one result”. ">

Versions Notes

Abstract

“Cause analysis” constitutes an indispensable component in quality management systems, serving to systematically identify the causes of quality defects, thereby enabling the development of targeted improvement strategies that concurrently address surface-level manifestations and fundamental drivers. However, relying solely on personal experience makes it challenging to conduct a comprehensive and in-depth analysis of quality problems. The reason is that, when analyzing the causes of quality problems, it is essential not only to consider the specific context in which the problems occur. This enables “specific problems” to be “specifically analyzed” for the formulation of temporary containment measures. Additionally, the context of the problem needs to be stripped. This allows for a general and in-depth analysis of the “class problem” or the causal linkages underlying the problem, thereby determining the root cause of the problem and formulating a corresponding long-term program. The analysis of the causes of quality problems exhibits “duality” characteristics. Based on this, this study proposes and constructs a two-layer causal knowledge network by leveraging the causal knowledge generated and applied in the process of quality problem solving to address the “duality” characteristic of the cause analysis of quality problems. The proposed network can assist front-line employees in analyzing the quality problems of products from diverse perspectives and overcome the challenge of relying solely on personal experience to comprehensively and profoundly analyze the causal relationships of quality problems. Our method not only contributes to enhancing the efficiency of quality problem solving but also makes a valuable contribution to the advancement of theories and methods related to quality management and knowledge management.

Keywords:

knowledge management; quality management; problem context; causal analysis; two-layer causal network

1. Introduction

With the progress of economic development and the continuous elevation of people’s living standards, consumers have set higher demands regarding the diversity of products and their quality levels. Concurrently, quality issues have become increasingly complex and difficult to resolve. This situation presents a serious challenge to the survival, development, and production and manufacturing systems of enterprises [1]. Identifying the cause-and-effect relationships of a problem is an effective approach to problem solving [2]. Only through a comprehensive and in-depth analysis of quality problems and the identification of their root causes can quality issues be effectively resolved (unless otherwise specified, the quality problems mentioned in this study refer to the quality of the product).

The analysis of the causes of quality problems is highly complex [3,4,5,6], mainly in the following aspects: ① The cause analysis of quality problems has a dual nature, specifically when analyzing the causal relationships of quality problems. This integration enables the formulation of temporary containment measures to temporarily mitigate the problem. Moreover, it is necessary to strip the specific context to endow the problem with generality, which facilitates the in-depth analysis of the causal linkage underlying the problem, thereby identifying the initial cause of the problem and formulating a corresponding long-term program to effectively resolve the problem in the long term. ② The cause-and-effect relationships behind quality problems are numerous and complex, mostly presenting a net-like structure. In addition to the single-cause-and-effect relationship, there are also multiple-cause-and-effect relationships, such as one cause and multiple effects, multiple causes and one effect, and multiple causes and multiple effects.

The dual nature of cause analysis and the existence of multiple causal relationships demand a comprehensive and in-depth examination of quality problems. Nevertheless, within the current enterprise quality management framework, the majority of front-line employees rely solely on their personal experience and knowledge to perform linear analysis of problem causes. The constraints of personal experience, the duality of cause analysis, the complexity of cause-and-effect relationships, and the existence of multiple-cause-and-effect relationships render the crucial “cause analysis” link ineffective. Due to technological development, the manufacturing industry generates a copious amount of raw data on a daily basis [7]. Among these data, there exists a particular type that is generated when front-line enterprise employees make detailed records of quality problems. In this study, this type of data is denoted as “Quality Problem Solving Data” (henceforth abbreviated as “QPSD”). Such data embody a substantial quantity of empirical causal knowledge, which is an externalization of the group’s empirical knowledge. As the philosopher Hume stated, causality cannot be known a priori but only with the help of experience [8]. If the causal knowledge contained in these data can be effectively managed and fully utilized, it will transcend the limitations of personal experience in cause analysis, enabling a comprehensive and in-depth analysis of quality problems and greatly improving the efficiency and effectiveness of cause analysis. However, QPSD is recorded by front-line employees in the form of text descriptions, which makes the data highly subjective. For example, different employees may depict the same problem in different ways, and multiple cause-and-effect relationships can be expressed in various manners. Consequently, although QPSD contain a wealth of causal knowledge, which furnishes a foundation for a comprehensive and in-depth analysis of quality problems, they are challenging to utilize directly.

In light of the complexity and duality of quality problem analysis, as well as its practical requirements, this study proposes a new causal knowledge network that can analyze both the superficial causes of problems and the initial causes of problems by deeply studying the causal knowledge contained in QPSD. The network mainly consists of two parts: (1) a concrete causal knowledge network containing the “concrete problem” and its causes, which mainly analyzes the surface causes of the concrete problem, and (2) an abstract causal knowledge network, which focuses on the “abstract problem” and its causes. This network is mainly utilized to analyze the initial causes of the quality problem. The concrete causal knowledge network and the abstract causal knowledge network are distinct yet closely interrelated. They are interlinked through the subordination relationship between the “concrete problem” and the “abstract problem” to form a mutually coordinated and unified whole, which is referred to as the double-layer causal knowledge network (referred to as DL-C). In this study, we refer to it as a two-layer causal knowledge network (abbreviated as DL-CKN).

DL-CKN is the externalization and expression of causal knowledge in QPSD. It provides support for causal analysis of quality problems from different perspectives in response to the dual needs of analyzing causes, effectively solving the difficult problem of comprehensive and in-depth analysis of quality problems relying only on personal experience, and providing extensive and comprehensive support for front-line employees in conducting causal analysis.

This study not only contributes to improving the efficiency of quality problem solving but also facilitates the integration of traditional quality management methods with information technology. Moreover, it propels the progression of knowledge-based and intelligent quality management and plays a beneficial role in facilitating the development of Quality 4.0, thereby possessing certain theoretical value and practical significance.

2. Research Status and Research Ideas

2.1. Research Status

The key to solving quality problems lies in the analysis of causes [9,10]. The analysis of causes and clarification of cause–effect relationships are important ways to understand the world and reveal its essence, and they are also the main driving forces for the development of science and the transformation of philosophical thinking. Consequently, individuals have always been tireless in their pursuit of cause–effect relationships [11]. For example, the eighteenth-century philosopher David Hume studied causality from an empiricist perspective. He argued that knowledge of facts should be based on causality, asserting that it is solely through causality that we can transcend the limitations of senses and memory. Hume further contended that causality cannot be known a priori but only by means of experience. Moreover, he emphasized that causality must be assumed if any knowledge beyond experience is to be derived from experience [12,13,14,15].

With the development of science and technology, strict causal determinism has been called into question [16]. Logicians have progressively initiated the utilization of probabilistic theory for the study of inductive reasoning [17]. The direction of causality theory has gradually shifted towards the exploration of probabilistic causality. In other words, the mathematical element of “probability” has been introduced into the causal problem, and the resulting theory has been studied in depth [17,18,19]. For example, Bernoulli discusses the “principle of inadequate justification” [20], Bayes proposes the “inverse probability theorem” [21,22], Reinchenbach studies forked causality and intercurrent causality [23], Suppes argues that real-world phenomena are inherently probabilistic, suggesting that “our concept of matter must contain an intrinsic probability factor” [24,25], Lewis argues for causality from the perspective of counterfactual conditional sentence analysis [26].

With the emergence of the principle of inadequate justification and the continuous research on probabilistic causal theory, R. Howard and J. Mateson propose Bayesian networks (also known as probabilistic nets), mainly for the representation of uncertain expert knowledge [27]. Some scholars have attempted to construct causal networks using Bayesian network structure learning based on experimental data [28]. However, data obtained from observational studies alone cannot fully determine all causal directions, and only the set of causal networks with the same Markovian properties or the same conditional independence can be obtained [29].

In the context of the ceaseless progression of information technology, some scholars have started to focus on how to discover variables that may have causal relationships from a large amount of data [30,31]. Leveraging techniques such as do operations (i.e., unilateral implementation of intervention actions to determine changes in probabilities) [32], counterfactual inference (what if) [33], and so on, they strive to infer the magnitude of causal effects between variables. This endows the data with the capacity to answer the “why” question [34]. Since then, the study of causal inference has gradually come to the forefront: Pearl [35] argues that causal inference should be formalized and algorithmic in order to achieve “human-level” artificial intelligence and proposes seven tools for this purpose; Ohlsson et al. [36] believes that causal inference provides an important basis for etiological models and prevention efforts; Castro et al. [37] opine that causal inference can offer new perspectives on the major challenges of machine learning in medical imaging; Whitaker et al. [38] believe that causal inference can play a crucial role in seeking to explain these complex relationships encountered in real-world environments.

Currently, some scholars are also focusing on how to use text data for causal network construction [5,39]. The basic idea of this research is as follows: initially, the linguistic pattern matching method [40,41] or methods based on machine learning [42,43] or deep learning (neural network) [44] are employed to extract causal pairs from the text [45,46,47]. Subsequently, similar nodes in the causal pairs are fused through exact matching [48] or similarity-based calculation [49,50,51] to form a causal network.

Probabilistic causal theory has, to a certain degree, addressed the limitations of causality within the empirical realm while providing a new explanatory path for the study of causality. However, despite the existence of a certain correlation between probability and causality, they are not equivalent to each other. In other words, probability dependence is neither a sufficient nor a necessary condition for causality [52]. Hume believes that the perception of causality necessitates the input of experience [53]. QPSDcontain a large amount of empirical causal knowledge, which is externalized by front-line employees in their analysis and problem solving. Therefore, this study aims to study the empirical causal knowledge contained in the QPSD and construct the DL-CKN by combining it with the quality problem situation. This constitutes the core content of this study and the most significant distinction between the DL-CKN and the “probabilistic causal network”. Specifically, the DL-CKN is composed of a large amount of causal knowledge and combined with the quality problem situation. The DL-CKN is composed of a profusion of causal knowledge combined with a quality problem context, in which the causal knowledge is a definite causal relationship that has already occurred and contains multiple causal relationships and problem contexts.

Currently, another hot topic of causality research is causal inference, which is the ability to understand and infer causal relationships between variables [54]. The major difference between this and this study is that the causal relationships contained in DL-CKN are derived from causal knowledge in QPSD, not from inference. Although the study of obtaining causal relationships from text provides an important idea for this study, QPSD contain a large amount of empirical causal knowledge and have their own characteristics. It is still worthy of in-depth study on how to use the data, mine the empirical causal knowledge contained in the data for its characteristics, and construct DL-CKN by combining the need for the duality of cause analysis.

2.2. Research Design

QPSD are generated when front-line employees document the process of solving quality problems. As a result, these data exhibit semi-structured characteristics. Specifically, they contain structured fields, such as “production part”, “product model”, “problem description”, “cause analysis”, and so on. However, these fields are recorded by front-line employees in the form of text description, which makes the fields present unstructured characteristics. The knowledge contained in the data lays the foundation for the construction of DL-CKN, such as empirical causal knowledge, problem situations, the truthful recording of multiple cause–effect relationships, etc. Nevertheless, the unstructured records of front-line employees and the confusing descriptive terms of the data bring about difficulties and challenges for the construction of DL-CKN. These issues manifest as misspellings, misused terms, differential descriptions of the same problem, and inconsistent presentation of multiple causal relationships.

In light of the characteristics of QPSDQPSD, this study is required to design the domain vocabulary construction method to standardize the description terms based on the clear composition of DL-CKN and its expression, and we extract the multiple causal relationships in the data to prepare for the construction of DL-CKN. Secondly, by integrating the problem contexts, the causal knowledge group is constructed based on the causal relationships among structures. Subsequently, the causal knowledge group is used as the basis to design the construction method of DL-CKN. Then, the construction method of the causal knowledge network is designed based on the causal knowledge group, and the construction of DL-CKN is completed by selecting the granularity of the abstract problem and selecting the specific problem situation. Finally, the QPSD recorded in the actual production process are taken as an example for the case study.

In summary, the research idea of this study is shown in Figure 1.

2.3. Composition and Expression of DL-CKN

In the cause–effect relationships of quality problems, there exist multiple causal effect relationships. Specifically, if multiple causes and results occur simultaneously and are missing, there is an “and” relationship between these multiple causes (multiple problems). Conversely, if they occur independently and do not interfere with each other, there is an “or” relationship between these multiple causes (multiple problems). The nodes represent the problems and their causes, the directed edges represent the cause–effect relationships between them, and the symbols “and-gate” and “or-gate” represent the “and” and “or” relationships among multiple causes and effects, which can be accurately expressed in the form of Appendix A.

The presence of multiple causal relationships results in diverse and complex manifestations of causality, which not only complicates the construction of DL-CKN but also poses challenges for subsequent research on reasoning and applications. Therefore, it is imperative to further analyze and refine the representation of these multiple causal relationships. Specifically, multiple causes (or problems) connected by an “OR” relationship imply that any of them may occur independently, representing multiple complete causal relationships that can manifest repeatedly. These can be treated “individually,” meaning that multiple cause nodes or problem nodes (collectively referred to as multi-causal nodes) involving an “OR” relationship can be decomposed. The detailed decomposition approach is provided in Appendix B. In contrast, multiple causes (or problems) linked by an “AND” relationship are interdependent and constitute a single, inseparable causal relationship. These should be treated as a “unified entity,” meaning that multi-causal nodes involving an “AND” relationship can be aggregated. The specific aggregation method is outlined in Appendix C.

By splitting the “or” relation and combining the “and” relation, the expressions of multiple causal relations are unified and can be expressed in the form of “node–edge–node”. This study refers to it as the causal knowledge group, as shown in Figure 2. To clearly distinguish multiple causal points and other points, this study uses the direction node (□) representing the multi-causal node containing “and” relationship after the combination process. The other nodes are expressed in the form of triangular nodes (△); the edge represents the causal relationship between nodes.

In Figure 2, the starting point

N_{c}

of the edge represents the cause, and the pointed point of the edge represents the results: problem

N_{e 1}

and problem

N_{e 2}

occur simultaneously, triggered by problem

N_{c}

Based on the duality of cause analysis, the proposed DL-CKN contains a concrete causal knowledge network for analyzing the superficial causes of “concrete problems” and an abstract causal knowledge network for analyzing the root cause. Therefore, the inter-node relationships in DL-CKN are divided into two types: inter-node causal relationships and subordination relationships. Both types of relationships are represented by directed edges, which are: (1) directed edges that represent causal relationships from causes to problems, which exist in the concrete causal knowledge network; (2) directed edges that represent the subordination relationship from species concepts to genus concepts, directed edges that represent abstract problems that contain contextual concrete problems to the corresponding abstract problems stripped of context, which exist between the concrete causal knowledge network and the abstract causal knowledge network.

The causal knowledge group is derived from the causal knowledge contained in the QPSD, and the number of occurrences of different problems and causes also varies. Accordingly, the frequencies of nodes and edges in DL-CKN will be different. This uses the size of nodes and the thickness of edges to represent the differences in their point and edge frequencies.

In summary, the manifestation of DL-CKN is shown in Figure 3.

Quality problems are manifested in various forms. For instance, phenomena such as cracking, tearing, and pulling all fall under the category of cracking-related issues, yet the specific circumstances can vary significantly. A crucial question arises as to whether these problems should be treated distinctly and how to determine the “node granularity” of the “abstract problem.” Different levels of “node granularity” affect the effect of in-depth analysis of the problem. Therefore, when constructing the abstract causal knowledge network, it is necessary to refer to the suggestions of domain experts to select the granularity of the abstract causal knowledge network, so as to help front-line employees better analyze the initial causes of quality problems. Similarly, the contexts of quality problems have diversified characteristics, such as time, place, production environment, production parts, etc., which are all contexts of quality problems. Moreover, the selection of different contexts has different meanings for the analysis of specific problems. Therefore, when constructing the embodied causal knowledge network, it is necessary to select suitable problem situations for the construction of the embodied causal knowledge network according to the actual production situation to improve the effect of analyzing the surface causes of specific problems to the greatest extent.

However, both the causal knowledge network of any type and the caste relationship of nodes between different levels are composed of nodes and the relationship between nodes. Therefore, the DL-CKN to be constructed in this study can be expressed by a unified mathematical formula, as shown in Equation (1).

G = (N, E, P)

(1)

In Equation (1), G denotes DL-CKN; N denotes the set of nodes in DL-CKN; E denotes the set of edges in DL-CKN; P denotes the set of parameters in DL-CKN.

(1): Nodes in DL-CKN

The nodes in DL-CKN are divided into triangular nodes (△) and square nodes (□), both of which represent quality problems or causes. The difference is that triangular nodes represent independent quality problems or causes, while square nodes represent multi-causal nodes that contain “and” relationships after combination. Both forms of nodes can be written as

n_{i}, n_{i} \in N, i = 1, 2, \dots, m

(2): Edge in DL-CKN

The edges in the concrete and abstract causal knowledge networks represent the causal relationships between nodes and are represented as directed edges between nodes, denoted as

e_{i j} = 〈n_{i}, n_{j}〉, e_{i j} \in E

, where

e_{i j}

denotes a directed edge from

n_{i}

n_{j}

;

n_{i}

is a cause of

n_{j}

; and

n_{j}

is a result caused by

n_{i}

The edges between the concrete causal knowledge network layer and the abstract causal knowledge network layer represent the subordination relationship between nodes, represented as directed edges from nodes in the concrete causal knowledge network to nodes in the abstract causal knowledge network, denoted as

\bar{e_{p q}} = 〈n_{p}, n_{q}〉, \bar{e_{p q}} \in E

, where

\bar{e_{p q}}

denotes a directed edge from

n_{p}

n_{q}

. The problem represented by node

n_{q}

is contained in the problem represented by node

n_{q}

(3): Parameters in DL-CKN

DL-CKN contains parameters, and the meanings of each parameter are as follows.

Node frequency. This parameter indicates the frequency of recurrence of a node in the causal network G. For node

n_{i}

, its frequency is denoted as

f (n_{i})

. A larger value of

f (n_{i})

indicates that the problem represented by node

n_{i}

occurs more frequently and should be focused on.

Edges frequency. This parameter indicates the frequency of recurrence of an edge in the causal network G. For edge

e_{i j}

, the frequency is noted as

f (e_{i j})

f (e_{i j})

with a larger value indicates that the causal relationship represented by edge

e_{i j}

occurs more frequently and should be focused on.

3. Methods for Constructing Causal Knowledge Sets

3.1. Construction Method of Domain Vocabulary

The QPSD recorded by different industries or enterprises have their own characteristics. The knowledge contained in the data has its own domain characteristics, and the special description terms of QPSD are confusing, so it is necessary to use a unified vocabulary to describe them. In this study, we call this unified vocabulary for describing knowledge in QPSD a domain vocabulary and construct a domain vocabulary to effectively solve the problem of confusing descriptive terms in QPSD.

Some companies have built a top-down coding system for defining and describing quality issues by domain experts, such as Vehicle Fault Coding (VFC). However, QPSD are recorded by front-line employees in the form of text descriptions. Moreover the descriptions of quality problems in the data tend to be colloquial and simple, and different employees will describe the same quality problem in different ways according to their own language habits. In contrast, the coding system defined by top-down domain experts tends to be specialized, standardized, and accurate, which makes this kind of coding system not well applied. In addition, the descriptions used by front-line employees in recording problem situations may also vary; e.g., when recording the production part, luggage outer plate is usually abbreviated as “outside the line plate”, “outside the line”, etc. Therefore, the construction of domain vocabulary in this study is intended to be based on the analysis of data characteristics and to be condensed from the data layer by layer from the bottom up. The construction process is shown in Figure 4.

The domain vocabulary construction method proposed in this study is condensed from the bottom-up layer by layer, and the specific methods are as follows.

(1): Extract the initial descriptive words of the domain vocabulary

The domain vocabulary source set is a collection of the initial descriptive words of domain vocabulary, which is directly extracted from the QPSD. Since these data are recorded by front-line employees, the descriptions of problems, causes, and problem situations therein are vague and unclear. For example, the inner panel of a car door may be described as “inner door panel”, “inside the car door”, “inside the four doors”, “inside the door”, etc., and it is difficult to directly extract them from the text descriptions.

Fuzzy Mathematics is a mathematical theory and method for studying and dealing with fuzzy phenomena. With the help of membership functions, it can effectively analyze and study uncertain things [55,56]. Therefore, this study draws on the method of Fuzzy Mathematics, designs the membership function based on the characteristics of QPSD, calculates the membership degree of domain vocabulary in the text data, and designs a threshold according to the actual situation of the text data, so as to judge which words in the data are the initial descriptive words of domain vocabulary and then construct the domain vocabulary source set.

The QPSD are a record of the quality problem-solving process, and there are relatively more words describing quality problems, causes, etc. Therefore, phrases or word groups frequently appearing in a single-text record are more likely to be the initial descriptive words of domain vocabulary (Characteristic 1). In the production process of an enterprise, quality problems occur repeatedly. Therefore, the more frequently phrases or word groups appear in multiple text records, the more likely they are to be the initial descriptive words of domain vocabulary (Characteristic 2).

Combining the above characteristics of QPSD, the membership function is designed as follows.

{T F}_{w o r d} = \frac{\sum n_{w o r d, d}}{\sum_{k} n_{k, d}}

(2)

{P D F}_{w o r d} = a r c s i n \frac{\sum D_{w o r d}}{\sum_{m} D_{m}}

(3)

M ({M E M}_{w o r d}) = f ({M E M}_{w o r d}) * {T F}_{w o r d} * {P D F}_{w o r d}

(4)

Equation (2) is a single-text word frequency calculation method designed according to Characteristic 1 of QPSD. Among them,

{T F}_{w o r d}

represents the word frequency of the phrase or word group “word” in a single text.

n_{w o r d}

represents the number of times the phrase or word group “word” appears in document d.

n_{k, d}

represents the number of phrases or word groups in document d. Equation (3) is a calculation method for the frequency of occurrence of the phrase or word group “word” in multiple texts designed according to Characteristic 2 of QPSD. Among them,

{P D F}_{w o r d}

represents the frequency of occurrence of the phrase or word group “word” in multiple texts;

D_{w o r d}

represents the documents containing the phrase or word group “word”;

D_{m}

represents the number of documents. In Equation (4),

M ({M E M}_{w o r d})

represents the membership degree of the phrase or word group “word” as the initial descriptive word of domain vocabulary;

f ({M E M}_{w o r d})

is a harmonic formula, which is designed according to the actual membership degree of domain vocabulary, preventing the membership degree of domain vocabulary from being too large or too small.

According to the above membership functions, the construction method of the domain vocabulary source set is as follows.

The text descriptions of each field in QPSD are composed of words. Suppose the field

S

in the data can be represented as a set composed of words

w_{i}

, that is

S = {w_{1} w_{2} {\dots w}_{i} {\dots w}_{n}}

. After the field S undergoes data preprocessing, word segmentation calculates the membership degree of the word

w_{i}

of domain vocabulary according to the membership function and sorts the words according to the size of the membership degree, and the original field

S

can be represented as

S^{'} = {w_{i_{1}} | {m e m}_{1}, w_{i_{2}} | {m e m}_{2}, \dots w_{i_{n}} | {m e m}_{n}}

, where

w_{i_{n}} | {m e m}_{n}

represents a certain word in the field and its membership degree of domain vocabulary.

According to the specific structure and characteristics of QPSD, set a threshold

λ_{S}

. That is, the words in the

S

field whose membership degree of domain vocabulary is greater than the threshold

λ_{S}

can be used as alternative words of domain vocabulary and put into the domain vocabulary source set.

(2): Correction of vocabulary misuse

The QPSD are recorded by front-line employees in the form of text descriptions, so there are a large number of cases of wrong characters and misused words in the data, such as “necking down” and “diameter reduction”; “crack” and “split crack”, etc. The existence of wrong characters can greatly affect the calculation of word semantic similarity in the domain vocabulary source set.

Among the existing text analysis methods, there are already recognition and correction tools for wrong characters and words, such as enchant, check, enchant, suggest, corrector, etc., all of which can complete the correction of wrong characters and words quite well. Therefore, with the help of tools like Python and MATLAB, text analysis tools can be used to correct wrong characters, misused words, and so on in the domain vocabulary source set.

(3): Vocabulary semantic clustering

The words in the domain vocabulary source set are diverse. For example, “component cracking” may be described as “component splitting”, “component split”, “split” and so on. Therefore, it is necessary to analyze and cluster different descriptions of the same problem through text analysis methods, such as similarity calculation, synonym analysis, semantic analysis, etc., to unify their descriptions and achieve semantic clustering. The specific method is as follows.

Pairwise compare the words in the domain vocabulary source set, calculate their semantic similarity, and construct a semantic similarity matrix

S M

, as shown in Equation (5).

S M = [\begin{matrix} 1 & m_{12} & \dots & m_{1 n} \\ - & 1 & \dots & m_{2 n} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ - & - & \dots & 1 \end{matrix}]

(5)

In Equation (5),

m_{i j}

represents the similarity between word

i

and word

j

According to the specific characteristics of the data, set a similarity threshold

ϑ

, and classify the words with a similarity greater than the threshold

ϑ

into one category to achieve the semantic clustering of the words in the domain vocabulary source set.

(4): Supplementary and improvement

Verify the constructed domain vocabulary, mainly checking whether there are still any omitted descriptions of problems, situations, and causes. If there is no omission, it proves that the constructed domain vocabulary can accurately and completely describe all the quality problems, situations, and causes in the current QPSD. That is to say, through the current domain vocabulary, DL-CKN, which accurately describes problems, situations, and causes, can be constructed; otherwise, methods such as data analysis, literature analysis, and judgment by domain experts can be used to supplement and improve the constructed domain vocabulary until there is no omission.

3.2. Extraction Method for Multiple Causal Nodes

QPSD are a detailed record of the quality problem-solving process. When multiple cause–effect relationships are encountered, including “and” and “or”, the data will be faithfully reflected, which lays the foundation for the extraction of multiple cause–effect nodes. For example, if the material of the sheet is not qualified, it may lead to the occurrence of cracks and neck at the same time, and the quality problem solver will record it truthfully by filling in the problem description field with “Cracking and necking of the part” and the cause description field with “Material of the sheet is not qualified”. However, the vocabulary used by front-line employees to record quality problems is often inconsistent and irregular. Therefore, replacing the irregular descriptions in the data with domain vocabulary is a prerequisite for extracting multiple cause–effect nodes.

In summary, the idea of multi-causal node extraction in this study is as follows: firstly, the irregular descriptions in QPSD are normalized by domain vocabulary. Subsequently, the “and” and “or” relationship connectives and syntactic dependencies are used to identify the “and” and “or” relationships among domain vocabulary. Then, we identify the “and” and “or” relations among domain vocabularies based on “and” and “or” relational connectives and syntactic dependencies to complete the extraction of multiple causal nodes. The specific methods are as follows:

(1): Domain Vocabulary Substitution

Let

W = {W_{1}, W_{2}, \dots, W_{m}}

W_{i} = {w_{1}^{i}, w_{2}^{i}, \dots, w_{n}^{i}}

, where

W

is the set of domain vocabulary;

W_{i}

is a certain domain vocabulary;

{w_{1}^{i}, w_{2}^{i}, \dots, w_{n}^{i}}

is the set of domain vocabulary sources of the domain vocabulary

W_{i}

, i.e., the words describing the problem/cause in the QPSD corresponding to the domain vocabulary

W_{i}

. Thus, the set of domain vocabulary can be shown in Equation (6).

W = ⟦\begin{matrix} \begin{matrix} W_{1} \\ W_{2} \end{matrix} \\ ⋮ \\ W_{m} \end{matrix}⟧ = ⟦\begin{matrix} \begin{matrix} w_{1}^{1} \\ w_{1}^{2} \end{matrix} & \begin{matrix} w_{2}^{1} \\ w_{2}^{2} \end{matrix} & \begin{matrix} \begin{matrix} \dots & w_{n_{1}}^{1} \end{matrix} \\ \begin{matrix} \dots & w_{n_{2}}^{2} \end{matrix} \end{matrix} \\ ⋮ & ⋮ & \begin{matrix} ⋱ & ⋮ \end{matrix} \\ w_{1}^{m} & w_{2}^{m} & \begin{matrix} \dots & w_{n_{m}}^{m} \end{matrix} \end{matrix}⟧

(6)

The text sentences in QPSD

S = {S_{1}, S_{2}, \dots, S_{p}}; S_{j} = {s_{1}^{j}, s_{2}^{j}, \dots, s_{q}^{j}}

, where

S

is the set of text sentences in the QPSD,

S_{j}

is the sentence describing the problem/cause in it,

{s_{1}^{j}, s_{2}^{j}, \dots, s_{q}^{j}}

is the word in sentence

S_{j}

. Thus, the text sentences can be shown in Equation (7).

S = ⟦\begin{matrix} \begin{matrix} S_{1} \\ S_{2} \end{matrix} \\ ⋮ \\ S_{p} \end{matrix}⟧ = ⟦\begin{matrix} \begin{matrix} s_{1}^{1} \\ s_{1}^{2} \end{matrix} & \begin{matrix} s_{2}^{1} \\ s_{2}^{2} \end{matrix} & \begin{matrix} \begin{matrix} \dots & s_{q_{1}}^{1} \end{matrix} \\ \begin{matrix} \dots & s_{q_{2}}^{2} \end{matrix} \end{matrix} \\ ⋮ & ⋮ & \begin{matrix} ⋱ & ⋮ \end{matrix} \\ s_{1}^{p} & s_{2}^{p} & \begin{matrix} \dots & s_{q_{p}}^{p} \end{matrix} \end{matrix}⟧

(7)

The domain vocabulary is constructed from QPSD, so there must exist

s_{k}^{j} = w_{h}^{i}

, as shown in Equation (8).

S = ⟦\begin{matrix} \begin{matrix} S_{1} \\ S_{2} \end{matrix} \\ ⋮ \\ S_{p} \end{matrix}⟧ = ⟦\begin{matrix} \begin{matrix} s_{1}^{1} \\ s_{1}^{2} \end{matrix} & \begin{matrix} s_{2}^{1} \\ s_{2}^{2} \end{matrix} & \begin{matrix} \begin{matrix} \dots & s_{q_{1}}^{1} \end{matrix} \\ \begin{matrix} \dots & s_{q_{2}}^{2} \end{matrix} \end{matrix} \\ ⋮ & ⋮ & \begin{matrix} ⋱ & ⋮ \end{matrix} \\ s_{1}^{p} & s_{2}^{p} & \begin{matrix} \dots & s_{q_{p}}^{p} \end{matrix} \end{matrix}⟧ = ⟦\begin{matrix} \begin{matrix} s_{1}^{1} \\ w_{h_{21}}^{i_{21}} \end{matrix} & \begin{matrix} w_{h_{12}}^{i_{12}} \\ s_{2}^{2} \end{matrix} & \begin{matrix} \begin{matrix} \dots & s_{q_{1}}^{1} \end{matrix} \\ \begin{matrix} \dots & s_{q_{2}}^{2} \end{matrix} \end{matrix} \\ ⋮ & ⋮ & \begin{matrix} ⋱ & ⋮ \end{matrix} \\ s_{1}^{p} & s_{2}^{p} & \begin{matrix} \dots & w_{h_{p t}}^{i_{p t}} \end{matrix} \end{matrix}⟧

(8)

Further, the

w_{h}^{i}

in the domain vocabulary source set in the sentence is replaced by the domain vocabulary

W_{i}

, as shown in Equation (9).

S = ⟦\begin{matrix} \begin{matrix} S_{1} \\ S_{2} \end{matrix} \\ ⋮ \\ S_{p} \end{matrix}⟧ = ⟦\begin{matrix} \begin{matrix} s_{1}^{1} \\ w_{h_{21}}^{i_{21}} \end{matrix} & \begin{matrix} w_{h_{12}}^{i_{12}} \\ s_{2}^{2} \end{matrix} & \begin{matrix} \begin{matrix} \dots & s_{q_{1}}^{1} \end{matrix} \\ \begin{matrix} \dots & s_{q_{2}}^{2} \end{matrix} \end{matrix} \\ ⋮ & ⋮ & \begin{matrix} ⋱ & ⋮ \end{matrix} \\ s_{1}^{p} & s_{2}^{p} & \begin{matrix} \dots & w_{h_{p t}}^{i_{p t}} \end{matrix} \end{matrix}⟧ = ⟦\begin{matrix} \begin{matrix} s_{1}^{1} \\ W_{i_{21}} \end{matrix} & \begin{matrix} W_{i_{11}} \\ s_{2}^{2} \end{matrix} & \begin{matrix} \begin{matrix} \dots & s_{q_{1}}^{1} \end{matrix} \\ \begin{matrix} \dots & s_{q_{2}}^{2} \end{matrix} \end{matrix} \\ ⋮ & ⋮ & \begin{matrix} ⋱ & ⋮ \end{matrix} \\ s_{1}^{p} & s_{2}^{p} & \begin{matrix} \dots & W_{i_{p t}} \end{matrix} \end{matrix}⟧

(9)

In this case, the irregular text descriptions in the text sentences have been replaced by domain vocabulary, and the “and” and “or” relationships between domain vocabulary can be identified based on the “and” and “or” relational connectives and syntactic dependencies.

(2): Extraction of Multiple Causal Nodes

According to the existing studies [57,58], and combined with the characteristics of QPSD, the multi-causal node expression pattern containing “and”, “or” and other relationships can be summarized, as shown in Table 1.

Sentences are composed of vocabulary. Let the text sentence Sen in QPSD after normalization be represented as a set composed of vocabulary and domain vocabulary, i.e.,

S e n = {w_{1} w_{2} {\dots w}_{i} {\dots w}_{n}}

, where

w_{i}

denotes the vocabulary or domain vocabulary in the sentence. The sentence Sen can be represented by Equation (10) after text analysis such as lexical analysis and syntactic analysis.

{S e n}^{'} = {w_{1} \leftrightarrow w_{j_{1}} | {s y n}_{1}, w_{2} \leftrightarrow w_{j_{2}} | {s y n}_{2}, \dots w_{i} \leftrightarrow w_{j_{i}} | {s y n}_{i}, \dots w_{n} \leftrightarrow w_{j_{n}} | {s y n}_{n}}

(10)

In Equation (10),

w_{i} \leftrightarrow w_{j_{i}} | {s y n}_{i}

indicates the syntactic dependency of a word

w_{i}

in a sentence and the syntactic dependency

{s y n}_{i}

of the word

w_{i}

with another word

w_{j_{i}}

Let

W

be the set of domain vocabulary,

{L R W}^{a n d}

be the set of “and” relationship feature words,

{L R W}^{o r}

be the set of “or” relationship feature words;

L^{a n d}

means “and” relationship, and

L^{o r}

means “or” relationship.

According to the characteristics of multi-causal expressions in QPSD, the corresponding multi-causal node extraction rules can be designed:

{R u l e}^{a n d}

: the extraction rule for the “and” (

⋀

) relationship can be represented by Equation (11).

{R u l e}^{a n d} = ⟦\begin{matrix} i f (w_{i} ϵ {L R W}^{a n d} ⋀ w_{i - 1} \leftrightarrow w_{i + 1} | {s y n}_{i} = C o o ⋁ A p p) ⋀ (w_{i - 1} ϵ W ⋀ w_{i + 1} ϵ W) \\ t h e n {(w}_{i - 1} \leftrightarrow w_{i + 1}) \equiv L^{a n d} \end{matrix}⟧

(11)

In Equation (11),

{(w}_{i - 1} \leftrightarrow w_{i + 1}) \equiv L^{a n d}

indicates that there is an “and” relationship between the domain vocabulary

w_{i - 1}

and the domain vocabulary

w_{i + 1}

{R u l e}^{o r}

: the extraction rule for the “or” (

⋁

) relationship can be represented by Equation (12).

{R u l e}^{o r} = ⟦\begin{matrix} i f (w_{i} ϵ {L R W}^{o r} ⋀ w_{i - 1} \leftrightarrow w_{i + 1} | {s y n}_{i} = C o o ⋁ A p p) ⋀ (w_{i - 1} ϵ W ⋀ w_{i + 1} ϵ W) \\ t h e n {(w}_{i - 1} \leftrightarrow w_{i + 1}) \equiv L^{o r} \end{matrix}⟧

(12)

In Equation (12),

{(w}_{i - 1} \leftrightarrow w_{i + 1}) \equiv L^{o r}

indicates that there is an “or” relationship between the domain vocabulary

w_{i - 1}

and the domain vocabulary

w_{i + 1}

Let the normalized sentence

{S e n}_{j} = {w_{1}^{j} w_{2}^{j} \dots w_{i}^{j} \dots w_{n}^{j}}

w_{i}^{j}

is the vocabulary or domain vocabulary composing the sentence

{S e n}_{j}

. According to the above multi-causal node extraction rules, the multi-causal nodes contained in the sentence

{S e n}_{j}

are extracted, and the extraction method is shown in Equation (13).

{S e n}_{j} = \{\begin{matrix} w_{i - 1} ⋀ w_{i + 1} {S e n}_{j} ≓ {R u l e}^{a n d} \\ w_{i - 1} ⋁ w_{i + 1} {S e n}_{j} ≓ {R u l e}^{o r} \end{matrix}

(13)

In Equation (13),

w_{i - 1} ⋀ w_{i + 1}

indicates that there is an “and” relationship between the domain vocabulary

w_{i - 1}

and

w_{i + 1}

w_{i - 1} ⋁ w_{i + 1}

indicates that there is an “or” relationship between the domain vocabulary

w_{i - 1}

and

w_{i + 1}

{S e n}_{j} ≓ {R u l e}^{a n d}

indicates that the sentence

{S e n}_{j}

conforms to the extraction rule of “and” relation,

{S e n}_{j} ≓ {R u l e}^{o r}

indicates that the sentence

{S e n}_{j}

conforms to the extraction rule of “or” relation.

The extraction of multiple causal nodes can be completed by traversing all sentences in the QPSD, laying the foundation for the subsequent construction of causal knowledge groups.

3.3. Construction Algorithm of Causal Knowledge Sets

The construction of domain vocabulary and the extraction of multiple causal nodes lay the foundation for the construction of causal knowledge groups. Since the concrete causal knowledge network layer in DL-CKN is contained with problem contexts, contextual factors should be considered when constructing causal knowledge groups.

QPSD are recorded according to the quality problem resolution process, so the data contain structured field sections, such as “Production Part”, “Product Model”, “Problem Description “, “Cause Analysis” and other fields. Some of the fields describe the quality problem situation, such as “production part”, “product model”, etc. Some of the fields are structured with cause-and-effect relationships, such as the “problem description field” and “Cause analysis field”. Therefore, based on the cause–effect relationship between structures, a causal knowledge group containing multiple causal relationships and contextual elements can be constructed based on the extraction method of multiple causal nodes and the problem contexts in the data.

Let the after domain vocabulary replacement QPSD

S = \{S_{1}, S_{2} \dots, S_{j}, \dots S_{m}\}

S_{j}

is the data of one of the quality problem records, which mainly show quality problem context fields (e.g., product model, production part, etc.), problem description fields, cause analysis fields, etc. Therefore, the record can be expressed as

S_{j} = \{{S_{j}^{P s}, S}_{j}^{P b}, S_{j}^{C a u}\}

\{{S_{j}^{P s}, S}_{j}^{P b}, S_{j}^{C a u}\}

are the corresponding problem situation fields, problem description fields, and cause analysis fields in the PQSD data

S_{j}

In summary, the causal knowledge group construction algorithm is shown in Algorithm 1.

Algorithm 1. The causal knowledge group construction algorithm

With the above algorithm, a causal knowledge set containing multiple causal and contextual elements can be constructed.

4. Construction Method of DL-CKN

4.1. Methods of Constructing Causal Knowledge Networks

Through the above study, a causal knowledge group containing multiple causal relationships and contextual factors was constructed, which laid the foundation for the construction of a causal knowledge network; i.e., all the identical nodes in the causal knowledge group were merged and the frequencies of nodes and edges were counted to complete the construction of causal knowledge network.

The specific method is as follows.

Let

G^{μ}, G^{η}

be the set of causal knowledge sets containing multiple causal and contextual elements from two QPSD according to the method described in this study, as shown in Equation (14) and Equation (15), respectively.

G^{μ} = (N^{μ}, E^{μ}, P^{μ})

(14)

G^{η} = (N^{η}, E^{η}, P^{η})

(15)

In Equation (14),

N^{μ}

denotes the set of nodes in

G^{μ}

E^{μ}

denotes the set of edges in

G^{μ}

P^{μ}

denotes the set of parameters in

G^{μ}

. In Equation (15),

N^{η}

denotes the set of nodes in

G^{η}

E^{η}

is the set of edges in

G^{η}

P^{η}

denotes the set of parameters in

G^{η}

Let

n_{i}^{μ}, n_{j}^{μ}, n_{i^{*}}^{μ}, n_{j^{*}}^{μ}

denote the nodes in

G^{μ}

n_{t}^{η}, n_{k}^{η}

denote the nodes in

G^{η}

Let

e_{i j}^{μ}, e_{i^{*} j^{*}}^{μ}

be the edges in

G^{μ}

e_{t k}^{η}

be the edge in

G^{η}

, where

e_{i j}^{μ} \in E^{μ}, e_{i j}^{μ} = 〈n_{i}^{μ}, n_{j}^{μ}〉

e_{i j}^{μ}

denotes the directed edge from

n_{i}^{μ}

n_{j}^{μ}

G^{μ}

e_{i^{*} j^{*}}^{μ} \in E^{μ}, e_{i^{*} j^{*}}^{μ} = 〈n_{i^{*}}^{μ}, n_{j^{*}}^{μ}〉

e_{i^{*} j^{*}}^{μ}

denotes the directed edge from

n_{i^{*}}^{μ}

n_{j^{*}}^{μ}

G^{μ}

e_{t k}^{η} \in E^{η}, e_{t k}^{η} = 〈n_{t}^{η}, n_{k}^{η}〉; e_{t k}^{η}

denotes the directed edge from

n_{t}^{η}

n_{k}^{η}

G^{η}

The nodes in

G^{μ}, G^{η}

are not merged, and the point frequency and edge frequency are the initial value 1, i.e.,

f^{μ} (n_{i}^{μ}) = 1, f^{μ} (n_{i^{*}}^{μ}) = 1, f^{μ} (n_{j}^{μ}) = 1, f^{μ} (n_{j^{*}}^{μ}) = 1, f^{η} (n_{t}^{η}) = 1, f^{η} (n_{k}^{η}) = 1; f^{μ} (e_{i j}^{μ}) = 1, f^{μ} (e_{i^{*} j^{*}}^{μ}) = 1, f^{η} (e_{t k}^{η}) = 1

; Where

f^{μ} (n_{i}^{μ}) \in P^{μ}, f^{μ} (n_{i^{*}}^{μ}) \in P^{μ}, f^{μ} (n_{j}^{μ}) \in P^{μ}, f^{μ} (n_{j^{*}}^{μ}) \in P^{μ}, f^{μ} (e_{i j}^{μ}) \in P^{μ}, f^{μ} (e_{i^{*} j^{*}}^{μ}) \in P^{μ}; f^{η} (n_{t}^{η}) {\in P}^{η}, f^{η} (n_{k}^{η}) {\in P}^{η}, f^{η} (e_{t k}^{η}) {\in P}^{η}

For nodes, if

n_{i}^{μ}

and

n_{i^{*}}^{μ}

are the same node split by the “or” relationship in the same data, their frequency does not change when they are merged because: the problem/reason actually occurs only once when the quality problem is solved, and it is split into multiple nodes by the “or” relationship method described in this study. The “or” relationship splitting method splits it into multiple nodes, so the nodes present multiple times.

In this case, the method of merging nodes is shown in Equation (16).

\{\begin{matrix} n_{i i^{*}}^{μ} = n_{i}^{μ} ⨁ n_{i^{*}}^{μ} \\ f^{μ} (n_{i i^{*}}^{μ}) = f^{μ} (n_{i}^{μ}) n_{j}^{μ}, n_{j^{*}}^{μ} \equiv L^{o r} \end{matrix}

(16)

In Equation (16), the symbol

⨁

indicates the merging of node

n_{i}^{μ}

with node

n_{i^{*}}^{μ}

n_{j}^{μ}, n_{j^{*}}^{μ} \equiv L^{o r}

indicates the existence of an “or” (

⋁

) relationship between

n_{j}^{μ}

and

n_{j^{*}}^{μ}

In multiple data, if node

n_{i}^{μ}

is the same as node

n_{t}^{η}

, these two nodes can be merged to form a new node

n_{i t}^{μ η}

, while the frequency

f^{μ η} (n_{i t}^{μ η})

n_{i t}^{μ η}

changes, as shown in Equation (17).

\{\begin{matrix} n_{i t}^{μ η} = n_{i}^{μ} ⨁ n_{t}^{η} \\ f^{μ η} (n_{i t}^{μ η}) = f^{μ} (n_{i}^{μ}) + f^{η} (n_{t}^{η}) \end{matrix}

(17)

For an edge, if the edge

e_{i j}^{μ}

is the same as the edge

e_{t k}^{η}

, i.e., the starting point

n_{i}^{μ}

of the edge

e_{i j}^{μ}

is the same as the starting point

n_{t}^{η}

of the edge

e_{t k}^{η}

, the endpoint

n_{j}^{μ}

e_{i j}^{μ}

is the same as the starting point

n_{k}^{η}

e_{t k}^{η}

, the edge

e_{i j}^{μ}

can be combined with the edge

e_{t k}^{η}

, forming

e_{i j}^{μ η}

, while the frequency

f^{μ η} (e_{i j}^{μ η})

changes, as shown in Equation (18).

\{\begin{matrix} e_{i j}^{μ η} = e_{i j}^{μ} ⨁ e_{t k}^{η} \\ f^{μ η} (e_{i j}^{μ η}) = f^{μ} (e_{i j}^{μ}) + f^{η} (e_{t k}^{η}) \end{matrix}

(18)

The construction of the causal knowledge network can be completed by traversing all the constructed causal knowledge groups using the above method.

4.2. Algorithm for the Construction of Species Relationships for Inter-Level Nodes

According to the above causal knowledge network construction method, the construction of a concrete causal knowledge network and abstract causal knowledge network can be completed by combining the suggestions of domain experts on the granularity of abstract causal knowledge networks and the selection of problem situations by the actual production situation. Then, the construction of DL-CKN can be completed based on the species relationship of nodes in the concrete layer and the abstract layer.

The granularity screening of the abstract causal knowledge network is essentially a question of the quality of the induction and integration of different manifestations. Let

G_{μ}^{A C N} = (N_{μ}^{A C N}, E_{μ}^{A C N}, P_{μ}^{A C N})

be the constructed abstract causal knowledge network, where the set of nodes

N_{μ}^{A C N} = {n_{μ 1}^{A C N}, n_{μ 2}^{A C N}, \dots, n_{μ ζ}^{A C N}, \dots, n_{μ p}^{A C N}}

. In any node

n_{μ ζ}^{A C N} = {\overset{⏝}{W_{μ ζ}^{A C N}} : {W_{{μ ζ}_{1}}^{A C N}, W_{{μ ζ}_{2}}^{A C N}, \dots, W_{{μ ζ}_{f}}^{A C N} \dots, W_{{μ ζ}_{s}}^{A C N}}}

\overset{⏝}{W_{μ ζ}^{A C N}}

is the descriptive vocabulary of the problem represented by the quality problem node

n_{μ ζ}^{A C N}

where the granularity screening is performed,

{W_{{μ ζ}_{1}}^{A C N}, W_{{μ ζ}_{2}}^{A C N}, \dots, W_{{μ ζ}_{f}}^{A C N} \dots, W_{{μ ζ}_{s}}^{A C N}}

is generalized and integrated into the initial domain vocabulary set of node

n_{μ ζ}^{A C N}

, and the two are associated in dictionary form.

Let

G_{ϑ}^{R C N} = (N_{ϑ}^{R C N}, E_{ϑ}^{R C N}, P_{ϑ}^{R C N})

be the constructed network of embodied causal knowledge, where the set of nodes

N_{ϑ}^{R C N} = {n_{ϑ 1}^{R C N}, n_{ϑ 2}^{R C N}, \dots, n_{ϑ i}^{R C N}, \dots n_{ϑ n}^{R C N}}

n_{ϑ i}^{R C N} = [W_{ϑ i}^{R C N - P b}, W_{ϑ i}^{R C N - P s}]

consists of a domain vocabulary for describing quality problems and a domain vocabulary for representing problem situations.

Combined with the above assumptions, an algorithm for inter-node caste relations in DL-CKN is shown in Algorithm 2.

Algorithm 2. Algorithm for inter-node caste relationship

The above algorithm can construct the species relationship between the nodes, which will link the concrete causal knowledge network with the abstract causal knowledge network and complete the construction of DL-CKN.

5. Case Study

5.1. Case Background

The automotive industry has a strong guiding and driving effect on the upstream- and downstream-related industries with high industrial relevance, and in the process of automotive production and manufacturing, about 60–70% of metal parts are produced by stamping [59]. Given this, this study takes the quality problems encountered in the stamping production process of an automobile manufacturing enterprise as an example and constructs the DL-CKN with practical application, while verifying the effectiveness, rationality, and scientificity of the method proposed in this study. In the process of body-in-white stamping manufacturing, quality problems such as cracks, creases, and hair-pulling often occur.

To solve quality problems and improve production efficiency and product quality, front-line employees in the stamping workshop usually record the process of solving quality problems and form the “Product Quality Problem-Solving Record Sheet for Stamping Workshop”. This sheet mainly contains fields, such as “production part”, “production model”, “fault description”, “cause analysis”, etc., which contain a lot of empirical knowledge. However, due to the difference in each person’s expression habits, the expressions in the record document are different, which makes the construction of DL-CKN very difficult.

The “Product Quality Problem Solving Record Form for Press Shop” obtained in this study contains 3997 QPSD, among which the “Production Part” and “Production Model” fields contain the problem situation, the “Fault Description”, “Production Part” and “Production Model” contain the problem situation, and the fields of “Fault Description” and “Cause Analysis” contain the cause–effect knowledge, which meet the basic requirements of building DL-CKN. Among the 3997 data obtained, some data were not recorded completely; for example, 483 data only filled in the “fault description” field but not the “cause analysis” field; 329 data only filled in the “cause analysis” field but not the “cause analysis” field. Cause Analysis” but not “Fault Description” field: 42 pieces of data where both “Fault Description” field and “Cause Analysis” field are not filled. There are 42 items of incomplete data; 23 items of data are not filled in the field of “production part”; 19 items of data are not filled in the field of “production model”; there are 42 items of intersection of the above incomplete data.

The main purpose of the case study in this section is to verify the DL-CKN construction method proposed in this study. Therefore, data with incomplete causes and effects and incomplete problem situations in the “Product Quality Problem-Solving Record Form of Stamping Workshop” are excluded, and 3143 complete data are selected for the case study, in order to help enterprise employees analyze quality problems comprehensively and deeply, and verify whether the theoretical method proposed in this study is scientific, reasonable, and effective.

5.2. Building Domain Vocabulary

According to the domain vocabulary construction method outlined in this paper, and considering the characteristics of the QPSD used, the harmonic function is set to a constant value of 10, that is

f ({M E M}_{w o r d}) = 10

. Subsequently, in a Python 3.6 environment, we performed word segmentation on the QPSD. A program was written to calculate the membership degree of domain vocabulary, and the membership degree for each word was computed. Some of the calculation results are presented in Table 2.

Due to the varying complexity of quality problems, there are significant differences in the number of words across different fields in the “Product Quality Problem Solving Process Record Form.” Some fields contain as few as one word, such as some “Problem Description” field only has the word “crack”, while some fields may have more than 100 words. The initial descriptive terms for domain vocabulary in texts of varying lengths also differ. Therefore, based on the differing vocabulary content in each field, this paper adopts a tiered approach to determine the threshold

λ_{S}

for the membership degree of domain vocabulary. The specific method is outlined in Equation (19).

λ_{S} = \{\begin{matrix} \begin{matrix} 0.6 & n_{w} \in (0,5] \\ 0.5 & n_{w} \in (5,20] \end{matrix} \\ \begin{matrix} 0.45 & n_{w} \in (20,50] \\ 0.4 & n_{w} \in (50,90] \end{matrix} \\ \begin{matrix} 0.35 & n_{w} \in (90, + \infty) \end{matrix} \end{matrix}

(19)

In Equation (19),

n_{w}

represents the vocabulary content in the field, and

λ_{S}

is the threshold for the membership of domain vocabulary.

According to the selected threshold

λ_{S}

, a total of 19,219 words were screened out as the initial descriptive words of domain vocabulary, and word frequency statistics were conducted on them. Some of the results are shown in Table 3.

There were 1343 initial descriptors of the domain vocabulary after the word frequency statistics, among which “cracking” appeared most frequently, 638 times.

When entering QPSD, some misspellings and misuse of words are inevitable. In the data selected for this study, most of the misused words are common harmonic words, for example, “crack” is mistakenly recorded as “crack”, and “insert” is mistakenly recorded as “insert”.

The existing Chinese text error correction tools were able to achieve better correction of vocabulary misuse. Therefore, in this study, we use the pycorrector Chinese text error correction tool in Python 3.6 environment, load the kenlm language model, check and correct 1343 words in the initial descriptors of domain vocabulary, and find 139 misused words in total; some results are shown in Table 4.

The words in the misuse-corrected domain vocabulary source set were compared in terms of two-by-two lexical similarity, and a semantic similarity matrix was constructed, partly as shown in Table 5.

According to the data characteristics, the similarity threshold

ϑ = 0.50

is selected for semantic clustering, the domain vocabulary is constructed by combining commonly used words as well as expert opinions, and 823 words are obtained, including 685 domain words indicating quality problems or causes and 138 problem context domain words. The construction of some domain vocabularies and their results are shown in Table 6.

5.3. Extraction of Multiple Causal Nodes

According to the method described in this study, the irregular descriptions in the data were first replaced by domain vocabulary and then extracted based on the multi-causal node extraction rules. A total of 518 multi-causal nodes were extracted, including 243 multi-causal nodes for the “with” relationship and 275 multi-causal nodes for the “or” relationship, and some results are shown in Table 7.

5.4. Construction of Causal Knowledge Sets

On the basis of extracting multiple causal nodes, in accordance with the causal knowledge group construction algorithm expounded in this study, a causal knowledge group containing multiple causal relationships and contextual elements can be constructed, as shown in Figure 5. Since the variety of problem contexts is not easy to display, PS is used to represent its contextual set.

5.5. Examples of Construction and Application of DL-CKN

The construction of DL-CKN first requires constructing the abstract causal knowledge network and the concrete causal knowledge network according to the causal knowledge network construction method described in this study. Then, link the already constructed abstract causal knowledge network and the concrete causal knowledge network according to the construction algorithm of the subordination relationship between nodes at different levels, thus completing the construction of DL-CKN.

(1): Construction of the abstract causal knowledge network

When constructing the abstract causal knowledge network, this study selects the granularity of nodes in combination with the actual situation and the suggestions of domain experts. For example, problems such as tearing and pulling cracks are all classified as crack problems, and the corresponding abstract causal knowledge network is constructed, as shown in Figure 6.

The abstract causal knowledge network can not only help analyze the deep-seated causes of problems, that is, “seeking causes from effects”, but also help predict the possible consequences that the problem may further cause, that is, “predicting effects from causes”. Due to the large number of nodes in the constructed abstract causal network and the complex causal relationships, it is not convenient for intuitive display. Taking the necking problem as an example, this study intercepts part of the abstract causal network as an example, as shown in Figure 7.

The part intercepted in Figure 7 includes two layers of causes that can lead to the necking problem, namely, the direct causes of this problem and the causes that can directly trigger these “direct causes”, as well as the problems that necking can directly lead to and the problems that the necking problem can further trigger. For the sake of greater intuitiveness, this study reset the colors and positions of the nodes and edges in Figure 7. In addition, the frequencies of nodes and edges in the abstract causal knowledge network intercepted with the necking problem as the core will change, so this figure will be different from Figure 6.

Through the abstract causal knowledge network of the necking problem, it can be found that there are a total of 27 causes that can directly lead to the occurrence of the necking problem. By analyzing the edge frequencies, it can be found that wrinkling and damage to the binder face are the main causes of this problem, and their probabilities of “seeking causes from effects” are 37 and 12, respectively. When this problem is discovered, these types of causes can be given priority. Non-compliance with the binder face clearance and insufficient hardness of the casting are the secondary causes of this problem, and their edge frequencies are both 1. Meanwhile, based on the abstract causal knowledge network, the causes of problems such as wrinkling and damage to the binder face can be further analyzed, to discover the deeper causes of necking. For example, there are 39 causes of wrinkling, among which uneven material flow and dirt on the binder face are the main causes, and their edge frequencies are 21 and 20, respectively. There are eight causes of damage to the binder face, among which local damage to the die face is the main cause, with an edge frequency of 13. From the abstract causal knowledge network, it can also be inferred that the necking problem may further trigger the occurrence of problems such as cracks and galling, and their edge frequencies are 74 and 27, respectively, so attention can be paid in advance to prevention. Meanwhile, there is also the existence of multiple causes; for example, chip accumulation in the rib groove and damage to the binder face may jointly lead to the occurrence of the necking problem.

As shown in Figure 7, among the many nodes in the abstract causal knowledge network of the necking problem, except for cracks and galling, all can trigger the necking problem, and these are just two layers of causes of the necking problem selected. In the actual data, the causes of some problems can reach six layers or even more than six layers. It can be seen that the causal relationships behind quality problems are indeed complex and changeable, and it is difficult to conduct a comprehensive and in-depth analysis relying solely on personal experience.

(2): Construction of the concrete causal knowledge network

In the construction of the concrete causal knowledge network, according to the actual production situation and combined with the suggestions of on-site workers, this study selects the car model and production parts together as the problem situation to construct the concrete causal knowledge network, as shown in Figure 8.

The constructed concrete causal knowledge network, with the addition of car models and production parts as situational elements, can be used to analyze quality problems under different situations and also makes the nodes appear relatively discrete. For the convenience of display, this study also takes the necking problem as an example and intercepts part of the concrete causal knowledge network as an example. The role of the concrete causal knowledge network is to help analyze the direct causes of quality problems in specific situations more accurately, so the intercepted part is only the direct cause of the necking problem, as shown in Figure 9.

For the convenience of display, this study also reset the colors and positions of the nodes and edges in Figure 9. In addition, the frequencies of nodes and edges in the concrete causal knowledge network intercepted with the necking problem as the core will change, so Figure 9 will be different from Figure 8.

In the example of the abstract causal knowledge network of the necking problem (Figure 7), it is not difficult to find that wrinkling and damage to the binder face are the main causes of the necking problem. However, in the intercepted concrete causal knowledge network of the necking problem (Figure 9), it is not difficult to find that the necking problem occurring when producing the inner panels of the doors of T6 and U5 models is mainly caused by wrinkling; the necking problem occurring when producing the inner panels of the doors of U7 model and the left side panels of U5 and F3 models is mainly caused by damage to the binder face; when producing the inner panels of the doors of the T5 model, both wrinkling and damage to the binder face can trigger the necking problem, and the probabilities are the same, with their edge frequencies both being 6. It can be seen that, although they are all necking problems, the causes vary under different situations, and the concrete causal knowledge network can help front-line employees quickly analyze the possible causes of the problem according to different problem situations, achieving “specific analysis” of “specific problems”, without having to analyze and judge each of the 27 causes that lead to the necking problem (Figure 7), and can quickly formulate corresponding containment measures to avoid the further deterioration of this quality problem or the occurrence of more serious consequences. Similarly, there are also multi-causal nodes in the exemplified concrete causal knowledge network; for example, “material properties are not qualified” or “sheet material does not conform” may both lead to the simultaneous occurrence of cracks and necking when producing the inner panels of the doors of the U7 model.

(3): Construction of DL-CKN

According to the construction algorithm of the subordination relationship between nodes at different levels described in this study, the constructed abstract causal knowledge network and the concrete causal knowledge network can be linked to complete the construction of DL-CKN. For the convenience of distinction, this study recolored the nodes in the concrete causal knowledge network and the abstract causal knowledge network. Among them, the concrete causal knowledge network is located at the lower layer and is presented in pink, while the abstract causal knowledge network is located at the upper layer and is presented in light green, as shown in Figure 10.

DL-CKN can help front-line employees analyze quality problems comprehensively and in depth. Taking the DL-CKN constructed in this study as an example, on the whole, nodes such as cracks, wrinkles, and necking appear with relatively high frequencies (as can be seen in Figure 6), indicating that these quality problems occur rather frequently in the stamping production process of automobile body-in-white and should be given key attention.

DL-CKN harmoniously and integrally combines the abstract causal knowledge network and the concrete causal knowledge network. When encountering quality problems, comprehensive analysis can be carried out through the concrete causal knowledge network to quickly find the surface causes of the problem to formulate temporary containment measures. Through the subordination relationship among nodes, it can be found out which category of problems the problem belongs to. Through level transitions, the causal relationships behind this kind of problem can be analyzed in depth in the abstract causal knowledge network, so that the problem can be solved effectively for a long time. It can be seen that the abstract causal knowledge network and the concrete causal knowledge network in DL-CKN are independent of each other yet closely related, and they can help front-line employees analyze quality problems comprehensively and in depth from different perspectives. Meanwhile, as production continues and QPSD gradually increase, DL-CKN will be further improved and can better help front-line employees analyze the causal relationships of quality problems.

6. Conclusions

This study systematically considers significant factors influencing cause analysis, such as the duality of cause analysis, the complexity of causal relationships, and the existence of multiple causal relationships. By utilizing the causal knowledge and problem situations contained in QPSD, this study proposes and constructs the DL-CKN. The main contributions of this study are as follows:

(1) To address the issue of the irregular descriptive language used in QPSD, a bottom-up, layer-by-layer method for constructing domain vocabulary is proposed. (2) Considering the characteristics of QPSD, a method for extracting multiple causal nodes is designed by analyzing feature words that express such relationships and utilizing syntactic dependency relations. (3) Based on the causal relationships within data structures, a method for constructing causal knowledge sets that include multiple causal relationships and contextual elements is proposed. (4) Building upon the constructed causal knowledge sets, a method for constructing DL-CKN is introduced. (5) By using QPSD generated from the stamping production process of an automobile manufacturing enterprise, a practically applicable DL-CKN is constructed, which validates the effectiveness, rationality, and scientificity of the methods proposed in this study.

The DL-CKN constructed in this study consists of two parts: the concrete causal knowledge network and the abstract causal knowledge network. These two layers are interconnected through the subordination relationship among nodes. When the need arises to analyze quality problems from different perspectives, one can switch to the corresponding layer via these links to conduct causal analysis. This effectively overcomes the difficulty of conducting a comprehensive and in-depth analysis of quality problems relying solely on individual experience. It provides extensive and comprehensive support for enterprise employees in performing causal analysis, thus presenting certain practical application value.

However, the causal relationships among quality problems are highly complex and dynamic. The question of how to utilize DL-CKN to precisely infer not only the surface level but also the root causes of quality problems still constitutes a topic meriting further in-depth research. Moreover, the integration and application of the construction method of DL-CKN are areas that warrant deeper exploration in the future.

Author Contributions

Conceptualization, Y.W.; methodology, T.L.; software, S.Q.; validation, X.Y.; formal analysis, K.Z.; investigation, T.L. and X.Y.; writing—original draft preparation, Y.W. and S.Q.; writing—review and editing, T.L. and K.Z.; funding acquisition, Y.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Program for the Philosophy and Social Sciences Research of Higher Learning Institutions of Shanxi (PSSR) under Grant No.2023W085, Shanxi Provincial Basic Research Program-Free Exploration Category under Grant No. 202303021222085, Henan Philosophy and Social Science Program under Grant No. 2023BJJ083, Key R&D and Promotion Project in Henan Province (Soft Science Projects under Grant No. 242400411089, and Key Technology Projects under Grant No.242102320060), China Postdoctoral Science Foundation under Grant No. 2023M733209, Key Scientific Research Projects of Universities in Henan Province under Grant No. 24A630032, National Social Science Fund of China under Grant No. 24BGL289.

Data Availability Statement

The data presented in this study are available upon request from the authors.

Acknowledgments

We sincerely thank Wei Wei (from School of Management, Zhengzhou University) for his kind help.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Representation Methods of Multiple Causal Relationships

One cause and multiple effects

In the situation of one cause with multiple effects, according to the differences in the “and” and “or” relationships among multiple result nodes, it can be divided into the following two basic situations: Firstly, when one cause leads to the occurrence of different problems in non-simultaneous situations, there is an “or” relationship among the various result nodes at this time. Secondly, when one cause triggers multiple problems simultaneously, there is an “and” relationship among the various result nodes at this time, as shown in Figure A1.

Figure A1. Schematic diagram of one cause with multiple effects.

2.: Multiple causes and one result

In the case of multiple causes and one result, according to the different “and” and “or” relationships among multiple cause nodes, it can be divided into two basic situations: Firstly, different causes can independently trigger a certain problem. At this time, there is an “or” relationship among each cause node. Secondly, a certain problem is caused by the joint action of multiple causes. At this time, there is an “and” relationship among each cause node, as shown in Figure A2.

Figure A2. Schematic diagram of multiple causes and one result.

3.: Multiple causes and multiple results

In the case of multiple causes and multiple results, according to the different “and” and “or” relationships among multiple cause nodes and result nodes, it can be divided into the following four basic situations:

Firstly, multiple causes can independently trigger multiple non-simultaneous problems. At this time, there is an “or” relationship between each cause node and each result node.

Secondly, multiple different causes can independently trigger multiple problems that appear simultaneously. At this time, there is an “or” relationship between each cause node, and there is an “and” relationship between each result node.

Thirdly, multiple causes jointly cause multiple non-simultaneous problems. At this time, there is an “and” relationship between each cause node, and there is an “or” relationship between each result node.

Fourthly, multiple causes work together and simultaneously trigger multiple problems. At this time, there is an “and” relationship between each cause node and each result node, as shown in Figure A3.

Figure A3. Schematic diagram of multiple causes and multiple results.

The existence of multiple causal relationships makes the manifestation forms of causal knowledge groups diverse and cumbersome. This is not only unfavorable for the construction of ML-CKN but also inconvenient for subsequent research on reasoning and applications of ML-CKN. Therefore, it is necessary to conduct in-depth analysis on the manifestation forms of multiple causal relationships and perform splitting and combination processing to unify their manifestation forms. This can not only provide convenience for the subsequent construction research of ML-CKN in this study but also be beneficial to research on reasoning and applications of ML-CKN.

Appendix B. The Splitting Method of the “Or” Relationship

The “or” relationship represents that the problems or causes represented by each node with this relationship may occur, but they do not occur simultaneously. Instead, they are multiple complete causal relationships that occur multiple times. Therefore, the multiple causal relationships involving “or” can be split into one cause and one result or multiple causal relationships with only “and”. The specific splitting method is as follows.

The “or” relationship in one cause and multiple results can be split into one cause and one result relationships with the same number of “result nodes”, as shown in Figure A4.

Figure A4. Schematic diagram of the splitting method of the “or” relationship in one cause and multiple results.

The “or” relationship in multiple causes and one result can be split into one cause and one result relationships with the same number of “cause nodes”, as shown in Figure A5.

Figure A5. Schematic diagram of the splitting method of the “or” relationship in multiple causes and one result.

The “or” relationship in multiple causes and multiple results can be split into the following three situations:

Firstly, each cause node and each result node are multiple causes and multiple results of “or” relationship, which can be divided into multiple one-cause and one-result relationships.

Secondly, each cause is an “or” relationship and each problem is a multi-cause and multi-result of the “and” relationship. It can be split into one-cause-and-multiple-results relationships with only “and” relationships and the same number of “cause nodes”.

Thirdly, multiple causes and multiple results with an “and” relationship between each cause and an “or” relationship between each problem, and it can be split into multiple causes and one result relationships with only “and” relationships and the same number of “result nodes”.

The specific splitting method is shown in Figure A6.

Figure A6. Schematic diagram of the splitting method of the “or” relationship in multiple causes and multiple results.

Through the above splitting method, complex multiple causal relationships are divided into four forms: one cause and one result, one cause and multiple results containing only “and” relationships, multiple causes and one result, and multiple causes and multiple results, as shown in Figure A7.

Figure A7. Schematic diagram of multiple causal relationships after splitting.

Appendix C. The Combination Method of the “And” Relationship

After the multiple causal relationships are split by the “or” relationship, there is only an “and” relationship between the nodes. The nodes with an “and” relationship are indispensable and form a complete causal relationship and cannot be split. Therefore, this study performs a combination treatment on the nodes with an “and” relationship, that is, treating them as a whole to form a “group node”. To distinguish “group nodes” from other general nodes, this study uses square nodes (□) to represent “group nodes” and triangular nodes (△) to represent general nodes.

Therefore, the combination method of the “and” relationship can be expressed as follows.

In one cause and multiple results, multiple result nodes containing the “and” relationship can be combined into a problem “group node”, as shown in Figure A8.

Figure A8. Schematic diagram of the combination method of the “and” relationship in one cause and multiple results.

In multiple causes and one result, multiple cause nodes containing the “and” relationship can be combined into a cause “group node”, as shown in Figure A9.

Figure A9. Schematic diagram of the combination method of the “and” relationship in multiple causes and one result.

In multiple causes and multiple results, multiple cause nodes and multiple result nodes containing the “and” relationship can be combined into a cause “group node” and a problem “group node” respectively, as shown in Figure A10.

Figure A10. Schematic diagram of the combination method of the “and” relationship in multiple causes and multiple results.

Through the above splitting of the “or” relationship and the combination of the “and” relationship, all multiple causal relationships can be represented in the form of “one cause and one result”, as shown in Figure A11.

Figure A11. Causal knowledge groups in the form of “one cause and one result”.

The causal knowledge group in this study is represented in the form of “node-edge-node”. Although it resembles “one cause and one result”, due to the existence of multiple causal relationships, it contains both triangle general nodes (△) representing individual quality problems or causes and square “group nodes” (□) representing the existence of “and” relationships.

The causal knowledge group in the form of “one cause and one result” unifies the manifestation of multiple causal relationships. This is not only conducive to the construction of DL-CKN, providing convenience for the subsequent research in this study, but also helpful for research on aspects such as the reasoning and application of DL-CKN.

References

Xu, Z.; Dang, Y.; Zhang, Z.; Chen, J. Typical short-term remedy knowledge mining for product quality problem-solving based on bipartite graph clustering. Comput. Ind. 2020, 122, 103–277. [Google Scholar] [CrossRef]
Li, J.; Xiong, R.B.; Lan, Y.Y.; Pan, L.; Guo, J.F.; Cheng, X.Q. Review of the frontiers of causal machine learning. Comput. Res. Dev. 2023, 60, 59–84. [Google Scholar]
Wang, H.; Jiang, W.; Wei, L.; Huang, W. Model innovation and prospect of product lifecycle quality management based on the Internet of Things. Theory Pract. Syst. Eng. 2021, 41, 475–482. [Google Scholar]
Zonnenshain, A.; Kenett, R.S. Quality 4.0—The challenging future of quality engineering. Qual. Eng. 2020, 32, 614–626. [Google Scholar] [CrossRef]
Wang, Y.; Dang, Y.; Xu, Z. Causal network construction method of Chinese text data based on quality problem solving. Manag. Sci. China 2023, 31, 254–265. [Google Scholar]
Reynolds, M.S.; Spencer, S.P.; Dunaway, A.; Buckingham, D.; Bartman, T. Scientific approach to assess if change led to improvement—Methods for statistical process control analysis in quality improvement. J. Emerg. Nurs. 2021, 47, 198–205. [Google Scholar] [CrossRef]
Dogan, A.; Birant, D. Machine learning and data mining in manufacturing. Expert Syst. Appl. 2021, 166, 1140–1160. [Google Scholar] [CrossRef]
Iwuagwu, E.K.; Agabi, G.A. David Hume’s Notion of Perception and His Problem with Causality. AFRREV IJAH Int. J. Arts Humanit. 2019, 8, 66–76. [Google Scholar] [CrossRef]
Ma, Q.; Li, H.; Thorstenson, A. A big data-driven root cause analysis system: Application of Machine Learning in quality problem solving. Comput. Ind. Eng. 2021, 160, 75–80. [Google Scholar] [CrossRef]
Xu, Z.; Dang, Y. Data-driven causal knowledge graph construction for root cause analysis in quality problem solving. Int. J. Prod. Res. 2023, 61, 3227–3245. [Google Scholar] [CrossRef]
Nogueira, A.R.; Pugnana, A.; Ruggieri, S.; Pedreschi, D.; Gama, J. Methods and tools for causal discovery and causal inference. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2022, 12, e1449. [Google Scholar] [CrossRef]
Saliya, C.A. Research Philosophy: Paradigms, world views, perspectives, and theories. In Social Research Methodology and Publishing Results: A Guide to Non-Native English Speakers; IGI Global: Hershey, PA, USA, 2023; Volume 17, pp. 35–51. [Google Scholar]
Määttänen, P. Facts and Values After David Hume. Int. J. Semiot. Law-Rev. Int. Sémiot. Jurid. 2023, 36, 17–29. [Google Scholar] [CrossRef]
Radder, H. Empiricism must, but cannot, presuppose real causation. J. Gen. Philos. Sci. 2021, 52, 597–608. [Google Scholar] [CrossRef]
Lorkowski, C.M. David Hume, Causation, and the Problem of Induction. J. NeuroPhilos. 2023, 2, 145–156. [Google Scholar]
Yin, P.; Sun, J. Is causation deterministic or probabilistic? A critique of Frosch and Johnson-Laird (2011). J. Cogn. Psychol. 2021, 33, 899–918. [Google Scholar] [CrossRef]
Koutsoyiannis, D.; Onof, C.; Christofides, A.; Kundzewicz, Z.W. Revisiting causality using stochastics: 1. Theory. Proc. R. Soc. A 2022, 478, 835–856. [Google Scholar] [CrossRef]
Glymour, C.; Zhang, K.; Spirtes, P. Review of causal discovery methods based on graphical models. Front. Genet. 2019, 10, 52–54. [Google Scholar] [CrossRef] [PubMed]
Song, W.; He, T. Limitations and integration: Causality and correlation under big data. J. Syst. Sci. 2021, 29, 42–46. [Google Scholar]
Eichberger, J.; Pasichnichenko, I. Decision-making with partial information. J. Econ. Theory 2021, 198, 105–369. [Google Scholar] [CrossRef]
Parzygnat, A.J.; Russo, B.P. A non-commutative Bayes’ theorem. Linear Algebra Appl. 2022, 644, 28–94. [Google Scholar] [CrossRef]
Latz, J. On the well-posedness of Bayesian inverse problems. SIAM/ASA J. Uncertain. Quantif. 2020, 8, 451–482. [Google Scholar] [CrossRef]
Reinchenbach, H. The Principle of Causality and the Possibility of its Empirical Confirmation; Springer: Dordrecht, The Netherlands, 1978. [Google Scholar]
Livadas, S. Talking about models: The inherent constraints of mathematics. Axiomathes 2020, 30, 13–36. [Google Scholar] [CrossRef]
Suppes, P.C. Probabilistic Metaphysics; Basil Blackwell: Oxford, UK, 1984; p. 10. [Google Scholar]
Markus, K.A. Causal effects and counterfactual conditionals: Contrasting Rubin, Lewis and Pearl. Econ. Philos. 2021, 37, 441–461. [Google Scholar] [CrossRef]
Yazdi, M.; Kabir, S. Fuzzy evidence theory and Bayesian networks for process systems risk analysis. Hum. Ecol. Risk Assess. Int. J. 2020, 26, 57–86. [Google Scholar] [CrossRef]
Chai, J.; Shi, H.; Lu, Q.; Hu, Y. Quantifying and predicting the Water-Energy-Food-Economy-Society-Environment Nexus based on Bayesian networks-a case study of China. J. Clean. Prod. 2020, 256, 120–266. [Google Scholar] [CrossRef]
Kitson, N.K.; Constantinou, A.C.; Guo, Z.; Liu, Y.; Chobtham, K. A survey of Bayesian Network structure learning. Artif. Intell. Rev. 2023, 56, 721–814. [Google Scholar] [CrossRef]
Guo, R.; Cheng, L.; Li, J.; Hahn, P.R.; Liu, H. A survey of learning causality with data: Problems and methods. ACM Comput. Surv. (CSUR) 2020, 53, 1–37. [Google Scholar] [CrossRef]
Li, C.; Qiu, W.X. Progress in causal inference methods based on machine learning. Stat. Decis.-Mak. 2021, 37, 10–15. [Google Scholar]
Zhang, D.; Zhang, H.; Tang, J.; Hua, X.S.; Sun, Q. Causal intervention for weakly-supervised semantic segmentation. Adv. Neural Inf. Process. Syst. 2020, 33, 655–666. [Google Scholar]
Geiger, A.; Wu, Z.; Potts, C.; Icard, T.; Goodman, N. Finding alignments between interpretable causal variables and distributed neural representations. Causal Learn. Reason. 2024, 236, 160–187. [Google Scholar]
Taherdoost, H. Data collection methods and tools for research; a step-by-step guide to choose data collection technique for academic and business research projects. Int. J. Acad. Res. Manag. (IJARM) 2021, 10, 10–38. [Google Scholar]
Pearl, J. The seven tools of causal inference, with reflections on machine learning. Commun. ACM 2019, 62, 54–60. [Google Scholar] [CrossRef]
Ohlsson, H.; Kendler, K.S. Applying Causal Inference Methods in Psychiatric Epidemiology: A Review. JAMA Psychiatry 2020, 77, 637–644. [Google Scholar] [CrossRef]
Castro, D.C.; Walker, I.; Glocker, B. Causality matters in medical imaging. Nat. Commun. 2020, 11, 36–73. [Google Scholar] [CrossRef]
Whitaker, R.G.; Sperber, N.; Baumgartner, M.; Thiem, A.; Cragun, D.; Damschroder, L.; Miech, E.J.; Slade, A.; Birken, S. Coincidence analysis: A new method for causal inference in implementation science. Implement. Sci. 2020, 15, 108. [Google Scholar] [CrossRef]
Egami, N.; Fong, C.J.; Grimmer, J.; Roberts, M.E.; Stewart, B.M. How to make causal inferences using texts. Sci. Adv. 2022, 8, 26–52. [Google Scholar] [CrossRef] [PubMed]
Khoo, C.S.G. Automatic Identification of Causal Relations in Text and Their Use for Improving Precision in Information Retrieval; Syracuse University: Syracuse, NY, USA, 1995. [Google Scholar]
Yang, J.; Han, S.C.; Poon, J. A survey on extraction of causal relations from natural language text. Knowl. Inf. Syst. 2022, 64, 61–86. [Google Scholar] [CrossRef]
Schölkopf, B. Causality for machine learning. In Probabilistic and Causal Inference: The Works of Judea Pearl; Association for Computing Machinery: New York, NY, USA, 2022; Volume 28, pp. 765–804. [Google Scholar]
Li, Z.; Li, Q.; Zou, X.; Ren, J. Causality extraction based on self-attentive BiLSTM-CRF with transferred embeddings. Neurocomputing 2021, 423, 207–219. [Google Scholar] [CrossRef]
Akkasi, A.; Moens, M.F. Causal relationship extraction from biomedical text using deep neural models: A comprehensive survey. J. Biomed. Inform. 2021, 119, 8–20. [Google Scholar] [CrossRef] [PubMed]
Liu, L.; Gong, D.; Zhang, Y.; Bai, R. Causal knowledge-led technical opportunity discovery—Take electric vehicle charging pile as an example. Data Anal. Knowl. Discov. 2022, 6, 31–40. [Google Scholar]
Li, W.; Li, Y.; Pandelea, V.; Ge, M.; Zhu, L.; Cambria, E. ECPEC: Emotion-cause pair extraction inconversations. IEEE Trans. Affect. Comput. 2022, 14, 1754–1765. [Google Scholar] [CrossRef]
Zibaei, E.; Borth, R. Building causal models for finding actual causes of unmanned aerial vehicle failures. Front. Robot. AI 2024, 11, 37–62. [Google Scholar] [CrossRef] [PubMed]
Zheng, W.; Zhou, Y.; Liu, S.; Tian, J.; Yang, B.; Yin, L. A deep fusion matching network semantic reasoning model. Appl. Sci. 2022, 12, 3416. [Google Scholar] [CrossRef]
Liu, Y.; Li, K.; Yan, D. Quantification analysis of potential risk in railway accidents: A new random walk based approach. Reliab. Eng. Syst. Saf. 2024, 242, 109–778. [Google Scholar] [CrossRef]
Tao, X.L.; Shi, L.; Zhao, F.; Lu, S.; Peng, Y. A hybrid alarm association method based on AP clustering and causality. Wirel. Commun. Mob. Comput. 2021, 2021, 557–650. [Google Scholar] [CrossRef]
Rehder, B. A causal-model theory of categorization. In Proceedings of the Twenty-First Annual Conference of the Cognitive Science Society; Psychology Press: New York, NY, USA, 2020; pp. 595–600. [Google Scholar]
Zhang, X.; Mahadevan, S. Bayesian network modeling of accident investigation reports foraviation safety assessment. Reliab. Eng. Syst. Saf. 2021, 209, 107371. [Google Scholar] [CrossRef]
Greco, L. Character and Causation: Hume’s Philosophy of Action. Philos. Q. 2021, 72, 246–248. [Google Scholar] [CrossRef]
Laubach, Z.M.; Murray, E.J.; Hoke, K.L.; Safran, R.J.; Perng, W. A biologist’s guide to model selection and causal inference. Proc. R. Soc. B 2021, 288, 8–15. [Google Scholar] [CrossRef]
Deng, J.; Deng, Y. Information volume of fuzzy membership function. Int. J. Comput. Commun. Control 2021, 16, 36–98. [Google Scholar] [CrossRef]
Zhi, H.; Li, Y. Attribute granulation in fuzzy formal contexts based on L-fuzzy concepts. Int. J. Approx. Reason. 2023, 159, 94–97. [Google Scholar] [CrossRef]
Chen, X.; Alexopoulou, T.; Tsimpli, I. Automatic extraction of subordinate clauses and its application in second language acquisition research. Behav. Res. Methods 2021, 53, 803–817. [Google Scholar] [CrossRef]
De Marneffe, M.C.; Manning, C.D.; Nivre, J.; Zeman, D. Universal dependencies. Comput. Linguist. 2021, 47, 255–308. [Google Scholar] [CrossRef]
Natsuda, K.; Thoburn, J. Automotive Industrialisation: Industrial Policy and Development in Southeast Asia; Taylor and Francis: Milton Park, UK, 2020. [Google Scholar]

Figure 1. Research design.

Figure 2. Diagram of causal knowledge group.

Figure 3. Schematic of DL-CKN.

Figure 4. Domain vocabulary construction process.

Figure 5. Causal knowledge sets containing multiple causal and contextual elements.

Figure 6. Abstract causal knowledge network.

Figure 7. Abstract causal knowledge network for the necking problem.

Figure 8. Concrete causal knowledge network.

Figure 9. Concrete causal knowledge network for the necking problem.

Figure 10. DL-CKN.

Table 1. Multi-causal node representation patterns and characteristic words.

Categories of Relationship	Relationship Characteristic Vocabulary	Expression Mode	Syntactic Dependency Relationship
and ( $⋀$ )	and, both and, Together with, Along with, As well as	<domain vocabulary> <and\|both and\|together with\|along with\|as well as> <domain vocabulary>	coordinating relationship appositive relationship
or ( $⋁$ )	or, either or	<domain vocabulary> <or\|either or> <domain vocabulary>	coordinating relationship appositive relationship

Table 2. Calculation of Membership Degree.

Original Entry	Segmentation Processing	Domain Vocabulary Membership Degree
Both wrinkles and cracks appear	‘Both’, ‘wrinkles’, ‘and’, ‘cracks’, ‘appear’	(‘Both’, 0.0009), (‘wrinkles’, 0.9315), (‘appear’, 0.0027), (‘and’, 0.3951), (‘cracks’, 0.9572)
The waste material is blocked for 5 minutes	‘The waste material is blocked’, ‘5’, ‘minutes’	(‘The waste material is blocked’, 0.8658), (‘5’, 0.0001), (‘minutes’, 0.0434),
Bottom cracking	‘Bottom’, ‘cracking’	(‘Bottom’, 0.0139), (‘cracking’, 0.6927)
The R-angle is unsmooth or the clearance between the punch and the male die is small.	‘R-angle’, ‘unsmooth’, ‘or’, ‘punch and the male die’, ‘the clearance is small’	(‘R-angle’, 0.6381), (‘unsmooth’, 0.7201), (‘or’, 0.4145), (‘punch and the male die’, 0.6956), (‘the clearance is small’, 0.6039)
$⋮$	$⋮$	$⋮$

Table 3. Domain vocabulary source sets.

Source Sets of Domain Vocabulary	Frequency
Crack	638
Wrinkle	481
Necking	376
Galling	342
Crease	295
Fracture	290
$⋮$	$⋮$

Table 4. Vocabulary misuse correction.

Misuse of Vocabulary	Revised Vocabulary
carck	crack
spilt	split
nacking	necking
isnert	insert
$⋮$	$⋮$

Table 5. Semantic similarity matrix.

—	Crack	Crack in Workpiece	Splinter	Tearing	$\dots$	Inner Door Panel	Inner Panel of Car Door	Interior Panels of Four Doors
crack	1	0.989	0.988	0.322	$\dots$	0.002	0.003	0.001
crack in workpiece	-	1	0.892	0.118	$\dots$	0.003	0.002	0.002
splinter	-	-	1	0.483	$\dots$	0.002	0.003	0.002
tearing	-	-	-	1	$\dots$	0.003	0.003	0.002
$⋮$	$⋮$	$⋮$	$⋮$	$⋮$	$⋱$	$⋮$	$⋮$	$⋮$
inner door panel	-	-	-	-	$\dots$	1	0.999	0.977
inner panel of car door	-	-	-	-	$\dots$	-	1	0.975
interior panels of four doors	-	-	-	-	$\dots$	-	-	1

Table 6. Schematic table of domain vocabulary construction.

Domain Vocabulary	Semantic Clustering	Revision of the Source Sets of Domain Vocabulary	Source Sets of Domain Vocabulary
Crack, drawing crack, tearing, flange crack	crack, drawing crack, tearing, flange crack	crack, crack in workpiece, drawing crack, tearing, split, cracking, split open, gap, split, have a crack, flange crack, flange split, flange split open	crack, misspelling of “crack”, crack in workpiece, drawing crack, tearing, split, split open, crack, gap, split, misspelling of “gap”, have a crack, flange crack, flange split, flange split open, misspelling of “flange split open”
wrinkle	wrinkle	wrinkle, wrinkle in workpiece, of workpiece, wrinkle, large wrinkle, large wrinkle in workpiece, wrinkling	wrinkle, wrinkle in workpiece, wrinkling of workpiece, wrinkle, large wrinkle, large wrinkle in workpiece, wrinkling
the waste material is blocked, unsmooth waste material sliding	the waste material is blocked, unsmooth waste material sliding	the waste material is blocked, unsmooth waste material sliding, the waste material is blocked completely, unsmooth waste material sliding, slow waste material sliding	the waste material is blocked, misspelling of “the waste material is blocked”, unsmooth waste material sliding, waste material is blocked completely, unsmooth waste material sliding, slow waste material sliding
inner panel of car door	inner panel of car door	inner door panel, inner panel of car door, inside the car door, inside the four doors, inside the door, interior panels of four doors	inner door panel, misspelling of “inner door panel”, inner panel of car door, inside the car door, inside the four doors, inside the door, interior panels of four doors, misspelling of “interior panels of four doors”
$⋮$	$⋮$	$⋮$	$⋮$

Table 7. Schematic table of multi-causal node extraction.

Text Description	Replacement of Domain Vocabulary	Extraction of Multiple Causal Nodes
P1 crack in workpiece and wrinkling	crack and wrinkle	crack $\land$ wrinkle
this problem is caused by the existence of foreign matter in the sheet material and the hardness of the mold	sheet material foreign matter and the hardness of the mold	sheet material foreign matter $\land$ the hardness of the mold
it is caused by low hardness of the mold or dirty sheet material	mold hardness or dirty sheet material	mold hardness $\lor$ dirty sheet material
there is a part dropping or SB3 alarm occurs	part dropping or SB3 alarm	part dropping $\lor$ SB3 alarm
$⋮$	$⋮$	$⋮$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, Y.; Qiang, S.; Yue, X.; Li, T.; Zhang, K. A Two-Layer Causal Knowledge Network Construction Method Based on Quality Problem-Solving Data. Systems 2025, 13, 142. https://doi.org/10.3390/systems13030142

AMA Style

Wang Y, Qiang S, Yue X, Li T, Zhang K. A Two-Layer Causal Knowledge Network Construction Method Based on Quality Problem-Solving Data. Systems. 2025; 13(3):142. https://doi.org/10.3390/systems13030142

Chicago/Turabian Style

Wang, Yubin, Shirong Qiang, Xin Yue, Tao Li, and Keyong Zhang. 2025. "A Two-Layer Causal Knowledge Network Construction Method Based on Quality Problem-Solving Data" Systems 13, no. 3: 142. https://doi.org/10.3390/systems13030142

APA Style

Wang, Y., Qiang, S., Yue, X., Li, T., & Zhang, K. (2025). A Two-Layer Causal Knowledge Network Construction Method Based on Quality Problem-Solving Data. Systems, 13(3), 142. https://doi.org/10.3390/systems13030142

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Two-Layer Causal Knowledge Network Construction Method Based on Quality Problem-Solving Data

Abstract

1. Introduction

2. Research Status and Research Ideas

2.1. Research Status

2.2. Research Design

2.3. Composition and Expression of DL-CKN

3. Methods for Constructing Causal Knowledge Sets

3.1. Construction Method of Domain Vocabulary

3.2. Extraction Method for Multiple Causal Nodes

3.3. Construction Algorithm of Causal Knowledge Sets

4. Construction Method of DL-CKN

4.1. Methods of Constructing Causal Knowledge Networks

4.2. Algorithm for the Construction of Species Relationships for Inter-Level Nodes

5. Case Study

5.1. Case Background

5.2. Building Domain Vocabulary

5.3. Extraction of Multiple Causal Nodes

5.4. Construction of Causal Knowledge Sets

5.5. Examples of Construction and Application of DL-CKN

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A. Representation Methods of Multiple Causal Relationships

Appendix B. The Splitting Method of the “Or” Relationship

Appendix C. The Combination Method of the “And” Relationship

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI