[go: up one dir, main page]

CN107247725B - Structural Integrity Detection Optimization Method Based on Metadata Logically Independent Fragmentation - Google Patents

Structural Integrity Detection Optimization Method Based on Metadata Logically Independent Fragmentation Download PDF

Info

Publication number
CN107247725B
CN107247725B CN201710290286.5A CN201710290286A CN107247725B CN 107247725 B CN107247725 B CN 107247725B CN 201710290286 A CN201710290286 A CN 201710290286A CN 107247725 B CN107247725 B CN 107247725B
Authority
CN
China
Prior art keywords
metadata
structural integrity
metaclass
shiq
formalized
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710290286.5A
Other languages
Chinese (zh)
Other versions
CN107247725A (en
Inventor
赵晓非
柴争义
尤轶
郭永新
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Electric Power Research Institute of State Grid Tianjin Electric Power Co Ltd
Original Assignee
Electric Power Research Institute of State Grid Tianjin Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Electric Power Research Institute of State Grid Tianjin Electric Power Co Ltd filed Critical Electric Power Research Institute of State Grid Tianjin Electric Power Co Ltd
Priority to CN201710290286.5A priority Critical patent/CN107247725B/en
Publication of CN107247725A publication Critical patent/CN107247725A/en
Application granted granted Critical
Publication of CN107247725B publication Critical patent/CN107247725B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/217Database tuning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/211Schema design and management

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a structural integrity detection optimization method based on metadata logic irrelevant fragmentation, which is technically characterized by comprising the following steps of: formalizing repository metadata and data into a description logic SHIQ knowledge base; carrying out logic-independent fragmentation on the SHIQ metadata knowledge base; structural integrity checking is performed on top of the logically unrelated slices. The invention completely reserves all relevant information of given metadata, so that detection for partial metadata can be carried out on a smaller data set and does not need to be carried out on the whole storage library; and the detection of the whole storage library can be decomposed to different metadata fragments, and the experimental result shows that the size of the metadata fragment generated by the method is obviously smaller than the original size on average, so that the efficiency of structural integrity detection performed on the basis of the method can be obviously improved.

Description

基于元数据逻辑无关分片的结构完整性检测优化方法Structural Integrity Detection Optimization Method Based on Metadata Logically Independent Fragmentation

技术领域technical field

本发明属于存储库系统技术领域,尤其是一种基于元数据逻辑无关分片的结构完整性检测优化方法。The invention belongs to the technical field of storage library systems, in particular to a structural integrity detection and optimization method based on logically irrelevant fragmentation of metadata.

背景技术Background technique

存储库系统中元数据的组织方式,即元数据的结构呈现出一种多级的、分层的而且动态变化的复杂结构,因此,保持该种系统的一致性是一项重要任务。存储库系统中的一致性包括:(1)操作一致性:涉及存储库应用间的交互,与存储库事务的概念密切相关。它又分为协作原子性和并发多用户访问。(2)元数据完整性:包括结构完整性和良格式。良格式确保元层次中元素定义的语法正确性,而结构完整性确保一个层次中的元素符合与该层相邻的、更高的元层次中的类型定义。The organization of the metadata in the repository system, that is, the structure of the metadata presents a complex structure that is multi-level, hierarchical and dynamically changing. Therefore, maintaining the consistency of the system is an important task. Consistency in the repository system includes: (1) Operational consistency: it involves the interaction between repository applications and is closely related to the concept of repository transactions. It is further divided into cooperative atomicity and concurrent multi-user access. (2) Metadata integrity: including structural integrity and good format. Well-formed ensures the syntactic correctness of element definitions in a meta-level, while structural integrity ensures that elements in one level conform to type definitions in an adjacent, higher meta-level.

结构完整性是存储库系统一致性的重要组成部分。如果结构完整性得不到保证,存储库系统应用就可能修改或建立Mn层中的元数据元素而与Mn+1层中它们的元类相冲突。例如,一个操作可能会读取某个元素的属性,而该元素的元类并不存在,则该操作是无效的。一个数据库系统包含M0层到M2层,其中M2层的内容是固定不变的。而为了提供可自定义、可扩展的系统框架,存储库系统引入了允许用户对M2层进行定义的M3层,在运行时刻M0,M1和M2层均可以被动态修改,因而就可能导致相邻层次之间的冲突问题,即结构完整性问题。其它系统并不面临这类问题因为它们假定系统框架在运行时刻是静止的。Structural integrity is an important part of repository system consistency. If structural integrity is not guaranteed, repository system applications may modify or create metadata elements in the Mn layer to conflict with their metaclasses in the Mn +1 layer. For example, an operation that might read a property of an element whose metaclass does not exist is invalid. A database system includes M 0 to M 2 layers, where the content of M 2 layer is fixed. In order to provide a customizable and extensible system framework, the repository system introduces the M 3 layer that allows users to define the M 2 layer . At runtime, the M 0 , M 1 and M 2 layers can be dynamically modified. Therefore, It may lead to conflict problems between adjacent layers, that is, structural integrity problems. Other systems do not face this problem because they assume that the system framework is static at runtime.

但是,高的计算开销使得结构完整性的自动检测逐渐成为一个棘手的问题。原因主要有如下四个方面:(1)近年来元数据量的快速增长;(2)元数据的更新频度很高;(3)约束的集合越来越大;(4)约束的内部复杂度越来越高,因此如何对结构完整性检测方法进行优化以提高其效率逐渐成为存储库系统一致性领域的研究热点。However, the high computational overhead makes the automatic detection of structural integrity increasingly a thorny problem. There are four main reasons for this: (1) The amount of metadata has grown rapidly in recent years; (2) The update frequency of metadata is very high; (3) The set of constraints is getting bigger and bigger; (4) The internal complexity of constraints Therefore, how to optimize the structural integrity detection method to improve its efficiency has gradually become a research hotspot in the field of repository system consistency.

目前,元对象设施MOF已经成为国际主流的元数据存储库规范,但是,关于MOF存储系统的结构完整性检测的一种检测方法是将结构完整性约束转化为逻辑表达式,而后将约束检测问题转化为逻辑推理问题,比如Duboisset等人(Duboisset M,et al.Integratingthe calculus-based method into OCL:Study of expressiveness and codegeneration,Proc of the 18th Int Workshop on Database and Expert SystemsApplications.Piscataway,NJ:IEEE,2007:502-506)、Donald等人(Donald C,et al.Usingfirst-order logic to query heterogeneous internet data sources,Proc of the2015Int Conf on Soft Computing and Software Engineering.Holand:AcademicPress,Elsevier,2015:1-8)和Demuth等人(Demuth B,et al.OCL as a specificationlanguage for business rules in database applications,LNCS 2185:Proc of the4th Conf on UML.Berlin:Springer,2006:104-117)的工作。然而由于结构完整性约束包含了递归、否定、包语义等诸多复杂机制,所提出的转化算法很难涵盖上述所有机制,尽管有的算法对该缺陷进行了改进以支持尽可能多的约束机制,但处理方式的高复杂度又致使算法效率不高。At present, the meta-object facility MOF has become the mainstream metadata repository specification in the world. However, a detection method for the structural integrity detection of the MOF storage system is to convert the structural integrity constraints into logical expressions, and then solve the constraint detection problem. Translated into logical reasoning problems, such as Duboisset et al. (Duboisset M, et al. Integrating the calculus-based method into OCL: Study of expressiveness and codegeneration, Proc of the 18th Int Workshop on Database and Expert Systems Applications. Piscataway, NJ: IEEE, 2007 : 502-506), Donald et al. (Donald C, et al. Using first-order logic to query heterogeneous internet data sources, Proc of the 2015 Int Conf on Soft Computing and Software Engineering. Holand: AcademicPress, Elsevier, 2015: 1-8) and Demuth et al. (Demuth B, et al. OCL as a specificationlanguage for business rules in database applications, LNCS 2185: Proc of the 4th Conf on UML. Berlin: Springer, 2006: 104-117). However, since structural integrity constraints include many complex mechanisms such as recursion, negation, and packet semantics, it is difficult for the proposed transformation algorithm to cover all the above mechanisms, although some algorithms have improved this defect to support as many constraint mechanisms as possible. However, the high complexity of the processing method makes the algorithm inefficient.

发明内容SUMMARY OF THE INVENTION

本发明的目的在于克服现有技术的不足,提供一种设计合理、方法简单且效率高的基于元数据逻辑无关分片的结构完整性检测优化方法。The purpose of the present invention is to overcome the deficiencies of the prior art, and to provide a structural integrity detection and optimization method based on logically unrelated fragments of metadata with reasonable design, simple method and high efficiency.

本发明解决现有的技术问题是采取以下技术方案实现的:The present invention solves the existing technical problems and adopts the following technical solutions to realize:

一种基于元数据逻辑无关分片的结构完整性检测优化方法,包括以下步骤:A structural integrity detection and optimization method based on metadata logically irrelevant fragmentation, comprising the following steps:

步骤1、将存储库元数据及数据形式化为描述逻辑SHIQ元数据知识库;Step 1. Form the repository metadata and data into a description logic SHIQ metadata knowledge base;

步骤2、将SHIQ元数据知识库进行逻辑无关分片;Step 2. Perform logically independent sharding of the SHIQ metadata knowledge base;

步骤3:在逻辑无关分片之上执行结构完整性检测;Step 3: Perform structural integrity checks on logically unrelated shards;

所述步骤2的具体实现方法包括以下步骤:The specific implementation method of the step 2 includes the following steps:

⑴根据准则1和准则2计算SHIQ元数据知识库给定的元素a的属性演绎片断

Figure GDA0002624557690000021
Figure GDA0002624557690000022
为SHIQ元数据知识库;(1) Calculate the attribute deduction fragment of the element a given by the SHIQ metadata knowledge base according to the criteria 1 and 2
Figure GDA0002624557690000021
Figure GDA0002624557690000022
for SHIQ metadata knowledge base;

⑵将元素a的全部类断言添加进

Figure GDA0002624557690000023
(2) Add all class assertions of element a into
Figure GDA0002624557690000023

⑶对于

Figure GDA0002624557690000024
中的每个R0(a,b)以及满足
Figure GDA0002624557690000025
的每个R,判断
Figure GDA0002624557690000026
是否满足,如果是,则将R0(a,b)、论据
Figure GDA0002624557690000027
中的断言以及
Figure GDA0002624557690000028
中的断言添加进
Figure GDA0002624557690000029
⑶ for
Figure GDA0002624557690000024
for each R 0 (a, b) in and satisfying
Figure GDA0002624557690000025
for each R, judging
Figure GDA0002624557690000026
Whether it is satisfied, if so, then R 0 (a, b), the argument
Figure GDA0002624557690000027
assertions in and
Figure GDA0002624557690000028
The assertion in the add in
Figure GDA0002624557690000029

⑷计算

Figure GDA0002624557690000031
⑷Calculation
Figure GDA0002624557690000031

所述的准则1为:SHIQ元数据知识库中以元素a作为第一个要素或第二个要素的属性断言;The criterion 1 is: attribute assertion with element a as the first element or the second element in the SHIQ metadata knowledge base;

所述的准则2为:SHIQ元数据知识库中从元素a到元素b的角色路径中的属性断言,这些断言具有同一个传递父角色。Said criterion 2 is: attribute assertions in the role path from element a to element b in the SHIQ metadata knowledge base, these assertions have the same transitive parent role.

所述步骤1包括:将存储库中的元层次Mn+1层和实例层Mn层分别进行形式化,其中n为0或1。The step 1 includes: respectively formalizing the meta-level M n+1 level and the instance level M n level in the repository, where n is 0 or 1.

所述元层次Mn+1层的形式化方法为:The formalized method of the meta-level M n+1 layer is:

⑴将元层次中的每个元类转换为一个SHIQ概念,并使得两个不同的元类可以拥有类型不同但名字相同的属性;(1) Convert each metaclass in the meta hierarchy into a SHIQ concept, and enable two different metaclasses to have attributes of different types but the same name;

⑵将元层次中的类C和一个类型C'形式化为概念以及互逆的两个角色r1和r2(2) Formalize a class C and a type C' in the meta-level into concepts and two reciprocal roles r 1 and r 2 ;

⑶泛化关系:如果一个元类C1是元类C2的泛化,将之形式化为

Figure GDA0002624557690000032
一个元类C1的每个属性以及与元类C1相关的每个聚合关联和一般关联都被元类C2继承下来了,并适用于元类之间的多重继承关系。(3) Generalization relationship: If a metaclass C1 is a generalization of metaclass C2, it can be formalized as
Figure GDA0002624557690000032
Every attribute of a metaclass C1 and every aggregate association and general association related to metaclass C1 is inherited by metaclass C2 and applies to multiple inheritance relationships between metaclasses.

所述实例Mn层的形式化方法为:The formalized method of the example Mn layer is:

⑴若Mn层元素c是其元层次中元类C的实例,则将其形式化为:C(c);(1) If the element c of the Mn layer is an instance of the metaclass C in its meta-level, it is formalized as: C(c);

⑵若Mn层元素c1关联了元素c2,相应的元类C1通过聚合关联A聚合了C2,聚合关联A被形式化为Tbox中角色,则将其形式化为:A(c1,c2);(2) If the element c 1 of the Mn layer is associated with the element c 2 , the corresponding metaclass C 1 aggregates C 2 through the aggregation association A, and the aggregation association A is formalized as the role in the Tbox, then it is formalized as: A(c 1 , c 2 );

⑶若Mn层元素c1关联了元素c2,相应的元类C1通过一般关联与元类C2相联系,而元类C1与元类C2的一般关联被形式化为概念和角色r1,r2,则c1和c2之间的关系可形式化为三个断言:A(a);r1(a,c1);r2(a,c2)。(3) If the element c 1 of the Mn layer is associated with the element c 2 , the corresponding metaclass C 1 is related to the meta class C 2 through a general association, and the general association between the meta class C 1 and the meta class C 2 is formalized as the concept and Roles r 1 , r 2 , then the relationship between c 1 and c 2 can be formalized as three assertions: A(a); r 1 (a, c 1 ); r 2 (a, c 2 ).

所述步骤3的方法为:检测单个元数据元素a的类属关系以及检测元数据元素a、元数据元素b的属性关系是在包含元数据元素a、元数据元素b的片断上进行;检测一个元类的全部实例元素或检测通过一个属性相关联的全部实例元素是将同样的查询在各片断上并行执行而后将结果合并。The method of the step 3 is: detecting the generic relationship of a single metadata element a and detecting the attribute relationship of the metadata element a and the metadata element b are carried out on the fragment containing the metadata element a and the metadata element b; All instance elements of a metaclass or checking all instance elements associated by an attribute is to execute the same query on fragments in parallel and then combine the results.

本发明的优点和积极效果是:The advantages and positive effects of the present invention are:

本发明针对MOF存储库的特点,将元数据的不同层次转换进描述逻辑SHIQ知识库,在此基础上提出了元数据逻辑无关片断的形式定义并给出了如何提取它们的方法,该方法完整地保留了给定元数据的相关的全部信息从而带来了两方面的好处:一方面,针对部分元数据的检测可以在较小数据集上进行而不必针对整个存储库;另一方面,对整个存储库的检测可以分解到不同的元数据片断上进行。实验结果表明平均来说通过本方法所产生的元数据片断的规模显著地小于其原始规模,在此基础上执行的结构完整性检测的效率可以得到显著地提高。Aiming at the characteristics of the MOF repository, the present invention converts different levels of metadata into the description logic SHIQ knowledge base. On this basis, it proposes a formal definition of logically irrelevant pieces of metadata and gives a method for how to extract them. The method is complete Preserving all relevant information about a given metadata brings two benefits: on the one hand, detection for partial metadata can be performed on smaller datasets and not necessarily for the entire repository; The detection of the entire repository can be broken down into different pieces of metadata. Experimental results show that, on average, the size of metadata fragments generated by the present method is significantly smaller than its original size, and the efficiency of structural integrity detection performed on this basis can be significantly improved.

附图说明Description of drawings

图1是本发明的有效性评测的实验结果;Fig. 1 is the experimental result of the validity evaluation of the present invention;

图2是MOF存储库系统中Mn+1层中的一般关联的例子;Figure 2 is an example of a general association in the Mn +1 layer in a MOF repository system;

图3是MOF存储库系统中Mn+1层中的聚合关联的例子;Figure 3 is an example of an aggregate association in the Mn +1 layer in the MOF repository system;

图4是属性类型冲突示例图。FIG. 4 is an example diagram of attribute type conflict.

具体实施方式Detailed ways

以下结合附图对本发明实施例做进一步详述:Embodiments of the present invention are described in further detail below in conjunction with the accompanying drawings:

一种基于元数据逻辑无关分片的结构完整性检测优化方法,包括以下步骤:A structural integrity detection and optimization method based on metadata logically irrelevant fragmentation, comprising the following steps:

步骤1:将存储库元数据及数据形式化为描述逻辑SHIQ知识库。Step 1: Form the repository metadata and data into a description logic SHIQ knowledge base.

MOF框架中相邻层次之间的关系是类型-实例的关系,因此我们将元层次,即Mn+1(n为0或1)层中的信息形式化为SHIQ Tbox

Figure GDA0002624557690000041
中的概念定义,而将作为实例的Mn层中的元素形式化进SHIQAbox
Figure GDA0002624557690000042
具体来说,当n=1时,我们分别将M2层和M1层形式化进Tbox和Abox,检测的是M2层和M1层之间的一致性;当n=0时,我们分别将M1层和M0层形式化进Tbox和Abox,检测的是M1层和M0层之间的一致性。下面分别进行介绍。The relationship between adjacent levels in the MOF framework is a type-instance relationship, so we formalize the information in the meta-level, that is, the M n+1 (n is 0 or 1) layer as SHIQ Tbox
Figure GDA0002624557690000041
The concept definition in , and the elements in the Mn layer as instances are formalized into SHIQAbox
Figure GDA0002624557690000042
Specifically, when n= 1 , we formalize the M2 and M1 layers into Tbox and Abox, respectively, and detect the consistency between the M2 and M1 layers; when n = 0, we The M1 layer and the M0 layer are formalized into Tbox and Abox respectively, and the consistency between the M1 layer and the M0 layer is detected. They are introduced separately below.

1、Mn+1层的形式化1. Formalization of the Mn +1 layer

(1)元类及元属性(1) Meta class and meta attribute

在元层次中,元类也是一种类,因此我们不对元类和类进行区分。由于元类和SHIQ概念都是用于描述实例的集合,因此我们将每个元类转换为一个SHIQ概念。In the meta hierarchy, a metaclass is also a class, so we do not distinguish between metaclasses and classes. Since both metaclasses and SHIQ concepts are used to describe collections of instances, we convert each metaclass into a SHIQ concept.

由于类C的一个类型为C'的属性a将C的每个实例关联到C'的实例,因此属性a是C的实例与C'的实例之间的二元关系,所以我们将属性a形式化为一个SHIQ角色,该角色可以通过如下断言来表示:

Figure GDA0002624557690000051
C'。若a存在多重性i..j,则将该多重性形式化为:
Figure GDA0002624557690000052
上述断言精确地指明了对于概念C的每个实例c,所有通过a关联到c的对象都是C'的实例,并且精确反映了属性名在整个元层次中的不唯一性,即两个不同的元类可以拥有类型不同但名字相同的属性。Since an attribute a of class C of type C' associates each instance of C to an instance of C', the attribute a is a binary relationship between an instance of C and an instance of C', so we put the attribute a in the form into a SHIQ role, which can be represented by the following assertion:
Figure GDA0002624557690000051
C'. If a has multiplicity i..j, then the multiplicity is formalized as:
Figure GDA0002624557690000052
The above assertion precisely specifies that for each instance c of concept C, all objects related to c through a are instances of C', and accurately reflects the non-uniqueness of attribute names in the entire meta-level, that is, two different 's metaclass can have properties of different types but the same name.

(2)一般关联(2) General association

元层次中的一般关联如图2所示。用于表明两个元类的实例之间的二元关系。每个一般关联都包含两个关联端并且对应一个相应的关联类。每个关联端都存在多重性约束。与属性不同,在MOF框架中一般关联的名字是唯一的。A general association in the meta-level is shown in Figure 2. Used to indicate a binary relationship between instances of two metaclasses. Each general association contains two association ends and corresponds to a corresponding association class. A multiplicity constraint exists at each association end. Unlike properties, generally associated names are unique within the MOF framework.

我们将类C和C'之间的一般关联(关联端分别为r1和r2)形式化为概念A以及互逆的两个角色r1和r2,其中r1用于描述关联端r1,它分别以C和C'作为第1和第2个要素。因此r1的要素的取值限定被形式化为:We formalize the general association between classes C and C' (the association ends are r 1 and r 2 , respectively) as a concept A and two reciprocal roles r 1 and r 2 , where r 1 is used to describe the association end r 1 , which takes C and C' as the first and second elements, respectively. Therefore, the value constraints of the elements of r 1 are formalized as:

Figure GDA0002624557690000053
Figure GDA0002624557690000053

r1和r2之间的关系被形式化为r2≡r1ˉ。r1的多重性i1..j1和r2的多重性i2..j2分别被形式化为:The relationship between r 1 and r 2 is formalized as r 2 ≡ r 1 ˉ. The multiplicity i 1 ..j 1 of r 1 and the multiplicity i 2 .. j 2 of r 2 are respectively formalized as:

Figure GDA0002624557690000054
Figure GDA0002624557690000054

(3)聚合关联(3) Aggregate association

元层次中的聚合关联如图3所示,用于表明两个元类的实例之间的部分-整体的关系,是一种二元关系。例如LevelBasedHierachy与HierarchyLevelAssociation之间的聚合关联表示每个LevelBasedHierachy的实例由一组HierarchyLevelAssociation的实例组成。The aggregation association in the meta-level is shown in Figure 3, which is used to indicate the part-whole relationship between instances of two metaclasses, which is a binary relationship. For example, an aggregate association between LevelBasedHierachy and HierarchyLevelAssociation indicates that each instance of LevelBasedHierachy consists of a set of instances of HierarchyLevelAssociation.

由于聚合关联本质上是一般关联的一种形式,因此对聚合关联进行转换的方法与一般关联相同,聚合中的包含类与被包含类之间的区别并没有丢失,我们约定角色的第一要素是包含类。Since an aggregate association is essentially a form of a general association, the method of transforming an aggregate association is the same as that of a general association, and the distinction between the containing class and the contained class in the aggregation is not lost. The first element of our agreement on the role is the containing class.

(4)泛化(4) Generalization

MOF框架中的泛化关系表明子类的每个实例也是父类的实例。因此子类的实例继承了父类的属性,此外它们还可以定义自己的属性。The generalization relation in the MOF framework states that every instance of a subclass is also an instance of the superclass. So instances of subclasses inherit the properties of the parent class, in addition they can define their own properties.

泛化关系是被SHIQ所支持的,如果一个元类C1是元类C2的泛化,我们可以将之形式化为

Figure GDA0002624557690000061
由于
Figure GDA0002624557690000062
的语义是基于子集理论的,因此在SHIQ中如果给定断言
Figure GDA0002624557690000063
把C1作为第i个要素的角色的每个元组也可以把C2的实例作为第i个要素,因为它也是C1的实例。因此在形式化中,C1的每个属性以及与C1相关的每个聚合关联和一般关联都被C2继承下来了。此外这种形式化方式也完全适用于元类之间的多重继承关系。The generalization relation is supported by SHIQ. If a metaclass C1 is a generalization of metaclass C2 , we can formalize it as
Figure GDA0002624557690000061
because
Figure GDA0002624557690000062
The semantics of is based on subset theory, so in SHIQ if an assertion is given
Figure GDA0002624557690000063
Each tuple that has C1 as the role of the ith element can also have an instance of C2 as the ith element, since it is also an instance of C1 . So in the formalization, every property of C1 and every aggregate association and general association related to C1 is inherited by C2 . In addition, this formalization is also fully applicable to the multiple inheritance relationship between metaclasses.

2、Mn层的形式化2. Formalization of the Mn layer

Mn层中的每个元素是Mn+1层中相应元类的实例,元素之间的关系是元类之间相应关联的实例,因此Mn层元素应被转化为SHIQ知识库的Abox

Figure GDA0002624557690000064
转换分三种情况:Each element in the Mn layer is an instance of the corresponding metaclass in the Mn +1 layer, and the relationship between the elements is an instance of the corresponding association between the metaclasses, so the Mn layer element should be transformed into the Abox of the SHIQ knowledge base
Figure GDA0002624557690000064
There are three cases of conversion:

(1)若Mn层元素c是其元层次中元类C的实例,则将其形式化为:C(c);(1) If the element c of the Mn layer is an instance of the metaclass C in its meta-level, it is formalized as: C(c);

(2)若Mn层元素c1关联了c2,相应的元类C1(或其祖先)通过聚合关联A聚合了C2(或其祖先),聚合关联A被形式化为Tbox中角色A,则将其形式化为:A(c1,c2);(2) If the element c 1 of the Mn layer is associated with c 2 , the corresponding metaclass C 1 (or its ancestor) aggregates C 2 (or its ancestor) through the aggregation association A, and the aggregation association A is formalized as the role in the Tbox A, then formalize it as: A(c 1 , c 2 );

(3)若Mn层元素c1关联了c2,相应的元类C1(或其祖先)通过一般关联与元类C2(或其祖先)相联系,而该一般关联被形式化为概念A和角色r1,r2,则c1和c2之间的关系可形式化为三个断言:A(a);r1(a,c1);r2(a,c2)。(3) If the Mn layer element c 1 is associated with c 2 , the corresponding metaclass C 1 (or its ancestor) is associated with the metaclass C 2 (or its ancestor) through a general association, and the general association is formalized as Concept A and roles r 1 , r 2 , the relationship between c 1 and c 2 can be formalized as three assertions: A(a); r 1 (a, c 1 ); r 2 (a, c 2 ) .

步骤2:将SHIQ元数据知识库进行逻辑无关分片。Step 2: sharding the SHIQ metadata knowledge base logically irrelevant.

1、逻辑无关分片的基本思路1. The basic idea of logically irrelevant sharding

通过步骤1的转换,我们获得了元数据知识库

Figure GDA0002624557690000065
下面讨论如何对元数据进行逻辑无关分片。首先给出一些定义。Through the transformation of step 1, we obtain the metadata knowledge base
Figure GDA0002624557690000065
The following discusses how to shard the metadata logically independently. First some definitions are given.

定义1(签名):给定Abox

Figure GDA0002624557690000066
断言γ,γ中出现的元数据元素的集合称为γ的签名,记为Sig(γ)。
Figure GDA0002624557690000067
中所有元数据元素的签名记为 Definition 1 (Signature): Given Abox
Figure GDA0002624557690000066
Assert γ, the set of metadata elements appearing in γ is called the signature of γ, denoted as Sig(γ).
Figure GDA0002624557690000067
The signature of all metadata elements in

定义2(角色路径):若对于i=1,…,n-1,

Figure GDA0002624557690000069
中或者存在角色Ri(ai,ai+1),或者存在角色Riˉ(ai+1,ai),则称元数据元素a1和an之间存在角色路径。Definition 2 (role path): if for i=1,...,n-1,
Figure GDA0002624557690000069
There is either a role R i (a i , a i+1 ), or a role R i ˉ(a i +1 , a i ), then there is a role path between the metadata elements a 1 and an.

角色路径可以包含逆角色。例如给定R1(a1,a2),R2(a3,a2),R3(a3,a4),从a1到a4的角色路径是{R1,R2ˉ,R3},相反从a4到a1的角色路径是{R3ˉ,R2,R1ˉ}。Role paths can contain inverse roles. For example, given R 1 (a 1 , a 2 ), R 2 (a 3 , a 2 ), R 3 (a 3 , a 4 ), the character path from a 1 to a 4 is {R 1 , R 2 ˉ , R 3 }, conversely the role path from a 4 to a 1 is {R 3 ˉ, R 2 , R 1 ˉ}.

元数据知识库上进行的完整查询将导致推理的效率低下及难驾驭性,考虑到元数据知识库中的元数据可以分解为不同的逻辑无关片断,我们可以将查询分解到不同的元数据片断上进行,从而减小待查询的元数据量并且可以并行执行查询。例如要查询和元数据元素Dimension的实例相关的信息,我们仅需在包含Dimension的实例的元数据片断上执行查询即可。为了保持查询结果的完备性,元数据片断必须是逻辑无关的,即该片断必须是给定元数据元素逻辑蕴含的闭包,基于上述分析,我们给出元数据逻辑无关片断的形式定义:A complete query on the metadata knowledge base will lead to inefficiency and unruly reasoning. Considering that the metadata in the metadata knowledge base can be decomposed into different logically unrelated pieces, we can decompose the query into different metadata pieces. This reduces the amount of metadata to be queried and queries can be executed in parallel. For example, to query information related to an instance of the metadata element Dimension, we only need to perform a query on the metadata fragment containing the instance of Dimension. In order to maintain the completeness of the query results, the metadata fragment must be logically independent, that is, the fragment must be a closure of the logical implication of a given metadata element. Based on the above analysis, we give the formal definition of the metadata logically independent fragment:

定义3(元数据逻辑无关片断):令

Figure GDA0002624557690000071
是元数据知识库,集合S是签名。
Figure GDA0002624557690000072
的子集
Figure GDA0002624557690000073
称为签名S的逻辑无关片断当且仅当对于满足Sig(γ)
Figure GDA0002624557690000074
的任意断言γ(类断言或者属性断言),有
Figure GDA00026245576900000721
等价于
Figure GDA0002624557690000076
Definition 3 (Metadata logically irrelevant fragment): Let
Figure GDA0002624557690000071
is the metadata repository, and the set S is the signature.
Figure GDA0002624557690000072
subset of
Figure GDA0002624557690000073
A logically independent segment called a signature S if and only if Sig(γ) is satisfied for
Figure GDA0002624557690000074
For any assertion γ (class assertion or attribute assertion), there are
Figure GDA00026245576900000721
Equivalent to
Figure GDA0002624557690000076

定义3规定了成为元数据逻辑无关片断的充要条件,它确保了签名S中元数据元素的逻辑蕴含的完备性,然而根据定义3以及SHIQ的单调性可知,

Figure GDA0002624557690000077
的任意超集也是S的逻辑无关片断(比如整个Abox
Figure GDA0002624557690000078
总是S的逻辑无关片断),因此定义3并没有确保签名S的逻辑无关片断的唯一性。Definition 3 specifies the necessary and sufficient conditions for becoming a logically irrelevant piece of metadata, which ensures the completeness of the logical implication of metadata elements in the signature S. However, according to Definition 3 and the monotonicity of SHIQ,
Figure GDA0002624557690000077
Any superset of is also a logically unrelated fragment of S (such as the entire Abox
Figure GDA0002624557690000078
is always a logically irrelevant fragment of S), so definition 3 does not ensure the uniqueness of the logically irrelevant fragment of signature S.

我们的目标是划分出精确的元数据片断,该片断仅包含对给定签名必不可少的断言,从而使得产生的分片在保持信息完备性的同时具有最小的规模。简单地说,要使得断言对于给定签名S必不可少,它们必须能够影响S中任意元数据元素的逻辑结论,为了区分这种断言,我们给出如下定义:Our goal is to carve out precise pieces of metadata that contain only the assertions that are essential for a given signature, so that the resulting shards have minimal size while maintaining information integrity. Simply put, for assertions to be necessary for a given signature S, they must be able to affect the logical conclusion of any metadata element in S. To distinguish such assertions, we give the following definitions:

定义4(论据):给定元数据知识库

Figure GDA0002624557690000079
及断言α,且
Figure GDA00026245576900000710
Figure GDA00026245576900000711
的片断
Figure GDA00026245576900000712
是α的论据,当且仅当对于任意
Figure GDA00026245576900000713
Figure GDA00026245576900000714
Figure GDA00026245576900000715
成立。α的论据记作
Figure GDA00026245576900000716
Definition 4 (Argument): Given Metadata Repository
Figure GDA0002624557690000079
and assert α, and
Figure GDA00026245576900000710
say
Figure GDA00026245576900000711
piece of
Figure GDA00026245576900000712
is an argument for α if and only if for any
Figure GDA00026245576900000713
Have
Figure GDA00026245576900000714
and
Figure GDA00026245576900000715
established. The argument for α is written as
Figure GDA00026245576900000716

定义5(关键断言):给定元数据知识库

Figure GDA00026245576900000717
元数据元素a及断言γ,称γ为{a}的关键断言,当且仅当对于a的任意断言α(类断言或者属性断言),有
Figure GDA00026245576900000718
成立。Definition 5 (Key Assertion): Given Metadata Knowledge Base
Figure GDA00026245576900000717
Metadata element a and assertion γ, called γ the key assertion of {a}, if and only if any assertion α (class assertion or attribute assertion) for a, has
Figure GDA00026245576900000718
established.

根据上述定义,断言α的论据

Figure GDA00026245576900000719
实质上是蕴含α的元数据知识库的最小片断,即
Figure GDA00026245576900000720
中每个断言都是α的关键断言。断言γ能够影响签名S中某个元数据元素的逻辑推导当且仅当它出现在该元素的任意属性断言或类断言的论据中,此时γ是S的关键断言。利用S的全部关键断言构造出的逻辑无关片断
Figure GDA0002624557690000081
不仅保持了S中元素的类断言和属性断言的全部信息而且最小的规模。因此下面我们的任务就变成了如何为给定签名S计算仅包含关键断言的逻辑无关片断,即最小逻辑无关片断,除非特别指明,下文的逻辑无关片断均指最小逻辑无关片断。可以证明签名S中每个元素的逻辑无关片断的并集即为S的逻辑无关片断,因此我们仅需为S中单个元数据元素的逻辑无关片断的计算提出算法即可。According to the above definition, the argument for asserting α
Figure GDA00026245576900000719
is essentially the smallest fragment of the metadata knowledge base containing α, namely
Figure GDA00026245576900000720
Each assertion in is a key assertion of α. An assertion γ can affect the logical derivation of a metadata element in a signature S if and only if it appears in the argument of any attribute assertion or class assertion of that element, where γ is the key assertion of S. A logically irrelevant fragment constructed using all key assertions of S
Figure GDA0002624557690000081
Not only maintains all the information of class assertion and attribute assertion of elements in S but also the minimum size. Therefore, our task below becomes how to calculate a logically irrelevant fragment containing only key assertions for a given signature S, that is, the minimum logically irrelevant fragment. Unless otherwise specified, the following logically irrelevant fragment refers to the minimum logically irrelevant fragment. It can be proved that the union of the logically independent fragments of each element in the signature S is the logically independent fragment of S, so we only need to propose an algorithm for the calculation of the logically independent fragments of a single metadata element in S.

要计算给定的单个元数据元素a的逻辑无关分片,我们需要判断

Figure GDA0002624557690000082
中的每个断言是否为a的关键断言,即必须测试每个断言是否与a的属性断言或类断言的推导有关。根据SHIQ的推理方法可知,元数据元素的类断言既依赖于类断言也依赖于属性断言,相反不同元数据元素之间的属性断言仅受属性断言的影响而与类断言无关,因此单个元数据元素a的逻辑无关分片可以通过三步实现:首先计算断言的集合
Figure GDA0002624557690000083
其中每个断言都与a的任意属性断言R(a,b)的推导有关,我们将该集合称为属性演绎片断,接着计算断言的集合
Figure GDA0002624557690000084
其中每个断言都与a的任意类断言C(a)的推导有关,我们将该集合称为类演绎片断,最后将集合
Figure GDA0002624557690000085
Figure GDA0002624557690000086
合并即得a的逻辑无关片断。To compute the logically unrelated shard given a single metadata element a, we need to decide
Figure GDA0002624557690000082
Whether each assertion in a is a key assertion of a, i.e., must test whether each assertion is related to the derivation of a property assertion or class assertion. According to SHIQ's reasoning method, the class assertion of metadata elements depends on both the class assertion and the attribute assertion. On the contrary, the attribute assertion between different metadata elements is only affected by the attribute assertion and has nothing to do with the class assertion. Therefore, a single metadata Logically independent sharding of element a can be achieved in three steps: first compute the set of assertions
Figure GDA0002624557690000083
where each assertion is related to the derivation of an arbitrary attribute assertion R(a,b) of a, we call this set an attribute deduction fragment, and then compute the set of assertions
Figure GDA0002624557690000084
where each assertion is related to the derivation of an arbitrary class assertion C(a) of a, we call this set a class deduction fragment, and finally the set
Figure GDA0002624557690000085
and
Figure GDA0002624557690000086
Merge to get the logically unrelated pieces of a.

2、

Figure GDA0002624557690000087
的计算2,
Figure GDA0002624557690000087
calculation

元数据元素a的属性演绎片断

Figure GDA0002624557690000088
中的每个断言都与a的任意属性断言R(a,b)的推导有关,因此
Figure GDA0002624557690000089
Figure GDA00026245576900000810
其中γ是Abox断言而
Figure GDA00026245576900000811
是R(a,b)的论据。由于角色层次和传递角色都会对属性断言产生影响,因此
Figure GDA00026245576900000812
的计算需要考虑两种断言:第一种断言形如
Figure GDA00026245576900000813
Figure GDA00026245576900000814
Figure GDA00026245576900000815
第二种断言是从a到b的角色层次中的断言,这些断言均有传递的父角色R0
Figure GDA00026245576900000816
例如R1(a,a1),R2(a2,a1),
Figure GDA00026245576900000817
而R1,R2ˉ,
Figure GDA00026245576900000818
且R0是传递角色。由第一种断言我们得到准则1:
Figure GDA00026245576900000819
中以a作为第一个要素或第二个要素的属性断言,由第二种断言我们得到准则2:
Figure GDA00026245576900000820
中从a到b的角色路径中的属性断言,这些断言具有同一个传递父角色。因此同时满足准则1和准则2的属性断言的集合即为元数据元素a的属性演绎片断
Figure GDA0002624557690000091
Attribute deduction fragment of metadata element a
Figure GDA0002624557690000088
Each assertion in is related to the derivation of an arbitrary property assertion R(a,b) of a, so
Figure GDA0002624557690000089
Figure GDA00026245576900000810
where γ is the Abox assertion and
Figure GDA00026245576900000811
is the argument for R(a,b). Since both role hierarchies and passing roles have an impact on attribute assertions, so
Figure GDA00026245576900000812
There are two kinds of assertions that need to be considered for the computation of : the first assertion has the form
Figure GDA00026245576900000813
or
Figure GDA00026245576900000814
Figure GDA00026245576900000815
The second type of assertion is an assertion in the role hierarchy from a to b, which all have a passed parent role R 0 and
Figure GDA00026245576900000816
For example R 1 (a, a 1 ), R 2 (a 2 , a 1 ),
Figure GDA00026245576900000817
And R 1 , R 2 ˉ,
Figure GDA00026245576900000818
And R 0 is the transfer role. From the first assertion we get criterion 1:
Figure GDA00026245576900000819
With a as the attribute assertion of the first element or the second element, we get criterion 2 from the second assertion:
Figure GDA00026245576900000820
Attribute assertions in the role path from a to b that have the same transitive parent role. Therefore, the set of attribute assertions that satisfy both criterion 1 and criterion 2 is the attribute deduction fragment of metadata element a
Figure GDA0002624557690000091

3、

Figure GDA0002624557690000092
的计算3.
Figure GDA0002624557690000092
calculation

要计算元数据元素a的逻辑无关片断,需要在获得的

Figure GDA0002624557690000093
的基础上进一步计算
Figure GDA0002624557690000094
由于
Figure GDA0002624557690000095
中的每个断言都与a的任意类断言C(a)的推导有关,因此
Figure GDA0002624557690000096
其中γ是Abox断言而
Figure GDA0002624557690000097
是C(a)的论据。To calculate the logically unrelated fragment of metadata element a, it is necessary to
Figure GDA0002624557690000093
on the basis of further calculation
Figure GDA0002624557690000094
because
Figure GDA0002624557690000095
Each assertion in is related to the derivation of an arbitrary class assertion C(a) of a, so
Figure GDA0002624557690000096
where γ is the Abox assertion and
Figure GDA0002624557690000097
is the argument for C(a).

如前所述,在SHIQ中元数据元素的类断言的推导既依赖于类断言也依赖于属性断言,因此a的类断言是

Figure GDA0002624557690000098
必不可少的组成部分。为了识别影响C(a)的属性断言,还需要对
Figure GDA0002624557690000099
中的每个断言进行鉴别。给定元数据知识库
Figure GDA00026245576900000910
Figure GDA00026245576900000911
仅当元素a的断言的支撑概念被C所包含,才会有C(a)成立,因此为了识别与a的类断言的推导有关的断言,必须确定该断言的支撑概念被某个概念所包含。例如令
Figure GDA00026245576900000912
若要识别R0(a,b)是否与a的类断言推导有关,必须确定R0(a,b)的支撑概念被某个概念所包含,即是否有As mentioned earlier, the derivation of class assertions for metadata elements in SHIQ depends on both class assertions and attribute assertions, so the class assertion for a is
Figure GDA0002624557690000098
essential component. To identify property assertions affecting C(a), it is also necessary to
Figure GDA0002624557690000099
Each assertion in is authenticated. Given metadata knowledge base
Figure GDA00026245576900000910
Figure GDA00026245576900000911
C(a) holds only if the supporting concept of the assertion of element a is contained by C, so in order to identify an assertion related to the derivation of the class assertion of a, it must be determined that the supporting concept of the assertion is contained by a concept . e.g. order
Figure GDA00026245576900000912
To identify whether R 0 (a, b) is related to the class assertion derivation of a, it must be determined that the supporting concept of R 0 (a, b) is contained by a concept, that is, whether there is

Figure GDA00026245576900000913
Figure GDA00026245576900000913

其中

Figure GDA00026245576900000914
且b∈C1。不难看出通过将C1替换为B、C2替换为A后上式是可满足的,因此可知R0(a,b)与C2(a)的推导有关,即R0(a,b)在C2(a)的论据中因而应该被加入
Figure GDA00026245576900000915
不仅如此,由于C1(b)也是推导出C2(a)的要素,
Figure GDA00026245576900000916
中的断言也应该被加入
Figure GDA00026245576900000917
in
Figure GDA00026245576900000914
And b∈C 1 . It is not difficult to see that the above formula can be satisfied by replacing C 1 with B and C 2 with A, so it can be seen that R 0 (a, b) is related to the derivation of C 2 (a), that is, R 0 (a, b ) should therefore be added to the argument for C 2 (a)
Figure GDA00026245576900000915
Not only that, since C 1 (b) is also an element for deriving C 2 (a),
Figure GDA00026245576900000916
Assertions in should also be added
Figure GDA00026245576900000917

上面的例子仅考虑了影响类断言推导的单个属性断言,实际上后者有时会被多个断言所影响,例如令

Figure GDA00026245576900000918
Figure GDA00026245576900000919
此时R0(a,b)仍然与A(a)的推导有关,但是按照公式(1)判断R0(a,b)的支撑概念是否被某个概念所包含时却发现它是不可满足的,因此为了包含关于元数据元素a的全部信息,应将公式(1)扩展为:The above example only considers a single property assertion that affects the deduction of a class assertion, in fact the latter can sometimes be affected by multiple assertions, e.g.
Figure GDA00026245576900000918
Figure GDA00026245576900000919
At this time, R 0 (a, b) is still related to the derivation of A (a), but according to formula (1), when judging whether the supporting concept of R 0 (a, b) is contained by a certain concept, it is found that it is unsatisfiable , so in order to include all the information about the metadata element a, formula (1) should be extended to:

Figure GDA00026245576900000920
Figure GDA00026245576900000920

其中C3为元素a的所有其它信息的整合且

Figure GDA00026245576900000921
如果把SHIQ的数量限定考虑在内,公式(2)应进一步扩展为:where C3 is the integration of all other information for element a and
Figure GDA00026245576900000921
If the quantitative limitation of SHIQ is taken into account, Equation (2) should be further extended to:

Figure GDA0002624557690000101
Figure GDA0002624557690000101

其中

Figure GDA0002624557690000102
Figure GDA0002624557690000103
公式(2)中
Figure GDA0002624557690000104
的仅是≥nR.C1的特例并且
Figure GDA0002624557690000105
代表C1元素a的R-邻居。公式(3)是识别影响C(a)的属性断言的最一般形式,它表明在元数据知识库中,对于影响元数据元素a的类断言的任意属性断言R(a,b),相应的支撑概念必被某个概念所包含并且对于任意的数量限定,a的R-邻居的数目应不少于限定的数目。in
Figure GDA0002624557690000102
and
Figure GDA0002624557690000103
In formula (2)
Figure GDA0002624557690000104
is only a special case of ≥nR.C 1 and
Figure GDA0002624557690000105
represents the R-neighbors of C1 element a. Equation (3) is the most general form of identifying attribute assertions affecting C(a), it shows that in the metadata knowledge base, for any attribute assertion R(a, b) affecting the class assertion of metadata element a, the corresponding The supporting concept must be contained by a concept and for any number of constraints, the number of R-neighbors of a should be no less than the number of constraints.

4、执行元数据逻辑无关分片算法4. Execute metadata logic independent sharding algorithm

由于元数据元素的类断言既依赖于类断言也依赖于属性断言,相反不同元素之间的属性断言仅受属性断言的影响而与类断言无关,因此单个元数据元素a的逻辑无关分片应首先计算属性演绎片断

Figure GDA0002624557690000106
其中每个断言都与a的任意属性断言R(a,b)的推导有关,然后在此基础上计算类演绎片断
Figure GDA0002624557690000107
其中每个断言都与a的任意类断言C(a)的推导有关,最后求二者的并集即得a的逻辑无关片断。据此思路得到如下算法。Since the class assertion of a metadata element depends on both the class assertion and the attribute assertion, on the contrary, the attribute assertion between different elements is only affected by the attribute assertion and has nothing to do with the class assertion, so the logically irrelevant fragmentation of a single metadata element a should be First compute the property deduction snippet
Figure GDA0002624557690000106
where each assertion is related to the derivation of an arbitrary property assertion R(a,b) of a, and then computes the class deduction piece based on that
Figure GDA0002624557690000107
Each of these assertions is related to the derivation of an arbitrary class assertion C(a) of a, and finally the union of the two is obtained to obtain a logically irrelevant fragment of a. According to this idea, the following algorithm is obtained.

Figure GDA0002624557690000108
Figure GDA0002624557690000108

步骤3:在逻辑无关分片之上执行结构完整性检测。Step 3: Perform structural integrity checks on top of logically unrelated shards.

由于通过步骤2求得的片断是给定元数据元素逻辑蕴含的闭包,因此逻辑无关分片使得在较小的元数据集合上执行结构完整性检测或并行执行该检测成为可能。按照元数据知识库的初始规模,我们可以将其划分为合适规模的互不相交的子集,然后通过算法1生成同等数量的逻辑无关片断。检测单个元数据元素a的类属关系以及检测两个元数据元素a、b的属性关系可以在包含a、b的片断上进行而不必针对整个元数据知识库;另一方面,检测某个元类的全部实例元素或检测通过某个属性相关联的全部实例元素可以将同样的查询在各片断上并行执行而后将结果合并。Since the fragments obtained by step 2 are closures of the logical implications of a given metadata element, logically independent fragmentation makes it possible to perform structural integrity checks on smaller sets of metadata or to perform the checks in parallel. According to the initial size of the metadata knowledge base, we can divide it into disjoint subsets of suitable size, and then generate the same number of logically unrelated fragments through Algorithm 1. Detecting the generic relationship of a single metadata element a and detecting the attribute relationship of two metadata elements a and b can be performed on the fragment containing a and b without necessarily targeting the entire metadata knowledge base; on the other hand, detecting a certain metadata All instance elements of a class or checking all instance elements associated by a property can execute the same query in parallel on each fragment and combine the results.

根据结构完整性约束,如果某个操作修改了元层次中某属性的类型,而新类型不是原有类型的超类且原有类型是元层次中已存在的元类,若下面层次中的元素没有被修改就会产生结构完整性冲突。该类冲突可以通过下面的语句来检测(假定元类Property的属性referencedType的类型由StructuredType变为SimpleType,而非DataType等等超类型,如图4所示):According to structural integrity constraints, if an operation modifies the type of an attribute in the meta-hierarchy, and the new type is not a superclass of the original type and the original type is a meta-class that already exists in the meta-hierarchy, if the element in the following hierarchy Structural integrity violations occur if they are not modified. This type of conflict can be detected by the following statement (assuming that the type of the attribute referencedType of the metaclass Property is changed from StructuredType to SimpleType, not a supertype such as DataType, as shown in Figure 4):

Figure GDA0002624557690000111
Figure GDA0002624557690000111

在本例的查询中,计算count1和count2时均需执行查询原子referencedType(property,datatype),该原子根据已知的property检索整个元数据知识库用以确定所有被property引用的datatype,而count2的计算另需执行查询原子SimpleType(datatype),该原子检索整个元数据知识库用以确定所有属于SimpleType的datatype。基于步骤2的逻辑无关分片,这两个查询原子均可被并行执行从而提高检测效率。In the query of this example, the query atom referencedType(property, datatype) needs to be executed when calculating count1 and count2. This atom retrieves the entire metadata knowledge base according to the known property to determine all the datatypes referenced by the property. The computation additionally executes the query atom SimpleType(datatype), which retrieves the entire metadata repository to determine all datatypes belonging to SimpleType. Based on the logically unrelated sharding in step 2, both query atoms can be executed in parallel to improve detection efficiency.

下例是聚合多重性冲突的检测。根据结构完整性约束,如果某个操作修改了元层次中的聚合端的多重性,而下层元素没有被相应修改将导致相应实例数目与修改后的多重性冲突。可以通过下面的语句来检测该类冲突(假定Aggregation和AggregationEnd间的聚合在AggregationEnd端的多重性由1改为2):The following example is the detection of aggregated multiplicity collisions. According to the structural integrity constraints, if an operation modifies the multiplicity of the aggregate end in the meta hierarchy, and the lower element is not modified accordingly, it will cause the corresponding instance number to conflict with the modified multiplicity. This type of conflict can be detected by the following statement (assuming that the multiplicity of the aggregation between Aggregation and AggregationEnd at AggregationEnd is changed from 1 to 2):

Figure GDA0002624557690000121
Figure GDA0002624557690000121

在本例中,count1的计算需要执行查询原子Aggregation-AggregationEnd(aggregation,aggregationEnd),该原子检索整个元数据知识库用以确定所有与已知的aggregation通过Aggregation-AggregationEnd相关联的aggregationEnd,基于步骤2的逻辑无关分片,该查询原子可以被并行优化从而提高检测效率。In this example, the calculation of count1 requires the execution of the query atom Aggregation-AggregationEnd(aggregation,aggregationEnd), which retrieves the entire metadata repository to determine all aggregationEnds associated with known aggregations through Aggregation-AggregationEnd, based on step 2 For logically independent sharding, the query atom can be optimized in parallel to improve detection efficiency.

为了评测本发明的有效性,我们进行了大量实验,重点测试了本优化方法对结构完整性检测时间性能的提升。实验的实例集取自MOF元数据存储库系统MBRS。该系统的结构由存储库客户端、存储库管理模块和数据存储构成。存储库客户端用于在该系统之上建立存储库应用;存储库管理模块用于处理元数据并为存储库客户端提供服务,它实现了元数据逻辑无关分片及并行化处理;数据存储由M0层及其上的各层元数据构成。其中存储库管理模块又包括:一组良定义的MBRS接口API,这些API的实现是基于对JMI反射的扩展;元数据管理器,它将元数据组织成层次结构并管理各层元数据的查询及存储。MBRS使用Oracle11g数据库存储M0层数据和元数据。结构完整性冲突是采用系统实例和人工植入两种方式,它们涵盖了结构完整性的各个方面,包括与包的删除和建立相关的冲突、与更改关联端和属性的多重性相关的冲突、与修改引用相关的冲突等。实验结果表明,本优化方法对各类结构完整性冲突的检测效率均有不同程度的提升。In order to evaluate the effectiveness of the present invention, we conducted a large number of experiments, focusing on testing the improvement of the time performance of structural integrity detection by the optimization method. The set of instances for the experiment was taken from the MOF metadata repository system MBRS. The structure of the system consists of repository client, repository management module and data storage. The repository client is used to build repository applications on the system; the repository management module is used to process metadata and provide services for the repository client, which implements metadata logic-independent sharding and parallel processing; data storage It is composed of M 0 layer and the metadata of each layer above it. The repository management module also includes: a set of well-defined MBRS interface APIs, the implementation of which is based on the extension of JMI reflection; the metadata manager, which organizes metadata into a hierarchical structure and manages the query of metadata at each layer and storage. MBRS uses Oracle11g database to store M 0 layer data and metadata. Structural Integrity Conflicts are implemented in both system instances and artificially implanted. They cover all aspects of Structural Integrity, including conflicts related to the deletion and creation of packages, conflicts related to changing the multiplicity of association ends and attributes, Conflicts related to modifying references, etc. The experimental results show that the optimization method improves the detection efficiency of various structural integrity conflicts to varying degrees.

有效性测试是在Intel Xeon E7-4830八核CPU、6GB内存的运行环境下进行的。分类测试的耗用时间以毫秒计,测试结果如图1所示。该图显示了不同规模的M0、M1、M2层元数据时结构完整性检测的执行时间。其中上半部分是当M2层元数据规模较小(24个类)时,变化相应的M1和M0层元素的数目时检测一致性所耗费的时间。黑色曲线反映的是不采用优化措施时的执行时间,蓝色、橙色、紫色曲线分别为M2层划分为2片、4片和8片时结构完整性检测花费的时间。下半部分反映的是M2层元数据规模更大,相应M1和M0层元数据规模也更大时的执行时间。可以看出在不采用优化措施以及划分的片断数分别为2、4和8时执行结构完整性检测所耗费的时间均与元数据规模之间呈线性关系,与预期相符合。片断数增加一倍时检测时间均并没有降低为原时间的一半,究其原因应该是分片过程本身占用一定时间所致。尽管如此,片断数的增加均导致执行检测的时间显著减少。从实验结果也可以看出,平均来说在中小规模的元数据集之上该优化方法对时间效率的提升是显著的。Validity tests are performed under the operating environment of an Intel Xeon E7-4830 eight-core CPU and 6GB of memory. The elapsed time of the classification test is measured in milliseconds, and the test results are shown in Figure 1. The figure shows the execution time of structural integrity detection for different scales of M 0 , M 1 , and M 2 layers of metadata. The upper part is the time it takes to detect consistency when the number of elements in the corresponding M1 and M0 layers is changed when the size of the M2 layer metadata is small (24 classes). The black curve reflects the execution time without optimization measures, and the blue, orange, and purple curves represent the time spent on structural integrity testing when the M 2 layer is divided into 2, 4, and 8 slices, respectively. The lower part reflects the execution time when the M2 level metadata is larger, and the corresponding M1 and M0 level metadata is larger. It can be seen that the time spent performing structural integrity detection is linearly related to the metadata size when no optimization measures are adopted and the number of divided segments is 2, 4, and 8, which is in line with expectations. When the number of fragments is doubled, the detection time is not reduced to half of the original time. The reason should be that the fragmentation process itself takes a certain time. Nonetheless, an increase in the number of fragments resulted in a significant reduction in the time to perform the detection. It can also be seen from the experimental results that, on average, the optimization method improves the time efficiency significantly on medium and small-scale metadata sets.

需要强调的是,本发明所述的实施例是说明性的,而不是限定性的,因此本发明包括并不限于具体实施方式中所述的实施例,凡是由本领域技术人员根据本发明的技术方案得出的其他实施方式,同样属于本发明保护的范围。It should be emphasized that the embodiments described in the present invention are illustrative rather than restrictive, so the present invention includes but is not limited to the embodiments described in the specific implementation manner. Other embodiments derived from the scheme also belong to the protection scope of the present invention.

Claims (5)

1.一种基于元数据逻辑无关分片的结构完整性检测优化方法,其特征在于包括以下步骤:1. a structural integrity detection optimization method based on metadata logic irrelevant fragmentation, is characterized in that comprising the following steps: 步骤1、将存储库元数据及数据形式化为描述逻辑SHIQ元数据知识库;Step 1. Form the repository metadata and data into a description logic SHIQ metadata knowledge base; 步骤2、将SHIQ元数据知识库进行逻辑无关分片;Step 2. Perform logically independent sharding of the SHIQ metadata knowledge base; 步骤3:在逻辑无关分片之上执行结构完整性检测;Step 3: Perform structural integrity checks on logically unrelated shards; 所述步骤2的具体实现方法包括以下步骤:The specific implementation method of the step 2 includes the following steps: ⑴根据准则1和准则2计算SHIQ元数据知识库给定的元素a的属性演绎片断
Figure FDA0002624557680000011
Figure FDA0002624557680000012
为SHIQ元数据知识库;
(1) Calculate the attribute deduction fragment of the element a given by the SHIQ metadata knowledge base according to the criteria 1 and 2
Figure FDA0002624557680000011
Figure FDA0002624557680000012
for SHIQ metadata knowledge base;
⑵将元素a的全部类断言添加进
Figure FDA0002624557680000013
(2) Add all class assertions of element a into
Figure FDA0002624557680000013
⑶对于
Figure FDA0002624557680000014
中的每个R0(a,b)以及满足
Figure FDA0002624557680000015
的每个R,判断
Figure FDA0002624557680000016
是否满足,如果是,则将R0(a,b)、论据
Figure FDA0002624557680000017
中的断言以及
Figure FDA0002624557680000018
中的断言添加进
Figure FDA0002624557680000019
⑶ for
Figure FDA0002624557680000014
for each R 0 (a, b) in and satisfying
Figure FDA0002624557680000015
for each R, judging
Figure FDA0002624557680000016
Whether it is satisfied, if so, then R 0 (a, b), the argument
Figure FDA0002624557680000017
assertions in and
Figure FDA0002624557680000018
The assertion in the add in
Figure FDA0002624557680000019
⑷计算
Figure FDA00026245576800000110
⑷Calculation
Figure FDA00026245576800000110
所述的准则1为:SHIQ元数据知识库中以元素a作为第一个要素或第二个要素的属性断言;The criterion 1 is: attribute assertion with element a as the first element or the second element in the SHIQ metadata knowledge base; 所述的准则2为:SHIQ元数据知识库中从元素a到元素b的角色路径中的属性断言,这些断言具有同一个传递父角色。Said criterion 2 is: attribute assertions in the role path from element a to element b in the SHIQ metadata knowledge base, these assertions have the same transitive parent role.
2.根据权利要求1所述的基于元数据逻辑无关分片的结构完整性检测优化方法,其特征在于:所述步骤1包括:将存储库中的元层次Mn+1层和实例层Mn层分别进行形式化,其中n为0或1。2. the structural integrity detection and optimization method based on metadata logically irrelevant fragmentation according to claim 1, is characterized in that: described step 1 comprises: meta-level M n+1 level and instance level M in the storage repository n layers are formalized separately, where n is 0 or 1. 3.根据权利要求2所述的基于元数据逻辑无关分片的结构完整性检测优化方法,其特征在于:所述元层次Mn+1层的形式化方法为:3. the structural integrity detection optimization method based on metadata logically irrelevant fragmentation according to claim 2, is characterized in that: the formalized method of described meta-level M n+1 layer is: ⑴将元层次中的每个元类转换为一个SHIQ概念,并使得两个不同的元类可以拥有类型不同但名字相同的属性;(1) Convert each metaclass in the meta hierarchy into a SHIQ concept, and enable two different metaclasses to have attributes of different types but the same name; ⑵将元层次中的类C和一个类型C'形式化为概念以及互逆的两个角色r1和r2(2) Formalize a class C and a type C' in the meta-level into concepts and two reciprocal roles r 1 and r 2 ; ⑶泛化关系:如果一个元类C1是元类C2的泛化,将之形式化为
Figure FDA00026245576800000111
一个元类C1的每个属性以及与元类C1相关的每个聚合关联和一般关联都被元类C2继承下来了,并适用于元类之间的多重继承关系。
(3) Generalization relationship: If a metaclass C1 is a generalization of metaclass C2, it can be formalized as
Figure FDA00026245576800000111
Every attribute of a metaclass C1 and every aggregate association and general association related to metaclass C1 is inherited by metaclass C2 and applies to multiple inheritance relationships between metaclasses.
4.根据权利要求2所述的基于元数据逻辑无关分片的结构完整性检测优化方法,其特征在于:所述实例Mn层的形式化方法为:4. the structural integrity detection optimization method based on metadata logically irrelevant fragmentation according to claim 2, is characterized in that: the formalized method of described instance M n layer is: ⑴若Mn层元素c是其元层次中元类C的实例,则将其形式化为:C(c);(1) If the element c of the Mn layer is an instance of the metaclass C in its meta-level, it is formalized as: C(c); ⑵若Mn层元素c1关联了元素c2,相应的元类C1通过聚合关联A聚合了C2,聚合关联A被形式化为Tbox中角色,则将其形式化为:A(c1,c2);(2) If the element c 1 of the Mn layer is associated with the element c 2 , the corresponding metaclass C 1 aggregates C 2 through the aggregation association A, and the aggregation association A is formalized as the role in the Tbox, then it is formalized as: A(c 1 , c 2 ); ⑶若Mn层元素c1关联了元素c2,相应的元类C1通过一般关联与元类C2相联系,而元类C1与元类C2的一般关联被形式化为概念和角色r1,r2,则c1和c2之间的关系可形式化为三个断言:A(a);r1(a,c1);r2(a,c2)。(3) If the element c 1 of the Mn layer is associated with the element c 2 , the corresponding metaclass C 1 is related to the meta class C 2 through a general association, and the general association between the meta class C 1 and the meta class C 2 is formalized as the concept and Roles r 1 , r 2 , then the relationship between c 1 and c 2 can be formalized as three assertions: A(a); r 1 (a, c 1 ); r 2 (a, c 2 ). 5.根据权利要求1所述的基于元数据逻辑无关分片的结构完整性检测优化方法,其特征在于:所述步骤3的方法为:检测单个元数据元素a的类属关系以及检测元数据元素a、元数据元素b的属性关系是在包含元数据元素a、元数据元素b的片断上进行;检测一个元类的全部实例元素或检测通过一个属性相关联的全部实例元素是将同样的查询在各片断上并行执行而后将结果合并。5. the structural integrity detection optimization method based on metadata logic irrelevant fragmentation according to claim 1, is characterized in that: the method of described step 3 is: detect the generic relationship of single metadata element a and detect metadata The attribute relationship of element a and metadata element b is performed on the fragment containing metadata element a and metadata element b; detecting all instance elements of a metaclass or detecting all instance elements associated with an attribute is the same. The query is executed in parallel on each fragment and the results are merged.
CN201710290286.5A 2017-04-28 2017-04-28 Structural Integrity Detection Optimization Method Based on Metadata Logically Independent Fragmentation Active CN107247725B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710290286.5A CN107247725B (en) 2017-04-28 2017-04-28 Structural Integrity Detection Optimization Method Based on Metadata Logically Independent Fragmentation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710290286.5A CN107247725B (en) 2017-04-28 2017-04-28 Structural Integrity Detection Optimization Method Based on Metadata Logically Independent Fragmentation

Publications (2)

Publication Number Publication Date
CN107247725A CN107247725A (en) 2017-10-13
CN107247725B true CN107247725B (en) 2020-10-23

Family

ID=60016861

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710290286.5A Active CN107247725B (en) 2017-04-28 2017-04-28 Structural Integrity Detection Optimization Method Based on Metadata Logically Independent Fragmentation

Country Status (1)

Country Link
CN (1) CN107247725B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109344206B (en) * 2018-12-03 2021-07-16 天津电气科学研究院有限公司 An Automatic Repair Method for OLAP Metadata Conflict Based on Query Reasoning
CN109840078B (en) * 2018-12-25 2022-06-10 北京仁科互动网络技术有限公司 Method and device for collaboratively editing hierarchical metadata

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1534518A (en) * 2003-03-27 2004-10-06 微软公司 Reproduction of consistency element in application defined system

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1534518A (en) * 2003-03-27 2004-10-06 微软公司 Reproduction of consistency element in application defined system

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
"基于描述逻辑的CWM元数据冲突的检测和消解";赵晓非 等;《计算机科学》;20101130;第37卷(第11期);第166-171页 *
"基于描述逻辑的元数据存储库系统的结构完整性检测";赵晓非 等;《小型微型计算机系统》;20100630;第1084-1089页 *
"基于描述逻辑的数据挖掘元数据的一致性检验";朱小栋 等;《小型微型计算机系统》;20090228;第266-270页 *
"描述逻辑SHIF和SHIQ的ABox一致性判定算法研究";彭立;《中国优秀硕士学位论文全文数据库•信息科技辑》;20140815;摘要,第1-14页 *
"描述逻辑SHIQ研究";杨敬波 等;《曲阜师范大学学报》;20060131;第32卷(第1期);第63-66页 *

Also Published As

Publication number Publication date
CN107247725A (en) 2017-10-13

Similar Documents

Publication Publication Date Title
Fan et al. Reasoning about record matching rules
JP5255000B2 (en) Query pattern to enable type flow of element types
Fan et al. Dynamic constraints for record matching
Pérez-Urbina et al. Tractable query answering and rewriting under description logic constraints
Song et al. Efficient discovery of similarity constraints for matching dependencies
US20090234801A1 (en) Approximating query results by relations over types for error detection and optimization
US11941001B2 (en) Optimizing cursor loops in relational database systems using custom aggregates
Thakkar et al. Example-guided synthesis of relational queries
Kimelfeld et al. A relational framework for classifier engineering
Cao et al. Bounded conjunctive queries
Tellez et al. Automatically verifying temporal properties of pointer programs with cyclic proof
Lutz et al. Efficiently enumerating answers to ontology-mediated queries
Romero et al. Discovering functional dependencies for multidimensional design
Zhou et al. Sia: Optimizing queries using learned predicates
CN107247725B (en) Structural Integrity Detection Optimization Method Based on Metadata Logically Independent Fragmentation
Shetty et al. SoftNER: Mining knowledge graphs from cloud incidents
Filippidis et al. An improvement of the piggyback algorithm for parallel model checking
Calders et al. Expressive power of an algebra for data mining
Boneva et al. Relational to RDF data exchange in presence of a shape expression schema
Wu et al. A survey of deep learning models for structural code understanding
Li et al. A method for fuzzy quantified querying over fuzzy resource description framework graph
Zingg et al. Verified first-order monitoring with recursive rules
Niu et al. Felix: Scaling inference for markov logic with an operator-based approach
Han et al. Mining Top-K constrained cross-level high-utility itemsets over data streams
Gulavani et al. Bottom-up shape analysis using lisf

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant