WO2018086025A1

WO2018086025A1 - Node identification in distributed adaptive networks

Info

Publication number: WO2018086025A1
Application number: PCT/CN2016/105324
Authority: WO
Inventors: Guangyue LU; Fan Wang; Jingyu FENG
Original assignee: Nokia Technologies Oy; Nokia Technologies (Beijing) Co., Ltd.
Priority date: 2016-11-10
Filing date: 2016-11-10
Publication date: 2018-05-17
Also published as: CN110168557B; CN110168557A

Abstract

A method for distributed networks may comprise: collecting first data information transmitted at an observation point from a first node to a second node in a distributed network, wherein the second node is a neighbor node of the first node； comparing the first data information with second data information transmitted at the observation point by the second node to its neighbor nodes in the distributed network； and determining whether the first data information is to be used by the second node in data fusion associated with the observation point, based at least in part on the comparison of the first data information with the second data information

Description

NODE IDENTIFICATION IN DISTRIBUTED ADAPTIVE NETWORKS

FIELD OF THE INVENTION

The present disclosure generally relates to communication networks, and more specifically, relates to distributed adaptive networks.

BACKGROUND

Communication networks can be deployed in centralized or distributed manner. In the centralized strategies, all nodes transmit their information to a central authority for making decision or data fusion, which can fully benefit from information collected throughout a network. Unfortunately, the central authority can easily become a bottleneck of the network, that is, the failure of the centralized node would become the disaster because of the possible denial-of-service attack. To overcome the limitation, distributed strategies, without a central authority, attracts more attentions for its good performances and robustness to the attack. In such case, many kinds of distributed algorithms are needed to fulfill the parameter estimation.

On the other hand, the networks are not always in a secure situation. For example, the possible malicious nodes tamper their own values and transmit them to their neighbor nodes for data fusion, which could definitely affect the performance of entire network during the data diffusion among the nodes. So how to identify a malicious node and then prevent its value from being transmitted to its neighbor nodes have become crucial issues for distributed networks.

SUMMARY

The present description proposes a novel solution for distributed networks, which may be utilized to identify some specified data information and network nodes, for example, abnormal data and related nodes in the distributed networks.

According to a first aspect of the present disclosure, there is provided a method comprising: collecting first data information transmitted at an observation point from a first node to a second node in a distributed network, wherein the second node is a neighbor node of the first node； comparing the first data information with second data information transmitted at the observation point by the second node to its neighbor nodes in the distributed network； and determining whether the first data information is to be used by the second node in data fusion associated with the observation point, based at least in part on the comparison of the first data information with the second data information.

According to a second aspect of the present disclosure, there is provided an apparatus comprising at least one processor and at least one memory comprising computer program code, the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus at least to: collect first data information transmitted at an observation point from another apparatus to the apparatus in a distributed network, wherein the apparatus is a neighbor of the another apparatus； compare the first data information with second data information transmitted at the observation point by the apparatus to its neighbors in the distributed network； and determine whether the first data information is to be used by the apparatus in data fusion associated with the observation point, based at least in part on the comparison of the first data information with the second data information.

According to a third aspect of the present disclosure, there is provided a computer program product comprising a computer-readable medium bearing computer program codes embodied therein for use with a computer, the computer program codes comprising: code for collecting first data information transmitted at an observation point from a first node to a second node in a distributed network, wherein the second node is a neighbor node of the first node； code for comparing the first data information with second data information transmitted at the observation point by the second node to its neighbor nodes in the distributed network； and code for determining whether the first data information is to be used by the second node in data fusion associated with the observation point, based at least in part on the comparison of the first data information with the second data information.

According to a fourth aspect of the present disclosure, there is provided an apparatus comprising: collection means for collecting first data information transmitted at an observation point from another apparatus to the apparatus in a distributed network, wherein the apparatus is a neighbor of the another apparatus； comparison means for comparing the first data information with second data information transmitted at the observation point by the apparatus to its neighbors in the distributed network； and determination means for determining whether the first data information is to be used by the apparatus in data fusion associated with the observation point, based at least in part on the comparison of the first data information with the second data information.

In accordance with an exemplary embodiment, the apparatus according to the second/fourth aspect of the present disclosure may comprise the second node, and the another apparatus according to the second/fourth aspect of the present disclosure may comprise the first node.

In accordance with an exemplary embodiment, the comparison of the first data information with the second data information may comprise calculating a similarity distance between the first data information and the second data information. The similarity distance may represent a distance between values of the first data information and the second data information.

In accordance with an exemplary embodiment, it is determined that the first data information is to be used by the second node in the data fusion associated with the observation point, in response to the similarity distance between the first data information and the second data information not satisfying a first criterion.

In accordance with another exemplary embodiment, it is determined that the first data information is not to be used by the second node in the data fusion associated with the observation point, in response to the similarity distance between the first data information and the second data information satisfying a first criterion. For example, the first data information may be replaced with the second data information by the second node in the data fusion associated with the observation point.

Optionally, the method according to the first aspect of the present disclosure may further comprise: updating first identification information for the first node, in response to determining that the first data information is not to be used by the second node in the data fusion associated with the observation point. The first identification information may indicate, within an observation cycle comprising one or more observation points, a number of observation points at which data information from the first node is determined not to be used in data fusion by the second node.

In accordance with an exemplary embodiment, the method according to the first aspect of the present disclosure may further comprise: updating second identification information for the first node, in response to the first identification information at the end of the observation cycle satisfying a second criterion. The second identification information may indicate a number of observation cycles at the end of which the first identification information for the first node satisfies the second criterion.

In accordance with an exemplary embodiment, the method according to the first aspect of the present disclosure may further comprise: triggering a predefined action for the first node, in response to the second identification information satisfying a third criterion. For example, the predefined action may comprise raising an alarm.

BRIEF DESCRIPTION OF THE DRAWINGS

The disclosure itself, the preferable mode of use and further objectives are best understood by reference to the following detailed description of the embodiments when read in conjunction with the accompanying drawings, in which:

Fig. 1 exemplarily illustrates a distributed network topology and data transmissions among nodes in accordance with an embodiment of the present disclosure；

Fig. 2 is a flowchart illustrating a method for distributed networks in accordance with an embodiment of the present disclosure；

Fig. 3 exemplarily illustrates the effects of network performance as the change of the predefined fusion threshold in the absence of attacks in accordance with an embodiment of the present disclosure；

Fig. 4 is a flowchart illustrating an identification procedure in accordance with an embodiment of the present disclosure；

Fig. 5 exemplarily illustrates the system architecture of a data management module in accordance with an embodiment of the present disclosure；

Fig. 6 exemplarily illustrates a network topology used in the simulations in accordance with an embodiment of the present disclosure；

Fig. 7 exemplarily illustrates the detection of the number of attacks under two kinds of attack modes in accordance with an embodiment of the present disclosure；

Fig. 8 exemplarily illustrates the comparison of DLMS performance in the unsafe situation in accordance with an embodiment of the present disclosure；

Fig. 9 exemplarily illustrates the impact of the increased number of malicious nodes on the network performance in accordance with an embodiment of the present disclosure； and

Fig. 10 is a simplified block diagram of an apparatus suitable for use in practicing exemplary embodiments of the present disclosure.

DETAILED DESCRIPTION OF THE INVENTION

The embodiments of the present disclosure are described in details with reference to the accompanying drawings. Reference throughout this specification to features, advantages, or similar language does not imply that all of the features and advantages that may be realized with the present disclosure should be or are in any single embodiment of the disclosure. Rather, language referring to the features and advantages is understood to mean that a specific feature, advantage, or characteristic described in connection with an embodiment is included in at least one embodiment of the present disclosure. Furthermore, the described features, advantages, and characteristics of the disclosure may be combined in any suitable manner in one or more embodiments. One skilled in the relevant art will recognize that the disclosure may be practiced without one or more of the specific features or advantages of a particular embodiment. In other instances, additional features and advantages may be recognized in certain embodiments that may not be present in all embodiments of the disclosure.

Distributed adaptation over networks has emerged as an attractive and challenging research area with the advent of multiagent (wireless or wired) networks. Distributed adaptive networks are well-suited for the decentralized information processing and optimization tasks to model various types of self-organized and complex behaviors encountered in engineering, such as target localization, cooperative spectrum sensing, distributed estimation, etc. Distributed adaptive networks consist of a collection of nodes with processing and learning abilities. The interconnected nodes form a connecting topology, and cooperate with each other through local interactions to perform preassigned tasks in real time.

In the adaptive networks, the nodes share information with each other and the continuous information diffusion across the network enables the nodes to adapt their performances to the data stream and network conditions. It can also achieve the improved adaptation and learning performance, as comparing with the non-cooperative nodes. However, the distributed networks may be in an unsecure situation where the possible malicious nodes may deteriorate the performance of entire network during the data diffusion among the nodes.

When a small amount of malicious nodes exist in the adaptive networks, some technical problems may have to be confronted. For example, the malicious node can launch different kinds of attacks, with certain probability, to its neighbor nodes at any time via different data falsifying ways. Therefore, it is hard to control and predict the attack behavior of the malicious node. The malicious node may transmit its falsified value to the regular nodes, which will cause catastrophic damage to the performance of the entire network, due to the use of the falsified value in data fusion, and thus make a well designed distributed system useless. With the increased number of malicious nodes, the regular nodes would suffer more fierce attack, which may lead to more serious damage to the entire network.

There are some possible schemes designed to deal with the problem of malicious nodes in distributed adaptive networks. For example, one way to detect malicious nodes is the sort-based method. In this method, the central authority sorts the values received from its neighbors and regards the values at both ends of them as malicious. However, confronting different attack modes, the values of malicious may change randomly, which makes it hard to identify a malicious node only via its value position. The effectiveness of the method is sensitive to the network topology and the type of malicious nodes. Another possible scheme relies on the differences between the values of each node and the average value of all neighbor nodes to identify the malicious nodes, which assumes that the malicious node has the biggest difference from the average value. However, this scheme is only useful when facing constant attack, that is, it is also limited by the attack modes and the increased number of malicious nodes. An improved scheme is designed, which also uses the differences between the value of single node and the average value of all nodes to identify malicious nodes. However, it needs a central authority to complete all the procedure and the central authority plays a key role for the entire network, which imposes too much pressure on the central authority.

It can be seen that the mentioned schemes are only for particular attack modes, and the number of malicious nodes is also a great limitation on the performance of such schemes. However, there actually exist malicious nodes with one or mixed attack mode in real networks.

In comparison to the current solutions, methods, apparatus, and computer program products provided according to exemplary embodiments of the present disclosure can provide a novel and effective solution to identify the malicious nodes in distributed networks. The node identification scheme proposed in the present disclosure could almost eliminate the influences of malicious nodes and greatly improve the performance of entire network.

Fig. 1 exemplarily illustrates a distributed network topology and data transmissions among nodes in accordance with an embodiment of the present disclosure. The distributed network as shown in Fig. 1 is a connected network consisting of N nodes. Each node has the ability of communicating with its neighbors and processing data. It will be realized that Fig. 1 merely shows schematically a topology structure and data transmissions of the exemplary distributed network and other possible implementations of the distributed network may also be conceived in view of the exemplary embodiments of the present disclosure.

For data communications, a node in the distributed network as shown in Fig. 1 may generate data at a time instant and broadcast it to all of its neighbor nodes for cooperation. For example, node l may generate data (which may be represented by a data vector with M components

) at time i and transmit it to all of its neighbor nodes (such as neighbor node 2, node 3 and node k) . On the other hand, the node may receive data from its neighbor nodes and perform data fusion by using the received data and its own data. For example, the data of node k and all data from its neighbors (which may be denoted as the data collection φ_k (i) ＝ [ψ_l (i) , … ψ_k-1 (i) , ψ_k (i) , … ψ_N (i) ] ) would be used for data fusion at node k.

If the entire network is safe, these nodes cooperate with each other to make beneficial contributions to the network, which is the ideal situation. In this situation, the values of these nodes are the same. Here the value of a node may refer to the specific value of data transmitted by the node. For example, if all nodes cooperate with each other to estimate the temperature, then the value of a node refers to the temperature estimated and transmitted by the node.

However, a few of malicious nodes in the network will make the situation change. For example, node l as one neighbor of node k may be one of the malicious nodes in the network. Though there are various kinds of attack modes, the vector ψ_l (i) may be falsified with certain probability in two forms such as the same-direction attack and the Gaussian attack, respectively.

For the same-direction attack, the attacker (such as node l) may launch attacks at time i, with probability p_l, via tampering its value either in the increasing way or in the decreasing way, which results in ψ_l (i) at the both ends of φ_k (i) after the sorting operation on φ_k (i) . For example, the same-direction attack can be expressed as follows:

where node l, if launching attack, may multiply the j-th component

by λ (i) at time i with probability p_l, and, if no attacking, do nothing with probability 1-p_l.

Being different from the same-direction attack, the Gaussian attack may randomly generate a Gaussian value as the component of ψ_l. Mathematically, this attack mode can be written as:

where the j-th component

may be determined by the Gaussian distribution with mean α_l and variance

at time i with probability p_l, and, if no attacking, do nothing with probability 1-p_l.

It is worth noting that most nodes in the distributed network or distributed adaptive network are rational nodes. From the social perspective, for example, a person is willing to allow his/her friends into his/her home because of the close relationships between them. Conversely, the malicious person who intends to harm others will not get any permission because of his/her deteriorated relationship with regular persons. Thus, there may be a standard, as a designed metric, to measure this kind of relationship. In view of this analysis, a novel and effective solution is proposed in the present disclosure to identify some specified nodes (such as the malicious nodes) and solve the attack problem in the distributed adaptive network.

Fig. 2 is a flowchart illustrating a method for distributed networks in accordance with an embodiment of the present disclosure. The method illustrated in Fig. 2 may be performed, for example, at a node or any other network element with node functionality in a distributed network. It will be realized that the distributed network to which the proposed method may be applicable may comprise the network as shown in Fig. 1 or any other adaptive network with distributed deployment.

A plurality of nodes comprising a first node and a second node in the distributed network may cooperate with each other through data communications and information exchanges. According to the method illustrated in Fig. 2, the second node which is a neighbor node of the first node may collect first data information transmitted at an observation point (such as time instant i) from the first node to the second node in the distributed network, as shown in block 202. For example, the first data information may comprise data information measured, sensed, generated and/or derived by the first node.

In accordance with an exemplary embodiment, the second node may also collect data information transmitted at this observation point from its other neighbor nodes in addition to the first node. The collected data information may be used by the second node with its own data information to perform information diffusion or data diffusion. The data diffusion among the network may increase both the performance and the risk in unsafe situation. So it is desirable to find ways to prevent the values of malicious nodes from entering the process of data fusion.

Since all nodes in the distributed network may have the same task, such as, estimation or tracing, the values of regular nodes are similar to each other. However, the values of malicious nodes will deviate from the norm range, which can be used to detect the malicious node. Accordingly, data information from the detected malicious node may be prevented from being used in data fusion.

Thus, as shown in block 204, the second node may compare the first data information with second data information transmitted at the observation point by the second node to its neighbor nodes in the distributed network. In block 206, the second node can determine whether the first data information is to be used by the second node in data fusion associated with the observation point, based at least in part on the comparison of the first data information with the second data information.

In accordance with an exemplary embodiment, the comparison of the first data information with the second data information may comprise calculating a similarity distance between the first data information and the second data information. The similarity distance may represent a distance between values of the first data information and the second data information. For example, the similarity distance η_l,k (i) may be defined as:

where ψ_l (i) and ψ_k (i) respectively represent the data or data information transmitted by node k and node l at time i , and η_l, _k (i) represents the distance between the value of node k and that of its neighbor node l at time i.

If node l is assumed as a malicious node, the distance between ψ_l (i) and ψ_k (i) is greater than the distant between ψ_k (i) and the data of other regular neighbor nodes. Then, the similarity distance η_l, _k (i) can be used as an indicator of the malicious node, which can be calculated before the data fusion at node k.

This principle may be taken into account in the proposed method described with respect to Fig. 2. As such, in response to the similarity distance between the first data information and the second data information not satisfying a first criterion, it is determined that the first data information is to be used by the second node in the data fusion associated with the observation point. Otherwise, it is determined that the first data information is not to be used by the second node in the data fusion associated with the observation point, in response to the similarity distance between the first data information and the second data information satisfying a first criterion.

If the first data information is determined not to be used by the second node in the data fusion, for example, the first data information may be replaced with the second data information by the second node in the data fusion associated with the observation point. This enables the second node to eliminate or reduce the adverse effects on the data fusion caused by the abnormal data from the first node, while maintaining the normal number of contributors to the data fusion.

In accordance with an exemplary embodiment, the first criterion may comprise the similarity distance η_l, _k (i) being larger than a predefined fusion threshold r. As such, if the first criterion is satisfied (η_l, _k (i) ＞r) , then ψ_l (i) would be considered as the value of malicious node and need to be processed before data fusion at node k. Otherwise, ψ_l (i) may be regarded as safe one for data fusion. Through this identification scheme, each node in the network can be evaluated before the process of data fusion via the similarity distance to judge whether it may be considered as a malicious node or not.

Fig. 3 exemplarily illustrates the effects of network performance as the change of the predefined fusion threshold in the absence of attacks in accordance with an embodiment of the present disclosure. The Mean Square Deviation (MSD) curve may be used to measure the performance of estimation of the entire network and MSD can be expressed as：

where w⁰ is a M×1 target vector which may be estimated by using the conventional Diffusion Least Mean Square (DLMS) algorithm (which is a kind of algorithm in distributed network and will be described later) , w_k (i) is the estimation or estimated vector of node k at time i, N represents the total number of nodes in the whole network and E denotes mathematical expectation. The equation amounts to averaging the differences between the estimated vectors of the individual nodes and the target vector at time i.

The predefined fusion threshold r can be obtained through simulations after the network configuration. Fig. 3 shows the determination of r in distributed estimation. As seen in Fig. 3, the nodes of entire network work independently without any information transmission when r ＝ 0, leading to the worst performance of estimation. With the increase of the fusion threshold r, the estimations from neighbor nodes are allowed into the process of data fusion, making the performance improved gradually. The performance curve at r ＝ 0.004 is almost same as the DLMS algorithm without any attacks. Thus, r ＝ 0.004 may be chosen as the value of the predefined fusion threshold. It is worth nothing that the increase of threshold r results in the benefits on performance, but that does not mean the bigger the better, which may lead to the estimations of malicious nodes to enter into the process of data fusion. It will be realized that the predefined fusion threshold r may also be set as any other suitable value as required.

The above method described in combination with Fig. 2 can eliminate the impact of the attack on regular nodes at time i by preventing abnormal or falsified data from data fusion. However, a node cannot be judged as a malicious node or not only through the short-term effects caused probably by the change of environmental factors, such as channel fading and shadow effect, etc. So a long-time observation may increase the accuracy in identifying an abnormal node or determining the malicious node.

Thus, according to an exemplary embodiment, first identification information (such as m_l described with respect to Fig. 4) for the first node may be defined to indicate, within an observation cycle or period (such as T described with respect to Fig. 4) comprising one or more observation points, a number of observation points at which data information from the first node is determined not to be used in data fusion by the second node. Thus, the first identification information for the first node may be updated in response to determining that the first data information is not to be used by the second node in the data fusion associated with the observation point.

According to the exemplary embodiment, second identification information (such as τ_l described with respect to Fig. 4) for the first node may be defined to indicate a number of observation cycles at the end of which the first identification information for the first node satisfies a second criterion. Thus, the second identification information for the first node may be updated in response to the first identification information at the end of the observation cycle satisfying the second criterion.

For example, the second criterion may comprise the attack probability (which may be represented by a ratio of the value of the first identification information to the observation cycle) being equal to or larger than a predefined identification threshold ρ. As such, if the second criterion is satisfied, which means that the attack probability is not below the predefined identification threshold ρ, then the first node may be suspected as a malicious node； otherwise, the first node would not be regarded as a malicious node.

In an exemplary embodiment, a predefined action may be triggered for the first node in response to the second identification information satisfying a third criterion. For example, the third criterion may comprise the value of the second identification information being equal to or larger than a predefined warning threshold ε. As such, if the third criterion is satisfied, which means that the first node is suspected as an abnormal node for one or more observation cycles not less than ε observation cycles, then at least one action may be triggered for the first node. For example, the predefined action may comprise raising an alarm.

It will be realized that the first, second and third criteria as previously described may be predefined or specified as required. The parameters, thresholds and rules associated with these criteria as well as the relationships thereof may vary, for example, as the network environment and the type of nodes which need to be identified change.

The proposed solution for node identification in the distributed network described with respect to Fig. 2 may be implemented with various suitable functional modules and/or network components. For example, the node identification scheme designed in accordance with exemplary embodiments of the present disclosure may be performed with three functional modules, such as an outlier data detection module, an evaluation and warning module, and a data management module. The outlier data detection module may be employed to identify abnormal information from neighbor nodes. The evaluation and warning module may be in charge of determining malicious nodes and raising an alarm in time. The data management module may be used to manage and update various data.

Fig. 4 is a flowchart illustrating an identification procedure in accordance with an embodiment of the present disclosure. This identification procedure may be divided into two parts, outlier data identification process and malicious node identification process, which are showed in the left and right of the flowchart, respectively. Specifically, the outlier data identification process (as shown in blocks 401-405) may be performed to prevent outlier data from entering into date fusion at a node to generate beneficial data, and then the beneficial data would be transmitted to the neighbors of the node. The malicious node identification process (as shown in blocks 406-412) may be performed to identify the malicious node and then to solve the attack problem in distributed adaptive networks.

The exemplary identification procedure may begin at block 401. As shown in Fig. 4, a node (such as node k) may collect information from its neighbors at block 402. For example, this node can obtain the data collection (such as φ_k (i) ＝ [ψ_l (i) , … ψ_k-1 (i) , ψ_k (i) , … ψ_N (i) ] ) comprising its own data and the data collected from all of its neighbors for data fusion associated with observation time i. As described with respect to Fig. 2, the similarity distance η_l, k (i) can be used to determine whether node l is a malicious node, which can be calculated before the data fusion at node k, as shown in block 403. The calculated η_l, k (i) may be compared with a predefined fusion threshold r at block 404. If η_l, k (i) ≤r, ψ_l (i) is regarded as safe one for data fusion, and the procedure proceeds to block 402 where the observation time i may be updated. Otherwise, ψ_l (i) would be considered as the value of a malicious node and need to be processed before data fusion at node k. For example, as shown in block 405, the number of attacks from node l detected during an observation cycle T may updated by m_l ＝ m_l+1, and ψ_l (i) may be replaced with ψ_k (i) to prevent the falsified value of malicious node l from the data fusion at node k. Through this outlier data identification process, each node in the network can be evaluated before the process of data fusion via the similarity distance to judge whether the received data may be used in data fusion.

As described with respect to Fig. 1, the malicious node l can launch an attack with certain probability p_l, so P_l can be defined to approximate the attack probability p_l as the evaluating indicator of neighbor node l. P_l can be expressed as follows:

where T is the observation cycle and m_l is the number of attacks detected during T. One time of calculation of P_l may not be enough for analyzing the behavior of node l, so a long-time accumulated value from many observation cycles may be needed to make a more reliable decision. If the currently accumulated number or time of estimations in one observation cycle (which may be represented by the latest updated value of i) is equal to the observation cycle T, the malicious node identification process may be invoked. Then the procedure proceeds from block 406 to 407 where P_l is calculated for node l. Otherwise, the procedure returns back to 402 where the observation time i may be updated.

In order to improve the procedure, parameters ρ and ε are introduced here， which may be called the identification threshold and the warning threshold, respectively. According to the different conditions and requirements, ρ and ε can be preset. As shown in block 408, if P_l≥ρ, node l may be suspected as the malicious node, and the counter τ_l is incremented by 1 till τ_l≥ε, as shown in

blocks

409 and 410. Otherwise, the procedure returns back to 402 where the observation time i may be updated. If τ_l≥ε at block 410, node l may be judged as the malicious node and a warning device or an alarm system may be activated at block 411. Thus, the operator can find and fix malicious nodes in time, which keep the network in safe stage.

The proposed solution introduces the concept of similarity distance and sets the fusion threshold to determine whether the data information received from a node is normal and can be used for data fusion. In addition, the proposed solution can effectively identify the malicious nodes launching attacks and solve the attack problem in the distributed adaptive network. According to an exemplary embodiment, the data used, obtained and/or generated in performing the proposed solution may be stored in a distributed manner without a central database.

Fig. 5 exemplarily illustrates the system architecture of a data management module in accordance with an embodiment of the present disclosure. As shown in Fig. 5, a node (such as node k) in the distributed adaptive network may have a data management module or a data manager which is responsible for collecting the data from its neighbors and updating some data and/or indicators used in the related components or modules (such as the evaluation and warning module) . For example, the data manager may store the data for node identification in the distributed adaptive network by a database (as shown in Table 1) .

Table 1: Description of database at node k

In accordance with an exemplary embodiment, the node may not store the data for all nodes of the whole network but its neighbors due to the limited computation capability. It will be realized that Fig. 5 and Table 1 merely schematically show some basic functional components of the data manager and some exemplary data stored in the database. Other possible operational elements and/or information configurations suitable for the proposed identification scheme may also be conceived in view of the exemplary embodiments of the present disclosure.

The proposed solution in accordance with the present disclosure does not need a central authority to identify the malicious nodes, but may be applicable in the distributed networks, which can have optimal performance for the distributed networks. Compared with the current approaches which are just effective for one kind of attack modes and can be easily affected by the increase of malicious nodes, the proposed solution would not have the limitations such as the mixed kinds of attack modes and the number of malicious nodes, and thus can effectively solve the attack problem in the distributed networks and greatly improve the performance. Moreover, less computation may be needed in the proposed solution, which makes it realize easily. The effectiveness and advantages of the proposed solution may be demonstrated in distributed parameter estimation via the simulations and performance analysis.

Fig. 6 exemplarily illustrates a network topology used in the simulations in accordance with an embodiment of the present disclosure. As shown in Fig. 6, a distributed network with 16 nodes (comprising one or more malicious nodes) is considered in the simulations. It will be appreciated that the considered distributed network may comprise more or less nodes than 16 nodes and may employ other suitable network topologies different from that shown in Fig. 6.

Similar to Fig. 3, MSD as expressed in equation (4) may be used to measure the performance of estimation of the entire network. The unknown M×1 vector w⁰ in MSD may be estimated by using the conventional DLMS algorithm. Considering a signal model denoted as follows:

d_k (i) ＝u_k (i) w⁰+v_k (i) , i ＝ 0, 1, … (6)

where node k has access to the observing realizations {d_k (i) , u_k (i) } , of zero-mean random data, with d_k (i) a scalar measurement and u_k (i) a 1×M regression row vector, both at time i； and v_k (i) is the background noise.

The conventional DLMS algorithm can be described as follows:

where w_k (i) is the estimation of node k at time i, μ_k is the step size, N_k is the collection of node k and its neighbor nodes, a_l, k is a nonnegative combination weight that node k assigns to data ψ_l (i) arriving from a member (such as node l) of N_k. The coefficients a_l, k may be designed to satisfy the following conditions:

a_l, k＝0, if

and

The DLMS algorithm may comprise two operations: adaptation and combination. In the adaptation stage, each node updates its estimator using the observed data {d_k (i) , u_k (i) } and collects the estimators from its neighbors in the combination stage. It is noted that no much discussion about the details of this algorithm is presented here so as not to unnecessarily obscure the present disclosure.

The simulation may be started with two kinds of attack modes such as the same-direction attack and the Gaussian attack. Some parameters and/or thresholds used in the simulation may be estimated or predefined as appropriate. For example, the fusion threshold r may be set to 0.004 as illustrated in Fig. 3. However, other values of the fusion threshold are also possible. Various performance advantages provided by the proposed solution (which is also denoted as “new scheme” in Figs. 8-9) in accordance with exemplary embodiments are exemplarily illustrated in Figs. 7-9.

As described with respect to various figures, the proposed solution in accordance with exemplary embodiments can enable a node to detect the abnormal data from its neighbors at every time under different attack modes and calculate the approximation of attack probability. Assume that node 1 and node 3 as shown in Fig. 6 are malicious, which have the attack probability 0.2 and 0.6 respectively. The results at node 2 are observed during 10 observation cycles, of which T is set to 1000 times. Fig. 7 exemplarily illustrates the detection of the number of attacks (denoted as m) under two kinds of attack modes in accordance with an embodiment of the present disclosure. As shown in Fig. 7, whatever the same-direction attack mode or the Gaussian attack mode, node 2 can accurately detect the number of attacks at every observation cycle T. Node 1 and node 3 are highly suspected as malicious because their attacks, of which m₁ ＝ 200 and m₃ ＝ 600 times respectively at every observation period T, are much more fierce than node 4 and node 5. The calculated attack probability P₁ and P₃ are around 0.2 and 0.6, which approach the real attack probability.

With the two kinds of attack modes, the proposed solution can greatly improve the performance of entire network and also eliminate the influences of different attack probabilities. Fig. 8 exemplarily illustrates the comparison of DLMS performance in the unsafe situation in accordance with an embodiment of the present disclosure. Specifically, Fig. 8 shows the comparison between the conventional DLMS algorithm and the DLMS algorithm using the proposed new scheme when node 3 is malicious. As shown in Fig. 8, whatever under any attack mode, the MSD curves of the conventional DLMS algorithm begin to rapidly deteriorate with the increase of attack probability. The performances of the conventional DLMS algorithm worsen to -10dB (Fig. 8 (a) ) and 5dB (Fig. 8 (b) ) while constant attacks (p＝1) . However, the proposed new scheme used in DLMS can greatly improve the estimation performance of the whole network and the MSD curves do not change over the attack probability, which always stay around -35dB that is same as the state of no attacks (p＝0) .

With the increased number of malicious nodes, the proposed new scheme can also keep the whole network high performance and robust. Fig. 9 exemplarily illustrates the impact of the increased number of malicious nodes on the network performance in accordance with an embodiment of the present disclosure. As shown in Fig. 9, regardless of attack modes, the MSD curve of when node 1 is malicious is almost the same as of when node 1 and node 3 are malicious, which stay around -35dB that is same as the state of no attacks (p＝0) .

The various blocks or information flows shown in Figs. 1-6 may be viewed as method steps, and/or as operations that result from operation of computer program code, and/or as a plurality of coupled logic circuit elements constructed to carry out the associated function (s) . The schematic flow chart diagrams described above are generally set forth as logical flow chart diagrams. As such, the depicted order and labeled steps are indicative of specific embodiments of the presented methods. Other steps and methods may be conceived that are equivalent in function, logic, or effect to one or more steps, or portions thereof, of the illustrated methods. Additionally, the order in which a particular method occurs may or may not strictly adhere to the order of the corresponding steps shown.

Fig. 10 is a simplified block diagram of an apparatus suitable for use in practicing exemplary embodiments of the present disclosure. The apparatus 1000 shown in Fig. 10 may be configured to perform various operations and functions of a node in distributed networks according to exemplary embodiments as described in connection with Fig. 1-6. Particularly, the apparatus 1000 may be configured to function as the second node according to exemplary embodiments as described in connection with Fig. 2.

The apparatus 1000 may comprise a processor (PROC) 1010, a memory (MEM) 1020 which stores computer program codes (PROG) 1030, and a suitable communication unit (COM) 1040 (such as a transceiver, a receiver and/or a transmitter, optionally in communication with an antenna) for coupling or connecting to another apparatus such as a network node, a communication device, an interactive entity, wireless terminal, wired terminal, a user equipment, a server, a client, a peripheral device, a database and so on. For example, the communication unit 1040 may be configured to support the apparatus 1000 to transmit/receive signals and messages to/from another apparatus. The processor 1010 may be used for processing these signals and messages. In this example only one processor and one memory are shown in Fig. 10, but it will be appreciated that other examples may utilize more than one processor and/or more than one memory (for example, same or different processor/memory types) .

The processor 1010 may be embodied as various means for implementing the various functionalities of exemplary embodiments of the present disclosure comprising, for example, a microprocessor, a coprocessor, a controller, a general purpose computer, a special-purpose integrated circuit such as, for example, an Application Specific Integrated Circuit (ASIC) , a Field Programmable Gate Array (FPGA) , or a hardware accelerator, processing circuitry or the like. According to an exemplary embodiment, the processor 1010 may be representative of a plurality of processors, or one or more multiple core processors, operating in concert. The processor 1010 may, but need not, comprise one or more accompanying Digital Signal Processors (DSPs) . In some exemplary embodiments, the processor 1010 is configured to execute instructions stored in the memory device or instructions otherwise accessible to the processor 1010. The processor 1010 may be configured to operate such that the processor 1010 causes the apparatus 1000 to perform various functionalities described herein.

The memory 1020 may be one or more computer-readable storage media that may comprise volatile and/or non-volatile memory. In some exemplary embodiments, the memory 1020 comprises Random Access Memory (RAM) including dynamic and/or static RAM, on-chip or off-chip cache memory, and/or the like. Further, the memory 1020 may comprise non-volatile memory, which may be embedded and/or removable, and may include, for example, read-only memory, flash memory, magnetic storage devices (for example, hard disks, floppy disk drives, magnetic tape, etc. ) , optical disc drives and/or media, non-volatile random access memory (NVRAM) , and/or the like. The memory 1020 may comprise a cache area for temporary storage of data. In this regard, at least a portion or the entire memory 1020 may be included within the processor 1010. Further, the memory 1020 may be configured to store information, data, applications, computer-readable program code instructions, and/or the like for enabling the processor 1010 and the apparatus 1000 to carry out various functions in accordance with exemplary embodiments described herein. For example, the memory 1020 may be configured to buffer input data for processing by the processor 1010. Additionally or alternatively, the memory 1020 may be configured to store instructions for execution by the processor 1010.

The computer program codes 1030 may be stored on a memory device, such as the memory 1020, and executed by a processor, such as the processor 1010, to enable the apparatus 1000 to operate in accordance with the exemplary embodiments, as discussed above. That is, the exemplary embodiments of the present disclosure may be implemented at least in part by computer software executable by the processor 1010, or by hardware, or by a combination of software and hardware. As will be appreciated, any such computer program codes may be loaded onto a computer or other programmable apparatus from a computer-readable storage medium to produce a particular machine, such that the particular machine becomes means for implementing the functions specified in the flowchart’s block (s) or operation (s) . These computer program codes may also be stored in a computer-readable storage medium that can direct a computer, a processor, or other programmable apparatus to function in a particular manner to thereby generate a particular machine or particular article of manufacture that becomes means for implementing the functions specified in the flowchart’s block (s) or operation (s) . The computer program codes may be retrieved from a computer-readable storage medium and loaded into a computer, processor, or other programmable apparatus to configure the computer, processor, or other programmable apparatus to execute operations to be performed on or by the computer, processor, or other programmable apparatus.

Alternatively or additionally, the apparatus 1000 may comprise various means and/or modules for implementing functions of the foregoing steps and methods in Figs. 1-6. In an exemplary embodiment, the apparatus 1000 may serve as the second node as described with respect to Fig. 2 and comprise: collection means for collecting first data information transmitted at an observation point from another apparatus (such as the first node as described with respect to Fig. 2) to the apparatus in a distributed network, wherein the apparatus is a neighbor of the another apparatus； comparison means for comparing the first data information with second data information transmitted at the observation point by the apparatus to its neighbors in the distributed network； and determination means for determining whether the first data information is to be used by the apparatus in data fusion associated with the observation point, based at least in part on the comparison of the first data information with the second data information.

In general, the various exemplary embodiments may be implemented in hardware or special purpose circuits, software, logic or any combination thereof. For example, some aspects may be implemented in hardware, while other aspects may be implemented in firmware or software which may be executed by a controller, microprocessor or other computing device, although the disclosure is not limited thereto. While various aspects of the exemplary embodiments of this disclosure may be illustrated and described as block diagrams, flow charts, or using some other pictorial representation, it is well understood that these blocks, apparatus, systems, techniques or methods described herein may be implemented in, as non-limiting examples, hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controller or other computing devices, or some combination thereof.

It will be appreciated that at least some aspects of the exemplary embodiments of the disclosures may be embodied in computer-executable instructions, such as in one or more program modules, executed by one or more computers or other devices. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types when executed by a processor in a computer or other device. The computer executable instructions may be stored on a computer readable medium such as a hard disk, optical disk, removable storage media, solid state memory, RAM, etc. As will be realized by one of skills in the art, the functionality of the program modules may be combined or distributed as desired in various embodiments. In addition, the functionality may be embodied in whole or in part in firmware or hardware equivalents such as integrated circuits, FPGA, and the like.

Although specific embodiments of the disclosure have been disclosed, those having ordinary skills in the art will understand that changes can be made to the specific embodiments without departing from the spirit and scope of the disclosure. The scope of the disclosure is not to be restricted therefore to the specific embodiments, and it is intended that the appended claims cover any and all such applications, modifications, and embodiments within the scope of the present disclosure.

Claims

A method, comprising:

collecting first data information transmitted at an observation point from a first node to a second node in a distributed network, wherein the second node is a neighbor node of the first node；

comparing the first data information with second data information transmitted at the observation point by the second node to its neighbor nodes in the distributed network； and

determining whether the first data information is to be used by the second node in data fusion associated with the observation point, based at least in part on the comparison of the first data information with the second data information.
The method according to claim 1, wherein the comparison of the first data information with the second data information comprises:

calculating a similarity distance between the first data information and the second data information, wherein the similarity distance represents a distance between values of the first data information and the second data information.
The method according to claim 2, wherein it is determined that the first data information is to be used by the second node in the data fusion associated with the observation point, in response to the similarity distance between the first data information and the second data information not satisfying a first criterion.
The method according to claim 2, wherein it is determined that the first data information is not to be used by the second node in the data fusion associated with the observation point, in response to the similarity distance between the first data information and the second data information satisfying a first criterion.
The method according to claim 4, wherein the first data information is replaced with the second data information by the second node in the data fusion associated with the observation point.
The method according to any one of claims 1 to 5, further comprising:

updating first identification information for the first node, in response to determining that the first data information is not to be used by the second node in the data fusion associated with the observation point,

wherein the first identification information indicates, within an observation cycle comprising one or more observation points, a number of observation points at which data information from the first node is determined not to be used in data fusion by the second node.
The method according to claim 6, further comprising:

updating second identification information for the first node, in response to the first identification information at the end of the observation cycle satisfying a second criterion,

wherein the second identification information indicates a number of observation cycles at the end of which the first identification information for the first node satisfies the second criterion.
The method according to claim 7, further comprising:

triggering a predefined action for the first node, in response to the second identification information satisfying a third criterion.
The method according to claim 8, wherein the predefined action comprises raising an alarm.
An apparatus, comprising:

at least one processor； and

at least one memory comprising computer program code,

the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus at least to:

collect first data information transmitted at an observation point from another apparatus to the apparatus in a distributed network, wherein the apparatus is a neighbor of the another apparatus；

compare the first data information with second data information transmitted at the observation point by the apparatus to its neighbors in the distributed network； and

determine whether the first data information is to be used by the apparatus in data fusion associated with the observation point, based at least in part on the comparison of the first data information with the second data information.
The apparatus according to claim 10, wherein the comparison of the first data information with the second data information comprises:

calculating a similarity distance between the first data information and the second data information, wherein the similarity distance represents a distance between values of the first data information and the second data information.
The apparatus according to claim 11, wherein it is determined that the first data information is to be used by the apparatus in the data fusion associated with the observation point, in response to the similarity distance between the first data information and the second data information not satisfying a first criterion.
The apparatus according to claim 11, wherein it is determined that the first data information is not to be used by the apparatus in the data fusion associated with the observation point, in response to the similarity distance between the first data information and the second data information satisfying a first criterion.
The apparatus according to claim 13, wherein the first data information is replaced with the second data information by the apparatus in the data fusion associated with the observation point.
The apparatus according to any one of claims 10 to 14, wherein the at least one memory and the computer program code are configured to, with the at least one processor, cause the apparatus at least further to:

update first identification information for the another apparatus, in response to determining that the first data information is not to be used by the apparatus in the data fusion associated with the observation point,

wherein the first identification information indicates, within an observation cycle comprising one or more observation points, a number of observation points at which data information from the another apparatus is determined not to be used in data fusion by the apparatus.
The apparatus according to claim 15, wherein the at least one memory and the computer program code are configured to, with the at least one processor, cause the apparatus at least further to:

update second identification information for the another apparatus, in response to the first identification information at the end of the observation cycle satisfying a second criterion,

wherein the second identification information indicates a number of observation cycles at the end of which the first identification information for the another apparatus satisfies the second criterion.
The apparatus according to claim 16, wherein the at least one memory and the computer program code are configured to, with the at least one processor, cause the apparatus at least further to:

trigger a predefined action for the another apparatus, in response to the second identification information satisfying a third criterion.
The apparatus according to claim 17, wherein the predefined action comprises raising an alarm.
A computer program product comprising a computer-readable medium bearing computer program codes embodied therein for use with a computer, the computer program codes comprising:

code for collecting first data information transmitted at an observation point from a first node to a second node in a distributed network, wherein the second node is a neighbor node of the first node；

code for comparing the first data information with second data information transmitted at the observation point by the second node to its neighbor nodes in the distributed network； and

code for determining whether the first data information is to be used by the second node in data fusion associated with the observation point, based at least in part on the comparison of the first data information with the second data information.
The computer program product according to claim 19, wherein the comparison of the first data information with the second data information comprises:

calculating a similarity distance between the first data information and the second data information, wherein the similarity distance represents a distance between values of the first data information and the second data information.
The computer program product according to claim 20, wherein it is determined that the first data information is to be used by the second node in the data fusion associated with the observation point, in response to the similarity distance between the first data information and the second data information not satisfying a first criterion.
The computer program product according to claim 20, wherein it is determined that the first data information is not to be used by the second node in the data fusion associated with the observation point, in response to the similarity distance between the first data information and the second data information satisfying a first criterion.
The computer program product according to claim 22, wherein the first data information is replaced with the second data information by the second node in the data fusion associated with the observation point.
The computer program product according to any one of claims 19 to 23, wherein the computer program codes further comprise:

code for updating first identification information for the first node, in response to determining that the first data information is not to be used by the second node in the data fusion associated with the observation point,

wherein the first identification information indicates, within an observation cycle comprising one or more observation points, a number of observation points at which data information from the first node is determined not to be used in data fusion by the second node.
The computer program product according to claim 24, wherein the computer program codes further comprise:

code for updating second identification information for the first node, in response to the first identification information at the end of the observation cycle satisfying a second criterion,

wherein the second identification information indicates a number of observation cycles at the end of which the first identification information for the first node satisfies the second criterion.
The computer program product according to claim 25, wherein the computer program codes further comprise:

code for triggering a predefined action for the first node, in response to the second identification information satisfying a third criterion.
The computer program product according to claim 26, wherein the predefined action comprises raising an alarm.
An apparatus, comprising:

collection means for collecting first data information transmitted at an observation point from another apparatus to the apparatus in a distributed network, wherein the apparatus is a neighbor of the another apparatus；

comparison means for comparing the first data information with second data information transmitted at the observation point by the apparatus to its neighbors in the distributed network； and

determination means for determining whether the first data information is to be used by the apparatus in data fusion associated with the observation point, based at least in part on the comparison of the first data information with the second data information.