1. Introduction
To meet the explosive demand for higher user data rates, it is envisioned that the next-generation cellular systems will be equipped with massive antenna arrays. Capitalizing on a large number of antennas at the base station (BS), Beam Division Multiple Access (BDMA) has recently been proposed as a promising method for 5G communications [
1,
2,
3,
4]. Different beams are allowed to transmit multiple users’ data-streams from BS. In contrast to the more conventional multiple access schemes such as Code Division Multiple Access (CDMA), Time Division Multiple Access (TDMA) or Orthogonal Frequency Multiple Division Access (OFDMA) that multiplex users in code, time and frequency domains, BDMA separates users in the beam space by transmitting data to different users in orthogonal beam directions. BDMA was first proposed in [
1] to decompose the multi-user multiple-input multiple-output (MU-MIMO) system into multiple single-user MIMO channels by multiplexing multiple users’ data onto non-overlapping beams. Since beamforming is commonly implemented in the analog domain using low-cost phase shifters, BDMA becomes particularly attractive in practice in recent years. Moreover, joint beam selection and user scheduling were formulated under the Lyapunov-drift optimization framework before the optimal user-beam scheduling policy for BDMA was derived in a closed-form [
2]. However, the assumption of non-overlapping orthogonal beams is generally difficult to be satisfied in practice. As a result, analog-only BDMA applications are heavily handicapped by the non-orthogonal inter-user interference among beams.
In the meantime, digital precoding has been widely investigated as an effective signal processing method to suppress the inter-user interference for MU-MIMO. It is well known that the classical fully digital precoding requires a dedicated radio frequency (RF) chain for each antenna. However, power consumption and the high hardware cost render the fully digital precoding impractical for massive MIMO systems [
5,
6,
7]. To address this challenge, hybrid digital and analog beamforming has been proposed for massive MIMO transmissions by separating the precoding process into two steps, namely analog and digital precoding [
8,
9]. More specifically, the transmitted signals are first precoded digitally using a smaller number of RF chains followed by the analog precoder exploited by a much larger number of low-cost phase shifters [
10,
11]. As a result, the hybrid analog-digital precoding architecture requires significantly fewer RF chains as compared to the fully digital precoding [
12]. It has been reported in the literature that the hybrid beamforming structure is capable of achieving performance compared to the fully digital beamforming scheme if the number of RF chains at each end is greater than or equal to twice the number of the data-streams [
13]. Therefore, the hybrid precoded massive MU-MIMO system can benefit from the interference suppression supplied by the digital precoding while harvesting large antenna beamforming gains by implementing the massive antennas available in the systems [
14]. This hybrid structure is particularly attractive for millimeter wave (mmWave) MIMO systems to support the transmissions of Gbps-order data throughput by exploiting the vast vacant spectrum available at RF of 6 GHz or above [
15]. Furthermore, the notion of block diagonal (BD) precoding was first introduced to the conventional fully digital schemes to reduce the precoding complexity in [
16]. By dividing the inverse of a large matrix into the inverse of multiple much smaller matrices, the BD precoding can be efficiently exploited with only marginal or no performance degradation as compared to the fully digital precoding [
16]. In recent years, the BD design has been extended to the hybrid precoding for MU-MIMO [
17]. However, most existing hybrid BD precoding schemes were constructed based on a crucial assumption, i.e., the number of RF chains must be no less than the total number of data-streams to be transmitted. Some pioneering works have investigated to relax this limitation by implementing the state-of-the-art fast-speed phase shifters and switches that can change their states symbol by symbol [
18]. However, [
19] requires users to resume their symbols via the compressive sensing technique, which makes the scheme impractical for low-complexity receivers.
Meanwhile, power allocation is also an important problem in co-channel interference management for multi-user wireless networks. In many MIMO applications, it is desirable to design a system satisfying the quality of service (QoS) constraint for each user by adjusting the power allocated to different users [
20]. Since the objective function is highly non-convex, the problem is usually very difficult and complicated, especially for the coupled analog and digital precoding constraints [
21]. Therefore, most existing works maximize the sum-rate capacity by implementing the water-filling algorithm without considering the QoS requirement for each user. For instance, [
22] alternatively optimized the power allocation for sum-rate maximization by using the water-filling algorithm, assuming that the analog precoders are strictly orthogonal among distinct users. However, the water-filling algorithm cannot satisfy the per-user QoS constraint as users with poor channel conditions are not allocated any transmit power by the water-filling algorithm. To cope with this problem, the signal-to-interference-and-noise ratio (SINR)-balanced power allocation has been proposed to achieve identical SINR for all users [
23,
24,
25,
26]. However, the system performance using the SINR-based power allocation is limited by the user with the worst channel conditions.
In this work, we propose a downlink BDMA scheme empowered by BD digital precoding and global power allocation over multipath channels. Compared to the existing BDMA works [
1,
2], our proposed BDMA schemes can substantially suppress multi-user interference without requiring perfectly orthogonal beams as residual interference can be greatly removed by digital precoding. Furthermore, in sharp contrast to the conventional hybrid precoding schemes, the proposed scheme can use fewer RF chains than the number of transmit data-streams by exploiting the hybrid BD precoding architecture built upon the state-of-the-art fast-speed phase shifters [
27] and switches [
18]. Furthermore, an iterative algorithm for power allocation is proposed to satisfy per-user QoS requirement based on the difference of convex functions (D.C.) programming technique.
The main contributions of this paper are summarized as follows:
A block diagonal hybrid precoding scheme is proposed by exploiting the state-of-the-art fast-speed phase shifters and switches. The resulting scheme can use fewer RF chains than the number of transmit data-streams by jointly performing hybrid analog-digital precoding and user-beam grouping.
Furthermore, we develop a greedy grouping algorithm to minimize the inter-group interference while maximizing the intra-group interference. Then, the intra-group interference is eliminated by two proposed digital precoders, namely the SINR- and SLNR-based precoders.
In contrast to most works in the literature that used the single-path channel model, we analyze the sum-rate capacity using the multipath channel model.
Finally, for given analog and digital precoders, an optimized power allocation scheme is derived to satisfy per-user QoS requirement by using the D.C. programming technique.
The rest of this paper is organized as follows: In
Section 2, we present the block diagonal system model with reduced RF chains before formulating the optimization problem. By allocating the power uniformly to each user, the analog and digital precoders are derived in
Section 3. After that, the performance of the proposed system is analyzed in
Section 5. Finally, to satisfy the QoS constraint,
Section 6 proposes a QoS-aware power allocation algorithm based on the D.C. programming technique followed by extensive numerical results presented in
Section 7.
Notation: In this paper, we use uppercase boldface and lowercase boldface letters to denote matrices and vectors, respectively. denotes the identity matrix with size . and denote the transpose and conjugate transpose of , respectively. is the pseudo inverse of while stands for the L2 norm of and denotes the absolute value of A. denotes the i-th element of . is the cardinality of the enclosed set . represents the chi-square distribution with k degrees of freedom. is the inner product of and . stands for the addition of element x to set while removal of element x from . Finally, denotes the expectation of the enclosed random variable.
2. System Model and Problem Formulation
In this paper, we consider a MU-MIMO downlink system as shown in
Figure 1 in which
users are scheduled for service.
RF chains and
antennas are equipped on BS which transmits
data-streams to
receivers with
receive antennas at each time slot. In a practical MIMO system, the number of RF chains is typically much smaller than the number of antennas, i.e.,
. We assume that only one data stream is designated to each scheduled receiver for transmission. Denoted by
the
n-th block of
data to be transmitted,
has unit power with
. In the sequel, we can concentrate on a single block and omit the temporal index
n for notation simplicity.
2.1. Transmitter
In our proposed group-by-group BD digital precoding system shown in
Figure 1, the
users are first divided into
K groups with the group size being
, where
for
. It is clear that
. Accordingly, the data-streams
can be rewritten in groups as:
where
is the data vector transmitted to the users in the
k-th group and modeled as:
with
being the data transmitted to the
u-th user in the
k-th group for
.
Next, we focus on modeling the digital precoding process. Denoted by
of
the digital precoder for the
k-th group for
,
can be written as:
where
represents the digital precoding vector for the
u-th user in the
k-th group. Thus, the overall digital precoding matrix can be expressed as a block diagonal matrix as follows:
Clearly, inverting a BD matrix is less computationally expensive than a non-BD matrix of the same dimension. Therefore, the BD structure of
in Equation (
4) can potentially lead to reduced computational complexity.
Similarly, we model the corresponding analog precoder in groups as
where
of
, the analog precoder for the
k-th group for
, is given by:
with
being the analog beamforming vector for the
u-th user in the
k-th group.
Finally, the resulting hybrid precoded signal
is transmitted to all
users.
2.2. Channel Models
We denote
the MIMO multipath channel matrix between the transmitter and the
u-th receiver in the
k-th group using the Saleh-Valenzuela model [
8]:
where
is the total number of multipath components between the transmitter and the
u-th user in the
k-th group. Furthermore,
,
and
are the complex path gain, angles of arrival (AoA) and azimuth/elevation angles of departure (AoD) of the
l-th path of the
u-th user in the
k-th group, respectively. Furthermore,
is the array response vector. Finally,
is a constant parameter. For a uniform planar array (UPA) of size
considered in this work, the array response vector is given by [
8]:
where
is the wavenumber and
d is the distance between two adjacent antennas.
2.3. Receiver
Consequently, we formulate the receiver structure of the
u-th user in the
k-th group. The received signal is represented by
where
is the complex additive white Gaussian noise with zero mean and variance equal to
.
Assuming that the receivers are all low-cost terminals that perform analog beamforming only in decoding, the decoded signal denoted by
is given by:
where
of length
is the analog beamforming vector employed by the receiver with the power constraint of
and
Please note that the first term in Equation (
11) stands for the desired signal while the second term the sum of inter- and intra-group interference as well as receiver thermal noise.
2.4. Group-By-Group Hybrid Precoding
For notational simplicity, we denote by
the effective analog beamforming gain vector observed by the
u-th user in the
k-th group from the
j-th group for
.
Let
be the transmitted power vector, where the power allocated to the
u-th user in the
k-th group is denoted by
with
Then, the resulting channel capacity can be computed as
with
and
is the noise power.
Subsequently, the system sum-rate capacity can be computed as a function of
,
,
and
:
It is worth noting that the digital beamforming vectors can be designed to eliminate user interference for the conventional hybrid beamforming with sufficient RF chains, i.e.,
In contrast, it can only achieve interference-free asymptotically as
grows a very large number since the proposed BD precoding scheme requires fewer RF chains, i.e.,
. Thus, the capacity of the proposed BD precoding scheme is constrained by the residual inter- and intra-group interference in the system. Given
K groups, we can derive the optimal analog and block digital precoding matrices by
where
and
in
,
and
.
In problem , and confine the analog beamforming vectors to the phase-only structure in transmitter and receiver while ensures that each precoded signal is of unit power. Furthermore, and define the analog and digital precoder, respectively. constrains the maximum number of data-streams in each group to be within the number of RF chains. Finally, defines the downlink transmitted power constraint while guarantees the minimal data rate for each user.
The problem is challenging due to its non-convex and combinatorial nature. Thus, it is analytically intractable to derive a closed-form optimal solution. Instead, we consider a two-stage suboptimal solution: In the first stage, we focus on the analog and digital precoder design while assuming uniform power allocation; After fixing the analog and digital precoders, we derive the QoS-aware optimal power allocation in the second stage.
4. User Grouping Algorithm
Since the intra-group interference is eliminated by the digital precoding, we will focus on using the user grouping to maximize the intra-group interference while minimizing the inter-group interference. More specifically, we propose to group users into K groups with minimal inter-group interference. Since the total number of possible group combinations is large, a greedy grouping algorithm is proposed in Algorithm 1.
Algorithm 1 Greedy User Grouping Algorithm |
Input:: the universal group and user index set; : Array response vector of index x; : the user index set for the k-th group; : group index; Initialize with being the user index of the largest channel gain and ; Procedures:- 1:
while is not empty do - 2:
Stage 1: - 3:
Solve the optimal analog precoder by Equation ( 21) - 4:
Let be the analog precoders of grouped users - 5:
for x in do - 6:
Compute - 7:
end for - 8:
Find the user index with maximum - 9:
Update , and - 10:
Stage 2: - 11:
if then - 12:
Update - 13:
for x in do - 14:
Compute - 15:
end for - 16:
Find the user index with minimum - 17:
Update , and - 18:
end if - 19:
end while
|
In this algorithm, for , we first group users who cause most interference to each other into Group 1 detailed in Stage 1. This selection is motivated by the observation that most interference can be eliminated by the digital precoder applied among each group. When the size of Group 1 reaches the number of RF chains, the user whose array response vector is most orthogonal to Group 1 is selected as the first member of Group 2 as shown in Stage 2, which is designed to minimize the inter-group interference. This process repeats until all users are assigned to different groups.
It is worth noting that the grouping problem is NP-hard. The greedy algorithm is proposed to find a suboptimal partition with complexity .
5. Performance Analysis
We first investigate the capacity for the conventional analog-only BDMA scheme.
where
is the received interference represented as
Proposition 1. If the optimal analog beamformers are designed as and , respectively, the following approximation holds: From Proposition 1, Equation (
31) can be rewritten as
where
and
Capitalizing on the Extreme Value Theory [
30,
31], we can derive the cumulative distribution function (CDF) of
Z as
where
. The detailed derivation is shown in
Appendix C.
The residual interference of distinct beams is negligible as compared to the desired signal. Thus,
Y can be upper bounded by
where
with
T being the expected residual interference power between distinct beams. In our proposed system, the beams will be selected and grouped to reduce the residual interference. Clearly,
if the number of antennas goes to infinity or the steering vectors of different users are strictly orthogonal. In contrast,
if different users have same AoDs. The value of
T can be numerically derived.
Finally, the CDF of the SINR lower bound can be given by
The detailed derivation can be found in
Appendix D.
Using the CDF above, the lower and upper bounds of the sum-rate capacity can be derived as
It is analytically intractable to obtain a closed-form solution to Equation (
44). We will show the numerical results in simulation section.
Please note that the upper bound is achieved if the interference from other users can be eliminated. Furthermore, since the number of transmitter antennas is finite in practice, the analog beamforming vectors shown in Equations (
21) and (
23) inevitably incur residual inter-user interference. Therefore, digital precoders are required to further suppress the residual interference.
6. Proposed QoS-Aware Power Allocation Algorithm Based on D.C. Programming
For given analog and digital precoders, we investigate the QoS-aware power allocation in by using the D.C. programming technique in this section.
We begin with reformulating
as
Following the procedures in [
32], the problem above can be cast as a D.C. programming problem:
where
For given analog and digital precoders, both
and
are concave in
, i.e., Equation (
46) is a D.C. function. Starting from a feasible
, the optimal
at the
n-th iteration is generated as the optimal solution of a convex problem:
which can be efficiently solved by any existing convex programming software, such as CVX [
33]. The computational complexity of Equation (
47) is
in each iteration [
32].
As
is concave, its gradient
is also super-gradient:
The proof is given in
Appendix E.
Finally, since
is the solution to Equation (
47), it follows that
Therefore, the -th solution is always better than the previous one. The iterative process terminates after is achieved with a pre-defined threshold .
7. Simulation Results
In this section, we will use computer simulation to evaluate the sum-rate performance of the proposed block diagonal digital precoding schemes. Unless specified otherwise, we consider a transmitter equipped with a UPA (i.e., ) and users each equipped with an UPA (i.e., ). The number of paths is set to and the additive Gaussian noise power dBm for each user. We consider the azimuth AoA/AoD’s uniformly distributed over while the elevation AoA/AoD’s uniformly distributed over , respectively. For each computer experiment, we compute the average over 100 realizations.
In
Figure 2, we first set
, i.e., no grouping. As a result, 16 RF chains are required to support 16 data-streams. As shown in
Figure 2, BZF slightly outperforms BSM as it can eliminate more multi-user interference even in multipath environment. It is observed that even in the high SNR regime, BDMA suffers from inter-beam interference and has poor performance.
Next, we evaluate the two proposed BD precoding schemes with
groups and 8 RF chains. The 16 users are grouped into
groups. The curves labeled as “BZF” and “BSM” stands for the proposed BD precoding schemes where only 8 RF chains are used to transmit 16 data-streams. It is observed that BZF and BSM have comparable performance. Furthermore, the curve labeled as “Conventional Hybrid BF (8 RF Chains)” is the sum-rate for the conventional hybrid beamforming system with 8 RF chains serving 8 users. Finally, BDMA is the analog-only precoding system that has the worst performance. Inspection of
Figure 3 reveals that the proposed BZF and BSM have much better sum-rate performance than the conventional hybrid precoding algorithm over the SNR range
dB. When the SNR is larger than 12 dB, the system becomes interference-limited. Thus, the performance of BZF and BSM tends to saturate beyond this point.
In
Figure 4, we investigate the sum-rate capacity improvement as a function of the number of RF chains while fixing the SNR at 5 dB. The upper bound is the conventional ZF precoding system with 16 RF chains for 16 users. Interestingly, the performance improvement generated by an additional RF chain increases only marginally as the number of RF chains grows from eight to 14.
Next, we vary the number of groups while fixing the total number of users at
and SNR
dB.
Figure 5 shows that BZF and BSM are lower bounded by BDMA and upper bounded by the conventional ZF system with 16 RF chains. When
, the system degenerates back to the conventional ZF system with
. On the other hand, if
, the system becomes the conventional BDMA.
We then investigate the sum-rate performance as the number of transmit antennas increases.
Figure 6 shows that the capacity of BZF and BSM has been significantly increased as the number of transmit antennas increases. This is because that the inter-group interference is asymptotically removed as indicated in Equation (
20).
Finally, we evaluate the performance of the power allocation generated with the D.C. programming technique. We assume that the minimum QoS threshold for each user is set to 3 bps/Hz.
Figure 7 and
Figure 8 show the performance achieved by our proposed QoS-aware power allocation algorithm. The curve labeled as “Water-filling Power Allocation” is obtained by allocating user power via the water-filling algorithm without taking into account the QoS requirement. The curve labeled as “QoS-Aware Power Allocation” shows the performance of the proposed power allocation algorithm. Compared to the curve labeled as “Uniform Power Allocation”, the proposed algorithm has demonstrated significant advantages in terms of the sum-rate capacity. Furthermore,
Figure 8 depicts the CDF of the user data rate. It is evident that all users served by the QoS-aware power allocation satisfy the minimum QoS requirement (i.e., 3 bps/Hz). In contrast, the water-filling-based power algorithm suffers from an outage rate of about 20% where outage is defined as the user data rate being below the minimum required data rate.