[go: up one dir, main page]

Skip to main content
Log in

Online estimation and community detection of network point processes for event streams

  • Original Paper
  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

A common goal in network modeling is to uncover the latent community structure present among nodes. For many real-world networks, the true connections consist of events arriving as streams, which are then aggregated to form edges, ignoring the dynamic temporal component. A natural way to take account of these temporal dynamics of interactions is to use point processes as the foundation of network models for community detection. Computational complexity hampers the scalability of such approaches to large sparse networks. To circumvent this challenge, we propose a fast online variational inference algorithm for estimating the latent structure underlying dynamic event arrivals on a network, using continuous-time point process latent network models. We describe this procedure for network models capturing community structure. This structure can be learned as new events are observed on the network, updating the inferred community assignments. We investigate the theoretical properties of such an inference scheme, and provide regret bounds on the loss function of this procedure. The proposed inference procedure is then thoroughly compared, using both simulation studies and real data, to non-online variants. We demonstrate that online inference can obtain comparable performance, in terms of community recovery, to non-online variants, while realising computational gains. Our proposed inference framework can also be readily modified to incorporate other popular network structures.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Algorithm 1
Algorithm 2
Algorithm 3
Algorithm 4
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Notes

  1. Here \(z_{-i}\) is a sub-vector of z with the ith entry removed.

  2. The algorithm for the non-homogeneous Poisson process is similarly constructed.

  3. The corresponding algorithm for the non-homogeneous Hawkes process is similarly constructed.

  4. See Appendix C.

References

  • Alquier, P., Ridgway, J.: Concentration of tempered posteriors and of their variational approximations. Ann. Stat. 48(3), 1475–1497 (2020)

    Article  MathSciNet  Google Scholar 

  • Amini, A.A., Chen, A., Bickel, P.J., Levina, E.: Pseudo-likelihood methods for community detection in large sparse networks. Ann. Stat. 41(4), 2097–2122 (2013)

    Article  MathSciNet  Google Scholar 

  • Bhattacharjee, M., Banerjee, M., Michailidis, G.: Change point estimation in a dynamic stochastic block model. J. Mach. Learn. Res. 51, 4330–4388 (2020)

    MathSciNet  Google Scholar 

  • Bifet, A., Frank, E.: Sentiment knowledge discovery in twitter streaming data. In: International Conference on Discovery Science, pp. 1–15 (2010)

  • Blei, D.M., Kucukelbir, A., McAuliffe, J.D.: Variational inference: a review for statisticians. J. Am. Stat. Assoc. 112(518), 859–877 (2017)

    Article  MathSciNet  Google Scholar 

  • Broderick, T., Boyd, N., Wibisono, A., Wilson, A.C., Jordan, M.I.: Streaming variational bayes. Adv. Neural Inf. Process. Syst. 26, 32 (2013)

    Google Scholar 

  • Celisse, A., Daudin, J.-J., Pierre, L.: Consistency of maximum-likelihood and variational estimators in the stochastic block model. Electron. J. Stat. 6, 1847–1899 (2012)

    Article  MathSciNet  Google Scholar 

  • Cheng, Y., Church, G.M.: Biclustering of expression data. In: Ismb, vol. 8, pp. 93–103 (2000)

  • Chérief-Abdellatif, B.-E., Alquier, P., Khan, M.E.: A generalization bound for online variational inference. In: Asian Conference on Machine Learning, pp. 662–677 (2019)

  • Daley, D.J., Jones, D.V.: An Introduction to the Theory of Point Processes: Elementary Theory of Point Processes. Springer, Berlin (2003)

    Google Scholar 

  • Fortunato, S., Hric, D.: Community detection in networks: a user guide. Phys. Rep. 659, 1–44 (2016)

    Article  MathSciNet  Google Scholar 

  • Gao, C., Ma, Z., Zhang, A.Y., Zhou, H.H.: Achieving optimal misclassification proportion in stochastic block models. J. Mach. Learn. Res. 18(1), 1980–2024 (2017)

    MathSciNet  Google Scholar 

  • George, T., Merugu, S.: A scalable collaborative filtering framework based on coclustering. In: Fifth IEEE International Conference on Data Mining (icdm’05), p. 4 (2005)

  • Gilks, W.R., Richardson, S., Spiegelhalter, D.: Markov Chain Monte Carlo in Practice. Chapman and Hall/CRC, Boco Raton (1995)

    Book  Google Scholar 

  • Hawkes, A.G.: Hawkes processes and their applications to finance: a review. Quant. Finance 18(2), 193–198 (2018)

    Article  MathSciNet  Google Scholar 

  • Hawkes, A.G., Oakes, D.: A cluster process representation of a self-exciting process. J. Appl. Probab. 11(3), 493–503 (1974)

    Article  MathSciNet  Google Scholar 

  • Heard, N.A., Weston, D.J., Platanioti, K., Hand, D.J.: Bayesian anomaly detection methods for social networks. Ann. Appl. Stat. 4(2), 645–662 (2010)

  • Hoffman, M., Bach, F.R., Blei, D.M.: Online learning for latent Dirichlet allocation. In: Advances in Neural Information Processing Systems, pp. 856–864 (2010)

  • Hoffman, M.D., Blei, D.M., Wang, C., Paisley, J.: Stochastic variational inference. J. Mach. Learn. Res. 14(1), 1303–1347 (2013)

    MathSciNet  Google Scholar 

  • Huang, Z., Soliman, H., Paul, S., Xu, K.S.: A mutually exciting latent space Hawkes process model for continuous-time networks. In: Uncertainty in Artificial Intelligence, pp. 863–873 (2022)

  • Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2(1), 193–218 (1985)

    Article  Google Scholar 

  • Lee, J., Li, G., Wilson, J.D.: Varyingcoefficient models for dynamic networks (2017). arXiv preprint arXiv:1702.03632

  • Lee, W., McCormick, T.H., Neil, J., Sodja, C., Cui, Y.: Anomaly detection in large-scale networks with latent space models. Technometrics 64(2), 241–252 (2022)

    Article  MathSciNet  Google Scholar 

  • Lehmann, E.L., Casella, G.: Theory of Point Estimation. Springer, New York (2006)

    Google Scholar 

  • Leifeld, P., Cranmer, S.J., Desmarais, B.A.: Temporal exponential random graph models with btergm: estimation and bootstrap confidence intervals. J. Stat. Softw. 83, 6 (2018)

  • Leskovec, J., Krevl, A.: Snap datasets: stanford large network dataset collection (2014)

  • Matias, C., Miele, V.: Statistical clustering of temporal networks through a dynamic stochastic block model. J. R. Stat. Soc. Ser. B Stat. Methodol. 79(4), 1119–1141 (2017)

    Article  MathSciNet  Google Scholar 

  • Matias, C., Rebafka, T., Villers, F.: A semiparametric extension of the stochastic block model for longitudinal networks. Biometrika 105(3), 665–680 (2018)

    Article  MathSciNet  Google Scholar 

  • Miscouridou, X., Caron, F., Teh, Y.W.: Modelling sparsity, heterogeneity, reciprocity and community structure in temporal interaction data. Adv. Neural Inf. Process. Syst. 31, 2343–2352 (2018)

    Google Scholar 

  • Moreira-Matias, L., Gama, J., Ferreira, M., Mendes-Moreira, J., Damas, L.: Predicting taxi-passenger demand using streaming data. IEEE Trans. Intell. Transp. Syst. 14(3), 1393–1402 (2013)

    Article  Google Scholar 

  • Mukherjee, S.S., Sarkar, P., Wang, Y.R., Yan, B.: Mean field for the stochastic blockmodel: optimization landscape and convergence issues. Adv. Neural Inf. Process. Syst. 31, 10694–10704 (2018)

    Google Scholar 

  • Ng, A., Jordan, M., Weiss, Y.: On spectral clustering: analysis and an algorithm. Adv. Neural Inf. Process. Syst. 14, 52 (2001)

    Google Scholar 

  • Nowicki, K., Snijders, T.A.B.: Estimation and prediction for Stochastic blockstructures. J. Am. Stat. Assoc. 96, 1077–1087 (2001)

    Article  MathSciNet  Google Scholar 

  • Ogata, Y.: Statistical models for earthquake occurrences and residual analysis for point processes. J. Am. Stat. Assoc. 83(401), 9–27 (1988)

    Article  Google Scholar 

  • Pensky, M., Zhang, T.: Spectral clustering in the dynamic stochastic block model. Electron. J. Stat. 13(1), 678–709 (2019)

    Article  MathSciNet  Google Scholar 

  • Pontes, B., Giráldez, R., Aguilar-Ruiz, J.S.: Biclustering on expression data: a review. J. Biomed. Inform. 57, 163–180 (2015)

  • Rizoiu, M.-A., Lee, Y., Mishra, S., Xie, L.: A tutorial on Hawkes processes for events in social media (2017). arXiv preprint arXiv:1708.06401

  • Rossetti, G., Cazabet, R.: Community discovery in dynamic networks: a survey. ACM Comput. Surv. 51(2), 1–37 (2018)

    Article  Google Scholar 

  • Sengupta, S., Chen, Y.: A block model for node popularity in networks with community structure. J. Roy. Stat. Soc. Ser. B (Stat. Methodol.) 80(2), 365–386 (2018)

    Article  MathSciNet  Google Scholar 

  • Sewell, D.K., Chen, Y.: Latent space models for dynamic networks. J. Am. Stat. Assoc. 110(512), 1646–1657 (2015)

    Article  MathSciNet  Google Scholar 

  • Shalev-Shwartz, S., et al.: Online learning and online convex optimization. Found. Trends Mach. Learn. 4(2), 107–194 (2012)

  • Wang, Y., Feng, C., Guo, C., Chu, Y., Hwang, J.-N.: Solving the sparsity problem in recommendations via cross-domain item embedding based on co-clustering. Proceedings of the twelfth ACM international conference on web search and data mining, pp. 717–725 (2019)

  • Xu, H., Fang, G., Zhu, X.: Network group Hawkes process model (2020). arXiv preprint arXiv:2002.08521

  • Yang, Y., Etesami, J., He, N., Kiyavash, N.: Online learning for multivariate Hawkes processes. Adv. Neural Inf. Process. Syst. 85, 4937–4946 (2017)

  • Zhao, Y., Levina, E., Zhu, J.: Consistency of community detection in networks under degree-corrected stochastic block models. Ann. Stat. 40(4), 2266–2292 (2012)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Contributions

All authors were involved in writing the main manuscript and designing the experiments. All authors reviewed the manuscript.

Corresponding author

Correspondence to Owen G. Ward.

Ethics declarations

Conflict of interest

The author declares that he has no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Algorithm details

We include Algorithm 5 for the online Hawkes process as mentioned in the main text, along with Algorithm 6, which is a key step for storing useful information in this procedure. Some supporting functions used in Algorithm 5 are given below.

  • \(a+=b\) represents \(a = a+b\); \(a-=b\) represents \(a = a - b\).

  • Formula for impact(t) is \(\sum _{t_1 \in timevec} \lambda \exp \{- \lambda (t - t_1)\}\).

  • Formula for \(I_1\) is \(\sum _{t_1 \in timevec} \exp \{- \lambda (t - t_1)\}\).

  • Formula for \(I_2\) is \(\sum _{t_1 \in timevec} (t - t_1) \lambda \exp \{- \lambda (t - t_1)\}\).

  • Formula for \(integral(t,t_{end},\lambda )\) is \( 1 - \exp \{-\lambda (t_{end} - t)\}\).

  • Formula for \(integral(t,t_{start},t_{end},\lambda )\) is \( \exp \{-\lambda (t_{start} - t)\} - \exp \{-\lambda (t_{end} - t)\}\).

Table 3 The Data Structure for Storing History Events. The left diagram shows the structure under Poisson model, where the key is the pair of nodes and the value is its corresponding cumulative number of all past events. The right diagram shows the structure under Hawkes model, where the key is still the nodes and the value is its corresponding time sequence between \(t_{current} - R\) and \(t_{current}\) stored in queue structure

As discussed in main paper, we only need to store the sufficient statistics of the particular model in each setting. We show two examples in Table 3. In the homogeneous Poisson setting, we only need to store the cumulative counts for each pair of sender and receiver (\(l_{user1, user2}\)). In the Hawkes setting, we only need to store the recent historical events since the old information decays exponentially fast and thus has vanishing impact on the current event.

Appendix B: Additional simulation results and details

Here we provide additional details regarding the simulations in the main text, and we also include additional simulations experiments which were omitted from the main manuscript. Alongside this, we provide the results of similar experiments in the Hawkes process setting, similar to those considered in the main text for the inhomogeneous Poisson model. The code used to create all results in this work is available at https://github.com/OwenWard/OCD_Events. We discuss the regret properties of our online procedure. We demonstrate community recovery and other properties of our online inference procedure for Hawkes process models. We also expand on some components of the inference procedure discussed in Sect. 5.1.

Algorithm 5
figure e

Online-Hawkes

Algorithm 6
figure f

Trim

Illustrative simulation in Sect. 1. We first provide exact simulation settings for the illustrative example in the introduction. We consider a network of \(n=100\) nodes, with 2 communities, with \(40\%\) of nodes in the first community. In particular, we consider an intensity function from nodes in group \(k_1\) to nodes in group \(k_2\) of the form

$$\begin{aligned}{} & {} a_{k_{1}k_{2}}(t) = a_{k_1 k_2}^{1} \mathbb {1}\{0 \le t< T/3\} \\{} & {} \qquad + a_{k_1 k_2}^{2} \mathbb {1}\{T/3 \le t< 2T/3\} + a_{k_1 k_2}^{3} \mathbb {1}\{2/3T \le t < T\} \end{aligned}$$

where the coefficient vectors \(\varvec{a_{k_1 k_2}}=(a^1_{k_1 k_2},a^2_{k_1 k_2}, a^3_{k_1 k_2})\) for each block pair are of the form

$$\begin{aligned} \begin{pmatrix} \varvec{a_{11}}\\ \varvec{a_{12}} \\ \varvec{a_{21}} \\ \varvec{a_{22}} \end{pmatrix}= \begin{pmatrix} 0.25 &{}\quad 0.5 &{}\quad 1\\ 0.75 &{}\quad 1 &{} \quad 1 \\ 0.5 &{}\quad 0.25 &{}\quad 1 \\ 1 &{}\quad 0.75 &{}\quad 1 \end{pmatrix}. \end{aligned}$$

This leads to a dense network, where we observe events between each node pair for the specified choice of \(T=100\). If we wish to identify the community structure, classical network models require a single adjacency matrix, encoding the relationship between each node pair. The simplest way to form such a matrix is to consider the count matrix with \(A_{ij}=N_{ij}(T)\), the number of events observed between a given node pair. Spectral clustering of this adjacency matrix would then lead to estimated community memberships for each node. However, for this choice of conditional intensity function the event counts do not preserve the underlying community structure.

Rather than aggregating this data to form a single adjacency matrix, an alternative approach to cluster network incorporates dynamics through the observation of a sequence of adjacency matrices at some (equally spaced) time intervals. If we wanted to apply such a method to this event data the challenge is how to form the adjacency matrices. An observation window must be chosen, with an edge between two nodes present for the corresponding adjacency matrix for that window if there is one (or more) events between them.

Simulation settings We first expand on the simulation settings used in Sect. 5.1. Unless otherwise specified we consider a network of \(n=200\) nodes and \(K=2\) equally sized communities. The block inhomgoeneous Poisson process model is given by

$$\begin{aligned} \lambda _{ij}(t) = \sum _{h=1}^{H=2}a_{z_i z_j}(h)f_h(t). \end{aligned}$$

Here we consider \(f_h(t)\) to be step functions of common fixed length. We consider the following coefficients for these basis functions

$$\begin{aligned} \begin{pmatrix} \varvec{a_{11}}\\ \varvec{a_{12}} \\ \varvec{a_{21}} \\ \varvec{a_{22}} \end{pmatrix}= \begin{pmatrix} 0.8 &{}\quad 0.4 \\ 0.6 &{}\quad 0.2 \\ 0.2 &{}\quad 0.7 \\ 0.4 &{}\quad 0.7 \end{pmatrix}. \end{aligned}$$

When we vary K we consider different coefficients K which are multiplied by a constant which varies with the community. Full details of the choices of parameters are given in the associated code repository.

Window size Throughout the simulation studies in Sect. 5.1 we have used a fixed window size such that \(dT=1\). Here we wish to investigate the effect that varying this window size has on the performance of our algorithm. We compare community recovery for varying window sizes from \(dT=0.25\) to \(dT=5\), keeping all other parameters fixed as in the default simulation setting. We show boxplots of the community recovery performance, in terms of ARI, in Fig. 7. We see that in this scenario community recovery is possible, for all window sizes considered. For each choice of dT almost all simulations correctly recover the communities. We note that the choice of appropriate dT will depend somewhat on the data considered. For sufficiently small dT we might not observe any events in a given window, which would not provide any update of the model parameters. Similarly, a dT leading to a single event in each window would mirror standard Stochastic Variational inference (Hoffman et al. 2013). It appears that dT should be small enough to avoid getting stuck in local optima of the current estimate of the ELBO, while being large enough to ensure events have been observed.

We also wish to investigate the role of dT in the computation time required for this procedure. In Fig. 8 we show the computation time for our inference procedure (in seconds) as we vary dT, the window size. It appears that the window size is not clearly related to the computation time, however we would recommend against extremely small values of dT, which may lead to too many windows where no events occur.

Fig. 7
figure 7

Community recovery under block inhomogeneous Poisson simulated data for varying window size

Fig. 8
figure 8

Computation time for a fixed network setting as we vary dT, across 50 replications

Regret rate.

Quantifying the regret of an online estimation scheme is an important tool in the analysis of such a procedure. We can investigate the empirical performance of the regret for our method in simulation settings, mirroring the theoretical results. To do this, we need to consider a loss function. We define the loss function over the m-th time window as the negative normalized log-likelihood, i.e.

$$\begin{aligned}{} & {} {\tilde{l}}_m(\theta \vert z) = - \frac{1}{ \vert A \vert }\sum _{(i,j)\in A} \bigg \{ \int _{(m-1)dT}^{m dT} \log \lambda _{ij}(t \vert z)dN_{ij}(t) \\{} & {} \qquad - \int _{(m-1)dT}^{m dT} \lambda _{ij}(t \vert z)dt \bigg \}, \end{aligned}$$

and define the regret as

$$\begin{aligned}{} & {} \text {Regret}(T) = \inf _{\theta ^{(m)}\in \Pi (\theta )}\left\{ \sum _{m=1}^M {\tilde{l}}_m(\theta ^{(m)} \vert z^{*}) \right\} \\{} & {} \qquad - \sum _{m=1}^M {\tilde{l}}_m(\theta ^{*} \vert z^{*}), \end{aligned}$$

where \(M = T/dT\). This regret function quantifies the gap of the conditional likelihood, given the true latent membership \(z^{*}\), between the online estimator and the true optimal value. We note that this regret function is conditional on the true latent assignment being known, and as such, we need to account for possible permutations of the inferred parameters, which is done using \(\Pi (\theta )\). While this regret quantity may be of theoretical interest, in practice it may be more appropriate to instead look at the empirical regret, using the estimated latent community memberships. We shall define this as

$$\begin{aligned}{} & {} \text {Regret}_{EMP}(T) = \sum _{m=1}^M {\tilde{l}}_m(\theta ^{(m)} \vert z^{(m)}) \\{} & {} \quad - \sum _{m=1}^M {\tilde{l}}_m(\theta ^{*} \vert z^{*}), \end{aligned}$$

measuring the cumulative difference between the estimated and true log likelihood, as we learn both z and \(\theta \) over time.

Given these two regret definitions, we can simulate networks as in the default setting and compute the empirical regret for a fixed network, varying the range of time over which events are observed from \(T=50\) to \(T=200\). This is shown in Fig. 9. Here we compute each quantity across 100 simulations, showing loess smoothed estimates and their standard error over time. We see that as we observe these networks for a longer period this regret grows slowly.

Fig. 9
figure 9

Smoothed regret estimates as a function of time for a fixed network structure observed for varying lengths of time

1.1 Hawkes models

Hawkes community recovery The main experiments in Sect. 5.1 demonstrate the performance of our online learning algorithm where the intensity function follows an inhomogeneous Poisson process. Here we demonstrate the performance of this procedure for Hawkes block models also. We consider the same defaults in terms of network and community size and observation time. The specific parameter settings are given in the associated code repository, in each case considering 50 simulations. In Fig. 10 we first investigate the performance of our procedure for community recovery, as we increase the number of nodes. As the number of nodes increase, we can more consistently recover the true community structure.

Fig. 10
figure 10

Community recovery under the Hawkes model, in terms of ARI, as we increase the number of nodes

Hawkes, varying number of communities We can also investigate the performance of our procedure for community recovery under the Hawkes model as we increase K, the number of communities. In Fig. 11 we show the performance as we consider more communities for a fixed number of nodes. As K increases, we are less able to recover the true community structure, which is seen by a decrease in the ARI.

Fig. 11
figure 11

Community recovery for the Hawkes model as we vary the number of communities

Hawkes, online community recovery In Fig. 12 we illustrate how the community structure is learned as events are observed on the network, varying the number of nodes. Community recovery is harder than in the Poisson setting, but the average ARI increases quickly in time, before stabilising. There is considerably more uncertainty in this estimate than in the Poisson setting.

Fig. 12
figure 12

Online community recovery under the Hawkes model as the number of nodes increases, for a fixed observation period

Hawkes, parameter recovery We can also look at how we recover the true parameters of our Hawkes process as events are observed in time over the network, as was considered for the Poisson process model previously. Here we measure the recovery of both the baseline rate matrix M and the excitation parameter, B, along with the scalar decay parameter \(\lambda \). Figure 13 shows the GAM smoothed difference between the true and estimated parameters across 50 simulations, with included standard errors shown. For each of these three parameters, we see that the difference between the true and estimated parameter decreases, with the difference decreasing as we observe events for a longer time period.

Fig. 13
figure 13

Parameter Recovery for block Hawkes model as the observation time increases for a fixed network size

Appendix C: Technical conditions

In this section, we provide the details of theoretical analyses of our proposed algorithm under dense event setting, that is, integration of intensity function \(\lambda _{ij}(t)\) over window length dT is \(\Theta (1)\).

Different from the analysis of regular online algorithms, the key difficulties in our setting are (1) the model we consider is a latent class network model with complicated dynamics, (2) the proposed algorithm involves approximation steps. Before the proof of the main results, we first introduce some notation and definitions. In the following, we use variables \(c_0 - c_3\), C, and \(\delta \) to denote some constants which may vary from the place to place. \(\theta ^{*}\), \(z^{*}\) represents the true parameter and latent class membership, respectively.

  1. C0

    [Window size] Assume time window dT is some fixed constant which is determined a priori.

  2. C1

    [Expectation] Define the normalized log likelihood over a single time window,

    $$\begin{aligned}{} & {} l_{w}(\theta \vert z) = \frac{1}{ \vert A \vert } \sum _{(i,j) \in A} \big \{ \int _{0}^{dT} \log \lambda _{ij}(t \vert z) dN_{ij}(t) \\{} & {} q\quad - \int _{0}^{dT} \lambda _{ij}(t \vert z) dt \big \}. \end{aligned}$$

    For simplicity, we assume the expectation of data process is stationary, i.e, \({\bar{l}}_w (\theta \vert z) = {\mathbb {E}}^{*} l_{\omega }(\theta \vert z)\) does not depend on the window number w. Here the expectation is taken with respect to all observed data under the true process.

  3. C2

    [Latent membership identification] Assume

    $$\begin{aligned} {{\bar{l}}}_w(\theta \vert z) \le {{\bar{l}}}_w(\theta \vert z^{*}) - c \frac{d_m \vert z - z^{*} \vert _0}{ \vert A \vert }, \end{aligned}$$

    for any \(z \ne z^{*}\) and \(\theta \in B(\theta ^{*}, \delta )\). Here \(B(\theta ^{*}, \delta )\) is the \(\delta \)-ball around the true parameter \(\theta ^{*}\); \(d_m = m^{r_d} (r_d > 0)\) represents the graph connectivity and \( \vert z - z^{*} \vert _0\) is the number of individuals such that \(z_i \ne z^{*}_i\).

  4. C3

    [Continuity] Define Q function, \(Q(\theta , q) = {\mathbb {E}}_{q(z)} l_w(\theta \vert z)\) and \({{\bar{Q}}}(\theta , q) = {\mathbb {E}}^{*} Q(\theta , q)\). Suppose

    $$\begin{aligned} {{\bar{Q}}}(\theta , q) - {{\bar{l}}} (\theta \vert z^{*}) \le c d(q, \delta _{z^{*}}) \end{aligned}$$
    (C1)

    holds, where \(\delta _{z^{*}}\) is the probability function that put all its mass on the true label vector \(z^{*}\). The distance \(d(q_1, q_2) \equiv TV(q_1, q_2)\), where \(TV(q_1, q_2)\) is the total variance between two distribution functions. Let \(\theta (q)\) be the maximizer of \({{\bar{Q}}}(\theta , q)\). Assume that \( \vert \theta (q) - \theta ^{*} \vert \le c d(q, \delta _{z^*})\) holds for any q and some constant c.

  5. C4

    [Gradient condition] Assume that there exists a \(\delta \) such that

    1. 1.
      $$\begin{aligned} \frac{\partial {{\bar{Q}}}(\theta , q)}{\partial \theta }^T (\theta - \theta (q))< - c \ \vert \theta - \theta (q)\ \vert ^2 < 0 \end{aligned}$$

      and

      $$\begin{aligned} \ \vert \frac{\partial {{\bar{Q}}}(\theta , q)}{\partial \theta }^T (\theta - \theta (q))\ \vert \ge c \ \vert \frac{\partial {{\bar{Q}}}(\theta , q)}{\partial \theta }\ \vert ^2 \end{aligned}$$

      hold for \(\theta \in B(\theta (q), \delta )\) and any q with c being a universal constant.

    2. 2.
      $$\begin{aligned} {\mathbb {E}}^{*} \frac{\partial Q(\theta , q)}{\partial \theta }^T \frac{\partial Q(\theta , q)}{\partial \theta } \le C \end{aligned}$$
      (C2)

      holds for any \(\theta \in B(\theta (q), \delta )\) and any q.

  6. C5

    [Boundedness] For simplicity, we assume the functions \(\lambda _{ij}(t \vert z)\), \(\log \lambda _{ij}(t \vert z)\) and their derivatives are continuous bounded function of parameter \(\theta \) for all z and t.

  7. C6

    [Network degree] Let \(d_{i}\) be the number nodes that individual i connects to. We assume that \(d_i \asymp d_n\) for all i, with \(d_n = n^{r_d}\) (\(0< r_d < 1\)) (Here \(a \asymp b\) means a and b are in the same order.)

  8. C7

    [Initial condition] Assume \(\theta ^{(0)} \in B(\theta ^{*}, \delta )\) for a sufficiently small radius \(\delta \) and \(q^{(0)}\) satisfies

    $$\begin{aligned}{} & {} {\mathbb {E}}_{q^{(0)}(z_{-i})} {{\bar{l}}}_w(\theta ^{*} \vert z_i = z, z_{-i}) \nonumber \\{} & {} \quad \le {\mathbb {E}}_{q^{(0)}(z_{-i})} {{\bar{l}}}_w(\theta ^{*} \vert z_i = z_i^{*}, z_{-i})\nonumber \\{} & {} \qquad - c d_i / \vert A \vert \end{aligned}$$
    (C3)

    for all i and \(z \ne z_i^{*}\).

These are the regularity conditions required for the proofs of Theorem 2 and 3 in the main text. We first note some important comments on the above conditions. Here the window size dT is assumed to be any fixed constant. It can also grow with the total number of windows (e.g. \(\log T\)), the result will still hold accordingly. Condition C1 assumes the stationarity of process for ease of the proof. This condition can also be further relaxed for non stationary processes as long as Condition C2 holds for any time window. In Condition C2, we assume that there is a positive gap between log-likelihoods when the latent profile is different from the true one, which plays an important role in identification of latent profiles. Condition C3 postulates the continuity of the Q function. In other words, the difference between Q and the true conditional likelihood is small, when the approximate posterior q concentrates around the true latent labels. Condition C4 characterizes the gradient of Q function, along with the local quadratic property and boundedness. Condition C5 requires the boundedness of the intensity function. It can be easily checked that it holds for Poisson process. By using truncation techniques, the results can be naturally extended to the Hawkes process setting (Yang et al. 2017; Xu et al. 2010). We also note that \(d_n\) can be viewed as the network connectivity, the degree to which nodes in network connect with each other. Condition C6 puts the restriction on the network structure that the degrees should not diverge too much across different nodes. Then \( \vert A \vert \asymp m d_n\) controls the overall sparsity of the network. The network gets sparser when \(r_d \rightarrow 0\). Here we do not consider the regime where \(r_d = 0\) (in which case the network is super sparse, i.e. each individual has only a finite number of connections on average), which could be of interest in future work. Condition C7 puts the requirement on the initialization of model parameters and approximate q function. Note that (C3) is satisfied when q is close to the multinomial distribution which puts mass probability on the true label \(z^{*}\). Equation (C3) could also automatically hold when true model parameters of difference classes are well separated. (That is, we take \(q_i^{(0)}(z) = \text {multinom}(\frac{1}{K}, \ldots , \frac{1}{K})\) as non-informative prior so that (C3) holds.)

Appendix D: Useful Lemmas

In the main proof, we depend on the following Lemmas to ensure the uniform convergence of random quantities (e.g. likelihood, ELBO, etc.) to their population versions.

Lemma 1

Under Conditions C0, C1 and C5, it holds that

$$\begin{aligned}{} & {} P( \sup _z \vert g(\theta \vert z) - {\mathbb {E}} g(\theta \vert z) \vert \ge x ) \nonumber \\{} & {} \quad \le C K^m \exp \{ - \frac{1/2 \vert A \vert x^2}{v^2 + 1/3 M x} \}, \end{aligned}$$
(D4)

where \(g(\theta \vert z)\) is some known functions which could be taken as weighted log likelihood or its derivatives; v and M are some constants. ”\({\mathbb {E}}\)” here is the conditional expectation given fixed label z.

Proof of Lemma 1

Without loss of generality, we take \(g(\theta \vert z) = l_w(\theta \vert z)\). Define \(X_{ij} = \int _0^{dT} \log \lambda _{ij}(t \vert z) dN_{ij}(t) - \int _0^{dT} \lambda _{ij}(t \vert z)dt\) for any pair \((i,j) \in A\). According to Condition C5, we know that there exists M and \(v^2\) such that \( \vert X_{ij} - {\mathbb {E}} X_{ij} \vert \le M\) and \(\text {var}(X_{ij}) \le v^2\). Then we apply the Bernstein inequality and get that

$$\begin{aligned}{} & {} P( \vert \sum _{(i,j) \in A} X_{ij} - {\mathbb {E}} X_{ij} \vert \ge \vert A \vert x) \nonumber \\{} & {} \quad \le 2 \exp \{\frac{-\frac{1}{2} \vert A \vert ^2 x^2}{ \vert A \vert v^2 + 1/3 M \vert A \vert x}\} \end{aligned}$$
(D5)

By taking union bound over all possible z, we then have

$$\begin{aligned}{} & {} P( \sup _z \vert g(\theta \vert z) - {\mathbb {E}} g(\theta \vert z) \vert \ge x ) \nonumber \\{} & {} \quad \le C K^m \exp \{ - \frac{- 1/2 \vert A \vert ^2 x^2}{ \vert A \vert v^2 + 1/3 M \vert A \vert x} \}. \end{aligned}$$
(D6)

Thus, we conclude the proof.

One immediate result from Lemma 1 is that

Corollary 1

Under the same setting stated in Lemma 1, it holds that

$$\begin{aligned}{} & {} P( \vert {\mathbb {E}}_{q(z)} g(\theta \vert z) - {\mathbb {E}}_{q(z)} {\mathbb {E}} g(\theta \vert z) \vert \ge x) \nonumber \\{} & {} \quad \le C K^m \exp \{ - \frac{1/2 \vert A \vert x^2}{v^2 + 1/3 M x} \}, \end{aligned}$$
(D7)

for any q.

Proof of Corollary 1

For any distribution function q(z), we can observe the following relation,

$$\begin{aligned} \vert {\mathbb {E}}_{q(z)} g(\theta \vert z) - {\mathbb {E}}_{q(z)} {\mathbb {E}} g(\theta \vert z) \vert \le \sup _z \vert g(\theta \vert z) - {\mathbb {E}} g(\theta \vert z) \vert , \end{aligned}$$

and get the desired result by Lemma 1. QED.

The following Lemma 2 and Lemma 3 ensure the identification of latent memberships.

Lemma 2

Under Conditions C0 - C2, C5 - C6, with probability \(1 - \exp \{ - C d_n \}\), it holds that

$$\begin{aligned} \sum _{z \ne z^{*}} L(\theta \vert z) = L(\theta \vert z^{*}) \cdot O(\exp \{- c_1 d_n\}) \end{aligned}$$
(D8)

for any \(\theta \in B(\theta ^{*}, \delta )\) for some constants \(c_1\) and \(\delta \). Here, \(L(\theta \vert z) = \exp \{ \vert A \vert l_w (\theta \vert z)\}\).

Proof of Lemma 2

The main step of the proof is to show that

$$\begin{aligned} l_w(\theta \vert z) \le l_w(\theta \vert z^{*}) - c/2 \frac{d_n \vert z - z^{*} \vert _0}{ \vert A \vert } \end{aligned}$$
(D9)

holds for all z with high probability. We take \(g(\theta \vert z)\) as \(l_w(\theta \vert z) - l_w(\theta \vert z^{*})\). Similar to the proof of Lemma 1, we have that

$$\begin{aligned}{} & {} P( \vert l_w(\theta \vert z) - l_w(\theta \vert z^{*}) - {\mathbb {E}} \{l_w(\theta \vert z) - l_w(\theta \vert z^{*})\} \vert \nonumber \\{} & {} \quad \ge x \frac{d_n \vert z - z^{*} \vert _0}{ \vert A \vert }) \nonumber \\{} & {} \quad \le \exp \{ - \frac{d_n^2 \vert z - z^{*} \vert _0^2 x^2}{ \vert z - z^{*} \vert _0 d_{max} (v^2 + 1/3 Mx)}\} \end{aligned}$$

by noticing that there are at most \(O( \vert z - z^{*} \vert _0 d_{n})\) number of non-zero \(X_{ij}\)’s in \(l_w(\theta \vert z) - l_w(\theta \vert z^{*})\). By taking \(x = c/2\), we have

$$\begin{aligned}{} & {} P( \vert l_w(\theta \vert z) - l_w(\theta \vert z^{*}) -{\mathbb {E}} \{l_w(\theta \vert z) - l_w(\theta \vert z^{*})\} \vert \\{} & {} \quad \ge c/2 \frac{d_n \vert z - z^{*} \vert _0}{ \vert A \vert }) \\{} & {} \quad \le \exp \{ - \frac{{\tilde{c}} d_n^2 \vert z - z^{*} \vert _0}{ d_{n} (v^2 + 1/6 M c)}\}, \end{aligned}$$

by using the fact that \(d_{max} \asymp d_n\) and adjusting the constant \({\tilde{c}}\). Hence, by union bound, we get

$$\begin{aligned}{} & {} P( \sup _z \vert l_w(\theta \vert z) - l_w(\theta \vert z^{*}) - \nonumber \\{} & {} \quad {\mathbb {E}} \{l_w(\theta \vert z) - l_w(\theta \vert z^{*})\} \vert \ge c/2 \frac{d_n \vert z - z^{*} \vert _0}{ \vert A \vert })\nonumber \\{} & {} \quad \le \sum _{z} \exp \{ - \frac{{\tilde{c}} d_n^2 \vert z - z^{*} \vert _0}{ d_{n} (v^2 + 1/6 M c)}\} \nonumber \\{} & {} \quad = \sum _{m_0 = 1}^n \sum _{ \vert z - z^{*} \vert _0 = m_0} \exp \{ - \frac{{\tilde{c}} d_n m_0}{ v^2 + 1/6 M c}\} \end{aligned}$$
(D10)

By Condition C2, \(d_n = n^{r_d} (r_d > 0)\), (D10) becomes

$$\begin{aligned}{} & {} P\left( \sup _z \vert l_w(\theta \vert z) - l_w(\theta \vert z^{*}) - \right. \\{} & {} \quad \left. {\mathbb {E}} \{l_w(\theta \vert z) - l_w(\theta \vert z^{*})\} \vert \ge c/2 \frac{d_m \vert z - z^{*} \vert _0}{ \vert A \vert } \right) \\{} & {} \quad \le \sum _{m_0 = 1}^m K^{m_0} \exp \{ - \frac{{\tilde{c}} d_n m_0}{ v^2 + 1/6 M c}\}\\{} & {} \quad = \sum _{m_0 = 1}^n \exp \{ - \frac{{\tilde{c}} d_n m_0}{ v^2 + 1/6 M c} + n_0 \log K \}\\{} & {} \quad \le \sum _{m_0 = 1}^n \exp \{ - \frac{{\tilde{c}} d_n m_0}{ 2(v^2 + 1/6 M c)} \}\\{} & {} \quad \le \exp \{ - C d_n\} \end{aligned}$$

for adjusting constant C. Together with Condition C2, (D9) holds with probability \(1 - \exp \{-C d_n\}\).

By definition of \(L(\theta \vert z)\) and (D9), we get that \(L(\theta \vert z) \le L(\theta \vert z^{*}) \cdot \exp \{ - c/2 \cdot d_m \vert z - z^{*} \vert _0\}\) holds for any z with probability \(1 - \exp \{- C d_m\}\). Thus,

$$\begin{aligned}{} & {} \sum _{z \ne z^{*}} L(\theta \vert z) \\{} & {} \quad \le \sum _{z \ne z^{*}} L(\theta \vert z^{*}) \exp \{-c/2 d_n \vert z - z^{*} \vert _0\} \\{} & {} \quad \le \sum _{m_0 = 1}^n \sum _{z: \vert z - z^{*} \vert _0 = m_0} \exp \{-c/2 d_n m_0\} \\{} & {} \quad \le \exp \{- c_1 d_n\}, \end{aligned}$$

by adjusting constant \(c_1\). This completes the proof.

Lemma 3

For approximate function \(q^{(1)}\), it holds that

$$\begin{aligned} \sum _{z_i \ne z_i^{*}} q_i^{(1)}(z_i) = q_i^{(1)}(z_i^{*}) O(\exp \{-{\tilde{c}} d_i\}). \end{aligned}$$
(D11)

Proof of Lemma 3

We first have that

$$\begin{aligned}{} & {} {\mathbb {E}}_{q^{(0)}(z_{-i})} l_0(\theta ^{0} \vert z_i, z_{-1})\} \nonumber \\{} & {} \quad \le {\mathbb {E}}_{q^{(0)}(z_{-i})} l_0(\theta ^{0} \vert z_i^{*}, z_{-1})\} - \frac{c}{2} d_i \end{aligned}$$
(D12)

with high probability for any \(z_i \ne z_i^{*}\). Under initial condition C7, (D12) can be proved via the same technique used in Lemma 1. Secondly, we note that

$$\begin{aligned} q_i^{(1)}(z_i) \propto \exp \{{\mathbb {E}}_{q^{(0)}(z_{-i})} l_0(\theta ^{0} \vert z)\}. \end{aligned}$$
(D13)

We then have

$$\begin{aligned} q_i^{(1)}(z_i) \le q_i^{(1)}(z_i^{*}) \exp \{- \frac{c}{2} d_i\}. \end{aligned}$$
(D14)

By summing over all \(z_i\), it gives that

$$\begin{aligned} \sum _{z_i \ne z_i^{*}} q_i^{(1)}(z_i) \le m q_i^{(1)}(z_i^{*}) \exp \{-\frac{c}{2} d_i\} \le q_i^{(1)}(z_i^{*}) \exp \{- {\tilde{c}} d_i\}, \end{aligned}$$

by adjusting the constant \({\tilde{c}}\). This concludes the proof.

Appendix E: Proofs of Theorem 2 and Theorem 3

With aid of useful lemmas stated in previous sections, we are ready for the proof of main theorems.

Proof of Theorem 2

According to definition of Regret, we have

$$\begin{aligned}{} & {} \text {Regret}(T) = \sum _{m=1}^M {\tilde{l}}_m(\theta ^{(m)} \vert z) - \sum _{m=1}^M {\tilde{l}}_m(\theta ^{*} \vert z^{*}) \\{} & {} \quad = \sum _{m=1}^M \{{\mathbb {E}}_{q^{(m)}} {\tilde{l}}_m(\theta ^{(m)} \vert z) - {\mathbb {E}}_{q^{(m)}} {\tilde{l}}_m(\theta ^{*} \vert z)\} \\{} & {} \qquad - \sum _{m=1}^M \{ {\mathbb {E}}_{q^{(m)}} {\tilde{l}}_m(\theta ^{(m)} \vert z) - {\tilde{l}}_m(\theta ^{(m)} \vert z^{*}) \} \\{} & {} \qquad + \sum _{m=1}^M \{ {\mathbb {E}}_{q^{(m)}} {\tilde{l}}_m(\theta ^{*} \vert z) - {\tilde{l}}_m(\theta ^{*} \vert z^{*}) \}. \end{aligned}$$

Next we prove the result by the following three steps.

  • Step 1. With high probability, it holds that \(\theta ^{(m)} \in B(\theta ^{*}, \delta )\) and

    $$\begin{aligned} q^{(m)} (z^{*}) \ge 1 - C \exp \{-c_1 m d_n\} \end{aligned}$$
    (E15)

    for \(m = 1, 2, \ldots \).

  • Step 2. With high probability, it holds that

    $$\begin{aligned}{} & {} \sum _{m=1}^M \{ {\mathbb {E}}_{q^{(m)}} {\tilde{l}}_m(\theta ^{(m)} \vert z) - {\mathbb {E}}_{q^{(m)}} {\tilde{l}}_m(\theta ^{*} \vert z) \} \nonumber \\{} & {} \quad \le C \sqrt{M} \log (M \vert A \vert )^2, \end{aligned}$$
    (E16)

    for some constant C.

  • Step 3. With high probability, it holds that

    $$\begin{aligned}{} & {} \vert \sum _{m=1}^M \{ {\mathbb {E}}_{q^{(m)}} \tilde{l}_m(\theta ^{(m)} \vert z) - {\tilde{l}}_m(\theta ^{(m)} \vert z^{*}) \} \vert \nonumber \\{} & {} \quad \le M \exp \{ - c d_n\}, \end{aligned}$$
    (E17)

    and

    $$\begin{aligned}{} & {} \vert \sum _{m=1}^M \{ {\mathbb {E}}_{q^{(m)}} {\tilde{l}}_m(\theta \vert z) - {\tilde{l}}_m(\theta \vert z^{*}) \} \vert \nonumber \\{} & {} \quad \le M \exp \{ - c d_n\} \end{aligned}$$
    (E18)

    for any \(\theta \in B(\theta ^{*}, \delta )\) and some constant c.

Proof of step 1. We prove this by mathematical induction. When \(m = 0\), it is obvious that \(\theta ^{(0)} \in B(\theta ^{*}, \delta )\) according to the assumption on initialization. By Lemma 3, we have that \(\sum _{z_i \ne z_i^{*}} q_i^{(1)}(z_i) = q_i^{(1)}(z_i^{*}) O(\exp \{- \tilde{c} d_i\})\). Then \(q^{(1)}(z^{*}) = \prod _i q_i^{(1)}(z_i^{*}) \ge 1 - n O(\exp \{-c_1 d_n\})\). That is, (E15) holds for \(q^{(1)}\) by adjusting constant \(c_1\). Next we assume that \(\theta ^{(m)} \in B(\theta ^{*}, \delta )\) and (E15) holds for any \(m \le m_1\) and need to show that \(\theta ^{(m_1+1)} \in B(\theta ^{*}, \delta )\) and (E15) holds for \(m = m_1 + 1\).

We consider the following two scenarios,

  1. (1)

    \(0 \le \ \vert \theta ^{(m_1)} - \theta (q^{(m_1)})\ \vert < \frac{\delta }{2}\) and

  2. (2)

    \(\frac{\delta }{2} \le \ \vert \theta ^{(m_1)} - \theta (q^{(m_1)})\ \vert \). We can compute that

$$\begin{aligned}{} & {} \ \vert \theta ^{(m_1 + 1)} - \theta (q^{(m_1)})\ \vert ^2 = \nonumber \\{} & {} \ \vert \theta ^{(m_1)} + \eta _{m_1} \frac{ \partial {\mathcal {Q}}_{m_1+1}(\theta , q)}{\partial \theta } - \theta (q^{(m_1)}) \ \vert ^2 \nonumber \\{} & {} = \ \vert \theta ^{(m_1)} - \theta ^{*}\ \vert ^2\nonumber \\{} & {} + 2 \eta _{m_1} \frac{ \partial {\mathcal {Q}}_{m_1+1}(\theta , q)}{\partial \theta }(\theta ^{(m_1)} - \theta (q^{(m_1)})) \nonumber \\{} & {} + \eta _{m_1}^2 \ \vert \frac{ \partial {\mathcal {Q}}_{m_1+1}(\theta , q)}{\partial \theta }\ \vert ^2. \end{aligned}$$
(E19)

By Lemma 1, we know that \( \frac{ \partial {\mathcal {Q}}_{m_1+1}(\theta , q)}{\partial \theta } = \frac{\partial \bar{{\mathcal {Q}}}(\theta , q)}{\partial \theta } + \epsilon \) where \(\epsilon = o(1)\) for all q. Therefore, the right hand side of (E19) becomes

$$\begin{aligned}{} & {} \ \vert \theta ^{(m_1)} - \theta (q^{(m_1)})\ \vert ^2 + 2 \eta _{m_1} \frac{\partial \bar{{\mathcal {Q}}}(\theta , q)}{\partial \theta }(\theta ^{(m_1)} - \theta (q^{(m_1)}) ) \nonumber \\{} & {} \qquad + 2 \epsilon \eta _{m_1} \ \vert \theta ^{(m_1)} - \theta (q^{(m_1)})\ \vert + 2(\eta _{m_1}^2 + \epsilon ^2)\ \vert \frac{\partial \bar{{\mathcal {Q}}}(\theta , q)}{\partial \theta }\ \vert ^2.\nonumber \\ \end{aligned}$$
(E20)

In the first scenario, by Condition C4, (E20) implies that

$$\begin{aligned}{} & {} \ \vert \theta ^{(m_1 + 1)} - \theta (q^{(m_1)})\ \vert \le \ \vert \theta ^{(m_1)} - \theta (q^{(m_1)})\ \vert + c \delta \epsilon \nonumber \\{} & {} \qquad +O(\epsilon ^2) < \frac{3}{4} \delta \end{aligned}$$
(E21)

when the step size \(\eta _{m_1}\) is small (e.g.\(\eta _{m_1} \le c/2\)).

In the second scenario, by Condition C4, (E19) implies that

$$\begin{aligned}{} & {} \vert \theta ^{(m_1 + 1)} - \theta (q^{(m_1)})\ \vert \le \vert \theta ^{(m_1)} - \theta (q^{(m_1)})\ \vert \nonumber \\{} & {} \qquad - 2(c + o(1)) \eta _{m_1} \ \vert \theta ^{(m_1)} - \theta (q^{(m_1)})\ \vert ^2 \nonumber \\ {}{} & {} \qquad + \eta ^2(1 + o(1))c^2 \ \vert \theta ^{(m_1)} - \theta (q^{(m_1)})\ \vert ^2 \nonumber \\{} & {} \quad \le \ \vert \theta ^{(m_1)} - \theta (q^{(m_1)})\ \vert - \frac{1}{2}c \eta _{m_1} \ \vert \theta ^{(m_1)} - \theta (q^{(m_1)})\ \nonumber \\ \vert ^2 \end{aligned}$$
(E22)

for \(\eta _{m_1} \le 1/2c\).

According to (E15) via induction, we have that \(d(q^{(m_1)}(z^{*}), \delta _{z^{*}}) = C \exp \{-c_1 m_1 d_n\}\). This further implies that \(\ \vert \theta (q^{(m_1)}) - \theta ^{*}\ \vert = O(\exp \{-c_1 m_1 d_n\})\) by Condition C3. By above facts, in the first scenario, we have

$$\begin{aligned} \ \vert \theta ^{(m_1 + 1)} - \theta ^{*}\ \vert\le & {} \ \vert \theta ^{(m_1 + 1)} - \theta (q^{(m_1)})\ \vert + \ \vert \theta (q^{(m_1)}) - \theta ^{*}\ \vert \nonumber \\\le & {} \ \vert \theta ^{(m_1)} - \theta (q^{(m_1)})\ \vert + \ \vert \theta (q^{(m_1)}) - \theta ^{*}\ \vert \nonumber \\\le & {} 3\delta /4 + 2 \ \vert \theta (q^{(m_1)}) - \theta ^{*}\ \vert < \delta . \end{aligned}$$
(E23)

In the second scenario, we have

$$\begin{aligned}{} & {} \vert \theta ^{(m_1 + 1)} - \theta ^{*}\ \vert \le \ \vert \theta ^{(m_1 + 1)} - \theta (q^{(m_1)})\ \vert + \vert \theta (q^{(m_1)}) - \theta ^{*}\ \vert \\{} & {} \quad \le \ \vert \theta ^{(m_1)} - \theta (q^{(m_1)})\ \vert \\{} & {} \quad - \frac{1}{2}c \eta _{m_1} \ \vert \theta ^{(m_1)} - \theta (q^{(m_1)})\ \vert ^2 + \ \vert \theta (q^{(m_1)}) - \theta ^{*}\ \vert \\{} & {} \quad< \delta - \frac{c \eta _{m_1} \delta }{4} + O(\exp \{-c_1m_1d_n\}) < \delta , \end{aligned}$$

where the last inequality holds since \(\eta _{m_1} = c m_1^{-1/2}>> \exp \{-c_1m_1d_n\}\). Hence, we conclude that \(\theta ^{(m_1 + 1)} \in B(\theta ^{*}, \delta )\).

Next, we turn to study approximate distribution \(q^{(m+1)}\) to show that (E15) holds for \(m = m_1 + 1\). According to Condition C2 and Lemma 2, we have that \( \vert A \vert l_{m_1}(\theta ^{(m_1)} \vert z) \le \vert A \vert l_{m_1}(\theta ^{(m_1)} \vert z^{*}) - c d_n\) for any \(z \ne z^{*}\). This implies that

$$\begin{aligned} {\mathbb {E}}_{q_{-z_i}^{(m_1)}} \vert A \vert l_{m_1}(\theta ^{(m_1)} \vert z_i, z_{-i}) \le \vert A \vert l_{m_1}(\theta ^{(m_1)} \vert z^{*}) - c d_n \end{aligned}$$

for any \(z_i \ne z_i^{*}\) and

$$\begin{aligned}{} & {} {\mathbb {E}}_{q_{-z_i}^{(m_1)}} \vert A \vert l_{m_1}(\theta ^{(m_1)} \vert z_i^{*}, z_{-i}) \\{} & {} \quad \ge \vert A \vert l_{m_1}(\theta ^{(m_1)} \vert z^{*}) - C \vert A \vert \exp \{-c m_1 d_n\}. \end{aligned}$$

Combining these two facts, we have that

$$\begin{aligned} E_{q_{-z_i}^{(m_1)}} \vert A \vert l_{m_1}(\theta ^{(m_1)} \vert z_i, z_{-i}) < \\ {\mathbb {E}}_{q_{-z_i}^{(m_1)}} \vert A \vert l_{m_1}(\theta ^{(m_1)} \vert z_i^{*}, z_{-i}) - c_2 d_n, \end{aligned}$$

holds for any \(z_i \ne z_i^{*}\) and some adjusted constant \(c_2\).

By recursive formula

$$\begin{aligned} S^{(m_1 + 1)}(z_i) = S^{(m_1)}(z_i) \exp \{\mathbb E_{q_{-z_i}^{(m_1)}} \vert A \vert l_m(\theta ^{(m_1)} \vert z_i, z_{-i}) \}, \end{aligned}$$

we then have

$$\begin{aligned} \sum _{z_i \ne z_i^{*}} S^{(m_1 + 1)}(z_i) = S^{(m_1 + 1)}(z_i^{*}) O(\exp \{- (c_1 m_1 + c_2) d_n\}), \end{aligned}$$

which indicates that

$$\begin{aligned} q^{(m_1 + 1)}(z_i^{*}) \ge 1 - \exp \{- (c_1 m_1 + c_2) d_n\}. \end{aligned}$$

Finally, noting \(q^{(m_1 + 1)}(z^{*}) = \prod _{i=1}^n q_i^{(m_1 + 1)}(z_i^{*})\) gives us

$$\begin{aligned}&q^{(m_1 + 1)}(z^{*}) \ge 1 - n \exp \{-(c_1 m_1 + c_2) d_n\} \\&\quad \ge 1 - \exp \{-c_1(m_1 + 1) d_n\}.\\ \end{aligned}$$

Hence, we complete Step 1 by induction. Proof of Step 2. For notational simplicity, we denote \({\mathbb {E}}_{q^{(m)}} {\tilde{l}}_m(\theta \vert z)\}\) as \(h_m(\theta )\) in the remaining part of the proof. By local convexity, we have

$$\begin{aligned} h_m(\theta ^{*}) - h_m(\theta ^{(m)}) \ge \nabla h_m(\theta ^{(m)})^T (\theta ^{*} - \theta ^{(m)}), \end{aligned}$$

which is equivalent to

$$\begin{aligned}{} & {} h_m(\theta ^{(m)}) - h_m(\theta ^{*}) \le \nonumber \\{} & {} \quad \nabla h_m(\theta ^{(m)})^T (\theta ^{(m)} - \theta ^{*}). \end{aligned}$$
(E24)

We know that

$$\begin{aligned}{} & {} d({\bar{\theta }}^{(m+1)}, \theta ^{*}) - d(\theta ^{(m)}, \theta ^{*}) \\{} & {} \quad \le \ \vert \theta ^{(m)} - \eta _m \nabla h_m(\theta ^{(m)}) - \theta ^{*}\vert ^2 - \ \vert \theta ^{(m)} - \theta ^{*}\vert ^2 \\{} & {} \quad \le \eta _m^2 \ \vert \nabla h_m(\theta ^{(m)}) \vert ^2 - 2 \eta _m \nabla h_m(\theta ^{(m)})^T (\theta ^{(m)} - \theta ^{*}), \end{aligned}$$

where \({\bar{\theta }}^{(m+1)} = \theta ^{(m)} - \eta _m \nabla h_m(\theta ^{(m)})\). By summing over m and the fact that \(d(\theta ^{(m)}, \theta ^{*}) \le d({\bar{\theta }}^{(m)}, \theta ^{*})\), we have

$$\begin{aligned}{} & {} \sum _m \{ d(\theta ^{(m+1)}, \theta ^{*}) - d(\theta ^{(m)}, \theta ^{*})\} \nonumber \\{} & {} \quad \le \sum _m \{ d({\bar{\theta }}^{(m+1)}, \theta ^{*}) - d(\theta ^{(m)}, \theta ^{*})\} \nonumber \\{} & {} \quad \le \sum _m \{ \eta _m^2 \ \vert \nabla h_m(\theta ^{(m)})\ \vert ^2 \end{aligned}$$
(E25)
$$\begin{aligned}{} & {} \qquad - 2 \eta _m \nabla h_m(\theta ^{(m)})^T (\theta ^{(m)} - \theta ^{*}) \}. \end{aligned}$$
(E26)

By Eq. (E24), we then have

$$\begin{aligned}{} & {} \text {regret} \le \sum _m \nabla h_m(\theta ^{(m)})^T (\theta ^{(m)} - \theta ^{*}) \nonumber \\{} & {} \quad \le \frac{1}{2 \eta _m} (d(\theta ^{(0)}, \theta ^{*}) - d(\theta ^{(m+1)}, \theta ^{*})) \nonumber \\{} & {} \qquad + \sum _m \frac{\eta _m}{2} \ \vert \nabla h_m(\theta ^{(m)}) \vert ^2, \end{aligned}$$
(E27)

where the second inequality uses (E25).

Next, we prove that \(\nabla h_m(\theta ^{(m)})\) is bounded with probability going to 1 for any m. Note that,

$$\begin{aligned}{} & {} \nabla l_m(\theta \vert z) = \nabla \left( \frac{1}{ \vert A \vert } \left[ \sum _{(i,j) \in A} \int _{(m - 1)\omega }^{m \omega } \log \lambda _{ij}(s \vert z) dN_{ij}(s) \right. \right. \\{} & {} \quad \left. \left. - \int _{(m - 1)\omega }^{m\omega } \lambda _{ij}(s \vert z) ds\} \right] \right) \\{} & {} \quad \le \frac{1}{ \vert A \vert } \left\{ \sum _{(i,j) \in A} \int _{(m - 1)\omega }^{m \omega } \frac{\lambda _{ij}^{'}(s \vert z)}{\lambda _{ij}(s \vert z)} dN_{ij}(s)\right. \\{} & {} \quad \left. - \int _{(m - 1)\omega }^{m\omega } \lambda _{ij}^{'}(s \vert z) ds\right\} . \end{aligned}$$

Let \(B_1 = \sup _{t,z} \frac{\lambda _{ij}^{'}(t \vert z)}{\lambda _{ij}(t \vert z)}\) and \(B_2 = \sup _{t, z} \lambda _{ij}^{'}(s \vert z)\). Both \(B_1\) and \(B_2\) are bounded according to Condition C5. We know that the number of events, \(M_w\), in each time window follows a Poisson distribution with mean \(\int _0^{\omega } \lambda (s) ds\). Therefore, we get \(P(M_w \ge m_w) \le \exp \{-c m_w\}\) for some constant c. We then have that \(\nabla l_m(\theta \vert z) \le C(B_1 m_w + B_2)\) with probability at least \(1 - M \vert A \vert \exp \{ - c m_w\}\).

By letting \(\eta _m = \frac{1}{\sqrt{M}}\), (E27) becomes

$$\begin{aligned} \text {regret}\le & {} C (\sqrt{M} d(\theta ^{(0)}, \theta ^{*}) + \sqrt{M} (B_1 m_W + B_2)^2) \\\le & {} C \sqrt{M} \log (M \vert A \vert )^2, \end{aligned}$$

where we set \(m_w = c \log (M \vert A \vert )\).

Proof of step 3. We only need to show that for each m, it holds that

$$\begin{aligned} \vert {\mathbb {E}}_{q^{(m)}} {\tilde{l}}_m(\theta \vert z) - \tilde{l}_m(\theta \vert z^{*}) \vert \le C \exp \{-c d_n\}. \end{aligned}$$

We know that

$$\begin{aligned}{} & {} {\mathbb {E}}_{q^{(m)}} {\tilde{l}}_m(\theta \vert z^{*}) \\{} & {} \quad = q^{(m)}(z^{*}) {\tilde{l}}_m(\theta \vert z^{*}) + \sum _{z \ne z^{*}} q^{(m)}(z) {\tilde{l}}_m(\theta \vert z) \\{} & {} \quad \le q^{(m)}(z^{*}) {\tilde{l}}_m(\theta \vert z) + \sum _{z \ne z^{*}} q^{(m)} B_0 m_W. \end{aligned}$$

This implies that

$$\begin{aligned}{} & {} \vert {\mathbb {E}}_{q^{(m)}} {\tilde{l}}_m(\theta \vert z) - \tilde{l}_m(\theta \vert z^{*}) \vert \le (1 - q^{(m)}(z^{*})) {\tilde{l}}_m(\theta \vert z^{*}) \\{} & {} \qquad + \sum _{z \ne z^{*}} q^{(m)}(z) {\tilde{l}}_m(\theta \vert z) \\{} & {} \quad \le C B_0 m_w \exp \{-c d_n\}. \end{aligned}$$

This completes the proof.

Proof of Theorem 3

By the update rule, we know that \(\theta ^{(m+1)} = \theta ^{(m)} - \eta _m \nabla h_m(\theta ^{(m)})\), so

$$\begin{aligned}{} & {} \ \vert \theta ^{(m+1)} - \theta ^{*} \vert ^2 \le \ \vert \theta ^{(m)} - \eta _m \nabla h_m(\theta ^{(m)}) - \theta ^{*} \vert ^2 \nonumber \\{} & {} \quad = \ \vert \theta ^{(m)} - \theta ^{*}\ \vert ^2 - \eta _m \nabla h_m(\theta ^{(m)}) (\theta ^{(m)} - \theta ^{*}) \nonumber \\{} & {} \quad \quad + \eta _m^2 \nabla h_m^2(\theta ^{(m)}). \end{aligned}$$
(E28)

Furthermore,

$$\begin{aligned}{} & {} \ \vert \theta ^{(m+1)} - \theta ^{*}\ \vert ^2 \nonumber \\{} & {} \quad \le \ \vert \theta ^{(m)} - \theta ^{*}\ \vert ^2 - \eta _m \nabla {h}_m(\theta ^{(m)}) (\theta ^{(m)} - \theta ^{*}) \nonumber \\{} & {} \qquad + \eta ^2_n \nabla h_n^2(\theta ^{(n)}) \nonumber \\{} & {} \quad = \ \vert \theta ^{(m)} - \theta ^{*}\ \vert ^2 - \eta _m (\nabla {h}_m(\theta ^{(m)}) - \nabla {\tilde{l}}(\theta ^{(m)}) \nonumber \\{} & {} \qquad + \nabla \bar{\tilde{l}}(\theta ^{(m)})) (\theta ^{(m)} - \theta ^{*}) + \eta ^2_m \nabla h_m^2(\theta ^{(m)}) \nonumber \\{} & {} \quad \le \ \vert \theta ^{(m)} - \theta ^{*}\ \vert ^2 - \eta _m \nabla \bar{\tilde{l}}(\theta ^{(m)} \vert z^{*}) (\theta ^{(m)} - \theta ^{*}) \nonumber \\{} & {} \qquad + \eta ^2_m \nabla h_m^2(\theta ^{(m)}) + c \eta _m \delta d(q^{(m)}, \delta _{z^{*}}) \nonumber \\{} & {} \qquad + c \delta \eta _m O_p\left( \frac{1}{\sqrt{ \vert A \vert }}\right) , \end{aligned}$$
(E29)

where the term \(\frac{1}{\sqrt{ \vert A \vert }}\) comes from the probability bound in Lemma 1. Notice that \(\theta (q) = \theta ^{*}\) when \(q = \delta _{z^{*}}\), we have that \(- \nabla \bar{\tilde{l}}(\theta ^{(m)} \vert z^{*}) (\theta ^{(m)} - \theta ^{*}) \le - c \ \vert \theta ^{(m)} - \theta ^{*}\ \vert ^2,\) according to Condition C4. Furthermore, we know that \(d(q^{(m)}, \delta _{z^{*}}) \le \exp \{-m d_n\}\) (see (E15)). Therefore, \(\eta _m \delta d(q^{(m)}, \delta _{z^{*}})\) can be absorbed into \(\eta ^2_m \nabla h_m^2(\theta ^{(m)})\). In summary, (E29) becomes

$$\begin{aligned}{} & {} \ \vert \theta ^{(m+1)} - \theta ^{*} \vert ^2 \le (1 - c \eta _m) \vert \theta ^{(m)} - \theta ^{*} \vert ^2 \\{} & {} \qquad + C \eta _m^2 \nabla h_m^2(\theta ^{(m)}) + C \frac{1}{\sqrt{ \vert A \vert }} \eta _m, \end{aligned}$$

which further gives,

$$\begin{aligned}{} & {} \ \vert \theta ^{(m+1)} - \theta ^{*}\ \vert ^2 \le (1 - c \eta _m) \ \vert \theta ^{(m)} - \theta ^{*}\vert ^2 \nonumber \\{} & {} \qquad + B \left( \eta _m^2 (\log (M \vert A \vert ))^2 + \eta _m \frac{1}{\sqrt{ \vert A \vert }}\right) , \end{aligned}$$

by adjusting constants and noticing that \(\nabla h_m^2(\theta )\) is bounded by \((\log (M \vert A \vert ))^2\). After direct algebraic calculation, we have

$$\begin{aligned}{} & {} \ \vert \theta ^{(m+1)} - \theta ^{*} \vert ^2 \nonumber \\{} & {} \quad \le \ \vert \theta ^{(0)} - \theta ^{*}\ \vert ^2 \prod _{t=0}^m (1 - c \eta _t) \nonumber \\{} & {} \qquad + B \sum _{t = 0}^m \eta _t \left( \eta _t (\log (N \vert A \vert ))^2 + \frac{1}{\sqrt{A}}\right) \prod _{s = t+1}^m (1 - c \eta _t).\nonumber \\ \end{aligned}$$
(E30)

For the first term in (E30), we have that

$$\begin{aligned} \prod _{t=0}^m (1 - c \eta _t)\le & {} \prod _{t=0}^m \exp \{-c \eta _t\} \\= & {} \exp \{-c\sum _{t=0}^m \eta _t\} \\\le & {} C \exp \{- m^{1-\alpha }\}. \end{aligned}$$

Next, we define \(x_{1t} = \eta _t (\log (M \vert A \vert ))^2\) and \(x_2 = 1/\sqrt{ \vert A \vert }\), to simplify the remainder of the proof. For the second term in (E30), we have that

$$\begin{aligned}{} & {} \sum _{t=0}^m \eta _t (x_{1t} + x_2) \prod _{s = t+1}^m (1 - c\eta _s) \\{} & {} \quad = \sum _{t=0}^{m/2} \eta _t (x_{1t} + x_2) \prod _{s = t+1}^m (1 - c\eta _s) \\{} & {} \qquad + \sum _{t=m/2 + 1}^{m} \eta _t (x_{1t} + x_2) \prod _{s = t+1}^m (1 - c\eta _s) \\{} & {} \quad = \sum _{t=0}^{m/2} \eta _t (x_{1t} + x_2) \prod _{s = t+1}^m (1 - c\eta _s) \\{} & {} \qquad + \sum _{t=m/2 + 1}^{m} (x_{1t} + x_2) (1 - (1 - c\eta _t))/c \times \\{} & {} \quad \prod _{s = t+1}^m (1 - c\eta _s) \\{} & {} \quad \le \sum _{t=0}^{m/2} \eta _t (x_{1t} + x_2) \prod _{s = t+1}^m (1 - c\eta _s) \\{} & {} \qquad + \frac{1}{c} (x_{1m/2} + x_2) \sum _{t=m/2 + 1}^{m} (1 - (1 - c\eta _t)) \times \\{} & {} \quad \prod _{s = t+1}^m (1 - c\eta _s) \\{} & {} \quad \le \sum _{t=0}^{m/2} \eta _t (x_{1t} + x_2) \prod _{s = t+1}^m (1 - c\eta _s) + \frac{1}{c} (x_{1m/2} + x_2) \\{} & {} \quad \le \exp \left\{ -c \sum _{t = m/2 + 1}^m \eta _t\right\} (\sum _{t=0}^{m/2} \eta _t (x_{1t} + x_2) ) \\{} & {} \qquad + \frac{1}{c} (x_{1m/2} + x_2) \\{} & {} \quad \le m \exp \{ - c m^{1-\alpha }\} + 1/c (x_{1 m/2} + x_2) \\{} & {} \quad \le c_0 (m^{-\alpha } (\log (M \vert A \vert ))^2 + \frac{1}{\sqrt{ \vert A \vert }}), \end{aligned}$$

by adjusting the constants. Combining the above inequalities, we have

$$\begin{aligned} \ \vert \theta ^{(m)} - \theta ^{*} \vert ^2 = O_p \left( m^{-\alpha } (\log (M \vert A \vert ))^2 + \frac{1}{\sqrt{ \vert A \vert }}\right) . \end{aligned}$$

This concludes the proof.

Appendix F: Community recovery under relaxed conditions

In this section, we relax the conditions mentioned in previous section by considering the case of uneven degree distribution. Let \(d_i\) be the number of nodes that i-th individual connects to. Then uneven degree distribution means that \(d_i\) are not in the same order. Degree \(d_i\) goes to infinity for some node i’s and is bounded for other i’s. Under this setting, we establish results for consistent community recovery. We start with introducing a few more modified conditions.

  1. C2’

    [Latent membership identification] Assume

    $$\begin{aligned} {{\bar{l}}}_w(\theta \vert z_{{\mathcal {N}}}, z_{-{\mathcal {N}}}^{*}) \le {{\bar{l}}}_w(\theta \vert z^{*}) - c \frac{\sum _{i \in {\mathcal {N}}} d_i}{ \vert A \vert }, \end{aligned}$$

    for any subset \({\mathcal {N}} \subset \{1,\ldots ,n\}\) and \(z_{{\mathcal {N}}}\) and \(z_{-{\mathcal {N}}}\) are the sub-vectors of z with and without elements in \({\mathcal {N}}\) respectively.

  2. C3’

    [Continuity] Assume

    $$\begin{aligned} {{\bar{Q}}}(\theta , q) - {{\bar{l}}}_w (\theta \vert z^{*}) \le c \frac{1}{ \vert A \vert } \sum _i d_i \cdot d(q_i, \delta _{z_i^{*}}) \end{aligned}$$
    (F31)

    holds. Also assume \( \vert \theta (q) - \theta ^{*} \vert \le c \frac{1}{ \vert A \vert } \sum _i d_i \cdot d(q_i, \delta _{z_i^{*}})\) holds for any q and some constant c.

  3. C6’

    [Network degree] Suppose \(\{1,\ldots ,n\}\) can be partitioned into two sets \(\mathcal N_u\) and \({\mathcal {N}}_b\). \({\mathcal {N}}_u\) is the set of nodes with degree larger than \(d_n\) and \(N_b\) is the set of nodes with bounded degree. \(d_n = m^{r_d} (r_d > 0)\). Let \(d_{i, {\mathcal {N}}_b}\) be the number of nodes within \(\mathcal N_b\) that individual i connects to. We assume \(d_{i, \mathcal N_b}\) is bounded for all i. In addition, the cardinality of \(N_b\) satisfies \( \vert {\mathcal {N}}_b \vert / \vert A \vert = o(1)\).

Lemma 4

With probability \(1 - \exp \{ - C d_n \}\), it holds that

$$\begin{aligned} \sum _{z: z_i \ne z_i^{*} } L_i(\theta \vert z) = L_i(\theta \vert z^{*}) \cdot O(\exp \{- c_0 d_n\}) \end{aligned}$$
(F32)

for any \(i \in {\mathcal {N}}_{u}\) and any \(\theta \in B(\theta ^{*}, \delta )\) for some constants \(c_1\) and \(\delta \). Here \(L_i(\theta \vert z):= \exp \{ \vert A \vert l_i (\theta \vert z)\}\) and

$$\begin{aligned} l_i(\theta \vert z):= \frac{1}{ \vert A \vert }(\sum _{(i,j) \in A} l_{ij}(\theta \vert z_i, z_j) + \sum _{(j,i) \in A} l_{ji}(\theta \vert z_j, z_i)). \end{aligned}$$

Proof of Lemma 4

Similar to the proof of Lemma 2, we can prove that

$$\begin{aligned} l_i(\theta \vert z_{{\mathcal {N}}}, z_{-{\mathcal {N}}}^{*}) \le l_i(\theta \vert z^{*}) - c/2 \frac{\sum _{j \in {\mathcal {N}}} d_j}{ \vert A \vert } \end{aligned}$$
(F33)

holds for any fixed \(z_{{\mathcal {N}}}\) with probability at least \(1 - \exp \{- C(\sum _{j \in {\mathcal {N}}} d_j)\}\). Then, we can compute

$$\begin{aligned}{} & {} P(l_i(\theta \vert z_{{\mathcal {N}}}, z_{-{\mathcal {N}}}^{*}) \ge l_i(\theta \vert z^{*}) - c/2 \frac{\sum _{j \in {\mathcal {N}}} d_j}{ \vert A \vert } \nonumber \\{} & {} \quad \text {for some} ~ z_{N}) \nonumber \\{} & {} \quad \le \sum _{z_{-i} \ne z_{-i}^{*}} \exp \{ - C (d_i + \sum _{j: z_j \ne z_j^{*}} d_j) \} \nonumber \\{} & {} \quad \le \sum _{m_0=1}^n \sum _{ \vert z_{{\mathcal {N}}_u} - z_{\mathcal N_u}^{*} \vert _0 = m_0} \exp \{ - C d_n m_0 \} \nonumber \\{} & {} \qquad + K^{ \vert d_{i,{\mathcal {N}}_b} \vert } \exp \{-C d_i\} \nonumber \\{} & {} \quad \le \exp \{ - c_0 d_n\}, \end{aligned}$$
(F34)

by adjusting the constants. (F34) uses the fact that \(l_i(\theta \vert z)\) only depends on a finite number of nodes in \({\mathcal {N}}_{b}\). This completes the proof.

We define the estimator of latent class membership as \({\hat{z}}_i:= \arg \max _{z} q_i^{(M)}(z)\). The following result says that we can consistently estimate the latent class membership of those individuals with large degrees exponentially fast.

Theorem 5

Under Conditions C1, C4, C5, C7 and C2’, C3’, C6’, with probability \(1 - M \exp \{-C d_n\}\), we have \(q_i^{(m)}(z_i = z_i^{*}) \ge 1 - C \exp \{c_1 m d_n\}\) for all \(i \in {\mathcal {N}}_u\) and \(m = 1, \ldots , M\). Especially when \(N_u = \{1, \ldots , n\}\), we can recover true labels of all nodes.

Proof of Theorem 5

To prove this, we only need to show that \(q_i^{(m)}(z_i^{*}) \ge 1 - C \exp \{-c_1 m d_n\}\) for all \(i \in {\mathcal {N}}_u\) with probability \(1 - \exp \{-Cd_n\}\). (\(m = 1,2,\ldots \) and \(c_1\) is some small constant.) Without loss of generality, we can assume \(\theta ^{(m)}\) is always in \({\mathcal {B}}(\theta ^{*}, \delta )\). (The proof of this argument is almost same as that in the proof of Theorem 2.)

Take any \(i \in {\mathcal {N}}_u\). We first prove that \(q_i^{(1)}(z_i^{*}) \ge 1 - C \exp \{-c_1 d_n\}\) for \(i \in {\mathcal {N}}_u\). This is true by applying Condition C7. In the following, we prove the result by induction.

According to Lemma 4 and Condition C2’, we have that \( \vert A \vert l_{i}(\theta ^{(m_1)} \vert z) \le \vert A \vert l_{i}(\theta ^{(m_1)} \vert z^{*}) - c d_n\) for any z with \(z_i \ne z_i^{*}\) with probability \(1 - \exp \{-C d_n\}\). This implies that

$$\begin{aligned} {\mathbb {E}}_{q_{-z_i}^{(m_1)}} \vert A \vert l_{m_1}(\theta ^{(m_1)} \vert z_i, z_{-i}) \le \vert A \vert l_{m_1}(\theta ^{(m_1)} \vert z^{*}) - c d_n \end{aligned}$$

for any \(z_i \ne z_i^{*}\) and

$$\begin{aligned}{} & {} {\mathbb {E}}_{q_{-z_i}^{(m_1)}} \vert A \vert l_{m_1}(\theta ^{(m_1)} \vert z_i^{*}, z_{-i}) \\{} & {} \quad \ge \vert A \vert l_{n_1}(\theta ^{(m_1)} \vert z^{*}) - C \vert A \vert \exp \{-c_1 m_1 d_n\} - C \vert d_{i, N_b} \vert . \end{aligned}$$

Combining these two facts, we have that

$$\begin{aligned}{} & {} E_{q_{-z_i}^{(m_1)}} \vert A \vert l_{m_1}(\theta ^{(m_1)} \vert z_i, z_{-i}) \\{} & {} \quad < {\mathbb {E}}_{q_{-z_i}^{(m_1)}} \vert A \vert l_{m_1}(\theta ^{(m_1)} \vert z_i^{*}, z_{-i}) - c_2 d_n \end{aligned}$$

holds for any \(z_i \ne z_i^{*}\) and some adjusted constant \(c_2 > c_1\).

By the recursive formula

$$\begin{aligned} S^{(m_1 + 1)}(z_i) = S^{(m_1)}(z_i) \exp \{\mathbb E_{q_{-z_i}^{(m_1)}} \vert A \vert l_m(\theta ^{(m_1)} \vert z_i, z_{-i}) \}, \end{aligned}$$

we then have

$$\begin{aligned} \sum _{z_i \ne z_i^{*}} S^{(m_1 + 1)}(z_i) = S^{(m_1 + 1)}(z_i^{*}) O(\exp \{- (c_1 m_1 + c_2) d_n\}), \end{aligned}$$

which indicates that

$$\begin{aligned} q^{(m_1 + 1)}(z_i^{*})\ge & {} 1 - \exp \{- (c_1 m_1 + c_2) d_n\} \nonumber \\\ge & {} 1 - \exp \{- (m_1 + 1) c_1 d_n \}. \end{aligned}$$
(F35)

Hence, we complete the proof by induction.

Lastly, it is easy to see that proof of Theorem 4 is the special case of Theorem 5.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fang, G., Ward, O.G. & Zheng, T. Online estimation and community detection of network point processes for event streams. Stat Comput 34, 35 (2024). https://doi.org/10.1007/s11222-023-10342-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11222-023-10342-z

Keywords

Navigation