[go: up one dir, main page]

Academia.eduAcademia.edu

An Open Framework for Analyzing and Modeling XR Network Traffic

2021, IEEE Access

Received August 5, 2021, accepted September 2, 2021, date of publication September 15, 2021, date of current version September 28, 2021. Digital Object Identifier 10.1109/ACCESS.2021.3113162 An Open Framework for Analyzing and Modeling XR Network Traffic MATTIA LECCI , (Graduate Student Member, IEEE), MATTEO DRAGO , (Graduate Student Member, IEEE), ANDREA ZANELLA , (Senior Member, IEEE), AND MICHELE ZORZI , (Fellow, IEEE) Department of Information Engineering, University of Padua, 35131 Padova, Italy Corresponding author: Mattia Lecci (leccimat@dei.unipd.it) This work was supported in part by the National Institute of Standards and Technology (NIST) under Award 60NANB19D122 and Award 60NANB21D127. The work of Mattia Lecci was supported by Fondazione CaRiPaRo under Grant ‘‘Dottorati di Ricerca 2018.’’ ABSTRACT Thanks to recent advancements in the technology, Augmented Reality (AR) and Virtual Reality (VR) applications are gaining a lot of momentum, and they will surely become increasingly popular in the next decade. These new applications, however, require a step forward also in terms of models to simulate and analyze this type of traffic sources in modern communication networks, in order to guarantee to the users state-of-the-art performance and Quality of Experience (QoE). Recognizing this need, in this work we present a novel open-source traffic model, which researchers can use as a starting point both for improvements of the model itself and for the design of optimized algorithms for the transmission of these peculiar data flows. Along with the mathematical model and the code, we also share with the community the traces that we gathered for our study, collected from freely available applications such as Minecraft VR, Google Earth VR, and Virus Popper. Finally, we propose a roadmap for the construction of an end-to-end framework that fills this gap in the current state of the art. INDEX TERMS Traffic modeling, traffic analysis, network simulations, virtual reality applications. I. INTRODUCTION After several years of innovations, the technology is finally ready for applications such as Virtual Reality (VR), Augmented Reality (AR) and Mixed Reality (MR) to go mainstream (in the following we will use the term eXtended Reality (XR) as a general expression to consider all these distinct interaction modes). According to some estimates, by 2025 there will be over 200 million people using XR for immersive gaming experience and 95 million enjoying live events in this novel way [2]. This immediately translates in increasing sales of devices and headsets dedicated to experience this new type of contents, with an estimated shipment of these devices in the order of tens of millions in the coming decade, generating billions in revenue for all the fields in which this technology will be deployed [3]–[5]. Although it all started from the entertainment and video gaming arenas, where players could immerse in a virtual 3D world, now we can see XR applied in various fields, The associate editor coordinating the review of this manuscript and approving it for publication was Xiaogang Jin 129782 . such as building or landscape design, real estate, marketing, and healthcare, opening up the possibility of learning new concepts and training employees for difficult situations in a completely different way [3], [6]–[10]. Automotive companies, for example, are using VR to cut the time that leads to the physical model of a new product from weeks to days [5]. Regarding the general retail market, instead, VR can give customers realistic experiences with products, allowing them to easily consider different options and configurations [3] and thus increasing sales and decreasing product returns. The peculiarity of this new class of contents, besides the wide range of use cases, is that the end user does not passively receive the information, but acts on it, possibly affecting the future behavior of the application itself. Hence, the traffic flow to and from the content provider is highly dependent on the interaction with the virtual environment in which the user is immersed. In this paper, we will focus on examples related to the video gaming world (even though equivalent conclusions can be drawn for different XR applications), where the user interacts with the application using a keyboard or joypad, and the This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ VOLUME 9, 2021 M. Lecci et al.: Open Framework for Analyzing and Modeling XR Network Traffic results of such actions are immediately seen on the PC or TV screen. Through Head Mounted Devices (HMDs), when playing with videogames supporting VR, users can also react by moving their heads, causing the application to stream distinct portions of the environment depending on where the head is pointing [11]. Even though traditionally gaming software ran on devices which needed to respect several hardware constraints to generate high-quality images, now the paradigm is shifting towards a cloud approach [12], [13]. This can be extended to all other use cases besides gaming, and for this reason, we can refer to this new paradigm as Cloud XR. By moving the computing and graphical processing units into the cloud, less powerful devices can be used to fully exploit this new technology. This would benefit not only in terms of the actual cost of an HMD (which still plays a huge role in promoting the adoption by new users) but also in the final Quality of Experience (QoE). Having all the computing resources self-contained in the device would mean not only a higher weight and volume, but also concerns in terms of heat and battery life [4], [9]. This shift towards cloud infrastructures requires the optimization of current communication systems, to fully support distributed XR services. To this end, we need accurate models of the applications that generate these data flows and, to the best of our knowledge, no previous work has addressed this problem so far. In this work we try to fill this gap by proposing a traffic model that emulates an XR application, while also sketching a roadmap to guide researchers in the development of more precise models, using ours as a baseline. To further understand what are the steps that most influence the XR performance, it is useful to describe a common endto-end XR architecture [2], [9]. First, we can start from the collection and processing of sensory and tracking information, delegated to an ad hoc device. Then, this information is sent to an XR server to compose the viewport, i.e., what is actually shown to the user. This process includes 2D/3D media encoding and the generation of additional metadata (including the scene description). The device’s presentation engine at the client side, after receiving and decoding the information stream, generates the images to display. These images are derived from the decoded signals, the rendering metadata, and other information, if applicable. Finally, video and audio tracks associated with the current pose are generated by synchronizing and spatially aligning the rendered media. These steps need to be accomplished with minimal delay to guarantee adequate QoE. In fact, the motion-to-photon latency, i.e., the time from an action (e.g., a head movement) to the update of what is shown on the display, must be below 20 ms to avoid the so called cybersickness, associated to disorientation and dizziness [2], [14]–[18]. Following this physiological constraint, several industry players pose the network requirements, in terms of latency, in the range of 5–10 ms [2], [4], [9], [14], [19]. Also in terms of the gaming QoE, it has been demonstrated that for VOLUME 9, 2021 first-person shooters, racing games, and team soccer matches, application latency directly impacts the results of competitive e-sports and, if not properly addressed, would lead to abandoning the game [12]. This translates into stringent constraints both in Downlink (DL) and in Uplink (UL), considering that not only the content must be streamed as soon as it is required, but also the user movements need to be promptly notified to the server. For this reason, the software that collects each movement input must consider all 6 Degrees of Freedom (6DoF), tracking both translations and rotations in the three perpendicular axes (based on the VR device, some may consider only rotational motion, i.e., 3 Degrees of Freedom (3DoF)). To take immersive mobile experience to the next level, many improvements will be required in head, body, and even gaze tracking [7]. It is also important to distinguish between processing latency, associated with computation and rendering, and network latency. Rendering complex gaming images can be quite demanding, and the delay introduced by these operations can be larger than that caused by network services, which further motivates the need to offload these functions to proper cloud infrastructures [12]. Besides delay-related issues, an additional problem consists in the bursty nature of the XR traffic, meaning that the throughput measured over short time windows could be much higher than its long-term average value [9], which can be the case for an application that periodically generates collections of packets to refresh the viewport. Another aspect impacting the throughput is that, in order for the technology to be as close as possible to human vision, we will need a higher spatial and temporal resolution of the content presented to the user than currently possible (i.e., 3D 360◦ 8196 × 4096 resolution at 90 Hz and beyond display refresh rate) [7], [9], [14]. The core technology that is expected to guarantee the satisfaction of all these requirements, by paving the way for an optimized distribution of processing capabilities, is 5G. Many players have already invested in 5G for the rising of XR, for operations at both sub-6 GHz and mmWave [7], [8], [20], [21]. Nonetheless, even though some efforts have also been devoted by standard bodies to the redaction of technical reports [14], [19], at the present time researchers are limited by the lack of precise traffic models representing the stream to/from an XR server. Having these models would allow the research community to design telecommunication solutions that could reduce the delay contribution related to the network, while also considering all the processing steps. For this reason, we propose a generative model for XR traffic sources, obtained from real application traces, and we also delineate a roadmap of the necessary steps to further improve it with additional features able to cope with the aforementioned problems, i.e., motion-to-photon latency, burstiness, capacity. While in Sec. II we summarize the current state of the art in the XR arena, Sec. III is devoted to the description of the acquisition setup that we used to collect about 70 GB 129783 M. Lecci et al.: Open Framework for Analyzing and Modeling XR Network Traffic of data for a total of more than 4 hours of traced traffic time using different VR applications, both from the hardware and from the software point of view. We will also describe each of these applications and illustrate how we analyzed the dataset. The model obtained from this analysis will be presented in Sec. IV, and its end-to-end validation, along with some example use cases, are discussed in Sec. V. Finally, in Sec. VI we propose a roadmap to extend our baseline model with additional increasingly complex features, and Sec. VII concludes the paper. II. STATE OF THE ART A seminal conceptual model that describes the human and technical elements creating the participatory environments of virtual reality systems was proposed in [22], dating back to 1994. This demonstrates that the interest in the definition of common models for the study of this framework started even before the technology was ready, or even invented. Despite the research interest in this field, to the best of our knowledge little work has been done on the creation of generative traffic models in XR contexts, while the focus was put on different aspects of the technology. In particular, a huge effort has been devoted to the creation and validation of practical systems that use immersive technology to interact with the world in different ways. An example can be found in [23], which describes a system for the interactive analysis of large datasets with time-dependent data, realized on a multi-processor parallel machine in order to guarantee a smooth user experience. Instead, in [24] the authors developed a proof-of-concept system, combining Oculus Rift HMD and the Phantom Premium 1.5 High Force haptic device with the goal of demonstrating the feasibility of combining HMD and haptics in one system. Also, XR solutions have been tested for purposes of architectural design [25] and for providing virtual performance instructions and feedback on users that want to play a real piano [26]. From a more technical perspective, a complete overview of the latest developments on immersive and 360◦ video streaming can be found in [11], where the author aims at providing a complete overview on four of the most important challenges in this field, namely: omnidirectional video coding and compression, subjective and objective QoE and the factors that can affect it, saliency measurement and Field-ofView (FoV) prediction, and adaptive streaming of immersive 360◦ videos. As stressed in [11], finding a proper way to measure the user’s QoE may be difficult. This is especially important with respect to the design of telecommunication infrastructures able to optimize the experience of the user, and to guarantee constant and stable service quality. For this reason, a lot of effort has been devoted to creating network solutions for the maximization of the quality of the delivered content. In [27], for example, the authors proposed a scheme for uplink delivery of tile-based VR video over cellular networks. In particular, they formulate resource allocation as a frequency- and time-dependent non-deterministic polynomial NP-hard problem, and propose three distinct 129784 algorithms to solve it. Instead, in [28] the authors consider a QoE-driven transmission of VR 360◦ contents in a multi-user massive MIMO wireless network. Specifically, in this scenario multiple users in the cell are requesting the same content, and the goal is to optimize the reception of such information through a stable scheme for the transmission of the viewport tiles. In this work, they also try to allocate the power in order to guarantee a consistent delivery rate for each stream. The impact of latency on the overall experience of the user has been mentioned in Sec. I, along with the importance of tracking the movements of the user in applications with strict delay requirements. The authors of [29] used a real VR head-tracking dataset to maximize the quality of the delivered video chunk under low-latency constraints. In that case, a deep recurrent neural network was designed for the prediction of the users’ FoV (allowing to cluster those with overlapping FoV) while information on the future content and the users’ locations was used as input of a proactive physical-layer multicast transmission scheme. A key solution to the latency problem would be to rely on the capabilities of 5G and edge cloud, exploiting what has been referred to as Cloud XR in Sec. I. Indeed, in [30] the authors demonstrated that 5G and edge cloud are necessary to sustain the requirements of applications such as VR gaming. All these solutions, however, lack a model capable of generating data flows that can easily be associated with a real XR application. The approach of [27] consisted in using 240 frames of each 8K 360◦ uncompressed video sequence available from [31]. In that case the author applied the HVEC Kvazaar encoding procedure, setting the frame rate to 25 Frames Per Second (FPS) and the Group of Pictures (GoP) size to 8, and using a constant tiling scheme, ideal for the purpose of their work. Despite the high level of details implemented in such a model, the use of a trace-based flow is limiting per se, considering also the limited portion of the video that they selected. Having an offline encoding strategy is another drawback, that in our framework has been overcome by integrating the rendering server in the processing pipeline. Also in [28], the simulation setup from the point of view of the VR architecture was defined in order to highlight the features of the algorithms proposed by the authors, and the nature of the traffic flow (e.g., average frame size, inter and intra-frames correlation, inter-frame interval, etc.) was not taken into account. Regarding the problem of tracking the movements of the users, in [29] the authors fed the recurrent neural network with the 3DoF traces from [32], tracking the pose of 50 different users watching a catalog of 10 HD 360◦ videos from YouTube (60 seconds long, 4K resolution, 30 FPS, FoV of 100◦ × 100◦ ). Having a generative model that creates such a dataset based on statistical studies on a collection of different traces would have greatly aided the training of the neural network used in [29]. Also, finding a dataset that represents VOLUME 9, 2021 M. Lecci et al.: Open Framework for Analyzing and Modeling XR Network Traffic well the problem that we want to solve is usually not feasible, and this may further limit the research outcomes. As a consequence, our goal is to provide the community with a tool for the automatic generation of such traces. A preliminary version of this work was proposed in [1], and here we extend it with the acquisition of longer and more heterogeneous traces, that now include realistic interaction with several VR applications. This extension also allowed a more detailed and thorough validation of the model. Besides making both the model and the traces public, we also propose a possible roadmap for making the framework as complete and detailed as possible, highlighting the most important contributions that would benefit researchers aiming at the design of ad hoc network protocol optimizations for this new type of traffic sources. III. VR TRAFFIC: ACQUISITION AND ANALYSIS In this section, we describe our basic traffic modeling work. Specifically, in Sec. III-A we describe our acquisition setup and the VR applications that we acquired, then in Sec. III-B we analyze the raw traffic traces, and the different streams composing them, both in terms of content and in terms of statistics. A. ACQUISITION SETUP For the rendering server, we used a desktop PC equipped with an Intel Core i7 processor, 32 GB of RAM, and an NVIDIA GeForce RTX 2080 Ti graphics card. For the headset, instead, we used an iPhone XS enclosed in a VR cardboard, which allows a realistic interaction with the applications. The two nodes were connected via Wi-Fi to improve the user’s freedom of movement, at the cost of a slightly less stable channel and of possible interference from other surrounding devices. VR applications were thus run on the rendering server and streamed to the headset using the application RiftCat 2.0 (on the server), and VRidge 2.7.7 (on the phone).1 This setup allows the user to play VR games on the SteamVR platform for up to a maximum of 10 minutes continuously, enough to obtain traffic traces to be analyzed (note that this limit is given by the free version of VRidge, and is absent in the premium version). Many settings can be tuned in this application, such as the display resolution, the frame rate (either 30 or 60 FPS), the target data rate (i.e., the data rate the application will try to consistently stream to the client, which can be set from 1 to 50 Mbps), the video encoder (NVIDIA NVENC was used), and the video compression standard (H.264 was chosen), among other advanced settings. As opposed to [1], we acquired traces while realistically interacting with available VR applications using mouse, keyboard, and head movements. Our setup only allowed us to interact with 3DoF, i.e., the user was seated and only head rotations were sensed by the headset. In any case, in order to increase the realism of the collected traces, the user was not required to limit the type of movements, but could freely 1 riftcat.com/vridge VOLUME 9, 2021 interact with the application of interest. To simplify the analysis of the traffic stream, audio was not activated. For this purpose, we selected three popular VR applications targeting different types of interactions. Specifically: • Minecraft: an extremely popular game, with the Vivecraft plugin enabling room-scale or seated VR experiences. The user can explore the virtual environment by walking or swimming, and interact with the virtual world by cutting trees, digging holes, crafting tools, etc. • Virus Popper: during this fast-paced educational game, many cartoony-looking viruses swarm a virtual room, and the user has to attack them with cleaning tools for survival. • Google Earth VR – Tour: the VR version of Google Earth, allowing a user to explore the world with satellite imagery, 3D terrain of the entire globe, and 3D buildings in hundreds of cities around the world. The SteamVR application also enables tours, teleporting the user all around the world every few seconds. • Google Earth VR – Cities: in this case, a more interactive experience is provided, allowing the user to fully explore cities or landmarks for as long as they want. Please note that Google Earth VR was used in two different ways, thus allowing us to analyze two different versions of the same application. To capture streamed packets, we ran Wireshark, a popular open-source packet analyzer, on the rendering server. The traffic analysis was performed at 30 and 60 FPS for target data rates of {10, 20, 30, 40, 50} Mbps and for all 4 applications with a resolution of 1920 × 1080, for a total of over 70 GB of PCAP traces and 4 hours of analyzed VR traffic. Our dataset containing the processed VR traffic traces can be found within our software and can be easily reused, as later described in Sec. V. B. TRAFFIC ANALYSIS As described in [1], we were able to partially reverse engineer both the DL and UL streams and, thanks to the help of RiftCat’s developers, we are now able to reliably process the raw traffic traces. We found that UDP sockets over IPv4 are used and both UL and DL streams contain several types of packets. Specifically, the UL stream contains packets such as synchronization, video frame reception information, and frequent small head-tracking information packets, whereas the DL stream contains synchronization, acknowledgment, and video frame packet bursts. To improve the stream quality, the RiftCat team developed a custom version of the ENet protocol,2 a relatively thin, simple and robust network communication layer on top of UDP, which offers reliable, in-order packet delivery. In Fig. 1 we show a visual representation of a slice of bidirectional VR streaming. The plot shows the main data streams in both DL and UL, giving an idea of how this transmission works. 2 Available: https://github.com/nxrighthere/ENet-CSharp 129785 M. Lecci et al.: Open Framework for Analyzing and Modeling XR Network Traffic FIGURE 1. Portion of traffic trace from Virus Popper (50 Mbps, 30 FPS). For this trace, 130–140 individual fragments make up a video frame burst. Most of the traffic is concentrated in DL and is made up of packet bursts encoding video frames. Video frame fragments were consistently found to be 1320 B long in all acquired traces, with a data size (the UDP payload) of 1278 B. The last packet of the burst also has the same size as the others, suggesting that padding has been used in order to simplify the protocol, although this biases the frame size distribution to be discrete. The second most noticeable traffic stream is the UL head tracking information, which the headset acquires and sends to the rendering server to update the viewport to be rendered. The head tracking payload was identified to be either 192 B or 97 B long, sometimes changing over the course of a single traffic trace, although the reason why different packet sizes were found is unclear. Finally, smaller packets in both UL and DL, with payloads of respectively 21 B and 10 B, were identified to contain feedback on the reception of video frames, which is probably used in the streaming protocol to decide whether or not to retransmit some frames. By reverse-engineering the bits composing the UDP payload of video frames, it was possible to identify a recurring set of bits suggesting a 31 B APP-layer header and allowing us to identify some key fields, such as (i) the frame sequence number, (ii) the number of fragments composing the frame, (iii) the fragment sequence number, (iv) the total frame size, and (v) a checksum. This information allowed us to reliably process and aggregate video frames. Given the settings of the streaming application (i.e., frame rate and target data rate), it is clear that a Constant Bit Rate (CBR) video encoding is performed in the background. In Fig. 2a we show the performance of the video encoder, almost always exceeding the target rate (though by only 5–10%). A simple explanation of this behavior might be the underestimation of header sizes in the computations of the CBR encoder, such as the header of the custom ENet protocol. Notably, both frame rates behave similarly across all four applications, with stable performance. 129786 Figs. 2b and 2c show the low overhead due to non-video DL and UL transmissions (including head tracking), respectively. Specifically, non-video DL traffic only accounts for 3–5 kbps while UL traffic for about 135–150 kbps, with 60 FPS traces consistently showing higher rates with respect to 30 FPS ones, probably due to the doubled amount of feedback. Only two out of our forty traces show different rates, possibly due to some imperfection in the streaming. In any case, these traffic flows are much lower than the target rates and appear constant, irrespective of the data rate or the application. This consideration lead us to the decision of ignoring them, focusing only on modeling the DL video traffic. Considering R the target data rate and F the application frame rate, the average video frame size is expected to be close to the ideal S = R/F, as shown in Fig. 2d. Note that the x-axis reports the measured data rate rather than the target data rate, i.e., the average data rate estimated from the acquired traces, which differs slightly from the target rate, as shown in Fig. 2a. Furthermore, Fig. 2e shows that the average Inter-Frame Inter-arrival (IFI) time perfectly matches the expected 1/F, equal to 33.3̄ ms for 30 FPS traces and 16.6̄ ms for 60 FPS traces. Moving to the analysis of the Probability Density Functions (PDFs), it is important to know that in a collection of packets associated to a video source, we can usually distinguish Intra-coded frames (I-frames) (sometimes called keyframes), Predictive-coded frames (P-frames), and Bipredictive-coded frames (B-frames). While I-frames are compressed similarly to simple static pictures, P-frames exploit the temporal correlation of successive frames to reduce the compressed frame size. B-frames, instead, can exploit the information from both previous and subsequent frames, further improving the compression efficiency at the cost of non-real-time transmission. All the details associated with these compression techniques are regulated by standards like H.264 [33]. VOLUME 9, 2021 M. Lecci et al.: Open Framework for Analyzing and Modeling XR Network Traffic FIGURE 2. Results from acquired VR traffic traces. FIGURE 3. Video frame distributions for Virus Popper (30 Mbps, 60 FPS). Interestingly, Fig. 3a shows that the frame size distribution is unimodal rather than multimodal, as would be expected considering the different compression levels of the I, P, and B frames generated by a typical H.264 encoder. As confirmed by the RiftCat team, the reason for such a smooth frame size distribution is that the encoder makes use of the H.264 Periodic Intra Refresh compression scheme where the reference image used to predict (and compress) the frames in VOLUME 9, 2021 a GoP, rather than being the first image as in H.264, is instead obtained from consecutive vertical slides taken from all the frames in the GoP. This results in a reduced variance of the frame size, making the encoded video stream almost CBR. As already mentioned, VRidge simplifies the transmission by discretizing some units. Fig. 3a shows a clear staircase Cumulative Distribution Function (CDF) for the video frame size, suggesting that video frames have been padded as the 129787 M. Lecci et al.: Open Framework for Analyzing and Modeling XR Network Traffic FIGURE 4. Video frame fit qualities for the Google Earth VR – Cities application. Fit quality is measured using the KS test (lower is better). Box plots show median (red), 1st and 3rd quartiles (box), minimum and maximum (whiskers) of the KS test with a given distribution, while markers show the exact values for the different traces. underlying distribution is indeed discrete, with a distance between consecutive stairs of 1278 B, i.e., the UDP payload of packets containing fragments of video frames. Similarly, the IFI time also appears to be discretized with a millisecond precision around the mean F1 , as seen in Fig. 3b, although some noise due to, e.g., variable rendering and encoding time, wireless channel condition, transmission queue state, transmission times, just to mention a few, smooths the CDF. IV. TRAFFIC MODEL Following the analysis of Sec. III-B, in this section we will describe the proposed model for VR traffic based on the collected VR traffic traces. The analysis presented in the previous section reveals that both packet sizes and IFI times appear to be discrete in the collected data traces. However, such granularity is likely due to specific design choices of the communication protocols used by the considered applications, rather than being a native characteristic of the XR services. Therefore, we believe it is more suitable to use continuous random variables to model the size of the data blocks generated by the XR application and the time between them. By doing so, we free our model from the specific constraints of this streaming application, with no loss of generality (as the discrete case can always be obtained from the continuous one), and in fact making it easier to accommodate other (non-discrete) cases in our framework if needed. A. DISTRIBUTION FITTING Given the extremely large number of samples per trace (200–600 s at 30 or 60 FPS), common quality of fit statistical tests yield poor performance due to the discretized distributions. Intuitively, while the PDF of discrete and continuous 129788 distributions takes completely different forms, the CDF of a discretized distribution is simply a staircased version of the related continuous distribution. In that case, the goodness of fit can be tested by comparing the CDFs, for example using the Kolmogorov-Smirnov (KS) test [34], defined as: KS = sup |Fe (x) − Ft (x)| , (1) x where supx is the supremum of the set of distances, Fe (x) is the empirical CDF of the acquired data, and Ft (x) is the CDF of the target distribution. The KS test will thus be used to score the quality of fit, where values closer to zero indicate a better parameter estimation. To fit and evaluate the best probability distributions for our data, we used the popular SciPy library [35]. We tested 15 of the most common continuous univariate distributions available in the scipy.stats package, evaluating their performance on both frame size and IFI on our traffic traces. Note that the SciPy library performs a maximum likelihood estimation of the parameters of the distribution, including location and scale, and applies them to all continuous distributions by transforming the random variable X into (X − loc)/scale. Given the exceptional accordance between expected values and computed averages (see Figs. 2d and 2e) and considering the proposed generative model (described in Sec. IV-B), we fixed the location parameter to the expected value (i.e., R/F for the frame size and 1/F for the IFI), fitting only the scale and the remaining parameters. A selection of distributions is shown in Fig. 4. We found that the Student’s t and Logistic distributions, closely followed by the Laplace, Gaussian, and Cauchy distributions, were the best fitting ones in almost all traces for both frame size and IFI. Fig. 5 shows how similar the fitted distributions actually are. Although the Student’s t distribution performs slightly better than the Logistic one in VOLUME 9, 2021 M. Lecci et al.: Open Framework for Analyzing and Modeling XR Network Traffic FIGURE 5. Comparison of the three best fitting distributions for Virus Popper (30 Mbps, 60 FPS). The KS test is also shown, where lower values indicate a better fit. the slight majority of the collected traces, in our case the Logistic distribution was the best choice. In fact, the third parameter of the Student’s t distribution is only able to yield minuscule improvements over the Logistic distribution, which only needs two parameters. Furthermore, if custom simulators need to manually implement the desired random stream, the Student’s t distribution is very hard to reproduce [36], while the Logistic distribution requires a simple transformation. This is the case when common libraries for random number generation cannot be used, such as in our implementation described in Sec. V. As a reference, we use SciPy’s definition of a logistic distribution, with PDF in its standardized form as follows: f (x) = e−x . (1 + e−x )2 (2) To shift or scale the distribution, the location and scale parameters are used as previously described. B. GENERATIVE MODEL Now that we characterized and fitted the statistical distributions of the 40 acquired traces, we want to define a generative model which would allow a user to synthesize XR traffic at will, be it for analysis or simulation purposes. As already discussed in previous sections, in this paper we propose a simple generative model, that only attempts to capture the statistical distributions of video frame size and Inter-Frame Inter-arrivals (IFIs), leaving higher-order statistical descriptions for future work. We define the dispersion as the ratio of the scale over the location parameter, attempting to find a common value for both frame rates, since absolute values are likely to differ by a constant factor (see Figs. 2d and 2e). While data aggregation is doable for frame sizes (as shown, for example, in Fig. 6a), data for IFI did not allow us to do so. As shown in Fig. 6b, in fact, data for 30 and 60 FPS behaves differently, making it impossible for us to create a single model for this parameter. This implies that our model will only be able to generalize over data rates, whereas 30 and 60 FPS are the only supported VOLUME 9, 2021 frame rates, and modeling and testing different values would require new data for the corresponding frame rate. After carefully studying the acquired traffic traces, we propose to generalize the scale parameters for both video frame size and IFI time with a power law, namely: y = ax b . (3) Furthermore, as Fig. 6b suggests, the 60 FPS IFI fits for all applications resulted in |b| < 10−4 , suggesting a constant behavior, irrespective of the data rate. In that case, we thus assumed a constant fit (a corner case of power law with b = 0) by computing the average value across all tested target data rates. As can clearly be observed from the collected data, the proposed model has been extracted from acquisitions between about 10 and 50 Mbps, thus using it beyond these limits is not advisable since no data in our possession can validate the quality of the synthetic traces. We let different applications have separate models, obtaining a data set of 10 traces per application (half at 30 FPS, half at 60 FPS). The parameters for all applications can be found in Table 1 and the generative algorithm is summarized in Algorithm 1. TABLE 1. Parameters of the proposed generative model. Each VR application is characterized by five parameters: two for the frame size dispersion DFS = αx β , one for the 60 FPS IFI dispersion DIFI = γ , two for the 30 FPS IFI dispersion DIFI = δx ǫ . V. SIMULATION RESULTS To further test the validity of the proposed model, we implemented it on top of Network Simulator 3 (ns-3), a popular open-source full-stack simulation software, and made it publicly available together with the processed VR traffic traces 129789 M. Lecci et al.: Open Framework for Analyzing and Modeling XR Network Traffic FIGURE 6. Generalization models for the Google Earth VR – Cities application. Individual points show the scale parameter of the Logistic model fitted on the acquired data, while the dashed red lines attempt to generalize the model to intermediate target data rates. Algorithm 1 Generative Model for XR Traffic Require: AppName, FrameRate, DataRate 1: FsAvg = DataRate / FrameRate 2: IfiAvg = 1 / FrameRate 3: α, β, γ , δ, ǫ = GetParameters (AppName) {see Table 1} 4: FsDispersion = α · DataRateβ 5: FsScale = FsDispersion · FsAvg 6: if FrameRate == 60 then 7: IfiDispersion = γ 8: else if FrameRate == 30 then 9: IfiDispersion = δ · DataRateǫ 10: else 11: Error: only 30 and 60 FPS supported 12: end if 13: IfiScale = IfiDispersion · IfiAvg in CSV format.3 Further details on the implementation of this traffic model on ns-3 can be found in [1]. To test our model, we set up simulation campaigns where multiple users equipped with HMDs communicate with a central Wi-Fi Access Point (AP), using a wireless connection based on the IEEE 802.11ac standard. The central AP also acts as rendering server, generating one VR stream for each receiving Station (STA) of the scenario. Transmissions randomly start within the first second of simulation, avoiding that different streams start at the same time. We show results for traffic streams imported directly from the acquired traces as well as for our model. Since a single trace is available for each parameter combination (i.e., application, frame rate, data rate), for a fixed parameter combination the traffic flows will all come from the same trace, although different 60 s windows are sampled to further decouple different users. Instead, simulations running our proposed model have been repeated twice: one with the target data rate submitted to VRidge when acquiring the corresponding trace, and one with the empirical data rate measured directly from the acquired traces (the two rates differ slightly, as can be seen 3 Available in the ns-3 app store: https://apps.nsnam.org/app/bursty-app/, Release v1.0.0 129790 from Fig. 2a). This information can also be found directly in the metadata of the acquired traces, made available in CSV format together with our model. A. MODEL VALIDATION Exhaustive simulation campaigns have been run for all four applications and five data rates at both 30 and 60 FPS, each repeated 10 times to obtain solid average statistics. Confidence intervals are not shown as they are extremely tight. Additional simulation parameters are shown in Table 2. TABLE 2. List of simulation parameters. In the following section, plots will show burst-level rather than fragment-level metrics, which in case of a video stream are much more informative and bring a more realistic perspective on the quality perceived by the user. In fact, in this case we are more interested in the performance regarding full video frames rather than single packets, and thus all packets from a burst will have to be collected before the HMD will be able to process and show the frame to the user. To validate our proposed model, we simulate a scenario as similar as possible to our acquisition setup, where a rendering server transmits the VR stream to a single user. Note that the Wi-Fi connection is able to withstand hundreds of megabitsper-second, thus a single user transmitting up to 50 Mbps is largely underutilizing the channel, allowing us to obtain unbiased results with respect to the limits of the channel capacity. We simulated all 40 combinations of parameters (4 applications, 5 data rates, 2 frame rates), although we only show results for the 10 related to the Google Earth VR – Cities application in Fig. 7. VOLUME 9, 2021 M. Lecci et al.: Open Framework for Analyzing and Modeling XR Network Traffic FIGURE 7. Simulation results for a single user streaming the Google Earth VR – Cities application over a Wi-Fi link. The statistics refer to fully received frames rather than to single fragments. In Fig. 7a we show the average throughput obtained by the 3 simulation campaigns in the 10 parameter sets. Clearly, both 30 and 60 FPS runs obtain similar results, since this metric disregards the frame rate. In fact, both models targeting the nominal rate (shown with dots and circular markers) are perfectly superimposed on the main diagonal. Simulations using the original traffic traces, instead, tend to have a slightly higher throughput (solid line with cross markers), as was expected by looking at Fig. 2a. Since data rate, frame size, and, conversely, latency are correlated, we matched our model’s data rate with the empirical one, as shown by the dashed line with square markers. As the flexibility of our model allows us to choose an arbitrary target rate, we can see a perfect match in the computed average throughput. In Fig. 7b, instead, we show the average frame delay measured from the Application (APP) layer of the AP to the APP layer of the STA. Processing, encoding/decoding and other technical delays must be added to obtain the full motion-to-photon latency, and thus the network delay should remain below 5–10 ms, as mentioned in Sec. I. The most noticeable difference with respect to the previous figure is that the two frame rates are clearly separated. This is because our reference application, described in Sec. III-B, allows us to choose a target data rate, trying to maintain a CBR transmission during the whole duration of the stream. This translates into frame sizes which depend directly on the frame rate, following the formula S = R/F as described in Sec. III-B. Since the channel capacity for these simulations is kept constant, doubling the frame rate halves the frame sizes which, in turn, halves the average video frame delay. As expected, the model using the target rate slightly underestimates the average frame delay, which depends on the real application throughput, always slightly lower than the one empirically computed from the traffic traces. Instead, similarly to the average throughput, setting the model to the trace’s empirical rate yields an almost perfect match with the VR traces we acquired. Finally, notice that the average frame delay always remains below 3 ms, well below the bound suggested by the industry experts [2], [4], [9], [14], [19]. To complete this analysis, in Fig. 7c we report the 95th percentile delay performance of our simulations. This metric is important as it gives an idea of the worst-case performance of VOLUME 9, 2021 the network. In fact, only ensuring average performance is not enough to obtain a smooth and appreciable user experience, since frequent stutters in the streamed video might easily ruin the interactivity of the application and even disorient the user. Ensuring that the 95th percentile of the delay is within acceptable bounds allows for a more fluid and overall better experience. In the case under analysis, it can be easily seen that both models using target and empirical rate slightly underestimate the frame delay of the acquired traces. It is likely that the fitted Logistic distribution is not able to fully grasp the minute details of the traffic trace, making our model unable to match the real traffic. Note that, while these results are bound to the specifications of the network under analysis (e.g., MCS, channel width, guard interval duration, fragment size, presence of RTS/CTS, Wi-Fi standard, mobility, environment) the framework that we proposed is general. This suggests that it can be used to study a variety of more or less complex scenarios and network architectures with different sets of parameters, assessing how they affect the end-to-end performance. To conclude, it appears that our model is indeed able to reliably predict average statistics, while it could still be improved to better mimic slightly more advanced and specific features. These refinements will be pursued in our future work. B. USE CASE EXAMPLE Finally, we propose a simple example use case for our VR traffic generator. We consider a VR arena setting, where multiple users are attached directly to a single AP streaming wirelessly. We assume that each user requests a 50 Mbps stream and observe how many STAs can be supported by an arena with an analogous setup. As expected, we notice again from Fig. 8 that our model needs to be calibrated against the empirical rate of the acquired trace to yield reliable results. In fact, from Fig. 8a we can see that the average throughput of the calibrated model perfectly matches the throughput of the traffic trace up to at least 8 users, where the network is able to support more than 400 Mbps. In Fig. 8b it is possible to see an unstable network condition, when 8 users are trying to stream simultaneously. 129791 M. Lecci et al.: Open Framework for Analyzing and Modeling XR Network Traffic FIGURE 8. Simulation results for multiple users streaming the Google Earth VR – Cities application over a Wi-Fi link. The statistics refer to fully received frames rather than to single fragments. It appears that the slightly higher throughput required by the trace and the empirical rate model with respect to the target rate model is enough to push the network to its limit, resulting in a sudden increase of the average frame delay, at both 30 and 60 FPS. Focusing on the 30 FPS simulations, the plot shows that up to 6 users can be supported within the 5 ms bound, while 7 users slightly exceed this limit, and finally 8 users make the network unstable and are thus pushed over the 10 ms limit for both the trace and the model using the empirical rate. It is important to notice that the more unstable the network, the worse the prediction accuracy of our model. This is probably due to the simplifications that we introduced, such as the Logistic distribution and the uncorrelated samples for both the IFI time and frame size stochastic processes. Similarly, at 30 FPS, up to 7 users can be supported, but an additional user makes the system highly unstable and with poor prediction performance from our model. Finally, in Fig. 8c we show the results for the 95th percentile of the delay. Similarly to the average delay, this metric also shows the instability of the network for 8 users with much worse performance. Focusing on 30 FPS, the system is able to keep the delay below the 5 ms bound only when no more than 2 users are present, whereas up to 5-6 users can be served if a 10 ms delay is still deemed acceptable. Instead, at 60 FPS up to 6 users can be served while keeping the network delay within 5 ms, while the 10 ms limit is only surpassed when the network becomes unstable with 8 users. These counterintuitive conclusions come from the fact that the application fixes a data rate, not a quality of experience. This means that doubling the frame rate results in halving the frame size, thus reducing the perceived image quality of the streamed application, which turns into an almost halved delay. Fixing a constant bit rate thus results in higher frame rates yielding lower latencies, at the cost of a lower image quality. In general, there is good accordance between the results predicted by the calibrated model and the traffic traces, while the uncalibrated model often shows overly optimistic results. When the traffic in the network increases too much and the network becomes unstable, the three simulations diverge significantly, making our synthetic traces less reliable, although this is a corner case that might be of lesser interest. 129792 VI. XR TRAFFIC MODELING ROADMAP Starting from the model described in the previous sections, in the following we propose an end-to-end framework to evaluate network solutions, tailored for XR applications. The goal is to list and detail the tasks required for the construction of such a framework, in order to encourage researchers in this field to advance with their work the state of the art, using our baseline as a valid starting point. While Sec. VI-A is devoted to highlighting our contributions, in Secs. VI-B to VI-E we set down each additional task, describing how they can lead to the optimization of network protocols. A. EXPLOITING FIRST-ORDER STATISTICS The model proposed in this paper, despite its basic functionalities, represents a solid foundation on top of which future works can iterate to develop more sophisticated strategies. In particular, we designed an open-source, highly customizable setup (described in Sec. III) to acquire traffic traces by sniffing the packets traveling on the local network where the experiments were conducted. At this stage, packets are generated following first-order statistics, sampling the size and inter-frame interval from the distribution fitted on the collected data (see Sec. III-B). As a consequence, with this model we can emulate the creation of application frames that replicate the strategy implemented by the rendering server used in our experiments. While this model is already useful for some applications, it lends itself to several interesting extensions, which capture other important features of the statistics of XR traffic. As an example, in the rest of this section we discuss the importance of studying the correlation between different packets and of understanding how the movements of the user impact the generated traffic as two key areas of future improvement for our model. B. INTRODUCING TEMPORAL CORRELATION More advanced studies can be carried out to improve the model with additional features. One important aspect to elaborate on is the correlation among subsequent frames, or even within a specific group of frames. As mentioned in Sec. III-B, when compressing a video stream both intra-frame and inter-frame compression VOLUME 9, 2021 M. Lecci et al.: Open Framework for Analyzing and Modeling XR Network Traffic techniques could be exploited, and this influences not only the structure of the packets since the type of compression greatly influences the frame size, but also the strategy to inject them into the network. It is also possible that some manufacturers use advanced coding techniques such as Periodic Intra Refresh, as was explained in Sec. III-B for the streaming application used for our analysis, or more advanced standards such as H.265 [37] using different compression techniques. In that case, the importance of temporal correlation might decrease, although further analysis should be carried out to ensure this. It should be clear, by now, that the availability of a model capable of generalizing how such frame sequences are created, independently of the technical setup, is important, and the fact that each manufacturer may use its own policy represents an additional challenge. In addition, having a model that integrates and generalizes the temporal correlation among frames would allow researchers to elaborate strategies to guarantee a certain level of latency and throughput, for example by giving different priorities and scheduling options to different types of packets. For applications with constant delay requirements and high values of FPS, a solution could be to buffer (at the device side or at the rendering server) specific packets associated with keyframes, in order to improve the encoding process. This would require stable network performance and an application capable of communicating directly with the network, e.g., exploiting cross-layer solutions, to be aware of any change of the link quality that would trigger specific countermeasures or improvements, if applicable. C. INTRODUCING HEAD TRACKING A further improvement of the model should exploit the information on movement tracking, in particular related to the head, for all 6DoF. In this case, sniffing the packets traveling through the network might not be enough, and we thus need to gather information from different sensors (e.g., gyroscope, accelerometer and compass), that could be integrated into the device used to interact with the virtual world. With respect to VRidge, the software that we used to make our phone acting as a VR headset and our PC as a rendering server, the developers provide an API for this purpose.4 By connecting to the head tracking endpoint, the software provides positional, rotational, or combined data, and even the possibility of modifying phone tracking data in real time before it is used for the rendering step. This is important because, by aligning the motion trace with the traffic generated by the application, it can be determined whether there is correlation between a certain movement of the user and the corresponding drop in the reception of packets, or other network-related events. For example, knowing the direction of the physical movement of the user might help mmWave wireless systems (such as 802.11ad/ay) keep beam alignment between the AP and the user device, 4 https://github.com/RiftCat/vridge-api VOLUME 9, 2021 thus limiting the risk of abrupt connection interruptions if the line of sight is lost. It is to be highlighted that this approach could benefit every communication infrastructures that can be used to deliver XR content, as user tracking data can be exploited at different layers of the protocol stack. D. FULL TRAFFIC EMULATOR The last step to further increase the fidelity (but also the complexity) of the traffic model is to fully characterize and emulate all the different information sub-flows and how they interact with each other. For example, as shown in Fig. 1 and explained in Sec. III-B, the VR stream comprises both DL and UL messages containing information such as video frames, head tracking information, and feedback. A full-blown emulator would send all this information to and from the user, reacting accordingly whenever a packet is lost or corrupted, or when communication delays are present. This level of detail requires a much more in-depth analysis of the transmission protocol of a real XR application, understanding all the consequences of erratic and unexpected behaviors of the network. Such a precise model would be extremely useful when running large simulation campaigns as it would give the most accurate and reliable results. However, the amount of work required to analyze and reproduce a realistic behavior would be extremely high. E. QoE-CENTRIC XR As highlighted thoroughly in the previous paragraphs, the final goal of all these approaches is to guarantee high-level performance to the final user. In particular, in the XR domain, we tend to measure the performance in terms of overall satisfaction of the customers, referred to as QoE, and, to the best of our knowledge, there is no standardized way to evaluate these metrics. In our case, besides the quality of the shown image, also the latency of the communication between the HMD and the rendering server can make a difference (especially if the latter is in the cloud), considering that cybersickness has a huge impact on the user experience. For this reason, researchers should be encouraged to design algorithms that guarantee stable and constant performance, taking into account that the traffic in the network varies depending on the application and user activity. Moreover, since in a common scenario we have different users, there may be a need to support different traffic categories at the same time in the same network. This requires a system able to fairly distribute resources among the flows, where learning algorithms could be implemented to orchestrate every operation, either from a network or from an application perspective. Given a certain condition of the user, or other available information, the algorithm could predict the QoE trend and act accordingly in case of an anticipated performance drop. At this point on the roadmap, the network design should 129793 M. Lecci et al.: Open Framework for Analyzing and Modeling XR Network Traffic focus on the user, trying to guarantee a stable experience also when Variable Bit Rate (VBR) flows are considered. In fact, in a CBR flow (much easier to handle from a network point of view), the perceived image quality can be affected in case of a scene with a large amount of action and details. In this case, it may be difficult to fit everything at a fixed rate and, as a consequence, the user experiences a downgrade in terms of quality. This further highlights the need for novel solutions, able to tackle these problems by trading off system complexity and QoE. VII. CONCLUSION In this paper we described the current state of the art regarding the telecommunication aspects needed to support highquality XR streaming, mainly focusing on the challenges needed to obtain faithful traffic models that the community could use to test protocols and optimize networks. We then proceeded to acquire over 4 hours of VR traffic, study in detail this type of traffic, and propose a model to generate synthetic traffic traces, while also making freely available to the community both our implementation and the VR dataset. Finally, we show some results on the predictive power of our model, while also acknowledging its weak points. Furthermore, we provided an example use case where multiple users coexist in the same network, naively sharing radio resources up to its collapse. Further work could better study effective scheduling strategies for XR traffic streams, possibly coexisting with other applications in the same network while also ensuring robustness in case of fluctuating channel quality. Also, the model could be tested and validated for higher values of FPS, by collecting and analyzing additional traces at 90 FPS or higher. All the tasks that we think are necessary to build a complete framework for traffic generation have been listed in Sec. VI and represent possible future directions for this work. With this contribution, we hope to pave the way for the research community to start working towards the optimization and support of this specific type of traffic, given the extreme interest from the main standard bodies and the most prominent telecommunication industries. ACKNOWLEDGMENT The authors would like to thank RiftCat Team for patiently answering all the disclosable technical questions they asked, allowing them to improve their work. The identification of any commercial product or trade name does not imply endorsement or recommendation by NIST, nor is it intended to imply that the materials or equipment identified are necessarily the best available for the purpose. A preliminary version of this paper proposing a simpler traffic model with fewer traffic traces was presented at the ACM Workshop on ns-3 (WNS3), in June 2021 [DOI: 10.1145/3460797.3460807]. 129794 REFERENCES [1] M. Lecci, A. Zanella, and M. Zorzi, ‘‘An ns-3 implementation of a bursty traffic framework for virtual reality sources,’’ presented at the ACM Workshop ns-3 (WNS3), Jun. 2021, pp. 1–9, doi: 10.1145/3460797.3460807. [2] Huawei Technologies. (2016). Empowering Consumer-Focused Immersive VR and AR Experiences With Mobile Broadband. [Online]. Available: https://www.huawei.com/en/industry-insights/outlook/mobilebroadband/insights-reports/vr-and-ar [3] Oculus Business. (Sep. 2020). Virtual Reality—Set to Enter the Business Mainstream. [Online]. Available: https://go.facebookinc.com/securitywhitepaper.html [4] Huawei Technologies. (2021). AR Insight and Application Practice. [Online]. Available: https://carrier.huawei.com/~/media/CNBGV2/downl oad/bws2021/ar-insight-and-application-practice-white-paper-en.pdf [5] PriceWaterhouse Coopers. (2019). Seeing is Believing–How Virtual Reality and Augmented Reality are Transforming Business and the Economy. [Online]. Available: https://www.pwc.com/ gx/en/technology/publications/assets/how-virtual-reality-and-augmentedreality.pdf [6] ZTE. (2019). 5G Cloud XR Application. [Online]. Available: https://www.mobile360series.com/wp-content/uploads/2019/09/zte-white -paper.pdf [7] Qualcomm Technologies. (Nov. 2020). The Mobile Future of eXtended Reality (XR). [Online]. Available: https://www.qualcomm.com/media/ documents/files/the-mobile-future-of-extended-reality-xr.pdf [8] Ericsson. (Apr. 2020). How 5G and Edge Computing Can Enhance Virtual Reality. [Online]. Available: https://www.ericsson.com/en/blog/20 20/4/how-5g-and-edge-computing-can-enhance-virtual-reality [9] 5G Americas. (Nov. 2019). 5G Services Innovation. [Online]. Available: https://www.5gamericas.org/wp-content/uploads/2019/11/5G-ServicesInnovation-FINAL-1.pdf [10] Deloitte. (Apr. 2018). Real Learning in a Virtual World. [Online]. Available: https://www2.deloitte.com/us/en/insights/industry/technology/ how-vr-training-learning-can-improve-outcomes.html [11] F. Chiariotti, ‘‘A survey on 360-degree video: Coding, quality of experience and streaming,’’ Comput. Commun., vol. 177, pp. 133–155, Sep. 2021. [12] Nokia. (2020). Cloud gaming and 5G–Realizing the opportunity. [Online]. Available: https://onestore.nokia.com/asset/207843 [13] Huawei. (2020). Preparing for a Cloud AR/VR Future. [Online]. Available: https://www-file.huawei.com/-/media/corporate/ pdf/x-lab/cloud_vr_ar_white_paper_en.pdf?la=en [14] Extended Reality (XR) in 5G, document (TR) 26.928, 3GPP, Dec. 2020. [15] L. J. Hettinger and G. E. Riccio, ‘‘Visually induced motion sickness in virtual environments,’’ Presence, Teleoperators Virtual Environ., vol. 1, no. 3, pp. 306–310, Jan. 1992. [16] E. L. Groen and J. E. Bos, ‘‘Simulator sickness depends on frequency of the simulator motion mismatch: An observation,’’ Presence, Teleoperators Virtual Environ., vol. 17, no. 6, pp. 584–593, 2008. [17] S. von Mammen, A. Knote, and S. Edenhofer, ‘‘Cyber sick but still having fun,’’ in Proc. 22nd ACM Conf. Virtual Reality Softw. Technol., Nov. 2016, pp. 325–326. [18] H. G. Kim, W. J. Baddar, H.-T. Lim, H. Jeong, and Y. M. Ro, ‘‘Measurement of exceptional motion in VR video contents for VR sickness assessment using deep convolutional autoencoder,’’ in Proc. ACM Symp. Virtual Reality Softw. Technol. (VRST), Gothenburg, Sweden, Nov. 2017, pp. 1–7. [19] Requirements for Mobile Edge Computing Enabled Content Delivery Networks, document F.743.10, ITU, Nov. 2019. [20] Orange. (Jul. 2020). XR: 5G Extends the Boundaries of Reality. [Online]. Available: https://hellofuture.orange.com/en/xr-5g-extends-theboundaries-of-reality/ [21] Samsung Research. (Jul. 2020). Samsung 6G Vision. [Online]. Available: https://news.samsung.com/global/samsungs-6g-white-paper-lays-out-thecompanys-vision-for-the-next-generation-of-communications-technology [22] J. N. Latta and D. J. Oberg, ‘‘A conceptual virtual reality model,’’ IEEE Comput. Graph. Appl., vol. 14, no. 1, pp. 23–29, Jan. 1994. [23] B. Hentschel, M. Wolter, and T. Kuhlen, ‘‘Virtual reality-based multi-view visualization of time-dependent simulation data,’’ in Proc. IEEE Virtual Reality Conf., Mar. 2009, pp. 253–254. [24] E. Saad, W. R. J. Funnell, P. G. Kry, and N. M. Ventura, ‘‘A virtualreality system for interacting with three-dimensional models using a haptic device and a head-mounted display,’’ in Proc. IEEE Life Sci. Conf. (LSC), Oct. 2018, pp. 191–194. VOLUME 9, 2021 M. Lecci et al.: Open Framework for Analyzing and Modeling XR Network Traffic [25] O. Ergun, S. Akin, I. G. Dino, and E. Surer, ‘‘Architectural design in virtual reality and mixed reality environments: A comparative analysis,’’ in Proc. IEEE Conf. Virtual Reality 3D User Interface (VR), Mar. 2019, pp. 914–915. [26] R. Guo, J. Cui, W. Zhao, S. Li, and A. Hao, ‘‘Hand-by-hand mentor: An AR based training system for piano performance,’’ in Proc. IEEE Conf. Virtual Reality 3D User Interface Abstr. Workshops (VRW), Mar. 2021, pp. 436–437. [27] J. Yang, J. Luo, D. Meng, and J.-N. Hwang, ‘‘QoE-driven resource allocation optimized for uplink delivery of delay-sensitive VR video over cellular network,’’ IEEE Access, vol. 7, pp. 60672–60683, 2019. [28] L. Teng, G. Zhai, Y. Wu, X. Min, W. Zhang, Z. Ding, and C. Xiao, ‘‘QoE driven VR 360◦ video massive MIMO transmission,’’ IEEE Trans. Wireless Commun., early access, Jul. 9, 2021, doi: 10.1109/TWC.2021.3093305. [29] C. Perfecto, M. S. Elbamby, J. D. Ser, and M. Bennis, ‘‘Taming the latency in multi-user VR 360◦ : A QoE-aware deep learning-aided multicast framework,’’ IEEE Trans. Commun., vol. 68, no. 4, pp. 2491–2508, Apr. 2020. [30] B. Krogfoss, J. Duran, P. Perez, and J. Bouwen, ‘‘Quantifying the value of 5G and edge cloud on QoE for AR/VR,’’ in Proc. 12th Int. Conf. Qual. Multimedia Exper. (QoMEX), Athlone, Ireland, May 2020, pp. 1–4. [31] X. Liu, Y. Huang, L. Song, R. Xie, and X. Yang, ‘‘The SJTU UHD 360-degree immersive video sequence dataset,’’ in Proc. Int. Conf. Virtual Reality Visualizat. (ICVRV), Oct. 2017, pp. 400–401. [32] W.-C. Lo, C.-L. Fan, J. Lee, C.-Y. Huang, K.-T. Chen, and C.-H. Hsu, ‘‘360◦ video viewing dataset in head-mounted virtual reality,’’ in Proc. 8th ACM Multimedia Syst. Conf., Taipei, Taiwan, Jun. 2017, pp. 211–216. [33] Advanced Video Coding for Generic Audiovisual Services, document H.264, ITU, Aug. 2021. [34] I. M. Chakravarti, R. G. Laha, and J. Roy, Handbook of Methods of Applied Statistics, vol. 1. Hoboken, NJ, USA: Wiley, 1967. [35] P. Virtanen et al., ‘‘SciPy 1.0: Fundamental algorithms for scientific computing in Python,’’ Nature Methods, vol. 17, pp. 261–272, Feb. 2020. [36] W. T. Shaw, ‘‘Sampling Student’s T distribution-use of the inverse cumulative distribution function,’’ J. Comput. Finance, vol. 9, no. 4, pp. 37–73, 2006. [37] High Efficiency Video Coding, document H.265, ITU-T, Aug. 2021. MATTIA LECCI (Graduate Student Member, IEEE) received the B.Sc. degree (Hons.) in information engineering and the M.Sc. degree (Hons.) in telecommunication engineering from the University of Padua, Italy, in 2016 and 2018, respectively, where he is currently pursuing the Ph.D. degree in information engineering. He was a Guest Researcher with the National Institute for Standards and Technology (NIST), in 2018. His main research interests include channel modeling for the mmWave frequency band, MAC scheduling for WiGig technologies, applied machine learning for communications, virtual reality traffic modeling, and open-source software development. VOLUME 9, 2021 MATTEO DRAGO (Graduate Student Member, IEEE) received the B.Sc. and M.Sc. degrees in telecommunication engineering from the University of Padua, Italy, in 2016 and 2019, respectively, where he is currently pursuing the Ph.D. degree. He visited Nokia Bell Labs, Dublin, in 2018, working on QoS provisioning in 60 GHz networks. His current works include the study of protocols and solutions for vehicular networks operating at millimeter-wave, the modeling and optimization of network solutions to support next-generation cloud applications, and open-source software development. ANDREA ZANELLA (Senior Member, IEEE) received the Laurea degree in computer engineering from the University of Padua, Italy, in 1998, and the Ph.D. degree, in 2001. In 2000, he has spent nine months with Prof. Mario Gerla’s Research Team at the University of California, Los Angeles (UCLA). He is currently a Full Professor with the Department of Information Engineering (DEI), University of Padua. He is one of the coordinators of the SIGnals and NETworking (SIGNET) Research Laboratory. His research interests include the fields of protocol design, optimization, and performance evaluation of wired and wireless networks. He has been serving as a Technical Area Editor for the IEEE INTERNET OF THINGS JOURNAL and an Associate Editor for the IEEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING, the IEEE COMMUNICATIONS SURVEYS AND TUTORIALS, and Digital Communications and Networks. MICHELE ZORZI (Fellow, IEEE) received the Laurea and Ph.D. degrees in electrical engineering from the University of Padua, Italy, in 1990 and 1994, respectively. From 1992 to 1993, he was on leave at the University of California at San Diego (UCSD). In 1993, he joined the Dipartimento di Elettronica e Informazione, Politecnico di Milano, Italy. After spending three years with the Center for Wireless Communications, UCSD, in 1998, he joined the School of Engineering, University of Ferrara, Italy, where he became a Professor, in 2000. Since November 2003, he has been a Faculty Member with the Department of Information Engineering, University of Padua. His current research interests include performance evaluation in mobile communications systems, the Internet of Things, cognitive communications and networking, 5G mmWave cellular systems, vehicular networks, and underwater communications and networks. Dr. Zorzi has served the IEEE Communications Society as a Memberat-Large for the Board of Governors, from 2009 to 2011 and from 2021 to 2023, as the Director of education, from 2014 to 2015, and the Director of journals, from 2020 to 2021. He received several awards from the IEEE Communications Society, including the Best Tutorial Paper Award, in 2008 and 2019, the Education Award, in 2016, Stephen O. Rice Best Paper Award, in 2018, and Joseph LoCicero Award for exemplary service to publications, in 2020. He was the Editor-in-Chief of the IEEE Wireless Communications Magazine, from 2003 to 2005, the IEEE TRANSACTIONS ON COMMUNICATIONS, from 2008 to 2011, and the IEEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING, from 2014 to 2018. 129795