WO2018026963A1

WO2018026963A1 - Head-trackable spatial audio for headphones and system and method for head-trackable spatial audio for headphones

Info

Publication number: WO2018026963A1
Application number: PCT/US2017/045176
Authority: WO
Inventors: Matthew MARRIN; Greg MORGENSTEIN
Original assignee: Hear360 Inc
Current assignee: Hear360 Inc
Priority date: 2016-08-03
Filing date: 2017-08-02
Publication date: 2018-02-08
Anticipated expiration: 2019-02-03

Abstract

Systems and methods to create and stream head-trackable spatial audio for 360 video and virtual reality applications for playback over headphones. A plurality of HRTF filters, a plurality of convolution engines for instantiating said HRTF filters, an audio bussing matrix for routing one or more audio signals to the convolution engines, a post convolution engine rendered multi-channel audio output, a multi-channel audio stream, a multi-channel decoder post audio stream, and a post audio decode multi-channel audio head-tracking renderer comprise components of the system. A plurality of audio outputs are summed into a multiple of multi-perspective stereo audio files which are bit stream encoded and streamed to an end user device. At the end user device, the audio bit stream is decoded and the head-tracking renderer renders the audio files for head tracking sound reproduction to the user based on positional data provided by the end user device or headset.

Description

HEAD-TRACKABLE SPATIAL AUDIO FOR HEADPHONES AND SYSTEM AND METHOD FOR HEAD-TRACKABLE SPATIAL AUDIO FOR HEADPHONES

Field of the Invention

The present invention relates generally to

creating and streaming head-trackable spatial audio for headphones. Background

Currently there are very few solutions for

streaming spatial audio for 360 video and virtual reality applications. Some of these solutions attempt to create and stream a stereo spatial representation of a target environment, but these solutions fail to meet the needs of the industry because they are unable to create multiple perspectives that can be head- tracked by the end user. Other solutions attempt to create and stream a multi-channel audio stream to be spatialized at the other end of the stream, but these solutions are similarly unable to meet the needs of the industry because of audio file synchronization issues at the decode stage. Still other solutions seek to attempt to create and stream a multi-channel audio stream to be spatialized at the other end of the

stream, but these solutions also fail to meet industry needs because of the high mobile application

processing power required to process the audio

spatially at the end of the stream. It would be

desirable to have a spatial audio creation

architecture with associated software for a simplified means to both create and stream spatial audio, and to also render that spatial audio after the stream to provide the correct creator intended spatial perspective to a user based on a user's head position. Furthermore, it would also be desirable to have a system and software that combines all of the steps in the spatial audio creation phase including the

preparation for streaming that content. Still further, it would be desirable to have a system and software that allows for the post stream audio decode and rendering of that audio for head-tracking based on the user's head position. Therefore, there currently exists a need in the industry for a system that provides a complete solution for both the creation of multi-perspective head-trackable spatial audio, and the delivery, decode, and rendering of that audio based on an end user's positional data.

Summary

In accordance with the disclosure, the present invention advantageously fills the aforementioned deficiencies by providing a device and method for creating and streaming head-trackable spatial audio for headphones. The present invention includes a software system together with an associated computer process. The system is made up of the following components: audio bussing matrix, convolution engines, HRTF (Head-related transfer function) filters, audio file output summing matrix, multi-channel audio file encode for streaming, multi-channel audio file

decoder, and a multi-channel audio file head-tracking renderer. These components are connected as follows: the audio bussing matrix is connected to the

convolution engines, the HRTF filters are connected to the convolution engines, the convolution engines are connected to the audio file output summing matrix, the audio file output summing matrix is connected to the multichannel audio file encoder for streaming. Multi^¬ channel audio is streamed by itself, or along with video content to the decoder. The multichannel audio file decoder is connected to the multi-channel audio file head-tracking renderer. The associated computer process is made up of the following executable steps: Audio content is sent to a bussing matrix that

delivers said content to a block of convolution engines that convolve the audio based on a set of HRTF filters that are loaded into the engines. The output of the convolution engines are sent through an audio output summing matrix which delivers multiple multi- perspective stereo audio files which are then encoded into a multi-channel audio stream. That multi-channel audio stream is then fed (sometimes interleaved with a matching video component) into a streaming server and streamed over a network. An app or a browser on a computer or mobile electronics device then receives the broadcasted stream at which point the multi^¬ channel audio stream is then decoded and then rendered for head-tracking based off of positional data

provided by the computer or mobile electronics device or a 360 video or virtual reality headset to represent the user's dynamic head position which can be static, continuously moving, or any combination of the two.

The present invention system may also have one or more of the following optional software components on the creation side for more accurate pre-monitoring of the created content before it is streamed over a network to the end user: An external 360 video, equi- rectangular, or multi-view video player which

synchronizes and connects to the convolution engines multi-perspective output bus summing matrix, and allows for real time head-tracking on the content creation side from user directed positional input to the video player via manual mouse style navigation, or from a remote virtual, augmented, or mixed reality headset or mobile electronics device that provides azimuth and or elevation data based on sensors in or attached to the device. Positional data may also be referred to as Χ,Υ,Ζ data, or Yaw, Pitch, and Roll.

The present invention' s software system is unique when compared with other known systems and solutions in that it provides a highly efficient means to process high quality multi-perspective spatial audio on a content creation side that can be streamed over a network and rendered for head-tracking on the other end of a network stream using a highly efficient rendering engine requiring very little CPU usage.

The present invention is unique in that the overall architecture of the system is different from other known systems. More specifically, the present invention system is unique due to the presence of: (1) Content creation side multi-perspective

spatialization; (2) Highly efficient user side

rendering for head-tracking; and (3) No need for HRTF rendering in the end of stream user application.

Among other things, it is an advantage of the present invention to provide software for creating and streaming head-trackable spatial audio for headphones that does not suffer from any of the problems or deficiencies associated with prior solutions.

The present invention now will be described more fully hereinafter with reference to the accompanying drawings, which are intended to be read in conjunction with both this summary, the detailed description and any preferred and/or particular embodiments specifically discussed or otherwise disclosed. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein; rather, these

embodiments are provided by way of illustration only and so that this disclosure will be thorough, complete and will fully convey the full scope of the invention to those skilled in the art.

Brief Description of the Drawings Fig. 1 is an overview of the spatial audio system for the content creation process; and

Fig. 2 is an overview of the streaming process and end user decoder and spatial audio rendering system.

Detailed Description

The present invention is directed to a software and/or hardware system and architecture for creating and streaming head-trackable spatial audio for

headphones .

Fig.l is an overview of the spatial audio system for the content creation process starting with the any number of (but in this case 4) original unprocessed , un-altered, non spatial audio sources 3, which may have original left, right, front, and rear sound data which are routed to the convolution engine routing bus 4 and are then distributed into the left front engine 6, the right front engine 7, the left rear engine 8, and the right rear engine 9 of the front perspective convolution engine block 5. The original audio sources are also distributed from the convolution engine routing bus 4 into the right front engine 13, the right rear engine 15, the left front engine 12, and the left rear engine 14 of the left perspective convolution engine block 11. The original audio sources are also distributed from the convolution engine routing bus 4 into the right rear engine 21, the left rear engine 20, the right front engine 19, and the left front engine 18 of the rear perspective convolution engine block 17. The original audio sources are also distributed from the convolution engine routing bus 4 into the left rear engine 26, the left front engine 24, the right rear engine 27, and the right front engine 25 of the right perspective convolution engine block 23. The stereo audio summing bus outputs 10, 16, 22, and 28 of convolution engine blocks 5, 11, 17, and 23, are then routed into and merged into a multi-channel audio output 29.

Fig. 2 is an overview of the streaming process and end user decoder and spatial audio rendering software where the multi-channel audio output 29 is sent into a streaming encoder 30 and streamed over a network 31, received by an end user application 32 where the audio stream is decoded by the multi-channel audio decoder 33 and then sent into a head-tracking renderer 34 which is controlled by a end user viewing device or headset that provides xyz positional data 35 so that the appropriate audio perspective and sound field can be outputted from the head-tracking renderer 34 to the users headphones 36.

The number of unprocessed source audio channels and the number of audio channels in the spatialized multichannel audio stream disclosed in these drawings are an example of a use case scenario. This use could be as few as 4 channels of source audio to create each perspective with an 8 channel multichannel spatialized output as the accompanying drawings illustrate, or as many or as few source channels and or rendered

multichannel audio perspectives as the bandwidth of the computer, mobile device, network or the like, permits. For example, the rendered, spatialized, multichannel audio output could contain 10, 12, 16, 24, 32, or any number of channels, delivering any number of perspectives, or perspective derivatives, including height perspectives, and perspectives that are below the listener. Unprocessed source audio is not limited to being routed to a left front, right front, left rear, and right rear position as

illustrated in the drawings, but could also be routed to any number of possibilities including HRTF engines designated to reproduce height information, or spatial information below a listener, or 5.1 (six channel surround sound audio systems), 7.1 (eight-channel surround audio system), 11.1 (an extension of the 5.1 surround sound format by incorporating height and overhead channels to allow for placement and panning of sound in the horizontal and vertical axis), 12.1, or other surround sound configurations and also spherical or Cartesian spatial layouts. Band-limited Low-Frequency Effects (LFE) channels can also be incorporated .

In a most complete version, the software system of the present invention is made up of the following components: a plurality of HRTF filters, a plurality of convolution engines for instantiating said HRTF filters, an audio bussing matrix for routing audio signals to the convolution engines, a post convolution engine rendered multi-channel audio output summing bus, a multi-channel audio stream, a multi-channel decoder post audio stream, and a post audio decode multi-channel audio head-tracking renderer. These components are combined together to create an

architecture for the system that has the following characteristics: typically audio content creation would take place inside of a DAW (Digital Audio

Workstation) , and this invention would allow for additional work flow options to be added to a DAW to create multi-perspective, streamable, and head- trackable spatial audio mixes. The software components of this invention would fit both inside of a DAW, other stand alone, or virtual (cloud) application, and also inside of a secondary remote application (a user application that receives the created content stream) . The following components from this invention would fit inside the architecture of a DAW: an input bussing matrix, convolution engines, HRTF filters, an output summing bus matrix, and a multi-channel audio output. Those components are connected as follows: the input bussing matrix routes audio into blocks of convolution engines which then spatialize and process the audio via the HRTF filters, each block representing a different head perspective, the processed audio is then fed into an output bus matrix which then sums the multiple convolution engine outputs from each

convolution block into a multi-perspective multi- channel spatial audio output or outputs. The final multi-channel spatial audio output or outputs are then sent out of the DAW into a system where they are either directly streamed or combined with video and then streamed over a network. The second group of components in this invention live on the other side of the network stream, in a remote user based application that can live on a computer, mobile phone, or any other mobile or non-mobile electronics device. In respect to the user side network stream receiving application, the remaining components of this

invention are listed and connected as follows: a multi-channel audio decoder separates and decodes multi-channel audio from a network stream and then sends it to a multi-channel head tracking rendering component that renders the multi-channel audio based on user inputted head position azimuth and or

elevation data provided by sensors in the host device, or a virtual reality headset, or manually via a mouse or touch-screen navigation input. While azimuth is employed in the illustrated example, elevation and roll data (yaw, pitch, and roll) or (Χ,Υ,Ζ) data can be employed alone or in combination.

With reference again to Fig. 2, blocks 32, 33, 34 and 35 can exist in the mobile phone, with the

positional data being provided by the phone's sensors, and headphones 36 connected to the mobile phone, or blocks 32, 33, and 34 can exist in the computer or mobile phone/remote device, and the positional data from block 35 can be provided separately (in

connection with headphones 36 in the case of a VR headset, or separately from another positional sensor or oystick/other position control device) .

In the way of a practical explanation of the use of one embodiment of the device to achieve HRTF filter application via fast convolution follows. In a home theatre entertainment room with a 7.1 surround sound setup, when a person watches a movie that plays back in 7.1 surround sound, the person hears sound all around them, in front, to the side, and also behind them. In order to simulate this effect over

headphones, we need to first model the 7.1 speaker setup in that room. This is done by using a dummy head that simulates the shape and function of the human head and the way that people hear sound. The ears on a dummy head contain omni-directional microphones that simulate ear drums of a person. We can then capture a binaural impulse response from each of the speaker positions by using the dummy head to record the impulse responses. After capturing the impulse

responses of all of the speakers in the surround setup, we would now have HRTF filters that simulate surround sound over headphones.

In the case of this invention, we are creating a production work-flow to allow for the creation of spatial audio (simulated headphone surround) in multiple perspectives. This can be done by creating multiple sets of the surround sound HRTF room model sets, and then create different input routings to each set to represent a different position for each set. For example, let's say that we want to represent a front orientation, a left orientation, a rear

orientation, and a right orientation. Imagine a person sitting in a surround sound home theatre listening while looking at the TV screen, and then the person turns their head 90 degrees to the right and hears the dialog coming from the left because it is hitting the left ear first. Or if the person turns to look at the back wall, they hear the dialog (center and front channels) coming from behind them (because the TV is now behind them) .

We can also simulate this same experience in real time over headphones by using HRTF room models and creating multiple bus routing scenarios. For instance, if we create 4 sets of the above mentioned 7.1 HRTF surround sound room model, and then create 4 sets of 7.1 busses, each bus set representing a different position (front, left, rear, and right), we can then mix audio in one perspective, but output 4 perspectives at the same time. In the most simple terms, take the center channel audio for a front perspective, if through a multiple 7.1 bus matrix that center channel is also feed to a separate set of 7.1 HRTF room models but instead of always routing it to the actual center channel, it is routed to the left side channel to represent a simulation of the center channel arriving at the person' s left ear first when they are looking 90 degrees to the right. Each set of 7.1 HRTF room models can be summed to stereo and retain all of the filters inherent spatial attributes. In this example we are talking about 4 sets of 7.1 HRTF filters, each representing a different

perspective (front, left, rear, and right) so at the end stage we could sum each 7.1 HRTF set into stereo, giving us 4 stereo audio outputs. Relating to this invention, we can then interleave these 4 stereo audio outputs into a single output or file, stream that interleaved output or file over a network, decode that on the other side of the network in a receiving application, and then render the appropriate stereo file perspective to be played in full, or in any mixed ratio combined with the other stereo audio

perspectives to create a real-time, dynamic, head- trackable spatial audio experience over headphones based off of user generated head positional data from the users device sensors, a virtual reality headset, or any other device or input form that allows for the expression of azimuth and or elevation information.

As one example of how to combine the 4 audio perspectives, the following computation can be used:

Where m = the magnitude of the sum output, f is the magnitude of the front perspective, r is the magnitude of the right perspective, 1 is the magnitude of the left perspective, b is the magnitude of the rear perspective, a is the azimuth f x cos a + r x sina , 0 < a < - π

- r x cos a + b x sin a , — < a <

m π

I f f x x c eonss a a—— 1 ! x X s siinn a a , . — < a < 0

2

b x cos a— 1 x sin a , π < a <

2

The azimuth is suitably provided from the

gyroscope of the VR player device, which may be a phone or VR headset or the like.

Which perspective is front and which is right is somewhat arbitrary. The above formula details the process for combining any two signals that are

adjacent, front and right, right and rear, rear and left and left and front. Variables R and B would be the magnitude of one of the two perspectives in any one of those combinations. Signals that are on

opposite sides from each other, for example, front and rear, are not combined.

Alternatively, a linear crossfade can be used to combine the audio perspectives.

Other perspectives can be incorporated, including those above and below the user, and corresponding combination can be provided to crossfade between above and below positions if they are employed.

While the example shown uses 4 channel points, front left, front right, rear left, and rear right surround sound routing, up to as many as possible, such as spherical representations (e.g., say 64 channel points on a sphere, or less or even way more than that) can be employed depending on the desired effect. Also in the 4 channel points case, the arrangement can be configured in a different way other than front/rear/left/right, for example in a plus sign configuration .

In accordance with the disclosure, a system and method for providing head-trackable spatial audio for use with headphones is provided, where any set of headphones can be employed and provide the user with an audio experience that provides audio that tracks the user's head movement, unlike the prior art where audio spatialization is lost when using headphones.

While the present invention has been described above in terms of specific embodiments, it is to be understood that the invention is not limited to these disclosed embodiments. Many modifications and other embodiments of the invention will come to mind of those skilled in the art to which this invention pertains, and which are intended to be and are covered by both this disclosure and the appended claims. It is indeed intended that the scope of the invention should be determined by proper interpretation and

construction of the appended claims and their legal equivalents, as understood by those of skill in the art relying upon the disclosure in this specification and the attached drawings.

Claims

Claims What is claimed is:

1. A method for streaming spatially adapted audio comprising:

providing multi-channel audio in a stream;

decoding the multi-channel audio stream and providing it to a headphone with spatial recombination based on an azimuth value.

2. The method according to claim 1, wherein the azimuth value is based on tracking of a position of a head of a person wearing the headphone

3. The method according to claim 1, wherein the azimuth value is based on a positional control.

4. The method according to claim 1, wherein the azimuth value is based on a positional data from a mobile device.

5. The method according to claim 1, wherein the multi-channel audio relates to multi-perspective sound values .

6. The method according to claim 5, wherein the multi-perspectives comprise left front, right front, left rear and right rear.

7. The method according to claim 5, wherein the multi-perspectives comprise 5.1 channel based surround sound positions.

8. The method according to claim 5, wherein the multi-perspectives comprise 7.1 channel based surround sound positions.

9. The method according to claim 5, wherein the multi-perspectives comprise left front, center, right front, left side, right side, left rear, right rear, left front height, right front height, left rear height, right rear height, and LFE channel (11.1) .

10. The method according to claim 5, wherein the multi-perspectives comprise a spherical channel based layout .

11. The method according to claim 1, wherein spatial recombination is based on the azimuth and an elevation value (Yaw and Pitch) .

12. The method according to claim 1, wherein spatial recombination is based on the azimuth, an elevation, and a roll value (Yaw, Pitch, and Roll)

13. A method for streaming spatially adapted audio comprising:

providing multi-perspective audio inputs to plural convolution engines to apply head-related transfer function audio filters thereto;

combining the output of the plural convolution engines to provide a multi-channel audio signal;

streaming an encoded version the multi-channel audio signal;

receiving the encoded version of the streamed multi-channel audio signal;

decoding the multi-channel audio signal to provide plural audio output signals; and rendering the multi-channel audio signals based on a position tracking value.

14. The method according to claim 13, wherein the multi-channel audio signals are rendered to a

headphone and the position tracking value is based on a head position of a wearer of the headphone.

15. The method according to claim 13, wherein the multi-channel audio relates to multi-perspective sound values .

16. The method according to claim 15, wherein the multi-perspectives comprise left front, right front, left rear and right rear.

17. The method according to claim 15, wherein the multi-perspectives comprise 5.1 channel based surround sound positions.

18. The method according to claim 15, wherein the multi-perspectives comprise 7.1 channel based surround sound positions.

19. The method according to claim 15, wherein the multi-perspectives comprise left front, center, right front, left side, right side, left rear, right rear, left front height, right front height, left rear height, right rear height, and LFE channel (11.1) .

20. The method according to claim 15, wherein the multi-perspectives comprise a spherical channel based layout .

21. The method according to claim 15, wherein the position tracking value is based on the azimuth of a user .

22. The method according to claim 15, wherein the position tracking value is based on the azimuth and an elevation value (Yaw and Pitch) of a user.

23. The method according to claim 15, wherein spatial recombination is based on the azimuth, an elevation, and a roll value (Yaw, Pitch, and Roll) of a user.

24. A system for streaming spatially adapted audio comprising:

a plurality of head-related transfer function filters ;

a plurality of convolution engines for

instantiating said head-related transfer function filters;

an audio bussing matrix for routing audio signals to the convolution engines;

a post convolution engine rendered multi-channel audio output summing bus;

a multi-channel audio stream;

a multi-channel decoder post audio stream; and a post audio decode multi-channel audio position- tracking renderer.

25. The system to claim 24, wherein the multichannel audio signals are rendered to a headphone and the position-tracking renderer is based on tracking of a position of a head of a person wearing a headphone

26. The system to claim 24, wherein the multi^¬ channel audio signals are rendered to a headphone and the position-tracking renderer is based on input from a positional control.

27. The system to claim 24, wherein the multichannel audio signals are rendered to a headphone and the position-tracking renderer is based on a

positional data from a mobile device.

28. The system according to claim 24, wherein the position-tracking renderer is based on the azimuth of a user.

29. The system according to claim 24, wherein the position-tracking renderer is based on the azimuth and an elevation value (Yaw and Pitch) of a user.

30. The system according to claim 24, wherein position-tracking renderer is based on the azimuth, an elevation, and a roll value (Yaw, Pitch, and Roll) of a user.

31. The system according to claim 24, wherein the multi-channel audio relates to multi-perspective sound values .

32. The system according to claim 31, wherein the multi-perspectives comprise left front, right front, left rear and right rear.

33. The system according to claim 31, wherein the multi-perspectives comprise 5.1 channel based surround sound positions.

34. The system according to claim 31, wherein the multi-perspectives comprise 7.1 channel based surround sound positions.

35. The system according to claim 31, wherein the multi-perspectives comprise left front, center, right front, left side, right side, left rear, right rear, left front height, right front height, left rear height, right rear height, and LFE channel (11.1) .

36. The system according to claim 31, wherein the multi-perspectives comprise a spherical channel based layout .

37. Apparatus for streaming spatially adapted audio comprising:

a multi-channel audio in a stream;

a decoder receiving the multi-channel audio stream, said decoder providing the multi-channel audio to a headphone with spatial recombination based on an azimuth value.

38. The apparatus according to claim 37, further comprising a tracker monitoring a position of a head of a person wearing the headphone to provide the azimuth value.

39. The apparatus according to claim 37, further comprising a positional control to provide the azimuth value .

40. The apparatus according to claim 37, further comprising a mobile device to provide the azimuth value .

41. The apparatus according to claim 37, wherein the multi-channel audio relates to multi-perspective sound values.

42. The apparatus according to claim 41, wherein the multi-perspective sound values comprise left front, right front, left rear and right rear sound values.

43. The apparatus according to claim 41, wherein the multi-perspectives comprise 5.1 channel based surround sound positions.

44. The apparatus according to claim 41, wherein the multi-perspectives comprise 7.1 channel based surround sound positions.

45. The apparatus according to claim 41, wherein the multi-perspectives comprise left front, center, right front, left side, right side, left rear, right rear, left front height, right front height, left rear height, right rear height, and LFE channel (11.1) .

46. The apparatus according to claim 41, wherein the multi-perspectives comprise a spherical channel based layout.

47. The apparatus according to claim 37, wherein the spatial recombination is further based on an elevation value (Yaw and Pitch) of a user.

48. The apparatus according to claim 37, wherein the spatial recombination is further based on, an elevation, and a roll value (Yaw, Pitch, and Roll) of a user.

49. Apparatus for streaming spatially adapted audio comprising:

plural convolution engines receiving multi- perspective audio inputs to apply head-related

transfer function audio filters to the audio inputs; a combiner combining the output of the plural convolution engines to provide a multi-channel audio signal ;

a streaming system for streaming an encoded version the multi-channel audio signal;

a receiver receiving the encoded version of the streamed multi-channel audio signal;

a decoder decoding the multi-channel audio signal to provide plural audio output signals; and

a renderer for rendering the multi-channel audio signals based on a position tracking value.

50. The apparatus according to claim 49, wherein the multi-channel audio signals are rendered to a headphone and the position tracking value is based on a head position of a wearer of the headphone.

51. The apparatus to claim 49, wherein the multi^¬ channel audio signals are rendered to a headphone and the position tracking value for the position-tracking renderer is based on input from a positional control.

52. The apparatus according to claim 49, wherein the multi-channel audio signals are rendered to a headphone and the position tracking value for the position-tracking renderer is based on a positional data from a mobile device.

53. The apparatus according to claim 49, wherein the position data is based on an azimuth value of a user .

54. The apparatus according to claim 49, wherein the position data is based on an azimuth and an elevation value (Yaw and Pitch) of a user.

55. The apparatus according to claim 49, wherein the position data is based on, an azimuth, an

elevation, and a roll value (Yaw, Pitch, and Roll) of a user.

56. The apparatus according to claim 49, wherein the multi-channel audio relates to multi-perspective sound values.

57. The apparatus according to claim 56, wherein the multi-perspective sound values comprise left front, right front, left rear and right rear sound values .

58. The apparatus according to claim 56, wherein the multi-perspectives comprise 5.1 channel based surround sound positions.

59. The apparatus according to claim 56, wherein the multi-perspectives comprise 7.1 channel based surround sound positions.

60. The apparatus according to claim 56, wherein the multi-perspectives comprise left front, center, right front, left side, right side, left rear, right rear, left front height, right front height, left rear height, right rear height, and LFE channel (11.1) .

61. The apparatus according to claim 56, wherein the multi-perspectives comprise a spherical channel based layout.