WO2010113142A2

WO2010113142A2 - Musical environment for remotely connected users

Info

Publication number: WO2010113142A2
Application number: PCT/IB2010/051467
Authority: WO
Inventors: Udayan Kanade
Original assignee: Udayan Kanade
Priority date: 2009-04-03
Filing date: 2010-04-05
Publication date: 2010-10-07
Also published as: WO2010113142A3

Abstract

A musical environment for remotely connected users is disclosed. In an embodiment, the environment comprises means for producing a repeating temporal pattern, such as a repeating pattern of sounds. The repeating temporal pattern is conditionally synchronized to a repeating temporal pattern produced at a remote environment.

Description

Title of Invention: MUSICAL ENVIRONMENT FOR REMOTELY

CONNECTED USERS

[1] This patent claims priority from patent number 911 /MUM/2009 filed in Mumbai,

India on 3 April 2009.

Technical Field Technical Field

[2] The present invention relates to environments that connect remote users. More particularly, the invention relates to musical environments that connect remote users. Background Art Background Art

[3] Environments that connect remote users on a network are well-known in the art.

Such environments allow audio and video data to be exchanged between the remote users. These environments may be connected over telecommunication or data networks. Summary

[4] A musical environment for remotely connected users is disclosed. In an embodiment, the environment comprises means for producing a repeating temporal pattern, such as a repeating pattern of sounds. The repeating temporal pattern is conditionally synchronized to a repeating temporal pattern produced at a remote environment.

[5] The above and other preferred features, including various details of implementation and combination of elements are more particularly described with reference to the accompanying drawings and pointed out in the claims. It will be understood that the particular methods and systems described herein are shown by way of illustration only and not as limitations. As will be understood by those skilled in the art, the principles and features described herein may be employed in various and numerous embodiments without departing from the scope of the invention. Brief Description of Drawings

[6] The accompanying drawings, which are included as part of the present specification, illustrate the presently preferred embodiment and together with the general description given above and the detailed description of the preferred embodiments given below serve to explain and teach the principles of the present invention.

[7] Figure 1 depicts a user environment connected to a second user environment through a network, according to an embodiment.

[8] Figure 2 is a schematic diagram of an interface box accepting rhythm markers from a data stream, according to an embodiment. [9] Figure 3 is a flow diagram of a synchronization routine, according to an embodiment.

[10] Figure 4 depicts a visual aid displayed on a display in the user environment, according to an embodiment.

[11] Figure 5 depicts a visual aid having two markers, according to an embodiment.

Detailed Description

[12] A musical environment for remotely connected users is disclosed. In an embodiment, the environment comprises means for producing a repeating temporal pattern, such as a repeating pattern of sounds. The repeating temporal pattern is conditionally synchronized to a repeating temporal pattern produced at a remote environment.

[13] Figure 1 depicts a user environment 199 connected to a second user environment

198 through a network, according to an embodiment. User environment 199 has a user 101, a recording device 102 (such as a microphone), and a playback device 103 (such as a speaker). In an embodiment, user environment 199 also has a display 104. The recording device 102, the playback device 103 and the optional display 104 are connected to at least one remote user environment such as second user environment 198, through an interface box 105 and a data network 150. In an embodiment, a performance created by the user 101 is transmitted to one or more than one remote users environment through the data network 150, and is played through their playback devices. A performance may be vocal or instrumental sound, a dance or actions, or any performance that the system of the present invention is configured to record, transmit and play back. Similarly, any performance made by users of a remote user environment is played through playback device 103.

[14] The data network 150 may include data transmission lines, repeaters and data routers.

It may also include intermediate hosts (servers) having more intelligence that plain routers, repeaters and data lines. A part or whole of the process described in the present invention may be implemented on such intermediate network hosts.

[15] The interface box 105 also generates a repeating temporal pattern. A repeating temporal pattern is used by musicians to synchronize the music they are playing. The repeating temporal pattern may be communicated to the user 101 using audio or visual means. It may be a repeating pattern of sounds, a repeating visual pattern, a repeating tactile pattern, repeating pattern of vibrations, a repeating series of light flashes or lighting conditions, etc.

[16] In an embodiment, the repeating temporal pattern generated by the interface box 105 is a repeating pattern of sounds, played through a speaker (playback device 103 may be a speaker). Many musicians play music over a repeating pattern of sounds. The repeating pattern of sounds may be a rhythm played on percussion instruments such as drums, timpani, or other instruments including a bass guitar, a guitar, a piano, a harmonium or a sarangi. It could be a pattern made up of sounds of various instruments, or electronically produced sounds. It could be a repeating metronome sound, which is used by musicians to mark time. The repeating pattern of sounds being played by the speaker 103 may be caught by the microphone 102. Any sound played by the speaker 103 and captured by the microphone 102 may be prevented from being transmitted over the data network 150 (or may be reduced to a large extent) by use of acoustic isolation between the speaker 103 and microphone 102, or by using virtual isolation techniques such as echo cancellation or echo reduction. In another embodiment, the repeating pattern of sounds is allowed to be captured by the microphone 102 and transmitted over the data network 150.

[17] When the user 101 is producing sound, this sound is captured by the recording device

102. The recording device 102 may be an audio microphone, or it may be an instrument-specific audio pick-up (e.g. a guitar pick-up microphone) or performance may be captured by other means such as MIDI (Musical Instruments Digital Interface) (or other digital performance data) picked up from an instrument having a digital interface. Other means of picking up performances are also known in the art, such as through video or motion capture. The performance being captured may not be an audio performance at all, but some other synchronized performance such as dancing or any action that is to be synchronized to a repeating temporal pattern. Such a performance may be captured by other appropriate means, such as video cameras, motion capture devices, etc.

[18] The captured performance of user 101 is encoded into a data stream, and sent over the data network 150. The encoding into the data stream may include various kinds of sound or other media compression techniques. Sending of data over the network may include techniques such as resending lost packets/data, data buffering, error detection, error correction, jitter reduction, packet/data resequencing to correctly order data which is received out-of-order, detection of duplicate packets, and any techniques of sending generic data or media data over a data network.

[19] In a particular mode of the interface box 105, together with this encoded performance, the data stream also includes rhythm markers. A rhythm marker is a piece of information indicating which instant from the repeating temporal pattern was being produced when a particular instant of performance was captured by the recording device 102. Since the interface box 105 is both generating the repeating temporal pattern through the playback device 103 as well as capturing performance through the recording device 102, the interface box 105 has the information to generate a rhythm marker. A rhythm marker may be generated at regular intervals, and these regular intervals may be arranged so that an integer number of such intervals fit exactly into one repeating temporal pattern. [20] In another mode of the interface box 105, the interface box 105 accepts a data stream coming from a remote user environment. The incoming data stream has rhythm markers, and possibly performance data. In an embodiment, the actual repeating temporal pattern played in the remote user environment is isolated from or removed from the captured performance, and thus, the actual repeating temporal pattern is not present in the performance data. In this case, the repeating temporal pattern may be generated by the interface box 105 in synchronicity with the rhythm markers received in the data stream, as the corresponding performance data is being played. This will recreate an approximate replica of the performance in the remote user environment, at the user environment 199, since both the repeating temporal pattern as well as the performance generated by the user of the remote user environment, will be played in correct synchronization through playback device 103. The performance data and the repeating temporal pattern may be played through separate playback devices (such as speakers or displays) present in user environment 199 (not shown), so that better separation of sources (such as sound sources) is perceived by user 101. The volume (audio level) of the generated repeating temporal pattern (in the case that it is a repeated pattern of sounds) can be controlled by the user 101 or automatically, for listening comfort, or so that both the remoteperformance (in the case that such performance is a sound) and the repeating temporal pattern can be heard, without one overpowering the other.

[21] In another embodiment, the actual repeating temporal pattern played in the remote user environment is not isolated from or removed from the captured performance, and thus, the actual repeating temporal pattern is present in the performance data being received. In this case, the repeating temporal pattern may not be generated by the interface box 105. Even in this case, the repeating temporal pattern may be generated by the interface box 105, possibly at a low volume, so that the repeating temporal pattern in the remote user environment can be better monitored, as well as the repeating temporal pattern is clearly heard or perceived.

[22] Figure 2 is a schematic diagram of an interface box 299 accepting rhythm markers from a data stream, according to an embodiment. A buffer 201 stores data 205 of a data stream 202 coming in over a data network. The data 205 has performance data 203 and rhythm markers 204, each rhythm marker pointing to a moment in the performance data 203 and indicating which moment in the repeating temporal pattern 207 should be synchronized to that particular moment in the performance data 203. The buffer 201 stores data for a certain amount of time before it is played back, to smooth out data flow irregularities in the data network. A clock 206 synchronizes the playback of the performance data 203 in the buffer 201 and performance data from the repeating temporal pattern 207. [23] In certain cases, the data network may fail to deliver all the data traveling over it, and some performance data as well as some rhythm markers may be lost. In an embodiment, if a piece of performance data 203 is missing, i.e. if two rhythm markers (consecutive or otherwise) do not have requisite amount of performance data in between their marked positions, the playback of the repeating temporal pattern 207 is sped up or jumped forward to resynchronize with the playing performance data. So to avoid abrupt jumps in performance, the performance from the old playback position and the performance from the new playback position may be crossfaded.

[24] In an embodiment, if a piece of performance data 203 is missing, the playback of the performance data 203 is paused till the correct moment in the repeating temporal pattern 207 is reached. The pause may be filled with frequencies, images or other performance characteristics similar to the ones just played, so that abrupt break in performance is not perceived. These generated frequencies, images, etc. may be slowly faded out. Similarly, frequencies, images, etc. may be generated and faded in so that abrupt start of performance is not perceived.

[25] Both the above techniques (of jumping the repeating temporal pattern 207 and of pausing the performance data 203) may be performed together to resynchronize the repeating temporal pattern and the performance.

[26] If the delay in the data network increases, the performance data may not be available in time for synchronous playback with the repeating temporal pattern 207, but may become available after the corresponding moment in the repeating temporal pattern 207 has already been played back. If this happens spuriously, such performance data which reaches late, may be rejected. But if this happens quite frequently, then the playback of the repeating temporal pattern 207 may be delayed by some amount of time, so that the performance data from the remote user environment has more time to reach the buffer 201 over the data network, i.e., the playback of the repeating temporal pattern 207 is slowed down or jumped back by some amount of time. The resynchro- nization may also be achieved by pausing, speeding up or jumping any of the performance data being played back and the repeating temporal pattern, in such a way that data corresponding to a larger time interval is being buffered in buffer 201 before being played back.

[27] Figure 3 is a flow diagram of a synchronization routine 399, according to an embodiment. The synchronization routine may be performed by an interface box near a user, or it may be performed at on an intermediate host (media server) in the data network, or even a remote interface box. The synchronization routing may be distributed across many devices, such as interface boxes and intermediate hosts. The interface box produces performance (301) through a playback device (such as speaker or speakers). The performance produced may include some section of performance from a repeating temporal pattern. The performance produced may also include some performance from performance data that is streaming in over a data network. The performance data streaming in over the data network may have rhythm markers specifying which moment from the repeating temporal pattern should be synchronized with a particular moment of the performance data. If the particular performance data stream has been synchronized with the repeating temporal pattern, then performance data that is chosen to be played is data that is appropriate to the moment of the repeating temporal pattern that is currently being played. If the particular performance data stream has not been synchronized with the repeating temporal pattern, then performance data may be chosen to be of equal amount in time to the chunk of repeating temporal pattern that will be played.

[28] Periodically or intermittently, the synchronization routine 399 (possibly running on an interface box) checks whether the user of the user environment is performing (302). If the user of the user environment is performing (e.g. by singing, playing an instrument, speaking, dancing, or doing whatever action the system is configured to accept as performance), this will be captured by a recording device. The performance being continuously captured by a recording device may be used to detect whether the user is performing. For example, if the sound level captured by a microphone goes beyond a threshold value, it may be decided that the user is producing sound. More sophisticated detection of sound production by user may be used. For example, algorithms to detect the difference between background noise, and the sound of speech or singing or a particular instrument may be used to detect whether user is producing a sound. Appropriate movement of the user (such as the movement of mouth for singing, or the movement of hands for playing a piano, etc.) may be detected using other means such as a video camera to detect whether the user is producing sound. The user may himself or herself provide input as to whether or not he or she is performing (or, in an embodiment, whether or not he or she wishes to send rhythm markers together with the performance data being sent out). This may be done by the user by using a switch or user interface, or by other means such as gestures, etc. Detectors such as sound or movement detectors may be placed on the user or the instrument being played to decide whether the user is performing. In an embodiment, even if the user is not performing at a particular moment, if he or she has performed within a particular amount of past time, it is still considered that the user is performing. In an embodiment, the user indicates only the beginning of his/her performance producing phase by use of a button/gesture/etc. , and the end of the user's production of performance is detected automatically, or detected based on whether a remote user environment produces a data stream having performance data or having rhythm markers.

[29] If the user is performing, the performance is sent as a media data stream over the network (303). Together with the media data stream, rhythm markers specifying which moment from the repeating temporal pattern was playing when the particular performance from the user environment was captured are sent periodically or intermittently. Since the interface box both produces the repeating temporal pattern, as well as captures the user's performance, it has this data. In an embodiment, performance captured from the user environment is slightly delayed (e.g. enters hardware buffers or operating system buffers, etc.) before becoming accessible to the interface box. Similarly, performance may take some time from being assigned to be produced by the playback device and being actually produced by the playback device. Thus, the performance that is recorded at a particular moment by the interface box may not correspond in the user environment to performance that is being sent for playback at that particular moment, but it may correspond to performance that was sent for playback some fixed time ago. In this case, we may mark the performance data stream with rhythm markers of an appropriately earlier time to the time at which the performance was recorded. This fixed delay between the production and recording of performance may be measured by system measurements, or it may be measured in a currently live user environment by measuring the delay between when a performance is sent out towards the playback device and when it is received from the record of the recording device.

[30] In an embodiment, the performance, playback and recording are all in the audio medium, and the fixed delay between the production and recording of performance is measured by measuring the delay between when a sound is sent out towards the speakers, and when the same sound is received in the microphone record. If an echo canceler is being used to cancel such sound from the audio before it is sent out, the echo canceler can produce the information about the principal delay between an input and an output sound. For example, if the echo canceler works on an adaptive delay echo cancellation principle, the value of the best adapted delay can be used as the value of this fixed delay. As another example, if the echo canceler works by identifying the filter response of the filter between the output sound stream and the input sound stream, the time delay corresponding to the first maxima (or the highest maximum) of the filter response can be used as the value of this fixed delay.

[31] In an embodiment, rhythm markers are not sent even though performance data is being sent. This is useful if the user wishes to convey some useful information like speech or a phrase of music, but does not wish the remote user to resynchronize his rhythm to the local rhythm. For example, in an instructional setting, this would reduce the confusion that a remote student will feel by many changes in rhythm. The indication that the user does not wish to send rhythm markers may be explicitly provided by the user, using a switch, a gesture or any user input feature. In an embodiment, if the user uses speech instead of musical sounds, the rhythm markers are not sent. In another embodiment, the rhythm markers are not sent if the user utters or creates a short phrase (shorter than a fixed amount of time), i.e. rhythm markers will be sent only if the user is performing for more than a fixed amount of time. In another embodiment, the playing of a particular instrument or use of a particular microphone, etc. triggers the sending of rhythm markers, but the use of a second instrument or a second microphone does not. For example, the user may have two microphones, one preferably for speech and one for musical performance, and whether to send rhythm markers or not will be decided based on which of these two microphones is capturing more of the sound produced by the user. In another embodiment, the playing of an instrument triggers the sending of rhythm markers, but the use of a microphone does not. Thus, the user may be provided control over which events (among various microphones, audio, video or other inputs, and kinds of signals within each input, for example speech vs. singing, dancing vs. gesturing, etc.) trigger the sending of rhythm markers and which do not. In an embodiment, rhythm markers are sent if the rhythm pattern is within a specified subset of the whole rhythm pattern. For example, for singing involving very long duration rhythm cycles (such as common in Indian classical music), rhythm markers may be sent only if the rhythm is close to the first beat of the rhythm cycle. In an embodiment, rhythm markers are sent if the phrase (such as musical phrase) being performed is of a rhythmic nature, and not sent if the phrase being performed is of a less rhythmic or non-rhythmic nature (such as singing/ performing ad libertum). In an embodiment, rhythm markers are sent over the entire cycle, but, over the entire cycle or in a subset of the cycle, synchronization is done not to the same moment in the repeating temporal pattern, but to a moment having a certain musical relation (e.g. a distance of an integeral or fractional number of beats) with the actual moment.

[32] In an embodiment, when performance data is being sent, rhythm markers are sent to certain remotely connected environments and not sent to other remotely connected environments. For example, in a situation where one person is teaching/training/leading a musical session, the students/trainees/followers may send rhythm markers to the trainer, but not to each other. Thus, all the participants will perceive the performance of the performing trainee student, but only the trainer will resynchronize to the performing trainee, minimizing the number of resynchronizations for other participants, and reducing their mental confusion. As another example, the musicians may be set up as a "stream" of performers, and they will send rhythm markers downstream, but not upstream.

[33] In all the embodiments above where the rhythm markers are not sent even though performance data is being sent, it is also possible that the rhythm markers are sent, but their effect is nullified in some other way, for example by not being forwarded by an intermediate host, or being disregarded by the receiver. In this case, the detection of situations where the rhythm markers may be disregarded or nullified may be performed by such intermediate hosts or receivers.

[34] The rhythm markers may be sent at regular intervals, and in an embodiment, there is an integer number of such intervals in one cycle of the repeating temporal pattern. These rhythm markers may be sent at particularly identified moments of the repeating temporal pattern. For example, rhythm markers may be sent at the start and exactly the middle of the repeating temporal pattern, or they may be synchronized to the repeating temporal pattern. In an embodiment, a rhythm marker is sent just as the performance data corresponding to the user's performance phrase begins to be sent. This rhythm marker may not match one of the particularly identified moments in the repeating temporal pattern at which a rhythm marker is supposed to be usually generated, but the information contained in this rhythm marker will help synchronize the performance data in the remote user environment, just as the performance data is beginning to be played back. In an embodiment, rhythm markers are sent for every packet of data transmitted over a data network that transmits packet data.

[35] If the user is not performing, the synchronization routine 399 checks whether rhythm markers are being received in any data stream coming in from the network (304). If rhythm markers are being received in a data stream, the currently playing repeating temporal pattern and the incoming data stream containing rhythm markers may have to be resynchronized (305) to each other. In an embodiment, resynchronization is not performed if the incoming data stream and the repeating temporal pattern have already been synchronized before, and the data stream has been reproducing performance data after such synchronization (continuously, or with gaps of not more than a particular fixed interval).

[36] A resynchronization is an action which causes an incoming data stream containing performance data and rhythm markers to be played in such a way that the moments in performance data having rhythm markers associated with them are played coinci- dentally with the moments in the repeating temporal pattern that the rhythm markers specify. At a given moment when resynchronization is to be performed, the moment in the repeating temporal pattern that the next performance in the data stream should be played at is determined. This can be done by using an explicit rhythm marker, if it is available, corresponding to the next performance in the data stream. Otherwise, it can be calculated from the previously encountered rhythm marker, and the amount of performance data encountered between the previous rhythm marker and the next performance in the data stream. If the moment in the repeating temporal pattern that the next performance in the data stream should be played at is the same as the moment in the repeating temporal pattern that will be played next, then no specific resynchro- nization step needs to be performed. On the other hand, if they are not the same, resyn- chronization may be performed by changing the playback characteristics of either the incoming performance data, or of the repeating temporal pattern, or both.

[37] In an embodiment, changing the playback characteristic of the incoming performance data comprises withholding playback of the incoming performance data till the appropriate moment in the repeating temporal pattern is reached. If there was previous data playing from this performance data stream and a gap in performance is being produced by such withholding of playback, such gap may be filled by frequencies, images, etc. similar to the reproduced performance data and the performance to be reproduced, frequencies, images, etc. that may be merged into each other by cross- fading, or maybe used to fade out the original performance and fade in the new performance. In another embodiment, changing the playback characteristic of the incoming performance data comprises rejecting some amount of performance data, i.e. jumping forward in the performance data. To reduce the perception of the jump or break in performance, the original and the new performance may be cross-faded. Such jumping forward loses some amount of incoming performance data, and in an embodiment, it is performed only if the data being lost in this case is of a small amount. Instead of withholding playback of the incoming performance data, the performance may be slowed down. This may be achieved by introducing pauses into places which already have pauses, or by drawing out the characteristics of a particular phrase (such as the frequency spectrum of a musical phrase) over a longer period of time. Similarly, instead of jumping forward in the performance data, the performance may be sped up, by reducing the time of pauses in the performance and by reducing the time of particular phrases in the performance.

[38] In an embodiment, changing the playback characteristic of the repeating temporal pattern comprises changing the playback of the repeating temporal pattern to a new location, so that the repeating temporal pattern is synchronized to the rhythm markers in the incoming data stream. This jump may be effected immediately, or may be performed by speeding up or slowing down the playback of the repeating temporal pattern till resynchronization is achieved. To prevent the jump in the repeating temporal pattern from jarring the user, the two patterns (which could be two locations in a repeating pattern of sounds) may be cross-faded or one pattern may be faded out as another pattern is faded in. Some parts of the repeating temporal pattern may be more easily identifiable to the user than other parts. For example, the beginning and the center of a rhythmic pattern may be more identifiable than an arbitrary location in a rhythmic pattern. In this case, instead of jumping directly to the appropriate moment in the repeating temporal pattern, the jump may be performed to a nearby easily iden- tifiable moment. The synchronization thus caused will be approximate, not perfect, and the remaining difference between the repeating temporal pattern and the incoming performance data may be eliminated by changing the playback characteristic of the incoming performance data. It may also be musically pleasing to jump from one moment of a repeating temporal pattern to another moment which is musically related to it in some way. For example, in a sixteen-beat rhythmic pattern, jumping 8 beats forward or back may be less noticable than jumping 5 beats. In general, in an n- segment rhythmic pattern, an m-segment jump is less noticable than a k-segment jump if gcd(m,n) is larger than gcd(k,n), where gcd stands for the greatest common denominator. In this case, instead of jumping directly to the appropriate moment in the repeating temporal pattern, the jump may be performed to a moment nearby to the appropriate moment which is musically related to the presently playing moment in the repeating temporal pattern. Furthermore, such jump may not be performed immediately, but may be performed when the repeating temporal pattern reaches a moment such that the musically related moment to this moment is also an easily identifiable moment. The synchronization thus caused will be approximate, and the remaining difference between the repeating temporal pattern and the incoming performance data may be eliminated by changing the playback characteristic of the incoming performance data. In such an embodiment, there are finite points from which a jump may be performed into other points in the repeating temporal pattern, and there are points to which a jump may be performed from each such point in the repeating temporal pattern. The jump from a specific point to a specific point in the repeating temporal pattern may be accompanied by a special mini-performance pattern which connects the points, such that such performance will signify the jump and/or make it more musically acceptable. Such mini-performance may be a special chord progression, or a connecting rhythmic phrase, etc.

[39] The more approximate the change caused by the change in the playback characteristic of the repeating temporal pattern, the more the playback characteristic of the incoming performance data has to be changed, creating large delays or loss of performance data. An expert user may be more tolerant of abrupt changes in the repeating temporal pattern than a novice user. Thus, for the same repeating temporal pattern, different users may have different settings as to easily identifiable moments in the repeating temporal pattern, or the allowable jumps, or which jumps are considered musically related and which are not. For example, in a 16-beat pattern, an extremely novice user will not tolerate any jump in the beat pattern, an intermediate user may tolerate jumps of 4, 8 and 12 beats, a talented user may tolerate jumps of 2, 4, 6, 8, 10, 12 and 14 beats, and an extremely expert user may tolerate jumps which may not even be an integer or rational number of beats. Making this choice may comprise, in an em- bodiment, just choosing a level of expertise for different parties involved. The difference in expertise may correspond to a difference in the minimum allowed gcd of the jump size and the repeating pattern size. A difference in expertise is particularly relevant to instructional situations, i.e. where one user is a teacher and another a student. In an embodiment, an expert user may choose expertise parameters not only for himself/herself but also for other remote (possibly novice) users. The choice may also be made by choosing one (or more) of the parties as teachers/experts, and the others as students/novices.

[40] In an embodiment, when a student or novice is performing, he or she cannot perceive the performance of other students or novices, but can perceive the performance of the teachers/experts. In this way, the teacher or expert may check the performance of various students/novices together, without them getting confused with each others performances. In an embodiment, the students/novices perform as and when the teacher/ expert is performing, and this performance is heard by the teacher/expert after such a session of performing together is finished, for example after completion of one repeating temporal pattern. The student may hear his/her performance too as the teacher is hearing it, so as to be able to review their own performance. The student may hear the performances of all the students in the session (and also the teachers, if they performed) synchronized, during such a reviewing phrase.

[41] In an embodiment, resynchronization is performed to gain buffer space before beginning playback of the performance data. In this case, a performance data from the incoming stream will be considered as playable at the present moment or any future moment, only if performance data corresponding to a certain period of time in the future of that performance moment has already been received. A resynchronization comprising a change in the playback characteristic of the repeating temporal pattern will be performed only if such playable performance data exists at the present moment. If such data does not exist at the present moment, the data that exists will be held in the buffer and data will be accumulated till enough amount of data is available, and a resynchronization will be triggered, if necessary, only after that. How much data is to be buffered before resynchronization can be predetermined, or can be decided based on the characteristics of the data network, with more data required to be buffered if the data network is more stochastic in nature. How much data is to be buffered can be decided based on how much data was being buffered to achieve smooth playback of incoming performance data during a previous episode or session.

[42] In an embodiment, too many resynchronizations comprising changing the playback characteristic of the repeating temporal pattern are avoided. This is done so that frequent rhythm changes do not confuse the user. If a change in the playback characteristic of the repeating temporal pattern has occurred in the recent past, resynchro- nization may be avoided altogether, or resynchronization comprising changing the playback characteristic of the incoming performance data may be given preference. Frequent changes in the playback characteristic of the incoming performance data may also be jarring to the user, as the performance of the remote user will not be perceived well. If this is happening, a change in the playback characteristic of the repeating temporal pattern may be performed to gain more buffer space for the incoming performance data, so that network flow can be better smoothed. In an embodiment, a resynchronization comprising changing the playback characteristic of the repeating temporal pattern is avoided if the repeating temporal pattern is presently synchronized to a particular incoming data stream. I.e. if a first remote user is performing and the repeating temporal pattern produced locally is synchronized to the rhythm markers of the first remote user, a second remote user producing rhythm markers will not trigger a change in the playback characteristic of the repeating temporal pattern, and changing the playback characteristic of the performance data from the second remote user may be performed for synchronization.

[43] In an embodiment, after resynchronization that causes a jump in the repeating temporal pattern is performed, the next time a rhythm marker is generated to be sent out over the data network, the rhythm marker contains a special token signifying that resynchronization causing a jump in the repeating temporal pattern occurred between the generation of the previous rhythm marker and the generation of the present rhythm marker. Once such a rhythm marker is received by a remote interface box (or any agency performing synchronization routing 399), it may perform a resynchronization of its own.

[44] In an embodiment, resynchronization of the repeating temporal pattern to an incoming data stream is performed even if the local user is performing. If only a change in the playback characteristic of the remote performance data is used to cause this synchronization, then the local user will not notice a jump in the rhythm. On the other hand, in an embodiment, even a change in the playback characteristic of the repeating temporal pattern is used to cause this synchronization, i.e. the remote user performing is an event that overrides the local user performing.

[45] Figure 4 depicts a visual aid 499 displayed on a display in the user environment, according to an embodiment. An indication 401 corresponds to the repeating temporal pattern (it may either be itself the repeating temporal pattern, or may be an accompaniment to other repeating temporal patterns such as a repeating pattern of sounds). A marker 402 indicates the moment of performance pattern currently playing from the repeating temporal pattern. The marker 402 (or the indication 401 or both) are moved so that at various times during the usage of the visual aid 499, the marker 402 points to the moment of performance pattern currently playing from the repeating temporal pattern. The marker 402 may move continuously, or may move in a staggered fashion, so that it only points to specific events such as beats or rhythmic divisions, rather than to any possible location. In an embodiment, the marker may be moved such that it slows down near the beats or rhythmic divisions, and speeds up in the intervals. The marker 402 may be a specific geometric shape, or it may be a highlighting or other embellishment which moves over the indication 401 (or highlights or embellishes various parts of the indication 401). The indication 401 may be a single line 403 or multiple lines, or a circle corresponding to the repeating temporal pattern. The line 403 may have marks 404 identifying events such as beats or rhythmic divisions. The indication

401 may include visual aids to identify the corresponding sounds in the repeating temporal pattern, such as notes in staff notation, percussion marks in staff notation, names of notes, percussion mnemonics, chord names, any musical notation or mnemonic aid, or any symbol or indication signifying special events (events such as the first beat, or an important intermediate beat, etc.)

[46] The user environment may have other interface, which may or may not use the display. An interface can be provided for changing the repeating temporal pattern. For example, metronomes, rhythm machines, music sequencers, lehra machines, light sequencers, etc. produce repeating temporal patterns, and have interfaces by which the repeating temporal pattern can be set or modified.

[47] In an embodiment, the visual aid is the sole indication of a repeating temporal pattern, and there is no sound that goes with it. A repeating pattern of sounds is just one manifestation of a repeating temporal pattern, and any repeating pattern, may it be audible, visual or otherwise, may be synchronized across a data network using the present invention.

[48] The visual aid may take the form of a repeating animation or live action sequence.

For example, many musical systems have rhythm mnemonic hand gestures (or movements of a baton), and an animation or live action sequence showing such hand gestures may be used.

[49] More than one visual aids signifying the repeating temporal pattern may be used at the same time.

[50] When there is a change in the playback of the repeating temporal pattern, the marker

402 is moved to point to the new position in the playback of the repeating temporal pattern. This helps in reorienting a user to the new rhythm situation presented. For example, many sub-phrases of a long repeating temporal pattern may sound similar, and the user may take a long time to adjust to an abrupt jump in the repeating temporal pattern, i.e. he/she may take a long time to identify exactly where the jump has landed. With a visual aid, this identification, i.e. reorientation of the user to the rhythmic situation, is helped. [51] The display may also display a live image/video of the remote user, by capturing video of the remote user by a video camera, and transferring it to the local interface box over a data network. The live video may be synchronized using a technique similar to the performance synchronization described in the present patent. A live video is especially useful for dance performance or instruction, or for instruction regarding the proper technique in vocal/instrumental music.

[52] Figure 5 depicts a visual aid 599 having two markers, according to an embodiment.

In an embodiment, the interface box keeps in memory at least two distinct moments in the repeating temporal pattern, one corresponding to a remote user environment, and one corresponding to the local user environment. Both the moments are advanced with advancing time. If a visual aid is provided, indication of both the moments may be provided simultaneously, such as marker 502 and marker 503 placed on a single indication 501. (Separate indications may be used for separate markers, too.) In this mode, availability of incoming rhythm markers triggers a resynchronization to one of the advancing moments, and production of performance by the local user triggers a resynchronization to the other advancing moment. In an embodiment, the moment corresponding to the remote user lags behind the moment corresponding to the local user by an amount of time which is slightly more than the round trip time between the two users. (The round trip time is the sum of the time taken for local data to reach the remote user, and the time taken for remote data to reach the local user, and may include network delays and also buffering latencies at both ends.) (This lag may be chosen such that the two moments are musically related in some way. E.g. in a 16-beat pattern the lag may be chosen to be equal to 4, 8 or 12 beats.) Thus, when the remote user stops producing performance and the local user starts producing performance, the remote user will not notice a jump in his/her repeating temporal pattern. This mode is especially useful if the local user is an expert and the remote user is a relative novice.

Claims

[Claim 1] A method comprising receiving performance data, receiving rhythm markers together with the performance data, reproducing performance data, producing a repeating temporal pattern, wherein the repeating temporal pattern is synchronized with the performance data using data from the rhythm markers, a rhythm marker being any data capable of indicating how performance data may synchronize with a repeating temporal pattern.

[Claim 2] The method of claim 1, wherein the synchronization is performed when remote data having both performance data and rhythm markers is received.

[Claim 3] The method of claim 1, wherein the synchronization is not performed during the performance of a local user.

[Claim 4] The method of claim 1, wherein the synchronization is performed based on the kind of performance data received.

[Claim 5] The method of claim 4, wherein the synchronization is performed if a particular sound capture device is used, and not performed if another sound capture device is used.

[Claim 6] The method of claim 4, wherein the synchronization is performed during a musical phrase, but not performed during speech.

[Claim 7] The method of claim 4, wherein the synchronization is performed during a certain subset of the repeating temporal pattern, and not performed during another.

[Claim 8] The method of claim 1, wherein the repeating temporal pattern includes a pattern of sounds. [Claim 9] The method of claim 1, wherein the repeating temporal pattern includes a visual indication.