GB2506613A

GB2506613A - User input timing data to generate a video for accompanying an audio track

Info

Publication number: GB2506613A
Application number: GB201217648A
Authority: GB
Inventors: Christopher Mcneeney
Original assignee: MEMEPLEX Ltd
Current assignee: MEMEPLEX Ltd
Priority date: 2012-10-03
Filing date: 2012-10-03
Publication date: 2014-04-09
Also published as: GB201217648D0

Abstract

A computer implemented method is provided for generating a video to accompany an audio track. The method comprises receiving a selection of an audio track 101 to be accompanied by a video and receiving a selection of video content data 102 to be included in the video. Segments of the video content are displayed, on a display, whilst the audio track is simultaneously outputted, i.e. played 104. As the audio file is output, user input is received 105 indicating the time at which each segment of video content data should be displayed in relation to the position reached in the audio file. This timing data is stored for each group 106. A video is then generated 107, either locally or remotely, by generating a video frame for each group of video content data, the video frame including the video content data segment of the group for display, and associating 108 the timing data for each group with the corresponding video frame such that the video frame is displayed at a time based on the time indicated by the user input when the video is played with the audio track 109. The video and audio tracks are thus synchronized by the user. The generated video could find application in Karaoke devices.

Description

tM:;: INTELLECTUAL

PROPERTY OFFICE

Application No. 0B1217M8.3 RTM Datc:8 April 2013 The following terms are registered trademarks and should be read as such wherever they occur in this document: YouTube Intellectual Properly Office is an operaling name of Ihe Patent Office www.ipo.gov.uk

AUTOMATIC VIDEO GENERATION

Field

The present invention relates to a method and accompanying system for automatically generating a video to accompany an audo track.

Background of the Invention

Karaoke systems are used to display to a user the words, or lyrics, that accompany a song.

The system displays, as a video or slideshow on a display, groups of text representing lines of lyrics, sequentially in time with the occurrence of the lyrics within the song.

An aim of the present invention is to allow users to automatically generate their own videos to accompany songs, including videos that show lyrics to accompany the song.

Summary of the Invention

The invention is defined in the claims, to which reference is now dtected. Preferred features are set out in the dependent claims.

Embodiments of the invention provide a computer implemented method for generating a video to accompany an audio track. The method comprises receiving a selection of an audio track to be accompanied by a video and receiving a selection of video content data to be included in the video. Segments of the video content are displayed, on a display, whilst the audio track is simultaneously outputted. As the audio tile is output, user input is received indicating the time at which each segment of video content data should be displayed in relation to the position reached in the outputting of the audio file. This tming data is stored for each segment. A video is then generated, either locally or remotely, by generating a video frame for each segment of video content data, the video frame including the video content data segment of the group for display, and associating the timing data for each segment with the corresponding video frame such that the video frame is displayed at a time based on the tme indicated by the associated user input when the video is played with the audio track.

The segments of video content may be assigned to groups to make it easier to identify the segments. As the audio file is output, the user input is received indicating the time at which each group of video content data should be displayed in relation to the position reached in the outputting of the audio file. The timng data is stored for each group, and the video is then generated by generatng a video frame for each group of video content data, the video frame includng the video content data segment of the group for display. Timing data s associating, for each group, with the corresponding video frame such that the video frame is displayed at a time based on the time indicated by the associated user input when the video is played with the audio track. The method may include the step of dviding the video content data into segments, each segment to be assigned to a group.

The video content data may be text data, containing text to accompany the audio track, such as lyrics. The text is then divided into segments of texts, such as a predetermined number of lines of text, and each segment is assigned to a particular group whereby each segment can be displayed and identified as a segment to be assigned timing data to by the user.

The video content data may be image data, containing mages to accompany the audio track. The image data may be provided as a plurality of image files, each of which is considered to be a segment of the video content data, and each of which is assigned to a given group.

The step of displaying the video content of each group may further comprise displaying one or more indicators each indicative of a given group for which the user input can be received. The indicators may be icons or buttons each associated with a given group/segment. The user nput may be provided by interacting with an icon or button for each group/segment using a user input device. Only one indicator may optionally be displayed at a gven time, the method further comprising, once user input has been received for a first group, removng the indicator for the first group and displaying an indicator associated with the next group.

The tming data for each group/segment may be displayed as it is received from the user input.

The method may turther comprise receving user input to manually adjust the timing data for a given group/segment and, n response, manually adjusting the timing data for the group.

The method may turther comprise automatically adjusting the time at which each group/segment of video content data should be displayed in relation to the postion reached in the outputting of the audio file. The fime at which each group of video content data should be displayed may be adjusted so that it is displayed earlier in relation to the position reached in the outputting of the audio file than the time indicated by the user input. The time may be adjusted by a predetermined period such as between 1 and 5 seconds.

When the video content data is text data, the size of the text characters displayed n each video frame may be determined based upon the amount of textual content contained in each group. The number of characters or number of words in each group of text data may determine the size of the text characters used, whereby the font size used for each frame is determined based on the number of characters or words in each group of text data. The size of the font used may adjusted in proportion, and particularly in direct proportion, to the number of text characters or words used.

A corresponding computer program may be provided for use in a computer implemented method for generating a video to accompany an audio track. The computer program, when IS loaded onto a computing device, may be configured to cause the computing devce to perform certain functions, in response to receiving a selection of an audio track to be accompaned by a video and receiving a selection of video content data to be included in the video. In particular. the program may cause the computing device to output for display the video content of each segment, whilst the audio file is simultaneously output, and, as the audio file is output, in response to receiving user input indicating the tme at which each segment of video content data should be cisplayed n relation to the position reached n the outputting of the audio file, store the tming data for each segment. As descrbed in the above method, a video is then generated by generating a video frame for each segment of video content data, the video frame including the video content data segment of the segment for display, and associate the timing data for each segment with the corresponding video frame such that the video frame wll be displayed at a time based on the time ndicated by the associated user input when the video is played with the audio track. This video may be generated under the control of the computer program, or alternatively the computer program may cause the computing device to send the timing data to a further computing device to generate the video.

As with the method described above, the segments of video content may be assgned to groups under control of the computer program.

The computer program, when executed on a computing device may be configured to cause the computing device to carry out any of the methods described herein.

A corresponding computing device may be provided for use in a computer implemented method for generating a video to accompany an audio track. The computing device comprises a processor configured to receive a user selection of an audio track to be accompaned by a video and receive a user selection of video content data to be included in the video, or in response to receiving such selections perform certain functions. In particular, the device may be configured to output for display, on a display, segments of the video content and simultaneously output the audio file to an audio output device, and, as the audio file is output, receive user input indicating the time at which each segment of video content data should be displayed in relation to the position reached in the outputting of the audio file. The device may store the timing data for each group on a storage medium, which may be part of the computing device or accessible by it. As described in the above method, a video is then generated by generating a video frame for each group of video content data, the video frame including the video content data segment of the group for display, and associate the timing data for each group with the correspondng video frame such that the vdeo frame will be displayed at a time based on the time indicated by the associated user input when the video is played with the audio track. This video may be generated by the device, or alternatively the device may send at least the timing data to a further computing device to generate the video.

As with the method described above, the segments of video content may be assgned to groups by the computing device.

The computing device may be configured to carry out any of the methods described herein.

Brief Description of the Drawings

Examples of the invention will now be described in more detail with reference to the accompanying drawings in whch: Figure 1: is a flow chart for creating a video to accompany an audio track accordng to an embodiment of the invention; Figure 2: is an example of an input screen that can be used to enter the track and lyric data; Figure 3: is an example input screen that may be presented to the user once they have input the audio track and the text data/lyrics to accompany it: Figure 4: is an example of an input screen that can be used to enter the track and image data; Figure 5: is an example input screen that may be presented to the user once they have input the audio track and the image files to accompany it; and Figure 6: is a schematic example of hardware that may be used to implement embodments of the inventon.

Detailed Description of the Preferred Embodments

Embodiments of the invention are implemented on a computer system. This may include user devces such as a desktop computer, laptop, tablet computer, mobile phone or smartphone, PDA and other similar device capable of processing audio tracks in the required way. Certain processing steps may optionally occur at a remote server connected to a user device over a network such as the internet, with the user device receiving user input and in turn providng this user input! or other data, to the remote server to select and process the multimedia in accordance with the user's requests. The remote server system may include a database for storing the audio tracks, and s coupled to the user devices via a network, the user devices having a display and an input device such that the user can provide input to select one or more tracks.

Figure 1 shows a computer implemented method according to an embodiment of the invention for automatcally generating videos to accompany an audio track. The method may be carried out on one or more computing devices, since steps may be carried out at a user devce local to the user, and steps may be carried out at a remote server. The method is mplemented by one or more computer programs operating on the computing devices.

At step 101 the user provides input to select a song!audio file for which they would like to create an accompanying video. This selection process may involve the user selecting a song from a list of songs that are accessible by the computer, or a server, e.g. by being stored in a memory or storage device accessible by the computer or server, or the selection may involve the user uploading the song for whch they would like to create the video. At step 102 the user then provides input to select the data making up the video content to be turned into a video to accompany the song. This could include text data, when the content to accompany the song is the song lyrics, or it could include image data, when the content to accompany the song is a pluralty of mages. Again, this data could be already accessible to the cDmputer, or it could be uploaded by the user.

At step 103 segments of the video content data may be assigned to groups, which are used to assign fiming information to the segments of content data. Each segment is assigned to only one group. The segments may be automatically generated, particularly when the video content data is text data representing song lyrics. Alternatively, the segments may be based on individual files, particularly when the video content data is a set of images, each contained in its own image file. The groups, when combned, include all the video content data to be displayed synchronously with the audio track. The groups are optionally soqucntial, such that when displayed in order thoy corrospond to the order in which the video content data is to be displayed alongside the song. Strictly speaking, assignment to groups may not be actively required, since the segments themselves could each bo considered to be a "group" of content data. All that may be needed s for the content data to be approprately dvided into segments that will each occupy their own frame of video. However, ths could still be considered to involve "grouping" portions of video content data, which is a useful feature since it can allow content groups to be displayed and easily identified by the user, and the term will be used in the following discussion.

At step 104 the audio file containing the song is output to an output device, such as one or more speakers, so that the user can listen to it. Simultaneously, the video content contained n each group is dsplayed. As the audio file is being output, at step 105, the user provides input, which is received by the computer, to indicate the time at which each grouping of video content data should be displayed over the song. For example, as the song plays, the user provides input to indicate when the first group should start being displayed. As the song progresses the user provides input to indicate when the next group should start being displayod, replacing the previous group. This can continue until all groups have had timing data assigned to them.

In order to allow the user to provide input to indicate the tming data, each group displayed in step 104 may have an indcator associated with it. The indicators are displayed for the user to see, and each is indicative of, or associated with, a given group for which user input can be received. The ndicators allow the user to determine whch group of video data they are providing timing input for. For example, the indicators may be icons or buttons each associated with a given group, and the user input may be provided by interacting with the icon or button for each group using a user input device, for example by clicking" the button at the appropriate time. Other types of indicator are also possible, such as highlighting the group of video data for which tming data will be assgned in response to user input.

Optionally, only one indicator may be dsplayed at a given time such that the user does not accidently clck or use the wrong indicator to assign timng data. Once user input has been received for a first group, and timing data assigned to it, the indicator for that group may be removed and the ndicator associated with the next group displayed.

At step 106 the timing data for each group is stored so that it can be associated with each group. The video can then be generated at step 107 by generating a video frame for each group containing the content within each segment for display. A given video frame may be displayed for a reasonably extended period of time, since any given frame will likely correspond to several seconds of the audio track. For example, when the segments relate to portions of the lyrics for the audio track, each segment of the lyrics wll need to be displayed for several seconds as the lyrics occur within the track. When the segments relate to images, these may be intended to be still images each displayed over a portion of the audio track sequentially. As such, the video may be considered in certain embociments to be a slide show rather than a conventional video ntended to show moving images.

At step 108 each frame, which corresponds to a particular group, is associated with the timing data established for the relevant group. The timing data can be stored in metadata accompanying the video file, which instructs the media player, or media players, used to output the video and audio track simultaneously when each frame within the video, or each slide, should be cFsplayed n relation to the position/timing reached withn the audio track.

The video and audio data may be stored as a combined media item for output by a media player. When a user opts to output the media item, the audio file is played, and the corresponding video is output sequentially wth the timings of each frame specified by the accompanying metadata based upon user input.

Each vdeo frame is displayed at a time based on the time ndicated by the associated user input when the video is played with the audio track at steps 104 and 105. Although the timing data used may correspond to the timing data obtained from the user input received as the audio track plays in steps 104 and 105, the timing data may be subsequently altered manually and/or automatically. In particular, user input may be received to manually adjust the timing data for a given group and in response the computer manually adjusts the tming data for that group. The time at which each group of video content data should be displayed in relation to the position reached in the outputting of the audio file may be automatically adjusted so that it is displayed earlier n relation to the position reached n the outputtng of the audio file than the time indicated by the user input. The time may be adjusted by a predetermined period, such as between 1 and 5 seconds, to account for the fact that the user will normally provide their input slightly too late, after the audio has reached the relevant section corresponding to the group.

IS An embodiment of the invention will now be described in relation to Figures 2 and 3. This embodment takes audio tracks n any suitable audio file format, such as MP35 or WAys, and turns them into vdeos with accompanying text data such as song lyrics.

Firstly, the user chooses the song to be turned into a video, and also the text to accompany the track, which may be selected by inputting text, such as by copying and pasting the song lyrics as text from another source.

Figure 2 shows an input screen that can be used to enter the text data to accompany a selected track. Once the user selects the audio track they are presented with an input box 201 into which they can insert the text data to be displayed alongside the audio track. The user can then click the "proceed" icon, or button, 202.

Figure 3 shows an example nput screen that may be presented to the user once they have input the audio track and the text data/lyrcs to accompany it. As shown in Figure 3, media player controls 301 are provided that allow the user to play the audio track, along with additional identifiers 302 assigned to specific segments of the lyric text data which have been grouped into sequential groups. Using this input screen, the audio track is played or output for the user to hear, and cue points are assigned to the lyric portions as the audio plays.

The media player may be any sort of player that allows the user to play the track.

Preferably the media player has the functionality to play and pause the track and also to seek to any location within the track using a control such as a scroll bar. The term "media player" used heren may refer to any suitable program or application for decoding audio data and providing appropriate control signals to an audio output device, such as one or more speakers, to output the audio to a user. In particular, browser based media plug-in media players may be used, such as a "flash player", which uses an "adobe flash" plug-in to play audio music tracks, or a Java script media player, such as the type used by YouTube. Also, meda players based on HTML5 or similar protocols can be used to achieve the same purpose using, for example, an HTML5 based media player. As well as browser based plug-ins, application, or "app", type media player programs of the sort used by smartphones, PDAs, tablet computers and the like may be used. As described herein, the audio data played by the media player may be receved from a remote server.

The media player, or a computer program interacting wth the media player program, maintains a log of the position of the track in question. As the track reaches a given position, such as 0.53 seconds or 19.43 seconds etc, the media player, or computer program, receives or determines this position. This is done so that the computer program/system can assign each block of text to a specific point in the video.

Below the player are displayed rows of the lyrics. These rows have been divded into blocks, or groups, of a predetermined number of rows of text. In the example of Figure 3, four rows of text are used for each block. Any line breaks that may have been included in the original text copied into the input box of Figure 2 may be removed. These line breaks may have been introduced by the original formatting of the text, or may be due to format changes when copying the text from a source. For example, the following four lines are shown wth a line break between them to demonstrate this feature: "Once I lived the Hfe of a millionaire Spending my money, I didn't care I carred my friends out for a good time Buying bootleg liquor, champagne and wne" The above text features a line break between the second and third Hnes. Removing the line break, these lines become: "Once I lived the Hfe of a millionaire Spending my money, I didn't care I carred my friends out for a good time Buying bootleg liquor, champagne and wne" As such, the line break between the two lines of one of the verses is removed, and this can be repeated throughout the text to remove all line breaks as necessary.

Associated with each block, or group, of text is an indicator. In the example of Fgure 3, the indicator is a button, or icon, located to the right of each block, with an accompanying form field that dsplays data ndicative of the timing data selected for that block by the user. As the audio track is output, the user may click on the button associated with a given block of text. Whatever time the player has reached for the audio track at the point at which the user clicks the button is the time that is entered by the program into the form field. For example, if the track has reached 9.4 seconds from the start, and the user clicks on the first button associated with the first group or block of text, they would see the form field change to road 9.4 seconds. It is also: optionally, possible for the user to manually change this number by typing into the form field if desired. This process is repeated for each of the blocks of lyrics, and is used to assign frame timings to the video being generated.

The user input, which in the example of Figure 3 is received by activating the buttons 302 marked "click when lyrics start", assigns the current time of the track being played to the video data group. In Figure 3 this is the lyric block to the left of the button. For example, clcking the button at 5.47 seconds into the track enters this value into the associated field, as demonstrated in Figure 3, and determnes when that lyric block will begin in the video that is produced. If the user clicks the button next to the lyric block below at, for example, 12.54 seconds, then in certain embodiments the first block would be displayed from 5.47 to 12.54 seconds, and the second block from 12.54 seconds until the tme determined for the third block is reached.

Once the process has been completed the method can proceed to the next stage, optionally by the user clicking the button at the bottom 304 to proceed to the final stage.

Optionally a check is performed to determine whether every form field has a number entered into it. If not then the user can be presented with a message indicating that they have missed a field and asking whether they wish to continue. The user can then either enter timing data for the missed blocks, or opt to continue without displayng these blocks.

Once all form fields are complete, or the user has opted to omit any incomplete fields, the method can proceed.

In order to create the video to accompany the audio track the software/program arranges each of the blocks or groups of text into a frame. For example, in the example of Figure 3 the first frame of video would contain the following lyrics: "Once I lived the Hfe of a millionaire Spending my money, I didn't care I carred my friends out for a good time Buying bootleg liquor, champagne and wne" These rows of text are converted into a frame n the video. Preferably the text is converted into a predetermined format, including a predetermined colour, font and/or size, and optionally with a predetermined background colour. For example, white, tahoma, font of

size 18 with black background may be used.

The amount of text in each group may determine the properties of the font used. In particular, the number of characters or number of words in each group or block of text may determine the type, sze and/or spacing of the font used. Thus the font size used may be adjusted in proportion to the number of characters or words in each group/block of text.

For example, a particular group containing only five text characters in the lyrics may be displayed using a font that is relatively large compared to a group that contains 50 characters in the lyrics. By adjusfing the size of the font in dependence upon the number of words or characters contained within the particular block, the lyrics displayed in a frame can be expanded or contracted to fit into the display size of the frame as appropriate. The screen size to be filled may amount to a porton of the total screen size to account for borders. For example, the screen size to be filled may be around 70% of the entire screen size, allowng approximately 15% on each side for borders.

Optionally the first frame of the video may be a predetermined picture, text, or combination of the two, that is independent of the video content data selected by the user. This may include a logo or advertisement, and/or the artst of the track, and/Dr title of the track, and/or the username of the user who is creating the video. Ths first frame can be shown for a predetermined period of time, such as between the first 2-5 seconds of the video and preferably around the first 3 seconds of the video. If a time greater than this value, such as 5.47 seconds, has been entered as the time for the first lyric group to be displayed, then several options are possible, and may be applied to all embodiments of the invention. The title frame may be shown fiom 0 to 3 seconds, and then the first lyric group would show from the end of the title frame (3 seconds in ths example) until the second group of lyrics is due to be displayed, which in the example of Figure 3 corresponds to 12.54 seconds.

Embodiments that implement this may therefore not requre the user to specify the timing data for the first group of lyrics or, more generally, the video data content. because the first group of lyrics can simply be displayed aftcr tho first framo has boon shown for a predetermined period, and until the second group are scheduled to be displayed.

However, the predetermined period for displaying the first frame may nipinge upon the timing during which the first lyrics are scheduled to appear, in which case, it may be desirable for the user to be provkled with the option to input timing data for the first group.

In this case, the first frame could be displayed for a predetermined period unless the first group of lyrics commences prior to the expiry of the predetermned period, wherein the system/program will optionally override the predetermined period and display the first group at the allotted time. Alternatively, the first frame may simply be automatically displayed from the start of the song until the timing data indicates that the first group of lyrics should be displayed. Whilst the above has been described for an embodiment for producing text based videos, clearly it can be applied to any other embodiment.

Optionally, the same frame as used for the first frame may be displayed for a period of time at the end of the video. This may include a predetermined period of time such as between the last 2-5 seconds of the vdeo and preterably around the last 3 seconds of the video.

Alternatively, the period of time may be determined based upon the cue point for the final frame of the selectod video content, such as a prodctormined pcriod after this framo.

A further embodiment of the invention will now be described n relation to Egures 4 and 5.

This embodiment takes audio tracks in any suitable audio file format, such as MP3s or WAys, and turns them into videos with images, such as a slideshow, instead of, or as well as, showing lyrics.

Users firstly need to select the audio track for which a video is to be made, such as in any of the manners described above. Then the method can proceed to the next step.

Figure 4 shows an input screen that can be used to enter image data. Once the user selects the audio track they are presented with an input box which they can use to select the video content data, in the form of images, to be displayed alongside the audio track.

The number of images that can be selected may be limted to a predetermined number. In the example of Figure 4, the number of images is limited to 20 or fewer.

Selecting the images may involve selecting images stored on the user's computer. Where parts of the method are performed remotely, selecting the images may involve uploadng desired images from the user's local computing devce, or selecting images from a library of images stored at, or accessble by, a server.

The images, if necessary, are adjusted to ft a predetermined aspect ratio. For example, in embodments that are producing a video for use with a media sharing website such as YouTube, the aspect ratio may be 6:4. In particular, to minimise the amount of image adjustment required, one of the dimensions of the image may be fxed, and the other dimension adjusted to fit the desired aspect ratio. For example, the width of the image may be fixed, and the height adjusted as appropriate. For example, if the image was 500x517 pixels, the width (500 pixels) is taken as fixed, and in order to reach the desired 6:4 aspect ratio the height is reduced down to 333 pxels. The height may be reduced by scaling the image height, or by cropping the image as appropriate. Optionally, the image to be used as the video content data is obtained from the centre of the image. This can be acheved by cropping equally from both the top and bottom of the image to ensure a centred image.

The image adjustment to the desired image size can be performed for all of the images that have been uploaded so they are transformed into images of the appropriate aspect ratio, suchas6:4.

Once the images have been selected, and optionally uploaded if a server is being used to perform certain steps of the method, they may be assigned to groups. In this example, each image s in the form of an image fle, which can be considered a segment of the video content data, and may be assigned to its own "group", since the group contains the image data that will make up the image displayed on a particular video frame or slide. Therefore, a group may be comprised of a single image file. In certain embodiments, more than one image could be assigned to a group if the user wanted to display more than one image on a given frame however. As can be seen from the Figures, grouping involves displaying the images, or text data, n the assigned groups so that the groups can be assigned timing data.

Figure 5 shows an example nput screen that may be presented to the user once the images to accompany the selected audio track have been selected and, if necessary, resized. As wth the embodiment described above, media player controls 501 are provided that allow the user to play the audio track, along with addtional identifiers 502 assigned to specific segments of the image data which have been grouped into sequential groups. in this case each group containing a single mage file. Using this input screen, the audio track is played or output for the user to hear, and cue points are assgned to the images as the audio plays.

Reduced size representations 503, or "thumbnails", of each of the images making up each group are displayed alongside the identifiers 502, such that the user can tell which image, or which images, are being assigned to which portion of the audio track in the resulting video.

The method then involves playing the track and assgning cue points to assign the image slides with the audio. This can be achieved in the same manner as described above, such as for the lyric vdeo embodiment, including all optional features as appropriate.

As with the lyrc video embodiment, there may optionally be an opening and/or closing frame/slide containing the same type of information and displayed in the same way.

Once timing data has been assigned to each image group, these images are then turned into a video or slide show. This is achieved by the sottware/program arranging each of the blocks or groups of images into a frame containing each image. For example, in the example of Figure 5, the first frame of video would contain Image 1, the second frame image 2 and so forth. Each mage is converted into a frame n the video, or a slide.

In order to transfer between frames/slides, one or more transition effects may be used. For example, a simple fade, whereby one image fades to the next, may be used among others.

In all embodments, once the video accompanying the audio track is completed, the combined video and audio track may be formed into a media item containing both the video and audio files, along with metadata containing the timing data for each video frame in order to display the correct frame at the desired time indicated by the user. Such media items may be uploaded to video sharing websites for example.

Embodiments of the invention may be implemented entirely on a single computer, such as a user's computing device, including generating the video and a media file containing the audio and video and associated metadata. However, optionally, embodiments may be performed in a distributed manner. In particular, a server may stream the audio track and the timing data together to the user's device, so that, for example, MP3 data and track position/location data constantly stream in. The user device plays the audio track on a media player, e.g. a browser based media player. The playback timing may therefore originate from a server, for example, at any given time the computer program will receive data such as "track 1: O.S4secs" indicative of the location reached by the selected track being streamed by the user device, as well as the data required by the media player to play the track. A server may therefore send to the media player executing on a local client device the streaming audio/video file plus constant/regular timing or positon updates for use to determine track position.

When the user inputs timing data, such as by clicking a button, a computer program, such as a browser-based script, which may be operatng on the user device, receives and determines the current location data as it streams in, and adds this data into the form field.

Of course, if the track being played is stored locally on the user device, or has been downloaded from the server rather than streamed from it, the tming informaton may be determined at the user device instead.

When the resulUng video is being created, the timing data received, e.g. the timing data in the form fields of Figure 3, along with the video content data, such as text/lyrics/images, may be sent to a server for construction of the video. The finished video can then be streamed or downloaded to the user device, and to other user devices, to output for consumption.

Such distributed arrangements may requre one or two servers plus a local program operating on the user device such as a browser-based "plugin". This s, of course one option. It is possible to perform all the relevant steps locally, for example via a self-contained application, but performing the method n a distributed manner, by performing some steps at the user device and some at a remote computer/server has advantages since, for example, video editing can be time and processor intensive and is better suited to a more powerful server.

Figure 6 shows an example of a system that may also be used to mplement embodiments of the inventon. Figure 6 shows a remote computing device, such as a server 701, arranged to perform the methods described herein in conjunction with a local client user device 704. The server comprises a CPU 702 which may be configured to perform the relevant calculations such as one or more of assigning segments of video content data to groups, receiving and associating timng data with each group and associating the timing data with video frames and generating the video to accompany the audio track. Other functional components, such as RAM, may be provided but are not shown for simplicity.

The CPU is coupled to a database, memory portion or store 711 for storing the collections of tracks for use in the methods described herein, as well as the resulting media items containing video and audio, and associated metadata. The database may be stored on a common memory device, such as a hard dsk drive or other type of storage device suitable for storing multimedia. Alternatively, collections of tracks may be stored in separate memories or stores to allow easier backup and updating/uploading of tracks. The memory upon which the track database and/or metadata is stored is coupled to the CPU via a common bus.

The server system 701 further includes an input and output for receiving and sending data to other devices. The input/output 709 is shown as a common unit in Figure 6, but may be provided as separate interfaces. In either case, the server is preferably connected or connectable to a plurality of client user devices over a network such as the internet, or a local or private network. The network may be wired or wireless or a combination of both.

As shown n Figure 6, in communication with the server 701 s a user device 704 such as a computing dovico, laptop, tablet computer, smartphono or similar. Tho user device 704 may be in communication with the server 701 via a network connection such as the internet through which the user device can provide tracks for which a video is to be created, or input to the server for selecting tracks, as well as timing data. The network connection may alternatively be a local network, such as a home network, with the server being a local media server. The server can also send data to the user device including, for example, indications of possible tracks that can be selected by the user and for providing the final video, as well as audio track data and accompanying track timing position data if the track is to be streamed to the user device to nput video content timing data.

The user device 704 ncludes a CPU 705 configured for performing the necessary calculations to receive user input regarding timing data and to provide this to the server, as well as to execute the media player software. The user device may also, in some embociments, perform one or more of assigning segments of video content data to groups, receiving and associating timing data with each group and associating the timing data with video frames and generating the video to accompany the audio track. The CPU is coupled to a display 706 which may or may not be integral to the user device, a user input devce 707 for receiving input from the user, which again may or may not be ntegral to the user device (e.g., a touch screen, keyboard or mouse), and a memory 708 such as a hard disk drive for storing tracks for replay by the user device if required. The user device will also have an audio output device 710, such as a speaker, for outputting the tracks being streamed to the user. The audio output device may be integral to the user device 704, or may be separate from it.

It is possible that more than one server will be used to provide the tracks to a plurality of user devces. Alternatively, the user device may use tracks that are stored locally as mentioned above. Indeed, the user device may perform some or all of the relevant steps of the method described herein, with the server being used, if required by the user, to upload videos such as uploading to the internet for others to view.

According to certain embodiments of the invention all or part of the method may be performed by a computer program that executes within a web browser program such as Internet Explorer or Firefox for example, or is associated therewith.

In the above description, the term "track" has been used to refer to audio files containing a piece of music or a song. The audio files may be in any appropriate format such as AIFF, WAy, FLAC, WMA, RealAudio, MP3 or any file type that may accompany a video file.

Embodiments of the invention can be implemented as a distributed system involving a server and one or more user devices that access audio tracks stored at the server. The user input may be receved at the server over a network, such as a local network or the internet, from a user device such as a desktop computer, laptop, tablet computer, mobile phone or smartphone, PDA. Any of the calculations and processes involved may be performed locally at the user device or remotely at the server as appropriate. For example, the processes and calculations involved in assigning segments of the video content data to groups, storing timing data for each group, generating video frames for each group and associating timing data for each group with video frames may each be performed locally on the user device or remotely at the server. It should be appreciated that it is the method steps described above that are important, and not, necessarily, where those steps are performed. Clams

Claims

1. A computer implemented method of generaling a video to accompany an audio track, the method comprising: -receiving a selection of an audio file to be accompanied by a video; -receiving a selection of video content data to be included in the video; -assigning segments of the video content data to two or more groups; -displaying, on a display, the vdeo content of each group and simultaneously outputtng the audio He and receiving, as the audio file is output, user input indicating the time at which each group of video content data should be displayed in relaton to the position reached in the outputting of the audio file; -storing the timing data for each group; -generating a video by generating a video frame for each group of video content data, the video frame including the vdeo content data segment of the group for display, and associating the timing data for each group with the corresponding video frame such that the video frame is displayed at a time based on the time indcated by the associated user input when the video is played with the audio file.

2. A method according to claim 1 further comprising dividing the video content data into segments, each segment to be assigned to a group.

3. A method according to claim 2 wherein the video content data is text data, containing text to accompany the audio file.

4. A method according to claim 3 wherein the text s received as lines of text, wherein each segment comprses a predetermined number of lines of text.

5. A method according to claim 4 wherein the number of lines of text is four.

6. A method according to claim 1 or 2 wherein the video content data is image data, containing images to accompany the audio file.

7. A method according to claim 6 wherein the image data is provided as a plurality of image files, each of whch s assigned to a given group.

8. A method according to any preceding claim wherein the step of displaying the video content of each group further comprises displaying one or more indicators each indicative of a given group for whch the user input can be received.

9. A method according to claim 8 wherein the indicators are icons or buttons each associated with a given group.

10. A method according to claim 9 wherein the user input is provided by interacting with an icon or buffon for each group using a user input device.

11. A method according to any of claims B to 10 wherein only one indicator is displayed at a given time, the method further comprising, once user input has been received for a first group, removing the indicator for the first group and dsplaying an indicator associated with the next group.

12. A method according to any preceding claim further comprising displaying the timing data for each group as it is received from the user input.

13. A method according to any preceding claim further comprising receiving user input to manually adjust the timing data for a given group and, n response, manually adjusting the timing data for the group.

14. A method according to any preceding claim further comprising automatically adjusting the time at which each group of video content data should be displayed in relation to the position reached in the outputting of the audio file.

15. A method according to claim 14 wherein the time at which each group of video content data should be displayed s adjusted so that it is displayed earlier in relation to the position reached in the outputting of the audio file than the time indicated by the user input.

16. A method according to claim 15 wherein the time is adjusted by a predetermned perod.

17. A method according to claim 16 wherein the period is between 1 and 5 seconds.

18. A method according to any preceding claim dependent upon claim 3 wherein the size of the text characters displayed in each video frame s determined based upon the amount of textual content contained in each group.

19. A method according to claim 18 wherein the number of characters or number of words in each group of text data determines the size of the text characters used, whereby the font size used for each frame is determined based on the number of characters or

20. A method according to claim 19 wherein the size of the font used s adjusted in direct proportion to the number of text characters or words used.

21. A computer program for use in the method of generating a video to accompany an audio file of claim 1, the computer program, when loaded onto a computing device, being configured to cause the computing device, in response to receiving a selection of an audio file to be accompanied by a video and receiving a selection of video content data to be included in the video, to: -assign segments of the video content data to two or more groups; -output for display the video content of each group and simultaneously output the audio file, and, as the audio file is output, in response to receiving user input indicating the time at whch each group of video content data should be displayed in relation to the position reached in the outputting of the audio file, store the timing data for each group; -generate a video, or send the timing data to a further computing device to generate a video; -wherein the vdeo is generated by generatng a video frame for each group of video content data, the video frame includng the video content data segment of the group for display, and assocate the timing data for each group with the corresponding video frame such that the video frame will be displayed at a time based on the time indicated by the associated user input when the video is played with the audio file.

22. A computer program accordng to claim 21 which when loaded onto a computing device is configured to cause the computing device to carry out any of the methods n claims2to 17.

23. A computing device for generating a video to accompany an audio file, the computing device comprising a processor configured to: -receive a user selection of an audio file to be accompanied by a video; -receive a user selection of video content data to be included in the video; -assign segments of the video content data to two or more groups; -output for display, on a display, the vdeo content of each group and simultaneously output the audio file to an audio output device, and, as the audio file is output, receive user input indicating the time at which each group of video content data should be displayed in relation to the position reached in the outputtng of the audio file; -store the timing data for each group on a storage medium; -generate a video, or send the timing data to a further computing device to generate a video; -wherein the vdeo is generated by generatng a video frame for each group of video content data, the video frame includng the video content data segment of the group for display, and assocate the timing data for each group with the corresponding video frame such that the video frame will be displayed at a time based on the time indicated by the associated user input when the video is played with the audio file.

24. A computing device according to claim 23 further configured to carry out any of the methods in claims 2 to 17.

25. A computing device according to claim 23 or 24 wherein the computing device is a desktop computer, laptop, tablet, smartphone, PDA or other suitable computing device, and wherein the further computing device is a server.

26. A computing device substantially as herein described with reference to the accompanying Figures.