CN105247879A

CN105247879A - Client device, control method, system and program

Info

Publication number: CN105247879A
Application number: CN201480029851.7A
Authority: CN
Inventors: 繁田脩; 斋藤直毅; 宫崎丽子; 金子孝幸
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2013-05-30
Filing date: 2014-04-18
Publication date: 2016-01-13
Anticipated expiration: 2034-04-18
Also published as: WO2014192457A1; EP3007456A4; CN105247879B; EP3007456A1; US10225608B2; BR112015029324A2; JP6369462B2; JPWO2014192457A1; US20160142767A1

Abstract

The invention provides a client device capable of presenting emotion expression data indicating the viewing reaction of other users, or of the user of this client device, and to provide a control method, system and program. This client device is provided with: an acquisition unit which acquires the viewing user's reaction to content; an emotion inference unit which infers the viewing user's emotions on the basis of the information about the viewing user's reaction acquired by the acquisition unit; a determination unit which determines emotion expression data that indicates the emotions inferred by the emotion inference unit; and an output unit which outputs the emotion expression data determined by the determination unit.

Description

Client devices, control method, the system and program

Technical field

The disclosure relates to a kind of client devices, control method, system and program.

Background technology

In recent years, along with the development of network, the communication between user catches on.In this case, a kind of fashion is popularized gradually below, wherein, with together with other users of remote location to such as ardent, sad, laugh at, the various emotion such as surprised or angry has view content resonantly.Propose all resonance systems as described below (empathysystem, empathize system), such as, as the sympathetic response of emotion of other users and the technology of the reaction of other users for understanding and watch identical content.

In patent documentation 1, propose a kind of content reproduction equipment, when view content, it makes the reaction of other users of viewing identical content reflect in real time in the content, and provides telepresenc.Specifically, disclosed in patent documentation 1, content reproduction equipment is based on the ardent information of each user of real-time collecting, obtain the ardent degree of multiple user, and combine the ardent effect and content that are made up of the video of the ardent degree representing multiple user or audio frequency, and regenerate.

In patent documentation 2, propose following a kind of information processing system, wherein, the user of view content can understand other users a lot of in real time to the reaction audio frequency of identical content, and can share emotion with other users, such as, and emotion dearly.

In patent documentation 3, propose a kind of copic viewing system, this system by receive and broadcast program time collect the audio frequency of the session person of one's own terminal, send to server by converting lteral data to, the lteral data of other-end is meanwhile received from server, and unification shows on a display screen, simply obtain empathize.

Reference listing

Patent documentation

Patent documentation 1:JP2011-182109A

Patent documentation 2:JP2012-129800A

Patent documentation 3:JP2012-90091A

Summary of the invention

Technical problem

But when using the audio frequency generated when watching by user, such as, in above-mentioned patent documentation 2 and 3, although save the time of text event detection, the situation of session mutually can be assumed to be cross-talk.Such as, by being transferred to every other user by with the reaction of content incoherent family session etc., the content viewing hindering different user will be become.

And, although patent documentation 1 makes the ardent effect that is made up of the video of ardent degree or audio frequency that represent multiple user and content combine and regenerate, when companion's video combines with content and regenerates, will become obstruction user and watch.

Therefore, present disclosure proposes a kind of client devices of novel improved, control method, system and program, it can present the emotion representation of data of the viewing reaction that other users or user itself are shown, and does not hinder user to watch.

The technical scheme of dealing with problems

According to the disclosure, provide a kind of client devices, it comprises: acquiring unit, and it obtains viewing user to the reaction of content; Emotion presumption unit, it is according to the reaction information of the viewing user obtained by described acquiring unit, the emotion of presumption viewing user; Determining unit, it determines to represent the emotion representation of data being estimated the emotion that unit estimates by described emotion; And output unit, it exports the emotion representation of data determined by described determining unit.

According to the disclosure, provide a kind of control method, comprising: for obtaining viewing user to the step of the reaction of content; For the reaction information of the acquisition according to viewing user, the step of the emotion of presumption viewing user; For determining the step of the emotion representation of data representing the emotion estimated; And for exporting the step of the described emotion representation of data determined.

According to the disclosure, provide a kind of system, comprising: client devices, it has: acquiring unit, and it obtains viewing user to the reaction of content; Emotion presumption unit, it is according to the reaction information of the viewing user obtained by described acquiring unit, the emotion of presumption viewing user; Determining unit, it, according to the integrated results of the presumption emotion result received from server, determines emotion representation of data; And output unit, it exports the emotion representation of data determined by described determining unit; And server, it has: integral unit, and it integrates the emotion presumption result of the viewing user received from multiple client devices.

According to the disclosure, providing a kind of program, being used as: acquiring unit for making computer, it obtains viewing user to the reaction of content; Emotion presumption unit, it is according to the reaction information of the viewing user obtained by described acquiring unit, the emotion of presumption viewing user; Determining unit, it determines to represent the emotion representation of data being estimated the emotion that unit estimates by described emotion; And output unit, it exports the emotion representation of data determined by described determining unit.

Advantageous effects of the present invention

According to the present disclosure of such as above contents, the emotion representation of data of the viewing reaction that other users or user itself are shown can be presented.

Accompanying drawing explanation

[Fig. 1] is the diagram of the summary for describing the viewing reaction reponse system according to an execution mode of the present disclosure.

[Fig. 2] is for being described through display user and the video of other users to strengthen the diagram of the situation of sympathetic response.

[Fig. 3] is the block diagram of the in-built example of each equipment that the viewing reaction reponse system formed according to the first execution mode is shown.

[Fig. 4] illustrates to export the diagram of example, wherein, and the emotion representation of data (non-karst areas reaction) that formed by text data of Overlapping display in terms of content.

[Fig. 5] illustrates the diagram exporting example, wherein, superposes ground in terms of content show the emotion representation of data (non-karst areas reaction) formed by text data by technique of expression (such as, comedy).

[Fig. 6] illustrates the diagram exporting example, wherein, and the emotion representation of data (non-karst areas reaction) formed by text data by projector projects.

[Fig. 7] is the diagram of the output example for describing the emotion representation of data (non-karst areas reaction) formed by voice data.

[Fig. 8] is the flow chart of the operational processes of the viewing reaction reponse system shown according to the present embodiment.

[Fig. 9] is the block diagram of the in-built example of each equipment of the formation viewing reaction reponse system illustrated according to the second execution mode.

[Figure 10] is the block diagram of the in-built example of each equipment of the formation viewing reaction reponse system illustrated according to the variation of the second execution mode.

[Figure 11] is the block diagram of the in-built example of the client devices illustrated according to the 3rd execution mode.

[Figure 12] is the block diagram of the in-built example of the client devices illustrated according to the 4th execution mode.

Embodiment

Below, with reference to accompanying drawing, one or more preferred implementation of the present disclosure is described in detail.It should be noted that in the present description and drawings, there is identical function substantially and represented by identical reference symbol with the element of structure, and save repeat specification.

Provide explanation in the following order.

1, according to the summary of the viewing reaction reponse system of embodiment of the present disclosure

2, each execution mode

2-1, the first execution mode

2-1-1, structure

2-1-2, operational processes

2-2, the second execution mode

2-3, the 3rd execution mode

2-4, the 4th execution mode

3, sum up

<<1, summary >> according to the viewing of embodiment of the present disclosure reaction reponse system

First, with reference to Fig. 1, the summary of the viewing reaction reponse system according to embodiment of the present disclosure is described.Fig. 1 is the figure of the summary for describing the viewing reaction reponse system according to an execution mode of the present disclosure.

As shown in fig. 1, in viewing environment A, viewing user 3A is by being connected to television equipment 3A (example of the information presenting device) view content of client devices 1A, and in viewing environment B, viewing user 3B watches identical content by the television equipment 3B being connected to client devices 1B.Further, in other viewing environment C and D, other users similarly watch identical content by the client devices of himself.Below, when not needing to describe individually each client devices 1A and 1B, client devices 1 is called.

Each client devices 1 is connected to server 2 (cloud) by corresponding network.

Each television equipment 3A and 3B is an example of the information presenting device of output content, and except all video display apparatus as shown in Figure 1, can be realized by video reproducing apparatus (loud speaker).Further, can be the video display apparatus that loud speaker embeds in it.

Further, in each television equipment 3A and 3B, comprise transducer 11A and 11B of the reaction for detecting viewing user.Below, when not needing to describe individually each transducer 11A and 11B, transducer 11 is called.

Such as, transducer 11 is realized by camera (imaging device) or microphone (sound collection device), and the action of acquisition viewing user, facial expression or audio frequency.Camera can be the particular cameras of the depth information that can obtain photograph main body (viewing user), such as, the Kinect of Microsoft, and microphone (being called microphone below) can be multiple microphone set, such as, microphone array is become.Transducer 11 is to client devices 1 output-response information (testing result), and it shows the reaction of the viewing user obtained.

In such configuration, viewing reaction reponse system according to embodiment of the present disclosure can by performing feedback (the non-karst areas reaction etc. of emotion representation of data to viewing user, to be described below), emotion representation of data represents the viewing reaction of other viewing users, provides the sympathetic response of emotion.

Such as, when watching comedy routine, export the laugh in meeting-place, applause or smiling face time, compared with situation about not exporting, user more interestingly will feel oneself.Further, watch science program time, when see nod appearance, conviction sound or appreciate sound time, user have oneself convince or understand sensation strong.Further, when watching ball match, when hearing cheer, ardent just stronger, on the contrary, when hearing disappointed sound, sorry sensation is just strong.Further, when watching the program with moving dearly, hear sob sound or see the face of shedding tears time, the sensation be deeply moved is just strong.In this way, when the appearance (performing artist of program) of other users by view content shares emotion, user just strengthens this emotion.

Usually, when real person to person exchanges, even if without any need for speech (session), also usually set up interchange.Such as, when jointly watching TV programme etc. together, though only have such as smile or facial expression reaction and there is no session, also set up interchange by sharing this space.Therefore, with when watching identical content the user of remote location, with the state of the part superposition on the screen of content, when showing the video of companion, even without session, also content viewing is hindered.

(background)

At this, a kind of method, such as according to the following methods of situation, be regarded as wherein multiple user be shared in be far apart position watch same content viewing reaction method.

The situation of a user's view content that () is familiar with

Suppose a kind of situation, wherein, watch identical content by the specific user be familiar with (such as, friend or acquaintance), share emotion.In this case, while viewing identical content, use and can transmit the communication service with receiver, video, audio frequency, text etc., can emotion be exchanged.But, when text, need to perform text event detection while view content, therefore, hinder content viewing.In addition, when using video or audio frequency, although save time, mutual session or the complete cross-talk of video council of text event detection.Such as, be transferred to every other user with family's session etc. that content is irrelevant, and hinder other users to carry out content viewing.

The situation of (b) unfamiliar user's view content

Such as, when user's viewing sports tournament at home, if share emotion with not specific other users a large amount of, such as, other supporters of the team of User support, and can be lively together, so content viewing can more make us enjoying.Usually, when by television equipment viewing sports tournament, the audio frequency (audio frequency collected in competition venue) of the supporter of Liang Ge team can be heard, therefore, be difficult to obtain with other supporters of the team of User support truly together with sensation.

In this case, such as, as shown in Figure 2, by present the video 120b of the video 120a of another supporter and audio frequency and its oneself (viewing user 3A) and audio frequency and usually broadcasted by television equipment 100 or cheer each other in real time, the sympathetic response with other supporters can be strengthened.But, in fact the Audio and Video of its oneself (viewing user 3A) is transferred to not specific a large number of users, and privacy concern occurs.

Therefore, by focusing on above-mentioned situation, reach the viewing reaction reponse system created according to each execution mode of the present disclosure.Viewing reaction reponse system is according to the present embodiment when watching identical content with specific or not specific other users a large amount of; by the method for privacy of user can be protected; can other users of feedback representation viewing reaction emotion representation of data; and can emotive sympathy be impelled, and not hinder user to watch.

In particular, in order to not hinder user to watch, and protection privacy of user, use non-karst areas reaction, such as, as representing the emotion representation of data that the viewing of other users is reacted.In the disclosure, non-karst areas reaction is the word representing emotion, such as, onomatopoeia, it comprises " onomatopoeia ", and onomatopoeia represents the word of not sounding (such as, conditioned disjunction sensation) by sound, and " onomatopoeia ", it is the word representing sound or voice, interjection, laugh or the sob generated by object.In viewing reaction reponse system according to the present embodiment, presented by different sound sources and represent that the non-karst areas of the emotion estimated from the reaction of user reacts, such as, directly do not present the original video of user to other users, therefore, the privacy of protection user.Further, according to the emotion estimated in the reaction of view content, present non-karst areas reaction, not with family's session etc. that detected view content is irrelevant, therefore, do not hinder other users to watch, make other users eavesdrop the original audio of user.

Specifically describe according to this viewing reaction reponse system of the present disclosure by comprising following multiple execution mode.

<<2, each execution mode >>

<2-1, the first execution mode >

First, with reference to Fig. 3 to Fig. 8, the viewing reaction reponse system according to the first execution mode is described.

(2-1-1, structure)

Fig. 3 is the block diagram of the in-built example of each equipment that the viewing reaction reponse system formed according to the first execution mode is shown.As shown in Figure 3, according to the present embodiment viewing reaction reponse system there is multiple client devices 1A-1 and 1B-1 and server 2-1.

Client devices 1A-1

Client devices 1A-1 is a kind of equipment, and it controls viewing environment A shown in FIG, and specifically, such as, as shown in Figure 3, has reaction information acquiring unit 12, emotion presumption unit 13 and transmission unit 15.

Reaction information acquiring unit 12 obtains the reaction information (image, voice data) of viewing user from the transducer 11A realized by camera or microphone (with reference to Fig. 1), and outputs it to emotion presumption unit 13.Herein, transducer 11A is not limited to camera or microphone, and can be implant acceleration transducer or the angular acceleration transducer that viewing user 3A puts into pocket or the mobile telephone terminal taken or smart phone.In this case, reaction information acquiring unit 12 can obtain acceleration information or angular acceleration information, as the reaction information of viewing user.Further, reaction information acquiring unit 12 can obtain biological information from the biology sensor (example of transducer 11A) that viewing user 3A wears, and such as, pulse, perspire or body temperature, as the reaction information of viewing user.

Emotion presumption unit 13 is according to the reaction information of the viewing user obtained by reaction information acquiring unit 12, and the emotion of user is watched in presumption.The presumption emotion information (emotion presumption result) of viewing user is exported to server 2-1 by transmission unit 15 by emotion presumption unit 13.

Herein, although can consider that various technology is for estimating the model of emotion, such as, can use " wheel of emotion " that proposed by RobertPluntchik.From 8 the application emotions be made up of the combination of 2 (fear, disappointed, regret, scorn, attack, optimistic, like and obey) with 8 kinds of basic emotions (fear, surprised, sad, detest, anger, expect, happiness and accept), set up the wheel of emotion and define various types of emotion.

As the emotion presuming method of emotion presumption unit 13, can (such as) be come from facial expression or body gesture presumption emotion by the face-image of image procossing viewing user 3A, or can (such as) be use microphone to utilize Sound seperation only to extract the audio frequency of specific user, and speech recognition technology is used to this voice data.Such as, when smiling face or laugh being detected, emotion presumption unit 13 estimates the emotion of " laugh " or " happy ".Further, when detect cry face or sob, emotion presumption unit 13 estimates the emotion of " sob " or " sad ".

Further, in addition, emotion presumption unit 13 can according to the size of the pupil of user, body temperature, sweat etc., the emotion of presumption user.In addition, when the text data of the SNS inputted when to be obtained by user at content viewing by reaction information acquiring unit 12, the emotion of user can be estimated by the text data analyzing input.

Server 2-1

Server 2-1 is incorporated into the emotion presumption result of each viewing user in multiple viewing environment, and integrated results is transferred to the client devices of each viewing environment.Specifically, as shown in Figure 3, server 2-1 according to the present embodiment has receiving element 21, integral unit 23 and transmission unit 25.

Receiving element 21 receives the emotion presumption result of viewing user 3A by network from client devices 1A-1, and outputs it to integral unit 23.It should be noted that, although in the embodiment shown in fig. 3, from client devices 1A-1, only receive emotion presumption result, but receiving element 21 receives the emotion presumption result of each viewing user 3B to 3D from other viewing environments (such as, viewing environment B, C and D).

Integral unit 23 integrates the emotion presumption result of multiple user, and give each client devices (herein by integrated results via Internet Transmission by transmission unit 25, client devices 1B-1, as an example) specifically, integrated results is the statistics of the emotion presumption result of multiple viewing user.Integral unit 23 can integrate the emotion presumption result of specific viewing user, or can integrate the emotion presumption result of not specific a large amount of viewing user.

Such as, have in this statistics integrate the quantity of user, ratio of males and females, age distribution, integration emotion presumption result ratio (such as, laugh 60%, surprised 20%, there is no reaction 20% etc.), must the grading of each emotion presumption result.Such as, the score of emotion presumption result is set to maximum score by 1.0, and if cachinnate, so pays 0.8 score, if little sound is laughed at, so pays score 0.2, by that analogy.

Further, to describe below, due to the situation that the details with the classification criterion content of emotion representation of data (female voice, male voice etc.) changes, so integral unit 23 can comprise the metadata of content in integrated results.As the example of metadata, the information such as classifying content ID (such as, TV, video, video website), title, airtime, school, performing artist, playwright, screenwriter or producer, URL (when internet video), a series of high pitchs (tweet) list that is associated with content can be comprised.

Client devices 1B-1

Client devices 1B-1 is a kind of equipment, it controls the viewing environment B shown in FIG, and according to the integrated results transmitted from server 2-1, determines emotion representation of data, and from information presenting device (such as, television equipment 30B), export this result.

Specifically, as shown in Figure 3, client devices 1B-1 has receiving element 16, determining unit 17 and output control unit 18.Receiving element 16 receives the integrated results (statistics) of the emotion presumption result of multiple viewing user from server 2-1 by network, and outputs it to determining unit 17.

Determining unit 17 is according to integrated results determination emotion representation of data.Such as, under " laugh " emotion occupies a high proportion of situation, determining unit 17 determines the emotion representation of data representing " laugh " emotion.Such as, this emotion representation of data is formed by voice data or text data.Such as, represent that the emotion representation of data of " laugh " emotion can be the reaction of above-mentioned non-karst areas.Specifically, such as, can be the text data of the voice data of the laugh (different sound sources and the raw tone of non-viewing user 3A) of registering in advance or the onomatopoeia of such as " oh oh " or " laughing a great ho-ho ".By using alternative sounds source to replace, protect the privacy of each user.

Further, the determination of the emotion representation of data be made up of the text data of onomatopoeia can be performed, such as, such as " Ow " be estimated as the large emotion representation of data of the possibility of display user to " cheer " of the emotion of the reaction of playing football close to goal as representing.Further, the emotion representation of data be made up of the text data of onomatopoeia can be determined, such as, " sigh " is as representing the emotion representation of data that the possibility of " disappointment " is large, or " condemnation ", as represent be estimated as show miss shooting time user the emotion representation of data of " angry " of emotion of reaction.

Further, the determination of the emotion representation of data be made up of the text data of onomatopoeia can be performed, such as " sob ", " sobbing " etc., as representing the emotion representation of data that the possibility of " sad " that estimate when watching recorded program is large.Further, the determination of the emotion representation of data be made up of the text data of mimetic word can be performed, such as " oh " etc. as representing the emotion representation of data that the possibility of " compellent sensation " that estimates when watching then educational programs is large.Further, the determination of the emotion representation of data be made up of the text data of onomatopoeia can be performed, such as " shriek " etc. as represent watch male idol, female's idol or artistical on-the-spot demonstration time the possibility of " happiness " that estimates large emotion representation of data.Further, can perform the determination of the emotion representation of data be made up of the text data of onomatopoeia, such as " warm " etc. are as representing the emotion representation of data that the possibility of " comfortable " that estimate when animal program is large.

Further, although be not particularly limited the particular instance of non-karst areas reaction (example of emotion representation of data), also comprise in addition such as " long live ", " revering ", " ", " oh ", the word of the emotion of the expression such as " yay " and " being taken aback " " in surprise ".

Up to now, although carried out some to the particular instance of non-karst areas reaction (example of emotion representation of data) to describe, the original expression that but determining unit 17 can use on the internet or used by the specific group in this locality or buzzword, but not usually known mimetic word or onomatopoeia.

Further, determining unit 17 can change the rendering method of non-karst areas reaction (example of emotion representation of data) according to integrated results (statistics).Such as, when the possibility of " laugh " emotion is large, determining unit 17 can increase tone or the volume of laugh, to be defined as non-karst areas reaction, or can superpose the laugh of the number (type) corresponding with the ratio of " laugh ".

Further, even if same emotion, determining unit 17 also can switch according to situation or frequency, the type of the emotion representation of data determined.Such as, when estimating " laugh " emotion continuously, determining unit 17 changes and determines dissimilar laugh, and carries out the judgement of identical laugh.Further, determining unit 17 according to the content of watching, can change the type of emotion representation of data.Such as, when estimating the emotion of " happiness ", and when content is the on-the-spot demonstration of female's idol, beholder's demographics that determining unit 17 estimates this content is the male sex, therefore, emotion representation of data is determined by male sex's audio frequency (such as, " Ow ").Further, when content is the on-the-spot demonstration of male idol, beholder's demographics that determining unit 17 estimates this content is women, therefore, determines emotion representation of data by women's audio frequency (such as, " well ").Further, outside the demographics of beholder unless the context, determining unit 17 can determine the emotion representation of data of the type corresponding with the attribute of viewing user, and such as, the ratio of males and females of viewing user, this is included in integrated results.It should be noted that determining unit 17 can determine emotion representation of data according to beholder's demographics of content and at least one of watching in the attribute of user.

Output control unit 18 performs control, to export the emotion representation of data determined by determining unit 17 from information presenting device.Such as, output control unit 18 makes when emotion representation of data is the voice data of laugh from loud speaker regeneration (example of information presenting device) and makes Overlapping display in the content in television equipment 30B (example of information presenting device) when emotion representation of data is text data.Below, with reference to Fig. 4 to Fig. 7, the output example of emotion representation of data is described.

Fig. 4 illustrates to export the diagram of example, wherein, and the emotion representation of data (non-karst areas reaction) that formed by text data of Overlapping display in terms of content.As shown on the left display screen 180a of Fig. 4, represent " Ow of " cheer " emotion! " and "! " etc. emotion representation of data such as " touch-touch " of emotion representation of data and expression " anxiety " emotion, be made into text, and on the display screen of television equipment 30A, exported by Overlapping display in terms of content.

Further, as shown on the right display screen 180b of Fig. 4, represent " laugh " emotion " oh oh! ", the emotion representation of data system such as " laughing a great ho-ho " and " he-he ", be made into text, and exported by Overlapping display in terms of content on the display screen of television equipment 30A.It should be noted that in the example shown on the right of Fig. 4, according to statistics, determining unit 17 determines that the ratio with " laugh " is large, and export emotion representation of data by polytype technique of expression.

In this way, by representing the technique of expression of emotion estimated according to the reaction of viewing user, be provided in the reaction of other viewing users in remote location, and without the need to the raw tone of watching user or in fact convert raw tone to text.

It should be noted that be not particularly limited the word of text data, and can be technique of expression, such as, comedy.Herein, show output example in Figure 5, wherein, by technique of expression (such as, comedy) the emotion representation of data (non-karst areas reaction) that formed by text data of Overlapping display in terms of content.As shown on the display screen 180c of Fig. 5, represent that the emotion representation of data (non-karst areas reaction) of " laugh " emotion is presented by technique of expression (such as, comedy).Further, when using technique of expression (such as, comedy), determining unit 17 can be determined such as, the emotion representation of data formed by the effect line (draw data) of comedy.

Further, the output example of emotion representation of data is according to the present embodiment not limited to the example that shows in figures 4 and 5.Such as, the emotion representation of data of text data or draw data can be incident upon in the wall portion of the surrounding of television equipment 30 by output control unit 18, coordinates with external equipment (such as, projecting apparatus) (example of information presenting device).In this way, a part of content can be avoided to disappear because showing emotion representation of data with superposing in terms of content.Herein, show one in figure 6 and export example, wherein, the emotion representation of data (non-karst areas reaction) formed by text data by projector projects.

Output control unit 18 sends emotion representation of data Q1 to the Q3 formed by text data in projecting apparatus (not shown), and wall portion around television equipment 30B projects this data, such as, as shown in Figure 6.Such as, emotion representation of data Q1 to Q3 is multiple text datas, for performance " laugh " emotion.

In addition, output control unit 18 can export emotion representation of data by the vibrator generation vibration in make embedding watch sofa that user 3B sits or earphone.

Further, when obtaining the emotion presumption result of each viewing user from server 2-1, output control unit 18 according to each emotion presumption result determination emotion representation of data, and can export each emotion representation of data and respectively illustrate face-image or the head portrait of emotion representation of data.

Further, when non-karst areas reaction is exported by voice data, although output control unit 18 can export from the loud speaker be included in television equipment 30B, only hear and other viewing sound of users and not before eyes also can lack presence.Therefore, output control unit 18 according to the present embodiment can (such as) use circulating loudspeaker, virtual acoustic technology etc. audiovideo to be placed on the side or above of viewing user 3B, and can copying surroundings with just as other viewing users around.Below, with reference to Fig. 7, be specifically described.

Fig. 7 is the diagram of the output example for describing the emotion representation of data (non-karst areas reaction) formed by voice data.As shown in Figure 7, multiple loud speaker 4 (example of information presenting device) is arranged in viewing environment B, and realizes circulating loudspeaker system.Multiple loud speaker 4 can be made up of loudspeaker array.The output control unit 18 of client devices 1B uses the loud speaker (front loud speaker) embedded in television equipment 30B, exports the emotion representation of data formed by voice data, and loud speaker 4 is arranged on around viewing user 3B.Herein, output control unit 18 performs audiovideo Position Control, just as the side or below that in fact other viewings user 31a and 31b be positioned at viewing user 3B, copies presence, and provides stronger sympathetic response to experience.Now, output control unit 18 can by with viewing user 3B the specific audiovideo that other watch emotion representation of data corresponding to user be familiar be placed on and watch user 3B side (watching the position of user 31a).Further, output control unit 18 can watch user 3B side (watching the position of user 31b) by being placed on viewing user 3B institute unfamiliar unspecific other audiovideos watching emotion representation of data corresponding to user in a large number.

Up to now, the structure of each equipment is according to the present embodiment described.It should be noted that in Fig. 3 to Fig. 7, although as an example, describe the situation presenting emotion representation of data in viewing environment B to viewing user 3B, in viewing environment A and other viewing environment C and D, also perform identical process.While watching identical content between mutually, share emotion at each viewing user 3A to 3D of remote location, sympathetic response experience can be obtained.

Further, the internal structure of each equipment shown in figure 3 is an example, and structure is according to the present embodiment not limited to the equipment that shows in figure 3.Such as, server 2-1 can perform the process of " the emotion presumption unit 13 " be included in client devices 1A-1 and be included in the process of " determining unit 17 " in client devices 1B-1.

Further, in the above-described embodiment, although server 2-1 collects the emotion presumption result of the viewing reaction according to viewing user 3A to the 3D watching identical content in each viewing environment A to D, present embodiment is not limited thereto.Such as, when content be sports tournament or package program, server 2-1 can collect the emotion presumption result of the viewing reaction according to user in the place of generating content.Specifically, server 2-1 receives the information (view data, voice data) detected by the various transducers be arranged in football or ball park (camera or microphone), and the emotion of presumption presumption result is included in conformity goal.

Such as, when watching football match in each viewing environment, server 2-1, respectively according to the reaction of the supporter obtained from the supporter seat in football pitch, estimates the emotion of the supporter of the team that each viewing user 3 supports, and presumption result is transferred to each viewing environment.And such as, in viewing environment B, client devices 1B-1 can present the emotion representation of data of the emotion of the supporter representing the team that viewing user 3B supports.

In this way, such as, when watching team that user 3B supports and having the advantage, present and represent that the supporter of this team " is taken aback " and the emotion representation of data of emotion of " happiness ", do not need the sound support in fact presenting both sides supporter.Further, when the team of watching user 3B support is in a disadvantageous position, the emotion representation of data representing supporter's " disappointment " of this team and the emotion of " sad " is presented.In this way, watch user 3 can realize experiencing with the sympathetic response of the supporter of the team supported.It should be noted that each client devices 1 or server 2-1 can determine viewing which team of User support according to the profile information of user, and before coming to matches, which team user can specifically select.

In order to continue, specifically describe the operational processes of viewing reaction reponse system according to the present embodiment with reference to Fig. 8.

(2-1-2, operational processes)

Fig. 8 is the flow chart of the operational processes of the viewing reaction reponse system illustrated according to the present embodiment.As shown in Figure 8, first, in step S203, the reaction information acquiring unit 12 of client devices 1A-1 obtains the reaction information of viewing user 3A from transducer 11A.

Next, in step S206, emotion presumption unit 13 is according to the reaction information of the viewing user obtained by reaction information acquiring unit 12, and the emotion of user 3A is watched in presumption.

Next, in step S209, the emotion seeing user 3A estimated by emotion presumption unit 13 is estimated result by network and is transferred to server 2-1 by transmission unit 15.

Next, in step S212, the integral unit 23 of server 2-1 integrates the emotion presumption result of the multiple viewing users received from each viewing environment by receiving element 21.By network, integrated results is transferred to client devices 1B-1 from transmission unit 25.

Next, in step S215, the determining unit 17 of client devices 1B-1, according to the integrated results received from server 2-1 by receiving element 16 (statistics of multiple emotion presumption result), determines emotion representation of data.Emotion representation of data can be reacted by non-karst areas and represent, such as, and onomatopoeia, mimetic word and interjection, as mentioned above.Further, emotion representation of data can be formed by text data, voice data, draw data etc., as mentioned above.

Then, in step S218, the output control unit 18 of client devices 1B-1 controls, to export the emotion representation of data determined by determining unit 17 from each information presenting device be arranged in viewing environment B (such as, television equipment 30B, loud speaker 4).

While view content, continuously and repeatedly carry out above-mentioned process.

As mentioned above, according to the viewing reaction reponse system according to the first execution mode, viewing user shares emotion with multiple user, and can will represent that the emotion representation of data of the emotion that the viewing of other viewing users is reacted be presented to viewing user and be experienced to obtain sympathetic response.Further, owing in fact not presenting other video watching users or audio frequency, and in fact do not have cross-talk to react irrelevant family's session etc. with viewing, so protect the privacy of each viewing user, and avoid obstruction user viewing.

<2-2, the second execution mode >

In the above-described first embodiment, owing to using emotion representation of data to replace, present the reaction of each viewing user, so do not hinder view content, and also protect the privacy of each user in not specific a large amount of viewing user.But, when the specific user be familiar with uses viewing reaction reponse system according to the present embodiment, even if to other viewings for presenting the original audio of viewing user, the privacy of viewing user also not particularly individual problem.

Therefore, reacting in reponse system according to the viewing of the second execution mode, when emotion representation of data can be extracted from the speech of viewing user, this emotion representation of data is presented to other users specific (user that viewing user allows, friend or the acquaintance be familiar with viewing user).Below, with reference to Fig. 9, be specifically described.

(2-2-1, structure)

Fig. 9 is the block diagram of the in-built example of each equipment that the viewing reaction reponse system formed according to the second execution mode is shown.As shown in Figure 9, according to the present embodiment viewing reaction reponse system there is multiple client devices 1A-2 and 1B-2 and server 2-2.

Client devices 1A-2

Client devices 1A-2 is a kind of equipment, and it controls the viewing environment A shown in FIG, and specifically, such as, as shown in Figure 9, has reaction information acquiring unit 12, emotion presumption unit 13, extraction unit 14 and transmission unit 15.

Reaction information acquiring unit 12 has the function identical with the function of those unit of the first execution mode described with reference to Fig. 3 with emotion presumption unit 13, and according to the reaction information (image, voice data) of the viewing user 3A obtained by reaction information acquiring unit 12, presumption emotion.

Extraction unit 14 extracts emotion representation of data from the reaction information of the viewing user 3A obtained by reaction information acquiring unit 12.Such as, when ardent (such as, the shooting scene of football), (such as, " score at the voice data being obtained the collection of the voice by watching user 3A by reaction information acquiring unit 12! Present score is equal! ") when, extraction unit 14 carries out the audio identification of voice data, and analyzing speech details, as lteral data.And, extraction unit 14 lteral data that search is corresponding with representing the emotion representation of data of emotion from the lteral data analyzed.Such as, the word (interjection) of the emotion representing " in surprise " (such as, " revering ") can be found from above-mentioned voice.Extraction unit 14 extracts the voice data (in fact, original audio) of the user 3A of the word part (segment part) that (extraction) finds, as the emotion representation of data of audio frequency, and outputs it to transmission unit 15.

Transmission unit 15 estimates result and the emotion representation of data of the audio frequency of viewing user 3A that extracted by extraction unit 14 is transferred to server 2-2 by network by being estimated the emotion of viewing user 3A that unit 13 estimates by emotion.

Server 2-2

As shown in Figure 9, server 2-2 has receiving element 21, integral unit 23, transmission unit 25 and user profile DB (database) 28.The function of receiving element 21, integral unit 23 and transmission unit 25 is identical with the function of those unit of the first execution mode described with reference to Fig. 3.Such as which viewing user becomes specific viewing user (other familiar viewing users, such as, friend or acquaintance) or become data storings such as specifically not watching user (other viewing users unfamiliar) in a large number in user profile DB28.Specifically, such as, identical group of ID is associated with the specific watch user be mutually familiar with (it should be noted that have familiar specific when watch multiple combination of user, the different ID that organizes is for each combination).Therefore, other viewing users be associated by identical group of ID are familiar other viewing users specific, and other viewing users be not associated by identical group of ID are unfamiliar other viewing users unspecific.The emotion representation of data of audio frequency of viewing user 3A and the integrated results of integral unit 23 are transferred to specific other viewing users (such as, watching user 3B) by server 2-2 according to the present embodiment.On the other hand, the integrated results of integral unit 23 is only transferred to not specific a large amount of other viewing users (such as, watching user 3C and 3D) being unfamiliar with viewing user 3A by server 2-2.

Client devices 1B-2

Client devices 1B-2 is a kind of equipment, and it controls the viewing environment B shown in FIG, and specifically, as shown in Figure 9, has receiving element 16 and output control unit 18.Suppose that viewing environment B is a kind of environment, wherein, there is the viewing user 3B of other viewing users specific as familiar viewing user 3A.Become specific viewing user by which viewing user of server 2-2 side management or become and specifically do not watch user in a large number, as mentioned above.The client devices 1B-2 controlling viewing environment B performs control, to export the emotion representation of data of the audio frequency of the viewing user 3A received from server 2-2 by receiving element 16 from the information presenting device (such as, television equipment 30B) of output control unit 18.

In this way, because the privacy of watching user is not problem for other viewing users specific of familiar viewing user, so the audio frequency of viewing user presents emotion representation of data.It should be noted that, similar to the first execution mode, the client devices with the structure identical with the structure of the client devices 1B-1 according to the first execution mode watches user to being unfamiliar with not specific a large amount of other watching user, presents the determined emotion representation of data of integrated results of the emotion performance results according to each viewing user.

(2-2-2, variation)

Herein, have following situation, it is believed that, the user be familiar with wishes the scene by view content, and such as, true environment, such as, shooting scene or the foul scene of football transmit direct comment.Therefore, as the variation of the second execution mode, when ardent (specific sympathetic response scene) being detected, temporarily direct communication can be carried out.It should be noted that only temporarily performing direct communication when ardent (specific sympathetic response scene) being detected, can convenience be improved, and not hinder user to watch.Below, specific description is carried out with reference to Figure 10.

Figure 10 is the block diagram of the in-built example of each equipment that the viewing reaction reponse system formed according to the variation of the second execution mode is shown.As shown in Figure 10, according to the present embodiment viewing reaction reponse system there is multiple client devices 1A-2 ' and 1B-2 ' and server 2-2 '.

Client devices 1A-2 '

Client devices 1A-2 ' is a kind of equipment, and it controls the viewing environment A shown in FIG, and specifically, as shown in Figure 10, has reaction information acquiring unit 12A, emotion presumption unit 13, communication unit 10A and output control unit 18A.Reaction information acquiring unit 12A has the function identical with the function of those unit of the first execution mode described with reference to Fig. 3 with emotion presumption unit 13.

Communication unit 10A has according to the function of the transmission unit 15 of the first execution mode described with reference to Fig. 3 and the function receiving the receiving element such as the detection notice of specific sympathetic response scene, the voice data of other viewing users 3 specific from server 2-2 '.

Output control unit 18A controls, to export the output data received by communication unit 10A from the information presenting device be arranged in viewing environment A (television equipment 30A or loud speaker).

Server 2-2 '

As shown in Figure 10, server 2-2 ' has communication unit 22, integral unit 23 and specific sympathetic response scene detection unit 24.

Communication unit 22 has the function of receiving element 21 and the function of transmission unit 25 of the first execution mode described according to reference Fig. 3.

Integral unit 23 is identical with according to the integral unit of the first execution mode described with reference to Fig. 3.Further, integrated results is exported to specific sympathetic response scene detection unit 24 by integral unit 23.

Such as, the integrated results of specific sympathetic response scene detection unit 24 according to integral unit 23 and the scene analysis result of content, detect and have ardent special scenes (hereinafter also referred to as specific sympathetic response scene), such as, and the shooting scene of football.The scene analysis of content can be performed by specific sympathetic response scene detection unit 24.

Specifically, such as, when emotion score exceedes setting, according to the integrated results of integral unit 23, specific sympathetic response scene detection unit 24 detects, as specific sympathetic response scene.Further, such as, when fighting for ball (if words of football) near goal, or when sportsman runs to home base (if words of baseball), according to scene analysis, specific sympathetic response scene detection unit 24 detects, as specific sympathetic response scene.

When specific sympathetic response scene being detected, specific sympathetic response scene detection unit 24 notifies to detect specific sympathetic response scene to each client devices 1 via communication unit 22.By detecting specific sympathetic response scene, such as, on television equipment 30, as text display, this notifies or exports as audio frequency from loud speaker.

And server 2-2 ' impels and starts specifically be familiar with direct communication between user.Herein, between which viewing environment, starting direct communication, automatically determined according to the profile information of user etc. by server 2-2 ', or start direct communication for which watches user, can be selected when notifying by viewing user.

Client devices 1B-2 '

Client devices 1B-2 ' is a kind of equipment, and it controls the viewing environment B shown in FIG, and specifically, as shown in Figure 10, has reaction information acquiring unit 12B, communication unit 10B and output control unit 18B.

Similar to the reaction information acquiring unit 12A of client devices 1A-2 ', reaction information acquiring unit 12B obtains the reaction information (image, voice data) of viewing user 3B from transducer 11B (not shown).

Communication unit 10B has the function of receiving element 16 and the function of transmission unit of the first execution mode described according to reference Fig. 3, and the reaction information of the viewing user 3B obtained by reaction information acquiring unit 12B is transferred to server 2-2 '.

Output control unit 18B is identical with according to the output control unit 18 of the first execution mode described with reference to Fig. 3.Further, output control unit 18B according to the present embodiment presents the detection notice of the specific sympathetic response scene of reception from server 2-2 ' to viewing user B.

By above-mentioned structure, when server 2-2 ' detects specific sympathetic response scene, notify detection notice to client devices 1A-2 ' and 1B-2 ', and start the direct communication between client devices 1A-2 ' and 1B-2 '.

Such as, direct communication is the voice data of the viewing user 3B in fact obtained by each reaction information acquiring unit 12B to specific familiar other viewings user 3A transmission or image and voice data or the image also transmitting viewing user 3A to viewing user 3B.Arranging of user can determine whether direct communication only performs direct communication by voice data, or whether also sends image.

Up to now, when detecting specific sympathetic response scene, for the situation of temporarily carrying out direct communication, be specifically described.It should be noted that except specific sympathetic response scene, the emotion of each viewing user, by converting emotion representation of data (such as, non-karst areas reacts) to, presents to other viewing users.In this way, owing to sending session except necessary session and video and video to other users, so keep secret protection.

Further, although can differently consider temporarily to start direct communication and the time then terminating direct communication, but such as, terminate when client devices 1A-2 ' can communicate when not detecting the stipulated time with 1B-2 ' or server 2-2 ', or can terminate according to the specific END instruction operation of viewing user.Specific END instruction operation is the operation, the posture of watching user, video, sight line etc. that use the remote controllers be connected with television equipment 30.When these the specific END instruction operations that have detected viewing user, client devices 1A-2 ' and 1B-2 ' terminates the transmission of the audio or video of viewing user.

Further, the viewing reaction reponse system of variation according to the present embodiment, even when not detecting specific sympathetic response scene, such as wishing according to viewing user the specific instruction temporarily performing direct communication, temporarily can start direct communication.Use the specific operation (pressing button etc.) in remote controllers, at the information processing terminal (such as, smart phone) GUI in specific operation, given pose, audio identification, line-of-sight detection etc., such as, specific instruction can be performed to client devices 1.The client devices 1 receiving this instruction notify to start with server 2-2 ' and with there are familiar specific other watch the direct communication of the client devices of the viewing environment of user.

It should be noted that have following situation: wish to concentrate on seeing that the user of content wishes other viewing video of user or the direct communications of audio frequency refusing not convert emotion representation of data (such as, non-karst areas reaction) to.In this case, refusal can be set on client devices 1 side, not receive the detection notice of specific sympathetic response scene or direct communication.

<2-3, the 3rd execution mode >

As mentioned above, according to the integrated results of the emotion presumption result of each viewing user, present emotion representation of data by network to the multiple viewing environments being connected to server 2, sympathetic response can be strengthened further for each viewing user.This system strengthening sympathetic response by presenting emotion representation of data can also be applied in the situation with a viewing environment.When only having a viewing environment, although the integrated results of the emotion presumption result of other viewing users can not be used, but by the emotion presumption result according to user instruction, determine the emotion representation of data of non-karst areas reaction etc., and these data are presented to user itself, and the emotion that user feels can strengthen and amplify.Emotion amplifier is called below according to this systematic difference of the present disclosure by view single (site) environment.Below, the concrete emotion amplifier describing basic execution mode with reference to Figure 11.

Figure 11 is the block diagram of the in-built example of the client devices 1-3 illustrated according to the 3rd execution mode.As shown in Figure 11, client devices 1-3 according to the present embodiment has reaction information acquiring unit 12, emotion presumption unit 13, determining unit 17 and output control unit 18.Each structure of each the first execution mode with describing with reference to Fig. 3 in these structures is identical.

Specifically, first, the reaction information watching user is obtained by reaction information acquiring unit 12.Such as, the video (triumph posture etc.) of the active viewing user of the good ball match of viewing one is obtained.

Next, emotion presumption unit 13 is according to the reaction information obtained by reaction information acquiring unit 12, and the emotion of user is watched in presumption.Such as, emotion presumption unit 13 is according to the emotion of video presumption " happiness " of triumph posture.

Next, determining unit 17 determines the emotion representation of data representing presumption emotion.Such as, when " happiness " emotion, determine by audio frequency or text (such as, " long live! " or " yay ") the emotion representation of data that formed.

Then, output control unit 18 exports the emotion representation of data determined by determining unit 17 from information presenting device (television equipment 30 or loud speaker), and ardent (liveliness) of short viewing user amplifies.

In this way, even in view single (site) environment, extension can be presented to according to the emotion representation of data determined of emotion of viewing user itself and see user, the emotion of viewing user can be amplified, and watch user can more active, more sobbing or more cachinnate.

Although use the above-mentioned emotion amplifier according to the 3rd execution mode in view single (site) environment, but be not limited to view single (site) environment according to emotion amplifier of the present disclosure, and use when in the first and second embodiments above-mentioned connects multiple viewing user by network.

In this way, when only watching user and enlivening, such as, in the moment that the little artist only oneself liked comes on stage, or the moment of the good goal in order to the footballer that only meets oneself favourite hobby, enlivening according to viewing user, exports the cheer not having actual content etc., and generation is suitable for the active of each user.

Further, at viewing real time content (such as, football match) time, there is following situation: situation about postponing occurs between the time that soccer goal-shooting scene is enlivened and the emotion of watching user according to other estimate time that result (integrated results) presents emotion representation of data.In this case, in time using together with the above-mentioned emotion amplifier of the 3rd execution mode, can directly present according to the emotion representation of data that the emotion of viewing user itself is determined, and by presenting the emotion representation of data of other viewing users subsequently, can produce active in real time by eliminating delay like this.

Further, in viewing reaction reponse system according to the present embodiment, the emotion representation of data that (output) represents the emotion of other viewing users specific being familiar with viewing user can be presented, then, the emotion representation of data that (output) shows the emotion being familiar with not specific other viewing users a large amount of can be presented.In this way, first can understand the reaction of other viewing users specific (such as, the friend be familiar with), and enlivening between the friend be familiar with can share sympathetic response.Specifically, such as, output control unit 18 can terminate the output of the emotion representation of data (hereinafter also referred to as the first emotion representation of data) of the emotion representing other viewing users specific, then, the output of the emotion representation of data (hereinafter also referred to as the second emotion representation of data) of the emotion of not specific other viewing users a large amount of can be started to represent.Further, output control unit 18 can, while the output diminuendo impelling described first emotion representation of data, impel the output of described second emotion representation of data to fade in (cross-fade).Further, when starting to export described second emotion representation of data, output control unit 18 can make the output resume of the first emotion representation of data by reducing the output valve (size etc. of volume, display text) being greater than the first emotion representation of data of the output valve of the second emotion representation of data.

<2-4, the 4th execution mode >

In the above-mentioned the first to the three execution mode, although present emotion representation of data between the viewing user of real-time viewing content, and strengthen the sympathetic response of each viewing user, viewing reaction reponse system is according to the present embodiment not limited thereto.Such as, by the emotion presumption result of each viewing user or the integrated results (hereinafter also referred to as emotion label) of this scene with content of labelling, subsequently, this emotion label can be used to detect and content recommendation, and perform presenting of emotion representation of data.

Usually, when being provided in the chapters and sections in content in tape deck etc., according to the information such as " CM ", " putting up metatag (scene difference, performer's difference etc.) " or " audio volume level in content ", chapters and sections are set.But, owing to not reflecting the reaction of actual viewing user, even if so be regarded as " wishing the scene that the public oneself watched enlivens ", when view content, be also difficult to the scene that the detection public enlivens,

Therefore, in the present embodiment, by the emotion presumption result according to reality viewing user, emotion label is added in content scene, use the emotion presumption result of in fact other viewing users, the scene of wishing viewing can be selected.And, use the emotion presumption result of in fact other viewing users, for content recommendation, high-precision recommendation can be performed for viewing user.Below, with reference to Figure 12, be specifically described.

Figure 12 is the block diagram of the in-built example of each equipment that the viewing reaction reponse system formed according to the 4th execution mode is shown.As shown in Figure 12, according to the present embodiment viewing reaction reponse system there is multiple client devices 1A-4 and 1B-4 and server 2-4.

Client devices 1A-4

Client devices 1A-4 is a kind of equipment, and it controls the viewing environment A shown in FIG, and specifically, such as, as shown in Figure 12, has reaction information acquiring unit 12A, emotion presumption unit 13A, communication unit 10A and content processing unit 19A.Reaction information acquiring unit 12A, emotion presumption unit 13A and communication unit 10A is identical with each unit of the second execution mode described with reference to Figure 10.

Content processing unit 19A carries out the Regeneration control of the content corresponding with the scene detection of user operation, content according to emotion label etc.Further, content processing unit 19A can send the information (CH information, programme information, temporal information etc.) of the content of being watched by user to server 2-4 by communication unit 10A.

Client devices 1B-4

Client devices 1B-4 is a kind of equipment, and it controls the viewing environment B shown in FIG, and specifically, such as, as shown in Figure 12, has reaction information acquiring unit 12B, emotion presumption unit 13B, communication unit 10B and content processing unit 19B.Reaction information acquiring unit 12B and communication unit 10B is identical with the unit of the second execution mode described with reference to Figure 10.

Further, emotion presumption unit 13B has the function identical with the function estimating unit 13 according to the emotion of the second execution mode described with reference to Figure 10.

Content processing unit 19B has the function identical with the function of foregoing processing unit 19A.

Server 2-4

As shown in Figure 12, server 2-4 has receiving element 21, associative cell 26, emotion label generation unit 27 and transmission unit 25.

Receiving element 21 receives emotion presumption result from the multiple client devices comprising client devices 1A-4.

The emotion presumption result of each viewing user received by receiving element 21 and content carry out mapping (integration) by associative cell 26.Specifically, associative cell 26 makes each emotion presumption result be associated in the time series of content.Now, associative cell 26 can make integration multiple emotion presumption integrated results of result be associated to the corresponding scene (chapters and sections) of content.

Emotion label generation unit 27 generates the information as emotion label according to emotion presumption result (or integrated results) be associated by associative cell 26, which kind of emotion it shows is associated with that scene of content, and gives each client devices 1A-4 and 1B-4 by transmission unit 25 by this information transmission.

By above-mentioned structure, when interior regeneration, content processing unit 19A and 19B (hereinafter also referred to as content processing unit 19) of each client devices 1A-4 and 1B-4 according to the emotion label received from communication unit 10, can perform the scene detection of content.

Such as, in server 2-4, when each user watches certain program X, generate show multiple user whether ardent, laugh at, dejected etc. emotion label, and this emotion label is transferred to client devices 1.

Then, user subsequently view content time, client devices 1 uses and represents that the emotion label that whether enlivens of other actual users is to perform scene detection and commending contents.

In this way, attempt the user of view content and can detect the scene public or particular community in fact enlivened.

Further, the user attempting view content can detect the content being suitable for the current sensation of user oneself (emotion).Such as, when thinking that user wants to laugh at, except the comedy routine that can deduce from title, those comedy routine that can not estimate from title (content) can also be detected.

Further, client devices 1, by using the score of quantity or the emotion presumption result being used for other viewing users that emotion label generates, can recommend the content being more suitable for watching user.

Further, when time shift regenerates certain package program, client devices 1 can use emotion label to perform scene regeneration, such as, only regenerates active scene or interest scene.

It should be noted that except the program of broadcast, be also included in the program, VOD content, internet video content etc. of record in tape deck, as object content according to the present embodiment.

Further, client devices 1 can represent emotion label by icon, color etc. on regeneration progress bar.

It should be noted that, although be not particularly limited in the interior scope as the viewing user of target of associative cell 26 of server 2-4, but the scope as the viewing user of target can be everyone (specific a large amount of), or can be only with viewing user corresponding particular community or friend (other specific users) are set.

<<3, general introduction >>

As mentioned above; by protecting the method for the privacy of viewing user, and when not hindering viewing user to watch, present embodiment can feed back emotion representation of data; represent the viewing reaction of other viewing users or viewing user itself, and enhancing sympathetic response can be impelled.

Further, when the visual communication of correlation technique, in fact the video camera of viewing user oneself and microphone audio are presented to session partner, therefore, from the angle of privacy, there is following problem: in fact the reaction of formation (language, facial expression, motion etc.) do not shown is transferred to partner.Further, even if in the session partner side receiving reaction, the primitive reaction of partner also causes the problem having and hinder content viewing.But, according to the present embodiment, due to by converting emotion representation of data (such as, non-karst areas reacts) to, presenting the reaction of viewing user, so protection privacy, and not hindering content viewing, the experience with other people sympathetic response content can be had.

Further, when the content viewing of correlation technique, content manufacturer side production emotion (laugh at, exclaim, regret), active etc. (such as, the laugh of spectators is inserted comedy routine etc. in).But, according to the present embodiment, due to other viewing users by the reality in specific community (familiar other users specific etc.) or the emotion by individual itself or active present emotion representation of data, experience so more real sympathetic response can be had.

Further, when the common viewing experience of the content of the visual communication using correlation technique, when generating the emotion performance without video, if sight line not to be turned to the video camera of the partner reflected on screen, so this emotion can not sympathetic response, and hinder content viewing (such as, when presenting smiling face in silence, viewing drama programs etc.).But, according to the present embodiment, owing to converting emotion representation of data to without the emotion of audio frequency (only smiling face etc.), so abundanter sympathetic response health check-up can be realized.

Further, when in fact a lot of people performs the common viewing of content, in the surrounding environment of user itself, other person's development are generated.Therefore, when remote content is watched jointly, likely there is uncomfortable sensation, wherein, before user, present other people non-karst areas reaction (voice data).Therefore, according to the present embodiment, because non-karst areas reaction (voice data) be positioned on audiovideo appears at around user, so even when remote content is watched jointly, can feel to realize environment by unified.

Further, in the present embodiment, use projecting apparatus, the peripheral region of television equipment 30 presents the reaction of (projection) non-karst areas, sympathetic response can be realized and experience, and not hinder content viewing.

Above with reference to accompanying drawing, describe preferred implementation of the present disclosure, and the disclosure is not limited to above example certainly.Those skilled in the art can find various changes and modifications within the scope of the appended claims, and it should be understood that these change and revise nature under technical scope of the present disclosure.

Such as, can create to be used for impelling embedding in client devices 1 (1A-1 to 1A-4,1B-1 to 1B-4) and show the computer program of the function of above-mentioned client devices 1 and server 2 with the hardware (such as, CPU, ROM and RAM) of server 2 (server 2-1 is to server 2-4).Further, additionally provide the storage medium that can be read by computer, this computer program stores on that computer.

Further, in the disclosure, although use the content such as football or artistical on-the-spot demonstration to be described, this content is not limited thereto, and comprises the match that multiple user (team member) can be participated in by network.

Such as, user 3A carry out together with other users specific a large amount of certain compete time, non-karst areas can be used to react, present the emotion of the user of the identical team with user 3A.

Further, the emotion presumption result of each user and integrated results can be remained in server 2 by timestamp and the group time series along content.In this way, when reproducing contents subsequently, client devices 1 can obtain and remain on emotion presumption result in server 2 and integrated results, and according to the timestamp of the content regenerated, can present non-karst areas reaction.

In addition, can also following this technology of configuration.

(1) client devices, comprising:

Acquiring unit, it obtains viewing user to the reaction of content;

Emotion presumption unit, it is according to the reaction information of the viewing user obtained by described acquiring unit, the emotion of presumption viewing user;

Determining unit, it determines to represent the emotion representation of data being estimated the emotion that unit estimates by described emotion; And

Output unit, it exports the emotion representation of data determined by described determining unit.

(2) client devices Gen Ju (1),

Wherein, described determining unit estimates the emotion of result according to the emotion incorporating multiple viewing user, determines described emotion representation of data.

(3) client devices Gen Ju (2), comprises further:

Transmission unit, it represents to server transmission the information being estimated the emotion that unit estimates by described emotion; And

Receiving element, it receives the information of the emotion representing the emotion presumption result having incorporated multiple viewing user from described server.

(4) according to (2) or the client devices described in (3),

Wherein, the emotion presumption result of described multiple viewing user refers to the emotion presumption result of specific viewing user.

(5) according to (2) or the client devices described in (3),

Wherein, the emotion presumption result of described multiple viewing user is the emotion presumption result of not specific a large amount of viewing user.

(6) client devices Gen Ju (2),

Wherein, described output unit starts the first emotion representation of data that output is determined according to the integration of the emotion presumption result of specific viewing user, and then, start the integration exported according to the emotion presumption result of specific a large amount of viewing user and the second emotion representation of data determined.

(7) client devices Gen Ju (6),

Wherein, described output unit completes the output of described first emotion representation of data, then, starts the output of described second emotion representation of data.

(8) client devices Gen Ju (6),

Wherein, described output unit, while the output diminuendo making described first emotion representation of data, makes the output of described second emotion representation of data fade in.

(9) client devices Gen Ju (6),

Wherein, when starting to export described second emotion representation of data, described output unit reduces the output valve being greater than the described first emotion representation of data of the output valve of described second emotion representation of data.

(10) client devices according to any one of (9) is arrived according to (1),

Wherein, described emotion representation of data is made up of voice data, text data or draw data (drawingdata).

(11) client devices Gen Ju (10),

Wherein, described emotion representation of data represents the onomatopoeia of predetermined emotion, interjection, effect sound or effect line.

(12) client devices according to any one of (11) is arrived according to (1),

Wherein, described determining unit is determined and beholder's demographics of described content or the type of watching at least one the corresponding emotion representation of data in the attribute of user.

(13) client devices according to any one of (12) is arrived according to (1),

Wherein, described acquiring unit collects the audio frequency of viewing user, as the reaction of viewing user.

(14) client devices Gen Ju (13), comprises further:

Extraction unit, it extracts described emotion representation of data from the audio frequency of collected viewing user; And

Transmission unit, it transmits the described emotion representation of data extracted by described extraction unit to server.

(15) client devices according to any one of (14) is arrived according to (1),

Wherein, described acquiring unit makes the face-image imaging of viewing user, as the reaction of viewing user.

(16) client devices according to any one of (15) is arrived according to (1),

Wherein, described output unit exports described emotion representation of data by least one in audio frequency and display.

(17) client devices according to any one of (16) is arrived according to (1),

Wherein, described output unit exports the described emotion representation of data coordinated with external equipment.

(18) control method, comprising:

Obtain viewing user to the step of the reaction of content;

According to the reaction information of the acquisition of viewing user, the step of the emotion of presumption viewing user;

Determine the step of the emotion representation of data representing the emotion estimated; And

Export the step of determined emotion representation of data.

(19) system, comprising:

Client devices, it has:

Acquiring unit, it obtains viewing user to the reaction of content;

Determining unit, it, according to the integrated results of the presumption emotion result received from server, determines emotion representation of data; And

Output unit, it exports the emotion representation of data determined by described determining unit; And server, it has:

Integral unit, it integrates the emotion presumption result of the viewing user received from multiple client devices.

(20) program, it impels computer to be used as:

Acquiring unit, it obtains viewing user to the reaction of content;

Symbol description

1,1A-1 ~ 1A-4,1B-1 ~ 1B-4: client devices

2,2-1 ~ 2-4: server

3,3A ~ 3D: viewing user

4: loud speaker

10: communication unit

11: transducer

12: reaction information acquiring unit

13: emotion presumption unit

14: extraction unit

15: transmission unit

16: receiving element

17: determining unit

18: output control unit

19: content processing unit

21: receiving element

22: communication unit

23: integral unit

24: specific sympathetic response scene detection unit

25: transmission unit

26: associative cell

27: emotion label generation unit

28: user profile DB

30: television equipment

Claims

1. a client devices, comprising:

Acquiring unit, obtains viewing user to the reaction of content;

Emotion presumption unit, the reaction information based on the described viewing user obtained by described acquiring unit estimates the emotion of described viewing user;

Determining unit, determines to represent the emotion representation of data being estimated the emotion that unit estimates by described emotion; And

Output unit, exports the emotion representation of data determined by described determining unit.

2. client devices according to claim 1,

Wherein, described determining unit determines described emotion representation of data based on the emotion of the emotion presumption result incorporating multiple viewing user.

3. client devices according to claim 2, comprises further:

Transmitting element, sends the information representing and estimated the emotion that unit estimates by described emotion to server; And

Receiving element, receives the information of the described emotion representing the described emotion presumption result having incorporated described multiple viewing user from described server.

4. client devices according to claim 2,

Wherein, the described emotion presumption result of described multiple viewing user refers to the emotion presumption result of specific viewing user.

5. client devices according to claim 2,

Wherein, the described emotion presumption result of described multiple viewing user refers to the emotion presumption result of not specific a large amount of viewing user.

6. client devices according to claim 2,

Wherein, the output of the first emotion representation of data that described output unit starts the integration based on the emotion presumption result of specific viewing user and determines, and the integration then started based on the emotion presumption result of specific a large amount of viewing user and the output of the second emotion representation of data determined.

7. client devices according to claim 6,

Wherein, described output unit completes the output of described first emotion representation of data, and then starts the output of described second emotion representation of data.

8. client devices according to claim 6,

Wherein, described output unit makes the output crescendo of described second emotion representation of data while the output diminuendo making described first emotion representation of data.

9. client devices according to claim 6,

Wherein, when starting the output of described second emotion representation of data, described output unit reduces the output valve of the described first emotion representation of data larger than the output valve of described second emotion representation of data.

10. client devices according to claim 1,

Wherein, described emotion representation of data is formed by voice data, text data or draw data.

11. client devices according to claim 10,

12. client devices according to claim 1,

Wherein, described determining unit determines the type with at least one the corresponding emotion representation of data in the attribute of beholder's demographics of described content or described viewing user.

13. client devices according to claim 1,

Wherein, described acquiring unit collects the reaction of audio frequency as described viewing user of described viewing user.

14. client devices according to claim 13, comprise further:

Extraction unit, extracts described emotion representation of data from the audio frequency of collected described viewing user; And

Transmitting element, sends the described emotion representation of data extracted by described extraction unit to server.

15. client devices according to claim 1,

Wherein, described acquiring unit by described viewing user face-image imaging, as described viewing user reaction.

16. client devices according to claim 1,

17. client devices according to claim 1,

18. 1 kinds of control methods, comprising:

Obtain viewing user to the step of the reaction of content;

Based on the reaction information of obtained described viewing user, estimate the step of the emotion of described viewing user;

Determine the step of the emotion representation of data representing the described emotion estimated; And

Export the step of determined described emotion representation of data.

19. 1 kinds of systems, comprising:

Client devices, has:

Acquiring unit, obtains viewing user to the reaction of content;

Emotion presumption unit, based on the reaction information of the described viewing user obtained by described acquiring unit, estimates the emotion of described viewing user;

Determining unit, based on the integrated results of the presumption emotion result received from server, determines emotion representation of data; And

Output unit, exports the emotion representation of data determined by described determining unit; And

Server, has:

Integral unit, integrates the emotion presumption result of the described viewing user received from multiple described client devices.

20. 1 kinds of programs, are used as making computer:

Acquiring unit, obtains viewing user to the reaction of content;