HK40079294A

HK40079294A - Vision-based rehabilitation training system based on 3d human pose estimation using multi-view images

Info

Publication number: HK40079294A
Application number: HK62023069131.8A
Authority: HK
Inventors: 林斯姚; 唐晖; 黄超; 韩连漪; 霍志敏; 范伟
Original assignee: 腾讯美国有限责任公司
Priority date: 2020-11-12
Filing date: 2021-06-25
Publication date: 2023-04-14

Description

Vision-based rehabilitation training system based on 3D body posture estimation using multi-view images

Cross Reference to Related Applications

This application claims priority from U.S. application No. 17/096,256, filed 11/12/2020, which is incorporated herein by reference in its entirety.

Technical Field

Embodiments of the present disclosure are directed to rehabilitation systems, and more particularly to markerless motion capture systems.

Background

Conventional rehabilitation systems require the patient to wear specific sensors on his body. However, such sensor-based systems present inconvenience to the patient. Some recent efforts estimate hand pose via depth sensors for hand recovery training. However, the use of specific sensors in the system limits the spread. Furthermore, conventional equipment is often expensive.

Disclosure of Invention

Embodiments of the present disclosure may address the above problems and/or other problems.

Embodiments of the present disclosure may provide a markerless motion capture system using vision-based techniques that can estimate three-dimensional (3D) body pose based on multi-perspective images captured by low-cost commercial cameras (e.g., three cameras).

The disclosed embodiments may provide multi-view 3D body posture estimation for rehabilitation training of e.g. movement disorders. Based on multi-view images captured by low cost cameras, the deep learning model of embodiments of the present disclosure can compute accurate 3D body poses. The disclosed embodiments can not only derive 3D body joints, but can also provide assessment results and rehabilitation advice of patient motion. Accordingly, rehabilitation training assessment and guidance can be achieved without assistance of a doctor in the process.

Embodiments of the present disclosure may include a module for representing animations such that patients monitor their motions and postures and improve their training. Further, embodiments of the present disclosure may include evaluation indicators, and may provide recommendations to help patients improve their recovery. According to embodiments, 3D body posture estimation techniques may facilitate rehabilitation training, which has not been achieved in the prior art.

Embodiments of the present disclosure may provide a vision-based, label-free motion capture system for rehabilitation training that avoids the limitations of conventional motion capture systems and has not been implemented in the prior art. Embodiments of the present disclosure may include a combination of video and voice guidance as part of non-contact rehabilitation training assessment and guidance. The disclosed embodiments may estimate 3D body pose based on deep learning techniques that utilize multi-view images in various views. Information of the multi-view images may assist in deep learning techniques to accurately infer 3D human body gestures.

In accordance with one or more embodiments, a method performed by at least one processor is provided. The method comprises the following steps: obtaining a plurality of videos of a person's body, the plurality of videos including a first video of the person at a first perspective captured by a first camera during a period of time and a second video of the person at a second perspective different from the first perspective captured by a second camera during the period of time; estimating a three-dimensional (3D) pose of the person based on the plurality of videos without relying on any markers on the person, the estimating comprising deriving a set of 3D body joints; obtaining an animation of the motion of the 3D body joint of the sleeve corresponding to the motion of the person during the time period; performing an analysis of the motion of the 3D body joint of the sleeve; and indicating, by a display or a speaker, a rehabilitation assessment result or a rehabilitation training recommendation of the analysis based on the analysis.

According to one embodiment, the performing analysis comprises: calculating at least one rehabilitation assessment indicator based on the motion of the 3D body joint of the sleeve.

According to one embodiment, said performing said analysis further comprises: at least one rehabilitation assessment indicator to be calculated is selected based on input from a user.

According to one embodiment, the method further comprises: displaying the animation of the motion of the 3D body joint of the sleeve.

According to one embodiment, the animation of the motion of the 3D body joint of the sleeve corresponding to the motion of the person during the period of time is displayed in real time.

According to one embodiment, the animation comprises an image of the person's body in combination with the 3D body joints of the sleeve.

According to one embodiment, the resulting plurality of videos further comprises: a third video of the person captured by a third camera during the period of time from a third perspective different from the first perspective and the second perspective.

According to one embodiment, the first perspective is a left side view of the person, the second perspective is a front view of the person, and the third perspective is a right side view of the person.

According to one embodiment, the second video is captured by the second camera at a higher elevation than the first video and the third video is captured by the third camera.

According to one embodiment, the height at which the first video is captured by the first camera and the height at which the third video is captured by the third camera are the same.

In accordance with one or more embodiments, there is provided a system comprising: a plurality of cameras configured to derive respective videos from a plurality of videos of a person's body. The plurality of cameras includes: a first camera configured to acquire a first video of a person at a first point of view from the plurality of videos over a period of time, and a second camera configured to acquire a second video of the person at a second point of view different from the first point of view from the plurality of videos over the period of time. The system also includes a display or speaker, at least one processor, and a memory storing computer code. The computer code includes: a first code configured to cause the at least one processor to estimate a three-dimensional (3D) pose of the person by acquiring a set of 3D body joints based on the plurality of videos, without relying on any markers on the person; second code configured to cause the at least one processor to animate a motion of a 3D body joint of the sleeve corresponding to a motion of the person during the period of time; third code configured to cause the at least one processor to perform an analysis of the motion of the 3D body joint of the cuff; and fourth code configured to cause the at least one processor to indicate, via the display or the speaker, a rehabilitation assessment result or a rehabilitation training recommendation for the analysis based on the analysis.

According to an embodiment, the third code is configured to cause the at least one processor to perform the analysis by calculating at least one rehabilitation assessment indicator based on the motion of the 3D body joint of the cuff.

According to an embodiment, the third code is further configured to cause the at least one processor to select the at least one rehabilitation assessment indicator to be calculated based on input from a user.

According to one embodiment, the system comprises the display, and the second code is further configured to cause the at least one processor to cause the display to display an animation of the motion of the 3D body joint of the sleeve.

According to one embodiment, the second code is configured to cause the at least one processor to cause the display to display in real-time the animation corresponding to the action of the person during the time period.

According to one embodiment, the plurality of cameras further includes a third camera configured to acquire a third video of the person at a third point of view different from the first point of view and the second point of view during the period of time.

According to an embodiment, the first viewing angle is a left side view of the person, the second viewing angle is a front view of the person, and the third viewing angle is a right side view of the person.

According to one embodiment, the height of the second camera is higher than the height of the first camera and the height of the third camera.

In accordance with one or more embodiments, a non-transitory computer-readable medium storing computer code is provided. The computer code, when executed by at least one processor, is configured to, when the at least one processor is caused to: estimating a 3D pose of a person by a set of 3D body joints acquired based on a plurality of videos of the person's body without relying on any markers of the person; obtaining an animation of motion of a 3D body joint of the sleeve corresponding to motion of the person over a period of time; analyzing the motion of the 3D body joint of the sleeve; and indicating, by a display or a speaker, a rehabilitation assessment result or a rehabilitation training recommendation of the analysis based on the analysis. The plurality of videos includes a first video of the person captured by a first camera at a first perspective during the period of time and a second video of the person captured by a second camera at a second perspective different from the first perspective during the period of time.

Drawings

Further features, properties and various advantages of the disclosed subject matter will become more apparent from the following detailed description and the accompanying drawings, in which:

FIG. 1 is a schematic diagram of a rehabilitation training system according to one embodiment.

Fig. 2 is a block diagram of a process according to one embodiment of the present disclosure.

FIG. 3 is a schematic diagram of computer code, according to one embodiment of the present disclosure.

Fig. 4 is a perspective view of a camera configuration according to one embodiment of the present disclosure.

FIG. 5 is an example illustration of a patient posture represented by a 3D body joint according to one embodiment of the present disclosure.

FIG. 6 is a block diagram of a process according to one embodiment of the present disclosure.

FIG. 7A is an illustration of a portion of a display animation according to one embodiment of the disclosure.

FIG. 7B is an illustration of a portion of a display animation according to one embodiment of the disclosure.

FIG. 8 is a schematic diagram of a computer system according to one embodiment of the present disclosure.

Detailed Description

According to an embodiment, referring to fig. 1, a rehabilitation training system 100 is provided. The rehabilitation training system 100 may include, for example, a camera 110, a computing system 120, and a display 130. The camera 110 may include any number of cameras. For example, according to one embodiment, the cameras 110 may include two or three cameras. The camera 110 may be configured to acquire video data and transmit the video data to the computing system 120 via a wired or wireless connection. The computing system 120 may include at least one processor 122 and memory storing computer code. The computer code may be configured to, when executed by the at least one processor 122, cause the at least one processor 122 to perform processes of the computing system 120, such as those described below with respect to fig. 2. FIG. 3 shows an exemplary diagram of computer code. Computing system 120 may also include display 130, or be connected to display 130, and may be configured to cause display 130 to display results of the processing by computing system 120. Computing system 120 may be connected to display 130 via a wired or wireless connection.

Referring to fig. 2 through 3, the processing performed by computing system 120 is described below. Referring to FIG. 2, computing system 120 may perform the following: multi-view 3D human pose estimation (220), human motion visualization (230), human motion analysis (240), and providing assessment results and recommendations (250). Referring to fig. 3, such processing may be performed by the at least one processor 122 of the computing system 120 through pose estimation code 320, action visualization code 330, action analysis code 340, and evaluation code 350, respectively, included in the memory 124.

Computing system 120 may receive video data from camera 110 as input to multi-view 3D body pose estimation (220). For example, each camera 110 may provide single-view videos (e.g., single-view videos 210-1, 210-2, …, 210-N) to computing system 120, each single-view video including images of a patient from a respective perspective. In other words, each camera 110 may capture gestures and motions of the patient from respective directions in respective single-view videos (e.g., single-view videos 210-1, 210-2, …, 210-N), which are then derived from the camera 110 by the computing system 120.

As an example, referring to fig. 4, the cameras 110 of the rehabilitation training system 100 may include a first camera 411, a second camera 412, and a third camera 413 in a configuration 400. In configuration 400, a first camera 411, a second camera 412, and a third camera 413 may be disposed at respective locations to capture respective perspectives of a patient starting from location (x 0, y0, z 0). The zxft 8978 direction may be an x-axis extending in a left-right direction with respect to fig. 4 with reference to 8978 (+ x direction toward right side of fig. 4), the y-direction may be a y-axis extending to the inside or outside of fig. 4 (+ y direction toward inside of fig. 4), and the z-direction may be a z-axis extending in an up-down direction with respect to fig. 4 (+ z direction toward top of fig. 4). The second camera 412 may be at the same or similar x-position as the position (x 0, y0, z 0) where the patient started, and may be located at a height h1 above (e.g., above ground level) the position (x 0, y0, z 0) in the + z direction. The first camera 411 may be in the-x direction and located at a distance d1 relative to the position (x 0, y0, z 0) and/or the second camera 412, and the third camera 413 may be in the + x direction and located at a distance d1 relative to the position (x 0, y0, z 0) and/or the second camera 412. The first camera 411 and the third camera 413 may be located at the same height h2 above (e.g., above ground level) the location (x 0, y0, z 0) in the + z direction. The first camera 411, the second camera 412, and the third camera 413 may all be at the same y position (e.g., a + y position). Each of the first camera 411, the second camera 412, and the third camera 413 has an angle of view a1 toward a position (x 0, y0, z 0) with respect to at least one axis. For example, as shown in FIG. 4, the perspective a1 of the third camera 413 may be angled from the y-axis in at least the-x direction. Additionally, the angle of view of the first camera 411 may be at an angle to the y-axis in at least the + x direction, and the angle of view of the second camera 412 may be at an angle to the y-axis in at least the-z direction. According to the configuration 400, the first camera 411 may be configured to capture a left perspective view of the patient's body, the second camera 412 may be configured to capture an upper/front perspective view of the patient's body, and the third camera 413 may be configured to capture a right perspective view of the patient's body.

Although fig. 4 illustrates configuration 400, other camera configurations having different numbers of cameras 110, camera positions, and/or camera perspectives may be implemented in embodiments of the present disclosure.

As described above, the camera 110 may be disposed at various locations and have various perspectives to capture various perspectives of the patient, and video data from the camera 110 may be input to the computing system 120 to perform multi-view 3D body pose estimation (220). Multi-view 3D body pose estimation (220) may be the process by which computing system 120 uses video data from camera 110 to infer a pose of a patient and represent the pose as a set of 3D joint positions. An example of a patient posture represented by a 3D body joint is shown in fig. 5. As shown in FIG. 5, gesture 500 may be represented by various body joints including, for example, right foot joint 501, left foot joint 502, right knee joint 503, left knee joint 504, right hip joint 505, left hip joint 506, right hand joint 507, left hand joint 508, right elbow joint 509, left elbow joint 510, right shoulder joint 512, left shoulder joint 513, and head joint 514.

According to an embodiment, referring to fig. 6, multi-view 3D body pose estimation (220) may be performed by computing system 120 using process 600. Process 600 may be implemented by a Deep Neural Network (DNN) model from end to end.

Process 600 may be a two-stage approach in which 2D coordinates of body joints are estimated in each individual camera view, and then multi-view information is considered using trigonometric and linear regression to infer 3D body pose.

For example, referring to FIG. 6, process 600 may include obtaining a respective single-view video (e.g., single-view video 610-1, …, 610-N) from each of cameras 110. Based on each single-view video 610-1, …, 610-N, a corresponding 2D backbone 620-1, …, 620-N may be derived. Based on each 2D stem 620-1, …, 620-N, a corresponding set of 2D joint thermal maps 630-1, …, 630-N may be derived. Each set of 2D joint heat maps 630-1, …, 630-N may be input to a corresponding soft-argmax function 640-1, …, 640-N to obtain a corresponding set of 2D joint keypoints 650-1, …, 650-N. Then, algebraic triangulation 660 can be performed using all sets of 2D joint keypoints 650-1, …, 650-N, and using joint confidence derived based on each 2D skeleton 620-1, …, 620-N to derive a set of 3D body joint positions 670, the set of 3D body joint positions 670 being a set of estimated 3D body joints.

Referring to fig. 7A-7B, the computing system 120 may be configured to perform a human motion visualization (230) process in which estimated 3D human motion of a patient is represented based on a set of estimated 3D body joints (e.g., a set of 3D body joint locations 670). The human motion visualization (230) process may include removing noise caused by failed pose estimation, and generating real-time animation.

For example, as shown in fig. 7A, the computing system 120 may be configured to combine a video image of a patient with a set of estimated 3D body joints (e.g., a set of 3D body joint locations 670) of the patient and display the combination as an animation 710. According to one embodiment, animation 710 may simultaneously include multiple fluoroscopic video images of the patient in conjunction with the set of estimated 3D body joints. As an example, the animation 710 may show a right perspective video 712 of the patient and a front perspective video 714 of the patient. However, the number of videos and the type of view may vary in the animation 710.

Further, as shown in fig. 7B, the computing system 120 may be configured to generate an animation 720 similar to the animation 710, except that a set of estimated 3D body joints shown in multiple perspectives at the same time do not show a video image of the patient with the animation.

According to an embodiment, animation 710 and animation 720 may be displayed simultaneously. According to one embodiment, animations 710 and 720 may be real-time animations. According to an embodiment, multiple perspective video images of the patient combined with the suite of estimated 3D body joints may be obtained from two or more of the monoscopic videos 210-1, … 210-N (see fig. 2). According to an embodiment, computing system 120 may cause animation 710 and or animation 720 to be displayed on display 130 (referring to FIG. 1).

By displaying animations according to embodiments of the present disclosure, patients can better monitor their movements and postures, which can help them understand how they behave in rehabilitation training.

The computing system 120 may also be configured to perform a human action analysis (240) process in which the user may set different evaluation metrics depending on the type of rehabilitation training. The computing system 120 may then calculate an index based on the estimated 3D body motion resulting from the multiview 3D body pose estimation (220) process and the body motion visualization (230) process. The estimated 3D body motion may refer to an animated motion of the set of estimated 3D body joints (e.g., the set of 3D body joint positions 670) (see fig. 6-7B). An example of a type of rehabilitation training is the rehabilitation training of walking movement. The indicators of the walking exercise rehabilitation training may include the walking speed of the patient, the height of the legs of the patient, the stability of the walking, and the amplitude and frequency of the arm swing of the patient. According to an embodiment, the computing system 120 may automatically determine the metric to be calculated based on a user selecting a type of rehabilitation training using an input device (e.g., a mouse, keyboard, touch screen, microphone, etc.) connected to the computing system 120. According to an embodiment, the user may manually select the metric to be calculated using an input device, and the computing system 120 may be configured to perform the calculation based on the selection.

After the human action analysis (240) process, the computing system 120 may be configured to perform an assessment results and advice (250) process. That is, for example, the computing system 120 may determine an assessment result based on the results of the human motion analysis (240) process, and may provide (e.g., display on the display 130 or output by a speaker) a training recommendation (which may or may not provide the assessment result) to the patient based on the assessment result. As an example, when the evaluation result is that the patient's walking motion is determined to be too slow due to too low an arm swing amplitude, the computing system 120 may provide a training recommendation that the patient should strengthen his or her arm swing. According to an embodiment, the results and advice (250) processing performed by the computing system 120 may include calculating and providing (e.g., displayed on the display 130 or output by a speaker) a final assessment score to the patient based on the results of the human motion analysis (240) processing.

The processes of the present disclosure described above may be implemented as computer software using computer readable instructions, and the computer software may be physically stored in one or more computer readable media. For example, fig. 8 illustrates a computer system 900 suitable for implementing the computing system 120 of the disclosed subject matter.

Computer software may be encoded using any suitable machine code or computer language that may be subject to assembly, compilation, linking, or similar mechanisms to create code comprising instructions that may be executed directly by computer Central Processing Units (CPUs), graphics Processing Units (GPUs), etc., or by interpretation, microcode execution, etc.

The instructions may be executed on various types of computers or components thereof, including, for example, personal computers, tablet computers, servers, smart phones, gaming devices, internet of things devices, and so forth.

The components of computer system 900 shown in FIG. 8 are exemplary in nature and are not intended to suggest any limitation as to the scope of use or functionality of the computer software implementing embodiments of the present disclosure. Neither should the configuration of components be interpreted as having any dependency or requirement relating to any one or combination of components illustrated in the exemplary embodiments of computer system 900.

Computer system 900 may include some human interface input devices. Such human interface input devices may be responsive to one or more human user inputs, such as tactile inputs (e.g., keystrokes, sliding, data glove movement), audio inputs (e.g., voice, applause), visual inputs (e.g., gestures), olfactory inputs (not depicted). The human interface device may also be used to capture certain media that are not necessarily directly related to human conscious input, such as audio (e.g., voice, music, ambient sounds), images (e.g., scanned images, photographic images from still-image cameras), and video (e.g., two-dimensional video, three-dimensional video including stereoscopic video).

Input human interface devices may include one or more of a keyboard 901, a mouse 902, a touch pad 903, a touch screen 910, a joystick 905, a microphone 906, a scanner 907, and a video camera 908 (only one shown for each type).

Computer system 900 may also include some human interface output devices. Such human interface output devices may stimulate one or more human user's senses through, for example, tactile output, sound, light, and smell/taste. Such human interface output devices may include tactile output devices (e.g., tactile feedback from touch screen 910, data glove, or joystick 905), but tactile feedback devices that do not act as input devices may also be present. For example, such devices may be audio output devices (e.g., speakers 909), headphones (not depicted), visual output devices (e.g., screen 910 to include Cathode Ray Tube (CRT) screens, liquid Crystal Display (LCD) screens, plasma screens, organic Light Emitting Diode (OLED) screens, each with or without touch screen input capability, each with or without haptic feedback capability, where some devices may be capable of outputting two-dimensional visual output or more than three-dimensional output through means such as stereoscopic image output); virtual reality glasses (not depicted), holographic displays and smoke canisters (not depicted), and printers (not depicted).

Computer system 900 may also include human-accessible storage devices and their associated media, such as optical media 921 including CD/DVD read-only memory (ROM)/read-write (RW) 920 with optical disk (CD)/Digital Video Disk (DVD) and the like, thumb drive 922, removable hard or solid state drive 923, conventional magnetic media such as magnetic tapes and floppy disks (not depicted), special purpose ROM/Application Specific Integrated Circuit (ASIC)/Programmable Logic Device (PLD) based devices such as a secure dongle (not depicted), and the like.

Those skilled in the art will also appreciate that the term "computer-readable medium" used in connection with the presently disclosed subject matter does not include transmission media, carrier waves, or other transitory signals.

Computer system 900 may also include an interface to one or more communication networks. The network may be, for example, wireless, wired, optical. The network may further be local, wide area, metropolitan, vehicular, and industrial, real-time, delay tolerant, etc. Examples of networks include local area networks such as ethernet, wireless Local Area Networks (LANs), cellular networks including global system for mobile communications (GSM), third generation mobile communications (3G), third generation mobile communications (4G), fifth generation mobile communications (5G), long Term Evolution (LTE) and the like, TV wired or wireless wide area network digital networks including cable Television (TV), satellite TV and terrestrial broadcast TV, vehicular networks, industrial networks including controller area network bus technology (CANBus), and the like. Some networks typically require external network interface adapters connected to some universal data port or peripheral bus 949 (e.g., a Universal Serial Bus (USB) port of computer system 900); other networks are typically integrated into the core of computer system 900 by attachment to a system bus as described below (e.g., an ethernet interface into a PC computer system or a cellular network interface into a smartphone computer system). Using any of these networks, computer system 900 may communicate with other entities. Such communications may be one-way receive-only (e.g., broadcast TV), one-way transmit-only (e.g., CANBus to certain CANBus devices), or two-way (e.g., to other computer systems using a local or wide area digital network). Such communications may include communications to the cloud computing environment 955. As described above, certain protocols and protocol stacks may be used on each of these networks and network interfaces.

The human interface device, human accessible storage device, and network interface 954 described above may be attached to the core 940 of computer system 900.

The core 940 may include one or more Central Processing Units (CPUs) 941, graphics Processing Units (GPUs) 942, special purpose programmable processing units in the form of Field Programmable Gate Arrays (FPGAs) 943, and hardware accelerators 944 for specific tasks, among others. These devices, along with Read Only Memory (ROM) 945, random Access Memory (RAM) 946, internal mass storage such as internal non-user accessible hard drives, solid State Disks (SSDs), etc., may be connected by a system bus 948. In some computer systems, the system bus 948 may be accessed in the form of one or more physical plugs to enable expansion by additional CPUs, GPUs, and the like. Peripherals may be attached directly to system bus 948 of the core or attached to system bus 948 of the core through peripheral bus 949. The architecture of the peripheral bus includes PCI, USB, etc. A graphics adapter 950 may be included in core 940.

The CPUs 941, GPUs 942, FPGAs 943, and accelerators 944 can execute instructions that, in combination, can constitute the aforementioned computer code. The computer code may be stored in ROM 945 or RAM 946. Temporary data may also be stored in the RAM 946, while permanent data may be stored in, for example, an internal mass storage 947. Fast storage and retrieval of memory devices may be achieved through the use of cache memory, which may be intimately associated with one or more CPUs 941, GPUs 942, mass storage 947, ROM 945, RAM 946, and the like.

Computer readable media may have computer code thereon for performing various computer-implemented operations. The media and computer code may be those specially designed and constructed for the purposes of the present disclosure, or they may be of the kind well known and available to those having skill in the computer software arts.

By way of example, and not limitation, computer system with architecture 900, and in particular cores 940, may provide functionality as a result of processors (including CPUs, GPUs, FPGAs, accelerators, etc.) executing software implemented in one or more tangible computer-readable media. Such computer-readable media may be media associated with user-accessible mass storage as described above, as well as certain memories of the core 940 that are non-transitory in nature, such as core internal mass storage 947 or ROM 945. Software implementing various embodiments of the present disclosure may be stored in such devices and executed by core 940. The computer readable medium may include one or more memory devices or chips, according to particular needs. The software may cause the core 940, and in particular the processors therein (including CPUs, GPUs, FPGAs, etc.) to perform certain processes or certain portions of certain processes described herein, including defining data structures stored in RAM 946 and modifying such data structures according to processes defined by the software. Additionally or alternatively, the computer system may provide functionality as a result of being logically hardwired or otherwise embodied in circuitry (e.g., accelerator 944) that may operate in place of or in conjunction with software to perform certain processes or certain portions of certain processes described herein. References to software may include logic and vice versa, as appropriate. References to a computer-readable medium may include, where appropriate, circuitry (e.g., an Integrated Circuit (IC)) that stores software for execution, or circuitry that implements execution logic, or both. The present disclosure includes any suitable combination of hardware and software.

While this disclosure has described several non-limiting exemplary embodiments, there are alterations, permutations, and various equivalents, which fall within the scope of this disclosure. It will thus be appreciated that those skilled in the art will be able to devise numerous systems and methods which, although not explicitly shown or described herein, embody the principles of the disclosure and are thus within its spirit and scope.

Claims

1. A method performed by at least one processor, the method comprising:

obtaining a plurality of videos of a person's body, the plurality of videos including a first video of the person captured by a first camera during a period of time from a first perspective and a second video of the person captured by a second camera during the period of time from a second perspective different from the first perspective;

estimating a three-dimensional (3D) pose of the person based on the plurality of videos without relying on any markers of the person, the estimating comprising acquiring a set of 3D body joints;

obtaining an animation of the motion of the 3D body joint of the sleeve corresponding to the motion of the person during the time period;

analyzing the motion of the 3D body joint of the sleeve; and

indicating, by a display or a speaker, a rehabilitation assessment result or a rehabilitation training recommendation of the analysis based on the analysis.

2. The method of claim 1,

the performing the analysis comprises: calculating at least one rehabilitation assessment indicator based on the motion of the 3D body joint of the sleeve.

3. The method of claim 2,

the performing the analysis further comprises: selecting the at least one rehabilitation assessment indicator to be calculated based on input from a user.

4. The method of claim 1, further comprising:

displaying the animation of the motion of the 3D body joint of the sleeve.

5. The method of claim 4,

displaying in real-time the animation of the motion of the 3D body joint of the sleeve corresponding to the motion of the person over the period of time.

6. The method of claim 5,

the animation includes an image of the body of the person in combination with a 3D body joint of the sleeve.

7. The method of claim 1,

the obtained plurality of videos further comprises: a third video of the person at a third perspective different from the first perspective and the second perspective captured by a third camera during the period of time.

8. The method of claim 7,

the first perspective is a left side view of the person, the second perspective is a front view of the person, and the third perspective is a right side view of the person.

9. The method of claim 8,

the second video is captured by the second camera at a height greater than the first video is captured by the first camera and the third video is captured by the third camera.

10. The method of claim 9,

the height at which the first video is captured by the first camera and the height at which the third video is captured by the third camera are the same.

11. A system, characterized in that the system comprises:

a plurality of cameras configured to obtain respective videos from a plurality of videos of a person's body, the plurality of cameras comprising:

a first camera configured to acquire a first video of the person at a first perspective during a period of time from the plurality of videos, an

A second camera configured to capture, from the plurality of videos, a second video of the person at a second perspective different from the first perspective during the period of time;

a display or speaker;

at least one processor; and

a memory including computer code, the computer code comprising:

a first code configured to cause the at least one processor to estimate a three-dimensional (3D) pose of the person by acquiring a set of 3D body joints based on the plurality of videos, without relying on any markers on the person;

second code configured to cause the at least one processor to acquire an animation of motion of the 3D body joints of the sleeve corresponding to motion of the person during the period of time;

third code configured to cause the at least one processor to analyze the motion of the 3D body joint of the sleeve; and

fourth code configured to cause the at least one processor to indicate, via the display or the speaker, a rehabilitation assessment result or a rehabilitation training recommendation for the analysis based on the analysis.

12. The system of claim 11,

the third code is configured to cause the at least one processor to perform the analysis by: calculating at least one rehabilitation assessment indicator based on the motion of the 3D body joint of the sleeve.

13. The system of claim 12,

the third code is further configured to cause the at least one processor to select the at least one rehabilitation assessment indicator to calculate based on input from a user.

14. The system of claim 11,

the system comprises the display, an

The second code is further configured to cause the at least one processor to cause the display to display the animation of the motion of the 3D body joint of the sleeve.

15. The system of claim 14,

the second code is configured to cause the at least one processor to cause the display to display in real-time the animation corresponding to the action of the person during the period of time.

16. The system of claim 15,

the animation includes an image of the body of the person in combination with the set of 3D body joints.

17. The system of claim 11,

the plurality of cameras further includes a third camera configured to acquire a third video of the person at a third perspective different from the first perspective and the second perspective during the period of time.

18. The system of claim 17,

19. The system of claim 18,

the height of the second camera is higher than the height of the first camera and the height of the third camera.

20. A non-transitory computer-readable medium storing computer code that, when executed by at least one processor, causes the at least one processor to:

estimating a three-dimensional (3D) pose of a person by a set of 3D body joints acquired based on a plurality of videos of the person's body without relying on any markers of the person;

obtaining an animation of the motion of the 3D body joints of the sleeve corresponding to the motion of the person over a period of time;

analyzing the motion of the 3D body joint of the sleeve; and

indicating, by a display or a speaker, a rehabilitation assessment result or a rehabilitation training recommendation of the analysis based on the analysis;

wherein the plurality of videos includes a first video of the person captured by a first camera at a first perspective during the period of time and a second video of the person captured by a second camera at a second perspective different from the first perspective during the period of time.