CN116551672A

CN116551672A - Systems and methods for robotic systems with object handling capabilities

Info

Publication number: CN116551672A
Application number: CN202310325209.4A
Authority: CN
Inventors: F·盖瑟
Original assignee: Mu Jinkeji
Current assignee: Mu Jinkeji
Priority date: 2022-03-08
Filing date: 2023-03-08
Publication date: 2023-08-08

Abstract

Systems and methods for a robotic system with object handling capabilities are disclosed. A computing system configured for object transfer is provided. The computing system includes at least one processing circuit configured to identify a pickable area of an object based on image information of the object. The pickable area may be determined from a surface cost map indicating the smoothness of each area of the image information, the surface cost map being determined from the height difference and the normal difference. The identification of pickable regions can be used in motion planning operations for conveying objects.

Description

Systems and methods for robotic systems with object handling capabilities

本申请是申请日为2023年3月8日、题为“用于具有物体处置能力的机器人系统的系统和方法”的发明专利申请202310238393.9的分案申请。This application is a divisional application of the invention patent application 202310238393.9 with a filing date of March 8, 2023, entitled "System and Method for a Robotic System with Object Handling Capability".

对相关申请的交叉引用Cross References to Related Applications

本申请要求于2022年3月8日提交的题为“ROBOTIC SYSTEM WITH OBJECTDETECTION”的第63/317,877号美国临时申请的权益，该临时申请的其全部内容通过引用并入本文。This application claims the benefit of US Provisional Application No. 63/317,877, filed March 8, 2022, entitled "ROBOTIC SYSTEM WITH OBJECTDETECTION," which is incorporated herein by reference in its entirety.

技术领域technical field

本技术一般而言针对机器人系统，并且更具体而言针对用于检测和处置物体的系统、过程和技术。更特别地，本技术可以用于标识容器中物体的可拾取区域(pickableregion)。The present technology is directed generally to robotic systems, and more specifically to systems, processes and techniques for detecting and handling objects. More particularly, the present technique may be used to identify pickable regions of objects in containers.

背景技术Background technique

随着机器人的不断提高的性能和降低的成本，许多机器人(例如，被配置为自动/自主地执行物理动作的机器)现在广泛用于各种不同的领域。例如，机器人可用于在制造和/或组装、包装和/或打包、运输和/或装运等中执行各种任务(例如，操纵或传送物体以穿过一空间)。在执行任务时，机器人可以复制人类动作，从而替代或减少否则在执行危险或重复性任务时所需要的人类参与。With the ever-improving performance and decreasing cost of robots, many robots (eg, machines configured to automatically/autonomously perform physical actions) are now widely used in various fields. For example, robots may be used to perform various tasks (eg, manipulating or transporting objects through a space) in manufacturing and/or assembly, packaging and/or packaging, transportation and/or shipping, and the like. When performing tasks, robots can replicate human movements, thereby replacing or reducing human involvement that would otherwise be required to perform dangerous or repetitive tasks.

然而，尽管技术有进步，机器人通常缺乏复制执行更大和/或更复杂任务所需的人类交互所需的复杂性。因此，仍然需要用于管理机器人之间的操作和/或交互的改进的技术和系统。However, despite technological advances, robots often lack the sophistication required to replicate the human interactions required to perform larger and/or more complex tasks. Accordingly, there remains a need for improved techniques and systems for managing operations and/or interactions between robots.

发明内容Contents of the invention

在实施例中，提供了一种计算系统。该计算系统包括：控制系统，被配置为与具有机械臂的机器人进行通信并与相机进行通信，该机械臂包括或附接到末端执行器装置；至少一个处理电路，当机器人位于包括用于传送到物体处置环境内的目的地的物体源的物体处置环境中时，该处理电路被配置为：获得物体的图像信息；通过以下操作来标识物体中的选定物体中的一个或多个选定物体的可拾取区域：根据图像信息生成表面成本图(surfacecost map)，分割表面成本图以获得标识与该一个或多个选定物体对应的一个或多个可拾取区域的一个或多个图像片段；以及生成至少包括该至少一个或多个可拾取区域的可拾取区域检测结果；以及为机器人系统生成用于传送该一个或多个选定物体的运动规划，该运动规划基于可拾取区域检测结果。In an embodiment, a computing system is provided. The computing system includes: a control system configured to communicate with a robot having a robotic arm that includes or is attached to an end effector device; and at least one processing circuit that is configured to transmit Upon arrival in an object handling environment of an object source of a destination within the object handling environment, the processing circuitry is configured to: obtain image information of the object; identify one or more selected ones of the selected ones of the objects by Pickable regions of objects: generating a surface cost map from image information, segmenting the surface cost map to obtain one or more image segments identifying one or more pickable regions corresponding to the one or more selected objects and generating a pickable area detection result including at least the at least one or more pickable areas; and generating a motion plan for the robotic system to transport the one or more selected objects, the motion plan being based on the pickable area detection result .

在实施例中，提供了一种由控制系统执行的物体传送方法，该控制系统具有至少一个处理电路并被配置为与具有机械臂的机器人进行通信并与相机进行通信，该机械臂包括或附接到末端执行器装置。该方法包括：获得物体源中所包含的一个或多个物体的图像信息；通过以下操作来标识物体中的选定物体中的一个或多个选定物体的可拾取区域：根据图像信息生成表面成本图，分割表面成本图以获得标识与一个或多个选定物体对应的一个或多个可拾取区域的一个或多个图像片段；以及生成至少包括该一个或多个可拾取区域的可拾取区域检测结果；以及为机器人系统生成用于传送该一个或多个选定物体的运动规划，该运动规划基于可拾取区域检测结果。In an embodiment, there is provided a method of object transfer performed by a control system having at least one processing circuit and configured to communicate with a robot having a robotic arm including or in addition to a camera, and to communicate with a camera. Connect to the end effector unit. The method includes: obtaining image information of one or more objects contained in an object source; identifying pickable regions of one or more selected ones of the objects by: generating a surface based on the image information a cost map, segmenting the surface cost map to obtain one or more image segments identifying one or more pickable regions corresponding to the one or more selected objects; and generating a pickable region including at least the one or more pickable regions region detection results; and generating, for the robotic system, a motion plan for conveying the one or more selected objects, the motion plan based on the pickable region detection results.

在实施例中，提供了一种非暂时性计算机可读介质，配置有用于物体传送的可执行指令，该物体传送通过具有至少一个处理电路并被配置为与具有机械臂的机器人进行通信并与相机进行通信的控制系统执行，该机械臂包括或附接到末端执行器装置。指令可以被配置用于：获得物体源中所包含的一个或多个物体的图像信息；通过以下操作来标识物体中的选定物体中的一个或多个选定物体的可拾取区域：根据图像信息生成表面成本图，分割表面成本图以获得标识与一个或多个选定物体对应的一个或多个可拾取区域的一个或多个图像片段；以及生成至少包括该一个或多个可拾取区域的可拾取区域检测结果；以及为机器人系统生成用于传送该一个或多个选定物体的运动规划，该运动规划基于可拾取区域检测结果。In an embodiment, a non-transitory computer readable medium configured with executable instructions for object transfer through a robot having at least one processing circuit and configured to communicate with a robot having a robotic arm and with The camera communicates with the control system implemented by the robotic arm that includes or is attached to an end effector device. The instructions may be configured to: obtain image information of one or more objects contained in the object source; identify pickable regions of one or more selected ones of the objects by: according to the image generating a surface cost map, segmenting the surface cost map to obtain one or more image segments identifying one or more pickable regions corresponding to the one or more selected objects; and generating at least the one or more pickable regions and generating, for the robotic system, a motion plan for transporting the one or more selected objects, the motion plan being based on the pickable area detection results.

附图说明Description of drawings

图1A图示了根据本文的实施例的用于执行或促进物体的检测、标识和取得的系统。Figure 1A illustrates a system for performing or facilitating the detection, identification and retrieval of objects according to embodiments herein.

图1B图示了根据本文的实施例的用于执行或促进物体的检测、标识和取得的系统的实施例。Figure IB illustrates an embodiment of a system for performing or facilitating the detection, identification and retrieval of objects according to embodiments herein.

图1C图示了根据本文的实施例的用于执行或促进物体的检测、标识和取得的系统的另一个实施例。Figure 1C illustrates another embodiment of a system for performing or facilitating the detection, identification and retrieval of objects according to embodiments herein.

图1D图示了根据本文的实施例的用于执行或促进物体的检测、标识和取得的系统的又另一个实施例。Figure ID illustrates yet another embodiment of a system for performing or facilitating detection, identification and retrieval of objects according to embodiments herein.

图2A是图示与本文的实施例一致的被配置为执行或促进物体的检测、标识和取得的计算系统的框图。2A is a block diagram illustrating a computing system configured to perform or facilitate detection, identification, and retrieval of objects, consistent with embodiments herein.

图2B是图示与本文的实施例一致的被配置为执行或促进物体的检测、标识和取得的计算系统的实施例的框图。2B is a block diagram illustrating an embodiment of a computing system configured to perform or facilitate detection, identification, and retrieval of objects, consistent with embodiments herein.

图2C是图示与本文的实施例一致的被配置为执行或促进物体的检测、标识和取得的计算系统的另一个实施例的框图。2C is a block diagram illustrating another embodiment of a computing system configured to perform or facilitate detection, identification, and retrieval of objects, consistent with embodiments herein.

图2D是图示与本文的实施例一致的被配置为执行或促进物体的检测、标识和取得的计算系统的又另一个实施例的框图。2D is a block diagram illustrating yet another embodiment of a computing system configured to perform or facilitate detection, identification, and retrieval of objects, consistent with embodiments herein.

图2E是由系统处理的图像信息的示例并且与本文的实施例一致。Figure 2E is an example of image information processed by the system and is consistent with embodiments herein.

图2F是由系统处理的图像信息的另一个示例并且与本文的实施例一致。Figure 2F is another example of image information processed by the system and is consistent with embodiments herein.

图3A图示了根据本文的实施例的用于操作机器人系统的示例性物体处置环境。FIG. 3A illustrates an exemplary object handling environment for operating a robotic system according to embodiments herein.

图3B图示了根据本文的实施例的用于操作机器人系统的示例性物体处置环境。Figure 3B illustrates an exemplary object handling environment for operating a robotic system according to embodiments herein.

图3C图示了根据本文的实施例的用于操作机器人系统的示例性物体处置环境。FIG. 3C illustrates an exemplary object handling environment for operating a robotic system according to embodiments herein.

图4是图示用于处置检测到的物体的示例过程的流程图。4 is a flowchart illustrating an example process for handling detected objects.

图5A图示了与本文的实施例一致的场景的2D图像信息的示例。FIG. 5A illustrates an example of 2D image information of a scene consistent with embodiments herein.

图5B图示了与本文的实施例一致的场景的3D图像信息的示例。FIG. 5B illustrates an example of 3D image information of a scene consistent with embodiments herein.

图6A为与本文的实施例一致的表面成本图生成方法提供了示例流程图。FIG. 6A provides an example flowchart for a method of generating a surface cost map consistent with embodiments herein.

图6B-6E提供了与本文的实施例一致的表面成本图生成方法的各方面的示例。6B-6E provide examples of aspects of a surface costmap generation method consistent with embodiments herein.

图6F提供了与本文的实施例一致的高度梯度成本图(height gradient costmap)的示例。Figure 6F provides an example of a height gradient costmap consistent with embodiments herein.

图6G提供了与本文的实施例一致的法线差异成本图(normal differences costmap)的示例。Figure 6G provides an example of a normal differences costmap consistent with embodiments herein.

图6H提供了与本文的实施例一致的表面成本图的示例。Figure 6H provides an example of a surface cost map consistent with embodiments herein.

图6I提供了与本文的实施例一致的物体的示例。Figure 6I provides an example of an object consistent with embodiments herein.

图7A提供了与本文的实施例一致的分割方法的示例。Figure 7A provides an example of a segmentation method consistent with embodiments herein.

图7B-7E提供了与本文的实施例一致的分割方法的各方面的示例。7B-7E provide examples of aspects of segmentation methods consistent with embodiments herein.

图8A和8B提供了与本文的实施例一致的检测掩模信息生成(detection maskinformation generation)的各方面的示例。8A and 8B provide examples of aspects of detection maskinformation generation consistent with embodiments herein.

图9A和9B提供了与本文的实施例一致的安全体积生成(safety volumegeneration)的各方面的示例。9A and 9B provide examples of aspects of safety volume generation consistent with embodiments herein.

具体实施方式Detailed ways

本文描述了与物体检测、标识和取得相关的系统和方法。特别地，所公开的系统和方法可以促进物体检测、可拾取区域的标识以及物体位于容器中的情况下的物体取得。如本文所讨论的，物体可以包括盒子、袋子、包等。由于物体的不规则布置以及难以标识适合拾取(例如利用抽吸抓握设备拾取)的物体区域或部分，在此类情形下的物体处置可能是挑战性的。因此，本文描述的系统和方法被设计为从一组物体中标识物体的可拾取区域，其中各个物体可能布置在不同位置、处于不同角度等。本文讨论的系统和方法可以包括机器人系统。根据本文的实施例配置的机器人系统可以通过协调多个机器人的操作来自主地执行集成任务。如本文所述，机器人系统可以包括机器人设备、致动器、传感器、相机和计算系统的任何合适组合，这些系统被配置为控制、发布命令、从机器人设备和传感器接收信息、访问、分析和处理由机器人设备、传感器和相机生成的数据，生成可用于控制机器人系统的数据或信息，并为机器人设备、传感器和相机规划行动。如本文所用，机器人系统不需要直接访问或控制机器人致动器、传感器或其他设备。如本文所述，机器人系统可以是被配置为通过接收、分析和处理信息来提高这样的机器人致动器、传感器和其他设备的性能的计算系统。Systems and methods related to object detection, identification, and retrieval are described herein. In particular, the disclosed systems and methods may facilitate object detection, identification of pickable areas, and object retrieval if the object is located in a container. As discussed herein, objects may include boxes, bags, bags, and the like. Object handling in such situations can be challenging due to the irregular arrangement of the objects and the difficulty in identifying object regions or parts suitable for picking, for example with a suction gripping device. Accordingly, the systems and methods described herein are designed to identify pickable regions of objects from a set of objects, where the individual objects may be arranged at different locations, at different angles, etc. The systems and methods discussed herein may include robotic systems. Robotic systems configured in accordance with embodiments herein can autonomously perform integrated tasks by coordinating the operation of multiple robots. As described herein, robotic systems may include any suitable combination of robotic devices, actuators, sensors, cameras, and computing systems configured to control, issue commands, receive information from robotic devices and sensors, access, analyze, and process Data generated by robotic devices, sensors, and cameras to generate data or information that can be used to control robotic systems and plan actions for robotic devices, sensors, and cameras. As used herein, a robotic system does not require direct access to or control of robotic actuators, sensors, or other devices. As described herein, a robotic system may be a computing system configured to enhance the performance of such robotic actuators, sensors, and other devices by receiving, analyzing, and processing information.

本文描述的技术向被配置用于物体标识、可拾取区域标识和物体传送的机器人系统提供了技术改进。本文描述的技术改进可以提高这些任务的速度、精度和准确度，并进一步促进了检测、可拾取区域标识和从源容器或储存库向目的地传送物体。本文描述的机器人系统和计算系统解决了从容器(其中物体可能被不规则地布置)中标识、检测可拾取区域和取得物体的技术问题。通过解决此技术问题，物体标识、可拾取区域检测和物体取得的技术被改进。The techniques described herein provide technical improvements to robotic systems configured for object identification, pickable area identification, and object delivery. The technological improvements described in this paper can increase the speed, precision, and accuracy of these tasks and further facilitate the detection, pickable area identification, and transfer of objects from source containers or storage to destinations. The robotic system and computing system described herein address the technical problems of identifying, detecting pickable areas, and retrieving objects from containers in which objects may be irregularly arranged. By solving this technical problem, techniques for object identification, pickable area detection, and object retrieval are improved.

本申请涉及系统和机器人系统。如本文所讨论的，机器人系统可以包括机器人致动器组件(例如，机械臂、机械抓手等)、各种传感器(例如，相机等)，以及各种计算或控制系统。如本文所讨论的，计算系统或控制系统可以被称为“控制”各种机器人组件，诸如机械臂、机械抓手、相机等。这样的“控制”可以指对各种致动器、传感器和机器人组件的其他功能方面的直接控制和与其的交互。例如，计算系统可以通过发出或提供所有需要的信号来控制机械臂，以使得各种电动机、致动器和传感器引起机器人移动。这样的“控制”还可以指向另一个机器人控制系统发出抽象或间接命令，然后该另一个机器人控制系统将这样的命令转换为用于引起机器人移动的必要信号。例如，计算系统可以通过发出描述机械臂应该移动到的轨迹或目的地位置的命令来控制机械臂，并且与机械臂相关联的另一个机器人控制系统可以接收和解释这样的命令，然后向机械臂的各种致动器和传感器提供必要的直接信号，以引起所需的移动。This application relates to systems and robotic systems. As discussed herein, a robotic system may include robotic actuator assemblies (eg, robotic arms, robotic grippers, etc.), various sensors (eg, cameras, etc.), and various computing or control systems. As discussed herein, the computing system or control system may be referred to as "controlling" various robotic components, such as robotic arms, grippers, cameras, and the like. Such "control" may refer to direct control of and interaction with various actuators, sensors, and other functional aspects of robotic components. For example, a computing system can control a robotic arm by issuing or providing all the needed signals to cause the various motors, actuators, and sensors to cause the robot to move. Such "control" may also issue abstract or indirect commands to another robot control system, which then translates such commands into the necessary signals for causing the robot to move. For example, a computing system may control a robotic arm by issuing commands describing a trajectory or destination position to which the robotic arm should move, and another robotic control system associated with the robotic arm may receive and interpret such commands and then provide Various actuators and sensors provide the necessary direct signals to cause the desired movement.

特别地，本文描述的技术帮助机器人系统与容器中多个物体中的目标物体交互。本文描述的方法和系统可以从一组物体当中标识选定物体的可拾取区域。如本文所述，机器人传送机构(例如，机械臂)可以包括吸盘或抽吸抓握器，作为用于抓住、拾取或抓住物体的末端执行器装置的一部分。当应用于物体的平滑表面(例如，具有足够平滑的表面轮廓的物体部分，使吸盘可以在物体的表面和吸盘之间接合并形成密封以抬起并传送物体)时，这种基于抽吸的抓握设备的性能可以更好。足够平滑以适当地与抽吸抓握设备进行适当的接合并且足够大以容纳机器人传送系统中的一个或多个抽吸抓握设备的表面可以被称为“可拾取区域”。当物体在源储存库或容器中被松散地组织时，可以采用本文描述的系统和方法来识别物体的可拾取区域。In particular, techniques described herein facilitate robotic systems to interact with target objects among multiple objects in a container. The methods and systems described herein can identify pickable regions for selected objects from among a set of objects. As described herein, a robotic transport mechanism (eg, a robotic arm) may include a suction cup or suction gripper as part of an end effector device for grasping, picking up, or grasping an object. When applied to a smooth surface of an object (e.g., a portion of an object that has a sufficiently smooth surface profile such that a suction cup can engage and form a seal between the surface of the object and the suction cup to lift and transport the object), this suction-based grasping The performance of the grip device could be better. A surface that is smooth enough to properly engage a suction gripping device and large enough to accommodate one or more suction gripping devices in a robotic delivery system may be referred to as a "pickable area." When objects are loosely organized in source repositories or containers, the systems and methods described herein may be employed to identify pickable regions of objects.

在以下中，具体细节被阐述以提供对当前公开的技术的理解。在实施例中，本文介绍的技术可被实践而不包括本文公开的每个具体细节。在其他情况下，诸如具体功能或例程的众所周知的特征未被详细描述以避免不必要地混淆本公开。本说明书中对“实施例”、“一个实施例”等的引用意味着正在被描述的特定特征、结构、材料或特性包括在本公开的至少一个实施例中。因此，本说明书中的这样的短语的出现不一定都指相同的实施例。另一方面，这样的引用也不一定是相互排斥的。此外，关于任何一个实施例描述的特定特征、结构、材料或特性可以以任何合适的方式与任何其他实施例中的那些组合，除非这样的项目是相互排斥的。应当理解，图中所示的各种实施例仅仅是说明性的表示并且不一定按比例绘制。In the following, specific details are set forth to provide an understanding of the presently disclosed technology. In embodiments, the techniques described herein may be practiced without including every specific detail disclosed herein. In other instances, well-known features such as specific functions or routines have not been described in detail to avoid unnecessarily obscuring the present disclosure. References in this specification to "an embodiment," "one embodiment," etc. mean that the particular feature, structure, material, or characteristic being described is included in at least one embodiment of the present disclosure. Thus, appearances of such phrases in this specification are not necessarily all referring to the same embodiment. On the other hand, such references are not necessarily mutually exclusive. Furthermore, a particular feature, structure, material or characteristic described with respect to any one embodiment may be combined in any suitable manner with those of any other embodiment, unless such items are mutually exclusive. It should be understood that the various embodiments shown in the figures are illustrative representations only and are not necessarily drawn to scale.

为清楚起见，描述众所周知且通常与机器人系统和子系统相关联但可能不必要地混淆所公开技术的一些重要方面的结构或过程的若干细节在以下描述中未被阐述。此外，尽管以下公开阐述了本技术的不同方面的若干实施例，但若干其他实施例可具有与本部分中描述的那些不同的配置或不同的组件。因此，所公开的技术可以具有带有附加元素或没有下面描述的元素中的若干元素的其他实施例。In the interest of clarity, several details are not set forth in the following description, describing structures or processes that are well known and commonly associated with robotic systems and subsystems, but that might unnecessarily obscure some important aspects of the disclosed technology. Furthermore, while the following disclosure sets forth several embodiments of different aspects of the technology, several other embodiments may have different configurations or different components than those described in this section. Accordingly, the disclosed technology is capable of other embodiments with additional elements or without several of the elements described below.

下面描述的本公开的许多实施例或方面可以采取计算机或控制器可执行指令的形式，包括由可编程计算机或控制器执行的例程。相关领域的技术人员将理解，所公开的技术可以在除了下面所示和描述的那些之外的计算机或控制器系统上或通过其实践。本文描述的技术可以体现在专用计算机或数据处理器中，该专用计算机或数据处理器被具体编程、配置或构造为执行下述计算机可执行指令中的一个或多个。因此，如本文通常使用的术语“计算机”和“控制器”是指任何数据处理器，并且可以包括互联网工具和手持设备(包括掌上型计算机、可穿戴计算机、蜂窝或移动电话、多处理器系统、基于处理器或可编程的消费电子产品、网络计算机、小型计算机等)。这些计算机和控制器处理的信息可以在包括液晶显示器(LCD)的任何合适的显示介质上呈现。用于执行计算机或控制器可执行任务的指令可以存储在包括硬件、固件或硬件和固件的组合的任何合适的计算机可读介质中或上。指令可以包含在包括例如闪存驱动器、USB设备和/或其他合适的介质的任何合适的存储器设备中。Many of the embodiments or aspects of the disclosure described below may take the form of computer or controller-executable instructions, including routines, executed by a programmable computer or controller. Those skilled in the relevant art will understand that the disclosed techniques may be practiced on or with computer or controller systems other than those shown and described below. The techniques described herein may be embodied in a special purpose computer or data processor that is specifically programmed, configured, or constructed to execute one or more of the computer-executable instructions described below. Accordingly, the terms "computer" and "controller" as generally used herein refer to any data processor, and may include Internet appliances and handheld devices (including palmtop computers, wearable computers, cellular or mobile phones, multiprocessor systems , processor-based or programmable consumer electronics, network computers, minicomputers, etc.). Information processed by these computers and controllers can be presented on any suitable display medium including liquid crystal displays (LCDs). Instructions for performing computer- or controller-executable tasks may be stored in or on any suitable computer-readable medium including hardware, firmware, or a combination of hardware and firmware. The instructions may be contained in any suitable memory device including, for example, a flash drive, USB device, and/or other suitable media.

术语“耦合”和“连接”以及它们的派生词在本文中可以用来描述组件之间的结构关系。应该理解，这些术语不旨在作为彼此的同义词。相反，在特定实施例中，“连接”可用于指示两个或更多个元素彼此直接接触。除非上下文中另有明确说明，否则术语“耦合”可用于指示两个或更多个元素直接或间接(通过它们之间的其他插入元素)彼此接触，或者两个或多个元素彼此合作或交互(例如，如在因果关系中，诸如用于信号传输/接收或功能调用)，或两者兼而有之。The terms "coupled" and "connected," along with their derivatives, may be used herein to describe structural relationships between components. It should be understood that these terms are not intended as synonyms for each other. In contrast, in particular embodiments, "connected" may be used to indicate that two or more elements are in direct contact with each other. Unless the context clearly dictates otherwise, the term "coupled" may be used to indicate that two or more elements contact each other, directly or indirectly (through other intervening elements between them), or that two or more elements co-operate or interact with each other (eg, as in causality, such as for signal transmission/reception or function calls), or both.

本文对通过计算系统进行的图像分析的任何参考可以根据或使用空间结构信息来执行，该空间结构信息可以包括描述相对于所选点的各种位置的相应深度值的深度信息。深度信息可用于标识物体或估计物体如何在空间上布置。在某些情况下，空间结构信息可包括或可用于生成描述物体的一个或多个表面的位置的点云。空间结构信息仅仅是可能的图像分析的一种形式，并且本领域技术人员已知的其他形式可以根据本文描述的方法使用。Any reference herein to image analysis by a computing system may be performed in accordance with or using spatial structure information, which may include depth information describing corresponding depth values at various locations relative to selected points. Depth information can be used to identify objects or estimate how objects are spatially arranged. In some cases, spatial structure information may include or be used to generate a point cloud describing the location of one or more surfaces of an object. Spatial structure information is only one form of possible image analysis, and others known to those skilled in the art may be used in accordance with the methods described herein.

图1A示出了用于执行物体检测或者更具体的物体识别的系统1000。更特别地，系统1000可以包括计算系统1100和相机1200。在此示例中，相机1200可以被配置为生成图像信息，该图像信息描述或以其他方式表示相机1200所处的环境，或更具体地表示相机1200的视场(也被称为相机视场)中的环境。环境可以是例如仓库、制造工厂、零售空间或其他场所。在本文描述的具体实施例中，环境可以是包括一个或多个源储存库和一个或多个目的地储存库的物体处置环境。在这样的情况下，图像信息可以表示位于这样的场所的物体(诸如盒子、袋子、包、柜子、箱子等)的图像。此类物体可以位于源储存库和目的地储存库中。系统1000可以被配置为诸如通过使用图像信息在相机视场中的个体物体之间进行区分来生成、接收和/或处理图像信息以基于图像信息执行物体标识或物体注册，和/或基于图像信息执行机器人交互规划，如下面更详细地讨论的(在本公开中术语“和/或”和“或”可互换使用)。机器人交互规划可以用于例如控制场所处的机器人以促进机器人与容器或其他物体之间的机器人交互。计算系统1100和相机1200可以位于相同的场所，或者可以彼此远离。例如，计算系统1100可以是托管在远离仓库或零售空间的数据中心中的云计算平台的一部分，并且可以经由网络连接与相机1200通信。FIG. 1A shows a system 1000 for performing object detection, or more specifically object recognition. More particularly, system 1000 may include computing system 1100 and camera 1200 . In this example, the camera 1200 may be configured to generate image information that describes or otherwise represents the environment in which the camera 1200 is located, or more specifically represents the field of view of the camera 1200 (also referred to as the camera field of view). in the environment. The environment may be, for example, a warehouse, manufacturing plant, retail space, or other location. In particular embodiments described herein, an environment may be an object handling environment including one or more source repositories and one or more destination repositories. In such a case, the image information may represent an image of an object (such as a box, bag, bag, cabinet, case, etc.) located at such a location. Such objects can be located in both source and destination repositories. System 1000 may be configured to generate, receive, and/or process image information to perform object identification or object registration based on the image information, such as by using the image information to distinguish between individual objects in the field of view of the camera, and/or to perform object identification or object registration based on the image information. Robot interaction planning is performed, as discussed in more detail below (the terms "and/or" and "or" are used interchangeably in this disclosure). Robot interaction planning can be used, for example, to control a robot at a location to facilitate robot interaction between the robot and a container or other object. Computing system 1100 and camera 1200 may be co-located, or may be remote from each other. For example, computing system 1100 may be part of a cloud computing platform hosted in a data center remote from a warehouse or retail space, and may communicate with camera 1200 via a network connection.

在实施例中，相机1200(其也可以被称为图像感测设备)可以是2D相机和/或3D相机。例如，图1B示出了系统1500A(其可以是系统1000的实施例)，该系统1500A包括计算系统1100以及相机1200A和相机1200B，相机1200A和相机1200B两者都可以是相机1200的实施例。在此示例中，相机1200A可以是被配置为生成2D图像信息的2D相机，该2D图像信息包括或形成描述相机的视场中的环境的视觉外观的2D图像。相机1200B可以是被配置为生成3D图像信息的3D相机(也被称为空间结构感测相机或空间结构感测设备)，该3D图像信息包括或形成关于相机的视场中的环境的空间结构信息。该空间结构信息可以包括深度信息(例如，深度图)，该深度信息描述了相对于相机1200B的各种位置的相应深度值，诸如相机1200B的视场中各种物体的表面上的位置。相机的视场中或物体的表面上的这些位置也可被称为物理位置。在此示例中，深度信息可以用于估计物体如何被空间地布置在三维(3D)空间中。在某些情况下，空间结构信息可以包括或可以用于生成点云(也称为3D点云)，该点云描述了相机1200B的视场中物体的一个或多个表面上的位置。更具体地，空间结构信息可以描述一个或多个物体的结构(也被称为物体结构)上的各种位置。In an embodiment, the camera 1200 (which may also be referred to as an image sensing device) may be a 2D camera and/or a 3D camera. For example, FIG. 1B shows system 1500A (which may be an embodiment of system 1000 ) that includes computing system 1100 and cameras 1200A and 1200B, both of which may be embodiments of camera 1200 . In this example, camera 1200A may be a 2D camera configured to generate 2D image information comprising or forming a 2D image describing the visual appearance of an environment in the camera's field of view. Camera 1200B may be a 3D camera (also referred to as a spatial structure sensing camera or spatial structure sensing device) configured to generate 3D image information comprising or forming a spatial structure about the environment in the camera's field of view information. The spatial structure information may include depth information (eg, a depth map) that describes respective depth values relative to various locations of the camera 1200B, such as locations on the surface of various objects in the field of view of the camera 1200B. These locations in the camera's field of view or on the surface of an object may also be referred to as physical locations. In this example, depth information can be used to estimate how objects are spatially arranged in three-dimensional (3D) space. In some cases, the spatial structure information may include or be used to generate a point cloud (also referred to as a 3D point cloud) that describes the location on one or more surfaces of objects in the field of view of the camera 1200B. More specifically, spatial structure information may describe various locations on the structure (also referred to as object structure) of one or more objects.

在实施例中，系统1000可以是用于促进机器人与相机1200的环境中的各种物体之间的机器人交互的机器人操作系统。例如，图1C示出了机器人操作系统1500B，其可以是图1A和图1B的系统1000/1500A的实施例。机器人操作系统1500B可以包括计算系统1100、相机1200和机器人1300。如上所述，机器人1300可以用于与相机1200的环境中的一个或多个物体(诸如与盒子、袋子、包、板条箱、柜子、托盘或其他容器)交互。例如，机器人1300可以被配置为从一个位置拾取物体并将它们移动到另一位置。在某些情况下，机器人1300可以用于执行在其中一组容器或其他物体被卸载并移动到例如传送带的卸垛操作。在某些实现中，如下面讨论的，相机1200可以被附接到机器人1300或机器人3300。这也被称为掌上相机或手持相机解决方案。相机1200可以被附接到机器人1300的机械臂3320。然后，机械臂3320可以移动到各种挑选范围以生成关于那些范围的图像信息。在某些实现中，相机1200可以与机器人1300分开。例如，相机1200可以被安装到仓库的天花板或其他结构，并且可以相对于该结构保持固定。在某些实现中，可以使用多个相机1200，包括与机器人1300分开的多个相机1200和/或与掌上相机1200一起使用的与机器人1300分开的相机1200。在某些实现中，一个相机1200或多个相机1200可以安装或固定到与用于物体操纵的机器人1300分开的专用机器人系统上，诸如机械臂、门架(gantry)或被配置用于相机移动的其他自动化系统。贯穿说明书，可以讨论“控制”相机1200。对于掌上相机解决方案，对相机1200的控制还包括对相机1200被安装或附接到的机器人1300的控制。In an embodiment, system 1000 may be a robot operating system for facilitating robotic interactions between the robot and various objects in the environment of camera 1200 . For example, FIG. 1C shows a robot operating system 1500B, which may be an embodiment of the system 1000/1500A of FIGS. 1A and 1B . The robot operating system 1500B may include a computing system 1100 , a camera 1200 and a robot 1300 . As noted above, robot 1300 may be used to interact with one or more objects in the environment of camera 1200 , such as with a box, bag, bag, crate, cabinet, tray, or other container. For example, robot 1300 may be configured to pick objects from one location and move them to another location. In some cases, robot 1300 may be used to perform depalletizing operations in which a set of containers or other objects is unloaded and moved to, for example, a conveyor belt. In some implementations, camera 1200 may be attached to robot 1300 or robot 3300, as discussed below. This is also known as a handheld camera or handheld camera solution. The camera 1200 may be attached to the robot arm 3320 of the robot 1300 . The robotic arm 3320 may then move to various pick areas to generate image information about those areas. In some implementations, camera 1200 may be separate from robot 1300 . For example, camera 1200 may be mounted to the ceiling or other structure of a warehouse and may remain fixed relative to the structure. In some implementations, multiple cameras 1200 may be used, including multiple cameras 1200 separate from the robot 1300 and/or cameras 1200 separate from the robot 1300 used with the palm camera 1200 . In some implementations, the camera 1200 or cameras 1200 may be mounted or fixed to a dedicated robotic system separate from the robot 1300 used for object manipulation, such as a robotic arm, gantry, or configured for camera movement other automated systems. Throughout the description, "controlling" the camera 1200 may be discussed. For a handheld camera solution, control of the camera 1200 also includes control of the robot 1300 to which the camera 1200 is mounted or attached.

在实施例中，图1A-1C的计算系统1100可以形成或集成到机器人1300，其也被称为机器人控制器。机器人控制系统可以被包括在系统1500B中，并且可以被配置为例如为机器人1300生成命令，诸如用于控制机器人1300与容器或其他物体之间的机器人交互的机器人交互移动命令。在这样的实施例中，计算系统1100可以被配置为基于例如由相机1200生成的图像信息来生成这样的命令。例如，计算系统1100可以被配置为基于图像信息确定运动规划，其中运动规划可以旨在例如抓持或以其他方式拾取物体。计算系统1100可生成一个或多个机器人交互移动命令以执行运动规划。In an embodiment, the computing system 1100 of FIGS. 1A-1C may form or be integrated into a robot 1300, also referred to as a robot controller. A robot control system may be included in system 1500B and may be configured, for example, to generate commands for robot 1300, such as robot interaction move commands for controlling robot interactions between robot 1300 and containers or other objects. In such embodiments, computing system 1100 may be configured to generate such commands based on, for example, image information generated by camera 1200 . For example, computing system 1100 may be configured to determine a motion plan based on image information, where the motion plan may be aimed at, for example, grasping or otherwise picking up an object. Computing system 1100 may generate one or more robot interaction movement commands to perform motion planning.

在实施例中，计算系统1100可以形成视觉系统或是视觉系统的一部分。视觉系统可以是生成例如视觉信息的系统，该视觉信息描述机器人1300所处的环境，或替代地或附加地，描述相机1200所处的环境。视觉信息可以包括上面讨论的3D图像信息和/或2D图像信息，或某些其他图像信息。在某些情况下，如果计算系统1100形成视觉系统，则视觉系统可以是上面讨论的机器人控制系统的一部分，或者可以与机器人控制系统分开。如果视觉系统与机器人控制系统是分开的，则视觉系统可以被配置为输出描述机器人1300所处的环境的信息。信息可以被输出到机器人控制系统，该机器人控制系统可以从视觉系统接收这样的信息并且基于该信息来执行运动规划和/或生成机器人交互移动命令。关于视觉系统的进一步信息在下面详细说明。In an embodiment, computing system 1100 may form or be part of a vision system. A vision system may be a system that generates, for example, visual information describing the environment in which the robot 1300 is located, or alternatively or additionally, describing the environment in which the camera 1200 is located. Visual information may include the 3D image information and/or 2D image information discussed above, or some other image information. In some cases, if computing system 1100 forms a vision system, the vision system may be part of the robotic control system discussed above, or may be separate from the robotic control system. If the vision system is separate from the robot control system, the vision system may be configured to output information describing the environment in which the robot 1300 is located. The information may be output to a robotic control system, which may receive such information from the vision system and perform motion planning and/or generate robot interactive movement commands based on this information. Further information on the vision system is detailed below.

在实施例中，计算系统1100可以经由直接连接(诸如，经由专用有线通信接口(诸如，RS-232接口、通用串行总线(USB)接口)和/或经由本地计算机总线(诸如外围组件互连(PCI)总线)提供的连接)与相机1200和/或机器人1300进行通信。在实施例中，计算系统1100可以经由网络与相机1200和/或与机器人1300通信。网络可以是任何类型和/或形式的网络，诸如个人局域网(PAN)、局域网(LAN)(例如，内联网)、城域网(MAN)、广域网(WAN)或互联网。网络可以利用不同的技术和协议层或协议栈，包括例如以太网协议、互联网协议套件(TCP/IP)、ATM(异步传输模式)技术、SONET(同步光网络)协议或SDH(同步数字体系)协议。In an embodiment, computing system 1100 may be connected via a direct connection, such as via a dedicated wired communication interface, such as an RS-232 interface, a Universal Serial Bus (USB) interface, and/or via a local computer bus, such as a Peripheral Component Interconnect (PCI) bus) to communicate with the camera 1200 and/or the robot 1300. In an embodiment, computing system 1100 may communicate with camera 1200 and/or with robot 1300 via a network. The network may be any type and/or form of network, such as a Personal Area Network (PAN), a Local Area Network (LAN) (eg, an Intranet), a Metropolitan Area Network (MAN), a Wide Area Network (WAN), or the Internet. Networks can utilize different technologies and protocol layers or stacks including, for example, Ethernet protocol, Internet Protocol Suite (TCP/IP), ATM (Asynchronous Transfer Mode) technology, SONET (Synchronous Optical Network) protocol or SDH (Synchronous Digital Hierarchy) protocol.

在实施例中，计算系统1100可以与相机1200和/或与机器人1300直接通信信息，或者可以经由中间存储设备或更一般地经由中间非暂时性计算机可读介质进行通信。例如，图1D示出了可以是系统1000/1500A/1500B的实施例的系统1500C，该系统1500C包括非暂时性计算机可读介质1400，该非暂时性计算机可读介质1400可以在计算系统1100的外部，并且可以充当外部缓冲器或用于存储例如由相机1200生成的图像信息的存储库。在这样的示例中，计算系统1100可以检索或以其他方式接收来自非暂时性计算机可读介质1400的图像信息。非暂时性计算机可读介质1400的示例包括电子存储设备、磁存储设备、光存储设备、电磁存储设备、半导体存储设备或其任何合适的组合。非暂时性计算机可读介质可以形成例如计算机软盘、硬盘驱动器(HDD)、固态驱动器(SDD)、随机存取存储器(RAM)、只读存储器(ROM)、可擦可编程只读存储器(EPROM或闪存)、静态随机存取存储器(SRAM)、便携式光盘只读存储器(CD-ROM)、数字多功能磁盘(DVD)和/或记忆棒。In embodiments, computing system 1100 may communicate information directly with camera 1200 and/or with robot 1300, or may communicate via an intermediate storage device or, more generally, via an intermediate non-transitory computer-readable medium. For example, FIG. 1D illustrates a system 1500C, which may be an embodiment of the system 1000/1500A/1500B, that includes a non-transitory computer-readable medium 1400 that may be stored in the computing system 1100 external, and may act as an external buffer or repository for storing image information generated by the camera 1200, for example. In such examples, computing system 1100 may retrieve or otherwise receive image information from non-transitory computer-readable medium 1400 . Examples of non-transitory computer readable medium 1400 include electronic storage devices, magnetic storage devices, optical storage devices, electromagnetic storage devices, semiconductor storage devices, or any suitable combination thereof. The non-transitory computer readable medium can form, for example, a computer floppy disk, hard disk drive (HDD), solid state drive (SDD), random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), static random access memory (SRAM), portable compact disc read only memory (CD-ROM), digital versatile disk (DVD) and/or memory stick.

如上所述，相机1200可以是3D相机和/或2D相机。2D相机可以被配置为生成2D图像，诸如彩色图像或灰度图像。3D相机可以是例如深度感测相机，诸如飞行时间(TOF)相机或结构化光相机，或任何其他类型的3D相机。在某些情况下，2D相机和/或3D相机可以包括图像传感器，诸如电荷耦合器件(CCD)传感器和/或互补金属氧化物半导体(CMOS)传感器。在实施例中，3D相机可以包括激光器、LIDAR设备、红外设备、明/暗传感器、运动传感器、微波检测器、超声检测器、RADAR检测器或被配置为捕获深度信息或其他空间结构信息的任何其他设备。As mentioned above, the camera 1200 may be a 3D camera and/or a 2D camera. A 2D camera may be configured to generate 2D images, such as color images or grayscale images. The 3D camera may be, for example, a depth sensing camera, such as a time-of-flight (TOF) camera or a structured light camera, or any other type of 3D camera. In some cases, the 2D camera and/or the 3D camera may include an image sensor, such as a Charge Coupled Device (CCD) sensor and/or a Complementary Metal Oxide Semiconductor (CMOS) sensor. In embodiments, a 3D camera may include a laser, a LIDAR device, an infrared device, a light/dark sensor, a motion sensor, a microwave detector, an ultrasonic detector, a RADAR detector, or any other device configured to capture depth information or other spatial structure information. other devices.

如上所述，图像信息可以由计算系统1100处理。在实施例中，计算系统1100可以包括或被配置为服务器(例如，具有一个或多个服务器刀片、处理器等)、个人计算机(例如，台式计算机、膝上型计算机等)、智能电话、平板计算设备和/或其他任何其他计算系统。在实施例中，计算系统1100的任何或所有功能可以作为云计算平台的一部分来执行。计算系统1100可以是单个计算设备(例如，台式计算机)，或者可以包括多个计算设备。Image information may be processed by computing system 1100, as described above. In an embodiment, the computing system 1100 may include or be configured as a server (e.g., with one or more server blades, processors, etc.), a personal computer (e.g., a desktop computer, a laptop computer, etc.), a smartphone, a tablet computing device and/or any other computing system. In an embodiment, any or all of the functions of computing system 1100 may be performed as part of a cloud computing platform. Computing system 1100 may be a single computing device (eg, a desktop computer), or may include multiple computing devices.

图2A提供了图示计算系统1100的实施例的框图。本实施例中的计算系统1100包括至少一个处理电路1110和非暂时性计算机可读介质(或多个介质)1120。在某些情况下，处理电路1110可以包括被配置为执行存储在非暂时性计算机可读介质1120(例如，计算机存储器)上的指令(例如，软件指令)的处理器(例如，中央处理单元(CPU)、专用计算机和/或板载服务器)。在某些实施例中，处理器可以包括在可操作地耦合到其他电子/电气设备的单独/独立控制器中。处理器可以实现程序指令以控制其他设备/与其他设备接口，从而使计算系统1100执行动作、任务和/或操作。在实施例中，处理电路1110包括一个或多个处理器、一个或多个处理核、可编程逻辑控制器(“PLC”)、专用集成电路(“ASIC”)、可编程门阵列(“PGA”)、现场可编程门阵列(“FPGA”)、其任何组合或任何其他处理电路。FIG. 2A provides a block diagram illustrating an embodiment of a computing system 1100 . Computing system 1100 in this embodiment includes at least one processing circuit 1110 and a non-transitory computer-readable medium (or media) 1120 . In some cases, processing circuitry 1110 may include a processor (e.g., a central processing unit (CPU) configured to execute instructions (e.g., software instructions) stored on non-transitory computer-readable medium 1120 (e.g., computer memory) CPU), dedicated computer and/or onboard server). In some embodiments, the processor may be included in a separate/independent controller operatively coupled to other electronic/electrical devices. The processor may implement program instructions to control/interface with other devices to cause the computing system 1100 to perform actions, tasks and/or operations. In an embodiment, the processing circuitry 1110 includes one or more processors, one or more processing cores, a programmable logic controller ("PLC"), an application specific integrated circuit ("ASIC"), a programmable gate array ("PGA") ”), Field Programmable Gate Array (“FPGA”), any combination thereof, or any other processing circuitry.

在实施例中，作为计算系统1100的一部分的非暂时性计算机可读介质1120可以是上面讨论的中间非暂时性计算机可读介质1400的替代或附加。非暂时性计算机可读介质1120可以是存储设备，诸如电子存储设备、磁存储设备、光存储设备、电磁存储设备、半导体存储设备或其任何合适的组合，例如，诸如计算机软盘、硬盘驱动器(HDD)、固态驱动器(SDD)、随机存取存储器(RAM)、只读存储器(ROM)、可擦可编程只读存储器(EPROM或闪存)、静态随机存取存储器(SRAM)、便携式光盘只读存储器(CD-ROM)、数字多功能盘(DVD)、记忆棒、其任何组合或任何其他存储设备。在某些情况下，非暂时性计算机可读介质1120可以包括多个存储设备。在某些实现中，非暂时性计算机可读介质1120被配置为存储由相机1200生成并由计算系统1100接收的图像信息。在某些情况下，非暂时性计算机可读介质1120可以存储用于执行本文讨论的方法和操作的一个或多个物体识别模板。非暂时性计算机可读介质1120可以替代地或可附加地存储计算机可读程序指令，该计算机可读程序指令在由处理电路1110执行时，使处理电路1110执行这里描述的一种或多种方法。In embodiments, the non-transitory computer-readable medium 1120 that is part of the computing system 1100 may be an alternative or in addition to the intermediate non-transitory computer-readable medium 1400 discussed above. The non-transitory computer readable medium 1120 may be a storage device such as an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination thereof, such as, for example, a computer floppy disk, a hard disk drive (HDD ), Solid State Drive (SDD), Random Access Memory (RAM), Read Only Memory (ROM), Erasable Programmable ROM (EPROM or Flash), Static Random Access Memory (SRAM), Portable CD ROM (CD-ROM), Digital Versatile Disk (DVD), memory stick, any combination thereof, or any other storage device. In some cases, non-transitory computer readable medium 1120 may include multiple storage devices. In some implementations, the non-transitory computer-readable medium 1120 is configured to store image information generated by the camera 1200 and received by the computing system 1100 . In some cases, non-transitory computer readable medium 1120 may store one or more object recognition templates for performing the methods and operations discussed herein. The non-transitory computer readable medium 1120 may alternatively or additionally store computer readable program instructions which, when executed by the processing circuit 1110, cause the processing circuit 1110 to perform one or more of the methods described herein .

图2B描绘了计算系统1100A，其是计算系统1100的实施例并且包括通信接口1130。通信接口1130可以被配置为例如接收由图1A-1D的相机1200生成的图像信息。可以经由以上讨论的中间非暂时性计算机可读介质1400或网络，或者经由相机1200与计算系统1100/1100A之间的更直接的连接来接收图像信息。在实施例中，通信接口可以被配置为与图1C的机器人1300通信。如果计算系统1100在机器人控制系统外部，则计算系统1100的通信接口可以被配置为与机器人控制系统进行通信。通信接口也可以被称为通信组件或通信电路，并且可以包括例如被配置为通过有线或无线协议执行通信的通信电路。作为示例，通信电路可以包括RS-232端口控制器、USB控制器、以太网控制器、控制器、PCI总线控制器、任何其他通信电路或其组合。FIG. 2B depicts computing system 1100A, which is an embodiment of computing system 1100 and includes communication interface 1130 . Communication interface 1130 may be configured, for example, to receive image information generated by camera 1200 of FIGS. 1A-1D . Image information may be received via the intermediary non-transitory computer readable medium 1400 or network discussed above, or via a more direct connection between the camera 1200 and the computing system 1100/1100A. In an embodiment, the communication interface may be configured to communicate with the robot 1300 of FIG. 1C. If the computing system 1100 is external to the robotic control system, the communication interface of the computing system 1100 may be configured to communicate with the robotic control system. A communication interface may also be referred to as a communication component or a communication circuit, and may include, for example, a communication circuit configured to perform communication through a wired or wireless protocol. As examples, communication circuitry may include RS-232 port controllers, USB controllers, Ethernet controllers, controller, PCI bus controller, any other communication circuitry, or a combination thereof.

在实施例中，如图2C中所示，非暂时性计算机可读介质1120可以包括被配置为存储本文讨论的一个或多个数据对象(data objects)的存储空间1125。例如，存储空间可以存储物体识别模板、检测假设、图像信息、物体图像信息、机械臂移动命令以及本文讨论的计算系统可能需要访问的任何附加数据对象。In an embodiment, as shown in FIG. 2C , non-transitory computer readable medium 1120 may include storage space 1125 configured to store one or more data objects discussed herein. For example, the memory space may store object recognition templates, detection hypotheses, image information, object image information, robotic arm movement commands, and any additional data objects that the computing systems discussed herein may need to access.

在实施例中，处理电路1110可以由存储在非暂时性计算机可读介质1120上的一个或多个计算机可读程序指令编程。例如，图2D图示了计算系统1100C，其是计算系统1100/1100A/1100B的实施例，其中处理电路1110由包括物体识别模块1121、运动规划模块1129和物体操纵规划模块1126的一个或多个模块编程。处理电路1110还可以被编程有物体注册模块1130和可拾取区域检测模块1132。以上模块中的每一个可以表示计算机可读程序指令，该计算机可读程序指令被配置为当在本文描述的处理器、处理电路、计算系统等中的一个或多个上实例化时执行某些任务。以上模块中的每一个可以相互协同操作以实现本文描述的功能。本文描述的功能的各种方面可以由上述软件模块中的一个或多个来执行，并且软件模块及其描述不应被理解为限制本文所公开的系统的计算结构。例如，虽然可以关于具体模块描述具体任务或功能，但该任务或功能也可以根据需要由不同模块执行。此外，本文描述的系统功能可以由配置有不同功能分解或分配方式的不同软件模块集来执行。In an embodiment, the processing circuit 1110 may be programmed by one or more computer-readable program instructions stored on the non-transitory computer-readable medium 1120 . For example, FIG. 2D illustrates computing system 1100C, which is an embodiment of computing system 1100/1100A/1100B, in which processing circuitry 1110 is composed of one or more components including object recognition module 1121, motion planning module 1129, and object manipulation planning module 1126. Module programming. The processing circuit 1110 may also be programmed with an object registration module 1130 and a pickable area detection module 1132 . Each of the above modules may represent computer-readable program instructions configured to, when instantiated on one or more of the processors, processing circuits, computing systems, etc. described herein, perform certain Task. Each of the above modules can cooperate with each other to realize the functions described herein. Various aspects of the functionality described herein may be performed by one or more of the software modules described above, and the software modules and their descriptions should not be construed as limiting the computing architecture of the systems disclosed herein. For example, although a specific task or function may be described with respect to a particular module, that task or function may also be performed by a different module as desired. Furthermore, the system functions described herein may be performed by different sets of software modules configured with different functional decomposition or distribution.

在实施例中，物体识别模块1121可被配置为获取和分析图像信息，如贯穿本公开所讨论的。本文关于图像信息讨论的方法、系统和技术可以使用物体识别模块1121。物体识别模块还可被配置用于与物体标识相关的物体识别任务，如本文所讨论的。In an embodiment, the object recognition module 1121 may be configured to acquire and analyze image information, as discussed throughout this disclosure. The methods, systems, and techniques discussed herein with respect to image information may use the object recognition module 1121 . The object recognition module may also be configured for object recognition tasks related to object identification, as discussed herein.

运动规划模块1129可以被配置为规划和执行机器人的移动。例如，运动规划模块1129可以与本文描述的其他模块交互以规划机器人3300的运动以用于物体取得操作和用于相机放置操作。本文关于机械臂移动和轨迹讨论的方法、系统和技术可以由运动规划模块1129执行。The motion planning module 1129 may be configured to plan and execute movements of the robot. For example, the motion planning module 1129 may interact with other modules described herein to plan the motion of the robot 3300 for object fetching operations and for camera placement operations. The methods, systems, and techniques discussed herein with respect to robotic arm movements and trajectories may be performed by the motion planning module 1129 .

物体操纵规划模块1126可以被配置为规划和执行机械臂的物体操纵活动，例如，抓握和释放物体以及执行机械臂命令以帮助和促进这样的抓握和释放。The object manipulation planning module 1126 may be configured to plan and execute object manipulation activities of the robotic arm, eg, grasp and release objects and execute robotic arm commands to facilitate and facilitate such grasping and releasing.

物体注册模块1130可以被配置为获取、存储、生成和以其他方式处理本文讨论的各种任务可能需要的物体注册和检测信息。物体注册模块1130可以被配置为与任何其他必要的模块交互或通信。Object registration module 1130 may be configured to acquire, store, generate, and otherwise process object registration and detection information that may be required for the various tasks discussed herein. The object registration module 1130 may be configured to interact or communicate with any other necessary modules.

可拾取区域检测模块1132可以被配置为标识一个或多个物体的表面上的可拾取区域，例如，如关于图4所描述的那样。可拾取区域检测模块1132可以被配置为与任何其他必要的模块交互或通信。The pickable area detection module 1132 may be configured to identify a pickable area on a surface of one or more objects, eg, as described with respect to FIG. 4 . The pickable area detection module 1132 may be configured to interact or communicate with any other necessary modules.

参考图2E、2F、3A、3B和3C，解释了与可以被执行用于图像分析的物体识别模块1121和物体注册模块1130相关的方法。图2E和2F图示了与图像分析方法相关联的示例图像信息，而图3A-3C图示了与图像分析方法相关联的示例机器人环境。本文中与由计算系统进行的图像分析相关的参考可以根据或使用空间结构信息来执行，该空间结构信息可以包括描述相对于所选点的各种位置的相应深度值的深度信息。深度信息可用于标识物体或估计物体如何在空间上布置。在某些情况下，空间结构信息可包括或可用于生成描述物体的一个或多个表面的位置的点云。空间结构信息仅仅是可能的图像分析的一种形式，并且本领域技术人员已知的其他形式可以根据本文描述的方法使用。Referring to Figures 2E, 2F, 3A, 3B and 3C, methods related to the object recognition module 1121 and object registration module 1130 that may be performed for image analysis are explained. 2E and 2F illustrate example image information associated with image analysis methods, while FIGS. 3A-3C illustrate example robotic environments associated with image analysis methods. References herein related to image analysis by a computing system may be performed from or using spatial structure information, which may include depth information describing respective depth values for various locations relative to selected points. Depth information can be used to identify objects or estimate how objects are spatially arranged. In some cases, spatial structure information may include or be used to generate a point cloud describing the location of one or more surfaces of an object. Spatial structure information is only one form of possible image analysis, and others known to those skilled in the art may be used in accordance with the methods described herein.

在实施例中，计算系统1100可以获取表示相机1200的相机视场(例如，3210)中的物体的图像信息。下面描述的用于获得图像信息的步骤和技术可以是图像信息捕获操作。在某些情况下，物体可以是来自相机1200的视场3210中的源容器3510中的多个物体3520中的一个物体3520。图像信息2600、2700可以由相机(例如，1200)在物体3520在(或已经处于)在相机视场3210中时生成并且可以描述个体物体3520中的一个或多个。物体外观从相机1200的视点描述物体3520的外观。如果在相机视场中有多个物体3520，则相机可以根据需要生成表示多个物体或单个物体的图像信息(这样的与单个物体相关的图像信息可以被称为物体图像信息)。图像信息可以由相机(例如，1200)在该组物体处于(或已经处于)相机视场中时生成，并且可以包括例如2D图像信息和/或3D图像信息。In an embodiment, computing system 1100 may acquire image information representing objects in a camera field of view (eg, 3210 ) of camera 1200 . The steps and techniques described below for obtaining image information may be image information capture operations. In some cases, the object may be an object 3520 from a plurality of objects 3520 in a source container 3510 in the field of view 3210 of the camera 1200 . Image information 2600 , 2700 may be generated by camera (eg, 1200 ) while objects 3520 are (or have been) within camera field of view 3210 and may describe one or more of individual objects 3520 . The object appearance describes the appearance of the object 3520 from the viewpoint of the camera 1200 . If there are multiple objects 3520 in the camera field of view, the camera may generate image information representing multiple objects or a single object as required (such image information related to a single object may be referred to as object image information). The image information may be generated by the camera (eg, 1200 ) while the set of objects is (or has been) in the camera's field of view, and may include, for example, 2D image information and/or 3D image information.

作为示例，图2E描绘了第一组图像信息，或更具体地，2D图像信息2600，如上所述，其由相机1200生成并且表示物体3520，诸如图3A-3C中示出的那些物体。更具体地，2D图像信息2600可以是灰度或彩色图像并且可以从相机1200的视点来描述物体3520的外观。在实施例中，2D图像信息2600可以对应于彩色图像的单个颜色通道(例如，红色、绿色或蓝色通道)。如果相机1200被布置在物体3520上方，则2D图像信息2600可以表示物体3520的相应顶表面的外观。在图2E的示例中，2D图像信息2600可以包括表示物体3520的相应表面的相应部分2000A/2000B/2000C/2000D/2550，也被称为图像部分或物体图像信息。在图2E中，2D图像信息2600的每个图像部分2000A/2000B/2000C/2000D/2550可以是图像范围，或更具体地，像素范围(如果图像由像素形成)。2D图像信息2600的像素范围中的每个像素可以被表征为具有由一组坐标[U,V]描述的位置并且可以具有相对于相机坐标系或某个其他坐标系的值，如图2E和2F所示。每个像素还可以具有强度值，诸如介于0和255或0和1023之间的值。在另外的实施例中，每个像素可以包括与各种格式的像素相关联的任何附加信息(例如，色调、饱和度、强度、CMYK、RGB等)。As an example, Figure 2E depicts a first set of image information, or more specifically, 2D image information 2600, as described above, generated by camera 1200 and representing objects 3520, such as those shown in Figures 3A-3C. More specifically, 2D image information 2600 may be a grayscale or color image and may describe the appearance of object 3520 from the viewpoint of camera 1200 . In an embodiment, 2D image information 2600 may correspond to a single color channel (eg, red, green, or blue channel) of a color image. If the camera 1200 is disposed above the object 3520 , the 2D image information 2600 may represent an appearance of a corresponding top surface of the object 3520 . In the example of FIG. 2E , 2D image information 2600 may include respective portions 2000A/2000B/2000C/2000D/2550 representing respective surfaces of object 3520 , also referred to as image portions or object image information. In FIG. 2E, each image portion 2000A/2000B/2000C/2000D/2550 of 2D image information 2600 may be an image range, or more specifically, a pixel range (if the image is formed of pixels). Each pixel in the pixel range of 2D image information 2600 may be characterized as having a position described by a set of coordinates [U,V] and may have a value relative to the camera coordinate system or some other coordinate system, as shown in Figure 2E and 2F is shown. Each pixel may also have an intensity value, such as a value between 0 and 255 or 0 and 1023. In further embodiments, each pixel may include any additional information associated with the pixel in various formats (eg, hue, saturation, intensity, CMYK, RGB, etc.).

如上所述，在某些实施例中，图像信息可以是图像的全部或部分，诸如2D图像信息2600。例如，计算系统3100可以被配置为从2D图像信息2600中提取图像部分2000A以获取仅与对应物体3520相关联的图像信息。在图像部分(诸如图像部分2000A)指向单个物体的情况下，它可以被称为物体图像信息。物体图像信息不需要仅包含关于它所指向的物体的信息。例如，它所指向的物体可能靠近、低于、高于或以其他方式位于一个或多个其他物体附近。在这样的情况下，物体图像信息可以包括关于它所指向的物体以及一个或多个相邻物体的信息。计算系统1100可以通过基于2D图像信息2600和/或图2F所示的3D图像信息2700执行图像分割或其他分析或处理操作来提取图像部分2000A。在一些实现中，分割或其他处理操作可以包括检测物体的物理边缘(例如，物体的边缘)在2D图像信息2600中出现的图像位置，并且使用这样的图像位置来标识限于表示相机视场(例如，3210)中的个体物体并且基本上排除其他物体的物体图像信息。“基本上排除”意味着图像分割或其他处理技术可以被设计和配置为从物体图像信息中排除非目标物体，但可以理解，错误可能产生，噪声可能存在，并且各种其他因素可能导致包含其他物体的部分。As mentioned above, in some embodiments the image information may be all or part of an image, such as 2D image information 2600 . For example, computing system 3100 may be configured to extract image portion 2000A from 2D image information 2600 to obtain only image information associated with corresponding object 3520 . In the case where an image portion (such as image portion 2000A) points to a single object, it may be referred to as object image information. Object image information need not contain only information about the object it points to. For example, the object it is pointing at may be near, below, above or otherwise located near one or more other objects. In such cases, object image information may include information about the object it is pointing at as well as one or more neighboring objects. Computing system 1100 may extract image portion 2000A by performing image segmentation or other analysis or processing operations based on 2D image information 2600 and/or 3D image information 2700 shown in FIG. 2F . In some implementations, segmentation or other processing operations may include detecting image locations where physical edges of objects (e.g., edges of objects) occur in 2D image information 2600, and using such image locations to identify images limited to representations of the camera's field of view (e.g., , 3210) and basically exclude object image information of other objects. "Substantial exclusion" means that image segmentation or other processing techniques may be designed and configured to exclude non-target objects from object image information, but it is understood that errors may arise, noise may be present, and various other factors may lead to the inclusion of other part of the object.

图2F描绘了其中图像信息是3D图像信息2700的示例。更特别地，3D图像信息2700可以包括例如指示物体3520的一个或多个表面(例如，顶表面或其他外表面)上的各种位置的相应深度值的深度图或点云。在某些实现中，用于提取图像信息的图像分割操作可以涉及检测物体的物理边缘(例如，盒子的边缘)在3D图像信息2700中出现的图像位置，并且使用这样的图像位置来标识限于表示相机视场中的个体物体(例如，3520)的图像部分(例如，2730)。FIG. 2F depicts an example in which the image information is 3D image information 2700 . More particularly, 3D image information 2700 may include, for example, a depth map or point cloud indicating respective depth values for various locations on one or more surfaces (eg, top or other exterior surfaces) of object 3520 . In some implementations, image segmentation operations for extracting image information may involve detecting image locations where physical edges of objects (e.g., edges of boxes) occur in 3D image information 2700, and using such image locations to identify objects limited to representations Image portions (eg, 2730 ) of individual objects (eg, 3520 ) in the camera's field of view.

相应深度值可以相对于生成3D图像信息2700的相机1200，或者可以相对于另一参考点。在某些实现中，3D图像信息2700可以包括点云(3D点云)，该点云包括相机视场(例如，3210)中的物体的结构上的各种位置上的相应坐标。在图2F的示例中，点云可以包括描述物体3520的相应表面上的位置的相应坐标组。坐标可以是3D坐标，诸如[XY Z]坐标，并且可以具有相对于相机坐标系或某个其他坐标系的值。例如，3D图像信息2700可以包括指示物体3520的表面上也被称为物理位置的一组位置2710₁-2710_n的相应深度值的第一图像部分2710(也被称为图像部分)。此外，3D图像信息2700还可以包括第二、第三、第四和第五部分2720、2730、2740和2750。然后这些部分还可以指示可以分别由2720₁-2720_n,、2730₁-2730_n、2740₁-2740_n和2750₁-2750_n表示的一组位置的相应深度值。这些图仅仅是示例，并且可以使用具有对应的图像部分的任何数量的物体。与上述类似，获取的3D图像信息2700在某些情况下可以是由相机生成的第一组3D图像信息2700的一部分。在图2E的示例中，如果获取的3D图像信息2700表示图3A的单个物体3520，则3D图像信息2700可被缩小为仅参考图像部分2710。与2D图像信息2600的讨论类似，标识的图像部分2710可以属于个体物体并且可以被称为物体图像信息。因此，如本文所使用的，物体图像信息可以包括2D和/或3D图像信息。The respective depth values may be relative to the camera 1200 that generated the 3D image information 2700, or may be relative to another reference point. In some implementations, 3D image information 2700 may include a point cloud (3D point cloud) that includes corresponding coordinates at various locations on structures of objects in the camera's field of view (eg, 3210 ). In the example of FIG. 2F , the point cloud may include respective sets of coordinates describing positions on respective surfaces of object 3520 . The coordinates may be 3D coordinates, such as [XY Z] coordinates, and may have values relative to the camera coordinate system or some other coordinate system. For example, 3D image information 2700 may include a first image portion 2710 (also referred to as image portion) indicating respective depth values for a set of locations 2710 ₁ - 2710 _n on the surface of object 3520 , also referred to as physical locations. In addition, the 3D image information 2700 may further include second, third, fourth and fifth parts 2720 , 2730 , 2740 and 2750 . These parts may then also indicate corresponding depth values for a set of locations which may be represented by 2720 ₁ -2720 _n , 2730 ₁ -2730 _n , 2740 ₁ -2740 _n and 2750 ₁ -2750 _n respectively. These figures are examples only, and any number of objects with corresponding image portions may be used. Similar to the above, the acquired 3D image information 2700 may in some cases be part of the first set of 3D image information 2700 generated by the camera. In the example of FIG. 2E , if the acquired 3D image information 2700 represents the single object 3520 of FIG. 3A , the 3D image information 2700 may be reduced to only the reference image portion 2710 . Similar to the discussion of 2D image information 2600, identified image portions 2710 may pertain to individual objects and may be referred to as object image information. Accordingly, as used herein, object image information may include 2D and/or 3D image information.

在实施例中，作为获取图像信息的一部分，可以由计算系统1100执行图像归一化操作。图像归一化操作可以涉及变换由相机1200生成的图像或图像部分，以便生成变换后的图像或变换后的图像部分。例如，如果获取的可以包括2D图像信息2600、3D图像信息2700或两者的组合的图像信息可经历图像归一化操作，以试图使图像信息在与视觉描述信息相关联的视点、物体姿势、照明条件中被改变。这样的归一化可被执行以促进图像信息和模型(例如，模板)信息之间的更准确比较。视点可以指物体相对于相机1200的姿势，和/或当相机1200生成表示物体的图像时相机1200正在观察物体的角度。In an embodiment, an image normalization operation may be performed by computing system 1100 as part of acquiring image information. An image normalization operation may involve transforming an image or portion of an image generated by the camera 1200 to generate a transformed image or portion of an image. For example, if acquired image information, which may include 2D image information 2600, 3D image information 2700, or a combination of both, may undergo an image normalization operation in an attempt to bring the image information to the point of view, object pose, The lighting conditions are changed. Such normalization can be performed to facilitate more accurate comparisons between image information and model (eg, template) information. The point of view may refer to the pose of the object relative to the camera 1200, and/or the angle at which the camera 1200 is viewing the object when the camera 1200 generates an image representing the object.

例如，图像信息可以在物体识别操作期间生成，其中目标物体在相机视场3210中。当目标物体具有相对于相机的具体姿势时，相机1200可以生成表示目标物体的图像信息。例如，目标物体可以具有使其顶表面垂直于相机1200的光轴的姿势。在这样的示例中，相机1200生成的图像信息可以表示具体的视点，诸如目标物体的顶视图。在某些情况下，当相机3210在物体识别操作期间生成图像信息时，图像信息可以在诸如照明强度的特定照明条件下生成。在这样的情况下，图像信息可以表示特定的照明强度、照明颜色或其他照明条件。For example, image information may be generated during an object recognition operation where a target object is in the camera field of view 3210 . When the target object has a specific posture with respect to the camera, the camera 1200 may generate image information representing the target object. For example, the target object may have a pose with its top surface perpendicular to the optical axis of the camera 1200 . In such examples, the image information generated by camera 1200 may represent a specific viewpoint, such as a top view of a target object. In some cases, when the camera 3210 generates image information during an object recognition operation, the image information may be generated under certain lighting conditions such as lighting intensity. In such cases, the image information may represent a particular lighting intensity, lighting color, or other lighting conditions.

在实施例中，图像归一化操作可以涉及调整由相机生成的场景的图像或图像部分，以使图像或图像部分更好地匹配与物体识别模板的信息相关联的视点和/或照明条件。调整可以涉及变换图像或图像部分以生成与和物体识别模板的视觉描述信息相关联的物体姿势或照明条件中的至少一个相匹配的变换后的图像。In an embodiment, the image normalization operation may involve adjusting the image or image portion of the scene generated by the camera so that the image or image portion better matches the viewpoint and/or lighting conditions associated with the information of the object recognition template. Adjustment may involve transforming the image or image portion to generate a transformed image that matches at least one of object pose or lighting conditions associated with the visual description information of the object recognition template.

视点调整可以涉及场景的图像的处理、卷绕和/或移位，使得图像表示与可包括在物体识别模板内的视觉描述信息相同的视点。例如，处理包括改变图像的颜色、对比度或照明，场景的卷绕可包括更改图像的大小、尺寸或比例，而图像的移位可能包括更改图像的位置、朝向或旋转。在示例实施例中，处理、卷绕和/或移位可用来改变场景的图像中的物体以具有匹配或更好地对应于物体识别模板的视觉描述信息的朝向和/或大小。如果物体识别模板描述了某个物体的正面视图(head-on view)(例如，顶视图)，则场景的图像可能被卷绕，以便也表示场景中物体的正面视图。Viewpoint adjustment may involve processing, warping, and/or shifting the image of the scene such that the image represents the same viewpoint as the visual description information that may be included within the object recognition template. For example, manipulation includes changing the color, contrast, or lighting of an image, warping of a scene may include changing the size, dimensions, or proportions of an image, and translating an image may include changing the position, orientation, or rotation of an image. In an example embodiment, processing, warping, and/or shifting may be used to alter objects in an image of a scene to have an orientation and/or size that matches or better corresponds to the visual descriptor of the object recognition template. If the object recognition template describes a head-on view (eg, a top view) of an object, the image of the scene may be warped to also represent the head-on view of the object in the scene.

本文中执行的物体识别方法的其他方面在2020年8月12日提交的第16/991,510号美国申请和2020年8月12日提交的第16/991,466号美国申请中被更详细地描述，这些美国申请中的每一个通过引用并入本文。Other aspects of the object recognition methods implemented herein are described in more detail in U.S. Application No. 16/991,510, filed August 12, 2020, and U.S. Application No. 16/991,466, filed August 12, 2020, which Each of the US applications is incorporated herein by reference.

在各种实施例中，术语“计算机可读指令”和“计算机可读程序指令”用于描述被配置为执行各种任务和操作的软件指令或计算机代码。在各种实施例中，术语“模块”广义地指被配置为使处理电路1110执行一个或多个功能任务的软件指令或代码的集合。当处理电路或其他硬件组件正在执行模块或计算机可读指令时，这些模块和计算机可读指令可以被描述为执行各种操作或任务。In various embodiments, the terms "computer readable instructions" and "computer readable program instructions" are used to describe software instructions or computer code configured to perform various tasks and operations. In various embodiments, the term "module" broadly refers to a collection of software instructions or code configured to cause the processing circuit 1110 to perform one or more functional tasks. The modules and computer readable instructions may be described as performing various operations or tasks when the modules or computer readable instructions are being executed by a processing circuit or other hardware component.

图3A-3C图示了示例环境，其中可以执行可拾取区域(或抓住区域)检测操作和/或运动规划操作。更具体而言，图3A描绘了系统3000(可以是图1A-1D的系统1000/1000A/1000B/1000C的实施例)，该系统包括计算系统1100，机器人3300和相机1200。相机1200可以是相机1200的实施例，并且可以被配置为生成图像信息，该信息表示相机1200的相机视场3210中的场景，或者更具体地表示相机视场3210中的物体(诸如物体3520₁至3520_n，可以包括例如物体3520₁、3520₂、3520₃、3520₄、3520₅，...3520_n)或其结构。在图3A-3C的实施例中，机器人3300可以被配置为操纵或以其他方式与物体3520₁-3520_n中的一个或多个中的每个物体进行交互，诸如通过拾取或以其他方式抓住3520₁-3520_n中的一个，将物体从物体的当前位置抬起，然后将物体移至目的地位置。3A-3C illustrate example environments in which pickable area (or grasping area) detection operations and/or motion planning operations may be performed. More specifically, FIG. 3A depicts system 3000 (which may be an embodiment of system 1000/1000A/1000B/1000C of FIGS. 1A-1D ) that includes computing system 1100 , robot 3300 and camera 1200 . Camera 1200 may be an embodiment of camera 1200 and may be configured to generate image information representative of a scene in camera field of view 3210 of camera 1200, or more specifically an object (such as object 3520 ₁ to 3520 _n , may include, for example, objects 3520 ₁ , 3520 ₂ , 3520 ₃ , 3520 ₄ , 3520 ₅ , . . . 3520 _n ) or structures thereof. In the embodiment of FIGS. 3A-3C , robot 3300 may be configured to manipulate or otherwise interact with each of one or more of objects 3520 ₁ - 3520 _n , such as by picking or otherwise grasping Live one of 3520 ₁ -3520 _n to lift the object from its current position and move the object to its destination position.

在某些情况下，物体3520₁至3520_n中的一些或全部可以是柔性物体。例如，物体3520₁至3520_n中的每一个可以是具有一件衣服(例如，衬衫或裤子)或其他纺织品或织物的包装，其中衣服或其他纺织品可以被包裹在包装材料片(诸如一张塑料)中。在某些情况下，塑料或其他包装材料片一般可以不受空气或其他液体的影响。在图3A的示例中，物体3520₁至3520_n可以被部署在容器3510中，诸如用于将3520₁至3520_n保持在设施(诸如与服装制造商或零售商相关的仓库)中的箱子或盒子。在某些情况下，物体3520₁至3520_n中的一些或全部可以包括诸如盒子、包、袋子和其他物品之类的物品。In some cases, some or all of objects 3520 ₁ through 3520 _n may be flexible objects. For example, each of objects 3520 ₁ through 3520 _n may be a package having a piece of clothing (e.g., a shirt or pants) or other textile or fabric, where the clothing or other textile may be wrapped in a sheet of packaging material, such as a sheet of plastic )middle. In some cases, sheets of plastic or other packaging material are generally immune to air or other liquids. In the example of FIG. 3A , objects 3520 ₁ through 3520 _n may be deployed in containers 3510, such as boxes or boxes for keeping 3520 ₁ through 3520 _n in a facility, such as a warehouse associated with a clothing manufacturer or retailer. Box. In some cases, some or all of objects 3520 ₁ through 3520 _n may include items such as boxes, bags, bags, and other items.

在某些情况下，本文的实施例的柔性物体(例如，3520₁)可以具有足够高的柔性，以允许在机器人3300移动或进行其他操纵时或者当置于容器3510中时，柔性物体的形状变形。足够高的柔性可以与足够低的刚性或刚度对应，以防止物体在被机器人3300移动或以其他方式操纵时维持其形状。在某些情况下，柔性物体可以具有足够高的柔性，以允许柔性物体的重量在机器人3300抬起柔性物体时引起物体的自身形状的变形。变形可以涉及例如柔性物体弯曲，或更具体而言，在被机器人3300抬起时，在自身的重量下下垂。柔性物体的柔性可以来自例如柔性物体的维度和/或柔性物体的材料。在一个示例中，柔性物体可以具有薄的剖面，这可以将柔性(也称为柔韧性)引入柔性物体。更具体而言，柔性物体的厚度尺寸可以相对于横向维度的尺寸(例如，长度维度或宽度维度)非常小。在一个示例中，柔性物体可以由足够柔软的材料制成，以将柔性引入柔性物体。在某些情况下，柔性物体的材料可以足够柔软，以便在机器人3300抬起物体时在材料自身的重量下下垂。例如，如果柔性物体是具有一件衣服的包装，那么它可能由诸如棉布或羊毛织物之类的材料形成，该材料缺乏足够的刚度，无法在机器人3300抬起时防止材料在其自身重量下下垂。In some cases, the flexible objects (e.g., 3520 ₁ ) of the embodiments herein may be sufficiently flexible to allow the flexible object to change its shape when the robot 3300 moves or otherwise manipulates or when placed in the container 3510. out of shape. Sufficiently high flexibility may correspond to sufficiently low stiffness or rigidity to prevent the object from maintaining its shape when moved or otherwise manipulated by robot 3300 . In some cases, the flexible object may have a high enough flexibility to allow the weight of the flexible object to cause deformation of the object's own shape when the robot 3300 lifts the flexible object. Deformation may involve, for example, bending of the flexible object, or more specifically, sagging under its own weight when lifted by the robot 3300 . The flexibility of a flexible object may result from, for example, the dimensions of the flexible object and/or the material of the flexible object. In one example, a flexible object can have a thin profile, which can introduce flexibility (also called flexibility) into the flexible object. More specifically, the flexible object may have a thickness dimension that is very small relative to a transverse dimension (eg, length dimension or width dimension). In one example, the flexible object can be made of a material that is soft enough to introduce flexibility into the flexible object. In some cases, the material of the flexible object may be soft enough to sag under the material's own weight when the robot 3300 lifts the object. For example, if the flexible object is a package with a piece of clothing, it may be formed from a material such as cotton or wool fabric that lacks sufficient stiffness to prevent the material from sagging under its own weight when lifted by the robot 3300 .

在实施例中，机器人3300(其可以是机器人1300的实施例)可以包括机械臂3320，其一端附接到机器人基座3310并且另一端附接到末端执行器装置3330或由末端执行器装置3330形成。机器人基座3310可以用于安装机械臂3320的一端，而机械臂3320的另一端，或更具体地是末端执行器装置3330，可以被用于与机器人3300的环境中的一个或多个物体(例如，3520₁、3520₂等)交互。交互可以包括例如抓住并抬起一个或多个物体，和/或将一个或多个物体从当前位置移动到目的地位置。In an embodiment, robot 3300 (which may be an embodiment of robot 1300 ) may include a robotic arm 3320 with one end attached to robot base 3310 and the other end attached to or controlled by end effector device 3330 . form. Robotic base 3310 may be used to mount one end of robotic arm 3320, while the other end of robotic arm 3320, or more specifically end effector assembly 3330, may be used to interact with one or more objects ( For example, 3520 ₁ , 3520 ₂ , etc.) interactions. Interactions may include, for example, grabbing and lifting one or more objects, and/or moving one or more objects from a current location to a destination location.

在实施例中，末端执行器装置3330可以包括一个或多个吸盘3332₁-3332_n(本文也称为抽吸抓握器和抽吸抓握设备)，用于拾取或以其他方式抬起物体，诸如物体3520₁-3520_n中的一个。在一些实现中，吸盘3332₁-3332_n中的每一个(也称为末端执行器吸盘)可以是机械设备，其被配置为当相对于物体(例如，3520₁)的表面被按压接触时，减少吸盘和物体的表面(也称为物体表面)之间的空间中的流体压力(例如，气压)。在示例中，物体表面可以由一般不渗透流体的(或者更一般地是无孔的)材料形成，诸如用来包裹一件衣服的塑料包装材料。降低的流体压力(诸如部分或完全真空)会导致空间外部的流体压力与空间内的流体压力之间的压力差。更具体而言，空间内的流体压力可以低于空间外部的流体压力，这会产生负流体压力，其导致较高的流体压力发挥净力，该净力使吸盘粘附到物体表面。净力可以充当粘合力，使吸盘能够粘附到物体表面，从而抓住物体表面。在实施例中，吸盘中的每一个(例如，3332₁或3332_n)可以具有多种形状(例如，圆形形状)和尺寸，并且可以具有多种材料，诸如塑料、硅树脂、氮、氟橡胶(viton)、乙烯基、氨基甲酸酯、橡胶或某种其他柔性材料。在题为“Robotic system with a robot arm suction control mechanism and methodof operation thereof”的第10,576,630号美国专利中，更详细地讨论了吸盘，其全部内容通过引用并入本文。在实施例中，吸盘和物体表面之间的粘合力的强度可以取决于吸盘能够多紧密地密封自身和物体表面之间的空间。例如，紧密的密封可以维持压力差，从而维持粘合力，而松散的密封会阻止维持压力差，因此会干扰吸盘抓住物体表面的能力。在实施例中，吸盘形成紧密密封的能力可以取决于吸盘试图抓住物体表面的物体表面的区域(也称为表面区域)的平滑度。因此，如下面更详细讨论的，可以将计算系统1100配置为标识或搜索如下表面区域，该区域足够平滑以至于能够被用作抓住区域，吸盘可以可靠地粘附到该区域从而抓住物体表面。In an embodiment, end effector assembly 3330 may include one or more suction cups 3332 ₁ - 3332 _n (also referred to herein as suction grippers and suction gripping devices) for picking up or otherwise lifting objects , such as one of objects 3520 ₁ -3520 _n . In some implementations, each of suction cups 3332 ₁ - 3332 _n (also referred to as end effector suction cups) can be a mechanical device configured to, when pressed into contact against a surface of an object (e.g., 3520 ₁ ), Fluid pressure (eg, air pressure) in the space between the suction cup and the surface of the object (also referred to as the object surface) is reduced. In an example, the object surface may be formed from a generally fluid-impermeable (or more generally non-porous) material, such as the plastic wrap used to wrap a piece of clothing. A reduced fluid pressure, such as a partial or full vacuum, results in a pressure differential between the fluid pressure outside the space and the fluid pressure inside the space. More specifically, the fluid pressure inside the space can be lower than the fluid pressure outside the space, which creates a negative fluid pressure that causes the higher fluid pressure to exert a net force that causes the suction cup to adhere to the surface of the object. The net force can act as an adhesive force, allowing the suction cup to adhere to the surface and thus grip the surface. In embodiments, each of the suction cups (eg, 3332 ₁ or 3332 _n ) can have various shapes (eg, circular shapes) and sizes, and can be of various materials, such as plastic, silicone, nitrogen, fluorine Viton, vinyl, urethane, rubber, or some other flexible material. Suction cups are discussed in more detail in US Patent No. 10,576,630, entitled "Robotic system with a robot arm suction control mechanism and method of operation thereof," the entire contents of which are incorporated herein by reference. In an embodiment, the strength of the adhesive force between the suction cup and the surface of the object may depend on how tightly the suction cup is able to seal the space between itself and the surface of the object. For example, a tight seal can maintain a pressure differential and thus maintain adhesion, while a loose seal prevents the pressure differential from being maintained and therefore interferes with the suction cup's ability to grip the surface of the object. In an embodiment, the ability of the suction cup to form a tight seal may depend on the smoothness of the area of the object's surface (also referred to as the surface area) where the suction cup attempts to grip the object's surface. Accordingly, as discussed in more detail below, computing system 1100 can be configured to identify or search for surface regions that are smooth enough to be used as gripping regions to which suction cups can reliably adhere to grip objects surface.

在实施例中，可以将相机1200配置为生成表示物体3520₁-3520_n和容器3510或相机视场3210中的(一个或多个)任何其他物体的图像信息。相机1200可以是被配置为生成3D图像信息的3D相机和/或被配置为生成2D图像信息的2D相机。在实施例中，3D图像信息可以表示物体3520的集体物体表面，或更具体地描述物体表面的物理结构。例如，3D图像信息可以包括深度图，或更一般地包括深度信息，这些信息可以描述相机视场3210中各个位置相对于相机1200或相对于其他参考点的深度值。与相应深度值对应的位置可以是相机视场3210中的各个表面上的位置，诸如物体3520₁至3520_n的相应物体表面上的位置。在某些情况下，3D图像信息可以包括点云，其可以包括多个3D坐标，这些坐标描述在相机视场3210中的物体3520₁至3520_n的相应物体表面上的位置。In an embodiment, camera 1200 may be configured to generate image information representative of objects 3520 ₁ - 3520 _n and container 3510 or any other object(s) in camera field of view 3210 . The camera 1200 may be a 3D camera configured to generate 3D image information and/or a 2D camera configured to generate 2D image information. In an embodiment, the 3D image information may represent the collective object surface of object 3520, or more specifically describe the physical structure of the object surface. For example, 3D image information may include a depth map, or more generally, depth information that may describe depth values for various locations in the camera's field of view 3210 relative to the camera 1200 or relative to other reference points. The locations corresponding to the respective depth values may be locations on various surfaces in the camera's field of view 3210, such as locations on respective object surfaces of objects ₃₅₂₀₁ through _3520n . In some cases, the 3D image information may include a point cloud, which may include a plurality of 3D coordinates describing positions on respective object surfaces of objects 3520 ₁ through 3520 _n in camera field of view 3210 .

在实施例中，物体(例如，3520₁)的物体表面可以指物体的外表面(例如，顶表面)。在这种实施例中，3D图像信息可以包括表示该外表面的信息，或更具体而言可以描述该外表面的物理结构。例如，如果相机1200通过感测从外表面反射的光(例如，激光或结构光)或其他信号来生成3D图像信息，那么3D信息可以表示例如外表面的表面轮廓。如果外表面是由透明材料(诸如用作包装材料的柔性塑料薄片)形成的，那么3D信息仍可以表示物体的外表面。更特别地，在这种情况下，相机1200可以感测从非透明材料(诸如一件衣服织物，该材料在透明材料下面或以其他方式被透明材料覆盖)反射的光或其他信号。反射的光或信号可以穿过该透明材料，并且可以由相机1200检测以生成3D信息。在这种情况下，透明材料(例如，塑料片)可能足够薄，使得外表面与非透明材料的表面之间的距离可以被认为可忽略不计。因此，在实施例中，可以认为3D信息描述了物体的外表面上各个位置的深度信息。此外，如果透明材料形成外表面，那么透明材料可能足够柔性，因此透明材料的全部或许多部分都沿用底层非透明材料的表面轮廓。因此，在这种情况下的3D图像信息可以被视为描述物体的外表面，或者更具体而言是外表面的物理结构或表面轮廓。In an embodiment, an object surface of an object (eg, 3520 ₁ ) may refer to an outer surface (eg, a top surface) of the object. In such embodiments, the 3D image information may include information representative of the exterior surface, or more specifically may describe the physical structure of the exterior surface. For example, if the camera 1200 generates 3D image information by sensing light (eg, laser or structured light) or other signals reflected from the exterior surface, the 3D information may represent, for example, the surface profile of the exterior surface. The 3D information can still represent the outer surface of the object if the outer surface is formed from a transparent material, such as a flexible plastic sheet used as packaging material. More particularly, in this case, camera 1200 may sense light or other signals reflected from a non-transparent material, such as a piece of clothing fabric that is underlying or otherwise covered by a transparent material. Reflected light or signals can pass through the transparent material and can be detected by the camera 1200 to generate 3D information. In this case, the transparent material (eg, a plastic sheet) may be thin enough that the distance between the outer surface and the surface of the non-transparent material can be considered negligible. Therefore, in an embodiment, it can be considered that the 3D information describes the depth information of various positions on the outer surface of the object. Furthermore, if the transparent material forms the outer surface, the transparent material may be flexible enough so that all or many portions of the transparent material follow the surface contour of the underlying non-transparent material. Therefore, the 3D image information in this case can be regarded as describing the outer surface of the object, or more specifically the physical structure or surface contour of the outer surface.

在实施例中，2D图像信息可以包括例如彩色图像或灰度图像，其表示相机视场3210中一个或多个物体的外观。例如，如果物体表面具有打印在其上的视觉标记(例如，徽标)或其他视觉细节，那么2D图像信息可以描述或以其他方式表示该视觉细节。如上所述，物体表面可以是物体的外表面，其在某些情况下可能由透明材料形成。在此类情况下，2D图像信息可以表示从底层的非透明材料(例如，衬衫)的表面反射并穿过形成外表面的透明材料的光(例如，可见光)或其他信号。因为在这种情况下2D图像信息是基于穿过外表面的光或其他信号的，所以仍然可以认为2D图像信息表示外表面。另外，在某些情况下形成外表面的透明材料可以足够薄且足够透明，以至于对物体的外观几乎没有影响或有可忽略的影响，使得可以考虑物体的外观或物体的外表面是指底层非透明材料(例如，服装材料)的外观。In an embodiment, the 2D image information may include, for example, a color image or a grayscale image representing the appearance of one or more objects in the camera's field of view 3210 . For example, if a surface of an object has a visual marking (eg, a logo) or other visual detail printed on it, the 2D image information may describe or otherwise represent the visual detail. As noted above, an object surface may be the outer surface of an object, which may in some cases be formed of a transparent material. In such cases, the 2D image information may represent light (eg, visible light) or other signals reflecting from the surface of the underlying non-transparent material (eg, shirt) and passing through the transparent material forming the outer surface. Since in this case the 2D image information is based on light or other signals passing through the outer surface, the 2D image information can still be considered to represent the outer surface. Also, in some cases the transparent material forming the outer surface may be thin enough and transparent enough to have little or negligible effect on the appearance of the object such that it may be considered that the appearance of the object or the outer surface of the object refers to the underlying The appearance of non-transparent materials (for example, clothing materials).

在实施例中，系统3000可以包括多个相机。例如，图3B图示了系统3000A(其可以是系统3000的实施例)，其包括具有相机视场3210A的相机1200A，并包括具有相机视场3210B的相机1200B。相机1200A(可以是相机1200A的实施例)可以是例如被配置为生成2D图像或其他2D图像信息的2D相机，而相机1200B(可以是相机1200B的实施例)可以是例如被配置为生成3D图像信息的3D相机。In an embodiment, system 3000 may include multiple cameras. For example, FIG. 3B illustrates system 3000A (which may be an embodiment of system 3000 ) including camera 1200A having camera field of view 3210A, and including camera 1200B having camera field of view 3210B. Camera 1200A (which may be an embodiment of camera 1200A) may be, for example, a 2D camera configured to generate 2D images or other 2D image information, while camera 1200B (which may be an embodiment of camera 1200B) may be, for example, configured to generate 3D images 3D camera for information.

在实施例中，相机1200/1200A/1200B可以相对于参考点(诸如在其上放置容器3510的地板)或相对于机器人基座3310是固定的。例如，图3A中的相机1200可以安装到天花板(诸如仓库的天花板)，或安装到相对于地板、相对于机器人基座3310或某个其他参考点保持固定的安装框架。在实施例中，可以将相机1200安装在机械臂3320上。例如，图3C描绘了系统3000B(可以是系统1000的实施例)，其中相机1200附接到或以其他方式安装在形成机械臂3320的远端的末端执行器装置3330上。这种实施例可以为机器人3300提供经由机械臂3320的移动将相机1200移动到不同姿势的能力。In an embodiment, the camera 1200 / 1200A / 1200B may be fixed relative to a reference point, such as the floor on which the container 3510 is placed, or relative to the robot base 3310 . For example, the camera 1200 in FIG. 3A may be mounted to a ceiling, such as the ceiling of a warehouse, or to a mounting frame that remains fixed relative to the floor, relative to the robot base 3310, or some other reference point. In an embodiment, the camera 1200 may be mounted on the robot arm 3320 . For example, FIG. 3C depicts system 3000B (which may be an embodiment of system 1000 ) in which camera 1200 is attached or otherwise mounted on end effector device 3330 forming the distal end of robotic arm 3320 . Such an embodiment may provide the robot 3300 with the ability to move the camera 1200 to different poses via movement of the robotic arm 3320 .

计算系统1100可以被配置为生成用于源容器3510处的一个或多个物体3520的可拾取区域检测结果。例如，源容器3510可以包括具有随机的定向、姿势和位置的物体3520的容器。除可拾取区域外，可拾取区域检测结果还可以包括附加信息，诸如检测掩模信息(detection mask information)、安全体积(safety volume)或其组合，其中每个信息都在下面进行详细描述。Computing system 1100 may be configured to generate pickable area detection results for one or more objects 3520 at source container 3510 . For example, source container 3510 may include a container of objects 3520 with random orientations, poses, and positions. In addition to the pickable area, the pickable area detection result may also include additional information, such as detection mask information, safety volume or a combination thereof, each of which is described in detail below.

机器人3300可以还包括配置为获得用于实现任务的信息的其他传感器，诸如用于操纵结构构件和/或用于运输机器人单元。这些传感器可以包括被配置为检测或测量机器人3300和/或周围环境的一个或多个物理特性(例如，其一个或多个结构构件/关节的状态、状况和/或位置)的设备。传感器的一些示例可以包括加速度计、陀螺仪、力传感器、应变计、触觉传感器、扭矩传感器、位置编码器等。Robot 3300 may also include other sensors configured to obtain information for accomplishing tasks, such as for manipulating structural members and/or for transporting robotic units. These sensors may include devices configured to detect or measure one or more physical characteristics of robot 3300 and/or the surrounding environment (eg, the state, condition and/or position of one or more structural members/joints thereof). Some examples of sensors may include accelerometers, gyroscopes, force sensors, strain gauges, tactile sensors, torque sensors, position encoders, and the like.

图4提供了图示用于标识容器中的物体中一个或多个选定物体的可拾取区域的方法和操作的总体流程的流程图。可拾取区域标识方法4000可以包括本文描述的子方法和操作的特征的任何组合。该方法4000可以由本文描述的任何合适的系统和设备执行或执行。4 provides a flowchart illustrating the general flow of methods and operations for identifying a pickable area for one or more selected one or more of the objects in a container. Pickable area identification method 4000 may include any combination of features of the sub-methods and operations described herein. The method 4000 can be performed or performed by any suitable systems and devices described herein.

在操作4002中，方法4000包括获得图像信息。可以通过计算系统获得源容器中包含的一组物体或多个物体的图像信息。可以例如通过控制相机获得图像信息和/或可以从其上已存储图像信息的数据存储设备中获得图像信息。如本文所讨论的，场景中物体的图像信息可以包括3D图像信息2700。图5A和图5B提供了场景的代表性示例，其包括由2D图像信息5600表示的多个物体(图5A)和表示场景的3D图像信息5700(图5B)。At operation 4002, method 4000 includes obtaining image information. Image information of a set of objects or objects contained in a source container may be obtained by a computing system. The image information may be obtained, for example, by controlling a camera and/or may be obtained from a data storage device on which the image information has been stored. As discussed herein, image information of objects in a scene may include 3D image information 2700 . 5A and 5B provide a representative example of a scene including a plurality of objects represented by 2D image information 5600 (FIG. 5A) and 3D image information 5700 (FIG. 5B) representing the scene.

图5A描绘了由相机1200/1200A生成并且表示图3A-3C的物体3520₁-3520_n和容器3510的2D图像信息，或更具体而言是2D图像信息5600。更具体而言，2D图像信息5600可以描述物体3520₁-3520_n以及其中部署有物体3520₁-3520_n的容器3510的外观。更具体而言，2D图像信息5600可以包括分别表示物体3520₁、3520₂、3520₃、...、3520_n的视觉细节的图像部分5610₁、5620₂、5620₃、5620₄、5620₅、...、5620_n-3、5620_n-2、5620_n-1、5620_n(例如，像素区域)。在实施例中，2D图像信息可以表示物体(例如，3520₁)的物体表面。如上所述，物体表面可以是物体的外表面(例如，顶表面)，并且可以由透明材料、非透明材料(例如，半透明或不透明材料)或其组合形成。如上面进一步所述，如果外表面由覆盖底层非透明材料的透明材料形成，那么透明材料可以足够薄且透明以至于可以被认为对物体的外观具有可忽略的影响。在此类情况下，可以认为底层非透明材料的外观也是物体的外表面的外观，使得2D图像信息被认为表示物体的外表面的外观。Figure 5A depicts 2D image information, or more specifically 2D image information 5600, generated by cameras 1200/1200A and representing objects 3520i _- _3520n and container 3510 of Figures 3A-3C. More specifically, 2D image information 5600 may describe the appearance of objects 3520 ₁ - 3520 _n and containers 3510 in which objects 3520 ₁ - 3520 _n are deployed. More specifically, 2D _image information 5600 may include image portions 5610 ₁ _, _{5620 2 , 5620 3} _, ₅₆₂₀ ₄ , 5620 ₅ _, ..., 5620n _-3 , 5620n _-2 , 5620n _-1 , _5620n (e.g. pixel area). In an embodiment, the 2D image information may represent an object surface of an object (eg, 3520 ₁ ). As noted above, an object surface may be an exterior surface (eg, a top surface) of an object, and may be formed from a transparent material, a non-transparent material (eg, a translucent or opaque material), or a combination thereof. As further noted above, if the outer surface is formed of a transparent material overlying an underlying non-transparent material, the transparent material may be sufficiently thin and transparent that it may be considered to have a negligible effect on the appearance of the object. In such cases, the appearance of the underlying non-transparent material may also be considered to be that of the outer surface of the object, such that the 2D image information is considered to represent the appearance of the outer surface of the object.

图5B图示了3D图像信息5700的示例。更特别地，3D图像信息5700可以包括例如深度图或其他深度信息，该信息指示相机视场(例如，3210/3210A)中各个位置(诸如位置5700₁、5700₂、...5700_n，其可以是被组织成行和列的位置网格)的相应深度值。在某些实现中，深度图可以包括指示位置5700₁-5700_n的深度值的像素。在实施例中，位置5700₁-5700_n中的至少有一些位置是一个或多个物体表面(诸如物体3520₁-3520_n的物体表面)上的位置。例如，3D图像信息5700可以包括图像部分5720₁、5720₂、5720₃、5720₄、5720₅、...5720_n-3、5720_n-2、5720_n-1、5720_n，其中每个图像部分可以包括相应物体(例如，3520₁，3520₂，3520₃，...或3520_n)的物体表面上的相应位置集合的深度值。在某些情况下，3D图像信息可以包括点云，其可以包括分别描述位置5700₁-5700_n的坐标的集合。坐标可以是3D坐标，诸如[X Y Z]笛卡尔坐标，并且可以具有相对于相机坐标系或某个其他坐标系统的值。在这个示例中，特定位置(例如，5700₁)的[X Y Z]坐标可以具有Z分量，该Z分量等于该位置的深度值或基于该位置的深度值。深度值可以是相对于生成了该3D图像信息的相机(例如，1200/1200A)的，或者可以是相对于某个其他参考点的。FIG. 5B illustrates an example of 3D image information 5700 . More particularly, 3D image information 5700 may include, for example, a depth map or other depth information indicating various locations (such as locations 5700 ₁ , 5700 ₂ , . _. . Can be the corresponding depth value of a grid of positions organized into rows and columns. In some implementations, the depth map may include pixels indicating depth values for locations 5700 ₁ -5700 _n . In an embodiment, at least some of locations 5700 ₁ - 5700 _n are locations on one or more object surfaces, such as object surfaces of objects 3520 ₁ - 3520 _n . For example, 3D image information 5700 may include image portions 5720 ₁ , 5720 ₂ , 5720 ₃ , 5720 ₄ , 5720 ₅ , ... 5720 _n-3 , 5720 _n-2 , 5720 _n-1 , 5720 _n , where each image A portion may include depth values for a corresponding set of locations on an object surface of a corresponding object (eg, 3520 ₁ , 3520 ₂ , 3520 ₃ , . . . or 3520 _n ). In some cases, the 3D image information may include a point cloud, which may include a collection of coordinates describing locations 5700i _- _5700n , respectively. The coordinates may be 3D coordinates, such as [XYZ] Cartesian coordinates, and may have values relative to the camera coordinate system or some other coordinate system. In this example, the [XYZ] coordinates of a particular location (eg, 5700 ₁ ) may have a Z component that is equal to or based on a depth value for that location. The depth value may be relative to the camera that generated the 3D image information (eg, 1200/1200A), or may be relative to some other reference point.

在实施例中，3D图像信息可以描述物体表面的表面轮廓。例如，图5A中的3D图像信息5700可以至少具有图像部分5720₁，该图像部分描述物体3520₁的物体表面的表面轮廓。物体表面的表面轮廓可以描述物体表面的物理结构。在某些情况下，物体表面的物理结构可以是完全或基本平滑的。在某些情况下，物体表面的物理结构可以包括物理特征，诸如皱纹、凸起、脊、折痕或凹陷，它们可能形成物体表面的一个或多个非平滑部分。In an embodiment, the 3D image information may describe the surface contour of the surface of the object. For example, the 3D image information 5700 in FIG. 5A may have at least an image portion 5720 ₁ describing the surface contour of the object surface of the object 3520 ₁ . The surface profile of an object surface can describe the physical structure of the object surface. In some cases, the physical structure of the surface of the object may be completely or substantially smooth. In some cases, the physical structure of the object's surface may include physical features, such as wrinkles, bumps, ridges, creases, or depressions, which may form one or more non-smooth portions of the object's surface.

如上所述，物体表面可以是物体的外表面(例如，顶表面)，并且可以由透明材料、不透明材料(例如，半透明或不透明的材料)或其组合形成。如上面进一步所述，如果外表面由覆盖底层不透明材料的透明材料形成，那么透明材料可以足够薄且足够柔性以便被认为对物体的物理结构或表面轮廓具有可忽略的影响。在此类情况下，表示底层不透明材料的物理结构或表面轮廓的3D图像信息可以被认为也表示物体的外表面的物理结构或表面轮廓。另外，如果透明材料足够薄，那么可以认为其厚度对相机(例如，1200)的深度测量具有可忽略的影响。在这种情况下，具有在3D图像信息中表示的深度值的各种位置(诸如图像部分5720₁的位置)可以被认为是对应物体(例如，3520₁)的外表面上的位置。As noted above, an object surface may be an exterior surface (eg, a top surface) of an object, and may be formed from a transparent material, an opaque material (eg, a translucent or opaque material), or a combination thereof. As further noted above, if the outer surface is formed of a transparent material overlying an underlying opaque material, the transparent material may be thin enough and flexible enough to be considered to have a negligible effect on the physical structure or surface profile of the object. In such cases, 3D image information representing the physical structure or surface contour of the underlying opaque material may be considered to also represent the physical structure or surface contour of the outer surface of the object. Additionally, if the transparent material is thin enough, its thickness can be considered to have a negligible effect on the depth measurement of the camera (eg, 1200). In this case, various positions (such as the position of the image portion 5720 ₁ ) having depth values represented in the 3D image information may be regarded as positions on the outer surface of the corresponding object (eg, 3520 ₁ ).

在实施例中，获得图像信息(其可以包括物体检测和物体注册)可以通过任何合适的手段执行。在实施例中，标识或检测多个物体3520可以包括如下过程，该过程包括物体注册、模板生成、特征提取、假设生成、假设细化和假设验证，如例如由物体注册模块1130执行的那样。在2022年8月9日提交的第17/884,081号美国专利申请中详细描述了这种过程，该专利申请的全部内容通过引用整体并入本文。In embodiments, obtaining image information (which may include object detection and object registration) may be performed by any suitable means. In an embodiment, identifying or detecting a plurality of objects 3520 may include a process including object registration, template generation, feature extraction, hypothesis generation, hypothesis refinement, and hypothesis validation, as performed, for example, by object registration module 1130 . Such a process is described in detail in US Patent Application Serial No. 17/884,081, filed August 9, 2022, which is hereby incorporated by reference in its entirety.

物体注册是如下过程，该过程包括获得和使用物体注册数据(例如，已知的、先前存储的与物体3520相关的信息)以生成物体识别模板，以供标识和识别物理场景中的类似物体。模板生成是如下过程，该过程包括生成用于让计算系统在标识物体3520时使用的物体识别模板的集合，以用于与物体拾取相关的进一步操作。特征提取(也称为特征生成)是如下过程，该过程包括从物体图像信息中提取或生成特征以用于在物体识别模板生成时使用。假设生成是如下过程，该过程包括生成一个或多个物体检测假设，例如基于物体图像信息与一个或多个物体识别模板之间的比较。假设细化是对物体识别模板与物体图像信息的匹配进行的过程(即使在物体识别模板与物体图像信息不完全匹配的情况下也是如此)。假设验证是如下过程，通过该过程从多个假设中选择单个假设作为用于物体3520的最佳拟合或最佳选择。Object registration is the process that includes obtaining and using object registration data (eg, known, previously stored information about objects 3520) to generate object recognition templates for identifying and recognizing similar objects in a physical scene. Template generation is the process that includes generating a set of object recognition templates for use by the computing system in identifying objects 3520 for further operations related to object picking. Feature extraction (also referred to as feature generation) is the process that involves extracting or generating features from object image information for use in object recognition template generation. Hypothesis generation is a process that includes generating one or more object detection hypotheses, for example based on a comparison between object image information and one or more object recognition templates. Hypothesis refinement is the process performed on the matching of the object recognition template to the object image information (even in the case that the object recognition template does not exactly match the object image information). Hypothesis validation is the process by which a single hypothesis is selected from among multiple hypotheses as the best fit or best choice for the object 3520 .

在操作4004中，方法4000包括生成场景中的多个物体3520的表面成本图。表面成本图可以是指示收集到的多个物体3520或物体3520的一部分的表面的平滑度的图像图。表面成本图可以是标识收集到的多个物体3520或物体3520的一部分的表面的不规则性或不连续性的图像图。表面成本图可以包括用于表示收集到的多个物体3520或其一部分的表面或顶层的每个点或像素的表面成本图值。因此，表面成本图可以为代表多个物体3520或其一部分的点云的每个点指派表面成本图值。如上面所讨论的，点云的每个点/像素可以由三个坐标(x，y，z)表示。表面成本图值代表点的集合(本文称为细分区域(kernel)或单元格(cell))与相邻的细分区域之间的差异。因此，指派给任何点或细分区域的表面成本图值可以代表该点或该细分区域与相邻的点或细分区域之间的差异。At operation 4004, the method 4000 includes generating a surface cost map for the plurality of objects 3520 in the scene. The surface cost map may be an image map indicative of the smoothness of the surface of the collected objects 3520 or a portion of the objects 3520 . A surface cost map may be an image map that identifies irregularities or discontinuities in the surface of the collected plurality of objects 3520 or a portion of objects 3520 . The surface costmap may include a surface costmap value for each point or pixel representing the surface or top layer of the collected plurality of objects 3520 or a portion thereof. Accordingly, the surface costmap may assign a surface costmap value to each point of a point cloud representing number of objects 3520 or a portion thereof. As discussed above, each point/pixel of a point cloud can be represented by three coordinates (x, y, z). A surface costmap value represents the difference between a collection of points (referred to herein as a kernel or cell) and adjacent subdivisions. Thus, a surface costmap value assigned to any point or subdivision may represent the difference between that point or subdivision and an adjacent point or subdivision.

根据图像信息5700生成的表面成本图可以表示细分区域或单元格与相邻的细分区域或单元格之间的高度和角度的差异。表面成本图可以包括高度梯度图和法线差异图，或者可以根据高度梯度图和法线差异图的组合来计算。可以根据各种手段来计算或确定表面成本图，以表示代表场景中多个物体的3D图像信息5700的相邻部分之间的高度和角度差异。在实施例中，参考图6A-6I，可以如下执行表面成本图的计算。The surface cost map generated according to the image information 5700 may represent the difference in height and angle between a subdivision area or cell and an adjacent subdivision area or cell. The surface cost map may include a height gradient map and a normal difference map, or may be computed from a combination of the height gradient map and the normal difference map. Surface cost maps may be calculated or determined according to various means to represent height and angular differences between adjacent portions of 3D image information 5700 representing multiple objects in a scene. In an embodiment, referring to FIGS. 6A-6I , the calculation of the surface cost map may be performed as follows.

图6A提供了表面成本图生成方法6000的示例流程图。方法6000可以由本文描述的任何合适的处理器或计算设备执行。图6A的步骤仅通过示例提供。图6A的步骤可以按任何合适的次序或组合执行，并且可以根据需要结合附加步骤。另外，可以采用生成表面成本图的替代方法，而不偏离本公开的范围。FIG. 6A provides an example flowchart of a method 6000 of generating a surface cost map. Method 6000 can be performed by any suitable processor or computing device described herein. The steps of Figure 6A are provided by way of example only. The steps of FIG. 6A may be performed in any suitable order or combination, and additional steps may be incorporated as desired. Additionally, alternative methods of generating surface cost maps may be employed without departing from the scope of the present disclosure.

可以从3D图像信息5700生成表面成本图，以包括或提供基于若干个成本图参数的高度梯度图和法线差异图的组合。下面更详细地解释的此类成本图参数可以包括细分区域、步幅(stride)、距离阈值、法线阈值和法线权重因子。如下进一步所述，可以手动确定或自动确定成本图参数。A surface cost map may be generated from 3D image information 5700 to include or provide a combination of a height gradient map and a normal disparity map based on several cost map parameters. Such costmap parameters, explained in more detail below, may include subdivision regions, strides, distance thresholds, normal thresholds, and normal weighting factors. As described further below, costmap parameters may be determined manually or automatically.

在表面成本图生成方法6000的操作6002中，3D图像信息5700可以用单元格6101的网格6100覆盖。图6B和图6C图示了表面成本图生成方法的网格操作。单元格6101可以是矩形或正方形，并且可以根据细分区域来确定尺寸。细分区域可以以3D图像信息5700表示的点云的点或像素为单位来表示每个单元格6101的尺寸(如维度6105所示)，诸如2×2、4×4、6×6、8×8、10×10、15×15、20×20或任何其他合适的尺寸。单元格6101形成网格，可以在该网格上执行表面成本图计算。在实施例中，3D图像信息5700可以用单元格6101的单个非重叠集合进行网格化，如图6B中所示。单元格中心6102各自彼此隔开一步幅，从而产生非重叠的网格，该步幅的长度(维度6106)等于细分区域尺寸。In operation 6002 of the surface cost map generation method 6000 , 3D image information 5700 may be overlaid with a grid 6100 of cells 6101 . 6B and 6C illustrate grid operations of the surface costmap generation method. The cell 6101 may be rectangular or square, and may be sized according to subdivision areas. The subdivision area can represent the size of each cell 6101 (as shown by the dimension 6105) in units of points or pixels of the point cloud represented by the 3D image information 5700, such as 2×2, 4×4, 6×6, 8 ×8, 10×10, 15×15, 20×20 or any other suitable size. Cells 6101 form a grid on which surface costmap calculations can be performed. In an embodiment, 3D image information 5700 may be gridded with a single non-overlapping set of cells 6101, as shown in Figure 6B. The cell centers 6102 are each spaced apart from each other by a stride whose length (dimension 6106) is equal to the subdivision area size, resulting in a non-overlapping grid.

在进一步的实施例中，覆盖3D图像信息5700的网格6100可以包括重叠的单元格6101的集合。每个单元格6101可以与多个其他单元格6101重叠，单元格中心6102按照小于细分区域尺寸的步幅隔开。因此，例如，如图6C中所示，单元格6101可以使单元格中心6102被细分区域尺寸一半的步幅尺寸隔开。在图6C中，网格6100包括单元格中心6102(每个单元格中心隔开一步幅尺寸)以及宽度和长度维度等于该步幅尺寸的两倍的单元格6101。在图6C中，由阴影区域图示单个单元格6101的尺寸。每个单元格6101与其他四个单元格6101重叠。In a further embodiment, grid 6100 overlaying 3D image information 5700 may include a collection of overlapping cells 6101 . Each cell 6101 may overlap multiple other cells 6101, with cell centers 6102 spaced in steps smaller than the subdivision area size. Thus, for example, as shown in FIG. 6C, cells 6101 may have cell centers 6102 separated by a stride size that is half the subdivision area size. In FIG. 6C , grid 6100 includes cell centers 6102 each spaced apart by a stride size and cells 6101 with width and length dimensions equal to twice the stride size. In FIG. 6C, the size of a single cell 6101 is illustrated by the shaded area. Each cell 6101 overlaps the other four cells 6101 .

在以下对表面成本图计算的讨论中，表面成本图值被指派给单元格中心6102，并且在执行计算时，将每个单元格6101与其非重叠的相邻单元格进行比较。因而，出于清楚的目的，将参考图6B的非重叠布置。In the following discussion of surface costmap calculations, surface costmap values are assigned to cell centers 6102, and when calculations are performed, each cell 6101 is compared to its non-overlapping neighbors. Thus, for purposes of clarity, reference will be made to the non-overlapping arrangement of Figure 6B.

在操作6004中，表面成本图生成方法6000可以包括将平面拟合到每个单元格6101的步骤。图6D图示了与网格6100对应的一组平面6220。对于每个单元格6101，平面6201可以根据3D图像信息5700中由该单元格6101涵盖的点的x、y和z坐标来确定。因此，对于20×20的细分区域尺寸，3D图像信息5700的400个点可以被用于确定该平面6201。可以根据任何合适的方法确定平面6201，包括例如最小二乘法。在另一个示例中，可以根据3D图像信息5700中每个单元格6101内每个点处的法线向量的平均值来确定平面6201。每个平面6201包括质心6202和法线6203。质心6202位于平面6201的几何中心处，并且法线6203正交于平面6201从质心6202延伸。每个平面6201的高度可以被定义为其质心6202的高度。In operation 6004 , the surface costmap generation method 6000 may include the step of fitting a plane to each cell 6101 . FIG. 6D illustrates a set of planes 6220 corresponding to grid 6100 . For each cell 6101 , the plane 6201 may be determined from the x, y and z coordinates of the points covered by the cell 6101 in the 3D image information 5700 . Therefore, for a subdivision area size of 20×20, 400 points of 3D image information 5700 can be used to determine the plane 6201 . Plane 6201 may be determined according to any suitable method, including, for example, least squares. In another example, the plane 6201 may be determined according to the average value of normal vectors at each point in each cell 6101 in the 3D image information 5700 . Each plane 6201 includes a centroid 6202 and a normal 6203 . Centroid 6202 is located at the geometric center of plane 6201 , and normal 6203 extends from centroid 6202 normal to plane 6201 . The height of each plane 6201 can be defined as the height of its centroid 6202.

在操作6006中，表面成本图生成方法6000可以包括计算或确定每个平面6201相对于其相邻平面6201的高度梯度。图6F图示了在包含物体3520的源容器3510的表示上所覆盖的高度梯度6200。每个平面6201的高度梯度可以是平面6201与其八个相邻平面6201之间的各个高度梯度的数学组合。每个平面6201的高度梯度可以以几种不同方式确定。如图6F中所示，空心圆圈图示了低高度梯度的部分，实心圆圈图示了较高高度梯度的部分，并且十字图示了例如由于不可靠的检测或检测到源容器3510而不能被标识为物体的部分。为了说明的目的，这些值被示为高和低，而实际上这些值可以跨一定范围的可能值。可以看出，物体3520的边界处的高度梯度大于跨物体3520的中心部分的高度梯度。In operation 6006 , the surface costmap generation method 6000 may include calculating or determining the height gradient of each plane 6201 relative to its neighboring planes 6201 . FIG. 6F illustrates a height gradient 6200 overlaid on a representation of a source container 3510 containing an object 3520 . The height gradient of each plane 6201 may be a mathematical combination of individual height gradients between the plane 6201 and its eight adjacent planes 6201 . The height gradient of each plane 6201 can be determined in several different ways. As shown in FIG. 6F , open circles illustrate portions of low height gradients, filled circles illustrate portions of higher elevation gradients, and crosses illustrate portions that cannot be detected, for example due to unreliable detection or detection of source container 3510. Identify the part of the object. For purposes of illustration, these values are shown as high and low, although in practice these values may span a range of possible values. It can be seen that the height gradient at the border of object 3520 is greater than the height gradient across the central portion of object 3520 .

在实施例中，可以如下参考图6E确定平面6201与相邻平面6201之间的成本图高度梯度。首先，可以确定两个平面(6201A与6201B)之间的高度差。在实施例中，相邻平面的高度差可以基于一个平面6201B在另一个平面6201A上的延伸(例如，延伸的平面6201BA)。例如，可以将高度差确定为第一延伸的平面6201BA与第二平面6201A的质心6202A之间的高度差，其根据任一平面的法线向量的长度或者根据在3D点云的z方向上的向量的长度计算。例如，可以将高度差确定为第一延伸的平面6201BA与第二平面6201A上的对应点之间的平均高度差，其中所述对应点与3D图像信息5700的点云中的网格点对应。在实施例中，两个平面6201A与6201B之间的高度差可以被确定为以下两者之间的最大或平均高度差：通过在平面6201A之上(或之下)延伸平面6201B而确定的高度差，以及通过在平面6201B之上(或之下)延伸平面6201A而确定的高度差。这种高度差确定方法可以导致相同的高度差，而不管两个平面中的哪个平面被选择为“第一”平面，哪个平面被选择为“第二”平面。In an embodiment, the costmap height gradient between a plane 6201 and an adjacent plane 6201 may be determined with reference to FIG. 6E as follows. First, the height difference between the two planes (6201A and 6201B) can be determined. In an embodiment, the height difference of adjacent planes may be based on the extension of one plane 6201B over another plane 6201A (eg, extended plane 6201BA). For example, the height difference can be determined as the height difference between the first extended plane 6201BA and the centroid 6202A of the second plane 6201A according to the length of the normal vector of either plane or according to the z-direction of the 3D point cloud. Vector length calculation. For example, the height difference may be determined as an average height difference between corresponding points on the first extended plane 6201BA and the second plane 6201A, where the corresponding points correspond to grid points in the point cloud of the 3D image information 5700 . In an embodiment, the height difference between two planes 6201A and 6201B may be determined as the maximum or average height difference between: the height determined by extending plane 6201B above (or below) plane 6201A difference, and the height difference determined by extending plane 6201A above (or below) plane 6201B. This method of height difference determination may result in the same height difference regardless of which of the two planes is selected as the "first" plane and which of the two planes is selected as the "second" plane.

两个平面6201之间的高度差可以被直接指派给与两个平面6201对应的单元格6101之间的位置。例如，可以将与单元格6101D和6101E对应的平面6201之间的高度差(参见图6B)指派给点DE处的位置。因此，因为与单元格6101D和6101E对应的平面6201之间的高度差不与对应于单元格6101E的平面6201的质心相对应，所以可以应用校正来确定高度差，以指派给与与单元格6101E和6101D对应的平面6201之间的高度差对应的单元格6101E。在实施例中，可以通过对指派给点DE的高度差和指派给点EF的高度差求平均来应用此校正(例如，指派给点EF的高度差基于与单元格6101E和6101F对应的平面6201之间的高度差)。每个单元格6101的总高度梯度可以被确定为与相邻单元格的八个高度差的平均值。每个单元格6101的总高度梯度可以被指派作为与高度梯度成本图6200中该单元格的中心处的点相关联的值。A height difference between two planes 6201 may be directly assigned to a position between cells 6101 corresponding to the two planes 6201 . For example, a height difference between plane 6201 corresponding to cells 6101D and 6101E (see FIG. 6B ) may be assigned to the location at point DE. Therefore, since the height difference between the plane 6201 corresponding to cells 6101D and 6101E does not correspond to the centroid of plane 6201 corresponding to cell 6101E, a correction can be applied to determine the height difference to assign to and cell 6101E The height difference between the plane 6201 corresponding to 6101D corresponds to the cell 6101E. In an embodiment, this correction may be applied by averaging the height difference assigned to point DE and the height difference assigned to point EF (e.g., the height difference assigned to point EF is based on plane 6201 corresponding to cells 6101E and 6101F height difference). The total height gradient of each cell 6101 can be determined as the average of the eight height differences from neighboring cells. The total height gradient of each cell 6101 may be assigned as the value associated with the point in the height gradient cost map 6200 at the center of that cell.

在进一步的实施例中，可以根据不同的方法确定高度差。高度差可以基于例如平面6201A/6201B的质心6202A/6202B之间的高度差或基于沿着与平面6201对应的单元格6101的边界的平面图之间的高度差(或平均高度差)。可以使用其他高度差计算和定义，而不偏离本公开的范围。In further embodiments, the height difference can be determined according to different methods. The height difference may be based, for example, on the height difference between the centroids 6202A/6202B of the planes 6201A/6201B or on the height difference (or average height difference) between planar views along the boundary of the cell 6101 corresponding to the plane 6201 . Other altitude difference calculations and definitions may be used without departing from the scope of this disclosure.

关于图6B的网格6100，上述讨论表示网格6100中每个单元格6101的中心点处的高度梯度的计算。因为步幅尺寸可以小于细分区域尺寸，所以为其计算高度梯度的点的数量可以大于(甚至显著大于)能够拟合到网格6100的细分区域的数量。例如，对于步幅尺寸1，3D点云中的任何特定点可以具有相关联的高度梯度，每个梯度是基于细分区域尺寸的单元格6101的网格确定的，其中该特定点是细分区域尺寸的单元格6101之一的中心。对于步幅尺寸2，每隔一个点将具有相关联的高度梯度。With respect to the grid 6100 of FIG. 6B , the above discussion represents the calculation of the height gradient at the center point of each cell 6101 in the grid 6100 . Because the stride size can be smaller than the subdivision region size, the number of points for which height gradients are calculated can be larger (even significantly larger) than the number of subdivision regions that can be fitted to the mesh 6100 . For example, for a stride size of 1, any particular point in a 3D point cloud can have associated height gradients, each gradient determined based on a grid of cells 6101 of the subdivision region size where that particular point is the subdivision The center of one of the cells 6101 of the area size. For a stride size of 2, every other point will have an associated height gradient.

因此，高度梯度成本图6200可以包括一系列值，这些值表示3D点云中的点(在一些实施例，所有点)相对于3D点云中的相邻点的高度梯度。如上面所讨论的，高度梯度成本图6200中的点可以是3D点云图像信息5700中被一步幅隔开的那些点。对于高度梯度成本图6200中的指派有值的每个点，该值是基于平面6201(其2D投影具有细分区域尺寸)以及该平面与相邻平面6201的关系而计算的。Accordingly, height gradient cost map 6200 may include a series of values representing the height gradient of a point (in some embodiments, all points) in the 3D point cloud relative to neighboring points in the 3D point cloud. As discussed above, the points in height gradient cost map 6200 may be those points in 3D point cloud image information 5700 separated by a step. For each point in the height gradient cost map 6200 that is assigned a value, the value is calculated based on the plane 6201 (whose 2D projection has subdivision region dimensions) and the plane's relationship to neighboring planes 6201 .

在实施例中，可以通过重用两个平面6201之间的高度差值来简化或优化高度梯度成本图6200的计算。例如，在一些实施例中，如上面所讨论的，从第一平面6201到第二平面6201的高度差的计算会导致与第二平面6201与第一平面6201之间的高度差的计算完全相同的值。因此，可以只需要单次计算两个平面6201之间的高度差，从而允许高度梯度计算的总数将减少约50％。In an embodiment, the computation of the height gradient cost map 6200 may be simplified or optimized by reusing the height difference between two planes 6201 . For example, in some embodiments, as discussed above, the calculation of the height difference from the first plane 6201 to the second plane 6201 results in exactly the same calculation as the calculation of the height difference between the second plane 6201 and the first plane 6201 value. Therefore, only a single calculation of the height difference between two planes 6201 may be required, allowing the total number of height gradient calculations to be reduced by approximately 50%.

在实施例中，可以在确定高度差时使用距离阈值参数。距离阈值参数可以是如下阈值，超出该阈值的任何高度差都被指派最大值。如果两个平面之间的高度差超过距离阈值，那么可以将该高度差设置为预定值(例如，在一些实施例中，为该距离阈值)。在计算总高度梯度时，使用距离阈值可以减少两个平面之间的大高度差的权重。在实施例中，距离阈值参数也可以被用于对指派给单元格6101的高度梯度设置阈值。在对与相邻单元格的各高度差求平均之后，如果所确定的高度梯度超过距离阈值的话，则可以应用距离阈值将所确定的高度梯度更改为预定值。In an embodiment, a distance threshold parameter may be used in determining the altitude difference. The distance threshold parameter may be a threshold beyond which any height difference is assigned a maximum value. If the height difference between two planes exceeds a distance threshold, the height difference may be set to a predetermined value (eg, in some embodiments, the distance threshold). Using a distance threshold reduces the weight of large height differences between two planes when computing the total height gradient. In an embodiment, a distance threshold parameter may also be used to threshold the height gradient assigned to the cell 6101. After averaging the height differences from neighboring cells, the distance threshold may be applied to change the determined height gradient to a predetermined value if the determined height gradient exceeds the distance threshold.

在表面成本图生成方法6000的操作6008中，可以计算法线差异。图6G图示了在包含物体3520的源容器3510的表示上所覆盖的法线差异成本图6300。现在参考图6D，可以确定每个平面6201与其相邻平面6201的法线6203之间的差。法线差异可以被确定为一个平面6201的法线6203与相邻平面6201的法线6203的点积。因此，每个平面6201可以具有八个不同的计算出的法线差异。可以取得这些法线差异的均值并指派给与平面6201相关联的单元格6101(例如，在单元格6101的中心处的点)。以这种方式，可以生成法线差异成本图6300，其中表面成本图内的每个点被指派有法线差异，该法线差异指示以该点为中心的平面6201与相邻平面6201之间的角度差。如图6G中所示，空心圆圈图示了低法线差异的部分，实心圆圈图示了较大法线差异的部分，并且十字图示了例如由于不可靠的检测或检测到源容器3510而不能被标识为物体的部分。为了说明的目的，这些值被示为高和低，而实际上这些值可以跨一定范围的可能值。可以看出，物体3520的边界处的法线差异大于跨物体3520的中心部分的法线差异。In operation 6008 of surface costmap generation method 6000, a normal difference may be calculated. FIG. 6G illustrates a normal difference cost map 6300 overlaid on a representation of a source container 3510 containing an object 3520 . Referring now to FIG. 6D, the difference between the normal 6203 of each plane 6201 and its adjacent plane 6201 can be determined. The normal difference may be determined as the dot product of the normal 6203 of one plane 6201 and the normal 6203 of an adjacent plane 6201 . Thus, each plane 6201 can have eight different calculated normal differences. These normal differences can be averaged and assigned to the cell 6101 associated with the plane 6201 (eg, the point at the center of the cell 6101). In this way, a normal difference cost map 6300 can be generated in which each point within the surface cost map is assigned a normal difference indicating the difference between the plane 6201 centered at that point and the adjacent plane 6201. angle difference. As shown in FIG. 6G , open circles illustrate portions of low normal variance, filled circles illustrate portions of large normal variance, and crosses illustrate areas of failure, for example due to unreliable detection or detection of the source container 3510. The part identified as an object. For purposes of illustration, these values are shown as high and low, although in practice these values may span a range of possible values. It can be seen that the difference in normals at the boundaries of object 3520 is greater than the difference in normals across the central portion of object 3520 .

在实施例中，可以在确定法线差异时使用法线阈值参数。法线阈值参数可以是如下阈值，超出该阈值的任何高度差都被指派最大值。如果两个平面之间的法线差异超过法线阈值，那么可以将法线差异设置为预定值(例如，在一些实施例中，为该法线阈值)。在计算平均法线差异时，使用法线阈值可以减少两个平面之间的大法线差异的权重。In an embodiment, a normal threshold parameter may be used in determining the normal difference. The normal threshold parameter may be a threshold beyond which any height difference is assigned a maximum value. If the normal difference between two planes exceeds a normal threshold, then the normal difference may be set to a predetermined value (eg, in some embodiments, the normal threshold). Using a normal threshold reduces the weight of large normal differences between two planes when computing the average normal difference.

在表面成本图生成方法6000的操作6010中，可以生成表面成本图。图6H图示了在包含物体3520的源容器3510的表示上所覆盖的表面成本图6400。表面成本图6400可以作为高度梯度成本图6200和法线差异成本图6300的数学组合而生成。在实施例中，计算机系统可以根据滤波操作(诸如平均滤波器或SOBEL滤波器)来组合高度差值与法线差异值。在实施例中，高度梯度成本图6200和法线差异成本图6300中的值可以被归一化并组合。在实施例中，可以将加权因子应用于高度差值或者法线差异值以控制表面成本图对相应差值的依赖性的强烈程度。加权因子可以是法线权重因子，例如，乘以经归一化的法线差异以确定最终表面成本图6400应当由法线差异确定的强度或最终表面成本图6400应当由高度差确定的强度的因子。如下面所讨论的，可以根据预期的物体类型执行法线权重因子的选择。如图6H中所示，空心圆圈图示了低表面成本图值的部分，实心圆圈图示了较大表面成本图值的部分，并且十字图示了例如由于不可靠的检测或检测到源容器3510而不能被标识为物体的部分。为了说明的目的，值被示为高和低，而实际上这些值可以跨一定范围的可能值。可以看出，物体3520的边界处的表面成本图值大于跨物体3520的中心部分的表面成本图值。In operation 6010 of surface cost map generation method 6000, a surface cost map may be generated. FIG. 6H illustrates a surface cost map 6400 overlaid on a representation of a source container 3510 containing an object 3520 . Surface cost map 6400 may be generated as a mathematical combination of height gradient cost map 6200 and normal difference cost map 6300 . In an embodiment, the computer system may combine the height difference and normal difference values according to a filtering operation, such as an averaging filter or a SOBEL filter. In an embodiment, the values in the height gradient cost map 6200 and the normal difference cost map 6300 may be normalized and combined. In an embodiment, a weighting factor may be applied to the height difference value or the normal difference value to control how strongly the surface costmap depends on the corresponding difference value. The weighting factor may be a normal weighting factor, for example, multiplied by the normalized normal difference to determine how strongly the final surface cost map 6400 should be determined from the normal difference or how strongly the final surface cost map 6400 should be determined from the height difference factor. As discussed below, the selection of normal weighting factors can be performed based on the intended object type. As shown in Figure 6H, open circles illustrate portions of low surface cost map values, filled circles illustrate portions of larger surface cost map values, and crosses illustrate 3510 and cannot be identified as part of an object. For purposes of illustration, values are shown as high and low, although in practice these values may span a range of possible values. It can be seen that the surface costmap values at the boundaries of the object 3520 are greater than the surface costmap values across the central portion of the object 3520 .

如上面所讨论的，可以基于一个或多个参数(包括细分区域尺寸、步幅尺寸、距离阈值、法线阈值和法线权重)来执行表面成本图生成。As discussed above, surface costmap generation may be performed based on one or more parameters including subdivision region size, stride size, distance threshold, normal threshold, and normal weight.

可以根据各种因素来选择或确定细分区域尺寸和步幅尺寸以实现各种结果。在实施例中，可以选择较小的细分区域尺寸以提供对3D点云的细小改变更为敏感的结果，但是较小的细分区域尺寸也会对噪声更为敏感。在实施例中，可以选择较大的细分区域尺寸以对3D点云中的较小变化进行平滑，无论这些变化是由于噪声还是由于被成像的实际物体的变化引起的。在实施例中，可以选择小步幅尺寸来提供高分辨率、详细的表面成本图，虽然这种小步幅尺寸可能要求增加的计算能力和/或增加的处理时间。在实施例中，较大的步幅尺寸会导致3D点云的降采样，这可以提供更快的结果和/或较低的计算资源使用，但要付出一些细节作为代价。在实施例中，选择小于细分区域尺寸的0.5、小于其0.4和/或小于其0.3的步幅尺寸可以提供适当量的细节，同时仍然提供较快的结果。在实施例中，步幅尺寸为细分区域尺寸的一半或大约一半可以在降低分辨率、速度和细节水平之间提供平衡。可以理解，选择细分区域和步幅尺寸会受到处理或计算能力的可用性的影响。增加的计算资源可以允许生成更详细的表面成本图，而不会不利地增加处理时间。The subdivision area size and stride size may be selected or determined based on various factors to achieve various results. In an embodiment, smaller subdivision region sizes may be chosen to provide results that are more sensitive to small changes in the 3D point cloud, but smaller subdivision region sizes are also more sensitive to noise. In an embodiment, a larger subdivision region size may be chosen to smooth out small changes in the 3D point cloud, whether these changes are due to noise or changes in the actual object being imaged. In an embodiment, a small step size may be selected to provide a high resolution, detailed surface cost map, although such a small step size may require increased computing power and/or increased processing time. In an embodiment, a larger stride size results in downsampling of the 3D point cloud, which may provide faster results and/or lower computational resource usage at the expense of some detail. In an embodiment, selecting a stride size that is less than 0.5, less than 0.4, and/or less than 0.3 of the subdivision area size may provide an appropriate amount of detail while still providing faster results. In an embodiment, a stride size that is at or about half the size of the subdivision region may provide a balance between reduced resolution, speed, and level of detail. It will be appreciated that the choice of subdivision area and stride size will be influenced by the availability of processing or computing power. Increased computing resources may allow more detailed surface cost maps to be generated without adversely increasing processing time.

在实施例中，物体源中物体的组成会影响细分区域尺寸和步幅尺寸的最优值。例如，具有小而尖锐的不连续性的物体的集合可以受益于较小的步幅尺寸以便捕获更精细的细节。在另一个示例中，具有粗糙但可变形的物体的集合可以受益于较大的细分区域尺寸以提供更大的平滑。在另一个示例中，如果与物体尺寸相比，细分区域尺寸较大(例如，物体尺寸仅为细分区域尺寸的2、3或4倍)，那么表面成本图可能包括很少的平滑区域，因为覆盖物体的细分区域中有许多细分区域也将与存在不连续性的物体的边缘重叠。在另一个示例中，如果细分区域尺寸和步幅尺寸太大，那么具有小半径的平滑弯曲表面的物体会导致不正确的高成本。In an embodiment, the composition of the objects in the object source affects the optimal values of the subdivision region size and the stride size. For example, collections of objects with small, sharp discontinuities can benefit from smaller stride sizes in order to capture finer details. In another example, collections with rough but deformable objects can benefit from larger subdivision region sizes to provide greater smoothing. In another example, if the subdivision region size is large compared to the object size (e.g., the object size is only 2, 3 or 4 times the subdivision region size), then the surface costmap may include few smooth regions , since many of the subdivisions covering the object will also overlap the edges of the object where the discontinuity exists. In another example, objects with smooth curved surfaces with small radii can result in incorrectly high costs if the subdivision region size and stride size are too large.

在实施例中，物体源中物体的组成也会影响用于距离阈值、法线阈值和法线权重因子的最优值。例如，现在参考图6I，可以考虑像盒子的物体6500和像袋子的物体6501(它们是物体3520的示例)。物体6500/6501的中心部分具有描述物体6500/6501的整体或主体平滑度的平滑度特性，而物体6500/6501的边缘描述物体6500/6501之间的过渡。因此，选择可以利用这一点的参数是有利的。In an embodiment, the composition of the objects in the object source also affects the optimal values for the distance threshold, normal threshold and normal weighting factor. For example, referring now to FIG. 61 , consider a box-like object 6500 and a bag-like object 6501 (which are examples of object 3520 ). The central portion of the objects 6500/6501 has a smoothness property that describes the overall or body smoothness of the objects 6500/6501, while the edges of the objects 6500/6501 describe the transitions between the objects 6500/6501. Therefore, it is advantageous to choose parameters that can take advantage of this.

例如，可以根据物体尺寸来选择距离阈值。任何检测到的等于或大于距离阈值的高度差都会被设置为高度差的最大值。因此，物体6500/6501的边缘处的高度差可能对表面成本图6400具有相同的影响，而不管物体是在几个物体的堆叠的顶部还是只有一个物体。物体6500/6501的边缘处的更大高度下降(例如，因为物体6500/6501堆叠在其他物体6500/6501上)没有提供任何用于标识物体过渡的附加信息。For example, the distance threshold may be chosen according to object size. Any detected altitude difference equal to or greater than the distance threshold will be set to the maximum altitude difference. Thus, height differences at the edges of objects 6500/6501 may have the same impact on surface cost map 6400 regardless of whether the object is on top of a stack of several objects or just one object. The greater drop in height at the edges of objects 6500/6501 (eg, because objects 6500/6501 are stacked on top of other objects 6500/6501) does not provide any additional information for identifying object transitions.

在另一个示例中，可以根据物体形状来选择法线阈值。例如，对于像盒子的物体6500，预期法线将具有低变化。在这种情况下，法线阈值可以被选择为大于由噪声引起的预期变化的值。因此，在法线差异成本图中，被标识为大于由噪声差异引起的任何法线差异的任何法线差异都被设置为最大值。在像盒子的物体6500中，因为所有物体表面都很可能是平面，所以可以被标识为真实(因为它超过噪声值)的法线的任何变化都可以表示物体6500之间的不连续性。对于此类物体6500，还可以选择法线权重因子以便对法线差异和高度差提供大致相等的权重。在另一个示例中，像袋子的物体(诸如物体6501)可以具有角度显著改变的部分，而不表示物体不连续性。在这种情况下，可以选择法线权重因子以对高度差提供更大的权重，因为法线的差异提供较少的关于物体不连续性的信息。在还有另一个示例中，可变形的袋子可被预期在法线上有大的变化，于是法线权重因子可以被选择为对高度差提供更大的权重，因为法线的差异提供非常少的关于物体不连续性的信息。In another example, the normal threshold can be chosen based on object shape. For example, for a box-like object 6500, it is expected that the normals will have low variance. In this case, the normal threshold can be chosen to be a value larger than the expected variation due to noise. Therefore, in the normal difference costmap, any normal difference that is identified as larger than any normal difference caused by the noise difference is set to the maximum value. In box-like objects 6500, since all object surfaces are likely to be planar, any change in the normal that can be identified as real (because it exceeds the noise value) can indicate a discontinuity between objects 6500. For such objects 6500, the normal weighting factor may also be chosen so as to give roughly equal weight to normal differences and height differences. In another example, a bag-like object, such as object 6501, may have portions where the angle changes significantly, rather than representing an object discontinuity. In this case, the normal weighting factor can be chosen to give greater weight to height differences, since differences in normals provide less information about object discontinuities. In yet another example, deformable bags can be expected to have large variations in normals, so the normal weighting factor can be chosen to give greater weight to height differences, since differences in normals provide very little information about the discontinuity of the object.

如上面所讨论的，根据源容器中的物体类型和物体尺寸，不同参数可以在表面成本图生成中提供更好或更糟的结果。在实施例中，可以手动选择表面成本图生成参数，例如根据源容器中物体的预期类型和尺寸。在进一步的实施例中，参数选择可以被自动化，并且可以基于例如所获得的2D图像信息2600和/或所获得的3D图像信息5700来执行。如上面所讨论的，可以在所获得的2D图像信息2600和/或所获得的3D图像信息5700上执行物体检测(包括例如物体注册)，以标识源容器中物体的尺寸、形状和/或类型。根据物体检测(例如，物体注册)，可以自动地选择表面成本图生成参数，包括细分区域尺寸、步幅尺寸、距离阈值、法线阈值和法线权重因子。As discussed above, different parameters can give better or worse results in surface costmap generation depending on the object type and object size in the source container. In an embodiment, surface costmap generation parameters may be manually selected, for example based on the expected type and size of objects in the source container. In a further embodiment, parameter selection may be automated and may be performed based on, for example, the obtained 2D image information 2600 and/or the obtained 3D image information 5700 . As discussed above, object detection (including, for example, object registration) may be performed on the obtained 2D image information 2600 and/or the obtained 3D image information 5700 to identify the size, shape and/or type of objects in the source container . Depending on object detection (eg, object registration), surface costmap generation parameters can be automatically selected, including subdivision region size, stride size, distance threshold, normal threshold, and normal weighting factor.

在包括具有多种不同类型的物体的源容器的实施例中，距离阈值、法线阈值和法线权重因子可以针对与不同类型的注册物体相关联的区域在表面成本图内进行调整。In embodiments including source containers with multiple different types of objects, distance thresholds, normal thresholds, and normal weighting factors may be adjusted within the surface cost map for regions associated with different types of registered objects.

现在返回到图4，在操作4006中，方法4000包括图像信息(例如，2D图像信息2600和/或3D图像信息5700)的分割。可以根据通过上述方法或通过任何合适的方法所生成的表面成本图6400进行分割。分割图像信息可以提供多个图像片段，这些图像片段使用表面成本图6400的值来标识场景中的各个物体。关于图7A-图7E描述了根据实施例的图像分割过程。Returning now to FIG. 4 , at operation 4006 , method 4000 includes segmentation of image information (eg, 2D image information 2600 and/or 3D image information 5700 ). Segmentation may be performed from the surface cost map 6400 generated by the methods described above or by any suitable method. The segmented image information may provide a plurality of image segments that use the values of the surface costmap 6400 to identify various objects in the scene. An image segmentation process according to an embodiment is described with respect to FIGS. 7A-7E .

在图像分割方法7000的操作7002中，可以包括应用成本阈值来执行表面成本图的初始分割。应用成本阈值生成在经阈值化的掩模(thresholded mask)7100中的各物体部分7101之间的阈值边界7102，如图7B中所示。阈值边界7102表示具有超过该阈值的表面成本图值的区域，而物体部分7101表示具有不超过该阈值的表面成本图值的区域。因此，阈值边界7102可以由经阈值化的掩模7100中的“假(false)”值表示，而物体部分7101则表示为“真(true)”值。“假”和“真”值的指派仅仅是约定的，并且任何合适的区分方式都可以被应用。物体部分7101表示对物体表面的第一估计，而阈值边界7102表示对物体边界或不连续性的第一估计。物体边界7103表示实际物体边界，并且被提供用于比较目的。In operation 7002 of the image segmentation method 7000, an initial segmentation of the surface cost map may be performed by applying a cost threshold. Applying a cost threshold generates a thresholded boundary 7102 between object parts 7101 in a thresholded mask 7100, as shown in FIG. 7B. Threshold boundaries 7102 represent regions with surface costmap values exceeding the threshold, while object parts 7101 represent regions with surface costmap values not exceeding the threshold. Thus, the threshold boundary 7102 may be represented by a "false" value in the thresholded mask 7100, while the object portion 7101 is represented by a "true" value. The assignment of "false" and "true" values is by convention only, and any suitable distinction may be applied. Object portion 7101 represents a first estimate of an object surface, while threshold boundary 7102 represents a first estimate of an object boundary or discontinuity. Object boundary 7103 represents the actual object boundary and is provided for comparison purposes.

在图像分割方法7000的操作7004中，可以在掩模定义操作中进一步定义经阈值化的掩模7100。掩模定义操作可以包括连接分量分析(connected component analysis)和掩模腐蚀(mask erosion)中的一个或多个，如关于图7C所解释的。经阈值化的掩模7100可以被进一步定义以生成经定义的掩模7200。In an operation 7004 of the image segmentation method 7000, the thresholded mask 7100 may be further defined in a mask definition operation. Mask definition operations may include one or more of connected component analysis and mask erosion, as explained with respect to FIG. 7C . Thresholded mask 7100 may be further defined to generate defined mask 7200 .

生成经定义的掩模7200可以包括对经阈值化的掩模7100执行的掩模腐蚀。掩模腐蚀是根据结构化元素来减少或腐蚀掩模的边界的操作。结构化元素可以表示例如具有输出像素/点的N×N组像素或点，该输出像素/点可以位于结构化元素的中心。当放置在掩模上时，如果掩模中与结构化元素中的点重合的每个点都为真，那么被腐蚀的掩模中结构化元素的输出点被设置为真。因此，为了使被腐蚀的掩模中的点为真，原始掩模中大到结构化元素的尺寸的每个周围点也必须为真。因此，腐蚀具有消除掩模边缘处的一层或多层点并平滑掩模中的任何不规则性的效果。在示例中，可以使用最小可拾取区域尺寸(例如，可以由机械臂抓住的最小区域尺寸，例如，它可以是抽吸抓握器实现牢固抓握所需的尺寸)的一半的结构化元素对经阈值化的掩模7100执行掩模腐蚀。这种腐蚀操作因此可以被用于使小于最小可拾取区域尺寸的掩模的任何部分断开(disconnect)。Generating defined mask 7200 may include performing mask erosion on thresholded mask 7100 . Mask etching is an operation that reduces or erodes the boundaries of a mask according to the structuring elements. A structuring element may represent, for example, an NxN set of pixels or points with an output pixel/point which may be located at the center of the structuring element. When placed on a mask, the output point of the structuring element in the eroded mask is set to true if every point in the mask that coincides with a point in the structuring element is true. Therefore, for a point in the eroded mask to be true, every surrounding point in the original mask up to the size of the structured element must also be true. Etching thus has the effect of removing one or more dots at the edge of the mask and smoothing any irregularities in the mask. In an example, structuring elements that are half the size of the minimum pickable area size (e.g. the smallest area size that can be grasped by a robotic arm, which can be, for example, the size required for a suction gripper to achieve a firm grip) can be used A mask etch is performed on the thresholded mask 7100 . This etch operation can thus be used to disconnect any portion of the mask that is smaller than the minimum pickable area size.

在图像分割方法7000的操作7006中，可以在经定义的成本掩模7200内标识物体区域。仍然参考图7C，可以对经定义的成本掩模7200执行连接分量分析以标识位于经定义的成本掩模7200内的物体区域7201。物体区域7201可以表示比先前讨论的物体部分7101更精细的对物体位置和边界的估计。In operation 7006 of image segmentation method 7000 , object regions may be identified within defined cost mask 7200 . Still referring to FIG. 7C , connected component analysis may be performed on the defined cost mask 7200 to identify object regions 7201 that lie within the defined cost mask 7200 . Object region 7201 may represent a finer estimate of object location and boundaries than previously discussed object portion 7101 .

在图像分割方法7000的操作7008中，可以选择并进一步定义来自物体区域7201中的图像片段7301。现在参考图7C和图7D，图像片段7301可以被选择为具有位于其中的种子7204的物体区域7201。种子7204可以是表面成本图中的具有最低成本的点(例如，最平滑点，其最不可能表示边界或不连续性)。可以通过移除不包括种子的所有物体区域7201来生成包含图像片段7301的片段图7300(图7D)。然后可以用与最小可拾取区域尺寸的一半对应的结构化元素来对图像片段7301进行膨胀(dilate)。膨胀是与腐蚀相反的操作。在膨胀期间，结构化元素的输出像素/点成为输入点。当覆盖在片段图7300上时，如果片段图7300上与结构化元素的输入点对应的点为真，那么片段图7300中与结构化元素对应的所有点都被设置为真。膨胀具有将图像片段7301的边界延伸与结构化元素的尺寸对应的量的效果。In operation 7008 of image segmentation method 7000 , an image segment 7301 from within object region 7201 may be selected and further defined. Referring now to FIGS. 7C and 7D , an image segment 7301 may be selected as an object region 7201 having a seed 7204 located therein. The seed 7204 may be the point in the surface cost map with the lowest cost (eg, the smoothest point that is least likely to represent a boundary or discontinuity). A segment map 7300 (FIG. 7D) containing image segments 7301 may be generated by removing all object regions 7201 that do not include a seed. The image segment 7301 may then be dilated with structuring elements corresponding to half the size of the smallest pickable area. Dilation is the opposite operation to erosion. During dilation, the output pixels/points of the structured elements become the input points. When overlaid on the fragment graph 7300, if the point on the fragment graph 7300 corresponding to the input point of the structured element is true, then all points in the fragment graph 7300 corresponding to the structured element are set to true. The dilation has the effect of extending the border of the image segment 7301 by an amount corresponding to the size of the structuring element.

在图像分割方法7000的操作7010中，可以验证图像片段7301。可以执行对图像片段7301的验证以确定标识出的图像片段7301是否表示多个物体中的可行物体。边界框7305(例如，正方形或矩形方框)可以绕标识出的图像片段7301进行拟合。然后可以将边界框7305与最大物体候选尺寸和最小物体候选尺寸进行比较。最大候选物体尺寸和最小候选物体尺寸表示在物体检测过程期间确定的最大和最小可能物体尺寸。如果边界框大于最大候选物体尺寸或小于最小候选物体尺寸，那么图像片段7301可以被确定为无效，这要求操作7002、7004、7006和7008的迭代。如果边界框大于最大候选物体尺寸，那么可以用降低的成本阈值来执行迭代。如果边界框小于最小候选物体尺寸，那么可以使用增加的成本阈值来执行迭代。In an operation 7010 of the image segmentation method 7000, an image segment 7301 may be verified. Validation of the image segment 7301 may be performed to determine whether the identified image segment 7301 represents a feasible object of the plurality of objects. A bounding box 7305 (eg, a square or rectangular box) may be fitted around the identified image segment 7301 . The bounding box 7305 can then be compared to the maximum object candidate size and the minimum object candidate size. The maximum candidate object size and the minimum candidate object size represent the maximum and minimum possible object sizes determined during the object detection process. If the bounding box is larger than the maximum candidate object size or smaller than the minimum candidate object size, the image segment 7301 may be determined to be invalid, requiring iterations of operations 7002 , 7004 , 7006 , and 7008 . Iterations can be performed with a reduced cost threshold if the bounding box is larger than the maximum candidate object size. Iterations can be performed with an increased cost threshold if the bounding box is smaller than the minimum candidate object size.

在实施例中，还可以将边界框与期望的最小可拾取区域尺寸进行比较。最小可拾取区域尺寸可以与能够拾取的最小可能区域尺寸对应，例如与机械臂的单个抽吸抓握器的尺寸对应。在实施例中，机械臂可以采用多于一个抽吸抓握器，例如2个或4个。期望的最小可拾取区域尺寸可以是与实现所选择或期望的抓握所必需的区域(例如，使得2或4个抽吸抓握器能够实现抓握所必需的区域)的尺寸对应的参数。如果边界框小于期望的最小可拾取区域尺寸，那么可以使用增加的阈值来对操作7002、7004、7006和7008进行迭代。In an embodiment, the bounding box may also be compared to the expected minimum pickable area size. The smallest pickable area size may correspond to the smallest possible area size that can be picked, for example to the size of a single suction gripper of a robotic arm. In embodiments, the robotic arm may employ more than one suction gripper, eg 2 or 4. The desired minimum pickable area size may be a parameter corresponding to the size of the area necessary to achieve a selected or desired grip (eg, the area necessary to enable 2 or 4 suction grippers to achieve a grip). If the bounding box is smaller than the desired minimum pickable area size, operations 7002, 7004, 7006, and 7008 may be iterated with increasing thresholds.

在图像片段7301已经被验证之后，它可以作为可拾取区域被存储用于进一步分析。然后可以从表面成本图6400中移除图像片段7301，并且可以重复操作7002-7010以标识附加的图像片段7301。在实施例中，在重复操作7002-7010之前，可以增加成本阈值。可以重复方法7000并且可以增加成本阈值，直到没有检测到或标识出另外的片段。图7E图示了从表面成本图6400标识出的图像片段7301的集合。在实施例中，标识出的图像片段7301可以被指定为可拾取区域。在实施例中，标识出的图像片段7301可以被进一步分析以确定其中的可拾取区域。After the image segment 7301 has been verified, it can be stored as a pickable region for further analysis. Image segment 7301 may then be removed from surface cost map 6400, and operations 7002-7010 may be repeated to identify additional image segments 7301. In an embodiment, the cost threshold may be increased before repeating operations 7002-7010. Method 7000 can be repeated and the cost threshold can be increased until no additional fragments are detected or identified. FIG. 7E illustrates a set of image segments 7301 identified from the surface cost map 6400 . In an embodiment, the identified image segment 7301 may be designated as a pickable area. In an embodiment, the identified image segment 7301 may be further analyzed to determine pickable regions therein.

在操作4008中，方法4000包括生成检测掩模。检测掩模可以被生成以细化或进一步定义与从图像分割操作4006中所确定的图像片段7301相对应的物体的潜在可拾取区域。At operation 4008, method 4000 includes generating a detection mask. A detection mask may be generated to refine or further define potentially pickable regions of objects corresponding to the image segment 7301 determined from the image segmentation operation 4006 .

例如，如图8A中所示，因为操作7010的边界框是二维构造，所以它可能不会准确地与物体上的点的实际高度对应。在图8A中，边界框8021已经被拟合到物体8022。但是，由于物体8022的可变形性质，物体8022的表面上的实际点8023并不都落在边界框8021内。因此，在操作4008中，可以生成检测掩模信息以标识该物体的在边界框内的或多或少适合于物体拾取的部分。For example, as shown in Figure 8A, because the bounding box of operation 7010 is a two-dimensional construct, it may not correspond exactly to the actual height of the point on the object. In FIG. 8A , a bounding box 8021 has been fitted to an object 8022 . However, due to the deformable nature of the object 8022, the actual points 8023 on the surface of the object 8022 do not all fall within the bounding box 8021. Accordingly, in operation 4008, detection mask information may be generated to identify portions of the object within the bounding box that are more or less suitable for object pickup.

图8B图示了检测掩模信息8300。检测掩模信息8300可以包括关于边界框8021(例如，在操作7010期间生成的用于图像片段7301的边界框)内的物体的信息。检测掩模信息8300包括标识出的区域8024和8027以及未标识出的区域8026。标识出的区域8024和8027可以包括检测到的区域8024(该检测到的区域8024包括检测到且未遮挡的区域)以及被遮挡的区域8027。对于物体拾取，被遮挡的区域8027可能是不安全或无用的，而检测到的区域8024对于拾取可以是安全的。未标识出的区域8026可以包括对于遮挡或拾取都未被标识并且一般不被用于检测或不被依赖于进行检测的区域。图8B中还图示了最小可拾取区域8025。如图所示，可以看出标记为“B”的检测到的区域8024不够大以至于不能容纳最小可拾取区域8025。因此，检测掩模信息8300可以结合上述图像分割技术来使用以标识物体的可拾取区域。FIG. 8B illustrates detection mask information 8300 . The detection mask information 8300 may include information about objects within the bounding box 8021 (eg, the bounding box for the image segment 7301 generated during operation 7010). Inspection mask information 8300 includes identified regions 8024 and 8027 and unidentified region 8026 . Identified areas 8024 and 8027 may include detected area 8024 (which includes detected and unoccluded area) and occluded area 8027 . An occluded area 8027 may be unsafe or useless for object picking, while a detected area 8024 may be safe for picking. Unidentified regions 8026 may include regions that are not identified for either occlusion or pick-up and are generally not used for or relied upon for detection. A minimum pickable area 8025 is also illustrated in Figure 8B. As shown, it can be seen that the detected area 8024 labeled "B" is not large enough to accommodate the minimum pickable area 8025 . Accordingly, detection mask information 8300 may be used in conjunction with the image segmentation techniques described above to identify pickable regions of objects.

在操作4010中，方法4000可以包括确定在运动规划操作中使用的安全体积。安全体积可以表示被选择用于拾取的物体可能占据的体积。安全体积被选择为降低选定物体一旦被拾取后与物体处置环境中的其他物体发生碰撞的可能性。At operation 4010, method 4000 may include determining a safety volume for use in a motion planning operation. A safe volume may represent a volume that an object selected for picking may occupy. The safe volume is chosen to reduce the probability that the selected object, once picked up, will collide with other objects in the object disposal environment.

现在参考图9A，在被指定为物体3520的可拾取区域的可拾取区域9201周围提供安全体积9100。安全体积可以被确定为具有所指定的供拾取的可拾取区域9201与预期的物体尺寸之间的差异的两倍的尺寸。这个安全体积尺寸因此在可拾取区域9201周围创建了如下体积，其可以提供物体的潜在维度的误差边际，例如，如果可拾取区域9201没有位于要拾取的物体3520的中心的话。然后可以如下修改安全体积9100的尺寸。Referring now to FIG. 9A , a safe volume 9100 is provided around a pickable area 9201 designated as the pickable area of an object 3520 . The safe volume may be determined to have a size twice the difference between the designated pickable area 9201 for picking and the expected object size. This safe volume size thus creates a volume around the pickable area 9201 that may provide a margin of error in the potential dimensions of the object, for example, if the pickable area 9201 is not centered on the object 3520 to be picked. The size of the safety volume 9100 can then be modified as follows.

首先，将安全体积9100与3D点云进行比较。如果3D点云不支持该安全体积9100尺寸(例如，安全体积9100太大并且会延伸超过3D点云的边界，3D点云的边界与源容器3510的边界对应)，那么安全体积9100的尺寸可以减小到被3D点云支持的尺寸。然后安全体积9100可以与3D点云的边缘对齐。图9B图示了安全体积9100被减小为安全体积9101的情况，这是因为安全体积9100的边界会延伸到与源容器3511相关联的3D点云之外。First, the Safe Volume 9100 is compared to the 3D point cloud. If the 3D point cloud does not support the safe volume 9100 size (e.g., the safe volume 9100 is too large and would extend beyond the boundaries of the 3D point cloud, which correspond to the boundaries of the source container 3510), then the size of the safe volume 9100 can Reduced to a size supported by the 3D point cloud. The safety volume 9100 can then be aligned to the edges of the 3D point cloud. FIG. 9B illustrates a situation where safe volume 9100 is reduced to safe volume 9101 because the boundaries of safe volume 9100 would extend beyond the 3D point cloud associated with source container 3511 .

如果安全体积9100/9101大于目的地容器指定的最大允许尺寸，那么可以进一步减小安全体积9100/9101。例如，如果目的地容器小于源容器，那么安全体积9100/9101对于目的地容器而言有可能太大。安全体积9100/9101因此可以相应地减小或调整。在实施例中，在安全体积9100/9101大于目的地容器并且不能调整到小于目的地容器的尺寸的情况下，如果已知物体3520能够适配到目标容器中，那么可以生成考虑到这种不确定性的运动规划。If the safety volume 9100/9101 is larger than the specified maximum allowable size of the destination container, the safety volume 9100/9101 may be further reduced. For example, if the destination container is smaller than the source container, then the safe volume 9100/9101 may be too large for the destination container. The safety volume 9100/9101 can thus be reduced or adjusted accordingly. In an embodiment, where the safety volume 9100/9101 is larger than the destination container and cannot be adjusted to be smaller than the size of the destination container, if the object 3520 is known to fit into the destination container, then a Deterministic motion planning.

如果操作7010的检测边界框伸出安全体积9100/9101，那么可以进一步调整安全体积9100/9101。这可能会发生，这例如是由于如上所述的对安全体积进行收缩或重新对齐，或者如果边界框相对于形成安全体积9100/9101的基础的可拾取区域9201以不方便的方式布置。在实施例中，为了解决这个问题，安全体积9100/9101可以被移动以包括边界框，或者边界框可以被移动并对齐到安全体积9100/9101。If the detection bounding box of operation 7010 extends beyond the safe volume 9100/9101, then the safe volume 9100/9101 may be further adjusted. This may happen, for example, due to shrinking or realigning the safe volume as described above, or if the bounding box is arranged in an inconvenient manner relative to the pickable area 9201 forming the basis of the safe volume 9100/9101. In an embodiment, to address this issue, the safety volume 9100/9101 may be moved to include the bounding box, or the bounding box may be moved and aligned to the safety volume 9100/9101.

在操作4012中，方法4000包括输出可拾取区域检测结果。可拾取区域检测结果可以包括在操作4002-4010中生成的任何或所有信息，包括例如标识出的图像片段7301、它们相关联的边界框7305、标识出的可拾取区域9201和安全体积9100/9101。可拾取区域检测结果可以包括关于源容器3510内的任何或所有检测到的物体3520的可拾取区域检测结果信息。At operation 4012, method 4000 includes outputting a pickable area detection result. Pickable area detection results may include any or all of the information generated in operations 4002-4010, including, for example, identified image segments 7301, their associated bounding boxes 7305, identified pickable areas 9201, and safe volumes 9100/9101 . The pickable area detection results may include pickable area detection result information regarding any or all detected objects 3520 within the source container 3510 .

在操作4014中，方法4000可以包括根据可拾取区域检测结果生成和/或输出运动规划。运动规划可以包括用于以下的机器人指令：遵循轨迹、通过标识出的可拾取区域9201来抓住或拾取物体3520、以及将物体3520传送到目的地容器，同时基于所确定的物体3520的安全体积9100/9101来考虑潜在碰撞。At operation 4014, method 4000 may include generating and/or outputting a motion plan based on the pickable region detection results. Motion planning may include robotic instructions for following a trajectory, grasping or picking up an object 3520 through an identified pickable area 9201, and delivering the object 3520 to a destination container, while based on the determined safe volume of the object 3520 9100/9101 to consider a potential collision.

对于相关领域的普通技术人员来说清楚的是，在不脱离任何实施例的范围的情况下，可以对本文描述的方法和应用进行其他适当的修改和改编。上述实施例是说明性示例并且不应当将本公开解释为限于这些特定实施例。应当理解的是，本文公开的各种实施例可以以与在说明书和附图中具体呈现的组合不同的组合来组合。还应当理解的是，根据示例，本文描述的任何过程或方法的某些动作或事件可以以不同的顺序执行，可以添加、合并或完全省略(例如，所有描述的动作或事件可能不是执行方法或过程所必需的)。此外，虽然出于清楚的目的将本文的实施例的某些特征描述为由单个组件、模块或单元执行，但应当理解的是，本文描述的特征和功能可以由组件、单元或模块的任何组合执行。因此，在不脱离所附权利要求所限定的本发明的精神或范围的情况下，本领域技术人员可以做出各种改变和修改。It will be apparent to those of ordinary skill in the relevant art that other suitable modifications and adaptations may be made to the methods and applications described herein without departing from the scope of any embodiment. The embodiments described above are illustrative examples and the present disclosure should not be construed as being limited to these specific embodiments. It should be understood that the various embodiments disclosed herein may be combined in combinations other than those specifically presented in the specification and drawings. It should also be understood that, depending on the example, certain acts or events of any process or method described herein may be performed in a different order, added to, combined, or omitted entirely (e.g., all described acts or events may not be performed in the manner of performing the method or necessary for the process). In addition, although certain features of the embodiments herein are described for clarity as being performed by a single component, module or unit, it is to be understood that the features and functions described herein may be implemented by any combination of components, units or modules implement. Accordingly, various changes and modifications can be made by those skilled in the art without departing from the spirit or scope of the invention as defined in the appended claims.

进一步的实施例可以包括：Further examples may include:

实施例1是一种计算系统，包括：控制系统，被配置为与具有机械臂的机器人进行通信并与相机进行通信，所述机械臂包括或附接到末端执行器装置；至少一个处理电路，被配置为当所述机器人位于包括用于传送到物体处置环境内的目的地的物体源的所述物体处置环境中时执行以下操作：获得所述物体的图像信息；通过以下操作来标识所述物体中的选定物体中的一个或多个选定物体的可拾取区域：根据所述图像信息生成表面成本图，分割所述表面成本图以获得一个或多个图像片段，所述一个或多个图像片段标识与所述一个或多个选定物体对应的一个或多个可拾取区域；以及生成至少包括所述一个或多个可拾取区域的可拾取区域检测结果；以及为机器人系统生成用于传送所述一个或多个选定物体的运动规划，所述运动规划基于所述可拾取区域检测结果。Embodiment 1 is a computing system comprising: a control system configured to communicate with a robot having a robotic arm that includes or is attached to an end effector device and to communicate with a camera; at least one processing circuit, configured to, when the robot is located in the object handling environment including a source of objects for delivery to a destination within the object handling environment: obtain image information of the object; identify the object by pickable regions of one or more selected ones of the objects: generating a surface cost map from the image information, segmenting the surface cost map to obtain one or more image segments, the one or more image segments identifying one or more pickable regions corresponding to the one or more selected objects; and generating a pickable region detection result including at least the one or more pickable regions; and generating a pickable region detection result for the robotic system A motion plan for the one or more selected objects is transmitted, the motion plan is based on the pickable area detection result.

实施例2是实施例1的系统，其中，所述表面成本图代表所述一个或多个选定物体的平滑度。Embodiment 2 is the system of embodiment 1, wherein the surface cost map represents a smoothness of the one or more selected objects.

实施例3是实施例1或2的系统，其中，所述图像信息包括三维信息，并且所述处理电路进一步被配置用于根据所述图像信息中的经定义的单元格之间的高度梯度和法线差异来生成所述表面成本图。Embodiment 3 is the system of embodiment 1 or 2, wherein the image information includes three-dimensional information, and the processing circuit is further configured to calculate height gradients between cells defined in the image information and normal difference to generate the surface costmap.

实施例4是实施例1至3中的任一项的系统，其中，所述至少一个处理电路进一步被配置用于根据表面成本图参数生成所述表面成本图。Embodiment 4 is the system of any one of embodiments 1 to 3, wherein the at least one processing circuit is further configured to generate the surface cost map based on surface cost map parameters.

实施例5是实施例1至4中的任一项的系统，其中，所述至少一个处理电路进一步被配置用于：基于所述图像信息注册所述一个或多个物体以创建物体注册信息；以及根据所述物体注册信息确定所述表面成本图参数。Embodiment 5 is the system of any one of embodiments 1 to 4, wherein the at least one processing circuit is further configured to: register the one or more objects based on the image information to create object registration information; and determining the surface cost map parameters according to the object registration information.

实施例6是实施例1至5中的任一项的系统，其中，所述至少一个处理电路进一步被配置用于生成指示图像片段的所述一个或多个可拾取区域的检测掩模信息，所述检测掩模信息包括所述一个或多个图像片段内检测到的区域和被遮挡的区域。Embodiment 6 is the system of any one of embodiments 1 to 5, wherein the at least one processing circuit is further configured to generate detection mask information indicative of the one or more pickable regions of the image segment, The detection mask information includes detected regions and occluded regions within the one or more image segments.

实施例7是实施例1至6中的任一项的系统，其中，分割所述表面成本图包括：将成本阈值应用于所述表面成本图以生成经阈值化的掩模；腐蚀所述经阈值化的掩模以生成被腐蚀的掩模；以及将连接分量分析应用于所述被腐蚀的掩模以标识第一图像片段。Embodiment 7 is the system of any one of embodiments 1 to 6, wherein segmenting the surface cost map comprises: applying a cost threshold to the surface cost map to generate a thresholded mask; eroding the surface cost map; thresholding the mask to generate an eroded mask; and applying connected component analysis to the eroded mask to identify a first image segment.

实施例8是实施例1至7中的任一项的系统，其中，分割所述表面成本图还包括：从表面成本图中移除所述第一图像片段；将第二成本阈值应用于所述表面成本图的剩余部分以生成第二经阈值化的掩模；腐蚀第二经阈值化的掩模以生成第二被腐蚀的掩模；以及将连接分量分析应用于所述第二被腐蚀的掩模以标识第二图像片段。Embodiment 8 is the system of any one of embodiments 1 to 7, wherein segmenting the surface cost map further comprises: removing the first image segment from the surface cost map; applying a second cost threshold to the the remainder of the surface costmap to generate a second thresholded mask; erode the second thresholded mask to generate a second eroded mask; and apply connected component analysis to the second eroded mask mask to identify the second image segment.

实施例9是实施例1至8中的任一项的系统，其中，生成所述可拾取区域检测结果还包括：生成围绕所述一个或多个可拾取区域的安全体积，所述安全体积指示所述一个或多个选定物体的所估计的剩余部分。Embodiment 9 is the system of any one of embodiments 1 to 8, wherein generating the pickable area detection result further comprises: generating a safety volume surrounding the one or more pickable areas, the safety volume indicating The estimated remainder of the one or more selected objects.

实施例10是一种由控制系统执行的物体传送方法，所述控制系统具有至少一个处理电路并被配置为与具有机械臂的机器人进行通信并与相机进行通信，所述机械臂包括或附接到末端执行器装置，所述方法包括：获得物体源中包含的一个或多个物体的图像信息；通过以下操作来标识所述物体中的选定物体中的一个或多个选定物体的可拾取区域：根据所述图像信息生成表面成本图，分割所述表面成本图以获得一个或多个图像片段，所述一个或多个图像片段标识与所述一个或多个选定物体对应的一个或多个可拾取区域；以及生成至少包括所述一个或多个可拾取区域的可拾取区域检测结果；以及为机器人系统生成用于传送所述一个或多个选定物体的运动规划，所述运动规划基于所述可拾取区域检测结果。Embodiment 10 is a method of object transfer performed by a control system having at least one processing circuit and configured to communicate with a robot having a robotic arm comprising or to which a camera is attached to an end effector apparatus, the method comprising: obtaining image information of one or more objects contained in an object source; identifying possible ones of one or more selected ones of the objects by Picking up regions: generating a surface cost map based on the image information, segmenting the surface cost map to obtain one or more image segments, the one or more image segments identifying one corresponding to the one or more selected objects or a plurality of pickable areas; and generating a pickable area detection result including at least the one or more pickable areas; and generating a motion plan for the robotic system to transport the one or more selected objects, the Motion planning is based on the pickable region detection results.

实施例11是实施例10的方法，其中，所述表面成本图代表所述一个或多个选定物体的平滑度。Embodiment 11 is the method of embodiment 10, wherein the surface cost map represents smoothness of the one or more selected objects.

实施例12是实施例10或11的方法，其中，所述图像信息包括三维信息，所述方法还包括根据所述图像信息中的经定义的单元格之间的高度梯度和法线差异来生成所述表面成本图。Embodiment 12 is the method of embodiment 10 or 11, wherein the image information includes three-dimensional information, and the method further includes generating The surface cost map.

实施例13是实施例10至12中的任一项的方法，还包括根据表面成本图参数生成所述表面成本图。Embodiment 13 is the method of any one of embodiments 10 to 12, further comprising generating the surface cost map based on surface cost map parameters.

实施例14是实施例10至13中的任一项的方法，还包括：基于所述图像信息注册所述一个或多个物体以创建物体注册信息；以及根据所述物体注册信息确定所述表面成本图参数。Embodiment 14 is the method of any one of embodiments 10 to 13, further comprising: registering the one or more objects based on the image information to create object registration information; and determining the surface based on the object registration information Costmap parameters.

实施例15是实施例10至14中的任一项的方法，还包括生成指示所述图像片段的所述一个或多个可拾取区域的检测掩模信息，所述检测掩模信息包括所述一个或多个图像片段内的检测到的区域和被遮挡的区域。Embodiment 15 is the method of any one of embodiments 10 to 14, further comprising generating detection mask information indicative of the one or more pickable regions of the image segment, the detection mask information comprising the Detected and occluded regions within one or more image segments.

实施例16是实施例10至15中的任一项的方法，其中，分割所述表面成本图包括：将成本阈值应用于所述表面成本图以生成经阈值化的掩模；腐蚀所述经阈值化的掩模以生成被腐蚀的掩模；以及将连接分量分析应用于所述被腐蚀的掩模以标识第一图像片段。Embodiment 16 is the method of any one of embodiments 10 to 15, wherein segmenting the surface cost map comprises: applying a cost threshold to the surface cost map to generate a thresholded mask; eroding the surface cost map thresholding the mask to generate an eroded mask; and applying connected component analysis to the eroded mask to identify a first image segment.

实施例17是实施例10至16中的任一项的方法，其中，分割所述表面成本图还包括：从所述表面成本图中移除所述第一图像片段；将第二成本阈值应用于所述表面成本图的剩余部分以生成第二经阈值化的掩模；腐蚀所述第二经阈值化的掩模以生成第二被腐蚀的掩模；以及将连接分量分析应用于所述第二被腐蚀的掩模以标识第二图像片段。Embodiment 17 is the method of any one of embodiments 10 to 16, wherein segmenting the surface cost map further comprises: removing the first image segment from the surface cost map; applying a second cost threshold on the remainder of the surface costmap to generate a second thresholded mask; erode the second thresholded mask to generate a second eroded mask; and apply connected component analysis to the A second etched mask to identify a second image segment.

实施例18是实施例10至17中的任一项的方法，其中，生成所述可拾取区域检测结果还包括：生成围绕所述一个或多个可拾取区域的安全体积，所述安全体积指示所述一个或多个选定物体的所估计的剩余部分。Embodiment 18 is the method of any one of embodiments 10 to 17, wherein generating the pickable area detection result further comprises: generating a safety volume surrounding the one or more pickable areas, the safety volume indicating The estimated remainder of the one or more selected objects.

实施例19是一种非暂时性计算机可读介质，配置有用于物体传送的可执行指令，所述物体传送由控制系统执行，所述控制系统具有至少一个处理电路并被配置为与具有机械臂的机器人进行通信并与相机进行通信，所述机械臂包括或附接到末端执行器装置，所述指令被配置用于：获得物体源中包含的一个或多个物体的图像信息；通过以下操作来标识所述物体中的选定物体中的一个或多个选定物体的可拾取区域：根据所述图像信息生成表面成本图，分割所述表面成本图以获得一个或多个图像片段，所述一个或多个图像片段标识与所述一个或多个选定物体对应的一个或多个可拾取区域；以及生成至少包括所述一个或多个可拾取区域的可拾取区域检测结果；以及为机器人系统生成用于传送所述一个或多个选定物体的运动规划，所述运动规划基于所述可拾取区域检测结果。Embodiment 19 is a non-transitory computer readable medium configured with executable instructions for object transfer performed by a control system having at least one processing circuit and configured to communicate with a robotic arm having and communicating with a camera, the robotic arm comprising or attached to an end effector device, the instructions being configured to: obtain image information of one or more objects contained in the object source; by To identify the pickable area of one or more selected objects among the selected objects: generate a surface cost map according to the image information, segment the surface cost map to obtain one or more image segments, and The one or more image segments identify one or more pickable areas corresponding to the one or more selected objects; and generating a pickable area detection result including at least the one or more pickable areas; and for The robotic system generates a motion plan for communicating the one or more selected objects, the motion plan based on the pickable area detection results.

实施例20是实施例19的非暂时性计算机可读介质，其中，所述图像信息包括三维信息，所述指令进一步被配置用于根据所述图像信息中的经定义的单元格之间的高度梯度和法线差异来生成所述表面成本图。Embodiment 20 is the non-transitory computer readable medium of embodiment 19, wherein the image information includes three-dimensional information, and the instructions are further configured to, based on the defined height between cells in the image information gradient and normal differences to generate the surface costmap.

Claims

1. A computing system comprising:

a control system configured to communicate with the robot having a robotic arm that includes or is attached to the end effector device and to communicate with the camera;

At least one processing circuit configured to perform the following operations while the robot is in the object handling environment including an object source for delivery to a destination within the object handling environment:

obtaining image information of the object;

Identifies the pickable area of one or more of the selected ones of said objects by:

generating a surface cost map from said image information,

segmenting the surface cost map to obtain one or more image segments identifying one or more pickable regions corresponding to the one or more selected objects; and

generating a pickable area detection result including at least the one or more pickable areas; and

A motion plan for conveying the one or more selected objects is generated for a robotic system, the motion plan based on the pickable area detection results.

2. The system of claim 1, wherein the surface cost map represents a smoothness of the one or more selected objects.

3. The system of claim 1 , wherein the image information includes three-dimensional information, and the processing circuit is further configured to calculate the height gradient between cells defined in the image information and Line differences to generate the surface cost map.

4. The system of claim 3, wherein the at least one processing circuit is further configured to generate the surface cost map based on surface cost map parameters.

5. The system of claim 4, wherein the at least one processing circuit is further configured to:

registering the one or more objects based on the image information to create object registration information; and

The surface cost map parameters are determined according to the object registration information.

6. The system of claim 1 , wherein the at least one processing circuit is further configured to generate detection mask information indicative of the one or more pickable regions of an image segment, the detection mask information Including detected areas and occluded areas in the one or more image segments.

7. The system of claim 1, wherein segmenting the surface costmap comprises:

applying a cost threshold to the surface cost map to generate a thresholded mask;

etching the thresholded mask to produce an etched mask; and

Connected component analysis is applied to the eroded mask to identify a first image segment.

8. The system of claim 7, wherein segmenting the surface costmap further comprises:

removing said first image segment from a surface costmap;

applying a second cost threshold to the remainder of the surface cost map to generate a second thresholded mask;

etching the second thresholded mask to generate a second etched mask; and

Connected component analysis is applied to the second eroded mask to identify a second image segment.

9. The system of claim 1, wherein generating the pickable region detection result further comprises:

A safe volume is generated around the one or more pickable regions, the safe volume indicating an estimated remainder of the one or more selected objects.

10. A method of object transfer performed by a control system having at least one processing circuit and configured to communicate with a robot having a robotic arm comprising or attached to a terminal An actuator device, the method comprising:

obtaining image information of one or more objects contained in the object source;

generating a surface cost map from said image information,

11. The method of claim 10, wherein the surface cost map represents a smoothness of the one or more selected objects.

12. The method according to claim 10, wherein the image information includes three-dimensional information, and the method further comprises generating the The surface cost map described above.

13. The method of claim 12, further comprising generating the surface cost map from surface cost map parameters.

14. The method of claim 13, further comprising:

15. The method of claim 10, further comprising generating detection mask information indicative of the one or more pickable regions of the image segments, the detection mask information comprising the one or more image segments The detected area and the occluded area within.

16. The method of claim 10, wherein segmenting the surface costmap comprises:

etching the thresholded mask to produce an etched mask; and

17. The method of claim 16, wherein segmenting the surface costmap further comprises:

removing the first image segment from the surface costmap;

etching the second thresholded mask to generate a second etched mask; and

18. The method of claim 10, wherein generating the pickable region detection result further comprises:

19. A non-transitory computer readable medium configured with executable instructions for object transfer performed by a control system having at least one processing circuit and configured to communicate with a robot having a robotic arm In communication with and with a camera, the robotic arm includes or is attached to an end effector device, the instructions being configured to:

generating a surface cost map from said image information,

20. The non-transitory computer-readable medium of claim 19, wherein the image information includes three-dimensional information, and the instructions are further configured to height gradient and normal difference to generate the surface costmap.