CN115867905A

CN115867905A - Augmented reality based speech translation in travel situations

Info

Publication number: CN115867905A
Application number: CN202180046641.9A
Authority: CN
Inventors: 弗吉尼亚·德拉蒙德; 伊尔泰里什·卡恩·詹贝尔克; 吉恩·罗; 阿列克·马西森; 希丽亚·妮科尔·穆尔库扬尼斯
Original assignee: Snap Inc
Current assignee: Snap Inc
Priority date: 2020-06-30
Filing date: 2021-06-23
Publication date: 2023-03-28
Also published as: EP4172744A1; US11769500B2; US20210407506A1; KR20230025917A; US12142278B2; US20230410811A1; WO2022005845A1

Abstract

Aspects of the present disclosure relate to a system including a computer-readable storage medium storing a program and a method for providing augmented reality content in association with travel. The programs and methods provide for: receiving, by a messaging application, a request to perform a scan operation in association with an image captured by a device camera; determining travel parameters associated with the request and attributes of an object depicted in the image; based on the travel parameters or At least one of the attributes selects an augmented reality content item configured to present the augmented reality content based on the speech input; receive the speech input; obtain at least one of a transcription or translation of the speech input; and present in association with the image Augmented reality content items that include transcriptions or translations.

Description

Augmented reality based speech translation in travel situations

优先权要求priority claim

本申请要求于2021年4月8日提交的美国申请序列第17/225,563号的优先权的权益，该美国申请要求于2020年6月30日提交的美国临时申请序列第63/046,114号的优先权的权益，该美国临时申请通过引用其整体并入本文中。This application claims the benefit of priority to U.S. Application Serial No. 17/225,563, filed April 8, 2021, which claims priority to U.S. Provisional Application Serial No. 63/046,114, filed June 30, 2020 rights, the U.S. Provisional Application is hereby incorporated by reference in its entirety.

技术领域technical field

本公开内容总体上涉及消息系统，其包括提供增强现实内容与捕获图像。The present disclosure relates generally to messaging systems, including providing augmented reality content and capturing images.

背景技术Background technique

消息系统提供用户之间的消息内容的交换。例如，消息系统允许用户与一个或更多个其他用户交换消息内容(例如，文本、图像)。The messaging system provides for the exchange of message content between users. For example, a messaging system allows a user to exchange message content (eg, text, images) with one or more other users.

附图说明Description of drawings

在不一定按比例绘制的附图中，相同的标号可以在不同的视图中描述类似的部件。为了容易地识别对任何特定元件或动作的讨论，附图标记中的最高有效数字指的是该元件首次被引入的图号。在附图的各图中，通过示例而非限制的方式示出了一些实施方式。In the drawings, which are not necessarily to scale, like reference numerals may depict similar components in the different views. For ease of identifying a discussion of any particular element or act, the most significant digit in a reference number refers to the figure number in which that element was first introduced. In the various figures of the drawings, some embodiments are shown by way of example and not limitation.

图1是根据一些示例实施方式的其中可以部署本公开内容的联网环境的图解表示。Figure 1 is a diagrammatic representation of a networked environment in which the present disclosure may be deployed, according to some example implementations.

图2是根据一些示例实施方式的具有客户端侧功能和服务器侧功能两者的消息系统的图解表示。2 is a diagrammatic representation of a messaging system with both client-side and server-side functionality, according to some example implementations.

图3是根据一些示例实施方式的如在数据库中维护的数据结构的图解表示。Figure 3 is a diagrammatic representation of a data structure as maintained in a database, according to some example implementations.

图4是根据一些示例实施方式的消息的图解表示。Figure 4 is a diagrammatic representation of a message according to some example implementations.

图5是示出根据一些示例实施方式的用于与旅行相关联地提供与语音翻译相对应的增强现实内容的过程的交互图。FIG. 5 is an interaction diagram illustrating a process for providing augmented reality content corresponding to voice translation in association with travel, according to some example embodiments.

图6A示出了根据一些示例实施方式的其中用于基于增强现实的语音翻译的语音输入的源对应于设备用户的示例用户界面。6A illustrates an example user interface in which a source of speech input for augmented reality-based speech translation corresponds to a device user, according to some example implementations.

图6B示出了根据一些示例实施方式的用于为由设备用户提供的语音输入提供基于增强现实的语音翻译的示例用户界面。6B illustrates an example user interface for providing augmented reality-based speech translation for speech input provided by a device user, according to some example implementations.

图7A示出了根据一些示例实施方式的其中用于基于增强现实的语音翻译的语音输入的源对应于不同于设备用户的个体的示例用户界面。7A illustrates an example user interface in which a source of speech input for augmented reality-based speech translation corresponds to an individual other than a device user, according to some example implementations.

图7B示出了根据一些示例实施方式的用于为由不同于设备用户的个体提供的语音输入提供基于增强现实的语音翻译的示例用户界面。7B illustrates an example user interface for providing augmented reality-based speech translation for speech input provided by an individual other than a device user, according to some example embodiments.

图8A示出了根据一些示例实施方式的用于为由电视提供的语音输入提供基于增强现实的语音翻译的示例用户界面。8A illustrates an example user interface for providing augmented reality-based speech translation for speech input provided by a television, according to some example implementations.

图8B示出了根据一些示例实施方式的用于为由电视提供的语音输入提供基于增强现实的语音翻译的另一示例用户界面。8B illustrates another example user interface for providing augmented reality-based speech translation for speech input provided by a television, according to some example implementations.

图9是示出根据一些示例实施方式的用于与旅行相关联地提供与语音翻译相对应的增强现实内容的过程的流程图。FIG. 9 is a flowchart illustrating a process for providing augmented reality content corresponding to voice translation in association with travel, according to some example embodiments.

图10是根据一些示例实施方式的访问限制过程的流程图。Figure 10 is a flowchart of an access restriction process according to some example implementations.

图11是根据一些示例实施方式的计算机系统形式的机器的图解表示，在该机器内可以执行一组指令以使该机器执行本文所讨论的方法中的任何一个或更多个。11 is a diagrammatic representation of a machine in the form of a computer system within which a set of instructions may be executed to cause the machine to perform any one or more of the methods discussed herein, according to some example embodiments.

图12是示出根据一些示例实施方式的软件架构的框图，在该软件架构内可以实现示例。Figure 12 is a block diagram illustrating a software architecture within which an example may be implemented, according to some example implementations.

具体实施方式Detailed ways

消息系统通常允许用户在消息线程中彼此交换内容项(例如，消息、图像和/或视频)。消息系统可以实现增强系统或以其他方式与增强系统协同工作以增强与消息相关联的媒体内容。例如，增强系统可以将覆盖、过滤器和/或增强现实内容与由设备摄像装置捕获的图像数据组合。然而，用户可能希望在旅行时便于增强现实内容的创建和/或选择。Messaging systems typically allow users to exchange content items (eg, messages, images, and/or videos) with each other in message threads. The messaging system may implement or otherwise work in conjunction with the enhancement system to enhance media content associated with the message. For example, an augmentation system may combine overlays, filters, and/or augmented reality content with image data captured by device cameras. However, users may wish to facilitate the creation and/or selection of augmented reality content while traveling.

所公开的实施方式提供了与旅行相关联地呈现与语音翻译相对应的增强现实内容。响应于执行扫描操作的用户请求，消息客户端确定与请求相关联的旅行参数和图像中描绘的对象的属性。The disclosed embodiments provide for presenting augmented reality content corresponding to speech translations in association with travel. In response to a user request to perform a scanning operation, the messaging client determines travel parameters associated with the request and attributes of objects depicted in the image.

消息客户端基于旅行参数和/或属性选择增强现实内容项(例如，对应于增强现实体验)。增强现实内容项被配置成基于与请求相关联的语音输入来呈现增强现实内容。消息客户端获得语音输入的转录和/或翻译，并且与图像相关联地呈现增强现实内容项，增强现实内容项包括转录或翻译中的至少一个。The messaging client selects an augmented reality content item (eg, corresponding to an augmented reality experience) based on travel parameters and/or attributes. The augmented reality content item is configured to present the augmented reality content based on the speech input associated with the request. The messaging client obtains a transcription and/or translation of the speech input and presents an augmented reality content item in association with the image, the augmented reality content item including at least one of the transcription or the translation.

图1是示出用于通过网络交换数据(例如，消息和相关联的内容)的示例消息系统100的框图。消息系统100包括客户端设备102的多个实例，每个实例托管包括消息客户端104的多个应用。每个消息客户端104经由网络106(例如，因特网)通信地耦接至消息客户端104的其他实例和消息服务器系统108。1 is a block diagram illustrating an example messaging system 100 for exchanging data (eg, messages and associated content) over a network. Messaging system 100 includes multiple instances of client devices 102 each hosting multiple applications including messaging clients 104 . Each message client 104 is communicatively coupled to other instances of the message client 104 and to the message server system 108 via a network 106 (eg, the Internet).

消息客户端104能够经由网络106与另一消息客户端104和消息服务器系统108通信并交换数据。在消息客户端104之间以及消息客户端104与消息服务器系统108之间交换的数据包括功能(例如，激活功能的命令)以及有效载荷数据(例如，文本、音频、视频或其他多媒体数据)。A message client 104 is capable of communicating and exchanging data with another message client 104 and a message server system 108 via a network 106 . Data exchanged between message clients 104 and between message clients 104 and message server system 108 includes functions (eg, commands to activate functions) and payload data (eg, text, audio, video, or other multimedia data).

消息服务器系统108经由网络106向特定消息客户端104提供服务器侧功能。虽然消息系统100的某些功能在本文中被描述为由消息客户端104或由消息服务器系统108执行，但是某些功能的在消息客户端104或消息服务器系统108内的定位可以是设计选择。例如，在技术上优选的可能是：最初将某些技术和功能部署在消息服务器系统108内，但是后面将该技术和功能迁移至客户端设备102具有足够处理能力的消息客户端104。Message server system 108 provides server-side functionality to particular message clients 104 via network 106 . Although certain functions of message system 100 are described herein as being performed by message client 104 or by message server system 108, the location of certain functions within message client 104 or message server system 108 may be a design choice. For example, it may be technically preferable to initially deploy certain technology and functionality within the message server system 108, but later migrate the technology and functionality to the message client 104 where the client device 102 has sufficient processing power.

消息服务器系统108支持向消息客户端104提供的各种服务和操作。这样的操作包括向消息客户端104发送数据、从消息客户端104接收数据以及对由消息客户端104生成的数据进行处理。作为示例，该数据可以包括消息内容、客户端设备信息、地理定位信息、媒体增强和覆盖、消息内容持续条件、社交网络信息和实况事件信息。通过经由消息客户端104的用户接口(UI)可用的功能来激活和控制消息系统100内的数据交换。Message server system 108 supports various services and operations provided to message clients 104 . Such operations include sending data to message client 104 , receiving data from message client 104 , and processing data generated by message client 104 . As examples, this data may include message content, client device information, geolocation information, media enhancements and overlays, message content persistence conditions, social network information, and live event information. Data exchange within messaging system 100 is activated and controlled through functionality available via a user interface (UI) of messaging client 104 .

现在具体地转至消息服务器系统108，应用程序接口(API)服务器110耦接至应用服务器114并向应用服务器114提供编程接口。应用服务器114通信地耦接至数据库服务器122，数据库服务器122便于对数据库124的访问，该数据库124存储与通过应用服务器114处理的消息相关联的数据。类似地，web服务器112耦接至应用服务器114，并且向应用服务器114提供基于web的接口。为此，web服务器112通过超文本传输协议(HTTP)和若干其他相关协议处理传入的网络请求。Turning now specifically to message server system 108 , application programming interface (API) server 110 is coupled to application server 114 and provides a programming interface to application server 114 . The application server 114 is communicatively coupled to a database server 122 that facilitates access to a database 124 that stores data associated with messages processed by the application server 114 . Similarly, web server 112 is coupled to application server 114 and provides a web-based interface to application server 114 . To this end, web server 112 handles incoming web requests via Hypertext Transfer Protocol (HTTP) and several other related protocols.

应用程序接口(API)服务器110在客户端设备102与应用服务器114之间接收和发送消息数据(例如，命令和消息有效载荷)。具体地，应用程序接口(API)服务器110提供一组接口(例如，例程和协议)，消息客户端104可以调用或查询该组接口以激活应用服务器114的功能。应用程序接口(API)服务器110显露由应用服务器114支持的各种功能，包括：账户注册；登录功能；经由应用服务器114将消息从特定消息客户端104发送至另一消息客户端104；将媒体文件(例如，图像或视频)从消息客户端104发送至消息服务器116并用于另一消息客户端104的可能访问；媒体数据集合(例如，故事)的设置；检索客户端设备102的用户的朋友列表；检索这样的集合；检索消息和内容；在实体图(例如，社交图)中添加和删除实体(例如，朋友)；在社交图中定位朋友；以及打开应用事件(例如，与消息客户端104有关)。Application programming interface (API) server 110 receives and sends message data (eg, commands and message payloads) between client device 102 and application server 114 . Specifically, an application program interface (API) server 110 provides a set of interfaces (eg, routines and protocols) that a message client 104 can call or query to activate the functionality of the application server 114 . Application Program Interface (API) server 110 exposes various functions supported by application server 114, including: account registration; login functionality; sending messages from a particular messaging client 104 to another messaging client 104 via application server 114; sending media Files (e.g., images or videos) are sent from a messaging client 104 to a messaging server 116 for possible access by another messaging client 104; settings for media data collections (e.g., stories); retrieval of friends of a user of the client device 102 lists; retrieving such collections; retrieving messages and content; adding and removing entities (e.g., friends) in entity graphs (e.g., social graphs); locating friends in social graphs; and opening application events (e.g., with messaging client 104 related).

应用服务器114托管多个服务器应用和子系统，包括例如消息服务器116、图像处理服务器118以及社交网络服务器120。消息服务器116实现了多个消息处理技术和功能，特别是与从消息客户端104的多个实例接收到的消息中包括的内容(例如，文本和多媒体内容)的聚合和其他处理有关的消息处理技术和功能。如将进一步详细地描述的，来自多个源的文本和媒体内容可以被聚合成内容的集合(例如，被称为故事(Story)或图库(gallery))。然后，使这些集合对消息客户端104可用。鉴于对这样的处理的硬件要求，也可以由消息服务器116在服务器侧执行数据的其他处理器和存储器密集型处理。Application server 114 hosts a number of server applications and subsystems including, for example, messaging server 116 , image processing server 118 , and social networking server 120 . Message server 116 implements a number of message processing techniques and functions, particularly message processing related to aggregation and other processing of content (e.g., text and multimedia content) included in messages received from multiple instances of message client 104 technology and functionality. As will be described in further detail, text and media content from multiple sources can be aggregated into collections of content (eg, referred to as stories or galleries). These collections are then made available to message client 104 . Given the hardware requirements for such processing, other processor and memory intensive processing of data may also be performed by the message server 116 on the server side.

应用服务器114还包括图像处理服务器118，该图像处理服务器118专用于执行各种图像处理操作，通常相对于在从消息服务器116发送或者在消息服务器116处接收的消息的有效载荷内的图像或视频，执行各种图像处理操作。The application server 114 also includes an image processing server 118 dedicated to performing various image processing operations, typically with respect to image or video within the payload of a message sent from or received at the message server 116 , to perform various image processing operations.

社交网络服务器120支持各种社交联网功能和服务并使这些功能和服务可用于消息服务器116。为此，社交网络服务器120维护和访问数据库124内的实体图304(如图3所示)。社交网络服务器120所支持的功能和服务的示例包括识别消息系统100中的与特定用户具有关系或该特定用户正在“关注”的其他用户，以及识别特定用户的兴趣和其他实体。Social networking server 120 supports and makes various social networking functions and services available to message server 116 . To this end, social networking server 120 maintains and accesses entity graph 304 (shown in FIG. 3 ) within database 124 . Examples of functions and services supported by social networking server 120 include identifying other users in messaging system 100 who have a relationship with or are "following" a particular user, as well as identifying interests and other entities of a particular user.

图2是示出根据一些示例的关于消息系统100的进一步细节的框图。具体地，消息系统100被显示为包括消息客户端104和应用服务器114。消息系统100包含多个子系统，这些子系统在客户端侧由消息客户端104支持并且在服务器侧由应用服务器114支持。这些子系统包括例如短暂定时器系统202、集合管理系统204、增强系统208、地图系统210、对象检测系统212和/或转录和翻译系统212。FIG. 2 is a block diagram illustrating further details regarding messaging system 100 according to some examples. In particular, messaging system 100 is shown including messaging client 104 and application server 114 . Messaging system 100 includes a number of subsystems supported on the client side by messaging client 104 and on the server side by application server 114 . These subsystems include ephemeral timer system 202 , collection management system 204 , augmentation system 208 , map system 210 , object detection system 212 , and/or transcription and translation system 212 , for example.

短暂定时器系统202负责施行由消息客户端104和消息服务器116对内容进行临时或限时访问。短暂定时器系统202包含多个定时器，这些定时器基于与消息或消息集合(例如，故事)相关联的持续时间和显示参数，选择性地实现经由消息客户端104访问(例如，用于呈现和显示)消息和相关联的内容。下面提供关于短暂计时器系统202的操作的另外的细节。Ephemeral timer system 202 is responsible for enforcing temporary or time-limited access to content by messaging clients 104 and messaging servers 116 . Ephemeral timer system 202 includes a plurality of timers that selectively enable access via message client 104 (e.g., for presentation) based on duration and display parameters associated with a message or collection of messages (e.g., stories). and display) messages and associated content. Additional details regarding the operation of ephemeral timer system 202 are provided below.

集合管理系统204负责管理媒体的合集或集合(例如，文本、图像视频和音频数据的集合)。可以将内容(例如，消息，包括图像、视频、文本和音频)的集合组织成“事件图库”或“事件故事”。可以使这样的集合在例如与内容有关的事件的持续时间的指定的时间段内可用。例如，可以在音乐会的持续时间内使与音乐会有关的内容作为“故事”可用。集合管理系统204还可以负责向消息客户端104的用户接口发布提供特定集合存在的通知的图标。Collection management system 204 is responsible for managing collections or collections of media (eg, collections of text, image video, and audio data). Collections of content (eg, messages, including images, video, text, and audio) can be organized into "event galleries" or "event stories." Such a collection may be made available for a specified period of time, such as the duration of an event related to the content. For example, concert-related content may be made available as a "story" for the duration of the concert. Collection management system 204 may also be responsible for publishing an icon to the user interface of message client 104 that provides notification of the existence of a particular collection.

此外，集合管理系统204还包括允许集合管理器管理和策展内容的特定集合的策展接口206。例如，策展接口206使得事件组织者能够策展与特定事件有关的内容的集合(例如，删除不适当的内容或冗余消息)。此外，集合管理系统204采用机器视觉(或图像识别技术)和内容规则来自动地策展内容集合。在某些示例中，可以为将用户生成的内容包括到集合中向用户支付补偿。在这样的情况下，集合管理系统204进行操作以自动地为使用用户的内容向这样的用户进行支付。Additionally, collection management system 204 also includes a curation interface 206 that allows a collection manager to manage and curate a particular collection of content. For example, the curation interface 206 enables an event organizer to curate a collection of content related to a particular event (eg, delete inappropriate content or redundant messages). In addition, collection management system 204 employs machine vision (or image recognition technology) and content rules to automatically curate content collections. In some examples, the user may be paid compensation for including user-generated content in the collection. In such cases, collection management system 204 operates to automatically pay such users for use of their content.

增强系统208提供使得用户能够增强(例如，注释或以其他方式修改或编辑)与消息相关联的媒体内容的各种功能。例如，增强系统208提供与生成和发布用于由消息系统100处理的消息的媒体覆盖(media overlay)有关的功能。增强系统208基于客户端设备102的地理定位可操作地向消息客户端104提供媒体覆盖或增强(例如，图像过滤器)。在另一示例中，增强系统208基于诸如客户端设备102的用户的社交网络信息的其他信息可操作地向消息客户端104供应媒体覆盖。媒体覆盖可以包括音频和视觉内容以及视觉效果。音频和视觉内容的示例包括图片、文本、标志、动画和声音效果。视觉效果的示例包括颜色覆盖。音频和视觉内容或视觉效果可以应用于客户端设备102处的媒体内容项(例如，照片)。例如，媒体覆盖可以包括可以覆盖在由客户端设备102拍摄的照片之上的文本或图像。在另一示例中，媒体覆盖包括定位标识(例如，威尼斯海滩)覆盖、实况事件的名称或商家名称(例如，海滩咖啡馆)覆盖。在另一示例中，增强系统208使用客户端设备102的地理定位来标识包括在客户端设备102的地理定位处的商家的名称的媒体覆盖。媒体覆盖可以包括与商家相关联的其他标记。媒体覆盖可以存储在数据库124中并通过数据库服务器122访问。Enhancement system 208 provides various functions that enable a user to enhance (eg, annotate or otherwise modify or edit) media content associated with a message. For example, enhancement system 208 provides functionality related to generating and publishing media overlays for messages processed by messaging system 100 . The enhancement system 208 is operable to provide media overlays or enhancements (eg, image filters) to the message client 104 based on the geographic location of the client device 102 . In another example, the enhancement system 208 is operable to supply the messaging client 104 with media overlays based on other information, such as social network information of the user of the client device 102 . Media overlays can include audio and visual content as well as visual effects. Examples of audio and visual content include pictures, text, logos, animations and sound effects. Examples of visual effects include color overlays. Audio and visual content or visual effects may be applied to media content items (eg, photos) at client device 102 . For example, media overlays may include text or images that may be overlaid on a photo taken by client device 102 . In another example, the media overlay includes a location identification (eg, Venice Beach) overlay, a name of a live event, or a business name (eg, a beach cafe) overlay. In another example, the enhancement system 208 uses the geolocation of the client device 102 to identify media overlays that include the name of the merchant at the geolocation of the client device 102 . The media overlay may include other indicia associated with the business. Media overlays may be stored in database 124 and accessed through database server 122 .

在一些示例中，增强系统208提供基于用户的发布平台，该基于用户的发布平台使得用户能够选择地图上的地理定位并上传与所选择的地理定位相关联的内容。用户还可以指定应当向其他用户提供特定媒体覆盖的情况。增强系统208生成包括所上传的内容并将所上传的内容与所选择的地理定位相关联的媒体覆盖。In some examples, enhancement system 208 provides a user-based publishing platform that enables users to select a geographic location on a map and upload content associated with the selected geographic location. Users may also specify circumstances under which certain media overlays should be provided to other users. The enhancement system 208 generates a media overlay that includes the uploaded content and associates the uploaded content with the selected geographic location.

在其他示例中，增强系统208提供基于商家的发布平台，该平台使商家能够选择经由竞标处理与地理定位相关联的特定媒体覆盖。例如，增强系统208将最高竞价商家的媒体覆盖与对应地理定位相关联达预定义时间量。In other examples, the enhancement system 208 provides a merchant-based publishing platform that enables merchants to select specific media overlays associated with geolocations via bidding processes. For example, the enhancement system 208 associates the media coverage of the highest bidding merchant with a corresponding geographic location for a predefined amount of time.

地图系统210提供各种地理定位功能，并且支持由消息客户端104呈现基于地图的媒体内容和消息。例如，地图系统210使得能够在地图上显示(例如，相关联地存储在如下所述的简档数据302中的)用户图标或化身，以指示用户的“朋友”的当前或过去位置，以及由这些朋友在地图的上下文内生成的媒体内容(例如，包括照片和视频的消息的集合)。例如，在消息客户端104的地图界面上，可以将用户从特定地理定位发布到消息系统100的消息在地图的该特定位置的上下文内显示给特定用户的“朋友”。用户还可以经由消息客户端104与消息系统100的其他用户(例如，使用如本文所述的适当的状况化身)共享他的或她的定位和状况信息，该定位和状况信息类似地在消息客户端104的地图界面的上下文内被显示给所选择的用户。Mapping system 210 provides various geolocation functions and supports rendering of map-based media content and messages by messaging client 104 . For example, the mapping system 210 enables user icons or avatars (e.g., stored in association in profile data 302 as described below) to be displayed on a map to indicate the current or past locations of the user's "friends," and Media content (eg, collections of messages including photos and videos) generated by these friends within the context of the map. For example, on the map interface of messaging client 104, messages posted by a user to messaging system 100 from a particular geographic location may be displayed to a particular user's "friends" within the context of that particular location on the map. A user may also share his or her location and situation information with other users of the messaging system 100 (e.g., using an appropriate situation avatar as described herein) via the messaging client 104, similarly in the messaging client 104. The selected user is displayed within the context of the map interface of the terminal 104.

对象检测系统212在消息系统100的上下文中提供各种对象检测功能。对象检测系统212可以采用一个或更多个对象分类器来识别在捕获图像中描绘的对象。图像可以对应于由客户端设备102的摄像装置(例如，后置摄像装置或前置摄像装置)捕获的实况视频馈送。替选地或另外，图像可以对应于与客户端设备102(例如，照片库)的用户相关联地存储的图像(例如，照片)。Object detection system 212 provides various object detection functions within the context of messaging system 100 . Object detection system 212 may employ one or more object classifiers to identify objects depicted in captured images. The images may correspond to a live video feed captured by a camera of client device 102 (eg, a rear-facing camera or a front-facing camera). Alternatively or in addition, the images may correspond to images (eg, photos) stored in association with a user of client device 102 (eg, a photo library).

在一个或更多个实施方式中，对象检测系统212被配置成实现或以其他方式访问被配置成扫描捕获图像的对象识别算法(例如，包括机器学习算法)，并且检测/跟踪图像内的对象的移动。作为非限制性示例，图像内的可检测对象包括：人脸、人体的部位、动物及其部位、风景、自然对象、非生命对象(例如，建筑物、商店门面、食物、衣物、椅子、书、汽车、建筑、其他结构)、对象的说明(例如，在海报和/或传单上)、基于文本的对象、基于方程的对象等。In one or more embodiments, object detection system 212 is configured to implement or otherwise access object recognition algorithms (eg, including machine learning algorithms) configured to scan captured images, and detect/track objects within the images of the mobile. As non-limiting examples, detectable objects within an image include: human faces, parts of the human body, animals and their parts, landscapes, natural objects, inanimate objects (e.g., buildings, store fronts, food, clothing, chairs, books, etc.) , cars, buildings, other structures), descriptions of objects (for example, on posters and/or leaflets), text-based objects, equation-based objects, etc.

另外，对象检测系统212被配置成确定或以其他方式访问对象的属性。对于特定对象，对象检测系统212可以确定或检索属性，该属性例如是名称/类型、流派、颜色、大小、形状、纹理、环境因素(例如，地理定位、时间、天气)和/或其他补充信息(例如，与媒体相对应的对象的歌曲标题/艺术家)。Additionally, object detection system 212 is configured to determine or otherwise access attributes of objects. For a particular object, object detection system 212 may determine or retrieve attributes such as name/type, genre, color, size, shape, texture, environmental factors (e.g., geolocation, time, weather), and/or other supplemental information (e.g. song title/artist for objects corresponding to media).

关于环境因素，对象检测系统212可以从消息客户端104接收信息以识别客户端设备102周围的天气、地理定位、时间等(例如，经由设备传感器)。对象检测系统212可以基于相关性(例如基于属性与一个或更多个环境因素的关联)，对检索到的属性进行排名。可以采用其他机器学习技术来选择和排序所检索的属性。对象检测系统212可以从在捕获图像中检测到的对象列表中选择与最高排名的属性相关联的对象，并且可以向消息客户端104发送所选择的对象的指示。替选地或另外，对象检测系统212可以提供将每个检测到的对象的一个或更多个属性(例如，名称/类型)和/或属性的排名的指示传送到消息客户端104。Regarding environmental factors, object detection system 212 may receive information from messaging client 104 to identify weather, geographic location, time, etc. around client device 102 (eg, via device sensors). Object detection system 212 may rank the retrieved attributes based on relevance (eg, based on the attribute's association with one or more environmental factors). Other machine learning techniques can be employed to select and rank the retrieved attributes. Object detection system 212 may select the object associated with the highest ranked attribute from the list of objects detected in the captured image, and may send an indication of the selected object to messaging client 104 . Alternatively or additionally, object detection system 212 may provide for communication of one or more attributes (eg, name/type) and/or an indication of the rank of the attributes of each detected object to message client 104 .

在一个或更多个实施方式中，对象检测系统212确定属性之一对应于已经由第三方赞助的关键字。例如，第三方可以赞助或支付比其他关键字排名更高的某些关键字。响应于确定给定属性对应于赞助关键词，对象检测系统212可以提供该属性相对于其他属性的更高排名。In one or more implementations, object detection system 212 determines that one of the attributes corresponds to a keyword that has been sponsored by a third party. For example, third parties may sponsor or pay to rank higher for certain keywords than others. In response to determining that a given attribute corresponds to a sponsored keyword, object detection system 212 may provide a higher ranking for that attribute relative to other attributes.

转录和翻译系统214被配置成提供与语音识别、转录和/或翻译有关的各种功能。在一个或更多个实施方式中，转录和翻译系统214被配置成接收语音输入(例如，经由设备麦克风)。转录和翻译系统214可以将自动语音识别(ASR)算法或其他已知技术应用于语音输入，并且提供机器编码文本作为输出。在一个或更多个实施方式中，转录和翻译系统214提供例如与整个语音输入相对应的转录形式的机器编码文本。替选地或另外，转录和翻译系统214可以提供从语音输入导出的一个或更多个基于文本的关键词形式的机器编码文本。The transcription and translation system 214 is configured to provide various functions related to speech recognition, transcription and/or translation. In one or more implementations, the transcription and translation system 214 is configured to receive speech input (eg, via a device microphone). Transcription and translation system 214 may apply automatic speech recognition (ASR) algorithms or other known techniques to speech input and provide machine-encoded text as output. In one or more implementations, the transcription and translation system 214 provides, for example, machine-encoded text in transcribed form corresponding to the entire speech input. Alternatively or in addition, transcription and translation system 214 may provide machine-encoded text in the form of one or more text-based keywords derived from the speech input.

转录和翻译系统214还提供与语言翻译有关的各种功能。在一个或更多个实施方式中，转录和翻译系统214被配置成实现或以其他方式访问用于将文本输入从第一语言翻译成第二语言的算法和/或技术。例如，响应于对翻译的请求，转录和翻译系统214可以接收语音输入，生成对应于语音输入(例如，对应于转录)的机器编码文本，并且将机器编码文本从第一语言翻译成第二语言(例如，对应于翻译)。转录和翻译系统214还被配置成提供经翻译的文本作为输出。替选地或另外，转录和翻译系统214被配置成实现或以其他方式访问用于将翻译的文本转换成音频输出的算法和/或技术。因此，转录和翻译系统214被配置成接收语音作为输入，并且输出关键字、转录和/或翻译中的一个或更多个以用于显示和/或用于音频输出。The transcription and translation system 214 also provides various functions related to language translation. In one or more implementations, the transcription and translation system 214 is configured to implement or otherwise access algorithms and/or techniques for translating textual input from a first language to a second language. For example, in response to a request for translation, transcription and translation system 214 may receive speech input, generate machine-encoded text corresponding to the speech input (e.g., corresponding to a transcription), and translate the machine-encoded text from a first language to a second language (for example, corresponding to translation). The transcription and translation system 214 is also configured to provide translated text as output. Alternatively or in addition, transcription and translation system 214 is configured to implement or otherwise access algorithms and/or techniques for converting translated text into audio output. Accordingly, the transcription and translation system 214 is configured to receive speech as input and output one or more of keywords, transcriptions, and/or translations for display and/or for audio output.

图3是示出根据某些示例的可以存储在消息服务器系统108的数据库124中的数据结构300的示意图。虽然数据库124的内容被示出为包括多个表，但是将理解的是，数据可以以其他类型的数据结构进行存储(例如，作为面向对象的数据库)。3 is a schematic diagram illustrating a data structure 300 that may be stored in database 124 of message server system 108, according to some examples. While the contents of database 124 are shown as including multiple tables, it will be appreciated that data may be stored in other types of data structures (eg, as an object-oriented database).

数据库124包括存储在消息表306内的消息数据。对于任何特定的一条消息，该消息数据包括至少消息发送者数据、消息接收方(或接收者)数据和有效载荷。下面参照图4描述关于可以被包括在消息中并且被包括在存储在消息表306中的消息数据中的信息的另外的细节。Database 124 includes message data stored in message table 306 . For any particular message, the message data includes at least message sender data, message recipient (or receiver) data, and payload. Additional details regarding information that may be included in a message and included in message data stored in message table 306 are described below with reference to FIG. 4 .

实体表308存储实体数据，并且(例如，参考地)链接到实体图304和简档数据302。其记录保存在实体表308内的实体可以包括个人、公司实体、组织、对象、地点、事件等。不管实体类型如何，消息服务器系统108存储关于其的数据的任何实体可以是识别的实体。每个实体设置有唯一标识符，以及实体类型标识符(未示出)。Entity table 308 stores entity data and is (eg, referentially) linked to entity graph 304 and profile data 302 . Entities whose records are maintained in entity table 308 may include individuals, corporate entities, organizations, objects, places, events, and the like. Regardless of the entity type, any entity about which message server system 108 stores data may be an identified entity. Each entity is provided with a unique identifier, as well as an entity type identifier (not shown).

实体图304存储关于实体之间的关系和关联的信息。仅作为示例，这样的关系可以是基于兴趣或基于活动的职业关系(例如，在共同的公司或组织工作)、社交关系。The entity graph 304 stores information about relationships and associations between entities. By way of example only, such relationships may be interest-based or activity-based professional relationships (eg, working in a common company or organization), social relationships.

简档数据302存储关于特定实体的多种类型的简档数据。基于由特定实体指定的隐私设置，简档数据302可以被选择性地使用并呈现给消息系统100的其他用户。在实体是个人的情况下，简档数据302包括例如用户名、电话号码、地址、设置(例如，通知和隐私设置)，以及用户选择的化身表示(或这样的化身表示的集合)(如果有的话)。然后，特定用户可以将这些化身表示中的一个或更多个选择性地包括在经由消息系统100传送的消息的内容中以及在由消息客户端104向其他用户显示的地图界面上。Profile data 302 stores various types of profile data about a particular entity. Profile data 302 may be selectively used and presented to other users of messaging system 100 based on privacy settings specified by a particular entity. Where the entity is an individual, profile data 302 includes, for example, a username, phone number, address, settings (e.g., notification and privacy settings), and an avatar (or collection of such avatars) selected by the user (if any). if). A particular user may then selectively include one or more of these avatar representations in the content of messages communicated via messaging system 100 and on map interfaces displayed by messaging client 104 to other users.

在实体是团体的情况下，除了团体名称、成员和相关团体的各种设置(例如，通知)之外，团体的简档数据302还可以类似地包括与团体相关联的一个或更多个化身表示。Where the entity is a community, the profile data 302 for the community may similarly include one or more avatars associated with the community in addition to the community name, members, and various settings related to the community (e.g., notifications) express.

数据库124还包括用于存储用户的相应旅行参数的旅行参数表318。虽然旅行参数表318被描绘为与简档数据302分离，但是旅行参数表318可以被包括作为简档数据302的一部分。因此，每个实体/用户可以具有与其相关联的相应旅行参数。旅行参数的示例包括但不限于：旅行时间表、运输时间表、语言、一般位置、特定地点或地标、活动、参与者(例如，参与旅行的全部或部分的朋友)和/或感兴趣的主题。The database 124 also includes a travel parameter table 318 for storing the user's corresponding travel parameters. Although travel parameter table 318 is depicted as being separate from profile data 302 , travel parameter table 318 may be included as part of profile data 302 . Accordingly, each entity/user may have corresponding travel parameters associated with it. Examples of travel parameters include, but are not limited to: travel schedules, transportation schedules, language, general location, specific places or landmarks, activities, participants (e.g., friends participating in all or part of the trip), and/or topics of interest .

消息系统100可以基于在消息客户端104内提供的用户提交的内容(例如，消息线程内的内容，和/或与由消息系统100提供的旅行计划用户界面相关联的内容)来填充旅行参数表318。替选地或另外，消息系统100可以基于来自第三方应用的内容(例如，来自第三方电子邮件/文本消息应用、日历应用、航班应用、酒店应用等的内容)来填充数据库124。在一个或更多个实施方式中，用户可以选择加入和/或以其他方式授权用来自消息系统100内和/或来自第三方应用的内容填充旅行参数表318。The messaging system 100 may populate the travel parameter table based on user-submitted content provided within the messaging client 104 (e.g., content within a messaging thread, and/or content associated with a travel planning user interface provided by the messaging system 100) 318. Alternatively or in addition, messaging system 100 may populate database 124 based on content from third-party applications (eg, content from third-party email/text messaging applications, calendar applications, flight applications, hotel applications, etc.). In one or more implementations, the user may opt-in and/or otherwise authorize the travel parameter table 318 to be populated with content from within the messaging system 100 and/or from third-party applications.

数据库124还在增强表310中存储增强数据，例如覆盖(overlay)或过滤器。增强数据与视频(视频的数据被存储在视频表314中)和图像(图像的数据被存储在图像表316中)相关联并且应用于视频和图像。Database 124 also stores enhancement data, such as overlays or filters, in enhancement table 310 . Enhancement data is associated with and applied to video (data for video is stored in video table 314 ) and image (data for image is stored in image table 316 ).

在一个示例中，过滤器是在向接收方用户呈现期间被显示为覆盖在图像或视频上的覆盖。过滤器可以是各种类型的，包括当发送用户正在编写消息时从由消息客户端104呈现给发送用户的一组过滤器中用户选择的过滤器。其他类型的过滤器包括地理定位过滤器(也称为地理过滤器)，其可以基于地理定位被呈现给发送用户。例如，可以基于由客户端设备102的全球定位系统(GPS)单元确定的地理定位信息，由消息客户端104在用户接口内呈现特定于附近或特殊定位的地理定位过滤器。In one example, a filter is an overlay displayed as an overlay over an image or video during presentation to a recipient user. The filters may be of various types, including a user-selected filter from a set of filters presented to the sending user by the message client 104 while the sending user is composing the message. Other types of filters include geolocation filters (also referred to as geofilters), which can be presented to the sending user based on geolocation. For example, proximity or particular location-specific geolocation filters may be presented within the user interface by messaging client 104 based on geolocation information determined by a global positioning system (GPS) unit of client device 102 .

另一种类型的过滤器是数据过滤器，其可以由消息客户端104基于在消息创建处理期间由客户端设备102收集的其他输入或信息选择性地呈现给发送用户。数据过滤器的示例包括特定定位处的当前温度、发送用户行进的当前速度、客户端设备102的电池寿命或当前时间。Another type of filter is a data filter, which may be selectively presented to the sending user by message client 104 based on other input or information collected by client device 102 during the message creation process. Examples of data filters include the current temperature at a particular location, the current speed at which the sending user is traveling, the battery life of the client device 102, or the current time.

可以被存储在图像表316内的其他增强数据包括增强现实内容项(例如，对应于应用镜头或增强现实体验)。增强现实内容项可以是可以添加至图像或视频的实时特殊效果和声音。Other augmented data that may be stored within image table 316 includes augmented reality content items (eg, corresponding to application shots or augmented reality experiences). Augmented reality content items can be real-time special effects and sounds that can be added to images or videos.

如上所述，增强数据包括增强现实内容项、覆盖、图像变换、AR图像以及涉及可以应用于图像数据(例如，视频或图像)的修改的类似术语。这包括实时修改，其在使用客户端设备102的设备传感器(例如，一个或更多个摄像装置)捕获图像时对图像进行修改并且然后在具有修改的情况下在客户端设备102的屏幕上显示图像。这还包括对所存储的内容的修改，例如对可以被修改的图库中的视频片段的修改。例如，在可以访问多个增强现实内容项的客户端设备102中，用户可以使用具有多个增强现实内容项的单个视频片段来查看不同的增强现实内容项将如何修改存储的片段。例如，通过为同一内容选择不同的增强现实内容项，可以将应用不同伪随机运动模型的多个增强现实内容项应用于该同一内容。类似地，实时视频捕获可以与示出的修改一起使用，以显示当前由客户端设备102的传感器捕获的视频图像将如何修改捕获的数据。这样的数据可以简单地显示在屏幕上而不存储在存储器中，或者由设备传感器捕获的内容可以在进行或不进行修改(或两者)的情况下被记录并存储在存储器中。在某些系统中，预览功能可以同时显示不同的增强现实内容项在显示器的不同窗口中将看起来如何。例如，这可以实现同时在显示器上查看具有不同伪随机动画的多个窗口。As noted above, augmented data includes augmented reality content items, overlays, image transforms, AR images, and similar terms referring to modifications that can be applied to image data (eg, video or images). This includes real-time modification, which modifies the image as it is captured using a device sensor (e.g., one or more cameras) of the client device 102 and then displays it with the modification on the screen of the client device 102 image. This also includes modifications to stored content, such as video clips in a gallery that can be modified. For example, in a client device 102 that has access to multiple augmented reality content items, a user may use a single video segment with multiple augmented reality content items to see how different augmented reality content items would modify the stored segment. For example, multiple augmented reality content items applying different pseudo-random motion models may be applied to the same content by selecting different augmented reality content items for the same content. Similarly, real-time video capture can be used with the modifications shown to show how the video image currently being captured by the sensor of the client device 102 will modify the captured data. Such data may simply be displayed on the screen without being stored in memory, or what is captured by the device's sensors may be recorded and stored in memory with or without modification (or both). In some systems, a preview function can simultaneously show how different items of augmented reality content will look in different windows of the display. For example, this enables multiple windows with different pseudo-random animations to be viewed on the display at the same time.

因此，使用增强现实内容项的数据和各种系统或使用该数据修改内容的其他这样的变换系统可以涉及视频帧中对象(例如，脸、手、身体、猫、狗、表面、对象等)的检测，在这些对象离开视场、进入视场以及在视场四处移动时对这些对象的跟踪，以及在跟踪这些对象时对其进行的修改或变换。在各种实施方式中，可以使用用于实现这样的变换的不同方法。一些示例可以涉及生成一个或更多个对象的三维网格模型，以及在视频内使用模型的变换和动画纹理来实现变换。在其他示例中，可以使用对象上的点的跟踪将图像或纹理(可以是二维或三维的)放置在所跟踪的位置处。在更进一步的示例中，可以使用视频帧的神经网络分析将图像、模型或纹理放置在内容(例如，图像或视频帧)中。因此，增强现实内容项既指用于在内容中创建变换的图像、模型和纹理，也指通过对象检测、跟踪和放置实现这样的变换所需的附加建模和分析信息。Thus, various systems using the data of an augmented reality content item, or other such transformation systems that modify content using that data, may involve the transformation of objects (e.g., faces, hands, bodies, cats, dogs, surfaces, objects, etc.) Detection, tracking of these objects as they leave the field of view, enter the field of view, and move around the field of view, and the modification or transformation of these objects while tracking them. In various implementations, different methods for implementing such transformations may be used. Some examples may involve generating a three-dimensional mesh model of one or more objects, and using the model's transforms and animated textures within the video to effectuate the transformation. In other examples, tracking of points on the object may be used to place an image or texture (which may be two-dimensional or three-dimensional) at the tracked location. In still further examples, neural network analysis of video frames can be used to place images, models, or textures within content (eg, images or video frames). AR content items thus refer both to the images, models, and textures used to create transformations within the content, as well as to the additional modeling and analysis information required to achieve such transformations through object detection, tracking, and placement.

可以利用保存在任何种类的计算机化系统的存储器中的任何种类的视频数据(例如，视频流、视频文件等)来执行实时视频处理。例如，用户可以加载视频文件并将其保存在设备的存储器中，或者可以使用设备的传感器生成视频流。此外，可以使用计算机动画模型来处理任何对象，例如人脸和人体的各部分、动物或非生物(例如椅子、汽车或其他对象)。Real-time video processing may be performed using any kind of video data (eg, video streams, video files, etc.) stored in the memory of any kind of computerized system. For example, a user can load a video file and save it in the device's memory, or the device's sensors can be used to generate a video stream. Furthermore, computer animation models can be used for any object such as human faces and parts of the human body, animals or non-living things such as chairs, cars or other objects.

在一些示例中，当与要变换的内容一起选择特定修改时，要变换的元素由计算设备识别，然后如果要变换的元素存在于视频的帧中，则要变换的元素被检测和跟踪。根据修改请求修改对象的元素，从而变换视频流的帧。对于不同种类的变换，可以通过不同的方法执行对视频流的帧的变换。例如，对于主要涉及改变对象元素形式的帧变换，计算对象的每个元素的特征点(例如，使用主动形状模型(ASM)或其他已知方法)。然后，针对对象的至少一个元素中的每一个生成基于特征点的网格。该网格用于跟踪视频流中的对象的元素的后续阶段。在跟踪处理中，将所提及的每个元素的网格与每个元素的位置对准。然后，在网格上生成附加点。基于修改请求针对每个元素生成第一点的第一集合，并且基于第一点的集合和修改请求针对每个元素生成第二点的集合。然后，可以通过基于第一点的集合和第二点的集合以及网格修改对象的元素，对视频流的帧进行变换。在这样的方法中，也可以通过跟踪和修改背景来使所修改对象的背景改变或变形。In some examples, when a particular modification is selected along with the content to be transformed, the element to be transformed is identified by the computing device and then detected and tracked if the element to be transformed is present in a frame of the video. Modify the elements of the object according to the modification request, thus transforming the frame of the video stream. For different kinds of transformations, the transformation of the frames of the video stream can be performed by different methods. For example, for frame transformations that primarily involve changing the form of object elements, feature points are computed for each element of the object (eg, using Active Shape Modeling (ASM) or other known methods). Then, a feature point-based mesh is generated for each of the at least one element of the object. This grid is used in subsequent stages to track elements of objects in the video stream. In the tracking process, the mesh of each element mentioned is aligned with the position of each element. Then, generate additional points on the mesh. A first set of first points is generated for each element based on the modification request, and a second set of points is generated for each element based on the set of first points and the modification request. Frames of the video stream may then be transformed by modifying elements of the object based on the first set of points and the second set of points and the mesh. In such methods, the background of the modified object may also be changed or deformed by tracking and modifying the background.

在一些示例中，使用对象的元素改变对象的一些区域的变换可以通过计算对象的每个元素的特征点并基于计算的特征点生成网格来执行。在网格上生成点，然后基于这些点生成各种区域。然后，通过将每个元素的区域与至少一个元素中的每一个的位置对准来跟踪对象的元素，并且可以基于修改的请求来修改区域的属性，从而变换视频流的帧。根据具体修改请求，可以以不同的方式变换提及的区域的属性。这样的修改可以涉及：改变区域的颜色；从视频流的帧中移除区域的至少一些部分；将一个或更多个新对象包括在基于修改请求的区域中；以及将区域或对象的元素进行修改或变形。在各种实施方式中，可以使用这样的修改的任何组合或其他类似修改。对于某些要被动画化的模型，可以选择一些特征点作为控制点，以用于确定用于模型动画的选项的整个状态空间。In some examples, a transformation that changes some regions of an object using elements of the object may be performed by calculating feature points for each element of the object and generating a mesh based on the calculated feature points. Generate points on the mesh, and then generate various regions based on those points. The elements of the object are then tracked by aligning the region of each element with the position of each of the at least one element, and the attributes of the region can be modified based on the request for modification, thereby transforming the frames of the video stream. Depending on the specific modification request, the attributes of the mentioned area can be transformed in different ways. Such modification may involve: changing the color of the region; removing at least some portions of the region from the frame of the video stream; including one or more new objects in the region based on the modification request; modification or deformation. In various implementations, any combination of such modifications or other similar modifications may be used. For some models to be animated, some feature points can be selected as control points for use in determining the entire state space of options for model animation.

在使用面部检测来变换图像数据的计算机动画模型的一些示例中，使用特定面部检测算法(例如，Viola-Jones)在图像上检测面部。然后，将主动形状模型(ASM)算法应用于图像的面部区域以检测面部特征参考点。In some examples of computer animation models that use face detection to transform image data, faces are detected on the image using a specific face detection algorithm (eg, Viola-Jones). Then, an active shape model (ASM) algorithm is applied to the face region of the image to detect facial feature reference points.

在其他示例中，可以使用其他适合面部检测的方法和算法。例如，在一些实施方式中，使用界标来定位特征，该界标表示在所考虑的大多数图像中存在的可区分点。例如，对于面部界标，可以使用左眼瞳孔的位置。如果初始界标不可识别(例如，如果人有眼罩)，则可以使用次级界标。这样的界标识别过程可以用于任何这样的对象。在一些示例中，一组界标形成形状。可以使用形状中的点的坐标将形状表示为矢量。一个形状利用相似变换(允许平移、缩放和旋转)与另一个形状对准，该相似变换使形状点之间的平均欧几里德距离最小化。平均形状是对准的训练形状的均值。In other examples, other methods and algorithms suitable for face detection may be used. For example, in some implementations, features are located using landmarks representing distinguishable points present in most of the images under consideration. For example, for facial landmarks, the location of the pupil of the left eye can be used. Secondary landmarks may be used if the initial landmarks are not identifiable (for example, if the person has a blindfold). Such a landmark identification process can be used for any such objects. In some examples, a set of landmarks forms a shape. Shapes can be represented as vectors using the coordinates of the points in the shape. One shape is aligned to another shape using a similarity transformation (allowing translation, scaling, and rotation) that minimizes the average Euclidean distance between shape points. Average shape is the mean of the aligned training shapes.

在一些示例中，从与由全局面部检测器确定的面部的位置和大小对准的平均形状开始搜索界标。然后，这样的搜索重复以下步骤：通过每个点周围的图像纹理的模板匹配来调整形状点的定位而建议暂定形状，并且然后使暂定形状符合全局形状模型，直至发生收敛。在某些系统中，个别的模板匹配是不可靠的，并且形状模型将弱模板匹配的结果进行池化，以形成更强的整体分类器。整个搜索从粗略分辨率到精细分辨率在图像金字塔的每个级别上重复。In some examples, the landmark is searched starting from an average shape aligned with the location and size of the face determined by the global face detector. Such a search then repeats the steps of suggesting a tentative shape by adjusting the location of the shape points by template matching of the image texture around each point, and then fitting the tentative shape to the global shape model until convergence occurs. In some systems, individual template matches are unreliable, and the shape model pools the results of weak template matches to form a stronger overall classifier. The entire search is repeated at each level of the image pyramid from coarse resolution to fine resolution.

变换系统可以在客户端设备102上捕获图像或视频流，并在客户端设备102上本地执行复杂的图像操纵，同时保持适当的用户体验、计算时间和功耗。复杂的图像操纵可以包括大小和形状变化、情绪转换(例如，将面部从皱眉变为微笑)、状态转换(例如，使被摄体变老、减少表观年龄、改变性别)、风格转换、图形元素应用，以及由已经被配置成在客户端设备102上高效执行的卷积神经网络实现的任何其他合适的图像或视频操纵。The transformation system can capture images or video streams on the client device 102 and perform complex image manipulations locally on the client device 102 while maintaining appropriate user experience, computation time, and power consumption. Sophisticated image manipulations can include size and shape changes, mood transformations (e.g., changing a face from frowning to smiling), state transformations (e.g., aging subjects, reducing apparent age, changing gender), style transformations, graphic Element application, as well as any other suitable image or video manipulation implemented by a convolutional neural network that has been configured to execute efficiently on client device 102 .

在一些示例中，用于变换图像数据的计算机动画模型可以由系统使用，在该系统中，用户可以使用具有神经网络的客户端设备102来捕获用户的图像或视频流(例如，自拍)，该神经网络操作为在客户端设备102上操作的消息客户端104的一部分。在消息客户端104内操作的变换系统确定图像或视频流中的面部的存在并且提供与计算机动画模型相关联的修改图标以变换图像数据，或者计算机动画模型可以被呈现为与本文中描述的接口相关联。修改图标包括以下变化，该变化可以是作为修改操作的一部分的修改图像或视频流中的用户面部的基础。一旦选择了修改图标，则变换系统发起将用户的图像转换以反映所选择的修改图标(例如，在用户上生成笑脸)的处理。一旦图像或视频流被捕获并且指定的修改被选择，则修改的图像或视频流就可以呈现在客户端设备102上显示的图形用户接口中。变换系统可以在图像或视频流的一部分上实现复杂的卷积神经网络，以生成和应用所选择的修改。也就是说，一旦选择了修改图标，用户就可以捕获图像或视频流并且实时或近乎实时地呈现修改结果。此外，当正在捕获视频流时，修改可以是持久的，并且所选择的修改图标保持被切换。机器教导的神经网络可以用于实现这样的修改。In some examples, a computer animation model for transforming image data may be used by a system in which a user may use a client device 102 with a neural network to capture an image or video stream (e.g., a selfie) of the user, the The neural network operates as part of the messaging client 104 operating on the client device 102 . A transformation system operating within message client 104 determines the presence of a face in an image or video stream and provides a modification icon associated with a computer animation model to transform the image data, or the computer animation model may be presented as an interface with the herein described Associated. Modifying an icon includes changes that may be the basis for modifying a user's face in an image or video stream as part of a modification operation. Once a modifier icon is selected, the transformation system initiates the process of transforming the user's image to reflect the selected modifier icon (eg, generating a smiley face on the user). Once the image or video stream is captured and the specified modification selected, the modified image or video stream may be presented in a graphical user interface displayed on the client device 102 . A transformation system can implement a complex convolutional neural network on a portion of an image or video stream to generate and apply selected modifications. That is, once a modification icon is selected, the user can capture an image or video stream and present the modification results in real-time or near real-time. Furthermore, the modification can be persistent and the selected modification icon remains toggled while the video stream is being captured. Machine-taught neural networks can be used to implement such modifications.

呈现由变换系统执行的修改的图形用户接口可以为用户提供附加的交互选项。这样的选项可以基于用于发起特定计算机动画模型的选择和内容捕获的接口(例如，从内容创建者用户接口发起)。在各种实施方式中，修改可以在对修改图标的初始选择之后是持久的。用户可以通过轻击或以其他方式选择正由变换系统修改的面部来打开或关闭修改，并将其存储以供以后查看或浏览到成像应用的其他区域。在由变换系统修改多个面部的情况下，用户可以通过敲击或选择在图形用户接口内修改和显示的单个面部来全局打开或关闭修改。在一些实施方式中，可以通过敲击或选择图形用户接口内显示的单独面部或一系列单独面部来单独修改一组多个面部中的各个面部，或者单独切换这样的修改。Presenting the modified graphical user interface performed by the transformation system may provide the user with additional interaction options. Such options may be based on the interface used to initiate selection of a particular computer animation model and content capture (eg, from a content creator user interface). In various implementations, the modification may be persistent after the initial selection of the modification icon. The user can toggle the modification on or off by tapping or otherwise selecting the face being modified by the transformation system and store it for later viewing or browse to other areas of the imaging application. In the case of multiple faces being modified by the transformation system, the user can globally turn the modification on or off by tapping or selecting a single face to be modified and displayed within the graphical user interface. In some embodiments, each face of a set of multiple faces can be individually modified by tapping or selecting an individual face or series of individual faces displayed within the graphical user interface, or such modification can be toggled individually.

故事表312存储关于消息和相关联的图像、视频或音频数据的集合的数据，所述消息和相关联的图像、视频或音频数据被汇编成集合(例如，故事或图库)。特定集合的创建可以由特定用户(例如，其记录保存在实体表308中的每个用户)发起。用户可以以已经由该用户创建和发送/广播的内容集合的形式创建“个人故事”。为此，消息客户端104的用户接口可以包括用户可选择的图标，以使得发送用户能够将特定内容添加到他或她的个人故事。The stories table 312 stores data about collections of messages and associated image, video, or audio data compiled into collections (eg, stories or galleries). Creation of a particular collection may be initiated by a particular user (eg, each user whose records are kept in entity table 308). A user may create a "personal story" in the form of a collection of content already created and sent/broadcast by that user. To this end, the user interface of messaging client 104 may include user-selectable icons to enable the sending user to add specific content to his or her personal story.

集合还可以构成作为来自多个用户的内容集合的“实况故事”，该内容集合是手动地、自动地或者使用手动技术和自动技术的组合创建的。例如，“实况故事”可以构成来自不同位置和事件的用户提交的内容的策展流。其客户端设备启用了定位服务并且在特定时间处于共同定位事件处的用户可以例如经由消息客户端104的用户接口被呈现有将内容贡献给特定实况故事的选项。可以由消息客户端104基于用户的定位向他或她标识实况故事。最终结果是从群体角度讲述的“实况故事”。A collection may also constitute a "live story" that is a collection of content from multiple users that is created manually, automatically, or using a combination of manual and automatic techniques. For example, "Live Stories" may constitute a curated stream of user-submitted content from different locations and events. A user whose client device has location services enabled and is at a co-location event at a particular time may be presented with the option to contribute content to a particular live story, eg, via a user interface of the messaging client 104 . Live stories may be identified to the user by messaging client 104 based on his or her location. The end result is a "live story" told from a group perspective.

另外类型的内容集合被称为“定位故事”，该“定位故事”使其客户端设备102位于特定地理定位内(例如，在学院或大学校园)的用户能够对特定集合做出贡献。在一些示例中，对定位故事的贡献可能需要二级认证来验证终端用户属于特定组织或其他实体(例如，是大学校园中的学生)。Another type of content collection is called a "located story" that enables users whose client devices 102 are located within a particular geographic location (eg, on a college or university campus) to contribute to a particular collection. In some examples, contributions to a location story may require a second level of authentication to verify that the end user belongs to a particular organization or other entity (eg, is a student on a college campus).

如上面提到的，视频表314存储视频数据，在一个示例中，该视频数据与其记录保存在消息表306内的消息相关联。类似地，图像表316存储与其消息数据存储在实体表308中的消息相关联的图像数据。实体表308可以使来自增强表310的各种增强与存储在图像表316和视频表314中的各种图像和视频相关联。As mentioned above, video table 314 stores video data associated with messages whose records are stored in message table 306 in one example. Similarly, image table 316 stores image data associated with messages whose message data is stored in entity table 308 . Entity table 308 may associate various enhancements from enhancement table 310 with various images and videos stored in image table 316 and video table 314 .

图4是示出根据一些示例的消息400的结构的示意图，消息400由消息客户端104生成，以用于传送至另外的消息客户端104或消息服务器116。特定消息400的内容用于填充存储在数据库124中的消息表306，该消息表306可由消息服务器116访问。类似地，消息400的内容作为客户端设备102或应用服务器114的“传输中”(“in-transit”)或“飞行中”(“in-flight”)数据存储在存储器中。消息400被示为包括以下示例组成部分：4 is a schematic diagram illustrating the structure of a message 400 generated by a message client 104 for transmission to another message client 104 or message server 116, according to some examples. The content of the particular message 400 is used to populate a message table 306 stored in the database 124 , which is accessible by the message server 116 . Similarly, the content of message 400 is stored in memory as “in-transit” or “in-flight” data for client device 102 or application server 114 . Message 400 is shown including the following example components:

·消息标识符402：识别消息400的唯一标识符。• Message Identifier 402 : A unique identifier that identifies the message 400 .

·消息文本有效载荷404：要由用户经由客户端设备102的用户接口生成并且被包括在消息400中的文本。• Message Text Payload 404 : Text to be generated by the user via the user interface of the client device 102 and included in the message 400 .

·消息图像有效载荷406：由客户端设备102的摄像装置部件捕获或从客户端设备102的存储器部件检索并且被包括在消息400中的图像数据。发送或接收的消息400的图像数据可以存储在图像表316中。• Message Image Payload 406 : image data captured by the camera component of the client device 102 or retrieved from the memory component of the client device 102 and included in the message 400 . Image data of the sent or received message 400 may be stored in the image table 316 .

·消息视频有效载荷408：由摄像装置部件捕获或从客户端设备102的存储器部件检索并且被包括在消息400中的视频数据。发送或接收的消息400的视频数据可以存储在视频表314中。• Message Video Payload 408 : Video data captured by the camera component or retrieved from the memory component of the client device 102 and included in the message 400 . Video data of the message 400 sent or received may be stored in the video table 314 .

·消息音频有效载荷410：由麦克风捕获或从客户端设备102的存储器部件检索并且被包括在消息400中的音频数据。• Message Audio Payload 410 : audio data captured by a microphone or retrieved from a memory component of the client device 102 and included in the message 400 .

·消息增强数据412：表示要应用于消息400的消息图像有效载荷406、消息视频有效载荷408或消息音频有效载荷410的增强的增强数据(例如，过滤器、标贴或其他注释或增强)。发送或接收的消息400的增强数据可以存储在增强表310中。• Message enhancement data 412 : enhancement data (eg, filters, stickers, or other annotations or enhancements) representing enhancements to be applied to the message image payload 406 , message video payload 408 , or message audio payload 410 of the message 400 . Enhancement data for sent or received message 400 may be stored in enhancement table 310 .

·消息持续时间参数414：参数值，其指示消息的内容(例如，消息图像有效载荷406、消息视频有效载荷408、消息音频有效载荷410)将经由消息客户端104被呈现给用户或使其对于用户可访问的以秒为单位的时间量。Message duration parameter 414: A parameter value indicating that the content of the message (e.g., message image payload 406, message video payload 408, message audio payload 410) will be presented to the user via the message client 104 or made relevant to the message The amount of time in seconds that the user has access to.

·消息地理定位参数416：与消息的内容有效载荷相关联的地理定位数据(例如，纬度和经度坐标)。多个消息地理定位参数416值可以被包括在有效载荷中，这些参数值中的每个参数值与关于内容中所包括的内容项(例如，消息图像有效载荷406内的特定图像或消息视频有效载荷408中的特定视频)相关联。• Message Geolocation Parameters 416: Geolocation data (eg, latitude and longitude coordinates) associated with the content payload of the message. Multiple message geolocation parameter 416 values may be included in the payload, each of these parameter values being valid with respect to a content item included in the content (e.g., a particular image or message video within message image payload 406). specific video in payload 408).

·消息故事标识符418：标识与消息400的消息图像有效载荷406中的特定内容项相关联的一个或更多个内容集合(例如，故事表312中标识的“故事”)的标识符值。例如，可以使用标识符值将消息图像有效载荷406内的多个图像各自与多个内容集合相关联。• Message Story Identifier 418 : An identifier value that identifies one or more collections of content (eg, “Stories” identified in the Stories table 312 ) associated with a particular content item in the message image payload 406 of the message 400 . For example, multiple images within message image payload 406 may each be associated with multiple content sets using identifier values.

·消息标签420：每个消息400可以用多个标签来标记，多个标签中的每个标签指示消息有效载荷中所包括的内容的主题。例如，在消息图像有效载荷406中包括的特定图像描绘动物(例如，狮子)的情况下，可以在消息标签420内包括指示相关动物的标签值。标签值可以基于用户输入手动地生成，或可以使用例如图像识别自动地生成。• Message tags 420: Each message 400 may be tagged with a plurality of tags, each of which indicates the subject of the content included in the message payload. For example, where a particular image included in message image payload 406 depicts an animal (eg, a lion), a tag value may be included within message tag 420 that indicates the associated animal. Tag values may be generated manually based on user input, or may be automatically generated using, for example, image recognition.

·消息发送者标识符422：指示在其上生成消息400并且从其发送消息400的客户端设备102的用户的标识符(例如消息系统标识符、电子邮件地址或设备标识符)。• Message Sender Identifier 422: An identifier (eg, a messaging system identifier, email address, or device identifier) indicating the user of the client device 102 on which the message 400 was generated and from which the message 400 was sent.

·消息接收者标识符424：指示消息400定址到的客户端设备102的用户的标识符(例如消息系统标识符、电子邮件地址或设备标识符)。• Message Recipient Identifier 424: An identifier indicating the user of the client device 102 to which the message 400 is addressed (eg, a messaging system identifier, email address, or device identifier).

消息400的各个组成部分的内容(例如，值)可以是指向其内存储有内容数据值的表中的位置的指针。例如，消息图像有效载荷406中的图像值可以是指向图像表316内的位置的指针(或地址)。类似地，消息视频有效载荷408内的值可以指向存储在视频表314内的数据，存储在消息增强412内的值可以指向存储在增强表310中的数据，存储在消息故事标识符418内的值可以指向存储在故事表312中的数据，并且存储在消息发送者标识符422和消息接收者标识符424内的值可以指向存储在实体表308内的用户记录。The content (eg, value) of each component of message 400 may be a pointer to a location in a table within which the content data value is stored. For example, an image value in message image payload 406 may be a pointer (or address) to a location within image table 316 . Similarly, values within message video payload 408 may point to data stored within video table 314, values stored within message enhancement 412 may point to data stored within enhancement table 310, values stored within message story identifier 418 Values may point to data stored in stories table 312 and values stored in message sender identifier 422 and message recipient identifier 424 may point to user records stored in entity table 308 .

图5是示出了根据一些示例实施方式的用于与旅行相关联地提供与语音翻译相对应的增强现实内容的过程500的交互图。为了说明的目的，在此主要参照图1的消息客户端104以及图2的增强系统208以及转录和翻译系统214来描述过程500。然而，过程500的一个或更多个块(或操作)可以由一个或更多个其他部件和/或由其他合适的设备来执行。进一步出于解释的目的，过程500的块(或操作)在本文中被描述为串行或线性地发生。然而，过程500的多个块(或操作)可以并行或同时发生。另外，过程500的块(或操作)不需要以所示的顺序执行，以及/或者过程500的一个或更多个块(或操作)不需要执行和/或可以由其他操作替换。当其操作完成时，过程500可以终止。另外，过程500可以对应于方法、程序、算法等。FIG. 5 is an interaction diagram illustrating a process 500 for providing augmented reality content corresponding to speech translation in association with travel, according to some example implementations. For purposes of illustration, process 500 is described herein primarily with reference to message client 104 of FIG. 1 and enhancement system 208 and transcription and translation system 214 of FIG. 2 . However, one or more blocks (or operations) of process 500 may be performed by one or more other components and/or by other suitable devices. Further for purposes of explanation, the blocks (or operations) of process 500 are described herein as occurring serially or linearly. However, multiple blocks (or operations) of process 500 may occur in parallel or simultaneously. Additionally, the blocks (or operations) of process 500 need not be performed in the order shown, and/or one or more blocks (or operations) of process 500 need not be performed and/or may be replaced by other operations. Process 500 may terminate when its operations are complete. Additionally, process 500 may correspond to a method, procedure, algorithm, or the like.

消息客户端104可以与消息服务器系统108的相应用户相关联，并且该用户可以与消息服务器系统108的用户账户相关联。如上所述，消息服务器系统108可以基于与该用户的用户账户相关联的唯一标识符(例如，消息发送系统标识符、电子邮件地址和/或设备标识符)来识别用户。另外，消息服务器系统108可以实现社交网络服务器124以及/或者结合社交网络服务器124来工作，社交网络服务器124被配置成识别特定用户与之具有关系的其他用户(例如，朋友)。Message client 104 may be associated with a corresponding user of message server system 108 , and the user may be associated with a user account of message server system 108 . As noted above, message server system 108 may identify a user based on a unique identifier (eg, messaging system identifier, email address, and/or device identifier) associated with the user's user account. Additionally, message server system 108 may implement and/or work in conjunction with social networking server 124 configured to identify other users (eg, friends) with whom a particular user has a relationship.

如本文所述，消息客户端104(例如，与消息服务器系统108结合)接收执行扫描操作的用户请求。作为响应，消息客户端104确定与请求相关联的旅行参数以及图像中描绘的对象的属性。消息客户端104基于旅行参数和/或属性选择增强现实内容项(例如，对应于增强现实体验)。增强现实内容项被配置成基于与请求相关联的语音输入来呈现增强现实内容。消息客户端104获得语音输入的转录和/或翻译，并且与图像相关联地呈现增强现实内容项，增强现实内容项包括转录或翻译中的至少一个。As described herein, message client 104 (eg, in conjunction with message server system 108 ) receives a user request to perform a scan operation. In response, messaging client 104 determines travel parameters associated with the request and attributes of objects depicted in the image. Messaging client 104 selects an augmented reality content item (eg, corresponding to an augmented reality experience) based on travel parameters and/or attributes. The augmented reality content item is configured to present the augmented reality content based on the speech input associated with the request. The messaging client 104 obtains a transcription and/or translation of the speech input and presents an augmented reality content item in association with the image, the augmented reality content item including at least one of the transcription or the translation.

在块502处，消息客户端104接收与对捕获图像执行扫描操作相对应的用户输入(块502)。如本文所述，执行扫描操作对应于识别捕获图像中描绘的对象。在一个或更多个实施方式中，消息客户端104激活客户端设备102的摄像装置(例如，在消息客户端104启动时)。消息客户端104允许用户请求扫描由摄像装置捕获的摄像装置馈送中的一个或更多个项。在一个或更多个实施方式中，消息客户端104检测用户的手的手指与触摸屏的区域之间的物理接触达阈值时间段(例如，对应于按压并保持手势)。例如，消息客户端104确定用户触摸了他们的手指并将其手指保持在屏幕上阈值时间(例如，两秒)。At block 502, the messaging client 104 receives user input corresponding to performing a scan operation on the captured image (block 502). As described herein, performing a scanning operation corresponds to identifying objects depicted in captured images. In one or more implementations, the messaging client 104 activates the camera of the client device 102 (eg, upon startup of the messaging client 104 ). The messaging client 104 allows a user to request that one or more items in the camera feed captured by the camera be scanned. In one or more implementations, the messaging client 104 detects physical contact between a finger of the user's hand and an area of the touch screen for a threshold period of time (eg, corresponding to a press and hold gesture). For example, the messaging client 104 determines that the user touched their finger and held their finger on the screen for a threshold time (eg, two seconds).

在替选实施方式中，按压并保持手势可以与轮播界面(例如，如以下关于图7A至图7B所讨论的，与启动界面分开)相关联地执行。在轮播界面内，用于修改捕获图像以包括增强现实内容的增强现实内容项可以在接收用户输入之前已经被选择。关于按压并保持手势，增强现实内容项在一些实施方式中可以包括提示用户进行输入以执行扫描操作的扫描提示。例如，扫描提示可以包括在限定预定义屏幕区域的图形边界内提示用户进行按压并保持手势的文本(例如，“按压并保持以扫描”)。In alternative implementations, the press and hold gesture may be performed in association with the carousel interface (eg, separate from the launch interface, as discussed below with respect to FIGS. 7A-7B ). Within the carousel interface, an augmented reality content item for modifying the captured image to include augmented reality content may have been selected prior to receiving user input. With regard to the press and hold gesture, the augmented reality content item may, in some implementations, include a scan prompt that prompts the user for input to perform a scan operation. For example, the scan prompt may include text prompting the user to perform a press and hold gesture (eg, "press and hold to scan") within graphical boundaries defining a predefined screen area.

作为按压并保持手势的替选，消息客户端104可以接收与摄像装置馈送一起呈现的专用扫描选项(例如，扫描按钮)的用户选择。因此，响应于执行扫描操作的用户请求，消息客户端104处理捕获图像(例如，实况视频馈送)以识别图像中的对象。替选地或另外，消息客户端104处理捕获的音频输入，例如，以检测用户是否发出了与执行扫描操作的用户输入相关联的话音命令。As an alternative to a press-and-hold gesture, messaging client 104 may receive user selection of a dedicated scan option (eg, a scan button) presented with the camera feed. Accordingly, in response to a user request to perform a scanning operation, messaging client 104 processes captured images (eg, live video feeds) to identify objects in the images. Alternatively or additionally, the messaging client 104 processes the captured audio input, eg, to detect whether the user issued a voice command associated with the user input to perform a scan operation.

消息客户端104确定与用户输入相关联的旅行参数和/捕获图像中描绘的对象的属性(块504)。如上面关于旅行参数表318所述，与消息系统100结合的消息客户端104可以被配置成访问(例如，基于适当的用户许可)与用户的旅行有关的数据，并且相应地填充旅行参数表318。旅行参数表318可以存储与在消息客户端104内提供的用户提交的内容(例如，消息线程内的内容，和/或与消息系统100所提供的旅行计划用户界面相关联的内容)相对应的旅行参数。在另一示例中，旅行参数表318可以存储与来自第三方应用的内容(例如，来自电子邮件/文本消息、日历应用、航班应用、酒店应用的内容)相对应的旅行参数。在一个或更多个实施方式中，旅行参数指示以下中的一个或更多个：旅行时间表、运输时间表、语言、一般位置(例如，城市、州等)、特定地点或地标、活动、参与者(例如，朋友)和/或感兴趣的主题。The messaging client 104 determines travel parameters associated with the user input and/or attributes of objects depicted in the captured image (block 504). As described above with respect to travel parameter table 318, messaging client 104 in conjunction with messaging system 100 may be configured to access (e.g., based on appropriate user permissions) data related to the user's travel and populate travel parameter table 318 accordingly . The travel parameters table 318 may store information corresponding to user-submitted content provided within the messaging client 104 (e.g., content within a messaging thread, and/or content associated with a travel planning user interface provided by the messaging system 100). travel parameters. In another example, travel parameters table 318 may store travel parameters corresponding to content from third-party applications (eg, content from email/text messages, calendar applications, flight applications, hotel applications). In one or more embodiments, the travel parameters indicate one or more of the following: travel schedule, transit schedule, language, general location (e.g., city, state, etc.), specific location or landmark, event, Participants (eg, friends) and/or topics of interest.

如图5所示，块506还包括确定图像中描绘的对象的属性。虽然在图5中未示出，但是可以结合对象检测系统212来执行对象的检测和/或对象的属性的确定。例如，消息客户端104可以向对象检测系统212发送用于识别所捕获图像中的对象的请求。作为响应，对象检测系统212确定捕获图像中的对象的属性。如上所述，对象检测系统212可以对应于消息系统100的子系统，并且可以在客户端侧由消息客户端104支持和/或在服务器端由应用服务器122支持。在一个或更多个实施方式中，可以在客户端侧、服务器侧和/或客户端侧和服务器侧的组合来实现对捕获图像内的对象的检测。As shown in FIG. 5, block 506 also includes determining attributes of objects depicted in the image. Although not shown in FIG. 5 , detection of objects and/or determination of attributes of objects may be performed in conjunction with object detection system 212 . For example, messaging client 104 may send a request to object detection system 212 to identify an object in a captured image. In response, object detection system 212 determines attributes of objects in the captured image. As noted above, object detection system 212 may correspond to a subsystem of messaging system 100 and may be supported on the client side by messaging client 104 and/or on the server side by application server 122 . In one or more implementations, detection of objects within captured images can be accomplished on the client side, server side, and/or a combination of client and server sides.

如上文进一步所述，对象检测系统212被配置成实现或以其他方式访问对象识别算法(例如，包括机器学习算法)，该对象识别算法被配置成扫描捕获图像，并且检测/跟踪图像内的对象的移动。对象检测系统212还被配置成确定或以其他方式访问所识别的对象的属性。如以下关于图6A至6B、图7A至7B和图8A至8B所讨论的，对象检测系统212可以确定诸如对象的类型(例如，设备用户、除设备用户之外的个人、经由电视显示的视频)、对象的名称和/或其他一般信息(例如，物理属性、相关联的日期、相关联的企业名称、作者等)的属性。As further described above, object detection system 212 is configured to implement or otherwise access object recognition algorithms (eg, including machine learning algorithms) configured to scan captured images and detect/track objects within the images of the mobile. Object detection system 212 is also configured to determine or otherwise access attributes of identified objects. As discussed below with respect to FIGS. 6A-6B , 7A-7B , and 8A-8B , object detection system 212 may determine, for example, the type of object (e.g., device user, person other than device user, video displayed via television ), the name of the object, and/or other general information (eg, physical attributes, associated dates, associated business names, authors, etc.).

如上所述，对象检测系统212可以确定与已由第三方赞助的关键字相对应的属性。例如，第三方(例如，与博物馆、场所或其他企业相关联)可以赞助或支付某些关键词(例如，壁画或其他艺术品的名称、诸如杂志的出版物的名称)以使其比其他关键词排名更高。响应于确定给定属性对应于赞助关键词，对象检测系统212可以提供该属性相对于其他属性的更高排名。As described above, object detection system 212 may determine attributes corresponding to keywords that have been sponsored by a third party. For example, a third party (e.g., associated with a museum, venue, or other business) may sponsor or pay certain keywords (e.g., the name of a mural or other artwork, the name of a publication such as a magazine) to make it more relevant than other keywords Words rank higher. In response to determining that a given attribute corresponds to a sponsored keyword, object detection system 212 may provide a higher ranking for that attribute relative to other attributes.

对象检测系统212向消息客户端104发送对象的属性以用于消息客户端104。在这样做时，对象检测系统212可以进一步提供属性的排名信息。Object detection system 212 sends the properties of the object to messaging client 104 for messaging client 104 . In doing so, object detection system 212 may further provide ranking information for attributes.

基于在块504处确定的旅行参数和/或属性，消息客户端104可以确定激活语音识别(块506)。例如，旅行参数可以指示用户正在与用户未知的语言相关联的地区(例如，另一国家)旅行。此外，图像中描绘的(例如，来自后置摄像装置的)对象的属性可以指示以这样的语言提供语音的源(例如，除了用户之外的个人，或电视)。对激活语音识别的确定还可以基于环境因素(例如，设备地理定位、时间等)。基于这些信号，消息客户端104可以确定执行相对于语音输入源的语音识别(例如，将语音翻译成用户已知的语言)。在一个或更多个实施方式中，用户的已知语言对应于与用户简档相关联地存储的主要语言(例如，在简档数据302内)。Based on the travel parameters and/or attributes determined at block 504, the messaging client 104 may determine to activate speech recognition (block 506). For example, a travel parameter may indicate that the user is traveling in a region (eg, another country) associated with a language unknown to the user. Additionally, attributes of objects depicted in the image (eg, from a rear camera) may indicate a source (eg, a person other than the user, or a television) that provides speech in such a language. The determination to activate speech recognition may also be based on environmental factors (eg, device geolocation, time of day, etc.). Based on these signals, messaging client 104 may determine to perform speech recognition (eg, translate speech into a language known to the user) relative to the speech input source. In one or more implementations, the user's known languages correspond to the primary languages stored in association with the user's profile (eg, within profile data 302).

在一个或更多个实施方式中，执行语音识别的确定至少部分地基于旅行参数和与在块502处的用户输入相关联地(例如，由设备麦克风)接收的话音命令。此外，所确定的属性可以指示用户发出了话音命令(例如，通过经由前置摄像装置来识别用户)。消息客户端104(例如，与实现语音识别的转录和翻译系统214结合)可以检测话音命令。例如，用户可能在执行上述按压并保持手势的同时说出了术语/短语“说”或“你怎么说”。基于旅行参数(例如，指示语言)和话音命令，消息客户端104可以确定对话音命令词之后的短语(例如，“说”之后的语音输入)执行语音识别。例如，消息客户端104可以确定将语音输入从用户已知的语言翻译为与旅行参数相关联的未知语言。以这种方式，用户可以学习在旅行时如何说所述语音输入。In one or more implementations, the determination to perform voice recognition is based at least in part on travel parameters and voice commands received (eg, by a device microphone) in association with the user input at block 502 . Additionally, the determined attribute may indicate that the user issued a voice command (eg, by identifying the user via the front-facing camera). Messaging client 104 (eg, in conjunction with transcription and translation system 214 implementing speech recognition) may detect voice commands. For example, the user may have spoken the term/phrase "say" or "how do you say" while performing the above-mentioned press and hold gesture. Based on the travel parameters (eg, indicating language) and the voice command, the messaging client 104 may determine to perform speech recognition on the phrase following the voice command word (eg, the voice input after "say"). For example, messaging client 104 may determine to translate speech input from a language known to the user to an unknown language associated with the travel parameters. In this way, the user can learn how to speak the voice input while traveling.

消息客户端104向增强系统208发送对被配置用于语音识别的增强现实内容项的请求(操作508)。如上所述，增强系统208可以对应于消息系统100的子系统，并且可以在客户端侧由消息客户端104支持和/或在服务器侧由应用服务器114支持。在一个或更多个实施方式中，增强现实内容项的提供可以在客户端侧、服务器侧和/或客户端侧和服务器侧的组合实现。Messaging client 104 sends a request to augmented system 208 for an augmented reality content item configured for speech recognition (operation 508). As noted above, enhancement system 208 may correspond to a subsystem of messaging system 100 and may be supported on the client side by messaging client 104 and/or on the server side by application server 114 . In one or more embodiments, provision of augmented reality content items may be implemented on the client side, server side, and/or a combination of client side and server side.

如上文关于块502所述，可能已经在轮播界面内接收到用于执行扫描操作的用户输入，其中已经选择了增强现实内容项。替选地，用户输入可以在消息客户端104启动时(例如，在启动界面内)被接收，在这种情况下，增强现实内容项尚未被选择。As described above with respect to block 502, user input to perform a scan operation may have been received within the carousel interface where an augmented reality content item has been selected. Alternatively, user input may be received when messaging client 104 is launched (eg, within a launch interface), in which case the augmented reality content item has not yet been selected.

在尚未选择增强现实内容项的情况下，操作508处的请求可以包括用于便于对增强现实内容项的选择的一个或更多个参数(例如，旅行参数、对象的属性和/或话音命令的指示)。例如，请求可以包括用户的已知语言、与旅行相关联的语言、语音输入源(例如，用户、其他个人、电视)和/或话音命令的指示(如果适用的话)。In the event that an augmented reality content item has not been selected, the request at operation 508 may include one or more parameters to facilitate selection of an augmented reality content item (e.g., travel parameters, properties of an object, and/or parameters of a voice command). instruct). For example, the request may include the user's known languages, languages associated with travel, sources of voice input (eg, user, other individuals, television), and/or indications of voice commands (if applicable).

在块510处，增强系统208从多个可用增强现实内容项选择增强现实内容项，该增强现实内容项被配置用于语音识别并且基于与来自操作508的请求一起提供的参数。在一个或更多个实施方式中，增强系统208被配置成通过将参数与可用增强现实内容项中的每一个所关联的相应参数(例如，预定义术语)进行比较来搜索可用增强现实内容项的集合。At block 510 , the augmented system 208 selects an augmented reality content item from a plurality of available augmented reality content items configured for speech recognition and based on parameters provided with the request from operation 508 . In one or more implementations, the augmentation system 208 is configured to search for available augmented reality content items by comparing the parameters with corresponding parameters (e.g., predefined terms) associated with each of the available augmented reality content items collection.

在一个或更多个实现方式中，包括在数据库124中的增强表310被配置成用相应属性和/或预定义术语来指定或标记每个增强现实内容项(例如，经由元数据)以搜索增强现实内容项。因此，在一个或更多个实施方式中，增强系统208利用来自请求的参数(例如，已知语言、与旅行相关联的语言、对应于语音源的对象的类型、命令的类型)来查询数据库124，并且数据库124可以提供一个或更多个所选择的增强现实内容项作为查询的结果。In one or more implementations, the augmented reality table 310 included in the database 124 is configured to designate or tag each augmented reality content item (e.g., via metadata) with corresponding attributes and/or predefined terms for searching Augmented reality content item. Thus, in one or more implementations, the enhancement system 208 queries the database with parameters from the request (e.g., known languages, languages associated with the trip, type of object corresponding to the source of the speech, type of command) 124, and the database 124 may provide the one or more selected augmented reality content items as a result of the query.

在参数与多于一个增强现实内容项的相应属性和/或预定义词相对应的情况下，数据库124可以被配置成提供多个增强现实内容项的指示作为查询的结果。此外，数据库124可以对多个增强现实内容项进行排名(例如，基于匹配数目和/或分配给多个参数中的参数的权重、增强现实内容项属性和/或预定义词)。数据库124可以向增强系统208提供排名的指示作为查询结果的一部分。Where parameters correspond to respective attributes and/or predefined words of more than one augmented reality content item, database 124 may be configured to provide indications of the plurality of augmented reality content items as a result of the query. Additionally, database 124 may rank the plurality of augmented reality content items (eg, based on number of matches and/or weights assigned to parameters of the plurality of parameters, augmented reality content item attributes, and/or predefined words). Database 124 may provide an indication of the ranking to enhancement system 208 as part of the query results.

在语音输入源对应于客户端设备102的用户并且话音命令指示执行从已知语言到与旅行相关联的语言的翻译(例如，经由话音命令词“说”或“你怎么说”)的情况下，所选择的增强现实内容项可以对应于显示语音输入的视觉转录、语音输入的视觉翻译和语音输入的音频翻译。在语音输入源对应于另一个人的实况语音的示例中，所选择的增强现实内容项可以对应于提供实况语音的视觉翻译和/或实况语音的音频翻译。在语音输入源对应于显示相关联视频的电视的音频输出的示例中，所选择的增强现实内容项可以对应于提供音频输出的视觉转录、音频输出的视觉翻译和音频输出的音频翻译。因此，如果适用，增强系统208将所选择的增强现实内容项的指示与排名信息一起发送到消息客户端104(操作512)。Where the voice input source corresponds to the user of the client device 102 and the voice command instructs to perform a translation from a known language to a language associated with the trip (e.g., via the voice command words "say" or "how do you say") , the selected augmented reality content item may correspond to displaying a visual transcription of the spoken input, a visual translation of the spoken input, and an audio translation of the spoken input. In examples where the voice input source corresponds to another person's live voice, the selected augmented reality content item may correspond to providing a visual translation of the live voice and/or an audio translation of the live voice. In an example where the voice input source corresponds to an audio output of a television displaying associated video, the selected augmented reality content item may correspond to providing a visual transcription of the audio output, a visual translation of the audio output, and an audio translation of the audio output. Accordingly, augmentation system 208 sends an indication of the selected augmented reality content item along with ranking information, if applicable, to messaging client 104 (operation 512).

在一个或更多个实现方式中，在AR内容项已经被选择(例如，与如上所述的轮播界面相关联)的情况下，在操作508处，消息客户端104可以在发送请求时不包括上述参数。此外，块510和操作512可以简单地与维护已经选择的AR内容项的增强系统208相对应。In one or more implementations, where the AR content item has been selected (e.g., associated with a carousel interface as described above), at operation 508, the message client 104 may send the request without Include the above parameters. Furthermore, block 510 and operation 512 may simply correspond to augmentation system 208 maintaining the AR content items that have been selected.

在块514处，如果所选择的增强现实内容项尚未激活，则消息客户端104激活所选择的增强现实内容项。此外，消息客户端104接收与激活的增强现实内容项相关联的语音输入(例如，经由设备麦克风)(块516)。如上所述，语音输入可以从不同的源接收，例如从用户、除用户之外的个人和/或电视(例如，或输出视频/音频的其他显示设备)接收。At block 514, the messaging client 104 activates the selected augmented reality content item if it has not already been activated. Additionally, the messaging client 104 receives speech input (eg, via the device microphone) associated with the activated augmented reality content item (block 516). As noted above, speech input may be received from various sources, such as from a user, a person other than the user, and/or a television (eg, or other display device that outputs video/audio).

消息客户端104结合所选择的增强现实内容项向转录和翻译系统214发送对语音输入的转录和/或翻译的请求(操作518)。如上所述，转录和翻译系统214可以对应于消息系统100的子系统，并且可以在客户端侧由消息客户端104支持和/或在服务器侧由应用服务器114支持。在一个或更多个实施方式中，经由转录和翻译系统214的语音的转录和/或翻译可以由客户端侧、服务器侧和/或客户端侧和服务器侧的组合来实现。Messaging client 104 sends a request to transcription and translation system 214 for transcription and/or translation of the speech input in connection with the selected augmented reality content item (operation 518). As noted above, the transcription and translation system 214 may correspond to a subsystem of the messaging system 100 and may be supported on the client side by the messaging client 104 and/or on the server side by the application server 114 . In one or more implementations, the transcription and/or translation of speech via the transcription and translation system 214 may be accomplished client-side, server-side, and/or a combination of client-side and server-side.

此外，转录和翻译系统214可以对应于增强系统208的子系统。因此，尽管图5的示例将操作518至522描绘为发生在消息客户端104与转录和翻译系统214之间，但是这些操作也可以(至少部分地)在消息客户端104与增强系统208之间执行。例如，增强系统208本身可以被配置成执行转录和/或翻译。替选地或另外，转录和翻译系统214可以是与增强系统208分开的系统。例如，消息客户端104和/或增强系统208可以与转录和翻译系统214通信以便获得转录和/或翻译。Additionally, transcription and translation system 214 may correspond to a subsystem of enhancement system 208 . Thus, while the example of FIG. 5 depicts operations 518 through 522 as occurring between message client 104 and transcription and translation system 214, these operations may also be (at least in part) between message client 104 and enhancement system 208. implement. For example, enhancement system 208 may itself be configured to perform transcription and/or translation. Alternatively or in addition, transcription and translation system 214 may be a separate system from enhancement system 208 . For example, message client 104 and/or enhancement system 208 may communicate with transcription and translation system 214 to obtain transcriptions and/or translations.

如上所述，转录和翻译系统214被配置成执行不同类型的基于语音的功能。基于语音的功能的示例包括接收语音输入、生成对应于语音输入的机器编码文本(例如，转录)和/或将机器编码文本从第一语言翻译为第二语言(例如，翻译)。转录和翻译系统214还被配置成提供转录和/或翻译作为用于显示的文本和/或用于音频输出的语音。As noted above, the transcription and translation system 214 is configured to perform different types of speech-based functions. Examples of speech-based functions include receiving speech input, generating machine-encoded text corresponding to the speech input (eg, transcription), and/or translating machine-encoded text from a first language to a second language (eg, translation). The transcription and translation system 214 is also configured to provide transcription and/or translation as text for display and/or speech for audio output.

因此，操作518处的请求可以包括语音输入、是否关于语音输入执行转录和/或翻译的指示、源/输出语言的指示、以及用于转录和/或翻译中的每一个的输出格式(例如，文本和/或音频输出)的指示。例如，是否执行转录和/或翻译的指示及其相应输出格式可以对应于如上所述选择的增强现实内容项的类型(例如，基于语音输入源、话音命令等)。Accordingly, the request at operation 518 may include speech input, an indication of whether to perform transcription and/or translation with respect to the speech input, an indication of the source/output language, and an output format for each of the transcription and/or translation (e.g., text and/or audio output). For example, the indication of whether to perform transcription and/or translation and its corresponding output format may correspond to the type of augmented reality content item selected as described above (eg, based on a voice input source, voice command, etc.).

转录和翻译系统214生成相应输出格式的转录和/或翻译(块520)。转录和翻译系统214将所生成的转录和/或翻译发送到消息客户端104。The transcription and translation system 214 generates transcriptions and/or translations in corresponding output formats (block 520). Transcription and translation system 214 sends the generated transcriptions and/or translations to messaging client 104 .

在一个或更多个实施方式中，在块520处生成并在操作522处发送的翻译可以对应于具有文本和/或音频格式的转录和/或翻译的数据结构。数据结构可由所选择的增强现实内容项使用以生成提供用于输出转录和/或翻译的增强现实内容。以此方式，增强现实内容项目可以对应于具有用于文本和/或音频输出的占位符的模板。替选地或另外，在块520处生成并在操作522处发送的转录和/或翻译可以由增强现实内容项本身生成。In one or more implementations, the translation generated at block 520 and sent at operation 522 may correspond to a data structure having transcriptions and/or translations in text and/or audio format. The data structure may be used by the selected augmented reality content item to generate augmented reality content provided for output transcription and/or translation. In this way, an augmented reality content item may correspond to a template with placeholders for text and/or audio output. Alternatively or in addition, the transcription and/or translation generated at block 520 and sent at operation 522 may be generated by the augmented reality content item itself.

在操作522之后，消息客户端104呈现增强现实内容项以及捕获图像，增强现实内容项包括转录和/或翻译(例如，以文本显示和/或作为音频输出)(块524)。如以下关于图6A至6B、图7A至7B和图8A至8B所讨论的，增强现实内容项可以被配置成使用包括由转录和翻译系统214提供的转录和/或翻译的增强现实内容(例如，覆盖、视觉效果等)来修改捕获图像。Following operation 522, the messaging client 104 presents the augmented reality content item including transcriptions and/or translations (eg, displayed in text and/or output as audio) along with captured images (block 524). As discussed below with respect to FIGS. , overlays, visual effects, etc.) to modify the captured image.

图6A示出了根据一些示例实施方式的示例用户界面600a，其中用于基于增强现实的语音翻译的语音输入的源对应于设备用户。用户界面600a包括捕获图像602和转录604。6A illustrates an example user interface 600a in which a source of speech input for augmented reality-based speech translation corresponds to a device user, according to some example implementations. User interface 600a includes captured image 602 and transcription 604 .

在一个或更多个实施方式中，客户端设备102的用户在消息客户端104内提供触摸输入，以执行扫描操作来识别捕获图像602中的对象(例如，来自前置摄像装置的实况视频馈送)。例如，触摸输入可以对应于在显示捕获图像602的设备屏幕的一部分内接收到的按压并保持手势，或者对诸如扫描按钮(未示出)的专用按钮的选择。In one or more implementations, a user of client device 102 provides touch input within messaging client 104 to perform a scan operation to identify objects in captured image 602 (e.g., a live video feed from a front-facing camera ). For example, the touch input may correspond to a press and hold gesture received within the portion of the device screen displaying the captured image 602, or selection of a dedicated button such as a scan button (not shown).

在一个或更多个实施方式中，在扫描操作期间，消息客户端104被配置成显示扫描图形(未示出)以指示消息客户端104正在执行扫描操作。例如，扫描图形对应于在扫描的持续时间(例如，2秒的预定持续时间)内显示的动画。In one or more implementations, during a scan operation, message client 104 is configured to display a scan graphic (not shown) to indicate that message client 104 is performing a scan operation. For example, the scan graphic corresponds to an animation displayed for the duration of the scan (eg, a predetermined duration of 2 seconds).

在扫描操作期间，客户端设备102的用户可以向消息客户端104提供语音输入(例如，通过发声或讲话)。在图6A的示例中，用户讲出短语“说出该成本是多少？”(或“用希腊语说出该成本是多少？”)。术语“说”可以对应于用于执行从用户的已知语言(例如，主要语言)到另一语言的翻译的话音命令词。另一语言可以根据旅行参数(例如，指示用户在希腊旅行)来确定和/或包括在语音输入中(例如，语音输入的“希腊语”部分)。During a scanning operation, a user of client device 102 may provide voice input (eg, by vocalizing or speaking) to messaging client 104 . In the example of FIG. 6A, the user speaks the phrase "How much does it cost to say that?" (or "How much does it cost to say that in Greek?"). The term "speak" may correspond to a voice command word for performing translation from a user's known language (eg, primary language) to another language. Another language may be determined from travel parameters (eg, indicating that the user is traveling in Greece) and/or included in the voice input (eg, the "Greek" portion of the voice input).

图6A的示例中的用户请求短语“该成本是多少？”从用户的已知语言(例如英语)到希腊语的翻译。响应于接收到用户输入，消息客户端104(例如，与转录和翻译系统214结合)可以提供转录整个短语以及显示生成的转录604(如图6A所示)。The user in the example of FIG. 6A requests a translation of the phrase "How much does this cost?" from a language known to the user (eg, English) to Greek. In response to receiving user input, message client 104 (eg, in conjunction with transcription and translation system 214 ) may offer to transcribe the entire phrase and display the resulting transcription 604 (as shown in FIG. 6A ).

此外，基于话音命令词(例如，术语“说”)、旅行参数(例如，指示旅行到希腊)、对象属性(例如，指示其是提供语音输入的用户)和/或环境因素(例如，设备地理定位)，消息客户端104确定执行关于语音输入的语音识别。如下面关于图6B所讨论的，语音识别可以与所选择的增强现实内容项相关联地执行。此外，消息客户端104(例如，与转录和翻译系统214结合)提供对语音输入(例如，话音命令词“说”之后的短语)的翻译和/或转录。Additionally, based on voice command words (e.g., the term "say"), travel parameters (e.g., indicating travel to Greece), object attributes (e.g., indicating that it is the user providing the voice input), and/or environmental factors (e.g., device geography) positioning), the message client 104 determines to perform speech recognition on the speech input. As discussed below with respect to FIG. 6B, speech recognition may be performed in association with the selected augmented reality content item. Additionally, messaging client 104 (eg, in conjunction with transcription and translation system 214 ) provides translation and/or transcription of speech input (eg, the phrase following the voice command word "say").

图6B示出了根据一些示例实施方式的用于为由设备用户提供的话音输入提供基于增强现实的语音翻译的示例用户界面600b。用户界面600b描绘了图6A的捕获图像602。另外，用户界面600b描绘了翻译卡606、轮播界面608和所选择的AR图标610。6B illustrates an example user interface 600b for providing augmented reality-based speech translation for speech input provided by a device user, according to some example implementations. User interface 600b depicts captured image 602 of FIG. 6A. Additionally, user interface 600b depicts translation card 606 , carousel interface 608 , and selected AR icon 610 .

如上所述，消息客户端104可能已经确定执行关于用户提供的语音输入的语音识别。作为响应，消息客户端104可以选择(例如，与增强系统208结合)增强现实内容项来执行这样的语音识别。As noted above, the messaging client 104 may have determined to perform speech recognition on the speech input provided by the user. In response, messaging client 104 may select (eg, in conjunction with augmentation system 208 ) an augmented reality content item to perform such speech recognition.

在这点上，轮播界面608描绘了可以关于捕获图像602应用的可选增强现实内容项。每个可用增强现实内容项由图标表示，该图标是用户可选择的以用于切换到相应的增强现实内容项。在一个或更多个实施方式中，所选择的AR图标610对应于(例如，由增强系统208)选择以执行语音识别的增强现实内容项。如图6B所示，以相对于(例如，大于)其余图标的不同方式显示所选择的AR图标610。In this regard, carousel interface 608 depicts selectable augmented reality content items that may be applied with respect to captured image 602 . Each available augmented reality content item is represented by an icon that is selectable by the user for switching to the corresponding augmented reality content item. In one or more implementations, the selected AR icon 610 corresponds to an augmented reality content item selected (eg, by the augmentation system 208) to perform speech recognition. As shown in FIG. 6B, the selected AR icon 610 is displayed differently relative to (eg, larger than) the remaining icons.

在一个或更多个实现方式中，所选择的增强现实内容项(例如，对应于所选择的AR图标610)提供语音输入的视觉转录、语音输入的视觉翻译以及语音输入的音频翻译。In one or more implementations, the selected augmented reality content item (eg, corresponding to the selected AR icon 610 ) provides a visual transcription of the speech input, a visual translation of the speech input, and an audio translation of the speech input.

相对于翻译卡606来描绘这些转录和/或翻译，翻译卡606由消息客户端104播放(例如，与转录和翻译系统214结合)。在一个或更多个实施方式中，翻译卡606被呈现为相对于捕获图像602的覆盖。翻译卡606的内容可以基于话音命令词(例如，术语“说”)、话音命令之后的语音输入(“该成本是多少？”)、用户的已知语言(例如，如与简档数据302相关联地存储的)，和/或输出语言(例如，基于旅行参数和/或“用希腊语”(在被包括作为语音输入的一部分的情况下))。此外，翻译卡606可以包括用于翻译的音频回放的接口元件。These transcriptions and/or translations are depicted relative to translation cards 606, which are played by message client 104 (eg, in conjunction with transcription and translation system 214). In one or more implementations, translation card 606 is presented as an overlay relative to captured image 602 . The content of the translation card 606 may be based on the voice command word (e.g., the term "speak"), the voice input following the voice command ("How much does that cost?"), the user's known language (e.g., as associated with the profile data 302 locally stored), and/or output language (e.g., based on travel parameters and/or "in Greek" (if included as part of the speech input)). Additionally, translation card 606 may include interface elements for audio playback of translations.

在一个或更多个实施方式中，响应于用户对所选择的AR图标610的选择，消息客户端104提供用于生成包括屏幕内容的图像(例如，响应于所选择的AR图标610的按压/敲击手势)和/或视频(例如，响应于所选择的AR图标610的按压并保持手势)的媒体内容项目，例如以便发送给朋友、包括在故事中等。In one or more implementations, in response to user selection of the selected AR icon 610, the messaging client 104 provides a means for generating an image including the screen content (e.g., in response to pressing/pressing the selected AR icon 610). tap gesture) and/or video (eg, in response to a press and hold gesture of a selected AR icon 610) media content item, eg, to send to a friend, include in a story, etc.

图7A示出了根据一些示例实施方式的示例用户界面700a，其中用于基于增强现实的语音翻译的语音输入的源对应于设备用户之外的个体。用户界面700a包括捕获图像702、轮播界面706和选择的AR图标708。7A illustrates an example user interface 700a in which a source of speech input for augmented reality-based speech translation corresponds to an individual other than the device user, according to some example implementations. User interface 700a includes captured image 702 , carousel interface 706 , and selected AR icon 708 .

在一个或更多个实施方式中，客户端设备102的用户在消息客户端104内提供触摸输入704，以执行扫描操作来识别捕获图像702中的对象(例如，来自后置摄像装置的实况视频馈送)。例如，触摸输入704可以对应于在显示捕获图像702的设备屏幕的一部分内接收到的按压并保持手势。In one or more implementations, a user of client device 102 provides touch input 704 within messaging client 104 to perform a scan operation to identify objects in captured image 702 (e.g., live video from a rear-facing camera feed). For example, touch input 704 may correspond to a press and hold gesture received within a portion of the device screen displaying captured image 702 .

在扫描操作期间，消息客户端104(例如，与对象检测系统212结合)可以在捕获图像702中检测不同于设备用户的个体。个体可以提供实况语音，该实况语音由设备麦克风捕获为语音输入。此外，消息客户端104可以确定指示实况语音的语言的旅行参数(例如，用户正在旅行的区域)。During scanning operations, messaging client 104 (eg, in conjunction with object detection system 212 ) may detect individuals in captured image 702 that are different from the device user. Individuals can provide live speech that is captured as speech input by the device microphone. In addition, messaging client 104 may determine a travel parameter (eg, the region in which the user is traveling) indicative of the language of the live voice.

基于语音输入、对象属性、旅行参数和/或环境因素(例如，装置地理定位)，消息客户端104确定执行关于语音输入的语音识别。如下面关于图7B所讨论的，可以与所选择的增强现实内容项相关联地执行语音识别。此外，消息客户端104(例如，与转录和翻译系统214结合)提供翻译和/或转录语音输入。Based on the voice input, object attributes, travel parameters, and/or environmental factors (eg, device geolocation), the messaging client 104 determines to perform voice recognition on the voice input. As discussed below with respect to FIG. 7B, speech recognition may be performed in association with the selected augmented reality content item. Additionally, messaging client 104 (eg, in conjunction with transcription and translation system 214 ) provides translation and/or transcription of speech input.

类似于图6A的轮播界面608，图7B的轮播界面706允许用户循环通过和/或选择不同的增强现实内容项以相对于捕获图像702应用/显示。另外，与活动增强现实内容项相对应的图标(例如，所选择的AR图标708)是用户可选择的，以生成包括图像(例如，响应于按压/敲击手势)和/或视频(例如，响应于按压并保持手势)的媒体内容项。Similar to carousel interface 608 of FIG. 6A , carousel interface 706 of FIG. 7B allows a user to cycle through and/or select different augmented reality content items to apply/display relative to captured image 702 . Additionally, an icon corresponding to an active augmented reality content item (e.g., selected AR icon 708) is user-selectable to generate images (e.g., in response to a press/tap gesture) and/or video (e.g., A media content item that responds to a press and hold gesture).

图7B示出了根据一些示例实施方式的用于为由不同于设备用户的个体提供的语音输入提供基于增强现实的语音翻译的示例用户界面700b。用户界面700b描绘了图7A的捕获图像702和触摸输入704。虽然在图7B中未示出，但是用户界面700b可以继续描绘图7A的轮播界面706和/或所选择的AR图标708。另外，用户界面700b描绘转录712和翻译714。7B illustrates an example user interface 700b for providing augmented reality-based speech translation for speech input provided by an individual other than the device user, according to some example implementations. User interface 700b depicts captured image 702 and touch input 704 of FIG. 7A . Although not shown in FIG. 7B , user interface 700b may continue to depict carousel interface 706 and/or selected AR icon 708 of FIG. 7A . Additionally, user interface 700b depicts transcription 712 and translation 714 .

如上所述，消息客户端104可能已经确定执行关于由在捕获图像702中描述的个人提供的语音输入的语音识别。作为响应，消息客户端104可以选择(例如，与增强系统208结合)增强现实内容项来执行这样的语音识别。在一个或更多个实现方式中，所选择的增强现实内容项(例如，对应于所选择的AR图标708)提供了实况语音的视觉转录、实况语音的视觉翻译和/或实况语音的音频翻译。As noted above, messaging client 104 may have determined to perform speech recognition with respect to speech input provided by the individual depicted in captured image 702 . In response, messaging client 104 may select (eg, in conjunction with augmentation system 208 ) an augmented reality content item to perform such speech recognition. In one or more implementations, the selected augmented reality content item (e.g., corresponding to the selected AR icon 708) provides a visual transcription of the live speech, a visual translation of the live speech, and/or an audio translation of the live speech .

因此，消息客户端104(例如，与转录和翻译系统214结合)与所选择的增强现实内容项相关联地显示转录712和翻译714。转录712和翻译714可以被呈现为相对于捕获图像702的覆盖。转录712可以基于语音输入(例如，由个体提供)和/或输入语言(例如，基于旅行参数和/或用于从语音识别语言的已知处理技术)。此外，翻译714可以基于将语音输入翻译成用户的已知语言(例如，如与简档数据302相关联地存储的)。Accordingly, messaging client 104 (eg, in conjunction with transcription and translation system 214 ) displays transcription 712 and translation 714 in association with the selected augmented reality content item. Transcription 712 and translation 714 may be presented as an overlay relative to captured image 702 . Transcription 712 may be based on speech input (eg, provided by the individual) and/or input language (eg, based on travel parameters and/or known processing techniques for recognizing language from speech). In addition, translation 714 may be based on translating the speech input into the user's known language (eg, as stored in association with profile data 302 ).

图8A示出了根据一些示例实施方式的用于为由电视提供的语音输入提供基于增强现实的语音翻译的示例用户界面800a。用户界面800a包括捕获图像802、翻译工具804和翻译806。8A illustrates an example user interface 800a for providing augmented reality-based speech translation for speech input provided by a television, according to some example implementations. User interface 800a includes captured image 802 , translation tool 804 and translation 806 .

在一个或更多个实施方式中，客户端设备102的用户在消息客户端104内提供触摸输入，以执行扫描操作来识别捕获图像802中的对象(例如，来自后置摄像装置的实况视频馈送)。例如，触摸输入704可以对应于在显示捕获图像802的设备屏幕的一部分内接收到的按压并保持手势。In one or more implementations, a user of client device 102 provides touch input within messaging client 104 to perform a scan operation to identify objects in captured image 802 (e.g., a live video feed from a rear facing camera ). For example, touch input 704 may correspond to a press and hold gesture received within a portion of the device screen displaying captured image 802 .

在扫描操作期间，消息客户端104(例如，与对象检测系统212结合)可以检测指示来自电视或其他显示设备的视频和音频输出的对象属性。来自电视的音频输出可以对应于由消息客户端104接收的语音输入。此外，消息客户端104可以确定旅行参数(例如，用户正在旅行的区域)，该旅行参数指示音频输出的语言。During scanning operations, message client 104 (eg, in conjunction with object detection system 212 ) may detect object attributes indicative of video and audio output from a television or other display device. Audio output from the television may correspond to speech input received by messaging client 104 . Additionally, the messaging client 104 can determine travel parameters (eg, the region the user is traveling in) that indicate the language of the audio output.

基于语音输入、对象属性、旅行参数和/或环境因素(例如，设备地理定位)，消息客户端104确定执行关于语音输入的语音识别。基于该确定，消息客户端104可以选择(例如，与增强系统208结合)增强现实内容项来执行这样的语音识别。在一个或更多个实施方式中，所选择的增强现实内容项提供音频输出的视觉转录、音频输出的视觉翻译或音频输出的音频翻译中的至少一个。Based on the voice input, object attributes, travel parameters, and/or environmental factors (eg, device geolocation), the messaging client 104 determines to perform voice recognition on the voice input. Based on this determination, messaging client 104 may select (eg, in conjunction with augmentation system 208 ) an augmented reality content item to perform such speech recognition. In one or more implementations, the selected augmented reality content item provides at least one of a visual transcription of the audio output, a visual translation of the audio output, or an audio translation of the audio output.

就这一点而言，用户界面800a包括翻译工具804，用于激活/去激活文本转录(例如，对应于与旅行相关联的语言的电视的音频输出)和/或文本翻译(例如，对应于用户的已知语言)的显示。翻译工具804还可以提供激活/去激活翻译的音频输出。在图8A的示例中，用户界面800a提供翻译806的显示，并且可以提供翻译的对应音频输出。In this regard, user interface 800a includes a translation tool 804 for activating/deactivating text transcription (e.g., corresponding to a television's audio output in the language associated with the trip) and/or text translation (e.g., corresponding to user known languages) display. The translation tool 804 may also provide an audio output for activating/deactivating translations. In the example of FIG. 8A , user interface 800a provides a display of translation 806 and may provide a corresponding audio output of the translation.

图8B示出了根据一些示例实施方式的用于为由电视提供的语音输入提供基于增强现实的话音翻译的另一示例用户界面800b。用户界面800b描绘图8A的捕获图像802、翻译工具804和翻译806。在图8B的示例中，用户界面800b还提供与电视的音频输出相对应的转录808的显示(例如，基于用户关于翻译工具804的选择)。8B illustrates another example user interface 800b for providing augmented reality-based voice translations for voice input provided by a television, according to some example implementations. User interface 800b depicts captured image 802, translation tool 804, and translation 806 of FIG. 8A. In the example of FIG. 8B , user interface 800b also provides a display of a transcription 808 corresponding to the television's audio output (eg, based on a user selection with respect to translation tool 804 ).

图9是示出根据一些示例实施方式的用于与旅行相关联地提供与语音翻译相对应的增强现实内容的过程900的流程图。为了说明的目的，在此主要参照图1的消息客户端104以及图2的增强系统208与转录和翻译系统214来描述过程900。然而，过程900的一个或更多个块(或操作)可以由一个或更多个其他部件和/或由其他合适的设备来执行。进一步出于解释的目的，过程900的块(或操作)在本文中被描述为串行或线性地发生。然而，过程900的多个块(或操作)可以并行或同时发生。另外，过程900的块(或操作)不需要以所示的顺序执行，和/或过程900的一个或更多个块(或操作)不需要执行和/或可以由其他操作替换。当其操作完成时，过程900可以终止。另外，过程900可以对应于方法、程序、算法等。9 is a flowchart illustrating a process 900 for providing augmented reality content corresponding to speech translation in association with travel, according to some example embodiments. For purposes of illustration, process 900 is described herein primarily with reference to message client 104 of FIG. 1 and enhancement system 208 and transcription and translation system 214 of FIG. 2 . However, one or more blocks (or operations) of process 900 may be performed by one or more other components and/or by other suitable devices. Further for purposes of explanation, the blocks (or operations) of process 900 are described herein as occurring serially or linearly. However, multiple blocks (or operations) of process 900 may occur in parallel or simultaneously. Additionally, the blocks (or operations) of process 900 need not be performed in the order shown, and/or one or more blocks (or operations) of process 900 need not be performed and/or may be replaced by other operations. Process 900 may terminate when its operations are complete. Additionally, process 900 may correspond to a method, procedure, algorithm, or the like.

消息客户端104接收与由设备摄像装置捕获的图像相关联地执行扫描操作的请求(块902)。图像可以对应于设备的摄像装置的实况馈送。The messaging client 104 receives a request to perform a scan operation in association with an image captured by a device camera (block 902). The images may correspond to a live feed of a camera of the device.

消息客户端104确定与该请求相关联的旅行参数和图像中描绘的对象的属性(块904)。旅行参数可以指示与用户的旅行相关联的旅行时间表、运输时间表、语言、一般位置、特定地点或地标、活动、参与者列表、或感兴趣的主题中的至少一个。The messaging client 104 determines travel parameters associated with the request and attributes of the objects depicted in the image (block 904). The travel parameters may indicate at least one of travel schedules, transportation schedules, languages, general locations, specific locations or landmarks, events, attendee lists, or topics of interest associated with the user's travel.

消息客户端104基于旅行参数或属性中的至少一个从多个增强现实内容项中选择增强现实内容项，增强现实内容项被配置成基于与请求相关联的语音输入来呈现增强现实内容(块906)。选择增强现实内容项还可以基于设备的地理定位。The messaging client 104 selects an augmented reality content item from a plurality of augmented reality content items based on at least one of the travel parameters or attributes, the augmented reality content item being configured to present the augmented reality content based on the voice input associated with the request (block 906 ). Selecting an augmented reality content item may also be based on the geolocation of the device.

消息客户端104接收语音输入(块908)。消息客户端104获得语音输入的转录或翻译中的至少一个(块910)。消息客户端104与图像相关联地呈现增强现实内容项，增强现实内容项包括转录或翻译中的至少一个(块912)。The messaging client 104 receives voice input (block 908). The messaging client 104 obtains at least one of a transcription or a translation of the speech input (block 910). The messaging client 104 presents an augmented reality content item in association with the image, the augmented reality content item including at least one of a transcription or a translation (block 912).

例如，属性可以指示对象对应于图像中不同于用户的个体，该个体提供对应于实况语音的语音输入。在这样的情况下，可以选择增强现实内容项以便提供实况语音的视觉翻译或实况语音的音频翻译中的至少一个。For example, an attribute may indicate that the object corresponds to an individual in the image other than the user, the individual providing speech input corresponding to the live speech. In such cases, the augmented reality content item may be selected to provide at least one of a visual translation of the live speech or an audio translation of the live speech.

在另一示例中，属性可以指示对象对应于在第二设备上显示的视频，该视频与对应于语音输入的音频输出一起显示。在这种情况下，可以选择增强现实内容项以便提供音频输出的视觉转录、音频输出的视觉翻译或音频输出的音频翻译中的至少一个。In another example, the attribute may indicate that the object corresponds to video displayed on the second device, the video being displayed with audio output corresponding to the speech input. In this case, the augmented reality content item may be selected to provide at least one of a visual transcription of the audio output, a visual translation of the audio output, or an audio translation of the audio output.

在另一示例中，消息客户端104可以检测用于翻译语音输入的话音命令，该话音命令和语音输入由用户与请求相关联地提供，并且属性可以指示对象对应于用户。在这种情况下，可以选择增强现实内容项以便提供语音输入的视觉转录、语音输入的视觉翻译和语音输入的音频翻译。In another example, the messaging client 104 may detect a voice command for translating the voice input provided by the user in association with the request, and the attribute may indicate that the object corresponds to the user. In this case, the augmented reality content item may be selected to provide a visual transcription of the spoken input, a visual translation of the spoken input, and an audio translation of the spoken input.

图10是示出访问限制过程1000的示意图，根据该访问限制过程，对内容(例如，短暂消息1010和相关联的数据的多媒体有效载荷)或内容集合(例如，短暂消息组1006)的访问可以是时间受限的(例如，使得是短暂的)。10 is a schematic diagram illustrating an access restriction process 1000 according to which access to content (e.g., multimedia payloads of ephemeral messages 1010 and associated data) or collections of content (e.g., groups of ephemeral messages 1006) may be is time-limited (eg, made ephemeral).

短暂消息1010被示为与消息持续时间参数1014相关联，消息持续时间参数1014的值确定消息客户端104将向短暂消息1010的接收用户显示短暂消息1010的时间量。在一个示例中，取决于发送用户使用消息持续时间参数1014指定的时间量，接收用户可以查看短暂消息1010达到最多10秒。The ephemeral message 1010 is shown associated with a message duration parameter 1014 whose value determines the amount of time that the messaging client 104 will display the ephemeral message 1010 to a recipient user of the ephemeral message 1010 . In one example, depending on the amount of time the sending user specifies using the message duration parameter 1014, the receiving user may view the ephemeral message 1010 for up to 10 seconds.

消息持续时间参数1014和消息接收者标识符424被示出为消息定时器1012的输入，消息定时器1012负责确定向由消息接收者标识符424标识的特定接收用户示出短暂消息1010的时间量。特别地，仅在由消息持续时间参数1014的值确定的时间段内向相关接收用户示出短暂消息1010。消息定时器1012被示出为向更一般化的短暂定时器系统202提供输出，该短暂定时器系统202负责向接收用户显示内容(例如，短暂消息1010)的总体定时。Message duration parameter 1014 and message recipient identifier 424 are shown as inputs to message timer 1012, which is responsible for determining the amount of time to show ephemeral message 1010 to the particular receiving user identified by message recipient identifier 424 . In particular, the ephemeral message 1010 is only shown to the relevant receiving user for a period of time determined by the value of the message duration parameter 1014 . Message timer 1012 is shown as providing output to a more general ephemeral timer system 202 that is responsible for the overall timing of displaying content (eg, ephemeral message 1010 ) to a receiving user.

图10中示出的短暂消息1010被包括在短暂消息组1006(例如，个人故事或事件故事中的消息的集合)内。短暂消息组1006具有相关联的组持续时间参数1004，组持续时间参数1004的值确定短暂消息组1006被呈现并可由消息系统100的用户访问的持续时间。例如，组持续时间参数1004可以是音乐会的持续时间，其中，短暂消息组1006是涉及该音乐会的内容的集合。替选地，当执行短暂消息组1006的设置和创建时，用户(拥有用户或策展者用户)可以指定组持续时间参数1004的值。The ephemeral message 1010 shown in FIG. 10 is included within an ephemeral message group 1006 (eg, a collection of messages in a personal story or an event story). Group of ephemeral messages 1006 has an associated group duration parameter 1004 whose value determines the duration for which group of ephemeral messages 1006 is presented and accessible to a user of messaging system 100 . For example, group duration parameter 1004 may be the duration of a concert, where ephemeral message group 1006 is a collection of content related to the concert. Alternatively, a user (either the owning user or the curator user) may specify a value for the group duration parameter 1004 when performing setup and creation of the ephemeral message group 1006 .

另外，短暂消息组1006内的每个短暂消息1010具有相关联的组参与参数1002，组参与参数1002的值确定在短暂消息组1006的上下文内将可访问短暂消息1010的持续时间。因此，在短暂消息组1006本身根据组持续时间参数1004到期之前，特定的短暂消息组1006可以“到期”并且在短暂消息组1006的上下文中变得不可访问。组持续时间参数1004、组参与参数1002和消息接收者标识符424各自向组定时器1008提供输入，组定时器1008可操作地首先确定短暂消息组1006的特定短暂消息1010是否将被显示给特定接收用户，并且如果是，则确定显示多长时间。注意，由于消息接收者标识符424，短暂消息组1006也知道特定接收用户的身份。Additionally, each ephemeral message 1010 within ephemeral message group 1006 has an associated group participation parameter 1002 whose value determines the duration for which ephemeral message 1010 will be accessible within the context of ephemeral message group 1006 . Thus, a particular short message group 1006 may "expire" and become inaccessible within the context of the short message group 1006 before the short message group 1006 itself expires according to the group duration parameter 1004 . Group duration parameter 1004, group participation parameter 1002, and message recipient identifier 424 each provide input to group timer 1008, which is operable to first determine whether a particular ephemeral message 1010 of ephemeral message group 1006 is to be displayed to a particular ephemeral message 1010. Receive the user, and if so, determine how long to display. Note that ephemeral message group 1006 also knows the identity of a particular recipient user due to message recipient identifier 424 .

因此，组定时器1008可操作地控制相关联的短暂消息组1006以及包括在短暂消息组1006中的单独的短暂消息1010的总使用期限。在一个示例中，短暂消息组1006内的每个短暂消息1010在由组持续时间参数1004指定的时间段内保持可查看和可访问。在另一示例中，在短暂消息组1006的上下文内，某个短暂消息1010可以基于组参与参数1002而到期。注意，即使在短暂消息组1006的上下文内，消息持续时间参数1014也仍然可以确定向接收用户显示特定短暂消息1010的持续时间。因此，消息持续时间参数1014确定向接收用户显示特定短暂消息1010的持续时间，而不管接收用户是在短暂消息组1006的上下文之内还是之外查看该短暂消息1010。Accordingly, the group timer 1008 is operable to control the total lifetime of the associated ephemeral message group 1006 and the individual ephemeral messages 1010 included in the ephemeral message group 1006 . In one example, each ephemeral message 1010 within ephemeral message group 1006 remains viewable and accessible for a period of time specified by group duration parameter 1004 . In another example, within the context of an ephemeral message group 1006 , a certain ephemeral message 1010 may expire based on the group participation parameter 1002 . Note that even within the context of fleeting message group 1006, message duration parameter 1014 may still determine the duration for which a particular fleeting message 1010 is displayed to the receiving user. Accordingly, the message duration parameter 1014 determines the duration for which a particular short message 1010 is displayed to the receiving user, regardless of whether the receiving user views the short message 1010 within or outside the context of the short message group 1006 .

短暂定时器系统202还可以基于确定特定的短暂消息1010已经超过相关联的组参与参数1002而从短暂消息组1006中可操作地移除该特定的短暂消息1010。例如，当发送用户已经建立了从发布起24小时的组参与参数1002时，短暂定时器系统202将在指定的二十四小时之后从短暂消息组1006中移除相关的短暂消息1010。短暂定时器系统202还进行操作以当针对短暂消息组1006内的每个短暂消息1010的组参与参数1002已到期时，或者当短暂消息组1006本身根据组持续时间参数1004已到期时，移除短暂消息组1006。The ephemeral timer system 202 can also be operable to remove the particular ephemeral message 1010 from the ephemeral message group 1006 based on determining that the ephemeral message 1010 has exceeded the associated group participation parameter 1002 . For example, when the sending user has established a group participation parameter 1002 of 24 hours from posting, the ephemeral timer system 202 will remove the associated ephemeral message 1010 from the ephemeral message group 1006 after the specified twenty-four hours. The ephemeral timer system 202 also operates to when the group participation parameter 1002 for each ephemeral message 1010 within the ephemeral message group 1006 has expired, or when the ephemeral message group 1006 itself has expired according to the group duration parameter 1004, The ephemeral message group 1006 is removed.

在某些使用情况下，特定短暂消息组1006的创建者可以指定无期限的组持续时间参数1004。在这种情况下，针对短暂消息组1006内最后剩余的短暂消息1010的组参与参数1002的到期将确定短暂消息组1006本身何时到期。在这种情况下，添加至短暂消息组1006的具有新的组参与参数1002的新的短暂消息1010有效地将短暂消息组1006的寿命延长到等于组参与参数1002的值。In some use cases, the creator of a particular ephemeral message group 1006 may specify an indefinite group duration parameter 1004 . In this case, the expiration of the group participation parameter 1002 for the last remaining ephemeral message 1010 within the ephemeral message group 1006 will determine when the ephemeral message group 1006 itself expires. In this case, a new ephemeral message 1010 added to ephemeral message group 1006 with new group participation parameter 1002 effectively extends the lifetime of ephemeral message group 1006 to equal the value of group participation parameter 1002 .

响应于短暂定时器系统202确定短暂消息组1006已经到期(例如，不再是可访问的)，短暂定时器系统202与消息系统100(并且例如特别是消息客户端104)通信，以使得与相关短暂消息组1006相关联的标记(例如，图标)不再显示在消息客户端104的用户接口内。类似地，当短暂定时器系统202确定针对特定短暂消息1010的消息持续时间参数1014已到期时，短暂定时器系统202使消息客户端104不再显示与短暂消息1010相关联的标记(例如，图标或文本标识)。In response to ephemeral timer system 202 determining that ephemeral message group 1006 has expired (e.g., is no longer accessible), ephemeral timer system 202 communicates with messaging system 100 (and, e.g., in particular message client 104) such that the ephemeral message group 1006 communicates with The indicia (eg, icon) associated with the group of related ephemeral messages 1006 is no longer displayed within the user interface of the message client 104 . Similarly, when ephemeral timer system 202 determines that message duration parameter 1014 for a particular ephemeral message 1010 has expired, ephemeral timer system 202 causes message client 104 to no longer display the indicia associated with ephemeral message 1010 (e.g., icon or text logo).

图11是机器1100的图解表示，在该机器1100内可以执行用于使机器1100执行本文所讨论的方法中的任何一个或更多个方法的指令1106(例如，软件、程序、应用、小程序、app或其他可执行代码)。例如，指令1106可以使机器1100执行本文所描述的方法中的任何一种或更多种。指令1106将通用的未经编程的机器1100变换成特定机器1100，该特定机器1100被编程为以所描述的方式执行所描述和所示出的功能。机器1100可以作为独立设备操作或者可以耦接(例如，联网)到其他机器。在网络部署中，机器1100可以在服务器-客户端网络环境中以服务器机器或客户端机器的身份操作，或者在对等(或分布式)网络环境中作为对等机器操作。机器1100可以包括但不限于：服务器计算机、客户端计算机、个人计算机(PC)、平板计算机、膝上型计算机、上网本、机顶盒(STB)、个人数字助理(PDA)、娱乐媒体系统、蜂窝电话、智能电话、移动设备、可穿戴设备(例如，智能手表)、智能家居设备(例如，智能电器)、其他智能设备、web装置、网络路由器、网络交换机、网络桥接器或者能够顺序地或以其他方式执行指定要由机器1100采取的动作的指令1106的任何机器。此外，虽然仅示出了单个机器1100，但是术语“机器”还应被认为包括单独地或联合地执行指令1100以执行本文中讨论的任何一种或更多种方法的机器的集合。例如，机器1100可以包括客户端设备102或者形成消息服务器系统108的一部分的多个服务器设备中的任何一个。在一些示例中，机器1100还可以包括客户端系统和服务器系统两者，其中在服务器侧执行特定方法或算法的某些操作，并且在客户端侧执行特定方法或算法的某些操作。FIG. 11 is a diagrammatic representation of a machine 1100 within which instructions 1106 (e.g., software, programs, applications, applets, etc.) for causing the machine 1100 to perform any one or more of the methodologies discussed herein can be executed. , app, or other executable code). For example, instructions 1106 may cause machine 1100 to perform any one or more of the methods described herein. Instructions 1106 transform a general-purpose unprogrammed machine 1100 into a specific machine 1100 programmed to perform the functions described and illustrated in the manner described. Machine 1100 may operate as a standalone device or may be coupled (eg, networked) to other machines. In a network deployment, machine 1100 may operate as a server machine or a client machine in a server-client network environment, or as a peer machine in a peer-to-peer (or distributed) network environment. Machine 1100 may include, but is not limited to, server computers, client computers, personal computers (PCs), tablet computers, laptop computers, netbooks, set top boxes (STBs), personal digital assistants (PDAs), entertainment media systems, cellular phones, Smartphones, mobile devices, wearable devices (e.g., smart watches), smart home devices (e.g., smart appliances), other smart devices, web appliances, network routers, network switches, network bridges, or capable of sequentially or otherwise Any machine that executes instructions 1106 specifying actions to be taken by the machine 1100 . Further, while a single machine 1100 is shown, the term "machine" shall also be taken to include a collection of machines that individually or jointly execute instructions 1100 to perform any one or more methodologies discussed herein. For example, machine 1100 may include client device 102 or any of a number of server devices forming part of message server system 108 . In some examples, machine 1100 can also include both a client system and a server system, where some operations of a particular method or algorithm are performed on the server side and some operations of a particular method or algorithm are performed on the client side.

机器1100可以包括可以被配置成经由总线1120彼此通信的处理器1102、存储器1110和输入/输出(I/O)部件1122。在示例中，处理器1102(例如，中央处理单元(CPU)、精简指令集计算(RISC)处理器、复杂指令集计算(CISC)处理器、图形处理单元(GPU)、数字信号处理器(DSP)、专用集成电路(ASIC)、射频集成电路(RFIC)、另外的处理器或其任何合适的组合)可以包括例如执行指令1106的处理器1104和处理器1108。术语“处理器”旨在包括多核处理器，该多核处理器可以包括可以同时执行指令的两个或更多个独立的处理器(有时被称为“核”)。尽管图11示出了多个处理器1102，但是机器1100可以包括具有单个核的单个处理器、具有多个核的单个处理器(例如，多核处理器)、具有单个核的多个处理器、具有多个核的多个处理器或其任何组合。Machine 1100 may include a processor 1102 , a memory 1110 , and an input/output (I/O) component 1122 that may be configured to communicate with one another via a bus 1120 . In an example, processor 1102 (e.g., central processing unit (CPU), reduced instruction set computing (RISC) processor, complex instruction set computing (CISC) processor, graphics processing unit (GPU), digital signal processor (DSP ), an application specific integrated circuit (ASIC), a radio frequency integrated circuit (RFIC), another processor, or any suitable combination thereof) may include, for example, processor 1104 and processor 1108 that execute instructions 1106 . The term "processor" is intended to include a multi-core processor, which may include two or more separate processors (sometimes referred to as "cores") that can execute instructions concurrently. Although FIG. 11 shows multiple processors 1102, machine 1100 may include a single processor with a single core, a single processor with multiple cores (e.g., a multi-core processor), multiple processors with a single core, Multiple processors with multiple cores or any combination thereof.

存储器1110包括主存储器1112、静态存储器1114和存储单元1116，二者可由处理器1102经由总线1120访问。主存储器1110、静态存储器1114和存储单元1116存储实现本文中所描述的方法或功能中的任何一个或更多个方法或功能的指令1106。指令1106还可以在其被机器1100执行期间完全地或部分地驻留在主存储器1112内、在静态存储器1114内、在存储单元1116内的机器可读介质1118内、在处理器1102中的至少一个处理器内(例如，在处理器的高速缓存存储器内)、或其任何合适的组合。Memory 1110 includes main memory 1112 , static memory 1114 , and storage unit 1116 , both of which are accessible by processor 1102 via bus 1120 . Main memory 1110, static memory 1114, and storage unit 1116 store instructions 1106 that implement any one or more of the methods or functions described herein. Instructions 1106 may also reside wholly or partially within main memory 1112, within static memory 1114, within machine-readable medium 1118 within storage unit 1116, within at least one of processor 1102 during execution by machine 1100 Within a processor (eg, within a processor's cache memory), or any suitable combination thereof.

I/O部件1122可以包括接收输入、提供输出、产生输出、发送信息、交换信息、捕获测量等的各种各样的部件。包括在特定机器中的具体I/O部件1122将取决于机器的类型。例如，便携式机器诸如移动电话可以包括触摸输入设备或其他这样的输入机构，而无头服务器机器将不太可能包括这样的触摸输入设备。应当理解的是，I/O部件1122可以包括图11中未示出的许多其他部件。在各种示例中，I/O部件1122可以包括用户输出部件1124和用户输入部件1126。用户输出部件1124可以包括视觉部件(例如，诸如等离子显示面板(PDP)、发光二极管(LED)显示器、液晶显示器(LCD)、投影仪或阴极射线管(CRT)的显示器)、声学部件(例如，扬声器)、触觉部件(例如，振动马达、阻力机构)、其他信号生成器等。用户输入部件1126可以包括字母数字输入部件(例如，键盘、被配置成接收字母数字输入的触摸屏、光电键盘或其他字母数字输入部件)、基于点的输入部件(例如，鼠标、触摸板、轨迹球、操纵杆、运动传感器或其他指向仪器)、触觉输入部件(例如，物理按钮、提供触摸或触摸手势的位置和力的触摸屏、或其他触觉输入部件)、音频输入部件(例如，麦克风)等。I/O components 1122 can include a wide variety of components that receive input, provide output, generate output, send information, exchange information, capture measurements, and the like. The specific I/O components 1122 included in a particular machine will depend on the type of machine. For example, a portable machine such as a mobile phone may include a touch input device or other such input mechanism, whereas a headless server machine would be less likely to include such a touch input device. It should be understood that I/O components 1122 may include many other components not shown in FIG. 11 . In various examples, I/O components 1122 can include user output components 1124 and user input components 1126 . User output components 1124 may include visual components (e.g., a display such as a plasma display panel (PDP), light emitting diode (LED) display, liquid crystal display (LCD), projector, or cathode ray tube (CRT)), acoustic components (e.g., speakers), haptic components (eg, vibration motors, resistance mechanisms), other signal generators, etc. User input components 1126 may include alphanumeric input components (e.g., a keyboard, a touch screen configured to receive alphanumeric input, an optical keyboard, or other alphanumeric input components), point-based input components (e.g., a mouse, touch pad, trackball, etc.) , joystick, motion sensor, or other pointing instrument), tactile input components (e.g., physical buttons, touch screens that provide the position and force of touch or touch gestures, or other tactile input components), audio input components (e.g., microphones), etc.

在其他示例中，I/O部件1122可以包括生物识别部件1128、运动部件1130、环境部件1132、或位置部件1134以及各种其他部件。例如，生物识别部件1128包括用于检测表达(例如，手表达、面部表情、声音表达、身体姿势或眼睛跟踪)、测量生物信号(例如，血压、心率、体温、出汗或脑波)、识别人(例如，声音识别、视网膜识别、面部识别、指纹识别或基于脑电图的识别)等的部件。运动部件1130包括加速度传感器部件(例如，加速度计)、重力传感器部件、旋转传感器部件(例如，陀螺仪)。In other examples, I/O components 1122 can include biometric components 1128, motion components 1130, environmental components 1132, or location components 1134, among various other components. For example, the biometric component 1128 includes functions for detecting expressions (e.g., hand expressions, facial expressions, voice expressions, body gestures, or eye tracking), measuring biosignals (e.g., blood pressure, heart rate, body temperature, sweating, or brain waves), identifying Components for people (for example, voice recognition, retinal recognition, facial recognition, fingerprint recognition, or EEG-based recognition), etc. The moving part 1130 includes an acceleration sensor part (for example, an accelerometer), a gravity sensor part, and a rotation sensor part (for example, a gyroscope).

环境部件1132包括例如：一个或更多个摄像装置(具有静止图像/照片和视频能力)、照明传感器部件(例如，光度计)、温度传感器部件(例如，检测环境温度的一个或更多个温度计)、湿度传感器部件、压力传感器部件(例如，气压计)、声学传感器部件(例如，检测背景噪声的一个或更多个麦克风)、接近传感器部件(例如，检测附近对象的红外传感器)、气体传感器(例如，为了安全而检测危险气体的浓度或者测量大气中的污染物的气体检测传感器)、或者可以提供与周围物理环境对应的指示、测量或信号的其他部件。Environmental components 1132 include, for example: one or more cameras (with still image/photo and video capabilities), lighting sensor components (e.g., photometers), temperature sensor components (e.g., one or more thermometers to detect ambient temperature ), humidity sensor components, pressure sensor components (e.g., barometer), acoustic sensor components (e.g., one or more microphones to detect background noise), proximity sensor components (e.g., infrared sensors to detect nearby objects), gas sensors (for example, gas detection sensors that detect concentrations of hazardous gases for safety or measure pollutants in the atmosphere), or other components that may provide indications, measurements, or signals corresponding to the surrounding physical environment.

关于摄像装置，客户端设备102可以具有摄像装置系统，该摄像装置系统包括例如客户端设备102的前表面上的前置摄像装置和客户端设备102的后表面上的后置摄像装置。前置摄像装置可以例如用于捕获客户端设备102的用户的静止图像和视频(例如，“自拍”)，然后可以用上述增强数据(例如，过滤器)对该静止图像和视频进行增强。例如，后置摄像装置可以用于以更传统的摄像装置模式捕获静止图像和视频，这些图像类似地使用增强数据进行增强。除了前置和后置摄像装置之外，客户端设备102还可以包括用于捕获360°照片和视频的360°摄像装置。Regarding cameras, the client device 102 may have a camera system including, for example, a front camera on the front surface of the client device 102 and a rear camera on the rear surface of the client device 102 . The front-facing camera may, for example, be used to capture still images and video (eg, "selfies") of the user of client device 102, which may then be enhanced with the aforementioned enhancement data (eg, filters). For example, the rear camera can be used to capture still images and video in a more traditional camera mode, which are similarly enhanced with augmentation data. In addition to front and rear facing cameras, client device 102 may also include a 360° camera for capturing 360° photos and videos.

此外，客户端设备102的摄像装置系统可以包括双后置摄像装置(例如，主摄像装置以及深度感测摄像装置)，或者甚至在客户端设备102的前后侧上包括三重、四重或五重后置摄像装置配置。例如，这些多摄像装置系统可以包括广角摄像装置、超广角摄像装置、长焦摄像装置、微距摄像装置和深度传感器。Furthermore, the camera system of the client device 102 may include dual rear cameras (e.g., a main camera and a depth-sensing camera), or even triple, quadruple, or quintuple cameras on the front and rear sides of the client device 102. Rear camera setup. For example, these multi-camera systems may include wide-angle cameras, ultra-wide-angle cameras, telephoto cameras, macro cameras, and depth sensors.

位置部件1134包括定位传感器部件(例如，GPS接收器部件)、海拔传感器部件(例如，检测可以得到海拔的气压的高度计或气压计)、取向传感器部件(例如，磁力计)等。Location components 1134 include positioning sensor components (eg, GPS receiver components), altitude sensor components (eg, an altimeter or barometer that detects air pressure at which altitude can be obtained), orientation sensor components (eg, magnetometers), and the like.

可以使用各种各样的技术来实现通信。I/O部件1122还包括通信部件1136，通信部件1136可操作以经由相应的耦接或连接将机器1100耦接至网络1140或设备1138。例如，通信部件1136可以包括网络接口部件或另一合适的设备以与网络1140对接。在另外的示例中，通信部件1136可以包括有线通信部件、无线通信部件、蜂窝通信部件、近场通信(NFC)部件、

部件(例如，/>

低功耗)、/>

部件以及经由其他模态提供通信的其他通信部件。设备1138可以是另一机器或各种外围设备中的任何外围设备(例如，经由USB耦接的外围设备)。Communications may be accomplished using a variety of techniques. The I/O component 1122 also includes a communication component 1136 operable to couple the machine 1100 to a network 1140 or a device 1138 via a corresponding coupling or connection. For example, communication component 1136 may include a network interface component or another suitable device to interface with network 1140 . In further examples, the communication component 1136 can include a wired communication component, a wireless communication component, a cellular communication component, a near field communication (NFC) component,

widget (eg, />

low power consumption), />

components and other communication components that provide communication via other modalities. Device 1138 may be another machine or any of a variety of peripherals (eg, a peripheral coupled via USB).

此外，通信部件1136可以检测标识符或包括可操作以检测标识符的部件。例如，通信部件1136可以包括射频识别(RFID)标签读取器部件、NFC智能标签检测部件、光学读取器部件(例如，用于检测诸如通用产品代码(UPC)条形码的一维条形码、诸如快速反应(QR)码、Aztec码、数据矩阵、数据符号(Dataglyph)、最大码(MaxiCode)、PDF417、超码(UltraCode)、UCC RSS-2D条形码的多维条形码和其他光学码的光学传感器)或声学检测部件(例如，用于识别标记的音频信号的麦克风)。另外，可以经由通信部件1136得出各种信息，例如经由因特网协议(IP)地理定位得出的定位、经由

信号三角测量得出的定位、经由检测可以指示特定定位的NFC信标信号得出的定位等。Additionally, communication component 1136 can detect an identifier or include a component operable to detect an identifier. For example, communication components 1136 may include radio frequency identification (RFID) tag reader components, NFC smart tag detection components, optical reader components (e.g., for detecting one-dimensional barcodes such as Universal Product Code (UPC) barcodes, such as fast Reactive (QR) Code, Aztec Code, Data Matrix, Data Symbol (Dataglyph), MaxiCode (MaxiCode), PDF417, UltraCode (UltraCode), UCC RSS-2D barcode for multidimensional barcodes and other optical sensors for optical codes) or acoustic A detection component (eg, a microphone for identifying the audio signal of the marker). Additionally, various information can be derived via the communication component 1136, such as location via Internet Protocol (IP) geolocation, via

Positions derived from signal triangulation, positions derived via detection of NFC beacon signals that can indicate a specific position, etc.

各种存储器(例如，主存储器1112、静态存储器1114以及处理器1102的存储器)以及存储单元1116可以存储由本文中描述的方法或功能中的任何一个或更多个实施或使用的一组或更多组指令和数据结构(例如，软件)。这些指令(例如，指令1106)在由处理器1102执行时使各种操作实现所公开的示例。Various memories (e.g., main memory 1112, static memory 1114, and memory of processor 1102) and storage unit 1116 may store a set or more Sets of instructions and data structures (eg, software). These instructions (eg, instructions 1106 ), when executed by processor 1102 , cause various operations to implement the disclosed examples.

可以经由网络接口设备(例如，通信部件1136中包括的网络接口部件)，使用传输介质并且使用若干公知的传输协议中的任何一种传输协议(例如，超文本传输协议(HTTP))，通过网络1140来发送或接收指令1106。类似地，可以使用传输介质经由与设备1138的耦接(例如，对等耦接)来发送或接收指令1106。Communication over a network may be via a network interface device (e.g., a network interface component included in communication component 1136), using a transmission medium and using any of several well-known transmission protocols (e.g., Hypertext Transfer Protocol (HTTP)). 1140 to send or receive instructions 1106. Similarly, instructions 1106 may be sent or received via a coupling (eg, a peer-to-peer coupling) with device 1138 using a transmission medium.

图12是示出软件架构1202的框图1200，该软件架构1202可以安装在本文所描述的任何一个或更多个设备上。软件架构1202由硬件例如包括处理器1248、存储器1250和I/O部件1252的机器1208支持。在该示例中，可以将软件架构1202概念化为层的堆栈，在该层的堆栈中，每个层提供特定的功能。软件架构1202包括如下层，例如操作系统1216、库1214、框架1212和应用1210。在操作上，应用1210通过软件堆栈来激活API调用1204并响应于API调用1204来接收消息1206。FIG. 12 is a block diagram 1200 illustrating a software architecture 1202 that may be installed on any one or more of the devices described herein. Software architecture 1202 is supported by hardware such as machine 1208 including processor 1248 , memory 1250 and I/O components 1252 . In this example, the software architecture 1202 can be conceptualized as a stack of layers, where each layer provides a specific functionality. Software architecture 1202 includes layers such as operating system 1216 , libraries 1214 , framework 1212 , and applications 1210 . Operationally, the application 1210 activates the API call 1204 and receives the message 1206 in response to the API call 1204 through the software stack.

操作系统1216管理硬件资源并提供公共服务。操作系统1216包括例如核1242、服务1244和驱动器1246。核1242充当硬件层与其他软件层之间的抽象层。例如，核1242提供存储器管理、处理器管理(例如，调度)、部件管理、联网和安全设置以及其他功能。服务1244可以为其他软件层提供其他公共服务。驱动器1246负责控制底层硬件或与底层硬件对接。例如，驱动器1246可以包括显示驱动器、摄像装置驱动器、

或

低能量驱动器、闪存驱动器、串行通信驱动器(例如，USB驱动器)、

驱动器、音频驱动器、电力管理驱动器等。The operating system 1216 manages hardware resources and provides common services. Operating system 1216 includes, for example, core 1242 , services 1244 and drivers 1246 . Core 1242 acts as an abstraction layer between the hardware layer and other software layers. For example, core 1242 provides memory management, processor management (eg, scheduling), component management, networking and security settings, among other functions. Services 1244 may provide other common services for other software layers. The driver 1246 is responsible for controlling the underlying hardware or interfacing with the underlying hardware. For example, the driver 1246 may include a display driver, a camera driver,

or

low-energy drives, flash drives, serial communication drives (for example, USB drives),

drivers, audio drivers, power management drivers, etc.

库1214提供由应用1210使用的公共低级基础设施。库1214可以包括系统库1236(例如，C标准库)，系统库1236提供诸如存储器分配功能、字符串操纵功能、数学功能等的功能。另外，库1214可以包括API库1238，例如媒体库(例如，用于支持各种媒体格式的呈现和操作的库，所述各种媒体格式例如运动图像专家组-4(MPEG4)、高级视频编码(H.264或AVC)、运动图像专家组层-3(MP3)、高级音频编码(AAC)、自适应多速率(AMR)音频编解码器、联合图像专家组(JPEG或JPG)或便携式网络图形(PNG))、图形库(例如，用于在显示器上的图形内容中以二维(2D)和三维(3D)进行呈现的OpenGL框架)、数据库库(例如，提供各种关系数据库功能的SQLite)、web库(例如，提供web浏览功能的WebKit)等。库1214还可以包括多种其他库1240，以向应用1210提供许多其他API。Libraries 1214 provide common low-level infrastructure used by applications 1210 . Libraries 1214 may include system libraries 1236 (eg, the C standard library) that provide functionality such as memory allocation functions, string manipulation functions, math functions, and the like. Additionally, libraries 1214 may include API libraries 1238, such as media libraries (e.g., libraries to support rendering and manipulation of various media formats, such as Moving Picture Experts Group-4 (MPEG4), Advanced Video Coding (H.264 or AVC), Moving Picture Experts Group Layer-3 (MP3), Advanced Audio Coding (AAC), Adaptive Multi-Rate (AMR) audio codec, Joint Photographic Experts Group (JPEG or JPG), or portable network graphics (PNG)), graphics libraries (e.g., the OpenGL framework for rendering in two-dimensional (2D) and three-dimensional (3D) in graphical content on a display), database libraries (e.g., SQLite), web libraries (for example, WebKit that provides web browsing functionality), etc. Library 1214 may also include a variety of other libraries 1240 to provide applications 1210 with many other APIs.

框架1212提供由应用1210使用的公共高级基础设施。例如，框架1212提供各种图形用户接口(GUI)功能、高级资源管理和高级定位服务。框架1212可以提供可由应用1210使用的广泛的其他API，其中的一些API可以专用于特定的操作系统或平台。Framework 1212 provides a common high-level infrastructure used by applications 1210 . For example, framework 1212 provides various graphical user interface (GUI) functions, advanced resource management, and advanced location services. Framework 1212 may provide a wide range of other APIs that may be used by applications 1210, some of which may be specific to a particular operating system or platform.

在示例中，应用1210可以包括家庭应用1218、联系人应用1224、浏览器应用1228、书籍阅读器应用1232、定位应用1220、媒体应用1226、消息应用1230、游戏应用1234和诸如第三方应用1222的种类繁多的其他应用。应用1210是执行程序中定义的功能的程序。可以使用各种编程语言来创建以各种方式构造的应用1210中的一个或更多个，所述编程语言例如面向对象的编程语言(例如，Objective-C、Java或C++)或过程编程语言(例如，C或汇编语言)。在特定示例中，第三方应用1222(例如，由特定平台的供应商以外的实体使用ANDROID^TM或IOS^TM软件开发工具包(SDK)开发的应用)可以是在诸如IOS^TM、ANDROID^TM、

Phone的移动操作系统或其他移动操作系统上运行的移动软件。在该示例中，第三方应用1222可以激活由操作系统1216提供的API调用1204以有助于本文中描述的功能。In an example, applications 1210 may include home application 1218, contacts application 1224, browser application 1228, book reader application 1232, location application 1220, media application 1226, messaging application 1230, gaming application 1234, and third-party applications such as 1222 A wide variety of other applications. The application 1210 is a program that executes functions defined in the program. One or more of the variously structured applications 1210 may be created using various programming languages, such as object-oriented programming languages (e.g., Objective-C, Java, or C++) or procedural programming languages ( For example, C or assembly language). In particular examples, third-party applications 1222 (eg, applications developed by entities other than the vendor of the particular platform using the ANDROID ^™ or IOS ^™ software development kit (SDK)) may be on platforms such as IOS ^™ , ANDROID ^™ ,

Mobile software running on the Phone's mobile operating system or other mobile operating systems. In this example, third party application 1222 can activate API calls 1204 provided by operating system 1216 to facilitate the functionality described herein.

“载波信号”是指能够存储、编码或携载由机器执行的指令的任何无形介质并且包括数字或模拟通信信号或其他无形介质以便于这些指令的通信。可以经由网络接口设备使用传输介质在网络上发送或接收指令。"Carrier signal" means any intangible medium capable of storing, encoding, or carrying instructions for execution by a machine and includes digital or analog communication signals or other intangible medium to facilitate the communication of such instructions. Instructions may be sent or received over a network using a transmission medium via a network interface device.

“客户端设备”是指与通信网络对接以从一个或更多个服务器系统或其他客户端设备获得资源的任何机器。客户端设备可以是但不限于移动电话、桌上型计算机、膝上型计算机、便携式数字助理(PDA)、智能电话、平板计算机、超级本、上网本、膝上型计算机、多处理器系统、基于微处理器或可编程消费电子产品、游戏控制台、机顶盒或用户可以用于访问网络的任何其他通信设备。"Client Device" means any machine that interfaces with a communications network to obtain resources from one or more server systems or other client devices. Client devices can be, but are not limited to, mobile phones, desktop computers, laptop computers, portable digital assistants (PDAs), smart phones, tablet computers, ultrabooks, netbooks, laptop computers, multiprocessor systems, based Microprocessors or programmable consumer electronics, game consoles, set-top boxes, or any other communication device that a user can use to access a network.

“通信网络”是指网络的一个或更多个部分，该网络可以是自组织网络、内联网、外联网、虚拟专用网络(VPN)、局域网(LAN)、无线LAN(WLAN)、广域网(WAN)、无线WAN(WWAN)、城域网(MAN)、因特网、因特网的一部分、公共交换电话网(PSTN)的一部分、普通老式电话服务(POTS)网络、蜂窝电话网络、无线网络、

网络、其他类型的网络或者两个或更多个这样的网络的组合。例如，网络或网络的一部分可以包括无线网络或蜂窝网络，并且耦接可以是码分多址(CDMA)连接、全局移动通信系统(GSM)连接或其他类型的蜂窝或无线耦接。在该示例中，耦接可以实现各种类型的数据传输技术中的任何数据传输技术，例如单载波无线电传输技术(1xRTT)、演进数据优化(EVDO)技术、通用分组无线电服务(GPRS)技术、增强数据速率的GSM演进(EDGE)技术、包括3G的第三代合作伙伴计划(3GPP)、第四代无线(4G)网络、通用移动电信系统(UMTS)、高速分组接入(HSPA)、全球微波接入互操作性(WiMAX)、长期演进(LTE)标准、由各种标准设置组织定义的其他数据传输技术、其他长距离协议或其他数据传输技术。"Communications network" means one or more parts of a network, which may be an ad hoc network, intranet, extranet, virtual private network (VPN), local area network (LAN), wireless LAN (WLAN), wide area network (WAN ), Wireless WAN (WWAN), Metropolitan Area Network (MAN), Internet, Part of Internet, Part of Public Switched Telephone Network (PSTN), Plain Old Telephone Service (POTS) Network, Cellular Telephone Network, Wireless Network,

network, other types of networks, or a combination of two or more such networks. For example, a network or portion of a network may comprise a wireless network or a cellular network, and the coupling may be a Code Division Multiple Access (CDMA) connection, a Global System for Mobile Communications (GSM) connection, or other type of cellular or wireless coupling. In this example, the coupling can implement any of various types of data transmission technologies, such as Single Carrier Radio Transmission Technology (1xRTT), Evolution Data Optimized (EVDO) technology, General Packet Radio Service (GPRS) technology, Enhanced Data Rates for GSM Evolution (EDGE), Third Generation Partnership Project (3GPP) including 3G, Fourth Generation Wireless (4G) networks, Universal Mobile Telecommunications System (UMTS), High Speed Packet Access (HSPA), Global Interoperability for Microwave Access (WiMAX), Long Term Evolution (LTE) standards, other data transmission technologies defined by various standards setting organizations, other long range protocols or other data transmission technologies.

“部件”是指具有以下边界的设备、物理实体或逻辑，该边界由功能或子例程调用、分支点、API或者对特定处理或控制功能提供分区或模块化的其他技术定义。部件可以经由它们的接口与其他部件组合以执行机器处理。部件可以是被设计用于与其他部件一起使用的封装功能硬件单元以及通常执行相关功能的特定功能的程序的一部分。部件可以构成软件部件(例如，在机器可读介质上实施的代码)或硬件部件。“硬件部件”是能够执行某些操作的有形单元，并且可以以某种物理方式来配置或布置。在各种示例实施方式中，可以通过软件(例如，应用或应用部分)将一个或更多个计算机系统(例如，独立计算机系统、客户端计算机系统或服务器计算机系统)或计算机系统的一个或更多个硬件部件(例如，处理器或处理器组)配置成进行操作以执行本文中描述的某些操作的硬件部件。也可以机械地、电子地或其任何合适的组合来实现硬件部件。例如，硬件部件可以包括被永久地配置成执行某些操作的专用电路系统或逻辑。硬件部件可以是例如现场可编程门阵列(FPGA)或专用集成电路(ASIC)的专用处理器。硬件部件还可以包括通过软件被临时配置成执行某些操作的可编程逻辑或电路系统。例如，硬件部件可以包括由通用处理器或其他可编程处理器执行的软件。一旦通过这样的软件被配置，则硬件部件成为特定的机器(或机器的特定部件)，该特定的机器(或机器的特定部件)被唯一地定制成执行配置的功能并且不再是通用处理器。将理解，可以出于成本和时间考虑来决定机械地、在专用且永久配置的电路中还是在临时配置(例如，由软件配置)的电路中实现硬件部件。因此，短语“硬件部件”(或“硬件实现的部件”)应当被理解成涵盖有形实体，即被物理构造、永久配置(例如，硬连线)或临时配置(例如，编程)成以某种方式操作或者执行本文中所描述的某些操作的实体。考虑硬件部件被临时配置(例如，被编程)的实施方式，无需在任一时刻对硬件部件中的每一个进行配置或实例化。例如，在硬件部件包括通过软件配置而成为专用处理器的通用处理器的情况下，通用处理器可以在不同时间处被配置成分别不同的专用处理器(例如，包括不同的硬件部件)。软件相应地配置特定处理器或处理器，以例如在一个时刻处构成特定硬件部件并且在不同的时刻处构成不同的硬件部件。硬件部件可以向其他硬件部件提供信息并且从其他硬件部件接收信息。因此，所描述的硬件部件可以被认为是通信上耦接的。在同时存在多个硬件部件的情况下，可以通过在两个或更多个硬件部件之间或之中(例如，通过适当的电路和总线)的信号传输来实现通信。在其中多个硬件部件在不同时间处被配置或被实例化的实施方式中，可以例如通过将信息存储在多个硬件部件可以访问的存储器结构中并且在该存储器结构中检索信息来实现这样的硬件部件之间的通信。例如，一个硬件部件可以执行操作，并且将该操作的输出存储在通信上耦接的存储器设备中。然后，另外的硬件部件可以在随后的时间处访问存储器设备，以检索和处理所存储的输出。硬件部件还可以发起与输入设备或输出设备的通信，并且可以对资源进行操作(例如，信息的集合)。在本文中描述的示例方法的各种操作可以至少部分地由临时地配置(例如，由软件)或永久地配置成执行相关操作的一个或更多个处理器来执行。无论是临时配置还是永久配置，这样的处理器可以构成进行操作以执行本文描述的一个或更多个操作或功能的处理器实现的部件。如本文中使用的，“处理器实现的部件”是指使用一个或更多个处理器实现的硬件部件。类似地，本文中描述的方法可以至少部分地由处理器实现，其中特定的一个或更多个处理器是硬件的示例。例如，方法的至少一些操作可以由一个或更多个处理器或处理器实现的部件来执行。此外，所述一个或更多个处理器还可以操作成支持“云计算”环境中的相关操作的执行或者操作为“软件即服务”(SaaS)。例如，操作中的至少一些操作可以由计算机组(作为包括处理器的机器的示例)执行，其中，这些操作能够经由网络(例如，因特网)并且经由一个或更多个适当的接口(例如，API)进行访问。某些操作的执行可以分布在处理器之间，不仅仅驻留在单个机器内，而是可以被部署在多个机器上。在一些示例实施方式中，处理器或处理器实现的部件可以位于单个地理定位中(例如，在家庭环境、办公室环境或服务器群内)。在其他示例实施方式中，处理器或处理器实现的部件可以跨多个地理定位分布。"Component" means a device, physical entity, or logic having boundaries defined by function or subroutine calls, branch points, APIs, or other techniques that provide partitioning or modularization of specific processing or control functions. Components may be combined with other components via their interfaces to perform machine processes. A component may be a part of a program designed to be used with other components encapsulating a functional hardware unit and generally performing a specific function of the associated function. The components may constitute software components (eg, code embodied on a machine-readable medium) or hardware components. A "hardware component" is a tangible unit capable of performing certain operations, and may be configured or arranged in some physical manner. In various example embodiments, one or more computer systems (e.g., stand-alone computer systems, client computer systems, or server computer systems) or one or more A number of hardware components (eg, a processor or group of processors) are configured as hardware components operative to perform certain operations described herein. Hardware components may also be implemented mechanically, electronically, or any suitable combination thereof. For example, a hardware component may include dedicated circuitry or logic that is permanently configured to perform certain operations. A hardware component may be a special purpose processor such as a Field Programmable Gate Array (FPGA) or an Application Specific Integrated Circuit (ASIC). Hardware components may also include programmable logic or circuitry that is temporarily configured by software to perform certain operations. For example, hardware components may include software executed by a general-purpose processor or other programmable processor. Once configured by such software, a hardware component becomes a specific machine (or specific part of a machine) that is uniquely tailored to perform the configured function and is no longer a general-purpose processor . It will be appreciated that cost and time considerations may dictate implementing hardware components mechanically, in dedicated and permanently configured circuitry, or in temporarily configured (eg, by software) circuitry. Accordingly, the phrase "hardware component" (or "hardware-implemented component") should be understood to cover a tangible entity that is physically constructed, permanently configured (e.g., hardwired), or temporarily configured (e.g., programmed) to An entity that operates in a manner or that performs some of the operations described herein. Contemplating embodiments in which hardware components are temporarily configured (eg, programmed), each of the hardware components need not be configured or instantiated at any one time. For example, where the hardware components include a general-purpose processor configured by software to be a special-purpose processor, the general-purpose processor may be configured as respectively different special-purpose processors (for example, including different hardware components) at different times. The software configures a specific processor or processors accordingly, eg to constitute a specific hardware component at one time and to constitute a different hardware component at a different time. Hardware components can provide information to, and receive information from, other hardware components. Accordingly, the described hardware components may be considered to be communicatively coupled. Where multiple hardware components are present at the same time, communications may be achieved by the transmission of signals between or among two or more hardware components (eg, through appropriate circuits and buses). In implementations where multiple hardware components are configured or instantiated at different times, such can be achieved, for example, by storing information in a memory structure accessible by multiple hardware components and retrieving information from the memory structure Communication between hardware components. For example, a hardware component may perform an operation and store the output of the operation in a communicatively coupled memory device. Additional hardware components can then access the memory device at a later time to retrieve and process the stored output. A hardware component can also initiate communications with input or output devices, and can operate on resources (eg, collections of information). The various operations of the example methods described herein may be performed, at least in part, by one or more processors that are temporarily configured (eg, by software) or permanently configured to perform the relevant operations. Whether temporarily configured or permanently configured, such processors may constitute processor-implemented components operative to perform one or more operations or functions described herein. As used herein, a "processor-implemented component" refers to a hardware component implemented using one or more processors. Similarly, the methods described herein may be implemented at least in part by processors, in particular the processor(s) being an example of hardware. For example, at least some operations of a method may be performed by one or more processors or processor-implemented components. Furthermore, the one or more processors may also be operable to support the execution of related operations in a "cloud computing" environment or as "software as a service" (SaaS). For example, at least some of the operations may be performed by a group of computers (as an example of a machine including a processor), where these operations can be performed via a network (e.g., the Internet) and via one or more suitable interfaces (e.g., an API). ) to access. The execution of certain operations can be distributed among processors, not only residing in a single machine, but can be deployed on multiple machines. In some example embodiments, a processor or processor-implemented component may be located in a single geographic location (eg, within a home environment, office environment, or server farm). In other example embodiments, processors or processor-implemented components may be distributed across multiple geographic locations.

“计算机可读存储介质”是指机器存储介质和传输介质两者。因此，这些术语包括存储设备/介质和载波/调制数据信号两者。术语“机器可读介质”、“计算机可读介质”和“设备可读介质”意指相同的事物，并且可以在本公开内容中可互换地使用。"Computer-readable storage medium" refers to both machine storage media and transmission media. Accordingly, these terms include both storage devices/media and carrier/modulated data signals. The terms "machine-readable medium", "computer-readable medium" and "device-readable medium" mean the same thing and may be used interchangeably in this disclosure.

“短暂消息”是指在时间有限的持续时间内可访问的消息。短暂消息可以是文本、图像、视频等。短暂消息的访问时间可以由消息发送者设置。替选地，访问时间可以是默认设置或者由接收者指定的设置。不管设置技术如何，消息都是暂态的。"Ephemeral Messages" means messages that are accessible for a limited duration in time. Ephemeral messages can be text, images, videos, etc. The access time for ephemeral messages can be set by the sender of the message. Alternatively, the access time may be a default setting or a setting specified by the recipient. Regardless of the setup technique, messages are transient.

“机器存储介质”是指存储可执行指令、例程和数据的单个或多个存储设备和介质(例如，集中式或分布式数据库，以及相关联的高速缓存和服务器)。因此，该术语应被视为包括但不限于固态存储器以及光学和磁介质，包括处理器内部或外部的存储器。机器存储介质、计算机存储介质和设备存储介质的具体示例包括：非易失性存储器，包括例如半导体存储器设备，例如可擦除可编程只读存储器(EPROM)、电可擦除可编程只读存储器(EEPROM)、FPGA和闪存设备；磁盘，诸如内部硬盘和可移动盘；磁光盘；以及CD-ROM和DVD-ROM盘。术语“机器存储介质”、“设备存储介质”、“计算机存储介质”意指相同的事物，并且在本公开内容中可以互换地使用。术语“机器存储介质”、“计算机存储介质”和“设备存储介质”明确地排除了载波、调制数据信号和其他这样的介质，所述载波、调制数据信号和其他这样的介质中的至少一些被涵盖在术语“信号介质”中。"Machine storage medium" means a single or multiple storage devices and media (eg, centralized or distributed databases, and associated caches and servers) that store executable instructions, routines, and data. Accordingly, the term should be read to include, but not be limited to, solid-state memory, as well as optical and magnetic media, including memory internal or external to the processor. Specific examples of machine storage media, computer storage media, and device storage media include: non-volatile memory, including, for example, semiconductor memory devices such as Erasable Programmable Read-Only Memory (EPROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), FPGA, and flash memory devices; magnetic disks, such as internal hard disks and removable disks; magneto-optical disks; and CD-ROM and DVD-ROM disks. The terms "machine storage medium", "device storage medium", "computer storage medium" mean the same thing and may be used interchangeably in this disclosure. The terms "machine storage medium", "computer storage medium" and "device storage medium" expressly exclude carrier waves, modulated data signals and other such media at least some of which are covered by the term "signal medium".

“非暂态计算机可读存储介质”是指能够存储、编码或携载由机器执行的指令的有形介质。A "non-transitory computer-readable storage medium" refers to a tangible medium capable of storing, encoding, or carrying instructions for execution by a machine.

“信号介质”是指能够存储、编码或携载由机器执行的指令的任何无形介质，并且包括数字或模拟通信信号或其他无形介质以有助于软件或数据的通信。术语“信号介质”应当被视为包括任何形式的调制数据信号、载波等。术语“调制数据信号”意指使其特性中的一个或更多个特性以将信息编码在信号中的方式来设置或改变的信号。术语“传输介质”和“信号介质”意指相同的事物并且可以在本公开内容中互换使用。"Signal medium" means any intangible medium capable of storing, encoding, or carrying instructions for execution by a machine, and includes digital or analog communication signals or other intangible medium to facilitate the communication of software or data. The term "signal medium" should be taken to include any form of modulated data signal, carrier wave, etc. The term "modulated data signal" means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. The terms "transmission medium" and "signal medium" mean the same thing and may be used interchangeably in this disclosure.

Claims

1. A method, comprising:

receiving, by a messaging application running on a device of a user, a request to perform a scanning operation in association with an image captured by a device camera;

in response to receiving the request, determining a travel parameter associated with the request and a property of an object depicted in the image;

selecting an augmented reality content item from a plurality of augmented reality content items based on at least one of the travel parameter or the attribute, the augmented reality content item configured to present augmented reality content based on a voice input associated with the request;

receiving the voice input;

obtaining at least one of a transcription or a translation of the speech input; and

presenting the augmented reality content item in association with the image, the augmented reality content item including the at least one of the transcription or the translation.

2. The method of claim 1, wherein the travel parameter indicates at least one of a travel schedule, a transportation schedule, a language, a general location, a specific location or landmark, an activity, a participant list, or a topic of interest associated with the user's travel.

3. The method of claim 1, wherein the attribute indicates that the object corresponds to an individual in the image other than the user, the individual providing the speech input corresponding to live speech.

4. The method of claim 3, wherein the augmented reality content item is selected to provide at least one of a visual translation of the live speech or an audio translation of the live speech.

5. The method of claim 1, wherein the attribute indicates that the object corresponds to a video displayed on a second device, the video displayed with an audio output corresponding to the speech input.

6. The method of claim 5, wherein the augmented reality content item is selected to provide at least one of a visual transcription of the audio output, a visual translation of the audio output, or an audio translation of the audio output.

7. The method of claim 1, further comprising:

detecting a voice command for interpreting the speech input, the voice command and the speech input being provided by the user in association with the request,

wherein the attribute indicates that the object corresponds to the user.

8. The method of claim 7, wherein the augmented reality content item is selected to provide a visual transcription of the speech input, a visual translation of the speech input, and an audio translation of the speech input.

9. The method of claim 1, wherein selecting the augmented reality content item is further based on a geographic location of the device.

10. The method of claim 1, wherein the image corresponds to a live feed of a camera of the apparatus.

11. An apparatus, comprising:

a processor; and

a memory storing instructions that, when executed by the processor, configure the processor to:

receiving, by a messaging application running on a device of a user, a request to perform a scan operation in association with an image captured by a device camera;

selecting an augmented reality content item from a plurality of augmented reality content items based on at least one of the travel parameter or the attribute, the augmented reality content item configured to present augmented reality content based on a speech input associated with the request;

receiving the voice input;

12. The apparatus of claim 11, wherein the travel parameter indicates at least one of a travel schedule, a transportation schedule, a language, a general location, a specific location or landmark, an activity, a participant list, or a topic of interest associated with the user's travel.

13. The apparatus of claim 11, wherein the attribute indicates that the object corresponds to an individual in the image other than the user, the individual providing the speech input corresponding to live speech.

14. The apparatus of claim 13, wherein the augmented reality content item is selected to provide at least one of a visual translation of the live speech or an audio translation of the live speech.

15. The apparatus of claim 11, wherein the attribute indicates that the object corresponds to a video displayed on a second device, the video displayed with an audio output corresponding to the voice input.

16. The apparatus of claim 15, wherein the augmented reality content item is selected to provide at least one of a visual transcription of the audio output, a visual translation of the audio output, or an audio translation of the audio output.

17. The apparatus of claim 11, wherein the instructions further configure the processor to:

detecting a voice command for translating the speech input, the voice command and the speech input provided by the user in association with the request,

wherein the attribute indicates that the object corresponds to the user.

18. The apparatus of claim 17, wherein the augmented reality content item is selected to provide a visual transcription of the speech input, a visual translation of the speech input, and an audio translation of the speech input.

19. The apparatus of claim 11, wherein selecting the augmented reality content item is further based on a geolocation of the device.

20. A non-transitory computer-readable storage medium comprising instructions that, when executed by a computer, cause the computer to:

receiving the voice input;