CN114331290A

CN114331290A - Distribution method, apparatus, electronic device, storage medium, and program product

Info

Publication number: CN114331290A
Application number: CN202210003011.XA
Authority: CN
Inventors: 丁一; 丁凡; 沈国斌
Original assignee: Lazas Network Technology Shanghai Co Ltd
Current assignee: Lazas Network Technology Shanghai Co Ltd
Priority date: 2022-01-04
Filing date: 2022-01-04
Publication date: 2022-04-12

Abstract

The embodiment of the application provides a distribution method, a distribution device, an electronic device, a storage medium and a program product, wherein the method achieves the purpose of distributing articles by public transportation by relaying the articles carried by distribution resources (such as passengers, riders and the like) taking the public transportation to reach a final destination. The embodiment of the application adopts public transport as a transportation means for distribution, and can realize that the public transport can reach a far destination in an urban area at a higher speed, namely, the public transport is used as a trunk road in a distribution system to realize wide-range city-wide distribution. And the distribution mode based on public transport can reduce the use of electric vehicles in distribution, and improves the safety of distribution resources to a certain extent. In addition, the distribution method provided by the embodiment of the application is very simple, the distribution resources only need to extract or store the articles at the starting point and the end point of the route of the distribution resources, and the route does not need to be specially changed, so that more common passengers can be promoted to participate as the distribution resources, and the distribution efficiency can be obviously improved.

Description

Distribution method, apparatus, electronic device, storage medium, and program product

技术领域technical field

本申请涉及数据处理技术领域，具体而言，本申请涉及一种配送方法、装置、电子设备、存储介质及程序产品。The present application relates to the technical field of data processing, and in particular, to a distribution method, apparatus, electronic device, storage medium and program product.

背景技术Background technique

随着网络订餐等业务的兴起，配送行业得到迅猛发展，相关从业人员逐渐增多。其中有一种新催生的职业——配送骑手，他们通常需要驾驶电动车，穿越汹涌的交通洪流,把餐品或包裹准时送达给客户。With the rise of online food ordering and other businesses, the distribution industry has developed rapidly, and the number of related employees has gradually increased. Among them is a nascent occupation—delivery riders, who often need to drive electric vehicles through heavy traffic to deliver meals or packages to customers on time.

不难发现，现有的基于电动车的配送方式存在一些缺陷，例如受电动车速度、电量等因素的限制，长距离的配送非常不便，容易造成时效性和安全性等方面的影响。如何开发一种新的配送模式来改善电动车配送的缺陷，已成为行业为的研究热点。It is not difficult to find that the existing distribution methods based on electric vehicles have some defects. For example, due to the limitations of electric vehicle speed and electricity, long-distance distribution is very inconvenient, and it is easy to affect timeliness and safety. How to develop a new distribution mode to improve the defects of electric vehicle distribution has become a research hotspot in the industry.

发明内容SUMMARY OF THE INVENTION

本申请实施例的目的旨在能解决现有的配送方式存在的技术缺陷，例如配送范围受限等技术缺陷。The purpose of the embodiments of the present application is to solve the technical defects existing in the existing distribution methods, such as technical defects such as limited distribution range.

根据本申请实施例的一个方面，提供了一种配送方法，该方法包括：According to an aspect of the embodiments of the present application, a distribution method is provided, the method comprising:

响应于针对任一目标物品的配送请求，按照以下方式执行对任一目标物品的至少一次分配，直至任一目标物品到达最终目的地：In response to a dispatch request for any target item, perform at least one dispatch to any target item in the following manner until either target item reaches the final destination:

获取当前环境信息以及各个存储柜中的每个物品的订单信息；Get current environmental information and order information for each item in each locker;

基于当前环境信息以及每个物品的订单信息，确定每个物品在公共交通中的下一个目的地；Determine the next destination of each item in public transportation based on the current environment information and the order information of each item;

在检测到配送资源针对任一存储柜发起的物品配送指令时，获取配送资源的乘坐目的地；When detecting an item delivery instruction initiated by the delivery resource for any storage cabinet, obtain the destination of the delivery resource;

根据任一存储柜中每个物品的下一个目的地和乘坐目的地，给配送资源分配下一个目的地与乘坐目的地相同的至少一个物品，以使得配送资源将至少一个物品携带到乘坐目的地的存储柜中进行存储。According to the next destination and the ride destination of each item in any storage cabinet, assign at least one item whose next destination is the same as the ride destination to the distribution resource, so that the distribution resource can carry at least one item to the ride destination storage cabinets.

一种可选地实施方式中，基于环境信息以及每个物品的订单信息，确定每个物品在公共交通中的下一个目的地，包括：In an optional implementation manner, based on the environmental information and the order information of each item, the next destination of each item in public transportation is determined, including:

确定预构建的利润模型在利润最大约束下的模型参数，利润模型的模型参数包括每个物品的配送收益、每个物品的分配次数、每个物品每次分配的配送成本、超时订单的数量以及订单超时成本中的至少一项；Determine the model parameters of the pre-built profit model under the profit maximum constraint. The model parameters of the profit model include the distribution revenue of each item, the number of distributions for each item, the distribution cost per distribution for each item, the number of overtime orders, and At least one of the cost of order overtime;

基于利润最大约束、当前环境信息以及每个物品的订单信息，确定每个物品在公共交通中的下一个目的地。Based on the profit maximization constraint, the current environment information, and the order information of each item, the next destination of each item in public transportation is determined.

基于环境信息以及每个物品的订单信息，通过强化学习调度模型，确定每个物品在公共交通中的下一个目的地，强化学习调度模型是基于强化学习算法得到的；Based on the environmental information and the order information of each item, the next destination of each item in public transportation is determined through the reinforcement learning scheduling model. The reinforcement learning scheduling model is obtained based on the reinforcement learning algorithm;

其中，强化学习算法的决策行为包括为物品确定下一个目的地，强化学习算法的环境状态包括环境信息以及物品的订单信息，强化学习算法的回报包括物品到达最终目的地后的利润。Among them, the decision-making behavior of the reinforcement learning algorithm includes determining the next destination for the item, the environmental state of the reinforcement learning algorithm includes the environmental information and the order information of the item, and the reward of the reinforcement learning algorithm includes the profit after the item reaches the final destination.

一种可选地实施方式中，强化学习调度模型是基于强化学习算法得到的，包括：In an optional implementation, the reinforcement learning scheduling model is obtained based on a reinforcement learning algorithm, including:

获取多个训练样本，每个训练样本包括历史时刻的环境信息以及历史时刻的多个物品的订单信息；Obtain multiple training samples, each of which includes environmental information at historical moments and order information for multiple items at historical moments;

根据每个训练样本确定强化学习算法的历史环境状态，基于每个训练样本，确定历史时刻的多个物品在公共交通中的下一个目的地，作为强化学习算法的历史决策行为，根据历史时刻的多个物品到达最终目的地后的利润来确定强化学习算法的历史回报；Determine the historical environment state of the reinforcement learning algorithm according to each training sample, and determine the next destination of multiple items in public transportation at the historical moment based on each training sample, as the historical decision-making behavior of the reinforcement learning algorithm, according to the historical moment. The profit of multiple items reaching the final destination to determine the historical return of the reinforcement learning algorithm;

基于历史环境状态、历史决策行为和历史回报进行强化学习，得到强化学习调度模型。Reinforcement learning is performed based on historical environment states, historical decision-making behaviors and historical rewards, and a reinforcement learning scheduling model is obtained.

确定每个物品采用各种路线对应的预计到达时间；Determine the estimated time of arrival for each item using various routes;

基于环境信息、每个物品的订单信息以及每个物品采用各种路线对应的预计到达时间，确定每个物品在公共交通中的下一个目的地。The next destination of each item in public transportation is determined based on the environmental information, the order information of each item, and the estimated time of arrival corresponding to each item using various routes.

一种可选地实施方式中，确定每个物品采用各种路线对应的预计到达时间，包括：In an optional implementation manner, the estimated time of arrival corresponding to each item using various routes is determined, including:

根据每个物品的配送起点、每个物品的交付目的地、预设参与率、每个物品的配送时间段中的至少一项，通过预设的预计到达时间模型，得到每个物品采用各种路线对应的运送时间和等待时间；According to at least one of the delivery starting point of each item, the delivery destination of each item, the preset participation rate, and the delivery time period of each item, through the preset estimated time of arrival model, it is obtained that each item adopts various Delivery time and waiting time corresponding to the route;

基于每个物品采用各种路线对应的运送时间和等待时间，确定每个物品采用各种路线对应的预计到达时间；Based on the delivery time and waiting time corresponding to each item using various routes, determine the estimated arrival time corresponding to each item using various routes;

其中，预计到达时间模型是对多个物品的历史运送时间和历史等待时间进行高斯拟合得到的。Among them, the estimated time of arrival model is obtained by Gaussian fitting of the historical delivery time and historical waiting time of multiple items.

一种可选地实施方式中，环境信息包括以下至少一项：天气信息、时间信息、配送资源供应信息、物品配送需求信息；In an optional implementation manner, the environmental information includes at least one of the following: weather information, time information, distribution resource supply information, and item distribution demand information;

每个物品的订单信息包括以下至少一项：配送起点、交付目的地、总时间限制、剩余交付时间、分配历史、配送收益、每次分配的配送成本、订单超时成本。The order information for each item includes at least one of the following: delivery origin, delivery destination, total time limit, remaining delivery time, allocation history, delivery revenue, delivery cost per allocation, and order overtime cost.

根据本申请实施例的另一个方面，提供了一种配送方法，该方法包括：According to another aspect of the embodiments of the present application, a distribution method is provided, the method comprising:

将乘坐目的地发送给调度平台；Send the ride destination to the dispatch platform;

接收调度平台给配送资源分配的下一个目的地与乘坐目的地相同的至少一个物品的物品信息；Receive item information of at least one item whose next destination and ride destination are the same as the next destination allocated by the dispatching platform to the distribution resource;

将物品信息进行展示。Display item information.

根据本申请实施例的又一个方面，提供了一种配送装置，该装置包括：According to yet another aspect of the embodiments of the present application, a delivery device is provided, the device comprising:

配送请求接收模块，用于接收针对任一目标物品的配送请求：The delivery request receiving module is used to receive a delivery request for any target item:

订单信息获取模块，用于获取当前环境信息以及各个存储柜中的每个物品的订单信息；The order information acquisition module is used to acquire the current environment information and the order information of each item in each storage cabinet;

物品目的地确定模块，用于基于当前环境信息以及每个物品的订单信息，确定每个物品在公共交通中的下一个目的地；The item destination determination module is used to determine the next destination of each item in public transportation based on the current environment information and the order information of each item;

乘坐目的地获取模块，用于在检测到配送资源针对任一存储柜发起的物品配送指令时，获取配送资源的乘坐目的地；The ride destination acquisition module is used to obtain the ride destination of the distribution resource when it detects an item distribution instruction initiated by the distribution resource for any storage cabinet;

分配模块，根据任一存储柜中每个物品的下一个目的地和乘坐目的地，给配送资源分配下一个目的地与乘坐目的地相同的至少一个物品，以使得配送资源将至少一个物品携带到乘坐目的地的存储柜中进行存储。The allocation module, according to the next destination and the riding destination of each item in any storage cabinet, allocates at least one item whose next destination is the same as the riding destination to the distribution resource, so that the distribution resource can carry at least one item to the destination. Store in a locker at the ride destination.

根据本申请实施例的再一个方面，提供了一种配送装置，该装置包括：According to yet another aspect of the embodiments of the present application, a delivery device is provided, the device comprising:

获取模块，用于在检测到配送资源针对任一存储柜发起的物品配送指令时，获取配送资源的乘坐目的地；The obtaining module is used to obtain the destination of the delivery resource when it detects the delivery instruction initiated by the delivery resource for any storage cabinet;

发送模块，用于将乘坐目的地发送给调度平台；The sending module is used to send the ride destination to the dispatching platform;

接收模块，用于接收调度平台给配送资源分配的下一个目的地与乘坐目的地相同的至少一个物品的物品信息；a receiving module, configured to receive the item information of at least one item whose next destination is the same as the riding destination allocated by the scheduling platform to the distribution resource;

展示模块，用于将物品信息进行展示。The display module is used to display the item information.

根据本申请实施例的一个方面，提供了一种电子设备，该电子设备包括：存储器、处理器及存储在存储器上的计算机程序，处理器执行计算机程序以实现本申请实施例提供的配送方法的步骤。According to an aspect of an embodiment of the present application, an electronic device is provided, the electronic device includes: a memory, a processor, and a computer program stored in the memory, where the processor executes the computer program to implement the delivery method provided by the embodiment of the present application. step.

根据本申请实施例的一个方面，提供了一种计算机可读存储介质，其上存储有计算机程序，计算机程序被处理器执行时实现本申请实施例提供的配送方法的步骤。According to an aspect of the embodiments of the present application, a computer-readable storage medium is provided, and a computer program is stored thereon, and when the computer program is executed by a processor, the steps of the distribution method provided by the embodiments of the present application are implemented.

根据本申请实施例的一个方面，提供了一种计算机程序产品，包括计算机程序，计算机程序被处理器执行时实现本申请实施例提供的配送方法的步骤。According to an aspect of the embodiments of the present application, a computer program product is provided, including a computer program, and when the computer program is executed by a processor, the steps of the distribution method provided by the embodiments of the present application are implemented.

本申请实施例提供的配送方法、装置、电子设备、存储介质及程序产品，通过乘坐公共交通的配送资源(例如乘客、骑手等)接力携带物品抵达最终目的地，达到利用公共交通来配送物品的目的。本申请实施例采用公共交通作为配送的交通工具，可以实现以较快的速度抵达市区内较远的目的地，即作为配送系统中的干路实现更大范围的全城配送。且基于公共交通的配送方式可以减少配送中电动车的使用，一定程度上提高了配送资源的安全性。此外，本申请实施例提供的配送方法非常简单，配送资源只需要在自己路线的起点和终点分别来对物品进行提取或存储，无需特意换乘线路，因此可以促使更多的普通乘客作为配送资源参与进来，能够显著提升配送效率。The distribution method, device, electronic device, storage medium, and program product provided by the embodiments of the present application can take the distribution resources (such as passengers, riders, etc.) of public transportation to carry the goods to the final destination, and achieve the goal of using public transportation to distribute the goods. Purpose. In the embodiment of the present application, public transportation is used as the means of delivery, which can achieve a faster speed to a destination farther in the urban area, that is, as a trunk road in the delivery system, to achieve a wider range of city-wide delivery. And the distribution method based on public transportation can reduce the use of electric vehicles in distribution, and improve the safety of distribution resources to a certain extent. In addition, the distribution method provided by the embodiment of the present application is very simple, and the distribution resources only need to extract or store the items at the starting point and the end point of their own routes, and there is no need to transfer routes, so more ordinary passengers can be promoted as distribution resources. Participation can significantly improve distribution efficiency.

附图说明Description of drawings

为了更清楚地说明本申请实施例中的技术方案，下面将对本申请实施例描述中所需要使用的附图作简单地介绍。In order to illustrate the technical solutions in the embodiments of the present application more clearly, the following briefly introduces the accompanying drawings that need to be used in the description of the embodiments of the present application.

图1为本申请实施例提供的众包配送系统的系统架构示意图；1 is a schematic diagram of the system architecture of a crowdsourcing distribution system provided by an embodiment of the present application;

图2为本申请实施例提供的一种配送方法的流程示意图；2 is a schematic flowchart of a distribution method provided by an embodiment of the present application;

图3为本申请实施例提供的一种物品配送的示例图；FIG. 3 is an example diagram of an item distribution provided by an embodiment of the present application;

图4为本申请实施例提供的一种强化学习的示意图；FIG. 4 is a schematic diagram of reinforcement learning provided by an embodiment of the present application;

图5为本申请实施例提供的一种强化学习调度模型的示意图；FIG. 5 is a schematic diagram of a reinforcement learning scheduling model provided by an embodiment of the present application;

图6为本申请实施例提供的另一种配送方法的流程示意图；6 is a schematic flowchart of another distribution method provided by an embodiment of the present application;

图7为本申请实施例提供的一种配送装置的结构示意图；7 is a schematic structural diagram of a distribution device provided by an embodiment of the present application;

图8为本申请实施例提供的另一种配送装置的结构示意图；8 is a schematic structural diagram of another distribution device provided by an embodiment of the present application;

图9为本申请实施例提供的一种电子设备的结构示意图。FIG. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

具体实施方式Detailed ways

下面结合本申请中的附图描述本申请的实施例。应理解，下面结合附图所阐述的实施方式，是用于解释本申请实施例的技术方案的示例性描述，对本申请实施例的技术方案不构成限制。Embodiments of the present application are described below with reference to the accompanying drawings in the present application. It should be understood that the embodiments described below in conjunction with the accompanying drawings are exemplary descriptions for explaining the technical solutions of the embodiments of the present application, and do not limit the technical solutions of the embodiments of the present application.

本技术领域技术人员可以理解，除非特意声明，这里使用的单数形式“一”、“一个”和“该”也可包括复数形式。应该进一步理解的是，本申请实施例所使用的术语“包括”以及“包含”是指相应特征可以实现为所呈现的特征、信息、数据、步骤、操作、元件和/或组件，但不排除实现为本技术领域所支持其他特征、信息、数据、步骤、操作、元件、组件和/或它们的组合等。应该理解，当我们称一个元件被“连接”或“耦接”到另一元件时，该一个元件可以直接连接或耦接到另一元件，也可以指该一个元件和另一元件通过中间元件建立连接关系。此外，这里使用的“连接”或“耦接”可以包括无线连接或无线耦接。这里使用的术语“和/或”指示该术语所限定的项目中的至少一个，例如“A和/或B”指示实现为“A”，或者实现为“A”，或者实现为“A和B”。It will be understood by those skilled in the art that the singular forms "a," "an," and "the" as used herein can include the plural forms as well, unless expressly stated otherwise. It should be further understood that the terms "comprising" and "comprising" used in the embodiments of the present application mean that corresponding features can be implemented as presented features, information, data, steps, operations, elements and/or components, but do not exclude Implementations support other features, information, data, steps, operations, elements, components, and/or combinations thereof, etc., as supported in the art. It will be understood that when we refer to an element as being "connected" or "coupled" to another element, the one element can be directly connected or coupled to the other element, or the one element and the other element may be intervening through intervening elements Establish a connection relationship. Furthermore, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. The term "and/or" as used herein indicates at least one of the items defined by the term, eg "A and/or B" indicates implementation as "A", or as "A", or as "A and B" ".

为使本申请的目的、技术方案和优点更加清楚，下面将结合附图对本申请实施方式作进一步地详细描述。In order to make the objectives, technical solutions and advantages of the present application clearer, the embodiments of the present application will be further described in detail below with reference to the accompanying drawings.

首先对本申请实施例涉及的几个名词进行介绍和解释：First, introduce and explain several terms involved in the embodiments of the present application:

(1)公共交通：地铁或公交车等具有固定线路和站点的交通系统。(1) Public transportation: a transportation system with fixed lines and stations, such as subways or buses.

(2)配送资源：能够将物品送达到指定地点的人员或机器人，其中，人员可以是指职业的派送员或骑手，也可以是普通乘客，机器人可以是预置的配送用机器人。(2) Distribution resources: personnel or robots that can deliver items to designated locations, where personnel may refer to professional dispatchers or riders, or ordinary passengers, and robots may be preset distribution robots.

(3)众包配送：把过去由员工执行的配送任务，以自由自愿的形式外包给非特定的大众来执行的配送模式，具有工作流程简单、工作时间灵活等优点。对于本申请实施例，众包配送包括普通乘客参与进行物品配送的方式。(3) Crowdsourcing distribution: The distribution mode that outsources the distribution tasks performed by employees in the past to non-specific public in a free and voluntary form has the advantages of simple workflow and flexible working hours. For the embodiment of the present application, the crowdsourcing distribution includes the manner in which ordinary passengers participate in the distribution of items.

(4)接力配送：由多个配送资源先后配送单个物品的过程，例如第一个配送资源将物品从起始站点通过地铁系统运送到某个中转站，然后由后续配送资源继续将物品运送到最终的站点。其中，物品分配给配送资源的次数也可称为接力次数或跳数。(4) Relay distribution: The process of delivering a single item successively by multiple distribution resources. For example, the first distribution resource transports the item from the starting site to a transit station through the subway system, and then the subsequent distribution resources continue to deliver the item to final site. Among them, the number of times the item is allocated to the distribution resource may also be called the number of relays or the number of hops.

(5)订单：订购物品配送的凭据，每个需要配送的物品均具有对应的订单信息，物品配送也可称为订单配送。(5) Order: Credentials for ordering item delivery. Each item that needs to be delivered has corresponding order information. Item delivery can also be called order delivery.

本申请实施例提供了一种基于公共交通的众包配送方法，该方法可以由如图1所示的众包配送系统实现。具体来说，如图1所示，众包配送系统包括用户端(左下)、配送资源端(右下)、存储柜(下方)以及调度平台(上方)。其中：The embodiment of the present application provides a public transportation-based crowdsourcing distribution method, which can be implemented by the crowdsourcing distribution system as shown in FIG. 1 . Specifically, as shown in Figure 1, the crowdsourcing distribution system includes a user terminal (lower left), a distribution resource terminal (lower right), a storage cabinet (below), and a scheduling platform (above). in:

调度平台可以为云服务器平台，能够基于云服务分别与用户端、配送资源端、存储柜进行交互，用于储存订单(物品)信息和配送资源信息，并对订单进行调度和分配。The scheduling platform can be a cloud server platform, which can interact with the user terminal, the distribution resource terminal, and the storage cabinet based on cloud services, to store order (item) information and distribution resource information, and to schedule and allocate orders.

用户端可以包括下单用户端，用于用户下单，接收信息等，还可以包括接单用户端，用于接收订单，并提供订单物品。用户端可以直接或间接地与存储柜进行交互。用户端具体可以是用户使用的移动终端，也可以为安装于移动终端上的客户端，例如应用软件或应用小程序等，但不限于此。实际应用中，移动终端可以包括但不限于诸如移动电话、智能手机、平板电脑、笔记本电脑、智能手表等。The user terminal may include an ordering user terminal, which is used by the user to place an order and receive information, etc., and may also include an order receiving user terminal, which is used to receive the order and provide the order items. The client can interact with the storage cabinet directly or indirectly. The user terminal may specifically be a mobile terminal used by the user, or may be a client installed on the mobile terminal, such as application software or an application applet, but is not limited thereto. In practical applications, the mobile terminal may include, but is not limited to, such as a mobile phone, a smart phone, a tablet computer, a notebook computer, a smart watch, and the like.

配送资源端可以与存储柜进行交互，用于接收分配的订单(物品)信息，打开存储柜对物品进行提取或存放等。配送资源端具体可以是普通乘客或骑手等使用的移动终端，也可以为安装于移动终端上的客户端，配送资源端的类型具体可以参见对用户端的描述，在此不再赘述。此外，配送资源端也可以是指预置的机器人，可以直接与存储柜进行交互。The distribution resource terminal can interact with the storage cabinet to receive the allocated order (item) information, open the storage cabinet to extract or store the items, and so on. The distribution resource terminal may be a mobile terminal used by ordinary passengers or riders, or a client installed on the mobile terminal. For details of the type of the distribution resource terminal, please refer to the description of the user terminal, which will not be repeated here. In addition, the distribution resource terminal can also refer to a preset robot, which can directly interact with the storage cabinet.

存储柜(例如智能储物柜)，设立在公共交通的各站点，通过Wi-Fi网络或蜂窝网络与云服务器交互来更新物品信息，用于订单物品的中转和暂存。Storage lockers (such as smart lockers) are set up at various stations of public transportation, interact with cloud servers through Wi-Fi network or cellular network to update item information, and are used for transit and temporary storage of order items.

本申请实施例提供的众包配送方法能够根据配送资源的乘车线路将合适的订单分配给相应的配送资源，同时借助于在公共交通各站点部署的存储柜，使得一个订单可以由多个配送资源接力配送，达到提高配送效率、增大配送范围、提升配送安全性等效果。The crowdsourcing delivery method provided by the embodiment of the present application can allocate suitable orders to the corresponding delivery resources according to the bus routes of the delivery resources, and at the same time, with the help of the storage cabinets deployed at each station of public transportation, one order can be delivered by multiple Relay distribution of resources to achieve the effects of improving distribution efficiency, increasing distribution scope, and improving distribution security.

其中，基于公共交通的配送网络是一种低成本的配送网络，且能够拥有数百万乘客作为潜在的配送员。在一些大都市地区，公共交通比私家车和出租车更受欢迎，因为它更快，带来的拥堵和污染更少。因此将公共交通作为配送工具有显著的优势。Among them, the distribution network based on public transportation is a low-cost distribution network and can have millions of passengers as potential delivery personnel. In some metropolitan areas, public transport is more popular than private cars and taxis because it is faster and causes less congestion and pollution. There are therefore significant advantages to using public transport as a delivery tool.

并且本申请实施例提供的基于公共交通的众包配送方法非常简单，配送资源只需要在自己路线的起点和终点分别来对物品进行提取或存储，在坐乘坐公共交通过程中顺手帮助配送，无需特意换乘线路，不会影响配送资源的出行安排，最小化对配送资源的额外时间占用，从而降低配送成本。In addition, the public transportation-based crowdsourcing distribution method provided by the embodiment of the present application is very simple. The distribution resources only need to extract or store the items at the starting point and the end point of their own routes, and they can help the distribution during the process of taking public transportation without the need for Deliberately changing routes will not affect the travel arrangements of distribution resources, minimize the extra time occupied by distribution resources, and thus reduce distribution costs.

基于此，本申请实施例提供的众包配送方法也可称为搭便车配送方法，能够促使更多的普通乘客作为配送资源参与进来，实现真正的众包配送，从而进一步提升配送效率。Based on this, the crowdsourcing distribution method provided by the embodiments of the present application may also be called a free-riding distribution method, which can encourage more ordinary passengers to participate as distribution resources, realize real crowdsourcing distribution, and further improve distribution efficiency.

下面通过对几个示例性实施方式的描述，对本申请实施例的技术方案以及本申请的技术方案产生的技术效果进行说明。需要指出的是，下述实施方式之间可以相互参考、借鉴或结合，对于不同实施方式中相同的术语、相似的特征以及相似的实施步骤等，不再重复描述。The technical solutions of the embodiments of the present application and the technical effects produced by the technical solutions of the present application will be described below by describing several exemplary embodiments. It should be noted that the following embodiments may refer to, learn from, or combine with each other, and the same terms, similar features, and similar implementation steps in different embodiments will not be described repeatedly.

本申请实施例中提供了一种配送方法，如图2所示，该方法包括：A distribution method is provided in the embodiment of the present application, as shown in FIG. 2 , the method includes:

步骤S201：响应于针对任一目标物品的配送请求，按照以下方式执行对任一目标物品的至少一次分配，直至任一目标物品到达最终目的地：Step S201: In response to the delivery request for any target item, perform at least one distribution to any target item in the following manner, until any target item reaches the final destination:

其中，所要配送的物品可以是指外卖订单中的餐品或物品、或是同城即时递送的包裹等，本申请实施例对物品的类型在此不做限定。可以理解的是，需要在规定时间内通过外包送达的物品甚至生物，均可适用于本申请，都应包含在本申请保护范围以内。Wherein, the items to be delivered may refer to meals or items in a take-out order, or parcels delivered instantly in the same city, and the embodiment of the present application does not limit the types of items herein. It is understandable that items and even organisms that need to be delivered by outsourcing within the specified time are all applicable to this application and should be included within the scope of protection of this application.

本申请实施例中，任意目标物品均可通过配送资源的接力配送来到达最终目的地，即重复执行下述步骤S202至步骤S205将任一目标物品依次分配给至少一个配送资源，以通过分配的至少一个配送资源将物品接力送达最终目的地。采用接力配送的原因是考虑到不同站点的客流量差别较大，有时候选取一个中转站往往能够更快地把订单送往最终目的地。In this embodiment of the present application, any target item can be delivered to the final destination through the relay distribution of the distribution resources, that is, the following steps S202 to S205 are repeatedly executed to sequentially allocate any target item to at least one distribution resource, so as to pass the allocated resources. At least one delivery resource relays the item to the final destination. The reason for the use of relay delivery is that considering the large difference in the passenger flow of different sites, sometimes choosing a transit station can often send the order to the final destination faster.

本申请实施例中，对下述步骤S202至步骤S205重复执行的时机不做具体限定。一种可行的实施方式中，可以每间隔预定时间重复执行，例如预定时间可以为5分钟。其他实施例中，也可以设置其他时机来重复执行。In this embodiment of the present application, the timing of repeated execution of the following steps S202 to S205 is not specifically limited. In a feasible implementation manner, the execution may be repeated every predetermined time interval, for example, the predetermined time may be 5 minutes. In other embodiments, other timings may also be set for repeated execution.

步骤S202：获取当前环境信息以及各个存储柜中的每个物品的订单信息；Step S202: obtaining current environment information and order information of each item in each storage cabinet;

其中，环境信息是指会影响物品配送的相关信息，包括但不限于以下至少一项：天气信息、时间信息、配送资源供应信息、物品配送需求信息。其中，时间信息包括但不限于日期、工作日或节假日、时间点等；配送资源供应信息是指可供调度的能够配送物品的配送资源的信息，例如在站点等待中的配送资源的信息、在两个站点之间乘坐中的配送资源的信息等；物品配送需求信息是指需要进行配送的物品的信息，例如在站点的存储柜中存储的物品的信息、在两个站点之间由配送资源携带的物品的信息等。The environmental information refers to relevant information that will affect the delivery of items, including but not limited to at least one of the following: weather information, time information, delivery resource supply information, and item delivery demand information. Wherein, time information includes but is not limited to date, working day or holiday, time point, etc.; distribution resource supply information refers to the information of distribution resources that can be dispatched and can deliver items, such as information of distribution resources waiting at the site, The information of the distribution resources in the ride between the two sites, etc.; the item distribution demand information refers to the information of the items that need to be distributed, such as the information of the items stored in the storage cabinets of the site, the distribution resources between the two sites Information on items to be carried, etc.

相较于一些现有技术中仅采用运输数据从供应角度来研究物品配送问题，本申请实施例开创性地结合了配送资源的信息和物品的信息，从供需两个角度来处理订单的调度，能提高方案的实际应用效果。Compared with some existing technologies that only use transportation data to study the problem of item distribution from the perspective of supply, the embodiment of the present application creatively combines the information of distribution resources and the information of items, and handles the scheduling of orders from the perspectives of supply and demand, It can improve the practical application effect of the program.

物品的订单信息是指与订单有关的会影响物品配送的相关信息，包括但不限于以下至少一项：配送起点、交付目的地、总时间限制、剩余交付时间、分配历史、配送收益、每次分配的配送成本、订单超时成本等。The order information of an item refers to the relevant information related to the order that will affect the delivery of the item, including but not limited to at least one of the following: delivery origin, delivery destination, total time limit, remaining delivery time, distribution history, delivery revenue, each time Allocated shipping costs, order overtime costs, etc.

步骤S203：基于当前环境信息以及每个物品的订单信息，确定每个物品在公共交通中的下一个目的地；Step S203: Determine the next destination of each item in public transportation based on the current environment information and the order information of each item;

其中，下一个目的地即下一站点，也可称为下一跳。可以理解，该下一个目的地可能是物品的最终目的地，也可能是一个中转站。Among them, the next destination is the next site, which may also be called the next hop. It can be understood that the next destination may be the final destination of the item, or may be a transit station.

本申请实施例中，将给物品确定下一个目的地的调度问题，看作对物品的路由(routing)决策问题，即对物品的配送路线进行选择。In the embodiment of the present application, the scheduling problem of determining the next destination for an item is regarded as a routing decision problem for the item, that is, the selection of the delivery route of the item.

步骤S204：在检测到配送资源针对任一存储柜发起的物品配送指令时，获取配送资源的乘坐目的地；Step S204: when detecting an item delivery instruction initiated by the delivery resource for any storage cabinet, obtain the ride destination of the delivery resource;

当该配送资源到达该任一存储柜处时，可以通过智能手机或点击存储柜等方式发起针对该任一存储柜的物品配送指令，并输入其乘坐目的地，由系统获取。When the distribution resource arrives at any of the storage cabinets, you can initiate a delivery instruction for any storage cabinet through a smartphone or click on the storage cabinet, and enter its destination for the system to obtain.

步骤S205：根据任一存储柜中每个物品的下一个目的地和乘坐目的地，给配送资源分配下一个目的地与乘坐目的地相同的至少一个物品，以使得配送资源将至少一个物品携带到乘坐目的地的存储柜中进行存储。Step S205: According to the next destination and the ride destination of each item in any storage cabinet, assign at least one item whose next destination is the same as the ride destination to the distribution resource, so that the distribution resource can carry at least one item to the destination. Store in a locker at the ride destination.

即系统会根据配送资源的乘坐目的地和该任一存储柜中现有物品的下一个目的地，决定配送资源携带哪个或那些物品。如果配送资源与一个或一些物品共享相同的下一跳，则会向配送资源分配这个或这些物品，具体携带的数量不超出预设的配送资源的携带能力。本申请实施例中对配送资源的携带能力不做具体限定，本领域技术人员可以根据实际情况进行设置或实时评价。That is, the system will decide which item or items the delivery resource will carry based on the destination of the delivery resource and the next destination of the existing items in any storage cabinet. If the delivery resource shares the same next hop with one or some items, the item or items will be allocated to the delivery resource, and the specific carrying quantity does not exceed the preset carrying capacity of the delivery resource. The carrying capacity of the distribution resource is not specifically limited in the embodiments of the present application, and those skilled in the art can set or evaluate in real time according to the actual situation.

作为示例地，采用本申请实施例提供的配送方法进行物品配送的过程如图3所示，针对一个订单，可以由多个配送资源进行配送，图3中以两个为例。As an example, the process of item distribution using the distribution method provided by the embodiment of the present application is shown in FIG. 3 . For an order, multiple distribution resources can be used for distribution, and FIG. 3 takes two as an example.

在对每个物品的配送调度中，主要考虑的因素有两个：时间限制和总的接力跳数。调度的目标就在于在订单的时间限制内(如1个小时)，用最小的接力跳数将物品从配送起点O运送到交付目的地D。There are two main factors to consider in the dispatch scheduling of each item: time constraints and total relay hops. The goal of scheduling is to deliver the item from the delivery origin O to the delivery destination D with the minimum number of relay hops within the time limit of the order (such as 1 hour).

一个最简单的方法是，只等待那些从O点到D点的配送资源，但问题在于，从O到D的配送资源可能非常少，如果只等待这样的配送资源，可能导致配送超时。One of the easiest ways is to only wait for those delivery resources from point O to point D, but the problem is that there may be very few delivery resources from O to D, and if you only wait for such delivery resources, it may lead to delivery timeout.

因此，本申请实施例提出的方法就是，对站点的客流量进行预测，进而预测出物品在不同路径选择下的配送时间，进而根据订单的起始站点和目标站点进行路径规划，选择出最适合路径，比如在OD之间加入中转站A，这样虽然可能增加配送成本，但可以避免超时。Therefore, the method proposed in the embodiment of the present application is to predict the passenger flow of the site, and then predict the delivery time of the item under different path selections, and then perform path planning according to the starting site and target site of the order, and select the most suitable one. For example, adding a transfer station A between ODs may increase the cost of delivery, but it can avoid overtime.

与单跳路线相比，通过多跳配送可以节省时间以满足时间限制，但会给配送利润和估计总交付时间带来额外的挑战。基于此，本申请实施例在实现不同站点的接力配送的过程中，在给定的配送时间限制下选择接力过程最少的路径，以降低配送成本。同时，通过对地铁内客流量的预测和配送时间的预测，可以在降低超时率的同时控制配送成本，从而最大化配送利润。Delivering through multiple hops saves time to meet time constraints compared to single-hop routes, but creates additional challenges for delivery margins and estimating total delivery time. Based on this, the embodiments of the present application select the path with the least relay process under a given delivery time limit in the process of implementing relay delivery at different sites, so as to reduce delivery costs. At the same time, by predicting the passenger flow and delivery time in the subway, the delivery cost can be controlled while the overtime rate is reduced, thereby maximizing the delivery profit.

由于现实世界中物品和配送资源以及环境都是动态变化的，本申请实施例中综合考虑所有因素(即时间限制、多跳、利润和费用等)，以提高基于公共交通的众包配送的适用性和性能。Since items and distribution resources and environments in the real world change dynamically, all factors (ie time constraints, multi-hops, profits and costs, etc.) are comprehensively considered in the embodiments of this application to improve the applicability of crowdsourcing distribution based on public transportation. sex and performance.

需要说明的是，本申请实施例提供的技术方案重点关注公共交通中的调度，即进行调度的物品和配送资源的起点和终点均是指公共交通路线中的站点。It should be noted that the technical solutions provided in the embodiments of the present application focus on scheduling in public transportation, that is, the starting point and the ending point of the dispatched items and distribution resources refer to stations in the public transportation route.

实际应用中，针对外卖配送等场景，物品即为用户订购的餐品。在将公共交通作为干路路线的基础上，可能存在需要其他方式配送的分支路线，例如通过骑手驾驶电动车将餐品从商家配送到公共交通中的配送起点的存储柜中，以及在餐品到达公共交通路线中的交付目的地的存储柜时，通过骑手驾驶电动车将餐品送达用户。也就是说，用户端可以通过骑手间接地与存储柜进行交互。对于这种情况，对分支路线的调度可以采用单独的调度算法，本申请实施例在此不做限定。其他实施例中，也不排除商家直接将餐品存储到公共交通中的配送起点的存储柜中，或用户直接在公共交通路线中的交付目的地的存储柜中取餐的情况。也就是说，用户端可以直接地与存储柜进行交互。这种情况便可直接采用本申请实施例提供的分配方法进行调度。本领域技术人员应能理解该场景仅为举例，基于这该范例进行适当的变化也可适用于本申请，也应包含在本申请保护范围以内。In practical applications, for scenarios such as takeaway delivery, the item is the meal ordered by the user. On the basis of using public transportation as the main road route, there may be branch routes that require other means of delivery, such as the delivery of meals by riders driving electric vehicles from merchants to storage cabinets at the delivery point in public transportation, and in the food When reaching the storage locker at the delivery destination in the public transport route, the meal is delivered to the user by the rider driving the electric vehicle. That is, the client can interact with the storage cabinet indirectly through the rider. In this case, a separate scheduling algorithm may be used for scheduling the branch route, which is not limited in this embodiment of the present application. In other embodiments, it is also not excluded that the merchant directly stores the meal in the storage cabinet of the delivery origin in the public transportation, or the user directly picks up the meal in the storage cabinet of the delivery destination in the public transportation route. That is, the client can directly interact with the storage cabinet. In this case, the allocation method provided by the embodiment of the present application can be directly used for scheduling. Those skilled in the art should understand that this scenario is only an example, and appropriate changes based on this example can also be applied to the present application, and should also be included in the protection scope of the present application.

本申请实施例提供的配送方法，通过乘坐公共交通的配送资源接力携带物品抵达最终目的地，达到利用公共交通来配送物品的目的，可以实现以较快的速度抵达市区内较远的目的地，即作为配送系统中的干路实现更大范围的全城配送。且基于公共交通的配送方式可以减少配送中电动车的使用，一定程度上提高了配送资源的安全性。此外，本申请实施例提供的配送方法非常简单，配送资源只需要在自己路线的起点和终点分别来对物品进行提取或存储，无需特意换乘线路，因此可以促使更多的普通乘客作为配送资源参与进来，能够显著提升配送效率。The delivery method provided by the embodiment of the present application can achieve the purpose of using public transportation to deliver the goods by taking the delivery resource relay of public transportation to carry the goods to the final destination, and can reach the destination farther in the urban area at a faster speed. , that is, as the main road in the distribution system to achieve a wider range of city-wide distribution. And the distribution method based on public transportation can reduce the use of electric vehicles in distribution, and improve the safety of distribution resources to a certain extent. In addition, the distribution method provided by the embodiment of the present application is very simple, and the distribution resources only need to extract or store the items at the starting point and the end point of their own routes, and there is no need to transfer routes, so more ordinary passengers can be promoted as distribution resources. Participation can significantly improve distribution efficiency.

考虑到对于物品的配送，平台需要一种盈利模式，用于在交付时间限制和物品的接力配送次数之间权衡以确定最适合配送路线，达到最大限度地降低成本，同时满足时间限制的目的，本申请实施例提供了一种可能的实施方式，步骤S203具体可以包括：Considering the distribution of items, the platform needs a profit model that is used to balance the delivery time limit and the number of relay deliveries of items to determine the most suitable distribution route to minimize costs and meet time constraints. The embodiment of the present application provides a possible implementation manner, and step S203 may specifically include:

其中，每个物品的配送收益即该物品配送可收取的配送费；每个物品的分配次数与每次分配的配送成本的乘积即需要付给配送该物品的所有配送资源的配送费；超时订单的数量与订单超时成本的乘积即所有超时订单需要赔付的配送费。Among them, the distribution revenue of each item is the distribution fee that can be charged for the distribution of the item; the product of the distribution times of each item and the distribution cost of each distribution is the distribution fee that needs to be paid to all distribution resources that deliver the item; overtime orders The product of the quantity and the order overtime cost is the delivery fee that needs to be paid for all overtime orders.

则在整个配送过程中，利润＝收取的所有物品的总配送费-付给配送所有物品的所有配送资源的配送费-所有超时订单赔付的配送费。Then in the whole delivery process, profit = total delivery fee charged for all items - delivery fee paid to all delivery resources that deliver all items - delivery fee paid for all overtime orders.

一种可行的实施方式中，假设若订单超时则全数返还收取的配送费，那么可采用的利润模型如下：In a feasible implementation, it is assumed that if the order is overtime, the delivery fee charged will be returned in full, then the profit model that can be adopted is as follows:

其中，N是总物品数；h是一个订单物品的接力配送的跳数(次数)；CustPay是收取的配送费；HopCost付给每次配送的配送资源的配送费；M是超时订单的数量；Among them, N is the total number of items; h is the number of hops (number of times) of relay delivery of an order item; CustPay is the delivery fee charged; HopCost pays the delivery fee for each delivery resource; M is the number of overtime orders;

可以理解的是，为了最大限度地提高平台的利润，需要更高的CustPay、更低的HopCost、每个物品更少的分配次数和更少的超时订单。因此本申请实施在基于利润最大约束对物品的调度中，专注于如何实现减少每个物品所需的分配次数，同时在时间限制内交付完成更多的物品，基于此来确定每个物品在公共交通中的下一个目的地，使得平台可以获得尽可能多的利润。Understandably, higher CustPay, lower HopCost, fewer dispenses per item, and fewer overtime orders are needed to maximize the platform’s profits. Therefore, this application is implemented in the scheduling of items based on the maximum profit constraint, focusing on how to reduce the number of allocations required for each item, while delivering more items within the time limit. Based on this, it is determined that each item is in the public domain. The next destination in traffic, allowing the platform to make as much profit as possible.

本申请实施例提供了一种可能的实施方式，步骤S203具体可以包括：基于环境信息以及每个物品的订单信息，通过强化学习调度模型，确定每个物品在公共交通中的下一个目的地，强化学习调度模型是基于强化学习算法得到的。The embodiment of the present application provides a possible implementation. Step S203 may specifically include: determining the next destination of each item in public transportation based on the environmental information and the order information of each item, through a reinforcement learning scheduling model, The reinforcement learning scheduling model is obtained based on the reinforcement learning algorithm.

强化学习(Reinforcement Learning，RL)是一种机器学习的算法，通常用于对顺序决策问题进行建模。在RL的设置中，如图4所示，决策执行体(Agent)每次会做出一个决策行为(Action)并获得一些奖励(Reward)，也可称为回报，每次的决策行为将改变环境状态(State)。环境(Environment)将决策行为的回报以及改变的状态反馈给决策执行体。强化学习的学习过程，即决策执行体在与环境的交互中学习如何根据环境状态和回报来选择下一个决策行为，以最大限度地提高下一个回报以及总回报。换言之，强化学习可以通过对历史数据的学习，针对不同的环境状态，根据设定的目标函数选取最优的策略(决策行为)。Reinforcement Learning (RL) is a machine learning algorithm commonly used to model sequential decision-making problems. In the RL setting, as shown in Figure 4, the decision executor (Agent) will make a decision action (Action) and get some reward (Reward) each time, which can also be called reward, and the decision action will change each time Environment state (State). The environment feeds the reward of the decision-making behavior and the changed state back to the decision-making body. The learning process of reinforcement learning, that is, the decision executor learns how to choose the next decision behavior according to the environment state and reward in the interaction with the environment, so as to maximize the next reward and the total reward. In other words, reinforcement learning can select the optimal strategy (decision-making behavior) according to the set objective function for different environmental states through the learning of historical data.

具体来说，本申请实施例通过强化学习将物品的调度看作为物品的路由，即针对包的下一跳进行选择。则该过程自然地被制定为马尔科夫决策过程。Specifically, in the embodiment of the present application, the scheduling of items is regarded as the routing of items through reinforcement learning, that is, the next hop of the package is selected. Then the process is naturally formulated as a Markov decision process.

本申请实施例中，强化学习算法的环境是指物品和配送资源的动态变化，以及一些现实环境的动态变化，如天气、时间等。In the embodiment of this application, the environment of the reinforcement learning algorithm refers to the dynamic changes of items and distribution resources, as well as the dynamic changes of some real environments, such as weather and time.

本申请实施例中，强化学习算法的决策执行体是针对每个物品分别执行决策的Agent，本申请实施例中，由图1所示的调度平台来充当元代理，即调度平台作为决策执行体以集中的方式来为所有物品做出决策。从而调度平台可以得到所有物品的路线，便于共享所有物品的局部信息，进而从全局的角度指导针对每个物品的每一次路由决策。In the embodiment of the present application, the decision executor of the reinforcement learning algorithm is an agent that executes decisions for each item respectively. In the embodiment of the present application, the scheduling platform shown in FIG. 1 acts as the meta-agent, that is, the scheduling platform acts as the decision executor Make decisions for all items in a centralized manner. Therefore, the scheduling platform can obtain the routes of all items, which is convenient for sharing local information of all items, and then guides each routing decision for each item from a global perspective.

进一步地，本申请实施例中，强化学习算法的决策行为包括为物品确定下一个目的地，即在所有站点中物品应该去的下一个站点。Further, in this embodiment of the present application, the decision-making behavior of the reinforcement learning algorithm includes determining the next destination for the item, that is, the next site the item should go to among all the sites.

本申请实施例中，强化学习算法的环境状态包括环境信息以及物品的订单信息。具体来说，环境状态可以分为两种类型：全局状态(基于环境信息)和局部状态(基于物品的订单信息)。其中，全局状态可以包括：In this embodiment of the present application, the environmental state of the reinforcement learning algorithm includes environmental information and order information of items. Specifically, the environmental state can be divided into two types: global state (based on environmental information) and local state (based on item order information). Among them, the global state can include:

(1)需求(即物品配送需求)状态，可以表示为：在所有站点等待(即存储在存储柜中)的物品的一维分布，以及正在所有的两个站点之间配送中的物品的二维分布(即站点×站点的矩阵)。(1) Demand (ie, item delivery demand) status, which can be expressed as: one-dimensional distribution of items waiting at all sites (ie, stored in storage cabinets), and two-dimensional distribution of items being distributed between all two sites Dimensional distribution (i.e. a matrix of sites × sites).

(2)供应(即配送资源供应)状态，可以表示为：进入所有站点等待中的配送资源的一维分布，以及正在所有的两个站点之间乘坐中的配送资源的二维分布(即站点×站点的矩阵)。(2) The state of supply (that is, the supply of distribution resources) can be expressed as: the one-dimensional distribution of the distribution resources that are waiting to enter all the stations, and the two-dimensional distribution of the distribution resources that are in the ride between all the two stations (that is, the stations × matrix of sites).

(3)上下文状态，可以表示为：天气、日期、工作日或节假日、时间点等。(3) Context state, which can be expressed as: weather, date, working day or holiday, time point, etc.

一次决策行为的局部状态可以表示为对应物品的以下至少一项信息：配送起点、交付目的地、总时间限制、剩余交付时间、分配历史、配送收益、每次分配的配送成本、订单超时成本等，也可以不限于此。The local state of a decision-making action can be expressed as at least one of the following information of the corresponding item: delivery origin, delivery destination, total time limit, remaining delivery time, distribution history, distribution revenue, distribution cost per distribution, order overtime cost, etc. , and may not be limited to this.

实际应用中，可以通过状态空间描述一次决策行为的所有状态变量构成的多维状态。In practical applications, a multi-dimensional state composed of all state variables of a decision-making behavior can be described by the state space.

本申请实施例中，强化学习算法的回报包括物品到达最终目的地后的利润，由于决策执行体的目标就是最大化每个状态的预期回报，本申请实施例设置的回报使得只要满足时间限制，算法就会选择跳数最少的路线，以实现利润最大化。具体地，在物品到达最终目的地之前，回报设置为0；否则设置为物品到达最终目的地后的利润。也就是说，物品到达最终目的地才会产生回报。In the embodiment of the present application, the reward of the reinforcement learning algorithm includes the profit after the item reaches the final destination. Since the goal of the decision executor is to maximize the expected return of each state, the reward set in the embodiment of the present application is such that as long as the time limit is met, The algorithm chooses the route with the fewest hops to maximize profit. Specifically, before the item reaches the final destination, the reward is set to 0; otherwise, it is set to the profit after the item reaches the final destination. That is, the reward is not generated until the item reaches its final destination.

一种可行的实施方式中，假设若订单超时则全数返还收取的配送费，那么可设置的回报如下：In a feasible implementation, it is assumed that if the order times out, the delivery fee will be refunded in full, then the return that can be set is as follows:

其中，CustPay_i是收取的物品i的配送费，

是付给配送物品i的所有配送资源的配送费。

是一个指标函数，若物品i配送未超时，则

若物品i配送超时，则

Among them, CustPay _i is the delivery fee charged for item i,

is the delivery fee paid to all delivery resources for delivery item i.

is an indicator function. If the delivery of item i does not time out, then

If the delivery of item i times out, then

实际应用中，可以利用值函数Q(s,a)＝E[G_t|s_t＝s,a_t＝a]估计当前环境状态(s_t＝s)下选择各种决策行为(a_t＝a)的预期回报E，以便选择预期回报最高的决策行为进行执行。其中，G_t当前的折扣回报，可选地，

γ表示折扣率，γ∈(0,1]，r表示回报；即Gt表示基于折现率γ得到的折扣回报的总和。In practical applications, the value function Q(s,a)=E[G _t | _s _t = _s ,at =a] can be used to estimate the selection of various decision-making behaviors ₍ at = a) of the expected return E, so that the decision-making behavior with the highest expected return is selected for execution. where G _t the current discounted return, optionally,

γ represents the discount rate, γ∈(0,1], and r represents the return; that is, Gt represents the sum of discounted returns based on the discount rate γ.

本申请实施例中，可以使用神经网络来表达值函数。例如，采用一个带参数θ的近似值函数Q(s,a；θ)，其中，θ可以利用各种强化学习算法进行更新。In this embodiment of the present application, a neural network can be used to express the value function. For example, take an approximation function Q(s, a; θ) with parameters θ, where θ can be updated using various reinforcement learning algorithms.

一种可行的实施方式中，可以选择包含经验回放机制的深度Q网络(Deep QNetwork，DQN)来表达近似值函数并更新参数。通过DQN能够处理高维感知输入(环境状态)，以应对大量的站点、物品和配送资源。经验回放机制能够在DQN训练过程中决定存储和抽取训练数据的方式，避免时序关联对训练过程的影响，能够提高训练数据的利用率，进一步提高强化学习效率。In a feasible implementation, a deep Q network (Deep QNetwork, DQN) including an experience replay mechanism can be selected to express the approximate value function and update the parameters. High-dimensional perceptual input (environmental state) can be processed through DQN to deal with a large number of stations, items and distribution resources. The experience playback mechanism can determine the way of storing and extracting training data during the DQN training process, avoid the impact of time series correlation on the training process, improve the utilization of training data, and further improve the efficiency of reinforcement learning.

本申请实施例中，在强化学习的每次迭代中，DQN使用以下损失函数更新参数：In this embodiment of the present application, in each iteration of reinforcement learning, DQN uses the following loss function to update parameters:

L(θ)＝E_{(s,a,r,s’)～U(D)}[(r+γmax_a’Q(s’,a’；θ^-)-Q(s,a；θ))²]L(θ)=E _{(s,a,r,s')～U(D)} [(r+γmax _a' Q(s',a';θ ^- )-Q(s,a;θ)) ² ]

其中，U(D)是回放缓存D的均匀分布，在经验回放中，采样出的训练数据e_t＝(s_t，a_t，r_t，s_t+1)都存储在回放缓存D_t＝e₁，…，e_t中。Among them, U(D) is the uniform distribution of the playback buffer D. In the empirical playback, the sampled training data e _t = (s _t , at , r _t , s _t ₊₁ ) are stored in the playback buffer D _t = e ₁ , ..., e _t .

本申请实施例中，强化学习调度模型是基于强化学习算法得到的，具体可以包括：In the embodiment of the present application, the reinforcement learning scheduling model is obtained based on the reinforcement learning algorithm, and may specifically include:

也就是说，本申请实施例使用深度RL方法从海量历史交通数据和订单数据中学习每个物品的最适合路线，具体采用了不同的费用设置(例如CustPay和HopCost等)和环境因素(天气、时间、供应和需求等)来进行学习。That is to say, the embodiments of the present application use a deep RL method to learn the most suitable route for each item from massive historical traffic data and order data, and specifically adopt different fee settings (such as CustPay and HopCost, etc.) and environmental factors (weather, time, supply and demand, etc.) to learn.

具体而言，可以从无处不在的移动设备和基础设施收集大量历史的公共交通数据和订单数据，通过历史的交通数据对每对站点之间的运送时间进行预估，结合历史的订单数据进行模拟仿真，再通过强化学习建立调度模型，强化学习调度模型会同时考虑实际因素和后续决策，即在随后的调度决策中考虑相关关系。Specifically, a large amount of historical public transportation data and order data can be collected from ubiquitous mobile devices and infrastructure, and the transit time between each pair of stations can be estimated through historical traffic data, combined with historical order data. Simulation is carried out, and then a scheduling model is established through reinforcement learning. The reinforcement learning scheduling model will consider both actual factors and subsequent decisions, that is, relevant relationships will be considered in subsequent scheduling decisions.

由于本申请实施例提供的配送方法具有"搭便车"的性质，配送过程不会改变车辆或配送的路线，即不需要配送资源绕道而行，因此可以基于真实的历史数据有效地模拟物品的配送过程，进而从历史大数据中学习调度策略。Since the distribution method provided by the embodiments of the present application has the property of "free-riding", the distribution process does not change the vehicle or the distribution route, that is, the distribution resources do not need to take a detour, so the distribution of items can be effectively simulated based on real historical data. process, and then learn scheduling policies from historical big data.

本申请实施例中，还可以在强化学习算法中结合上述利润模型，以利润最大化的最终目标来学习强化学习算法的决策行为。In the embodiment of the present application, the above-mentioned profit model may also be combined with the reinforcement learning algorithm to learn the decision-making behavior of the reinforcement learning algorithm with the ultimate goal of profit maximization.

具体而言，利润模型可以用于指导强化学习算法的每个决策行为的回报功能设计。与上述强化学习算法的回报包括物品到达最终目的地后的利润相对应。Specifically, profit models can be used to guide the reward function design of each decision-making behavior of reinforcement learning algorithms. Corresponds to the reward of the reinforcement learning algorithm above including the profit after the item reaches its final destination.

同时，调度平台充当元代理来每次做出决策行为时，会预测未来预定时间段(例如几个小时)内的总回报(即平台利润)，从更全局的角度指导每次决策行为，最终目标是为平台实现利润最大化。At the same time, the scheduling platform acts as a meta-agent to predict the total return (i.e. platform profit) within a predetermined time period (such as a few hours) in the future every time a decision is made, guides each decision from a more global perspective, and ultimately The goal is to maximize profits for the platform.

本申请实施例提供了一种可能的实施方式，步骤S203具体可以包括：The embodiment of the present application provides a possible implementation manner, and step S203 may specifically include:

其中，预计到达时间(Estimated Time of Arrival，ETA)是地图导航、物流运输中常用的概念，指的是人或车辆等预计到达某个地方的时间。在本申请实施例中，ETA能够用于估计任意两个站点之间物品配送所需的时间。Among them, Estimated Time of Arrival (ETA) is a commonly used concept in map navigation and logistics transportation, which refers to the time when people or vehicles are expected to arrive at a certain place. In this embodiment of the present application, ETA can be used to estimate the time required for item delivery between any two sites.

本申请实施例中，将物品的配送时间分为两部分：等待时间和运送时间。具体来说，等待时间是指物品在各存储柜(包括中转站的存储柜)中存储的时间，具体可以是物品放置与取出之间的时间；运送时间是指物品由配送资源携带的时间，具体可以是配送资源取出物品到下一目的地放置物品之间的时间。In the embodiment of the present application, the delivery time of an item is divided into two parts: waiting time and delivery time. Specifically, the waiting time refers to the time that the items are stored in each storage cabinet (including the storage cabinets of the transfer station), which may be the time between the items being placed and taken out; the delivery time refers to the time that the items are carried by the distribution resources, Specifically, it may be the time between the delivery resource taking out the item and placing the item at the next destination.

在一个物品由多个配送资源携带配送的情况下，等待时间和运送时间均由多段时间组成。具体地，总交付时间T_D的计算公式如下：When an item is carried and distributed by multiple distribution resources, both the waiting time and the delivery time consist of multiple periods of time. Specifically, the calculation formula of the total delivery time _TD is as follows:

其中，T_W是物品每次分配前的等待时间，T_R是物品每次分配的运送时间，n是物品的分配次数。Among them, _TW is the waiting time before each distribution of the item, _TR is the delivery time of each distribution of the item, and n is the number of times the item is distributed.

本申请的发明人根据对历史数据的实验观察发现，等待时间T_W和运送时间T_R均遵循高斯分布。需要说明的是，运送时间是基于每两个站点来估计的，理论上在两个站点间选择不同的路线会导致运送时间不同，但在特定时间段内人们对路线的选择往往是固定的，例如高峰时段人们普遍会选择耗时较少的路程等，因此针对一天中的不同时段，路线选择几乎是固定的，使得运送时间也遵循高斯分布。The inventors of the present application found, based on experimental observations on historical data, that both the waiting time _TW and the transit time _TR follow a Gaussian distribution. It should be noted that the delivery time is estimated based on every two stations. In theory, choosing different routes between the two stations will lead to different delivery times. However, people's choice of routes is often fixed during a specific time period. For example, during peak hours, people generally choose a journey that takes less time. Therefore, for different time periods of the day, the route selection is almost fixed, so that the delivery time also follows a Gaussian distribution.

一种可行的实施方式中，可以采用参数高斯(N(μ,σ²))来对等待时间和运送时间进行建模，通过高斯不仅可以得到时间分布的均值信息(μ)，还可以得到时间分布的方差信息(σ²)。其中，方差信息能够为总交付时间提供统计性保证。In a feasible implementation, the parameter Gaussian (N(μ,σ ² )) can be used to model the waiting time and the delivery time. Through the Gaussian, not only the mean information (μ) of the time distribution, but also the time can be obtained. Variance information (σ ² ) of the distribution. Among them, the variance information can provide a statistical guarantee for the total delivery time.

具体来说，为等待时间T_W构建的高斯分布模型为

其中，

和

分别是

的均值信息和方差信息。为运送时间T_R构建的高斯分布模型为

其中，

和

分别是

的均值信息和方差信息。基于此，每个物品的运送时间和等待时间可以拟合高斯。Specifically, the Gaussian distribution model constructed for the waiting time _TW is

in,

and

respectively

mean and variance information. The Gaussian distribution model constructed for the transit time _TR is

in,

and

respectively

mean and variance information. Based on this, the shipping time and waiting time of each item can be fitted with a Gaussian.

进一步地，考虑到当乘客作为配送资源时，乘客的意愿会影响等待时间的估计(因为并非所有乘客都愿意参与配送物品，即有多少乘客参与会影响物品的配送时间)，使用参与率指标ρ来模拟全局意愿(例如ρ＝0.1表示10％的乘客愿意参与配送物品)。具体而言，可以每次分配的配送成本和/或配送时间段来确定参与率指标，本领域技术人员可以根据实际情况来设置具体的确定方式，在此不做具体限定。Further, considering that when passengers are used as delivery resources, the willingness of passengers will affect the estimation of waiting time (because not all passengers are willing to participate in the delivery of items, that is, how many passengers participate will affect the delivery time of items), using the participation rate indicator ρ to simulate the global willingness (eg ρ=0.1 means that 10% of passengers are willing to participate in the delivery of items). Specifically, the participation rate index can be determined by the distribution cost and/or the distribution time period allocated each time, and a person skilled in the art can set a specific determination method according to the actual situation, which is not specifically limited here.

在固定时间段和参与率下，每个物品的运送时间和等待时间可以拟合高斯。即本申请实施例中，确定每个物品采用各种路线对应的预计到达时间，可以包括：根据每个物品的配送起点、每个物品的交付目的地、预设参与率(ρ)、每个物品的配送时间段中的至少一项，通过预设的预计到达时间模型，得到每个物品采用各种路线对应的运送时间和等待时间；基于每个物品采用各种路线对应的运送时间和等待时间，确定每个物品采用各种路线对应的预计到达时间；其中，预计到达时间模型([μ_W，σ_W，μ_R，σ_R])是对多个物品的历史运送时间和历史等待时间进行高斯拟合得到的。A Gaussian can be fitted to the shipping time and waiting time of each item for a fixed time period and participation rate. That is, in the embodiment of the present application, determining the estimated arrival time corresponding to each item using various routes may include: according to the distribution starting point of each item, the delivery destination of each item, the preset participation rate (ρ), each item For at least one item in the delivery time period of the item, through the preset estimated time of arrival model, the delivery time and waiting time corresponding to each item using various routes are obtained; the delivery time and waiting time corresponding to each item using various routes are obtained. time, determine the estimated arrival time corresponding to each item using various routes; _wherein , the estimated arrival time model ([μW, _σW , _μR , _σR ]) is the historical delivery time and historical waiting time of multiple items obtained by Gaussian fitting.

具体而言，ETA结果可以是一个四维张量[μ_W，σ_W，μ_R，σ_R]＝H(ρ，t，o，d)，其中μ_W和σ_W是估计等待时间的高斯参数，μ_R和σ_R是估计运送时间的高斯参数，o和d表示物品配送的源头和最终目的地的两个站点，ρ表示参与率，t表示配送时间段。Specifically, the ETA result can be a four-dimensional tensor [μW, _σW , _μR , σR] = _H (ρ, _t , o, d), where _μW and _σW are Gaussian parameters for estimating latency , μ _R and σ _R are Gaussian parameters for estimating the delivery time, o and d represent the source and final destination of the item delivery to the two sites, ρ represents the participation rate, and t represents the delivery time period.

本申请实施例中，通过ETA模型进行时间估计的目的是获得更多有关时间分配的信息，即可以根据ETA设计一个路线过滤器。考虑到一个城市有数百个公共交通站点，为每个物品确定在公共交通中的下一个目的地时计算量很大，通过ETA可以过滤每个物品超时的路线，以提高路线路由性能，保证物品的交付时间。In this embodiment of the present application, the purpose of performing time estimation through the ETA model is to obtain more information about time allocation, that is, a route filter can be designed according to the ETA. Considering that there are hundreds of public transportation stops in a city, it is computationally expensive to determine the next destination in public transportation for each item. Through ETA, it is possible to filter the routes over time for each item to improve the route routing performance, guaranteeing Delivery time of the item.

本申请实施例中，还可以在利用强化学习调度模型，确定每个物品在公共交通中的下一个目的地时结合ETA感知(Aware)来消除不可行的决策行为，避免过大的决策数据导致的训练收敛缓慢以及对路由性能造成的影响。具体而言，可以利用离线收集的ETA信息，作为过滤用的先验信息来加速培训，并能保证训练结果更好地实现物品准时交付。In the embodiment of the present application, the reinforcement learning scheduling model can also be used to determine the next destination of each item in public transportation in combination with ETA awareness (Aware) to eliminate infeasible decision-making behaviors and avoid excessive decision-making data. The training convergence is slow and the impact on routing performance. Specifically, ETA information collected offline can be used as a priori information for filtering to speed up training and ensure that the training results can better achieve on-time delivery of items.

实际应用中，可以使用高斯的量化功能(分位数函数，即逆CDF(CumulativeDistribution Function，累积分布函数))，来保证实际到达时间小于或等于预计到达时间的概率为p(可称之为准时率参数)。In practical applications, the Gaussian quantization function (quantile function, ie inverse CDF (CumulativeDistribution Function, cumulative distribution function)) can be used to ensure that the actual arrival time is less than or equal to the expected arrival time The probability is p (which can be called on-time. rate parameter).

本申请实施例提供了一种可行的实施方式，在执行过滤操作时，潜在的决策行为a_j的概率可以是：The embodiment of the present application provides a feasible implementation manner. When performing the filtering operation, the probability of the potential decision-making behavior a _j may be:

其中，

是从当前站o_t到最终目的地d_i的预计到达时间，而r_i是物品i的剩余配送时间。

的计算公式可以为：in,

is the estimated time of arrival from the current station _o _t to the final destination di, and ri is the remaining delivery time for item _i .

The calculation formula can be:

其中，s_j是在当前时间t，物品i的决策行为a_j确定的下一个目的地(中转站)。即预计派送时间为当前站的等待时间，当前站到中转站的运送时间、中转站的等待时间、中转站到最终目的地的运送时间之和。可以理解，如果决策行为a_j确定的下一个目的地恰好是最终目的地，则Among them, s _j is the next destination (transit station) determined by the decision-making behavior a _j of item i at the current time t. That is, the estimated delivery time is the sum of the waiting time of the current station, the transportation time from the current station to the transfer station, the waiting time of the transfer station, and the transportation time from the transfer station to the final destination. It can be understood that if the next destination determined by the decision behavior a _j happens to be the final destination, then

根据ETA结果[μ_W，σ_W，μ_R，σ_R]＝H(ρ，t，o，d)，

的计算公式中的每一项分别使用以下高斯分位数函数计算：According to ETA results [μ _W , σ _W , μ _R , σ _R ]=H(ρ, t, o, d),

Each term in the formula is calculated using the following Gaussian quantile function:

其中，erf()是误差函数，0<p<1，可以理解的是，较大的p值可以保证更多物品准时交付，但会过滤一些潜在的低成本路线。因此，参数p的设置是准时交付率和利润之间的权衡，本领域技术人员可以根据实际情况对p进行设置，本申请实施例在此不做限定。作为示例地，p＝0.9，即90％的实际到达时间小于或等于预计到达时间。使用这种机制能够最大程度地保证系统的总时间限制。where erf() is the error function, 0<p<1, it is understandable that a larger value of p can guarantee more items to be delivered on time, but will filter some potentially low-cost routes. Therefore, the setting of the parameter p is a trade-off between the on-time delivery rate and the profit, and those skilled in the art can set p according to the actual situation, which is not limited in this embodiment of the present application. As an example, p=0.9, ie 90% of the actual arrival times are less than or equal to the expected arrival times. Using this mechanism maximizes the total time limit of the system.

本申请实施例中，如图5所示，ETA感知的RL调度算法确定每个物品在公共交通中的下一个目的地的执行过程可以包括：In the embodiment of the present application, as shown in FIG. 5 , the execution process of determining the next destination of each item in public transportation by the ETA-aware RL scheduling algorithm may include:

根据当前环境信息以及所有物品的订单信息，更新所有物品的全局状态和局部状态。Update the global and local states of all items based on the current environment information and the order information of all items.

(可选的步骤)按照每个物品的剩余交付时间，对所有物品进行排序(例如采用升序，但不限于此)，这个步骤的作用是，如果需要配送的物品数量超过配送资源的容量，可以先分配更紧急(剩余交付时间更少)的物品，例如，剩余交付时间小于或等于0，表示物品已经逾期，将优先分配给配送资源。(Optional step) Sort all items according to the remaining delivery time of each item (for example, in ascending order, but not limited to this). Items with more urgency (with less remaining delivery time) are allocated first. For example, if the remaining delivery time is less than or equal to 0, it means that the item is overdue and will be allocated to distribution resources first.

对于每个物品i均执行以下操作：For each item i do the following:

通过DQN计算所有决策行为a_j的价值Q(s,a_j)；Calculate the value Q(s,a _j ) of all decision-making actions a _j through DQN;

通过ETA过滤器，根据输入的ETA结果H(ρ，t，o，d)结果，对所有决策行为进行过滤，得到过滤后的决策行为；Through the ETA filter, all decision-making behaviors are filtered according to the input ETA result H(ρ, t, o, d) to obtain the filtered decision-making behavior;

根据epsilon greedy算法在过滤后的决策行为中选择具有最高价值的决策行为(但仍然会给其他低概率的决策行为一些机会)。Choose the decision action with the highest value among the filtered decision actions according to the epsilon greedy algorithm (but still give some chance to other low probability decision actions).

在所有物品确定出在公共交通中的下一个目的地后，按照路线给配送资源分配订单物品。After all items have determined their next destination in public transportation, order items are allocated to delivery resources according to the route.

本申请实施例提供的配送方法，能够实现基于公共交通的众包配送，引入具有实际意义的利润模型，以最大限度地提高平台对物品配送的利润。为了整合时间限制、多跳交付、乘客参与意愿和利润等实际因素对订单调度的影响，本申请实施例构建了强化学习调度模型，从海量历史交通数据和订单数据中学习调度策略。订单调度被制定为路由决策问题，即为物品配送选择下一个目的地。预计到达时间模块旨在加快培训过程并提供总的交付时间保证。The distribution method provided by the embodiment of the present application can realize crowdsourcing distribution based on public transportation, and introduce a profit model with practical significance, so as to maximize the profit of the platform for the distribution of goods. In order to integrate the impact of practical factors such as time limit, multi-hop delivery, passenger participation willingness and profit on order scheduling, the embodiment of the present application builds a reinforcement learning scheduling model to learn scheduling strategies from massive historical traffic data and order data. Order scheduling is formulated as a routing decision problem, i.e. choosing the next destination for item delivery. The ETA module is designed to expedite the training process and provide an overall delivery time guarantee.

经过对本方案对应的交付数据的统计发现，与现有的配送算法相比，采用本方案使得利润率提高了40％，交付率提高了29％。通过使用预计到达时间模型，将利润率和交付率分别提高了9％和8％。Through the statistics of the delivery data corresponding to this scheme, it is found that compared with the existing distribution algorithm, the use of this scheme increases the profit margin by 40% and the delivery rate by 29%. Improved margins and delivery rates by 9% and 8%, respectively, by using an estimated time of arrival model.

另外，由于本申请实施例的训练样本数据来自于从无处不在的移动设备和基础设施收集的海量传输数据和交付数据，除了众包配送之外，还能支持更多的应用。In addition, since the training sample data in the embodiments of the present application comes from massive transmission data and delivery data collected from ubiquitous mobile devices and infrastructure, in addition to crowdsourcing distribution, more applications can be supported.

本申请实施例中还提供了一种配送方法，该方法的执行主体为配送资源端，如图6所示，该方法包括：The embodiment of the present application also provides a distribution method, and the execution body of the method is a distribution resource terminal. As shown in FIG. 6 , the method includes:

步骤S601：在检测到配送资源针对任一存储柜发起的物品配送指令时，获取配送资源的乘坐目的地；Step S601: when detecting an item delivery instruction initiated by the delivery resource for any storage cabinet, obtain the destination of the delivery resource;

当该配送资源到达该任一存储柜处时，可以通过智能手机或点击存储柜等方式发起针对该任一存储柜的物品配送指令，并输入其乘坐目的地。When the delivery resource arrives at any storage cabinet, an item delivery instruction for any storage cabinet can be initiated by means of a smart phone or by clicking on the storage cabinet, and the travel destination can be input.

步骤S602：将乘坐目的地发送给调度平台；Step S602: sending the ride destination to the dispatch platform;

调度平台会为该任一存储柜中的每个物品确定在公共交通中的下一个目的地，具体的确定方式可参见上文中的介绍，在此不再赘述。The dispatching platform will determine the next destination in the public transportation for each item in any of the storage cabinets. For the specific determination method, please refer to the introduction above, which will not be repeated here.

调度平台在接收到配送资源的乘坐目的地后，会根据乘坐目的地和该任一存储柜中现有物品的下一个目的地，决定配送资源携带哪个或那些物品。After receiving the ride destination of the distribution resource, the scheduling platform will decide which item or items the distribution resource will carry based on the ride destination and the next destination of the existing items in any storage cabinet.

步骤S603：接收调度平台给配送资源分配的下一个目的地与乘坐目的地相同的至少一个物品的物品信息；Step S603: Receive item information of at least one item whose next destination and ride destination are the same as the next destination allocated by the scheduling platform to the distribution resource;

如果配送资源与一个或一些物品共享相同的下一跳，调度平台会向配送资源分配一个或多个物品，具体携带的数量不超出预设的配送资源的携带能力。本申请实施例中对配送资源的携带能力不做具体限定，本领域技术人员可以根据实际情况进行设置或实时评估。If the distribution resource shares the same next hop with one or some items, the scheduling platform will allocate one or more items to the distribution resource, and the specific quantity to be carried does not exceed the preset carrying capacity of the distribution resource. The carrying capacity of the distribution resource is not specifically limited in the embodiments of the present application, and those skilled in the art can set it or evaluate it in real time according to the actual situation.

步骤S604：将物品信息进行展示。Step S604: Display the item information.

调度平台向配送资源分配一个或多个物品后，将分配的物品信息发送给配送资源查看，以便配送资源明确从存储柜取出哪个或哪些物品进行携带。After the scheduling platform allocates one or more items to the distribution resources, it sends the information of the allocated items to the distribution resources for viewing, so that the distribution resources can clearly identify which item or items to take out of the storage cabinet to carry.

本申请实施例的配送方法，通过乘坐公共交通的配送资源携带物品进行配送，达到利用公共交通来配送物品的目的。本申请实施例采用公共交通作为配送的交通工具，可以实现以较快的速度抵达市区内较远的目的地，即作为配送系统中的干路实现更大范围的全城配送。且基于公共交通的配送方式可以减少配送中电动车的使用，一定程度上提高了配送资源的安全性。此外，本申请实施例提供的配送方法非常简单，配送资源只需要在自己路线的起点和终点分别来对物品进行提取或存储，无需特意换乘线路，因此可以促使更多的普通乘客作为配送资源参与进来，能够显著提升配送效率。The distribution method of the embodiment of the present application achieves the purpose of using public transportation to distribute the articles by carrying the articles on the distribution resources of the public transportation. In the embodiment of the present application, public transportation is used as the means of delivery, which can achieve a faster speed to a destination farther in the urban area, that is, as a trunk road in the delivery system, to achieve a wider range of city-wide delivery. And the distribution method based on public transportation can reduce the use of electric vehicles in distribution, and improve the safety of distribution resources to a certain extent. In addition, the distribution method provided by the embodiment of the present application is very simple, and the distribution resources only need to extract or store items at the starting point and the end point of their own routes, and there is no need to change routes, so more ordinary passengers can be promoted as distribution resources. Participation can significantly improve distribution efficiency.

下面以网络订餐场景为例，描述图1所示的众包配送系统的操作流程：The following takes the online ordering scenario as an example to describe the operation process of the crowdsourcing distribution system shown in Figure 1:

(1)用户网络订餐后，商户直接或通过骑手间接地将外卖放在某公共交通的配送起始站点的取餐柜(存储柜)中，网络订餐订单中包含外卖需要送达的目的地，以及外卖配送时间的保证服务，如半小时、1小时等。(1) After the user orders the food online, the merchant directly or indirectly through the rider places the takeaway in the pick-up cabinet (storage cabinet) of the delivery start site of a public transportation. The online order includes the destination to which the takeaway needs to be delivered. And takeaway delivery time guarantee services, such as half an hour, 1 hour, etc.

(2)根据所有环境和订单的实施信息，调度平台决定所有物品(包括该外卖)在公共交通中的下一个目的地。(2) According to the implementation information of all environments and orders, the scheduling platform decides the next destination of all items (including the takeaway) in public transportation.

(3)当参与配送的乘客(也可以是骑手)来到取餐柜时，可以通过智能手机或直接在取餐柜输入其目的地。调度平台根据乘客的输入和取餐柜中现有物品的预定路线，决定乘客携带哪个或哪些物品。如果乘客与该外卖共享相同的下一跳，则会向乘客分配该外卖。(3) When the passengers (or riders) participating in the delivery come to the pick-up counter, they can input their destination through a smartphone or directly in the pick-up counter. The dispatch platform decides which item or items the passenger will carry based on the passenger's input and the predetermined route of the items in the pickup cabinet. If the passenger shares the same next hop with the takeaway, the passenger will be assigned the takeaway.

(4)当乘客到达目的地(不一定是外卖的最终目的地)时，把外卖放入目的地的取餐柜中，并触发任务完成命令。另一名乘客将前来重复上述过程，直到将外卖送公共交通中的最终目的地。乘客只需在自己乘车路线的源头和目的地存取包裹，无需绕道而行。因此，送外卖纯粹是搭便车，乘客的路线或行为不会改变。(4) When the passenger arrives at the destination (not necessarily the final destination of the takeaway), the takeaway is put into the pick-up cabinet of the destination, and the task completion command is triggered. Another passenger will come and repeat the process until the takeaway is delivered to its final destination in public transport. Passengers simply pick up and drop off their packages at the source and destination of their rides, no detours. Therefore, food delivery is purely free-riding, and passengers' routes or behaviors do not change.

(5)当外卖到达公共交通中的最终目的地时，用户可以直接打开取餐柜领取外卖，也可以通过骑手从取餐柜领取外卖后送达给用户。如果交付超过时间限制，平台将补偿用户。(5) When the takeaway arrives at the final destination in public transportation, the user can directly open the pickup cabinet to receive the takeaway, or the rider can pick up the takeaway from the pickup cabinet and deliver it to the user. If the delivery exceeds the time limit, the platform will compensate the user.

该方法通过乘坐公共交通的乘客或骑手接力携带外卖抵达最终目的地，达到利用公共交通来配送外卖的目的，可以实现以较快的速度抵达市区内较远的目的地，即用户有希望网络订购位置更远的餐品。且基于公共交通的配送方式可以减少配送中电动车的使用，一定程度上提高了配送资源的安全性。此外，本申请实施例提供的配送方法非常简单，配送资源只需要在自己路线的起点和终点分别来对物品进行提取或存储，无需特意换乘线路，因此可以促使更多的普通乘客作为配送资源参与进来，能够显著提升配送效率。In this method, passengers or riders who take public transportation relay takeaway food to the final destination, so as to achieve the purpose of using public transportation to deliver takeaway food. Order meals further away. And the distribution method based on public transportation can reduce the use of electric vehicles in distribution, and improve the safety of distribution resources to a certain extent. In addition, the distribution method provided by the embodiment of the present application is very simple, and the distribution resources only need to extract or store items at the starting point and the end point of their own routes, and there is no need to change routes, so more ordinary passengers can be promoted as distribution resources. Participation can significantly improve distribution efficiency.

本申请实施例提供了一种配送装置，如图7所示，该配送装置70可以包括：配送请求接收模块701、订单信息获取模块702、物品目的地确定模块703、乘坐目的地获取模块704以及分配模块705，其中，An embodiment of the present application provides a distribution device. As shown in FIG. 7 , the distribution device 70 may include: a distribution request receiving module 701 , an order information acquisition module 702 , an item destination determination module 703 , a ride destination acquisition module 704 and Allocation module 705, wherein,

配送请求接收模块701用于接收针对任一目标物品的配送请求；The delivery request receiving module 701 is configured to receive a delivery request for any target item;

订单信息获取模块702用于获取当前环境信息以及各个存储柜中的每个物品的订单信息；The order information acquisition module 702 is used to acquire current environment information and order information of each item in each storage cabinet;

物品目的地确定模块703用于基于当前环境信息以及每个物品的订单信息，确定每个物品在公共交通中的下一个目的地；The item destination determination module 703 is configured to determine the next destination of each item in public transportation based on the current environment information and the order information of each item;

乘坐目的地获取模块704用于在检测到配送资源针对任一存储柜发起的物品配送指令时，获取配送资源的乘坐目的地；The ride destination obtaining module 704 is configured to obtain the ride destination of the distribution resource when detecting an item distribution instruction initiated by the distribution resource for any storage cabinet;

分配模块705根据任一存储柜中每个物品的下一个目的地和乘坐目的地，给配送资源分配下一个目的地与乘坐目的地相同的至少一个物品，以使得配送资源将至少一个物品携带到乘坐目的地的存储柜中进行存储。The allocation module 705 allocates at least one item whose next destination is the same as the riding destination to the distribution resource according to the next destination and the riding destination of each item in any storage cabinet, so that the distribution resource can carry the at least one item to the destination. Store in a locker at the ride destination.

一种可选地实施方式中，物品目的地确定模块703在用于基于环境信息以及每个物品的订单信息，确定每个物品在公共交通中的下一个目的地时，具体用于：In an optional implementation manner, when the item destination determination module 703 is used to determine the next destination of each item in public transportation based on the environmental information and the order information of each item, it is specifically used for:

一种可选地实施方式中，物品目的地确定模块703在用于确定每个物品采用各种路线对应的预计到达时间时，具体用于：In an optional implementation manner, when the item destination determination module 703 is used to determine the estimated arrival time corresponding to each item using various routes, it is specifically used for:

本申请实施例的装置可执行本申请实施例所提供的方法，其实现原理相类似，本申请各实施例的装置中的各模块所执行的动作是与本申请各实施例的方法中的步骤相对应的，对于装置的各模块的详细功能描述和有益效果具体可以参见前文中所示的对应方法中的描述，此处不再赘述。The apparatus of the embodiments of the present application can execute the methods provided by the embodiments of the present application, and the implementation principles thereof are similar. The actions performed by each module in the apparatus of the embodiments of the present application are the same as the steps in the methods of the embodiments of the present application. Correspondingly, for the detailed functional description and beneficial effects of each module of the apparatus, reference may be made to the description in the corresponding method shown above, and details are not repeated here.

本申请实施例提供了一种配送装置，如图8所示，该配送装置80可以包括：获取模块801、发送模块802、接收模块803展示模块805，其中，An embodiment of the present application provides a distribution device. As shown in FIG. 8 , the distribution device 80 may include: an acquisition module 801 , a transmission module 802 , a reception module 803 and a display module 805 , wherein,

获取模块801用于在检测到配送资源针对任一存储柜发起的物品配送指令时，获取配送资源的乘坐目的地；The obtaining module 801 is configured to obtain the riding destination of the distribution resource when detecting an item distribution instruction initiated by the distribution resource for any storage cabinet;

发送模块802用于将乘坐目的地发送给调度平台；The sending module 802 is configured to send the ride destination to the dispatch platform;

接收模块803用于接收调度平台给配送资源分配的下一个目的地与乘坐目的地相同的至少一个物品的物品信息；The receiving module 803 is configured to receive the item information of at least one item whose next destination is the same as the riding destination allocated by the scheduling platform to the distribution resource;

展示模块804用于将物品信息进行展示。The display module 804 is used to display the item information.

本申请实施例中提供了一种电子设备，包括存储器、处理器及存储在存储器上的计算机程序，该处理器执行上述计算机程序以实现前述各方法实施例的步骤。An embodiment of the present application provides an electronic device, including a memory, a processor, and a computer program stored in the memory, where the processor executes the above computer program to implement the steps of the foregoing method embodiments.

在一个可选实施例中提供了一种电子设备，如图9所示，图9所示的电子设备900包括：处理器901和存储器903。其中，处理器901和存储器903相连，如通过总线902相连。可选地，电子设备900还可以包括收发器904，收发器904可以用于该电子设备与其他电子设备之间的数据交互，如数据的发送和/或数据的接收等。需要说明的是，实际应用中收发器904不限于一个，该电子设备900的结构并不构成对本申请实施例的限定。In an optional embodiment, an electronic device is provided. As shown in FIG. 9 , the electronic device 900 shown in FIG. 9 includes: a processor 901 and a memory 903 . The processor 901 is connected to the memory 903 , for example, through a bus 902 . Optionally, the electronic device 900 may further include a transceiver 904, and the transceiver 904 may be used for data interaction between the electronic device and other electronic devices, such as data transmission and/or data reception. It should be noted that, in practical applications, the transceiver 904 is not limited to one, and the structure of the electronic device 900 does not constitute a limitation to the embodiments of the present application.

处理器901可以是CPU(Central Processing Unit，中央处理器)，通用处理器，DSP(Digital Signal Processor，数据信号处理器)，ASIC(Application SpecificIntegrated Circuit，专用集成电路)，FPGA(Field Programmable Gate Array，现场可编程门阵列)或者其他可编程逻辑器件、晶体管逻辑器件、硬件部件或者其任意组合。其可以实现或执行结合本申请公开内容所描述的各种示例性的逻辑方框，模块和电路。处理器901也可以是实现计算功能的组合，例如包含一个或多个微处理器组合，DSP和微处理器的组合等。The processor 901 may be a CPU (Central Processing Unit, central processing unit), a general-purpose processor, a DSP (Digital Signal Processor, data signal processor), an ASIC (Application Specific Integrated Circuit, an application-specific integrated circuit), an FPGA (Field Programmable Gate Array, Field Programmable Gate Array) or other programmable logic devices, transistor logic devices, hardware components, or any combination thereof. It may implement or execute the various exemplary logical blocks, modules and circuits described in connection with this disclosure. The processor 901 may also be a combination that implements computing functions, such as a combination of one or more microprocessors, a combination of a DSP and a microprocessor, and the like.

总线902可包括一通路，在上述组件之间传送信息。总线902可以是PCI(Peripheral Component Interconnect，外设部件互连标准)总线或EISA(ExtendedIndustry Standard Architecture，扩展工业标准结构)总线等。总线902可以分为地址总线、数据总线、控制总线等。为便于表示，图9中仅用一条粗线表示，但并不表示仅有一根总线或一种类型的总线。The bus 902 may include a path to transfer information between the components described above. The bus 902 may be a PCI (Peripheral Component Interconnect, Peripheral Component Interconnect) bus or an EISA (Extended Industry Standard Architecture, Extended Industry Standard Architecture) bus or the like. The bus 902 can be divided into an address bus, a data bus, a control bus, and the like. For ease of presentation, only one thick line is used in FIG. 9, but it does not mean that there is only one bus or one type of bus.

存储器903可以是ROM(Read Only Memory，只读存储器)或可存储静态信息和指令的其他类型的静态存储设备，RAM(Random Access Memory，随机存取存储器)或者可存储信息和指令的其他类型的动态存储设备，也可以是EEPROM(Electrically ErasableProgrammable Read Only Memory，电可擦可编程只读存储器)、CD-ROM(Compact DiscRead Only Memory，只读光盘)或其他光盘存储、光碟存储(包括压缩光碟、激光碟、光碟、数字通用光碟、蓝光光碟等)、磁盘存储介质、其他磁存储设备、或者能够用于携带或存储计算机程序并能够由计算机读取的任何其他介质，在此不做限定。The memory 903 may be a ROM (Read Only Memory, read only memory) or other types of static storage devices that can store static information and instructions, a RAM (Random Access Memory, random access memory) or other types that can store information and instructions. A dynamic storage device can also be an EEPROM (Electrically Erasable Programmable Read Only Memory), a CD-ROM (Compact Disc Read Only Memory, a CD-ROM) or other CD-ROM storage, CD-ROM storage (including compressed CDs, Laser discs, optical discs, digital versatile discs, Blu-ray discs, etc.), magnetic disk storage media, other magnetic storage devices, or any other media that can be used to carry or store computer programs and can be read by a computer, without limitation.

存储器903用于存储执行本申请实施例的计算机程序，并由处理器901来控制执行。处理器901用于执行存储器903中存储的计算机程序，以实现前述方法实施例所示的步骤。The memory 903 is used for storing a computer program for executing the embodiments of the present application, and the execution is controlled by the processor 901 . The processor 901 is configured to execute the computer program stored in the memory 903 to implement the steps shown in the foregoing method embodiments.

其中，电子设备可以是指上述调度平台、也可以是指上述配送资源端等，但不限于此。The electronic device may refer to the above-mentioned scheduling platform, or the above-mentioned distribution resource terminal, etc., but is not limited thereto.

本申请实施例提供了一种计算机可读存储介质，该计算机可读存储介质上存储有计算机程序，计算机程序被处理器执行时可实现前述方法实施例的步骤及相应内容。Embodiments of the present application provide a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the steps and corresponding contents of the foregoing method embodiments can be implemented.

本申请实施例还提供了一种计算机程序产品，包括计算机程序，计算机程序被处理器执行时可实现前述方法实施例的步骤及相应内容。Embodiments of the present application further provide a computer program product, including a computer program, and when the computer program is executed by a processor, the steps and corresponding contents of the foregoing method embodiments can be implemented.

应该理解的是，虽然本申请实施例的流程图中通过箭头指示各个操作步骤，但是这些步骤的实施顺序并不受限于箭头所指示的顺序。除非本文中有明确的说明，否则在本申请实施例的一些实施场景中，各流程图中的实施步骤可以按照需求以其他的顺序执行。此外，各流程图中的部分或全部步骤基于实际的实施场景，可以包括多个子步骤或者多个阶段。这些子步骤或者阶段中的部分或全部可以在同一时刻被执行，这些子步骤或者阶段中的每个子步骤或者阶段也可以分别在不同的时刻被执行。在执行时刻不同的场景下，这些子步骤或者阶段的执行顺序可以根据需求灵活配置，本申请实施例对此不限制。It should be understood that, although the respective operation steps are indicated by arrows in the flowcharts of the embodiments of the present application, the execution order of these steps is not limited to the order indicated by the arrows. Unless explicitly stated herein, in some implementation scenarios of the embodiments of the present application, the implementation steps in each flowchart may be performed in other sequences as required. In addition, some or all of the steps in each flowchart are based on actual implementation scenarios, and may include multiple sub-steps or multiple stages. Some or all of these sub-steps or stages may be executed at the same time, and each of these sub-steps or stages may also be executed at different times respectively. In scenarios with different execution times, the execution order of these sub-steps or stages may be flexibly configured according to requirements, which is not limited in this embodiment of the present application.

以上仅是本申请部分实施场景的可选实施方式，应当指出，对于本技术领域的普通技术人员来说，在不脱离本申请的方案技术构思的前提下，采用基于本申请技术思想的其他类似实施手段，同样属于本申请实施例的保护范畴。The above are only optional implementations of some implementation scenarios of the present application. It should be pointed out that for those skilled in the art, without departing from the technical concept of the solution of the present application, other similar solutions based on the technical concept of the present application are adopted. The implementation means also belong to the protection scope of the embodiments of the present application.

Claims

1. a distribution method, is characterized in that, comprises:

In response to a dispatch request for any target item, at least one allocation of the any target item is performed in the following manner until the any target item reaches the final destination:

Get current environment information and order information for each item in each locker;

determining the next destination of each item in public transportation based on the current environment information and the order information for each item;

When detecting an item delivery instruction initiated by the delivery resource for any storage cabinet, obtain the ride destination of the delivery resource;

According to the next destination of each item in any of the storage cabinets and the ride destination, the delivery resource is allocated at least one item whose next destination is the same as the ride destination, so that the delivery The resource carries the at least one item to a storage locker at the ride destination for storage.

2 . The distribution method according to claim 1 , wherein the determining the next destination of each item in public transportation based on the environmental information and the order information of each item, comprising: 3 . :

Determine the model parameters of the pre-built profit model under the maximum profit constraint, and the model parameters of the profit model include the distribution revenue of each item, the number of times of distribution of each item, and the amount of each distribution of each item. at least one of delivery cost, number of overtime orders, and order overtime cost;

A next destination in public transportation for each item is determined based on the profit maximization constraint, the current environment information, and the order information for each item.

3 . The distribution method according to claim 1 , wherein determining the next destination of each item in public transportation based on the environmental information and the order information of each item, comprising: 4 . :

Based on the environmental information and the order information of each item, the next destination of each item in public transportation is determined through a reinforcement learning scheduling model, and the reinforcement learning scheduling model is obtained based on a reinforcement learning algorithm ;

Wherein, the decision-making behavior of the reinforcement learning algorithm includes determining the next destination for the item, the environmental state of the reinforcement learning algorithm includes environmental information and the order information of the item, and the reward of the reinforcement learning algorithm includes after the item reaches the final destination profit.

4. The distribution method according to claim 3, wherein the reinforcement learning scheduling model is obtained based on a reinforcement learning algorithm, comprising:

Acquiring a plurality of training samples, each training sample includes environmental information at a historical moment and order information of multiple items at the historical moment;

The historical environment state of the reinforcement learning algorithm is determined according to each training sample, and based on each training sample, the next destination in public transportation of a plurality of items at the historical moment is determined as the historical decision of the reinforcement learning algorithm behavior, determining the historical return of the reinforcement learning algorithm according to the profit after the multiple items at the historical moment reach the final destination;

Reinforcement learning is performed based on the historical environment state, the historical decision-making behavior and the historical reward to obtain the reinforcement learning scheduling model.

5. A distribution method, characterized in that, comprising:

sending the ride destination to the dispatch platform;

Receive item information of at least one item whose next destination allocated to the distribution resource by the scheduling platform is the same as the riding destination;

Display the item information.

6. A distribution device, characterized in that, comprising:

The delivery request receiving module is used to receive a delivery request for any target item;

The order information acquisition module is used to acquire the current environment information and the order information of each item in each storage cabinet;

an item destination determination module, configured to determine the next destination of each item in public transportation based on the current environment information and the order information of each item;

A ride destination acquisition module, configured to acquire the ride destination of the distribution resource when an item distribution instruction initiated by the distribution resource for any storage cabinet is detected;

The allocation module, according to the next destination of each item in any of the storage cabinets and the riding destination, allocates at least one item whose next destination is the same as the riding destination to the distribution resource, so that The delivery resource carries the at least one item to a storage locker at the ride destination for storage.

7. A distribution device, characterized in that, comprising:

an acquisition module, configured to acquire the ride destination of the distribution resource when an item distribution instruction initiated by the distribution resource for any storage cabinet is detected;

a sending module, configured to send the ride destination to the dispatch platform;

a receiving module, configured to receive the item information of at least one item whose next destination allocated to the distribution resource by the scheduling platform is the same as the riding destination;

The display module is used to display the item information.

8. An electronic device, comprising a memory, a processor and a computer program stored on the memory, wherein the processor executes the computer program to realize any one of claims 1-4 or claim 5. steps of the method described.

9. A computer-readable storage medium on which a computer program is stored, characterized in that, when the computer program is executed by a processor, the steps of the method of any one of claims 1-4 or claim 5 are implemented.

10. A computer program product, comprising a computer program, characterized in that, when the computer program is executed by a processor, the steps of the method of any one of claims 1-4 or claim 5 are implemented.