报 告 人：肖峰 教授
肖峰，教授，博士生导师，西南财经大学大数据研究院副院长。国家自然科学基金优秀青年基金获得者，教育部长江学者青年学者（2017），四川省百人计划特聘专家。主持多项国家和省部级课题。研究方向包括道路拥挤收费、网络建模和优化、博弈论、机器学习与交通数据挖掘、智能交通系统等。在管理科学与工程交通研究领域著名国际期刊和会议如Transportation Science，Transportation Research Part A、B、C、D, ISTTT等发表多篇论文。
Ride-sourcing services are now reshaping the way people travel by effectively connecting drivers and passengers through mobile internets. Online matching between idle drivers and waiting passengers is one of the most key components in a ride-sourcing system. The average pickup distance or time is an important measurement of system efficiency since it affects both passengers’ waiting time and drivers’ utilization rate. It is naturally expected that a more effective bipartite matching (with smaller average pickup time) can be implemented if the platform accumulates more idle drivers and waiting passengers in the matching pool. A specific passenger request can also benefit from a delayed matching since he/she may be matched with closer idle drivers after waiting for a few seconds. Motivated by the potential benefits of delayed matching, this paper establishes a two-stage framework which incorporates a combinatorial optimization and multi-agent deep reinforcement learning methods. The multi-agent reinforcement learning methods are used to dynamically determine the delayed time for each passenger request (or the time at which each request enters the matching pool), while the combinatorial optimization conducts an optimal bipartite matching between idle drivers and waiting passengers in the matching pool. Two reinforcement learning methods, spatio-temporal multi-agent deep Q learning (ST-M-DQN) and spatio-temporal multi-agent actor-critic (ST-M-A2C) are developed. Through extensive empirical experiments with a well-designed simulator, we show that the proposed framework is able to remarkably improve system performances.