报告讲座 > 正文

安波: Impression allocation for combating fraud in e-commerce via deep reinforcement learning with action norm penalty
发布日期:2018-06-28  字号:   【打印

报告时间2017年6月29日(星期五)16:00

报告地点:科技楼九楼900会议室

  :安波

工作单位新加坡南洋理工大学

举办单位:管理学院

报告人简介

Bo An is a Nanyang Assistant Professor with the School of Computer Science and Engineering, Nanyang Technological University, Singapore. He received the Ph.D degree in Computer Science from the University of Massachusetts, Amherst. His current research interests include artificial intelligence, multiagent systems, game theory, and optimization. He has published over 50 referred papers at AAMAS, IJCAI, AAAI, ICAPS, KDD, JAAMAS, AIJ and  IEEE Transactions. Dr. An was the recipient of the 2010 IFAAMAS Victor Lesser Distinguished Dissertation Award, an Operational Excellence Award from the Commander, First Coast Guard District of the United States, the Best Innovative Application Paper Award at AAMAS-12, the 2012 INFORMS Daniel H. Wagner Prize for Excellence in Operations Research Practice, and the Innovative Application Award at IAAI-16. He was invited to give Early Career Spotlight talk at IJCAI-17. He is a member of the editorial board of JAIR and the Associate Editor of JAAMAS. He was elected to the board of directors of IFAAMAS.

报告简介

Conducting fraud transactions has become popular among e-commerce sellers to make their products favorable to the platform and buyers, which decreases the utilization efficiency of buyer impressions and jeopardizes the business environment. Fraud detection techniques are necessary but not enough for the platform since it is impossible to recognize all the fraud transactions. This talk focuses on improving the platform’s impression allocation mechanism to maximize its profit and reduce the sellers’ fraudulent behaviors simultaneously. First, we learn a seller behavior model to predict the sellers’ fraudulent behaviors from the real-world data provided by one of the largest ecommerce company in the world. Then, we formulate the platform’s impression allocation problem as a continuous Markov Decision Process (MDP) with unbounded action space. In order to make the action executable in practice and facilitate learning, we propose a novel deep reinforcement learning algorithm DDPG-ANP that introduces an action norm penalty to the reward function. Experimental results show that our algorithm significantly outperforms existing baselines in terms of scalability and solution quality. This talk will also outline future research directions.

(罗贺/文)  
编辑:徐小红
0