报告题目：Optimal Sampling Gaps for Adaptive Submodular Maximization
报告人：唐少杰 副教授（University of Texas at Dallas）
摘要：Running machine learning algorithms on large and rapidly growing volumes of data is often computationally expensive, one common trick to reduce the size of a data set, and thus reduce the computational cost of machine learning algorithms, is probability sampling. It creates a sampled data set by including each data point from the original data set with a known probability. Although the benefit of running machine learning algorithms on the reduced data set is obvious, one major concern is that the performance of the solution obtained from samples might be much worse than that of the optimal solution when using the full data set. In this paper, we examine the performance loss caused by probability sampling in the context of adaptive submodular maximization. We consider a simple probability sampling method which selects each data point with probability at least $r\in[0,1]$. If we set $r=1$, our problem reduces to finding a solution based on the original full data set. We define sampling gap as the largest ratio between the optimal solution obtained from the full data set and the optimal solution obtained from the samples, over independence systems. Our main contribution is to show that if the sampling probability of each data point is at least $r$ and the utility function is policywise submodular, then the sampling gap is both upper bounded and lower bounded by $1/r$. We show that the property of policywise submodular can be found in a wide range of real-world applications, including pool-based active learning and adaptive viral marketing. This talk is based on joint work with Jing Yuan.
报告人简介：美国德克萨斯大学达拉斯分校Naveen Jindal管理学院副教授。主要研究方向为电子商务、社交网络、优化和博弈论等，大量成果发表在Operations Research Letters、INFORMS Journal on Computing、Production and Operations Management、Information Systems Research、IEEE Transactions on Information Systems、IEEE Transactions on Knowledge and Data Engineering、IEEE Transactions on Mobile Computing等顶级期刊，H-index指数42，共被引用8000次以上。