应002全讯白菜网国际数学暑期学校 (http://mss.hit.edu.cn) 邀请, 国际著名统计学家、金融学家,普林斯顿大学范剑青(Jianqing Fan) 教授访问公司进行微课程授课, 并面向学校作学术报告,欢迎感兴趣的师生参加!
报告题目:Big Data Big Assumption: Spurious discoveries and endogeneity
报告时间: 2015年7月24日,上午10:00点
报告地点:002全讯白菜网科学园 2H楼201会议室
简介: 范剑青教授,美国普林斯顿大学运筹学与金融工程学系主任,中央研究院院士。1989年在加州大学伯克利分校获得博士学位(师从著名应用数学和统计学大师David Donoho院士)。2000年获得国际统计学最高荣誉考普斯“总统奖”(统计学领域的菲尔茨奖),成为获此殊荣的大陆第一人而饮誉海内外。2006年应邀在享有盛名的国际数学家大会上作45分钟邀请报告; 2007年荣获晨兴华人数学家大会应用数学金奖,该奖被誉为华人数学界的最高奖。2008年获泛华统计学会授予的杰出成就奖。2012年当选中央研究院院士,2013年获得泛华统计学会首次颁发的许宝禄奖,该奖颁发给不超过50岁的优秀统计学家,三年一次。 2014年荣获英国皇家统计学会的Guy 银奖,该奖每年一次,是继该学会三年一次的金奖之后的最高荣誉。于2008年当选国际数理统计学会(IMS)主席,是该会创会以来70多位主席中唯一的中国人; 2009 年当选国际泛华统计学会主席。 范剑青教授曾担任国际统计学领域NO.1期刊《Annals of Statistics统计年鉴》主编,成为该杂志创刊70多年来第一个亚裔主编。曾担任国际顶尖概率论杂志《Probability Theory and its Related Fields概率论及其相关领域》(2003-2005), 《Econometrics Journal计量经济》和《Journal of Multivariate Analysis,多元分析杂志》的共同主编(Co-Editor)。现担任国际顶级学术期刊《Journal of Econometrics 计量经济》的主编、及《Journal of the American Statistical Association》、《Annals of Statistics统计年鉴》、《Econometrica》等副主编或编委。
报告摘要: Over the last two decades, many exciting variable selection methods have been developed for finding a small group of covariates that are associated with the response from a large pool. Can the discoveries by such data mining approaches be spurious? Can our fundamental assumptions on exogeneity of covariates needed for such variable selection be validated with the data? To answer these questions, we need to derive the distributions of the maximum spurious correlations given certain number of predictors. When the covariance matrix of covariates possesses the restricted eigenvalue property, we derive such distributions, using Gaussian approximation and empirical process techniques. However, such a distribution depends on the unknown covariance matrix of the covariate. Hence, we propose a multiplier bootstrap method to approximate the unknown distributions and establish the consistency of such a simple bootstrap approach. The results are further extended to the situation where residuals are from regularized fits. Our approach is then applied to construct the upper confidence limit for the maximum spurious correlation and testing exogeneity of covariates. The former provides a baseline for guiding false discoveries due to data mining and the latter tests whether our fundamental assumptions for high-dimensional model selection are statistically valid. Our techniques and results are illustrated by both numerical examples. |