Reading work

The papers listed below were selected from the list of papers accepted at the 2018 ACM KDD conference.

You should organize in groups of at most 3 people. Each group chooses a unique paper. You will have one hour to read through and discuss the paper in order to answer the questions that follow. A brief presentation will be given in the last half hour of the class.

1. ACM KDD 2018 list of accepted papers:

Discovering Non-Redundant K-means Clusterings in Optimal Subspaces Dominik Mautz (Ludwig Maximilian University of Munich); Wei Ye (Ludwig Maximilian University of Munich); Claudia Plant (University of Vienna); Christian B
Why should I trust you?
Unlearn What You Have Learned: Adaptive Crowd Teaching with Exponentially Decayed Memory Learners Yao Zhou (Arizona State University); Arun Reddy Nelakurthi (Arizona State University); Jingrui He (Arizona State University)
Scalable k-Means Clustering via Lightweight Coresets Olivier Bachem (ETH Zurich); Mario Lucic (Google); Andreas Krause (ETH Zurich)
TextTruth: An Unsupervised Approach to Discover Trustworthy Information from Multi-Sourced Text Data Hengtong Zhang (SUNY at Buffalo); Yaliang Li (Baidu Research); Fenglong Ma (SUNY Buffalo); Jing Gao (University at Buffalo); Lu Su (The State University of New York at Buffalo)
Graph Classification using Structural Attention John Boaz Lee (WPI); Ryan Rossi (Adobe Research); Xiangnan Kong (WPI)
You Are How You Drive: Peer and Temporal-Aware Representation Learning for Driving Behavior Analysis Pengyang Wang (Missouri University of Science and Technology); Yanjie Fu (Missouri University of Science and Technology); Jiawei Zhang (Florida State University); Pengfei Wang (CNIC, Chinese Academy of Sciences); Yu Zheng (Urban Computing Business Unit, JD Finance); Charu Aggarwal (IBM)
MiSoSouP: Mining Interesting Subgroups with Sampling and Pseudodimension Matteo Riondato (Two Sigma Investments, LP); Fabio Vandin (University of Padova)
Towards Mitigating the Class-Imbalance Problem for Partial Label Learning Jing Wang (Southeast University); Min-Ling Zhang (Southeast University)
Complex Object Classification: A Multi-Modal Multi-Instance Multi-Label Deep Network with Optimal Tr Yang Yang (NanJing university); Yi-Feng Wu (LAMDA Group, Nanjing University); De-Chuan Zhan (Nanjing University); Zhi-Bin Liu (Tencent); Yuan Jiang (Nanjing University)
TruePIE: Discovering Reliable Patterns in Pattern-Based Information Extraction Qi Li (University of Illinois at Urbana-Champaign); Meng Jiang (University of Notre Dame); Xikun Zhang (University of Illinois at Urbana-Champaign); Meng Qu (University of Illinois at Urbana-Champaign); Timothy Hanratty (US Army Research Laboratory); Jing Gao (University at Buffalo); Jiawei Han (University of Illinois at Urbana-Champaign)
Risk Prediction on Electronic Healthcare Records with Prior Medical Knowledge Fenglong Ma (SUNY Buffalo); Jing Gao (SUNY Buffalo); Qiuling Suo (SUNY Buffalo); Quanzeng You (Microsoft AI & Research); Jing Zhou (Eheath Inc); Aidong Zhang (SUNY Buffalo)
Algorithms for Hiring and Outsourcing in the Online Labor Market Aris Anagnostopoulos (Sapienza University of Rome); Carlos Castillo (Universitat Pompeu Fabra); Adriano Fazzone (Sapienza University of Rome); Stefano Leonardi (Sapienza University of Rome); Evimaria Terzi (Boston University)
PCA by Determinant Optimization has no Spurious Local Optima Raphael Hauser (University of Oxford); Armin Eftekhari (Alan Turing Institute); Heinrich Matzinger (Georgia Institute of Technology)
Efficient Large-Scale Fleet Management via Multi-Agent Deep Reinforcement Learning Kaixiang Lin (Michigan State University); Renyu Zhao (AI Labs, Didi Chuxing); Zhe Xu (AI Labs, Didi Chuxing); Jiayu Zhou (Michigan State University)
TaxoGen: Constructing Topical Concept Taxonomy by Adaptive Term Embedding and Clustering Chao Zhang (University of Illinois at Urbana-Champaign); Fangbo Tao (Facebook); Xiusi Chen (University of Illinois at Urbana-Champaign); Jiaming Shen (University of Illinois at Urbana-Champaign); Meng Jiang (University of Notre Dame); Brian Sadler (U.S. Army Research Lab); Michelle Vanni (U.S. Army Research Lab); Jiawei Han (University of Illinois at Urbana-Champaign)
IntelliLight: a Reinforcement Learning Approach for Intelligent Traffic Light Control Hua Wei (The Pennsylvania State University); Guanjie Zheng (The Pennsylvania State University); Huaxiu Yao (The Pennsylvania State University); Zhenhui Li (The Pennsylvania State University)
Generalized Score Functions for Causal Discovery Biwei Huang (Carnegie Mellon University); Kun Zhang (Carnegie Mellon University); Yizhu Lin (Carnegie Mellon University); Bernhard Scho?lkopf (Max-Planck Institute for Intelligent Systems); Clark Glymour (Carnegie Mellon University)
XiaoIce Band: A Melody and Arrangement Generation Framework for Pop Music Hongyuan Zhu (USTC); Qi Liu (USTC); Nicholas Jing Yuan (Microsoft); Chuan Qin (USTC); Jiawei Li (Soochow University); Kun Zhang (USTC); Guang Zhou (Microsoft); Furu Wei (Microsoft); Yuanchun Xu (Microsoft); Enhong Chen (USTC)
Training Big Random Forests with Little Resources Fabian Gieseke (University of Copenhagen); Christian Igel (University of Copenhagen)

2. ACM KDD 2018 list of accepted Posters

Extremely Fast Decision Tree Chaitanya Manapragada (Monash University); Geoffrey Webb (Monash University); Mahsa Salehi (Monash University)

On the Generative Discovery of Structured Medical Knowledge Chenwei Zhang (University of Illinois at Chicago); Yaliang Li (Baidu Research Big Data Lab); Nan Du (Tencent Medical AI Lab); Wei Fan (Tencent Medical AI Lab); Philip S. Yu (University of Illinois at Chicago)

3. Questions:

What is the data mining or learning task in the paper?
What is the data mining/machine learning method used?
What are the charateristics of the data used? (dimension, types of variables, imbalanced? missing? etc)
What is the evaluation metrics used?
What is the validation method used?
Is the method applied to various datasets or just one?
Is the method compared with other methods?
Can you conclude that the experimental methodology is sound?
Can you conclude that the experimental results are good?
Are the models applied in practice? (would the model generalize? Is the model biased?)
Just using the information contained in the paper would you be able to reproduce results?