Jay Pujara's Webpage


Jay Pujara
Ph.D. Candidate
Computer Science Department
University of Maryland, College Park

Contact Information:

E-Mail:
<my first name> @ cs.umd.edu

Mailing Address:
3228 AV Williams Building
University of Maryland
College Park, MD 20742

Career Information:
CV




|   About Me  |   Research Interests  |   Education  |   Publications  |
|   Work Experience  |   Teaching  |   Presentations  |   Course Work  |

About Me

I'm Jay and I'm a PhD student in Computer Science at the University of Maryland. Currently, I'm doing research in the field of machine learning with my advisor Lise Getoor and the LINQS group. From 2006-2010 I worked on spam detection at Yahoo! in Sunnyvale, CA. I completed my undergraduate education as well as a research Masters program at Carnegie Mellon University, graduating in 2005.

Research Interests

My research focuses on scalable machine learning to address scenarios where billions of predictions are necessary in a limited amount of time and large, noisy corpora of training data are available. Particular topics that I'm actively pursuing are Knowledge Graph Identification, Efficient Prediction Using Classifier Cascades, Scalable Entity Resolution, and Reducing Label Cost. Check out my CV for more elaborate descriptions about each of these topics.

Education

University of Maryland, College Park, 2010-
School of Computer Science
Ph.D. Candidate

University of California, Santa Cruz, 2014-2015
Jack Baskin School of Engineering
Visiting Student

Carnegie Mellon University, 2014
Machine Learning Department
Visiting Research Scholar

Carnegie Mellon University, 2004-2005
School of Computer Science
M.S. Computer Science
Thesis: Understanding Feature Selection in Functional Magnetic Resonance Imaging

Carnegie Mellon University, 2000-2004
School of Computer Science
B.S. Computer Science
Minors in Logic and Computation, Mathematical Science, and Robotics
Thesis: Machine Learning Classification of fMRI data
University Honors and College Honors

Carnegie Mellon University, 2001-2004
Carnegie Institute of Technology
B.S. Electrical and Computer Engineering
University Honors

Carnegie Mellon University, 2001-2004
School of Humanities and Social Sciences
B.S. Cognitive Science
University Honors


Publications

See also: LINQS Publications and Google Scholar

Journals and Magazines

Using Semantics & Statistics to Turn Data into Knowledge. Jay Pujara, Hui Miao, Lise Getoor, William W. Cohen. AI Magazine 36.1 (2015). (pdf, bibtex)

Refereed Conferences

Budgeted Online Collective Inference. Jay Pujara, Ben London, and Lise Getoor. 2015 Conference on Uncertainty in Artificial Intelligence (UAI). (pdf, bibtex, github)

RELLY: Inferring Hypernym Relationships Between Relational Phrases. Adam Grycner, Gerhard Weikum, Jay Pujara, James Foulds, and Lise Getoor. 2015 Conference on Emperical Methods in Natural Language Processing. (pdf, bibtex)

Knowledge Graph Identification. Jay Pujara, Hui Miao, Lise Getoor, William Cohen. 2013 International Semantic Web Conference (ISWC). [winner of Best Student Paper award] (pdf, bibtex, github, video, slides)

Using Classifier Cascades for Scalable E-Mail Classification. Jay Pujara, Hal Daume III, and Lise Getoor. CEAS 2011. [winner of Best Paper award] (pdf, bibtex, slides)

Refereed Workshops and Symposia

Online Inference for Knowledge Graph Construction. Jay Pujara, Ben London, Lise Getoor, and William W. Cohen. UAI 2015 Workshop on Statistical Relational AI (StaRAI). (pdf)

Building Dynamic Knowledge Graphs. Jay Pujara and Lise Getoor. NIPS 2014 Workshop on Automated Knowledge Base Construction (AKBC). (pdf, bibtex)

A Unified Probabilistic Approach for Semantic Clustering of Relational Phrases. Adam Grycner, Gerhard Weikum, Jay Pujara, James Foulds, Lise Getoor. NIPS 2014 Workshop on Automated Knowledge Base Construction (AKBC). (pdf)

Probabilistic Models for Collective Entity Resolution Between Knowledge Graphs. Jay Pujara, Kevin Murphy, Xin Luna Dong, Curtis Janssen. Bay Area Machine Learning Symposium (BayLearn). (pdf, bibtex)

Large-Scale Knowledge Graph Identification using PSL (extended abstract). Jay Pujara, Hui Miao, Lise Getoor, William Cohen. AAAI Fall 2013 Symposium on Semantics for Big Data. (pdf, bibtex)

Ontology-Aware Partitioning for Knowledge Graph Identification. Jay Pujara, Hui Miao, Lise Getoor, William Cohen. 2013 CIKM Workshop on Automated Knowledge Base Construction (AKBC). [selected for spotlight talk] (pdf, bibtex, slides)

Joint Judgments with a Budget: Strategies for Reducing the Cost of Inference. Jay Pujara, Hui Miao, Lise Getoor. 2013 ICML Workshop on Machine Learning with Test-Time Budgets. (pdf, bibtex)

Large-Scale Knowledge Graph Identification using PSL. Jay Pujara, Hui Miao, Lise Getoor, William Cohen. 2013 ICML Workshop on Structured Learning (SLG). (pdf, bibtex)

Large-Scale Hierarchical Topic Models. Jay Pujara, Peter Skomoroch. NIPS 2012 workshop on Big Learning. (pdf, bibtex)

Social Group Modeling with Probabilistic Soft Logic. Bert Huang, Stephen H. Bach, Eric Norris, Jay Pujara, Lise Getoor. NIPS 2012 workshop on Social Network and Social Media Analysis. (pdf, bibtex)

Reducing Label Cost by Combining Feature Labels and Crowdsourcing. Jay Pujara, Ben London, and Lise Getoor. ICML 2011 workshop on Combining Learning Strategies to Reduce Label Cost. [selected for contributed talk] (pdf, bibtex, slides)

Facilitating Medication Reconciliation with Animation and Spatial layout. Leo Claudino, Sameh Khamis, Ran Liu, Ben London, Jay Pujara, Catherine Plaisant, Ben Shneiderman. Workshop on Interactive Systems in Healthcare.

Coarse-to-Fine, Cost-Sensitive Classification of E-Mail. Jay Pujara and Lise Getoor. NIPS 2010 Workshop on Coarse-to-Fine Processing. [selected for spotlight talk] (pdf, bibtex, slides)

Patents

Real-time Ad-Hoc Spam Filtering of E-Mail. Jay Pujara, Patent 8,069,128; awarded 2011.

Employing pixel density to detect a spam image. Ke Wei, Hao Zheng, Jay Pujara, Patent 7,882,177; awarded 2011.

Identifying IP addresses for spammers. Jaesik Choi, Jay Pujara, Vishwanath Ramarao, Ke Wei, Patent 7,849,146; awarded 2010.


Work Experience

Institution Position Dates
Google Inc., Mountain View, CA Engineering Intern, Knowledge Vault Summer 2014
LinkedIn Corp., Mountain View, CA Data Science Intern, Skills Summer 2012
Yahoo! Inc., Sunnyvale, CA (remote) Data Researcher, Yahoo! Mail Fall 2010 - Spring 2012
Yahoo! Inc., Sunnyvale, CA Senior Engineer, Yahoo! Mail Fall 2006 - Fall 2010
Oracle Corp., Redwood Shores, CA Member of Technical Staff, Business Intelligence Fall 2005 - Fall 2006
Carnegie Mellon University, Pittsburgh, PA Graduate Research Assistant Summer 2004
University of Pittsburgh, Pittsburgh, PA Research Programmer, Learning R&D Center Summer 2003
InternalDrive Corp., Stanford, CA Camp Instructor, Game Programming & C++ Summer 2002
Carnegie Mellon University, Pittsburgh, PA Research Programmer, Robotics Institute Summer 2001
WV State Legislature, Charleston, WV Web Designer and Developer Spring 2000


Teaching Experience

Institution Topic Dates
University of California, Santa Cruz Knowledge Graph Construction (Lecture) Spring 2014
National Youth Science Camp, Barstow, WV Game Theory and Artificial Intelligence (Seminar) Summer 2013
National Youth Science Camp, Barstow, WV How to Think Like a Computer Scientist (Lecture) Summer 2012
National Youth Science Camp, Barstow, WV A Brief, Yet Helpful, Guide to Machine Learning (Seminar) Summer 2012
University of Maryland, College Park Artificial Intelligence (Course TA) Fall 2011
University of Maryland, College Park Game Playing and Search (Lecture) Fall 2011
National Youth Science Camp, Barstow, WV The Mysteries of Computer Science (Lecture) Summer 2011
InternalDrive Corp., Stanford, CA Game Programming (Course Instructor) Summer 2002
InternalDrive Corp., Stanford, CA C++ (Course Instructor) Summer 2002
George Washington Community Education Center, Charleston, WV Introduction to the Internet (Course Instructor) Fall 1999
George Washington Community Education Center, Charleston, WV Computer Skills (Course Instructor) Fall 1999

Invited Talks, Presentations and Tutorials

Knowledge Graph Construction, talk for D5 group at Max Planck Institut Informatik, Summer 2015. (slides)

Knowledge Graph Construction, talk for Web Science and Knowledge Management group at Karlsruhe Institute of Technology, Summer 2015. (slides)

Efficient Online Collective Inference for Graphical Models, talk at the New Perspectives for Relational Learning Workshop at the Banff International Research Station, Spring 2015. (video, slides)

Knowledge Graph Identification, talk for the ReadTheWeb group at Carnegie Mellon, Fall 2014. (slides)

Knowledge Graph Construction, tutorial given in the Advanced Machine Learning course at University of California, Santa Cruz, Spring 2014. (lecture video, demo video, slides)

Large-Scale Knowledge Graph Identification using PSL, talk at AAAI Symposium on Semantics for Big Data, Fall 2013. (slides)

Ontology-Aware Partitioning for Knowledge Graph Identification, talk at CIKM workshop on Automated Knowledge Base Construction, Fall 2013. (slides)

Knowledge Graph Identification, talk at International Semantic Web Conference, Fall 2013. (video, slides)

Using Classifier Cascades for Scalable E-mail Classification, talk at the University of Maryland Computer Vision Student Seminar, Winter 2012. (slides)

Using Classifier Cascades for Scalable E-Mail Classification, talk at Conference on Collaboration, Electronic Messaging, Anti-Abuse, and Spam, Summer 2011. (slides)

Reducing Label Cost by Combining Feature Labels and Crowdsourcing., talk at ICML workshop on Combining Learning Strategies to Reduce Label Cost, Summer 2011. (slides)

Coarse-to-Fine, Cost-Sensitive Classification of E-Mail, talk at NIPS workshop on Coarse-to-Fine Processing, Fall 2010. (slides)

Using Hadoop to Fight Spam, interview by the Yahoo! Developer network, Spring 2009. (part 1, part 2)


Graduate-level Course Work

Spring 2012 at University of Maryland
CMSC818C: Local Data and Privacy with Bobby Bhattacharjee
CMSC828L: Link Mining with Lise Getoor

Spring 2011 at University of Maryland
CMSC734: Information Visualization with Ben Shneiderman
CMSC858P: Computational methods for high-throughput analysis of biological systems with Hector Corrada Bravo

Fall 2010 at University of Maryland
CMSC723: Computational Linguistics with Hal Daume
CMSC858F: Algorithmic Game Theory with Mohammad Hajiaghayi

Spring 2005 at Carnegie Mellon University
15-721: Database System Design and Implementation with Anastasia Ailamaki
85-714: Cognitive Neuropsychology with Marlene Behrmann

Fall 2004 at Carnegie Mellon University
15-744: Computer Networks with Srinivasan Seshan
15-781: Machine Learning with Andrew Moore