PhD Candidate, College of Information Sciences and Technology

Penn State University

About Me

πŸŽ“πŸš€ I am a Computational Scientist at the Research Computing Centre at the University of Chicago 🏫, with a passion for skiing ⛷️ in the winter and playing tennis 🎾 in the summer.

I earned my Ph.D. at the College of Information Sciences and Technology, The Pennsylvania State University, under the guidance of Prof. C. Lee Giles πŸŽ“βœ¨. As a member of The Intelligent Information Systems Research Laboratory, I contributed extensively to the CiteSeerX project πŸ“šπŸ”¬.

Collaborating closely with Dr. Jian Wu 🀝, I have been responsible for research, crawling, updating the index, and maintaining the large-scale repository of academic documents (over 20M documents) πŸŒπŸ”Ž.

My thesis centered on building advanced indexing and retrieval techniques for Math Search πŸ”’πŸ”, working on the MathSeer project with the support of Dr. Richard Zanibbi πŸ“πŸ§ .

I’ve had the unique honor of working alongside fantastic people at Allen AI in the Semantic Scholar Team, where I received mentorship from Sergey Feldman and Doug Downey πŸŒŸπŸ€–. I’m also very proud of being the creator of S2QA Semantic Scholar Question Answeringβ€”a first-of-its-kind generative AI QA system that can cite papers using GPT-4 🧩🀯.

Previously, my professional journey as a researcher led me to the Tata Innovation Labs, where I had the opportunity to work with Dr. Puneet Agarwal πŸ§ͺπŸš€.

Interests

  • Artificial Intelligence
  • Natural Language Processing
  • Information Retrieval

Education

  • PhD in Informatics, Currently Pursuing

    Penn State University

  • Integrated Post Graduation (Masters) in Information and Communication Technology, 2014

    Indian Institute of Information Technology and Management, Gwalior

Recent News

DateNews
September β€˜22Released ACL Anthology Corpus - 113 stars on github [ dataset details ]
August β€˜22Teaching IST 441 Information Retrieval and Search Engines [ course details ]
April β€˜22S2AMP - S2 Analysis of MentorshiP was accepted in JCDL’22 in the late breaking and dataset track [ data ]
March β€˜22Accepted an internship offer from Allen AI for Summer’22 in Seattle, WA
December β€˜21Building an Accessible, Usable, Scalable, and Sustainable Service for Scholarly Big Data accepted in International Conference on Big Data [ pdf ]
September β€˜21What Were People Searching For? A Query Log Analysis of An Academic Search Engine accepted in JCDL’21 as a poster [ pdf ]
May β€˜21Started my summer internship at Allen AI. Working with the S2 Research team on modelling and inference of academic mentorship at scale
February β€˜21Large scale subject category classification of scholarly papers with deep attentive neural networks accepted at Frontiers in research metrics and analytics [ paper ]

Experience

 
 
 
 
 

Research Intern

Allen AI

May 2021 – Present Seattle, WA
Worked with the S2 research team on modelling mentorship between coauthors https://github.com/allenai/S2AMP-data [code]
 
 
 
 
 

Graduate Research Assistant

The Intelligent Information Systems Research Laboratory

Sep 2017 – Present State College, PA
 
 
 
 
 

Researcher

Tata Innovation Labs

Dec 2014 – Aug 2017 Noida, India
Worked on various projects involving Machine Learning, NLP, BigData and Graph Mining. Responsibilities include:

  • Analysing
  • Modelling
  • Deploying
  • Publishing

Recent Posts

ACHARYA: Triage For Preliminary COVID-19 Symptoms

WHAT IS ACHARYA? We introduce Acharya, a multi-lingual WhatsApp based messaging platform and chatbot service for Indian users which interacts with people (via simple interactive questions) to assess (i) their potential risk of COVID-19 infection; and (ii) whether they qualify (or need) to get tested for COVID-19, as per current ICMR guidelines.

Recent Publications

Quickly discover relevant content by filtering publications.

COVIDSeer: Extending the CORD-19 Dataset

Tangent-CFT: An Embedding Model for Mathematical Formulas

CiteSeerX-2018: A Cleansed Multidisciplinary Scholarly Big Dataset

Courses and Certificates

IST 441 Information Retrieval and Search Engines

Graded : A

Neural Networks and Deep Learning

First course of the Deep Learning Specialization.
See certificate

Machine Learning

Learned about the Core Machine Learning Algorithms
See certificate

IST 597: Foundations of Deep Learning

Graded : A

Oracle Certified Java Associate

Skills

Statistics

100%

Python

100%

Linux

100%

Java

70%

Docker

Kaggle

Participated in 3 Competiitons

Contact