PhD Candidate, College of Information Sciences and Technology

Penn State University

About Me

πŸŽ“πŸš€ I am a Computational Scientist at the Research Computing Centre at the University of Chicago 🏫, with a passion for skiing ⛷️ in the winter and playing tennis 🎾 in the summer.

I earned my Ph.D. at the College of Information Sciences and Technology, The Pennsylvania State University, under the guidance of Prof. C. Lee Giles πŸŽ“βœ¨. As a member of The Intelligent Information Systems Research Laboratory, I contributed extensively to the CiteSeerX project πŸ“šπŸ”¬.

Collaborating closely with Dr. Jian Wu 🀝, I have been responsible for research, crawling, updating the index, and maintaining the large-scale repository of academic documents (over 20M documents) πŸŒπŸ”Ž.

My thesis centered on building advanced indexing and retrieval techniques for Math Search πŸ”’πŸ”, working on the MathSeer project with the support of Dr. Richard Zanibbi πŸ“πŸ§ .

I’ve had the unique honor of working alongside fantastic people at Allen AI in the Semantic Scholar Team, where I received mentorship from Sergey Feldman and Doug Downey πŸŒŸπŸ€–. I’m also very proud of being the creator of S2QA Semantic Scholar Question Answeringβ€”a first-of-its-kind generative AI QA system that can cite papers using GPT-4 🧩🀯.

Previously, my professional journey as a researcher led me to the Tata Innovation Labs, where I had the opportunity to work with Dr. Puneet Agarwal πŸ§ͺπŸš€.


  • Artificial Intelligence
  • Natural Language Processing
  • Information Retrieval


  • PhD in Informatics, Currently Pursuing

    Penn State University

  • Integrated Post Graduation (Masters) in Information and Communication Technology, 2014

    Indian Institute of Information Technology and Management, Gwalior

Recent News

September ‘22Released ACL Anthology Corpus - 113 stars on github [ dataset details ]
August ‘22Teaching IST 441 Information Retrieval and Search Engines [ course details ]
April ‘22S2AMP - S2 Analysis of MentorshiP was accepted in JCDL’22 in the late breaking and dataset track [ data ]
March ‘22Accepted an internship offer from Allen AI for Summer’22 in Seattle, WA
December ‘21Building an Accessible, Usable, Scalable, and Sustainable Service for Scholarly Big Data accepted in International Conference on Big Data [ pdf ]
September ‘21What Were People Searching For? A Query Log Analysis of An Academic Search Engine accepted in JCDL’21 as a poster [ pdf ]
May ‘21Started my summer internship at Allen AI. Working with the S2 Research team on modelling and inference of academic mentorship at scale
February ‘21Large scale subject category classification of scholarly papers with deep attentive neural networks accepted at Frontiers in research metrics and analytics [ paper ]



Research Intern

Allen AI

May 2021 – Present Seattle, WA
Worked with the S2 research team on modelling mentorship between coauthors [code]

Graduate Research Assistant

The Intelligent Information Systems Research Laboratory

Sep 2017 – Present State College, PA


Tata Innovation Labs

Dec 2014 – Aug 2017 Noida, India
Worked on various projects involving Machine Learning, NLP, BigData and Graph Mining. Responsibilities include:

  • Analysing
  • Modelling
  • Deploying
  • Publishing

Recent Posts

ACHARYA: Triage For Preliminary COVID-19 Symptoms

WHAT IS ACHARYA? We introduce Acharya, a multi-lingual WhatsApp based messaging platform and chatbot service for Indian users which interacts with people (via simple interactive questions) to assess (i) their potential risk of COVID-19 infection; and (ii) whether they qualify (or need) to get tested for COVID-19, as per current ICMR guidelines.

Recent Publications

Quickly discover relevant content by filtering publications.

COVIDSeer: Extending the CORD-19 Dataset

Tangent-CFT: An Embedding Model for Mathematical Formulas

CiteSeerX-2018: A Cleansed Multidisciplinary Scholarly Big Dataset

Courses and Certificates

IST 441 Information Retrieval and Search Engines

Graded : A

Neural Networks and Deep Learning

First course of the Deep Learning Specialization.
See certificate

Machine Learning

Learned about the Core Machine Learning Algorithms
See certificate

IST 597: Foundations of Deep Learning

Graded : A

Oracle Certified Java Associate












Participated in 3 Competiitons