PhD Candidate, College of Information Sciences and Technology

Penn State University

About Me

I’m a fifth-year Ph.D. candidate at the College of Information Sciences and Technology, The Pennsylvania State University. I am advised by Prof. C.Lee Giles. I am a member of The Intelligent Information Systems Research Laboratory in which I work specifically on the CiteSeerX project.

Here, with the help of Dr. Jian Wu I am responsible for research, crawling, updating the index and maintaining the repository of academic documents at scale(>20M documents). My thesis is on building advanced indexing and retrieval techniques for Math Search. This work is under the project MathSeer on which I am actively working with the help of Dr. Richard Zanibbi.

I have had the unique honor of working with great people at Allen AI in the Semantic Scholar Team where I was mentored by Sergey Feldman and Doug Downey. Previously, I have worked with Dr. Puneet Agarwal at the Tata Innovation Labs as a Researcher.


  • Artificial Intelligence
  • Natural Language Processing
  • Information Retrieval


  • PhD in Informatics, Currently Pursuing

    Penn State University

  • Integrated Post Graduation (Masters) in Information and Communication Technology, 2014

    Indian Institute of Information Technology and Management, Gwalior

Recent News

September ‘22Released ACL Anthology Corpus - 113 stars on github [ dataset details ]
August ‘22Teaching IST 441 Information Retrieval and Search Engines [ course details ]
April ‘22S2AMP - S2 Analysis of MentorshiP was accepted in JCDL’22 in the late breaking and dataset track [ data ]
March ‘22Accepted an internship offer from Allen AI for Summer’22 in Seattle, WA
December ‘21Building an Accessible, Usable, Scalable, and Sustainable Service for Scholarly Big Data accepted in International Conference on Big Data [ pdf ]
September ‘21What Were People Searching For? A Query Log Analysis of An Academic Search Engine accepted in JCDL’21 as a poster [ pdf ]
May ‘21Started my summer internship at Allen AI. Working with the S2 Research team on modelling and inference of academic mentorship at scale
February ‘21Large scale subject category classification of scholarly papers with deep attentive neural networks accepted at Frontiers in research metrics and analytics [ paper ]



Research Intern

Allen AI

May 2021 – Present Seattle, WA
Worked with the S2 research team on modelling mentorship between coauthors [code]

Graduate Research Assistant

The Intelligent Information Systems Research Laboratory

Sep 2017 – Present State College, PA


Tata Innovation Labs

Dec 2014 – Aug 2017 Noida, India
Worked on various projects involving Machine Learning, NLP, BigData and Graph Mining. Responsibilities include:

  • Analysing
  • Modelling
  • Deploying
  • Publishing

Recent Posts

ACHARYA: Triage For Preliminary COVID-19 Symptoms

WHAT IS ACHARYA? We introduce Acharya, a multi-lingual WhatsApp based messaging platform and chatbot service for Indian users which interacts with people (via simple interactive questions) to assess (i) their potential risk of COVID-19 infection; and (ii) whether they qualify (or need) to get tested for COVID-19, as per current ICMR guidelines.

Recent Publications

Quickly discover relevant content by filtering publications.

COVIDSeer: Extending the CORD-19 Dataset

Tangent-CFT: An Embedding Model for Mathematical Formulas

CiteSeerX-2018: A Cleansed Multidisciplinary Scholarly Big Dataset

Courses and Certificates

IST 441 Information Retrieval and Search Engines

Graded : A

Neural Networks and Deep Learning

First course of the Deep Learning Specialization.
See certificate

Machine Learning

Learned about the Core Machine Learning Algorithms
See certificate

IST 597: Foundations of Deep Learning

Graded : A

Oracle Certified Java Associate












Participated in 3 Competiitons