PhD Candidate, College of Information Sciences and Technology

Penn State University

About Me

I’m a third-year Ph.D. student at the College of Information Sciences and Technology, The Pennsylvania State University. I am advised by Prof. C.Lee Giles. I am a member of The Intelligent Information Systems Research Laboratory in which I work specifically on the CiteSeerX project.

Here, with the help of Dr. Jian Wu I am responsible for research, crawling, updating the index and maintaining the repository of academic documents at scale(>20M documents). My thesis is on building advanced indexing and retrieval techniques for Math Search. This work is under the project MathSeer on which I am actively working.

Previously, I have worked with Dr. Puneet Agarwal at the Tata Innovation Labs as a Researcher.


  • Artificial Intelligence
  • Natural Language Processing
  • Information Retrieval


  • PhD in Informatics, Currently Pursuing

    Penn State University

  • Integrated Post Graduation (Masters) in Information and Communication Technology, 2014

    Indian Institute of Information Technology and Management, Gwalior



Research Intern

Allen AI

May 2021 – Aug 2021 Seattle, WA
Worked with the S2 research team on modelling mentorship between coauthors

Graduate Research Assistant

The Intelligent Information Systems Research Laboratory

Sep 2017 – Present State College, PA


Tata Innovation Labs

Dec 2014 – Aug 2017 Noida, India
Worked on various projects involving Machine Learning, NLP, BigData and Graph Mining. Responsibilities include:

  • Analysing
  • Modelling
  • Deploying
  • Publishing

Recent Posts

ACHARYA: Triage For Preliminary COVID-19 Symptoms

WHAT IS ACHARYA? We introduce Acharya, a multi-lingual WhatsApp based messaging platform and chatbot service for Indian users which interacts with people (via simple interactive questions) to assess (i) their potential risk of COVID-19 infection; and (ii) whether they qualify (or need) to get tested for COVID-19, as per current ICMR guidelines.

Recent Publications

Quickly discover relevant content by filtering publications.

COVIDSeer: Extending the CORD-19 Dataset

Tangent-CFT: An Embedding Model for Mathematical Formulas

CiteSeerX-2018: A Cleansed Multidisciplinary Scholarly Big Dataset

Courses and Certificates

IST 441 Information Retrieval and Search Engines

Graded : A

Neural Networks and Deep Learning

First course of the Deep Learning Specialization.
See certificate

Machine Learning

Learned about the Core Machine Learning Algorithms
See certificate

IST 597: Foundations of Deep Learning

Graded : A

Oracle Certified Java Associate












Participated in 3 Competiitons