Hi, my name is

Anchit Gupta.

I build data-driven solutions.

I'm a Senior Data Engineer with 4+ years of experience specializing in building scalable data pipelines and infrastructure. Currently working at Sevaro, solving real-world problems through data engineering and AI.

View My Work Resume

01. About Me

Hello! I'm Anchit, a Senior Data Engineer specializing in GenAI and Cloud-Native Analytics. I enjoy building scalable data platforms and integrating Generative AI solutions to solve complex business problems.

With over 4 years of experience, I've worked across the full data lifecycle - from building MDM pipelines for pharmaceutical giants to developing LLaMA-based gaming bots and automating medical document summarization.

My background includes an M.Tech in CS from IIIT Delhi and deep expertise in the modern data stack. I'm constantly exploring new technologies in the AI/ML space to build more intelligent systems.

Here are some technologies I use daily:

Python & PySpark
Databricks & AWS
GenAI (LangChain, Llama)
Snowflake & dbt
Airflow & Docker
SQL & NoSQL

02. Where I've Worked

Oct 2025 — Present

Senior Data Engineer

@ Sevaro Health

Architected migration to Medallion architecture, reducing technical debt by 20-30%
Engineered CI/CD pipelines using GitHub Actions for automated deployments
Implemented RBAC policies in Databricks for data security and governance
Built data observability framework and monitoring dashboards
Spearheaded cloud cost management audits to minimize infrastructure expenses

Jul 2024 — Oct 2025

Data Engineer III

@ Junglee Games

Built LangChain-based GenAI summarization agent used across 5 games, saving 3.5 hours per analyst/day
Migrated Spark jobs from EMR to Databricks, improving ETL performance by 40% and reducing costs by 30%
Developed LLaMA-based Rummy bot for Learn/Practice section, enhancing user onboarding
Built MLOps pipelines for A/B testing and dynamic recommendation systems

Jul 2023 — Jul 2024

Data Engineer II

@ Junglee Games

Led 3 engineers to develop a centralized alerting and observability framework
Designed reusable alerting system with email support, reducing alert development effort by 70%
Created freshness and anomaly detection modules, reducing data quality incidents by 50%

Oct 2022 — Jul 2023

Data Engineer I

@ Junglee Games

Built self-serve analytics layers for cross-functional teams
Enhanced GST/TDS pipelines, reducing turnaround time by 20%
Improved KPI dashboards, cutting SLA delays by 80%

Apr 2022 — Sep 2022

Associate

@ Axtria

Developed MDM pipelines to unify 50M+ records using NLP, achieving 87% match accuracy
Delivered ETL tools and Golden Record logic for pharma client data

Jul 2021 — Apr 2022

Data Analyst

@ Axtria

Generated HCP/HCO hierarchies and automated stakeholder notifications
Reduced matching algorithm time by 40% through memory optimization

03. Education

Master of Technology

IIIT Delhi

2019 — 2021

Specialized in Computer Science with focus on Natural Language Processing and Deep Learning. Worked on research projects involving semantic analysis and information retrieval.

GATE Score: 98.14

Bachelor of Technology

Moradabad Institute of Technology (AKTU)

2014 — 2018

Completed B.Tech in Computer Science with First Division. Developed foundational skills in programming, Android development, and machine learning.

First Division

04. Things I've Built

🤖

GenAI Medical Summarization

Automated system for summarizing complex medical documents using Generative AI. Streamlines information extraction for healthcare professionals to improve patient care.

LLMs LangChain Python Healthcare

🎮

LLaMA Gaming Bot

Integrated LLaMA models into gaming environments to create intelligent, responsive bots that enhance player engagement at Junglee Games.

GenAI LLaMA Game Dev Python

💊

Pharma MDM Pipeline

Enterprise Master Data Management pipeline processing 50M+ records. Implemented fuzzy matching and NLP for high-accuracy entity resolution.

Big Data NLP Spark Axtria

📚

Wikipedia Citation Verifier

NLP-based Chrome Extension to find semantically similar sentences in cited documents for Wikipedia citations using deep learning embeddings.

Python Deep Learning Chrome Ext

😄

Humour Detection

Text classification system to detect humor using deep learning embeddings, optimized for memory-sensitive mobile devices with model size <10 MB.

TensorFlow NLP Mobile AI

🩸

Blood Bank Finder

Android application to help users locate blood banks across India. Features real-time availability and location-based search.

Android Java Firebase Maps API

05. Publications

Technical articles on Data Engineering, Apache Spark, and AI/ML

Jan 2025 Medium

What's New in Apache Spark 4.1 for Developers?

Apache Spark 4.1 is packed with features designed to make data engineering faster, more flexible, and more efficient...

Spark Big Data

Jan 2025 Medium

Breaking Into Data Engineering: The Complete Roadmap for 2026

Data engineering is one of the most in-demand roles in tech. With the rise of GenAI and real-time analytics, the role is evolving...

Career Data Engineering

Jan 2025 Medium

How Pro Handles Unstructured Data in Databricks

The real power of a Lakehouse comes from how you handle unstructured data. In this article, we explore best practices...

Databricks Lakehouse

Dec 2024 Medium

Level Up Your ETL Code with Arrow in PySpark

Apache Arrow is a cross-language platform for in-memory data that specifies a standardized column-oriented format...

PySpark Arrow

Nov 2024 Medium

What Developers Need to Know About Apache Spark 4.0

Apache Spark 4.0 is on the horizon, bringing major changes to the world of big data processing...

Spark Big Data

Nov 2024 Medium

How to Build a Data Engineering Portfolio That Gets You Hired

Building a data engineering portfolio is one of the most important things you can do to stand out in a crowded job market...

Career Portfolio

View All Articles on Medium

06. Certifications

🎓

GATE 2019

Score: 98.14 Percentile

🔷

Databricks Fundamentals Accreditation

Databricks

🤖

LangChain for LLM Application Development

DeepLearning.AI

🔧

Apache Airflow Fundamentals

Astronomer

☁️

Azure Databricks & Spark Core

Udemy

📊

MTA Database Fundamentals

Microsoft

☕

J2SE Java Certification

Oracle

📱

Android App Development

Google

06. What's Next?

Get In Touch

I'm currently open to new opportunities and exciting projects. Whether you have a question or just want to say hi, my inbox is always open. I'll try my best to get back to you!

Say Hello