AI Systems Engineer · ML • GenAI • Agentic AI

Building production AI systems that scale from machine learning to autonomous agents.

Senior AI Engineer with 7+ years of experience building machine learning, RAG, and agentic AI systems across fintech and regulated enterprise domains using AWS-native cloud infrastructure.

View Case Studies Download Resume

0+: Years Experience
0: Companies
0+: ML Models in Production
AWS: Cloud-Native

Portrait of Vikash Agrawal, Senior AI Engineer

PythonAWSLangGraphBedrockXGBoost

Experience

7+ years building scalable software, ML, and AI systems

My journey across software engineering, machine learning, and production AI systems.

2026 – Present
Solstice Intelligence
Senior AI Engineer
Architecting production-grade agentic AI systems, enterprise RAG platforms, and AWS-native AI infrastructure for regulatory intelligence in life sciences.
- LangGraph
- Bedrock
- AWS
- RAG
- Agentic AI
2021 – 2026
Scienaptic AI
Senior Data Scientist / Generative AI Engineer
Built credit risk ML systems, anomaly detection pipelines, fairness evaluation frameworks, and enterprise generative AI platforms for large-scale financial institutions.
- XGBoost
- SHAP
- ML
- LLM
- Risk AI
2019 – 2021
TCS
System Engineer
Built backend APIs and scalable distributed applications using Python, Django, PostgreSQL, and React.
- Python
- Django
- React
- PostgreSQL

Case Studies

Featured Case Studies

Real-world AI systems built at enterprise scale.

Agentic AI / Production GenAI / Enterprise RAG

Agentic Regulatory Intelligence Platform

A production agentic AI platform for life-science compliance teams that continuously monitors global regulators such as the FDA and EMA, performs semantic diffing across regulatory document versions, and generates citation-backed compliance impact assessments through a stateful multi-agent LangGraph workflow.

10K+ documents / month
6-agent LangGraph workflow
Citation-backed reports

Semantic change detection

Structural diff first, then an LLM compares old and new text, then a final pass scores business impact. It catches small edits like “annually” turning into “quarterly” that carry big compliance consequences.

Multi-agent reasoning

Six LangGraph agents split the work: a planner, a retrieval agent, a change analyzer, a domain expert that rates severity, a client-impact agent, and a report generator.

Hybrid retrieval

Vector similarity runs alongside keyword search, with metadata filters for agency, version, and effective date, all over OpenSearch. Pure vector search kept missing exact regulatory terms.

Human in the loop

Anything flagged high-severity pauses for a compliance reviewer to sign off before it goes out. That one checkpoint did more for trust than any accuracy gain.

LangGraph
LangChain
Bedrock
OpenSearch
ECS Fargate
DynamoDB
SQS
EventBridge
Terraform
LangSmith

Machine Learning / FinTech AI / Risk Modeling

Enterprise Credit Risk Intelligence Platform

Credit underwriting and portfolio risk models for large lenders, scoring millions of applicants on a blend of bureau and alternative data. The job was always the same balancing act: approve more good borrowers without quietly taking on more default risk, and keep every decision explainable enough to defend to a regulator.

Millions of applicants scored
KS and Gini for risk separation
PSI / CSI drift monitoring in production
AIR fairness checks on protected groups
Real-time and batch scoring pipelines

XGBoost
Random Forest
SHAP
Optuna
PSI
CSI
AIR
LexisNexis

AI Reliability / Observability

LLM Evaluation Framework

Evaluation harnesses for enterprise LLM features, built so quality is something you can actually see. They score retrieval and generation on real traffic, track groundedness and hallucination rate, and run as regression gates every time a prompt or model changes.

Recall@K and MRR on retrieval
Groundedness and hallucination scoring
Latency and cost tracking

Evaluation
LangSmith
Groundedness
Recall@K
Observability

Architecture

Architecture Gallery

System designs behind production AI platforms.

Writing

Technical Writing

Thoughts on AI systems, machine learning, and production engineering.

RetrievalEssay

Production RAG at Scale

Naïve top-k retrieval quietly degrades past a few thousand documents. The fix is rarely a bigger model. It's getting the right context in front of it.

Chunk on document structure, not fixed token windows
Run keyword and vector search together, then rerank the top hits
Filter on metadata before the query ever reaches the model

Retrieval quality, not the model, is the ceiling on RAG accuracy.

AgentsEssay

Designing Reliable Agentic AI Systems

Agents don't fail in the demo. They fail on the long tail, three steps deep, when a tool returns something unexpected. Reliability is mostly about closing those gaps.

Give every tool a strict schema and one narrow job
Cap planning loops so an agent can't wander or stall
Add a critic pass before anything acts on the result

Reliability comes from constraints, not larger models.

EvaluationEssay

LLM Evaluation in Enterprise Systems

You can't ship what you can't measure. A vibe check on ten prompts doesn't survive a model swap, so evaluation has to be built in, not bolted on.

Score retrieval and generation as separate stages
Measure groundedness and hallucination rate on real traffic
Gate every prompt or model change behind the eval suite

Treat evals like CI: every prompt change is a regression test.

Resume

Download my latest resume covering experience across AI systems engineering, machine learning, generative AI, and cloud-native architectures.

Download Resume (PDF)

Contact

Let's Build Something Meaningful

Open to AI engineering opportunities, consulting, and high-impact technical collaborations.

agrawalvikash2015@gmail.com

vikash-agrawal

Visit profile →

GitHub

VikashAgrawal-DataScientist

View repositories →

Building production AI systems that scale from machine learning to autonomous agents.

7+ years building scalable software, ML, and AI systems

Solstice Intelligence

Scienaptic AI

TCS