MeheliSinha.
Data & AI Engineer with over 3.5 years building data platforms and automation layers across banking and enterprise governance systems. I specialize in Python, SQL, and cloud based ETL pipelines that turn complex enterprise data into trusted, AI ready foundations at a Berlin energy technology company.
Sources
- Python, SQL
- PostgreSQL, Oracle
- REST APIs, NoSQL
Pipelines
- Azure Data Factory
- Databricks, PySpark
- Airflow
AI & Insight
- LLMs, RAG
- FastAPI
- Power BI
Data driven engineering, end to end.
Data Engineer with over 3.5 years building data platforms and automation layers across banking and enterprise governance systems. At Tata Consultancy Services I architected ELT pipelines processing 500K+ daily records for Nordea Bank ABP. Now at E.ON Digital Technology GmbH in Berlin, I migrate enterprise systems to Databricks and build automated governance pipelines, turning complex enterprise IT data into trusted, structured foundations for AI powered systems.
Production Data Engineering
Over 3.5 years building robust ETL and ELT pipelines on Python, SQL, Azure, and Databricks, handling 500K+ records per run.
Cloud & Governance Platforms
Building automated pipelines integrating Power BI, Azure SQL, Blob Storage, and internal security and asset systems for enterprise governance.
MSc in Data Science & AI
Pursuing a Master's at GISMA University of Applied Sciences in Berlin, with a thesis on multi agent LLM systems for Data Vault 2.0.
Key strengths and technologies
The tools I reach for to build scalable, AI ready, data intensive systems.
ETL / ELT Pipelines
Scalable ELT and ETL pipelines across SQL Server, Oracle, and Databricks, processing 500K+ daily records.
Databricks & Medallion
Migrating SQL views, tables, and stored procedures to Databricks using Medallion architecture.
FastAPI
High performance APIs exposing data and AI capabilities as production ready services.
Python & SQL
pandas, NumPy, scikit-learn, and async/await with advanced SQL and PL/SQL on PostgreSQL and Oracle.
Azure Cloud
Azure Data Factory, SQL, Functions, and Blob Storage with Power BI for enterprise governance.
AI & Machine Learning
ML pipelines, feature engineering, RAG, LLMs, and agentic AI for AI ready data platforms.
Where the pipeline runs
Working Student, Data Engineer
- Migrated 100+ SQL views, tables, and stored procedures to Databricks using Medallion architecture, building the reference foundation for the broader cloud migration program.
- Diagnosed and fixed a Databricks orchestrator job that had failed for 3+ runs and blocked 419K+ daily records, by resolving a credential race condition with an in memory PEM key auth fix.
- Resolved a silent SCD2 data quality bug caused by inconsistent Azure API ID casing, eliminating 4,800+ duplicate records across 59 views by automating blast radius analysis over 31 tables with PySpark.
- Delivered fully automated Power BI KPI dashboards adopted by governance and business stakeholders, replacing manual reporting, and integrated REST APIs to synchronize IT asset and security data across enterprise platforms.
Data Engineer
- Architected end to end ELT and ETL pipelines processing 500K+ daily records across SQL Server, Oracle, and Databricks for Nordea Bank's Credit and Risk Transformation program.
- Engineered 50+ Python automation components and advanced PL/SQL packages, procedures, and triggers, improving data pipeline efficiency by 30% and speed by 20%.
- Re-engineered a legacy batch pipeline into an incremental load model, cutting processing time by 60%.
- Conducted root cause analysis on 30+ data discrepancies, improving downstream data trust scores by 25%, and orchestrated workflows with Python, SQL Developer, and Apache Airflow.
Machine Learning Intern
- Built a used car price prediction model with supervised regression using pandas, NumPy, matplotlib, seaborn, and SciPy.
Featured work
Production grade, agentic, and full stack data and AI, from raw ingestion to auditable, queryable insight.
Multi-Agent Data Vault 2.0
A multi agent LLM system that auto generates Data Vault 2.0 warehouse models (hubs, links, and satellites) from raw source schemas. A six stage pipeline combines LLM based schema inference with deterministic risk checks and majority vote consensus across model runs, so no single bad output can break a run. It includes a thread safe rate limiter with adaptive batch splitting, a human in the loop governance layer with severity graded validation, and an insert only audit trail for full data lineage. A full stack review app lets engineers move a new source from connection to approved model without manual scripting.
MSc Data Science, AI & Digital Business
GISMA University of Applied Sciences
BE Information Science & Engineering
Visvesvaraya Technological University
- ML Pipelines
- Feature Engineering
- RAG
- NLP
- LLMs
- Agentic AI
- GenAI
- Python
- SQL
- ETL and ELT
- Databricks
- Azure Data Factory
- Airflow
- PostgreSQL
- Docker
419K+ daily records unblocked
Fixed a Databricks orchestrator job failing for 3+ runs with an in memory PEM key auth fix.
4,800+ duplicate records eliminated
Resolved a silent SCD2 data quality bug across 59 views via PySpark blast radius analysis.
60% faster batch processing
Re-engineered a legacy batch pipeline into an incremental load model at Nordea Bank.
30% pipeline efficiency gain
Engineered 50+ Python automation components, improving processing speed by 20%.
Python Programming — Beginner to Advanced
Skill Development Program on Artificial Intelligence
McKinsey.org Forward Program
Programming in Java
Let's build something scalable.
Open to full time Data Engineer roles across Berlin, Germany and the EU. The fastest way to reach me is via email and LinkedIn.