Job Description
Project Overview
Marsh MMA is undertaking a major modernization initiative to unify its currently fragmented Azure data and analytics ecosystem. The goal is to migrate existing systems to a Databricks-native platform, improving operational efficiency, eliminating data duplication, reducing latency, and strengthening compliance, security, and scalability across the organization.
This role will support the enablement of the Databricks platform and the migration of legacy data systems into the new architecture.
Key Responsibilities
- Design, build, and optimize scalable data pipelines using Azure Data Factory, Azure Functions, and Databricks.
- Lead migration efforts from existing systems into the new Databricks-native environment.
- Develop, maintain, and optimize SQL queries, data models, and ETL/ELT workflows.
- Integrate data from various sources such as Azure SQL Database, Azure Data Lake, and Azure Storage.
- Implement best practices for data governance, quality, lineage, and security across the new unified platform.
- Collaborate with architects, analysts, and business stakeholders to gather requirements and translate them into technical solutions.
- Troubleshoot, debug, and optimize data processes to minimize latency and ensure system reliability.
- Support Databricks platform enablement, including cluster configuration, performance tuning, and workspace setup.
Required Technical Skills
- Hands-on experience with Azure Data Factory (ADF)
- Strong SQL development and optimization skills
- Experience with Azure Functions
- Proficiency in Azure SQL Database
- Solid understanding of Azure Data Lake and Azure Storage services
- Expertise in Databricks (PySpark/SQL, notebooks, pipelines, clusters)
- Experience with modern data architectures (Lakehouse, Delta Lake preferred)
Preferred Skills (Nice to Have)
- Knowledge of CI/CD for data pipelines (Azure DevOps)
- Experience with data governance tools and practices
- Knowledge of data security and compliance frameworks
- Familiarity with cloud cost optimization for analytics workloads