Sr. Data Engineer
100% Remote Forever
Contract to Hire
As a Senior Data Engineer, you will work cross-functionally with data scientists, data analysts, product managers, and stakeholders to understand business needs and develop, maintain and optimize the data sets, data models and large-scale data pipelines primarily in the Azure Databricks Spark cloud stack used for data science models and visualizations. You will partner to drive best practices and set standards for data engineering patterns and optimization. You are a key influencer in data engineering strategy.
This is a unique, high visibility opportunity for someone who wants to have business impact, dive deep into large scale data pipeline and work closely with cross functional team. This position would be fully remote, and report directly to the Director of Data Science. You’ll enjoy the flexibility to telecommute from anywhere within the U.S. as you take on some tough challenges.
- Design, build, optimize, and manage modern large-scale data pipelines ETL/ELT processing to support data integration for analytics, machine learning features and predictive modelling.
- Consume data from a variety of sources (RDBMS, APIs, FTPs and other cloud storage) & formats (Excel, CSV, XML, JSON, Parquet, Unstructured)
- Write advanced / complex SQL with performance tuning and optimization.
- Identify ways to improve data reliability, data integrity, system efficiency and quality.
- Participate in architectural evolution of data engineering patterns, frameworks, systems, and platforms including defining best practices and standards for managing data collections and integration
- Work with data scientists to deploy machine learning models to real-time analytics systems.
- Design and build data service APIs.
- Mentor other data engineers and provide significant technical direction by teaching other data engineers how to leverage cloud data platforms
- Bachelor’s degree in Computer Science, Engineering, Mathematics, Statistics, Economics or related discipline
- 3 + years of experience in data engineering, data integration, data modeling, data architecture, and ETL/ELT processes to provide quality data and analytics solutions
- 3 + years of experience in SQL with designing complex data schemas and query performance optimization
- 2 + years of experience in Apache Spark (PySpark / Spark SQL)
- 2 + years of experience in Python
- Experience working with large size data sets using Big Data Frameworks (i.e., Hadoop/EMR/Databricks/Spark/Hive etc.)
- Experience using Regular Expression, Rest API, NoSQL, Kafka, CI/CD technology, Git
- Extensive knowledge of data architecture principles (e.g., Data Lake, Databricks Delta Lake, Data Warehousing, etc.)
- Extensive knowledge of data modelling techniques including slowly changing dimensions, aggregation, partitioning and indexing strategies
- Ability to independently troubleshoot and performance tune large scale enterprise systems
- Experience with at least one of the following cloud platforms: (Azure, AWS or GCP)
- Excellent collaborator with experience working effectively with cross-functional teams such as leadership, product management and engineering, with a willingness to inspire other data engineers, data scientists and analysts
- Solid communication skills with the ability to communicate technical concepts to both technical and non-technical audiences
Brooksource provides equal employment opportunities (EEO) to all employees and applicants for employment without regard to race, color, religion, national origin, age, sex, citizenship, disability, genetic information, gender, sexual orientation, gender identity, marital status, amnesty or status as a covered veteran in accordance with applicable federal, state, and local laws.