Stephanie Mak

About

Stephanie Mak is a highly experienced data and machine learning engineer with a strong track record of delivering data/ML projects, and enabling data-driven decision-making that aligns with business goals. She has also attained the DataBricks Certified Machine Learning Associate and DataBricks Certified Data Engineer Associate Certificates. Her latest work EnergyAustralia includes the design and delivery of DataBricks multi-workspace solutions for enterprise data lakehouse, data science laboratory, and ML engineering.

Employment

7 roles

Aug 2022 - Present 3 years 10 months

Senior Machine Learning Engineer

EnergyAustralia

Architected, established and managed data science lab and ML operation workspaces in Databricks to enable citizenship data scientists.
Established model management framework and build ML platform consisting of infrastructure, services, tools, and common libraries to enable scalability and support the end-to-end ML development process.
Adopted best practices and established data design patterns from the data engineering space in ML engineering to enable unified enterprise solutions and governance.
Productionised high fouling event prediction model to alert operators in power plants and trigger preventive measures.

Sep 2021 - Aug 2022 11 months

Senior Data Engineer

EnergyAustralia

Established MLOps pattern in Databricks and operationalise model lifecycle with version control.
Designed and built a self-service platform in Databricks with generic frameworks and patterns to enable business users to access data with low friction and shorten the time to deliver insight.
Technologies used: Python, Spark, PySpark, AWS (Redshift, EMR, Lambda, Glue), Azure DevOps, Terraform, Power BI, Tableau, Databricks, MLOps, Airflow

Mar 2021 - Sep 2021 6 months

Senior Data Engineer

Intelematics

Established organizational data development and quality standards and work with stakeholders to ensure standards are fulfilled.
Designed and implemented organizational data system management to formulate, implement and enforce proper data management policies and standards.
Design and implement pipeline automation and orchestration to facilitate rapid business change.

Aug 2019 - Mar 2021 1 year 7 months

Data Engineer

Intelematics

A subsidiary of RACV that is building real-time scalable ETL pipelines in AWS to provide traffic data as a service.

Developed a strategic data-driven solution utilizing a supervised ML model to project traffic volume data, with the angle of multiple dimensions e.g road conditions, vehicle perspective, levels of congestion, and demographics.
Implemented life cycle of in-house scalable ML framework from data preparation and manipulation, model training and tuning, to model deployment, with the ability to intake 100 million training data.
Explored and untangled SCATS data (traffic volume around intersections), translate and transform low-level sixel binary data into layman's terms and human-readable formats in a systematic and reproducible way.
Designed and architected fault-tolerant ETL pipeline in AWS to transform published traffic and SCATS data from multiple ingestion feeds, to data assets of congestion flow and traffic volume in real-time and bulk (~20 billion data).
Implemented a cloud migration strategy to re-platform on-premise production services and databases to AWS.
Technologies used: Python, Java, AWS (API Gateway, Sagemaker, ECS, Dynamo, Lambda, Athena, SQS, SNS, KMS), Docker, Git, Terraform, Power BI, Jenkins, xgboost, Databricks, PySpark

Mar 2017 - Jul 2019 2 years 4 months

Software Engineer

DBS Bank

Migrated Mainframe ETL scripts to Teradata scripts and scheduled jobs with IBM Tivoli Workload Scheduler during data warehouse migration (from Mainframe to hybrid model of Hadoop and Teradata) in 3 months.
Applied data quality assurance during the migration by conducting data reconciliation between source and target system and built from scratch to production launch an archiving and search engine for paper records with OCR, resulting in SG$600M revenues by optimizing space and workflow.
Involved in the implementation of Open API hosting in the public cloud (pivotal cloud foundry) to support Hong Kong Monetary Authority Open API Framework roadmap and improved data integrity and quality and automated IBG credit approval process by implementing E-Form platform with business logic and approval workflow, integrating existing systems into E-Form and enabling microservices, to ensure seamless data connectivity.
Implemented financial triggers for regulatory Matters Requiring Attention (MRA) demands with borrower data identified, extracted, analyzed, and reported to model risk management.
Was the POC with AWS Rekognition to cross-validate identity from HKID and address proof documents during the account opening process.
Technologies: AWS Rekognition, Docker, Elasticsearch, Angular, Spring, Java, Node, Python, Teradata, Hadoop, MSSQL, MySQL, Jenkins, SonarQube

Jun 2016 - Dec 2016 6 months

Software Engineer Intern

Thales

Handled pioneer innovative solutions for air traffic management, investigate enabling technologies, and develop a proof-of-concept with eye-tracking sensor data.
Designed and developed a real-time analysis web portal to evaluate air traffic controller performance with their behavior and responses based on eye and mouse movement.
Architected and implemented ingestion pipeline using Kafka to transform eye tracking data and integrate with air traffic management systems, to produce real-time heat maps and live replay in frontend.
Technologies: Apache Kafka, message queue, XML, Grails, HTML, Java, OpenGL, Tomcat, Git

Dec 2014 - Dec 2015 1 year

Software Engineer

Microsoft

Was part of the Microsoft Security Research and Response Team providing anti-malware protection and protecting Windows by building the right tools to enhance researchers' productivity.
Gained and delivered insights from large anti-malware data sets (4 million data daily) using an internal map-reduce system (COSMOS) and a SQL-like language (SCOPE).
Facilitated researcher to evaluate malware by aggregation of data in private cloud and creating a chain of malware history to generate clusters for certain malware families.
Monitored data loss in the end-to-end data pipeline and identified 15% miss from the malware classification engine.
Technologies: MS SQL Reporting, D3.js, NodeXL, HyperV, Perl, C#, .Net

Education

Nov 2022 - Nov 2022 0 months

Certificate, Databricks Certified Machine Learning Associate

Databricks

May 2022 - May 2022 0 months

Certificate, Databricks Certified Data Engineer Associate

Databricks

Jan 2022 - Jan 2022 0 months

Certificate, Astronomer Certification DAG Authoring for Apache Airflow

Astronomer

Dec 2021 - Dec 2021 0 months

Certificate, Astronomer Certification for Apache Airflow Fundamentals

Astronomer

Dec 2019 - Dec 2019 0 months

Certificate, AWS Certified Security – Specialty

Amazon Web Services Training and Certification

Jul 2019 - Jul 2019 0 months

Certificate, AWS Certified Machine Learning - Specialty

Amazon Web Services Training and Certification

Jul 2019 - Jul 2019 0 months

Certificate, AWS Certified Big Data - Specialty

Amazon Web Services Training and Certification

Jun 2019 - Jun 2019 0 months

Certificate, AWS Certified Solutions Architect - Associate

Amazon Web Services Training and Certification

Mar 2014 - Mar 2016 2 years

Master of Information Technology, Big Data

RMIT University

Portfolio

1 item

Marketing

AI Services

Blog

Events

Hiring Resources

Tools

Marketing

AI Services

Blog

Events

Hiring Resources

Tools

About

Employment

Education

Portfolio

Simplifying Geospatial Data Analysis With Python Using Databricks