Stephanie Mak
Data Science Developer

Stephanie Mak

Senior machine learning engineer for EnergyAustralia.

About

Stephanie Mak is a highly experienced data and machine learning engineer with a strong track record of delivering data/ML projects, and enabling data-driven decision-making that aligns with business goals. She has also attained the DataBricks Certified Machine Learning Associate and DataBricks Certified Data Engineer Associate Certificates. Her latest work EnergyAustralia includes the design and delivery of DataBricks multi-workspace solutions for enterprise data lakehouse, data science laboratory, and ML engineering.

Employment

7 roles
Aug 2022 - Present 3 years 9 months
Senior Machine Learning Engineer
EnergyAustralia
  • Architected, established and managed data science lab and ML operation workspaces in Databricks to enable citizenship data scientists.
  • Established model management framework and build ML platform consisting of infrastructure, services, tools, and common libraries to enable scalability and support the end-to-end ML development process.
  • Adopted best practices and established data design patterns from the data engineering space in ML engineering to enable unified enterprise solutions and governance.
  • Productionised high fouling event prediction model to alert operators in power plants and trigger preventive measures.
Sep 2021 - Aug 2022 11 months
Senior Data Engineer
EnergyAustralia
  • Established MLOps pattern in Databricks and operationalise model lifecycle with version control.
  • Designed and built a self-service platform in Databricks with generic frameworks and patterns to enable business users to access data with low friction and shorten the time to deliver insight.
  • Technologies used: Python, Spark, PySpark, AWS (Redshift, EMR, Lambda, Glue), Azure DevOps, Terraform, Power BI, Tableau, Databricks, MLOps, Airflow
Mar 2021 - Sep 2021 6 months
Senior Data Engineer
Intelematics
  • Established organizational data development and quality standards and work with stakeholders to ensure standards are fulfilled.
  • Designed and implemented organizational data system management to formulate, implement and enforce proper data management policies and standards.
  • Design and implement pipeline automation and orchestration to facilitate rapid business change.
Aug 2019 - Mar 2021 1 year 7 months
Data Engineer
Intelematics

A subsidiary of RACV that is building real-time scalable ETL pipelines in AWS to provide traffic data as a service.


  • Developed a strategic data-driven solution utilizing a supervised ML model to project traffic volume data, with the angle of multiple dimensions e.g road conditions, vehicle perspective, levels of congestion, and demographics.
  • Implemented life cycle of in-house scalable ML framework from data preparation and manipulation, model training and tuning, to model deployment, with the ability to intake 100 million training data.
  • Explored and untangled SCATS data (traffic volume around intersections), translate and transform low-level sixel binary data into layman's terms and human-readable formats in a systematic and reproducible way.
  • Designed and architected fault-tolerant ETL pipeline in AWS to transform published traffic and SCATS data from multiple ingestion feeds, to data assets of congestion flow and traffic volume in real-time and bulk (~20 billion data).
  • Implemented a cloud migration strategy to re-platform on-premise production services and databases to AWS.
  • Technologies used: Python, Java, AWS (API Gateway, Sagemaker, ECS, Dynamo, Lambda, Athena, SQS, SNS, KMS), Docker, Git, Terraform, Power BI, Jenkins, xgboost, Databricks, PySpark
Mar 2017 - Jul 2019 2 years 4 months
Software Engineer
DBS Bank
  • Migrated Mainframe ETL scripts to Teradata scripts and scheduled jobs with IBM Tivoli Workload Scheduler during data warehouse migration (from Mainframe to hybrid model of Hadoop and Teradata) in 3 months.
  • Applied data quality assurance during the migration by conducting data reconciliation between source and target system and built from scratch to production launch an archiving and search engine for paper records with OCR, resulting in SG$600M revenues by optimizing space and workflow.
  • Involved in the implementation of Open API hosting in the public cloud (pivotal cloud foundry) to support Hong Kong Monetary Authority Open API Framework roadmap and improved data integrity and quality and automated IBG credit approval process by implementing E-Form platform with business logic and approval workflow, integrating existing systems into E-Form and enabling microservices, to ensure seamless data connectivity.
  • Implemented financial triggers for regulatory Matters Requiring Attention (MRA) demands with borrower data identified, extracted, analyzed, and reported to model risk management.
  • Was the POC with AWS Rekognition to cross-validate identity from HKID and address proof documents during the account opening process.
  • Technologies: AWS Rekognition, Docker, Elasticsearch, Angular, Spring, Java, Node, Python, Teradata, Hadoop, MSSQL, MySQL, Jenkins, SonarQube
Jun 2016 - Dec 2016 6 months
Software Engineer Intern
Thales
  • Handled pioneer innovative solutions for air traffic management, investigate enabling technologies, and develop a proof-of-concept with eye-tracking sensor data.
  • Designed and developed a real-time analysis web portal to evaluate air traffic controller performance with their behavior and responses based on eye and mouse movement. 
  • Architected and implemented ingestion pipeline using Kafka to transform eye tracking data and integrate with air traffic management systems, to produce real-time heat maps and live replay in frontend.
  • Technologies: Apache Kafka, message queue, XML, Grails, HTML, Java, OpenGL, Tomcat, Git
Dec 2014 - Dec 2015 1 year
Software Engineer
Microsoft
  • Was part of the Microsoft Security Research and Response Team providing anti-malware protection and protecting Windows by building the right tools to enhance researchers'​ productivity.
  • Gained and delivered insights from large anti-malware data sets (4 million data daily) using an internal map-reduce system (COSMOS) and a SQL-like language (SCOPE).
  • Facilitated researcher to evaluate malware by aggregation of data in private cloud and creating a chain of malware history to generate clusters for certain malware families.
  • Monitored data loss in the end-to-end data pipeline and identified 15% miss from the malware classification engine.
  • Technologies: MS SQL Reporting, D3.js, NodeXL, HyperV, Perl, C#, .Net

Education

Nov 2022 - Nov 2022 0 months
Certificate, Databricks Certified Machine Learning Associate
Databricks
May 2022 - May 2022 0 months
Certificate, Databricks Certified Data Engineer Associate
Databricks
Jan 2022 - Jan 2022 0 months
Certificate, Astronomer Certification DAG Authoring for Apache Airflow
Astronomer
Dec 2021 - Dec 2021 0 months
Certificate, Astronomer Certification for Apache Airflow Fundamentals
Astronomer
Dec 2019 - Dec 2019 0 months
Certificate, AWS Certified Security – Specialty
Amazon Web Services Training and Certification
Jul 2019 - Jul 2019 0 months
Certificate, AWS Certified Machine Learning - Specialty
Amazon Web Services Training and Certification
Jul 2019 - Jul 2019 0 months
Certificate, AWS Certified Big Data - Specialty
Amazon Web Services Training and Certification
Jun 2019 - Jun 2019 0 months
Certificate, AWS Certified Solutions Architect - Associate
Amazon Web Services Training and Certification
Mar 2014 - Mar 2016 2 years
Master of Information Technology, Big Data
RMIT University

Portfolio

1 item