- Architected, established and managed data science lab and ML operation workspaces in Databricks to enable citizenship data scientists.
- Established model management framework and build ML platform consisting of infrastructure, services, tools, and common libraries to enable scalability and support the end-to-end ML development process.
- Adopted best practices and established data design patterns from the data engineering space in ML engineering to enable unified enterprise solutions and governance.
- Productionised high fouling event prediction model to alert operators in power plants and trigger preventive measures.
Stephanie Mak
About
Stephanie Mak is a highly experienced data and machine learning engineer with a strong track record of delivering data/ML projects, and enabling data-driven decision-making that aligns with business goals. She has also attained the DataBricks Certified Machine Learning Associate and DataBricks Certified Data Engineer Associate Certificates. Her latest work EnergyAustralia includes the design and delivery of DataBricks multi-workspace solutions for enterprise data lakehouse, data science laboratory, and ML engineering.
Employment
- Established MLOps pattern in Databricks and operationalise model lifecycle with version control.
- Designed and built a self-service platform in Databricks with generic frameworks and patterns to enable business users to access data with low friction and shorten the time to deliver insight.
- Technologies used: Python, Spark, PySpark, AWS (Redshift, EMR, Lambda, Glue), Azure DevOps, Terraform, Power BI, Tableau, Databricks, MLOps, Airflow
- Established organizational data development and quality standards and work with stakeholders to ensure standards are fulfilled.
- Designed and implemented organizational data system management to formulate, implement and enforce proper data management policies and standards.
- Design and implement pipeline automation and orchestration to facilitate rapid business change.
A subsidiary of RACV that is building real-time scalable ETL pipelines in AWS to provide traffic data as a service.
- Developed a strategic data-driven solution utilizing a supervised ML model to project traffic volume data, with the angle of multiple dimensions e.g road conditions, vehicle perspective, levels of congestion, and demographics.
- Implemented life cycle of in-house scalable ML framework from data preparation and manipulation, model training and tuning, to model deployment, with the ability to intake 100 million training data.
- Explored and untangled SCATS data (traffic volume around intersections), translate and transform low-level sixel binary data into layman's terms and human-readable formats in a systematic and reproducible way.
- Designed and architected fault-tolerant ETL pipeline in AWS to transform published traffic and SCATS data from multiple ingestion feeds, to data assets of congestion flow and traffic volume in real-time and bulk (~20 billion data).
- Implemented a cloud migration strategy to re-platform on-premise production services and databases to AWS.
- Technologies used: Python, Java, AWS (API Gateway, Sagemaker, ECS, Dynamo, Lambda, Athena, SQS, SNS, KMS), Docker, Git, Terraform, Power BI, Jenkins, xgboost, Databricks, PySpark
- Migrated Mainframe ETL scripts to Teradata scripts and scheduled jobs with IBM Tivoli Workload Scheduler during data warehouse migration (from Mainframe to hybrid model of Hadoop and Teradata) in 3 months.
- Applied data quality assurance during the migration by conducting data reconciliation between source and target system and built from scratch to production launch an archiving and search engine for paper records with OCR, resulting in SG$600M revenues by optimizing space and workflow.
- Involved in the implementation of Open API hosting in the public cloud (pivotal cloud foundry) to support Hong Kong Monetary Authority Open API Framework roadmap and improved data integrity and quality and automated IBG credit approval process by implementing E-Form platform with business logic and approval workflow, integrating existing systems into E-Form and enabling microservices, to ensure seamless data connectivity.
- Implemented financial triggers for regulatory Matters Requiring Attention (MRA) demands with borrower data identified, extracted, analyzed, and reported to model risk management.
- Was the POC with AWS Rekognition to cross-validate identity from HKID and address proof documents during the account opening process.
- Technologies: AWS Rekognition, Docker, Elasticsearch, Angular, Spring, Java, Node, Python, Teradata, Hadoop, MSSQL, MySQL, Jenkins, SonarQube
- Handled pioneer innovative solutions for air traffic management, investigate enabling technologies, and develop a proof-of-concept with eye-tracking sensor data.
- Designed and developed a real-time analysis web portal to evaluate air traffic controller performance with their behavior and responses based on eye and mouse movement.
- Architected and implemented ingestion pipeline using Kafka to transform eye tracking data and integrate with air traffic management systems, to produce real-time heat maps and live replay in frontend.
- Technologies: Apache Kafka, message queue, XML, Grails, HTML, Java, OpenGL, Tomcat, Git
- Was part of the Microsoft Security Research and Response Team providing anti-malware protection and protecting Windows by building the right tools to enhance researchers' productivity.
- Gained and delivered insights from large anti-malware data sets (4 million data daily) using an internal map-reduce system (COSMOS) and a SQL-like language (SCOPE).
- Facilitated researcher to evaluate malware by aggregation of data in private cloud and creating a chain of malware history to generate clusters for certain malware families.
- Monitored data loss in the end-to-end data pipeline and identified 15% miss from the malware classification engine.
- Technologies: MS SQL Reporting, D3.js, NodeXL, HyperV, Perl, C#, .Net
