David Y.
Profile Summary
Analytical and process-oriented data scientist with expert level knowledge in statistics, algorithms, multivariate analysis, machine learning tools in Python, database systems, and working with data at scale. With a proven track record of analyzing complex data sets and serving as a reliable advisor. Passionate about finding insights that are business focused while also achievable at scale.
Expertise
Python Developer
> 5 Year Experience5/5
Machine Learning Engineer
> 5 Year Experience5/5
Data Scientist
> 5 Year Experience5/5
Employment History
Sr Data Scientist / Founder (CDO)
Guided team practice for analysis for projects focused on revenue classification, engagement/progression, and churn for mobile game publishers. I lead the implementation of our data lake and warehouse projects on Google Cloud Platform. Built and maintained toolsets for automating sklearn pipelines for testing, production deployment, and monitoring of the Bluwhale platform. A key contributor of platform product team defining use-cases for product features related to ML and monitoring the health of live classification and regression models.
Highlights
Produced an insight that uncovered server errors leading 5% reduction in the first-day churn for a client with 10M+ DAU.
Environment:
Python ML stack (numpy, Pandas, Matplotlib, sklearn, Keras, Tensorflow)
Google Cloud Platform (BigQuery, GKE, Dataproc)
Global Lead Data Science Instructor
Lead content development for the global organization of data science pilot programs and curriculum, including machine learning, statistics, math, engineering, and their applications. I worked with stakeholders to develop internal and student-facing lectures, labs, and project material based on learning objectives and market research with job placement outcomes in mind. My role consisted of research and development, in-class instructor support, lecturing on key data science topics, and consulting for scope and methodology of final student projects.
Highlights
- Learned to code the classic ML algorithms from scratch during the development of curriculum
- Development of metrics leading to improved NPS across North American regions
- Piloted original data science immersive programs now taught globally
- Authored 15 in-depth case studies used in the classroom across the organization related to supervised and unsupervised learning, model deployment, metrics, testing, and validation.
Environment: Pandas, Jupiter, numpy, sklearn, scipy, keras, tensorlfow, matplotlib, Spark, SQL, AWS, Linux systems
Data Scientist
Built and managed a collaborative based recommender system pipelines for production mobile data app for women. Supported marketing initiatives through sentiment models and bespoke Python/Flask prototypes to identify key brand influences for outreach.
Highlights
- Recommender system tuned to increase active session length by 10%
- Moved historical data off 300M+ MySQL instance to Redshift for better online analysis and analytics purposes
Environment: Python, Scala, SQL, sklearn, EMR, AWS, RDS
Sr Data Developer
I built a database of 6 million artists with accompanying search algorithms, architect an enterprise-grade content management system supporting seven languages, integrated six featured Consumer Electronics Show projects related to recommender and ML-based music identification, and built a content classification suite of tools that automated manual tagging processes.
- Lead development role for content delivery, search, and prototype/pilot related web projects.
- Ownership of web application security guidelines and practices.
- Review and approve schema design, queries, and other database resources related to development/engineering team ensuring best practice.
- Review and recommend testing strategies for major projects (unit, white box, system, availability).
- ETL of disparate sources, building RESTful services around clean(er) data.
- Bespoke reporting and BI/analysis tools.
- Tuning Databases for performance, availability, and integrity.
- Scripting to monitor and automate processes.
Environment: Linux/Unix, AWS, MongoDB, MySQL, Oracle, Python, Java, sklearn, Pandas