Email Text Cleaner for Communication Mining
Jun
2023
Problem / Purpose
Emails stored in Salesforce were difficult to analyze due to reply chains, signatures, legal disclaimers, and other noise that obscured meaningful communication patterns.
Solution
Built a SQL-based text parser in Snowflake to process email content row-by-row. Identified and removed repeated signature blocks, reply history patterns, and automated disclaimers to create a cleaned dataset of email content suitable for text analysis and downstream modeling.
Key Achievements / Impact
Created a reusable cleaned email dataset that became the foundation for multiple downstream analytics and ML projects, significantly improving signal clarity for communication-based insights.
Key Technologies / Tools Used
Snowflake, SQL, Text Cleaning, Email Parsing, Process Design, Process Automation
Role
Data Scientist (self-initiated)
ProService Hawaii