Data Resources
Whether you're just getting started or diving deeper into the world of data, this collection of curated resources is here to guide you. Explore books, professional organizations, datasets, and tools that support learning, growth, and real-world application in data science and analytics.
Books & Reading Lists
Behavior Analysis
This free online book, Behavior Analysis with Machine Learning Using R by Enrique Garcia Ceja, offers a practical introduction to applying machine learning techniques to behavioral data. It walks through real-world examples using R, covering data collection, modeling, and evaluation—making it a valuable resource for anyone interested in behavioral science or applied ML.
In his article Let's Predict Human Behavior with AI, Artem Oppermann introduces predictive behavior modeling, a facet of predictive analytics aimed at forecasting future customer actions. He explains how this modeling can be framed as a classification problem, where each potential action is assigned a probability score by a deep neural network. Oppermann emphasizes that such models are trained on historical customer data to predict the likelihood of specific behaviors. He also highlights the practical applications of predictive behavior modeling across various business sectors, illustrating its significance in anticipating customer needs and enhancing decision-making processes.
​The document titled Data-Centric Machine Learning Research: A Brief History by Andrew Zhai, submitted to MIT's Department of Electrical Engineering and Computer Science, explores the evolution of data-centric approaches in machine learning. It discusses how the focus has shifted from model-centric to data-centric methodologies, emphasizing the importance of high-quality data in developing effective machine learning systems. The paper also examines various techniques and frameworks that have been proposed to enhance data quality and their impact on model performance.
This study shows that deep neural networks can outperform traditional models in predicting human decision-making by capturing both reward-based and pattern-based behavior.
Data Analysis
This tutorial teaches how to explore and analyze network data using Python, covering key concepts in network analysis and guiding users through practical steps using the NetworkX library.
Data Architecture & Modeling
This page from The Data Crew provides a clear, beginner-friendly overview of data warehouses—what they are, how they differ from databases, and why they're essential for storing and analyzing large volumes of historical data.
This Oracle page explains that a database is an organized collection of data, typically stored and accessed electronically, and highlights how modern databases support efficient data management, scalability, and security.
This page explains that a relational database organizes data into tables with rows and columns, using relationships between tables to efficiently store and retrieve structured information.
This blog post explains how to apply the Kimball methodology for data modeling, focusing on building dimensional models that support efficient, user-friendly analytics in data warehouses.
This article explores the idea of using a "One Big Table" approach with LLMs to simplify data engineering, suggesting that large language models could reduce the need for complex data pipelines and modeling.
Data Visualization
Fundamentals of Data Visualization by Claus O. Wilke is a free online book that teaches the principles of effective data visualization using clear examples in R. It covers everything from choosing the right chart type to color usage and layout, making it a great resource for anyone aiming to communicate data more clearly.
Information is Beautiful is a website by David McCandless that showcases compelling, visually-driven stories based on data. It features interactive and beautifully designed infographics that make complex information engaging and easy to understand.
This page from ASQ explains what histograms are, how to create them, and how they help visualize the distribution of data to identify patterns or variations.
This guide introduces the Junk Charts Trifecta Checkup, a framework for evaluating data visualizations based on their data, visual design, and messaging to ensure clarity and effectiveness.
Machine Learning
The article An Introductory Review of Deep Learning for Prediction Models With Big Data provides a comprehensive overview of key deep learning architectures—including Deep Feedforward Neural Networks (D-FFNN), Convolutional Neural Networks (CNNs), Deep Belief Networks (DBNs), Autoencoders (AEs), and Long Short-Term Memory (LSTM) networks—and discusses their applications in handling large datasets.
This KDNuggets article offers a practical guide to choosing the right machine learning algorithm based on factors like data size, type, accuracy needs, and interpretability.
Tools & Libraries
Coefficient offers 100+ free, customizable spreadsheet templates for Google Sheets and Excel, covering tasks like inventory management, financial reporting, and content planning. They're designed to save time and streamline workflows for individuals and teams.
Tableau is a visual analytics platform that helps people explore, understand, and share data through interactive dashboards and visualizations.
Tableau Public’s Discover page showcases a curated gallery of interactive visualizations created by the community, offering inspiration and real-world examples across topics like business, health, and sports.
Learning Platforms & Courses
DataCamp is an online learning platform offering interactive courses in data science, analytics, and programming with a focus on Python, R, SQL, and machine learning.
The Python Institute offers certification programs and resources for learning Python, from beginner to advanced levels, to support professional development and validate programming skills.
Tableau's Learning Paths offer guided resources and courses tailored to different roles—like analysts, business users, and data scientists—to help users build skills in data visualization and analysis.
NCCI’s Learning Center offers educational resources on workers’ compensation, including articles, videos, and webinars covering insurance basics, data reporting, and industry trends.
Online Data Sources
This page from Washington State’s L&I provides access to workers’ compensation injury data, helping employers analyze injury trends and improve workplace safety.
This page from III.org provides key facts and statistics on workplace safety and workers’ compensation, including injury rates, common causes, and insurance costs across industries.
This page from WCIO provides technical specifications and related documents for standardized workers’ compensation data reporting across states, including codes and formats used by insurers.
This CMS page provides resources and guidance on ICD-10 codes, which are used for classifying diagnoses and medical procedures in Medicare billing and reporting.
This FDA page provides access to the National Drug Code (NDC) Directory, a database of all drugs approved for use in the U.S., including product details needed for labeling, billing, and tracking.
This Noridian Medicare page explains revenue codes, which are used on Medicare claims to indicate the type of service provided and the department or cost center responsible for the service.