DP · 014 credits
Data Analysis
Systematic methods for examining datasets to identify patterns, anomalies, and actionable signals — covering exploratory analysis, statistical reasoning, and the communication of findings to technical and non-technical audiences. Emphasis on analysis as a decision-support discipline, not a reporting exercise.
DP · 024 credits
Data Classification
Design and implementation of systems that assign structured categories to unstructured or semi-structured data — taxonomy development, classifier training and evaluation, and quality validation at scale. Covers both rule-based and ML-driven classification pipelines.
DP · 034 credits
Data Governance
The organizational discipline of managing data as a strategic asset — ownership models, access controls, lineage tracking, retention policy, and the audit structures that satisfy regulatory and compliance requirements in AI-heavy environments where data provenance directly affects model accountability.
DP · 044 credits
Data Labeling
The full pipeline of human-in-the-loop data annotation — task design, labeling tool selection, quality assurance protocols, inter-annotator agreement measurement, and the economics of labeling at scale. Covers automated pre-labeling, active learning, and the tradeoffs between speed, cost, and quality.
DP · 054 credits
Data Management
End-to-end practices for acquiring, storing, transforming, and retiring data across an organization — schema design, pipeline architecture, storage optimization, and the operational standards that prevent data debt from accumulating silently beneath production AI systems.
DP · 064 credits
Data Privacy
The regulatory landscape (GDPR, CCPA, HIPAA) and the technical controls — anonymization, differential privacy, consent management, access controls — that protect individual rights and limit organizational liability in data-driven AI systems. Emphasis on privacy-by-design over compliance-after-the-fact.
DP · 074 credits
Data Quality
Methods and infrastructure for defining, measuring, and enforcing quality standards in data pipelines — completeness checks, anomaly detection, schema validation, and the organizational processes that maintain trust in data over time. Includes quality metrics for AI training datasets specifically.
DP · 084 credits
Data Structures
A practitioner's guide to the data structures underlying AI and analytics systems — arrays, trees, graphs, hash maps, and the reasoning required to select and implement the right structure for a given problem. Focuses on the structural decisions that affect performance at scale in real ML pipelines.
DP · 094 credits
Data Validation
Design of validation logic that ensures data meets defined standards before entering pipelines, models, or reports — schema enforcement, constraint testing, and the testing frameworks that catch errors upstream where they are cheap to fix rather than downstream where they are expensive to trace.
DP · 104 credits
Data Extraction
Methods for reliably extracting structured and semi-structured data from diverse sources — web scraping, API integration, document parsing, OCR pipelines, and the transformation logic that converts raw, heterogeneous inputs into usable datasets for analysis and model training.
DP · 114 credits
Information Retrieval
The theory and systems behind finding relevant information in large corpora — indexing strategies, ranking algorithms, semantic and hybrid search, and the evaluation metrics that distinguish retrieval quality from retrieval volume. Covers classical IR and the vector-search systems underpinning modern AI applications.