BIG-TB: A Benchmark Dataset for Genomic Resistance Prediction and Interpretability in Mycobacterium tuberculosis
Date:
This spotlight talk presented the BIG-TB dataset — a multimodal benchmark of ~17,000 M. tuberculosis isolates curated to advance antibiotic resistance prediction and model interpretability.
The presentation highlighted how integrating sequence, structural, and evolutionary features enables models to generalize across resistance mechanisms and better align with biological reality.
Key topics discussed:
- Dataset design principles and integration of WHO 2023 resistance catalogues
- Evaluation of sequence-based (ESM, CNN) vs structure-aware models
- Insights from causal variant discovery and explainability metrics (SHAP, Recall@k)
This work underscores the importance of interpretable, biologically grounded ML systems for global health and precision diagnostics.