Talks and presentations
BIG-TB: A Benchmark Dataset for Genomic Resistance Prediction and Interpretability in Mycobacterium tuberculosis
September 10, 2025 · Talk · Machine Learning for Computational Biology (MLCB) Workshop · New York Genome Center, New York City
This spotlight talk presented the BIG-TB dataset — a multimodal benchmark of ~17,000 M. tuberculosis isolates curated to advance antibiotic resistance prediction and model interpretability.
The presentation highlighted how integrating sequence, structural, and evolutionary features enables models to generalize across resistance mechanisms and better align with biological reality.
Key topics discussed:
- Dataset design principles and integration of WHO 2023 resistance catalogues
- Evaluation of sequence-based (ESM, CNN) vs structure-aware models
- Insights from causal variant discovery and explainability metrics (SHAP, Recall@k)
This work underscores the importance of interpretable, biologically grounded ML systems for global health and precision diagnostics.
This spotlight talk presented the BIG-TB dataset — a multimodal benchmark of ~17,000 M. tuberculosis isolates curated to advance antibiotic resistance prediction and model interpretability.
The presentation highlighted how integrating sequence, structural, and evolutionary features enables models to generalize across resistance mechanisms and better align with biological reality.
Key topics discussed:
- Dataset design principles and integration of WHO 2023 resistance catalogues
- Evaluation of sequence-based (ESM, CNN) vs structure-aware models
- Insights from causal variant discovery and explainability metrics (SHAP, Recall@k)
This work underscores the importance of interpretable, biologically grounded ML systems for global health and precision diagnostics.