Portfolio

BIG-TB: A Benchmark Dataset for Genomic Resistance Prediction and Interpretability

A multimodal benchmark of 17,000 Mycobacterium tuberculosis isolates for antibiotic resistance modeling and causal variant discovery.

BIG-TB is a large-scale, multimodal benchmark dataset designed to advance machine-learning-based antibiotic resistance prediction and biological interpretability in Mycobacterium tuberculosis.

The dataset integrates genomic, protein, and structural features across 11 WHO-priority drugs and enables rigorous evaluation of both predictive accuracy and causal variant recovery.


🔬 Key Contributions


🧠 Methods

Models include:


🏆 Impact

Presented as a spotlight talk at the Machine Learning for Computational Biology (MLCB) Workshop 2025, co-located with NeurIPS.
BIG-TB serves as a foundation for evaluating biological faithfulness in ML systems and fostering explainable AI for global health.