Publications

You can also find my articles on my Google Scholar profile.

Journal Articles


BIG-TB: A Benchmark Dataset for Genomic Resistance Prediction and Interpretability in Mycobacterium tuberculosis

Published in Manuscript in Preparation, 2025

A unified 17K-isolate benchmark dataset for genotype-to-phenotype prediction across 11 WHO-priority antibiotics, integrating genomic, proteomic, and evolutionary modalities.

BIG-TB provides standardized train/test splits, harmonized variant annotation, and interpretability metrics for model comparison. It supports research into causal variant recovery and cross-drug generalization.

The Structural Context of Mutations in Proteins Predicts Their Effect on Antibiotic Resistance

Published in Submitted to eLife, 2025

Protein structural context features yield state-of-the-art accuracy and interpretability for antibiotic resistance mutation prediction.

Status: Submitted to eLife. Preprint: bioRxiv 2025.09.23.676583 (2025)
Summary: Leverages residue-level structural descriptors—solvent accessibility, hydrogen-bonding networks, and ligand proximity—to explain and predict resistance across Mycobacterium tuberculosis drug targets. Structural context improves both AUROC and attribution faithfulness over sequence-only models.

Unveiling GPT-4V’s Hidden Challenges Behind High Accuracy on USMLE Questions

Published in Journal of Medical Internet Research (2025), 2025

Analyzes GPT-4V performance on medical licensing questions, revealing systematic failure modes masked by headline accuracy.

Summary: Dissects GPT-4V responses to USMLE-style questions, cataloging error patterns in multimodal reasoning, visual grounding, and factual calibration. Provides actionable evaluation protocols for clinical AI deployment.

Conference Papers


Protein Structure-Informed Regularized Linear Model Outperforms ESM for Predicting Antibiotic Resistance

Published in Program in Quantitative Genomics Conference (PQG), Harvard University, 2024

Poster highlighting a fused regularized linear model that integrates 3D structural features and surpasses ESM-based baselines for resistance prediction.

Summary: Poster presentation demonstrating how fused regularization over structural neighborhoods and biochemical feature groups yields stronger predictive accuracy than large protein language models on Mycobacterium tuberculosis resistance benchmarks.

Workshop Papers


Beyond Sequence-Only Models: Leveraging Structural Constraints for Antibiotic Resistance Prediction in Sparse Genomic Datasets

Published in ICLR 2025 MLGenX Workshop, 2025

Structural constraints paired with deep learning improve resistance prediction under extreme label sparsity.

Summary: Presents a structure-aware hybrid model that injects contact-map priors and residue-level constraints into sequence-based predictors to maintain interpretability with limited labeled isolates. Demonstrates improved accuracy across sparse genomic datasets for Mycobacterium tuberculosis.