Benchmarking Interpretability in Healthcare using Pattern Discovery and Disentanglement
Published in EMBS BHI 2024, 2024
The healthcare industry seeks to integrate AI into clinical applications, yet understanding AI decision-making remains a challenge for healthcare practitioners since these systems often function as black boxes. Our work benchmarks the Pattern Discovery and Disentanglement (PDD) system’s unsupervised learning algorithm, which provides interpretable outputs and clustering results from clinical notes to aid decision-making. Using the MIMIC-IV dataset, we process free-text clinical notes and ICD-9 codes with Term Frequency-Inverse Document Frequency and Topic Modeling. The PDD algorithm discretizes numerical features into event-based features, discovers association patterns from a disentangled statistical feature value association space, and clusters records. The output is an interpretable knowledge base linking knowledge, patterns, and data to support decision-making. Despite being unsupervised, PDD demonstrated performance comparable to supervised deep learning models, validating its clustering ability and knowledge representation. We benchmark interpretability techniques —–Feature Permutation, Gradient SHAP, and Integrated Gradients–— on the best-performing models (in terms of F1, ROC AUC, balanced accuracy, etc.), evaluating based on sufficiency, comprehensiveness, and sensitivity metrics. ur findings highlight the limitations of feature importance ranking and post-hoc analysis for clinical diagnosis. Meanwhile, PDD’s global interpretability compensates effectively for these issues, helping healthcare practitioners understand the decision-making process and providing suggestive clusters of diseases to assist their diagnosis.