Research:
My interest lies in practical machine learning methods and their theoretical analysis, spanning multiple areas such as Out-of-Distribution Generalization, Trustworthy Machine Learning, and Generative models. Specifically, I am interested in developing methods that enhance the capacity of learning predictive representations. Moreover, I strongly believe that the potential of large deep learning models is yet to be fully explored. My ultimate goal is to create a unified, robust, and generalizable machine learning method that is cost-efficient and accessible to everyone.
In this paper, we focus on a realistic yet challenging task, Single Domain Generalization Object Detection (S-DGOD), where only one source domain’s data can be used for training object detectors, but have to generalize multiple distinct target domains. In S-DGOD, both high-capacity fitting and generalization abilities are needed due to the task’s complexity. Differentiable Neural Architecture Search (NAS) is known for its high capacity for complex data fitting and we propose to leverage Differentiable NAS to solve S-DGOD. However, it may confront severe over-fitting issues due to the feature imbalance phenomenon, where parameters optimized by gradient descent are biased to learn from the easy-to-learn features, which are usually non-causal and spuriously correlated to ground truth labels, such as the features of background in object detection data. Consequently, this leads to serious performance degradation, especially in generalizing to unseen target domains with huge domain gaps between the source domain and target domains. To address this issue, we propose the Generalizable loss (G-loss), which is an OoD-aware objective, preventing NAS from over-fitting by using gradient descent to optimize parameters not only on a subset of easy-to-learn features but also the remaining predictive features for generalization, and the overall framework is named G-NAS. Experimental results on the S-DGOD urban-scene datasets demonstrate that the proposed G-NAS achieves SOTA performance compared to baseline methods.
ICLR
Object Detection with OOD Generalizable Neural Architecture Search
Fan Wu, Kaican Li, Jinling Gao, and
5 more authors
To improve the Out-of-Distribution (OOD) Generalization on Object Detection, we present a Neural Architecture Search (NAS) framework guided by feature orthogonalization. We believe that the failure to generalize on OOD data is due to the spurious correlations of category-related features and context-related features. The category-related features describe the causal information for predicting the target objects, such as "a car with four wheels”, while the context-related features describe the non-causal information, such as "a car driving at night”. However, due to the distinct data distribution between training and testing sets, the context-related features are often mistaken for causal information. To address this, we aim to automatically discover an optimal architecture that can disentangle the category-related features and the context-related features with a novel weight-based detector head. Both theoretical and experimental results show that the proposed scheme can achieve disentanglement and better performance on both IID and OOD.
Journal
AGNet: Automatic generation network for skin imaging reports
Fan Wu, Haiqiong Yang, Linlin Peng, and
5 more authors
Medical imaging has been increasingly adopted in the process of medical diagnosis, especially for skin diseases, where diagnoses based on skin pathology are extremely accurate. The diagnostic reports of skin pathology images has the distinguishing features of extreme repetitiveness and rigid formatting. However, reports written by inexperienced radiologists and pathologists can have a high error rate, and even experienced clinicians can find the reporting task both tedious and time-consuming. To address this challenge, this paper studies the automatic generation of diagnostic reports based on images of skin pathologies. A novel deep learning-based image caption framework named the automatic generation network (AGNet), which is an effective network for the automatic generation of skin imaging reports, is proposed. The proposed AGNet consists of four parts: (1) the image model that extracts features and classifies images; (2) the language model that codes data and generates words using comprehensible language; (3) the attention module that connects the “tail” of the image model and the “head” of the language model, and computes the relationship between images and captions; (4) the embedding and labeling module that processes the input caption data. In case study, The AGNet is verified on a skin pathological image dataset and compared with several state-of-the-art models. The results show that the AGNet achieves the highest scores of the evaluation metrics of image caption among all comparison models, demonstrating the promising performance of the proposed method.