Wei Gao

I am an Associate Professor in the Department of Electrical and Computer Engineering and an affiliated faculty member of the McGowan Institute of Regenerative Medicine at University of Pittsburgh. I direct the Pitt Intelligent System Laboratory (ISL).

My research is positioned at the intersection between AI and systems. It focuses on the design and deployment of on-device AI architectures and algorithms on mobile, embedded and networked systems. I have strong interests in unveiling analytical principles underneath practical AI deployment problems, and designing systems based on these principles. The developed AI and system solutions are widely applied to different practical applications, including smart health, Internet of Things and edge computing. My research areas mainly include the following:

Note: I am currently looking for strong and self-motivated students with an interest in the above areas. If you are interested in my research and working with me, please send me email with your CV and transcripts. You can find more details about my research here, and a list of my recent publications here.

 

Professional Services

 

Research Highlights

November 2024: Text-to-video (T2V) generative AI could revolutionize many current and emerging application and industry domains. However, the capabilities of today's T2V generative models are mostly data dependent. While they perform well in domains covered by the training data, they usually fail to obey the real-world common knowledge and physical rules with out-of-distribution prompts. Expanding the model's capabilities, on the other hand, relies on large amounts of real-world data and is hence not scalable. Our recent work aims to address this limitation of data dependency, by fully unleashing the current T2V models' potential in scene generation given proper and detailed prompts. Our approach, namely PhyT2V, is a training-free technique that leverages the LLM's capabilities of chain-of-thought and step-back reasoning in the language domain, to logically identify the deficiency of generated videos and iteratively refine the current T2V models' video generation by correcting such deficiency with more precise and well articulated prompts. Check our preprint here. We have also released a Discord Bot which allows you to try our work with SOTA T2V models. mcc
 
August 2024: Our paper, Perceptual-Centric Image Super-Resolution using Heterogeneous Processors on Mobile Devices, has been accepted for publication at the 2024 ACM International Conference on Mobile Computing and Networking (MobiCom). It focuses on using heterogeneous processors on mobile devices to accelerate the computationally expensive image super-resolution, while minimizing the impact of AI accelerators' reduced arithmetic precision on the images' perceptual quality. Check our paper here. mcc
 
May 2024: Being different from model compression that requires expensive retraining, sparse activation can effectively reduce neural network models' inference cost at runtime without any prior retraining or adaptation efforts. Although sparse activation has been proved to be effective on Large Language Models (LLMs) that are usually redundant (e.g., OPT and BLOOMZ models), its applicability on recent Small Language Models (SLMs) with higher parameter efficiency remains questionable. Our recent work verified such possibility by using gradient-based attribution scores to evaluate neurons' importance in inference, in both analytical and experimental perspectives. Our results show that we can achieve up to 80% sparsity in major SLM models, including Phi-1.5/2 and MobiLlama-0.5B/1B, with less than 5% model accuracy loss on QA tasks. Check our preprint and source codes. mcc
 
May 2024: Illegally using fine-tuned diffusion models to forge human portraits has been a major threat to trustworthy AI. While most existing work focuses on detection of the AI-forged contents, our recent work instead aims to mitigate such illegal domain adaptation by applying safeguards on diffusion models. Being different from model unlearning techniques that cannot prevent the illegal domain knowledge from being relearned with custom or public data, our approach, namely FreezeGuard, suggests that the model publisher selectively freezes tensors in pre-trained models that are critical to the convergence of fine-tuning in illegal domains. FreezeAsGuard can effectively reduce the quality of images generated in illegal domains and ensure that these images are unrecognizable as target objects. Meanwhile, it has the minimum impact on legal domain adaptations, and can save up to 48% GPU memory and 21% wall-clock time in model fine-tuning. Check our preprint and source codes. mcc
 
April 2024: We are pleased to release our recently completed Generative AI Tutorial and Roadmap, which is a detailed learning guide for generative AI research, including a curated list of state-of-the-art research articles, projects, open-sourced code repositories, and other related research resources. This tutorial covers a wide range of well studied, currently popular and emerging research topics in the broaderly defined area of generative AI, and can be used as both learning materials for entry-level graduate students and reference documents for other researchers. We will also continuously update this tutorial to add new contents and topics. Please stay tuned! mcc
 
January 2024: We are happy to publish the dataset of human airway measurements, produced by our integrated AI and sensing systems for smart pulmonary telemedicine, namely Acoustic Waveform Respiratory Evaluation (AWARE). This dataset contains airway measurements of 382 human subjects, including patients with various pulmonary diseases and healthy control subjects, recruited from the Children's Hospital of Pittsburgh during the past 3 years. The contents of the dataset include raw WAV files from acoustic sensing, segmented and aligned acoustic signal pulses, and processed measurements of airway cross-sectional areas. More details can be found in dataset page and source codes of using the dataset can be found here. To our best knowledge, this is the first public dataset of human airway measurements with pulmonary diseases, and we welcome any feedback from the smart health research community. mcc
 
January 2024: Our paper, Towards Green AI in Fine-Tuning Large Language Models via Adaptive Backpropagation, has been accepted for publication at the 2024 International Conference on Learning Representations (ICLR). It focuses on computationally efficient LLM fine-tuning, and reduces the FLOPs of LLM fine-tuning by 30% and improves the model accuracy by 4%, compared to LoRA. Check our paper here. mcc
 
December 2023: The preprint of our recent work on runtime modality adaptation for embodied AI has been made publicly available on arXiv. This is the first work that allows multimodal LLMs to elastically switch between input data modalities at runtime, for embodied AI applications such as autonomous navigation. Our basic technical approach is to use fully trainable projectors to adaptively connect the unimodal data encoders being used to a flexible set of last LLM blocks. In this way, we can flexibly adjust the amount of LLM blocks being connected to balance between accuracy of runtime fine-tuning cost, and optimize the efficiency of cross-modal interaction by controlling the amount of information being injected in each connection. Our implementations on NVidia Jetson AGX Orin demonstrate short modality adaptation delays of few minutes with mainstream LLMs, 3.7x fine-tuning FLOPs reduction, and 4% accuracy improvements on multimodal QA tasks. Check the source codes of our work here. mcc
 
October 2023: Since 2020, our integrated AI and sensing systems for smart pulmonary telemedicine, namely PTEase, have been applied to and tested on more than 400 patients with various pulmonary diseases at the Children's Hospital of Pittsburgh, for remote disease diagnosis, evaluation and management, and have particularly helped families in various minority groups with limited access to healthcare. Please see the table for demographics of involved human subjects. By applying on-device AI models and ML algorithms on patient data collected on smartphones, we can achieve 75% accuracy on diagnosing pulmonary diseases in diverse patient groups and 11%-15% errors in estimating lung function indices. Check our recent paper for details. Currently, we are in ongoing processes of commercializing these techniques and clinical studies at larger populations, by applying for FDA 510(k) clearances on medical device use and licensing our techniques to both US and international companies. Check this website for latest news and this page for more details on our research of integrated AI and sensing systems for smart health. mcc
 
September 2023: Two preprints of our recent works on on-device AI have been made publicly available on arXiv. The first work, Towards Green AI in Fine-Tuning Large Language Models via Adaptive Backpropagation, extends our prior work of ElasticTrainer (published at ACM MobiSys 2023) to Large Language Models (LLMs) and facilitates computationally efficient LLM fine-tuning towards green AI. It can reduce the fine-tuning FLOPs by extra 30% compared to existing techniques such as LoRA, without noticeable accuracy loss. With the same amount of FLOPs reduction, it can provide 4% model accuracy improvement compared to LoRA. The second work, Tackling the Unlimited Staleness in Federated Learning with Intertwined Data and Device Heterogeneities, targets a FL problem motivated by practical system settings, where data samples in certain classes or with particular features may only be produced from some slow clients. Our work leverages gradient inversion to move the staleness of model updates from these slow clients, and can improve the trained model accuracy by 20% and speed up the training progress by 35%, compared to existing techniques in asynchronous FL. The source codes have been publicly available at the Pitt ISL webpage.
 
April 2023: Our paper, ElasticTrainer: Speeding Up On-Device Training with Runtime Elastic Tensor Selection, has been accepted for publication at the 2023 ACM Conference on Mobile Systems, Applications, and Services (MobiSys). This paper presents the first on-device AI technique that achieves full elasticity of on-device training on resource-constrained mobile and embedded devices. By leveraging the principle of eXplainable AI (XAI) and evaluating the importance of different tensors in training, we allow fully flexible adaptation of the trainable neural network portion at runtime, according to the current training needs and online data patterns, to minimize the training cost without accuracy loss. Check our paper and source codes for more details.
 
April 2023: Our paper, PTEase: Objective Airway Examination for Pulmonary Telemedicine using Commodity Smartphones, has been accepted for publication at the 2023 ACM Conference on Mobile Systems, Applications, and Services (MobiSys). This is the first mobile health system that turns a commodity smartphone into a fully functional pulmonary examination device to measure the internal physiological conditions of human airways, such as airway caliber, obstruction and possible inflammation. Information about these airway conditions could provide vital clues for precise and objective pulmonary disease evaluation. Check our paper for more details.
 
Oct 2022: Our paper, AiFi: AI-Enabled WiFi Interference Cancellation with Commodity PHY-Layer Information, has been accepted for publication at the 2022 ACM Conference on Embedded Networked Sensor Systems (SenSys). This is the first work that applies on-device AI techniques to interference cancellation in WiFi networks and enables generalizable interference cancellation on commodity WiFi devices without any extra RF hardware. By using neural network models to mimic WiFi network's PHY-layer operation, AiFi can be generally applied to different types of interference signals ranging from concurrent WiFi transmissions, ZigBee/Bluetooth to wireless baby monitors or even microwave oven, and improves the MAC-layer frame reception rate by 18x. Check our paper for more details. mcc
 
Aug 2022: Our paper, Real-time Neural Network Inference on Extremely Weak Devices: Agile Offloading with Explainable AI, has been accepted for publication at the 2022 ACM Int'l Conference on Mobile Computing and Networking (MobiCom). This is the first work that achieves real-time inference (<20ms) of mainstream neural network models (e.g., ImageNet) on extremely weak MCUs (e.g., STM32 series with <1MB of memory), without impairing the inference accuracy. The usage of eXplainable AI (XAI) techniques allows >6x improvement of feature compressibility during offloading and >8x reduction of the local device's resource consumption. Check our paper and source codes for more details.
 
May 2022: Our paper, TransFi: Emulating Custom Wireless Physical Layer from Commodity WiFi, has been accepted for publication at the 2022 ACM Int'l Conference on Mobile Systems, Applications and Services (MobiSys). This is the first work that realizes fine-grained signal emulation and allows commodity WiFi devices to emulate custom wireless physical layer, including but not limited to, custom PHY-layer preambles and new ways of agile spectrum usage. It could also improve the performance of cross-technology communication and many other wireless applications by up to 50x, enabling high-speed data communication on par with commodity WiFi! Watch the teaser video for details.
 
January 2022: Our paper, RAScatter: Achieving Energy-Efficient Backscatter Readers via AI-Assisted Power Adaptation, has been accepted for publication at the 2022 ACM/IEEE Conference on Internet of Things Design and Implementation (IoTDI).
 
January 2022: Our paper, FaceListener: Recognizing Human Facial Expressions via Acoustic Sensing on Commodity Smartphones, has been accepted for publication at the 2022 ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN).
   

November 2021: Our paper, Eavesdropping User Credentials via GPU Side Channels on Smartphones, has been accepted for publication at the 2022 ACM Int'l Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). This is one of the few works that demonstrate critical security vulnerabilities of mainstream GPUs (QualComm Adreno GPU on Snapdragon SoCs) on smartphones, which allow an unprivileged attacker to eavesdrop the user's sensitive credentials such as app username and password. This attack has been acknowledged by Google and has been incorporated by Google in its future Android security updates. Watch our demo video below for details.

   

August 2020: Our paper, SpiroSonic: Monitoring Human Lung Function via Acoustic Sensing on Commodity Smartphones, has been accepted for publication at the 2020 International Conference on Mobile Computing and Networking (MobiCom). This is the first work that allows commodity smartphones to be used as a portable spirometer and provide accuracy lung function test results on par with clinical-grade spirometers. This is a collaborative work with the Children's Hospital of Pittsburgh, and could also potentially contribute to in-home evaluation of COVID-19 risks by allowing convenient out-of-clinic lung function evaluation. Watch our presentation video for details.

 
May 2020: We have been exploiting the power of modern mobile computing technologies to fight against COVID-19. Our new project of using commodity smartphones for early-stage COVID-19 diagnosis has been funded by NSF, and was reported by several news media internationally. [WGN TV, Daily Mail, News Medical, Medical Express, Pittsburgh Post-Gazette]

More research highlights...