Research
We study how to make large-scale software systems more observable, performant, and maintainable. Our work spans the full software lifecycle — from development-time logging decisions to production-time performance monitoring and regression testing. Driven by extensive industrial collaborations with partners like Wind River, Ericsson, Alibaba, ERA Environmental, BlackBerry, and Mobeewave (Apple), we build practical tools backed by empirical evidence to support developers operating ultra-large-scale systems (ULSS).
Research Areas
Green AIWare & Software Energy Efficiency
Our Green AIWare program addresses the massive carbon footprint and energy demands of modern Artificial Intelligence. We study the energy efficiency of software that is either powered by AI or generated by AI:
- AI-Powered Software: We build multidimensional frameworks in collaboration with the Standard Performance Evaluation Corporation (SPEC) to measure runtime energy and battery consumption alongside user experience (responsiveness, satisfaction). We design guidelines and automated diagnostics tools to resolve runtime inefficiency root causes.
- AI-Generated Software: We benchmark code generated by commercial and free LLMs across different hardware (GPUs) using benchmarks like EffiBench. We utilize prompting strategies (such as RAG) and model fine-tuning (e.g., DeepSeekCoder, StarCoder) to build energy-aware AI coding assistants.
- Energy Testing for AIWare: We generate realistic workload profiles by monitoring external user interactions and internal model inference, and develop automated co-generation and reinforcement learning pipelines to test AI-generated code.
Software Log Mining & Log Intelligence
Execution logs are the primary data source for monitoring and debugging ultra-large-scale systems. We are pioneers in software logging decisions, developing automated techniques to guide developers on where to log, what to log, and what log levels to select while balancing logging costs and performance overhead. We study logging anti-patterns (such as duplicate logs), investigate security concerns like privacy leakage in third-party Android logging libraries (Android Log Privacy), and optimize logging infrastructure through high-performance log parsing (TempoLo) and log compression.
Performance Assurance for Ultra-Large-Scale Systems
Assuring software performance is highly resource-intensive. We propose automated techniques to detect performance issues and regressions, analyze performance-issue-inducing changes, and optimize system configurations and cache parameters. We optimize performance testing pipelines by identifying discrepancies between performance test workloads and actual user behavior, generating realistic test suites, and leveraging existing functional tests to simulate end-to-end production performance profiles—minimizing testing costs for flagship software systems (PMT, Perf-JIT-Models).
DevOps Quality Assurance & Test Optimization
Testing represents up to 50% of the software development lifecycle cost, presenting a massive bottleneck during rapid DevOps release cycles. In close collaboration with Ericsson, we develop machine-learning-based techniques to optimize testing pipelines by selecting and prioritizing tests most likely to fail, and guiding test batching and bisection. Additionally, we address user-interface testing flakiness (such as in web GUI testing) by automatically capturing and analyzing fine-grained system states during test steps in collaboration with ERA Environmental (LongTermUITest, NFBugsExtended).
API Usability & Evolution
Modern software systems heavily rely on third-party application programming interfaces (APIs), but rapid API evolution creates a major maintenance burden. We develop automated tools to assist developers in migrating source code to new API versions (A3). We also study "API workarounds"—temporary fixes or bypasses written by API clients when APIs fail to meet their needs. We published the first literature survey on API workarounds in ACM Computing Surveys and build automated tools to extract workarounds (API-Workarounds) to help API developers improve their interface design.
AI and LLMs for Software Engineering
We investigate how AI models and Large Language Models (LLMs) can be leveraged in software engineering tasks, such as detecting inconsistencies between software requirements and their implementation in safety-critical automotive systems. Simultaneously, we study the security and compliance implications of using AI coding assistants, particularly concerning memorization and training data leakage in Code LLMs (LLM Memorization).
Research Sponsors
Our research is generously supported by government funding agencies, industry partners, and academic institutions.