Retrieval-Augmented Generation (RAG)
- Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks — Lewis et al., NeurIPS 2020. arXiv
- Retrieval-Augmented Generation for Large Language Models: A Survey — Gao et al., 2023. arXiv
Sustainability & Green AI
- Sustainable AI: Environmental Implications, Challenges, and Opportunities — Luccioni et al., 2021. arXiv
- Characterizing Power Management Opportunities for LLMs in the Cloud — Patel et al., ASPLOS 2024.
- Power-Aware Deep Learning Model Serving with μ-Serve — Qiu et al., USENIX ATC 2024.
- Scaling AI Sustainably: An Uncharted Territory — Wu, USENIX ATC 2024 Keynote. Slides
AI-Assisted Engineering & Operations
- LLaMA-Reviewer: Advancing Code Review Automation with Large Language Models — Lu et al., ISSRE 2023.
- R2E: Turning Any GitHub Repository into a Programming Agent Environment — Weng et al., 2024.
- Exploring LLM-Based Agents for Root Cause Analysis — Yang et al., 2024. arXiv
AI Safety & Governance
- SEAS: Self-Evolving Adversarial Safety Optimization for Large Language Models — Diao et al., 2024. arXiv
- Fairness in AI and Its Long-Term Implications on Society — Bohdal et al., 2023. arXiv
- API-BLEND: A Comprehensive Corpora for Training and Benchmarking API LLMs — Basu et al., 2024.
- TaskDiff: A Similarity Metric for Task-Oriented Conversations — Bhaumik et al., EMNLP 2023. ACL Anthology
- API Pack: A Massive Multi-Programming Language Dataset for API Call Generation — Guo et al., 2024. arXiv
- StableToolBench: Towards Stable Large-Scale Benchmarking on Tool Learning of LLMs — Guo et al., ACL Findings 2024. ACL Anthology
- RouterBench: A Benchmark for Multi-LLM Routing Systems — Hu et al., 2024. arXiv
- SWE-bench: Can Language Models Resolve Real-World GitHub Issues? — Jimenez et al., 2024. arXiv
- API-Bank: A Comprehensive Benchmark for Tool-Augmented LLMs — Li et al., 2023. arXiv
- ToolSandbox: A Stateful, Conversational, Interactive Evaluation Benchmark for LLM Tool Use Capabilities — Lu et al., 2024. arXiv
- GAIA: A Benchmark for General AI Assistants — Mialon et al., 2023. arXiv
- Towards LLM-Based Personal Agents in the Enterprise — Muthusamy et al., EMNLP Findings 2023. ACL Anthology
- Gorilla: LLM Connected with Massive APIs — Patil et al., 2023. arXiv
- GoEx: Perspectives and Designs Towards a Runtime for Autonomous LLM Applications — Patil et al., 2024.
- ToolLLM: Facilitating Large Language Models to Master 16,000+ Real-World APIs — Qin et al., 2023. arXiv
- Do Multimodal Foundation Models Understand Enterprise Workflows? — Wornow et al., 2024. arXiv
- TravelPlanner: A Benchmark for Real-World Planning with Language Agents — Xie et al., 2024.
- ProAgent: From Robotic Process Automation to Agentic Process Automation — Ye et al., 2023. arXiv
Notes
- Citations without direct links reference venue proceedings; see associated digital libraries for final versions.