Applications

Retrieval-Augmented Generation (RAG)

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks — Lewis et al., NeurIPS 2020. arXiv
Retrieval-Augmented Generation for Large Language Models: A Survey — Gao et al., 2023. arXiv

Sustainability & Green AI

Sustainable AI: Environmental Implications, Challenges, and Opportunities — Luccioni et al., 2021. arXiv
Characterizing Power Management Opportunities for LLMs in the Cloud — Patel et al., ASPLOS 2024.
Power-Aware Deep Learning Model Serving with μ-Serve — Qiu et al., USENIX ATC 2024.
Scaling AI Sustainably: An Uncharted Territory — Wu, USENIX ATC 2024 Keynote. Slides

AI-Assisted Engineering & Operations

LLaMA-Reviewer: Advancing Code Review Automation with Large Language Models — Lu et al., ISSRE 2023.
R2E: Turning Any GitHub Repository into a Programming Agent Environment — Weng et al., 2024.
Exploring LLM-Based Agents for Root Cause Analysis — Yang et al., 2024. arXiv

AI Safety & Governance

SEAS: Self-Evolving Adversarial Safety Optimization for Large Language Models — Diao et al., 2024. arXiv
Fairness in AI and Its Long-Term Implications on Society — Bohdal et al., 2023. arXiv

Tool-Augmented LLMs & Benchmarking

API-BLEND: A Comprehensive Corpora for Training and Benchmarking API LLMs — Basu et al., 2024.
TaskDiff: A Similarity Metric for Task-Oriented Conversations — Bhaumik et al., EMNLP 2023. ACL Anthology
API Pack: A Massive Multi-Programming Language Dataset for API Call Generation — Guo et al., 2024. arXiv
StableToolBench: Towards Stable Large-Scale Benchmarking on Tool Learning of LLMs — Guo et al., ACL Findings 2024. ACL Anthology
RouterBench: A Benchmark for Multi-LLM Routing Systems — Hu et al., 2024. arXiv
SWE-bench: Can Language Models Resolve Real-World GitHub Issues? — Jimenez et al., 2024. arXiv
API-Bank: A Comprehensive Benchmark for Tool-Augmented LLMs — Li et al., 2023. arXiv
ToolSandbox: A Stateful, Conversational, Interactive Evaluation Benchmark for LLM Tool Use Capabilities — Lu et al., 2024. arXiv
GAIA: A Benchmark for General AI Assistants — Mialon et al., 2023. arXiv
Towards LLM-Based Personal Agents in the Enterprise — Muthusamy et al., EMNLP Findings 2023. ACL Anthology
Gorilla: LLM Connected with Massive APIs — Patil et al., 2023. arXiv
GoEx: Perspectives and Designs Towards a Runtime for Autonomous LLM Applications — Patil et al., 2024.
ToolLLM: Facilitating Large Language Models to Master 16,000+ Real-World APIs — Qin et al., 2023. arXiv
Do Multimodal Foundation Models Understand Enterprise Workflows? — Wornow et al., 2024. arXiv
TravelPlanner: A Benchmark for Real-World Planning with Language Agents — Xie et al., 2024.
ProAgent: From Robotic Process Automation to Agentic Process Automation — Ye et al., 2023. arXiv

Notes

Citations without direct links reference venue proceedings; see associated digital libraries for final versions.

Retrieval-Augmented Generation (RAG)​

Sustainability & Green AI​

AI-Assisted Engineering & Operations​

AI Safety & Governance​

Tool-Augmented LLMs & Benchmarking​

Notes​

Retrieval-Augmented Generation (RAG)

Sustainability & Green AI

AI-Assisted Engineering & Operations

AI Safety & Governance

Tool-Augmented LLMs & Benchmarking

Notes