Skip to main content

Applications

Retrieval-Augmented Generation (RAG)

  • Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks — Lewis et al., NeurIPS 2020. arXiv
  • Retrieval-Augmented Generation for Large Language Models: A Survey — Gao et al., 2023. arXiv

Sustainability & Green AI

  • Sustainable AI: Environmental Implications, Challenges, and Opportunities — Luccioni et al., 2021. arXiv
  • Characterizing Power Management Opportunities for LLMs in the Cloud — Patel et al., ASPLOS 2024.
  • Power-Aware Deep Learning Model Serving with μ-Serve — Qiu et al., USENIX ATC 2024.
  • Scaling AI Sustainably: An Uncharted Territory — Wu, USENIX ATC 2024 Keynote. Slides

AI-Assisted Engineering & Operations

  • LLaMA-Reviewer: Advancing Code Review Automation with Large Language Models — Lu et al., ISSRE 2023.
  • R2E: Turning Any GitHub Repository into a Programming Agent Environment — Weng et al., 2024.
  • Exploring LLM-Based Agents for Root Cause Analysis — Yang et al., 2024. arXiv

AI Safety & Governance

  • SEAS: Self-Evolving Adversarial Safety Optimization for Large Language Models — Diao et al., 2024. arXiv
  • Fairness in AI and Its Long-Term Implications on Society — Bohdal et al., 2023. arXiv

Tool-Augmented LLMs & Benchmarking

  • API-BLEND: A Comprehensive Corpora for Training and Benchmarking API LLMs — Basu et al., 2024.
  • TaskDiff: A Similarity Metric for Task-Oriented Conversations — Bhaumik et al., EMNLP 2023. ACL Anthology
  • API Pack: A Massive Multi-Programming Language Dataset for API Call Generation — Guo et al., 2024. arXiv
  • StableToolBench: Towards Stable Large-Scale Benchmarking on Tool Learning of LLMs — Guo et al., ACL Findings 2024. ACL Anthology
  • RouterBench: A Benchmark for Multi-LLM Routing Systems — Hu et al., 2024. arXiv
  • SWE-bench: Can Language Models Resolve Real-World GitHub Issues? — Jimenez et al., 2024. arXiv
  • API-Bank: A Comprehensive Benchmark for Tool-Augmented LLMs — Li et al., 2023. arXiv
  • ToolSandbox: A Stateful, Conversational, Interactive Evaluation Benchmark for LLM Tool Use Capabilities — Lu et al., 2024. arXiv
  • GAIA: A Benchmark for General AI Assistants — Mialon et al., 2023. arXiv
  • Towards LLM-Based Personal Agents in the Enterprise — Muthusamy et al., EMNLP Findings 2023. ACL Anthology
  • Gorilla: LLM Connected with Massive APIs — Patil et al., 2023. arXiv
  • GoEx: Perspectives and Designs Towards a Runtime for Autonomous LLM Applications — Patil et al., 2024.
  • ToolLLM: Facilitating Large Language Models to Master 16,000+ Real-World APIs — Qin et al., 2023. arXiv
  • Do Multimodal Foundation Models Understand Enterprise Workflows? — Wornow et al., 2024. arXiv
  • TravelPlanner: A Benchmark for Real-World Planning with Language Agents — Xie et al., 2024.
  • ProAgent: From Robotic Process Automation to Agentic Process Automation — Ye et al., 2023. arXiv

Notes

  • Citations without direct links reference venue proceedings; see associated digital libraries for final versions.