NACL: A Robust KV Cache Eviction Framework for Efficient Long-Text Processing in LLMs
Large Language Models (LLMs) with extended context windows have shown remarkable potential in handling complex tasks such as long conversations, document summarization, and code debugging. However, their deployment faces significant challenges, primarily due to the […]