LLM의 효율적인 추론 가속을 위한 Context Filtering 기법 연구

정현석; 현선영; 최윤석; 이창은; 하영국

LLM의 효율적인 추론 가속을 위한 Context Filtering 기법 연구

원문정보

A Study on Context Filtering for Efficient Inference Acceleration of LLM

정현석, 현선영, 최윤석, 이창은, 하영국

국제차세대융합기술학회 차세대융합기술학회논문지 제9권 11호 2025.11 pp.2823-2830 KCI 등재

피인용수 : 0건 (자료제공 : 네이버학술정보)

초록

영어

Since the advent of ChatGPT, research on applying large language models(LLM) to various fields has been actively conducted. Typical methods for improving LLM's reasoning performance include Chain-of-Thought and Retrieval-Augmented Generation(RAG). However, these methods increase the length of the input prompt, thereby increasing the inference time and cost, and there is a problem that Hallucination occurs due to unnecessary context. In order to solve this problem, this paper proposes a Context Filtering technique that compresses context, leaving only relevant information based on the questionnaire among prompts. Context Filtering is a method for LLM to infer using only necessary information as input based on the time-series and semantic relevance of a query. The experiment was conducted based on time-series data of Timestamp + Triplet structure in a battlefield situational awareness scenario. As a result of the experiment, it was confirmed that the situation awareness accuracy similar to that of the existing prompt was maintained even at the compression prompt using only about 25% of the entire context, and the inference time and token usage were also reduced.

한국어

ChatGPT의 등장 이후 대규모 언어 모델(Large Language Model, LLM)을 다양한 분야에 적용하는 연구가 활발하게 이루어지고 있다. LLM의 추론 성능을 향상시키기 위한 대표적인 방법으로는 Chain-of-Thought, Retrieval-Augmented Generation(RAG) 등이 있다. 그러나 이러한 방법들은 입력 프롬프트(Prompt)의 길이를 증가 시켜 추론 시간과 비용을 높이며, 불필요한 문맥으로 인해 환각 현상이 발생하는 문제가 있다. 본 논문에서는 이러한 문제를 해결하기 위해, 프롬프트 중 질의문을 기준으로 관련된 정보만 남기고 문맥(Context)을 압축하는 문맥 필터링 (Context Filtering) 기법을 제안한다. 문맥 필터링은 질의문의 시계열적, 의미적 관련성을 기준으로 필요한 정보만 입력 으로 사용하여 LLM이 추론하는 방법이다. 실험은 전장 시뮬레이션으로부터 생성된 Timestamp + Triplet 구조의 대규 모 시계열 데이터를 기반으로 수행되었다. 실험 결과, 전체 문맥의 약 25%만을 사용한 압축 프롬프트에서도 기존 프롬프 트와 유사한 수준의 상황 인지 정확도를 유지하였으며, 추론 시간과 토큰 사용량 또한 감소하였음을 확인하였다.

요약
Abstract
Ⅰ. 서론
Ⅱ. 관련 연구
2.1 LLM
2.2 프롬프트 압축 기법
2.3 시계열 데이터 추론
Ⅲ. Context Filtering 기법
3.1 Context Filtering 개요
3.2 시간적 필터링(Temporal Filtering)
3.3 의미적 필터링(Semantic Filtering)
3.4 최종 프롬프트
Ⅳ. 실험 및 분석
4.1 데이터셋
4.2 실험
4.3 결과 및 분석
Ⅴ. 결론
REFERENCES

키워드

저자정보

정현석 Hyun-seok Chung. 스마트랩스 연구원
현선영 Sun-young Hyun. 스마트랩스 연구원
최윤석 Yoon-Seok Choi. 한국전자통신연구원 책임연구원
이창은 Chang-eun Lee. 한국전자통신연구원 책임연구원
하영국 Young-guk Ha. 건국대학교 컴퓨터공학부 교수

참고문헌

자료제공 : 네이버학술정보

함께 이용한 논문

※ 기관로그인 시 무료 이용이 가능합니다.

4,000원

0개의 논문이 장바구니에 담겼습니다.

earticle