Short Term Memory API Reference¶
Classes:
-
SummarizationNode
–A LangGraph node that summarizes messages when they exceed a token limit and replaces them with a summary message.
-
SummarizationResult
–Result of message summarization.
-
RunningSummary
–Object for storing information about the previous summarization.
Functions:
-
summarize_messages
–Summarize messages when they exceed a token limit and replace them with a summary message.
SummarizationNode
¶
Bases: RunnableCallable
A LangGraph node that summarizes messages when they exceed a token limit and replaces them with a summary message.
Methods:
-
__init__
–A LangGraph node that summarizes messages when they exceed a token limit and replaces them with a summary message.
__init__
¶
__init__(
*,
model: LanguageModelLike,
max_tokens: int,
max_summary_tokens: int = 256,
token_counter: TokenCounter = count_tokens_approximately,
initial_summary_prompt: ChatPromptTemplate = DEFAULT_INITIAL_SUMMARY_PROMPT,
existing_summary_prompt: ChatPromptTemplate = DEFAULT_EXISTING_SUMMARY_PROMPT,
final_prompt: ChatPromptTemplate = DEFAULT_FINAL_SUMMARY_PROMPT,
input_messages_key: str = "messages",
output_messages_key: str = "summarized_messages",
name: str = "summarization",
) -> None
A LangGraph node that summarizes messages when they exceed a token limit and replaces them with a summary message.
Processes the messages from oldest to newest: once the cumulative number of message tokens
reaches max_tokens
, all messages within max_tokens
are summarized (excluding the system message, if any)
and replaced with a new summary message. The resulting list of messages is [summary_message] + remaining_messages.
Parameters:
-
model
(LanguageModelLike
) –The language model to use for generating summaries.
-
max_tokens
(int
) –Maximum number of tokens to return. Will also be used as a threshold for triggering the summarization: once the cumulative number of message tokens reaches
max_tokens
, all messages withinmax_tokens
will be summarized.Note
If the last message within max_tokens is an AI message with tool calls or a human message, this message will not be summarized, and instead will be added to the returned messages.
-
max_summary_tokens
(int
, default:256
) –Maximum number of tokens to budget for the summary.
Note
This parameter is not passed to the summary-generating LLM to limit the length of the summary. It is only used for correctly estimating the maximum allowed token budget. If you want to enforce it, you would need to pass
model.bind(max_tokens=max_summary_tokens)
as themodel
parameter to this function. -
token_counter
(TokenCounter
, default:count_tokens_approximately
) –Function to count tokens in a message. Defaults to approximate counting. For more accurate counts you can use
model.get_num_tokens_from_messages
. -
initial_summary_prompt
(ChatPromptTemplate
, default:DEFAULT_INITIAL_SUMMARY_PROMPT
) –Prompt template for generating the first summary.
-
existing_summary_prompt
(ChatPromptTemplate
, default:DEFAULT_EXISTING_SUMMARY_PROMPT
) –Prompt template for updating an existing (running) summary.
-
final_prompt
(ChatPromptTemplate
, default:DEFAULT_FINAL_SUMMARY_PROMPT
) –Prompt template that combines summary with the remaining messages before returning.
-
input_messages_key
(str
, default:'messages'
) –Key in the input graph state that contains the list of messages to summarize.
-
output_messages_key
(str
, default:'summarized_messages'
) –Key in the state update that contains the list of updated messages.
Warning
By default, the
output_messages_key
is different from theinput_messages_key
. This is done to decouple summarized messages from the main list of messages in the graph state (i.e.,input_messages_key
). You should only make them the same if you want to overwrite the main list of messages (i.e.,input_messages_key
). -
name
(str
, default:'summarization'
) –Name of the summarization node.
Returns:
-
None
–
Example
from typing import Any, TypedDict
from langchain_openai import ChatOpenAI
from langchain_core.messages import AnyMessage
from langgraph.graph import StateGraph, START, MessagesState
from langgraph.checkpoint.memory import InMemorySaver
from langmem.short_term import SummarizationNode, RunningSummary
model = ChatOpenAI(model="gpt-4o")
summarization_model = model.bind(max_tokens=128)
class State(MessagesState):
context: dict[str, Any]
class LLMInputState(TypedDict):
summarized_messages: list[AnyMessage]
context: dict[str, Any]
summarization_node = SummarizationNode(
model=summarization_model,
max_tokens=256,
max_summary_tokens=128,
)
def call_model(state: LLMInputState):
response = model.invoke(state["summarized_messages"])
return {"messages": [response]}
checkpointer = InMemorySaver()
workflow = StateGraph(State)
workflow.add_node(call_model)
workflow.add_node("summarize", summarization_node)
workflow.add_edge(START, "summarize")
workflow.add_edge("summarize", "call_model")
graph = workflow.compile(checkpointer=checkpointer)
config = {"configurable": {"thread_id": "1"}}
graph.invoke({"messages": "hi, my name is bob"}, config)
graph.invoke({"messages": "write a short poem about cats"}, config)
graph.invoke({"messages": "now do the same but for dogs"}, config)
graph.invoke({"messages": "what's my name?"}, config)
SummarizationResult
dataclass
¶
Result of message summarization.
Attributes:
-
messages
(list[AnyMessage]
) –List of updated messages that are ready to be input to the LLM after summarization, including a message with a summary (if any).
-
running_summary
(RunningSummary | None
) –Information about previous summarization (the summary and the IDs of the previously summarized messages.
messages
instance-attribute
¶
messages: list[AnyMessage]
List of updated messages that are ready to be input to the LLM after summarization, including a message with a summary (if any).
running_summary
class-attribute
instance-attribute
¶
running_summary: RunningSummary | None = None
Information about previous summarization (the summary and the IDs of the previously summarized messages. Can be None if no summarization was performed (not enough messages to summarize).
RunningSummary
dataclass
¶
Object for storing information about the previous summarization.
Used on subsequent calls to summarize_messages to avoid summarizing the same messages.
Attributes:
-
summary
(str
) –Latest summary of the messages, updated every time the summarization is performed.
-
summarized_message_ids
(set[str]
) –The IDs of all of the messages that have been previously summarized.
-
last_summarized_message_id
(str | None
) –The ID of the last message that was summarized.
summarize_messages
¶
summarize_messages(
messages: list[AnyMessage],
*,
running_summary: RunningSummary | None,
model: LanguageModelLike,
max_tokens: int,
max_summary_tokens: int = 256,
token_counter: TokenCounter = count_tokens_approximately,
initial_summary_prompt: ChatPromptTemplate = DEFAULT_INITIAL_SUMMARY_PROMPT,
existing_summary_prompt: ChatPromptTemplate = DEFAULT_EXISTING_SUMMARY_PROMPT,
final_prompt: ChatPromptTemplate = DEFAULT_FINAL_SUMMARY_PROMPT,
) -> SummarizationResult
Summarize messages when they exceed a token limit and replace them with a summary message.
This function processes the messages from oldest to newest: once the cumulative number of message tokens
reaches max_tokens
, all messages within max_tokens
are summarized (excluding the system message, if any)
and replaced with a new summary message. The resulting list of messages is [summary_message] + remaining_messages.
Parameters:
-
messages
(list[AnyMessage]
) –The list of messages to process.
-
running_summary
(RunningSummary | None
) –Optional running summary object with information about the previous summarization. If provided: - only messages that were not previously summarized will be processed - if no new summary is generated, the running summary will be added to the returned messages - if a new summary needs to be generated, it is generated by incorporating the existing summary value from the running summary
-
model
(LanguageModelLike
) –The language model to use for generating summaries.
-
max_tokens
(int
) –Maximum number of tokens to return. Will also be used as a threshold for triggering the summarization: once the cumulative number of message tokens reaches
max_tokens
, all messages withinmax_tokens
will be summarized.Note
If the last message within
max_tokens
is an AI message with tool calls or a human message, this message will not be summarized, and instead will be added to the returned messages. -
max_summary_tokens
(int
, default:256
) –Maximum number of tokens to budget for the summary.
Note
This parameter is not passed to the summary-generating LLM to limit the length of the summary. It is only used for correctly estimating the maximum allowed token budget. If you want to enforce it, you would need to pass
model.bind(max_tokens=max_summary_tokens)
as themodel
parameter to this function. -
token_counter
(TokenCounter
, default:count_tokens_approximately
) –Function to count tokens in a message. Defaults to approximate counting. For more accurate counts you can use
model.get_num_tokens_from_messages
. -
initial_summary_prompt
(ChatPromptTemplate
, default:DEFAULT_INITIAL_SUMMARY_PROMPT
) –Prompt template for generating the first summary.
-
existing_summary_prompt
(ChatPromptTemplate
, default:DEFAULT_EXISTING_SUMMARY_PROMPT
) –Prompt template for updating an existing (running) summary.
-
final_prompt
(ChatPromptTemplate
, default:DEFAULT_FINAL_SUMMARY_PROMPT
) –Prompt template that combines summary with the remaining messages before returning.
Returns:
-
SummarizationResult
–A SummarizationResult object containing the updated messages and a running summary. - messages: list of updated messages ready to be input to the LLM - running_summary: RunningSummary object - summary: text of the latest summary - summarized_message_ids: set of message IDs that were previously summarized - last_summarized_message_id: ID of the last message that was summarized
Example
from langgraph.graph import StateGraph, START, MessagesState
from langgraph.checkpoint.memory import InMemorySaver
from langmem.short_term import summarize_messages, RunningSummary
from langchain_openai import ChatOpenAI
model = ChatOpenAI(model="gpt-4o")
summarization_model = model.bind(max_tokens=128)
class SummaryState(MessagesState):
summary: RunningSummary | None
def call_model(state):
summarization_result = summarize_messages(
state["messages"],
running_summary=state.get("summary"),
model=summarization_model,
max_tokens=256,
max_summary_tokens=128,
)
response = model.invoke(summarization_result.messages)
state_update = {"messages": [response]}
if summarization_result.running_summary:
state_update["summary"] = summarization_result.running_summary
return state_update
checkpointer = InMemorySaver()
workflow = StateGraph(SummaryState)
workflow.add_node(call_model)
workflow.add_edge(START, "call_model")
graph = workflow.compile(checkpointer=checkpointer)
config = {"configurable": {"thread_id": "1"}}
graph.invoke({"messages": "hi, my name is bob"}, config)
graph.invoke({"messages": "write a short poem about cats"}, config)
graph.invoke({"messages": "now do the same but for dogs"}, config)
graph.invoke({"messages": "what's my name?"}, config)