Short Term Memory API Reference¶

Classes:

SummarizationNode –

A LangGraph node that summarizes messages when they exceed a token limit and replaces them with a summary message.
SummarizationResult –

Result of message summarization.
RunningSummary –

Object for storing information about the previous summarization.

Functions:

summarize_messages –

Summarize messages when they exceed a token limit and replace them with a summary message.

SummarizationNode ¶

Bases: RunnableCallable

A LangGraph node that summarizes messages when they exceed a token limit and replaces them with a summary message.

Methods:

__init__ –

A LangGraph node that summarizes messages when they exceed a token limit and replaces them with a summary message.

init ¶

__init__(
    *,
    model: LanguageModelLike,
    max_tokens: int,
    max_tokens_before_summary: int | None = None,
    max_summary_tokens: int = 256,
    token_counter: TokenCounter = count_tokens_approximately,
    initial_summary_prompt: ChatPromptTemplate = DEFAULT_INITIAL_SUMMARY_PROMPT,
    existing_summary_prompt: ChatPromptTemplate = DEFAULT_EXISTING_SUMMARY_PROMPT,
    final_prompt: ChatPromptTemplate = DEFAULT_FINAL_SUMMARY_PROMPT,
    input_messages_key: str = "messages",
    output_messages_key: str = "summarized_messages",
    name: str = "summarization",
) -> None

A LangGraph node that summarizes messages when they exceed a token limit and replaces them with a summary message.

Processes the messages from oldest to newest: once the cumulative number of message tokens reaches max_tokens_before_summary, all messages within max_tokens_before_summary are summarized (excluding the system message, if any) and replaced with a new summary message. The resulting list of messages is [summary_message] + remaining_messages.

Parameters:

model (LanguageModelLike) –

The language model to use for generating summaries.
max_tokens (int) –

Maximum number of tokens to return in the final output. Will be enforced only after summarization.
max_tokens_before_summary (int | None, default: None ) –

Maximum number of tokens to accumulate before triggering summarization. Defaults to the same value as max_tokens if not provided. This allows fitting more tokens into the summarization LLM, if needed.

Note

If the last message within max_tokens_before_summary is an AI message with tool calls, all of the subsequent, corresponding tool messages will be summarized as well.

Note

If the number of tokens to be summarized is greater than max_tokens, only the last max_tokens amongst those will be summarized. This is done to prevent exceeding the context window of the summarization LLM (assumed to be capped at max_tokens).
max_summary_tokens (int, default: 256 ) –

Maximum number of tokens to budget for the summary.

Note

This parameter is not passed to the summary-generating LLM to limit the length of the summary. It is only used for correctly estimating the maximum allowed token budget. If you want to enforce it, you would need to pass model.bind(max_tokens=max_summary_tokens) as the model parameter to this function.
token_counter (TokenCounter, default: count_tokens_approximately ) –

Function to count tokens in a message. Defaults to approximate counting. For more accurate counts you can use model.get_num_tokens_from_messages.
initial_summary_prompt (ChatPromptTemplate, default: DEFAULT_INITIAL_SUMMARY_PROMPT ) –

Prompt template for generating the first summary.
existing_summary_prompt (ChatPromptTemplate, default: DEFAULT_EXISTING_SUMMARY_PROMPT ) –

Prompt template for updating an existing (running) summary.
final_prompt (ChatPromptTemplate, default: DEFAULT_FINAL_SUMMARY_PROMPT ) –

Prompt template that combines summary with the remaining messages before returning.
input_messages_key (str, default: 'messages' ) –

Key in the input graph state that contains the list of messages to summarize.
output_messages_key (str, default: 'summarized_messages' ) –

Key in the state update that contains the list of updated messages.

Warning

By default, the output_messages_key is different from the input_messages_key. This is done to decouple summarized messages from the main list of messages in the graph state (i.e., input_messages_key). You should only make them the same if you want to overwrite the main list of messages (i.e., input_messages_key).
name (str, default: 'summarization' ) –

Name of the summarization node.

Returns:

None –

LangGraph state update in the following format:

{
    "output_messages_key": <list of updated messages ready to be input to the LLM after summarization, including a message with a summary (if any)>,
    "context": {"running_summary": <RunningSummary object>}
}

Example

from typing import Any, TypedDict
from langchain_openai import ChatOpenAI
from langchain_core.messages import AnyMessage
from langgraph.graph import StateGraph, START, MessagesState
from langgraph.checkpoint.memory import InMemorySaver
from langmem.short_term import SummarizationNode, RunningSummary

model = ChatOpenAI(model="gpt-4o")
summarization_model = model.bind(max_tokens=128)


class State(MessagesState):
    context: dict[str, Any]


class LLMInputState(TypedDict):
    summarized_messages: list[AnyMessage]
    context: dict[str, Any]


summarization_node = SummarizationNode(
    model=summarization_model,
    max_tokens=256,
    max_tokens_before_summary=256,
    max_summary_tokens=128,
)


def call_model(state: LLMInputState):
    response = model.invoke(state["summarized_messages"])
    return {"messages": [response]}


checkpointer = InMemorySaver()
workflow = StateGraph(State)
workflow.add_node(call_model)
workflow.add_node("summarize", summarization_node)
workflow.add_edge(START, "summarize")
workflow.add_edge("summarize", "call_model")
graph = workflow.compile(checkpointer=checkpointer)

config = {"configurable": {"thread_id": "1"}}
graph.invoke({"messages": "hi, my name is bob"}, config)
graph.invoke({"messages": "write a short poem about cats"}, config)
graph.invoke({"messages": "now do the same but for dogs"}, config)
graph.invoke({"messages": "what's my name?"}, config)

SummarizationResult `dataclass` ¶

Result of message summarization.

Attributes:

messages (list[AnyMessage]) –

List of updated messages that are ready to be input to the LLM after summarization, including a message with a summary (if any).
running_summary (RunningSummary | None) –

Information about previous summarization (the summary and the IDs of the previously summarized messages.

messages `instance-attribute` ¶

messages: list[AnyMessage]

List of updated messages that are ready to be input to the LLM after summarization, including a message with a summary (if any).

running_summary `class-attribute` `instance-attribute` ¶

running_summary: RunningSummary | None = None

Information about previous summarization (the summary and the IDs of the previously summarized messages. Can be None if no summarization was performed (not enough messages to summarize).

RunningSummary `dataclass` ¶

Object for storing information about the previous summarization.

Used on subsequent calls to summarize_messages to avoid summarizing the same messages.

Attributes:

summary (str) –

Latest summary of the messages, updated every time the summarization is performed.
summarized_message_ids (set[str]) –

The IDs of all of the messages that have been previously summarized.
last_summarized_message_id (str | None) –

The ID of the last message that was summarized.

summary `instance-attribute` ¶

summary: str

Latest summary of the messages, updated every time the summarization is performed.

summarized_message_ids `instance-attribute` ¶

summarized_message_ids: set[str]

The IDs of all of the messages that have been previously summarized.

last_summarized_message_id `instance-attribute` ¶

last_summarized_message_id: str | None

The ID of the last message that was summarized.

summarize_messages ¶

summarize_messages(
    messages: list[AnyMessage],
    *,
    running_summary: RunningSummary | None,
    model: LanguageModelLike,
    max_tokens: int,
    max_tokens_before_summary: int | None = None,
    max_summary_tokens: int = 256,
    token_counter: TokenCounter = count_tokens_approximately,
    initial_summary_prompt: ChatPromptTemplate = DEFAULT_INITIAL_SUMMARY_PROMPT,
    existing_summary_prompt: ChatPromptTemplate = DEFAULT_EXISTING_SUMMARY_PROMPT,
    final_prompt: ChatPromptTemplate = DEFAULT_FINAL_SUMMARY_PROMPT,
) -> SummarizationResult

Summarize messages when they exceed a token limit and replace them with a summary message.

This function processes the messages from oldest to newest: once the cumulative number of message tokens reaches max_tokens_before_summary, all messages within max_tokens_before_summary are summarized (excluding the system message, if any) and replaced with a new summary message. The resulting list of messages is [summary_message] + remaining_messages.

Parameters:

messages (list[AnyMessage]) –

The list of messages to process.
running_summary (RunningSummary | None) –

Optional running summary object with information about the previous summarization. If provided: - only messages that were not previously summarized will be processed - if no new summary is generated, the running summary will be added to the returned messages - if a new summary needs to be generated, it is generated by incorporating the existing summary value from the running summary
model (LanguageModelLike) –

The language model to use for generating summaries.
max_tokens (int) –

Maximum number of tokens to return in the final output. Will be enforced only after summarization. This will also be used as the maximum number of tokens to feed to the summarization LLM.
max_tokens_before_summary (int | None, default: None ) –

Maximum number of tokens to accumulate before triggering summarization. Defaults to the same value as max_tokens if not provided. This allows fitting more tokens into the summarization LLM, if needed.

Note

If the last message within max_tokens_before_summary is an AI message with tool calls, all of the subsequent, corresponding tool messages will be summarized as well.

Note

If the number of tokens to be summarized is greater than max_tokens, only the last max_tokens amongst those will be summarized. This is done to prevent exceeding the context window of the summarization LLM (assumed to be capped at max_tokens).
max_summary_tokens (int, default: 256 ) –

Maximum number of tokens to budget for the summary.

Note

This parameter is not passed to the summary-generating LLM to limit the length of the summary. It is only used for correctly estimating the maximum allowed token budget. If you want to enforce it, you would need to pass model.bind(max_tokens=max_summary_tokens) as the model parameter to this function.
token_counter (TokenCounter, default: count_tokens_approximately ) –

Function to count tokens in a message. Defaults to approximate counting. For more accurate counts you can use model.get_num_tokens_from_messages.
initial_summary_prompt (ChatPromptTemplate, default: DEFAULT_INITIAL_SUMMARY_PROMPT ) –

Prompt template for generating the first summary.
existing_summary_prompt (ChatPromptTemplate, default: DEFAULT_EXISTING_SUMMARY_PROMPT ) –

Prompt template for updating an existing (running) summary.
final_prompt (ChatPromptTemplate, default: DEFAULT_FINAL_SUMMARY_PROMPT ) –

Prompt template that combines summary with the remaining messages before returning.

Returns:

SummarizationResult –

A SummarizationResult object containing the updated messages and a running summary. - messages: list of updated messages ready to be input to the LLM - running_summary: RunningSummary object - summary: text of the latest summary - summarized_message_ids: set of message IDs that were previously summarized - last_summarized_message_id: ID of the last message that was summarized

Example

from langgraph.graph import StateGraph, START, MessagesState
from langgraph.checkpoint.memory import InMemorySaver
from langmem.short_term import summarize_messages, RunningSummary
from langchain_openai import ChatOpenAI

model = ChatOpenAI(model="gpt-4o")
summarization_model = model.bind(max_tokens=128)


class SummaryState(MessagesState):
    summary: RunningSummary | None


def call_model(state):
    summarization_result = summarize_messages(
        state["messages"],
        running_summary=state.get("summary"),
        model=summarization_model,
        max_tokens=256,
        max_tokens_before_summary=256,
        max_summary_tokens=128,
    )
    response = model.invoke(summarization_result.messages)
    state_update = {"messages": [response]}
    if summarization_result.running_summary:
        state_update["summary"] = summarization_result.running_summary
    return state_update


checkpointer = InMemorySaver()
workflow = StateGraph(SummaryState)
workflow.add_node(call_model)
workflow.add_edge(START, "call_model")
graph = workflow.compile(checkpointer=checkpointer)

config = {"configurable": {"thread_id": "1"}}
graph.invoke({"messages": "hi, my name is bob"}, config)
graph.invoke({"messages": "write a short poem about cats"}, config)
graph.invoke({"messages": "now do the same but for dogs"}, config)
graph.invoke({"messages": "what's my name?"}, config)

Short Term Memory API Reference¶

SummarizationNode ¶

__init__ ¶

SummarizationResult dataclass ¶

messages instance-attribute ¶

running_summary class-attribute instance-attribute ¶

RunningSummary dataclass ¶

summary instance-attribute ¶

summarized_message_ids instance-attribute ¶

last_summarized_message_id instance-attribute ¶