Skip to content

Short Term Memory API Reference

Classes:

  • SummarizationNode –

    A LangGraph node that summarizes messages when they exceed a token limit and replaces them with a summary message.

  • SummarizationResult –

    Result of message summarization.

  • RunningSummary –

    Object for storing information about the previous summarization.

Functions:

  • summarize_messages –

    Summarize messages when they exceed a token limit and replace them with a summary message.

SummarizationNode

Bases: RunnableCallable

A LangGraph node that summarizes messages when they exceed a token limit and replaces them with a summary message.

Methods:

  • __init__ –

    A LangGraph node that summarizes messages when they exceed a token limit and replaces them with a summary message.

__init__

__init__(
    *,
    model: LanguageModelLike,
    max_tokens: int,
    max_summary_tokens: int = 256,
    token_counter: TokenCounter = count_tokens_approximately,
    initial_summary_prompt: ChatPromptTemplate = DEFAULT_INITIAL_SUMMARY_PROMPT,
    existing_summary_prompt: ChatPromptTemplate = DEFAULT_EXISTING_SUMMARY_PROMPT,
    final_prompt: ChatPromptTemplate = DEFAULT_FINAL_SUMMARY_PROMPT,
    input_messages_key: str = "messages",
    output_messages_key: str = "summarized_messages",
    name: str = "summarization",
) -> None

A LangGraph node that summarizes messages when they exceed a token limit and replaces them with a summary message.

Processes the messages from oldest to newest: once the cumulative number of message tokens reaches max_tokens, all messages within max_tokens are summarized (excluding the system message, if any) and replaced with a new summary message. The resulting list of messages is [summary_message] + remaining_messages.

Parameters:

  • model (LanguageModelLike) –

    The language model to use for generating summaries.

  • max_tokens (int) –

    Maximum number of tokens to return. Will also be used as a threshold for triggering the summarization: once the cumulative number of message tokens reaches max_tokens, all messages within max_tokens will be summarized.

    Note

    If the last message within max_tokens is an AI message with tool calls or a human message, this message will not be summarized, and instead will be added to the returned messages.

  • max_summary_tokens (int, default: 256 ) –

    Maximum number of tokens to budget for the summary.

    Note

    This parameter is not passed to the summary-generating LLM to limit the length of the summary. It is only used for correctly estimating the maximum allowed token budget. If you want to enforce it, you would need to pass model.bind(max_tokens=max_summary_tokens) as the model parameter to this function.

  • token_counter (TokenCounter, default: count_tokens_approximately ) –

    Function to count tokens in a message. Defaults to approximate counting. For more accurate counts you can use model.get_num_tokens_from_messages.

  • initial_summary_prompt (ChatPromptTemplate, default: DEFAULT_INITIAL_SUMMARY_PROMPT ) –

    Prompt template for generating the first summary.

  • existing_summary_prompt (ChatPromptTemplate, default: DEFAULT_EXISTING_SUMMARY_PROMPT ) –

    Prompt template for updating an existing (running) summary.

  • final_prompt (ChatPromptTemplate, default: DEFAULT_FINAL_SUMMARY_PROMPT ) –

    Prompt template that combines summary with the remaining messages before returning.

  • input_messages_key (str, default: 'messages' ) –

    Key in the input graph state that contains the list of messages to summarize.

  • output_messages_key (str, default: 'summarized_messages' ) –

    Key in the state update that contains the list of updated messages.

    Warning

    By default, the output_messages_key is different from the input_messages_key. This is done to decouple summarized messages from the main list of messages in the graph state (i.e., input_messages_key). You should only make them the same if you want to overwrite the main list of messages (i.e., input_messages_key).

  • name (str, default: 'summarization' ) –

    Name of the summarization node.

Returns:

  • None –

    LangGraph state update in the following format:

    {
        "output_messages_key": <list of updated messages ready to be input to the LLM after summarization, including a message with a summary (if any)>,
        "context": {"running_summary": <RunningSummary object>}
    }
    

Example
from typing import Any, TypedDict
from langchain_openai import ChatOpenAI
from langchain_core.messages import AnyMessage
from langgraph.graph import StateGraph, START, MessagesState
from langgraph.checkpoint.memory import InMemorySaver
from langmem.short_term import SummarizationNode, RunningSummary

model = ChatOpenAI(model="gpt-4o")
summarization_model = model.bind(max_tokens=128)


class State(MessagesState):
    context: dict[str, Any]


class LLMInputState(TypedDict):
    summarized_messages: list[AnyMessage]
    context: dict[str, Any]


summarization_node = SummarizationNode(
    model=summarization_model,
    max_tokens=256,
    max_summary_tokens=128,
)


def call_model(state: LLMInputState):
    response = model.invoke(state["summarized_messages"])
    return {"messages": [response]}


checkpointer = InMemorySaver()
workflow = StateGraph(State)
workflow.add_node(call_model)
workflow.add_node("summarize", summarization_node)
workflow.add_edge(START, "summarize")
workflow.add_edge("summarize", "call_model")
graph = workflow.compile(checkpointer=checkpointer)

config = {"configurable": {"thread_id": "1"}}
graph.invoke({"messages": "hi, my name is bob"}, config)
graph.invoke({"messages": "write a short poem about cats"}, config)
graph.invoke({"messages": "now do the same but for dogs"}, config)
graph.invoke({"messages": "what's my name?"}, config)

SummarizationResult dataclass

Result of message summarization.

Attributes:

  • messages (list[AnyMessage]) –

    List of updated messages that are ready to be input to the LLM after summarization, including a message with a summary (if any).

  • running_summary (RunningSummary | None) –

    Information about previous summarization (the summary and the IDs of the previously summarized messages.

messages instance-attribute

messages: list[AnyMessage]

List of updated messages that are ready to be input to the LLM after summarization, including a message with a summary (if any).

running_summary class-attribute instance-attribute

running_summary: RunningSummary | None = None

Information about previous summarization (the summary and the IDs of the previously summarized messages. Can be None if no summarization was performed (not enough messages to summarize).

RunningSummary dataclass

Object for storing information about the previous summarization.

Used on subsequent calls to summarize_messages to avoid summarizing the same messages.

Attributes:

summary instance-attribute

summary: str

Latest summary of the messages, updated every time the summarization is performed.

summarized_message_ids instance-attribute

summarized_message_ids: set[str]

The IDs of all of the messages that have been previously summarized.

last_summarized_message_id instance-attribute

last_summarized_message_id: str | None

The ID of the last message that was summarized.

summarize_messages

summarize_messages(
    messages: list[AnyMessage],
    *,
    running_summary: RunningSummary | None,
    model: LanguageModelLike,
    max_tokens: int,
    max_summary_tokens: int = 256,
    token_counter: TokenCounter = count_tokens_approximately,
    initial_summary_prompt: ChatPromptTemplate = DEFAULT_INITIAL_SUMMARY_PROMPT,
    existing_summary_prompt: ChatPromptTemplate = DEFAULT_EXISTING_SUMMARY_PROMPT,
    final_prompt: ChatPromptTemplate = DEFAULT_FINAL_SUMMARY_PROMPT,
) -> SummarizationResult

Summarize messages when they exceed a token limit and replace them with a summary message.

This function processes the messages from oldest to newest: once the cumulative number of message tokens reaches max_tokens, all messages within max_tokens are summarized (excluding the system message, if any) and replaced with a new summary message. The resulting list of messages is [summary_message] + remaining_messages.

Parameters:

  • messages (list[AnyMessage]) –

    The list of messages to process.

  • running_summary (RunningSummary | None) –

    Optional running summary object with information about the previous summarization. If provided: - only messages that were not previously summarized will be processed - if no new summary is generated, the running summary will be added to the returned messages - if a new summary needs to be generated, it is generated by incorporating the existing summary value from the running summary

  • model (LanguageModelLike) –

    The language model to use for generating summaries.

  • max_tokens (int) –

    Maximum number of tokens to return. Will also be used as a threshold for triggering the summarization: once the cumulative number of message tokens reaches max_tokens, all messages within max_tokens will be summarized.

    Note

    If the last message within max_tokens is an AI message with tool calls or a human message, this message will not be summarized, and instead will be added to the returned messages.

  • max_summary_tokens (int, default: 256 ) –

    Maximum number of tokens to budget for the summary.

    Note

    This parameter is not passed to the summary-generating LLM to limit the length of the summary. It is only used for correctly estimating the maximum allowed token budget. If you want to enforce it, you would need to pass model.bind(max_tokens=max_summary_tokens) as the model parameter to this function.

  • token_counter (TokenCounter, default: count_tokens_approximately ) –

    Function to count tokens in a message. Defaults to approximate counting. For more accurate counts you can use model.get_num_tokens_from_messages.

  • initial_summary_prompt (ChatPromptTemplate, default: DEFAULT_INITIAL_SUMMARY_PROMPT ) –

    Prompt template for generating the first summary.

  • existing_summary_prompt (ChatPromptTemplate, default: DEFAULT_EXISTING_SUMMARY_PROMPT ) –

    Prompt template for updating an existing (running) summary.

  • final_prompt (ChatPromptTemplate, default: DEFAULT_FINAL_SUMMARY_PROMPT ) –

    Prompt template that combines summary with the remaining messages before returning.

Returns:

  • SummarizationResult –

    A SummarizationResult object containing the updated messages and a running summary. - messages: list of updated messages ready to be input to the LLM - running_summary: RunningSummary object - summary: text of the latest summary - summarized_message_ids: set of message IDs that were previously summarized - last_summarized_message_id: ID of the last message that was summarized

Example
from langgraph.graph import StateGraph, START, MessagesState
from langgraph.checkpoint.memory import InMemorySaver
from langmem.short_term import summarize_messages, RunningSummary
from langchain_openai import ChatOpenAI

model = ChatOpenAI(model="gpt-4o")
summarization_model = model.bind(max_tokens=128)


class SummaryState(MessagesState):
    summary: RunningSummary | None


def call_model(state):
    summarization_result = summarize_messages(
        state["messages"],
        running_summary=state.get("summary"),
        model=summarization_model,
        max_tokens=256,
        max_summary_tokens=128,
    )
    response = model.invoke(summarization_result.messages)
    state_update = {"messages": [response]}
    if summarization_result.running_summary:
        state_update["summary"] = summarization_result.running_summary
    return state_update


checkpointer = InMemorySaver()
workflow = StateGraph(SummaryState)
workflow.add_node(call_model)
workflow.add_edge(START, "call_model")
graph = workflow.compile(checkpointer=checkpointer)

config = {"configurable": {"thread_id": "1"}}
graph.invoke({"messages": "hi, my name is bob"}, config)
graph.invoke({"messages": "write a short poem about cats"}, config)
graph.invoke({"messages": "now do the same but for dogs"}, config)
graph.invoke({"messages": "what's my name?"}, config)

Comments