Putting It All Together: Lessons from Building a LangGraph Agent

This final part covers main.py — the CLI entry point that ties everything together followed by a real end-to-end session trace, and then three production-readiness topics that this project raises but doesn't fully solve: streaming, context window growth, and model swapping. We'll close with an honest take on when LangGraph is the right tool and when it isn't.

The entry point: main.py

Every file in this project has been shown except one. main.py is the CLI that launches the agent, the loop that takes user input, invokes the graph, and prints the response. It's intentionally thin:

# main.py

from agent.graph import build_graph

WELCOME_BANNER = """
╔══════════════════════════════════════════════════════════════╗
║                City Events Agent                             ║
║                                                              ║
║  I can help you with:                                        ║
║    • Finding local events in cities worldwide                ║
║    • Searching the web for information                       ║
║    • Checking current weather conditions                     ║
║                                                              ║
║  Type 'quit' or 'exit' to leave.                             ║
╚══════════════════════════════════════════════════════════════╝
"""

def run_cli():
    print(WELCOME_BANNER)
    graph = build_graph()
    print("Agent ready. Ask me anything!\n")
    
    while True:
        try:
            user_input = input("You: ").strip()
        except (EOFError, KeyboardInterrupt):
            print("\nGoodbye! ")
            break
        
        if not user_input:
            continue
        if user_input.lower() in ("quit", "exit", "q"):
            print("Goodbye!")
            break
        
        try:
            result = graph.invoke(
                {"messages": [{"role": "user", "content": user_input}]}
            )
            assistant_msg = result["messages"][-1].content
            print(f"\nAssistant: {assistant_msg}\n")
        except Exception as exc:
            print(f"\n Error: {exc}\n")

if __name__ == "__main__":
    run_cli()

Here are a few deliberate choices here worth noting.

The graph is built once, outside the loop. build_graph() is called before the while True loop begins. This matters because init_chat_model and bind_tools initialise the LLM client and generate the tool schemas. Doing this on every user message would add latency and waste resources. Build once, invoke many.

Each invocation is stateless. Notice that every call to graph.invoke() passes only the current user message, not the full conversation history. This means the agent has no memory between turns. Each question is answered in isolation. For a CLI demo this is fine, but it's the first thing you'd fix for a real conversational product. We'll come back to this.

The last message is the response. After graph.invoke() returns, result["messages"][-1].content gives us the final AI message. By the time the graph reaches END, the last item in the messages list is always the assistant's natural-language response — everything that came before it (tool calls, tool results, intermediate AI messages) is accumulated state that the LLM used to produce that final answer.

Errors are caught at the CLI level. The try/except around graph.invoke() is a last-resort catch. Individual tools handle their own failures internally and return structured errors. This outer catch exists for unexpected failures in the graph itself, like a bad API key that prevents the LLM from initialising, a network failure that Tavily doesn't handle internally, and so on.

Here's what a real session looks like with the agent running. The user asks a multi-part question:

╔══════════════════════════════════════════════════════════════╗
║                City Events Agent                       ║
╚══════════════════════════════════════════════════════════════╝
 
Agent ready. Ask me anything!
 
You: What's on in Lagos this weekend, and what's the weather like?
 
Assistant: Here's what I found for Lagos this weekend:
 
 Events:
• Afrobeats Night at Eko Hotel — Saturday 8pm
• Lagos Art Fair, Victoria Island — Saturday & Sunday, 10am–6pm
• Comedy at Terra Kulture — Sunday 7pm
 
 Weather:
Currently 31°C with light clouds. Humidity at 74%,
light breeze from the southwest. Good conditions for
outdoor events.
 
You: quit
Goodbye!

Two tool calls, one coherent response. The LLM queried the database, fetched the weather, and synthesised both into a single answer without any explicit orchestration from our side. That's the graph doing its job.

Three things You would consider in production

1. The agent has no memory between turns

As noted above, each graph.invoke() call starts fresh. The conversation has no continuity. if a user asks "What about Abuja?" after asking about Lagos, the agent has no idea what "what about" refers to.

LangGraph solves this with checkpointers. A checkpointer persists graph state between invocations using a thread ID. The simplest version uses in-memory storage:

from langgraph.checkpoint.memory import MemorySaver

checkpointer = MemorySaver()
graph = builder.compile(checkpointer=checkpointer)

# Each invocation now carries a thread ID
config = {"configurable": {"thread_id": "session_001"}}

result = graph.invoke(
    {"messages": [{"role": "user", "content": user_input}]},
    config=config
)

With a checkpointer in place, LangGraph automatically loads the previous state for that thread ID before each invocation and saves the updated state after. The full conversation history persists across turns without any changes to your nodes, tools, or edges. For production, swap MemorySaver for SqliteSaver or PostgresSaver to persist state to disk.

2. Context window growth

The agent's state grows with every turn. In Part 2, we saw that a single three-tool query produces eight messages. Ten turns of conversation with multi-tool queries could put 50 to 80 messages in the context window which is well within limits for a single session, but worth planning for in a longer-lived agent.

There are two standard approaches:

Message trimming:

LangChain provides trim_messages(), which truncates the message list to fit within a token budget while preserving the system message and the most recent exchanges. You'd apply it inside the chatbot node before calling the LLM:

from langchain_core.messages import trim_messages

def chatbot(state: State):
    trimmed = trim_messages(
        state["messages"],
        max_tokens=4096,
        strategy="last",
        token_counter=llm,
    )
    return {"messages": [llm_with_tools.invoke(trimmed)]}

Summarisation:

Instead of trimming, you can add a summarisation node that periodically compresses older messages into a single summary message. This preserves context that trimming would discard, at the cost of a second LLM call. For most conversational agents, trimming is sufficient and simpler.

For the City Events Agent as built , which is just a CLI tool with short sessions , neither is needed. But if you extend it to a web API with persistent sessions, this becomes a real concern around the 20-turn mark.

3. Streaming responses

Right now, graph.invoke() blocks until the entire execution completes; all tool calls resolved, final LLM response generated before printing anything. For queries that hit three tools, this can feel sluggish. The user sees nothing for several seconds, then gets the full response at once.

LangGraph's graph.stream() method fixes this by yielding events as they happen. You can stream at different granularities , that is, per node completion, or per token if the LLM supports it:

# Stream node-level events
for event in graph.stream(
    {"messages": [{"role": "user", "content": user_input}]},
    config=config
):
    for node_name, node_output in event.items():
        # node_output is the state update from that node
        last_msg = node_output["messages"][-1]
        if hasattr(last_msg, "content") and last_msg.content:
            print(last_msg.content)

# For token-level streaming, use astream_events (async)
async for event in graph.astream_events(
    {"messages": [{"role": "user", "content": user_input}]},
    config=config,
    version="v2"
):
    if event["event"] == "on_chat_model_stream":
        chunk = event["data"]["chunk"].content
        print(chunk, end="", flush=True)

Token-level streaming requires an async runtime and an LLM that supports streaming (most modern ones do). For a CLI tool, node-level streaming ,which shows tool results as they arrive before the final response is often enough to make the agent feel responsive without the added complexity of async Python.

When LangGraph is the wrong tool

Every tool has a domain where it's the right answer and a domain where it's overkill. LangGraph is no exception.

Use LangGraph when your application genuinely needs loops, conditional branching, or persistent state across multiple LLM calls. Multi-tool agents like this one, autonomous research pipelines, workflow automation with human-in-the-loop checkpoints, and multi-agent systems, these are exactly what LangGraph is designed for.

Don't use LangGraph when your application is a single LLM call, a fixed sequence of steps, or a simple RAG pipeline. A single prompt-response cycle doesn't need a state machine. A document summariser that chunks, embeds, retrieves, and generates in a fixed order is better expressed as a plain function or a simple LangChain LCEL chain. The graph adds conceptual overhead without adding capability.

If you've been following along and building as you go, you now have a working multi-tool LangGraph agent and a mental model for extending it. That's the goal. The framework will evolve; the model of stateful graphs for agentic applications is here to stay.

Putting It All Together: Lessons from Building a LangGraph Agent

The entry point: main.py

Here are a few deliberate choices here worth noting.

Three things You would consider in production

1. The agent has no memory between turns

2. Context window growth

3. Streaming responses

When LangGraph is the wrong tool

Comments

More from this blog

Adding Multiple Tools to the Agent: SQLite, Web Search, and Live Weather

State Management in LangGraph

Building Better LLM Agents with LangGraph

Moving Data in Production: The Three Architectures Every ML Engineer Must Know

Command Palette

The entry point: main.py

Here are a few deliberate choices here worth noting.

Three things You would consider in production

1. The agent has no memory between turns

2. Context window growth

3. Streaming responses

When LangGraph is the wrong tool

Comments

More from this blog