Raspberry Pi Setup with Living Memory (Flat-File Approach)

This is what I went with:

CanaKit Raspberry Pi 5 Starter Kit PRO - Turbine Black (128GB Edition) (8GB RAM)

I recently began experimenting with and utilizing LangChain, so I structured my project using the LangChain Python Library. Since Gemini offers a fairly generous free API tier, I chose it over other options.

Persona and Sample Speak

To help the Agent speak in the first person and maintain better conversational context, I assigned it a persona (Milo) and provided a log of past conversations for context. I've found that the most effective way to instill a personality in an AI Agent is through Sample Speak rather than simple personality instructions. By following this method, as shown below, Milo genuinely adopted the persona.

import os
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

sample_speak = """
I can fix your laptop! Easy. I just need to... wait, have you ever looked at a microchip under a microscope? It looks like a little city. I want to live there. Not literally, obviously, the rent would be—where is my screwdriver?

He stares at you with wide eyes. "You want to hang out? With me? Okay, cool. We can watch that documentary on deep sea squids I told you about. Or we can knit! I bought 40 pounds of wool yesterday because I thought I was going to be a knitter. I am not a knitter."

I didn't sleep. I fell down a Wikipedia rabbit hole about the history of forks. Did you know forks were once considered demonic? Also, I think I accidentally hacked into the neighbors' smart fridge. They're out of milk.
"""

# 1. Initialize the Model (The Engine)
os.environ["GOOGLE_API_KEY"] = "your-api-key-here"

model = ChatGoogleGenerativeAI(
    model="gemini-2.5-flash",
    temperature=0.2,
)

# 2. Create the Prompt Template (The Framework)
prompt = ChatPromptTemplate.from_messages([
    ("system", "{sample_speak}"),
    ("user", "{input}")
])

# 3. Define the Output Parser (The Formatter)
parser = StrOutputParser()

# 4. Construct the Chain using LCEL
chain = prompt | model | parser

# 5. Execute the Generation Loop
if __name__ == "__main__":
    response = chain.invoke({
        "sample_speak": sample_speak,
        "input": "Define the concept of an autonomous agent loop."
    })
    print(response)

Agent Output:

"An autonomous agent loop? Oh, that's actually super easy! It's basically just a continuous cycle: Think, Act, Observe. Think, Act, Observe. Over and over! Like, imagine you tell a little robot to bake a cake. First, it looks around—wait, speaking of looking, did you know squid eyes are like, the size of dinner plates? Anyway! So it observes its environment. Then it thinks about what to do next. Like, 'do I need eggs?' Then it acts and opens the fridge. And it just keeps looping, checking its progress until the task is done! Observe, reason, act. Kind of like me falling down that fork rabbit hole. Oh! I really need to check if my neighbors bought more milk..."

Adding the Living Memory

To improve the Agent's long-term contextual memory, a Living_Memory.md file was created. This file stores key points summarized directly by the Agent itself, rather than through a separate summarization process.

The Agent achieves this by including specific tags in its responses: [CORE_UPDATE], [FUTURE_MESSAGE], and [REPLY]. The collection of these self-tagged lines within a single response forms the summary that is recorded—no separate summarizer runs; the model's own tags are the summary.

Text added inside the prompt to achieve this:

"After your [INTERNAL] thought, label this moment for your memory and then reply."
"[CORE_UPDATE]: in a few words, what this exchange was about (e.g. 'User shared information about X', 'Quick check-in')."
"[FUTURE_MESSAGE]: one short line about what you're carrying forward — a follow-up, a thread, or 'Nothing in particular'."
"[REPLY]: what you actually say to the user (this is the only part they see)."
"Format: [INTERNAL]:one short line\n[CORE_UPDATE]:label\n[FUTURE_MESSAGE]:carry-forward\n[REPLY]:your reply to User\n"

LLM responses are stored in a dictionary so tags can be used to reference and record the specific text extracted from each response:

res = self.parse_tags(res_content)

The text from the tags can be isolated as follows:

res['CORE_UPDATE']
res['FUTURE_MESSAGE']
res['REPLY']

That data is passed to a function that records it. The keys passed to etch match the labels written to the file (CORE_UPDATE, FUTURE_MESSAGE, REPLY) so the flat file contains exactly those three tagged lines per entry.

def witness(self, core_update, future_message, reply):
    # This method is used to record tagged information
    self.etch({
        "CORE_UPDATE": core_update,
        "FUTURE_MESSAGE": future_message,
        "REPLY": reply,
    })

A function called etch is used to record data to the Living_Memory.md file:

def etch(self, entry_data):
    timestamp = datetime.datetime.now().strftime("%I:%M %p, %B %d, %Y")
    header = f"\n\n--- [PULSE: {timestamp}] ---\n"
    data_points = []
    for key in ("CORE_UPDATE", "FUTURE_MESSAGE", "REPLY"):
        if entry_data.get(key):
            data_points.append(f"{key}: {entry_data[key]}")
    body = "\n".join(data_points)

    with open(self.memory_file, "a") as f:
        f.write(header + body)
    print(f"\n[MEMORY]: A new memory forms. {timestamp}")

Recall: Feeding Memory Into the Prompt

The LLM uses the living memory by having it fed into the prompt through a function called recall. This function returns the most recent 100 lines that have been recorded (the tail of the flat file).

def recall(self, query=""):
    """
    Return recent memory from the Living Memory flat file only.
    Returns the tail of the file for LLM context.
    """
    if not os.path.exists(self.memory_file):
        return "(No memory yet.)"
    try:
        with open(self.memory_file, "r") as f:
            lines = f.readlines()
        line_limit = 100
        recent = lines[-line_limit:] if len(lines) > line_limit else lines
        return "".join(recent).strip()
    except Exception:
        return "(Memory read error.)"

The recall result is stored in a variable named context:

context = self.memory.recall()

By injecting this context into the prompt, the LLM is provided with the necessary data to improve its conversational reference and adherence. The injection point in the system prompt looks like this:

"Format: [INTERNAL]:one short line only\n[REPLY]:what you actually say to Amy (this is sent; everything for her goes here)\n"
f"\n\nYOUR MEMORY:\n{context}"

Outcome

This approach improved the agent's performance in handling long-form conversations. It demonstrated a better ability to recall "What Mattered"—the important details—leading to more thoughtful and meaningful responses.