Docs: Design document for semantic memory #377

o-love · 2025-10-24T17:37:06Z

Description

We are looking to refactor profile memory into a general knowladge and fact extractor.
Where given a set of specialized prompts, each prompt can extract a set of facts.

This can then be used both for extracting facts about a user persona as well as extracting facts on other topics.

Type of change

Documentation update

Copilot

Pull Request Overview

This PR introduces a design document for refactoring profile memory into a generalized semantic memory subsystem. The system uses LLM-based fact extraction with specialized prompts to build, update, and search structured knowledge across various domains.

Key Changes:

Design document outlining architecture for semantic memory extraction and storage
Definition of data models, ingestion flows, and consolidation strategies
Documentation of background processing and caching mechanisms

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

docs/core_concepts/semantic_memory_design.mdx

Copilot · 2025-10-24T17:37:39Z

docs/core_concepts/semantic_memory_design.mdx

+invoke the `LanguageModel` to produce a list of `SemanticCommand`s that
+are applied to the feature set.


The search flow description appears incomplete or incorrect. Lines 88-89 describe invoking the LanguageModel to produce SemanticCommands applied to the feature set, but in a search operation, the system should be querying the vector database with the embedding, not modifying features. This should describe the semantic search process using the embedding to find relevant features.

Suggested change

invoke the `LanguageModel` to produce a list of `SemanticCommand`s that

are applied to the feature set.

use this embedding to query the vector database for similar feature embeddings,

retrieving the most relevant features as search results.

I kind of agree with Copilot here. I don't fully understand why we apply SemanticCommands to the feature set after turning the query into an embedding, unless maybe one of the SemanticCommands is some kind of wrapper for search?

SarahScargall · 2025-10-24T18:03:41Z

Is this design document something we are implementing toward, or something that has already been implemented? I ask as if it has been implemented, I need to open a separate Activity to update the Architectural design documentation on docs.memmachine.ai.

o-love · 2025-10-24T18:25:00Z

Is this design document something we are implementing toward, or something that has already been implemented? I ask as if it has been implemented, I need to open a separate Activity to update the Architectural design documentation on docs.memmachine.ai.

It is something we are working towards over the next weeks.

svetly-todorov · 2025-10-27T21:41:51Z

docs/core_concepts/semantic_memory_design.mdx

+To update a feature set, the feature set and the prompt associated with
+a `SemanticMemoryType` are sent to the `LanguageModel`. The LLM
+produces a list of `SemanticCommand`s that are applied to the feature
+set.


It would be useful to provide a short example of what a "feature set" can be. I referenced some output that I got after performing some memory additions & searches in main, and I'm assuming that it means the list of features that come after "profile memory":

"profile_memory":[{"tag":"Hobbies & Interests","feature":"favorite_food","value":"apples","metadata":{"id":1,"similarity_score":0.3549225330352783}},{"tag":"Hobbies & Interests","feature":"liked_food","value":"persimmons","metadata":{"id":3,"similarity_score":0.2976627051830292}}]

svetly-todorov · 2025-10-27T21:52:37Z

docs/core_concepts/semantic_memory_design.mdx

+is to translate the application specific `SessionData` into a `set_id`
+that will be used by the `SemanticManager`.
+This includes the `persona_id` and the `session_id`, for both the profile based
+semantic memory and the session based semantic memory.


I may be missing something. Is persona_id something new to be introduced in the refactor or something already in the code? On a first reading it seems maybe like a combination of the user_id and group_id that currently make up our session data.

svetly-todorov · 2025-10-27T21:53:32Z

I read through this to get a handle on #371.

Overall I think the description gives me a good high-level overview of what the semantic memory is supposed to accomplish. I just have some Qs about the details therein

svetly-todorov

This is useful and, I think, OK to merge. The "user experience" part at the top makes clear how I can interface with the refactored code, and then the rest follows...

Now, I understand "feature sets" not to be characteristics you glean from the data (e.g. "hobbies and interests" or "food preferences") but more like a set of identifiers that the message is associated with (group X, organization Y, etc.)

Tianyang-Zhang · 2025-11-05T00:08:13Z

docs/core_concepts/semantic_memory_design.mdx

+   metadata is folded into the text for better grounding.
+2. **Dirty-Set Tracking** – `SemanticUpdateTrackerManager` counts
+   messages and elapsed time per set to decide when updates should run.
+3. **Background Loop** – `_background_ingestion_task` wakes on a fixed


I feel running in the background would be very hard to let the caller know when an error happens. It think it is more straightforward to let the caller deside when to update/ingest, and the function/API call should let the caller know whether the result is success or failure.
If running in the background, a failure could cause the data/message to be gone forever.

Running in the background makes the user experience better in some way. It works as long as we figure out a good way to handle errors.

Yes. We need an API to report the status of those background tasks.

edwinyyyu · 2025-11-07T23:13:21Z

I think feature set should be defined.
Also, what is a feature type and how is it different?
Are set_ids bijective with feature sets?

edwinyyyu · 2025-11-07T23:18:55Z

docs/core_concepts/semantic_memory_design.mdx

+produces a list of `SemanticCommand`s that are applied to the feature
+set.
+
+Then when a search is being conducted with a query, the `SemanticProfileManager`


Suggested change

Then when a search is being conducted with a query, the `SemanticProfileManager`

Then when a search is being conducted with a query, the `SemanticManager`

edwinyyyu · 2025-11-07T23:21:19Z

docs/core_concepts/semantic_memory_design.mdx

+- **Feature Entry** – `(set_id, memory_type, feature, value, tag, embedding, metadata,
+  citations)` tuples persisted by the storage backend. Metadata tracks
+  provenance, timestamps, and arbitrary annotations.
+- **Set Config Entry** – `(set_id, llm_model_name, embedder_name, []semantic_memory_type_names)`


Go-like notation is confusing.

Docs: Design document for semantic memory

8cedbfb

o-love requested review from a team and Copilot October 24, 2025 17:37

Copilot AI reviewed Oct 24, 2025

View reviewed changes

o-love requested a review from sscargal October 24, 2025 17:40

Update semantic memory doc with per set_id config

5882a73

svetly-todorov reviewed Oct 27, 2025

View reviewed changes

Add interface overview

935539a

svetly-todorov approved these changes Nov 4, 2025

View reviewed changes

Tianyang-Zhang reviewed Nov 5, 2025

View reviewed changes

edwinyyyu reviewed Nov 7, 2025

View reviewed changes

jealous approved these changes Nov 18, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Docs: Design document for semantic memory #377

Docs: Design document for semantic memory #377

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

		invoke the `LanguageModel` to produce a list of `SemanticCommand`s that
		are applied to the feature set.

	Then when a search is being conducted with a query, the `SemanticProfileManager`
	Then when a search is being conducted with a query, the `SemanticManager`

Docs: Design document for semantic memory #377

Are you sure you want to change the base?

Docs: Design document for semantic memory #377

Uh oh!

Conversation

Description

Type of change

Uh oh!

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants