In the modern enterprise, decisions, action items, and participant details are routinely captured in meeting notes. However, these documents often remain isolated, accessible only through basic text searches, making it difficult to extract interconnected insights. Imagine a system capable of answering complex questions such as: 'Which individuals attended discussions on budget planning?', 'What specific tasks were assigned to Sarah across various sessions?', or 'Outline all engineering team decisions made in the last quarter.' This level of relational querying, impractical with traditional document storage, is precisely where knowledge graphs demonstrate their significant value.
A recent development showcases a practical pipeline, built with CocoIndex, that automates the transformation of raw Markdown meeting notes into a sophisticated knowledge graph. This system achieves seamless, incremental updates, eliminating the need for full reprocessing of existing data. Key steps in this process include reading notes from sources like Google Drive, utilizing LLMs to extract structured entities such—as meeting details, attendees, and assigned tasks—and then persisting this information into a Neo4j graph database. Crucially, the system is designed to update only when source documents undergo changes.
The Incremental Architecture Explained
The pipeline operates on a clear, staged data flow designed for efficiency. It begins by monitoring Google Drive documents for modifications. Upon detecting changes, only the altered documents are processed. These are then segmented into individual meeting records, from which LLMs extract structured data. This extracted information, comprising nodes and relationships, is subsequently exported to Neo4j using an upsert mechanism, ensuring the graph is always current without redundant writes.
A core strength of this architecture lies in its incremental processing capabilities. The framework intelligently identifies only newly added or modified files, completely bypassing unchanged documents. This selective processing dramatically reduces computational overhead and LLM API costs, as only a small fraction of data typically triggers downstream actions in a large organizational context.
LLMs and Graph Databases: A Powerful Synergy
The extraction phase benefits significantly from large language models. By providing the LLM with a predefined schema for entities like `Person`, `Task`, and `Meeting`, the system ensures reliable and structured output. This structured data is then meticulously collected into distinct categories—meeting nodes, attendance relationships, task decisions, and task assignments—acting as in-memory buffers before final persistence.
These collected data points are then mapped directly to a property graph within Neo4j. The system defines `Meeting`, `Person`, and `Task` as distinct nodes. Relationships are then established, such as `ATTENDED` (linking individuals to meetings), `DECIDED` (connecting meetings to associated tasks), and `ASSIGNED_TO` (mapping individuals to their assigned tasks). This meticulous mapping, coupled with consistent key management, prevents data duplication and maintains graph integrity across updates.
Significantly, the export process to Neo4j also operates incrementally. CocoIndex updates only those nodes or relationships that have actually changed, minimizing database churn and optimizing write operations on the target database.
Transformative Enterprise Applications
Once implemented, the Neo4j database becomes a rich, queryable resource. Analysts can execute complex Cypher queries to explore interconnections across meetings, people, and tasks. For example, one can quickly identify all attendees of a specific meeting, trace tasks back to their originating discussions, or view all assignments for a particular individual.
This innovative pattern extends far beyond simply analyzing meeting notes. Its potential applications across the enterprise are vast: from building knowledge graphs of concepts and citations in research papers, enabling tracking of updates across thousands of documents; to extracting issues, solutions, and customer relationships from support tickets; or even summarizing email threads to graph communication patterns. Such a system empowers organizations to transform previously siloed information into actionable, interconnected intelligence.
This article is a rewritten summary based on publicly available reporting. For the original story, visit the source.
Source: Towards AI - Medium