Graph Modeling Mastery
When it comes to graph data modeling, there are no hard and fast rules. Instead, there’s a lot of experience-based guidelines, some common pitfalls, and a healthy respect for performance.
In our GraphGeeks Talk with Max De Marzi, we unpack what makes a graph model solid, what tends to break things, and how to design with both your data and your queries in mind. It isn’t a dry list of dos and don’ts. Think of it more like advice from a graph architect who’s seen one too many models spiral out of control.
What Makes a Good Model?
Let’s start with the basics. A good graph model reflects the structure of your domain and the questions you want to ask. It’s not enough to mirror the data as it exists in some spreadsheet or JSON blob. If you don’t consider the kinds of queries your application needs to support, you’re likely to end up with a beautiful but unusable model, or worse, one that becomes slower and messier over time.
Relationship Goals
One of the first things Max emphasizes is relationship types. It’s tempting, especially early on, to just use a single generic relationship like RELATED_TO for everything. It’s easy, flexible, and seems harmless—until you realize you’ve created a black box. If your relationships don’t tell you what’s on the other end, your queries will have to do all the heavy lifting. That leads to performance hits and overly complex logic.
Relationships should carry meaning: FRIENDS_WITH
, WORKS_AT
, MENTORED_BY
, and so on. If you find yourself adding a comment to explain what a relationship is for, stop. That explanation should be in the relationship type itself.
On a related note (pun slightly intended), don’t overload relationships. If you're using one relationship type to connect a dozen different node types, your queries are going to spend most of their time figuring out what not to return. Relationships should be lean, purposeful, and predictable.
Direction Matters
When you’re modeling well-traveled paths in your graph, keeping the direction consistent can make a big difference. A bidirectional model sounds nice in theory, but in practice it often forces your queries to include logic for both directions, slowing them down unnecessarily.
Not every relationship is symmetric. A FOLLOWS
relationship implies one-way interest (just ask your LinkedIn connections). If you’re reaching for symmetric relationships like FRIENDS_WITH
, think carefully: Is this actually mutual?
Avoid creating both PARENT_OF
and CHILD_OF
unless the semantics differ. Most graph databases can easily reverse a traversal without needing a mirrored relationship.
One Label is Plenty
Your graph database might let you assign multiple labels to a node. That doesn’t mean you should. Labels are often used for partitioning, indexing, or semantic clarity. If a node needs three labels to explain what it is, it probably doesn’t know either.
Stick to a single, meaningful label per node. It makes querying easier, indexing simpler, and your data model more coherent. If you find yourself needing multiple labels, it might be time to revisit your taxonomy rather than piling on metadata.
Index Your Lookups
Any property you plan to search by—email, phone number, username, social security number—should be indexed. It sounds obvious, but it's surprisingly easy to forget until your queries start dragging. Without indexes, your queries are forced to scan every node with the matching label, and that gets expensive fast.
Think of indexing as your graph’s immune system: you only notice it when it’s not working.
Language Vs. Logic
We also explore some of the more subtle modeling choices—like the difference between a relationship and a node. Language can trick you here. “Did you email someone?” sounds like a relationship. But in a graph, that interaction is often better represented as a node with properties like timestamp, subject, and participants.
Relationships are important, but if you find yourself cramming too much information into them, you might be missing an entity. And when something feels weird, when your data starts to twist into awkward patterns, it’s often because you’ve overlooked time. Time almost always deserves to be modeled explicitly.
When in doubt, ask: Is this a thing that exists independently? If yes, it probably deserves to be a node.
Build for Queries
One of the biggest modeling mistakes is designing around data structure rather than query patterns. A model might look elegant in theory but perform terribly when asked real questions.
Take a movie database, for instance. If you store actor roles as a property on the ACTED_IN
relationship, it looks clean, until you try to answer, “Who played 007?” Then you’re stuck scanning every role property in your graph.
A better approach is to make characters like James Bond
their own nodes. Connect them to both the movies and the actors who portrayed them. Now, that question becomes a simple traversal.
Design for Growth
Many models work beautifully on day one—until data volume increases. Take the classic example of a social media graph: users post content, and you want to see what the people you follow posted today. If those posts are connected by POSTED
relationships with a date
property, querying becomes expensive, fast. Every day, that list of posts gets longer and filtering it becomes slower.
One neat trick is to turn time into a first-class concept, maybe even a relationship type. If your user POSTED_ON_2025_05_26
, you can query just today’s posts without wading through everything from 2016.
Break the Rules, but Only on Purpose
There’s no “right” way to model a graph, but there are definitely wrong ways. Most of the principles above exist because they tend to work well in practice. But graph modeling isn’t about following a checklist. It’s about making thoughtful, informed decisions that suit your domain, your queries, and the capabilities of your chosen database.
Sometimes that means sticking to best practices. Sometimes it means breaking them—because you understand why they exist.
Join Us!
The good news? You don’t have to figure it all out on your own.
Whether you're wrangling your first dataset or wrestling with your fiftieth redesign, the GraphGeeks community is where you’ll find others just as passionate (and just as picky) about graph modeling. We talk patterns, anti-patterns, performance tricks, and yes—how to avoid being haunted by poorly named relationship types.
So come join us in our Discord server. Bring your questions, your weird edge cases, and your modeling wins. We’re graph people. We get it.