构建图形数据库应用程序

构建图形数据库应用程序

In this chapter, we discuss some of the practical issues of working with a graph database. In previous chapters, we’ve looked at graph data; in this chapter, we’ll apply that knowledge in the context of developing a graph database application. We’ll look at some of the data modeling questions that may arise, and at some of the application architecture choices available to us.
In our experience, graph database applications are highly amenable to being developed using the evolutionary, incremental, and iterative software development practices in widespread use today. A key feature of these practices is the prevalence of testing throughout the software development life cycle. Here we’ll show how we develop our data model and our application in a test-driven fashion.
At the end of the chapter, we’ll look at some of the issues we’ll need to consider when planning for production.

在本章中,我们讨论使用图数据库的一些实际问题。 在前面的章节中,我们研究了图形数据; 在本章中,我们将在开发图形数据库应用程序的背景下应用这些知识。 我们将研究可能出现的一些数据建模问题,以及可供我们使用的一些应用程序体系结构选择。
根据我们的经验,图数据库应用程序非常适合使用当今广泛使用的渐进式,增量式和迭代式软件开发实践进行开发。 这些实践的关键特征是在整个软件开发生命周期中普遍进行测试。 在这里,我们将展示如何以测试驱动的方式开发数据模型和应用程序。
在本章的最后,我们将讨论在计划生产时需要考虑的一些问题。

Data Modeling

在第3章中,我们详细介绍了建模和使用图形数据的方法。在这里,我们总结了一些更重要的建模准则,并讨论了实现图形数据模型如何与迭代和增量软件开发技术相适应的方法。

Describe the Model in Terms of the Application’s Needs

The questions we need to ask of the data help identify entities and relationships. Agile user stories provide a concise means for expressing an outside-in, user-centered view of an application’s needs, and the questions that arise in the course of satisfying this need. 1 Here’s an example of a user story for a book review web application:AS A reader who likes a book, I WANT to know which books other readers who like the same book have liked, SO THAT I can find other books to read.
This story expresses a user need, which motivates the shape and content of our data model. From a data modeling point of view, the  AS A clause establishes a context comprising two entities—a reader and a book—plus the  LIKES relationship that connects them. The  I WANT clause then poses a question: which books have the readers who like the book I’m currently reading also liked? This question exposes more  LIKES relationships, and more entities: other readers and other books.
The entities and relationships that we’ve surfaced in analyzing the user story quickly translate into a simple data model, as shown in Figure 4-1.

我们需要对数据提出的问题有助于识别实体和关系。敏捷的用户故事提供了一种简洁的方法,用于表达从外到内,以用户为中心的应用程序需求视图以及满足此需求的过程中出现的问题。 这是一个用于书评网络应用程序的用户故事的示例:AS喜欢一本书的读者,我想知道其他喜欢同一本书的读者喜欢哪本书,因此我可以找到其他书籍。
这个故事表达了用户需求,激发了我们数据模型的形状和内容。从数据建模的角度来看,“ AS A”子句建立了一个包含两个实体(读者和书本)以及连接它们的LIKES关系的上下文。然后,I WANT子句提出了一个问题:喜欢我当前正在阅读的书的读者也喜欢哪些书?这个问题揭示了更多的LIKES关系,以及更多的实体:其他读者和其他书籍。我们在分析用户故事时浮出水面的实体和关系很快就转化为一个简单的数据模型,如图4-1所示。

构建图形数据库应用程序

由于此数据模型直接对用户故事所提出的问题进行编码,因此它可以以类似反映我们要查询的数据问题的结构的方式进行查询,因为爱丽丝喜欢沙丘,因此可以找到其他喜欢的书沙丘喜欢:

构建图形数据库应用程序

Nodes for Things, Relationships for Structure

• Use nodes to represent entities—that is, the things in our domain that are of interest to us, and which can be labeled and grouped.
• Use relationships both to express the connections between entities and to establish semantic context for each entity, thereby structuring the domain.
• Use relationship direction to further clarify relationship semantics. Many relationships are asymmetrical, which is why relationships in a property graph are always directed. For bidirectional relationships, we should make our queries ignore direction, rather than using two relationships.
• Use node properties to represent entity attributes, plus any necessary entity metadata, such as timestamps, version numbers, etc.
• Use relationship properties to express the strength, weight, or quality of a relationship, plus any necessary relationship metadata, such as timestamps, version numbers, etc
  • 使用节点来表示实体,即我们所关注的领域中可以标记和分组的事物。
  • 使用关系既可以表示实体之间的连接,也可以为每个实体建立语义上下文,从而构造域。
  • 使用关系方向进一步阐明关系语义。 许多关系是不对称的,这就是为什么属性图中的关系始终是有向的。 对于双向关系,我们应该使查询忽略方向,而不是使用两个关系。
  • 使用节点属性表示实体属性,以及任何必要的实体元数据,例如时间戳,版本号等。
  • 使用关系属性来表示关系的强度,权重或质量,以及任何必要的关系元数据,例如时间戳,版本号等
It pays to be diligent about discovering and capturing domain entities. As we saw in Chapter 3, it’s relatively easy to model things that really ought to be represented as nodes using carelessly named relationships instead. If we’re tempted to use a relationship to model an entity—an email, or a review, for example—we must make certain that this entity cannot be related to more than two other entities. Remember, a relationship must have a start node and an end node—nothing more, nothing less. If we find later that we need to connect something we’ve modeled as a relationship to more than two other entities, we’ll have to refactor the entity inside the relationship out into a separate node. This is a breaking change to the data model, and will likely require us to make changes to any queries and application code that produce or consume the data.

努力发现和捕获域实体是值得的。 正如我们在第3章中所看到的,使用不小心命名的关系来建模应该以节点表示的事物相对容易。 如果我们想使用一种关系来为实体建模(例如,电子邮件或评论),则必须确保该实体不能与两个以上的其他实体相关。 请记住,一个关系必须有一个开始节点和一个结束节点,仅此而已。 如果以后发现我们需要将已建模为关系的对象与其他两个以上的实体连接,则必须将关系内的实体重构为一个单独的节点。 这是对数据模型的重大更改,可能需要我们对生成或使用数据的所有查询和应用程序代码进行更改。

Fine-Grained versus Generic Relationships

When designing relationships we should be mindful of the trade-offs between using fine-grained relationship names versus generic relationships qualified with properties. It’s the difference between using  DELIVERY_ADDRESS and  HOME_ADDRESS versus ADDRESS {type:'delivery'} and  ADDRESS {type:'home'} .
Relationships are the royal road into the graph. Differentiating by relationship name is the best way of eliminating large swathes of the graph from a traversal. Using one or more property values to decide whether or not to follow a relationship incurs extra I/O the first time those properties are accessed because the properties reside in a separate store file from the relationships (after that, however, they’re cached).

在设计关系时,我们应注意使用细粒度关系名称与具有属性的通用关系之间的权衡。 使用DELIVERY_ADDRESS和HOME_ADDRESS与ADDRESS {type:'delivery'}和ADDRESS {type:'home'}之间的区别。
关系是图中的皇家之路。通过关系名称区分是从遍历中消除大量图形的最好方法。首次访问这些属性时,使用一个或多个属性值来决定是否遵循某个关系会导致额外的I / O,因为这些属性与该关系位于不同的存储文件中(但是此后将对其进行缓存) 。

We use fine-grained relationships whenever we have a closed set of relationship names. Weightings—as required by a shortest-weighted-path algorithm—rarely comprise a closed set, and are usually best represented as properties on relationships.
Sometimes, however, we have a closed set of relationships, but in some traversals we want to follow specific kinds of relationships within that set, whereas in others we want to follow all of them, irrespective of type. Addresses are a good example. Following the closed-set principle, we might choose to create  HOME_ADDRESS ,  WORK_ADDRESS ,and  DELIVERY_ADDRESS relationships. This allows us to follow specific kinds of address relationships ( DELIVERY_ADDRESS , for example) while ignoring all the rest.
But what do we do if we want to find all addresses for a user? There are a couple of options here. First, we can encode knowledge of all the different relationship types in our queries: e.g.,  MATCH (user)-[:HOME_ADDRESS|WORK_ADDRESS| DELIVERY_ADDRESS]->(address) . This, however, quickly becomes unwieldy when there are lots of different kinds of relationships. Alternatively, we can add a more generic  ADDRESS relationship to our model, in addition to the fine-grained relationships. Every node representing an address is then connected to a user using two relationships: a fined-grained relationship (e.g.,  DELIVERY_ADDRESS ) and the more generic  ADDRESS {type:'delivery'} relationship.
As we discussed in “Describe the Model in Terms of the Application’s Needs” on page 66, the key here is to let the questions we want to ask of our data guide the kinds of relationships we introduce into the model.

每当我们有一组封闭的关系名称时,我们就使用细粒度的关系。权重(按照最短加权路径算法的要求)很少包含封闭集,通常最好用关系的属性表示。
但是,有时我们有一组封闭的关系,但是在某些遍历中,我们希望遵循该组中的特定类型的关系,而在另一些遍历中,我们希望遵循所有这些关系,而与类型无关。地址就是一个很好的例子。遵循封闭集原则,我们可能选择创建HOME_ADDRESS,WORK_ADDRESS和DELIVERY_ADDRESS关系。这使我们可以遵循特定类型的地址关系(例如DELIVERY_ADDRESS),而忽略其余所有关系。
但是,如果我们要查找用户的所有地址怎么办?这里有两个选择。首先,我们可以在查询中对所有不同关系类型的知识进行编码:例如,MATCH(用户)-[:HOME_ADDRESS | WORK_ADDRESS |          DELIVERY_ADDRESS]->(地址)。但是,当存在许多不同类型的关系时,这很快变得难以处理。另外,除了细粒度的关系外,我们还可以向模型添加更通用的地址关系。然后,使用两种关系将表示地址的每个节点连接到用户:精细关系(例如DELIVERY_ADDRESS)和更通用的ADDRESS {type:'delivery'}关系。
正如我们在第66页的“根据应用程序的需求描述模型”中所讨论的那样,关键是要让我们要问数据的问题指导我们引入模型中的各种关系。

Model Facts as Nodes

When two or more domain entities interact for a period of time, a fact emerges. We represent a fact as a separate node with connections to each of the entities engaged in that fact. Modeling an action in terms of its product—that is, in terms of the thing that results from the action—produces a similar structure: an intermediate node that represents the outcome of an interaction between two or more entities. We can use timestamp properties on this intermediate node to represent start and end times.
The following examples show how we might model facts and actions using intermediate nodes.

当两个或多个域实体交互一段时间后,就会出现一个事实。 我们将事实表示为一个单独的节点,该节点与该事实中涉及的每个实体都有连接。 根据行为的产品(即根据行为产生的事物)对行为进行建模,会产生类似的结构:代表两个或多个实体之间交互结果的中间节点。 我们可以在此中间节点上使用时间戳属性来表示开始时间和结束时间。
以下示例说明了如何使用中间节点对事实和动作进行建模。

Employment

Figure 4-2 shows how the fact of Ian being employed by Neo Technology in the role of engineer can be represented in the graph.

图4-2显示了如何在图形中表示Ian被Neo Technology聘用为工程师的事实。

构建图形数据库应用程序

Figure 4-3 shows how the fact that William Hartnell played The Doctor in the story
The Sensorites can be represented in the graph

构建图形数据库应用程序

构建图形数据库应用程序

 

 

 

 

上一篇:CloudCompare学习记录(二)教程


下一篇:记一次 .NET医疗布草API程序 内存暴涨分析