Concept dependence is best understood using an example.
{
"mrn": "001",
"first_name": "John",
"last_name": "Doe",
"address": {
"street": "319 Doe Lane",
"city": "Philadelphia",
"state": "PA",
"zipcode": "19014"
},
"problem_list": [
{
"noted_date": "2016-10-11",
"resolved_date": null,
"icd9": {
"code": "487.1",
"description": "Influenza with other respiratory manifestations"
}
},
...
]
}
The description of the above looks as follows:
model:
name: patient
topic: patient
concepts:
- patient
- address
- problem_list
- icd9
relations:
- /address
- /problem_list
- /problem_list/icd9
relations:
- name: /address
source: patient
target: address
path: /address
multiple: false
- name: /problem_list
source: patient
target: problem_list
path: /problem_list
multiple: true
- name: /problem_list/icd9
source: problem_list
target: icd9
path: /icd9
multiple: false
concepts:
- name: patient
path: /
fields:
- name: mrn
path: /mrn
format: string
- name: first_name
path: /first_name
format: string
- name: last_name
path: /last_name
format: string
- name: address
path: /address
fields:
- name: street
path: /street
format: string
- name: city
path: /city
format: string
- name: state
path: /state
format: string
- name: zipcode
path: /zipcode
format: string
- name: problem_list
path: /problem_list
fields:
- name: noted_date
path: /noted_date
format: date
- name: resolved_date
path: /resolved_date
format: date
- name: icd9
path: /problem_list/icd9
fields:
- name: code
path: /code
format: string
- name: description
path: /description
format: string
Although address
is defined in the scope of patient
, this relation is merely a structural one. The concept of address
is not inherently dependent on patient
since even if the patient moves, the address will remain. This means that the address can be independently referred to and does not require knowledge about the patient.
The problem_list
relation, on the other hand, is dependent on patient
. Why? Because a problem is about the patient. It describes the state of a patient as of some date captured through the noted_date
and resolved_date
fields. Furthermore, unlike with address
, if the patient dies, their problem list goes with them.
Another way of looking at it is, if you were given a problem from the list without the patient it applies to, is this data meaningful?
The final relation to consider is between problem_list
and icd9
. Like address
, an icd9
description can continue to exist and be referenced by other things.
One question you may ask is whether there is a dependence from the source concept to the target concept, i.e. patient
to address
. In general the answer is no, unless the target concept is or contributes to the identity of the source concept. It should be obvious that the address
does not contribute to the identity of a patient given there is an mrn
field. As a thought experiement, what if there was no mrn
field?
In the case of problem_list
, the value of icd9
is part of the identity of an item in the list. Why? Because there is a good chance other problems with the same noted_date
and/or resolved_date
could be in that list. Likewise, the same problems can reoccur over the lifetime of a patient. Without the dates, these will simply look like a series of repeated values, but without any information about when they occurred.
Now that we discussed this at length, we can update the relation descriptions with this information. This is done using the dependent
flag. As with above, it is defined in the direction of target to source. Thus the only one with a dependence is problem_list
to patient
.
# ...
relations:
- name: /address
source: patient
target: address
path: /address
multiple: false
dependent: false
- name: /problem_list
source: patient
target: problem_list
path: /problem_list
multiple: true
dependent: true
- name: /problem_list/icd9
source: problem_list
target: icd9
path: /icd9
multiple: false
dependent: false
#...
Another annotation to add is the identity of concepts. This is done by specifying local fields that contribute to the identity
of the concept.
The fields have been omitted for brevity.
concepts:
- name: patient
path: /
identity:
- mrn
- name: address
path: /address
identity: null
- name: problem_list
path: /problem_list
identity:
- noted_date
- name: icd9
path: /problem_list/icd9
identity:
- code
Although not required, the identity of the address
concept was explicitly set to null
as a form of documentation. If the identity is not explicitly defined, then all fields contribute to the identity.
See example.yaml
for the final description.
Using this information we can auto-generate SQL following a simple set of rules:
- one table per concept
- unique primary key for each table
- foreign keys to encode dependence on other concepts
- foreign key columns cannot be null
- unique index for the identity
The Go program tosql.go
takes the example.yaml
and generates the output shown in example.sql
.