Skip to content

Instantly share code, notes, and snippets.

@bruth
Last active October 12, 2016 12:45
Show Gist options
  • Save bruth/e83647183672188e7c6d84cd767481bc to your computer and use it in GitHub Desktop.
Save bruth/e83647183672188e7c6d84cd767481bc to your computer and use it in GitHub Desktop.
Concept dependence

Concept Dependence

Concept dependence is best understood using an example.

{
  "mrn": "001",
  "first_name": "John",
  "last_name": "Doe",
  "address": {
    "street": "319 Doe Lane",
    "city": "Philadelphia",
    "state": "PA",
    "zipcode": "19014"
  },
  "problem_list": [
    {
      "noted_date": "2016-10-11",
      "resolved_date": null,
      "icd9": {
        "code": "487.1",
        "description": "Influenza with other respiratory manifestations"
      }
    },
    ...
  ]
}

The description of the above looks as follows:

model:
  name: patient
  topic: patient

  concepts:
    - patient
    - address
    - problem_list
    - icd9

  relations:
    - /address
    - /problem_list
    - /problem_list/icd9

relations:
  - name: /address
    source: patient
    target: address
    path: /address
    multiple: false

  - name: /problem_list
    source: patient
    target: problem_list
    path: /problem_list
    multiple: true

  - name: /problem_list/icd9
    source: problem_list
    target: icd9
    path: /icd9
    multiple: false

concepts:
  - name: patient
    path: /
    fields:
      - name: mrn
        path: /mrn
        format: string

      - name: first_name
        path: /first_name
        format: string

      - name: last_name
        path: /last_name
        format: string

  - name: address
    path: /address
    fields:
      - name: street
        path: /street
        format: string

      - name: city
        path: /city
        format: string

      - name: state
        path: /state
        format: string

      - name: zipcode
        path: /zipcode
        format: string

  - name: problem_list
    path: /problem_list
    fields:
      - name: noted_date
        path: /noted_date
        format: date

      - name: resolved_date
        path: /resolved_date
        format: date

  - name: icd9
    path: /problem_list/icd9
    fields:
      - name: code
        path: /code
        format: string

      - name: description
        path: /description
        format: string

Although address is defined in the scope of patient, this relation is merely a structural one. The concept of address is not inherently dependent on patient since even if the patient moves, the address will remain. This means that the address can be independently referred to and does not require knowledge about the patient.

The problem_list relation, on the other hand, is dependent on patient. Why? Because a problem is about the patient. It describes the state of a patient as of some date captured through the noted_date and resolved_date fields. Furthermore, unlike with address, if the patient dies, their problem list goes with them.

Another way of looking at it is, if you were given a problem from the list without the patient it applies to, is this data meaningful?

The final relation to consider is between problem_list and icd9. Like address, an icd9 description can continue to exist and be referenced by other things.

One question you may ask is whether there is a dependence from the source concept to the target concept, i.e. patient to address. In general the answer is no, unless the target concept is or contributes to the identity of the source concept. It should be obvious that the address does not contribute to the identity of a patient given there is an mrn field. As a thought experiement, what if there was no mrn field?

In the case of problem_list, the value of icd9 is part of the identity of an item in the list. Why? Because there is a good chance other problems with the same noted_date and/or resolved_date could be in that list. Likewise, the same problems can reoccur over the lifetime of a patient. Without the dates, these will simply look like a series of repeated values, but without any information about when they occurred.

Now that we discussed this at length, we can update the relation descriptions with this information. This is done using the dependent flag. As with above, it is defined in the direction of target to source. Thus the only one with a dependence is problem_list to patient.

# ...

relations:
  - name: /address
    source: patient
    target: address
    path: /address
    multiple: false
    dependent: false

  - name: /problem_list
    source: patient
    target: problem_list
    path: /problem_list
    multiple: true
    dependent: true

  - name: /problem_list/icd9
    source: problem_list
    target: icd9
    path: /icd9
    multiple: false
    dependent: false

#...

Another annotation to add is the identity of concepts. This is done by specifying local fields that contribute to the identity of the concept.

The fields have been omitted for brevity.

concepts:
  - name: patient
    path: /
    identity:
      - mrn

  - name: address
    path: /address
    identity: null

  - name: problem_list
    path: /problem_list
    identity:
      - noted_date

  - name: icd9
    path: /problem_list/icd9
    identity:
      - code

Although not required, the identity of the address concept was explicitly set to null as a form of documentation. If the identity is not explicitly defined, then all fields contribute to the identity.

See example.yaml for the final description.

Code Generation

Using this information we can auto-generate SQL following a simple set of rules:

  • one table per concept
  • unique primary key for each table
  • foreign keys to encode dependence on other concepts
  • foreign key columns cannot be null
  • unique index for the identity

The Go program tosql.go takes the example.yaml and generates the output shown in example.sql.

create table "patient" (
"_id" integer primary key autoincrement,
"address_id" integer not null,
"mrn" string,
"first_name" string,
"last_name" string,
foreign key ("address_id") references "address"("_id")
);
create unique index "patient_identity" on "patient"("mrn", "address_id");
create table "address" (
"_id" integer primary key autoincrement,
"street" string,
"city" string,
"state" string,
"zipcode" string
);
create unique index "address_identity" on "address"("street", "city", "state", "zipcode");
create table "problem_list" (
"_id" integer primary key autoincrement,
"patient_id" integer not null,
"icd9_id" integer not null,
"noted_date" date,
"resolved_date" date,
foreign key ("patient_id") references "patient"("_id"),
foreign key ("icd9_id") references "icd9"("_id")
);
create unique index "problem_list_identity" on "problem_list"("noted_date", "patient_id", "icd9_id");
create table "icd9" (
"_id" integer primary key autoincrement,
"code" string,
"description" string
);
create unique index "icd9_identity" on "icd9"("code");
model:
name: patient
topic: patient
concepts:
- patient
- address
- problem_list
- icd9
relations:
- /address
- /problem_list
- /problem_list/icd9
relations:
- name: /address
source: patient
target: address
path: /address
multiple: false
dependent: false
- name: /problem_list
source: patient
target: problem_list
path: /problem_list
multiple: true
dependent: true
- name: /problem_list/icd9
source: problem_list
target: icd9
path: /icd9
multiple: false
dependent: false
concepts:
- name: patient
path: /
identity:
- mrn
fields:
- name: mrn
path: /mrn
format: string
- name: first_name
path: /first_name
format: string
- name: last_name
path: /last_name
format: string
- name: address
path: /address
identity: null
fields:
- name: street
path: /street
format: string
- name: city
path: /city
format: string
- name: state
path: /state
format: string
- name: zipcode
path: /zipcode
format: string
- name: problem_list
path: /problem_list
identity:
- noted_date
fields:
- name: noted_date
path: /noted_date
format: date
- name: resolved_date
path: /resolved_date
format: date
- name: icd9
path: /problem_list/icd9
identity:
- code
fields:
- name: code
path: /code
format: string
- name: description
path: /description
format: string
package main
import (
"fmt"
"io"
"io/ioutil"
"log"
"os"
"strings"
yaml "gopkg.in/yaml.v2"
)
type Description struct {
Model *Model
Relations []*Relation
Concepts []*Concept
}
func (d *Description) Concept(s string) *Concept {
for _, c := range d.Concepts {
if s == c.Name {
return c
}
}
return nil
}
type Model struct {
Name string
Topic string
Relations []string
Concepts []string
}
type Relation struct {
Name string
Path string
Target string
Source string
Multiple bool
Dependent bool
}
type Concept struct {
Name string
Path string
Fields []*Field
Identity []string
}
type Field struct {
Name string
Path string
Format string
}
type Table struct {
Name string
Columns []*Column
Unique []string
}
type Column struct {
Name string
Type string
PrimaryKey bool
Nullable bool
ForeignTable string
ForeignColumn string
}
func main() {
b, err := ioutil.ReadAll(os.Stdin)
if err != nil {
log.Fatal(err)
}
var d Description
if err := yaml.Unmarshal(b, &d); err != nil {
log.Fatal(err)
}
// Concepts map to tables.
for _, n := range d.Model.Concepts {
c := d.Concept(n)
// Every table gets a primary key.
cols := []*Column{
&Column{
Name: "_id",
Type: "integer",
PrimaryKey: true,
},
}
var id []string
if len(c.Identity) > 0 {
id = append(id, c.Identity...)
} else {
for _, f := range c.Fields {
id = append(id, f.Name)
}
}
// Check for other dependent fields contributing to the identity.
for _, r := range d.Relations {
if c.Name == r.Target && r.Dependent {
id = append(id, fmt.Sprintf("%s_id", r.Source))
cols = append(cols, &Column{
Name: fmt.Sprintf("%s_id", r.Source),
Type: "integer",
ForeignTable: r.Source,
ForeignColumn: "_id",
})
} else if c.Name == r.Source && !r.Dependent {
id = append(id, fmt.Sprintf("%s_id", r.Target))
cols = append(cols, &Column{
Name: fmt.Sprintf("%s_id", r.Target),
Type: "integer",
ForeignTable: r.Target,
ForeignColumn: "_id",
})
}
}
// Fields are columns.
for _, f := range c.Fields {
cols = append(cols, &Column{
Name: f.Name,
Type: f.Format,
Nullable: true,
})
}
t := Table{
Name: c.Name,
Columns: cols,
Unique: id,
}
generateTableSQL(os.Stdout, &t)
}
}
func generateTableSQL(w io.Writer, t *Table) {
fmt.Fprintf(w, `create table "%s" (`, t.Name)
fmt.Fprint(w, "\n")
for i, c := range t.Columns {
if i > 0 {
fmt.Fprint(w, ",\n")
}
fmt.Fprint(w, " ")
fmt.Fprintf(w, `"%s" %s`, c.Name, c.Type)
if c.PrimaryKey {
fmt.Fprint(w, " primary key autoincrement")
} else if !c.Nullable {
fmt.Fprint(w, " not null")
}
}
for _, c := range t.Columns {
if c.ForeignTable == "" {
continue
}
fmt.Fprint(w, ",\n ")
fmt.Fprintf(w, `foreign key ("%s") references "%s"("%s")`, c.Name, c.ForeignTable, c.ForeignColumn)
}
fmt.Fprint(w, "\n);\n\n")
// Quote column names.
unique := make([]string, len(t.Unique))
for i, f := range t.Unique {
unique[i] = fmt.Sprintf(`"%s"`, f)
}
fmt.Fprintf(w, `create unique index "%s_identity" on "%s"(%s);`, t.Name, t.Name, strings.Join(unique, ", "))
fmt.Fprint(w, "\n\n")
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment