openEHR + SNOMED CT: a perfect combination for data querying

Posted by Pablo Pazos Gutierrez on December 27, 2017, 6:04 pm

On this article I'll explain how openEHR queries can benefit from SNOMED CT concepts and expressions, focused on our implementation on the EHRServer.

When I started to work on eHealth, I saw a great potential in the openEHR standard as a way to standardize how clinical information is defined, managed and stored. One of the most valuable aspects of the openEHR specs is the capability of defining data queries based on semantic clinical information models (archetypes) in a very simple, powerful and standardized way. This is one of the features I implemented in the EHRServer that makes it different from other Clinical Data Storage (CDR) solutions: creating queries on the fly without the need of writing source code or SQL.

If we take what openEHR provides in terms of querying (mostly related on using information structures for querying the CDR(, and combine that with SNOMED CT concepts (mostly related with semantic content), the result is simplified advanced querying and super easy Clinical Decision Support (CDS). Let's see how this is done!

About archetypes and templates

The core of all openEHR data queries is on the clinical information models called archetypes. Archetypes include the definition of data structures, data constraints, concept definitions, purpose definition and naming/terminology definitions. Basically data structures in archetypes are tree structures based on a generic information model (the openEHR Information Model, HL7 v3 RIM, HL7 FHIR Resources, etc).

Each node (data point) in a data structure tree has a node ID and a path (like in a folder structure, that is also a tree), both help identify nodes in an archetype. Each node has also a type that is defined on the information model, for instance we can have free text or coded text nodes if we use the openEHR Information Model.

OpenEHR queries are defined in terms of paths that reference nodes, like the coded text where a diagnosis is recorded, or a physical quantity where the systolic blood pressure is recorded, etc. Paths are defined inside an archetype, so we also need the archetype ID to fully identify a specific node. So with the archetype ID and the path, we can know the type of node (e.g. coded text), so in a Query Builder we can define a query for data that matches a certain criteria, for instance the diagnosis is equals to a SNOMED CT code.

In openEHR, templates are just big archetypes that define a complete clinical document, and can contain many archetypes (blood pressure, triage, clinical evaluation, diagnosis, etc). In the EHRServer we are going to work just with templates, we don't use archetypes directly.


This terminology is not just a list of codes, is a graph of related concepts, with hierarchies, attributes, and translations. SNOMED has an expression language that allows to define more specific concepts based on existing ones. Expressions are basically a way of querying the SNOMED graph and retrieve subsets.

For instance, SNOMED has 925 concepts that match "diabetes mellitus", 804 if we filter that by "disorder". What happens if we need all the types of "diabetes mellitus that are disorders", we'll need 804 codes. With expressions we only need the parent node of all "diabetes mellitus that are disorders", that's: "73211009 | Diabetes mellitus (disorder) |". Now if we want all the descendants of that concept, we just use "<< 73211009 | Diabetes mellitus (disorder) |".

If we need something more complex like "all the respiratory infections caused by a virus", that includes 54 different concepts, we just use this SNOMED expression:

<< 275498002 | Respiratory tract infection (disorder) | :
246075003 | Causative agent (attribute) | = 49872002 | Virus (organism) |

openEHR queries + SNOMED expressions

Let's say we want to query clinical documents that contain a certain diagnosis, but any type of it, for example: any type of diabetes mellitus. In our implementation of openEHR queries, this would require to specify all 804 concept codes that are all the types of diabetes mellitus in the SNOMED CT terminology. This is unpractical, and unmaintainable, since SNOMED is constantly updated so we need to update the query each time a new type of diabetes mellitus is added to SNOMED or lose ths new codes in the future.


The openEHR query alone would look like:

  • Criteria 1

    • Archetype ID = openEHR-EHR-EVALUATION.problem_diagnosis.v1

    • Path (diagnosis node) = /data[at0001]/items[at0002]/value

    • Datatype = DV_CODED_TEXT (has code and terminologyId attributes)

    • Condition: code IN ('609565001','111307005','724067006','111552007', ...) AND terminologyID = "SNOMED-CT"

Now with SNOMED expressions we can do something like:

  • Criteria 1

    • Archetype ID = openEHR-EHR-EVALUATION.problem_diagnosis.v1

    • Path (diagnosis node) = /data[at0001]/items[at0002]/value

    • Datatype = DV_CODED_TEXT (has code and terminologyId attributes)

    • Condition: code in_snomed_expr "<< 73211009 | Diabetes mellitus (disorder) |" AND terminologyID = "SNOMED-CT"

To complete the example, the use case of a query like this can be to quickly find patients with certain conditions, diseases or health risk factors to include them into specific care programs, invite them to participate on clinical trials, or send alerts and reminders to clinicians to look for those diseases or even recommend to order specific tests. In all cases, this is a great solution for Clinical Decision Support.



Implementation in the EHRServer

A couple of months ago I started to work with our partner VeraTech from Spain, on the integration of openEHR queries, implemented on the EHRServer, with some services they implemented around SNOMED CT, including the SNOMED expression resolution service they provide, and that is part of their SNQuery tool. I worked directly with Diego Boscá Tomás (specialist on openEHR and SNOMED) for this integration.

The EHRServer queries are defined on a user interface called EHRServer Query Builder, and stored in the database. Queries are all based on archetype IDs and paths. When a query is evaluated, it is translated to HQL (Hibernate Query Language), and that is translated by Hibernate into SQL specific to the database in use (MySQL by default). What I did was changing the HQL generator, to check for conditions of the type "in_snomed_expr", when that is detected, the SNOMED expression that appears on the condition is evaluated by the VeraTech's service, and the result is a plain list of SNOMED concepts. That list is included into the generated HQL, appearing on the query like "code IN ('609565001','111307005','724067006','111552007', ...)".

This implementation adds all the semantic power of SNOMED expressions to openEHR queries, allowing complex queries to be specified in a couple of minutes and directly into production, without any code changes or long testing and deployment processes.

I'm sure this feature will be used a lot in the future by systems that use SNOMED CT in their records, and the world is moving on this direction.

With this in place, we can offer advanced services on the next releases of the EHRServer, stay tuned!

This implementation wouldn't be possible without the collaboration of VeraTech, kudos to them.