Archetype Query Language (AQL)
Issuer: openEHR Specification Program | |
---|---|
Release: QUERY latest |
Status: STABLE |
Revision: [latest_issue] |
Date: [latest_issue_date] |
Keywords: query, AQL, archetype, Xpath, openehr |
© 2008 - 2021 The openEHR Foundation | |
---|---|
The openEHR Foundation is an independent, non-profit foundation, facilitating the sharing of health records by consumers and clinicians via open specifications, clinical models and open platform implementations. |
|
Licence |
Creative Commons Attribution-NoDerivs 3.0 Unported. https://creativecommons.org/licenses/by-nd/3.0/ |
Support |
Issues: Problem Reports |
Amendment Record
Issue | Details | Raiser, Implementer | Completed |
---|---|---|---|
Release 1.2.0 (unreleased) |
|||
Release 1.1.0 |
|||
SPECQUERY-8 Correct numerous minor typographical and formatting errors. Improve the wording and readability. |
S Iancu |
||
SPECQUERY-30 Update (rewrite) grammar to ANTLR4 format. |
S Iancu, |
04 Feb 2021 |
|
SPECQUERY-33 Add support for |
S Iancu |
01 Feb 2021 |
|
SPECQUERY-1 Allow name attribute term codes to be used in node predicate (SPECPR-129). |
C Ma, |
01 Feb 2021 |
|
SPECQUERY-27 Add support for |
S Iancu |
10 Jan 2021 |
|
SPECQUERY-28 Add support for built-in functions. |
S Iancu |
10 Jan 2021 |
|
SPECQUERY-3 Add support for literals (fixed values) in column expression. |
D Bosca, |
12 Dec 2020 |
|
SPECQUERY-7 Add support for |
T Beale, |
12 Dec 2020 |
|
SPECQUERY-12 Improve terminology support by adding a new |
L Marco-Ruiz, |
24 Nov 2020 |
|
SPECQUERY-25 Add support for aggregate functions like |
S Iancu |
23 Nov 2020 |
|
SPECQUERY-17 Add support for |
openEHR SEC |
09 Nov 2020 |
|
SPECQUERY-31 Move AQL examples to a new separate specification. |
openEHR SEC |
09 Nov 2020 |
|
SPECQUERY-16 Add support for |
P Pazos, |
05 Nov 2020 |
|
Release 1.0.1 |
|||
1.0.1 |
SPECQUERY-20 Clarify major keyword documentation and semantics; remove |
openEHR SEC |
15 Jun 2020 |
SPECQUERY-23 Improve introductory text; update section 2. |
T Beale |
15 Apr 2020 |
|
SPECQUERY-19 Clarify use of terminology 'identified path'. |
P Pazos, |
10 Apr 2020 |
|
SPECQUERY-14: Correct typographical errors (SPECPR-320, SPECPR-345, SPECPR-212, SPECPR-332, SPECPR-331); |
P Pazos, |
10 Apr 2020 |
|
Release 1.0.0 |
|||
1.0.0 |
SPECQUERY-6 Publish original AQL wiki page as formal specification; |
openEHR SEC |
10 Jan 2017 |
Add Antlr grammar for AQL |
B Lah |
13 Jan 2012 |
|
Initial Writing, based on openEHR AQL wiki page. |
C Ma |
10 Dec 2008 |
Acknowledgements
Contributors
This specification benefited from wide formal and informal input from the openEHR and wider health informatics community. The openEHR Foundation would like to recognise the following people for their contributions.
-
Birger Haarbrandt, MSc, Peter L. Reichertz Institut for Medical Informatics (PLRI), Germany
-
Bjørn Næss, DIPS, Norway
-
Borut Fabjan, Program Manager, Better, Slovenia
-
Bostjan Lah, Senior Architect, Better, Slovenia
-
Heath Frankel, Ocean Informatics, Australia
-
Ian McNicoll MD, FreshEHR, UK
-
Luiz Marco-Ruiz, PhD, Norwegian Centre for E-health Research, Norway
-
Matija Polajnar, PhD, Software Engineer, Better, Slovenia
-
Pablo Pazos Gutierrez, Senior engineer, CaboLabs, Uruguay
-
Sebastian Iancu, Architect, Code24, Netherlands
-
Seref Arikan, MEcon, Ocean Informatics UK, Centre for Health Informatics and Multi-professional Education (CHIME) UK
-
Thomas Beale, Ars Semantica UK, openEHR Foundation Management Board
1. Preface
1.2. Status
This specification is in the STABLE state. The latest development version of this document can be found at https://specifications.openehr.org/releases/QUERY/latest/AQL.html.
Known omissions or questions are indicated in the text with a 'to be determined' paragraph, as follows:
TBD: (example To Be Determined paragraph)
1.3. Tools
Various tools that can be used to work with archetypes and templates, including extracting 'archetype paths' for use in AQL statements, are listed on the openEHR website modelling tools page.
1.4. Feedback
Feedback may be provided on the openEHR AQL forum.
Issues may be raised on the specifications Problem Report tracker.
To see changes made due to previously reported issues, see the QUERY component Change Request tracker.
1.5. Original Source
The text of this specification was originally posted on the openEHR wiki, and was used by the global openEHR community in that form.
2. Overview
2.1. Existing Query Languages
Currently, mainstream database query languages such as Structured Query Language (SQL), W3C XQuery, and Object Query Language (OQL) have dependencies on particular data schemas and physical representations (e.g. relational tables). Users must know the physical data schema of a particular database in order to write a valid query. A query statement written for one schema will not usually work in other systems, which generally have different data schemas, even for storing the same data. One reason schemas are different between systems relates to differing optimisation requirements and choices. Queries written in these languages therefore are not usually portable across systems.
More modern web-oriented languages such as W3C Sparql and GraphQL are not database- or system-oriented as such, and do not suffer from the problem of portability. However, both the database- and web-oriented languages suffer from another problem, which is being limited to a single level of semantic representation, i.e. they effectively assume an Entity-Attribute-Value (EAV) data meta-model. This prevents their direct use with multi-level models, such as those based on openEHR Archetype model, or its ISO equivalent ISO 13606-2:2019.
In order to overcome these limitations, this specification describes a query language designed to support portable queries based on multi-level models.
2.2. What is AQL?
Archetype Query Language (AQL) is a declarative query language developed specifically for expressing queries used for searching and retrieving the data found in archetype-based repositories. The examples used in this specification mostly relate to the openEHR Reference Model (RM) and the openEHR clinical archetypes, but the syntax is independent of information model, application, programming language, system environment, and storage model.
The minimum requirement for data to be queried using AQL (including with archetype structures and terminology) is that it be based on archetypes, which concretely means that it contains fine-grained semantic markings in the form of archetype and terminology codes. This may be native openEHR RM data, or legacy system data to which the relevant semantic markers (i.e. archetype and terminology codes) have been added. Consequently, AQL expresses queries in terms of a combination of archetype semantic elements and RM data structure elements on which the archetypes are based, rather than solely the latter, which is the case for EAV-based query languages such as SQL. This is the key in developing and sharing semantic queries across system and enterprise boundaries.
AQL has the following distinctive features:
-
the utilization of openEHR path syntax to locate clinical statements and data values within them using archetypes; this syntax is used to represent the query criteria and returned results, and allows stating query criteria using archetype and node identifiers, data values within the archetypes, and class attributes defined within the Reference Model;
-
returned results may be objects of any granularity from 'top-level' RM objects to primitive data items;
-
the utilization of a
CONTAINS
operator to match data hierarchy relationships in order to constrain the source data to which the query is applied; -
the utilization of ADL-like operator syntaxes, such as
matches
,exists
andnot
; -
model-neutral syntax: AQL does not have any dependency on a Reference Model; it is also neutral to system implementation and environment;
-
supports time-based conditions to query historical versions of data.
AQL also has features found in other query languages, including:
-
naming returned results;
-
query criteria parameters;
-
arithmetic, comparison and logical operators;
-
functions;
-
preferences on the result retrieval and structuring, such as ordering and paginating results.
2.3. AQL example
Below is an example of an AQL statement. This statement returns all blood pressure values contained in COMPOSITION
instances defined by the openEHR-EHR-COMPOSITION.encounter.v1
archetype, which contain OBSERVATION
instances defined by the openEHR-EHR-OBSERVATION.blood_pressure.v1
archetype, where the systolic value is greater than or equal to 140 or whose diastolic value is greater than or equal to 90, within a specified EHR (i.e. whose EHR id is the value of the variable $ehrUid
). The AQL syntax is a synthesis of SQL structural syntax and the openEHR path syntax.
SELECT -- Select clause o/data[at0001]/.../items[at0004]/value AS systolic, -- Identified path with alias o/data[at0001]/.../items[at0005]/value AS diastolic, c/context/start_time AS date_time FROM -- From clause EHR[ehr_id/value=$ehrUid] -- RM class expression CONTAINS -- containment COMPOSITION c -- RM class expression [openEHR-EHR-COMPOSITION.encounter.v1] -- archetype predicate CONTAINS OBSERVATION o [openEHR-EHR-OBSERVATION.blood_pressure.v1] WHERE -- Where clause o/data[at0001]/.../items[at0004]/value/value >= 140 OR -- value comparison o/data[at0001]/.../items[at0005]/value/value >= 90 ORDER BY -- order by datetime, latest first c/context/start_time DESC
More examples can be found in the openEHR AQL examples document.
3. AQL Syntax Description
3.1. Overview
AQL has the following clauses:
-
The
SELECT
clause specifies the data elements to be returned. -
The
FROM
clause specifies the result source and the corresponding containment criteria. -
The
WHERE
clause specifies data value criteria within the result source. -
The
ORDER BY
clause indicates the data items used to order the returned result set. -
The
LIMIT
clause indicates which portion of the result set will be returned.
In AQL, openEHR path syntax is used to locate any node or data values defined within archetypes. In the SELECT
clause, the openEHR path syntax can be used to indicate expected objects or data values to be returned (similarly to how SQL 'projections' work). The FROM
clause uses containment mechanisms indicating the scope of the query data source. WHERE
clause uses archetype path to set query criteria on archetypes or any node within the archetypes. The following sections introduce the AQL syntax in detail.
3.2. Reserved words and characters
Keywords in AQL are not case-sensitive, so SELECT
, Select
, select
, SeLeCt
, etc. are processed in the same way. AQL has the following reserved keywords and characters:
-
SELECT
,AS
,FROM
,CONTAINS
,WHERE
-
ORDER BY
,LIMIT
,OFFSET
-
DISTINCT
-
operators:
AND
,OR
,NOT
,LIKE
,matches
,exists
,<
,>
,=
,!
-
functions:
COUNT
,MIN
,MAX
,SUM
,AVG
,LENGTH
,CONTAINS
,POSITION
,SUBSTRING
,CONCAT
,CONCAT_WS
,ABS
,MOD
,CEIL
,FLOOR
,ROUND
,CURRENT_DATE
,CURRENT_TIME
,CURRENT_DATE_TIME
,NOW
,CURRENT_TIMEZONE
,TERMINOLOGY
-
literals:
true
,false
,NULL
-
"
and'
: double and single quote characters are used to delimit string values -
|
: bar characters are used to delimit intervals -
[]
: brackets are used to delimit coded terms or predicate expressions -
{}
: curly braces are used withmatches
delimiting matches criteria -
()
: parentheses are used with function calls or to separate logic or group expressions and enforce precedence -
$
: dollar sign used to prefix parameters -
:
,?
,&
,%
,#
,@
,*
: characters that mainly used as part of a terminology or openEHR URI -
/
,.
: used by openEHR path syntax
Note
|
TOP modifier is deprecated in favour of the LIMIT clause combined with ORDER BY (see below).
|
3.3. openEHR path syntax
The openEHR path syntax has been described in detail (see Paths and Locators section in the Architecture Overview specification). AQL utilises the openEHR path syntax to reference both coarse-grained and fine-grained archetype nodes (objects and data values, respectively), and openEHR class instances and attributes, when referencing something that is not defined in an archetype but is part of the openEHR Reference Model.
There are two types of openEHR paths used in AQL queries: an archetype path, referring to a node within an archetype, and a RM class attribute path, pointing to an attribute of an openEHR Reference Model class. These paths uses also predicate expressions (see Section 3.6) enclosed by brackets ([]
), to define criteria for coarse-grained and fine-grained data.
A set of openEHR path examples are shown below:
Archetype RM type | Archetype node name | Archetype ID | Path syntax | Referenced RM Type |
---|---|---|---|---|
|
Apgar score |
openEHR-EHR-OBSERVATION.apgar.v1 |
|
|
|
Heart rate |
openEHR-EHR-OBSERVATION.pulse.v1 |
|
|
|
Systolic |
openEHR-EHR-OBSERVATION.blood_pressure.v2 |
|
|
openEHR RM attribute name | Path syntax |
---|---|
|
|
|
|
|
|
3.4. Variables
AQL variables are used to refer a specific openEHR class defined within an AQL FROM
clause. Consider the following example:
SELECT o/data[at0001]/events[at0006]/data[at0003]/items[at0004]/value/magnitude, o/data[at0001]/events[at0006]/data[at0003]/items[at0005]/value/magnitude FROM EHR [ehr_id/value='1234'] CONTAINS COMPOSITION [openEHR-EHR-COMPOSITION.encounter.v1] CONTAINS OBSERVATION o [openEHR-EHR-OBSERVATION.blood_pressure.v1] WHERE o/data[at0001]/events[at0006]/data[at0003]/items[at0004]/value/magnitude >= 140 OR o/data[at0001]/events[at0006]/data[at0003]/items[at0005]/value/magnitude >= 90
This example defines only one AQL variable: o
, which represents OBSERVATIONs
that conform to (i.e. created using) the openEHR-EHR-OBSERVATION.blood_pressure.v1
archetype. This combination of a variable reference and a class represents a class expression (see Section 3.10.1).
The combination of a variable reference and a subordinate path forms one kind of identified path (see Section 3.7), used to specify a fine-grained data element within a larger object matched by the FROM
clause.
3.4.1. Syntax
AQL variables have the following features:
-
AQL variables are defined in the AQL
FROM
clause. -
A variable name must be unique within an AQL statement.
-
Not all classes defined within the
FROM
clause need a variable name. It is required only when other AQL clauses need to refer to the class identified in theFROM
clause. The example above doesn’t define a variable name for theEHR
orCOMPOSITION
classes, only for theOBSERVATION
class (variableo
), to allow the latter to be referred to in theWHERE
andSELECT
clauses, as the root object of specific paths.
A variable name is formed of an initial letter followed by any number of alphanumerics and underscores, as long as it does not clash with an AQL reserved word. Variable names are not case-sensitive. Digits are not recommended at the beginning of a variable name although there is no formal restriction.
3.5. Parameters
AQL syntax supports parameterisation. AQL parameters are used to indicate criteria, which will be substituted with values at run time. Supporting parameterisation is one of the mechanisms used to realise sharing AQL statements within a system or across system boundaries.
The parameters can be class specific (such as the parameter of EHR identifier or EHR creation date/time), archetype specific (such as an archetype identifier, or archetype constraints), or template specific (such as a template identifier or template constraints).
3.5.1. Syntax
A parameter always starts with a dollar sign $
followed by the parameter name, e.g. $ehrUid
. The parameter name can consist of letters, digits and underscores. It cannot have spaces, and it cannot be an AQL reserved word.
A parameter can be used for any criteria values within an AQL statement, e.g.:
-
within a predicate:
[$archetypeId]
,[at0003, $nameValue]
,[ehr_id/value=$ehrUid]
-
withing a
WHERE
criteria:o/data[at0001]/events[at0006]/data[at0003]/items[at0004]/value/value > $systolicCriteria
Note
|
In the WHERE criteria, when the value of the parameter is not a number or boolean value, it should be substituted with the corresponding quotes added to the value, for instance o/../.. = $textVariable , when substituted should look like this o/../.. = "value" (single or double quotes could be used). So substituted values follow the same rules as each type when the value is specified as a literal in the AQL expression: strings, dates, times and datetimes should be quoted, numbers and booleans are not quoted.
|
3.5.2. Parameters use case
Parameters are needed when the same AQL query statement is used with different criteria values. This AQL example is to return all abnormal blood pressure values for a single specific EHR, indicated by $ehrUid
parameter:
SELECT o/data[at0001]/events[at0006]/data[at0003]/items[at0004]/value/magnitude, o/data[at0001]/events[at0006]/data[at0003]/items[at0005]/value/magnitude FROM EHR [ehr_id/value=$ehrUid] CONTAINS COMPOSITION [openEHR-EHR-COMPOSITION.encounter.v1] CONTAINS OBSERVATION o [openEHR-EHR-OBSERVATION.blood_pressure.v1] WHERE o/data[at0001]/events[at0006]/data[at0003]/items[at0004]/value/magnitude >= 140 OR o/data[at0001]/events[at0006]/data[at0003]/items[at0005]/value/magnitude >= 90
This parameter will be substituted by a specific EHR.ehr_id
value at run time. Consequently, this query can be reused for all EHRs
within a system.
3.5.3. Parameter Resolution
AQL query parameters can be resolved at application level, or EHR system level. It depends on what the query parameters are used for and the design/implementation of the system or components. Query parameters would be normally resolved outside of a query engine.
If a query needs to be reusable across different EHR systems, any query parameters normally need to be registered in these EHR systems so that they can be resolved with real values from each environment. A typical example of this type of query parameter is $ehrUid
. If the query is only used within an application, then the query parameters would be resolved by the application, such as the parameter of healthcare facility identifier or template identifier.
There are no specific guidelines on how to resolve query parameters. Generally speaking, a parameter name is used as a key (or a key is associated with a parameter name), and the key needs to be unique within the boundary where the parameters are resolved. The EHR system or application needs to have the API functions to get the real value with a given parameter name or key.
3.6. Predicates
AQL uses predicates (see Predicate Expressions) to define criteria for coarse-grained and fine-grained data. Predicate expressions are always enclosed by brackets ([]
).
There are three types of predicates used by AQL: the standard predicate, the archetype predicate, and the node predicate.
3.6.1. Standard predicate
Standard predicates always have a left-hand operand, an operator and a right-hand operand, e.g. [ehr_id/value='123456']
.
-
The left-hand operand is normally an openEHR path, such as
ehr_id/value
,name/value
. -
The right-hand operand is normally a criterion value or a parameter, such as
'123456'
,$ehrUid
. It can also be another openEHR path. -
The operator can be one of the following:
>, >=, =, <, <=, !=
3.6.2. Archetype predicate
An archetype predicate is a shortcut of a standard predicate, i.e. the predicate does not have the left-hand operand and operator. It only has an archetype id, e.g. [openEHR-EHR-COMPOSITION.encounter.v1]
. The archetype predicate is a specific type of query criterion indicating which archetype instances are relevant to this query. It is used to scope the data source from which the query result data is to be retrieved. Therefore, an archetype predicate is only used within an AQL FROM
clause, for example,
FROM EHR [ehr_id/value='1234'] CONTAINS COMPOSITION c [openEHR-EHR-COMPOSITION.encounter.v1] CONTAINS OBSERVATION o [openEHR-EHR-OBSERVATION.blood_pressure.v1]
These predicates could also be written as standard predicates:
FROM EHR e CONTAINS COMPOSITION c CONTAINS OBSERVATION o WHERE e/ehr_id/value = '1234' AND c/archetype_node_id = 'openEHR-EHR-COMPOSITION.encounter.v1' AND o/archetype_node_id = 'openEHR-EHR-OBSERVATION.blood_pressure.v1'
This equivalence could be used by implementers of AQL to transform predicates into their standard form, as a canonical representation of the query, which can simplified its processing, validation, evaluation and transformation into specific database query languages.
3.6.3. Node predicate
A node predicate is also a shortcut of a standard predicate. It has the following forms:
-
Containing an
archetype_node_id
(i.e. an at-code) only, e.g.:[at0002]
The corresponding standard predicate would be:
[archetype_node_id=at0002]
-
Containing an
archetype_node_id
and aname/value
criterion, e.g.:[at0002 and name/value=$nameValue] [at0002 and name/value='real name value']
The corresponding standard predicates would be:
[archetype_node_id=at0002 and name/value=$nameValue] [archetype_node_id=at0002 and name/value='real name value']
-
Containing an
archetype_node_id
and a shortcut of aname/value
criterion, e.g.:[at0002, $nameValue] [at0002, 'real name value']
The corresponding standard predicates would be:
[archetype_node_id=at0002 and name/value=$nameValue] [archetype_node_id=at0002 and name/value='real name value']
-
Containing an
archetype_node_id
and a shortcut of aname
term code criterion (thename
attribute is coded text rather than free text), consisting of a terminology_id, a code_string and an optional (informational) value, formatted asterminology_id::code_string|value|
, e.g.:[at0002, at0003] [at0002, snomed_ct(3.1)::313267000] [at0003, icd10AM::F60.1|Schizoid personality disorder|]
The corresponding standard predicates would be:
[archetype_node_id=at0002 and name/defining_code/code_string='at0003' and name/defining_code/terminology_id/value='local'] [archetype_node_id=at0002 and name/defining_code/code_string='313267000' and name/defining_code/terminology_id/value='snomed_ct(3.1)'] [archetype_node_id=at0003 and name/defining_code/code_string='F60.1' and name/defining_code/terminology_id/value='icd10AM']
-
The above four forms are the most common node predicates. A more advanced form is to include a general criterion instead of the
name/value
criterion within the predicate. The general criterion consists of left-hand operand, operator, and right-hand operand, e.g.:[at0002 and value/defining_code/terminology_id/value=$terminologyId]
A node predicate defines criteria on fine-grained data.
3.7. Identified Paths
In AQL, an identified path is the association of a variable reference (the identifier), a predicate and/or an openEHR path. Except for the FROM
clause, an identified path may appear in any clause of an AQL statement. For example, it can be used to indicate the data to be returned in a SELECT
clause, or the data item on which query criteria are applied in a WHERE
clause.
In the following example, the identified path o/data[at0001]/events[at0006]/data[at0003]/items[at0004]/value/magnitude
specifies the systolic value of the Observation archetype openEHR-EHR-OBSERVATION.blood_pressure.v1
:
SELECT o/data[at0001]/events[at0006]/data[at0003]/items[at0004]/value/magnitude FROM EHR [ehr_id/value='1234'] CONTAINS COMPOSITION [openEHR-EHR-COMPOSITION.encounter.v1] CONTAINS OBSERVATION o [openEHR-EHR-OBSERVATION.blood_pressure.v1]
3.7.1. Syntax
An AQL identified path can take any of the following forms:
-
a variable name defined in the
FROM
clause followed by an openEHR path, which specifies a data item at that path within the object, as follows:o/data[at0001]/.../data[at0003]/items[at0004]/value/value
-
a variable name followed by a predicate, which specifies an object that satisfies the predicate, as follows:
o[name/value=$nameValue]
-
a variable name followed by a predicate and an openEHR path, which specifies a data item at a path within an object satisfying the predicate, as follows:
o[name/value=$nameValue]/data[at0001]/.../data[at0003]/items[at0004]/value/value
3.8. Operators
An AQL operator is a reserved word or a symbol, used primarily in WHERE
clause to perform comparison or logical operation(s).
3.8.1. Comparison operators
The table below shows the supported AQL comparison operators, meaning and example.
Operator | Meaning | Example |
---|---|---|
= |
Equal |
|
> |
Greater than |
|
>= |
Greater than or equal to |
|
< |
Smaller than |
|
<= |
Smaller than or equal to |
|
!= |
Not equal to |
|
LIKE |
Simple pattern matching |
|
matches |
Advanced matcher |
|
3.8.1.1. LIKE
The LIKE
binary operator is used to compare a value of type string (or dates and times) against a simple pattern.
The left-hand operand is an AQL identified path to a data element that is either a String or has a String representation, while the right-hand operand is a String value, representing the pattern to be matched.
It returns true
if the value matches the supplied pattern.
Below is an example using a simple pattern matching:
SELECT e/ehr_id/value, c/context/start_time FROM EHR e CONTAINS COMPOSITION c[openEHR-EHR-COMPOSITION.administrative_encounter.v1] CONTAINS ADMIN_ENTRY admission[openEHR-EHR-ADMIN_ENTRY.admission.v1] WHERE c/context/start_time LIKE '2019-0?-*'
The ?
wildcard in the pattern matches any single character, while the *
wildcard matches any sequence of zero or more characters.
If the pattern does not contain any of these wildcards, then LIKE
acts like the =
(equal) operator.
The LIKE
operator always tries to match the entire string value, therefore if the intention is to match a sequence anywhere within the string, the pattern must start and end with the *
wildcard, e.g. "*test*"
.
To match a literal ?
or *
, the respective character in a pattern must be escaped by using the backslash \
character, e.g. "test \?"
.
3.8.1.2. matches
The matches
binary operator is used in the WHERE
clause. The left-hand operand is an AQL identified path. The right-hand operand is enclosed within curly braces ({}
), and may take the following forms:
-
cADL list constraint: a comma-separated value list, which is one of the cADL constraint forms used in the Archetype Definition Language (ADL). Below is an example using a string value list:
SELECT o/data[at0002]/events[at0003]/data/items[at0015]/items[at0018]/name FROM EHR [uid=$ehrUid] CONTAINS Composition c CONTAINS Observation o[openEHR-EHR-OBSERVATION.microbiology.v1] WHERE o/data[at0002]/events[at0003]/data/items[at0015]/items[at0018]/items[at0019]/items[at0021]/name/defining_code/code_string matches {'18919-1', '18961-3', '19000-9'}
Value list items may be of type string, date/time, integer, or real. Quotes are required for strings and date/times.
A value list is only used in AQL
WHERE
clause when the criteria is to match one item of the list. The relationships among these value list items are "OR".Some examples of value list:
matches {'string item 1', 'string item 2', 'string item3'} // A string list, equivalent to // matches 'string item 1' or .. // matches 'string item 2' or .. // matches 'string item3' matches {'2006-01-01', '2007-01-01', '2008-01-01'} // a date value list matches {1, 2, 3} // an integer list matches {1.1, 2.5, 3.8} // a real value list
-
URI: this can be a terminology URI, an openEHR EHR URI, or other URI. An example with a terminology URI is shown below:
SELECT e/ehr_status/subject/external_ref/id/value, diagnosis/data/items[at0002.1]/value FROM EHR e CONTAINS Composition c[openEHR-EHR-COMPOSITION.problem_list.v1] CONTAINS Evaluation diagnosis[openEHR-EHR-EVALUATION.problem-diagnosis.v1] WHERE c/name/value='Current Problems' AND diagnosis/data/items[at0002.1]/value/defining_code matches { terminology://snomed-ct/hierarchy?rootConceptId=50043002 }
URI data is enclosed within curly braces after
matches
operator. A URI is expressed in IETF RFC 3986 format. URIs are not case-sensitive.A terminology URI consists of the following components:
-
terminology: the URI schemes value;
-
terminology service: the URI authority value, such as SNOMED-CT;
-
terminology function name: the URI path, e.g. "hierarchy" is the function name in the example shown below;
-
argument values required by the terminology functions - URI queries;
This is an example of a terminology URI:
terminology://snomed-CT/hierarchy?rootConceptId=50043002 \_________/ \_______/ \_______/ \___________/ \______/ | | | |__________| scheme authority path | queries | | | | | | terminology terminology function argument argument uri service name value
-
-
results of function calls: a
TERMINOLOGY()
function (see below) is used to invoke operations of an external terminology server and return the results back to thematches
operator for further use. An example is shown below:SELECT c/context/start_time, p/data/items[at0002]/value FROM EHR e[ehr_id/value='1234'] CONTAINS COMPOSITION c[openEHR-EHR-COMPOSITION.problem_list.v1] CONTAINS EVALUATION p[openEHR-EHR-EVALUATION.problem-diagnosis.v1] WHERE c/name/value='Current Problems' AND p/data/items[at0002]/value/defining_code/code_string matches TERMINOLOGY('expand', 'hl7.org/fhir/4.0', 'http://snomed.info/sct?fhir_vs=isa/50697003')
3.8.2. Logical operators
3.8.2.1. AND
AND
is a binary operator used to combine two Boolean expressions. It represents a logical 'and' (conjunction) and evaluates to true
when both operands evaluate to true
, and it evaluates to false
otherwise.
3.8.2.2. OR
OR
is a binary operator used to combine two Boolean expressions. It represents a logical 'or' (disjunction) and evaluates to true
when any of the operands evaluate to true
, and it evaluates to false
otherwise.
3.8.2.3. NOT
NOT
is a unary operator used to negate a Boolean expression operand. It represents a logical 'not' (negation) and returns a Boolean result: true
means the operand is false
.
When used in the WHERE
clause, a NOT
expression is a kind of identified expression. The example below uses NOT
operator followed by a Boolean identified expression:
SELECT e/ehr_id/value FROM EHR e CONTAINS COMPOSITION c[openEHR-EHR-COMPOSITION.administrative_encounter.v1] CONTAINS ADMIN_ENTRY admission[openEHR-EHR-ADMIN_ENTRY.admission.v1] WHERE NOT (EXISTS c/content[openEHR-EHR-ADMIN_ENTRY.discharge.v1] AND e/ehr_status/subject/external_ref/namespace = 'CEC')
The above example is equivalent to the two expressions shown in the following WHERE
clause:
SELECT e/ehr_id/value FROM EHR e CONTAINS COMPOSITION c[openEHR-EHR-COMPOSITION.administrative_encounter.v1] CONTAINS ADMIN_ENTRY admission[openEHR-EHR-ADMIN_ENTRY.admission.v1] WHERE NOT EXISTS c/content[openEHR-EHR-ADMIN_ENTRY.discharge.v1] OR e/ehr_status/subject/external_ref/namespace != 'CEC'
The NOT
operand in combination with CONTAINS
can be used also in the FROM
clause to express an exclusion constraint, e.g.
SELECT e/ehr_id/value FROM EHR e CONTAINS COMPOSITION c[openEHR-EHR-COMPOSITION.administrative_encounter.v1] NOT CONTAINS ADMIN_ENTRY admission[openEHR-EHR-ADMIN_ENTRY.admission.v1] WHERE e/ehr_status/subject/external_ref/namespace != 'CEC'
3.8.2.4. EXISTS
EXISTS
is a unary operator, whose operand is an identified path. It returns a Boolean result: true
means the data associated with the specified path exists, false
otherwise. An EXISTS
expression is a kind of identified expression that can be used only in WHERE
clause.
In the AQL example below, EXISTS
is used in negated form to filter out COMPOSITIONs
(and therefore EHRs) that do not contain a discharge ADMIN_ENTRY
instance:
SELECT e/ehr_id/value FROM EHR e CONTAINS COMPOSITION c[openEHR-EHR-COMPOSITION.administrative_encounter.v1] CONTAINS ADMIN_ENTRY admission[openEHR-EHR-ADMIN_ENTRY.admission.v1] WHERE NOT EXISTS c/content[openEHR-EHR-ADMIN_ENTRY.discharge.v1]
3.9. Functions
AQL provides built-in functions to perform operations on data. Functions are similar to operators as they manipulate data and return a single result, but the difference is in the format and number of their arguments.
In general, a function syntax is function(expression, expression, …)
, accepting zero, one or more expression arguments, consisting of literals, parameters, variables, identified paths or other functions.
Functions are used primarily in the SELECT
and WHERE
clauses.
There are two types of functions: single-row functions, returning a single result for every row of the result set, and aggregate functions, returning a single result based on a group of rows of the result set rather than on a single row.
Most of AQL built-in functions are mapped to function definitions in the openEHR Foundation Types and in the openEHR Base Types, while others are borrowed from SQL languages.
Note
|
The followings are a set of core built-in functions defined by AQL specifications. Various other functions may exist however in various AQL implementations. |
3.9.1. Aggregate functions
Aggregate functions calculate a single result value from a set of input values, allowing the query to return summarized information about a data item or a result set.
Input values are selected by an expression in a form of an identified path applied to data items filtered by FROM
and WHERE
clauses.
Unless specified otherwise, these functions ignore NULL
input values.
The table below shows the built-in AQL aggregate functions:
Function | Description |
---|---|
COUNT() |
Returns the count of the number of rows returned or of the input values |
MIN() |
Returns the minimum of the non-null input values |
MAX() |
Returns the maximum of the non-null input values |
SUM() |
Returns the sum of the non-null input values |
AVG() |
Returns the average (arithmetic mean) of all the non-null input values |
The example below uses MIN()
, MAX()
and AVG()
functions to determine edge and mean values for systolic blood pressure:
SELECT MAX(o/data[at0001]/events[at0006]/data[at0003]/items[at0004]/value/magnitude) AS maxValue, MIN(o/data[at0001]/events[at0006]/data[at0003]/items[at0004]/value/magnitude) AS minValue, AVG(o/data[at0001]/events[at0006]/data[at0003]/items[at0004]/value/magnitude) AS meanValue FROM EHR e CONTAINS COMPOSITION c[openEHR-EHR-COMPOSITION.encounter.v1] CONTAINS OBSERVATION o[openEHR-EHR-OBSERVATION.blood_pressure.v1]
3.9.1.1. COUNT
The COUNT
function returns the number of values of given expression argument. The syntax is COUNT([DISTINCT] expression|*)
.
The DISTINCT
keyword can be used to calculate the number of only distinct values of expression.
The COUNT(*)
is used to calculate the number of all rows of the result set, including duplicates and NULL
values.
If there are no matching rows, then this function returns 0
.
Input values type can be anything, while the return type is always an Integer.
3.9.1.2. MIN
The MIN
function returns the minimum value of given expression argument. The syntax is MIN(expression)
.
If there are no matching rows, then this function returns NULL
.
Input values type should be either String, Date, Time, Integer or Real, and it will also determine the return type.
3.9.1.3. MAX
The MAX
function returns the maximum value of given expression argument. The syntax is MAX(expression)
.
If there are no matching rows, then this function returns NULL
.
Input values type should be either String, Date, Time, Integer or Real, and it will also determine the return type.
3.9.2. String functions
AQL provides single-row functions for examining and manipulating string values.
The table below shows the built-in AQL string functions:
Function | Description |
---|---|
LENGTH() |
Returns the length of a string |
CONTAINS() |
Validates if a string contains other string |
POSITION() |
Returns the position of the first occurrence of substring in a string |
SUBSTRING() |
Extracts the substring as specified |
CONCAT() |
Returns the string that results from concatenating the arguments |
CONCAT_WS() |
Returns the string that results from concatenating the arguments with separator |
3.9.2.1. LENGTH
The LENGTH
function returns the number of characters of a given expression argument.
The syntax is LENGTH(expression)
, where the argument is of type String. The result type is Integer.
3.9.2.2. CONTAINS
The CONTAINS
function with syntax CONTAINS(expression, substring)
returns true
if the expression argument contains the substring argument.
The arguments are of type String, while the result type is Boolean.
3.9.2.3. POSITION
The POSITION
function with syntax POSITION(substring, expression)
returns the position (index) of the first occurrence of substring argument in the value of the expression argument, or 0
(zero) if is not present.
The position of the first character (start) of a string is considered position 1
.
The arguments are of type String, while the result type is Integer.
3.9.2.4. SUBSTRING
The SUBSTRING
function with syntax SUBSTRING(expression, position, length)
extracts the substring from expression argument, starting at the position-th character, and optionally stopping after a given length characters.
The first argument is of type String, while position and length arguments are Integers. The result type is String.
The position of the first character (start) of a string is considered position 1
.
The length argument is optional, and when not provided this function will extract the substring from the given position until the end of the given expression argument.
3.9.3. Numeric functions
Numeric single-row functions are providing various mathematical functions for numeric type arguments.
The table below shows the built-in AQL numeric functions:
Function | Description |
---|---|
ABS() |
Returns the absolute value |
MOD() |
Returns the remainder |
CEIL() |
Returns the nearest integer greater than or equal to argument |
FLOOR() |
Returns the nearest integer less than or equal to argument |
ROUND() |
Rounds the argument |
3.9.3.1. ABS
The ABS
function returns the absolute value of given expression argument.
The syntax is ABS(expression)
, where the argument is of type Real or Integer. The result type is derived from the argument type.
3.9.3.2. MOD
The MOD
function with the syntax MOD(x, y)
returns the remainder of x
divided by y
.
The x
and y
expression arguments are of type Real or Integer, while the result type is derived from the arguments.
3.9.3.3. CEIL
The CEIL
function returns the Integer greater than or equal to expression argument.
The syntax is CEIL(expression)
, where the argument is of type Real or Integer.
3.9.3.4. FLOOR
The FLOOR
function returns the Integer less than or equal to expression argument.
The syntax is FLOOR(expression)
, where the argument is of type Real or Integer.
3.9.3.5. ROUND
The ROUND
function with the syntax ROUND(expression, decimal)
returns the rounded expression argument to decimal places.
The expression argument is of type Real or Integer, and the result type of the function is derived from it.
The decimal argument is of type Integer, and defaults to value 0
if not specified.
3.9.4. Date and time functions
AQL provides single-row functions for examining and manipulating date, time or datetime values.
The table below shows the built-in AQL date and time functions:
Function | Description |
---|---|
CURRENT_DATE() |
Returns the current date |
CURRENT_TIME() |
Returns the current time |
CURRENT_DATE_TIME() |
Returns the current datetime |
NOW() |
An alias for CURRENT_DATE_TIME() |
CURRENT_TIMEZONE() |
Returns the current timezone |
3.9.4.1. CURRENT_DATE
The function with syntax CURRENT_DATE()
accepts no arguments and returns the current date as String value with the format 'YYYY-MM-DD'.
3.9.4.2. CURRENT_TIME
The function with syntax CURRENT_TIME()
accepts no arguments and returns the current time as String value with the format 'hh:mm:ss'.
3.9.5. Other functions
3.9.5.1. TERMINOLOGY
Often the matches
operator may need to perform a request to a terminology server, in order to execute some operations over a value set or code system (a synonym of terminology).
Several operations may be possible: the expansion of a ValueSet (or Reference Set), checking that a concept belongs to a value set or code system, testing if one concept subsumes another one, or the mapping of concepts across different terminologies.
While matching value by a pure terminology concept specified as a terminology URI is possible, the invocation of external terminology servers from AQL requires a special function, to perform all these operations.
The TERMINOLOGY
function is used to invoke operations of an external terminology server and return the results for further processing by AQL.
The syntax is TERMINOLOGY(operation, service_api, params_uri)
, having all arguments of type String, while the return type depends on the invoked operation
and service_api
and is considered to be of Any
type.
The operation
argument specifies the action to perform over the specified value set or code system. It is not restricted to any particular value as different terminology servers may use different ways of specifying the operation and its parameters. The specified operation can be properly dereferenced by the implementation of the service_api
.
Some examples of typical operations are:
-
expand: Expand a value set; this will retrieve all the codes contained in a value set as an explicit set.
-
validate: Validate a code in a value set; this will check if a given code belongs to a value set. Recall that the value set may comprise all the codes in a code system (terminology).
-
lookup: Look-up a code; this will retrieve all the information concerning one particular code. Examples are retrieving the preferred form to display, synonyms, etc.
-
map: Map a code (translate between value sets); this will convert (find an equivalent code) from one Value Set to another one based on a predefined mapping available in the external terminology service. Translation precision may not be limited to full equivalence and different kinds of mappings may be possible (wider meaning, equivalent, narrower meaning, etc).
-
subsumes: Subsumption testing; this will determine if a particular terminology concept is a subtype (is-a) of another one. For example, test in SNOMED-CT if
"Myasthenia Gravis | 91637004"
is a subtype of"autoimmune disease | 85828009"
(i.e. test if85828009
subsumes91637004
).
The service_api
argument represents an identifier of a type of terminology service. This is not the service endpoint as such, neither a service name, but an identifier for the kind/flavour of terminology syntax that is being sent. It provides the information to send requests related to the ValueSet, operation
and parameters to the right terminology service. In addition, it is closely related to the strategy to parse the terminology service response in a format acceptable by the query processor.
Some examples of typical values for service_api
are:
Service_api | Description |
---|---|
hl7.org/fhir/4.0 |
FHIR Terminology Service v4.0 |
hl7.org/fhir/3.0 |
FHIR Terminology Service v3.0 |
hl7.org/fhir/1.0 |
FHIR Terminology Service v1.0 |
ots.oceanhealthsystems.com |
Ocean Terminology Service |
bts.better.care |
Better Terminology Service |
apelon.dts.org/4.7.1 |
Apelon Terminology Service, version 4.7.1 |
example.terminology-service.api/v1 |
An example of a fictive terminology service |
The params_uri
argument consists of URI path and query sections in compliance with IETF RFC 3986. Most common operations use this to identify the value set on which to perform the operation
. A value set may be a full code system such as the full set of codes in SNOMED-CT or LOINC. The value set URI may include the version and edition sections (e.g. Australian, US, etc). When the release and version identifiers are not provided, it is up to the external terminology server to decide which default version will be used.
Some examples of typical URI parameters are:
URI parameters | Description |
---|---|
https://vsac.nlm.nih.gov/valueset/2.16.840.1.113762.1.4.1010.2 |
|
http://snomed.info/sct/32506021000036107/version/20200331?fhir_vs=refset/1200161000168100 |
implicit VS defined for Australian Refset for Vaccination Reason |
The TERMINOLOGY
function can be used in the following ways in the WHERE
clause:
-
as a right-hand operand of
matches
operator, to directly match function results, e.g.WHERE e/value/defining_code/code_string matches TERMINOLOGY('expand', 'hl7.org/fhir/4.0', 'http://snomed.info/sct?fhir_vs=isa/50697003')
-
embedded between curly braces (
{}
) of the right-hand operand ofmatches
, for merging explicit codes with the function results (in which case the AQL interpreter is responsible for generating a valid list of codes during semantic analysis), e.g.WHERE e/value/defining_code/code_string matches {'http://snomed.info/id/442031002', TERMINOLOGY('expand', 'hl7.org/fhir/4.0', 'http://snomed.info/sct?fhir_vs=isa/50697003')}
-
as a Boolean value expression, when an assertion is being tested, e.g.
WHERE TERMINOLOGY('validate', 'hl7.org/fhir/r4', 'system=http://snomed.info/sct&code=122298005&url=http://snomed.info/sct?fhir_vs&display=Astrovirus RNA assay') = true
More examples can be found in the openEHR AQL examples document.
3.10. Expressions
AQL expressions provide a formal way of evaluating, transforming or comparing values. An expression is a combination of one or more literals, operators, variables, predicates, parameters or functions that evaluates to a value.
3.10.1. Class expressions
Class expressions are used in the FROM
clause for two purposes:
-
indicating the constraints on RM classes so as to scope the data source for the query. For instance,
EHR e[ehr_id/value='123456']
indicates that the required data must be from a specificEHR
withehr_id
value'123456'
; whileCOMPOSITION c[openEHR-EHR-COMPOSITION.report.v1]
indicates the required data must be from or must be associated with a Composition instance with archetype id -openEHR-EHR-COMPOSITION.report.v1
. -
defining an RM class variable that may be used by other clauses to indicate the required data or data items on which query criteria are applied. The example below uses the class expression to define a variable
e
which is used by theSELECT
clause indicating all relevantehr_id
values are retrieved, and a variablec
used by theWHERE
clause indicating that the query criteria is set on the Compositiontemplate_id
value.SELECT e/ehr_id/value FROM EHR e CONTAINS COMPOSITION c[openEHR-EHR-COMPOSITION.referral.v1] WHERE c/archetype_details/template_id/value = $templateId
Class expressions syntax include three parts. A class expression must have part one and at least one of part two or part three.
-
part one (mandatory): RM class name, such as
EHR
,COMPOSITION
,OBSERVATION
etc. -
part two (optional): AQL variable name
-
part three (optional): a standard predicate or an archetype predicate.
Two examples of a class expression are shown below:
// EHR class, class identifier/variable, and a standard predicate EHR e[ehr_id/value=$ehrUid]
// COMPOSITION class, class identifier/variable, and an archetype predicate COMPOSITION c[openEHR-EHR-COMPOSITION.report.v1]
3.10.2. Identified expression
An identified expression specifies matching criteria in the WHERE
clause and comes in two forms.
The first form is an expression formed by a unary operator, such as NOT
or EXISTS
, described in Section 3.8.2.
The second is a binary operator expression, formed as follows:
-
left-hand operand: an identified path or a function having an identified path as an argument;
-
operator: a comparison operator;
-
right-hand operand: one of:
The following examples illustrate the binary form.
-
The left-hand operand is an identified path; the right-hand operand is a primitive value:
o/data[at0001]/events[at0006]/data[at0003]/items[at0004]/value/value >= 140 c/archetype_details/template_id/value = 'health_encounter'
-
The left-hand operand is an identified path and right-hand operand is a parameter:
c/archetype_details/template_id/value = $templateParameter
-
The left-hand operand is an identified path; the right-hand operand is a pattern:
c/archetype_details/template_id/value LIKE '*encounter*'
-
Both left-hand operand and right-hand operand are identified paths:
o/data[at0001]/events[at0006]/data[at0003]/items[at0004]/value/value > o1/data[at0001]/events[at0006]/data[at0003]/items[at0004]/value/value
3.11. Literals
A literal value, sometimes known as constant in certain SQL implementations, is a fixed value that is supplied directly in the AQL statement, not derived from an identified path, a variable, or a column alias name. It can be used in the SELECT
clause as a column expression, or in WHERE
class as part of an identified expression.
The type of a literal value can be any of the AQL built-in types, where strings, dates, times and datetimes should be quoted, while numbers and booleans are not quoted.
Another supported literal value is the NULL
keyword, indicating that the data item is either missing or is unknown.
3.12. Built-in Types
3.12.1. Integer data
Integers are represented as numeric literals, such as 1
, 2
, 365
or -1
. Commas or periods for breaking long numbers are not allowed. Hexadecimal representation is not supported.
3.12.2. Real data
Real numbers are the decimal literals that include a decimal point, such as 3.1415926
or -1.0
. Commas or periods for breaking long numbers are not allowed.
3.12.3. Boolean data
Boolean values are indicated using the case-insensitive literals true
or false
.
3.12.4. String data
All string literals are enclosed in double or single quotes, such as 'encounter'
or "encounter"
. Line breaks are not supported.
3.12.5. Dates and Times
These types are treated as string literals and should comply with the rules for string quoting. The format of the date, time and datetime types should comply with the ISO 8601 Date and Time format specification, which allow the basic or extended formats to be used. The ISO 8601 semantics assumed by openEHR is defined in the classes of the base.foundation_types.time package. It is strongly recommended using the extended format for dates, times and datetimes. Complete or partial values are allowed.
Note
|
The underlying types of date/time strings are inferred by the AQL processor from the context (metadata associated with the path to which the date/time value is compared with, or by interpretation of the ISO 8601 format), enabling them to be processed as date/time quantities rather than literal strings by AQL engines. |
Examples are as below:
// complete datetime in ISO 8601 extended format: AQL grammar identifies this value as a datetime value. WHERE composition/context/start_time <= '1986-01-01T12:00:00.000+09:30'
// complete date in ISO 8601 extended format: AQL grammar identifies this value as a date value. WHERE composition/context/start_time <= '1986-01-01'
// a complete date in ISO 8601 basic format: AQL grammar identifies it as a string value. WHERE composition/context/start_time > '19860101'
// ISO 8601 time extended format: AQL grammar identifies it as a string value. WHERE composition/context/start_time < '12:00:00+09:30'
3.13. Query structure
3.13.1. Overview
AQL structure was briefly introduced in Section 2.3. This section describes the syntax in more formal detail. An AQL query may include the following clauses which must occur in the order shown.
-
SELECT
(mandatory) -
FROM
(mandatory) -
WHERE
(optional) -
ORDER BY
(optional) -
LIMIT
(optional)
An AQL statement must at least contain the SELECT
and FROM
clauses.
3.13.2. FROM
The FROM
clause is used to specify the subset of all the data available in a repository,
that will be available for the rest of the clauses to filter and return specific data (WHERE
, SELECT
, etc).
The data available should be defined by an information model, known here as the Reference Model (RM), that supports the use of archetype- and template-based constraint models, as described in the openEHR Architecture Overview and the Archetype Technology Overview.
All the classes referenced by the FROM
clause should be defined by the RM. For instance,
if the Reference Model is the openEHR Reference Model, the FROM
clause should only include classes like
EHR
, COMPOSITION
, OBSERVATION
, EVALUATION
, and so on.
Note
|
The AQL specification is not bound to a specific Reference Model, but to use a given RM, it should comply with some requirements: it should be an Object Oriented Model and should follow the dual-model approach. |
Then, the expressions in the WHERE
clause will filter data, but only from the subset
defined by the FROM
clause. In the SELECT
clause, a final projection is applied,
selecting only the matched data that should be retrieved in the query result. In addition
to the filtering done in the WHERE
clause, the class expressions could hold predicates
that also allow filtering data.
In summary:
-
FROM
: Defines the subset of data over which the query will be executed; -
WHERE
: Filters data from the defined subset, leaving only the data that we need; -
predicates in class expressions: further filters for data in the subset;
-
SELECT
: picks the exact data that we need to return from the query, taken from the matched data in the previous two points.
3.13.2.1. Syntax
A simple FROM
clause consists of three parts: keyword FROM
, class expression and/or containment constraints, e.g.
FROM EHR e [ehr_id/value=$ehrUid] CONTAINS COMPOSITION c[openEHR-EHR-COMPOSITION.report.v1]
3.13.2.2. Containment
Since archetypes are in hierarchical structure, AQL has a containment constraint which specifies the hierarchical relationships between parent and child data items. The FROM
clause utilises this hierarchical constraint along with class expression to determine the data source to which the AQL query is applied.
The syntax of containment constraint expression is very simple: using CONTAINS
operator between two class expressions. Left-hand operand class expression is the parent object of the right-hand operand class expression, e.g.
EHR e CONTAINS COMPOSITION c [openEHR-EHR-COMPOSITION.referral.v1]
Logical operators AND
and OR
and parentheses ()
are used when multiple containment constrains are required, e.g.
EHR e CONTAINS COMPOSITION c [openEHR-EHR-COMPOSITION.referral.v1] AND COMPOSITION c1 [openEHR-EHR-COMPOSITION.report.v1]
EHR e CONTAINS COMPOSITION c [openEHR-EHR-COMPOSITION.referral.v1] CONTAINS (OBSERVATION o [openEHR-EHR-OBSERVATION.laboratory-hba1c.v1] OR OBSERVATION o1 [openEHR-EHR-OBSERVATION.laboratory-glucose.v1])
The logical operator NOT
can be used in combination with CONTAINS
to express an exclusion constraint, i.e. the absence of any containment relationship between a parent and potential child items. The following example matches referral COMPOSITIONs that do not contain laboratory test result OBSERVATIONs:
EHR e CONTAINS COMPOSITION c [openEHR-EHR-COMPOSITION.referral.v1] NOT CONTAINS OBSERVATION o [openEHR-EHR-OBSERVATION.laboratory_test_result.v1]
3.13.3. WHERE
An AQL WHERE
clause is used to represent further criteria applied to the data items within the objects declared in the FROM
clause. A WHERE
clause expresses the query criteria that cannot be represented in other AQL clauses, such as criteria on archetype id, composition committal date/time, and the criteria on in which order the returned results should be listed.
3.13.3.1. Syntax
The WHERE
clause syntax has the following parts (in order): keyword WHERE
and one or more identified expressions. Logical operators AND
, OR
, NOT
and parenthesis ()
can be used to combine multiple identified expressions. Examples:
WHERE c/name/value=$nameValue AND c/archetype_details/template_id/value=$templateId
WHERE (c/name/value = $nameValue OR c/archetype_details/template_id/value = $templateId) AND o/data[at0001]/events[at0006]/data[at0003]/items[at0004]/value/value >= 140
3.13.4. SELECT
A SELECT
clause specifies what data is to be retrieved by the AQL query. The data can be any types from RM or any built-in data types (see Section 3.12).
3.13.4.1. Syntax
The syntax always starts with the keyword SELECT
, optionally followed by DISTINCT
, optionally followed by TOP
(deprecated - see below), followed by one or more column expressions.
A column expression is formed by an identified path, a function, a literal value or plain variable name defined in the FROM
clause.
Where a variable name is specified, the full object of the type associated with the variable is retrieved, such as a COMPOSITION
, OBSERVATION
etc.
Where a function is specified, the call has to be done using the specified arguments, and the results are returned.
Where an identified path is specified, the data item(s) having that archetype path are returned.
Each column expression may have a name alias renaming the associated data.
When the SELECT
clause contains multiple column expressions, they are separated using a comma.
Some examples are shown below.
SELECT c/name/value AS Name, c/context/start_time AS date_time, c/composer/name AS Composer FROM EHR e[ehr_id/value=$ehrUid] CONTAINS COMPOSITION c
SELECT c FROM EHR e[ehr_id/value=$ehrUid] CONTAINS COMPOSITION c
SELECT true AS dangerousBP, "alert" as indication, count(*) as counter FROM EHR [ehr_id/value=$ehrUid] CONTAINS COMPOSITION [openEHR-EHR-COMPOSITION.encounter.v1] CONTAINS OBSERVATION obs [openEHR-EHR-OBSERVATION.blood_pressure.v1] WHERE obs/data[at0001]/events[at0006]/data[at0003]/items[at0004]/value/magnitude>= 160 OR obs/data[at0001]/events[at0006]/data[at0003]/items[at0005]/value/magnitude>= 110
3.13.4.2. DISTINCT
The DISTINCT
modifier specify whether duplicate rows should be filtered out from result set.
By default, an AQL query returns all data items selected by the FROM
and WHERE
clauses as rows in the result set. A row is considered to be duplicated in the result set if there is at least one other row with the same value for each corresponding column expression. DISTINCT
specifies removal of all such duplicate rows from the result set.
Below is an example using DISTINCT
modifier to filter out duplicate rows:
SELECT DISTINCT c/name/value AS Name, c/composer/name AS Composer FROM EHR e[ehr_id/value=$ehrUid] CONTAINS COMPOSITION c
3.13.4.3. TOP
Deprecated: Starting with Release 1.1.0, the use of TOP
modifier is deprecated in favour of the LIMIT
clause combined with ORDER BY
(see below).
The TOP
will be removed in a future major release of AQL specification.
It is not allowed to use TOP
while also using LIMIT
clause in the same query.
The TOP
syntax was borrowed from SQL language for representing the number of result sets that should be returned by the AQL query. It uses BACKWARD
and FORWARD
to indicate the direction where to start to get the number of results to be returned.
It starts with keyword TOP
, followed by an integer number and/or the direction (i.e. BACKWARD
, FORWARD
), e.g.
SELECT TOP 10 c/name/value AS Name, c/context/start_time AS date_time, c/composer/name AS Composer FROM EHR e[ehr_id/value=$ehrUid] CONTAINS COMPOSITION c
3.13.5. ORDER BY
The ORDER BY
clause is used to sort the returned results.
Note
|
If no ORDER BY clause is specified, then the query result doesn’t have any default ordering criteria defined by this specification. Ordering could be defined by each implementation or be random. In terms of compliance to this specification, default ordering in results is undefined.
|
3.13.5.1. Syntax
The syntax starts with keyword ORDER BY
followed by one or more sorting expressions.
A sorting expression consists of an identified path, optionally followed by one of the DESC
, DESCENDING
, ASC
, or ASCENDING
keyword, indicating the sorting type (descending or ascending), e.g.
ORDER BY c/name/value DESC
Sorting rows assumes that data identified by the path (from the sorting expression) are comparable. It implies the use of a specific operators like equal, less-than and greater-than (all available to primitives and Ordered
types) on data identified by path.
When sorting type is not specified, ASC
(ascending) is assumed by default.
Multiple sorting expressions are separated using a comma. If two rows are equal according to the leftmost expression, they are compared according to the next sorting expression.
3.13.6. LIMIT
The LIMIT
clause is used to constrain the result set generated by the rest of the query.
It is often used together with an OFFSET
subclause.
The OFFSET
specifies the number of rows to skip before it starts to return rows from the query.
The LIMIT
specifies the number of rows to return after the OFFSET
clause has been processed.
Note that when using LIMIT
, deterministic behavior requires that the ORDER BY
clause is also used to constrain the result in a unique order.
Note
|
As of Release 1.1.0, the use of the LIMIT clause in combination with the ORDER BY clause is recommended instead of the deprecated TOP alternative.
It is not allowed to use LIMIT while also using TOP clause in the same query.
|
Where DISTINCT
modifier is used (see Section 3.13.4.2), the LIMIT
and OFFSET
applies to remaining rows, after duplicates were filtered out from result set.
3.13.6.1. Syntax
The syntax was borrowed from SQL language, similar to PostgreSQL and MySQL implementation.
It consists of two parts: keyword LIMIT
followed by number, optionally followed by OFFSET
followed by a number:
LIMIT row_count [OFFSET offset]
Both row_count
and offset
are integer numbers, row_count
minimal value is 1, while minimal value for offset
is 0.
If a LIMIT row_count
is given, no more than that many rows will be returned (but possibly fewer, if the query itself yields fewer rows).
The OFFSET offset
is optional, and when not specified then offset
= 0 is assumed.
The LIMIT row_count OFFSET offset
is used to get results in a paginated way. For instance, when offset
= 0 and row_count
= 10, the result will have at most 10 rows, starting with first row. The offset of the first row is 0 (not 1). Then with offset
= 10 and row_count
= 10, the result will contain next 10 rows at most (i.e. rows 11-20 respectively). So increasing offset
by row_count
, allows to get all the results in a paginated way.
The following example will return 10 rows, representing the second page of result set, ordered by event start time:
SELECT c/name/value AS Name, c/context/start_time AS date_time, c/composer/name AS Composer FROM EHR e[ehr_id/value=$ehrUid] CONTAINS COMPOSITION c ORDER BY c/context/start_time LIMIT 10 OFFSET 10
4. Result structure
The result structure of an AQL query in its raw form is a 2-dimensional table, conceptually similar to the tabular projection generated by an SQL query. Formally, it may be thought of as having the type Array<Array<Any>>
, where Any
is understood as the superclass of all object types, including primitive and built-in types, or the NULL
value in the case where the data item is missing or unknown.
In practical use, AQL queries will normally be executed through a library or service API, which is likely to provide an 'annotated' result structure, i.e. a data structure that includes meta-data such as column descriptors, thus enabling efficient result processing. Such annotated results are not formally defined by this specification, and are considered an artefact of the relevant API or service definition.
An example of such an extended structure is the 'Result set' used in openEHR, described by the openEHR Abstract Platform Query Service. A concrete expression for use with the openEHR REST Query API includes this structure, mapped to HTTP and JSON structures.
5. Writing AQL manually
This section describes how to write an AQL query statement manually. An example is used to illustrate how to write an AQL statement. The query scenario is:
"Get all abnormal blood pressure values that are recorded in a health encounter for a specific patient."
Note
|
More examples can be found in the openEHR AQL examples document. |
The process described here may provide some hints for constructing AQL builder tools.
5.1. Step 1. The FROM clause
A FROM
clause is to scope the data source for the query. Normally the first step is to determine the FROM
clause, which has EHR class expression and archetype class expressions.
-
EHR class expression
This determines whether this query is applied to a single EHR or all EHRs. The latter is called a population query. If it is for all EHRs, there is no need to specify
ehr_id/value
in theFROM
clause. Otherwise, you need to either specify a specific value or use a query parameter forehr_id/value
in aFROM
clause. For this particular scenario, it is for a single EHR. A parameter calledehrUid
is used for theehr_id/value
, giving this AQL query statement:FROM EHR [ehr_id/value=$ehrUid]
-
Archetype expressions
-
To write archetype expressions in the
FROM
clause, the archetypes required for the scenario must be determined, since archetypes are used to represent clinical concepts. The easy way to identify archetypes for a scenario is to identify clinical concepts mentioned in the scenario. Each clinical concept is generally associated with one archetype. Two clinical concepts are mentioned in the above scenario: 'blood pressure' and 'health encounter'. From these concepts the two archetypes used in query are identified: an Observation archetypeopenEHR-EHR-OBSERVATION.blood_pressure.v1
, and a Composition archetypeopenEHR-EHR-COMPOSITION.encounter.v1
. -
Determine whether a variable name is needed for each archetype class expression or not. This is useful if the reference to the archetype class expression is required by other AQL clauses. Without using variable names for the archetype expressions, the two archetype expressions are:
COMPOSITION [openEHR-EHR-COMPOSITION.encounter.v1] OBSERVATION [openEHR-EHR-OBSERVATION.blood_pressure.v1]
-
-
Containment expression
The openEHR RM is used to identify the hierarchical relationships among the found archetypes so as to define the containment expression. For example, a Composition archetype is the parent of an Observation archetype, so the AQL query looks as follows:
FROM EHR [ehr_id/value=$ehrUid] CONTAINS COMPOSITION [openEHR-EHR-COMPOSITION.encounter.v1] CONTAINS OBSERVATION [openEHR-EHR-OBSERVATION.blood_pressure.v1]
5.2. Step 2. The WHERE clause
The WHERE
clause represents all query criteria that cannot be represented in the FROM
clause. To write a WHERE
clause expression, the following is needed:
-
Determine the criteria. The criteria required by the scenario are abnormal blood pressure values. Different guidelines may have different definitions for abnormal blood pressure values. Here
abnormal blood pressure
is interpreted to mean 1) the 'systolic pressure' value is greater than or equal to140
; OR 2) the 'diastolic pressure' value is greater than or equal to90
. -
State an identified expression for criterion 1), as follows:
-
state the identified path for the systolic data value. A class variable name is needed as the reference to the blood pressure class expression defined in the
FROM
clause, so a variable nameobs
is added into theFROM
clause. A path to systolic data value is also required to form the identified path. The path to the systolic data value is/data[at0001]/events[at0006]/data[at0003]/items[at0004]/value/value
. -
choose the appropriate operator:
>=
is used for criterion 1. -
add the criteria value, here,
140
. The query statement including the identified expression for criterion 1) is shown below:FROM EHR [ehr_id/value=$ehrUid] CONTAINS COMPOSITION [openEHR-EHR-COMPOSITION.encounter.v1] CONTAINS OBSERVATION obs [openEHR-EHR-OBSERVATION.blood_pressure.v1] WHERE obs/data[at0001]/events[at0006]/data[at0003]/items[at0004]/value/value >= 140
-
-
Write an identified expression for criterion 2). Following the previous instruction to write the identified expression for criterion 2) which is shown as:
obs/data[at0001]/events[at0006]/data[at0003]/items[at0005]/value/value >= 90
-
Find the appropriate boolean operator to join criterion 1) and 2) - OR. The query statement looks like:
FROM EHR [ehr_id/value=$ehrUid] CONTAINS COMPOSITION [openEHR-EHR-COMPOSITION.encounter.v1] CONTAINS OBSERVATION obs [openEHR-EHR-OBSERVATION.blood_pressure.v1] WHERE obs/data[at0001]/events[at0006]/data[at0003]/items[at0004]/value/value >= 140 OR obs/data[at0001]/events[at0006]/data[at0003]/items[at0005]/value/value >= 90
5.3. Step 3. The SELECT clause
Writing a SELECT
clause depends on determining:
-
What return data is required. The afore-mentioned scenario requires abnormal blood pressure values as return values, including both systolic pressure and diastolic pressure.
-
Write the appropriate identified path to the required item (variable name is required if the containing object is required). Multiple identified paths are separated using commas. For this particular scenario, two identified paths are found for systolic and diastolic data value by following Step 2. The completed query statement looks as follows:
SELECT obs/data[at0001]/events[at0006]/data[at0003]/items[at0004]/value/magnitude, obs/data[at0001]/events[at0006]/data[at0003]/items[at0005]/value/magnitude FROM EHR [ehr_id/value=$ehrUid] CONTAINS COMPOSITION [openEHR-EHR-COMPOSITION.encounter.v1] CONTAINS OBSERVATION obs [openEHR-EHR-OBSERVATION.blood_pressure.v1] WHERE obs/data[at0001]/events[at0006]/data[at0003]/items[at0004]/value/magnitude >= 140 OR obs/data[at0001]/events[at0006]/data[at0003]/items[at0005]/value/magnitude >= 90
5.4. Ordering and pagination
If the query scenario would be:
"Get the latest 5 abnormal blood pressure values that were recorded in a health encounter for a specific patient."
then the followings are needed:
-
add an extra column with event start_time values and order the results based on that column:
SELECT obs/data[at0001]/events[at0006]/data[at0003]/items[at0004]/value/magnitude AS systolic, obs/data[at0001]/events[at0006]/data[at0003]/items[at0005]/value/magnitude AS diastlic, c/context/start_time AS date_time FROM EHR [ehr_id/value=$ehrUid] CONTAINS COMPOSITION c [openEHR-EHR-COMPOSITION.encounter.v1] CONTAINS OBSERVATION obs [openEHR-EHR-OBSERVATION.blood_pressure.v1] WHERE obs/data[at0001]/events[at0006]/data[at0003]/items[at0004]/value/magnitude >= 140 OR obs/data[at0001]/events[at0006]/data[at0003]/items[at0005]/value/magnitude >= 90 ORDER BY c/context/start_time DESC
-
use
LIMIT
clause to retrieve only the first needed rows, i.e. the latest 5 rows:SELECT obs/data[at0001]/events[at0006]/data[at0003]/items[at0004]/value/magnitude AS systolic, obs/data[at0001]/events[at0006]/data[at0003]/items[at0005]/value/magnitude AS diastlic, c/context/start_time AS date_time FROM EHR [ehr_id/value=$ehrUid] CONTAINS COMPOSITION c [openEHR-EHR-COMPOSITION.encounter.v1] CONTAINS OBSERVATION obs [openEHR-EHR-OBSERVATION.blood_pressure.v1] WHERE obs/data[at0001]/events[at0006]/data[at0003]/items[at0004]/value/magnitude >= 140 OR obs/data[at0001]/events[at0006]/data[at0003]/items[at0005]/value/magnitude >= 90 ORDER BY c/context/start_time DESC LIMIT 5
6. AQL Syntax Specification
The following ANTLR4 grammar expresses the AQL syntax.
//
// description: ANTLR4 parser grammar for Archetype Query Language (AQL)
// authors: Sebastian Iancu, Code24, Netherlands
// Teun van Hemert, Nedap, Netherlands
// Thomas Beale, Ars Semantica UK, openEHR Foundation Management Board
// contributors: This version of the grammar is a complete rewrite of previously published antlr3 grammar,
// based on current AQL specifications in combination with grammars of AQL implementations.
// The openEHR Foundation would like to recognise the following people for their contributions:
// - Chunlan Ma & Heath Frankel, Ocean Health Systems, Australia
// - Bostjan Lah, Better, Slovenia
// - Christian Chevalley, EHRBase, Germany
// - Michael Böckers, Nedap, Netherlands
// support: openEHR Specifications PR tracker <https://specifications.openehr.org/releases/QUERY/open_issues>
// copyright: Copyright (c) 2021- openEHR Foundation
// license: Creative Commons CC-BY-SA <https://creativecommons.org/licenses/by-sa/3.0/>
//
parser grammar AqlParser;
options { tokenVocab=AqlLexer; }
selectQuery
: selectClause fromClause whereClause? orderByClause? limitClause? SYM_DOUBLE_DASH? EOF
;
selectClause
: SELECT DISTINCT? top? selectExpr (SYM_COMMA selectExpr)*
;
fromClause
: FROM fromExpr
;
whereClause
: WHERE whereExpr
;
orderByClause
: ORDER BY orderByExpr (SYM_COMMA orderByExpr)*
;
limitClause
: LIMIT limit=INTEGER (OFFSET offset=INTEGER) ?
;
selectExpr
: columnExpr (AS aliasName=IDENTIFIER)?
;
fromExpr
: containsExpr
;
whereExpr
: identifiedExpr
| NOT whereExpr
| whereExpr AND whereExpr
| whereExpr OR whereExpr
| SYM_LEFT_PAREN whereExpr SYM_RIGHT_PAREN
;
orderByExpr
: identifiedPath order=(DESCENDING|DESC|ASCENDING|ASC)?
;
columnExpr
: identifiedPath
| primitive
| aggregateFunctionCall
| functionCall
;
containsExpr
: classExprOperand (NOT? CONTAINS containsExpr)?
| containsExpr AND containsExpr
| containsExpr OR containsExpr
| SYM_LEFT_PAREN containsExpr SYM_RIGHT_PAREN
;
identifiedExpr
: EXISTS identifiedPath
| identifiedPath COMPARISON_OPERATOR terminal
| functionCall COMPARISON_OPERATOR terminal
| identifiedPath LIKE likeOperand
| identifiedPath MATCHES matchesOperand
| SYM_LEFT_PAREN identifiedExpr SYM_RIGHT_PAREN
;
classExprOperand
: IDENTIFIER variable=IDENTIFIER? pathPredicate? #classExpression
| VERSION variable=IDENTIFIER? (SYM_LEFT_BRACKET versionPredicate SYM_RIGHT_BRACKET)? #versionClassExpr
;
terminal
: primitive
| PARAMETER
| identifiedPath
| functionCall
;
identifiedPath
: IDENTIFIER pathPredicate? (SYM_SLASH objectPath)?
;
pathPredicate
: SYM_LEFT_BRACKET (standardPredicate | archetypePredicate | nodePredicate) SYM_RIGHT_BRACKET
;
standardPredicate
: objectPath COMPARISON_OPERATOR pathPredicateOperand
;
archetypePredicate
: ARCHETYPE_HRID
| PARAMETER
;
nodePredicate
: (ID_CODE | AT_CODE) (SYM_COMMA (STRING | PARAMETER | TERM_CODE | AT_CODE | ID_CODE))?
| ARCHETYPE_HRID (SYM_COMMA (STRING | PARAMETER | TERM_CODE | AT_CODE | ID_CODE))?
| PARAMETER
| objectPath COMPARISON_OPERATOR pathPredicateOperand
| objectPath MATCHES CONTAINED_REGEX
| nodePredicate AND nodePredicate
| nodePredicate OR nodePredicate
;
versionPredicate
: LATEST_VERSION
| ALL_VERSIONS
| standardPredicate
;
pathPredicateOperand
: primitive
| objectPath
| PARAMETER
| ID_CODE
| AT_CODE
;
objectPath
: pathPart (SYM_SLASH pathPart)*
;
pathPart
: IDENTIFIER pathPredicate?
;
likeOperand
: STRING
| PARAMETER
;
matchesOperand
: SYM_LEFT_CURLY valueListItem (SYM_COMMA valueListItem)* SYM_RIGHT_CURLY
| terminologyFunction
| SYM_LEFT_CURLY URI SYM_RIGHT_CURLY
;
valueListItem
: primitive
| PARAMETER
| terminologyFunction
;
primitive
: STRING
| numericPrimitive
| DATE | TIME | DATETIME
| BOOLEAN
| NULL
;
numericPrimitive
: INTEGER
| REAL
| SCI_INTEGER
| SCI_REAL
| SYM_MINUS numericPrimitive
;
functionCall
: terminologyFunction
| name=(STRING_FUNCTION_ID | NUMERIC_FUNCTION_ID | DATE_TIME_FUNCTION_ID | IDENTIFIER) SYM_LEFT_PAREN (terminal (SYM_COMMA terminal)*)? SYM_RIGHT_PAREN
;
aggregateFunctionCall
: name=COUNT SYM_LEFT_PAREN (DISTINCT? identifiedPath | SYM_ASTERISK) SYM_RIGHT_PAREN
| name=(MIN | MAX | SUM | AVG) SYM_LEFT_PAREN identifiedPath SYM_RIGHT_PAREN
;
terminologyFunction
: TERMINOLOGY SYM_LEFT_PAREN STRING SYM_COMMA STRING SYM_COMMA STRING SYM_RIGHT_PAREN
;
// (deprecated)
top
: TOP INTEGER direction=(FORWARD|BACKWARD)?
;
//
// description: ANTLR4 lexer grammar for Archetype Query Language (AQL)
// authors: Sebastian Iancu, Code24, Netherlands
// Teun van Hemert, Nedap, Netherlands
// Thomas Beale, Ars Semantica UK, openEHR Foundation Management Board
// contributors: This version of the grammar is a complete rewrite of previously published antlr3 grammar,
// based on current AQL specifications in combination with grammars of AQL implementations.
// The openEHR Foundation would like to recognise the following people for their contributions:
// - Chunlan Ma & Heath Frankel, Ocean Health Systems, Australia
// - Bostjan Lah, Better, Slovenia
// - Christian Chevalley, EHRBase, Germany
// - Michael Böckers, Nedap, Netherlands
// support: openEHR Specifications PR tracker <https://specifications.openehr.org/releases/QUERY/open_issues>
// copyright: Copyright (c) 2021- openEHR Foundation
// license: Creative Commons CC-BY-SA <https://creativecommons.org/licenses/by-sa/3.0/>
//
lexer grammar AqlLexer;
channels {
COMMENT_CHANNEL
}
// SKIP
WS: [ \t\r\n]+ -> skip;
UNICODE_BOM: (
'\uEFBBBF' // UTF-8 BOM
| '\uFEFF' // UTF16_BOM
| '\u0000FEFF' // UTF32_BOM
) -> skip;
COMMENT: (
SYM_DOUBLE_DASH ' ' ~[\r\n]* ('\r'? '\n' | EOF)
| SYM_DOUBLE_DASH ('\r'? '\n' | EOF)
) -> channel(COMMENT_CHANNEL);
// Keywords
// Common Keywords
SELECT: S E L E C T ;
AS: A S ;
FROM: F R O M ;
WHERE: W H E R E ;
ORDER: O R D E R ;
BY: B Y ;
DESC: D E S C ;
DESCENDING: D E S C E N D I N G ;
ASC: A S C ;
ASCENDING: A S C E N D I N G ;
LIMIT: L I M I T ;
OFFSET: O F F S E T ;
// other keywords
DISTINCT: D I S T I N C T ;
VERSION : V E R S I O N ;
LATEST_VERSION : L A T E S T '_' V E R S I O N ;
ALL_VERSIONS : A L L '_' V E R S I O N S ;
NULL: N U L L ;
// deprecated
TOP: T O P ;
FORWARD: F O R W A R D ;
BACKWARD: B A C K W A R D ;
// Operators
// Containment operator
CONTAINS : C O N T A I N S ;
// Logical operators
AND : A N D ;
OR : O R ;
NOT : N O T ;
EXISTS: E X I S T S ;
// Comparison operators
COMPARISON_OPERATOR: SYM_EQ | SYM_NE | SYM_GT | SYM_GE | SYM_LT | SYM_LE ;
LIKE: L I K E ;
MATCHES: M A T C H E S ;
// functions
STRING_FUNCTION_ID: LENGTH | CONTAINS | POSITION | SUBSTRING | CONCAT_WS | CONCAT ;
NUMERIC_FUNCTION_ID: ABS | MOD | CEIL | FLOOR | ROUND ;
DATE_TIME_FUNCTION_ID: NOW | CURRENT_DATE_TIME | CURRENT_DATE | CURRENT_TIMEZONE | CURRENT_TIME ;
// string functions
LENGTH: L E N G T H ;
POSITION: P O S I T I O N ;
SUBSTRING: S U B S T R I N G ;
CONCAT: C O N C A T ;
CONCAT_WS: C O N C A T '_' W S ;
// numeric functions
ABS: A B S ;
MOD: M O D ;
CEIL: C E I L ;
FLOOR: F L O O R ;
ROUND: R O U N D ;
// date and time functions
CURRENT_DATE: C U R R E N T '_' D A T E ;
CURRENT_TIME: C U R R E N T '_' T I M E ;
CURRENT_DATE_TIME: C U R R E N T '_' D A T E '_' T I M E ;
NOW: N O W ;
CURRENT_TIMEZONE: C U R R E N T '_' T I M E Z O N E ;
// aggregate function
COUNT: C O U N T ;
MIN: M I N ;
MAX: M A X ;
SUM: S U M ;
AVG: A V G ;
// other functions
TERMINOLOGY: T E R M I N O L O G Y ;
// other, identifiers
PARAMETER: '$' IDENTIFIER_CHAR;
//
// ======================= Lexical rules ========================
// The followings are copies of https://github.com/openEHR/adl-antlr/blob/master/src/main/antlr/adl2/base_lexer.g4 rules, with some modifications required by AQL
//
// ---------- various ADL2 codes -------
ID_CODE : 'id' CODE_STR ;
AT_CODE : 'at' CODE_STR ;
fragment CODE_STR : ('0' | [1-9][0-9]*)+ ( '.' ('0' | [1-9][0-9]* ))* ;
// ---------- Delimited Regex matcher ------------
CONTAINED_REGEX: '{'WS* SLASH_REGEX WS* (';' WS* STRING)? WS* '}';
fragment SLASH_REGEX: '/' SLASH_REGEX_CHAR+ '/';
fragment SLASH_REGEX_CHAR: ~[/\n\r] | ESCAPE_SEQ | '\\/';
// ---------- ISO8601 Date/Time values ----------
fragment ISO8601_DATE
: YEAR MONTH DAY
| YEAR '-' MONTH '-' DAY
;
fragment ISO8601_TIME
: HOUR MINUTE SECOND ('.' MICROSECOND)? TIMEZONE?
| HOUR ':' MINUTE ':' SECOND ('.' MICROSECOND)? TIMEZONE?
;
fragment ISO8601_DATE_TIME
: YEAR MONTH DAY ('T' HOUR MINUTE SECOND ('.' MICROSECOND)? TIMEZONE?)?
| YEAR '-' MONTH '-' DAY ('T' HOUR ':' MINUTE ':' SECOND ('.' MICROSECOND)? TIMEZONE?)?
;
fragment MICROSECOND: [0-9][0-9][0-9] ;
fragment TIMEZONE: 'Z' | [+-] HOUR ( ':'? MINUTE )? ; // hour offset, e.g. `+09:30`, or else literal `Z` indicating +0000.
fragment YEAR: [0-9][0-9][0-9][0-9] ; // Year in ISO8601:2004 is 4 digits with 0-filling as needed
fragment MONTH: ( [0][1-9] | [1][0-2] ) ; // month in year
fragment DAY: ( [0][1-9] | [12][0-9] | [3][0-1] ) ; // day in month
fragment HOUR: ( [01][0-9] | [2][0-3] ) ; // hour in 24 hour clock
fragment MINUTE: [0-5][0-9] ; // minutes
fragment SECOND: [0-5][0-9] ; // seconds
// ------------------- special word symbols --------------
fragment SYM_TRUE: T R U E ;
fragment SYM_FALSE: F A L S E ;
// ---------------------- Identifiers ---------------------
ARCHETYPE_HRID : ARCHETYPE_HRID_ROOT '.v' VERSION_ID ;
fragment ARCHETYPE_HRID_ROOT : (NAMESPACE '::')? IDENTIFIER_CHAR '-' IDENTIFIER_CHAR '-' IDENTIFIER_CHAR '.' ARCHETYPE_CONCEPT_ID ;
fragment VERSION_ID : DIGIT+ ('.' DIGIT+)* ( ( '-rc' | '-alpha' ) ( '.' DIGIT+ )? )? ;
IDENTIFIER: IDENTIFIER_CHAR;
fragment IDENTIFIER_CHAR : ALPHA_CHAR WORD_CHAR* ;
fragment ARCHETYPE_CONCEPT_ID : ALPHA_CHAR NAME_CHAR* ;
// --------------------- composed primitive types -------------------
// coded term shortcut e.g. 'ICD10AM(1998)::F23', 'ISO_639-1::en' or 'snomed_ct(3.1)::3415004|cyanosis|'
TERM_CODE : TERM_CODE_CHAR+ ( '(' TERM_CODE_CHAR+ ')' )? '::' TERM_CODE_CHAR+ ('|' ~[|[\]]+ '|')?;
fragment TERM_CODE_CHAR: NAME_CHAR | '.';
// URIs - simple recogniser based on https://tools.ietf.org/html/rfc3986 and
// http://www.w3.org/Addressing/URL/5_URI_BNF.html
URI : URI_SCHEME ':' URI_HIER_PART ( '?' URI_QUERY )? ('#' URI_FRAGMENT)? ;
fragment URI_HIER_PART : ( '//' URI_AUTHORITY ) URI_PATH_ABEMPTY
| URI_PATH_ABSOLUTE
| URI_PATH_ROOTLESS
| URI_PATH_EMPTY;
fragment URI_SCHEME : ALPHA_CHAR ( ALPHA_CHAR | DIGIT | '+' | '-' | '.')* ;
fragment URI_AUTHORITY : ( URI_USERINFO '@' )? URI_HOST ( ':' URI_PORT )? ;
fragment URI_USERINFO: (URI_UNRESERVED | URI_PCT_ENCODED | URI_SUB_DELIMS | ':' )* ;
fragment URI_HOST : URI_IP_LITERAL | URI_IPV4_ADDRESS | URI_REG_NAME ; //TODO: ipv6
fragment URI_PORT: DIGIT*;
fragment URI_IP_LITERAL : '[' URI_IPV6_LITERAL ']'; //TODO, if needed: IPvFuture
fragment URI_IPV4_ADDRESS : URI_DEC_OCTET '.' URI_DEC_OCTET '.' URI_DEC_OCTET '.' URI_DEC_OCTET ;
fragment URI_IPV6_LITERAL : HEX_QUAD (':' HEX_QUAD )* '::' HEX_QUAD (':' HEX_QUAD )* ;
fragment URI_DEC_OCTET : DIGIT | [1-9] DIGIT | '1' DIGIT DIGIT | '2' [0-4] DIGIT | '25' [0-5];
fragment URI_REG_NAME: (URI_UNRESERVED | URI_PCT_ENCODED | URI_SUB_DELIMS)*;
fragment HEX_QUAD : HEX_DIGIT HEX_DIGIT HEX_DIGIT HEX_DIGIT ;
fragment URI_PATH_ABEMPTY: ('/' URI_SEGMENT ) *;
fragment URI_PATH_ABSOLUTE: '/' ( URI_SEGMENT_NZ ( '/' URI_SEGMENT )* )?;
fragment URI_PATH_NOSCHEME: URI_SEGMENT_NZ_NC ( '/' URI_SEGMENT )*;
fragment URI_PATH_ROOTLESS: URI_SEGMENT_NZ ( '/' URI_SEGMENT )*;
fragment URI_PATH_EMPTY: ;
fragment URI_SEGMENT: URI_PCHAR*;
fragment URI_SEGMENT_NZ: URI_PCHAR+;
fragment URI_SEGMENT_NZ_NC: ( URI_UNRESERVED | URI_PCT_ENCODED | URI_SUB_DELIMS | '@' )+; //non-zero-length segment without any colon ":"
fragment URI_PCHAR: URI_UNRESERVED | URI_PCT_ENCODED | URI_SUB_DELIMS | ':' | '@';
//fragment URI_PATH : '/' | ( '/' URI_XPALPHA+ )+ ('/')?;
fragment URI_QUERY : (URI_PCHAR | '/' | '?')*;
fragment URI_FRAGMENT : (URI_PCHAR | '/' | '?')*;
fragment URI_PCT_ENCODED : '%' HEX_DIGIT HEX_DIGIT ;
fragment URI_UNRESERVED: ALPHA_CHAR | DIGIT | '-' | '.' | '_' | '~';
fragment URI_RESERVED: URI_GEN_DELIMS | URI_SUB_DELIMS;
fragment URI_GEN_DELIMS: ':' | '/' | '?' | '#' | '[' | ']' | '@'; //TODO: migrate to [/?#...] notation
fragment URI_SUB_DELIMS: '!' | '$' | '&' | '\'' | '(' | ')'
| '*' | '+' | ',' | ';' | '=';
// According to IETF http://tools.ietf.org/html/rfc1034[RFC 1034] and http://tools.ietf.org/html/rfc1035[RFC 1035],
// as clarified by http://tools.ietf.org/html/rfc2181[RFC 2181] (section 11)
fragment NAMESPACE: LABEL ('.' LABEL)* ;
fragment LABEL: ALPHA_CHAR (NAME_CHAR|URI_PCT_ENCODED)* ;
// --------------------- atomic primitive types -------------------
BOOLEAN: SYM_TRUE | SYM_FALSE ;
INTEGER: DIGIT+;
REAL: DIGIT* '.' DIGIT+;
SCI_INTEGER: INTEGER E_SUFFIX;
SCI_REAL: REAL E_SUFFIX;
fragment E_SUFFIX: E [-+]? DIGIT+ ;
DATE
: SYM_SINGLE_QUOTE ISO8601_DATE SYM_SINGLE_QUOTE
| SYM_DOUBLE_QUOTE ISO8601_DATE SYM_DOUBLE_QUOTE
;
TIME
: SYM_SINGLE_QUOTE ISO8601_TIME SYM_SINGLE_QUOTE
| SYM_DOUBLE_QUOTE ISO8601_TIME SYM_DOUBLE_QUOTE
;
DATETIME
: SYM_SINGLE_QUOTE ISO8601_DATE_TIME SYM_SINGLE_QUOTE
| SYM_DOUBLE_QUOTE ISO8601_DATE_TIME SYM_DOUBLE_QUOTE
;
STRING
: SYM_SINGLE_QUOTE ( ESCAPE_SEQ | UTF8CHAR | OCTAL_ESC | ~('\\'|'\'') )* SYM_SINGLE_QUOTE
| SYM_DOUBLE_QUOTE ( ESCAPE_SEQ | UTF8CHAR | OCTAL_ESC | ~('\\'|'"') )* SYM_DOUBLE_QUOTE
;
fragment ESCAPE_SEQ: '\\' ['"?abfnrtv\\] ;
// ------------------- character fragments ------------------
fragment NAME_CHAR: WORD_CHAR | '-' ;
fragment WORD_CHAR: ALPHANUM_CHAR | '_' ;
fragment ALPHANUM_CHAR: ALPHA_CHAR | DIGIT ;
fragment ALPHA_CHAR: [a-zA-Z];
fragment UTF8CHAR: '\\u' HEX_DIGIT HEX_DIGIT HEX_DIGIT HEX_DIGIT ;
fragment DIGIT: [0-9];
fragment HEX_DIGIT: [0-9a-fA-F];
fragment OCTAL_ESC: '\\' [0-3] OCTAL_DIGIT OCTAL_DIGIT | '\\' OCTAL_DIGIT OCTAL_DIGIT | '\\' OCTAL_DIGIT;
fragment OCTAL_DIGIT: [0-7];
// ---------- symbols ----------
SYM_SEMICOLON: ';' ;
SYM_LT: '<' ;
SYM_GT: '>' ;
SYM_LE: '<=' ;
SYM_GE: '>=' ;
SYM_NE: '!=' ;
SYM_EQ: '=' ;
SYM_LEFT_PAREN: '(' ;
SYM_RIGHT_PAREN: ')' ;
SYM_COMMA: ',';
SYM_SLASH: '/';
SYM_ASTERISK: '*';
SYM_PLUS: '+';
SYM_MINUS: '-';
SYM_LEFT_BRACKET: '[';
SYM_RIGHT_BRACKET: ']';
SYM_LEFT_CURLY: '{';
SYM_RIGHT_CURLY: '}';
SYM_DOUBLE_DASH: '--';
fragment SYM_SINGLE_QUOTE: '\'';
fragment SYM_DOUBLE_QUOTE: '"';
// ------------------- Fragment letters ---------------------
fragment A: [aA];
fragment B: [bB];
fragment C: [cC];
fragment D: [dD];
fragment E: [eE];
fragment F: [fF];
fragment G: [gG];
fragment H: [hH];
fragment I: [iI];
fragment J: [jJ];
fragment K: [kK];
fragment L: [lL];
fragment M: [mM];
fragment N: [nN];
fragment O: [oO];
fragment P: [pP];
fragment Q: [qQ];
fragment R: [rR];
fragment S: [sS];
fragment T: [tT];
fragment U: [uU];
fragment V: [vV];
fragment W: [wW];
fragment X: [xX];
fragment Y: [yY];
fragment Z: [zZ];