JSON_EXTRACT
Purpose
This function extracts one or more SQL values from a valid JSON string. For each SQL value, JSON_EXTRACT requires a JSON path expression that specifies the value to extract.
Syntax
json_extract::=
json_input_clause::=
Usage Notes
- The parameter json_expr defines the input JSON string for the function. Exasol accepts an arbitrary expression of type (VAR)CHAR. To return an SQL value, json_expr has to follow the syntax rules on JSON.org.
- JSON_EXTRACT is based on the Exasol UDF framework and behaves like a SCALAR EMITS UDF. Thus, the same restrictions apply (for example, there is no Virtual Schema pushdown). Although JSON_EXTRACT relies on the UDF framework, internally, it uses a fast native JSON processing framework.
- Each json_path_expr parameter defines a single JSON path for JSON_EXTRACT. The function accepts an arbitrary expression of type (VAR)CHAR for each path. For the path syntax rules in Exasol, refer to the JSON Path Expressions section.
- JSON_EXTRACT supports an arbitrary number of json_path_expr parameters. For each json_path_expr, the EMITS clause has to define a column_name and a data_type for the JSON values it returns. The order of the columns in the EMITS clause corresponds to the order of the json_path_expr parameters. Thus, the first column_name and data_type belongs to the first json_path_expr.
- If any json_path_expr does not identify an SQL value, JSON_EXTRACT throws an error. Thus, ERROR ON EMPTY is the default for each json_path_expr.
- If any error occurs during the JSON processing, JSON_EXTRACT throws an error by default. Thus, ERROR ON ERROR is the default for each json_path_expr.
- If the last json_path_expr parameter is the string literal '$.error()', the error behavior of JSON_EXTRACT changes to LOG ON EMPTY and LOG ON ERROR for all json_path_expr parameters.
- If '$.error()' is present, the EMITS clause requires an extra 'error column' with data_type (VAR)CHAR to log all occurring errors. It is possible to freely choose the column_name and (VAR)CHAR size for the 'error column' in the EMITS clause.
- To specify the ON EMPTY and ON ERROR behavior for each path separately, refer to the JSON Path Expressions section.
- If the error behavior for a path is LOG ON EMPTY and the path does not identify an SQL value, JSON_EXTRACT returns NULL for the path and writes the problem into the 'error column'.
- If the error behavior for a path is LOG ON ERROR and an error occurs during the JSON processing, JSON_EXTRACT returns NULL for the path and writes the problem into the 'error column'.
- To learn more about the different possible errors, refer to the JSON Error Handling section.
- With JSON_EXTRACT, it is possible to extract several rows from a single json_expr. This makes it possible to create an extra row for each element in a JSON array. To extract several rows, the json_path_expr has to contain a '#' character to split up all expressions identified by a nested path into several rows.
- All nested path prefixes ending with a '#' character must be a prefix of the longest nested path.
Example
CREATE OR REPLACE TABLE json_input(id INT, json VARCHAR(2000000));
INSERT INTO json_input VALUES(1, '{"firstname" : "Ann", "surname" : "Smith", "age" : 29}');
INSERT INTO json_input VALUES(2, '{"firstname" : "Sue", "surname" : "Smith", "age" : 42}');INSERT INTO json_input VALUES(3, '{"firstname" : "Claire", "surname" : "Doe", "age" : "Five"}');
INSERT INTO json_input VALUES(4, '{"firstname" : "John", "name" : "Doe", "age" : 50}');
SELECT id,
JSON_EXTRACT (json,
'$.firstname',
'$.surname',
'$.age',
'$.error()'
)
EMITS(
forename VARCHAR(100),
surname VARCHAR(100),
age INT,
error_column VARCHAR(2000000)
)
FROM json_input;
Result
ID | FIRSTNAME | SURNAME | AGE | ERROR_COLUMN |
1 | Ann | Smith | 29 | NULL |
2 | Sue | Smith | 42 | NULL |
3 | Claire | Doe | NULL | [{ "column" : "AGE", "error" : "cast error" }] |
4 | John | NULL | 50 | [{ "column" : "SURNAME", "error" : "no SQL/JSON item" }] |
CREATE OR REPLACE TABLE json_input2(id INT, json VARCHAR(2000000));
INSERT INTO json_input2 VALUES(1, '{"data" : [{"country" : "United Kingdom", "customers" : [{"name" : "John Doe", "phone numbers" : [{"type" : "private", "number" : "+4412345678"}]}]}]}');
INSERT INTO json_input2 VALUES(2, '{"data" : [{"country" : "Germany", "customers" : [{"name" : "Hans Meier", "phone numbers" : [{"type" : "private", "number" : "+4912345678"}, {"type" : "mobile", "number" : "+49987654321"}]} ,{"name" : "Achim Schmidt", "phone numbers" : [{"type" : "private", "number" : "+49214365879"}, {"type" : "mobile", "number" : "+4989674523"}]}]}]}');
SELECT id,
JSON_EXTRACT(json,
'$.data.country',
'$.data.customers#.name',
'$.data.customers#."phone numbers"#.type',
'$.data.customers#."phone numbers"#.number'
)
EMITS(
country VARCHAR(100),
name VARCHAR(100),
phone_number_type VARCHAR(100),
phone_number VARCHAR(100)
)
FROM json_input2
ORDER BY id, name, phone_number;
Result
ID | COUNTRY | NAME | PHONE_NUMBER_TYPE | PHONE_NUMBER |
1 | United Kingdom | John Doe | private | +4412345678 |
2 | Germany | Achim Schmidt | private | +49214365879 |
2 | Germany | Achim Schmidt | mobile | +4989674523 |
2 | Germany | Hans Meier | private | +4912345678 |
2 | Germany | Hans Meier | mobile | +49987654321 |