JSON_EXTRACT

Purpose

This function extracts one or more SQL values from a valid JSON string. For each SQL value, JSON_EXTRACT requires a JSON path expression that specifies the value to extract.

Syntax

json_extract::=

json_extract_1

json_extract_2

json_input_clause::=

json_input_clause

Usage Notes

  • The parameter json_expr defines the input JSON string for the function. Exasol accepts an arbitrary expression of type (VAR)CHAR. To return an SQL value, json_expr has to follow the syntax rules on JSON.org.
  • JSON_EXTRACT is based on the Exasol UDF framework and behaves like a SCALAR EMITS UDF. Thus, the same restrictions apply (for example, there is no Virtual Schema pushdown). Although JSON_EXTRACT relies on the UDF framework, internally, it uses a fast native JSON processing framework.
  • Each json_path_expr parameter defines a single JSON path for JSON_EXTRACT. The function accepts an arbitrary expression of type (VAR)CHAR for each path. For the path syntax rules in Exasol, refer to the JSON Path Expressions section.
  • JSON_EXTRACT supports an arbitrary number of json_path_expr parameters. For each json_path_expr, the EMITS clause has to define a column_name and a data_type for the JSON values it returns. The order of the columns in the EMITS clause corresponds to the order of the json_path_expr parameters. Thus, the first column_name and data_type belongs to the first json_path_expr.
  • If any json_path_expr does not identify an SQL value, JSON_EXTRACT throws an error. Thus, ERROR ON EMPTY is the default for each json_path_expr.
  • If any error occurs during the JSON processing, JSON_EXTRACT throws an error by default. Thus, ERROR ON ERROR is the default for each json_path_expr.
  • If the last json_path_expr parameter is the string literal '$.error()', the error behavior of JSON_EXTRACT changes to LOG ON EMPTY and LOG ON ERROR for all json_path_expr parameters.
  • If '$.error()' is present, the EMITS clause requires an extra 'error column' with data_type (VAR)CHAR to log all occurring errors. It is possible to freely choose the column_name and (VAR)CHAR size for the 'error column' in the EMITS clause.
  • If the error behavior for a path is LOG ON EMPTY and the path does not identify an SQL value, JSON_EXTRACT returns NULL for the path and writes the problem into the 'error column'.
  • If the error behavior for a path is LOG ON ERROR and an error occurs during the JSON processing, JSON_EXTRACT returns NULL for the path and writes the problem into the 'error column'.
  • To know more about the different possible errors, refer to the JSON Error Handling section.
  • To specify the ON EMPTY and ON ERROR behavior for each path separately, refer to the JSON Path Expressions section.
  • With JSON_EXTRACT, it is possible to extract several rows from a single json_expr. This makes it possible to create an extra row for each element in a JSON array. To extract several rows, the json_path_expr has to contain a '#' character to split up all expressions identified by a nested path into several rows.
  • All nested path prefixes ending with a '#' character must be a prefix of the longest nested path.
  • JSON_EXTRACT is only available from Exasol 6.2.7 and requires that the command-line parameter '-enableJsonFunctions' is set.

Example

CREATE OR REPLACE TABLE json_input(id INT, json VARCHAR(2000000));
INSERT INTO json_input VALUES(1, '{"firstname" : "Ann", "surname" : "Smith", "age" : 29}');
INSERT INTO json_input VALUES(2, '{"firstname" : "Sue", "surname" : "Smith", "age" : 42}');INSERT INTO json_input VALUES(3, '{"firstname" : "Claire", "surname" : "Doe", "age" : "Five"}');
INSERT INTO json_input VALUES(4, '{"firstname" : "John", "name" : "Doe", "age" : 50}');

SELECT id,
       JSON_EXTRACT (json,
           '$.firstname'
           '$.surname'
           '$.age'
           '$.error()'
       )
       EMITS(
             forename VARCHAR(100), 
             surname VARCHAR(100), 
             age INT,
             error_column VARCHAR(2000000)
       )
FROM json_input;
Result
ID FIRSTNAME SURNAME AGE ERROR_COLUMN
1 Ann Smith 29 NULL
2 Sue Smith 42 NULL
3 Claire Doe NULL [{ "column" : "AGE", "error" : "cast error" }]
4 John NULL 50 [{ "column" : "SURNAME", "error" : "no SQL/JSON item" }]

CREATE OR REPLACE TABLE json_input2(id INT, json VARCHAR(2000000));
INSERT INTO json_input2 VALUES(1, '{"data" : [{"country" : "United Kingdom", "customers" : [{"name" : "John Doe", "phone numbers" : [{"type" : "private", "number" : "+4412345678"}]}]}]}');
INSERT INTO json_input2 VALUES(2, '{"data" : [{"country" : "Germany", "customers" : [{"name" : "Hans Meier", "phone numbers" : [{"type" : "private", "number" : "+4912345678"}, {"type" : "mobile", "number" : "+49987654321"}]} ,{"name" : "Achim Schmidt", "phone numbers" : [{"type" : "private", "number" : "+49214365879"}, {"type" : "mobile", "number" : "+4989674523"}]}]}]}');
SELECT id,
       JSON_EXTRACT(json, 
            '$.data.country',
            '$.data.customers#.name',
            '$.data.customers#."phone numbers"#.type',
            '$.data.customers#."phone numbers"#.number'
       )
      EMITS(
             country VARCHAR(100), 
             name VARCHAR(100),
             phone_number_type VARCHAR(100),
             phone_number VARCHAR(100)
       )
FROM json_input2
ORDER BY id, name, phone_number;
Result
ID COUNTRY NAME PHONE_NUMBER_TYPE PHONE_NUMBER
1 United Kingdom John Doe private +4412345678
2 Germany Achim Schmidt private +49214365879
2 Germany Achim Schmidt mobile +4989674523
2 Germany Hans Meier private +4912345678
2 Germany Hans Meier mobile +49987654321