Import Geospatial Data from CSV and GeoJSON

This section explains how to import geospatial data in CSV and GeoJSON format to an Exasol database.

You can store and analyze geospatial data in an Exasol database using the GEOMETRY datatype. This section provides examples of how to import geospatial data from a CSV file and a GeoJSON file, and how to use SQL functions to perform analytics and geo joins on the imported data.

Spatial reference systems

The system table EXA_SPATIAL_REF_SYS contains more than 3000 different spatial reference systems that you can use for the GEOMETRY datatype. These reference systems define geographical coordinates on Earth with different accuracy and using different properties. For example, the reference system SRID 31466 can only reference locations within Germany.

In the following examples we will use SRID 4326 and SRID 2163. SRID 4326 uses degrees as a unit, while SRID 2163 uses meters. Converting between different coordinate systems is often necessary. You can use the function ST_TRANSFORM for this.

For more details about the built-in geospatial functions in Exasol, see Geospatial Data.

Working with spatial data in Exasol

The following example shows how to create a table with a geometry column, and how to insert and query data. We are using GEOMETRY(4326) because the provided coordinates are in degrees. In the SELECT query we then transform the geodata into SRID 2163 to see the distance between the two cities in meters:

CREATE TABLE cities(name VARCHAR(200), geo GEOMETRY(4326));
INSERT INTO cities VALUES('Berlin', 'POINT (13.36963 52.52493)');
INSERT INTO cities VALUES('London', 'POINT (-0.1233 51.5309)');

-- this shows the distance in degrees:
SELECT a.name, b.name, st_distance(a.geo, b.geo) FROM cities a, cities b;
-- this shows the distance in meters:
SELECT a.name, b.name, st_distance(ST_Transform(a.geo, 2163),
     ST_Transform(b.geo, 2163)) FROM cities a, cities b;

GEOMETRY columns can be filled with strings using well-known text representation (WKT). For example, 'POINT (13.36963 52.52493)'.

Importing geodata from CSV files

Geodata is often presented in the form of latitude and longitude values in CSV files or in columns of tables imported from database systems. In the following example we use a file that contains data for international airports and train stations in CSV format. The first two fields of the file are the ID and the name of the airport or train station, and columns 7 and 8 contain the latitude and longitude coordinates.

CREATE OR REPLACE TABLE airports( airport_id INT, name VARCHAR(500), latitude DECIMAL(9,6), longitude DECIMAL(9,6) );

IMPORT INTO airports FROM LOCAL CSV FILE 'D:\airports-extended.dat' (1, 2, 7, 8);

ALTER TABLE airports ADD COLUMN geo GEOMETRY(4326);
UPDATE airports SET geo = 'POINT ('||longitude||' '||latitude||')';

SELECT * FROM airports;

In this example we used DECIMAL(9,6) columns to store the latitude and longitude values, and then added a GEOMETRY column to store a geodata point for each airport.

Similar to the example above, we can now calculate the distance between two airports or train stations in our table using the ST_DISTANCE and ST_TRANSFORM functions:

SELECT ST_DISTANCE(ST_TRANSFORM(a.geo, 2163), ST_TRANSFORM(b.geo, 2163)) 
FROM airports a, airports b WHERE a.name = 'Berlin-Tegel Airport' AND b.name = 'Berlin Hauptbahnhof';

Importing data from GeoJSON

GeoJSON is an often-used format for storing and exchanging geodata. On https://geojson-maps.ash.ms, you can download countries as geodata. When you generate a custom JSON file on this site, the file consists of one JSON object, a so-called FeatureCollection, which contains multiple polygons and multi-polygons, one for each country.

We import the data in three steps:

  1. Load the GeoJSON object as a string into a VARCHAR(2000000) column in a table.

    The GeoJSON string must not contain more than two million characters.

  2. Parse the GeoJSON with the Exasol native function json_extract to produce a table with one row for each country.

    For more details, see JSON_EXTRACT.

  3. Convert each GeoJSON object into a GEOMETRY value using the ST_geomFromGeoJSON UDF script.

After these steps, we have all countries from the GeoJSON file in a GEOMETRY column. Using this method, the whole GeoJSON must not be larger than 2,000,000 characters:

CREATE OR REPLACE TABLE geo_import(v VARCHAR(2000000));

IMPORT INTO GEO_IMPORT FROM LOCAL CSV FILE 'D:\custom.geo.json' COLUMN SEPARATOR = '0x01' COLUMN DELIMITER = '0x02'
-- we use dummy separators and delimiters to import a whole line as one column value

-- json_extract emits a row with two columns for each country: name and geojson
CREATE OR REPLACE VIEW geojson AS 
    SELECT JSON_EXTRACT(v, '$.features#.properties.name', '$.features#.geometry.json()'
    EMITS (name VARCHAR(2000000), geojson VARCHAR(2000000)) FROM geo_import;

-- the ST_GeomFromGeoJSON script can be downloaded using the link on this page
CREATE OR REPLACE TABLE countries AS SELECT name, CAST(ST_GeomFromGeoJSON(geojson) AS GEOMETRY(4326)) AS geo FROM geojson;

SELECT * FROM countries;

An alternative approach is to develop a UDF script that loads the GeoJSON from a server (for example, by using Python package requests or Paramiko) and parsing and iterating over the GeoJSON objects within the script. For each object, a WKT string is emitted, which can later be converted into a GEOMETRY value. This way, there is no limit on the number of characters.

Geo-joins and geo-indexes

The following query performs a geo-join between the two tables, countries, and airports. It finds all airports that are within the country Italy.

SELECT a.* FROM countries c JOIN airports a ON st_contains(c.geo, a.geo)
    WHERE c.name = 'Italy';

You can use Profiling to see that an geo-index is created automatically and that this geo-index is used to execute the geo-join:

ALTER SESSION SET profile = 'ON';

ALTER SESSION SET query_cache = 'OFF';

SELECT a.* FROM countries c JOIN airports a ON st_contains(c.geo, a.geo)
WHERE c.name = 'Italy';
 
FLUSH STATISTICS;

SELECT * FROM exa_statistics.exa_user_profile_last_day 
WHERE session_id = current_session AND command_name = 'SELECT' PREFERRING HIGH stmt_id;

Like other indexes in Exasol, the geo-index is persisted so that it can be used for future queries. Geo-index is maintained automatically when the table data changes, and it is automatically dropped when it is not used for five weeks or longer.

Visualizing geometry data

The SQL client DBeaver can show geospatial data on a map. For information about how to use DBeaver with Exasol, see DBeaver.

Load the data in a result tab in DBeaver. Right-click on a GEOMETRY column, choose View / Format, View as Geometry. Then click on the Panels button on the right to show the Value panel with the map.