Import Geospatial Data from CSV and GeoJSON File

Geospatial Data can be stored and analyzed in the Exasol database using the GEOMETRY datatype. This topic provides you with some examples of how to import geospatial data from a CSV and a GeoJSON file, and use SQL functions to perform analytics and geo joins.

Spacial Reference Systems

In the system table SYS.EXA_SPATIAL_REF_SYS, you will find more than 3000 different spatial reference systems which you can use for the GEOMETRY datatype. These reference systems are there to define points on the earth, but they have different strengths, accuracies, and properties, for example, the SRID 31466 can only reference locations within Germany.

We will use the two SRIDs - 4326 and 2163 in our example. The SRID 4326 is using degrees as a unit, and the SRID 2163 uses meters. Often, conversions are necessary between different coordinate systems. You can use the ST_TRANSFORM function for this.

Working with Spatial Data in Exasol

The following example shows you how to create a table with a geometry column, how to insert and query data. We are using GEOMETRY(4326) because the coordinates provided are in degrees. Later, in the SELECT query, we transform the geodata into SRID 2163 to see the distance between the two cities in meters:

CREATE TABLE cities(name VARCHAR(200), geo GEOMETRY(4326));
INSERT INTO cities VALUES('Berlin', 'POINT (13.36963 52.52493)');
INSERT INTO cities VALUES('London', 'POINT (-0.1233 51.5309)');

-- this shows the distance in degrees:
SELECT a.name, b.name, st_distance(a.geo, b.geo) FROM cities a, cities b;
-- this shows the distance in meters:
SELECT a.name, b.name, st_distance(ST_Transform(a.geo, 2163),
     ST_Transform(b.geo, 2163)) FROM cities a, cities b;

GEOMETRY columns can be filled with strings using the well-known text representation (WKT), for example, 'POINT (13.36963 52.52493)'.

Importing geodata from CSV files

Often, geodata is present in CSV files or in columns of tables that are imported from different database systems in the form of latitude and longitude values. On https://openflights.org/data.html#airport, you can download a CSV file containing international airport data. We are using the extended version of this file, which consists of more than 12,000 international airports and train stations. The first two fields of the file are the id and the name of the airport; in columns 7 and 8, we find its latitude and longitude coordinates:

CREATE OR REPLACE TABLE airports( airport_id INT, name VARCHAR(500), latitude DECIMAL(9,6), longitude DECIMAL(9,6) );

IMPORT INTO airports FROM LOCAL CSV FILE 'D:\airports-extended.dat' (1, 2, 7, 8);

ALTER TABLE airports ADD COLUMN geo GEOMETRY(4326);
UPDATE airports SET geo = 'POINT ('||longitude||' '||latitude||')';

SELECT * FROM airports;

We firstly used DECIMAL(9,6) columns to store the latitude and longitude values, and then we added a GEOMETRY column to store a geodata point for each airport.

Similar to the example above, we can now calculate the distance between two airports or train stations in our table using the ST_DISTANCE and the ST_TRANSFORM function:

select st_distance(ST_Transform(a.geo, 2163), ST_Transform(b.geo, 2163)) 
from airports a, airports b where a.name = 'Berlin-Tegel Airport' and b.name = 'Berlin Hauptbahnhof';

Importing data from GeoJSON

GeoJSON is an often-used format for storing and exchanging geodata. On https://geojson-maps.ash.ms, you can download countries as geodata. When you generate a custom JSON file on this site, the file consists of one JSON object, a so-called FeatureCollection, which contains multiple polygons and multi-polygons, one for each country.

We import the data in three steps:

  1. Load the full GeoJSON string into a VARCHAR(2000000) column in a table (mind that this does not work if the GeoJSON is larger than two million characters).
  2. Parse the GeoJSON with the Exasol native function json_extract to produce a table with one row for each country (see JSON_EXTRACT for more details).
  3. Convert each GeoJSON object into a GEOMETRY value using the ST_geomFromGeoJSON UDF script (download the ST_GeomFromGeoJSON.sql script).

After these steps, we have all countries from the GeoJSON file in a GEOMETRY column. An alternative approach is developing a UDF script that loads the GeoJSON from a server (for example, by using the Python package requests or paramiko) and parsing and iterating over the GeoJSON objects within the script. For each object, a WKT string is emitted, which can later be converted into a GEOMETRY value. This way, there is no limit on the characters. For our approach, the whole GeoJSON must not be larger than 2,000,000 characters:

create or replace table geo_import(v varchar(2000000));
import into geo_import from local CSV file 'D:\custom.geo.json' column separator = '0x01' column delimiter = '0x02'; -- dummy separaters / delimiters to import a whole line as one column value

-- json_extract emits a row with two columns for each country: name and geojson
create or replace view geojson as 
select json_extract(v, '$.features#.properties.name', '$.features#.geometry.json()'
 emits (name varchar(2000000), geojson varchar(2000000)) from geo_import;

-- ST_GeomFromGeoJSON can be downloaded using the link on this page
create or replace table countries as select name, cast(ST_GeomFromGeoJSON(geojson) as geometry(4326)) as geo from geojson;

select * from countries;

Geo-Joins and Geo-Indexes

The following query performs a geo-join between the two tables, countries, and airports. It finds all airports that are within the country Italy.

select a.* from countries c join airports a on st_contains(c.geo, a.geo)
 where c.name = 'Italy';

You can use Profiling to see that an geo-index is created automatically and that this geo-index is used to execute the geo-join:

alter session set profile = 'ON';

alter session set query_cache = 'OFF';

select a.* from countries c join airports a on st_contains(c.geo, a.geo)
 where c.name = 'Italy';
 
flush statistics;

select * from exa_statistics.exa_user_profile_last_day 
where session_id = current_session and command_name = 'SELECT' preferring high stmt_id;

Like other indexes in Exasol, the geo-index is persisted so that it can be used for future queries. Geo-index is maintained automatically when the table data changes, and it is automatically dropped when it is not used for five weeks or longer.

Visualizing Geometry Data

The SQL client DBeaver can show geospatial data on a map. On a result tab, right-click on a GEOMETRY column, choose View / Format, View as Geometry. Then click on the Panels button on the right to show the Value panel with the map.