Geocoding with UDFs

In this tutorial you will learn how to add geocoding to your data using Exasol UDFs.

This tutorial uses database objects in the free public demo system hosted by Exasol. To get access to the public demo system, sign up here.

Introduction

The examples in this tutorial make use of GeoPy, a popular Python client for geocoding web services. The public demo system contains a pre-built GeoPy distribution file in the following BucketFS location:

/buckets/bucketfs1/showcase/geopy-VERSION-py3-none-any.whl

In Exasol UDFs, a sys.path.extend call makes use of corresponding Python modules. This method only works for pure Python modules that do not contain extensions or code in other programming languages. If this does not work in your database, use the method described in Adding New Packages to Existing Script Languages.

Geocoding example

The following example shows how to determine the latitude and longitude for a given place using a UDF script.

  1. Open your schema in the demo database.

    OPEN SCHEMA <YOUR SCHEMA>;
  2. Copy the following script into your client and execute it to create the UDF.

    --/
    CREATE or REPLACE PYTHON3 SET SCRIPT geopy_geocode_long_lat(placename varchar(200)) EMITS (place varchar(20000), latitude double, longitude double) AS
    import glob
    sys.path.extend(glob.glob('/buckets/bucketfs1/showcase/*'))
    import geopy.geocoders
    from geopy.geocoders import Nominatim
    geopy.geocoders.options.default_user_agent = "my-application" 
    # IMPORTANT: "my-application" must be changed to the name of your application

    geolocator = Nominatim()

    def run(ctx):
     while True:
      try:
       geo = geolocator.geocode(ctx.placename, timeout=None
       place=ctx.placename
       lat=geo.latitude
       lo=geo.longitude
       ctx.emit(place, lat, lo)
      except:
       ctx.emit(place + " (Unavailable)",None,None)
      if not ctx.next(): break
    /
  3. Test the UDF against a specific address using the dual table in the demo database:

    SELECT geopy_geocode_long_lat('10 Downing Street, London, UK') FROM dual;
  4. Create a new table in your schema to run the UDF against a table with location data. For example:

    CREATE TABLE
        "TEST_BOROUGH" AS
    SELECT
        *
    FROM
        NYC_TAXI.TAXI_ZONES lIMIT 15;
  5. Execute the UDF against the data set.

    SELECT DISTINCT * from(
        SELECT
            geopy_geocode_long_lat(tz.BOROUGH)
        FROM
            "TEST_BOROUGH" tz
        ORDER BY
            1 ASC
    )
    ;

    Result: 

    * PLACE LATITUDE LONGITUDE
    1 Bronx 40.85045845 -73.8404035580209
    2 Brooklyn 40.6501038 -73.9495823
    3 Manhattan 40.7900869 -73.9598295
    4 Queens 40.6524927 -73.7914214158161
    5 Staten Island 40.5834557 -74.1496048

Reverse geocoding

The following examples show how to do reverse geocoding using latitude and longitude.

  1. Open your schema in the demo database.

    OPEN SCHEMA <YOUR SCHEMA>;
  2. Copy the following script into your client and execute it to create the UDF.

    --/
    CREATE OR REPLACE PYTHON3 SET SCRIPT geopy_reverse_geocode(lat DOUBLE, lo DOUBLE) EMITS (latitude double, longitude double, address VARCHAR(20000)) AS
    import glob
    sys.path.extend(glob.glob('/buckets/bucketfs1/showcase/*'))
    import geopy.geocoders
    from geopy.geocoders import Nominatim
    geopy.geocoders.options.default_user_agent = "my-application" 
    # IMPORTANT: "my-application" must be changed to the name of your application
     
    geolocator = Nominatim()
     
    def run(ctx):
     while True:
      geo = geolocator.reverse([ctx.lat, ctx.lo])
      lat=ctx.lat
      lo=ctx.lo
      address=geo.address
      ctx.emit(lat, lo, address)
      if not ctx.next(): break
    /
  3. Test the UDF against specific coordinates using the dual table in the demo database:

    SELECT geopy_reverse_geocode(51.5034066, -0.1275923) FROM dual;
  4. Create a new table in your schema to run the UDF against a table with location data. For example:

    CREATE TABLE
       taxi_coordinates AS
    SELECT
        t.id,
        t.pickup_longitude,
        t.pickup_latitude,
        t.dropoff_longitude,
        t.dropoff_latitude
    FROM
        NYC_TAXI.TRIPS t limit 10;
  5. Execute the UDF against the data set.

    SELECT
        geopy_reverse_geocode(pickup_latitude, pickup_longitude)
    FROM
        taxi_coordinates;

    Result: 

    * LATITUDE LONGITUDE ADDRESS
    1 40.717982000000006 -73.99393 101, Chrystie Street, Lower East Side, Manhattan Community Board 3, Manhattan, New York County, City of New York, New York, 10002, United States
    2 40.705454 -74.013501 Evacuation Day Plaza, Whitehall, Manhattan, New York County, City of New York, New York, 10275, United States
    3 40.760014000000005 -73.988322 Riu Plaza Hotel, 305, West 46th Street, Hell's Kitchen, Manhattan Community Board 4, Manhattan, New York County, City of New York, New York, 10036, United States
    4 40.760525 -73.991272 645, 9th Avenue, Hell's Kitchen, Manhattan Community Board 4, Manhattan, New York County, City of New York, New York, 10036, United States
    5 40.743637 -74.003475 130, 9th Avenue, Manhattan Community Board 4, Manhattan, New York County, City of New York, New York, 10011, United States
    6 40.714915000000005 -74.009779 75, West Broadway, Tribeca, Manhattan, New York County, City of New York, New York, 10007, United States
    7 40.64464 -73.782466 John F. Kennedy International Airport, JFK Access Road, Queens, Queens County, City of New York, New York, 11430, United States
    8 40.733792 -73.98675 Hattan Cleaners, 141, 3rd Avenue, Gramercy, Manhattan Community Board 6, Manhattan, New York County, City of New York, New York, 10003, United States
    9 40.764456 -73.972851 Harry Cipriani, 781, 5th Avenue, Manhattan Community Board 8, Manhattan, New York County, City of New York, New York, 10022, United States
    10 40.758942000000005 -73.974343 34, East 52nd Street, Midtown East, Manhattan Community Board 5, Manhattan, New York County, City of New York, New York, 10022, United States