Documentation

Complete guide to using GooseDB in your applications

1. Getting Started

GooseDB is an extended database system based on DuckDB.
It has been modified to a server/client architecture by supporting the PostgreSQL wire protocol. (It is not open source.)

Note:
GooseDB can be considered as a PostgreSQL for OLAP purposes.
GooseDB Lakehouse/Standalone has not been released yet.

2. Installation

2-1. Install

$ unzip goosedb_release.zip
$ chmod 755 goose_release


After downloading the zip file, you can save it to an appropriate directory.

3. Daemon Start/Stop

3-1. Daemon Start

$./goose_release start


When the daemon starts, it listens on port 1234 by default.
This can be changed through the goosedb.cfg configuration file.

3-2. Directory Structure

When the daemon starts, the "conf/", "data/", and "logs/ directories along with their related files are automatically created.

./goosedb_release
./conf/goosedb.cfg
./data/goosedb.ddb
      /goosedb.wal
./logs/goosedb_1234.log
      /goosedb_1234.pid
      /restapi_5678.log

Note:
After copying the existing DuckDB file to the "data/" directory and renaming it to goosedb.ddb, start the GooseDB daemon.
It will then recognize the file as a GooseDB database file.

3-3. Daemon Stop

$ cd logs/
$ kill -9 `cat goosedb_1234.pid`

4. Configuration

4-1. configuration file(goosedb.cfg)

Configuration details for GooseDB can be specified in the goosedb.cfg file.
You can adjust GooseDB's behavior through the goosedb.cfg configuration file.
If the `goosedb.cfg` file does not exist when starting the daemon, it is automatically created.
Lines starting with ';' or '#' are treated as comments.

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
[Main]
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
PgwirePort = 1234
RestapiPort = 5678

# AllowedIp is repeated as many times as specified.
# If this parameter is enabled, in addition to password authentication, IP checking will also be performed.
# If you want to remove IP checking, comment it out.
# Only CIDR(ip/subnetmask) notation is supported
;;AllowedIp = 127.0.0.1/32
;;AllowedIp = 192.168.1.0/24
AllowedIp = 0.0.0.0/0
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
[Log]
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
# common
MaxSizeMB = 1       # max file size (mb)
MaxBackups = 3        # max backup file number
MaxAgeDay = 28        # duration days
Compress = true
# goosedb_<port>.log
LogConnections=true
LogDisconnections=true
LogStatement = ddl     # none, ddl, mod, all

# restapi_<port>.log
EnableRestapiLogging = true
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
[GooseDB]
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
VirAdminUser= gooseadmin    # NEVER MODIFY!
VirAppRwUser= goose         # NEVER MODIFY!
VirAppRoUser= goosero       # NEVER MODIFY!
VirFileDbName=goosedb
VirMemoryDbName=memdb
FetchUnitRows=1000
StatementTimeoutSecs=600       # 0=notimeout
MaxPreparedStatements=100       # maximum number of cached prepared statements per session, 0=unlimit
 
DataDir = ./data
AccessMode=read_write   # read_write OR  read_only

# AutoLoadExtension is repeated as many times as specified.
# once at bootup - default-loaded jemalloc, parquet
;AutoLoadExtension = icu
;AutoLoadExtension = inet
;AutoLoadExtension = postgres
AutoLoadExtension = parquet
AutoLoadExtension = json

# AutoExecAttachCmd is repeated as many times as specified.
# once at bootup
;;AutoExecAttachCmd=ATTACH 'host=localhost port=9099 dbname=postgres user=postgres' AS pg9099 (TYPE postgres)
;;AutoExecAttachCmd=ATTACH 'host=localhost port=5432 dbname=postgres user=postgres' AS pg5432 (TYPE postgres)

# AutoExecSetCmd is repeated as many times as specified.
# Each time a session is established, the following commands are executed sequentially.
AutoExecSetCmd=SET ieee_floating_point_ops = false
A utoExecSetCmd=SET autoload_known_extensions=1
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
[TrinoBridge]  ;; exprimental
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
VirAppUser = trino
#VirAppEncPwd = zhIPa1t/B3jXGb/m7I551I+bWoqL3+J7EVwNVi0o1NQ=    ;; 000000
VirDbName = trinobridge
FetchUnitRows=1000
StatementTimeoutSecs=600       # 0=notimeout
 
TrinoServerUri = http://127.0.0.1:8080
TrinoCatalog = iceberg
TrinoSchema = default
TrinoUser = trinoadmin
TrinoEncPwd = zhIPa1t/B3jXGb/m7I551I+bWoqL3+J7EVwNVi0o1NQ=    ;; 000000

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
[Internal]
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
CfgVer=1   ;; NEVER MODIFY!

Note:
While GooseDB Freeware/Standalone uses a single data file,
GooseDB Lakehouse internally manages data by dividing it into multiple files in chunks, allowing it to handle data sizes up to petabytes.

5. Connect

5-1. psql connect

The default password for gooseadmin is 000000 (six zeros).
Unlike DuckDB, multiple users can connect simultaneously and write to the same table.

psql -h 127.0.0.1 -U gooseadmin -d goosedb -p 1234

5-2. show command

The following commands are newly added in GooseDB.

Command Description
show all It displays all objects.
show all tables It displays all tables.
show config select * from duckdb_settings() order by 1
show config all It is the same as "show config"
show config thread select * from duckdb_settings() where name ilike '%thread%' order by 1
show current_catalog It displays information about the currently logged-in database.
show current_database It is the same as show current_catalog.
show current_schema It displays the current schema.
show databases It displays a list of all databases.
show extensions It displays the list of extensions.
show hostname It displays the OS hostname.
show indexes It displays all indexes.
show memory It shows the memory usage of GooseDB.
show os It displays the OS platform information where GooseDB is installed.
show port It provides the GooseDB port.
show schemas It displays schema information.
show sequences It shows sequence information.
show sid It displays the session ID of the connected session.
show tables It displays the list of tables.
show user It shows the name of the logged-in user.
show version It displays the GooseDB version information.
show views It displays the views.

5-3. SQL

It supports all DuckDB SQL syntax.
Please report related bugs to goose@goosedb.net.

6. PostgreSQL Protocol

6-1. Tool Compatibility

GooseDB implements the PostgreSQL wire protocol, allowing seamless integration with existing PostgreSQL tools and clients.
While it fully supports the PostgreSQL Wire Protocol at the protocol level, compatibility with all PostgreSQL tools remains a separate issue.
This is because various PostgreSQL client tools may encounter errors when querying PostgreSQL-specific catalogs (dictionaries).
Internally, GooseDB ensures compatibility with client tools by utilizing a query rewrite approach.

Note:
DBeaver, PgBouncer, PgCat, Apache Superset, Grafana, and the PostgreSQL dblink extension have completed compatibility testing.

6-2. Data Types

It is recommended to use only the following data types to ensure compatibility with PostgreSQL.
Although all DuckDB column types can be used, PostgreSQL compatibility is not guaranteed.

Name Aliases
boolean bool
float float4, real
double float8
decimal(prec, scale) numeric(prec, scale)
smallint int2
int int4
bigint int8
date
time
timetz
timestamp
timestamptz
varchar text
json
boolean[] bool[]
smallint[] int2[]
int[] int4[]
bigint[] int8[]
float[] float4[]
double[] float8[]
date[]
time[]
timetz[]
timestamp[]
timestamptz[]
varchar[] text[]

6-3. Programming Language Support

You can program using Python3, Go, Node.js, C#, etc., by treating GooseDB as PostgreSQL and utilizing a PostgreSQL driver.

7. Performance Tuning

7-1. Parameter Tuning

You can adjust tuning-related parameters by modifying the config/goosedb.cfg file.

cd config/
vi goosedb.cfg
...
AutoExecSetCmd=SET threads=4
AutoExecSetCmd=SET max_memory='12GiB'
...
			

8. Security

8-1. SSL/TLS & SCRAM-SHA-256

GooseDB currently does not support SSL/TLS and SCRAM-SHA-256 authentication.
If these features are required, they can be enabled through PgBouncer.

#> mkdir cert
#> cd cert

#> cat <<EOF > server-ext.cnf
[req]
default_bits = 2048
prompt = no
default_md = sha256
req_extensions = req_ext

[req_ext]
subjectAltName = @alt_names

[alt_names]
DNS.1 = mypostgres.local
DNS.2 = localhost
IP.1 = 127.0.0.1
IP.2 = ::1
EOF

#> cat <<EOF > make_cert.sh
openssl genrsa -out ca.key 4096

# 10 years
openssl req -x509 -new -nodes \
    -key ca.key \
    -sha256 \
    -days 3650 \
    -out ca.crt \
    -subj "/CN=MyCustomCA/O=MyOrganization"

openssl genrsa -out server.key 2048

openssl req -new -key server.key \
    -out server.csr \
    -subj "/CN=mypostgres.local/O=MyOrganization"

openssl x509 -req \
    -in server.csr \
    -CA ca.crt \
    -CAkey ca.key \
    -CAcreateserial \
    -out server.crt \
    -days 3650 \
    -sha256 \
    -extfile ./server-ext.cnf \
    -extensions req_ext

chown 1001 server.* ca.*
chmod 600  server.key ca.key
EOF
#> chmod 755 make_cert.sh
#> ./make_cert.sh

#>cd ..

#> docker run -itd --name pgbouncer.2222 --network=host \
  -v ./cert/:/tmp/cert/ \
  -e PGBOUNCER_DATABASE="goosedb" \
  -e PGBOUNCER_EXTRA_FLAGS="--verbose" \
  -e PGBOUNCER_MAX_PREPARED_STATEMENTS=100 \
  -e PGBOUNCER_MIN_POOL_SIZE=3 \
  -e PGBOUNCER_PASSWORD="000000" \
  -e PGBOUNCER_POOL_MODE="session" \
  -e PGBOUNCER_PORT="2222" \
  -e PGBOUNCER_SERVER_LIFETIME=120 \
  -e PGBOUNCER_SERVER_ROUND_ROBIN=1 \
  -e PGBOUNCER_USERNAME="gooseadmin" \
  -e POSTGRESQL_DATABASE="goosedb" \
  -e POSTGRESQL_HOST="localhost" \
  -e POSTGRESQL_PORT="1234" \
  -e POSTGRESQL_USERNAME="gooseadmin" \
  -e POSTGRESQL_PASSWORD="000000" \
  -e PGBOUNCER_CLIENT_TLS_SSLMODE="require" \
  -e PGBOUNCER_CLIENT_TLS_CA_FILE="/tmp/cert/ca.crt"\
  -e PGBOUNCER_CLIENT_TLS_CERT_FILE="/tmp/cert/server.crt"\
  -e PGBOUNCER_CLIENT_TLS_KEY_FILE="/tmp/cert/server.key" \
  -e PGBOUNCER_AUTH_TYPE="scram-sha-256" \
  bitnami/pgbouncer:latest
$> psql "host=localhost dbname=goosedb user=gooseadmin  password=000000 port=2222 sslmode=require"
Pager usage is off.
psql (16.6 (Ubuntu 16.6-1.pgdg22.04+1))
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, compression: off)
Type "help" for help.

goosedb=> select version();
                                                          version                                                          
---------------------------------------------------------------------------------------------------------------------------
 PostgreSQL 16.6 - (GooseDB 1.2.1.0) on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-22), 64-bit
(1 row)

goosedb=>

9. Monitoring

9-1. Information

You can obtain information about GooseDB by making REST API requests using curl.

$ curl localhost:5678/about
{
  "BuildArchOS": "amd64/linux",
  "BuildDate": "2025-01-19(012305)",
  "GooseDB Ver": "GooseDB Freeware(For Non-Commercial Use Only) v1.2.1.0 (DuckDB v1.2.1)",
  "LicenseOwner": "KSS",
  "LicensePeriod": "~ 2027-01-01 09:00:00 +0900 KST",
  "LicenseType": "Freeware(For Non-Commercial Use Only)",
  "MaxConnections": "10"
}

9-2. Sessions

You can monitor by making GET requests to the REST API using curl.
The results are displayed in JSON.

$ curl localhost:5678/jsessions
__STAGE__# curl localhost:5678/jsessions
{
  "active": 0,
  "idle": 11,
  "idle in transaction": 1,
  "total": 12,
  "zsession": [
    {
      "sid": 1,
      "sid_crttime": 1737425770,
      "database": "goosedb",
      "username": "goose",
      "read_only": false,
      "application_name": "",
      "host_addr": "10.104.0.6:55960",
      "step": "Auth",
      "begin_nano": 1737425770656767697,
      "end_nano": 1737425770657884774,
      "duration": "1.117077ms",
      "state": "idle",
      "in_transaction": false,
      "sql": ""
    },
    {
      "sid": 2,
      "sid_crttime": 1737425770,
      "database": "goosedb",
      "username": "goose",
      "read_only": false,
      "application_name": "",
      "host_addr": "10.104.0.6:55968",
      "step": "Auth",
      "begin_nano": 1737425770990457247,
      "end_nano": 1737425770991877983,
      "duration": "1.420736ms",
      "state": "idle",
      "in_transaction": false,
....

Note:
SQL> show sid; -- identify my session id
SQL> kill 9; -- To terminate another session, run "kill SID" using its session id.

In the non-freeware version, output in HTML format is available.

$ curl localhost:5678/sessions