Database Migration

Pepys-import repository uses Alembic for data migration. Alembic is a database migration tool which is maintained by SQLAlchemy. It is possible to apply the existing migrations to the users’ database via pepys_admin. Don’t forget to check your configuration file before running pepys_import or pepys_admin.

Instructions

If this is your first time using Alembic, please do the followings:

Alembic reads DB configurations from the repository’s config file( config.py). Please check your PEPYS_CONFIG_FILE environment variable. If it doesn’t exist, it means that default_config.ini will be used to create a connection.

If you have an existing DB with tables and values, you have two options:

1. The easiest option is removing your schema (or entire DB for SQLite) completely and creating from the scratch. You might run “alembic upgrade head” which is going to create all DB tables and alembic_version table.

It will stamp Alembic’s head to the latest migration. You might see this migration revision ID in alembic_version table.

2. (NOT SUGGESTED!) If you don’t want to lose your values in the DB, you might create alembic_version table and stamp it manually. For doing that, please run the following commands:

Postgres

CREATE TABLE pepys.alembic_version
(
    version_num VARCHAR(32) NOT NULL,
    CONSTRAINT alembic_version_pkc PRIMARY KEY (version_num)
);
INSERT INTO pepys.alembic_version VALUES ('XXXXXX');
Please change XXXXX with the latest migration number (revision variable in the most recent script file in migrations/postgres_versions).
If you have the same schema with the base migration script, you won’t have any problem. You can test it with this command:
alembic current
If it doesn’t throw any error, it is okay to go!
Note: It should print the revision ID of the head with the context information:
INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
INFO  [alembic.runtime.migration] Will assume transactional DDL.
5154f7db278d (head)

SQLite

CREATE TABLE alembic_version
(
    version_num VARCHAR(32) NOT NULL,
    CONSTRAINT alembic_version_pkc PRIMARY KEY (version_num)
);
INSERT INTO alembic_version (version_num)
VALUES ('XXXXXX');

Please change XXXXX with the latest migration number (revision variable in the most recent script file in migrations/sqlite_versions). So, the last line will stamp your database with the latest migration. For controlling your database version, you might run the following command:

alembic upgrade head
Note: It should print the revision ID of the head with the context information:
INFO  [alembic.runtime.migration] Context impl SQLiteImpl.
INFO  [alembic.runtime.migration] Will assume non-transactional DDL.
ccc37f794db6 (head)
If get this output, it means that it worked correctly. You can use Alembic according to your needs.

How to use it? (For Developers)

You can add/update/delete Base models (pepys_import.core.store.postgres_db and pepys_import.core.store.sqlite_db). If there is any change, Alembic might automatically generate a migration script:

alembic revision -m "YOUR MESSAGE" --autogenerate

It will create a script under migrations/postgres_versions or migrations/sqlite_versions according to your configuration. Alembic assigns revision IDs using uuid.uuid4(). When you open the created script, you can see the number (revision = XXXXXX) It is strongly suggested to control the autogenerated script before DB migration.

Latest Revision IDs

When we create a new database using SQLAlchemy, we use either one of the values to stamp the database with the latest version: LATEST_POSTGRES_VERSION and LATEST_SQLITE_VERSION.
They are automatically updated when a new migration file is generated.
For this update, a post-hook function is used (under migrations/env.py module):
@write_hooks.register("update_latest_revision")
def update_latest_revision(filename, options):
    with open(filename) as file_:
        lines = file_.readlines()


    # Load json file
    json_file_path = os.path.join(MIGRATIONS_DIRECTORY, "latest_revisions.json")
    with open(json_file_path, "r") as json_file:
        data = json.load(json_file)


    for line in lines:
        # If line has revision variable, i.e. revision = "bcff0ccb4fbd", remove new line
        # character and quote marks, split line into two parts: ('', 'bcff0ccb4fbd'), obtain the
        # second element
        if line.startswith("revision = "):
            split_tokens = line.replace("\n", "").replace('"', "").split("revision = ")
            revision_id = split_tokens[1]


    if "postgres_versions" in filename:
        data["LATEST_POSTGRES_VERSION"] = revision_id
    else:
        data["LATEST_SQLITE_VERSION"] = revision_id


    # Dump updated json
    with open(json_file_path, "w") as json_file:
        json.dump(data, json_file, indent=4)
It does basic text processing. It finds the line which starts with revision = ` and takes the :code:`revision_id from that line. Then, it updates the corresponding key in JSON (LATEST_POSTGRES_VERSION if it’s a migration script for Postgres, LATEST_SQLITE_VERSION if it’s a migration script for SQLite).

We created this script because we had troubles when we initialised a database using SQLAlchemy. We forgot to update the revision id.
Alembic tried to upgrade the database even though the database was in the latest version (SQLAlchemy uses the model classes, and they are always updated). So, instead of manually updating the revision IDs for each migration, we used this approach.

After a migration script is checked, you can also interpret SQL script of it. Please note that this only works for PostgreSQL. Batch operations are used for SQLite to not lose any data. Due to that, it needs further actions to run --sql command. These actions were carried out and caused more troubles. Therefore, it was decided not to support --sql for SQLite.

For generating SQL codes from postgres migration scripts, the following command might be used:

alembic upgrade XXXXX --sql > new_migration.sql. (Please change XXXX with the revision ID of the migration script.)

This command will create a new file named new_migration.sql which has SQL Scripts for all migrations, so that you can play with it or run it in your DB’s console.

You can also give a start point if you would like to: alembic upgrade FROM:TO --sql > new_migration.sql
For example, let’s check a small migration script named 2020-06-30_b311affac706_update_nullability_of_columns_of_.py under migrations/postgres_versions.
You can open the file and pass the first three number of down_revision and revision in this format: alembic upgrade 018:b31 --sql
It has to print this script:
(pepys-import) baris@bariss-MacBook-Pro pepys-import % alembic upgrade 018:b31 --sql
INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
INFO  [alembic.runtime.migration] Generating static SQL
INFO  [alembic.runtime.migration] Will assume transactional DDL.
BEGIN;

INFO  [alembic.runtime.migration] Running upgrade 0189cb24812f -> b311affac706, Update nullability of columns of Activations
-- Running upgrade 0189cb24812f -> b311affac706

ALTER TABLE pepys."Activations" ALTER COLUMN "end" DROP NOT NULL;

ALTER TABLE pepys."Activations" ALTER COLUMN start DROP NOT NULL;

UPDATE pepys.alembic_version SET version_num='b311affac706' WHERE pepys.alembic_version.version_num = '0189cb24812f';

COMMIT;
Note-1: Please keep in mind that you should consider possible failures before applying the migration.

Note-2: SQLite doesn’t support ALTER TABLE syntax. Therefore, render_as_batch=True is passed to the Alembic’s context and batch_alter_table is used in migration scripts. For further information: Running Batch Migrations
This batch operation successfully drops a table, creates a new one with an arbitrary name, adds the copied values from the dropped table, and finally renames the new table.

Note-3: If you would like to write your own migration script, you don’t need to pass --autogenerate flag. For example:
alembic revision -m "YOUR MESSAGE"
It will create a migration script with empty upgrade(), downgrade() functions. You can fill them manually.

When you have new migration scripts to migrate and the scripts are checked/corrected, you can upgrade your DB:
alembic upgrade head

It is also possible to downgrade migration scripts. You can give a revision ID to do that:
alembic downgrade head REVISION_ID
If you would like to use relative identifiers, such as alembic downgrade -1, you might check it out: Relative Identifiers

Note-4: During the migration of SQLite Database, it’s possible to see this error:
AddGeometryColumn() error: "UNIQUE constraint failed: geometry_columns.f_table_name, geometry_columns.f_geometry_column"

You can ignore this error because it says that there is already a geometry column entity for your table that has a Geometry column (i.e. States, Geometries, Contact). This error happens because SQLite doesn’t support ALTER TABLE statement. Instead, it creates a new one, copies values from the previous table, and drops the existing table. However, during the creation of a new table, it tries to push the Geometry column to the geometry_columns table again.

How to use it? (For Users)

Migrations might be applied in pepys_admin. Its migration option migrates the DB to the latest version. However, if they would like to check it, after ensuring their config file points to the correct database, they might run this command to upgrade their DB if there are any migrations:

alembic upgrade head

Helpful Commands

If you would like to see the current head of Alembic: alembic current

If you would like to see the history of the migration: alembic history

If you have changed schema and want to create a migration script: alembic revision -m "YOUR MESSAGE" --autogenerate

If you would like to see SQL script of migration scripts (Don’t forget to change START and END values with the migration revision IDs): alembic upgrade START:END --sql

Note: Recall that this only works for PostgreSQL.

If you would like to get information about a migration script of Alembic:
alembic -c alembic_sqlite.ini show XXXX or alembic -c alembic_postgres.ini show XXXX
Please change XXXX with a migration script’s revision ID or head or current.

If you would like to edit a migration script:
alembic -c alembic_sqlite.ini edit XXXX or alembic -c alembic_postgres.ini edit XXXX
Please change XXXX with a migration script’s revision ID or head or current.

Note: Show and Edit commands don’t run env.py file and it checks the migration scripts in the version_locations which is not specified in the main config file. Therefore, one of the extra config files should be passed to show/edit scripts in migrations/postgres_versions and migrations/sqlite_versions.

Please check the cookbook and documentation of Alembic for further information.

Possible Errors and Solutions

sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) Cannot add a NOT NULL column with default value NULL

If you have values in your DB and you try to add a new attribute to the filled table which can’t be NULL, you should make this attribute nullable.


File "migrations/env.py", line 9, in <module>
from config import DB_HOST, DB_NAME, DB_PASSWORD, DB_PORT, DB_TYPE, DB_USERNAME
ModuleNotFoundError: No module named 'config'

If you face this error, it means that the pepys-import repository should be installed in a development environment. Please run the following command when you are at the root of the repository:

source PATH/TO/YOUR/ENV/bin/activate
pip install -e .
The error should be corrected now. Please try to run the same command again.
Alternatively, you can always add the local project to your PYTHONPATH. For example:
PYTHONPATH=. alembic current

(pepys-import) baris@bariss-MacBook-Pro pepys-import % alembic revision -m "message" --autogenerate
Database tables are not found! (Hint: Did you initialise the DataStore?)
INFO  [alembic.runtime.migration] Context impl SQLiteImpl.
INFO  [alembic.runtime.migration] Will assume non-transactional DDL.
ERROR [alembic.util.messaging] Target database is not up to date.
  FAILED: Target database is not up to date.

If you make some changes and try to create a new migration script without having the latest version of the database, you will face this issue. You should upgrade your DB and then run the revision command:

alembic upgrade head
alembic revision -m "your message" --autogenerate

FOR DEVELOPERS ONLY

Database tables are not found! (Hint: Did you initialise the DataStore?)

If you migrated your SQLite database and started to see this message, it might mean that your migration script is wrong. It would have been because of the extra tables created by Alembic. When a table with Geometry column is included in migration, please check the SQLite database if there are any temporary tables (with tmp keyword) such as idx_Geometries_tmp_geometry_node etc. If there is any, please drop these tables. After that, please open your migration script and add spatial_index=False argument to the Geometry column. An example is as follows:

# Change this column
geometry = deferred(Column(Geometry(geometry_type="GEOMETRY", management=True), nullable=False))
# Add spatial_index=False parameter
geometry = deferred(
    Column(
        Geometry(geometry_type="GEOMETRY", management=True, spatial_index=False), nullable=False
    )
)

Why do we have many SQLite related problems ?

In general, SQLite’s unsupporting ALTER TABLE statement pushes us to use the move-and-copy approach to not lose any data. But when we use it, all problems arise because we use render_as_batch=True parameter to apply the move-and-copy approach. Alembic calls it the Batch mode and I think many of our problems related to that.

Move-and-copy approach

Migration tools are instead expected to produce copies of SQLite tables that correspond to the new structure, transfer the data from the existing table to the new one,
then drop the old table. For our purposes here we’ll call this “move and copy” workflow, and in order to accommodate it in a way that is reasonably predictable,
while also remaining compatible with other databases, Alembic provides the batch operations context.