cancel
Showing results for 
Search instead for 
Did you mean: 

A Declarative Approach for Database Schema Upgrades

skovalenko_m
Senior Member

This blog post explores the proposed new strategy to simplify the Magento installation and upgrade processes. This new concept will allow a developer to get away from writing updates for each new version in favor of declaring the final result that the developer wants to achieve.

 

At the moment, in order to change the database schema, extension developers need to write code (PHP script) with the desired operations: renaming columns, adding new CMS pages, etc. Currently Magento 2 has 5 types of scripts:

  1. InstallData and InstallSchema scripts, that are executed on a clean (empty) database;
  2. UpgradeData and UpgradeSchema incremental scripts, that supplement an existing Magento database;
  3. Recurring scripts, that are executed each time you install or upgrade Magento.

For installation, all scripts of all types are executed in a loop, in order of module registration. For upgrades, only scripts that match the module version being upgraded to are executed. This mechanism is used for executing each upgrade script only once. However, the current approach has several disadvantages:

  1. Having different types of scripts makes the system unstable and difficult. It may seem that install scripts are only executed when Magento is installed. However, upgrade scripts can even be executed during a new installation. This can occur when installation script content is frozen on the first release (e.g. 1.0, 2.0) and then changes are added only with upgrade scripts. Developers can easily make the mistake of putting changes into an installation script or old upgrade script instead of creating new one, resulting in the changes being skipped.
  2. Excessive complexity of installation. It is hard to keep in mind the base definition of SQL tables and their modifications  to understand what the final database tables look like. The most complex case is installing Magento Enterprise Edition with the Staging module enabled. In this scenario, Magento removes primary and foreign keys for all staged entities, adds a new column to these entities, and adds new foreign keys. So the final structure is not similar to the base definition. It begs the question: “Why not create a table with right structure once, instead of creating the deprecated table and then modifying it to produce the correct structure?”
  3. There is no update validation and possible errors cannot be predicted before the installation process. Before installation, developers cannot be sure that all relations between tables exist. They also cannot be sure whether the table has been modified by another extension. For example, Magento has “split database” functionality that allows tables to be separated between databases (shards). If an extension tries to reference the table from a different shard, it will get an error.
  4. With the release of new versions, the size of the UpgradeData and UpgradeSchema scripts grows. This is because sections for new versions must be added to the scripts without removing old sections. The content of upgrade scripts is separated by versions as shown in the following example:
if (version_compare($context->getVersion(), '2.0.2', '<')) {
    $this->updateAdminTextSwatchValues($setup);
}
if (version_compare($context->getVersion(), '2.0.3', '<')) {
    $this->convertAddDataToJson($setup);
}

This file becomes harder to read as additional versions are added. A better approach is to use database structure declarations, as is done by projects such as Doctrine. Schema files declare what the database structure should be, and Magento determines the differences between the current table structure and what it should be. These differences can be represented with atomic SQL operations. Here is an example of such a declaration:

<table name="catalog_product_entity">
    <field name="entity_id">
        <type>integer</type>
        <unsigned>1</unsigned>
        <primary>1</primary>
        <identity>1</identity>
        <comment>Entity ID</comment>
    </field>
    ...
</table>

In case of updating/installing data, we propose to use a patch system. A Setup Patch is a PHP script that is declared in XML format that performs a single operation (for example, adding a CMS Page  and can be applied only once. In contrast to the current scripts, patches do not have versions and can be identified by their names. Patches will be rolled back when they are absent in the XML entity declaration, and are present in the database. This approach has the following benefits:

  1. Avoids missed or repeated SQL operations. Developers install only what is necessary. This approach bypasses short-lived scripts in which some functionality appears in one version and is neutralized in a later version.
  2. Dry-run mode for SQL structure and data. A dry run mode allows installation testing. Also in dry-run mode, raw SQL queries can be generated and persisted in an SQL dump file on a staging environment, The dump file can be reused in production as an alternative way of installation.
  3. Performance improvements. The new installation approach generates one set of SQL commands to jump from the current version to the final end point. The current scheme can apply multiple SQL commands to the same tables. The new approach can save a lot of time on large, populated tables. Also, installation through a generated SQL dump file will run faster, as there is no need for data and structure calculations, such as loading or persisting entities. The new approach can speed up Magento installation with Sample Data by several times.
  4. Database data memory. The new approach introduces a new Magento installation mode with support for rollbacks. Magento will be able to restore removed data during a rollback. For instance, if the developer has installed an extension that deleted a column with data from a core table, he can remove this extension and do rollback. This is achieved by dumping data that will be lost (such as values in a column to be dropped from a table). Magento determines whether data should be backed up before executing destructive operations, like removing a column, table or changing column type. Before installation, Magento will check whether it has backed up data for new table or column. If so, Magento will try to apply dump data. If the backup file is not needed, it can be deleted by manually deleting the files from the file system. Note that Magento will save not only data from deleted columns, but also the unique key value for the table. Repair is not supported for tables without a unique key.
  5. Ability to revert to a previous version . Related to the previous point, the new approach will allow merchants to revert to a previous version if required. This is not a replacement to trying out new releases in a development or staging environment, but is a useful ability when needed.
  6. Database structure validation before installation. In order to be sure that installation will be clean, Magento will validate the database schema for table relations, SQL keys, etc. before proceeding with the upgrade.
  7. Ability to modify core database structure. This can be achieved by the merge of 3rd party and core declaration files. Developers can add their own columns in existing tables. This question is critical for extension developers when they work with flat tables such as “quote” and “sales_order”.
  8. Developers can modify existing columns. For example, at the moment Magento supports only IPv4. In order to add IPv6 support, Magento must store longer IP addresses. Currently to achieve this a new update should be created. The proposed new approach implies that the length can be changed by creating the declaration file with column reference and new value of the length in it: 
<table name="admin_user_session">
    <field name="ip">
        <length>100</length>
    </field>
</table>

The merge algorithm will validate breaking changes in database structure such as removing columns, decreasing lengths, changing the column type, etc., and can then block installation.

 

In conclusion, Magento plans to implement declaration scripts in a future Magento release. At this time it is planned to replace the old core install and upgrade scripts with XML declarations for database table structure and with PHP data patches for database data.


But what will happen to the existing 3rd party install and upgrade scripts? Due to the Magento Backwards Compatibility policy, all old setup Install and Upgrade scripts will work as before.

 

Human-readable declaration files, clear merging strategy, ability to repair database, data memory and dry-run features underpin the new installation strategy. All this will become possible after switching to the new approach, based on comparing current state of database structure and data and desired one.

15 Comments