cancel
Showing results for 
Search instead for 
Did you mean: 

A Declarative Approach for Database Schema Upgrades

Magento Team

This blog post explores the proposed new strategy to simplify the Magento installation and upgrade processes. This new concept will allow a developer to get away from writing updates for each new version in favor of declaring the final result that the developer wants to achieve.

 

At the moment, in order to change the database schema, extension developers need to write code (PHP script) with the desired operations: renaming columns, adding new CMS pages, etc. Currently Magento 2 has 5 types of scripts:

  1. InstallData and InstallSchema scripts, that are executed on a clean (empty) database;
  2. UpgradeData and UpgradeSchema incremental scripts, that supplement an existing Magento database;
  3. Recurring scripts, that are executed each time you install or upgrade Magento.

For installation, all scripts of all types are executed in a loop, in order of module registration. For upgrades, only scripts that match the module version being upgraded to are executed. This mechanism is used for executing each upgrade script only once. However, the current approach has several disadvantages:

  1. Having different types of scripts makes the system unstable and difficult. It may seem that install scripts are only executed when Magento is installed. However, upgrade scripts can even be executed during a new installation. This can occur when installation script content is frozen on the first release (e.g. 1.0, 2.0) and then changes are added only with upgrade scripts. Developers can easily make the mistake of putting changes into an installation script or old upgrade script instead of creating new one, resulting in the changes being skipped.
  2. Excessive complexity of installation. It is hard to keep in mind the base definition of SQL tables and their modifications  to understand what the final database tables look like. The most complex case is installing Magento Enterprise Edition with the Staging module enabled. In this scenario, Magento removes primary and foreign keys for all staged entities, adds a new column to these entities, and adds new foreign keys. So the final structure is not similar to the base definition. It begs the question: “Why not create a table with right structure once, instead of creating the deprecated table and then modifying it to produce the correct structure?”
  3. There is no update validation and possible errors cannot be predicted before the installation process. Before installation, developers cannot be sure that all relations between tables exist. They also cannot be sure whether the table has been modified by another extension. For example, Magento has “split database” functionality that allows tables to be separated between databases (shards). If an extension tries to reference the table from a different shard, it will get an error.
  4. With the release of new versions, the size of the UpgradeData and UpgradeSchema scripts grows. This is because sections for new versions must be added to the scripts without removing old sections. The content of upgrade scripts is separated by versions as shown in the following example:
if (version_compare($context->getVersion(), '2.0.2', '<')) {
    $this->updateAdminTextSwatchValues($setup);
}
if (version_compare($context->getVersion(), '2.0.3', '<')) {
    $this->convertAddDataToJson($setup);
}

This file becomes harder to read as additional versions are added. A better approach is to use database structure declarations, as is done by projects such as Doctrine. Schema files declare what the database structure should be, and Magento determines the differences between the current table structure and what it should be. These differences can be represented with atomic SQL operations. Here is an example of such a declaration:

<table name="catalog_product_entity">
    <field name="entity_id">
        <type>integer</type>
        <unsigned>1</unsigned>
        <primary>1</primary>
        <identity>1</identity>
        <comment>Entity ID</comment>
    </field>
    ...
</table>

In case of updating/installing data, we propose to use a patch system. A Setup Patch is a PHP script that is declared in XML format that performs a single operation (for example, adding a CMS Page  and can be applied only once. In contrast to the current scripts, patches do not have versions and can be identified by their names. Patches will be rolled back when they are absent in the XML entity declaration, and are present in the database. This approach has the following benefits:

  1. Avoids missed or repeated SQL operations. Developers install only what is necessary. This approach bypasses short-lived scripts in which some functionality appears in one version and is neutralized in a later version.
  2. Dry-run mode for SQL structure and data. A dry run mode allows installation testing. Also in dry-run mode, raw SQL queries can be generated and persisted in an SQL dump file on a staging environment, The dump file can be reused in production as an alternative way of installation.
  3. Performance improvements. The new installation approach generates one set of SQL commands to jump from the current version to the final end point. The current scheme can apply multiple SQL commands to the same tables. The new approach can save a lot of time on large, populated tables. Also, installation through a generated SQL dump file will run faster, as there is no need for data and structure calculations, such as loading or persisting entities. The new approach can speed up Magento installation with Sample Data by several times.
  4. Database data memory. The new approach introduces a new Magento installation mode with support for rollbacks. Magento will be able to restore removed data during a rollback. For instance, if the developer has installed an extension that deleted a column with data from a core table, he can remove this extension and do rollback. This is achieved by dumping data that will be lost (such as values in a column to be dropped from a table). Magento determines whether data should be backed up before executing destructive operations, like removing a column, table or changing column type. Before installation, Magento will check whether it has backed up data for new table or column. If so, Magento will try to apply dump data. If the backup file is not needed, it can be deleted by manually deleting the files from the file system. Note that Magento will save not only data from deleted columns, but also the unique key value for the table. Repair is not supported for tables without a unique key.
  5. Ability to revert to a previous version . Related to the previous point, the new approach will allow merchants to revert to a previous version if required. This is not a replacement to trying out new releases in a development or staging environment, but is a useful ability when needed.
  6. Database structure validation before installation. In order to be sure that installation will be clean, Magento will validate the database schema for table relations, SQL keys, etc. before proceeding with the upgrade.
  7. Ability to modify core database structure. This can be achieved by the merge of 3rd party and core declaration files. Developers can add their own columns in existing tables. This question is critical for extension developers when they work with flat tables such as “quote” and “sales_order”.
  8. Developers can modify existing columns. For example, at the moment Magento supports only IPv4. In order to add IPv6 support, Magento must store longer IP addresses. Currently to achieve this a new update should be created. The proposed new approach implies that the length can be changed by creating the declaration file with column reference and new value of the length in it: 
<table name="admin_user_session">
    <field name="ip">
        <length>100</length>
    </field>
</table>

The merge algorithm will validate breaking changes in database structure such as removing columns, decreasing lengths, changing the column type, etc., and can then block installation.

 

In conclusion, Magento plans to implement declaration scripts in a future Magento release. At this time it is planned to replace the old core install and upgrade scripts with XML declarations for database table structure and with PHP data patches for database data.


But what will happen to the existing 3rd party install and upgrade scripts? Due to the Magento Backwards Compatibility policy, all old setup Install and Upgrade scripts will work as before.

 

Human-readable declaration files, clear merging strategy, ability to repair database, data memory and dry-run features underpin the new installation strategy. All this will become possible after switching to the new approach, based on comparing current state of database structure and data and desired one.

12 Comments
Magento Team

Maybe another benefit would be simplifying/obsoleting UNinstall scripts, which I believe all too few devs even know about. Could Magento just use these structure declarations to magically know what to clean up, somehow?

Magento Team

Yes - most likely uninstall scripts are no longer required as it is moving from one declared state to another.

Senior Member

It's an awesome idea. I just wondered how a column rename would take place. When a declation changes from

 

 

<field name="description">
<type>text</type>
..
</field>

 

to

 

 

<field name="longDescription">
<type>text</type>
...
</field>

 

in a new version of a module, how do you make sure the column "description" is not dropped and an empty column "longDescription" is added, when the intention is just a rename action. I don't know how this works in Doctrine.

 

Magento Team

@Garfix thanks for the question. 
There will be a possibility to rename the column, by adding declaration of a column with new name and with "baseOn" attribute (the value of this attribute will hold the previous name). When declaration will processing it will merge the definition of the old and of the new column, and use new column name. And when we will compare declaration and database: we will have all the columns, that should be renamed, and instead of deleting them and creating new one, we will rename them, and if it needed we will change their definition.

<field name="longDescription">
<type>text</type>
<baseOn>description</baseOn>
...
</field>

This approach allows to rename one column few times. For example, we have the history of renaming: 
description -> mediumDescription -> longDescriptiion. 
It will gather defintions of all columns (description, mediumDescription, longDescription), and do rename from "description" to "longDescription".
Also approach allows to rename a column from another module and save the definition of that column.

 

Occasional Contributor

1. What happens when third party code accidentally merges in XML that modifies a core schema in subtle (or not so subtle) and unintended ways

 

2. What happens when third parties need SQL constructs (complex indexes, some combination of column properties, etc.) that are not supported via the declarative abstraction

 

3. What happens when third parties and integrators are upgrading/updating systems with large datasets and need to know *the method used to change* the database in order to reason about and plan upgrades/updates

 

4. What happens to the (at this point) years of M2 schema code (both core and third party) that's written in the old style

 

5. What happens with data updates, which include vital things like EAV structure updates?

 

A declarative syntax might make sense for a platform that's mostly controlled by a single party with others making minor modifications. However, for a platform like Magento, where a large amount of the work being done is by third parties and integrators who *don't* control the platform, it seems like the declarative syntax offers a number of new challenges without a commiserate benefit.

 

There's been a general consensus in the MVC web-application world (be it ruby, python, PHP, etc.) that the migration script approach offers the best set of tradeoffs. Regardless of that, switching Magento mid-stream to a radically different system seems like it would introduce a lot of refactoring busy work for everyone with Magento systems.

 

What's driving such a radical departure at this stage of the game?  Wouldn't improving the system in place be a more reasonable way of solving the actual problems that users face?

Magento Team

1. There should be the warning that some third party extension modify the core. But as I think, there should be a possibility to modify third party extensions. Also I think it is possible to make limitations: what exactly you can modify in core schema or other third party extensions.

2. There will be a way to do any SQL request. We have Patch system: which will allows to register in XML your PHP handlers (where you can run SQL) and apply them only for one time for one installation. Each handler will have up() and down() methods. This system will be propagated not only for data, but for schema too.
3. Lets assume that a merchant has very big database with large tables, so ALTER command will be very painfull for him. At the moment we have a set of commands: each command is responsible for appropriate SQL operation, for  instance ChangeColumn command. The set of commands is part of SPI, so merchant can substitute specific command with one he did (lets assume that his command will be faster for his case) and as the result speed up appropriate *SQL operation.

4. They will exists side by side with new declarations. We will not break the things.

5. Actually EAV is part of schema Smiley Happy All core data upgrades will migrate to Patch System, that I mentioned earlier. 

Inscreasing number of setup scripts leads system to a chaos. There is no possibility to uninstall malicious extension, clear database rubbish (like EAV attributes with source-models that do not exists in system and cause Fatal errors), prevent database errors (as right now there is no properly --dry-run mode), optimize system (by skipping setup scripts, the result of which will be nivilated during one installation) and a lot of other things...

Thanks for your questionSmiley Happy

Magento Team

@alanstorm also I think we will have tool: which will convert existsing PHP scripts to PATCHES or to declarative schema. Of course there will be a places, that should be edited manually, after tool. But eventually, this will help developers to move quickly from one setup system to another.
So this will not take a lot of time to refactor the code of extension.

Occasional Contributor

I had a pretty extensive point-by-point writeup -- but I'm not sure its worth posting. Your tone indicates this is a done deal, and my critique can be best summed up by the following.

 

Inscreasing number of setup scripts leads system to a chaos. There is no possibility to uninstall malicious extension, clear database rubbish (like EAV attributes with source-models that do not exists in system and cause Fatal errors), prevent database errors (as right now there is no properly --dry-run mode), optimize system (by skipping setup scripts, the result of which will be nivilated during one installation) and a lot of other things...

 

Your new system, as described, doesn't seem to solve these problems either. The chaos isn't because of any particular system, it comes from 122 core module + n additional modules all trying to do things. Radically changing things to use a declarative system won't fix this chaos, it will only add more.

 

Fixing and simplifying how the current system operates, and doing so in a way where no-changes-to-existing-modules is built-in from day 1 seems like a better choice. This, combined with tooling to look for invalid database states would be a better solution for the problem.

 

But -- not my circus, not my monkeys.  As an exercise -- make a note to come back here three months after you launch the features and see what sorts of problems your users are running into and compare with the questions I asked above.

 

Good luck!

Magento Team

@alanstorm Sorry for my toneSmiley Happy This is not already done feature - this is just the idea. 
Can you share with me, how we can tune up existing setup system? I will not argue: each new solution brings new problems. You are absolutely right.


The chaos isn't because of any particular system, it comes from 122 core module + n additional modules all trying to do things

The main idea of setup scripting approach is that in every script we exactly know how our database and system at all looks like. We know how many scripts we have and in what order they will apply. But what if n additional modules will be added to the system?

If few modules will tries to change one table, no module will knows how current state of a table looks like. So anything can happens. Only If all modules would be independent - scripting will works fine.


One assumption. Let`s assume that you have database with shards. And all sales tables are on one specific shard. And you have 10 custom extensions, which references to sales tables. All this extensions are broken, because by default they are installed on default shard. The only solution, using old approach, is to refactor each setup script.

Also can you answer one question? It is not about declarative schema and particular idea I have  - it is about declaration at all and about differences between two approaches:

Let`s assume, that extension developer developing CMS Page, that changes from version to version (in one version new content was added, in another one - markup changed, etc), So through 5 versions, developer will have 5 scripts (no matter how scripts will looks like. Will it be 5 different scripts like in Magento 1, or just one UpgradeData script with 5 sections). And this is one approach. Using this approach, in order to add one new html block for CMS page you will need to create huge script and duplicate everything you have in previous version.
Another approach say, that you have described the content of CMS page, and all that you need to do is just change HTML content declaration.
So what way will you choose in this case and why? 

Thanks, for discussion!

Occasional Contributor

Can you share with me, how we can tune up existing setup system?

 

I don't work for Magento.  But good luck with you project. 

Senior Member

It was an awesome idea, if it was possible with a declarative solution only. Once parts of it need to be done with procedural code things get messy, and complicated.

 

To make the setup procedure easier, the install and upgrade classes can be combined, using only the functionality of the upgrade class, but calling it "install". The difference between "data" and "schema" is not very clear either, so perhaps they can be merged in as well. Somehow the manual version check via if-statements should be made more robust. Something like in Magento 1, but Object Oriented, of course.

 

Magento Team

Hi @patrick vbergen.

It was an awesome idea, if it was possible with a declarative solution only.

There will be places, that couldnt be done without procedure code, like bulk data updating or some performance improvements, or working with unknown qty of tables, indexes, etc


Distinguishing between schema and data was done because we need to ensure that schema is up to date, before data start installing. 

You are absolutely right: there is no big difference between upgrade and install scripts. Install is just first upgrade script:

                $installer = $this->getSchemaDataHandler($moduleName, $installType);
                if ($installer) {
                    $this->log->logInline("Installing $type... ");
                    $installer->install($setup, $moduleContextList[$moduleName]);
                }
                $upgrader = $this->getSchemaDataHandler($moduleName, $upgradeType);
                if ($upgrader) {
                    $this->log->logInline("Upgrading $type... ");
                    $upgrader->upgrade($setup, $moduleContextList[$moduleName]);
                }

As you can see from the code, upgrade script start right after install.

Speaking about substitution of if (version_compare) statements, there is only one alternative way I see: to split each upgrade into PHP classes (here you got Object Orientation), register them in system and keep applied PHP classes in database. So you can make comparison between registered and already applied PHP classes, get a difference and apply it.