Magento uses the EAV (Entity Attribute Value) model so that Administrators can dynamically add attributes to entities such as “Product” and “Customer. This blog post provides a quick overview of the EAV model as a precursor to future blog posts discussing the planned introduction of a higher-level database persistence API. This higher-level API is planned to support EAV internally rather than having the MySQL table structures exposed directly to business logic.
Note that the new database persistence API is a long-term investigation. It is not currently assigned to a specific release. Plans around changes to the API will be discussed on this blog as a part of collecting community feedback.
Entities
Magento supports persistent entities such as “Customer” and “Product”. Entities are persisted in MySQL using multiple records spanning multiple database tables.
Each entity has a series of named attributes. For example, in the underlying primary entity table (“catalog_product_entity” for products), some attributes are stored directly in table columns (such as “updated_at”), while others are stored in a special format referred to as EAV (the Entity Attribute Value model). There are three main types of attributes:
In practice the first two types of attributes are very similar. They are known to programmers when code is written and so can be built into the code base. Custom attributes, on the other hand, are (generally) not known when developers write code, as they are defined by merchants in the Admin panel. As a result, a dedicated table column with the same name cannot be used for custom attributes. New attributes can appear at any time, and changing the database schema typically requires system downtime, especially for larger tables.
The Entity Attribute Value Model
The EAV model involves having a table where one column holds the entity id, another holds the attribute identity (typically an id, but you can think of it as the attribute name), and finally you have the attribute value. Because attributes are typed (string, integer, float, etc), Magento has multiple EAV tables, one per data type required for attributes. For example, “catalog_product_entity_int” is used to store integer attribute values and “catalog_product_entity_datetime” is used to store date/time attribute values.
To retrieve an entity, a left-join can be used from the entity table to the EAV table to pick up each wanted attribute value. Using a left join means that if a record in the EAV table does not exist for the entity, NULL will be returned for that attribute. (A regular join would have dropped the entity row.) The SQL queries are typically structured something like:
SELECT e.id, a1.value AS description FROM catalog_product_entity e LEFT JOIN catalog_product_entity_varchar a1 ON (e.entity_id = a1.entity_id AND a1.attribute_id = 123)
This allows new attributes to be created at any time, as new attributes do not require a change to the database schema. However, performance suffers, because join queries are required to fetch a full entity.
Store View Scope
The EAV model is also useful for another concept – scoping. Attributes can be defined with a scope, such as the store view scope for product attributes. This is how Magento stores multiple translations of a product description, one per store view. For such attributes, another column is added to the EAV table (“store_id” in the case of “catalog_product_entity_text”). Because new store views can be created at any time, a dedicated column to hold a product description per store view is not feasible.
Thus, an attribute such as “description” can have multiple EAV table rows for the one entity – one per store view.
When displaying a product page, the store view to use is known. That means it can be specified when querying the database. However, this is where another capability of Magento kicks in. Rather than mandate a description per store view, a default value can be used for when a value has not been provided for a specific store view.
In the admin panel, when editing a product, a drop down is provided listing possible scopes in the top left corner of the page. If you have added an additional store view to your site, you must select the appropriate store view before you start editing attributes.
In the following screen shot two store views have been defined, “First Store View” and “Second Store View”. This results in three selectable options in the drop-down menu:
(The greyed out “Main Website” and “Main Website Store” show the website and store that the first and second store views belong to. “All Store Views” is greyed out because it is already the current selection.)
Internally, all store views are allocated an id, typically integers incrementing from 1. That is, the first store view will typically be allocated a store view number of 1, then second store view will typically be allocated a store view number of 2, and so on.
Store View 0
“All Store Views” is allocated store view id 0. If you select a store view other than “All Store Views” from the drop down, all store view scoped attributes in the Admin have a checkbox to specify whether to use a default value (store id 0) or to keep a value for the current store view. Checking “Use Default Value” means an attribute value record will not be stored for the current store view, unchecking will cause an attribute value record for the current store view to be created.
This fallback scheme can be useful in multiple scenarios
Coming back to attribute types, “NULL” is an important value for some attribute types to represent when a field in the Admin is blank. For example, an integer “inventory_count” attribute on a form can specify a number or be blank. When blank value is specified, the consumer might see “Call to check the availability” message, versus “Out of stock” for a value of zero and “In stock” for integers above zero. Blank integer fields are represented at the database level as a NULL value. (Text fields on the other hand do not distinguish between a zero-length string and NULL.)
So how to specify a SQL query to pick up an attribute value, including picking up the default value if a value for a specific store view is not found? Here is a first attempt at a query.
SELECT e.entity_id, IFNULL(s1.value, d1.value) AS num FROM catalog_product_entity e LEFT JOIN catalog_product_entity_int s1 ON (e.entity_id = s1.entity_id AND s1.attribute_id = 123 AND s1.store_id = 2) LEFT JOIN catalog_product_entity_int d1 ON (e.entity_id = d1.entity_id AND d1.attribute_id = 123 AND d1.store_id = 0)
This query is however wrong. Can you see how?
The problem is it is saying to use the s1 record (for store view 2 in this example) if the value is not NULL, otherwise use the d1 (default) value. The LEFT JOIN will insert a NULL value into the s1.value attribute if there is no record for that store view. The problem is that attribute might have been intentionally NULL. The query needs adjusting as follows:
SELECT e.entity_id, IF(s1.value_id IS NULL, d1.value, s1.value) AS num FROM catalog_product_entity e LEFT JOIN catalog_product_entity_int s1 ON (e.entity_id = s1.entity_id AND s1.attribute_id = 123 AND s1.store_id = 2) LEFT JOIN catalog_product_entity_int d1 ON (e.entity_id = d1.entity_id AND d1.attribute_id = 123 AND d1.store_id = 0)
By checking the “value_id” field instead of the “value” field, the query can now reliably detect when the LEFT JOIN operator failed to find a record to include and thus pick the right value to use.
Indexing
To fill out the complete story, it is worth noting that “indexing” in Magento includes generating a database row for every store view to avoid the need to check for store view 0 as above. When the “master” entity definition is updated, the index records for that entity are updated. This is done for efficiency and performance. The index table can also have a column per attribute value (often a subset of all attributes). That allows a single record to be retrieved from the database containing multiple attribute values, rather than requiring LEFT JOIN operations to fetch the attribute values. This improves performance.
The Future Magento Persistence Layer
Why is this discussion important? As we explore providing an improved database persistence API, we are anticipating implementing the EAV table structure behind the API. Business logic should not be concerned with the physical table representation used to store entities and attributes. This allows potential optimizations in the way data is stored in MySQL without change to business logic code.
This is also important as one of the goals of the new database API is to provide an abstraction layer from MySQL. That opens up the opportunity for other database technologies to be used in the future. For example, MongoDB with its richer “document” structure could be used to store an entity with all scoped attribute values in a single document. To achieve this in the most efficient way possible, the Magento database API should operate at the entity level. For example, when inserting a new product, it should be possible to provide a complete entity including attribute values for all store views so a single MongoDB operation is required.
Indexing, as briefly mentioned, is also to be abstracted behind the API allowing different indexing technologies to be swapped in. For example, when enumerating all store views for an entity, it may be possible in a NoSQL database to represent this in a more efficient form with fewer database records when there is no change in attribute values between store views.
In order to swap in other database technologies with correct semantics, it is important to fully understand the semantics of current functionality. There are a number of optimizations that could be made if NULL was used to indicate when to fall back values. But this is where the saying of “correctness over speed” comes in. Optimization should never be used as an excuse to introduce incorrect semantics.
Conclusion
This post has given an overview of how attribute values (with scopes) are stored in Magento today. EAV is a useful physical database schema approach to allowing a dynamically extensible schema. However, different database technologies allow different approaches. Introduction of a database API should open up interesting new opportunities to explore different persistence structures without changing business logic.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.