cancel
Showing results for 
Search instead for 
Did you mean: 

Magento EAV Review

akent99
Regular Contributor

Magento uses the EAV (Entity Attribute Value) model so that Administrators can dynamically add attributes to entities such as “Product” and “Customer. This blog post provides a quick overview of the EAV model as a precursor to future blog posts discussing the planned introduction of a higher-level database persistence API. This higher-level API is planned to support EAV internally rather than having the MySQL table structures exposed directly to business logic.

 

Note that the new database persistence API is a long-term investigation. It is not currently assigned to a specific release. Plans around changes to the API will be discussed on this blog as a part of collecting community feedback.

 

Entities

 

Magento supports persistent entities such as “Customer” and “Product”. Entities are persisted in MySQL using multiple records spanning multiple database tables.

 

Each entity has a series of named attributes. For example, in the underlying primary entity table (“catalog_product_entity” for products), some attributes are stored directly in table columns (such as “updated_at”), while others are stored in a special format referred to as EAV (the Entity Attribute Value model). There are three main types of attributes:

 

  • Core Magento attributes (these are defined in the module that first declared the entity),
  • Extension attributes (these are like core attributes but declared in modules other than the module that first declared the entity – they “extend” an existing entity definition), and
  • Custom attributes (these are defined in the Admin panel).

In practice the first two types of attributes are very similar. They are known to programmers when code is written and so can be built into the code base. Custom attributes, on the other hand, are (generally) not known when developers write code, as they are defined by merchants in the Admin panel. As a result, a dedicated table column with the same name cannot be used for custom attributes. New attributes can appear at any time, and changing the database schema typically requires system downtime, especially for larger tables.

 

The Entity Attribute Value Model

 

The EAV model involves having a table where one column holds the entity id, another holds the attribute identity (typically an id, but you can think of it as the attribute name), and finally you have the attribute value. Because attributes are typed (string, integer, float, etc), Magento has multiple EAV tables, one per data type required for attributes. For example, “catalog_product_entity_int” is used to store integer attribute values and “catalog_product_entity_datetime” is used to store date/time attribute values.

 

To retrieve an entity, a left-join can be used from the entity table to the EAV table to pick up each wanted attribute value. Using a left join means that if a record in the EAV table does not exist for the entity, NULL will be returned for that attribute. (A regular join would have dropped the entity row.) The SQL queries are typically structured something like:

 

 

SELECT e.id, a1.value AS description
FROM catalog_product_entity e
LEFT JOIN catalog_product_entity_varchar a1
  ON (e.entity_id = a1.entity_id
      AND a1.attribute_id = 123)

 

This allows new attributes to be created at any time, as new attributes do not require a change to the database schema. However, performance suffers, because join queries are required to fetch a full entity.

 

Store View Scope

 

The EAV model is also useful for another concept – scoping. Attributes can be defined with a scope, such as the store view scope for product attributes. This is how Magento stores multiple translations of a product description, one per store view. For such attributes, another column is added to the EAV table (“store_id” in the case of “catalog_product_entity_text”). Because new store views can be created at any time, a dedicated column to hold a product description per store view is not feasible.

 

Thus, an attribute such as “description” can have multiple EAV table rows for the one entity – one per store view.

 

When displaying a product page, the store view to use is known. That means it can be specified when querying the database. However, this is where another capability of Magento kicks in. Rather than mandate a description per store view, a default value can be used for when a value has not been provided for a specific store view.

 

In the admin panel, when editing a product, a drop down is provided listing possible scopes in the top left corner of the page. If you have added an additional store view to your site, you must select the appropriate store view before you start editing attributes.

In the following screen shot two store views have been defined, “First Store View” and “Second Store View”. This results in three selectable options in the drop-down menu:

 

  • “All Store Views” (the defaults if values are not overridden by specific store views),
  • “First Store View”, and
  • “Second Store View”.

(The greyed out “Main Website” and “Main Website Store” show the website and store that the first and second store views belong to. “All Store Views” is greyed out because it is already the current selection.)

 

Blog Screenshot 1.png

 

Internally, all store views are allocated an id, typically integers incrementing from 1. That is, the first store view will typically be allocated a store view number of 1, then second store view will typically be allocated a store view number of 2, and so on.

 

Store View 0


“All Store Views” is allocated store view id 0. If you select a store view other than “All Store Views” from the drop down, all store view scoped attributes in the Admin have a checkbox to specify whether to use a default value (store id 0) or to keep a value for the current store view. Checking “Use Default Value” means an attribute value record will not be stored for the current store view, unchecking will cause an attribute value record for the current store view to be created.

 

Blog Screenshot 2.png

 

This fallback scheme can be useful in multiple scenarios

 

  • When adding a new store view, the new store view can pick up all the defaults as a starting point.
  • When a store has attribute values that are not language-specific and can be re-used on every store view, like the count of available products, prices, etc.

Coming back to attribute types, “NULL” is an important value for some attribute types to represent when a field in the Admin is blank. For example, an integer “inventory_count” attribute on a form can specify a number or be blank. When blank value is specified, the consumer might see “Call to check the availability” message, versus “Out of stock” for a value of zero and “In stock” for integers above zero. Blank integer fields are represented at the database level as a NULL value. (Text fields on the other hand do not distinguish between a zero-length string and NULL.)

 

So how to specify a SQL query to pick up an attribute value, including picking up the default value if a value for a specific store view is not found? Here is a first attempt at a query.

 

 

SELECT e.entity_id,
       IFNULL(s1.value, d1.value) AS num
FROM catalog_product_entity e
LEFT JOIN catalog_product_entity_int s1
  ON (e.entity_id = s1.entity_id
      AND s1.attribute_id = 123
      AND s1.store_id = 2)
LEFT JOIN catalog_product_entity_int d1
  ON (e.entity_id = d1.entity_id
      AND d1.attribute_id = 123
      AND d1.store_id = 0)

 

This query is however wrong. Can you see how?

 

The problem is it is saying to use the s1 record (for store view 2 in this example) if the value is not NULL, otherwise use the d1 (default) value. The LEFT JOIN will insert a NULL value into the s1.value attribute if there is no record for that store view. The problem is that attribute might have been intentionally NULL. The query needs adjusting as follows:

 

 

SELECT e.entity_id,
       IF(s1.value_id IS NULL, d1.value, s1.value) AS num
FROM catalog_product_entity e
LEFT JOIN catalog_product_entity_int s1
  ON (e.entity_id = s1.entity_id
      AND s1.attribute_id = 123
      AND s1.store_id = 2)
LEFT JOIN catalog_product_entity_int d1
  ON (e.entity_id = d1.entity_id
      AND d1.attribute_id = 123
      AND d1.store_id = 0)

 

By checking the “value_id” field instead of the “value” field, the query can now reliably detect when the LEFT JOIN operator failed to find a record to include and thus pick the right value to use.

 

Indexing

 

To fill out the complete story, it is worth noting that “indexing” in Magento includes generating a database row for every store view to avoid the need to check for store view 0 as above. When the “master” entity definition is updated, the index records for that entity are updated. This is done for efficiency and performance. The index table can also have a column per attribute value (often a subset of all attributes). That allows a single record to be retrieved from the database containing multiple attribute values, rather than requiring LEFT JOIN operations to fetch the attribute values. This improves performance.

 

The Future Magento Persistence Layer

 

Why is this discussion important? As we explore providing an improved database persistence API, we are anticipating implementing the EAV table structure behind the API. Business logic should not be concerned with the physical table representation used to store entities and attributes. This allows potential optimizations in the way data is stored in MySQL without change to business logic code.

 

This is also important as one of the goals of the new database API is to provide an abstraction layer from MySQL. That opens up the opportunity for other database technologies to be used in the future. For example, MongoDB with its richer “document” structure could be used to store an entity with all scoped attribute values in a single document. To achieve this in the most efficient way possible, the Magento database API should operate at the entity level. For example, when inserting a new product, it should be possible to provide a complete entity including attribute values for all store views so a single MongoDB operation is required.

 

Indexing, as briefly mentioned, is also to be abstracted behind the API allowing different indexing technologies to be swapped in. For example, when enumerating all store views for an entity, it may be possible in a NoSQL database to represent this in a more efficient form with fewer database records when there is no change in attribute values between store views.

 

In order to swap in other database technologies with correct semantics, it is important to fully understand the semantics of current functionality. There are a number of optimizations that could be made if NULL was used to indicate when to fall back values. But this is where the saying of “correctness over speed” comes in. Optimization should never be used as an excuse to introduce incorrect semantics.

 

Conclusion

 

This post has given an overview of how attribute values (with scopes) are stored in Magento today. EAV is a useful physical database schema approach to allowing a dynamically extensible schema. However, different database technologies allow different approaches. Introduction of a database API should open up interesting new opportunities to explore different persistence structures without changing business logic.