cancel
Showing results for 
Search instead for 
Did you mean: 

Create a mechanism to checkpoint time series data for use in the codebase.

Create a mechanism to checkpoint time series data for use in the codebase.

Broadly speaking, there is a few different ways of expressing application state in such a way that it's useful debugging:

 

  1. Logs (implemented in M2 with the excellent psr/log interface)
  2. Time series data (also known as metrics). Things like Data Dog, Promethiums, Sensu and so on.
  3. Traces; primarily expressed by opentracing.io, but also Zipkin, Lightstep

 This feature request is to track the implementation of #2; the expression of time series data in the application, as well as the export of this data via whatever mechanism might be setup to receive it.

 

Time series data exposed by an application can provide extremely rapid feedback on an applications performance. This information can be used for a number of useful processes such as:

Note: I am doing some work on this with PHP more generally tracked on github. Best place to contrib beyond core support is there.

 

There is currently no simple way to express such time series data in Magento. There are several efforts that have been done to implement this in a third party, vendor specific way including:

 

However, these suffer form a few drawbacks. In particular, 

 

  • There is no standard API (such as the psr/log api) that can be invoked in third party extensions reliably
  • It locks the users into a hosting infrastructure; undesirable for the continuity of the project.

Additionally, there are various metrics that would prove extremely useful when debugging Magento:

 

  1. Cache hit rates (https://github.com/magento/magento2/issues/11151)
  2. Cron metrics, such as last execution per job, queue length etc.
  3. Indexer status (expressed as 1 or 0)
  4. Cache enabled status (expressed as 1 or 0)
  5. Cache invalidations issued
  6. Orders in various states (processing, shipped etc)
  7. Customer logins (aggregate, not per customers)
  8. HTTP status codes

However, the utilities aren't limited to the above. In particular, third party code often has reason to express it's health (such as the code developed by an agency) to ensure custom logic is functioning correctly, and that a feature is delivering sufficient value to jusify it's cost.

 

General Background

 

 

It is suggested the implementer have some experience implementing and using application instrumentation. There is a body of knowledge in this area that is immense of it's own right, and not a task that is so simple.

 

Beyond that, the goal of the work should be to checkpoint the data, but defer all processing to the implementing time series databases. It should also have minimal impact on performance. Lastly, high cardinarily data can be expressed through logs or traces, and is perhaps not something that is easily expressed by time series metrics (though honeycomb et. al would beg to differ)

 

Primitives

 

 

The open source monitoring package Prometheus expresses the primitives:

  • counter
  • gauge
  • histograma
  • summary 

The author has implemented this in several places, and has found uses for counter and gauge in a repeated way, as well as a limited use for histogram. To begin, the implementation of "guage" and "counter" would be amply sufficient to justify the majority of cases. This is supported by both Prometheus and DataDog, and it's presumed the majority of time series databases.

 

Non Goals

 

It should not be a goal of this work to implement the metrics view in Magento. This can be approached as a separate task, but the primary intent of this work is to express time series data in such a way it can be consumed by third party services.

 

It should not be a goal of this work to track the time associated with the time series data. The sampling is left to the third party application implementations.

 

It should not be a goal of this work to instrument third party applications (redis). They have their own implementations, and should be considered separate services. 

 

It's unclear whether instrumenting the PHP runtime itself would be beneficial, though this can likely be handled in the bespoke cases it's required by third party utilisation of the above library.

 

Interface

 

Ideally there should be an interface similar to the psr/log exposed that allows the checkpointing of metrics. Something like:

 

 

<?php

class Foo
{
    private $oMetricRegistry;

    /**
     * Dependencies automagically wired by DI
     */
    public function __construct(
        // Singleton injected that stores the metrics. Invariably a global, persists towards the end of the request
        \Magento\Framework\Metrics $oMetricRegistry
    ) {
        $this->oMetricRegistry;

        // Ensure the metric exists. This operation is an idempotent "create if not exists" type operation.
        $this->oMetricRegistry->register(
            // metric identifier
            'vendor_extension_foo_thing',
            // metric type. Can be one of "count" or "gauge"; potentially "histogram".
            'count',
            // label keys. Labels are mechanisms to slice time series data, supported by both data dog and prometheus
            [
                'attribute'
            ]
        );
    }

    public function doThing()
    {
        // An operation changing the state of the metric. In this case, an increment -- it's a count.
        $this->oMetricRegistry->increment(
            // The metric identifier
            'vendor_extension_foo_thing',
            // The label values.
            ['value']
        );
    }
}

This would allow an extremely simple API to quickly expose the data for ingestion into time series.

 

Storage

 

Ideally, the storage engine should be pluggable, but implement:

 

  • APCu
  • In Memory (or "null" when considering parity with psr/log)
  • Redis
  • MySQL

This would cater the wide range of hosting constraints that Magento can operate with, with a predisposition for not changing exising behaviour (checkpoint to memory but flush at the end of each request)

 

Exposition

 

There are various ways that the data may be exposed. It's suggested the core team deliberately not support any mechanim (except perhaps a native admin panel one, expressed as separate work). Instead, rely on third party / community vendors to provide the required libraries, similar to psr/log.

 

Broader PHP community involvement

 

 

It is the authors hope that this conversation is extended to the broader PHP community, perhasp through the involvement in FIG. A single interface for time series data would have ramifications for the langauge more generally.

 

Future Work

 

Once implemented, Magento's introspectability could be extended further by implementing traces.

 

Related Reading:

 

1 Comment
andrewhowdencom
Senior Member

As it turns out, there is already a standard mechanim being developed to express this. See https://github.com/census-instrumentation/opencensus-php