We have a very large product database -- currently about 4.5M active products on Magento 2.4.1 using ElasticSearch 7.9.
Every couple days (3-8) the indexing process runs building up a new search index. What appears to then happen is it gets to some point in the indexing process and the Alias is pointed to this new Index. Then, something comes along and resets the index and continues building it.
So, Indexer "A" starts and gets close to 4M or so. My thought is it gets the "last batch" of products to index and Indexer "B" kicks off and starts. Then, for some reason, the "new" index is reset and the alias is updated (I believe by the first Indexer "A" process) to point to this "new" (now reset) index which may have some products. We have seen it with as few as 4K or as many as 1.9M. Either way, it still has several million products to index which takes several hours.
It appears some type of Mutex is either not getting set or is being ignored so that 2 index processes get running. The first one completes and moves the alias but the second has reset the index out from under the first.
Any help on where the indexer mutex or check is implemented would help for adding debug information.