Select top N elements of related objects

Mhd :

I have a class Product to hold a specific instance of a given product. This class has a list of related products that are similar to main product.

class Product
{
    public string Name;
    public double Rating;
    public List<Product> RelatedProducts;
    //...
    public List<Product> GetTopRelatedProducts(int N)
    {
        // How to implement this method
        // What I did so far(Bad code)
        //     1- INFINITE RECURSION
        //     2- Can not remember visited objects
        var myList = new List<Product>();
        foreach(Product prod in RelatedProducts)
        {
             myList.AddRange(prod.GetTopRelatedProducts(N));
        }
        return myList.Distinct().OrderByDescending(x => x.Rating).Take(N).ToList();
    }
}

I want to define a method within Product class to get top N (best rated) related products. This method should take into consideration that elements in RelatedProducts list are of type Product and they also have their own RelatedProducts list. So I should keep navigating nested object until all related products were reached and after that, take the top N product. I mean, the solution would not be simply this.RelatedProducts.OrderByDescending(x => x.Rating).Take(N);

One more thing to keep in mind: Two products can be mutually related. Which means a product A can belong to RelatedProducts list of a product B and B also can belong to RelatedProducts list of product A.

Any suggestion how to solve this issue in optimal way?

Imagine, I have millions of products to maintain. How can I navigate all related products recursively and recognize already visited ones?

I tagged this as both C# and Java, as the same logic could be applied to both languages

Ivan Stoev :

Imagine, I have millions of products to maintain. How can I navigate all related products recursively and recognize already visited ones?

It doesn't need to be recursive. Explicit Stack or Queue can serve the navigating part. For collecting the result a HashSet can be used instead of List. It would serve two purposes - to allow you to skip the already visited elements and also eliminate the need of Distinct at the end.

Here is a sample Queue based implementation:

public List<Product> GetTopRelatedProducts(int N)
{
    var relatedSet = new HashSet<Product>();
    var relatedListQueue = new Queue<List<Product>>();
    if (RelatedProducts != null && RelatedProducts.Count > 0)
        relatedListQueue.Enqueue(RelatedProducts);
    while (relatedListQueue.Count > 0)
    {
        var relatedList = relatedListQueue.Dequeue();
        foreach (var product in relatedList)
        {
            if (product != this && relatedSet.Add(product) && product.RelatedProducts != null && product.RelatedProducts.Count > 0)
                relatedListQueue.Enqueue(product.RelatedProducts);
        }
    }
    return relatedSet.OrderByDescending(x => x.Rating).Take(N).ToList();
}

Update: For completeness, here are the other possible implementations of the related set collecting part:

With explicit Stack:

public List<Product> GetTopRelatedProducts(int N)
{
    if (RelatedProducts == null || RelatedProducts.Count == 0)
        return new List<Product>();
    var relatedSet = new HashSet<Product>();
    var pendingStack = new Stack<List<Product>.Enumerator>();
    var relatedList = RelatedProducts.GetEnumerator(); 
    while (true)
    {
        while (relatedList.MoveNext())
        {
            var product = relatedList.Current;
            if (product != this && relatedSet.Add(product) && product.RelatedProducts != null && product.RelatedProducts.Count > 0)
            {
                pendingStack.Push(relatedList);
                relatedList = product.RelatedProducts.GetEnumerator();
            }
        }
        if (pendingStack.Count == 0) break;
        relatedList = pendingStack.Pop();
    } 
    return relatedSet.OrderByDescending(x => x.Rating).Take(N).ToList();
}

Although a bit more verbose than the explicit Queue based implementation, this method has less space requirements - O(height) where height is the maximum depth.

The benefit of both iterative implementations is that of course they can handle much bigger depth than the recursive solutions which can lead to StackOverflowExpection. But if the depth is not expected to be so big and you prefer recursion, then here are a couple recursive implementations (all they need to have access to the relatedSet and this):

With classic private recursive method:

public List<Product> GetTopRelatedProducts(int N)
{
    var relatedSet = new HashSet<Product>();
    GetRelatedProducts(this, relatedSet);
    return relatedSet.OrderByDescending(x => x.Rating).Take(N).ToList();
}

private void GetRelatedProducts(Product product, HashSet<Product> relatedSet)
{
    if (product.RelatedProducts == null) return;
    foreach (var item in product.RelatedProducts)
        if (item != this && relatedSet.Add(item))
            GetRelatedProducts(item, relatedSet);
}

With recursive lambda:

public List<Product> GetTopRelatedProductsD(int N)
{
    var relatedSet = new HashSet<Product>();
    Action<Product> GetRelatedProducts = null;
    GetRelatedProducts = product =>
    {
        if (product.RelatedProducts == null) return;
        foreach (var item in product.RelatedProducts)
            if (item != this && relatedSet.Add(item))
                GetRelatedProducts(item);
    };
    GetRelatedProducts(this);
    return relatedSet.OrderByDescending(x => x.Rating).Take(N).ToList();
}

Last, but not least, with the latest C# 7.0 addition - recursive local function:

public List<Product> GetTopRelatedProducts(int N)
{
    var relatedSet = new HashSet<Product>();
    GetRelatedProducts(this);
    return relatedSet.OrderByDescending(x => x.Rating).Take(N).ToList();

    void GetRelatedProducts(Product product)
    {
        if (product.RelatedProducts == null) return;
        foreach (var item in product.RelatedProducts)
            if (item != this && relatedSet.Add(item))
                GetRelatedProducts(item);
    }
}

All these methods handle (IMO) optimally the collecting part. The top N part of course is not optimal - O(N*log(N)) and can be optimized as mentioned in @Amit Kumar's answer, but it would require implementing a missing standard data structure, which is outside the scope of a SO answer.

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=454042&siteId=1