Table of Contents

💡 Goals of this “Blog” is to initially propose some future directions in the fields of diversity in modern-IR/Recommendation systems based on some insights and analysis on related papers. Thus, another objective of this is to encourage research activity in this direction and enable opportunity for collaboration and discussion on next steps!

During my last previous research on retrieval problems, I gained an understanding of multi-aspect queries and their relevance to documents. This exploration led me to appreciate how “relevant” documents emerge from a plethora of diverse aspects and domains, introducing an interest in the concept of “Diversity” within Information Retrieval (IR) and Recommendation Systems—a direction I found quite fascinating.

While numerous surveys have focused on this domain, many are somewhat dated. However, they still provide a rich, compelling foundation to build upon. Notably, these resources do not cover approaches in modern IR systems, such as those in the Generative IR (Gen-IR) and Retrieval-Augmented Generation (RAG) era. With a passion for and commitment to both the fields of Large Language Models (LLM) and IR, my goal is to bridge these two domains. This endeavor aims to enhance our understanding and open new opportunity for diversity in “modern retrieval.”

Traditionally, “diversity” in retrieval was synonymous with search result diversification, seeking to enrich the variety within retrieval or recommended lists. However, the groundbreaking era of LLMs has introduced the application of LLMs at various stages of the IR framework, offering fresh perspectives on “diversity” with an ambition to not only refine the effectiveness and relevance of modern IR systems but to move beyond result diversification.

Therefore, this blog is structured around two main themes: Diversity in enhancing search result diversification (traditionally defined within the scope of IR diversity) and “Additional” Diversity in Modern Retrieval/Recommendation Systems. In discussing various subsections or papers, I also share my insights, including analyses of strengths, weaknesses, and potential future directions.

1. Diversity in improving search result diversification

1.1. Diversity Definitions in Search Result Diversification

“Diversity” can depict different things in the expected list of recommendations. Luckily, there have been different definitions of search result diversification definitions introduced in the literature. However, in summary, they are usually categorized into 4 types of “diversity” definitions [Drosou et al.,2010]:

💡 Although there are multiple definitions of diversity, there are still inconsistent or not clear use of these terms in research…

1.2. Diversifying search results approaches

There have been a widely well-rounded survey about some of diversifying approaches introduced in the paper [Wu et al. 2022]. Therefore, my goal in this section is to introduce some additional approaches and provide some analysis on key characteristics of each of these approaches. To provide the consistent analysis, I inherit similar structures and taxonomy introduced in mentioned survey paper. Diversifying search result approaches can be categorized into 3 main themes includng pre-processing, in-processing and post-processing approaches.

1.3 Diversity Metrics

There have been many diversity metrics have been introduced and implemented to evaluate methods in both search and recommendation domains. However, all of these metrics aim to evaluate the dissimilarity and non-redundancy among a list of items. [Zheng et al,. 2017] provides a great comprehensive collection of metrics so many parts of this sections are adopted in the paper.

2. “Additional” Diversity in Modern Retrieval/Recommendation System

With the revolutionary era of LLM, there have been the implementation of LLM in different stages of IR framework, especially the introduction of RAG system. According to, it is significant to me that LLM have been or (can be) implemented in any stages of the IR system.

Zhu et al. 2024., Figure of applications of LLMs in different components of modern IR systems. It has been shown that LLMs have been applied to every components of IR systems

In the current stage, not lots of proposed approaches directly related to diversification in search retrieval/recommendation as introduced in previous sections, but I argue that many proposed approaches also consider some characteristics of “diversity” with the goal to improve even only the effectiveness and relevancy of modern IR system.

In this structure, I included some papers that utilize “diversity” in their proposed techniques and methods. This sections are divided by different components of modern IR systems.

Rewriter (effective retriever + diversified result + capture users intent)

Complex or Ambiguous Queries can contains multiple aspects but vanila RAG systems usually rely on the same sets of retrieved contexts to generate the answers. Thus, these papers propose techniques that how “diversity” can help in this domain. All of the below papers refer to techniques of diversifying the queries using different approaches, which the goal of better capturing the intents of users or decompose complex queries. Thus, these steps can aid to the retrievals models to search for diverse and enough evidence to give the final results.

  1. Generative Relevance Feedback with Large Language Models, Mackie et al., SIGIR 2023 (short paper). [Paper] - Peudo-relevance feedback (PRF) methods, which can falter with non-relevant initial results. GRF innovatively utilizes Large Language Models (LLMs) to generate long-form text for creating probabilistic feedback models, exploring various zero-shot generation subtasks like queries, entities, facts, and more. This approach considers the diversity on prompting approaches since they show that combinations of all prompting methods can better capture users’ intent.
  2. GRM: Generative Relevance Modeling Using Relevance-Aware Sample Estimation for Document Retrieval, Mackie et al., arXiv 2023. [Paper] - To tackle some of above paper of limitations parts, where lots of irrelevant information is introduced in the system. The current paper (with the same author) introduces Generative Relevance Modeling (GRM), a technique to enhance query expansion effectiveness by filtering out irrelevant information generated by Large Language Models (LLMs). GRM leverages Relevance-Aware Sample Estimation (RASE) to more accurately weight expansion terms, using a neural re-ranker to assess the relevance of documents similar to those generated. Testing on three document ranking benchmarks, GRM demonstrated improvements in both Mean Average Precision (MAP) and Recall (R@1k), outperforming existing methods. This approach first to extract potential subtopics from queries and use that to generate diverse documentations/aspects for that queries.

  1. Agent4Ranking: Semantic Robust Ranking via Personalized Query Rewriting Using Multi-agent LLM. This is an interesting approaches to utilize an emerging fields of simulations in LLMs, for this steps, the goal is to improve the personalizing objectives when the model aims to generate diverse queries that tackle different user types and diverse population.

Retriever

For retriever steps, most of below papers focus on diversity in data types where the first paper introduce the diverse representations can improve the text retrieval. Meanwhile, the second paper utilize the combination of text and graphs information to improve the search relevancing specializing in e-commerce domains.

  1. Sparse, Dense, and Attentional Representations for Text Retrieval, (Luan et al, 2021)[Paper] - The study proposes a novel neural model that aims to combine the efficiency of dual encoders with the expressive power of more complex attentional architectures. Additionally, the exploration of sparse-dense hybrid models seeks to leverage the precision of sparse retrieval methods. The outcomes suggest these innovative approaches surpass strong existing alternatives in large-scale retrieval scenarios, offering new directions for enhancing retrieval performance through a combination of techniques from both dense and sparse retrieval paradigms.
  2. An Interpretable Ensemble of Graph and Language Models for Improving Search Relevance in E-Commerce, Choudhary et al,. 2024. [Paper] - The paper discusses the complexity of ensuring search relevance in e-commerce, highlighting the difficulty in aligning user queries with suitable products due to nuanced queries. Traditional methods like language models and graph neural networks struggle with the rapid pace of technological advancements, making practical application challenging. This is compounded by issues of model generalizability, experimentation costs, and a lack of interpretability. The proposed solution, Plug and Play Graph Language Model (PP-GLAM), addresses these challenges by offering an explainable, modular framework that improves search relevance through an ensemble of models. It enhances diversity in search results by integrating different signals and models, ensuring broad coverage of user intents and product relationships. PP-GLAM outperforms existing models on complex, real-world e-commerce datasets

Reranker

Most existing work utilizing LLM as reranker confirm the validity of LLM to rerank the sets. However, most of these approaches only focus on relevancy instead of diversity in recommendation.

  1. Enhancing Recommendation Diversity by Re-ranking with Large Language Models [Paper] - This paper utilize LLM for reranking process to evaluate the diversity in the recommendations and show that LLM-based re-ranking does not perform well as the traditional re-ranking approaches. The prompt design implications from this paper are to prompt for diversity instead of balancing diversity and relevance.

Diversity-Retrieved-Aid in Generations

Since below research bring interesting approaches to incorporate “diversity” into different perspectives of RAG systems, they do not just focus on one component, thus I put all of these paper in this sections to highlight their approaches. As mentioned above in previous sections, there are some challenges related to RAG systems especially in how the generations/reader models process these diverse contexts, where the contexts may contain irrelevant information, misinformation.

  1. To filter the irrelevant in the retrieval results, REAR: A Relevance-Aware Retrieval-Augmented Framework for Open-Domain Question Answering, (Wang et al., 2023) [Paper] -, a novel approach enhancing Large Language Models’ (LLMs) effectiveness in open-domain QA by improving their ability to discern and utilize relevant external knowledge within Retrieval-Augmented Generation (RAG) systems. By integrating a specially designed rank head for accurate relevance assessment and employing advanced training methods, REAR significantly outperforms existing RAG models in open-domain QA tasks. Access to the code and data is provided for further exploration. This approach consider different granularities of relevance allow it to look at different levels, thus can include a “diverse” or broad range of information to be deemed relevant and included in the final response.

  2. Additionally challenge in RAG system is the intepretability of the system and reasoning, which also happen in search result diversification domain. To address this, Diverse in prompts ReACT - ReAct, an innovative approach utilizing large language models, integrates reasoning and action to improve decision-making and information retrieval tasks. By intertwining reasoning traces with actionable steps, it dynamically updates action plans and interfaces with external sources, significantly enhancing task performance and interpretability. Demonstrating superior results on benchmarks like HotpotQA and Fever, and outperforming existing learning methods in interactive scenarios, ReAct showcases a marked improvement in accuracy, interpretability, and trustworthiness in complex language and decision-making tasks.
  3. Finally, the AutoCoT : AUTOMATIC CHAIN OF THOUGHT PROMPTING IN LARGE LANGUAGE MODELS paper presents interesting approach of how diversity in demonstration can aid to the reasoning steps. This approach performs similar approaches as other diversity-construction by clusterings.

3. Summary and Future Directions

Through some of my analysis on current trends and approaches, I believe that these mentioned directions (📡) can further improve the strengths (✅) and weaknesses (🚫) of current and existing approaches

Search Result Diversification:

Fact-Checking:

However, since the development of LLM in IR systems there are some potential posing challenges regardings to retrieve diverse contexts/evidence for generations or even. The widely acknowledged problems mentioned in the implementations of LLMs in these components are hallucinations and facts-verification issues.

In more details, introduced diverse contexts may post conflicts in facts or even introduce irrelevant information into the generations model.

A recent survey Xu et al., 2024 has detail informations about different types of conflicts in LLMs generation including parametric and non-parametric knowledge, conflict in the contexts, conflict in the LLM generations. The survey paper also include diverse types of solutions that can aid… Thus, I highly recommend to check this paper out!

Transparency and explainability:

Curated List of Papers:

  1. Diversity in improving search result diversification
  2. “Additional” Diversity in Modern Retrieval/Recommendation System:

📖 Contribution Guidelines:

☘️About

For further collaboration, inquiries, or contributions to this project, please feel free to reach out. I am Hy Dang, a PhD student at the University of Notre Dame with the interest in NLP, IR, Search or Recommendation, the creator of this page, specializing in Information Retrieval and Diversity Research. I welcome any queries or suggestions to enhance the resourcefulness of this GitHub page. You can contact me via email at hdang@nd.edu. Looking forward to potential collaborations and enriching discussions on this topic.