Combined with an overview of performing semantic searches on your own private data

What is Semantic Search?

Semantic search is a search technology that uses natural language processing algorithms to understand the meaning and context of words and phrases to provide more accurate search results. This approach is based on the idea that a search engine should not only match keywords in a query, but also try to understand the intent of the user's search and the relationship between the words used.

Semantic search aims to go beyond traditional keyword-based search algorithms by using techniques such as entity recognition, concept matching, and semantic analysis to identify the relationship between words, phrases, and concepts. It also considers synonyms, related terms, and context to provide more relevant search results.

Overall, semantic search aims to provide more precise and meaningful search results that better reflect user intent beyond just matching keywords. This makes it especially useful for complex queries, such as those related to scientific research, medical information, or legal documents.

History of Semantic Search

The concept of semantic search dates back to the early days of computer science, with attempts to develop natural language processing systems in the 1950s and 1960s. However, it wasn't until the 1990s and 2000s that significant progress was made in the field of semantic search, thanks in part to advances in machine learning and artificial intelligence.

One of the earliest examples of semantic search was the Cyc project created in 1984 by Douglas Lenat. This project aims to build a comprehensive commonsense knowledge ontology or knowledge base that can be used to understand natural language queries. Although the Cyc project faced many challenges and ultimately did not achieve its goals, it laid the foundation for future research in semantic search.

In the late 1990s, search engines such as Ask Jeeves (now Ask.com) began experimenting with natural language query and semantic search techniques. These early efforts were limited by the technology of the time, but they demonstrated the potential of more sophisticated search algorithms.

The development of the Web Ontology Language (OWL) in the early 2000s provided a standardized way to represent knowledge and relationships in a machine-readable format, making it easier to develop semantic search algorithms. Companies such as Powerset, acquired by Microsoft in 2008, and Hakia, launched in 2007, are using semantic search technology to provide more relevant search results.

Today, many search engines and companies are using semantic search to improve the accuracy and relevance of search results. These include Google, which launched the Knowledge Graph in 2012, and Amazon, which uses semantic search to power its Alexa virtual assistant. As the field of artificial intelligence continues to develop, semantic search is likely to become more sophisticated and applicable to a wide range of applications.

Recent Improvements to Semantic Search

Semantic search has seen some recent improvements that help push the field further. Some of the most notable include:

Transformer-based models: Transformer-based models, such as BERT (Bidirectional Encoder Representations from Transformers), have revolutionized natural language processing and semantic search. These models are better able to understand the context of words and phrases, making it easier to provide more relevant search results.
Multimodal Search: Multimodal search refers to the ability to search for information across multiple modalities such as text, images, and video. Recent advances in machine learning have made it possible to develop more accurate and complex multimodal search algorithms.
Conversational Search: Conversational search involves the use of natural language processing and machine learning to provide more accurate and humane responses to user queries. The technology is already used in virtual assistants such as Amazon's Alexa and Apple's Siri.
Personalization: Personalization refers to the ability to customize search results based on a user's preferences and previous search history. This becomes increasingly important as the amount of data available online continues to grow.
Domain-Specific Search: Domain-specific search involves searching within a specific domain or industry, such as healthcare or finance, using semantic search techniques. This helps provide more accurate and relevant search results to users in these industries.

Overall, recent advances in semantic search have made it easier to find information online and paved the way for more sophisticated search algorithms in the future.

What is the relationship between semantic search and knowledge graph?

Semantic search and knowledge graphs are closely related because both involve the use of semantic techniques to improve search results.

A knowledge graph is a structured information database that uses semantic techniques to represent knowledge in a machine-readable format. It usually consists of entities (such as people, places, and things) and the relationships between them. For example, a knowledge graph might contain information about a specific company, including its location, products, and employees, and the relationships between these entities.

Semantic search, on the other hand, is a search technology that uses natural language processing and machine learning to better understand the meaning of words and phrases in search queries. Semantic search algorithms use knowledge graphs and other semantic techniques to analyze the relationships between entities and concepts and provide more relevant search results based on this analysis.

In other words, the knowledge graph provides the underlying structure and data for semantic search algorithms. By leveraging the relationships and context provided by knowledge graphs, semantic search algorithms are able to provide more accurate and meaningful search results that better match user intent.

For example, Google's Knowledge Graph uses a vast database of structured data to power its search results and provide additional information about entities (such as people, places, and things) that appear in search results. This makes it easier for users to find the information they are looking for and explore related concepts and entities.

Vector databases, knowledge graphs and semantic search

Vector databases are another technology that can be used in conjunction with semantic search and knowledge graphs to improve search results.

Vector databases use machine learning algorithms to represent data as vectors, which are mathematical representations of data that can be used for various computational tasks such as similarity searches, clustering, and classification. These vectors can be used to represent entities, concepts, and other types of data in a manner that allows for more accurate and efficient processing.

In the context of semantic search and knowledge graphs, vector databases can improve the accuracy of search results by better understanding the relationships between entities and concepts. For example, vectors can be used to represent entities such as people, places, and things and the relationships between them. By comparing these vectors, search algorithms can identify relationships and patterns that might not be immediately apparent in the data itself.

For example, when a user searches for "Paris," a semantic search algorithm can use knowledge graphs and vector databases to understand that the user is likely referring to the city of Paris, France, rather than other entities of the same name. By using vector databases to represent and compare entities and concepts, search algorithms can provide more relevant and accurate search results.

Overall, vector databases, semantic search, and knowledge graphs are all technologies that work together to improve the accuracy and efficiency of search algorithms. By leveraging these techniques, search engines and other applications can better understand the relationships between entities and concepts, making it easier to find the information users are looking for.

How to Semantic Search Your Own Private Data

In my previous article " ChatGPT and Elasticsearch: OpenAI meets private data (1) ", I described in detail that although our LLMs (Large Language Models) can achieve semantic search, due to its limitations, it cannot be used for private data. Semantic search, since private data is invisible to LLMs. In addition, since each training of LLMs requires a lot of cost, it cannot be trained on new data in time, which also makes its use somewhat limited. As I introduced in my previous article, we can use Elasticsearch combined with LLMs to jointly complete semantic search:

For detailed steps on this demonstration, see the article " ChatGPT and Elasticsearch: OpenAI meets private data (Part 2) ".

For semantic search, in addition to the above solutions, Elastic also provides a release called Elasticsearch Relevance Engine™. We can use Elastic's out-of-the-box Learned Sparse Encoder model to implement ML-based search without training or maintaining the model, which can provide highly relevant and semantic search in various fields. For detailed reading, please refer to:

If you want to learn more about NLP and semantic search, please refer to the " NLP - Natural Language Processing and Vector Search " chapter in " Elastic: A Developer's Guide ".

Elasticsearch: An Overview of Semantic Search, Knowledge Graphs, and Vector Databases