Vuepress configures Algolia search method and pit avoidance guide

Today is the first day of the 2023 college entrance examination. As a freshman in high school, I also signed up to experience it. I feel that the Chinese and math problems in Shandong this year are not very difficult (at least easier than the first model and the second model) (of course my The grades are definitely bad)

Best of luck to all candidates!

What's this

The Slogan of the Algolia DocSearch official website is this: Free Algolia Search For Developer Docs, that is to say, Algolia provides free search services for developers.

In fact, this is a search engine for a certain site. It will crawl your website just like a search engine, and then provide you with an API interface. When users search on your website, they only need to call the API. Can.

Don't underestimate it, the following projects have adopted their services:

1686139297189

In addition, I would like to make a point before starting the text: Although DocSearch is the only understanding of Algolia for many of us, their business is more than that.

how to use

This article introduces the access method to apply for crawling from the official website. According to its documentation, you can also run the crawler yourself and upload the data to Algolia to obtain services. Please read the research yourself if necessary: ​​Run your own | DocSearch by Algolia

Apply for crawling

First visit DocSearch by Algolia , click "Apply" on this page, fill in your website address, email address and open source address (DocSearch requires that your website must be open source).

After applying, just wait for the official to send you an email. Algolia will send two emails to inform you that the application is approved and the crawling is completed. According to my experience, the delivery time of the two emails is within 15 minutes. I waited three days before receiving my first email.

Get API information

In the official email, extract appId apiKeyand indexNamethree information, and then configure it according to the documentation of the Vuepress framework you use.

1686140121886

After the configuration is complete, please try to search for a keyword (make sure there are documents in your website that contain this keyword), if the search results are returned normally, congratulations, you have completed the configuration; if you are like me, you can search for everything No Results, then please continue to look down.

Modify crawling configuration

Why can't I find anything after crawling? This is because when Algolia crawls, it only indexes the text under the element that matches the specified element selector on each page, so in most cases we need to manually specify the selector.

Configuration address: Crawlers | Crawler Admin Console

On the home page, click on your application, then on the new page click on Editor on the left:

1686140459241

1686140469015

In the configuration interface is a very long JS file, we only need to pay attention to the first part:

new Crawler({
    
    
  rateLimit: 8,
  maxDepth: 10,
  maxUrls: 5000,
  startUrls: ["https://www.yixiangzhilv.com/"],
  renderJavaScript: false,
  sitemaps: ["https://www.yixiangzhilv.com/sitemap.xml"],
  ignoreCanonicalTo: true,
  discoveryPatterns: ["https://www.yixiangzhilv.com/**"],
  schedule: "at 12:20 on Monday",
  actions: [
    {
    
    
      indexName: "yixiangzhilv",
      pathsToMatch: ["https://www.yixiangzhilv.com/**"],
      recordExtractor: ({
     
      helpers }) => {
    
    
        return helpers.docsearch({
    
    
          recordProps: {
    
    
            lvl1: ".page-container h1",
            content: ".theme-reco-default-content p, .content__default li",
            lvl0: {
    
    
              selectors: "p.sidebar-heading.open",
              defaultValue: "Documentation",
            },
            lvl2: ".theme-reco-default-content h2",
            lvl3: ".theme-reco-default-content h3",
            lvl4: ".theme-reco-default-content h4",
            lvl5: ".theme-reco-default-content h5",
            lang: "",
            tags: {
    
    
              defaultValue: ["v1"],
            },
          },
          aggregateContent: true,
        });
      },
    },
  ],
  ...
}

You can configure the middle startUrls pathsToMatchparameters according to your own needs. It should be noted that the URL listed here in the document that may be generated for the first time at some time is not the root directory of the website, but etc. Please pay attention to screening /docs/**. In addition, if it is a front-end rendering project, you need to enable renderJavaScriptthe option (2023.7.12 Supplement: I did not enable it at first and it was successful. Later, I started reporting errors and found this problem. I don’t know why it was not enabled before)

What we should focus on is recordProps. Below I release the default configuration I got and my modified configuration, you can compare:

default allocation

recordProps: {
    
    
  lvl1: ".content__default h1",
  content: ".content__default p, .content__default li",
  lvl0: {
    
    
    selectors: "p.sidebar-heading.open",
    defaultValue: "Documentation",
  },
  lvl2: ".content__default h2",
  lvl3: ".content__default h3",
  lvl4: ".content__default h4",
  lvl5: ".content__default h5",
  lang: "",
  tags: {
    
    
    defaultValue: ["v1"],
  },
},

Modified configuration

recordProps: {
    
    
  lvl1: ".page-container h1",
  content: ".theme-reco-default-content p, .content__default li",
  lvl0: {
    
    
    selectors: "p.sidebar-heading.open",
    defaultValue: "Documentation",
  },
  lvl2: ".theme-reco-default-content h2",
  lvl3: ".theme-reco-default-content h3",
  lvl4: ".theme-reco-default-content h4",
  lvl5: ".theme-reco-default-content h5",
  lang: "",
  tags: {
    
    
    defaultValue: ["v1"],
  },
},

:::
::::

See the difference? In fact, we need to tell Algolia what element to extract text from according to the element position of the body of our website in the HTML document. For example, for the vuepress-theme-reco theme I use, it needs to be extracted .theme-reco-default-contentfrom :

1686140985552

After modification, you can enter the URL of an interface of your own website to test in the URL Tester on the right side of the website (note that you choose the text interface instead of the homepage, after all, there is nothing on the homepage for indexing), if you see that there is content in Records It is success.

1686141171926

After that, click the small eye marked 4 in the above picture to return to the Overview interface, click the Restart crawling button in the upper right corner to restart the crawler, and wait patiently for the crawling to complete!

1686141296294

References

Guess you like

Origin blog.csdn.net/weixin_44495599/article/details/132022146