Troubleshooting the crawler | Community
Skip to main content

Troubleshooting the crawler

  • February 17, 2022
  • 0 replies
  • 0 views

In this post we try to provide steps to trouble shoot the issues you can run unto when you set up the crawler. We will continuously add more troubleshooting guides to more issues, but it currently covers:

  • Domain ownership could not be verified
  • Sitemap could not be processed

Please also add issues you need troubleshooting guides for in the comments and any other feedback you have to the content of this post.

Domain ownership could not be verified

If you have received an email saying that the domain ownership could not be verified or seen it in the Crawler settings UI then:

  1. Make sure the homepage of your website (otherwise known as the index or root page) is up and publicly available. The page should not have any user login, password, IP restrictions or similar.
  2. Make sure the crawler’s domain verification tag (no typos) is implemented in the <head> section of your homepage of the website you want to crawl. Even if your crawler is set up to only crawl a subset of pages on your website the domain verification tag should be placed on the homepage.
    Note: you can have several verification tags for different crawlers on the same domain.
  3. Make sure you’ve included the correct tag from your crawler, and that it is added in its entirety, with no typos. 

The crawler will try to verify the domain ownership next time it runs, which can take up to 24 hours and you will be notified via email if the verification fails. 

The verification tag looks like this:

<meta name='zd-site-verification' content=’crawler-verification-token />

You must place this verification meta tag within the <head> section and before the <body> section in the source code of your home page.

If you’ve placed your tag correctly, it would look like this:

<html>

  <head>

    <meta name="zd-site-verification" content="crawler-verification-token">

    <title>Title</title>

    <style>

      <!-- style info here -->

    </style>

  </head>

  <body>

    <!-- body of the page here -->

  </body>

</html>


Make sure you aren’t nesting your verification tag within another tag of the <head> section. For example, this would fail, because the tag has been placed within <style>.

<html>

  <head>

    <title>Title</title>

    <style>

          <meta name="zd-site-verification" content="crawler-verification-token">

    </style>

  </head>

  <body>

    <!-- body of the page here -->

  </body>

</html>


Ensure you have not placed your verification tag within the <body> section of your page’s source code. For example, this is incorrect:

<html>

  <head>

    <title>Title</title>

  </head>

  <body>

    <meta name="zd-site-verification" content="crawler-verification-token">

  </body>

</html>

Additionally, make sure the source code of your page has a <head> section. For example, this would be incorrect:

<html>

    <meta name="zd-site-verification" content="crawler-verification-token">

</html>

 

Sitemap could not be processed

  1. Read the "1) Add sitemap" section in this post for explanation on how the crawler uses your sitemap.
  2. Make sure the sitemap is being served and publicly available. The page should not be restricted by any user login, password, IP restrictions or similar.
  3. Make sure the crawler points to the URL of the sitemap that your crawler wishes to use. To make 100% sure the crawler finds your sitemap, use the full URL including the file name e.g:
    https://example.com/custom-sitemaps/custom-sitemap1.xml
    Note: Currently you cannot edit the sitemap URL of an existing crawler so you will have to create a new crawler with the right sitemap URL if the crawler was not setup right.
  4. Make sure your sitemap is an XML URL sitemap as we currently only support this. The sitmap The sitemap has to follow the Sitemaps XML protocol.


-----------------------------------
Please also add issues you need troubleshooting guides for in the comments and any other feedback you have to the content of this post.