Links related to Matryoshka, a potential SEO error

johnsonron

3 years ago

In 10 years of consulting I have always suggested to website and CMS developers to prefer absolute links over relative links. Let’s first see what changes between one type of link and another, for the less experienced.

Absolute and relative links, what changes?

Absolute and relative links mean the way an internal link is inserted into a web page, whether it is links to other HTML pages, images, CSS, JavaScript and any dependencies that may be contained in an HTML file.

The relative link inserts only the part of the URL relative to the page intohrefthe tag attribute.a

Absolute links instead insert the whole URL, including protocol (http or https), sub domain (with or without www) and domain.

To link the SEO Services page of this site I can use an absolute link:

or a relative link:

The relative link works with respect to the folder where the HTML page containing it is located and can have two syntaxes:

if it begins with the slash “/” the relative link is appended to the domain
if it starts without a slash “/” the link is appended to the URL of the page that contains the link

For example, if I am on the page https://www.evemilano.com/
courses / and I insert the relative link <a href=”/salotto-seo/”> link </a>, I will be sent to the page https: / /www.evemilano.com/salotto-seo/ since the relative link with slash is hung on the domain.

By inserting the relative link <a href=”alotto-seo/”> link </a>, I will be sent to the page https://www.evemilano.com/corsi/salotto-seo/ – which does not exist. Without starting slash, the link is hung on the URL of the page that contains the link, this is where the criticalities arise.

If the web server is misconfigured and responds with a page with status code 200 smihub, the relative link generates an infinite loop, a black hole for search engine spiders. Simple isn’t it?

How to find related links?

To search for relative links in an HTML page you can use the browser’s show source function, press CTRL + F to search and type:

This string will find all related links on the page.

However, if a relative link begins without a “/” slash it becomes a spider trap , and you can identify the problem with a Screaming Frog scan .

Another way to find “dangerous” relative links is using the regular expression to search the HTML for any character after href = “that is not a slash:

What problems do relative links cause?

This relative link feature opens the door to endless potential problems, especially in web servers that don’t handle 404 errors and websites without canonical tags . Let’s see an example.

A relative link without the slash inserted in the footer (boilerplate * part of a website) generates what I call Matryoshka , which is an infinite and growing series of internal links , the real hell for a spider.

By inserting a link like “lounge-seo /” in a footer I will create a succession, the spider trap I mentioned above:

https://www.evemilano.com/salotto-seo/
https://www.evemilano.com/salotto-seo/salotto-seo/
https://www.evemilano.com/salotto-seo/salotto-seo/salotto-seo/
https://www.evemilano.com/salotto-seo/salotto-seo/salotto-seo/salotto-seo/
https://www.evemilano.com/salotto-seo/salotto-seo/salotto-seo/salotto-seo/salotto-seo/
…

And so on ad infinitum, each page the spider encounters will contain a sub page pointing to lounge-seo /. This Matryoshka of internal links causes problems with crawl budget , indexing , ranking , duplicate content, etc, …

A trivial mistake like forgetting a slash can affect the SEO of an entire website, so take some advice: before going into production, take Screaming Frog for a spin . If the number of pages the spider encounters is greater than what you would expect, analyze the URL pattern and you may find that this is the problem.

A few days ago during the first crawl of a website I noticed that the number of URLs the spider was finding was exaggerated and looking at the list of crawled pages I found longer and longer URLs with recurring patterns:

example.com/home/servizio/
example.com/home/servizio/home/
example.com/home/servizio/home/home/…

I opened the HTML code and immediately went through all the related links on the page. To my surprise, I found a widget in the footer that used relative links with no leading slash. Critical problem solved in 15 minutes, you just need to know where to look.

Creating a relative path

Sometimes, it is necessary to include more information than a file name to create a relative path. For example, if page1.html resides in the root directory while page2.html resides in a subdirectory called folderA, a relative link from page1.html to page2.html should include the folder name followed by a forward slash:

Note that folder and file names are always case-sensitive in URLs! Domain names, however, are never case sensitive.

Let’s say you want to create a link on page2.html that takes you back to page1.html. To create a path relative to a parent directory, simply use a colon followed by a forward slash at the beginning of the link:

Adding ../ instructs the browser to search the upper folder hierarchy to find the desired file.

Now, imagine a folder structure with a root directory containing folder A and folder A containing folder B. If page1.html is in the root directory and page3.html is in folder B, you can create a link from page3.html to page1.html as follows:

You can keep adding ../ whenever you want to move up one directory, but what if you need to move up and then down? For example, imagine that this same root directory also has a folder called subfolder, which contains the contactpage.html file. If page2.html is in folder A and you want to create a relative path from page2.html to contactpage.html, use the following link format