Book Now
Advanced Technical SEO

URL Syntax

The URL is the basis of positioning, since search engines are a directory/search engine for URLs.

URL Syntax
Author:
Carlos Sánchez
Topics:
Crawling
,
LinkBuilding
Publication Date:
2024-08-28

Last Review:
2024-08-28

To understand what a URL is, its syntax, and how it can affect SEO, it's essential to first grasp the basics and where it originates.

URI

A URI (Uniform Resource Identifier) is a string of characters that uniquely identifies a resource on the web. In other words, it is a way of naming or identifying a resource (be it a webpage, an image, an audio file, etc.) on the internet.

Categories of URI

Officially, a URI is divided into two main categories: URN (Uniform Resource Name) and URL (Uniform Resource Locator). URLs are a type of URI used to specify the location of a resource on the internet, while URNs are used to identify the resource uniquely, regardless of its location.

URL

A URL is an address that indicates where a resource is located. When a URL is accessed via a browser, it requests the information from the server hosting that resource, and the server executes what it has programmed for that case. The browser then interprets this information. This is a basic explanation of what a URL is for and how a website works.

URLs are case sensitive. This means that an uppercase letter is different from a lowercase one and leads to two different destinations.

However, the address of an internet resource has many characteristics that allow you to reach the desired resource. This is why it is important to know the syntax and each of the parts.

URL syntax for SEO
The syntax of an absolute URL

Protocol

The protocol of a URL provides information on how the connection between the user's browser and the web server hosting the resource should be handled. For example, HTTPS uses an additional security layer to protect the data transmitted between the user and the server, while FTP uses a different set of commands to transfer files.

The most common protocols used in URLs are HTTP (Hypertext Transfer Protocol) and HTTPS (Hypertext Transfer Protocol Secure), which are used to access websites and transfer information between servers and clients. Other protocols include FTP (File Transfer Protocol), which is used to transfer files, and SMTP (Simple Mail Transfer Protocol), which is used to send emails.

Regarding SEO, the protocols that interest us are HTTP and HTTPS, which are necessary for a website and its operation. It is always better to use HTTPS as it provides greater security for the user, who will not receive warnings from their browser about the potential insecurity of the website.

Mixed Content

When a webpage uses different protocols, such as HTTPS and HTTP, to load resources, this is known as mixed content. In the example mentioned, the webpage is served through HTTPS, but an image is attempted to be loaded via HTTP, creating mixed content.

Mixed content can be a security concern as it compromises the privacy and integrity of the transmitted data. Modern browsers often block or warn about mixed content to protect users from potential security risks. It is important to ensure that all resources on a webpage are loaded through a secure protocol like HTTPS to avoid mixed content issues.

Subdomain

A subdomain is an extension of a main domain that acts as an independent website within that domain. Essentially, a subdomain allows the main domain to be divided into several logical sections or websites, each with its own content and structure. A subdomain is represented as a prefix added to the main domain, separated by a dot. For example, in "carlos.sanchezdonate.com", "carlos" is the subdomain and "sanchezdonate.com" is the main domain.

Once you have a domain, you can create as many subdomains as you want, as they do not incur any additional cost.

Domain

A domain is a unique address on the web that identifies a particular website. It is like the postal address of a business on the Internet, and it consists of two parts: the domain name and the domain extension. For example, in "sanchezdonate.com", "sanchezdonate" is the domain name and ".com" is the domain extension.

In the URL, the domain indicates the server and which part of the server is being accessed.

When a user enters a web address in their browser, such as "sanchezdonate.com", the browser sends a request to the DNS (Domain Name System) server to obtain the IP address associated with that domain name. The DNS server is a hierarchical naming system that functions like a telephone directory for the Internet, translating domain names into IP addresses.

The process of converting a domain name into an IP address is called domain name resolution, and it involves several steps. First, the local DNS server (usually provided by the user's Internet service provider) checks its DNS cache to see if it has previously resolved that web address. If so, it returns the IP address stored in its cache. If not, it sends a request to the root DNS server, which is the first level of authority in the domain name system.

The root DNS server does not have information about the requested domain name, but it can direct the local DNS server to the top-level DNS server (TLD) responsible for that particular domain. For example, if "sanchezdonate.com" is being searched for, the root DNS server may direct the local DNS server to communicate with the ".com" top-level DNS server.

The ".com" top-level DNS server has information about all domains with the ".com" extension and can provide the IP address of the web server hosting the "sanchezdonate.com" website. The local DNS server stores this information in its cache and returns the IP address to the user's browser, which finally connects to the web server using the IP address.

In summary, DNS is a hierarchical naming system that converts domain names into IP addresses. The domain name resolution process involves several steps, including queries to local, root, and top-level DNS servers, to obtain the IP address associated with a particular domain name. This process is called domain name resolution and allows users to access websites using easy-to-remember domain names instead of having to remember complex numerical IP addresses.

Domains are acquired through domain registrars, which are companies that offer the service of registering and managing domains on the Internet. To register a domain, you first need to check its availability and then "purchase" it, which is really renting because you cannot acquire it permanently.

Domain Extension

The domain extension is the set of letters that always follow a dot after the domain name. There are many different types available, but they must be listed in the IANA TLDs list.

Domain extensions are divided into two main categories:

Generic Top-Level Domain Extensions (gTLD)

gTLD domain extensions are the most popular and are generally considered more generic.

Country Code Top-Level Domain Extensions (ccTLD)

ccTLD domain extensions, on the other hand, are often used for regional or local websites and can positively impact geolocated SEO, depending on the settings used.

Special Extensions

Some special domain extensions can impact a website's SEO. For example, search engines may assign greater authority and trust to websites using certain special domain extensions like:

These special domain extensions indicate that the website undergoes a verification and validation process. Therefore, search engines often consider websites with these domain extensions to be more trustworthy and relevant in certain areas, such as education or government services.

It's important to note that, although these special domain extensions can positively impact SEO, the content and quality of the website remain the most important factors for its search engine ranking.

Combination of Extensions

Although it is somewhat unorthodox, there is a combination of domain extensions, such as: .edu.es or .co.uk. The ideal and simplest way to obtain these types of extensions is to look for a domain registrar that provides these extensions directly.

Port

Ports are indicated by a colon and a sequence of numbers after the domain extension. The standard ports are 443 for HTTPS and 80 for the HTTP protocol. Each protocol has a default port number unless specified otherwise. Deviating from this standardisation can lead to serious SEO issues.

Path

The path of a URL is the part of the web address that follows the domain and identifies the specific location of a page within the website. For example, in the URL "https://carlos.sanchezdonate.com/articulo/codigos-de-respuesta/", the path would be "/articulo/codigos-de-respuesta/".

The path of a URL can affect SEO in several ways. Here are some key considerations:

File Extension

A file extension in a web address indicates the type of file it is, whether images (such as png, jpg, gif), videos (mp4, mov, avi), PDFs, or even web extensions (html, php, py, asp).

In the case of websites, displaying these extensions does not pose an issue as they are perfectly SEO-friendly URLs. The important thing is to maintain standardisation and ensure the entire website functions consistently. However, for aesthetic reasons, there are ways to visually remove file extensions in a URL.

Parameter

Parameters are a variant of the URL itself. They can modify the content or simply be used for analytics and other features the project may need or present.

Parameters are anything in a URL that follows a "?". They can be added to any website from the user's side. Only the request made by the user from outside is modified.

To prevent parameters from negatively impacting SEO when they do not generate entirely different content, it is important to use a self-canonical.

Parameters can be useful for better file management and work well with cache management.

A page with parameters is generally considered an independent URL from another, as it usually acts as a different page. To manage this, parameter redirection can be implemented, but it is a complex process.

Hashbang or Anchor

The hashbang in a URL (also known as a "hash fragment") is an exclamation mark followed by a string of text placed after the "#" symbol in a web address. For example, in the address "https://carlos.sanchezdonate.com/articulo/renderizacion-de-javascript-en-el-seo/#incremental-static-regeneration-isr", the hashbang is "#incremental-static-regeneration-isr".

The hashbang is used in some web applications to allow users to navigate to different sections of the application without reloading the entire webpage. When a user clicks on a link containing a hashbang, the web browser reads the hashbang and sends a request to the web server to load the content corresponding to that fragment of the webpage. The web server responds with the specific content corresponding to the hashbang and loads it in the appropriate section of the webpage without reloading the entire content of the page.

The use of hashbangs in URLs can be beneficial for improving the user experience in a web application, as it allows for smoother navigation without the need to reload the entire page each time a new section is accessed.

However, the excessive use of hashbangs offers no direct benefit to SEO, as Google cannot crawl them:

Anchor Link Redirections

Hashbangs are managed from the user's side, not from the server. Therefore, redirections cannot be performed from the server.

If redirections are needed, they can be done from the user's side via JS. However, this would be a user-focused implementation, which would not affect search engines (at most, negatively due to having to load more JS, but it doesn't affect it excessively).

This code, for example, redirects all anchor links to their lowercase version, replacing "_" with "-":

window.addEventListener('load', function() {
var fragment = decodeURI(window.location.hash.substr(1));
if (fragment.indexOf('_') !== -1) {
var newFragment = fragment.replaceAll('_', '-').toLowerCase();
history.replaceState(null, '', window.location.href.replace(fragment, newFragment));
} else if (fragment.match(/[A-Z]/)) {
var newFragment = fragment.toLowerCase();
history.replaceState(null, '', window.location.href.replace(fragment, newFragment));
}
});

Relative and Absolute URLs

URLs are a way of specifying the address of a resource on the internet. When the resource is on the same domain, it is not necessary to specify the complete URL, and the resource can be requested (whether through the src attribute, href, or any method that requires a URL) from the site where it is located using various shortcuts.

This can streamline certain tasks during migrations or enable complex functionalities. Meanwhile, the absolute URL is simply the normal URL.

There are different ways to create a relative URL:

URL Type in ExampleDescription
<a href="pagina">example</a>The page is in the same folder as the current page
<a href="categoria/pagina">example</a>The page is in the category folder, which is in the path of the current page
<a href="/categoria/pagina">example</a>The page is in the category folder that descends directly from the root folder, ROOT.
<a href="../categoria/pagina">example</a>The page is in the category folder that is one level above the current folder/page.

Important: A relative URL should never be used for meta tags. They will not be read correctly.

It is important to have a good understanding of how absolute and relative URLs work because they can impact SEO.

Extracting Parts of a URL with Programming

Now that we understand the different parts of a URL, let's see how to extract these specific parts with programming:

PHP

$url = "https://carlos.sanchezdonate.com/articulo/sintaxis-de-urls/?parametro#url";
$hostname = parse_url($url, PHP_URL_HOST);
echo $hostname;
// Output: carlos.sanchezdonate.com
// Get the path
$path = parse_url($url, PHP_URL_PATH);
echo $path;
// Output: /articulo/sintaxis-de-urls/
// Get the full URL
$href = parse_url($url, PHP_URL_SCHEME) . '://' . parse_url($url, PHP_URL_HOST) . parse_url($url, PHP_URL_PATH);
echo $href;
// Output: https://carlos.sanchezdonate.com/articulo/sintaxis-de-urls/
// Get the query string
$query = parse_url($url, PHP_URL_QUERY);
echo $query;
// Output: parametro
// Get the fragment (hash)
$hash = parse_url($url, PHP_URL_FRAGMENT);
echo $hash;
// Output: url
// Get the hostname

Note that the parse_url function in PHP does not automatically include the "?" character for the query string or the "#" character for the fragment (hash). If you wish to include them in the output, you can add them manually.

JavaScript

$url = "https://carlos.sanchezdonate.com/articulo/sintaxis-de-urls/?parametro#url";
// Get the hostname
$hostname = parse_url($url, PHP_URL_HOST);
echo $hostname;
// Output: carlos.sanchezdonate.com
// Get the path
$path = parse_url($url, PHP_URL_PATH);
echo $path;
// Output: /articulo/sintaxis-de-urls/
// Get the full URL
$href = parse_url($url, PHP_URL_SCHEME) . '://' . parse_url($url, PHP_URL_HOST) . parse_url($url, PHP_URL_PATH);
echo $href;
// Output: https://carlos.sanchezdonate.com/articulo/sintaxis-de-urls/
// Get the query string
$query = parse_url($url, PHP_URL_QUERY);
echo $query;
// Output: parametro
// Get the fragment (hash)
$hash = parse_url($url, PHP_URL_FRAGMENT);
echo $hash;
// Output: url

If you like this article, you would help me a lot by sharing my content:
Interested in Advanced SEO Training?

I currently offer advanced SEO training in Spanish. Would you like me to create an English version? Let me know!

Tell me you're interested
You might be interested in other articles:
SEO Articles
Usamos cookies para asegurar que te damos la mejor experiencia en nuestra web. Aquí tienes nuestra política de privacidad.