Blog

Anatomy of a URL

In February 2010, techy web magazine Read Write Web published an article entitled Facebook Wants to Be Your One True Login. The article was about a new partnership between Facebook and AOL aimed at making it easier for users to log in to Facebook. Which is quite ironic considering what happened next.

Anatomy of a URL

Soon after it was published, the article somehow managed to reach number one in Google for the search term Facebook log in. So when thousands of people started landing on this Read Write Web page, believing it was the Facebook log in page, they proceeded to log in! Of course, they couldn't log in because it wasn't the Facebook site they were on! Read the comments to see how people reacted when their log in details didn't work!

What's wrong with this situation? Who's fault is it that they did a search for Facebook log in and arrived on a site that wasn't the Facebook log in page? Some argued that Google was to blame because these people were already familiar with logging into Facebook in this way and because the Google results changed, they weren't to know better? Others argued that the blame lies with the individuals in question because they never carried out the most basic security check... to check the URL.

They didn't check the URL?

Evidently not. Hello? Phishing scams?

What is a URL?

URL stands for Uniform Resource Locator, or in other words, the web address of an online resource, i.e. a web site or document.

Web browsers display the URL in the address bar

Every website you visit has a URL. The following URL points to this article: https://doepud.co.uk/blog/anatomy-of-a-url.

The URL is created in one of two ways

  1. after clicking a link in a web page, bookmark or email
  2. by typing the URL directly into the address bar

♫The protocol's connected to the domain name and the domain name's connected to the file path... ♫

Using the URL of this article as an example, the three basic parts of a URL you should understand are the protocol, the domain name and the path.

https://doepud.co.uk/blog/anatomy-of-a-url

And, based on the example URL from Matt Cutts' URL definitions, here's an example of a more complex-looking URL:

http://video.google.co.uk:80/videoplay?docid=-7246927612831078230&hl=en#00h02m30s

URL anatomy explained

Protocol
The protocol declares how your web browser should communicate with a web server when sending or fetching a web page or document. The most common protocol is http which stands for Hypertext Transfer Protocol.
Another common protocol is https which stands for Hypertext Transfer Protocol Secure. You'll see this on secure pages, like shopping sites and log in pages. If you're visiting a site where you need to enter sensitive information, like bank details and passwords, make sure the protocol is declared as https. This means your web browser encrypts any information you provide so it can't be understood by any phishers who try to intercept the page during transfer.
Some protocols you're less likely to see include ftp (File Transfer Protocol) pop (Post Office Protocol), smtp (Simple Mail Transfer Protocol) and imap (Internet Message Access Protocol). If you want to know more here's a list of protocols.
Subdomain
A subdomain is a sub-division of the main domain name. For example, mail.doepud.com and calendar.doepud.com are subdomains of the domain name doepud.com.
Domain name
A domain name is a unique reference that identifies a web site on the internet, for example doepud.co.uk. A domain name always includes the top-level domain (TLD), which in Doepud's case is uk. The co part is shorthand for commercial and combined .co.uk is called a second-level domain (SLD).
Port
The port number is rarely visible in URLs but always required. When declared in a URL it comes right after the TLD, separated by a colon. When it's not declared and in most cases where the protocol is http, port 80 is used. For https (secure) requests port 443 is used.
Read more about port numbers in URLs.
Path
The path typically refers to a file or directory on the web server, e.g. /directory/file.php.
Sometimes the file name won't be specified, e.g. https://doepud.co.uk/blog/ so a web browser will automatically look inside the /blog/ folder for a file called index or default. If neither can be found, a 404 Not Found error will usually be returned by the server.
Query
A query is commonly found in the URL of dynamic pages (ones which are generated from database or user-generated content) and is represented by a question mark followed by one or more parameters. The query directly follows the domain name, path or port number.
For example, have a look at this URL which was generated by Google when doing a search for the word URL:

http://www.google.co.uk/search?q=url&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-GB:official&client=firefox-a.
The query part is

?q=url&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-GB:official&client=firefox-a.
Parameters
Parameters are snippets of information found in the query string of a URL. With reference to the Google query above, the parameters follow the question mark and are separated by an ampersand (&) character so they can be understood individually and used to display content on that page. The parameters are:
  • q=url
  • ie=utf-8
  • oe=utf-8
  • aq=t
  • rls=org.mozilla:en-GB:official
  • client=firefox-a
Fragment
A fragment is an internal page reference, sometimes called a named anchor. It usually appears at the end of a URL and begins with a hash (#) character followed by an identifier. It refers to a section within a web page.

Further reading