When you think ASP, think...
Recent Articles
All Articles
ASP.NET Articles
ASPFAQs.com
Message Board
Related Web Technologies
User Tips!
Coding Tips
Search

Sections:
Book Reviews
Sample Chapters
Commonly Asked Message Board Questions
JavaScript Tutorials
MSDN Communities Hub
Official Docs
Security
Stump the SQL Guru!
Web Hosts
XML
Information:
Advertise
Feedback
Author an Article
Jobs

ASP ASP.NET ASP FAQs Message Board Feedback ASP Jobs
 
Print this Page!
Published: Wednesday, July 28, 2010

Techniques for Preventing Duplicate URLs in Your Website

By Scott Mitchell


Introduction


Chances are, there are several different URLs that point to the same content on your website. For example, the URLs http://yoursite.com, http://yoursite.com/default.aspx, http://www.yoursite.com, or http://www.yoursite.com/default.aspx are all likely valid URLs that results in the same content, namely the homepage for yoursite.com. While having four different URLs reference the same content may not seem like a big deal, it can directly impact your website's search engine placement and, consequently, it's traffic. To a search engine, those four different URLs represent four different pages, even though the all produce the same content.

To understand how allowing duplicate URLs in your website can affect your search engine placement, first understand that search engines base a page's placement in the search results based, in part, on how many other websites link to the page. Now, imagine that there are 1,000 web pages from other websites that link to your homepage. You might conclude, then, that a search engine would rank the importance of your homepage based on those 1,000 links. But consider what would happen if 25% of those links linked to http://yoursite.com, 25% to http://yoursite.com/default.aspx, and so on. Rather than your homepage reflecting 1,000 inbound links, instead the search engine assumes there are only 250 links to http://yoursite.com, only 250 links to http://yoursite.com/default.aspx, and so on. In effect, redundant URLs can dilute your search engine ranking.

A key tenet of search engine optimization is URL normalization, or URL canonicalization. URL normalization is the process of eliminating duplicate URLs in your website. This article explores four different ways to implement URL normalization in your ASP.NET website. Read on to learn more!

- continued -

First Things First: Deciding on a Canonical URL Format


Before we examine techniques for normalizing URLs, and certainly before such techniques can be implemented, we must first decide on a canonical URL format. Many websites use www.websitename.com as the canonical form. For example, if you type amazon.com into your browser's Address bar and hit Enter you'll see that the URL is changed to www.amazon.com.

By choosing to use www.websitename.com as the canonical form we are saying that we to "replace" the URLs:

  • http://yoursite.com,
  • http://yoursite.com/default.aspx, and
  • http://www.yoursite.com/default.aspx
With the canonical one, http://www.yoursite.com.

But how do we go about "replacing" one URL with another? You can ensure that any internal links within your website refer to the canonical format, but what's to stop some other website from linking to one of the non-canonical forms? As we'll see in this article, you can't replace the non-canonical URLs; instead, you can issue permanent redirects from non-canonical URLs to the canonical form as well as include markup that gives search engines a hint as to the canonical form.

URL Normalization Using Permanent Redirects


Every ASP.NET developer is familiar with the Response.Redirect(url) method, which redirects a visitor from the page they requested to the specified url. Response.Redirect works by returning a 302 HTTP status code and a Location HTTP header to the client. The Location header indicates the URL to which the requested resource has moved. The 302 status code indicates that the resource being requested has temporarily moved to a new URL. A client - such as a search engine spider - that receives a 302 status will continue to try the original URL for future requests. There is an alternative status code, 301, that should be used if the resource has been moved permanently.

From an end user's experience, the 301 and 302 redirects behave the same - they are redirected to the specified URL. However, when a search engine spider receives a 301 status it updates its index with the new URL. Therefore, if anytime a request comes in for a non-canonical URL we immediately issue a permanent redirect to the same page but use the canonical form then a search engine spider crawling our site will only maintain the canonical form in its index. That means that it doesn't matter if other websites link to our homepage using a non-canonical format like http://yoursite.com or http://www.yoursite.com/default.aspx because a permanent redirect will send the user to http://www.yoursite.com and, in the case of a search engine spider, instruct it to update its index to use the canonical form.

There are a number of different ways we can determine if the incoming URL is in a non-canonical form and issue a permanent redirect to the canonical form. This article explores three such techniques: using ASP.NET code; using IIS 7's URL Rewrite Module; and using ISAPI_Rewrite, a commercial URL rewriting product that works with IIS 7 and earlier versions.

Issuing Permanent Redirects From ASP.NET


Every time an incoming request is handled by the ASP.NET engine, it raises the BeginRequest event. You can execute code in response to this event by creating an HTTP Module or by creating the Application_BeginRequest event handler in Global.asax. The following code, written by Fredrik Normen and available at Redirect Permanent from a non-www to a www using ASP.NET 4.0, examines the incoming URL (Request.Url) to see whether it starts with www. If the URL does not start with www then a permanent redirect is issued to the same page, but with the www in the URL. (The download available at the end of this article offers a VB version of the below code.)

protected void Application_BeginRequest(object sender, EventArgs e)
{
   if (Request.Url.Authority.StartsWith("www"))
      return;

   var url = string.Format("{0}://www.{1}{2}",
               Request.Url.Scheme,
               Request.Url.Authority,
               Request.Url.PathAndQuery);

   Response.RedirectPermanent(url, true);
}

The above code makes extensive use of the Request.Url property, which returns a Uri object that has a host of properties, like Scheme, Authority, and PathAndQuery, that can be used to examine the incoming URL. In the example above, the Authority property is examined to see if the authority (www.yoursite.com) starts with "www". If not, the user is redirected to the same URL they were requesting, but with the "www" injected.

Finally, note the use of the Response.RedirectPermanent method. This is a new method added to ASP.NET 4 that issues a 301 permanent redirect. This method and its behavior are described more in an earlier article of mine, Search Engine Optimization Enhancements in ASP.NET 4. If you are not using ASP.NET 4 you will have to write a few more lines of code to issue a permanent redirect - see Chris Love's blog entry, 301 Redirect ASP.NET. (Do not use Response.Redirect as that issues a temporary redirect and won't cause the search engines to update their indexes.)

The above code only tacks on the "www" if it is omitted - it does not drop the default.aspx from the URL if someone were to visit http://www.yoursite.com/default.aspx. This functionality can be added with a bit more code:

protected void Application_BeginRequest(object sender, EventArgs e)
{
   var url = string.Empty;
   
   if (!Request.Url.Authority.StartsWith("www"))
      url = string.Format("{0}://www.{1}{2}",
                  Request.Url.Scheme,
                  Request.Url.Authority,
                  Request.Url.PathAndQuery);
   else if (Request.RawUrl.EndsWith("/default.aspx", StringComparison.OrdinalIgnoreCase))
      url = string.Format("{0}://{1}{2}",
                  Request.Url.Scheme,
                  Request.Url.Authority,
                  Request.RawUrl.Remove(Request.RawUrl.LastIndexOf("/default.aspx", StringComparison.OrdinalIgnoreCase)));

   if (url.Length > 0)
      Response.RedirectPermanent(url, true);
}

The above code (in VB) is available for download at the end of this article.

Rewriting URLs Into Canonical Form Using IIS 7's URL Rewrite Module


Shortly after releasing IIS 7, Microsoft created and released a free URL Rewrite Module. The URL Rewrite Module makes it easy to define URL rewriting rules in your Web.config file. To learn more about the URL Rewrite Module, including instructions on downloading and installing the module, please refer to the URL Rewrite Module documentation on www.IIS.net.

Presuming you have the URL Rewrite Module installed on your IIS 7 web server and that your website uses the integrated pipeline, all you need to do is add the following markup to your ASP.NET application's Web.config file:

<configuration>
   ...

   <system.webServer>
      <rewrite>
         <rules>
            <rule name="Canonical Host Name" stopProcessing="true">
               <match url="(.*)" />
               
               <conditions>
                  <add input="{HTTP_HOST}" pattern="^yoursite\.com$" />
               </conditions>
               
               <action type="Redirect" url="http://www.yoursite.com/{R:1}" redirectType="Permanent" />
            </rule>

         </rules>
      </rewrite>
   </system.webServer>
</configuration>

The above configuration defines a single rule named Canonical Host Name. This rule examines the host name and if it matches the regular expression pattern ^yoursite\.com$ - which means, in English that the host name is literally "yoursite.com" - then the user is permanently redirected to http://www.yoursite.com/PageBeingRequested.

To remove default.aspx from the URL - that is, to normalize from http://www.yoursite.com/default.aspx to http://www.yoursite.com, add the following rule beneath the Canonical Host Name rule:

<rule name="Strip Default.aspx Out">
   <match url="(.*)default.aspx" ignoreCase="false" />
   <action type="Redirect" url="{R:1}" redirectType="Permanent" />
</rule>

Rewriting URLs Into Canonical Form Using ISAPI_Rewrite


Microsoft's URL Rewriter Module is a great choice if you are using IIS 7, but if you are using previous version of IIS you're out of luck. What also makes things a bit more complicated is that IIS 6 and earlier has a more distinct boundary between IIS's pipeline and ASP.NET's. In IIS 6 (and earlier) requests for static resources like HTML pages, MP3s, PDFs, ZIPs, and such are not (by default) handled by ASP.NET. Consequently, any URL rewriting logic implemented at the ASP.NET layer - such as in using the Application_BeginRequest event handler in Global.asax - will not ensure that the canonical URL form is used for static resources. Instead, an IIS-level solution needs to be applied.

ISAPI_Rewrite is a commercial URL rewriting engine for IIS that is quite similar to Apache's mod_rewrite URL rewriting engine. Instead of defining rewrite rules in Web.config using XML syntax, ISAPI_Rewrite rules are defined in a text file using single-line commands.

To get started with with ISAPI_Rewrite, head over to the download page and download and install the appropriate package. There's both a freeware "Lite" version and a fully functional commercial version. The official documentation gives a detailed overview of ISAPI_Rewrite's syntax, but here's an example of how you would use it to redirect users permanently to the canonical form:

# Redirect from yoursite.com to www.yoursite.com
RewriteCond Host: yoursite\.com
RewriteRule (.*) http\://www.yoursite\.com$2 [RP]

# Remove default.aspx
RewriteRule (.*)/default.aspx$ $1/ [RP]

Telling Search Engine Spiders Your Canonical Form In Markup


As we just saw, one way to enforce URL normalization is to implement permanent redirects from non-canonical URLs to canonical ones. This works well for the homepage and for adding (or dropping) the "www" from all requests to your site, but it doesn't address other issues. Consider a URL that may include querystring parameters that don't affect the content rendered on the page or only affect non-essential parts of the page. Take YouTube as an example. The canonical URL to a YouTube video is http://www.youtube.com/watch?v=videoId. The v querystring parameter is the key parameter here, as it specifies the video to display.

The YouTube video URL may optionally include additional URL parameters, such as a querystring parameter that specifies that the other videos in the same channel should be displayed to the right of the video player (which takes the form http://www.youtube.com/watch?v=videoId&feature=channel). The YouTube webmasters want search engines to index the canonical form of the URL, but they can't do a permanent redirect from http://www.youtube.com/watch?v=videoId&feature=channel to http://www.youtube.com/watch?v=videoId, otherwise the channel feature will never be enabled for any visitor.

Fortunately, there is a way to give a hint to a search engine that you have a canonical URL that should be used. To specify the canonical URL simply add a <link> element in the <head> portion of the web page. The way this works as is follows - add the following markup to the <head> sections of those pages in your website that you want the search engines to consider all the same URL:

<link rel="canonical" href="canonical_url" />

In the case of YouTube, all video pages specify a <link> element like so, regardless of whether the querystring includes just the videoId or the videoId and other parameters:

<link rel="canonical" href="/watch?v=videoId">

Because this same <link> element shows up when visiting http://www.youtube.com/watch?v=videoId and http://www.youtube.com/watch?v=videoId&feature=channel, the search engine spider will treat these two URLs as one in the same.

For more information on how this <link> element works be sure to read Specify Your Canonical.

Happy Programming!

  • By Scott Mitchell


    Attachments:


  • Download the VB versions of the code demos presented in this article
  • Further Readings:


  • SEO Advice: URL Canonicalization
  • URL Rewriting to Prevent Duplicate URLs
  • IIS 7's URL Rewrite Module
  • ISAPI_Rewrite Homepage
  • Specify Your Canonical


  • ASP.NET [1.x] [2.0] | ASPMessageboard.com | ASPFAQs.com | Advertise | Feedback | Author an Article