<?xml version="1.0" encoding="UTF-8" ?>

<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/">
	<channel>
			<title>Tagged with "google"</title>
			<link>http://www.addedbytes.com/feeds/tag-feed/</link>
			<description></description>
			<language>en</language>
			<copyright>Web Development in Brighton - Added Bytes 2006</copyright>
			<ttl>120</ttl>
			<item>
				<title>Numbered Google Results User Script</title>
				<link>http://www.addedbytes.com/blog/code/numbered-google-results-user-script/</link>
				<description><![CDATA[ A user script for Opera and Firefox that automatically numbers Google search results. Updated 16 Nov 2006 following changes to Google results page code. <p><strong>Update:</strong> This user script was last updated 21 August 2008 following changes to Google results page code.</p>

<p><a href="http://greasemonkey.mozdev.org/">Greasemonkey</a> is a truly impressive addition to Firefox and <del>will be</del> <ins>IS</ins> a nice addition to Opera 8. In simple terms, it allows you to write custom JavaScripts to run on any sites you like (or even all sites). There is a fairly healthy group of scripts already available, and many more surely on the way.</p>

<ul><li><a href="userscripts/number_google_results.user.js">Number Google Results</a><br />
This script will automatically number all Google search results for you.</li></ul> <br><br>]]></description>
				<pubDate>Thu, 16 Nov 2006 10:08:00 +0000</pubDate>
				<guid isPermaLink="false">http://www.addedbytes.com/blog/code/numbered-google-results-user-script/</guid>
				<dc:creator>Dave Child</dc:creator>
				<a href="/feeds/tag-feed/?tags=firefox&amp;start=0" class="ditto_tag" rel="tag">firefox</a>,<a href="/feeds/tag-feed/?tags=google&amp;start=0" class="ditto_tag" rel="tag">google</a>,<a href="/feeds/tag-feed/?tags=greasemonkey&amp;start=0" class="ditto_tag" rel="tag">greasemonkey</a>,<a href="/feeds/tag-feed/?tags=hacking&amp;start=0" class="ditto_tag" rel="tag">hacking</a>
			</item>

			<item>
				<title>Jargon Explained</title>
				<link>http://www.addedbytes.com/articles/online-marketing/jargon-explained/</link>
				<description><![CDATA[ Many of my clients have worked previously consultants and SEOs that inundated them with jargon, especially where proposals and sales calls are concerned. I find myself sometimes using too much jargon - easily done when you spend so much time working in any field. This jargon guide explains the industry terms in simple language. <h3 id="anchortext">Anchor Text</h3>

<p>Anchor text is the text used to link to another site. In this example - <a href="http://www.google.com">Google Web Search</a> - the anchor text is "Google Web Search".</p>

<h3 id="atom">Atom</h3>

<p>Atom is a file format used for web <a href="http://www.addedbytes.com/seo/jargon-explained/#feeds">feeds</a>. It is a type of <a href="http://www.addedbytes.com/seo/jargon-explained/#xml">XML</a> document, and is used in <a href="http://www.addedbytes.com/seo/jargon-explained/#syndication">syndication</a>.</p>

<h3 id="blackhat">Black Hat</h3>

<p>Black Hat is the term used to describe techniques used by some search marketers to promote websites. These techniques are those that go against guidelines published by search engines, and in many cases their use <em>can</em> result in a site being penalised or removed from search engine listings. Black Hat is the opposite of <a href="http://www.addedbytes.com/seo/jargon-explained/#whitehat">White Hat</a>.</p>

<h3 id="cctld">ccTLD</h3>

<p>A ccTLD is a country-code <a href="http://www.addedbytes.com/seo/jargon-explained/#tld">top level domain</a>. .uk, for example, is a ccTLD, as are .au (Australia), .de (Germany), .fr (France), .ca (Canada) and .nz (New Zealand).</p>

<h3 id="clickthroughrate">Click-through Rate</h3>

<p>See <a href="http://www.addedbytes.com/seo/jargon-explained/#ctr">CTR</a>.</p>

<h3 id="cloaking">Cloaking</h3>

<p>Cloaking is a technique used to show content to a search engine and different content to a user. The content shown to the engine is usually designed to help a page rank very well for a certain phrase or word, and the content shown to the user usually designed to maximise the <a href="http://www.addedbytes.com/seo/jargon-explained/#conversion">conversions</a> from that page. Search engines dislike this technique and many sites are banned for using it. It is a <a href="#blackhat">Black Hat</a> technique.</p>

<h3 id="conversion">Conversion</h3>

<p>A conversion is when a website user completes a specific goal. With some sites that can be to complete a sale; with others, to sign up to a newsletter; and with others to make an enquiry.</p>

<h3 id="cookie">Cookie</h3>

<p>A cookie is a small text file stored on a website user's computer. It identifies a repeat visitor to a site, often with a unique code, allowing people to shop online and removing the need to log in to sites repeatedly. Cookies are often considered dangerous by less experienced web users. You can find out more about cookies in <a href="http://www.addedbytes.com/development/are-cookies-dangerous/">Are Cookies Dangerous?</a></p>

<h3 id="cpa">CPA</h3>

<p>CPA stands for "Cost-Per-Action", and is a form of advertising model. The idea is that an advertiser pays a specific amount for each successful <a href="http://www.addedbytes.com/seo/jargon-explained/#conversion">conversion</a>, be that a sale or a signup.</p>

<h3 id="cpc">CPC</h3>

<p>CPC stands for "Cost-Per-Click", and is a form of advertising model. The idea is that an advertiser pays a specific amount for each visitor referred to their website, regardless of whether that user converts to a sale or not.</p>

<h3 id="cpm">CPM</h3>

<p>CPM stands for "Cost-Per-Mille", and is a form of advertising model. The idea is that an advertiser pays a specific amount for every thousand times his advert is seen on a site, regardless of how many of the users who see the advert click on it and visit the advertiser's site.</p>

<h3 id="crawler">Crawler</h3>

<p>See <a href="http://www.addedbytes.com/seo/jargon-explained/#spider">Spider</a>.</p>

<h3 id="ctr">CTR</h3>

<p>CTR stands for "Click-through Rate". It is an indicator of the percentage of people who see an advert who actually click on it. For example, if one out of every hundred people who view an advert click on it, the advert with have a CTR of 1%.</p>

<h3 id="directory">Directory</h3>

<p>A directory is different to a search engine in that it organises the sites it lists in categories. Sites are usually added by hand, rather than found using a <a href="http://www.addedbytes.com/seo/jargon-explained/#spider">spider</a>, and often a small fee is charged for this addition.</p>

<h3 id="datacenter">Data Center</h3>

<p>A data center is a large collection of computers that hold information for a search engine. Major search engines have several of these around the world. Their purposes is to process search queries.</p>

<h3 id="doorway">Doorway Page</h3>

<p>A doorway page is a page designed specifically to rank well in search engines. Often a visitor to a doorway page will not notice they have visited one, as they will be sent straight on to the target page instantly. Use of doorways is a <a href="http://www.addedbytes.com/seo/jargon-explained/#blackhat">Black Hat</a> technique.</p>

<h3 id="feed">Feed</h3>

<p>A feed is a file that users can download that contains information about recent updates and additions to a website. Often these feeds are used for <a href="http://www.addedbytes.com/seo/jargon-explained/#syndication">syndication</a> purposes. Using feeds and programs designed to use feeds, users can often keep up to date with many hundreds of websites.</p>

<h3 id="ffa">FFA</h3>

<p>FFA stands for "Free-For All". It is usually used in conjunction with links pages that allow anyone and everyone to add a link to the page.</p>

<h3 id="googledance">Google Dance</h3>

<p>The Google Dance is the name for the process Google used to go through very regularly when it updated an algorithm. As various <a href="http://www.addedbytes.com/seo/jargon-explained/#datacenter">data centres</a> around the world were progrssively updated, people would be able to make the same search several times in succession and see different results each time. The Google Dance does not happen as often now, but can still be seen when major changes are made to the Google infrastructure or algorithms.</p>

<h3 id="hit">Hit</h3>

<p>A "hit" can mean one of two things.</p>

<ul><li>When searching the web, a hit can be a result found by a search engines that matches the search criteria.</li><li>In analytics, a hit is when a file is requested by a server. Some people have used hits as a measure of website traffic, however hits to a server include images and repeat visitors, and are a poor indicator of traffic. One thousand hits very rarely equals one thousand visits.</li></ul>

<h3 id="ibl">IBL</h3>

<p>IBL stands for "Inbound Link", and refers to a link pointing to a website from a separate website (unlike an internal link, which refers to a link within one website pointing to somewhere else within the same site).</p>

<h3 id="impression">Impression</h3>

<p>Impression is the word used to describe a single viewing of something. A page impression would mean a single view of a web page. In advertising, one impression is a single view of the advert.</p>

<h3 id="keyword">Keyword</h3>

<p>A keyword is simply a word used to describe a page. It can also be a word used by someone trying to find a site, entered into a search engine.</p>

<h3 id="keyphrase">Keyphrase</h3>

<p>A keyphrase is very similar to a <a href="http://www.addedbytes.com/seo/jargon-explained/#keywords">keyword</a>, except that it is a phrase made up of several words.</p>

<h3 id="keywordstuffing">Keyword Stuffing</h3>

<p>Keyword stuffing is the practice of repeating a keyword (or keywords) far too many times throughout a page. It may be that the keyword is repeated so many times in the text that as a result the text reads badly. It may be that it is repeated lots of times in meta tags, or elsewhere in code, or it may be a combination of these things. Common practice in the late 90s, this is now considered a technique that may harm a site more than help it.</p>

<h3 id="linkbuilding">Link Building</h3>

<p>Link Building is the process used to increase the number of links to a website. This can include submitting a website to directories, creating more content for a website, link rental, and many more techniques. Most search engines now use link data extensively in their algorithms, and so link building has become far more common.</p>

<h3 id="metadata">Meta Data / Meta Tag</h3>

<p>Meta Data is information held about a page or document. It is usually held invisibly within the page, and may include a description of the page, a list of relevant keywords, or the name of the author. For a full explanation of common meta tags, and how to work out which ones are worth using, please read <a href="http://www.addedbytes.com/seo/meta-tags/">Meta Tags</a>.</p>

<h3 id="pagetitle">Page Title</h3>

<p>A page title is an important part of a page - it is usually the part of the page that appears as a link in search results. It is usually visible in the title bar of your browser while you are viewing a page.</p>

<h3 id="pr">PageRank / PR</h3>

<p>PageRank is an algorithm, developer by Larry Page and Sergey Brin, founders of Google. It allows you to find the "best" pages of a group of pages by looking at how the pages link to each other. The more links a page has, the better it is considered, and the more important its links, in turn, are considered. PageRank is named after Larry Page.</p>

<h3 id="payperaction">Pay Per Action</h3>

<p>Pay Per Action advertising is the same advertising model as <a href="http://www.addedbytes.com/seo/jargon-explained/#cpa">CPA</a>, in that an advertiser will pay every time a user completes a specific action.</p>

<h3 id="paypercall">Pay Per Call</h3>

<p>Pay Per Call advertising is a subset of <a href="http://www.addedbytes.com/seo/jargon-explained/#payperaction">Pay Per Action</a>, and is the same advertising model as <a href="http://www.addedbytes.com/seo/jargon-explained/#cpa">CPA</a>, in that an advertiser will pay every time a user calls a specific number.</p>

<h3 id="payperclick">Pay Per Click</h3>

<p>Pay Per Click advertising is the same advertising model as <a href="http://www.addedbytes.com/seo/jargon-explained/#cpc">CPC</a>, in that an advertiser will pay every time a user clicks on their advert.</p>

<h3 id="pfi">PFI</h3>

<p>PFI stands for "Pay For Inclusion". Some engines will charge sites to be listed at all in their results (notably Yahoo for many years). Prices vary greatly, and some engines charge annually, where others charge a one-off fee. This is a far more common feature of directories than search engines.</p>

<h3 id="ppc">PPC</h3>

<p>See <a href="payperclick">Pay Per Click</a>.</p>

<h3 id="robots">Robots.txt</h3>

<p>A robots.txt file is a simple text file that contains instructions for search engine <a href="http://www.addedbytes.com/seo/jargon-explained/#spider">spiders</a>. It can tell specific spiders to slow down, or not to index specific area of a site. For more information, please read <a href="http://www.addedbytes.com/development/robots-txt-file/">robots.txt</a>.</p>

<h3 id="roi">ROI</h3>

<p>ROI stands for "Return on Investment". It is a measure of the success of any marketing campaign. A marketing campaign that cost ?10,000 but made ?3,000 would obviously have a low ROI. A marketing campaign that cost ?10,000 but made ?100,000 would have a high ROI.</p>

<h3 id="rss">RSS</h3>

<p>RSS is a type of <a href="http://www.addedbytes.com/seo/jargon-explained/#xml">XML</a> file, and is the most commonly used file format for website <a href="http://www.addedbytes.com/seo/jargon-explained/#feed">feeds</a>.</p>

<h3 id="sem">SEM</h3>

<p>SEM is an acronym of "Search Engine Marketing". SEM is a broader topic than <a href="http://www.addedbytes.com/seo/jargon-explained/#seo">SEO</a>, and can include, for example, an online PR campaign or <a href="http://www.addedbytes.com/seo/jargon-explained/#ppc">PPC</a> (and other forms of) advertising.</p>

<h3 id="seo">SEO</h3>

<p>SEO is an acronym of "Search Engine Optimisation", and is the art of altering a website to improve a site's performance in search engines (note: an improvement in performance does not equal an increase in traffic!).</p>

<h3 id="serps">SERPs</h3>

<p>SERPs is an acronym for "Search Engine Result Pages".</p>

<h3 id="ses">SEs</h3>

<p>SE is an abbreviation of "Search Engine".</p>

<h3 id="sitemaps">Site Map</h3>

<p>A site map is a page, or set of pages, on a website, designed to help users and search engines find their way around a site.</p>

<h3 id="spam">Spam</h3>

<p>Spam has many different meanings on the web. The most common meaning is related to email, where spam describes unwanted email, often commercial in nature, and often sent out indiscriminately to millions of people at once. In a search engine context, spam refers to pages that are listed out of place. This can mean pages that are found for keywords unrelated to their content. It can also mean pages appearing unnaturally high in search engines. These pages are often promoted using <a href="http://www.addedbytes.com/seo/jargon-explained/#blackhat">Black Hat</a> techniques, especially <a href="http://www.addedbytes.com/seo/jargon-explained/#cloaking">cloaking</a> and <a href="http://www.addedbytes.com/seo/jargon-explained/#doorway">doorway pages</a>.</p>

<h3 id="spider">Spider</h3>

<p>A spider, also often called a "crawler", is a program created by a search engine to index pages on the web. It visits pages on the web, collects their content, and finds links within that page. It then adds the links found on that page to those it intends to crawl.</p>

<h3 id="splashpage">Splash Page</h3>

<p>A splash page is an introduction page to a website, often created using flash. They are much derided, as they slow down access to a website and often provide no useful information to the user.</p>

<h3 id="stopword">Stop Word</h3>

<p>A stop word is a word that is ignored by the search engines. It is a word that appears so often on the web as to be useless to a search engine. Examples include "a", "and", "I", "you" and "it".</p>

<h3 id="syndication">Syndication</h3>

<p>Syndication is where a website makes information available for others to use. In the majority of cases, the information available is a list of the content most recently added to the site (a <a href="http://www.addedbytes.com/seo/jargon-explained/#feed">feed</a>), to allows visitors to keep up to date easily with new content added to many sites.</p>

<h3 id="textlinkad">Text Link Ad</h3>

<p>A text link ad is a type of advert on a website, placed in return for a simple monthly fee. These types of advert can have a positive effect on a website's SEO campaign, and can directly generate traffic to websites.</p>

<h3 id="tld">TLD</h3>

<p>A TLD is an acronym for "Top Level Domain". .com, .org, .net, .biz, .info, .name and .pro are all examples of TLDs. They are usually global TLDs, unlike <a href="http://www.addedbytes.com/seo/jargon-explained/#cctld">ccTLDs</a>, which are country-code domains.</p>

<h3 id="url">URL / URI</h3>

<p>A URL (Uniform Resource Locator), sometimes (more correctly) referred to as a URI (Uniform Resource Identifier), is in basic terms a web address. For example, "http://www.addedbytes.com" is a URI.</p>

<h3 id="visits">Visit</h3>

<p>A visit is different from a <a href="http://www.addedbytes.com/seo/jargon-explained/#hit">Hit</a> or an <a href="http://www.addedbytes.com/seo/jargon-explained/#impression">Impression</a>, in that it indicates a single person's visit to a website. A visit may include many page impressions, and many hits.</p>

<h3 id="whitehat">White Hat</h3>

<p>White Hat is the term used to describe techniques used by some search marketers to promote websites. These techniques are those that adhere to the guidelines published by search engines. White Hat is the opposite of <a href="http://www.addedbytes.com/seo/jargon-explained/#blackhat">Black Hat</a>.</p>

<h3 id="xml">XML</h3>

<p>XML is a file format designed to create files that are easy to share and understand.</p> <br><br>]]></description>
				<pubDate>Wed, 03 May 2006 13:17:00 +0100</pubDate>
				<guid isPermaLink="false">http://www.addedbytes.com/articles/online-marketing/jargon-explained/</guid>
				<dc:creator>Dave Child</dc:creator>
				<a href="/feeds/tag-feed/?tags=article&amp;start=0" class="ditto_tag" rel="tag">article</a>,<a href="/feeds/tag-feed/?tags=blog&amp;start=0" class="ditto_tag" rel="tag">blog</a>,<a href="/feeds/tag-feed/?tags=google&amp;start=0" class="ditto_tag" rel="tag">google</a>,<a href="/feeds/tag-feed/?tags=jargon&amp;start=0" class="ditto_tag" rel="tag">jargon</a>,<a href="/feeds/tag-feed/?tags=online+marketing&amp;start=0" class="ditto_tag" rel="tag">online marketing</a>,<a href="/feeds/tag-feed/?tags=reference&amp;start=0" class="ditto_tag" rel="tag">reference</a>,<a href="/feeds/tag-feed/?tags=search&amp;start=0" class="ditto_tag" rel="tag">search</a>,<a href="/feeds/tag-feed/?tags=seo&amp;start=0" class="ditto_tag" rel="tag">seo</a>,<a href="/feeds/tag-feed/?tags=web&amp;start=0" class="ditto_tag" rel="tag">web</a>
			</item>

			<item>
				<title>My Site's Dropped!</title>
				<link>http://www.addedbytes.com/articles/online-marketing/my-site-has-dropped/</link>
				<description><![CDATA[ Why sites usually drop in the SERPs and what to do if it happens to you. <p>Visit any one of the excellent internet marketing forums on the web and you will see a host of threads dedicated to the same topic: <strong>My Site Has Dropped</strong>. Google, Yahoo, MSN, Ask and the other engines are constantly in a state of flux, so to a degree this is to be expected, but sometimes major shifts in rankings and resultant traffic are seen and sometimes sites are penalised. Consequently, on any given day there are plenty of webmasters who wake up to discover their traffic has vanished into this air.</p>

<p>For hobby webmasters, this is generally not a problem. For anyone making money online, though, it can be extremely nervewracking. For those whose livlihoods depend on their websites, losing all search engine traffic can be a devastating blow.</p>

<p>Unfortunately, a great deal of the threads and dicussions on this topic often result in a large amount of misinformation. For example, as a result of one recent Google update, many sites had lost significant rankings. Some forums were claiming that the specific industries had specifically been targetted and sites in that industry had been penalised in some way. Some claimed that Google had "lost" a serious amount of data, or had re-added old data, and that was what caused the change. There are as many explanations for loss of traffic as there are sites that have dropped out there.</p>

<p>Unfortunately, with all of the wild ideas and crazy theories being bandied around, the average site owner has a very hard time working out first what has happened, and second what to do about it.</p>

<p>The very first thing to consider when looking at the effect of a shift in algorithms is that a change rarely affects all ranking criteria at once. They rarely, if ever, target a specific industry, even though the effect of a change on a specific market may be far greater than in others (this is especially true in ultra-competitive arenas, such as real estate, finance and the adult industry, where those at the top are often precariously balanced, and a tiny change in algorithms can mean major changes to the SERPs).</p>

<h3>Fixing The Problem</h3>

<p>Before anything else, it is important to make sure there actually is a problem. The forums usually first fill with these types of posts during an update. However, while the update is going on the SERPs are in a state of flux. Sites can appear all over the place during an update, so save the panic until the update is over. Updates can last days, and it is a good idea to watch a few of the SEO forums to find out when an update has finished.</p>

<p>If the update has finished and a site has definitely dropped, it is rare that it will be able to regain the exact same (or better) traffic within a short space of time. If an algorithm change has caused a site to be dropped, the chances are that one specific thing that was making that site rank well (for example, rented links) has been devalued. If the only thing that was making a site rank well has become less important, there are probably no quick fixes.</p>

<h3>Is It a Penalty?</h3>

<p>The first thought to cross most peoples' minds when sites lose traffic and drop down the SERPs is that there must be a penalty applied to their site. Penalties are very real, yes, but there is no reason to suspect you have had a penalty applied unless one of the following is true:</p>

<ul><li>You have been doing bad things. If you've been using cloaking, hidden text, doorway pages, keyword stuffing or link farms etc, expect to be penalised.</li><li><p>You can't find your site - at all - in the search engine you suspect has penalised you. In the case of Google, search for "site:addedbytes.com" (replacing addedbytes.com with your domain name, of course). If no results are returned, Google will show you something like this:</p>

<p>Your Search - <strong>site:addedbytes.com</strong> - did not match any documents.</p>

<p>Suggestions:<ul><li>Make sure all words are spelled correctly.</li><li>Try different keywords.</li><li>Try more general keywords.</li></ul></p></li></ul>

<p>If you have been penalised, then you'll get no sympathy from me - be more careful in future! SEO is not about getting top rankings for two weeks before vanishing forever from results, it's about a sustained and long-term effort to get top spots. It's not a sprint, it's a marathon. It's not worth taking the kind of risks that will get you penalised unless you have no choice. (&lt;/lecture&gt;)</p>

<p>To come back from a penalty is not a quick process, but it is relatively simple. First, remove all remotely-fishy stuff from your site. Before a site is reincluded, the chances are it will be checked, and if you've not corrected what you were doing wrong, you will not be reincluded. Be over-cautious at this stage - better to remove absolutely anything that a search engine might dislike than remove the obvious things and be refused reinclusion because you've assumed that the search engine penalised you for something specific and that fixing that alone is enough. Once you are certain there is nothing left <em>on your domain</em> that can be considered dodgy by the whitest of white hats, then file a reinclusion request with the engine you are having trouble with.</p>

<p>Then wait - and it may be many months before you are reincluded, if at all. Don't pressure the engine and don't file the request every day or week. File it once and wait. You're the kid in the corner with the large hat with a D on it. THe search engine doesn't like you - you tried to manipulate it (even if it wasn't your work, it's your site and your responsibility). Have patience and work on building links to your site and building content - at least then when you are reincluded you should have better traffic.</p>

<h3>What Next?</h3>

<p>If you've not been penalised, then you should look at why you have dropped. Actually, let me rephrase - you should look at why your competitors have risen up the SERPs - that is a more accurate way to look at it. Check out the top sites in your field - what are they getting traffic for? What do their sites have that yours doesn't? Link quantity? Link quality? Content? Meta tags? A title in a specific shade of blue? Look for themes in the top ranking sites - if you can find out why they are top now, and you are not, you know what to work on.</p>

<p>The chances are that if you're not been penalised and your site is not performing as well, you need to look at improving or updating your online marketing tactics.</p>

<p>If your site is lacking in normal, organic links for example (you have previously paid for all of your links), then start adding things to your site people will want to link to to add to your normal organic links. Add a blog and post controversial or funny (but always unique) items on there. The web is a conversation, and you are not as prominent as you once were because the search engines are getting better - to get yourself noticed, you need to be talked about. [The same applies in all area - if the people doing better than you all have very content-heavy sites, hire some copywriters and get them writing some interesting and engaging content; if the people doing better than you have sites built with good quality, semantic markup, and you don't, have your site rebuilt.]</p>

<p>The most important thing is to treat a perceived drop in rankings for what it is: a temporary glitch in your grand plan. Put in a bit of hard work and a little investment in your online marketing and you should see improvements. You were ranking well before, so the chances are good that you will rank well again.</p> <br><br>]]></description>
				<pubDate>Wed, 04 Jan 2006 12:24:00 +0000</pubDate>
				<guid isPermaLink="false">http://www.addedbytes.com/articles/online-marketing/my-site-has-dropped/</guid>
				<dc:creator>Dave Child</dc:creator>
				<a href="/feeds/tag-feed/?tags=google&amp;start=0" class="ditto_tag" rel="tag">google</a>,<a href="/feeds/tag-feed/?tags=howto&amp;start=0" class="ditto_tag" rel="tag">howto</a>,<a href="/feeds/tag-feed/?tags=marketing&amp;start=0" class="ditto_tag" rel="tag">marketing</a>,<a href="/feeds/tag-feed/?tags=online+marketing&amp;start=0" class="ditto_tag" rel="tag">online marketing</a>,<a href="/feeds/tag-feed/?tags=optimization&amp;start=0" class="ditto_tag" rel="tag">optimization</a>,<a href="/feeds/tag-feed/?tags=seo&amp;start=0" class="ditto_tag" rel="tag">seo</a>,<a href="/feeds/tag-feed/?tags=tips&amp;start=0" class="ditto_tag" rel="tag">tips</a>
			</item>

			<item>
				<title>Block Prefetching</title>
				<link>http://www.addedbytes.com/blog/block-prefetching/</link>
				<description><![CDATA[ <p>Mozilla and Google's prefetching functions are a nice addition to browser technology in many ways. Unsurprisingly, they are not very well thought through.</p> <p>Mozilla and Google's prefetching functions are a nice addition to browser technology in many ways. Unsurprisingly, they are not very well thought through. The main two problems with the prefetching idea are that it messes with log files and it means every link on a page could potentially be followed despite the consequences (dangerous in a site administration context).</p>

<p>It appears from the FAQ that Google only intends their accelerator to prefetch specific pages, that have been specified with the &lt;link&gt; tag. However, many people are claiming that normal links have been prefetched.</p>

<p>To prevent prefetching of a page is simple: add the following PHP to the page you do not want prefetched:</p>

<pre class="php">if ((isset($_SERVER['HTTP_X_MOZ'])) && ($_SERVER['HTTP_X_MOZ'] == 'prefetch')) {
    // This is a prefetch request. Block it.
    header('HTTP/1.0 403 Forbidden');
    echo '403: Forbidden&lt;br&gt;&lt;br&gt;Prefetching not allowed here.';
    die();
}</pre>

<p>This will serve a "forbidden" header to the prefetcher. Normal browsing should be unaffected.</p> <br><br>]]></description>
				<pubDate>Wed, 20 Apr 2005 16:16:00 +0100</pubDate>
				<guid isPermaLink="false">http://www.addedbytes.com/blog/block-prefetching/</guid>
				<dc:creator>Dave Child</dc:creator>
				<a href="/feeds/tag-feed/?tags=block&amp;start=0" class="ditto_tag" rel="tag">block</a>,<a href="/feeds/tag-feed/?tags=google&amp;start=0" class="ditto_tag" rel="tag">google</a>,<a href="/feeds/tag-feed/?tags=mozilla&amp;start=0" class="ditto_tag" rel="tag">mozilla</a>,<a href="/feeds/tag-feed/?tags=php&amp;start=0" class="ditto_tag" rel="tag">php</a>,<a href="/feeds/tag-feed/?tags=prefetching&amp;start=0" class="ditto_tag" rel="tag">prefetching</a>,<a href="/feeds/tag-feed/?tags=reference&amp;start=0" class="ditto_tag" rel="tag">reference</a>,<a href="/feeds/tag-feed/?tags=webdev&amp;start=0" class="ditto_tag" rel="tag">webdev</a>
			</item>

			<item>
				<title>robots.txt File</title>
				<link>http://www.addedbytes.com/articles/online-marketing/robots-txt-file/</link>
				<description><![CDATA[ Learn how and why you should add a robots.txt file to your website. <p>A robots.txt file is a simple, plain text file that you store on your website. Its purpose is to give instructions to robots (also known as "spiders", programs that retrieve content for search engines like Google and Fast) detailing <em>what they should not index on a website</em>. If you are unable to create or use a robots.txt file, you might find this <a href="http://www.addedbytes.com/seo/meta-tags/3#robotstag">meta tags tutorial</a> useful.</p>

<p>A robots.txt file (a document detailing the <a href="http://www.robotstxt.org/wc/norobots.html">robots.txt exclusion standard</a> is available) is always stored in the root of your site, and is always named in lower case. For example, if a website at http://www.addedbytes.com/ had a robots.txt file it would be found at http://www.addedbytes.com/robots.txt - and only there. Spiders will always search for it in the root of a domain, and will never ever look for it elsewhere. You cannot specify a different name or location for a robots.txt file.</p>

<p>A robots.txt file should be viewed like a list of recommendations. By including one, you are asking the spiders that visit your site to ignore certain things that you would prefer not to be indexed, but they are not obliged to pay attention to that. If you really do not want things indexed, it is far better to disallow access with server-side programming than a robots.txt file.</p>

<h3>Writing a robots.txt File</h3>

<p>A robots.txt file is a list of instructions. Each instruction is divided into two parts. The first, "User-agent" (case-sensitive), tells robots reading the file which robots should pay attention to the instructions that follow. Usually, this will be a "*", which is a wild card meaning "all robots". The wild card character can only be used in this context, except in the case of Googlebot, which does support it in other places (see <a href="http://www.addedbytes.com/development/robots-txt-file/3/#uaspecific">User-Agent Specific Commands</a>).</p>

<p>Following this line specifying a user agent are the rules themselves. The rules that apply to a defined user agent must be defined on the lines following the "User-agent" instruction. There can be no blank lines within each set of instructions, and there must be at least one blank line seperating sets of instructions. The instructions are usually of the format: "Disallow: /folder/" or "Disallow: /file.htm". There can only be one instruction per line, and you should really avoid putting spaces before the instructions (though this isn't specifically allowed or disallowed, it is probably best to avoid taking a risk).</p>

<p>Anything following a hash symbol "#" is considered a comment and ignored. At least, according to the standards. Rumours abound, though, that in the past some engines have ignored a line with a hash symbol on it wherever it is placed, so you may want to place each comment on a line by itself.</p>

<p>For example, the following robots.txt file is technically valid:</p>

<code># My robots.txt file

User-agent: *
Disallow: /folder/ # My private folder
Disallow: /file.htm # My private file</code>

<p>If you want to prevent robots from indexing anything at all on your site, you could add the following to your robots.txt file:</p>

<code>User-agent: *
Disallow: /</code>

<p>If you want to prevent all robots, except for a particular one or two, from accessing a folder, you could write a file like this, which will allow GoogleBot to index everything on your site, but prevent all other robots from accessing the folder called, imaginatively, "folder":</p>

<code>User-agent: googlebot
Disallow:

User-agent: *
Disallow: /folder/</code>

<p>Please note: Many people believe that it is necessary to define the robot-specific rules before the general rules. This is not necessary according to the robots.txt exclusion standard, however there is no evidence of it causing problems, so may be worth doing, if there is a small chance it will help things to work as you intend.</p>

<p>Once you have written a robots.txt file, it is often a good idea to run it through a validator to check for errors, as they may do considerable harm if they prevent your site from being indexed. SearchEngineWorld's <a href="http://www.searchengineworld.com/cgi-bin/robotcheck.cgi">robots.txt validator</a> is the most proficient of those available, or if you prefer, there is a <a href="http://tool.motoricerca.info/robots-checker.phtml">validator that understands more unusual commands</a> like Crawl-delay available as well.</p>

<h3>Example Files</h3>

<p>This is the robots.txt file for AddedBytes.com. As you can see, I have disallowed the indexing of a few files, but not many. Specifically, I have asked Google not to index "404.php", which is the page a user is redirected to if a page is not found, and "friend.php", which is linked to from every page, but is there to allow users to refer friends to the site, and so should not really be indexed.</p>

<code>User-agent: *
Disallow: /404.php
Disallow: /friend.php</code>

<p>This file, from eBay, is again quite short, and simply specifies a few folders that should not be indexed:</p>

<code>User-agent: *
Disallow: /help/confidence/ 
Disallow: /help/policies/ 
Disallow: /disney/</code>

<p>As you can see, Google will still list <a href="http://www.google.co.uk/search?hl=en&amp;lr=&amp;ie=UTF-8&amp;sa=G&amp;q=%22%2Bwww.ebay.%2Bcom/help/confidence/%22">pages excluded by robots.txt</a>, as Google is still aware they exist. However, Google will not index the content of the page and the page will not show up in searches except where a search includes the address of the excluded page.</p>
 
<h3>Blank robots.txt files</h3>

<p>It may be that you do not want to prevent spiders from indexing anything on your site. If that is the case, you should still add a robot.txt file, but an empty one, of this format:</p>

<code>User-agent: * 
Disallow:</code>

<p>This prevents spiders from generating a 404 error when the robots.txt file isn't found. It is basically just good practice to add a blank robots file, at the least, but not essential.</p>

<h3>Be Careful</h3>

<p>You may be thinking that adding the addresses of folders you do not with robots to index is a good way to prevent spiders from accidentally indexing sensitive areas of your site, like an administration area. While this is true, remember that anybody at all can view your robots.txt file, and therefore find the address(es) you'd rather were not indexed. If that includes your admninistration area, you may have saved them the trouble of searching for it.</p>

<p>There have been websites with unprotected administration areas online, whose admin area was hidden in an unusually named folder for "security" reasons - who added the name of the folder to their robots.txt file, opening up their admin area to anyone who wanted to have a poke around.</p>

<p>You must also be careful when writing your robots.txt file. Robots will usually err on the side of caution. If they do not recognise a command, they may well assume you meant them to stay away. Syntax errors in a robots.txt file can prevent your entire site from being indexed, so check it thoroughly before uploading it!</p>

<h3 id="uaspecific">User-Agent Specific Commands</h3>

<p><strong>GoogleBot</strong></p>

<p>Googlebot has no extra commands specific to it, however it is allegedly a little brighter than the average crawler. Googlebot will supposedly understand wild card characters (*) in the "Disallow" field of the robots.txt file. However, Googlebot is the only engine even rumoured to be able to do this, so you would be wise to avoid using wild cards in the disallow field wherever possible.</p>

<p><strong>MSNBot and Slurp</strong></p>

<code>User-Agent: msnbot
Crawl-Delay: 10

User-Agent: Slurp
Crawl-Delay: 10</code>

<p>The above code is specific to MSN's spider, "MSNBot", and Inktomi's spider, "Slurp", and instructs the spiders to wait the specified amount of time, in seconds (10 seconds above, default is 1 second if not specified) before requesting another page from your site. MSNBot and Slurp have been known to index some sites very heavily, and this allows webmasters to slow down their indexing speed.</p>

<p>You could technically use this command with a user agent of "*" as well - the robots.txt exclusion standard instructs robots to just ignore commands they do not understand. However, if a robot sees something they do not understand in a robots.txt file, they may just not index your site. If using the "Crawl-Delay" command, you would be wiser to specify the user agents it should apply to.</p>

<h3>List of User-Agent Names</h3>

<ul><li>Google: "googlebot"</li><li>Google's Image Search: "Googlebot-Image"</li><li>MSN: "msnbot"</li><li>Inktomi: "Slurp"</li><li>AllTheWeb: "fast"</li><li>AskJeeves: "teomaagent1" or "directhit"</li><li>Lycos: "lycos"</li></ul> <br><br>]]></description>
				<pubDate>Mon, 19 Jul 2004 12:09:17 +0100</pubDate>
				<guid isPermaLink="false">http://www.addedbytes.com/articles/online-marketing/robots-txt-file/</guid>
				<dc:creator>Dave Child</dc:creator>
				<a href="/feeds/tag-feed/?tags=google&amp;start=0" class="ditto_tag" rel="tag">google</a>,<a href="/feeds/tag-feed/?tags=html&amp;start=0" class="ditto_tag" rel="tag">html</a>,<a href="/feeds/tag-feed/?tags=online+marketing&amp;start=0" class="ditto_tag" rel="tag">online marketing</a>,<a href="/feeds/tag-feed/?tags=robots&amp;start=0" class="ditto_tag" rel="tag">robots</a>,<a href="/feeds/tag-feed/?tags=robots.txt&amp;start=0" class="ditto_tag" rel="tag">robots.txt</a>,<a href="/feeds/tag-feed/?tags=seo&amp;start=0" class="ditto_tag" rel="tag">seo</a>
			</item>
	</channel>
</rss>