15 Jan, 2010  |  Written by Goginoo  |  under Robots.txt

If you're new here, you may want to subscribe to my RSS feed. Thanks for visiting!


What are Robots?


Search engines, sent spiders (Robots) to crawl into your site to index your web pages.
When the robots crawls into your website pages they get all the information that you have in your site and index it in the search engine, that how you can find your web pages in the search engine.

What is Robots.txt file?


Well, you can say it’s like a door for the search engine robots that can help the robots to know what exactly to index and what to not index!

I mean if you want the robots of search engine X or Y to index your web pages or to not index a folder or file in you site.

Lets say that you want to from all search engines to disallow robots from indexing your cgi-bin folder, you can use the robots.txt to file and put some disallow robots.
Or if you want just Google robots to index or not index your web pages you can do it by using robots.txt file.

So, Robots.txt file its very important to the search engines, and its good for you to select files and directories for the robots to index them, its more better than the allowing the robots to index what they want in your site, this may cause for you duplicate content issue.

Lets say that you are using WordPress script, and you didn’t make a robots.txt file, this will cause you duplicate content, because the robots will index what they want and they will index unnecessary files in the search engines that you don’t want to show up, if you type some orders in Robots.txt file you will avoid pulling in duplicate content, because you select the important files and directories to the robots to index them, then you will not get a poor rank, you will get high rank because you just have the important pages in the search engines.

How can I make a robots.txt file? How to upload robots.txt file to my site? And how I write the right orders to help search engine spiders to index what pages or folders that I want?


First, make a new txt file (Note pad) and name it;  robots… that’s will be robots.txt.

Important Note: the name of the txt file must be robots.

After you make a new txt file open it and write “User-agent: *” User-agent mean, what is the name of search engine robot that you want to make an orders for, and the star (*) is to allow all the search engine robots to index all your pages.

Robots txt file screenshot:

robots.txt file

Now if you want to disallow robots to index any folder of your site, type under User-agent: * the order (Disallow:)

If you want to disallow robots to index the cgi-bin folder or admin folder just type slash (/) after the Disallow: and then type your folder name, our folder name is cgi-bin and finally type another slash (/).

Must be like this:

Disallow: /cgi-bin/

If you type Disallow: / just slash without any folder name, it’s like a shortcut of all your website folders. that will Disallow robots to index all your website, that mean your website will not appear in the search engine, so don’t make a mistake .

Disallow in Robots.txt file screenshot:

disallow robots.txt

If you want to disallow another folder Retype Disallow: /(Folder Name)/

Its will be like this:

User-agent: *

Disallow: /cgi-bin/
Disallow: /(Folder Name)/

If you want to disallow robots from indexing a dir but also in the same time you want to allow a web page to be indexed in the dir that you disallowed, you can use this:

User-agent: *

Disallow: /dir that you disallowed/
Allow: /dir that you disallowed/webpage.htm

Note: don’t add slash (/) after a file URL, we add slash just after the folders.
If you want just Google robots to index your website, you can type the name of Google robots in the robots.txt file.

Like this:

User-agent: *

Disallow: /

User-Agent: Googlebot

Allow: /
Disallow: /cgi-bin/

Googlebot is the name of Google robots, the meaning of the orders:

User-agent: *

Disallow: /

This mean, that all the search engines will not index your website.

But…

User-Agent: Googlebot

Allow: /
Disallow: /cgi-bin/

The orders here means: Googlebot will index all of your website pages, but Googlebot will not index your cgi-bin folder.

So, that means, all the robots will not index your site, just Google will index your website.

Important Note: if your folder name is “Go” G capital and you typed “go” small capital, the robots will ignore it.

I mean if you type “Disallow: /go/ and your folder name is “Go” by capital letter robots will ignore it and will continue indexing the “GO” folder.

Useful Note: if you want to guide the robots to your sitemap, and that’s very useful for robots to make them index well and faster your website.

Open your robots txt file and type under User-Agent: *

Sitemap: http://www.goginoo.com/sitemap.xml (or your sitemap name)

Must be like this:

User-Agent: *

Sitemap: http://www.goginoo.com/sitemap.xml

What is the name of Yahoo, Google, and Msn Robots?

Google Robots: Googlebot

Google Adsense Robots: Mediapartners-Google

Google Image Search: Googlebot-Image

Yahoo Robots: Yahoobot

Yahoo Blog Search: Yahoo-Blogs

Yahoo Multimedia Search: Yahoo-MMAudVid

Msn Robots: Msnbot

Where to upload Robots.txt file?

Upload it to the primary folder of you site:  root or www or home or public_html.

goginoo.com/robots.txt

Note: Don’t forget to add your xml sitemap URL in your Robots.txt file.

Done.

Related Posts

15 Responses so far | Have Your Say!

  1. Netchunks Web Magazine  |  January 15th, 2010 at 1:34 pm #

    Robots.txt is a very useful tool if you do not want the search engine crawlers to index/crawl some particular pages of your site. In WordPress it is very easy to generate a robot.txt for your site. You have so many plugins but for other type of sites it is a little bit manual and hard :)

    Netchunks Web Magazine - Gravatar
  2. Goginoo  |  January 15th, 2010 at 4:53 pm #

    yea you are right 100%, another types of sites its will be hard, but in wordpress there is a lot of duplicate content issues, so you have to tell the robots exactly what are the important pages and files to index.

    Thanks a lot for visiting us :)

    Goginoo - Gravatar
  3. life insurance companies in california  |  January 23rd, 2010 at 5:00 am #

    The blog was absolutely fantastic! Lots of great information and inspiration, both of which we all need!

    life insurance companies in california - Gravatar
  4. Mitch Jobson  |  February 10th, 2010 at 4:25 pm #

    Hi, good day! Your article is incredibly uplifting. I never considered that it was feasible to do something like that until after I checked out your write-up. You definitely gave an excellent insight on exactly how this kind of whole scheme functions. I’ll always come back for more advice. Keep writing!

    Mitch Jobson - Gravatar
  5. Leigha Detterich  |  February 11th, 2010 at 10:03 pm #

    Simply commenting to say your article is striking. The clearness in your post is simply striking and i can take for given you are an expert on this topic. Well with your permission I will grab your rss feed to keep up to date with forthcoming posts. Thanks a billion and please keep up the accomplished work. Apologise my sad English. It is not my mother language.

    Leigha Detterich - Gravatar
  6. Goginoo  |  February 12th, 2010 at 12:40 am #

    Thanks my friend it’s really nice to hear that there are people likes my articles :)

    Goginoo - Gravatar
  7. Goginoo  |  February 12th, 2010 at 12:40 am #

    Thank you very much! You are welcome dude :)

    Goginoo - Gravatar
  8. Edward Mccollam  |  February 12th, 2010 at 1:26 am #

    Great article. There’s a lot of good info here, though I did want to let you know something – I am running Ubuntu with the latest beta of Firefox, and the layout of your blog is kind of flaky for me. I can read the articles, but the navigation doesn’t work so well.

    Edward Mccollam - Gravatar
  9. Goginoo  |  February 12th, 2010 at 2:01 am #

    Welcome my friend, i tested my site in the most browsers including ubuntu i didn’t find any problem!

    Please can you provide for me a screenshot for the problem?

    Goginoo - Gravatar
  10. Eddie Bansag  |  February 12th, 2010 at 11:17 am #

    Great article! Really enjoyed reading.

    Eddie Bansag - Gravatar
  11. wp-popular.com » Blog Archive » Do you what is the robots.txt?  |  February 19th, 2010 at 2:21 am #

    [...] the article here: Do you what is the robots.txt? Tags: engine, rank, robots, [...]

    wp-popular.com » Blog Archive » Do you what is the robots.txt? - Gravatar
  12. Yoshiko Manasares  |  February 19th, 2010 at 6:53 pm #

    Very good Posting, you give detail information and give explain about how to make money online, I think information like this very useful for beginner like me..

    Yoshiko Manasares - Gravatar
  13. Goginoo  |  February 19th, 2010 at 9:04 pm #

    Thanks you my friend ;)

    Goginoo - Gravatar
  14. Viviana Chung  |  February 22nd, 2010 at 6:04 pm #

    Heya i got to your site by mistake when i was searching bing for something off topic here but i do have say your site is really helpful, like the theme and the content on here…so thanks for me procrastinating from my previous task, lol

    Viviana Chung - Gravatar
  15. Real Estate Leads  |  March 29th, 2010 at 1:57 am #

    I’m not big on commenting, but nice post. I always found good quality information from this site! I will keep visiting this blog very often.

    Real Estate Leads - Gravatar