Posted: Nov 17, 04, 16:36 techAdmin
Status: Site Admin
Joined: 26 Sep 2003
Location: East Coast, West Coast? I know it's one of them.
You've heard of robots.txt, and maybe aren't clear on how it works. First, read the robots.txt specification.
It's pretty simple really, not much to it. All you can do with robots.txt is disallow files or robots. By slightly twisting the syntax you can also explicitly allow only certain robots.
robots.txt must be lower case, not Robots.txt, and must be placed in the root directory of your website. You cannot place a robots.txt file in any other directory; well, you can, but it will be ignored. It is a plain text file, any text editor can be used to create it.
The root directory is where your homepage usually lives.
means all user agents. This is the only place a wildcard type character is allowed.
means this only applies to that particular bot.
disallows your entire website.
allows full access to your entire website. You generally would not need to do that though, it's assumed you give access unless you have explicitly Disallowed a file or folder(s) in robots.txt. Basically you are telling it to disallow nothing - in other words allow everything.
disallows only that file in that folder for all bots.
disallows all files in folder1
disallows all files in folder1, as well as any file or folder beginning with the characters 'folder1', for example folder1.html, /folder1b/file2.html, etc. This syntax is how you handle wildcard Disallow: file/folder exclusions.
disallows all files beginning with 'profile' in the 'forums' folder.
You can have many listings of disallowed files, as well as multiple categories of allowed/disallowed fields. Here is a sample robots.txt file, designed to block search engines from indexing irrelevant forum links:
Turnitinbot in this case respects robots.txt, but we don't want it in our site.
And that's really all there is to it, it's probably just about the simplest standard to master out there.
Back to top
All times are GMT - 8 Hours