Google is said to contain everything. Every bit of information a human being can think of is contained on Google. The information can be good or bad. There are various types of information which should be blocked from the web index and this article deals with such cases:
- The confidential content on the server should be saved in a password protected directory. This content if not stored in the protected directories, will be shown on the web. On saving these in such directories will not make the Google and the web spiders access such content. This is the most effective and easiest way for preventing these spiders and Google bot accesses such content. The users of apache webserver have the advantage of editing the .htaccess file to for protecting the password directory.
- A noindex meta tag should be used which prevents the content from appearing in the search results. Google completely drops the page from the search results on seeing a noindex meta tag on the page. Even if the other pages are linked to it, Google will drop it from search preferences.
- If the content appears presently in the index, then it will be removed next time when it is crawled. The remove URLs in the Google Webmaster tools should be used. Other search engines may take this differently, so when links are present, then it may appear in the search results. When the noindex meta tag appears on a page, Google respects it and removes from the search terms. For controlling the access to the files and the directories on the server, robnots.txt should be used. The robots.txt file acts as an electronic No trespassing sign. It helps Google bots and other robots to avoid access to particular files and directories on the server. The root of the host should be accessed for using the robots.txt file. The access can be restricted by using the robots meta tag on the pages individually if the access to the root of the domain is denied.
- It is very important to note the following case. Even if the robot.txt files are used for blocking the Google bots and other crawlers from accessing content, there are various other ways in which the content can be accessed and can be added to the web index. For example, there may be links present from the other sites, which will make the URL of the page and other important information such as the anchor texts may still appear in the search terms. Some of the robots may interpret the robots.txt files differently. It is usually avoided by the spammers and the trouble makers. For such reasons, Google recommends password-protecting confidential information.
So, these are the ways with which you can block people from accessing your content on the website. Hope, this article was helpful to you. Join the discussion forum by logging in.