I removed a page from my website, but it's still in Google's index
If you've ever used Google's Webmaster Tools, you might have come across a few error messages indicating that a page could not be found on your website. Usually the case is that a page existed on your site when it was indexed by Google, but that page has since been deleted.
If you like to completely remove pages after their usefulness has diminished, it makes sense to notify search engines that your are doing so.
Google gives us a method of doing so in the form of a META tag. Use the example below on any page you wish to be removed from Google's index after a certain date.
<meta name="GOOGLEBOT" content="unavailable_after: 01-Jan-2008 16:00:00 EST">
Blogs and Search Engine Optimization
It's a well known fact that search engines love blogs. They thrive on the consistently updated content. But is it enough to simply offer loads of content? Remember that search engines need to parse your pages to separate HTML tags from actual content. It therefore makes sense to be sure that your content is marked up properly to make it search engine friendly.
If you are using hosted blog service, there isn't much you can do to modify the HTML to make it more search engine friendly. I recommend hosting your own blog and choosing a popular open-source blogging platform to install on your server. I personally use bBlog, but had to modify the code a little bit to make sure the proper HTML tags were being used.
For example, the first thing I noticed was that page titles were simply wrapped in a div or h2 tag. This does nothing to emphasize to search engines that this is the title of the page. Page titles should be wrapped in h1 tags. Also be sure that the page title is reflected in the title meta tag in the HTML header.
The second improper tag use I discovered was the use of the h2 or h3 tag to wrap the date of the blog entry. While the date is important, it hardly can be considered a subtitle of the article, which is what the h2 and h3 tags should be used for.
Make sure to modify your blog software to make it as SEO friendly as your knowledge of SEO allows.
Don't neglect images in your search engine optimization strategy
If you've ever performed a Google Image search, you know what I'm talking about. If you issue a search for 'red apple', many of the images that are returned are named 'red-apple.jpg' or 'red_apple.gif'. It appears that Google makes use of the images file name as the most important factor in indexing images.
So, if you have an image of a red apple, your HTML image tag should look like this:
<img src="red-apple.jpg" alt="red apple" />
Note the use of the hyphen (-) instead of an underscore (_) to separate the words red and apple. It has been said that Google prefers the hyphen to the underscore when parsing words in a file's name. The alt text should also describe the image appropriately. Finally, to keep with XHTML compliance, the trailing slash before the tag closes.
So, why would we want to optimize our images for a Google Image search? Well, try an image search on any term. Click on one of the results and notice that you are directed to the image itself in one frame and the website which hosts the image in another frame. And there you have it, a website is reached by a user from a Google Image search.
SEO friendly URLs and Object Databases
Most of us are aware that search engines have a much easier time digesting URLs that do not include query string parameters. Search engines, mainly Google, especially frown upon pages that use the ‘id=’ parameter to locate a specific record in a database for retrieval.
The big question is, how do we pull content from a database without identifying it’s primary key with the ‘id=’ query string parameter. The answer starts with your web server and ends with an object database.
I found it particularly important to provide search engine friendly URLs for one site I was building and decided to do a little research. I wrote a small tutorial on configuring the Apache web server . This configuration allowed me to define a script, article.php, where any path information in the URL following article would be provided to the script. For example, if the following URL was called:
mysite.com/article/how-tos/home-repair/replace-electrical-outlets
The ’/how-tos/home-repair/replace-electrical-outlets’ would be provided to my script, in the PATH_INFO server variable. It then becomes a simple matter of parsing the path into an array, or any other structure useful to you.
This is where the next problem came into play. I had many different tables in my database, each referring to a specific type of object. Let’s say I had a table for categories, articles and reviews. If I were to use the path above, how will my code know which table / object ‘replace-electrical-outlets’ refers to. Is it a category, article, or a review?
I looked to the object database for a solution to my problem. After becoming handy with the Zope object database, I decided to create my own. Not knowing anything about the concepts and design behind them, I came up with something that worked for my purposes.
Basically, I have one “master” table in my database called ‘object’. This table contains:
- id
- parent_id
- user_id
- object_type
- local_id
- title
Also, a few other fields not relevant to this article. Id, being the primary, basically gave each object a globally unique identifier, so even two objects of different types would never have the same id. Parent_id allows our hierarchical relationship between objects, where an ‘article’ can have a ‘category’ parent, or even a ‘category’ can have another ‘category’ parent.
The User_id specifies which user created this object. Object_type simply indicates which type of object this is, in my case, a ‘category’, ‘article’, or ‘review’. Local_id is the name you see in the URL. In the case of the example article, local_id would be ‘replace-electrical-outlets’.
And finally, title specifies the human readable title, in this case, ‘Replace Electrical Outlets’.
In some cases, certain objects had other attributes. Reviews, for example, needed to have a ranking from 1 to 5, so an integer field was necessary. Since not all objects needed this extra field, I was reluctant to add it to the ‘object’ table. Instead, I created a completely new table called ‘review’. The review table contained the following fields:
- object_id
- ranking
The object_id is simply a one-to-one relationship with the ‘object’ table’s id. Now, when a ‘review’ record is identified in the object table, we use the ‘object_type’ field to specify that this object is a ‘review’ record and a second query needs to be called on the ‘review’ table to retrieve the extra information.
So far this approach has been working for me. I’m sure there are more efficient ways, but for my small site, this seemed to make the most sense.