Developing an SEO-Friendly Website For WordPress And Other Websites Part-2

Keyword Targeting

The search engines face a tough task; based on a few words in a query, sometimes only one, they must return a list of relevant results, order them by measures of importance, and hope that the searcher finds what he is seeking. As website creators and web content publishers, you can make this process massively simpler for the search engines and, in turn, benefit from the enormous traffic they send by employing the same terms users search for in prominent positions on your pages. This practice has long been a critical part of search engine optimization, and although other metrics (such as links) have a great deal of value in the search rankings, keyword usage is still at the core of targeting search traffic. The first step in the keyword targeting process is uncovering popular terms and phrases that searchers regularly use to find the content, products, or services your site offers. There’s an art and science to this process, but it consistently begins with a list of keywords to target. Once you have that list, you’ll need to include these in your pages. In the early days of SEO, the process involved stuffing keywords repetitively into every HTML tag possible. Now, keyword relevance is much more aligned with the usability of a page from a human perspective. Since links and other factors make up a significant portion of the search engines’ algorithms, they no longer rank pages with 61 instances of “free credit report” above pages that contain only 60. In fact, keyword stuffing, as it is known in the SEO world, can actually get your pages devalued via search engine penalties. The engines don’t like to be manipulated, and they recognize keyword stuffing as a disingenuous tactic. Figure 6-18 shows an example of a page utilizing accurate keyword targeting.

FIGURE 6-18. Title and headings tags-powerful for SEO
FIGURE 6-18. Title and headings tags-powerful for SEO

Keyword usage includes creating titles, headlines, and content designed to appeal to searchers in the results (and entice clicks), as well as building relevance for search engines to improve your rankings. Building a search-friendly site requires that the keywords searchers use to find content are prominently employed. Here are some of the more prominent places where a publisher can place those keywords .

Title Tags

For keyword placement, title tags are the most critical element for search engine relevance.The title tag is in the <head> section of an HTML document, and is the only piece of “meta” information about a page that influences relevancy and ranking. To give you an idea, when 72 well-known SEOs voted on what they believed were the most important ranking factors in Google’s algorithm, the majority of them said that a page’s title tag was the most important attribute (see SEOmoz’s search engine ranking factors study at http://www.seomoz.org/article/search-ranking-factors for more information). The following eight rules represent best practices for title tag construction. Do keep in mind, however, that a title tag for any given page must directly correspond to that page’s content. You may have five different keyword categories and a unique site page (or section) dedicated to each, so be sure to align a page’s title tag content with its actual visible content as well.

Place your keywords at the beginning of the title tag

This provides the most search engine benefit, and thus, if you want to employ your brand name in the title tag, place it at the end. There is a tradeoff here between SEO benefit and branding benefit that you should think about and make explicitly. Major brands may want to place their brand at the start of the title tag as it may increase click-through rates. To decide which way to go you need to consider which need is greater for your business.

Limit length to 65 characters (including spaces)

Content in title tags after 65 characters is probably given less weight by the search engines. At a minimum, the title tag shown in the SERPs gets cut off at 65 characters. Watch this number carefully, though, as Google in particular is now supporting up to 70 characters in some cases.

Incorporate keyword phrases

This one may seem obvious, but it is critical to prominently include in your title tag whatever your keyword research shows as being the most valuable for capturing searches.

Target longer phrases if they are relevant

When choosing what keywords to include in a title tag, use as many as are completely relevant to the page at hand while remaining accurate and descriptive. Thus, it can be much more valuable to have a title tag such as “SkiDudes | Downhill Skiing Equipment & Accessories” rather than simply “SkiDudes | Skiing Equipment”-including those additional terms that are both relevant to the page and receive significant search traffic can bolster your page’s value. However, if you have separate landing pages for “skiing accessories” versus “skiing equipment,” don’t include one term in the other’s title. You’ll be cannibalizing your rankings by forcing the engines to choose which page on your site is more relevant for that phrase, and they might get it wrong. We will discuss the cannibalization issue in more detail shortly.

Use a divider

When splitting up the brand from the descriptive, options include | (a.k.a. the pipe), >, -, and :, all of which work well. You can also combine these where appropriate-for example, ”Major Brand Name: Product Category – Product”. These characters do not bring an SEO benefit, but they can enhance the readability of your title.

Focus on click-through and conversion rates

The title tag is exceptionally similar to the title you might write for paid search ads, only it is harder to measure and improve because the stats aren’t provided for you as easily. However, if you target a market that is relatively stable in search volume week to week, you can do some testing with your title tags and improve the click-through. Watch your analytics and, if it makes sense, buy search ads on the page to test click-through and conversion rates of different ad text as well, even if it is for just a week or two. You can then look at those results and incorporate them into your titles, which can make a huge difference in the long run. A word of warning, though: don’t focus entirely on click-through rates. Remember to continue measuring conversion rates.

Target searcher intent

When writing titles for web pages, keep in mind the search terms your audience employed to reach your site. If the intent is browsing or research-based, a more descriptive title tag is appropriate. If you’re reasonably sure the intent is a purchase, download, or other action, make it clear in your title that this function can be performed at your site. Here is an example from http://shopper.cnet.com: “Buy Digital cameras – Best Digital camera prices - Shopper.com”.

Be consistent

Once you’ve determined a good formula for your pages in a given section or area of your site, stick to that regimen. You’ll find that as you become a trusted and successful “brand” in the SERPs, users will seek out your pages on a subject area and have expectations that you’ll want to fulfill.

Meta Description Tags

Meta descriptions have three primary uses:

_ To describe the content of the page accurately and succinctly

_ To serve as a short text “advertisement” to click on your pages in the search results

_ To display targeted keywords, not for ranking purposes, but to indicate the content to searchers

Great meta descriptions, just like great ads, can be tough to write, but for keyword-targeted pages, particularly in competitive search results, they are a critical part of driving traffic from the engines through to your pages. Their importance is much greater for search terms where the intent of the searcher is unclear or different searchers might have different motivations. Here are seven good rules for meta descriptions:

Tell the truth

Always describe your content honestly. If it is not as “sexy” as you’d like, spice up your content; don’t bait and switch on searchers, or they’ll have a poor brand association.

Keep it succinct

Be wary of character limits-currently Google displays up to 160 characters, Yahoo! up to 165, and Bing up to 200+ (they’ll go to three vertical lines in some cases). Stick with the smallest-Google-and keep those descriptions at 160 characters (including spaces) or less.

Author ad-worthy copy

Write with as much sizzle as you can while staying descriptive, as the perfect meta description is like the perfect ad: compelling and informative.

Test, refine, rinse, and repeat

Just like an ad, you can test meta description performance in the search results, but it takes careful attention. You’ll need to buy the keyword through paid results (PPC ads) so thatyou know how many impressions critical keywords received over a given time frame. The nyou can use analytics to see how many clicks you got on those keywords and calculateyour click-through rate.

Analyze psychology

Unlike an ad, the motivation for a natural-search click is frequently very different from that of users clicking on paid results. Users clicking on PPC ads may be very directly focused on making a purchase, and people who click on a natural result may be more interested in research or learning about the company. Don’t assume that successful PPC ad text will make for a good meta description (or the reverse).

Include relevant keywords

It is extremely important to have your keywords in the meta description tag-the boldface that the engines apply can make a big difference in visibility and click-through rate. In addition, if the user’s search term is not in the meta description, chances are reduced that the meta description will be used as the description in the SERPs.

Don’t employ descriptions universally

You shouldn’t always write a meta description. Conventional logic may hold that it is usually wiser to write a good meta description yourself to maximize your chances of it being used in the SERPs, rather than let the engines build one out of your page content; however, this isn’t always the case. If the page is targeting one to three heavily searched terms/phrases, go with a meta description that hits those users performing that search. However, if you’re targeting longer tail traffic with hundreds of articles or blog entries or even a huge product catalog, it can sometimes be wiser to let the engines themselves extract the relevant text. The reason is simple: when engines pull, they always display the keywords (and surrounding phrases) that the user searched for. If you try to force a meta description, you can detract from the relevance that the engines make naturally. In some cases, they’ll overrule your meta description anyway, but since you can’t consistently rely on this behavior, opting out of meta descriptions is OK (and for massive sites, it can save hundreds or thousands of man-hours).

Heading (H1, H2, H3) Tags

The H(x) tags in HTML (H1, H2, H3, etc.) are designed to indicate a headline hierarchy in a document. Thus, an H1 tag might be considered the headline of the page as a whole, whereas H2 tags would serve as subheadings, H3s as tertiary-level headlines, and so forth. The search

FIGURE 6-19. Headings styled to match the site
FIGURE 6-19. Headings styled to match the site

engines have shown a slight preference for keywords appearing in heading tags, notably the H1 tag (which is the most important of these to employ). In some cases, you can use the title tag of a page, containing the important keywords, as the H1 tag. However, if you have a longer title tag, you may want to use a more focused, shorter heading tag using the most important keywords from the title tag. When a searcher clicks a result at the engines, reinforcing the search term he just typed in with the prominent headline helps to indicate that he has arrived on the right page with the same content he sought. Many publishers assume that what makes the H1 a stronger signal is the size at which it is displayed. For the most part, the styling of your heading tags is not a factor in the SEO weight of the heading tag. You can style the tag however you want, as shown in Figure 6-19, provided that you don’t go to extremes (e.g., make it too small to read).

Document Text

The HTML text on a page was once the center of keyword optimization activities. Metrics such as keyword density and keyword saturation were used to measure the perfect level of keyword usage on a page. To the search engines, however, text in a document, particularly the frequency with which a particular term or phrase is used, has very little impact on how happy a searcher will be with that page. In fact, quite often a page laden with repetitive keywords attempting to please the engines will provide a very poor user experience; thus, although some SEO professionals today do claim to use term weight (a mathematical equation grounded in the real science of information retrieval) or other, more “modern” keyword text usage methods, nearly all optimization can be done very simply. The best way to ensure that you’ve achieved the greatest level of targeting in your text for a particular term or phrase is to use it in the title tag, in one or more of the section headings (within reason), and in the copy on the web page. Equally important is to use other related phrases within the body copy to reinforce the context and relevance of your main phrase for the page. Although it is possible that implementing more instances of the key phrase on the page may result in some increase in ranking, this is increasingly unlikely to have an impact as you add more instances of the phrase. In addition, it can ruin the readability of some documents, which could hurt your ability to garner links to your site. Furthermore, testing has shown that document text keyword usage is such a small factor with the major engines that even one link of very low quality is enough to outweigh a page with perfect keyword optimization versus one that simply includes the targeted phrase naturally on the page (two to 10 times, depending on length).

This doesn’t mean keyword placement on pages is useless-you should always strive to include the keyword you’re targeting at least a few times depending on document length-but it does mean that aiming for “perfect” optimization on every page for every term is overkill and largely unnecessary.

Image Filenames and Alt Attributes

Incorporation of images on web pages can substantively enrich the user experience. However, the search engines cannot read the images directly. There are two elements that you can control to give the engines context for images:

The filename

Search engines look at the image filename to see whether it provides any clues to the content of the image. Don’t name your image example.com/img4137a-b12.jpg, as it tells the search engine nothing at all about the image, and you are passing up the opportunity to include keyword-rich text.If it is a picture of Abe Lincoln, name the file abe-lincoln.jpg and/or have the SRC URL string contain it, as in example.com/abe-lincoln/portrait.jpg.

Image alt text

Image tags in HTML permit you to specify an attribute known as the alt attribute. This is a place where you can provide more information about what is in the image, and again where you can use your targeted keywords. Here is an example for the picture of Abe

Lincoln:

<img alt=”Abe Lincoln photo” src=”http://example.com/abe-lincoln.jpg” >

Use the quotes if you have spaces in the text string of the alt content! Sites that have invalid img tags frequently lump a few words without quotes into the img tag, intended for the alt content-but with no quotes, all terms after the first word will be lost. This usage of the image filename and of the alt attribute permits you to reinforce the major keyword themes of the page. This is particularly useful if you want to rank in image search. Make sure the filename and the alt text reflect the content of the picture, and do not artificially emphasize keywords unrelated to the image (even if they are related to the page). Although the alt attribute and the image filename are helpful, you should not use image links as a substitute for text links with rich anchor text that carries much more weight from an SEO perspective. Presumably, your picture will relate very closely to the content of the page, and using the image filename and the alt text will help reinforce the page’s overall theme.

Boldface Text

Some SEO professionals who engage in considerable on-page optimization testing have noticed that, all else being equal, a page that employs the targeted keyword(s) in <b> or <strong> tags (HTML elements that boldface text visually) outrank their unbolded counterparts. Thus, although this is undoubtedly a very small factor in modern SEO, it may be worth leveraging, particularly for those looking to eke every last bit of optimization out of keyword usage.

Avoiding Keyword Cannibalization

As we discussed earlier, you should not use common keywords across multiple page titles. This advice applies to more than just the title tags. One of the nastier problems that often crops up during the course of a website’s information architecture, keyword cannibalization refers to a site’s targeting of popular keyword search phrases on multiple pages, forcing the engines to pick which one is most relevant. In essence, a site employing cannibalization competes with itself for rankings and dilutes the ranking power of internal anchor text, external links, and keyword relevancy. Avoiding cannibalization requires strict site architecture with attention to detail. Plot out your most important terms on a visual flowchart (or in a spreadsheet file, if you prefer), and pay careful attention to what search terms each page is targeting. Note that when pages feature two-, three-, or four-word phrases that contain the target search phrase of another page, linking back to that page within the content with the appropriate anchor text will avoid the cannibalization issue. For example, if you had a page targeting “mortgages” and another page targeting “low-interest mortgages,” you would link back to the “mortgages” page from the “low-interest mortgages” page using the anchor text “mortgages” (see Figure 6-20). You can do this in the breadcrumb or in the body copy. The New York Times (http://www.nytimes.com) does the latter, where keywords in the body copy link to the related resource page on the site.

FIGURE 6-20. Adding lots of value with relevant cross-links
FIGURE 6-20. Adding lots of value with relevant cross-links

Keyword Targeting in CMSs and Automatically Generated Content

Large-scale publishing systems, or those that produce automatically generated content, present some unique challenges. If hundreds of pages are being created every day, it is not feasible to do independent keyword research on each and every page, making page optimization an interesting challenge. In these scenarios, the focus turns to methods/recipes for generating unique titles, <h1> headings, and page content for each page. It is critical to educate the writers on ways to implement titles and headings that capture unique, key aspects of the articles’ content. More advanced teams can go further with this and train their writing staff on the use of keyword research tools to further optimize this process. In the case of automatically generated material (such as that produced from algorithms that mine data from larger textual bodies), the key is to automate means for extracting a short (fewer than 70 characters) description of the article and making it unique from other titles generated elsewhere on the site and on the Web at large.

SEO Copywriting: Encouraging Effective Keyword Targeting by Content Creators

Very frequently, someone other than an SEO professional is responsible for content creation. Content creators often do not have an innate knowledge as to why keyword targeting is important-and therefore, training for effective keyword targeting is a critical activity. This is particularly important when dealing with large websites and large teams of writers. Here are the main components of SEO copywriting that your writers must understand:

_ Search engines look to match up a user’s search queries with the keyword phrases on your web pages. If the search phrases do not appear on the page, chances are good that your page will never achieve significant ranking for that search phrase.

_ The search phrases users may choose to use when looking for something are infinite in variety, but certain phrases will be used much more frequently than others.

_ Using the more popular phrases you wish to target on a web page in the content for that page is essential to SEO success for that page.

_ The title tag is the most important element on the page. Next up is the first header (H1), and then the main body of the content.

_ Tools exist (as outlined in Chapter 5) that allow you to research and determine what the most interesting phrases are.

If you can get these five points across, you are well on your way to empowering your content creators to perform solid SEO. The next key element is training them on how to pick the right keywords to use.The most important factor to reiterate to the content creator is that content quality and user experience still come first. Then, by intelligently making sure the right keyphrases are properly used throughout the content, they can help bring search engine traffic to your site. Reverse these priorities and you can end up with keyword stuffing or other spam issues.

The small-volume search terms, when tallied up, represent 70% of all search traffic, and the more obvious, high-volume terms represent only 30% of the overall search traffic. For example, if you run a site targeting searches for new york pizza and new york pizza delivery, you might be surprised to find that hundreds of single searches each day for terms such as pizza delivery on the corner of 57th & 7th, or Manhattan’s tastiest Italian-style sausage pizza, when taken together, will actually provide considerably more traffic than the popular phrases you’ve researched. This concept is called the long tail of search. Targeting the long tail is another aspect of SEO that combines art and science. In Figure 6-21, you may not want to implement entire web pages for a history of pizza dough, pizza with white anchovies, or Croatian pizza.

FIGURE 6-21. Example of the long tail search curve
FIGURE 6-21. Example of the long tail search curve

Finding scalable ways to chase long tail keywords is a complex topic. Perhaps you have a page for ordering pizza in New York City, and you have a good title and H1 header on the page (e.g., ”New York City Pizza: Order Here”), as well as a phone number and a form for ordering the pizza, and no other content. If that is all you have, that page is not competing effectively for rankings on long tail search terms. To fix this, you need to write additional content for the page. Ideally, this would be content that talks about the types of pizza that are popular in New York City, the ingredients used, and other things that might draw in long tail search traffic. If you have a page for San Jose pizza, the picture gets even more complicated. You don’t really want your content on the San Jose page to be the same as it is on the New York City page. This runs into potential duplicate content problems, as we will outline in “Duplicate Content Issues” on page 226, or the keyword cannibalization issues we discussed earlier in this chapter. To maximize your success, find a way to generate different content for those two pages, ideally tuned to the specific needs of the audience that arrives at those pages. Perhaps the pizza preferences of the San Jose crowd are different from those in New York City. Of course, the geographic information is inherently different between the two locations, so driving directions from key locations might be a good thing to include on the page.If you have pizza parlors in 100 cities, this can get very complex indeed. The key here is to remain true to the diverse needs of your users, yet use your knowledge of the needs of search engines and searcher behavior to obtain that long tail traffic.

Content Optimization

Content optimization relates to how the presentation and architecture of the text, image, and multimedia content on a page can be optimized for search engines. Many of these recommendations are second-order effects. Having the right formatting or display won’t boost your rankings directly, but through it, you’re more likely to earn links, get clicks, and eventually benefit in search rankings. If you regularly practice the techniques in this section, you’ll earn better consideration from the engines and from the human activities on the Web that influence their algorithms.

Content Structure

Because SEO has become such a holistic part of website improvement, it is no surprise that content formatting-the presentation, style, and layout choices you select for your content-is a part of the process. Choosing sans serif fonts such as Arial and Helvetica is a wise choice for the Web; Verdana in particular has received high praise from usability/readability experts, such as that which WebAIM offered in an article posted at http://www.webaim.org/techniques/fonts/. Verdana is one of the most popular of the fonts designed for on-screen viewing. It has a simple, straightforward design, and the characters or glyphs are not easily confused. For example, the uppercase I and the lowercase L have unique shapes, unlike in Arial, in which the two glyphs may be easily confused (see Figure 6-22).

FIGURE 6-22. Arial versus Verdana font comparison
FIGURE 6-22. Arial versus Verdana font comparison

Another advantage of Verdana is the amount of spacing between letters. One consideration to take into account with Verdana is that it is a relatively large font. The words take up more space than words in Arial, even at the same point size (see Figure 6-23). The larger size improves readability but also has the potential of disrupting carefully planned page layouts. Font choice is accompanied in importance by sizing and contrast issues. Type that is smaller than 10 points is typically very challenging to read, and in all cases, relative font sizes are recommended so that users can employ browser options to increase/decrease if necessary. Contrast-the color difference between the background and text-is also critical; legibility usually drops for anything that isn’t black (or very dark) on a white background.

Content length and word count

Content length is another critical piece of the optimization puzzle that’s mistakenly placed in the “keyword density” or “unique content” bucket of SEO. In fact, content length can have a big role to play in terms of whether your material is easy to consume and easy to share. Lengthy pieces often don’t fare particularly well on the Web (with the exception, perhaps, of the one page sales letter), whereas short-form and easily digestible content often has more success. Sadly, splitting long pieces into multiple segments frequently backfires, as abandonment increases while link attraction decreases. The only benefit is page views per visit (which is why many sites which get their revenue from advertising employ this tactic).

Visual layout

Last but not least in content structure optimization is the display of the material. Beautiful, simplistic, easy-to-use, and consumable layouts instill trust and garner far more readership and links than poorly designed content wedged between ad blocks that threaten to overtake the page. For more on this topic, you might want to check out “The Golden Ratio in Web Design” from NetTuts (http://nettuts.com/tutorials/other/the-golden-ratio-in-web-design/), which has some great illustrations and advice on laying out web content on the page.

CSS and Semantic Markup

CSS is commonly mentioned as a best practice for general web design and development, but its principles provide some indirect SEO benefits as well. Google used to recommend keeping pages smaller than 101 KB, and it used to be a common belief that there were benefits to implementing pages that were small in size. Now, however, search engines deny that code size is a factor at all, unless it is really extreme. Still, keeping file size low means faster load times, lower abandonment rates, and a higher probability of being fully read and more frequently linked to. CSS can also help with another hotly debated issue: code to text ratio. Some SEO professionals swear that making the code to text ratio smaller (so there’s less code and more text) can help considerably on large websites with many thousands of pages. Your experience may vary, but since good CSS makes it easy, there’s no reason not to make it part of your standard operating procedure for web development. Use tableless CSS stored in external files, keep JavaScript calls external, and separate the content layer from the presentation layer as shown on CSS Zen Garden, a site that offers many user-contributed style sheets formatting the same HTML content. Finally, CSS provides an easy means for “semantic” markup. For a primer, see Digital Web Magazine’s article, “Writing Semantic Markup” (http://www.digital-web.com/articles/writing_semantic_markup/). For SEO purposes, only a few primary tags apply, and the extent of microformats interpretation (using tags such as <author> or <address>) is less critical (the engines tend to sort out semantics largely on their own since so few web publishers participate in this coding fashion, but there is evidence that it helps with local search). Using CSS code to provide emphasis, to quote/reference, and to reduce the use of tables and other bloated HTML mechanisms for formatting, however, can make a positive difference.

Content Uniqueness and Depth

Few can debate the value the engines place on robust, unique, value-added content-Google in particular has had several rounds of kicking “low-quality-content” sites out of its indexes, and the other engines have followed suit. The first critical designation to avoid is “thin content”-an insider phrase that (loosely) refers to content the engines do not feel contributes enough unique material to display a page competitively in the search results. The criteria have never been officially listed, but many examples/discussions from engineers and search engine representatives would place the following on the list:

_ Thirty to 50 unique words, forming unique, parsable sentences that other sites/pages do not have.

_ Unique HTML text content, different from other pages on the site in more than just the replacement of key verbs and nouns (yes, this means all those sites that build the same page and just change the city and state names thinking it is “unique” are mistaken).

_ Unique titles and meta description elements. If you can’t write unique meta descriptions, just exclude them. Similarly, algorithms can trip up pages and boot them from the index simply for having near-duplicate meta tags.

_ Unique video/audio/image content. The engines have started getting smarter about identifying and indexing pages for vertical search that wouldn’t normally meet the ”uniqueness” criteria. The next criterion from the engines demands that websites “add value” to the content they publish, particularly if it comes from (wholly or partially) a secondary source.

A word of caution to affiliates

This word of caution most frequently applies to affiliate sites whose republishing of product descriptions, images, and so forth has come under search engine fire numerous times. In fact, it is best to anticipate manual evaluations here even if you’ve dodged the algorithmic sweep. The basic tenets are:

_ Don’t simply republish something that’s found elsewhere on the Web unless your site adds substantive value to users, and don’t infringe on others’ copyrights.

_ If you’re hosting affiliate content, expect to be judged more harshly than others, as affiliates in the SERPs are one of users’ top complaints about search engines.

_ Small things such as a few comments, a clever sorting algorithm or automated tags, filtering, a line or two of text, simple mashups, or advertising do not constitute “substantive value.” For some exemplary cases where websites fulfill these guidelines, check out the way sites such as CNET, Urbanspoon, and Metacritic take content/products/reviews from elsewhere, both aggregating and “adding value” for their users. Last but not least, Google has provided a guideline to refrain from trying to place “search results in the search results.” For reference, look at the post from Google’s Matt Cutts, including the comments, at http://www.mattcutts.com/blog/search-results-in-search-results/. Google’s stated feeling is that search results generally don’t “add value” for users, though others have made the argument that this is merely an anticompetitive move. Sites can benefit from having their “search results” transformed into “more valuable” listings and category/subcategory landing pages. Sites that have done this have had great success recovering rankings and gaining traffic from Google. In essence, you want to avoid the potential for your site pages being perceived, both by anengine’s algorithm and by human engineers and quality raters, as search results. Refrain from:

_ Pages labeled in the title or headline as “search results” or “results” Pages that appear to offer a query-based list of links to “relevant” pages on the site without other content (add a short paragraph of text, an image, and formatting that make the ”results” look like detailed descriptions/links instead)

_ Pages whose URLs appear to carry search queries (e.g., ?q=miami+restaurants or ? search=Miami+restaurants versus /miami-restaurants)

_ Pages with text such as “Results 1 through 10″ Though it seems strange, these subtle, largely cosmetic changes can mean the difference between inclusion and removal. Err on the side of caution and dodge the appearance of search results.

Duplicate Content Issues

Duplicate content can result from many causes, including licensing of content to or from your site, site architecture flaws due to non-SEO-friendly CMSs, or plagiarism. Over the past five years, however, spammers in desperate need of content began the now much-reviled process of scraping content from legitimate sources, scrambling the words (through many complex processes), and repurposing the text to appear on their own pages in the hopes of attracting long tail searches and serving contextual ads (and various other nefarious purposes). Thus, today we’re faced with a world of “duplicate content issues” and “duplicate content penalties.” Here are some definitions that are useful for this discussion:

Unique content

This is written by humans, is completely different from any other combination of letters, symbols, or words on the Web, and is clearly not manipulated through computer text- processing algorithms (such as Markov-chain-employing spam tools).

Snippets

These are small chunks of content such as quotes that are copied and reused; these are almost never problematic for search engines, especially when included in a larger document with plenty of unique content.

Shingles

Search engines look at relatively small phrase segments (e.g., five to six words) for the presence of the same segments on other pages on the Web. When there are too many shingles in common between two documents, the search engines may interpret them as duplicate content.

Duplicate content issues

This is typically used when referring to duplicate content that is not in danger of getting a website penalized, but rather is simply a copy of an existing page that forces the search engines to choose which version to display in the index (a.k.a. duplicate content filter).

Duplicate content filter

This is when the search engine removes substantially similar content from a search result to provide a better overall user experience.

Duplicate content penalty

Penalties are applied rarely and only in egregious situations. Engines may devalue or ban other web pages on the site, too, or even the entire website.

Consequences of Duplicate Content

Assuming your duplicate content is a result of innocuous oversights on your developer’s part, the search engine will most likely simply filter out all but one of the pages that are duplicates because the search engine wants to display one version of a particular piece of content in a given SERP. In some cases, the search engine may filter out results prior to including them in the index, and in other cases the search engine may allow a page in the index and filter it out when it is assembling the SERPs in response to a specific query. In this latter case, a page may be filtered out in response to some queries and not others. Searchers want diversity in the results, not the same results repeated again and again. Search engines therefore try to filter out duplicate copies of content, and this has several consequences:

_ A search engine bot comes to a site with a crawl budget, which is counted in the number of pages it plans to crawl in each particular session. Each time it crawls a page that is a duplicate (which is simply going to be filtered out of search results) you have let the bot waste some of its crawl budget. That means fewer of your “good” pages will get crawled. This can result in fewer of your pages being included in the search engine index.

_ Links to duplicate content pages represent a waste of link juice. Duplicated pages can gain PageRank, or link juice, and since it does not help them rank, that link juice is misspent.

_ No search engine has offered a clear explanation for how its algorithm picks which version of a page it does show. In other words, if it discovers three copies of the same content, which two does it filter out? Which one does it still show? Does it vary based on the search query? The bottom line is that the search engine might not favor the version you wanted. Although some SEO professionals may debate some of the preceding specifics, the general structure will meet with near-universal agreement. However, there are a couple of problems around the edge of this model. For example, on your site you may have a bunch of product pages and also offer print versions of those pages. The search engine might pick just the printer-friendly page as the one to show in its results. This does happen at times, and it can happen even if the printer-friendly page has lower link juice and will rank less well than the main product page. The fix for this is to apply the canonical URL tag to all versions of the page to indicate which version is the original.A second version of this can occur when you syndicate content to third parties. The problem is that the search engine may boot your copy of the article out of the results in favor of the version in use by the person republishing your article. The best fix for this, other than NoIndexing the copy of the article that your partner is using, is to have the partner implement a link back to the original source page on your site. Search engines nearly always interpret this correctly and emphasize your version of the content when you do that.

How Search Engines Identify Duplicate Content

Some examples will illustrate the process for Google as it finds duplicate content on the Web. In the examples shown in Figures 6-24 through 6-27, three assumptions have been made:

_ The page with text is assumed to be a page containing duplicate content (not just a snippet, despite the illustration).

_ Each page of duplicate content is presumed to be on a separate domain.

_ The steps that follow have been simplified to make the process as easy and clear as possible. This is almost certainly not the exact way in which Google performs (but it conveys the effect). There are a few facts about duplicate content that bear mentioning as they can trip up webmasters who are new to the duplicate content issue:

Location of the duplicate content

Is it duplicated content if it is all on my site? Yes, in fact, duplicate content can occur within a site or across different sites.

Percentage of duplicate content

What percentage of a page has to be duplicated before you run into duplicate content filtering? Unfortunately, the search engines would never reveal this information because it would compromise their ability to prevent the problem It is also a near certainty that the percentage at each engine fluctuates regularly and that more than one simple direct comparison goes into duplicate content detection. The bottom line is that pages do not need to be identical to be considered duplicates.

Ratio of code to text

What if your code is huge and there are very few unique HTML elements on the page? Will Google think the pages are all duplicates of one another? No. The search engines do not really care about your code; they are interested in the content on your page. Code size becomes a problem only when it becomes extreme.

Ratio of navigation elements to unique content

Every page on my site has a huge navigation bar, lots of header and footer items, but only a little bit of content; will Google think these pages are duplicates? No. Google (and Yahoo! and Bing) factor out the common page elements such as navigation before evaluating whether a page is a duplicate. They are very familiar with the layout of websites and recognize that permanent structures on all (or many) of a site’s pages are quite normal. Instead, they’ll pay attention to the “unique” portions of each page and often will largely ignore the rest.

Licensed content

What should I do if I want to avoid duplicate content problems, but I have licensed content from other web sources to show my visitors? Use meta name = “robots” content=”noindex, follow”. Place this in your page’s header and the search engines will know that the content isn’t for them. This is a general best practice, because then humans can still visit the page, link to it, and the links on the page will still carry value.Another alternative is to make sure you have exclusive ownership and publication rights for that content.

How to Avoid Duplicate Content on Your Own Site

As we outlined, duplicate content can be created in many ways. Internal duplication of material requires specific tactics to achieve the best possible results from an SEO perspective. In many cases, the duplicate pages are pages that have no value to either users or search engines. If that is the case, try to eliminate the problem altogether by fixing the implementation so that all pages are referred to by only one URL. Also, 301-redirect the old URLs to the surviving URLs to help the search engines discover what you have done as rapidly as possible, and preserve any link juice the removed pages may have had. If that process proves to be impossible, there are many options, as we will outline in “Content Delivery and Search Spider Control” on page 238. Here is a summary of the guidelines on the simplest solutions for dealing with a variety of scenarios:

_ Use the canonical tag. This is the next best solution to eliminating the duplicate pages.

_ Use robots.txt to block search engine spiders from crawling the duplicate versions of pages on your site.

_ Use the Robots NoIndex meta tag to tell the search engine to not index the duplicate pages.

_ NoFollow all the links to the duplicate pages to prevent any link juice from going to those pages. If you do this, it is still recommended that you NoIndex those pages as well. You can sometimes use these tools in conjunction with one another. For example, you can NoFollow the links to a page and also NoIndex the page itself. This makes sense because you are preventing the page from getting link juice from your links, and if someone else links to your page from another site (which you can’t control), you are still ensuring that the page does not get into the index. However, if you use robots.txt to prevent a page from being crawled, be aware that using NoIndex or NoFollow on the page itself does not make sense, as the spider can’t read the page, so it will never see the NoIndex or NoFollow tag. With these tools in mind, here are some specific duplicate content scenarios:

HTTPS pages

If you make use of SSL (encrypted communications between the browser and the web server often used for e-commerce purposes), you will have pages on your site that begin with https: instead of http:. The problem arises when the links on your https: pages link back to other pages on the site using relative instead of absolute links, so (for example) the link to your home page becomes https://www.yourdomain.com instead of http://www.yourdomain.com. If you have this type of issue on your site, you may want to use the canonical URL tag. An alternative solution is to change the links to absolute links (http://www.yourdomain.com/content.html instead of “/ content.html”), which also makes life more difficult for content thieves that scrape your site.

CMSs that create duplicate content

Sometimes sites have many versions of identical pages because of limitations in the CMS where it addresses the same content with more than one URL. These are often unnecessary duplications with no end-user value, and the best practice is to figure out how to eliminate the duplicate pages and 301 the eliminated pages to the surviving pages. Failing that, fall back on the other options listed at the beginning of this section.

Print pages or multiple sort orders

Many sites offer print pages to provide the user with the same content in a more printer- friendly format. Or some e-commerce sites offer their products in multiple sort orders (such as size, color, brand, and price). These pages do have end-user value, but they do not have value to the search engine and will appear to be duplicate content. For that reason, use one of the options listed previously in this subsection.

Duplicate content in blogs and multiple archiving systems (pagination, etc.)

Blogs present some interesting duplicate content challenges. Blog posts can appear on many different pages, such as the home page of the blog, the Permalink page for the post, date archive pages, and category pages. Each instance of the post represents duplicates of the other instances. Once again, the solutions listed earlier in this subsection are the ones to use in addressing this problem.

User-generated duplicate content (repostings, etc.)

Many sites implement structures for obtaining user-generated content, such as a blog, forum, or job board. This can be a great way to develop large quantities of content at a very low cost. The challenge is that users may choose to submit the same content on your site and in several other sites at the same time, resulting in duplicate content among those sites. It is hard to control this, but there are two things you can do to reduce the problem:

_ Have clear policies that notify users that the content they submit to your site must be unique and cannot be, or cannot have been, posted to other sites. This is difficult to enforce, of course, but it will still help some to communicate your expectations.Implement your forum in a different and unique way that demands different content. Instead of having only the standard fields for entering data, include fields that are likely to be unique over what other sites do, but that will still be interesting and valuable for site visitors to see.

Controlling Content with Cookies and Session IDs

Sometimes you want to more carefully dictate what a search engine robot sees when it visits your site. In general, search engine representatives will refer to the practice of showing different content to users than crawlers as cloaking, which violates the engines’ Terms of Service (TOS) and is considered spam. However, there are legitimate uses for this concept that are not deceptive to the search engines or malicious in intent. This section will explore methods for doing this with cookies and sessions IDs.

Author: jacobstree on July 23, 2010
Category: Seo
Tags: , , , , , , , ,
4 responses to “Developing an SEO-Friendly Website For WordPress And Other Websites Part-2”
  1. [...] This post was mentioned on Twitter by Online Marketing New, jacobstree. jacobstree said: DevelopingSEOFriendlyWebsite@http://bit.ly/bOJF4S [...]

  2. [...] the original post here: Developing an SEO-Friendly Website For WordPress And Other … Share and [...]

  3. [...] the original post here: Developing an SEO-Friendly Website For WordPress And Other … Share and [...]

  4. Excellent Informative WordPress SEO article. WordPress websites are probably one of the most accepted platforms online today. Having a WordPress website optmized for search engines is easy for an SEO Expert and doable for an SEO beginer.

    Thank you for sharing this information.

Leave a Reply

Last articles