ID: 9787
Added: 2002-09-16 10:31
Modified: 2002-09-16 10:40
Refreshed: 2012-02-10 01:25
|
 |

| How to List Your Web Sites Effectively with Search Engines |

News 28 of 50
Introduction Search engines are the primary tools used by Internet users to locate a web site. If your web site is effectively listed with search engines, you may see a dramatic increase in your online traffic. Increasing audience reach is important to Pan Asia Networking (PAN) partners who are developing country research and development institutions learning to use the Internet strategically to promote and publicise their work. With this in mind, the team, working out of the PAN Collaboratory in Singapore, engaged in some preliminary research to: | 
| understand how various search engines work | | 
| learn the pitfalls/limitations of search engines | | 
| test the effectiveness of the search engines | | 
| improve PAN's website listing in the search engines. | We hope that by sharing the results of this research and our experiences, it will benefit our PAN partners, as well as the Asian development community as a whole. Tips on Generating Keywords A good place to start when trying to list with search engines would be to make a list of keywords that will help your target audience search for your site easily on a search engine. Put yourself in your audience's shoes and imagine: | 
| What would your audience call your product or services? Not just your organisation's name, but other common names that your audience may use. E.g. "online courses", "e-learning" or "training courses" instead of just "Distance Education". | | 
| What are the main features? It could be "Web-based learning" or "virtual classroom". | | 
| What are the key benefits? E.g. "acquire new knowledge or skills from home", or "learning any time or place that suits you". | | 
| What else is related? E.g. "career development", "adult education" or even "learning on the job". | Submitting Keywords to Search Engines There are two ways to submit your keywords to search engines - manual or automated. Manual Submission refers to submission of a web site address to one or two search engines at a time. This is done at search engines' web sites, such as Google.com, Yahoo.com, Lycos.com, etc. The link to submit your web site is usually found at the bottom of a search category, usually titled "submit/ suggest a site". Automated submission refers to the usage of software tools to help expedite the submission of your web site to many search engines simultaneously. These tools can be a software package that can be purchased, such as Web Position GoldTM, or online submission services such as www.addme.com, www.submit-it.com, or www.101addurl.com etc. Cost of Search Engines Submission Although web site submission is free most of the time, certain search engines charge fees for listing on their web sites. One notable example is Yahoo.com, which charges all listings under its "Business and Economy" section a fee of US$199 just to review the web site. Goto.com offers bidding for search rankings starting from US$0.05 onwards. If you belong to a niche category, this may be a good way to ensure high search engine placing at a very low cost. Automated submission service providers will also charge a fee, or would want you to promote their services on your web site if they provide free services. Depending on your needs, the fees you pay may save you the time spent to submit your web site to each search engine individually. Creating Title and META Tags for Your Web Site Title and META Tags are usually found near the top of a web page's HTML codes. Amongst other uses, it can insert hidden keywords into your web pages and make it easier for search engines to find them (this is explained further on). The Title Tag will look like: <TITLE>useit.com: Jakob Nielsen's site (Usable Information Technology)</TITLE> If you manually code your web pages, you're aware of the "keyword" and "description" attributes. Using the META description attribute, you add your own description for your page: <META NAME="description" CONTENT="This page is to publicise the farm produce we offer."> You use the keywords attribute to tell the search engines which keywords to use. In the case of an agriculture producer, it will look something like: <META NAME ="keywords" CONTENT="life, life stock, farm, farming, vegetable, plants, fruits, food, science, biological science, biology, agriculture"> The Components of a Search Engine Search engines have three major components. The first component is the web spider, also known as the web crawler. The web spider visits a web page, parses it, and then follows all the hyper-links in a web page to other web pages. Different search engines have different types of web spiders to work for them. Every text that the web spider finds goes into the second component of a search engine, the index. If the web spider detects that a web page has been changed, then this index will be updated with the latest information. Until it is added to the index, the changes are not available for searching within the search engine. Search engine software is the third component of a search engine. This is the program that sifts through millions of records in the index to find matches to a search and rank the web pages in order of what it believes is the closest match. The Distinction between Search Engines and Directories Search engines, such as HotBot, create their listings automatically. Search engines crawl the web, then people manually search through what the "spiders" have found. If you change your web pages, search engines will eventually find these changes. Page title, page body and other elements all play a role. Web sites addresses are categorized under different topics. A web site can appear under more than one topic. You will submit a short description of your entire site to the directory, or the search engine editors will write one for sites they review. A search looks for matches in the descriptions submitted and will display both the web sites' addresses and the topics under which the web sites are categorized. Some search engines maintain an associated directory. This directory will be utilized by a group of search engines. If a web site is listed in the associated directory, it is actually listed in the group of search engines. It is normal to re-submit the web site to a search engine for multiple times before the web site can get listed in the search engines. How Various Search Engines Work AltaVista (www.altavista.com) AltaVista is consistently one of the largest search engines on the web, in terms of pages indexed. The AltaVista index is built by sending out web crawlers that capture text and analyse it. In theory, there is no need to tell AltaVista about your site if your site is listed in other web sites - it should be found automatically. Some webmasters think that AltaVista and other search engines search only for information in META tags. AltaVista uses a full-text index. Every word on every page matters, and not just individual words, but the order of the words as well, e.g. words combined as phrases. Excite (www.excite.com) Excite search generates site summaries automatically based on the content of the pages it indexes. The engine is designed to extract important portions of each page to help identify what the page is about. If a document has a META-DESCRIPTION tag, Excite Search will use its contents as the summary of that document when the page shows up in search results. Lycos (www.lycos.com) The Lycos spider will try to travel through links contained in the web page you submit. Do not submit dynamic web pages with these symbols in the URL: ampersand (&), percent sign (%), equals sign (=), dollar sign ($) or question mark (?). LookSmart (www.looksmart.com) Human editors who review each site and describe its content compile the LookSmart directory. LookSmart is fundamentally different from search sites like HotBot, AltaVista and Infoseek. LookSmart is not only a search engine but also a Web directory that contains the largest collection of reviewed quality sites on the Internet. LookSmart category search guides you through our site listings by topic. Yahoo (www.yahoo.com) The Yahoo directory is organized by subject. Sites are placed in categories by Yahoo. Every site listed by Yahoo is contained somewhere within these 14 major categories, available from the top page of the directory. You can also suggest a new category for your web site. How Pan Asia Networking Lists with Search Engines PAN uses systematic submission of its web site address PanAsia.org.sg to one or two search engines at a time. After a period of time, we check the ranking of the web site address in the search engines' results by searching for the keywords used in the META tag and in the full text. Pan Asia's pages are enclosed in frames format with JavaScript for the main controlling pages. The following are the HTML tags in the header of PanAsia's index.htm page. Title Pan Asia Networking, an Information and Communication Technologies (ICT) program of International Development Research Centre Meta Description Assists Asian development organisations build capacity in information and communication technologies through funding projects in Internet connectivity, networking solutions, ecommerce and distance education projects. Meta Keywords asia, asian, books, bibliographical databases, communication, contents, community-based natural resource management, developing countries, development, distance education, e-commerce, environmental, funds, handicrafts, ict, information, internet, international, it, journals, merchants, network, networking solutions, online conferences, pan, panasia, policy, projects, providers, publications, reports, research, r&d, results, services, social economic, textiles, technological, video-on-demand, bangladesh, bhutan, canada, china, cambodia, india, indonesia, laos, malaysia, mongolia, nepal, pakistan, papua new guinea, philippines, sri lanka, singapore, tibet, thailand, vietnam For the purpose of this research exercise, PAN submitted its PanAsia URL to one or two search engines/directories a week. After sometime, we went back to these submitted sites to check PanAsia's ranking. Submissions to search engines were made to AltaVista, Excite, Google, HotBot, LookSmart, Lycos and Webcrawler between July and August 2000. An analysis was made in January 2001 and again in April 2001. Below are the latest search engine rankings in April 2001 based on keyword searches. | Keywords / Search Engines | Alta Vista | Excite | Goo gle | Hot Bot | Look Smart | Lycos | Web Crawl- er | | pan | 2nd | 2nd | 8th | 6th | >50 | 30 8th (under popular site) | 2nd | | panasia | 1st | 1st | 1st | 1st | 1st | 2nd | 1st | | pan Asia | 1st | 1st | 1st | 1st | 1st | 4th | 1st | | bibliographi- cal | 15th | >50 | >50 | >50 | >50 | >50 | >50 | | bibliographi- cal databases | 1st | >50 | >50 | >50 | >50 | >50 | >50 | | community- based natural resource management | 2nd | 49th | >50 | >50 | >50 | >50 | >50 | | ICT | >50 | 12th | >50 | >50 | >50 | >50 | 12th | | Networking solutions | >50 | 32nd | >50 | 40th | >50 | >50 | 3rd | Note: 1. >50 means that we did not look beyond the 50th listing. 2. The search engine ranking results may be affected by earlier submissions. What We Learnt -
Text that appears in multimedia files (audio and video) cannot be indexed. You could however, include these files at AltaVista MP3/Audio and Video search. -
Information that is generated by Java-Script, Java applets or in XML coding cannot be indexed. It is good to include hyper-links in JavaScript and Java applets in normal html page to let the web spiders parse through these links. -
Dynamic pages (pages linked to a database, and will be updated when the database gets updated) also block Web crawlers. While it's great to have such pages, the techniques you use to do that could stop search engines from indexing your content and hence could greatly reduce your potential traffic. Typically such pages have a question mark (?) in the URL. Examples of dynamic pages which are not indexed are Active Server Pages, Perl (cgi-bin) Pages, PHP pages, etc. with question marks in their URLs (indicating that the page is a script for the construction of a page, rather than just static content). When a search engine crawler arrives at such a page, it captures the content but halts immediately, and will not follow the links, because it sees ahead of it an infinite number of pages that would bring it to a crash. If you are using dynamic pages avoid using them for the homepage. -
If you have information inside frames, some search engines index the outside of the frame as a distinct page. It will also index each pane of the frame window as a separate page. So if you want visitors from search engines to experience your pages in a certain way, you should have non-frames as well as frames versions of those pages, and submit the non-frames versions with as additional URLs. -
Also, consider technical factors. If a site has a slow connection or the pages are very complex, it might time out before the crawler can index all the text. -
If you have a hierarchy of directories at your site, put the most important information high, not deep. Search engines will presume that information placed higher is more important. And crawlers may not venture deeper than three, four, or five directory levels. -
In addition, it helps to have a central page with good navigation to the other pages at your site. Make it easy, not hard, for the crawler to find all your pages by following internal links. -
Search engines may also penalize pages or exclude them from the index if they detect search engine spamming. An example is when a word is repeated hundreds of times on a page, to increase the frequency and propel the page higher in the lists. -
Meta tags do not necessary boost the pages with most of the search engines. How to Improve PanAsia's Listing -
Move PanAsia's pages from frame pages to non-frame pages. If this is not possible, make a link to the home page where the frames can be displayed using the following. If(top==self)self.location.href="frameset page name here"; -
Replace single keywords like communication, contents, Asian etc.. by a phrase of keywords that describes a niche area that is more targeted and will also obtain higher ranking in the search engine. -
Use other more relevant keywords and phrases which describe the PanAsia website's content such as: virtual conferencing, networking solutions, communities of practice, ICT policy, ICT research and development, development research, development of e-commerce in Asia, ICT4D, PanAsia, Pan Asia Networking. -
Below are some findings related to specific search engines: AltaVista Being a full-text and robot enabled search engine, it becomes important that the pages should always contain a HTML title tag and the body text is equally important. The search results are based on titles of the pages as well as the body text more than the meta tags. AltaVista Search keeps its information current by completely rebuilding its index with a "big crawl" every few months, with "updating crawls" that bring in millions of pages a day. If your site's URL has changed, you will need to submit a "Submit a Site" form so that your new location can be "crawled". Excite Framed content is not indexable by Excite. Excite's spider regularly revisits all of the pages it indexes, gathering information to be indexed and noting any content changes you have made. Excite derives the summary of your site from your site's content or from the META-DESCRIPTION tag. To change the site summary that Excite's spider compiles, either change the content of your META-DESCRIPTION tag or change the content on your site's homepage. Lycos If you change your URL, please add your new site to Lycos (think of it as a change of address form.) Keep your site live. Lyco's spider revisits all sites on a periodic basis. If Lycos' spider can't connect to your site over an extended period (about 4 weeks), it will be deleted from their rolling catalog. If your site goes back online, Lycos will add it in again. Conclusions and Recommendations -
When your website appears in the search engine results, so does your title. If you have frame pages, check if these pages are supported by the search engines that you are submitting to. Some search engines cannot index text that is embedded in graphics. Search engines simply cannot "see" the text unless the web-master puts alternate text behind the picture, describing it and listing those important words. Therefore label pictures clearly with HTML ALT TEXT tag in your page. -
Sites that require any kind of registration or password lock out search engines. If the content of your database is largely text, you might consider creating plain-text static HTML pages with that same content, so they can be indexed and found. -
Acrobat files cannot be indexed by most search engines. But if you need to be found, you should provide plain HTML versions of those pages and point the crawler to those with Add a URL. -
Do not repeat the keywords too many times, as it may be detected as spamming by the search engines. -
If your listing is not appearing on a certain search engine, you should resubmit it. -
Resubmit your site if you have made extensive changes. This will speed up the re-indexing process at these search engines rather than waiting for the indexing agent to visit your site the next time. -
Read any submission tips provided by the search engines before submitting with site. -
Search engines have a tough time with frames. Using frames either prevents them from finding pages within a web site, or it causes them to send visitors into a site without the proper frame "content" being established. Both problems can be corrected, such as by moving your pages to a non-frame set-up. Webmasters must familiarize themselves with various submission methods and achieve understanding how various search engines work in order to have optimum results when listing with search engines. We hope this writeup of our experiences will help you list your website more effectively. Pan Asia Networking is an initiative of the International Development Research Centre (IDRC), a public corporation created by the Parliament of Canada to help researchers and communities in the developing world find solutions to their social, economic and environmental problems. Contact Ms Maria Ng Lee Hoon Senior Regional Program Officer Pan Asia Networking International Development Research Centre Regional Office for Southeast and East Asia Tanglin PO Box 101 Singapore 912404 Tel: 65-6235 1344 Fax: 65-6235 1849 E-mail: MNg@idrc.org.sg http://www.PanAsia.org.sg Mr Richard Fuchs Director, Information and Communication Technologies for Development (ICT4D) International Development Research Centre Head Office PO Box 8500, Ottawa, Ontario Canada K1G 3H9 Tel: 1-613-236 6163 Fax: 1-613-567 7749 E-mail: RFuchs@idrc.ca http://web.idrc.ca This article is co-authored by Teresa Wong, Pan Asia Networking's Web Administrator, and C.J. Ng, CEO of eBiz Enabler Pte Ltd, http://www.ebiz-enabler.com, a company democratising e-business for people in business. References -
Search Engine Watch - http://searchenginewatch.com -
AltaVista - http://doc.altavista.com/adv_search/ast_haw_index.html -
Lycos - http://www.lycos.com -
Excite - http://www.excite.com -
Submit It! Search Engine Tips - http://submitit.bcentral.com -
Ineedhits - http://www.ineedhits.com -
Pan Asia Networking - http://www.PanAsia.org.sg A copy of this is also available in PDF format below this document.
File : search.pdf
2002-05-05

News 28 of 50
|
 |