Sourcing Newsletter Content On The Internet (3)

Foreword By The Author

This is the third in a series of articles where the focus is on sourcing newsletter content on the Internet. The full sequence of articles is as follows:

  1. Copyright Issues
  2. Help and Guidance Resources
  3. Subject Research

  4. Content Providers

Mike Alexander
For all your content needs go to ClipCopy Content Solutions

Database Without Structure

Making the most of the Internet is a matter of making use of the (often) huge amount of data available on practically any subject you care to mention. The drawback is that it is sometimes harder and more time-consuming than expected. There is a lot of information ‘out there’, some of which is neatly sorted and categorized in databases (the owners of which may or may not charge you for access) but, because the Internet is not organized hierarchically, there is no central authority directing ‘where’ things should ‘go’. The data then, is only of use to you if you know how to find it.

Logic dictates that the easiest way to do this is to use Internet Search Sites (search ‘engines’ and directories), which also have the advantage of being free (to users). After all, that’s what they’re for, isn’t it? The answer, in theory, is yes. In theory it is just a matter of entering in key words and bingo!—up pops a list of places where documents containing those words can be found. Unfortunately though, it just isn’t that simple!

Volume Of Data

The main problem is the sheer volume of data available on many subjects. When your search result consists of literally thousands, or even millions of links, you might get the (understandable) feeling that you’ve gone from a paucity of facts to a state of information overload. In other words you may be only a little better off because you are now faced with having to do a further search—of the search results! Not only that but, because of the volume of data and number of sites, it’s practically impossible for the Search Sites to keep ahead of changes constantly occurring on the millions of pages they index, not to mention the thousands of new sites coming online every day.

Spamdexing

Sometimes the ‘volume of data’ problem manifests itself in unexpected ways. It is common, for example, to get totally irrelevant results that do not seem to have any of the keywords you are searching for. This is often caused by ‘spamdexing’, which is an attempt on the part of some website owners to artificially ‘load’ their pages with common, sometimes hidden, keywords even though they might have little or nothing to do with the content of their site. They do this in order to get a high ranking (in other words, for their site to appear within the first 10 or 20 links returned) in all search results, knowing that any links lower down the list, even if they are more relevant than theirs, will invariably get overlooked. People get very annoyed when their searches turn up a load of rubbish and they usually blame the search engines even though the real culprits are more likely to be the owners of those rubbish sites trying to cheat the system. However, there is one search engine doing a good job of trying to rectify this kind of cheating and, as a result, the worst of it has now changed for the better. That search engine is Google and it is the main reason for its current domination of the search engine field.

Search Options

Google Appliance as shown at RSA Expo 2008 in ...
Image via Wikipedia

Engines

The problem of irrelevant search results is most common with search ‘engines’ other than Google. The reason is that their entries are gathered automatically by Internet robots (called spiders) and processed into their databases electronically.

Nevertheless, they have certain advantages over manually processed databases:

  1. They can ‘spider’ the web 24 hours a day for 365 days a year, which results in many more sites being indexed than would be possible otherwise.
  2. They can index individual pages on sites rather than just the ‘root URL’.

Directories

Does this mean that search directories, or ‘portals’, as they are now sometimes called, are better because they use real people who are not so easily duped? Some people once thought so, but the problem is once again one of volume. Because of the time involved, a database whose entries are checked manually necessitates some fairly strict rules and this can lead to a tendency towards bureaucracy and a lack of flexibility. The end result is often that fewer sites get indexed overall and amongst the missing might be some perfectly good ones—and even some veritable goldmines of information. Meanwhile, other sites with little or no worthwhile content manage to mysteriously achieve positions of prominence, presumably by appealing to a sense of what constitutes a ‘good’ site on the part of the reviewer. The best-known example of a search directory is Yahoo!. (http://www.yahoo.com/)

Topic-Specific Indexes

There are hundreds of very specific databases on the Internet covering a wide variety of subjects. These can sometimes be queried using specialized search engines like Pilot-Search, which is a literary search engine. (http://www.pilot-search.com/)

They can also often be found by using standard searching methods and sometimes by consulting ‘megasource jumpstations’ or, as they are now coming to be known, ‘Vortals’ (vertical portals). These are similar to the search directories mentioned above except that they each concentrate on a particular niche, or market segment. They are invariably much smaller but cover their subject more comprehensively. The main advantage in using this type of site is the concentrated focus, and the fact that many of them include details of resources that, for one reason or another, simply do not appear elsewhere. The downside is that they can often be as hard to find as the information you hope to get from them. One of the reasons for this is that many of the most popular Search Sites refuse to list them (presumably because they see them as competitors).

As mentioned before in this series, there is more to the Internet than the World Wide Web. Because of this there are some ‘protocol-specific’ searching applications that seek out data that is available on Usenetnewsgroups‘, Gopher sites, email Mailing Lists etc. Some of these consist of ‘discussion threads’ from within forums and are not always suitable for specific subject research. However, they can sometimes prove invaluable for ongoing research and an increasing number archive their most popular discussion threads on the web too. An example of this type of resource would be Deja.com, which is used for finding discussion groups of all kinds. (http://www.deja.com/)

Focused Searching

Probably the most effective way of getting accurate results when using many engines, directories and databases is by using ‘precision searching’. In its simplest form, this means entering the most precise phrase you can think of to describe what you are looking for, thus narrowing the search as much as possible from the outset. If this doesn’t produce the desired result, then widen the search term one step at a time until it does.

Most people do the opposite. They start by using a single, all-encompassing keyword, which results in a very large list to start with; then they start to narrow the search by qualifying that word in increasing detail. The problem with this method is that the searcher has to wade through huge amounts of data in the process and often never gets to the point of refinement necessary to produce worthwhile results.

Precision searching necessitates the use of search delimiters; the commonest one being “THIS PHRASE” (in other words, enclosing the search phrase within quotation marks). Other important delimiters are the words AND, OR and NOT. This article is not the place to launch into a tutorial on refining devices for searching but suffice to say that you will improve your results dramatically if you take the trouble to learn these and the many other search terms in common use. Most of them can be found by simply using the Help feature in the search engine or other device you are using. The very popular search engine AltaVista is particularly helpful in this respect. (http://www.altavista.com/)

Resources

The URLs listed below are only meant to guide you, to give you some ideas and to illustrate the range of useful sites available to people wishing to research subject material on the Internet. They are only a fraction of a huge range of sites available and are not meant to be in any way representative of what is on the Internet as a whole. However, you will find pointers within them to every type of resource mentioned in this article.

SquirrelNet

A simple and concise guide to searching the Internet and finding cool sites. SquirrelNet ranks the top search engines and tells you which is best for different types of searching needs. This site also links to free online games, hidden jobs and links about squirrels(!).
http://www.squirrelnet.com/

direct search

A huge compilation of search tools, directories and resources that allows a lot of material that is normally hidden or invisible to general search engines to be accessible and searchable.
http://gwis2.circ.gwu.edu/~gprice/direct.htm

Internets Search Engines, Databases and Newswires

Search the Internet’s collection of search engines and databases in every useful category. Glowing reviews from press and universities about the 1000′s of reference search engines.
http://www.internets.com/

InfoSpace

Real world information where you’ll find yellow pages, white pages, classifieds, shopping sites, finance information, government data, chat rooms, and much more.
http://www.infospace.com/

Sookoo Strategy Searches

What if you are searching for information about a concept, rather than a specific company or person? Try Sookoo, the business strategy search specialist. At this site you can drill through categories such as big thinkers, leadership, trends or change management—or search on just about any term you can think of.
http://www.sookoo.com/

IMG Network Search Resources

The official IMG Network Search page where all the tools and help you need can be found.
http://www.img.net/search/

ePilot.com

Search the Web with the ePilot Desktop Application!
http://www.epilot.com/

AffirmNet Search Resources

Whether looking for information on a subject of your interest or trying to see whether the Internet is aware of the existence of your web site, these search engines and directories are some of the most useful ways to make sense of the overwhelming size and complexity of the Internet.
http://www.affirmnet.com/search.html

WorldPages.com

The world’s premier Internet Yellow Pages and White Pages, Email directory, featuring 117 Million U.S. & Canadian white and yellow pages listings, 30 million URLs, 125,000 web sites hosted for local businesses, government listings, email and web search, maps, classifieds, ecommerce, and links to over 350 international online government, business and email directories worldwide.
http://www.worldpages.com/

The Ultimates

A new type of index with twenty-five net services at your fingertips.
http://www.theultimates.com/

theDigitalDetective.com

Your master index to the world: find ancestors, audio, businesses, domain name, driving directions, e-mail, how-to, laws, location, maps, news, people, phone numbers, pictures, places, software, and zip codes.
http://www.thedigitaldetective.com/

Search Search Sites

Use SSS to find appropriate search engines and search directories for whatever you are looking for.
http://www.motherofallsearches.com/search.htm

Liszt, The Mailing List Directory

A really big directory of mailing lists (and newsgroups, too). If you like anything, you’re bound to find something you like here.
http://www.liszt.com/

CNET Search

Search hundreds of sites in one place.
http://new.search.com/

The Invisible Web

For those hard-to-find resources.
http://www.invisibleweb.com/

Windweaver

Top search engines, directories, libraries and metasearch pages reviewed plus links to recommended search tools and an online search skills course. Other resources at Windweaver include searching tutorials, search resources, recommended Web sites, hints for email communication and mailing lists, etc.
http://www.windweaver.com/

© 2000 Mike Alexander (Revised 2009), All Rights Reserved

PROTECTED BY COPYSCAPE

Reprint Rights

If you would like to use the above article in your own publication you must follow our Reprint Rights guidelines.

Leave a Reply

CommentLuv badge


Powered by Yahoo! Answers