Sourcing Newsletter Content On The Internet (3)
Foreword By The Author
This is the third in a series of articles where the focus is on sourcing newsletter content on the Internet. The full sequence of articles is as follows:
- Copyright Issues
- Help and Guidance Resources
-
Subject Research
- Content Providers
Mike Alexander
For all your content needs go to ClipCopy Content Solutions
Database Without Structure
Making the most of the Internet is a matter of making use of the (often) huge amount of data available on practically any subject you care to mention. The drawback is that it is sometimes harder and more time-consuming than expected. There is a lot of information ‘out there’, some of which is neatly sorted and categorized in databases (the owners of which may or may not charge you for access) but, because the Internet is not organized hierarchically, there is no central authority directing ‘where’ things should ‘go’. The data then, is only of use to you if you know how to find it.
Logic dictates that the easiest way to do this is to use Internet Search Sites (search ‘engines’ and directories), which also have the advantage of being free (to users). After all, that’s what they’re for, isn’t it? The answer, in theory, is yes. In theory it is just a matter of entering in key words and bingo!—up pops a list of places where documents containing those words can be found. Unfortunately though, it just isn’t that simple!
Volume Of Data
The main problem is the sheer volume of data available on many subjects. When your search result consists of literally thousands, or even millions of links, you might get the (understandable) feeling that you’ve gone from a paucity of facts to a state of information overload. In other words you may be only a little better off because you are now faced with having to do a further search—of the search results! Not only that but, because of the volume of data and number of sites, it’s practically impossible for the Search Sites to keep ahead of changes constantly occurring on the millions of pages they index, not to mention the thousands of new sites coming online every day.
‘Spamdexing‘
Sometimes the ‘volume of data’ problem manifests itself in unexpected ways. It is common, for example, to get totally irrelevant results that do not seem to have any of the keywords you are searching for. This is often caused by ‘spamdexing’, which is an attempt on the part of some website owners to artificially ‘load’ their pages with common, sometimes hidden, keywords even though they might have little or nothing to do with the content of their site. They do this in order to get a high ranking (in other words, for their site to appear within the first 10 or 20 links returned) in all search results, knowing that any links lower down the list, even if they are more relevant than theirs, will invariably get overlooked. People get very annoyed when their searches turn up a load of rubbish and they usually blame the search engines even though the real culprits are more likely to be the owners of those rubbish sites trying to cheat the system. However, there is one search engine doing a good job of trying to rectify this kind of cheating and, as a result, the worst of it has now changed for the better. That search engine is Google and it is the main reason for its current domination of the search engine field.
Search Options

- Image via Wikipedia
Engines
The problem of irrelevant search results is most common with search ‘engines’ other than Google. The reason is that their entries are gathered automatically by Internet robots (called spiders) and processed into their databases electronically.
Nevertheless, they have certain advantages over manually processed databases:
- They can ‘spider’ the web 24 hours a day for 365 days a year, which results in many more sites being indexed than would be possible otherwise.
- They can index individual pages on sites rather than just the ‘root URL’.
Directories
Does this mean that search directories, or ‘portals’, as they are now sometimes called, are better because they use real people who are not so easily duped? Some people once thought so, but the problem is once again one of volume. Because of the time involved, a database whose entries are checked manually necessitates some fairly strict rules and this can lead to a tendency towards bureaucracy and a lack of flexibility. The end result is often that fewer sites get indexed overall and amongst the missing might be some perfectly good ones—and even some veritable goldmines of information. Meanwhile, other sites with little or no worthwhile content manage to mysteriously achieve positions of prominence, presumably by appealing to a sense of what constitutes a ‘good’ site on the part of the reviewer. The best-known example of a search directory is Yahoo!. (http://www.yahoo.com/)
Topic-Specific Indexes
There are hundreds of very specific databases on the Internet covering a wide variety of subjects. These can sometimes be queried using specialized search engines like Pilot-Search, which is a literary search engine. (http://www.pilot-search.com/)
They can also often be found by using standard searching methods and sometimes by consulting ‘megasource jumpstations’ or, as they are now coming to be known, ‘Vortals’ (vertical portals). These are similar to the search directories mentioned above except that they each concentrate on a particular niche, or market segment. They are invariably much smaller but cover their subject more comprehensively. The main advantage in using this type of site is the concentrated focus, and the fact that many of them include details of resources that, for one reason or another, simply do not appear elsewhere. The downside is that they can often be as hard to find as the information you hope to get from them. One of the reasons for this is that many of the most popular Search Sites refuse to list them (presumably because they see them as competitors).
As mentioned before in this series, there is more to the Internet than the World Wide Web. Because of this there are some ‘protocol-specific’ searching applications that seek out data that is available on Usenet ‘newsgroups‘, Gopher sites, email Mailing Lists etc. Some of these consist of ‘discussion threads’ from within forums and are not always suitable for specific subject research. However, they can sometimes prove invaluable for ongoing research and an increasing number archive their most popular discussion threads on the web too. An example of this type of resource would be Deja.com, which is used for finding discussion groups of all kinds. (http://www.deja.com/)
Focused Searching
Probably the most effective way of getting accurate results when using many engines, directories and databases is by using ‘precision searching’. In its simplest form, this means entering the most precise phrase you can think of to describe what you are looking for, thus narrowing the search as much as possible from the outset. If this doesn’t produce the desired result, then widen the search term one step at a time until it does.
Most people do the opposite. They start by using a single, all-encompassing keyword, which results in a very large list to start with; then they start to narrow the search by qualifying that word in increasing detail. The problem with this method is that the searcher has to wade through huge amounts of data in the process and often never gets to the point of refinement necessary to produce worthwhile results.
Precision searching necessitates the use of search delimiters; the commonest one being “THIS PHRASE” (in other words, enclosing the search phrase within quotation marks). Other important delimiters are the words AND, OR and NOT. This article is not the place to launch into a tutorial on refining devices for searching but suffice to say that you will improve your results dramatically if you take the trouble to learn these and the many other search terms in common use. Most of them can be found by simply using the Help feature in the search engine or other device you are using. The very popular search engine AltaVista is particularly helpful in this respect. (http://www.altavista.com/)
Resources
The URLs listed below are only meant to guide you, to give you some ideas and to illustrate the range of useful sites available to people wishing to research subject material on the Internet. They are only a fraction of a huge range of sites available and are not meant to be in any way representative of what is on the Internet as a whole. However, you will find pointers within them to every type of resource mentioned in this article.
SquirrelNet
A simple and concise guide to searching the Internet and finding cool sites. SquirrelNet ranks the top search engines and tells you which is best for different types of searching needs. This site also links to free online games, hidden jobs and links about squirrels(!).
http://www.squirrelnet.com/
direct search
A huge compilation of search tools, directories and resources that allows a lot of material that is normally hidden or invisible to general search engines to be accessible and searchable.
http://gwis2.circ.gwu.edu/~gprice/direct.htm
Internets Search Engines, Databases and Newswires
Search the Internet’s collection of search engines and databases in every useful category. Glowing reviews from press and universities about the 1000′s of reference search engines.
http://www.internets.com/
InfoSpace
Real world information where you’ll find yellow pages, white pages, classifieds, shopping sites, finance information, government data, chat rooms, and much more.
http://www.infospace.com/
Sookoo Strategy Searches
What if you are searching for information about a concept, rather than a specific company or person? Try Sookoo, the business strategy search specialist. At this site you can drill through categories such as big thinkers, leadership, trends or change management—or search on just about any term you can think of.
http://www.sookoo.com/
IMG Network Search Resources
The official IMG Network Search page where all the tools and help you need can be found.
http://www.img.net/search/
ePilot.com
Search the Web with the ePilot Desktop Application!
http://www.epilot.com/
AffirmNet Search Resources
Whether looking for information on a subject of your interest or trying to see whether the Internet is aware of the existence of your web site, these search engines and directories are some of the most useful ways to make sense of the overwhelming size and complexity of the Internet.
http://www.affirmnet.com/search.html
WorldPages.com
The world’s premier Internet Yellow Pages and White Pages, Email directory, featuring 117 Million U.S. & Canadian white and yellow pages listings, 30 million URLs, 125,000 web sites hosted for local businesses, government listings, email and web search, maps, classifieds, ecommerce, and links to over 350 international online government, business and email directories worldwide.
http://www.worldpages.com/
The Ultimates
A new type of index with twenty-five net services at your fingertips.
http://www.theultimates.com/
theDigitalDetective.com
Your master index to the world: find ancestors, audio, businesses, domain name, driving directions, e-mail, how-to, laws, location, maps, news, people, phone numbers, pictures, places, software, and zip codes.
http://www.thedigitaldetective.com/
Search Search Sites
Use SSS to find appropriate search engines and search directories for whatever you are looking for.
http://www.motherofallsearches.com/search.htm
Liszt, The Mailing List Directory
A really big directory of mailing lists (and newsgroups, too). If you like anything, you’re bound to find something you like here.
http://www.liszt.com/
CNET Search
Search hundreds of sites in one place.
http://new.search.com/
The Invisible Web
For those hard-to-find resources.
http://www.invisibleweb.com/
Windweaver
Top search engines, directories, libraries and metasearch pages reviewed plus links to recommended search tools and an online search skills course. Other resources at Windweaver include searching tutorials, search resources, recommended Web sites, hints for email communication and mailing lists, etc.
http://www.windweaver.com/
© 2000 Mike Alexander (Revised 2009), All Rights Reserved
Reprint Rights
If you would like to use the above article in your own publication you must follow our Reprint Rights guidelines.










