PERTEMUAN
13 ARSITEKTUR
NAMA : DEVI SETYAWATI
NIM : 41816310004
MATA KULIAH : ARSITEKTUR DAN
MANAJEMEN E-BUSINESS
DOSEN : Yulius Eka Agung Saputra,
SE, M.Si
RUANG : D2-209
Basic
Search Engine Optimization Techniques
- Determine what keywords you want to appear in SE results (this requires some research and analysis).
- Understand how search engine spiders work.
- Understand how search engines crawl and compile data on the Web and know what documents (html files) relate to which keywords and phrases.
Basically, search
engines collect data about a unique Web site by sending an electronic
spider to visit the site and copy its content which is stored in the
search engine’s database. Generally known as ‘bots’ (robots),
these spiders are designed to follow links from one document to the
next. As they copy and assimilate content from one document, they
record links and send other bots to make copies of content on those
linked documents. This process continues ad infinitum. By sending out
spiders and collecting information 24/7, the major search engines
have established databases that measure their size in the tens of
billions.
As spiders follow links
and record everything in their paths, one can safely assume that if a
link to a site exists, a spider will find that site. Webmasters and
SEOs no longer need to manually or electronically submit their sites
to the major search engines. The search spiders are perfectly capable
of finding them on their own, provided a link to that site exists
somewhere on the web. Google and Yahoo both have an uncanny ability
to judge the topic or theme of documents they are examining, and use
that ability to judge the topical relationship of documents that are
linked together. The most valuable incoming links (and the only ones
worth perusing), come from sites that share topical themes. Offering
spiders access to the areas of the site one wants them to access is
half the battle. The other half is found in the site content. Search
engines are supposed to provide their users with lists of documents
that relate to user.
After the URL of a
site, there are four basic elements a search engine looks at when
examining a document:
- the Title of the site (Page Title meta tag)
- the Description meta tag
- the Keywords meta tag
- keywords in text and especially in the anchor text used in internal links
Page Titles should be
written using the strongest keyword targets as the foundation. Some
titles are written using two or three basic two-keyword phrases. A
key to writing a good title is to remember that human readers will
see the title as the reference link on the search engine results page
(followed by the description)
The Description Meta
tag is also fairly important. Search engines tend to use it to gather
information on the topic or theme of the document. A well written
Description is phrased in two or three complete sentences with the
strongest keyword phrases woven early into each sentence. As with the
title tag, some search engines will display the Description on the
search results pages, generally using it in whole or in part to
provide the text that appears under the reference link. Some search
engines place minor weight in the Keywords Meta tag.
Good content is the
most important aspect of search engine optimization.
The easiest and most basic rule of the trade is that search engine
spiders can be relied upon to read basic body text 100% of the time.
By providing a search engine spider with basic text content, SEOs
offer the engines information in the easiest format for them to read.
While some search engines can strip text and link content from Flash
files, nothing beats basic body text when it comes to providing
information to the spiders. Very good SEOs can almost always find a
way to work basic body text into a site without compromising the
designer’s intended look, feel and functionality.
The content itself should be thematically focused. In other words, keep it simple. Some documents cover multiple topics on each page, which is confusing for spiders and SEOs alike. The basic SEO rule here is if you need to express more than one topic on a page, you need more pages. Fortunately, creating new pages with unique topic-focused content is one of the most basic SEO techniques, making a site simpler for both live-users and electronic spiders. An important caveat is to avoid duplicate content and the temptation to construct doorway pages specifically designed for search placements.
The content itself should be thematically focused. In other words, keep it simple. Some documents cover multiple topics on each page, which is confusing for spiders and SEOs alike. The basic SEO rule here is if you need to express more than one topic on a page, you need more pages. Fortunately, creating new pages with unique topic-focused content is one of the most basic SEO techniques, making a site simpler for both live-users and electronic spiders. An important caveat is to avoid duplicate content and the temptation to construct doorway pages specifically designed for search placements.
Purpose - To explore the capabilities and limitations of blog search engines.
Design/methodology/approach - First, we describe the features
of a range of current blog search engines. Second, we discuss and
illustrate with examples the reliability and coverage limitations of
blog searching.
Findings – Although blog searching is a useful new
technique, the results are sensitive to the choice of search engine,
the parameters used and the date of the search. The quantity of spam
also varies by search engine and search type.R
Research limitations/implications – The results illustrate
blog search evaluation methods and do not use a full-scale scientific
experiment.
Originality/value - Blog searching is a new technique, and one
that is significantly different to web searching. Hence information
professionals need to understand its strengths and weaknesses.
INTRODUCTION
The information sources available to librarians and other
information professional have expanded from the traditional shelves
of books to a plethora of online repositories. In parallel,
information retrieval techniques have developed from the card index
system to keyword searching and the advanced Boolean interfaces
available for the typical digital library and web search engines.
Information professionals need to keep track of the new information
sources and technologies, understanding what is available, how to
access it, and how to interpret or evaluate the results.
blog searching is one of the most unusual. Blogs are mini web sites
containing entries in reverse chronological order. They are often
updated daily or weekly and frequently take the form of a personal
diary (Herring, Scheidt, Bonus, & Wright, 2004), a specialist
information resource (e.g., theshiftedlibrarian.com) or a political
commentary (Trammell & Keshelashvili, 2005).Although a few
‘A-list’ blogs are relatively authoritative, with readerships of
hundreds of thousands for their timely political or technological
commentaries (Trammell & Keshelashvili, 2005), the majority of
blogs carry little authority and the content of most is probably
trivial, or crass and opinionated (Weiss, 2004). Hence, from a
traditional librarian’s perspective blogs seem an information
source to be mostly avoided. A follower of blogs may perhaps visit
those of friends and a few trustworthy information blogs (Bar-Ilan,
2005) for professional or leisure interests, but would probably have
little cause to use a general blog search engine such as
blogsearch.google.com. Nevertheless, blogs do contain information
that can be of value in some cases, such as for public opinion
insights (Gruhl, Guha, Kumar, Novak, & Tomkins, 2005).
If a researcher is not looking for a specific fact or theory but is
interested in attitudes or opinions towards an event or topic, then
an appropriate blog search may well yield a set of relevant posting
by a variety of individual bloggers. Hence understanding the
potential of blog searching is (yet another) capability that
information professionals may benefit from mastering.
The advertising industry has already recognised the potential of
blogs and other ‘consumer-generated media’ (CGM) to gain insights
into consumer opinions (Pikas, 2005). For example Nielsen
BuzzMetrics’ BrandPulse will track mentions of a company’s brand
name online (http://www.nielsenbuzzmetrics.com/brandpulse.asp) and
IBM and Microsoft (Gamon, Aue, Corston-Oliver, & Ringger, 2005;
Gruhl, Guha, Liben-Nowell, & Tomkins, 2004) have similar projects
to extract users opinions or comments from large quantities of
comments. There are two main issues here. First, continually
monitoring online sources allows trends and changes to be identified.
For instance a company may wish to know how a particular advertising
campaign or news story has changed their brand or product
perceptions. Second, this is a passive activity. Consumers are not
interviewed or sent a survey but are indirectly canvassed via their
perhaps throwaway comments in blogs or email discussion lists. The
unique advantage of this is that retrospective opinions can be sought
even about unexpected events
A few search engines provide this function, typically reporting the
daily proportion of blog postings that match the query. Any
noticeable peak in such a graph may represent a burst of discussion
around a specific topic. The debate can then be found typically by
clicking on the peak in the graph, which produces a list of the posts
on that day matching the search.
Although blog search engines have existed since at least 2001 with
DayPop and have been already described briefly by various librarians
(Bradley, 2003; Curling, 2001; Notess, 2002), their increasing power
and an expanding blogspace makes them more relevant now than ever
before. In this paper we describe the capabilities of some common
blog search engines and present an illustrative analysis of the
reliability and coverage of their results. The purpose of these is
not to give definitive information in either case, because rapid
change seem likely, but to illustrate the types of blog search
capabilities that are available and their likely shortcomings
Blog Searching Engines
Blog search engines are similar to web search engines like Google in
that they automatically gather large quantities of information from
the web and give a free interface to allow the public to search their
databases. The main difference between the two is that blog search
engines mainly index blogs and ignore the rest of the web. The
special features of blogs give blog search engines some specific and
unique attributes. First, since each blog posting is dated, blog
search engines can report the date at which the posting was created.
For normal web pages, search engines can only report the last updated
date, and this is often not very reliable. Second, many blog search
engines have a date-specific search capability. Again, some general
search engines have this as an advanced search option, but only for
the last modified date of pages.
Although blogs are web sites and hence use standard HyperText Markup
Language (HTML) for their construction, blog search engines are
designed differently to general search engines in order to take
advantage of blog structures. The core of any blog is the list of
individual blog postings, but these are typically presented to the
blog visitor in a range of different formats. a blog search engine
will try to understand the format of a blog and dissect and store
just the individual blog postings, ignoring all the grouped pages.
This is an operation that needs to be coded for each blog format.
Hence it is quite labour-intensive for computer programmers. A
corollary of this is that it is likely that blog search ngines only
index the most common blog formats and ignore minor or one-off
formats, and it is difficult to understand and process the format of
blogs in foreign languages. There is a fallback mechanism, however,
the Rich Site Summary (RSS) format (Hammersley, 2005; Notess, 2002).
This is a technology used by a minority of blogs to deliver their
individual most recent postings to users. The standard format of RSS
means that it is easy to process and there is often no need to
understand the language of a blog to correctly process its RSS feed.
In summary, a typical blog search engine is likely to be constructed
using a combination of comprehensive indexing of common blog formats
Table 1. Blog
search engines (August 2006).
Search Engine
|
URL
|
Content
|
Other
|
Bloglines
|
http://www.bloglines.com
|
Posts or feeds or others
|
Can add extra entries to the search
options
|
Feedster
|
http://www.feedster.com
|
Blogs or news or podcasts or all
|
No boxes for search preferences -
need syntax, instructions on site
|
Technorati
|
http://www.technorati.com
|
Posts or tags or blog directory
|
|
Icerocket
|
http://www.icerocket.com
|
Blogs or several other things
|
|
Blogdigger
|
http://www.blogdigger.com
|
Blogs
|
No instructions/help, just search
box
|
Blogpulse
|
http://www.blogpulse.com
|
Blogs
|
|
A9
|
http://a9.com/
|
Blogs or several other things
|
Uses IceRocket search
|
Findory Blogs
|
http://www.findory.com/blogs
|
Blogs or News or Video or Podcasts
or Web
|
Just a search box - no advanced
preferences or instructions/help
|
Google Blog Search
|
http://blogsearch.google.com
|
Posts
|
|
BlogSearch-Engine
|
http://www.blog
searchengine.com
|
Blogs or moblogs
|
“Powered by” IceRocket
|
Bloogz
|
http://www.bloogz.com
|
Blogs
|
Can search blogs or URLs, not both
at once
|
Gigablast
|
http://blogs.gigablast.com
|
Blogs or several other things
|
Also site clustering, summary
excerpts, site restriction
|
Sphere
|
http://www.sphere.com
|
Blogs
|
Table 2 summarises the available advanced search facilities,
including Boolean searches, language specific searches and word
location limits (e.g., author/title/body). It is clear from the table
that a variable range of capabilities is offered, with no engine
being comprehensive.
Table 2. Blog
search engine capabilities (August 2006).
Boolean search
|
Date search
|
URL search
|
Time limits
|
Language selection
|
Word location
|
#Results
selection
|
Sort choice
|
|
Bloglines
|
Partial
|
Yes
|
No
|
2001
|
Yes
|
Yes
|
10,20,30,
50,100
|
Yes
|
Feedster
|
Full
|
Yes
|
Yes
|
No
|
No
|
Yes
|
No
|
Yes
|
Technorati
|
Full
|
No
|
Yes
|
No
|
No
|
No
|
No
|
No
|
Icerocket
|
Full
|
Yes
|
No
|
No
|
No
|
Yes
|
No
|
No
|
Blogdigger
|
Full
|
No
|
No
|
No
|
No
|
No
|
No
|
Yes
|
Blogpulse
|
Full
|
Yes
|
Yes
|
180 days
|
No
|
No
|
10,25,50
|
Yes
|
Findory Blogs
|
Partial
|
No
|
No
|
No
|
No
|
No
|
No
|
No
|
Google Blog Search
|
Full
|
Yes
|
Yes
|
2000
|
Yes
|
Yes
|
10,20,30,
50,100
|
No
|
Bloogz
|
Partial
|
No
|
Yes
|
No
|
Yes
|
No
|
No
|
Yes
|
Gigablast
|
Full
|
No
|
Yes
|
No
|
Yes
|
No
|
10,20,30,
50,100
|
No
|
Sphere
|
Full
|
Yes
|
No
|
4 mths.
|
Yes
|
Yes
|
No
|
Yes
|
Producing a trend graph for a query and looking for spikes in the
graph is a good way of discovering relevant recent events. Below is a
list of blog trend graph capabilities.
- Blogpulse (submit a query and click on “trend this”): Graphs of the percentage of postings daily matching a query for the most recent 6 months. Can produce 3 simultaneous graphs and clicking on the graph gives a list of postings from the selected date.
- Technorati (submit a query and click on the mini-graph): Graphs the total volume of postings daily for up to the most recent 360 days. A small Technorati graph can be added to a user’s web site.
- IceRocket (submit a query and click on “trend it”): Graphs of the percentage of postings daily matching a query for the most recent 3 months. Can produce 3 simultaneous graphs.
Evaluation : Realibility and Coverage
Research into general search engines has shown that their coverage
and reliability are imperfect (Bar-Ilan, 1999; Bar-Ilan & Peritz,
2004; Jasco, 2006; Lawrence & Giles, 1999; Mettrop &
Nieuwenhuysen, 2001; Rousseau, 1999). The problems include
differences in the results reported between search engines and even
by the same search engine over time. In addition, different search
engines can report different sets of results and rank their results
in different ways. Hence it is logical to assume that the same would
be true for blog search engines
Coverage (results)
It is not possible to precisely describe the coverage of blog search
engines. There is no single source of blog URLs and so each search
engine probably has a different set of blog URLs and uses a different
ad-hoc method to find new blogs. In addition, some search engines may
collect blog data indirectly via RSS feeds. For example, methods to
find new blogs include following links in existing blogs and
automatically identifying blogs in a general crawl of the web (e.g.,
Google could do this)
Table 3 summarises the results, excluding the search engines in Table
1 that used IceRocket results.
- Book (very common word)
- Librarian (medium-usage word)
- Timbuktu (low usage word)
- Citedness (rare word)
Table 3. The total number of hits reported in each search engine.
Search engine
|
book
|
Librarian
|
Timbuktu
|
citedness
|
Google Blog Search
(beta)*
|
15,252,764
|
1,662
|
411
|
11
|
Technorati*
|
11,048,316
|
151,474
|
12,497
|
32
|
Bloglines*
|
5,486,000
|
191,600
|
6,930
|
27
|
IceRocket
|
4,449,856
|
63,755
|
4,683
|
3
|
BlogPulse
|
2,990,010
|
46,179
|
2,905
|
3
|
Feedster*
|
1,404,746
|
25,429
|
816
|
3**
|
Blogdigger
|
687,025
|
24,480
|
547
|
6
|
Gigablast
|
458,742
|
13,726
|
667
|
3
|
Sphere*
|
357,020
|
9,071
|
672
|
3
|
Bloogz
|
48,478
|
1,769
|
54
|
0
|
Findory Blogs
|
2,159
|
282
|
1
|
0
|
*Numbers change between pages of results. **Using the “search
further back” option.
Table 4. Results of time-specific queries: from July 11 to 12, 2006.
Search engine
|
book
|
librarian
|
Timbuktu
|
citedness
|
Ice Rocket
|
38,552
|
609
|
37
|
0
|
BlogPulse
|
33,983
|
542
|
34
|
0
|
Sphere*
|
11,640
|
298
|
30
|
0
|
Bloglines
|
1,420
|
33
|
2
|
0
|
Feedster
|
153
|
2
|
0
|
0
|
Google Blog Search *
|
95
|
100
|
60
|
0
|
Converege (Language)
This would be consistent with the search engines (perhaps with the
exception of Google) developing language-specific strategies.
The results shown in tables 3 and 4 for each query suggest that the
search engines’ effective database sizes are significantly
different. In some cases the results are unreliable and vary
significantly between different pages of the result set and also for
the same query submitted at different times. Google’s results seem
rather low in Table 4, perhaps because it is a beta (pre-release)
version, or perhaps it uses only a subset of its database for
time-specific queries.
Table 5. Coverage
of Google translations of the word ‘library’ in several
languages.
Search engine
|
library
|
Biblioteca (Italian, Portuguese
Spanish)
|
Bibliothèque
(French)
|
Bibliothek (German)
|
المكتبه
(Arabic)
|
図書館
(Japanese)
|
(Korean)
|
图书
(Chinese simplified)
|
Google
|
4024970
|
186662
|
45669
|
17992
|
89
|
248160
|
4991
|
105563
|
Technorati
|
2634679
|
193666
|
41424
|
22780
|
0
|
1,161,055
|
0
|
0
|
Bloglines
|
2887000
|
7390
|
3750
|
2710
|
0
|
0
|
0
|
0
|
IceRocket
|
1060191
|
48616
|
26
|
6684
|
141
|
431
|
861
|
4681
|
BlogPulse
|
554482
|
23505
|
9505
|
2106
|
55
|
80690
|
0
|
143056
|
Feedster
|
207112
|
533
|
99
|
91
|
2
|
247
|
1
|
126
|
Blogdigger
|
175926
|
3935
|
2451
|
2358
|
0
|
1081
|
0
|
1131
|
Gigablast
|
103760
|
3458
|
1809
|
662
|
0
|
231
|
6
|
992
|
Sphere
|
83506
|
6810
|
251
|
390
|
2
|
7
|
1
|
5
|
Bloogz*
|
16175
|
442
|
0
|
285
|
0
|
0
|
0
|
0
|
Findory Blogs*
|
991
|
1
|
0
|
1
|
0
|
0
|
0
|
0
|
Coverage (bloggers)
Blogger demographics are an important issue for those wishing to
know about the opinions of bloggers or to use blog searches for
public opinion or trend identification. It is clear that bloggers are
not typical citizens of the world:
Presumably blog search engines, like general search engines
(Chakrabarti, 2003), identify new blogs to index by following links
from known blogs so that they tend to cover the more popular blogs
and would not have an explicitly biased policy for the kind of blogs
indexed. For example if search results contain mainly right-wing
blogs then this is unlikely to be the result of a coverage policy
decision
Internal Consistency
The results reported by general search engines have been shown to be
internally inconsistent, in the sense that the same query may yield
significantly different results when repeated a short while later
(Mettrop & Nieuwenhuysen, 2001). Moreover, different numbers may
be reported on different results pages. For blogs, an additional
factor is that the total number of matches for a particular day may
vary over time if results from spam blogs are removed, as the spam
blogs are identified, or if additional blog postings are subsequently
found (e.g., in a previously unknown blog).
The number of results reported by a search engine for a query may
vary for two reasons. First, the search engine may perform the
initial search over only a fraction of its database and then guess at
the total number of results in the full database. Second, the search
engine may perform the systematic elimination of duplicates or
near-duplicates on a page-by-page basis, using the results to predict
the total number of valid matches. This second reason explains why
the order in which the results are sorted can have an impact on the
apparent total number of results.
Table 6 illustrates some of these issues. Most of the search engines
report small changes in the search results when moving between
different pages. Google Blog Search gives more significant changes
and previous experience has shown that it can sometimes give
radically different results depending upon the order in which the
results are sorted, and the total number of results can change
dramatically when a lot of spam is involved. Table 7 illustrates this
phenomenon with some sample queries. In addition, simply pressing the
refresh button in Google sometimes changes the results
Table 6. Blog
search engine result changes for the query “Library”.
Search engine
|
Page 1
|
Page 2
|
Page 10
|
Page 1 again
|
Google
|
3988495
|
3988376
|
4011573
|
4016487
|
Technorati
|
2634843
|
2634888
|
2634888
|
2634888
|
Bloglines
|
2888000
|
2888000
|
2888000
|
2888000
|
IceRocket
|
1060317
|
1060321
|
1060321
|
1060321
|
BlogPulse
|
554524
|
554524
|
554524
|
554524
|
Feedster
|
207188
|
207205
|
207206
|
207206
|
Blogdigger
|
175926
|
175926
|
175926
|
175926
|
Gigablast
|
103760
|
103760
|
103760
|
103760
|
Sphere
|
83506
|
83504
|
83503
|
83492
|
Bloogz*
|
16175
|
16175
|
16175
|
16175
|
Findory Blogs*
|
991
|
991
|
991
|
991
|
Spam
Although spam does not seem to have attracted attention in
cybermetrics research, it is an issue for blog search engine research
(e.g., Han, Ahn, Moon, & Jeong, 2006; Narisawa, Yamada, Ikeda, &
Takeda, 2006) because blog spam is prevalent. Spam blogs may be
identified automatically or manually and the different search engines
may have differing levels of success in identifying and removing it.
Table 8 reports some results of manual spam blog counting in some
search engines. The relatively low quantity of Spam is reassuring and
in contrast to our earlier experience with news-related blog
searching, which typically produced 50%-90% spam results from fake
news blogs.
Table 8. Spam blog/non-blog results in the first 100 search matches.
Spam/Non-blogs
|
BlogPulse
|
Google
|
IceRocket
|
Book
|
8/0
|
0/29
|
6/11
|
Librarian
|
4/2
|
1/19
|
9/5
|
Timbuktu*
|
2/5
|
7/35
|
11/5
|
citedness
|
0/1 (3 hits)
|
1/2 (11 hits)
|
0/0 (3 hits)
|
*Noticeable repetition + non-English blogs
Table 7. Google blog search results for different pages and sort
options.
Query
|
Page 1
date-sorted/relevance-sorted
|
Page 2
date-sorted/relevance-sorted
|
Page 3
date-sorted/relevance-sorted
|
Page 4
date-sorted/relevance-sorted
|
Book
|
22,282,559/
26,180,926
|
15,436,952/
25,154,932
|
22,476,958/
42,241,386
|
26,926,910/
25,154,118
|
Librarian
|
1,284/1,247
|
2,116/2,173
|
2,923/3,044
|
3,500/3,900
|
Timbuktu
|
15,915/15,902
|
15,881/15,839
|
15,837/15,787
|
15,741/15,723
|
citedness
|
11/11
|
11/11
|
-
|
-
|
Precision
General search engines sometimes seem to make mistakes: i.e.
returning pages not matching the query term. This may be because the
page has changed between indexing and the time of the query. This
should not happen for blogs, or only rarely, because blog postings
tend not to be modified after being posted. A related issue is
stemming – some information retrieval systems automatically stem
words before matching them.
Overlaps and Ranking
Web search engines generally list results in order of decreasing
relevance so that the most useful pages or sites are in the first few
results. The ranking of web pages is typically performed using a
combination of the text in a page and the number of links pointing to
the page or site (Brin & Page, 1998; Chakrabarti, 2003). Hence
the top results of search engines tend to overlap somewhat – there
are online tools to explore this phenomenon (Jasco, 2005). Blog
search engines, in contrast, seem not to rank results using links but
present them by default in reverse chronological order, assuming that
the searcher will be more interested in currency than relevance or
authority. It seems unlikely that blog search engines will have a
large overlap in results since the most recent posts will depend upon
the blog checking order, which will vary by search engine (see
Lewandowski, Wahlig, & Meyer-Bautor, 2006). A large overlap could
only be expected for queries with few results and only if blog search
engine databases significantly overlap.
We compared the top 50 results for the query ‘librarian’ in
Google and BlogPulse, finding no overlaps at all, despite both
reporting recent results first. We constructed a rare query “library
of Timbuktu” to measure precise overlaps, illustrating the results
for the biggest engines in Table 9. In addition, Bloogz found 3
results (1 overlap with Technorati); Sphere found the same result as
IceRocket; Gigablast found 1 (unique) article; Blogdigger found 7 (3
overlapping with other engines) and Feedster does not allow phrase
searches. Overall, it seems that there is a low degree of overlap
between the search engines.
Table 9. Overlaps between search engines for the query “library of
Timbuktu”.
Overlap
|
Google
|
Technorati
|
Bloglines
|
IceRocket
|
BlogPulse
|
Google (6 matches)
|
-
|
1
|
3
|
0
|
0
|
Technorati (10)
|
1
|
-
|
1
|
1
|
0
|
Bloglines (4)
|
3
|
1
|
-
|
0
|
0
|
IceRocket (1)
|
0
|
1
|
0
|
-
|
0
|
BlogPulse (2)
|
0
|
0
|
0
|
0
|
-
|
CONCLUSIONS
Blog search engines are a source of new types of information, such as
public opinion and expert commentaries. Based upon the experiments
above, users should expect great variety between search engines and
alack of uniformity. Hence we make the following recommendations.
- Try different search engines to find one with the most useful capabilities.
- For low frequency queries a range of different search engines may be needed if one gives few results.
- For non-English queries look for a blog search engine that gives good coverage of the language.
If the searches are to be used to predict public opinion or to use
otherwise the total volume of hits for a query, then we make the
following additional recommendations.
- Don’t rely upon the “total results” estimates of most of blog search engines but perform additional checking and use the results of several engines together.
- Don’t assume that the results are unbiased by language or nation, or that bloggers are r epresentative of the general population.
Tidak ada komentar:
Posting Komentar