Sat Dec 29 05:03:17 PST 2018






184
There is significant effort being placed on looking for ways to move the PageRank
model to a model based upon trust and local communities.

Link Spam Detection Based on Mass Estimation
TrustRank mainly works to give a net boost to good, trusted links. Link Spam
Detection Based on Mass Estimation was a research paper aimed at killing the
effectiveness of low-quality links. Essentially the thesis of this paper was that you
could determine what percent of a site?s direct and indirect link popularity come
from spammy locations and automate spam detection based on that.
The research paper is a bit complex, but many people have digested it. I posted on
it at http://www.seobook.com/archives/001342.shtml.
Due to the high cost of producing quality information versus the profitability and
scalability of spam, most pages on the web are spam. No matter what you do, if
you run a quality website, you are going to have some spammy websites link to you
and/or steal your content. Because my name is Aaron Wall, some idiots kept
posting links to my sites on their ?wall clock? spam site.
The best way to fight this off is not to spend lots of time worrying about spammy
links, but to spend the extra time to build some links that could be trusted to offset
the effects of spammy links.
Algorithms like the spam mass estimation research are going to be based on
relative size. Since quality links typically have more PageRank (or authority by
whatever measure they chose to use
than most spam links, you can probably get
away with having 40 or 50 spammy links for every real, quality link.
Another interesting bit mentioned in the research paper was that generally the web
follows power laws. This quote might be as clear as mud, so I will clarify it shortly.
A number of recent publications propose link spam detection
methods. For instance, Fetterly et al. [Fetterly et al., 2004]
analyze the indegree and outdegree distributions of web pages.
Most web pages have in- and outdegrees that follow a power-
law distribution. Occasionally, however, search engines
encounter substantially more pages with the exact same in- or
outdegrees than what is predicted by the distribution formula.
The authors find that the vast majority of such outliers are spam
pages.
Indegrees and outdegrees above refer to link profiles, specifically to inbound links and
outbound links. Most spam generator software and bad spam techniques leave
obvious mathematical footprints.
If you are using widely hyped and marketed spam site generator software, most of
it is likely going to be quickly discounted by link analysis algorithms since many


No comments:

Post a Comment

Featured Post

Tue May 18 12:07:08 CDT 2021