The Effect of Outbound Links:
Since PageRank is based on the linking structure
of the whole web, it is inescapable that if the inbound links of
a page influence its PageRank, its outbound links do also have some
impact. To illustrate the effects of outbound links, we take a look
at a simple example.
We
regard a web consisting of to websites, each having two web pages.
One site consists of pages A and B, the other constists of pages
C and D. Initially, both pages of each site solely link to each
other. It is obvious that each page then has a PageRank of one.
Now we add a link which points from page A to page C. At a damping
factor of 0.75, we therefore get the following equations for the
single pages' PageRank values
- PR(A) = 0.25 + 0.75 PR(B)
- PR(B) = 0.25 + 0.375 PR(A)
- PR(C) = 0.25 + 0.75 PR(D) + 0.375 PR(A)
- PR(D) = 0.25 + 0.75 PR(C)
Solving the equations gives us the following PageRank
values for the first site:
- PR(A) = 14/23
- PR(B) = 11/23
We therefore get an accumulated PageRank of 25/23
for the first site. The PageRank values of the second site are given
by
- PR(C) = 35/23
- PR(D) = 32/23
So, the accumulated PageRank of the second site
is 67/23. The total PageRank for both sites is 92/23 = 4. Hence,
adding a link has no effect on the total PageRank of the web. Additionally,
the PageRank benefit for one site equals the PageRank loss of the
other.
The Actual Effect of Outbound Links:
As it has already been shown, the PageRank benefit
for a closed system of web pages by an additional inbound link is
given by
(d / (1-d)) × (PR(X) / C(X))
where X is the linking page, PR(X) is its PageRank
and C(X) is the number of its outbound links. Hence, this value
also represents the PageRank loss of a formerly closed system of
web pages, when a page X within this system of pages now points
by a link to an external page.
The validity of the above formula requires that
the page which receives the link from the formerly closed system
of pages does not link back to that system, since it otherwise gains
back some of the lost PageRank. Of course, this effect may also
occur when not the page that receives the link from the formerly
closed system of pages links back directly, but another page which
has an inbound link from that page. Indeed, this effect may be disregarded
because of the damping factor, if there are enough other web pages
in-between the link-recursion. The validity of the formula also
requires that the linking site has no other external outbound links.
If it has other external outbound links, the loss of PageRank of
the regarded site diminishes and the pages already receiving a link
from that page lose PageRank accordingly.
Even if the actual PageRank values for the pages
of an existing web site were known, it would not be possible to
calculate to which extend an added outbound link diminishes the
PageRank loss of the site, since the above presented formula regards
the status after adding the link.
Intuitive Justification of the Effect
of Outbound Links:
The intuitive justification for the loss of PageRank
by an additional external outbound link according to the Random
Surfer Modell is that by adding an external outbound link to one
page the surfer will less likely follow an internal link on that
page. So, the probability for the surfer reaching other pages within
a site diminishes. If those other pages of the site have links back
to the page to which the external outbound link has been added,
also this page's PageRank will deplete.
We can conclude that external outbound links diminish
the totalized PageRank of a site and probably also the PageRank
of each single page of a site. But, since links between web sites
are the fundament of PageRank and indespensable for its functioning,
there is the possibility that outbound links have positive effects
within other parts of Google's ranking criteria. Lastly, relevant
outbound links do constitute the quality of a web page and a webmaster
who points to other pages integrates their content in some way into
his own site.
Dangling Links:
An important aspect of outbound links is the lack
of them on web pages. When a web page has no outbound links, its
PageRank cannot be distributed to other pages. Lawrence Page and
Sergey Brin characterise links to those pages as dangling links.
The
effect of dangling links shall be illustrated by a small example
website. We take a look at a site consisting of three pages A, B
and C. In our example, the pages A and B link to each other. Additionally,
page A links to page C. Page C itself has no outbound links to other
pages. At a damping factor of 0.75, we get the following equations
for the single pages' PageRank values:
- PR(A) = 0.25 + 0.75 PR(B)
- PR(B) = 0.25 + 0.375 PR(A)
- PR(C) = 0.25 + 0.375 PR(A)
Solving the equations gives us the following PageRank
values:
- PR(A) = 14/23
- PR(B) = 11/23
- PR(C) = 11/23
So, the accumulated PageRank of all three pages
is 36/23 which is just over half the value that we could have expected
if page A had links to one of the other pages. According to Page
and Brin, the number of dangling links in Google's index is fairly
high. A reason therefore is that many linked pages are not indexed
by Google, for example because indexing is disallowed by a robots.txt
file. Additionally, Google meanwhile indexes several file types
and not HTML only. PDF or Word files do not really have outbound
links and, hence, dangling links could have major impacts on PageRank.
In
order to prevent PageRank from the negative effects of dangling
links, pages wihout outbound links have to be removed from the database
until the PageRank values are computed. According to Page and Brin,
the number of outbound links on pages with dangling links is thereby
normalised. As shown in our illustration, removing one page can
cause new dangling links and, hence, removing pages has to be an
iterative process. After the PageRank calculation is finished, PageRank
can be assigned to the formerly removed pages based on the PageRank
algorithm. Therefore, as many iterations are needed as for removing
the pages. Regarding our illustration, page C could be processed
before page B. At that point, page B has no PageRank yet and, so,
page C will not receive any either. Then, page B receives PageRank
from page A and during the second iteration, also page C gets its
PageRank.
Regarding our example website for dangling links,
removing page C from the database results in page A and B each having
a PageRank of 1. After the calculations, page C is assigned a PageRank
of 0.25 + 0.375 PR(A) = 0.625. So, the accumulated PageRank does
not equal the number of pages, but at least all pages which have
outbound links are not harmed from the danging links problem.
By removing dangling links from the database, they
do not have any negative effects on the PageRank of the rest of
the web. Since PDF files are dangling links, links to PDF files
do not diminish the PageRank of the linking page or site. So, PDF
files can be a good means of search engine optimisation for Google.
|