329. Human Activity in the Web
Filippo Radicchi is a research scientist in Complex Systems Lagrange Lab, ISI Foundation, Turin. He is interested in non-equilibrium diagrammatic methods, RG group analysis of complex networks and community detection. Dmitry.
We use to spend a relevant part of our time surfing the Web: we read news, make posts in blogs, share photos and music, buy books or other goods, etc. The Web offers great possibilities to communicate and retrieve information and none of the precedent technologies can be compared to the Web in terms of globality and velocity of communication.
The Web represents also an important source of information for scientific purposes. Actions performed in the Web are generally stored in electronic databases. Think for example about NEQNET: when we you make a post or leave a comment, meaningful information about your action are stored in the database present on the computer which hosts the service: in addition to the content of the message, also your nickname and the time stamp of your message are saved. Electronic databases collecting information about human activity in the Web can be therefore used in order to understand how people behave and interact.
Former studies have already focused on computer related human activities. Particular attention has been addressed to the activity patterns of humans. Interesting information can be extracted from the statistical analysis of the so called inter-event times. Imagine we know the the instants of time
in which a user i has performed
actions. From such information, we can calculate
inter-event time gaps:
. Then we can compute the inter-event time probability distribution function (pdf) of the i-th user as
,
where
is the total number of consecutive actions performed by the user i which differ by
units of time. The global (calculated over the whole population) pdf can be then calculated as

which is basically the weighted average of the pdfs of single users: each user contributes to the global pdf linear proportionally to her/his global activity. Global inter-event pdfs have been studied in the case of e-mail communication [Nature 435, 207-211 (2005)], Web surfing [Phys. Rev. E 78, 026123 (2008)], etc. In all these cases, it has been shown that the global inter-event time pdf can be well fitted by a power-law
,
where the exponent
ranges from 1 to 2, depending on the case of study. This finding is particularly relevant because human activity seems to be characterized by a bursty behavior: long periods of inactivity followed by short periods of intense activity. Some models have been introduced in order to explain this emergent behavior [Nature 435, 207-211 (2005)]. More recently, in [Proc. Natl. Acad. Sci. USA 105, 18153-18158 (2008)] it has been shown that the power-law decay could be explained as the superposition of non-homogeneous poissonian processes.
In our paper, we study three very large databases. We considered a big set of inquiries performed on the search’s engine of America On Line, all logging actions performed by users on the English website of Wikipedia and a big set of feedback messages sent by users on the Ebay (EB) website. The global inter-event time pdf calculated for the EB dataset is shown in Figure 1.

As one can clearly see, the global pdf is characterized by a power-law decay modulated by periodic (daily) oscillations. It should be noticed that the definition of the global pdf is meaningful only in the hypothesis that all users behaves in the same way, which means that each
is a random variable extracted from the same pdf (the global one) independently of the considered user. This assumption is however wrong. If we calculate the statistical significance of the global
to describe the activity pattern of single users we see that it significantly violates the null hypothesis. A simple Kolmogorov-Smirnov (KS) test which systematically compare the global
with each of the single users’pdf (see Figure 2), shows that fraction of users whose activity pattern is describe by P(tau) within a significant level at least equal to Q is much less than expected.

The main reason of this discrepancy is due to the heterogeneity of the population in terms of number of operations performed. Not all users perform the same number of actions, but instead the number of users who have performed a operations equal to n follows a broad distribution. Interestingly, users performing the same number of operations have similar activity patterns. We first define
as the inter-event pdf averaged only over users who have performed n total actions. We first see that the statistical significance of
is much better than the one of
(Figure 3).

Each panel reports the fraction of users R(Q) whose activity patterns are described by the pdf P(n)(
) with a probability at least equal to Q for different values of n. The qualitative comparison with Figure 2 tells us that the P(n)(
) can describe the activity patterns of single users much better than P(n)(
). The reliability of P(n)(
) decreases however as n increases.
In addition, we can see
depends on
in the sense that the decay exponent of this pdf varies as a function of n (see Figure 4).

Each panel reports the inter-event time pdfs P(n)(
) for the same values of n considered in Figure 3. P(n)(
) can be well fitted by a power-law (dashed lines), but the decay exponent varies with n. In the presented cases we have:
≈ 1.1, 1.2, 1.8 and 2.3.
The importance of the result is twofold. First, it is important to stress that the study of the global
is meaningless. The global pdf is defined on the basis of wrong hypothesis and therefore every results obtained by its analysis are biased. Instead of a global pdf valid for every user, it is better to focus on the study of many different pdfs each corresponding to users with similar activity. Second, the finding opens a new direction for the modelling the process. New models are required in order to understand how and why the number of operations influences the decay exponent of the inter-event time pdfs.
People interested in this topic is invited to read our manuscript and visit the homepage, where all data can be freely downloaded.
If you liked the post, please kindly consider to leave a comment, subscribe to the RSS feed or get new posts sent directly to your Inbox. If you want to chat with me in real time, you can find me on Twitter. The posts below are probably related to the subject of this one:

Save This Post as PDF



