Recent posts: Vortex line representation. Cauchy invariant
Powered by MaxBlogPress 

329. Human Activity in the Web

Print This Post Print This Post   Save This Post as PDF                                


Filippo Radicchi Filippo Radicchi is a research scientist in Complex Systems Lagrange Lab, ISI Foundation, Turin. He is interested in non-equilibrium diagrammatic methods, RG group analysis of complex networks and community detection. Dmitry.

We use to spend a relevant part of our time surfing the Web: we read news, make posts in blogs, share photos and music, buy books or other goods, etc. The Web offers great possibilities to communicate and retrieve information and none of the precedent technologies can be compared to the Web in terms of globality and velocity of communication.

The Web represents also an important source of information for scientific purposes. Actions performed in the Web are generally stored in electronic databases. Think for example about NEQNET: when we you make a post or leave a comment, meaningful information about your action are stored in the database present on the computer which hosts the service: in addition to the content of the message, also your nickname and the time stamp of your message are saved. Electronic databases collecting information about human activity in the Web can be therefore used in order to understand how people behave and interact.

Former studies have already focused on computer related human activities. Particular attention has been addressed to the activity patterns of humans. Interesting information can be extracted from the statistical analysis of the so called inter-event times. Imagine we know the the instants of time t_1 \leq t_2 \leq \ldots \leq t_{n_i} in which a user i has performed  n_i actions. From such information, we can calculate  n_i -1 inter-event time gaps:  \tau_1=t_2 -t_1, \ldots, \tau_{n_i-1}=t_{n_i}-t_{n_i-1} . Then we can compute the inter-event time probability distribution function (pdf) of the i-th user as

P_i(\tau)=x_i(\tau) / (n_i-1),

where x_i(\tau) is the total number of consecutive actions performed by the user i which differ by \tau units of time. The global (calculated over the whole population) pdf can be then calculated as

P(\tau)=\sum_i (n_i-1) P_i(\tau) / \sum_i (n_i-1)

which is basically the weighted average of the pdfs of single users: each user contributes to the global pdf linear proportionally to her/his global activity. Global inter-event pdfs have been studied in the case of e-mail communication [Nature 435, 207-211 (2005)], Web surfing [Phys.  Rev. E 78, 026123 (2008)], etc. In all these cases, it has been shown that the global inter-event time pdf can be well fitted by a power-law

P(\tau) \sim \tau^{-\beta},

where the exponent \beta ranges from 1 to 2, depending on the case of study. This finding is particularly relevant because human activity seems to be characterized by a bursty behavior: long periods of inactivity followed by short periods of intense activity. Some models have been introduced in order to explain this emergent behavior [Nature 435, 207-211 (2005)]. More recently, in [Proc. Natl. Acad. Sci. USA 105, 18153-18158 (2008)] it has been shown that the power-law decay could be explained as the superposition of non-homogeneous poissonian processes.

In our paper, we study three very large databases. We considered a big set of inquiries performed on the search’s engine of America On Line, all logging actions performed by users on the English website of Wikipedia and a big set of feedback messages sent by users on the Ebay (EB) website. The global inter-event time pdf calculated for the EB dataset is shown in Figure 1.

329. Human Activity in the Web

As one can clearly see, the global pdf is characterized by a power-law decay modulated by periodic (daily) oscillations. It should be noticed that the definition of the global pdf is meaningful only in the hypothesis that all users behaves in the same way, which means that each \tau is a random variable extracted from the same pdf (the global one) independently of the considered user. This assumption is however wrong. If we calculate the statistical significance of the global P(\tau) to describe the activity pattern of single users we see that it significantly violates the null hypothesis. A simple Kolmogorov-Smirnov (KS) test which systematically compare the global P(\tau) with each of the single users’pdf (see Figure 2), shows that fraction of users whose activity pattern is describe by  P(tau) within a significant level at least equal to Q is much less than expected.

329. Human Activity in the Web

The main reason of this discrepancy is due to the heterogeneity of the population in terms of number of operations performed. Not all users perform the same number of actions, but instead the number of users who have performed a operations equal to n follows a broad distribution. Interestingly, users performing the same number of operations have similar activity patterns. We first define P^{(n)}(\tau) as the inter-event pdf averaged only over users who have performed n total actions. We first see that the statistical significance of P^{(n)}(\tau) is much better than the one of P(\tau) (Figure 3).

329. Human Activity in the Web

Each panel reports the fraction of users R(Q) whose activity patterns are described by the pdf P(n)(\tau) with a probability at least equal to Q for different values of n. The qualitative comparison with Figure 2 tells us that the P(n)(\tau) can describe the activity patterns of single users much better than P(n)(\tau). The reliability of P(n)(\tau) decreases however as n increases.

In addition, we can see P^{(n)}(\tau) depends on n in the sense that the decay exponent of this pdf varies as a function of n (see Figure 4).

329. Human Activity in the Web

Each panel reports the inter-event time pdfs P(n)(\tau) for the same values of n considered in Figure 3. P(n)(\tau) can be well fitted by a power-law (dashed lines), but the decay exponent varies with n. In the presented cases we have:  \beta ≈ 1.1, 1.2, 1.8 and 2.3.

The importance of the result is twofold. First, it is important to stress that the study of the global P(\tau) is meaningless. The global pdf is defined on the basis of wrong hypothesis and therefore every results obtained by its analysis are biased.  Instead of a global pdf valid for every user, it is better to focus on the study of many different pdfs each corresponding to users with similar activity. Second, the finding opens a new direction for the modelling the process. New models are required in order to understand how and why the number of operations influences the decay exponent of the inter-event time pdfs.

People interested in this topic is invited to read our manuscript and visit the homepage, where all data can be freely downloaded.

  • Digg
  • Reddit
  • StumbleUpon
  • Technorati

If you liked the post, please kindly consider to leave a comment, subscribe to the RSS feed or get new posts sent directly to your Inbox. If you want to chat with me in real time, you can find me on Twitter. The posts below are probably related to the subject of this one:

352. 48 years ago
332. NEQNET: last two weeks of March
161. Survival of the witless
70. Excess of cash and the origin of bubbles
68. On Shakespeare

RSS feed | Trackback URI

1 Comment »

Please, enter your name (required)
e-mail (required - never shown publicly)
URI
or login via Facebook by clicking the button below
Your comment (smaller size | larger size)
For LaTeX in your comment, please use tags [tex] and [/tex]. Also, you may use the following HTML tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> .

« Back to text comment
or subscribe me to comments RSS feed

Trackback responses to this post