Category Archives: manuscripts, inspiration

Why I think p-value is less important (for business decisions)

This article only stands for my own opinion.

When the Internet companies start to embrace experimentation (e.g. A/B Testing), they are told that this is the scientific way to make a judgement and help business decisions. Then statisticians teach software engineers the basic t-test and p-values etc. I am not sure how many developers really understand those complicated statistical theories, but they use it as way -- as a practical method to verify if their product works.

What do they look at? Well, p-value. They have no time to think about what "Average Treatment Effect" is or what the hell "Standard error and Standard deviations" are. As long as they know p-value helps, that's enough. So do many applied researchers who follow this criteria in their specialized fields.

In the past years there are always professionals claiming that p-values is not interpreted in the way that people should have done; they should look at bla bla bla... but the problem is that there is no better substitute to p-value. Yes you should not rely on a single number, but will two numbers be much better? or three? or four? There must be someone making a binary Yes/No decision.

So why do I think that p-value is not as important as it is today for business decisions as what I wrote in the title? My concern is that how accurate you have to be. Data and statistical analysis will never give you a 100% answer that what should be done; any statistical analysis has some kind of flaws (metrics of interest; data cleaning; modelling, etc.) -- nothing is perfect. Therefore, yes p-value is informative, but more like a directional thing. People make decisions anyway without data, and there is no guarantee that this scientific approach provides you the optimal path of business development. Many great experts have enough experience to put less weight on a single experiment outcome. This is a long run game, not one time.

Plus the opportunity cost -- experiment is not free lunch. You lose the chance to improve user's experience in the control group; you invest in tracking all the data and conduct the analysis; it takes time to design and run a proper test, etc. If it is a fast-developing environment, experiments could barely tell anything about the future. Their predictive power is very limited. If you are in a heavy competition, go fast is more important than waiting. Therefore instead of making adaptive decisions based on each single experiment, just go and try everything. The first decision you should make is how much you want to invest in testing and waiting; if the cost of failure product is not high, then just go for it.

I love these cartoons from the book "How Google Works". You have to be innovative in the Internet industry because you are creating a new world that no body could imagine before.

Screen Shot 2015-02-06 at 12.48.06 AM Screen Shot 2015-02-06 at 12.48.13 AM Screen Shot 2015-02-06 at 12.48.45 AM Screen Shot 2015-02-06 at 12.49.05 AM

Expectation, epidemics and the social network

As the final master project, I set "social network" as one key word, not surprisingly. Then I need a context to apply to.

Actually I began to think about the topic from last Christmas. However, things were not going as well as I expected. A few months later I posted a notice on my chinese blog and was looking for someone to collaborate on this. Unfortunately, not that ideal either.

Now there is only one month left for me to work on it. Moreover, I also noticed that the 4th Chinese R conference is going to take place on the 28th of May. Well, since social network is one topic on the list, I really want to go back and see what's going on. Ideally, if I can finish the paper by 21th of May, and I can also find a funding for this trip, I would like to go back to China for a few days and give a speech... Well, then time is pretty tight, and I don't know even where I can ask for money. Does UPF or BGSE have a flyout funding? Any helpful informaion?

Well... a little off topic. This time, we found several great papers to follow. Before, I was only reading Jackson's book on social and economics networks, so I knew nothing about health. Interestingly, there are more people from sociology who are working on social network. Checked out some demographic journals,  tons of relevant papers appear.

However, although we have got enough idea now and it seems to have a possible way, still there are a lot of questions remained. As I wrote down a year before, my idea is concentrated around "information". In particular, here I want to try to stress the importance of information on expectation. From the context's view, when it comes to health, people tend to be either over optimistic or passive,  and as a result, the risk is either overestimated or underestimated. From the common sense, if people do not hold a nearly correct expectation, then for sure, their behaviors  will diverge from the optimal path. This time, I want to find out the way to model this process, and further, check the validity of the model by some empirical methods.

Maybe now, most challenges lie in the concern of time. Somehow luckily, I only have two courses to take this semester, which ensures more free time for me to work on a project. But still, I don't know how much I can finish. Things are going more and more stressful.

Among all issues related to (public) health, this time we may pick some epidemics, and a good choice should be HIV/AIDS. HIV has become such a hot topic in recent years, and as a sexual disease, it does have a lose link to social network, or specifically, sexual network, including the homosexual and heterosexual ones. Moreover, the risk perceptions of HIV/AIDS will also influence or even determine corresponding sexual behaviors - that's may be a reason why there are so many public targeted education in countries all around the world.

Hopefully I was not too optimistic this time (otherwise for me, I cannot follow the optimal path as well ^_^), and also, not too ambitious. Learning by doing, that's how we will benefit from this project.

Natural experiment on public good and social behaviors

Today, suddenly I saw the latest issue of AER (SEP, 2010. How late I was!). I failed to stop myself from having a look at it, although it is already in the exam weeks. After a balance, I chose to have a brief look at this issue. It is a kind of evidence that my life has become worse - I used to at least read the content of every issue of AER, JPE, QJE every season... Now it has been such a long time that I have no access to these journals. I cannot stop myself, and I was so excited when I saw the red face of AER again. (I was also overexcited when I heard the news that LYX 2.0 is going to be released. Fantastic new features: Advanced Search, Spell-checking on the fly, Multilingual Thesaurus, Table features, Progress view and debugging pane, Instant preview inset, etc!!! Awesome! For details, just jump to http://wiki.lyx.org/LyX/NewInLyX20)

OK. Go back to AER. Two articles attracted my interests:

First one first. The first reason why I was attracted because of the words "social" "contribution" "online communities" and "field experiment". Without doubt, it reminded me of Michael Zhang's paper immediately (also, here is a brief intro in Chinese I wrote before).

Then the author. The world is so small. I just met Sherry Xin Li this spring when I was spending a month auditing courses and seminars in Tsinghua and Peking U. Her topic was "虚拟世界实验中的社会距离问题", or "Social Distance in a Virtual World Experiment". It was an interesting paper about experiments they did in the virtual world game "Second Life". Thus, it is interesting for me to see her work again, since the online market is always a market I'm paying attention to.

In this paper, she/they designed several experiments to study how the social comparison increases contributions to an online community. Here are some main results:

  • After receiving behavioral information about the median user’s total number of movie ratings, users below the median demonstrate a 530% increase in the number of monthly movie ratings, while those above the median do not necessarily decrease their ratings.
  • When given outcome information about the average user’s net benefit score, above-average users mainly engage in activities that help others.
  • Effective personalized social information can increase the level of public goods provision.

Public good is always an interesting topic for economists, perhaps due to the fact that market inefficiency/failure has always been concerned by economists.  Now I'm interested in the social behaviors. Can we actually design a mechanism to improve the supply of public goods? What are the necessary conditions for those mechanisms to function well?

Moreover, another interesting point is that how to utilize the data from the Internet. Frank has a good comment on it. I'm learning from more and more published papers to see where are the pitfalls.

OK, go to the second paper. The law of the few stresses an empirical phenomenon: in social groups a very small subset of individuals invests in collecting information while the rest of the group invests in forming connections with this select few. The interesting thing here is that someone prefer to invest in information while the rest invest in forming connections (Which type I am? Both???Haha~)

It is a kind of traditional social network application, and a little old (the working paper version I found was written in 2007, now it is 2010! what a long publishing cycle!). The authors talked about the structure of social network, and of course, network game. The meaningful suggestion for policy makers (either government or advertisement makers) is that

by collecting information about the communication network, for example by asking a subset of the community members to report “with whom they talk to” about a particular matter, the government can identify an opinion leader, the individual who receives most nominations. Each dollar spent on this opinion leader will then spill over to all community members.

Which have been confirmed by some of my friends who are using similar strategies in their broadcastings.

Fine... I'm thinking about my master project now so everything I'm caring about is how to apply social network to a certain field. Hopefully I can find some interesting points a few days later. But anyway, I need to pay attention to the exams. Go back to books and problems.

Social behavior VS Individual behavior

Well, recently I'm focusing on an interesting model --social network and thinking about an old issue --social behavior. The social network model is really attractive (if it can be regarded as a model, or more exactly, a theory), especially for people like me who are wondering the traditional assumptions in classical economics while trying to borrow something from other subjects, like sociology.

Using complex network as a mathematical tool, the social network model includes so many factors that cannot be described in the past. Well, I should admit here that I haven't taken any related courses yet, and all I know about sociology are inherited from some unprofessional books. I should also admit that I was really attracted by behavior economics last winter, but until now had I started to "study" it in an academic way. Maybe the reasons are pretty simple: first, I was busying applying for postgraduate study positions last winter; second, there was not a good teacher who was able to teach this course; third, I have no idea about which book (or textbook) should be chosen as an introductory one.

Well, I need some time to make it clear in my mind that how social network works. However,  I've got a much easier question yet. That is, in traditional macroeconomic models, like the Lucas' island model, when we are trying to calculate the sum of all individual's save, at most time we simply add them together, (i.e. a*N, where N is the number of population).  However, I think at least the individual's save should be regarded as a stochastic variable (e.g follows a normal distribution, or Brownian movement), so the sum should also follow a normal distribution. Well, in econometrics we do not need to worry about this question. But I wonder whether it would be valuable if we can make this small change.

In the example above, I want to figure out that the micro-foundations for macroeconomics should be dug more deeply in order to persuade readers.  Borrowing some mature conclusions from other fields, this task shall be easier. Alternatively, economists should make an effort (maybe something can be found from data) to know more about the relationships between individual behavior and social behaviors, thereby establishing a proper model to describe them. Personally speaking, the study of social network may be helpful, since different from other social science, at least it has a mathematical model...And the diffusion of information can be easily introduced into macroeconomic models in this way...WoW!