Tuesday, April 14, 2009

Data Mining as Psychotherapy: How the petabyte age could help us to know our selves and why that's nothing to be scared of


Let's start with a problem. Make it a personal problem. I mean "personal" in two senses: "personal" as in something that you wouldn't want to talk about with anyone besides a very close friend or a therapist, and "personal" as in specific to you and only you. Say you're depressed, or that you're engaging in what you know to be a self-destructive pattern of behavior. How can you solve this problem, or avoid repeating it in the future?

Now, let's pretend you had a magical machine that could track every bit of thought and experience you had from your birth to this moment. The data produced by the machine tells you everything that led up to that negative outcome moment. Well, not everything. Only the things that you were a part of. If a butterfly flapped its wings in China the day after you were born, the machine would not record that. It is possible that things that you were not directly a part of could have a profound effect on you, leading you to become depressed or engage in shitty behavior (see sensitive dependence on initial conditions). Nevertheless, if you have to limit your recording of information somehow due to technical limits, of which there will always be some, and your goal was to find out the cause of a problem that relates to you specifically, then it would be good to start with all experience and thought related directly to you.

Now that you've got all that information, you could look for patterns in that data, or maybe the machine could look for them for you. You notice a recurring pattern of actions or thoughts that you keep choosing that lead up to that negative outcome you want to avoid. We have an intuitive grasp of the kinds of behavior or thought that lead up to such outcomes (e.g. I'm depressed because I looked at a picture of a dead relative, which reminded me of how much i missed them and of my own mortality. Maybe I shouldn't look that picture so often). But sometimes, we can't see those patterns, either b/c we don't want to acknowledge that somethings that produce pleasure in the short-term might bring us displeasure in the long term, or b/c we just can't remember everything that we thought, felt, or did over the course of our entire lives.

The first problem is an objectivity/subjectivity problem, solved by asking a trusted friend or a therapist for advice. The second problem is a surveillance problem - no other person is there for our every waking moment, and even if they are, they can't see inside our heads (the closest you could come to that would be a parent or a lover). This creates a trade-off: you're the only person with access to your entire history and your thoughts and feelings, but you're not especially objective about them. The third problem is a cognitive load problem: no one person can hold that much information about one person's life in his/her head.

Enter the machine. The machine is capable of holding a much larger amount of information. The machine also makes the data available to you or to trained professionals (though, once you were able to see the patterns that linked the short-term pleasurable actions with long-term displeasurable consequences, you wouldn't need anyone else to tell you to cut it out). The machine could be with you every waking minute. Heck, it could even record what's going on in your head while you're asleep.

I'm not saying that the machine would allow you to see with complete certainty what led to that negative outcome, but it would be on a par with what we think to be causes and effects in the hard sciences. In other words, they would be able to reduce the probability of there being some other explanation for your misery to virtually nil. So, you take the data, you make note of the patterns of thought and behavior that led up to the negative outcomes, and you choose not to think or do the things that led up to those outcomes. Even better, you look at the patterns that lead up to your happiest moments, and you repeat those.

The machine does not exist. yet. But I think we've got a prototype: google. Google tracks our searches, which might tell us a little more about our patterns of behavior than we might know ourselves. A prototype also exists in the form of spyware that tracks our every click on the internet. As more and more of our desires and thoughts and feelings and actions are conveying on computers, the closer we come to having something like "the machine." (if you included mobile tech and its ability to track us throughout the day, you'd have an even better approximation of the machine).

Is the machine to be feared? I'd guess that most people would say yes, but I wouldn't agree. To me, that's like saying that you're afraid of knowing yourself, or afraid of knowledge in general. What we're afraid of is the misuse or misinterpretation of information. But should that keep us from garnering what we know to be more accurate information about our selves? There is such a thing as responsible data interpretation. In order to engage in responsible interpretation, it is essential to start with this assumption: the information we're dealing with is imperfect and incomplete, and yet it may offer us insight into our thoughts and actions that is superior to (or supplements) what we're currently working with. We need to engage in systematic testing of the circumstances in which this information does provide us with insight, and we need to identify misuse and misinterpretation and discourage it.

The other choice is one that I think too many people choose, mostly out of fear and laziness (its easier to dismiss the entire enterprise of data mining than learn how to do it responsibly and teach people how to interpret data properly and how to tell if someone else is interpreting data properly). I think that, on some fundamental level, we fear data, not the corporations or the governments or the scientists who are gathering it and using it, but the data itself. Given the rate at which behavioral data is piling up, this stance is becoming increasingly irresponsible each day. We can either let someone else aggregate all this data and learn why we do things and how to manipulate us, or we can take control of our own destinies and learn how our minds work so as to beat the others to the punch, to alter our behaviors so as to become less predictable. There's no going back to the pre-data age.

And really, the machine is just an extension of established sciences that spring from our desire to know ourselves more fully. Psychology, sociology, and really all of the social sciences are imperfect versions of the machine. They look for patterns leading up to outcomes we judge to be good or bad, but they have huge blind spots. But the blind spots are shrinking. Sociology and psychology have always played the red-headed stepchild to "hard sciences" like physics and chemistry. A great deal of this ill-will comes from the fact that social science can not deliver the levels of certainty that are the norm in hard sciences (hence the tolerance of smaller effect sizes in social science). That, too, may change in the petabyte age.

No comments: