The other week I got a new smartphone. I have enough power for a whole day again and the pictures turn out beautifully. However, it is hardest for me to get used to the new keyboard and the other way around the algorithms have a hard time getting used to my typing preferences. Around the same time, a brand-new study was published that was able to predict users’ affective states based on their smartphone touch data.
What are affective states?
One of the most influential researchers when it comes to human emotions is James Russell. In 1973, he published his classification of affective states in three distinct and bipolar dimensions: valence, arousal and dominance.
Valence is also known as hedonic tone and describes positivity or negativity of a behavior, feeling, event, situation, interaction, … and so on. Think about “good” or “bad” as the two ends of the scale. Valence can be used to describe emotions, i.e. joy having a positive valence but sadness having a negative valence.
Arousal can be seen as the degree of alertness or attentiveness of an individual at a given time. Again, it can help us categorizing emotions, excitement goes along with high level of arousal whereas being relaxed means low arousal.
Dominance ranges from controlling to controlled. Think about negative emotions such as anger or fear. They share the same valence (“bad”, “negative”) but while anger is a dominant emotion, fear can be considered submissive.
Since then, many researchers tried to map all kinds of emotions along these dimensions. I searched the web to get you the original publication here.
Fast-forward to the year 2020 when researchers of ETH Zürich were able to predict these dimensions based on our smartphone typing behavior.
Now how does emotion detection based on touch data work?
Let’s dive into the study at hand which was able to predict emotional states by feeding a semi-supervised model with smartphone typing data. To do so, 70 university students took part in chat conversations that were designed to evoke certain affective states. The four conversations should be perceived exciting, shocking, rude and confusing by the participants (I can only recommend reading in the original publication about the cover stories to evoke emotions, especially “rude” is hilarious).
However, the four different conversation partners have been the same person – you guessed it right – the experimenter sitting next door or in one case a chatbot. The researchers reached deep into their bag of tricks by telling the participants their conversation partners were real people and creating fake profile pictures for each contact.
From time to time, participants had to self-report on their affective state in terms of valence, arousal, dominance and some basic emotions (angry, sad, happy, surprise, stressed). The Self Assessment Manikin Scale (see picture below) is a popular tool to assess affective states with the help of adorable cartoon characters. This is done to capture so called “ground truth data” which can be used to label data and to evaluate and train intelligent models.
The researchers build their model based on pressure and touch speed data collected during the chats and synthesized them into heatmaps.
The model proved to be capable of predicting valence, arousal and dominance at three levels (low, medium, high) with maximum AUC (area under the ROC curve, ROC meaning Receiver Operating Characteristic Curve) between 0.82 and 0.84. AUC is – very simplified – a measure to test accuracy. Slightly more complex, AUC is a metric to assess how well classes are separated. An AUC value of 0.5 would be pure chance, values greater than 0.8 can be considered good. In case you never heard about ROC and AUC before, I found some beginner’s resources from Google and Devopedia.
The model is not only able to predict the three dimensions valence, arousal and dominance but also a list of basic emotions such as anger, happiness, sadness, surprise and stress.
According to the study and in alignment with past research, pressure is a good predictor for arousal. Maybe you will remember this fact the next time you are stress-texting 🙂 They also found that positive valence comes with increased typing speed.
Why should my smartphone know about my mood?
If technical systems are able to detect our emotions, they can adapt and provide us with tailored experiences. Have you ever been really angry and send a text you then regretted later? Well, this might be able to be prevented in the future – by your smartphone. Or maybe you have taken an e-Learning class in the past and get demotivated or bored at some point. An adaptive system would be able to detect your emotions and respond accordingly, i.e. through an extra motivational boost. Even industrial robots are trained to detect emotions and personality of their fleshy co-workers to increase safety and operational performance.
I searched the web to find an article on affective computing published in WIRED magazine in 1996 (it is nearly as old as myself). Back then the authors Negroponte and Picard asked: “What if Microsoft could access a database of affective information from people interacting with its software and modify the parts that annoy people the most?“ “Yes, mates, what if?” I’m asking myself 24 years later. 🙂
At 2019 CES in Las Vegas, Nuance Automotive showcased their in-car assistant that analyzed drivers’ emotions based on facial expression recognition and voice. It is likely that we will see more adaptive features in the automotive world soon since they have the potential to increase the drivers’ safety.
Amazon realized the potential of emotion detection for their services, too. Especially interaction with Amazon’s Alexa will benefit from information on emotional states of users and the ability to adapt. Reports on extensive research in the area of emotion detection date back to 2017, followed by specific patent files in 2018 and rumors about a health and wellness gadget using emotion detection in 2019. It was also last year that Amazon researchers published work on a generative model to emotion detection based on voice. This piece of research is really interesting in my point of view. In the past, voice data labeled by experts was needed to train models in a supervised fashion. Improving the technology was thus limited by the quantity of labeled data, a bottleneck that belongs to the past with self-teaching AI. So it’s only the icing that the new model even produced more accurate results.
Some years ago, I even build my own, low-cost emotion detector as a university project. I used the credit-card sized computer Raspberry Pi and a free trial account of Google’s Vision API to test it out. With a camera, I took pictures of faces which were then uploaded to Google Cloud and classified. Next, I attached and programmed a small LCD display to show the (simplified) result of the analysis. Here are some pictures of a first prototype:
If you want to try it out yourself I found a tutorial online that will guide you through the process.
If everyone can analyze emotions, what’s the fuss about?
As we can see, there are good reasons for technology to know about our emotions and for many years, researchers are using different methods to detect our mood.
In my opinion, the distinguishing factor of this research compared to other ways of detecting affective states is its non-invasive nature. I don’t know if people would opt-in being video-recorded by their devices 24/7 in order to assess emotional states. In the same way, I wouldn’t be too happy if my apps run sentiment analyses of every word I type to classify my emotions. If you want to learn more about sentiment analysis, I recommend the article “everything there is to know about sentiment analysis” (yes, I was lured by the title).
Another benefit of emotion detection based on touch data is its efficiency. The researchers reported on a total process time of 0.45 seconds, which is literally in the blink of an eye (blinking takes around 300-400 milliseconds to be precise).
I hope you enjoyed this article and think of this the next time you are rage-texting! 🙂
Source: Wampfler, R., Klingler, S., Solenthaler, B., Schinazi, V. R., & Gross, M. (2020, April). Affective State Prediction Based on Semi-Supervised Learning from Smartphone Touch Data. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (pp. 1-13).