Using Big Data to Identify Four Personality Types

Post by Anastasia Sares

What's the science?

The effort to understand and categorize human personalities has been going on for millennia. Still, we don’t have a very good intuition about how many personality types there should be, or what kinds of traits are important in defining personalities. The most well-established model of personality traits to date is the Big Five, or the Five-Factor Model, which defines five dimensions of personality: extraversion, openness to experience, conscientiousness, agreeableness, and neuroticism. If we mathematically assess the responses of many people, patterns emerge, and a small number of stable traits can often be found. How do we get from traits that everyone has to some degree, such as high extraversion or low agreeableness, to different ‘categories’ of people’s personality types? This week in Nature Human Behavior, Gerlach and colleagues attempt to categorize personalities using online survey data from over 1.5 million people. 

How did they do it?

The authors acquired four different data sets, each consisting of an online personality questionnaire with 100,000 to 500,000 responses. The questionnaires had different numbers of questions (44, 100, 120, and 300). They analyzed the 300-question data set to develop their initial typology. The authors first extracted the Big Five trait scores for each person using a factor analysis. The next step was to look for clusters of people that had similar profiles across the traits (using a clustering algorithm). At this point, they obtained 13 different clusters, a number that seemed quite high. So they took the data and randomized it, shuffling the trait scores comparing the random data to the real data. They identified the clusters where the density in the real data was distinctly larger than in the the randomized data, ending up with four clusters corresponding to four distinct personality types. Finally, the authors analyzed the other three data sets in the same way as the first one. This way, they could make sure that their results were reproducible.

What did they find?

The authors found four personality types, which they labeled average type, role model type, self-centered type, and reserved type. The average type was named for their average scores on all of the Big Five traits. The role model had low neuroticism and high scores in all the other traits. The self-centered type had low scores on openness, agreeableness and conscientiousness, while the reserved type had low scores on openness and neuroticism. In general, these types replicated well across the other three data sets, though they did not recover the self-centered type in the 100-question survey, nor the average type in the 44-question survey. This could be expected given the smaller number of questions (something the authors confirmed by simulation). They also found, based on demographic data, that older individuals were more likely to be role model types, and less likely to be self-centered. This pattern was also consistent across data sets.


What's the impact?

The finding of four distinct personality types advances the field of personality psychology, and contributes to the debate on whether it is actually possible to quantify something as complex as human personality. The Big Five traits can predict behavior in mental health situations and other life patterns. Further, having a typing system might allow clinicians and researchers to measure the personality factors that affect their work. Beyond this, the study showed that the clustering algorithm used to find the personality traits, while being an advanced technique, can sometimes come up with spurious results. Therefore it’s important to analyze results in other ways, like randomizing the data.

Word of Caution: While the results suggest these four types, the findings do not suggest that every person belongs to one and only one type. In fact, many respondents' score on the 5 personality traits suggest they are located between personality types; nevertheless there are some clusters in the data which are much denser than others.


Gerlach et al., A robust data-driven approach identifies four personality types across four large data sets. Nature Human Behaviour (2018). Access the original scientific publication here