How Performance Evaluations Hurt Gender Equality Testing

I was told I was too aggressive, I was too blunt, I was too direct, and that I sounded pompous when I offered advice on recruiting, despite the fact that I had been there twice and was very successful,” Lt. Col. Kate Germano told Public Radio International in 2015, shortly after she was removed from command. “In my career of nineteen years, what I found was that my counterparts would not be told those things.”

Her male counterparts.

Germano isn’t the first to point to a double standard in how we judge the behavior of leaders. When men take charge, we say that they’re strong and assertive. Women, on the other hand, are all too often labeled bossy or aggressive.

Our findings suggest that people from under-represented groups can be penalized for not looking like a leader, and are told implicitly that they are not leaders through the messaging of performance evaluations.

This type of bias appears in performance evaluations. These evaluations are part of the process to identify, develop, and promote talented individuals and are designed to be meritocratic. Ironically, they can exacerbate the very gender inequities they are striving to reduce—like the anemic number of women in leadership positions.

Our research offers evidence of this from a large field study at an institution—a military service academy—that prides itself on equal opportunity and meritocracy. Our findings suggest that people from under-represented groups can be penalized for not looking like a leader, and are told implicitly that they are not leaders through the messaging of performance evaluations.

What does it mean to “look like a leader”? Research on leadership finds that people generally look for agentic characteristics (e.g., instrumental, task-focused, goal-oriented) in a leader whereas communal characteristics (e.g., nurturing, relationship-focused, collaborative) are less valued. Agentic traits are typically associated with men and masculinity and communal traits are associated with women and femininity. Because men are often assumed to be agentic and women to be communal, women don’t look like leaders.

The message inherent in these descriptors is clear: the women evaluated were incompetent and not qualified to be leaders.

As a result, women in leadership positions face an impossible situation. They will either receive feedback highlighting their lack of feminine, communal attributes (“She’s not compassionate or organized enough”), criticizing them for taking too much power (“She’s abrasive, overbearing”), or for lacking some key leadership qualification (“She’s inept, temperamental”). No matter their leadership style, they are deemed unfit.

We saw these trends firsthand in our research on peer evaluations from a military-service academy. We examined subjective peer-performance evaluations of over 4,000 leaders and 81,000 assignments of leadership attributes (chosen from a predetermined list of 89 attributes: 44 positive and 45 negative). Based on prior research, we expected that men would be assigned more positive attributes and women would receive more negative ones.

There were no gender differences on objective performance measures (e.g., grades, fitness scores, class standings), a finding consistent with previous research on military populations. Thus, those metrics wouldn’t explain gender differences in attributes assigned by peers.

As predicted, women received more negative leadership attributes (in greater overall quantity and variety) than men. Specifically, women were more likely to be described as inept, frivolous, gossipy, excitable, scattered, temperamental, panicky, and indecisive—in other words, a host of negative feminine stereotypes. The message inherent in these descriptors is clear: the women evaluated were incompetent and not qualified to be leaders.

If these traits are so valued, why aren’t women retained and advanced equivalently to or at higher rates than men?

Interestingly, women were only penalized for acting in an agentic—i.e., more masculine—way with one negative attribute: selfish. For women leaders in historically male-dominated professions, such as the military, we expected more penalties for women who dared to usurp power as leaders. (While we do not have the data to explain this result, we speculate either that these future officers have not yet developed a traditionally masculine leadership style or that they have received sufficient negative feedback and backlash about their agentic [i.e., masculine] leadership style they that responded by adopting a more traditionally feminine leadership style.)

We also found that men and women received similar numbers of positive attributes, although the attributes they received were qualitatively different. While men were more likely to be assigned attributes such as analytical, competent, and logical, women received compassionate, enthusiastic, and organized. Arguably, all of these leadership attributes are aspirational and valuable (and when asked, people indicate that the most important traits in a leader are communal traits, such as compassion).

But let’s not overlook the elephant in the performance-review room: If these traits are so valued, why aren’t women retained and advanced equivalently to or at higher rates than men? Men, after all, are less likely to receive these attributes. Why is this not reflected in the senior military leadership (general and flag officers) as well as the C-suite?

The problem, it turns out, likely starts earlier in the leadership pipeline. At the officer ranks, women in combat jobs are retained at approximately half the rate as men. When it comes to promoting a new senior officer, there are simply more male officers to choose from.

Outside of the military, there’s renewed momentum to increase diversity and inclusion, improve retention rates, and achieve more equitable recruiting, hiring, and promotion. It’s crucial that the individuals leading the charge not overlook the power of performance-evaluation language, which can reinforce stereotypes that undermine these objectives. As applied researchers working to improve the workplace for all organizations, we’ve distilled a few evidence-based suggestions to minimize bias in performance evaluations:

Include unconscious-bias education as part of manager (leader) development training. Raising awareness of how we can inadvertently bias language in performance evaluations may not eliminate biased language, but it will certainly encourage evaluators to stop and think before providing feedback. Simple online programs are available to assess gender “coding” in job advertisements and other employment documents. Formal evaluation programs that use standardized lists of words or phrases should also be evaluated for biased language.

Be specific and clear about evaluation criteria. When evaluators don’t have specific criteria and evidence to measure the performance of an employee, they’re more likely to rely on information from bias and stereotype, like personality traits. Performance data might include productivity metrics, and evaluation criteria could include the number of sales calls in a specific period of time. We think there’s a great example of how to set this up here.

Hold evaluators accountable. When evaluators think no one will check on their work, they’re more likely to be lazy (picking the easiest path) and to unconsciously bias evaluations. Hold evaluators accountable for their work, and eliminate anonymity. Before they even start to do the work of evaluating, educate them about how biases can show up in evaluations—and about the impact.

Avoid lone-wolf evaluators. Ask several people to evaluate individuals. This encourages a broader perspective on performance.

Be transparent in who is evaluating, what is being evaluated, how it’s being evaluated, and why it’s being evaluated. Evaluators may say they value a particular skill or trait, but then make employment decisions that don’t match up. Decision makers can’t be held accountable, or expect to be held accountable, if there’s no transparency in how those decisions are made.

More frequent evaluation is better. Consistent and frequent evaluation—although not necessarily formal—provide for longitudinal and continuous appraisals, which are more likely to show progress, reinforce professional identity, and affirm organizational fit.

Across industries, senior management is desperately trying to retain talented women. Too often, these women receive formal and informal messaging that they neither belong nor fit, and they are penalized for their authentic leadership style. There are high costs associated with employee turnover, and overwhelming evidence suggests that businesses’ bottom lines increase by as much as 15 percent with more gender-diverse leadership teams in senior management, the C-suite, and the boardroom. Reducing evaluation bias is a business imperative.


David G. Smith is a professor of sociology in the Department of National Security Affairs at the United States Naval War College. His research focuses on gender, work, and family issues, including gender bias, dual career families, military families, women in the military, and retention of women. He is the co-author of Athena Rising: How and Why Men Should Mentor Women.


Judith E. Rosenstein

Judith E. Rosenstein is a professor of sociology in the Department of Leadership, Ethics, and Law at the United States Naval Academy and is affiliated with the Academy’s sexual harassment and assault prevention education program. Her research focuses on social inequality, with an emphasis on gender, sexual assault, sexuality, and violence.


Margaret C. Nikolov

Margaret C. Nikolov is an independent statistical consultant who previously taught at the United States Naval Academy. Her research includes applications in public and environmental health, naval architecture, and sociology.

Behavioral Scientist

This piece was published in partnership with The Behavioral Scientist, a collaboration between BSPA, ideas42 and the Center for Decision Research. The Behavioral Scientist is a non-profit online magazine that offers readers original, thought-provoking reports from the front lines of behavioral science. Visit us at