TY - JOUR
T1 - Gender Representation of Health Care Professionals in Large Language Model-Generated Stories
AU - Menz, Bradley D
AU - Kuderer, Nicole M
AU - Chin-Yee, Benjamin
AU - Logan, Jessica M
AU - Rowland, Andrew
AU - Sorich, Michael J
AU - Hopkins, Ashley M
PY - 2024/9/23
Y1 - 2024/9/23
N2 - Importance: With the growing use of large language models (LLMs) in education and health care settings, it is important to ensure that the information they generate is diverse and equitable, to avoid reinforcing or creating stereotypes that may influence the aspirations of upcoming generations. Objective: To evaluate the gender representation of LLM-generated stories involving medical doctors, surgeons, and nurses and to investigate the association of varying personality and professional seniority descriptors with the gender proportions for these professions. Design, Setting, and Participants: This is a cross-sectional simulation study of publicly accessible LLMs, accessed from December 2023 to January 2024. GPT-3.5-turbo and GPT-4 (OpenAI), Gemini-pro (Google), and Llama-2-70B-chat (Meta) were prompted to generate 500 stories featuring medical doctors, surgeons, and nurses for a total 6000 stories. A further 43 200 prompts were submitted to the LLMs containing varying descriptors of personality (agreeableness, neuroticism, extraversion, conscientiousness, and openness) and professional seniority. Main Outcomes and Measures: The primary outcome was the gender proportion (she/her vs he/him) within stories generated by LLMs about medical doctors, surgeons, and nurses, through analyzing the pronouns contained within the stories using χ2 analyses. The pronoun proportions for each health care profession were compared with US Census data by descriptive statistics and χ2 tests. Results: In the initial 6000 prompts submitted to the LLMs, 98% of nurses were referred to by she/her pronouns. The representation of she/her for medical doctors ranged from 50% to 84%, and that for surgeons ranged from 36% to 80%. In the 43 200 additional prompts containing personality and seniority descriptors, stories of medical doctors and surgeons with higher agreeableness, openness, and conscientiousness, as well as lower neuroticism, resulted in higher she/her (reduced he/him) representation. For several LLMs, stories focusing on senior medical doctors and surgeons were less likely to be she/her than stories focusing on junior medical doctors and surgeons. Conclusions and Relevance: This cross-sectional study highlights the need for LLM developers to update their tools for equitable and diverse gender representation in essential health care roles, including medical doctors, surgeons, and nurses. As LLMs become increasingly adopted throughout health care and education, continuous monitoring of these tools is needed to ensure that they reflect a diverse workforce, capable of serving society's needs effectively.
AB - Importance: With the growing use of large language models (LLMs) in education and health care settings, it is important to ensure that the information they generate is diverse and equitable, to avoid reinforcing or creating stereotypes that may influence the aspirations of upcoming generations. Objective: To evaluate the gender representation of LLM-generated stories involving medical doctors, surgeons, and nurses and to investigate the association of varying personality and professional seniority descriptors with the gender proportions for these professions. Design, Setting, and Participants: This is a cross-sectional simulation study of publicly accessible LLMs, accessed from December 2023 to January 2024. GPT-3.5-turbo and GPT-4 (OpenAI), Gemini-pro (Google), and Llama-2-70B-chat (Meta) were prompted to generate 500 stories featuring medical doctors, surgeons, and nurses for a total 6000 stories. A further 43 200 prompts were submitted to the LLMs containing varying descriptors of personality (agreeableness, neuroticism, extraversion, conscientiousness, and openness) and professional seniority. Main Outcomes and Measures: The primary outcome was the gender proportion (she/her vs he/him) within stories generated by LLMs about medical doctors, surgeons, and nurses, through analyzing the pronouns contained within the stories using χ2 analyses. The pronoun proportions for each health care profession were compared with US Census data by descriptive statistics and χ2 tests. Results: In the initial 6000 prompts submitted to the LLMs, 98% of nurses were referred to by she/her pronouns. The representation of she/her for medical doctors ranged from 50% to 84%, and that for surgeons ranged from 36% to 80%. In the 43 200 additional prompts containing personality and seniority descriptors, stories of medical doctors and surgeons with higher agreeableness, openness, and conscientiousness, as well as lower neuroticism, resulted in higher she/her (reduced he/him) representation. For several LLMs, stories focusing on senior medical doctors and surgeons were less likely to be she/her than stories focusing on junior medical doctors and surgeons. Conclusions and Relevance: This cross-sectional study highlights the need for LLM developers to update their tools for equitable and diverse gender representation in essential health care roles, including medical doctors, surgeons, and nurses. As LLMs become increasingly adopted throughout health care and education, continuous monitoring of these tools is needed to ensure that they reflect a diverse workforce, capable of serving society's needs effectively.
KW - LLMs
KW - Large language models
KW - AI
KW - Artificial intelligence
KW - Gender representation
KW - Healthcare professionals
KW - Doctor
KW - Nurse
KW - Surgeon
UR - http://www.scopus.com/inward/record.url?scp=85204758806&partnerID=8YFLogxK
UR - http://purl.org/au-research/grants/NHMRC/2030913
UR - http://purl.org/au-research/grants/NHMRC/2008119
U2 - 10.1001/jamanetworkopen.2024.34997
DO - 10.1001/jamanetworkopen.2024.34997
M3 - Article
C2 - 39312237
AN - SCOPUS:85204758806
SN - 2574-3805
VL - 7
SP - e2434997
JO - JAMA network open
JF - JAMA network open
IS - 9
ER -