The purpose of this dataset was to study gender bias in occupations. Online biographies, written in English, were collected to find the names, pronouns, and occupations. Twenty-eight most frequent occupations were identified based on their appearances. The resulting dataset consists of 397,340 biographies spanning twenty-eight different occupations. Of these occupations, the professor is the most frequent, with 118,400 biographies, while the rapper is the least frequent, with 1,406 biographies. Important information about the biographies: 1. The longest biography is 194 tokens, while the shortest is eighteen; the median biography length is seventy-two tokens. 2. It should be noted that the demographics of online biographies’ subjects differ from those of the overall workforce and that this dataset does not contain all biographies on the Internet.
Paper | Code | Results | Date | Stars |
---|