Researchers from MIT and elsewhere have found that machine-learning models designed to mimic human decision making, such as deciding whether social media posts violate toxic content policies, often do not replicate human decisions. This is because the models are typically trained with descriptive data, which are labeled by humans who are asked to identify factual features, rather than normative data, which are labeled by humans who are asked to judge rule violations. This could lead to models making harsher judgements than humans would, with serious implications in the real world.
