Contractors working on improving Google’s Gemini model have been comparing its responses to those of Anthropic’s rival model, Claude. According to internal emails obtained by TechCrunch, this process involves contractors manually assessing responses, rather than using specialized benchmarks. While such comparisons are common in the industry, they are usually conducted with predefined metrics rather than individual evaluations.
Evaluation Criteria and Key Findings
The contractors assess Gemini’s responses based on criteria such as plausibility and verbosity, spending up to 30 minutes per query to decide which model performs better. In internal chats, some contractors highlighted that Claude prioritizes security more stringently than Gemini.
One contractor remarked that Claude employs “the strictest security settings” among AI models, often choosing not to respond to queries it considers questionable. By contrast, Gemini has been reported to “flagrantly violate” certain guidelines, sometimes providing responses containing nudity or obscenity.
Anthropic, Claude’s developer, explicitly prohibits customers from using its model for building competing products or training rival models without its consent. Notably, Google is a significant investor in Anthropic, raising questions about the competitive dynamics between the two companies.
Controversy and Google’s Position
TechCrunch reached out to Google to clarify whether it had obtained Anthropic’s permission to use Claude for testing Gemini. Google did not respond directly but confirmed that it evaluates its model’s responses against competitors to ensure quality. However, the company stated that it does not train Gemini using Claude’s data.
Adding to the controversy, previous reports indicated that contractors working on Gemini were tasked with evaluating its responses in unfamiliar subject areas, notes NIXsolutions. Internal communications reveal concerns among these contractors, particularly regarding the accuracy of Gemini’s outputs on critical topics like health and treatment.
The competition between Gemini and Claude highlights the challenges of ensuring both accuracy and ethical standards in AI. As new developments emerge, we’ll keep you updated on the evolving landscape of AI advancements and comparisons.