Google's AI Model Gemini Under Scrutiny As Contractors Use Competitor Claude For Benchmarking

In a recent revelation by TechCrunch, contractors employed to enhance Google’s AI model, Gemini, have been found using outputs from Anthropic’s Claude to gauge the effectiveness of their own model.

This practice has stirred curiosity and raised questions about the methodologies used by big tech firms to advance their AI technologies.

The process, as detailed by internal sources, involves contractors comparing responses from Gemini and Claude on various criteria including truthfulness and detail within a response time of up to 30 minutes per prompt.

Interestingly, these comparisons have begun to highlight differences between the two models, particularly in their approach to safety protocols.

Reports indicate that Claude tends to avoid responding to potentially unsafe prompts, showcasing a higher safety standard, which isn’t always the case with Gemini.

Notably, an internal communication hinted that Claude’s safety features were more stringent than those of any other AI models evaluated.

This was emphasized further when Gemini reportedly handled a prompt that led to a response flagged for “huge safety violation” due to inappropriate content, an issue not encountered with Claude.

Despite these intriguing insights, Google’s relationship with Anthropic, a company in which it has invested, remains under the spotlight.

Shira McNamara, a spokesperson from Google DeepMind which oversees Gemini, clarified that while they do compare outputs from different models, they do not use Anthropic’s models for training Gemini.

This statement comes amidst concerns raised by contractors about being asked to rate AI responses on topics outside their areas of expertise, potentially leading to inaccuracies in sensitive fields like healthcare.

As the AI industry continues to expand, the practices for developing and refining these technologies are increasingly scrutinized.

The use of a competitor’s AI for internal benchmarks without explicit permission could pose ethical and legal questions, particularly when it involves leading tech giants like Google.