Compare
September 22, 2023
4-5 minutes

Why Can't Google's LLM craft a rhyme?

Author
Pratik

We recently ran some tests that led to an interesting discovery. Google's PALM2, despite its reputation, struggles at writing poems—except when the subject is Kanye West. OpenAI GPT and Meta LLAMA models, on the other hand, perform exceptionally well. What gives?

We issued a straightforward challenge to multiple AI models: write a short poem about Steve Jobs and Tim Cook. Google's PALM2 LLM fell remarkably short in generating the poems. In stark contrast, OpenAI's  and Facebook's models penned compelling verses for both prompts. The difference is striking and begs the question: Why? Compare the output below or try it out at this link

Try it out: https://app.contentable.ai/playground/650d41ba1097a0c8745a6d8e

Next we decided to try the same prompt on celebrities like Kanye West and Justin Bieber with the hope that it could be a cultural bias. We observed that PALM2 output did rhyme to some extent but it was still terrible compared to GPT and LLAMA

Try it out : https://app.contentable.ai/playground/650d42d01097a0c8745a6d98

Was this a cultural bias or was it a algorithmic preference? We tried other tech celebrities like Elon Musk and Sundar Pichai. Google PALM struggled with Sundar Pichai but did an average job with Elon Musk. This points fingers at training data as there is a lot of data on Elon Musk. We cannot confirm for sure as Steve Jobs would have a similar amount of training data  to Elon Musk. Why did PALM get elon right and not steve jobs?

Try it out: https://www.contentable.ai
Possible Explanations
  1. Training Data: PALM2 may lack poetic examples for specific public figures, except singers.
  2. Algorithmic Preferences: LLM might be optimized for different types of tasks and therefore struggles with poetry.
  3. Cultural Bias: Kanye, being a musical artist, might align better with LLM's training data, explaining the model's relative success.

Implications
  1. Specialization Matters: General-purpose models like Google's PALM2 may not be the best choice for all creative tasks.
  2. Training Data: The models' capabilities are only as good as the data they’re trained on.
  3. Fine-Tuning: OpenAI and LLAMA models may have undergone specific tuning to excel at this task.


Google's LLM's inconsistency in poetic tasks compared to OpenAI and LLMAS models underscores a crucial takeaway: not all AI models excel at the same tasks. This disparity is not just academic; it has practical implications for businesses looking to adopt AI solutions. The key is comparison. Platforms like contentable.ai allow you to evaluate multiple AI models head-to-head, ensuring you adopt the one that best aligns with your specific needs. Failure to compare could mean missing out on the most effective tools for your business.

Try it out and let us know what you find when you compared different models at contentable.ai.

Written by
Pratik
Co-founder

AI Nerd
Developer
Love good UX
Father

99.9% teams love Collab. Not convinced you’re one?We love a challenge.