Poetiq cracks major reasoning benchmark

    0

    Good morning, AI enthusiasts. Just half a year ago, leading AI models struggled to surpass a 5% score on the ARC-AGI-2 reasoning benchmark. Today, a small startup has shattered expectations by achieving over 50%, outperforming even Google’s own Gemini model.

    This breakthrough was made possible by Poetiq’s innovative “meta-system” approach, which enhances existing AI models through intelligent orchestration rather than developing new architectures from the ground up. This milestone suggests that future AI advancements may stem as much from smart engineering as from sheer computational scale.


    In this edition:

    • Poetiq’s Gemini-based system leads ARC-AGI-2 benchmark
    • Insights from The Rundown Roundtable on AI applications
    • How to design LinkedIn carousels using ChatGPT and Canva
    • New research reveals poetry prompts can bypass AI safety filters
    • Latest AI tools, community workflows, and industry updates

    Poetiq’s Meta-System Surpasses Google on ARC-AGI-2

    Poetiq, a nimble six-person AI startup, has claimed the top position on the ARC-AGI-2 reasoning benchmark by leveraging a meta-system that orchestrates and refines outputs from existing models instead of training new ones. This strategy enabled them to outperform Google’s Gemini 3 Deep Think variant at nearly half the cost.

    • The meta-system quickly adapts to new base models, achieving leading results within hours of Gemini 3’s release without retraining.
    • Using Gemini 3 Pro as a foundation, Poetiq’s refinement process scored 54% accuracy at $30 per task, surpassing Google’s Deep Think which scored 45% at $77 per task.
    • This marks the first time any system has broken the 50% threshold on ARC-AGI-2, a benchmark where top models were stuck below 5% just six months ago.
    • Poetiq’s approach involves large language models (LLMs) iteratively improving their own outputs with integrated self-auditing to ensure high-quality reasoning.

    This rapid leap in ARC-AGI-2 performance highlights a dual path forward for AI: continued development of cutting-edge models alongside innovative orchestration techniques that maximize existing resources, enabling smaller teams to compete without massive compute investments.

    Streamline Your Workflow with Lindy’s AI Agents

    Lindy offers a no-code platform to create custom AI agents tailored to your business needs. Whether it’s qualifying leads, drafting reports, or managing customer support, Lindy automates repetitive tasks to free up your team’s time.

    • Sales agents that autonomously qualify prospects and schedule meetings around the clock.
    • Support agents capable of resolving customer tickets instantly via phone and chat.
    • Operations agents that reduce hours of manual work to mere minutes.

    With over 6,000 integrations, Lindy enables rapid deployment of AI-powered workflows without technical complexity.

    The Rundown Roundtable: How We Use AI Daily

    Each week, our team shares personal stories about integrating AI into their professional and personal routines.

    Billy, Educator: As a basketball fan, I tested Nano Banana 3.0 by generating consistent product images of hats for every NBA team using Google Sheets and dynamic prompts. The AI maintained a uniform style across all designs, as if they belonged to a single brand. Now, if only AI could help me secure an NBA licensing deal!

    Reagan, Strategic Partnerships: I often take long walks during work breaks to brainstorm. Using Wispr Flow, I can speak my ideas aloud and have them transcribed directly into my workspace, making idea capture effortless.

    Rishi, Product Marketing Manager: While building a paid advertising tracker in Google Sheets, I document key features by recording Loom videos, transcribing them, and then summarizing the content with ChatGPT to create clear, concise explanations for our Notion database.

    Create Engaging LinkedIn Carousels with ChatGPT and Canva

    Learn how to quickly design professional LinkedIn carousel posts by combining ChatGPT’s Canva integration for content creation and slide design-all within one seamless interface.

    1. Open ChatGPT, start a new chat, select the Canvas app, and prompt: “Create a 5-slide LinkedIn carousel on [your topic]. Slide 1: Hook. Slides 2-4: One tip each. Slide 5: Call to action. Keep each slide under 40 words.”
    2. Refine the text in Canvas, then prompt: “@canva, generate a 5-slide LinkedIn carousel using this content [paste slides]. Use a [detailed style]. Keep the text exactly as provided.”
    3. Review the four design options generated, pick your favorite, and open it in Canva for editing.
    4. Make final adjustments in Canva, then download your carousel as a PDF for LinkedIn or PNG files for individual slides.

    Pro tip: Specify your brand’s colors and fonts in the prompt to have them automatically applied to your carousel designs.

    Upcoming Webinar: Enhancing AI Oversight with Fiddler AI

    Join Fiddler AI’s live webinar to explore how agentic observability can elevate AI system performance by providing comprehensive visibility, contextual insights, and control mechanisms. Learn to monitor AI behavior from development through deployment.

    • Validate agent decisions pre-production using golden and challenger datasets.
    • Monitor system health with detailed metrics across hierarchical agent layers.
    • Analyze reasoning chains to identify and troubleshoot failure points.

    Register now to attend live or receive the session recording.

    Research Spotlight: Poetry as a Novel AI Jailbreak Technique

    Researchers at Italy’s Icaro Labs have uncovered a surprising vulnerability: reframing harmful prompts as poetry can consistently bypass safety filters in many leading AI models.

    • Testing 25 state-of-the-art models from OpenAI, Google, Anthropic, and others, poetry-based prompts achieved a 62% average success rate in eliciting unsafe outputs.
    • Google’s Gemini 2.5 Pro was fully compromised (100% success), while OpenAI’s smaller GPT-5 nano model resisted all poetic jailbreak attempts.
    • Exploited topics included instructions on weapon creation, hacking techniques, and psychological manipulation.
    • The researchers withheld the exact poems due to their potential misuse, despite their simplicity.

    This finding underscores the ongoing cat-and-mouse nature of AI safety, where new creative exploits emerge as soon as defenses are implemented. Poetry joins roleplay, foreign language tricks, and code obfuscation as unexpected attack vectors, highlighting the need for continuous vigilance.

    Quick Updates in AI

    • Mistral announces its latest generation of open-source AI models, promising enhanced performance and accessibility.
    • ByteDance unveils a powerful image AI capable of advanced editing and text rendering.
    • A new avatar generation model now supports up to 5-minute video outputs, expanding creative possibilities.
    • Microsoft releases an open-source, real-time text-to-speech system, advancing voice synthesis technology.

    Additional industry news:

    • OpenAI is reevaluating its approach to shopping suggestions after criticism over perceived advertising bias, with CRO Mark Chen acknowledging shortcomings.
    • Meta-backed startup Limitless launches an AI pendant that records and transcribes real-world conversations for seamless note-taking.
    • The New York Times and Chicago Tribune have filed separate copyright infringement lawsuits against AI startup Perplexity, marking the NYT’s second legal action in this space.
    • Meta secures new AI licensing agreements with major publishers like CNN, Fox News, and USA Today to integrate real-time news into its AI platform.
    • The U.S. Department of Energy introduces AMP2, an ambitious AI research platform designed to autonomously study microbial ecosystems at unprecedented scale.

    Community Spotlight: AI in Action

    Each issue, we highlight how readers harness AI to boost productivity and simplify their lives. Today’s feature comes from an anonymous subscriber in Houston, TX:

    “I used ChatGPT as a strategic partner throughout a recent interview and negotiation process. It helped me prepare by refining my talking points and rehearsing answers, boosting my confidence and clarity. During the offer stage, ChatGPT assisted in crafting assertive yet professional positioning statements, negotiation language, and follow-up emails.”

    How are you leveraging AI? Share your story with us.

    Additional Resources

    Until next time,

    Rowan, Joey, Zach, Shubham, and Jennifer – your team behind The Rundown

    Exit mobile version