General Discussion

highplainsdem

(59,323 posts)

26. Re hallucinations - see this article:

Sun Dec 7, 2025, 08:58 PM

Sunday

Gemini 3 Pro tops new AI reliability benchmark, but hallucination rates remain high
https://the-decoder.com/gemini-3-pro-tops-new-ai-reliability-benchmark-but-hallucination-rates-remain-high/

No single model demonstrated consistently strong factual reliability across all six domains. Claude 4.1 Opus led in law, software engineering, and the humanities; GPT‑5.1.1 ranked first in business questions; while Grok 4 performed best in health and science.

According to the study, these domain differences mean that relying solely on overall performance can obscure important gaps.

While larger models tend to achieve higher accuracy, they don’t necessarily have lower hallucination rates. Several smaller models - like Nvidia’s Nemotron Nano 9B V2 and Llama Nemotron Super 49B v1.5 - outperformed much larger competitors on the Omniscience Index.

Artificial Analysis confirmed that accuracy strongly correlates with model size, but hallucination rate does not. That explains why Gemini 3 Pro, despite its high accuracy, still hallucinates frequently.

Edit history

Please sign in to view edit histories.

Recommendations

0 members have recommended this reply (displayed in chronological order):

69 replies

= new reply since forum marked as read

Highlight:

I was thinking about this "AI" stuff . . . [View all] AverageOldGuy Sunday OP

Critical work was and is done slowly and carefully. usonian Sunday #1

Now it is on my mind MuseRider Sunday #2

I'm 56 and I've only seen a slide rule once or twice. Haggard Celine Sunday #3

False premise. NASA and the other entities involved used computers. They did not only use slide rules. Celerity Sunday #4

Didn't John Glenn ask the women mathmaticians of "Hiden Figures" to do manual calculations Deminpenn Sunday #5

Never saw that movie, but you said 'check the computer calculations' so computers were obviously used to a degree. Celerity Sunday #7

The computers... IcyPeas Sunday #15

See comment 15: the "computers" were women. TommyT139 Monday #57

No, the women checked the computers' outputs. Also see comments in this thread confirming that computers were used Celerity Monday #58

Jmo, but a lot of "AI" seems just to be a rebrand of things that Deminpenn Sunday #6

Its way more than that JCMach1 Sunday #10

A lot is what is called AI isn't. Ms. Toad Sunday #12

Then anyone could "write" such a thesis because it would require minimal knowledge and the AI highplainsdem Sunday #16

I've seen a couple reports that AI has already revealed some dangers... buzzycrumbhunger Sunday #53

. I've read a little about this, and here is what I think is going on.... reACTIONary Sunday #55

AI technology is the new reality anciano Sunday #8

You can't enhance creativity with AI, any more than you enhance creativity asking someone else to highplainsdem Sunday #9

I don't think this is fair. mr715 Sunday #14

Curious about what you mean when you say it inspires you. Do you mean you ask it for ideas? highplainsdem Sunday #18

It did a psychic reading mr715 Sunday #35

Cool 😎 .... anciano Sunday #40

See reply 37. highplainsdem Sunday #50

Okay, I'll give you an A+ for creativity just for writing a poem for a science communication workshop. highplainsdem Sunday #49

Yours is one of the few nuanced takes I've read about one of the major faults with AI... appmanga Sunday #54

Thanks, but I'm just trying to relay some of what I've heard from artists and writers and others highplainsdem Monday #61

I mean... the poem was well received mr715 Monday #62

GenAI is very good at mimicry. highplainsdem Monday #64

Some don't need it for creativity tinrobot Sunday #48

We didn't do it with a slide rule DavidDvorkin Sunday #11

I never used a slide rule. mr715 Sunday #13

It isn't at all cool that AI is being widely used for cheating and students are learning less as a highplainsdem Sunday #19

Students also learn more... WarGamer Sunday #22

GenAI is never hallucination-free. I don't know where you got the idea that it is. highplainsdem Sunday #23

It's because history is set. WarGamer Sunday #25

See reply 26. highplainsdem Sunday #27

If you ask Grok mr715 Sunday #41

It wasn't that long ago that Grok was identifying him as the main source of misinformation on X, highplainsdem Sunday #46

Now it does funny stuff mr715 Sunday #47

Yes. Smarter than Einstein, and more fit than LeBron James: highplainsdem Sunday #51

Re hallucinations - see this article: highplainsdem Sunday #26

Yup I'm a Gemini Pro 3.0 power user... since day 1 and version 1. WarGamer Sunday #29

You just contradicted what you said minutes ago about it being hallucination-free. highplainsdem Sunday #31

I specifically said history topics along with other disciplines that don't change and are "set" WarGamer Sunday #33

The topic doesn't matter. All genAI models can hallucinate on any topic. highplainsdem Sunday #34

*can yes... WarGamer Sunday #39

Just one study: highplainsdem Sunday #45

And see these threads about AI and hallucinations: highplainsdem Sunday #28

The undergrads I teach mr715 Sunday #36

We need to adapt. mr715 Sunday #42

I'll be the first to say it: What's a slide rule? Polybius Sunday #17

Wikipedia is very useful: highplainsdem Sunday #20

Hey, never cite wikipedia! mr715 Sunday #43

50 years ago if I told you I could hold a piece of glass and access global knowledge... WarGamer Sunday #21

You don't know if it was "dead accurate" unless you took the time to check that those were the highplainsdem Sunday #24

I did... I back checked it. WarGamer Sunday #32

Cool 😎.... anciano Sunday #30

Not exactly. highplainsdem Sunday #37

I find this discussion fascinating. It seems that the algorithm has figured out people are inherently lazy learners. cayugafalls Sunday #38

It is a fancy autocorrect mr715 Sunday #44

Like it or not, if you have a job interview these days you better have an AI story/strategy underpants Sunday #52

In some professions use of AI is a badge of dishonor. highplainsdem Monday #56

It's pretty mediocre jfz9580m Monday #59

We weren't allowed to use a slide rule in school. That was cheating. Emile Monday #60

Did you memorize logs? nt mr715 Monday #65

We had to walk 5 miles barefoot in snow to school too. Emile Monday #67

Uphill both ways. mr715 Monday #68

LOL, that's right 👍. Emile Monday #69

I wonder the same about calculators Torchlight Monday #63

And grad students. nt mr715 Monday #66