fbpx
01 / 05
OpenAI’s GPT-5 Hallucinates Less than Previous Models Do

Nature | Science & Technology

OpenAI’s GPT-5 Hallucinates Less than Previous Models Do

“In one literature-review benchmark known as ScholarQA-CS, GPT-5 ‘performs well’ when it is allowed to access the web, says Akari Asai, an AI researcher at the Allen Institute for Artificial Intelligence, based in Seattle, Washington, who ran the tests for Nature. In producing answers to open-ended computer-science questions, for example, the model performed marginally better than human experts did, with a correctness score of 55% (based on measures such as how well its statements are supported by citations) compared with 54% for scientists, but just behind a version of institute’s own LLM-based system for literature review, OpenScholar, which achieved 57%.

However, GPT-5 suffered when the model was unable to get online, says Asai. The ability to cross-check with academic databases is a key feature of most AI-powered systems designed to help with literature reviews. Without Internet access, GPT-5 fabricated or muddled half the number of citations that one of its predecessors, GPT-4o, did. But it still got them wrong 39% of the time, she says.

On the LongFact benchmark, which tests accuracy in long-form responses to prompts, OpenAI reported that GPT-5 hallucinated 0.8% of claims in responses about people or places when it was allowed to browse the web, compared with 5.1% for OpenAI’s reasoning model o3. Performance dropped when browsing was not permitted, with GPT-5’s error rate climbing to 1.4% compared with 7.9% for o3. Both models showed worse performance than did the non-reasoning model GPT-4o, which had an error rate of 1.1% when offline.”

From Nature.

Bloomberg | Space

Space Startup Beams More Laser Energy to Panels than Ever Before

“Aerospace startup Star Catcher Industries Inc., which is developing technology to beam solar power to orbiting satellites, said it wirelessly transmitted more electricity in a ground test than ever before, marking another step toward creating the equivalent of a space grid.

Using a suite of lasers, the company successfully sent energy to off-the-shelf solar panels positioned more than 1 kilometer (0.62 miles) away. The tests took place at NASA’s Kennedy Space Center in Florida last month.

The 1.1 kilowatt of converted electricity delivered at once exceeded the previous record set by the US government’s Defense Advanced Research Projects Agency, or Darpa. During Star Catcher’s multiday campaign, it beamed more than 10 megajoules of energy, according to the company.”

From Bloomberg.

Reuters | Motor Vehicles

pony.ai Granted Citywide Driverless Robotaxi Permit in Shenzhen

“Chinese autonomous driving firm Pony.ai has been granted the first citywide permit for driverless commercial robotaxi operations in the city of Shenzhen in southern China, it said on Friday.

The permit was jointly granted to Pony.ai and the city’s largest taxi operator Xihu Group, the company said in a statement.”

From Reuters.

NASASpaceflight | Space

Vast Completes Haven-1 Structural Testing, Launches Mission

“Vast is a space station company founded in 2021 by Jed McCaleb. It launched its pathfinder, Haven Demo, aboard a SpaceX Falcon 9 Bandwagon-4 mission on Sunday, Nov. 02, at 01:09 AM EDT (05:09 UTC) from Space Launch Complex 40 (SLC-40) at the Cape Canaveral Space Force Station in Florida.

Vast’s first station, Haven-1, is due to launch NET May 2026, also aboard a Falcon 9.

Haven-1 isn’t just any payload; it aims to help Vast beat other contenders for NASA Commercial LEO Destinations Phase 2 (CLD) funding. These companies include Axiom Space (Axiom Station) and Voyager Space/Airbus (Starlab), among others…

Vast’s Haven Demo will test out key capabilities, such as Reaction Control Systems (RCS), power systems, and propulsion, in preparation for Haven-1.”

From NASASpaceflight.