LLM Benchmark Framework

Can AI maintain accuracy in low-resource languages?

To assess the accuracy of truthfulness evaluations, the study compares three methods: human evaluation, multiple-choice ...

Tech Xplore on MSN1d

Continuous skill acquisition in robots: New framework mimics human lifelong learning

Humans are known to accumulate knowledge over time, which in turn allows them to continuously improve their abilities and ...

Tech Xplore on MSN3d

Neuro-inspired AI framework uses reverse-order learning to enhance code generation

Large language models (LLMs), such as the model behind OpenAI's popular platform ChatGPT, have been found to successfully ...

16d

Hugging Face clones OpenAI’s Deep Research in 24 hours

Hugging Face researchers released an open source AI research agent called "Open Deep Research," created by an in-house team ...

FACTBOX China's AI firms take spotlight with deals, low-cost models

Alibaba's announcement this week that it will partner with Apple to support iPhones' artificial intelligence services ...

decrypt8d

New Open Source AI Model Rivals DeepSeek's Performance—With Far Less Training Data

OpenThinker-32B achieved benchmark-beating results using just 14% of the data its Chinese competitor needed, marking a win ...

10d

LangChain shows AI agents aren’t human-level yet because they’re overwhelmed by tools

LangChain evaluated a single AI agent to see if its performance degrades when given more context and tools, essentially overwhelming it.

Psychology Today15d

Can AI Make Doctors Think Deeper?

LLMs foster deeper clinical reasoning, prompting iterative, reflective decision-making in physicians. AI-augmented doctors score higher, spending 119 seconds more per case, improving accuracy without ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results