Customize Consent Preferences

We use cookies to help you navigate efficiently and perform certain functions. You will find detailed information about all cookies under each consent category below.

The cookies that are categorized as "Necessary" are stored on your browser as they are essential for enabling the basic functionalities of the site. ... 

Always Active

Necessary cookies are required to enable the basic features of this site, such as providing secure log-in or adjusting your consent preferences. These cookies do not store any personally identifiable data.

No cookies to display.

Functional cookies help perform certain functionalities like sharing the content of the website on social media platforms, collecting feedback, and other third-party features.

No cookies to display.

Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics such as the number of visitors, bounce rate, traffic source, etc.

No cookies to display.

Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.

No cookies to display.

Advertisement cookies are used to provide visitors with customized advertisements based on the pages you visited previously and to analyze the effectiveness of the ad campaigns.

No cookies to display.

Tech

AI still can’t answer questions about history: study

While artificial intelligence excels at tasks like coding and podcast generation, it struggles to accurately answer high-level history questions, according to a study.

Researchers tested OpenAI’s GPT-4, Meta’s Llama and Google’s Gemini using a newly developed benchmark called Hist-LLM.

The benchmark relies on the Seshat Global History Databank, a comprehensive database of historical knowledge.

Artificial intelligence-powered learned language models struggled to get basic facts right about ancient Egypt, according to a study. AlexAnton – stock.adobe.com

The study, which was presented at the NeurIPS AI conference last month, found disappointing results, according to TechCrunch.

GPT-4 Turbo performed best but only achieved about 46% accuracy — barely above random guessing.

“LLMs, while impressive, still lack the depth required for advanced history,” said Maria del Rio-Chanona, a co-author of the paper and associate professor at University College London.

“They’re great for basic facts, but they fail at nuanced, PhD-level historical inquiries.”

Researchers found that LLMs often extrapolate from prominent historical data but struggle with more obscure details.

For instance, GPT-4 incorrectly stated that scale armor was present in ancient Egypt during a specific time period, when in reality, the technology only appeared 1,500 years later.

AI technology has advanced rapidly in recent years — enabling humans to boost productivity in shorter time frames. GamePixel – stock.adobe.com

Similarly, the model falsely claimed ancient Egypt had a professional standing army during a particular period, likely due to the prevalence of information on standing armies in other ancient empires, such as Persia.

“If you get told A and B 100 times, and C only once, you’re more likely to recall A and B,” del Rio-Chanona explained.

Another concern was potential bias.

OpenAI’s GPT-4 and Meta’s Llama models performed worse when answering questions about regions such as sub-Saharan Africa, indicating training data limitations.

“These biases suggest LLMs reflect gaps in historical documentation rather than an unbiased representation of history,” said Peter Turchin, the study’s lead researcher.

OpenAI’s GPT-4 incorrectly stated that scale armor was present in ancient Egypt during a specific time period, when in reality, the technology only appeared 1,500 years later. Michael – stock.adobe.com

Despite these limitations, researchers remain hopeful that AI can assist historians in the future.

They plan to refine the Hist-LLM benchmark by incorporating more diverse data sources and increasing the complexity of the questions.

“Our findings highlight areas where LLMs need improvement, but they also showcase their potential to support historical research,” the paper concluded.

As AI continues to evolve, experts say it is clear that human historians remain irreplaceable in interpreting complex historical narratives and ensuring accuracy in academic inquiry.

Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button