Use the vitals package with ellmer to evaluate and compare the accuracy of LLMs, including writing evals to test local models.
Obsidian is already great, but my local LLM makes it better ...
LangGraph has been used to create a multi-agent large language model (LLM) coding framework. This framework is designed to automate various software development tasks, including coding, testing, and ...
AI coding agents from OpenAI, Anthropic, and Google can now work on software projects for hours at a time, writing complete apps, running tests, and fixing bugs with human supervision. But these tools ...
Despite the hype around AI-assisted coding, research shows LLMs only choose secure code 55% of the time, proving there are fundamental limitations to their use.
Recently AI risk and benefit evaluation company METR ran a randomized control test (RCT) on a gaggle of experienced open source developers to gain objective data on how the use of LLMs affects their ...