Hello HN! We are building Jazzberry (https://jazzberry.ai), an AI bug finder that automatically tests your code when a pull request occurs to find and flag real bugs before they are merged.

Here’s a demo video: https://www.youtube.com/watch?v=L6ZTu86qK8U#t=7

We are building Jazzberry to help you find bugs in your code base. Here’s how it all works:

Jazzberry clones a repo when a PR is created. The AI agent receives the diff from the PR in its context window. The AI agent can execute bash commands in the sandbox to interact with the rest the code base. The output of these commands is fed into the agent. The agent can now read/write files and search, install packages, run executables, run interpreters, etc. It iteratively tests and observes the results to pinpoint bugs. These are then reported in the PR as markdown tables.

Jazzberry focuses on dynamically testing code in a Sandbox to confirm the existence of real bugs. We are not a code review tool. Our only goal is to provide concrete proof of what’s broken, and how.

Here’s some real examples of bugs we have found.

Authentication bypass (Critical).– When AUTH_ENABLED=False the get_user dependence in home/api/deps.py returns the first superuser. This bypasses authentication and could lead to unauthorized access. It also defaults to superuser if the auth0 authenticated user is not in the database.

Unsecure Header Processing (High)– The server does not validate header names/values. This allows malicious headers to be injected, which could lead to security issues.

API Key leakage (High)– Different error logs in browser consoles revealed whether API Keys were valid. This allowed attackers to bruteforce valid credentials by separating between format errors and authorisation errors.

While working on this project, we realized how much the growth of LLM-generated codes is increasing the need for automated testing solutions. When dealing with thousands of LLM-generated lines of code, traditional code coverage metrics and manual review are already less effective. We believe this will only get worse over time, as the complexity of AI-authored system will eventually require even more sophisticated AI tools for effective validation.

Mateo is a PhD in formal methods and reinforcement learning with over 20 publications. Marco has an MSc in Software Testing, with a specialization in LLMs (Latent Learning Models) for automated test generation.

Our team is actively developing and we would love to hear your honest feedback!

www.aiobserver.co

More from this stream

Recomended