Montag, 29. Juni 2026

 

VibeKode.IT 2026

 

I had the great opportunity to visit VibeKode.IT which was held for the first time along the established MLConf in my hometown of Munich.

In this post I will try to sum up the talks I visited and present you with the gist of what I took from them. Should you spot any mistakes or have questions, feel free to leave them in the comments and I will try to address them. Sadly, I cannot link the talks themselves as those are not available publicly but only for conference participants. At the time of writing even some of the talk summaries have been removed from the website and app for some reason. Should they be added again I will link them.

Day 1

Keynote - Sebastian Meyen (Software & Support Media GmbH) & Dr. Pieter Buteners (Emma Legal)

The conference opened with a recap on the history of Machine Learning, charting its evolution into the current wave of agentic systems. As AI-assisted coding goes mainstream, the speakers emphasized that adaptation is no longer optional. The transition from "vibe coding" to structured agentic engineering marks a new maturity phase where developers must learn to guide complex multi-agent workflows rather than simple autocomplete prompts.

When Code Becomes Free:The Organizational Bottleneck of the AI Age - Markus Andrezak (überproduct)

Markus Andrezak framed the AI revolution alongside historical technological disruptions, noting the classic split between blind trust and complete skepticism. He illustrates this with a series of highly sceptic quotes along the lines of “People will not accept it” and “This can never work”. Only to later reveal these quotes were all in relation to Continuous Deployment, which after all is now an integral part of many development processes showing he is clearly in the camp of “anything is possible”.

As AI dramatically lowers the cost of writing code, the bottleneck is shifting from writing software to managing organizational flow and verification. To successfully rely on AI-generated code, organizations must build robust tooling and continuous integration pipelines.

Additionally, with AI creating software became a skill available to almost everyone which resembles the democratization of software development.

Andrezak invoked the philosophical concept of Geworfenheit (thrownness): AI is here, and we must actively shape how it improves our work instead of lamenting what it replaces.

How To Write Great Codewith LLMs #vibeless - Adam Bien (adam-bien.com)

Adam Bien is a household name at many conferences and especially in the Java ecosystem. In this session he shared his experiences and insights on working with LLMs in his daily business.

To break it down you could summarize it in two mantras:

  1. Keep your context small and clean, the more clutter you add, the worse the results will be
  2. Use well-known standards, he even goes so far as to say, not to use frameworks like spring or angular but have the LLM create plain java/js code

Especially the second one warrants some further explanation. The old APIs and SDKs are already deeply embedded in the training data of LLMs, for newer frameworks on the other hand training data is often sparse. According to him js code generated that way looks just like react and has no caveats regarding maintenance.

We did not see any actual code snippets supporting this, but it seems to work for him so maybe it is worth trying on a project sometimes.

What we did see was his setup for using kiro as SDD tool (Spec Driven Development) where he had his use case specification in a markdown file to generate code from and was able to have changes in either being reflected in the other by instructing his agent accordingly.

Critical Evaluation of AICoding Assistants: Productivity Gains vs. Security Risks - Maxim Salnikov(Microsoft)

After hearing about the possibilities and opportunities of AI we were in for a change of tone.

Salnikov traced the evolution of AI coding from simple chat interfaces to continuous, autonomous background assistants. With this increased autonomy comes a loss of observability and a host of new security vulnerabilities. 

New attack vectors include slopsquatting (creating fake packages which are engineered to take advantage of LLM hallucinations to be included in builds) and malicious unicode injections that are invisible to humans in github projects.

Since agents can access more than committed code, they present unique risks, especially those running with the same access privileges as your user.

Guardrails are needed: Restrict agent privileges, protect configuration files via CODEOWNERS, use sandboxing, and avoid "YOLO" flags.

Try to use Agent Package Manager (apm) to reduce the risk of using malicious agent skills, maintain security checks in pipelines, and keep a human in the loop.

From Local to Production:Running AI Workloads Anywhere with Docker + Serverless - Mikhail Rozhkov(Nebius)

To be honest, I was expecting something different from the title. Actually, the topic was not a docker + serverless setup for AI tooling but instead a sales pitch for a product called Nebius Serverless AI.

To summarize: Provisioning a GPU worker in cloud environments has several challenges which Serverless AI is helping to solve. So should you face the challenge of moving e.g. AI containers from on-prem to the cloud this product might be worth a look.

Spec Driven Development: The End of Vibe coding - Daniel Sogl (Thinktecture AG)

Daniel Sogl argued that the primary cause of software failure remains unchanged: bad, vague, or contradictory requirements. While AI agents are highly capable, they fail more often than realized due to five core limitations:

  •  Short memory: Even before the context window comes near begin full, information is lost and quality drops
  • Lack of trade-off awareness: Exiting code is copied even if it does not make sense for the current task, just because it is there
  • Ripple Blindness: Side-Effects are often missed, especially in more complex/grown architectures
  • Ignorance of company-specific rules: Many rules are not available to the agent
  • Silent failures: Sometimes they just pretend everything is fine, e.g. tests that are always green

A possible solution to this is Spec-Driven Development (SDD): Humans and AI collaborate to build a rigorous specification containing clear intents, constraints, and acceptance criteria using tools like Spec KitOpen Spec, or Kiro.

  • The Gold Rule: If an issue is found, restart by modifying the specification—do not fix the generated code manually (a point that interestingly contrasts with Adam Bien's approach of letting code changes update specs).
  • Current State: Spec tools are still missing features like refactoring or resolving conflicting specs and in general seem mostly suitable for greenfield projects but not for bugfixing in existing code bases 

Human in the Loop at Scale: From Coder to AI Orchestrator - Johannes Modersohn (Hermes Germany GmbH)

Johannes Modersohn shared Hermes’ real-world journey of scaling AI usage, which began small but quickly grew increasingly.

To kick-start AI-usage in their teams they partnered with training providers for intensive "AI Sprints" which brought teams up to speed quickly.

An important realization was that there will always be some form of resistance among developers. This can be caused by a variety of issues from anxiety about job security to cognitive overload from a growing number of code reviews.

These issues need not only to be recognized and acknowledged but also require suitable countermeasures. For one thing, common tooling and context engineering can help with cognitive load issues while psychological space and freedom are helpful to ease people into the transition.

Guardrails and Sanity Checks: Verifying LLM Input and Output for Developers - Enrique Lopez Manas (Snapp Mobile GmbH)

In contrast to the previous sessions this talk focused on LLM integration inside applications instead of the developer experience aspect. A key aspect here is reliability. Depending on your use-case you want different “levels” of reliability of the generated data. For this, requests to LLMs can be tuned via the temperature property, where a higher temperature results in more “creative” responses, e.g. an image generator might profit from higher temperatures than a research bot.

A more complex aspect is of course security. Processing input from users always bears certain risks of various attack vectors like prompt injections. To mitigate this we have a few tools at our disposal:

  • Input Control: Validate and sanitize inputs to prevent prompt injections (e.g., using regex to filter attack strings), or enforce use structured, safe prompts that instruct the LLM clearly which part of the context contains valid instructions
  • Output Validation: 
    • Enforce schema validation, monitor token budgets, and implement history management (sliding windows/summarizations)
    • Scan for sensitive data, run sanity checks against a secondary LLM, or ask for a summary check to detect logical anomalies. 

Human in the Loop Is Not aCheckbox - Tessa Pfattheicher (iteratec GmbH)

The last talk of a very long day. Not necessarily the most popular slot, but still th room as quite full when Tessa Pfattheicher challenged the simplistic "Human-in-the-Loop" (HITL) model, using iteratec’s maturity model to evaluate how organizations interact with AI.

The idea behind this is a layered schema with each layer representing a higher reliance on AI than the one before until to a point where no human is involved in the process anymore.

With the example of iteratec’s invoice process she illustrated how her company still considers the human factor a valuable asset that must not be delegated to AI completely.

  • True Integration: Humans must be deeply integrated into the actual workflow rather than acting as passive, end-of-process "approvers" but still utilize AI tools as much as sensibly possible
  • Complacency Risks: When people rely too heavily on AI validation, they become complacent and stop performing rigorous checks, leading to a false sense of security and potentially degrading trust in colleagues

Day 2

Harness Engineering: ThePressure Reactor of Agentic Engineering - Robert Glaser (Exxeta)

With the rise of AI assisted coding, the main bottleneck in software development moved from implementation to verification. So, we want to ensure generated code is easy to review by being concise and precise. One tool for this can be harness engineering.

But what exactly is the harness? Is it just your tool to interact with our agent like Pi or ClaudeCode or is it maybe even everything outside of the LLMs we use? There is no clear consensus on this yet. Robert Glaser gave an overview of different measures we can employ for this purpose.

To begin, we need to understand there are different kinds of harnesses, behavioral and structural. A behavioral harness can be instructions in your context e.g. via md files or system prompts. An issue with these is, as any other part of the context they can be “forgotten” or their understanding altered by other parts of the context.

Structural harnesses live outside of the context. It could be a simple loop over a call to an agent api that performs a fixed check on the result and continues the loop until the check is fulfilled. The LLM cannot try to game these constraints (unless we add information about to the context of course) so we can be sure certain criteria are met when we evaluate the response.

To further optimize the process, we can add backpressure to it, that could be automated quality gates, validation steps regarding architecture security or domain rules. If possible, those should be automated but especially for domain related feedback that usually has to come from PO or users outside the automated loop.

Anything that validates output from an LLM can be considered an outcome grader. Ideally those should be calibrated by providing positive and negative examples to compare output against.

Paired Programming with AI Agents: How Humans and Machines Build Together at FINN - Paul Beudert & Jan Heiselbetz (FINN)

To be honest the title of the talk suggested something entirely different to me than was ultimately presented. Paul and Jan illustrated how FINN is leaning very heavily into AI assisted development and tooling. The two of them have no actual engineering background but are utilizing AI at their company to build pipelines for several processes and even independent applications either on their own or assisting development teams.

For some of their developers this even goes as far as not using an IDE anymore.

MCP, A2A, and AG-UI: The Agentic Protocol Stack - Max Marschall (Thinktecture AG)

Max Marschall outlined the emerging technical standards that enable agents to work together seamlessly.

MCPs have been considered a plug & play portable extension to any AI tooling when they came out. Max gives various examples of MCPs and their use in different scenarios. Unfortunately, many MCPs also meant a high pollution of the context window which could cause quality loss.

With the new version of MCPs this does change as now those can then be added on a per use case basis, similar to skills. Further innovations include MCPs becoming stateless and the introduction of MCP-apps which allow you to also include a UI so the MCP becomes more self-contained, and its usage is more standardized.

A2A provides a standardized protocol for inter agent communications so implementation specifics on an agent are hidden, allowing for better interchangeability.

AG-UI also defines a protocol which allows for a new pattern for request flows. You can have frontend and backend parts which can both dispatch LLM calls and stream those calls and results together as they deem fit.

There was a lot more ground covered, so should this talk ever become available publicly I encourage you to take a look.

From Language to Reality: Why the Future of AI Is Spatial - Agata Chudzinska (theBlue.ai GmbH)

Agata Chudzinska addressed the ceiling of classically trained LLMs where in the past simply adding more training data resulted in more powerful models. But the curve of added data to model capabilities flattens more and more.

Illustrated by Moravec’s paradox, she explained that LLMs lack a real-world mental model, something humans learn intuitively.

There are approaches to add such world models to LLMs by building visual models and memory models. An example was a model playing Doom and trying to stay alive as long as possible, it was fed with random game movements as test data and in the end was able to avoid getting for quite           a while.

In general, there are 3 strategies on how to use such a world model. Generators and simulators are generative, while planners are predictive.

Generators like Genie3 can render e.g. a first-person view of realistic scenery on the fly so you can move around in it and the model creates new parts of that scenery in real-time.

A simulator can be fed a start and end image and then interpolates a realistic transition between the two. An example we saw were images of a robot arm in two different positions and then how the model built a sequence of how the arm would have to move for this transition.

Planners on the other hand can predict what will happen from the given input data onwards. All in all this was very interesting and I am curious to see how this will play out in the future.

Omega Programming: LeadingAI Systems That Write Code - Paul Dubs (Xpress AI)

Paul Dubs reminded the audience that the true output of software engineering is not the lines of code, but the mental model behind them. If there is no understanding of the created software, it is essentially dead.

The term omega programming refers to technics introduced by extreme programming mapped into the AI era where the agent acts as driver, but the human embodies the navigator.

Even if we let AI create everything, in the end we as humans must remain in the loop to be still in control of the product and own the code even though we shift at least partly from programmers to product managers.

Building Domain-Specific AI Agents On-Prem: Open-Source LLMs and MCP in Practice - Yatindra Shashi (Intel)

For enterprises dealing with strict privacy requirements, public cloud models represent a security liability. Shashi made the case for On-Premises AI, which provides complete data sovereignty.

In addition, using self-hosted LLMs allows for customization which can be advantageous in some scenarios. If you are dealing with very specific hardware like industrial robots a lot of the training data used for generic LLMs is useless.

He illustrated how his team used pruning and LoRA fine-tuning to reduce the size of models and used MCPs to increase efficiency and accuracy for their project. Also, the addition of domain specific guard rails like compiler output did help to improve results.

To keep up with changes in the base LLMs all such optimizations of course must be added to some kind of pipeline so they can be reapplied repeatedly.

Running Open WebUI or LibreChat with LiteLLM Proxy in Azure - Melanie Bauer (software architects)

Melanie Bauer gave an overview of her school project which used LiteLLM as a basis to host Open WebUI or LibreChat on Azure addressing several cross-cutting concerns like GDPR compliance, security, and budgeting of AI credits.

  • LiteLLM Proxy: An OpenAI-compatible gateway that provides means to configure provider endpoints, manage actual and virtual API keys, and enforces budgets.
  • Open WebUI provides a very versatile AI workbench for users to create their own workflows
  • LibreChat allows to build a robust, secure enterprise chatbot infrastructure

The actual topic was not that interesting for me personally and honestly, I was planning to visit the other talk of that slot but that one was cancelled. In hindsight I am grateful for that as I was very impressed by how Melanie handled herself. A 17-year-old student being able to hold a very competent talk in a foreign language and navigating through the pitfalls of a live demo (and of course there were a few, there always are) was unexpected.

Should you see that name on a future panel I encourage you to attend that talk, I know I will.

Conclusion

While HITL (Human in the loop) is a huge topic for a lot of people, we also saw proponents for eliminating interaction all together. Some advocate for context reduction at all costs while others claim more context delivers better results. For me the world of AI is still in its infancy, and we have differing opinions all over the place. But one thing is certain, there are interesting times ahead. Let’s try to make the best of it.